frightanic
New Member
Posts: 35
Raspberry Pi: Yes
Other Device: OurWeather (ESP8266), lots of ESP8266/ESP32 modules
|
Post by frightanic on Mar 21, 2018 13:38:38 GMT -8
Note: this was branched off from forum.switchdoc.com/thread/466/ourweather-unit-freezes-randomly?page=3That's why I was a bit irritated that the other thread was "hijacked" for something else. Anyone else following along now has to follow all the twists and turns. I acknowledge, though, that it may not have been clear from the beginning that it's a different issue. Uuhhm, no, not really. I'm sorry if it's confusing but I'm still on the temporary AP out on the patio. So, I ran the data gathering spike against a stable system. Very good point about "sending packets to the device may keep the wifi in place longer" - in theory. I'll run the same test without the temp AP in place just to see if it makes any difference in practice (I doubt it). When you asked to turn off data gathering did you do so on the hypothesis that there might be a memory leak in the software which would cause the system to freeze? I did consider that but with the underlying ESP8266 SDK when running out of heap or if you don't feed the watchdog it just resets/reboots rather than freeze.
|
|
|
Post by SDL on Mar 22, 2018 8:05:15 GMT -8
Fightanic,
Tomorrow, we are looking at the V029 code (moving to V030) to add some code that will detect a connection loss and re-establish the connection. I'll want you to test it out.
Note: I moved this to a new thread so I could separate the two problems.
I asked to turn off data gathering to isolate the problem. Remember that some browsers - with BHOs (and possibly some software retrieval software) can hang the REST interface.
This remote debugging is frustrating for the user, but imagine how it is for me when I don't have complete information! So I always ask lots of questions and try to isolate the problem. So we can duplicate it. Then it almost always is easy to fix.
Best regards,
BP
|
|
frightanic
New Member
Posts: 35
Raspberry Pi: Yes
Other Device: OurWeather (ESP8266), lots of ESP8266/ESP32 modules
|
Post by frightanic on Mar 22, 2018 8:25:20 GMT -8
Thanks, looking forward to new code. I've been monitoring github.com/switchdoclabs/OurWeatherWeatherPlus for new stuff ever since I discovered this issue. The "REST interface hanging" situation is clear, the forum threads about that are hard to miss. As the OLED should continue to update in that case I knew it must be something else (remember mine freezes). I appreciate your support and fully understand your situation. I often find myself in your shoes for the open-source projects I maintain.
|
|
|
Post by SDL on Mar 24, 2018 15:09:40 GMT -8
Firghtanic, I have put V030 up on github.com/switchdoclabs/OurWeatherWeatherPlus which should resolve this issue. I'm not releasing it to the Auto Update server until I hear from you that the problem has been resolved. Works with us. We tested by connecting, then turning off the AP for 10 minutes and turning it back on. Best regards, BP
|
|
frightanic
New Member
Posts: 35
Raspberry Pi: Yes
Other Device: OurWeather (ESP8266), lots of ESP8266/ESP32 modules
|
Post by frightanic on Mar 25, 2018 23:45:59 GMT -8
The good news is that v030 works i.e. it doesn't break anything AFAICT. The bad news is that it doesn't fix my problem. Test procedure: - OurWeather out on patio
- switch back to original AP (i.e. the temporary one on the patio turned off)
- OurWeather hooked up through FTDI
- serial monitor in Arduino IDE running
- no continuoues data gathering software i.e. no hits against the REST API
After an arbitrary time (can be less than 10min) the system freezes as reported (no blinking white LED, no OLED updates, REST request times outs).
BUT it's still pingable in this state!
Attached you'll find the (redacted) serial report. All seems well and the heap is absolutely stable at around 30k.
What is missing from the report are the last two lines. I don't know why they were not copy pasted from Arduino to Vim. There were just two numbers on two separate lines; no prefix, no suffix - just numbers. The second might have been the heap bytes. I vaguely remember it to be 30xxx or 300xxx something.
Attachments:ourweather.txt (29.56 KB)
|
|
|
Post by SDL on Mar 26, 2018 17:50:27 GMT -8
Hi Frightanic,
I still think that V030 will fix the loss of signal problem, but now you are looking more like Eduardo.
Please run it again until it freezes and then grab those last numbers. (and about 100 lines above). Let's start digging into this.
Are you using the weather underground interface? I'm seeing failures. I wonder if that is related. Maybe....
BP
|
|
|
Post by SDL on Mar 26, 2018 17:58:15 GMT -8
BTW, my thinking is that maybe all these WeatherUnderground failures are screwing with the web software. One difference in my system here is that I am transmitting to the weatherunderground all the time. If you aren't using the WeatherUnderground, could you set up a station for an experiment? It's free and pretty easy to do. See the WeatherUnderground manual on shop.switchdoc.com/products/ourweather-complete-weather-kitunder downloads. I'm also working on the Thunderboard Interface too for OurWeather but before we release that, we would like to get this fixed. BP
|
|
frightanic
New Member
Posts: 35
Raspberry Pi: Yes
Other Device: OurWeather (ESP8266), lots of ESP8266/ESP32 modules
|
Post by frightanic on Mar 26, 2018 21:11:07 GMT -8
Yes, the code looks sane and it would reconnect to the AP. However, we were never able to establish that the cause must be a loss of signal. There were just indications pointing into that direction.
Yep, I'll do that when I get around to it.
I ran another test last night - on the hypothesis that this be NOT a WiFi issue. I took a NodeMCU 1.0 (aka v2) module which has a ESP-12 on it just like your weather kit. Then I flashed the NodeMCU firmware and ran a simple HTTP server w/o any WiFi-reconnect logic or the like on it. It is connected to the same AP and hasn't stuttered once in the last 24h.
Slowly but surely this starts looking like faulty hardware to me.
|
|
|
Post by SDL on Mar 27, 2018 9:51:30 GMT -8
Your second test is pretty interesting. For it to be a valid one, we need to have the same external hardware and the same software hooked up to it. I still think this is a software problem (maybe in an ESP8266 library).
It would be interesting for you to take the same software that you used on your NodeMCU and then run it on the WeatherPlus ESP8266. Then we have an apples to apples test.
I am taking one of our units and putting a fail for WeatherUnderground on it and see what happens.
I'm liking that we are starting to get some real data on what this might be.
BP
|
|
frightanic
New Member
Posts: 35
Raspberry Pi: Yes
Other Device: OurWeather (ESP8266), lots of ESP8266/ESP32 modules
|
Post by frightanic on Mar 27, 2018 10:13:46 GMT -8
Yep, I had planned that for tonight or tomorrow. Likewise I'll try to flash OurWeather onto the naked NodeMCU v2 and see what happens.
|
|
frightanic
New Member
Posts: 35
Raspberry Pi: Yes
Other Device: OurWeather (ESP8266), lots of ESP8266/ESP32 modules
|
Post by frightanic on Mar 28, 2018 11:10:15 GMT -8
The results are in but...they're inconclusive for me.
I had two devices out on the patio connected to the AP inside the house. - device 1: naked NodeMCU v2 module on a breadboard running OurWeather v30 - device 2: fully assembled WeatherPlus Kit including all sensor running NodeMCU firmware and a small Lua script that provides an HTTP endpoint. Of course, neither access to sensors nor OLED are integrated into the software.
Both have been running for over 24h without a hiccup or freeze.
|
|
|
Post by SDL on Mar 28, 2018 14:10:40 GMT -8
Fightanic,
The results are in but...they're inconclusive for me.
I had two devices out on the patio connected to the AP inside the house. - device 1: naked NodeMCU v2 module on a breadboard running OurWeather v30
And device 1 is failing logging into WeatherUnderGround?
Try adding an AM2315 into the NodMCU v2 on the I2C bus. I'm wondering if V28/V29 did something.
- device 2: fully assembled WeatherPlus Kit including all sensor running NodeMCU firmware and a small Lua script that provides an HTTP endpoint. Of course, neither access to sensors nor OLED are integrated into the software.
And we are not using the AM2315 in device 2:
So, it looks like we have a software problem, but it could be in relation to some piece of hardware.
Best,
BP
|
|
frightanic
New Member
Posts: 35
Raspberry Pi: Yes
Other Device: OurWeather (ESP8266), lots of ESP8266/ESP32 modules
|
Post by frightanic on Apr 2, 2018 12:40:25 GMT -8
I made yet another effort over the Easter weekend. Flashed v31 to the WeatherPlus board and created a Wunderground account to let OurWeather publish data there. That worked for 8h until the system completely froze again. Then I opened the outdoor case for the umpteenth time, disconnected OLED & AM2315 and rebooted. 2h later the system was frozen again. So, I also unplugged the wind & rain plugs and rebooted the nearly naked system. Again, it stalled shortly afterwards. As reported earlier in all cases the device was still pingable. However, it didn't upload data to WU when in frozen state. Hardware problems with any of the sensors can now definitely be ruled out. Sorry, I'm done with this. I won't start to rebuild OurWeather on the NodeMCU stack (adding AM2315 et.al.) just to figure out what the problem could be. I've already invested way too many hours with debugging and analysis for what is essentially an off-the-shelve product; KickStarter or not. Of course it's not just me who's frustrated, disappointed and annoyed but primarily my daughter who I built the weather station with and for. First the dead USB cable in the package, then the hassle/failure with the USB connector and now this. I don't think I'm asking too much when I expect a replacement WeatherPlus board is going to be dispatched to me soon, right? I'm backer #9.
|
|
|
Post by SDL on Apr 2, 2018 13:09:06 GMT -8
Frightanic,
How totally bizarre. I hate non-repeatable problems. We (and most other people) are not having this issue.
So, maybe you have a bad ESP8266. Send it back in and we will replace it (and get around to testing it).
Send it to:
SwitchDoc Labs RMA#040218-02 909 S Shelley Lake Lane Spokane Valley, WA 99037
Best regards,
BP
|
|
frightanic
New Member
Posts: 35
Raspberry Pi: Yes
Other Device: OurWeather (ESP8266), lots of ESP8266/ESP32 modules
|
Post by frightanic on Apr 15, 2018 13:57:48 GMT -8
I had my board packaged and ready to be dropped off at the post office tomorrow but... Thanks to a support case with one of our own products over the weekend I may have discovered another - or a different - smoking gun here. A customer reported that his device would lock up randomly after a while during HTTP transmission. Much to my surprise I was able to reproduce it with my own test device (same model). Debug statements around the code in question pointed to some WiFi issue. Some digging in the ESP8266 Arduino core issues list revealed: github.com/esp8266/Arduino/issues/4522. It's described as a regression against the WiFi stack in 2.4.1 which apparently fails with certain APs. I left a comment there, downgraded to ESP8266 Arduino core 2.3.0 (Boards Manager) and re-flashed OurWeather. I'll leave it outside for the night with no sensor but just the RTC connected. Preliminary results should be in by tomorrow morning.
|
|