|
Post by juliwood on Dec 30, 2018 2:13:07 GMT -8
Hi, I also got this freeze problem with my board. After reading along the forum, I have been doing some tests and checking what comes out from the board. I tell you what I have seen so far, just in case it can be helpful to solve this. It looks like there are 2 different “freezing” situations, at least in my board. It appears to be quite random, sometimes it starts after 10 min, and other times after a few hours. When “freeze” starts, humidity values start to appear as NaN%, and temperature value freezes. Sometimes, rest of the board seems to work fine, and wind values, pressure, etc. are well measured (it can be seen at the screenshots from weatherunderground I attach). But then at some point, the board completely freezes, and stop sending data to weatherundergroud. Also “ourweather” website stops responding. Even LCD screen seems to freeze at this “second stage”. But this does not happened always, I think. I am currently using v035, i upgrade it using the “update-option”, but then also upload this version using Arduino IDE to see if something changes. But it looks does not have any effect. Board works fine regarding sending the code and reading through the serial port. Before this, I also tested the board with the previous sofware (v034) with the same results. I log what comes out from the serial port, an also i see when it “freezes”. Here looks like temperature stays in the same value, humidity goes to “nan”, and also wind values freezes (unlike what looks at weatherunderground. In contrast, readings from the pressure sensor (pressure and temp) changes from one reading to the next one. Blynk shows something similar, looks like pressure values are ok, but temperature is constant and humidity is gone. Values for wind direction and speed are also freeze. Ourweather webpage also seems to respond at the beginning, but shows all the time the same values, I have checked using two different browsers with the same results. After some hours it stops responding. I also notice i have started to have random rainfall values, but I think it may not be related with all this, and maybe rain gauge have something wrong, as this fake rainfall disappear if i unplug the rain gauge. Is there anything I can test to try to fix this? Just let me know. I do not know what else to try (Besides switch it on/of every 5 mins ) I attach a couple of screenshots from weatherunderground and the log from the serial port. Thanks! Attachments:error.log (81.38 KB)
|
|
|
Post by SDL on Dec 30, 2018 9:22:58 GMT -8
here are my summaries of both freezes:
1) The full lockup scenario - caused by some interaction between the underlying WiFi operating system of the ESP8266 and the REST interface (which obviously uses the WiFi). You can prove this be reducing the frequency of your REST calls to the OurWeather system. The frequency of the lockups go down. We are looking for (and have not found at this point) software solutions to this problem. The typical solution of locking the rest of the system out while servicing REST calls may not work.
There seems to be two parts to this problem. One is as described above, but the second is caused by what browser or system you are using to make the REST calls. Not enough data to tell you exactly what to do, but some users suggest, with some data, that BHO (browser add on software) is causing this problem in Browsers.
2) The AM2315 problem - It has been dramatically improved, but not solved. It only seems to affect some users now. I am thinking it is power and noise related. It could be a 5V/3.3V issue. The Raspberry Pi with Pi2Grover seems to have fewer problems (5V versus 3.3V) I know a way to solve the lockup problem,, but it requires some additional hardware. This seems to be related to the AM2315 itself and not the OurWeather itself. We are looking at updating the OurWeather hardware base board to solve the lockup on the AM2315 and we can also supply the hardware for you to do it yourself. Grove PowerSave is the off the shelf solution. A key piece of information here is that when the AM2315 fails or locks up, power cycling it always fixes it.
This is a problem that has been devilish difficult for us to duplicate in the Labs. And there is data in that fact too!
3) The rain issue is related to noise on your interrupt line. Again, doesn't affect everyone because it is environmentally related. Adding an additional 5K ohm pullup resistor on the rain interrupt line should improve this.
Thank you for your well thought out and documented reply. I hope this information is useful to you!
Best regards,
BP
|
|
perrito
New Member
Posts: 22
Raspberry Pi: Yes
|
Post by perrito on Jan 2, 2019 12:29:09 GMT -8
Hey let me chime in here and sorry if I am too far off the line. This started for me when I upgraded from 021 to 035 (I know, I really procrastinate sometimes) along with the lock-up came a significant increase in response time by the particular endpoint I am querying (/) I presume some more work has been added to the micro and therefore there is not enough juice for the temp reading (which would make a lot of sense)
I will try to make a series of calls to the particular endpoints instead of one big to see if that eases the burden and stops the issue and report back.
Another idea I have is, could I just feed 5v to the AM2315 and put a voltage divider in the i2c wire to return to 3.3V? that way I would ensure constant power supply (sorry, I am completely unaware about the i2c comm way but I assume is a 1 wire comm protocol and that the other wires are only supply)
|
|
|
Post by SDL on Jan 2, 2019 15:31:39 GMT -8
Perrito, We appreciate your information and efforts. This has been a nasty, nasty problem and we are just starting to see a pattern to the very sparse data. Where is my machine learning program, in TensorFlow when I need it? Yes, you could put 5V up on that line and use a voltage divider on the SDA and SCL line coming back to the machine. You could also use this board: shop.switchdoc.com/products/3-3v-to-5v-i2c-voltage-translator-and-i2c-hubBP
|
|
perrito
New Member
Posts: 22
Raspberry Pi: Yes
|
Post by perrito on Jan 2, 2019 17:19:19 GMT -8
Ok, I tried querying the individual values and it eventually (under 3 hs) came to the same result so I am not sure this is a power drain issue yet I still have a theory: I have a couple of theories: On the hardware side: I suspect (a lot of guesswork here) that whatever is causing the power drain is happening periodically and that when read from the sensor and the draining event happen to collide the AM2315 hangs. From what I have gathered by reading the specs for this sensor and issues from the past you should be able to repro this by either: 1) turning off another sensor in the bus before this one (this is most likely not it) 2) produce an artificial power drain while reading the data On the software side: I suggest reverting this commit I was not able to read it as I would have liked because GH is misbehaving today but I see some potential culprits there, when I get the chance to do a side by side diff but I am especially suspicious of the changes around `SDL_ESP8266_HR_AM2315::readData` I am willing to push another binary to the device without that rev but it'll have to wait until the weekend because I need to climb into my roof to do it ( juliwood if you feel inclined to test it I would be super thankful of not having to reprogram something in the roof with ~30C) BTW Is there a way to cut power to the whole i2c bus/card via soft? Sorry if I sound pompous, I am sure much of my politeness wore off in translation to English
|
|
|
Post by juliwood on Jan 4, 2019 8:23:28 GMT -8
Hi,
My knowledge of both, electronics and programming are quite basic. However, I would be happy to help and, if it is something like upload a new code, see if it works, etc.... I can do it.
Also, my board is currently indoors, so it is quite accesible (but without real rainfall or wind)
|
|
perrito
New Member
Posts: 22
Raspberry Pi: Yes
|
Post by perrito on Jan 7, 2019 8:32:39 GMT -8
Ok, I think the best solution here is for me to go to my board and upload a version that I suspect works (that way we disk but my board with my crazy ideas ) SDL are there clear instructions on how to make arduino ide compile properly the project for my card? I am uncertain on what board to select and whether there are other settings to tweak (shame on arduino ide for not packing those into the .ino file)
|
|
|
Post by SDL on Jan 8, 2019 20:28:57 GMT -8
Perrito, You don't sound pompous. I appreciate your remarks. It mode't think reverting that code is a good idea. It works for us and reports the issues found. I don't mean to discount your findings, I just have to duplicate things to fix them! Take a look in the advanced programming manual to learn how to program the OurWeather WeatherPlus board. FYI, it emulates an Adafruit Huzzah board. and is programmed with an FTDI cable. Here is a way to shut off the AM2315 with software control. shop.switchdoc.com/products/grovepowersave-control-grove-device-power-with-your-computer-perfect-for-solar-powerBP BP
|
|
perrito
New Member
Posts: 22
Raspberry Pi: Yes
|
Post by perrito on Jan 9, 2019 3:00:56 GMT -8
While i fully acknowledge that this code works for you there evidently is a software issue at play here, it all began with a software ypgrade for all of us. It is evidently a corner case aince it is proven hard to reproduce in lab conditions. I am not saying the software is wrong just that it triggers the conditions for this to happen. Ill try to get hold of an ftdi cable so i can try a git bisect on this issue since i am in the position of reproducing it. Are there any board details that i could send so you can pinpoint the exact hardware conf i am using to try and repro? Sadly i do not live in a place where your products are available but ill make sure to get a few accessories next time i visit the Us tx for pointing me to the board for the reset
|
|
perrito
New Member
Posts: 22
Raspberry Pi: Yes
|
Post by perrito on Jan 9, 2019 16:47:22 GMT -8
Turns out the FTDI board was bundled with the original kit (I love this kit), I am so happy I built a custom version of 035 that excludes the commit I am suspicious of. One thing I noticed missing from the compiling docs, blynk library needs manual install, perhaps that needs to be added to the next revision of the docs. Ill keep you all posted.
|
|
perrito
New Member
Posts: 22
Raspberry Pi: Yes
|
Post by perrito on Jan 10, 2019 3:02:36 GMT -8
So, SDL The flash seems to have been successful, I have been polling 8 straight hours and nothing fails (when module freezes this graph shows humidity at 0 and temp becomes stuck in the number it had when it froze) you can monitor that graph for some time to make sure it still works. The only thing I did was revert this commit
|
|
|
Post by SDL on Jan 10, 2019 9:58:28 GMT -8
Perrito,
Now that is VERY interesting. Let's let it run a few days.
I wonder if it is the serial port usage in the commit that did this. Interrupts are funny things. When you have the released software in it is DEBUGPRINT defined?
BP
|
|
perrito
New Member
Posts: 22
Raspberry Pi: Yes
|
Post by perrito on Jan 10, 2019 10:25:31 GMT -8
Sorry SDL sadly I cannot leave a Serial monitor debugging, this thing is in a remote place of my house (near roof under water tank) From reviewing the code and seeing how it behaves the problem seems to be (or at least what causes the visible symptoms) badcrc, that seems to be the only way in which we end up with this particular combination of values in these circumstances. By reviewing the code, the crc calculation seems to have changed so if I am correct when this fails the first time it causes the sensor to be hit 8 times in a row very close to each which might be what causes it to hang. Again, just an educated guess sadly all my system is affixed to the roof and it would be super hard for me to go monitor it.
|
|
|
Post by SDL on Jan 10, 2019 13:11:48 GMT -8
Even though you are not monitoring, the serial port is still running. The question I have for you is in the released software that you were using was DEBUGPRINT defined?
Again, interesting data. I'm going to run the exact same thing on a unit tomorrow and see what I get. I have one sitting here in the lab.
In the repository, it is:
#define WEATHERPLUSESP8266VERSION "035"
#define WEATHERPLUSPUBNUBPROTOCOL "OURWEATHER035"
// define DEBUGPRINT to print out lots of debugging information for WeatherPlus.
#undef DEBUGPRINT
BP
|
|
perrito
New Member
Posts: 22
Raspberry Pi: Yes
|
Post by perrito on Jan 10, 2019 13:20:26 GMT -8
ah, sorry for the confusion, `DEBUGPRINT` is not defined (code remains as in 035) the define comment is there and undef is invoked
|
|