tecronin
Junior Member
Posts: 91
Raspberry Pi: Yes
|
Post by tecronin on Apr 13, 2021 20:25:36 GMT -8
some days my camera goes redish, I have to reboot it to reset it.
|
|
|
Post by Jason on Apr 14, 2021 3:58:31 GMT -8
Totally off-topic but gorgeous view!!!
Jason
|
|
|
Post by pagerman on Apr 14, 2021 5:09:56 GMT -8
Depends on the camera. If it has the capability to work at night with I/R lights I have seen this. The camera has an IR filter for daytime and moves it out of the way for night vision. I hear an audible "click" on my camera. If the filter is stuck and not in front of the lens, daytime will be reddish. I see this all the time on older security cameras.
|
|
|
Post by SDL on Apr 14, 2021 8:37:52 GMT -8
I agree. It sounds like your IR filter gets stuck. Next time it happens, rather than reboot, can you try just killing and restarting SkyWeather2? That would show whether we can address this programatically.
BP
|
|
tecronin
Junior Member
Posts: 91
Raspberry Pi: Yes
|
Post by tecronin on Apr 14, 2021 16:26:06 GMT -8
I'm using the Camera that came with the kit.
I'm running the app via systemd. github.com/tim-oe/SkyWeather2/blob/main/src/lib/systemd/system/skyweather.service
it happened again today.
shutting it down it timed out but the process wasn't running. I then tried running testSkyCamera.py but it hung too
I captured the running processes ps.log (10.68 KB) before rebooting but nothing stands out
after reboot, it's working again, testSkyCamera also is working
|
|
tecronin
Junior Member
Posts: 91
Raspberry Pi: Yes
|
Post by tecronin on Apr 14, 2021 17:56:58 GMT -8
in digging a little more i noticed that there's a sub process of /usr/bin/pigpiod this was still running, see ps.log above and i could not kill that without reboot. in googling i came across this raspberrypi.stackexchange.com/questions/70568/how-to-run-pigpiod-on-booti see it being explicitly started in the code, is there a reason to have it run that way and not via a system service
might not be anything though
|
|
|
Post by SDL on Apr 15, 2021 10:57:45 GMT -8
You can do that, but we wanted it done programmatically in order to simplify installation.
The piece of information that I would like is after a redshift problem, does a kill / restart of SkyWeather2.py fix it.
BP
|
|
tecronin
Junior Member
Posts: 91
Raspberry Pi: Yes
|
Post by tecronin on Apr 15, 2021 15:45:07 GMT -8
it didn't, see prev post, the pigpiod seemed to be hung couldn't kill it. had to reboot it to get it back
I will try it again if it happens again
|
|
tecronin
Junior Member
Posts: 91
Raspberry Pi: Yes
|
Post by tecronin on Apr 28, 2021 13:08:50 GMT -8
happened again today, i had upgraded code to 025 3 days ago.
i killed the pigpiod, shyweather2, and rtl_433 processes, using sudo kill -9 pid. using ps -aux | grep pid the processes were no longer present.
i restarted the skyweather2 script and there was 2 pigpiod processes, one being defunct. the only way to clear it was to power cycle it.
|
|
dow4hurst
Full Member
SkyWeather2 Newbie
Posts: 117
|
Post by dow4hurst on Apr 28, 2021 15:49:44 GMT -8
tecronin, Pretty much anytime I've had the skyweather2 python process be affected by a firmware transaction error, the pi has to be rebooted with:
echo s | sudo tee /proc/sysrq-trigger;echo u | sudo tee /proc/sysrq-trigger;echo b | sudo tee /proc/sysrq-trigger or else I have to cycle power. The camera IR filter problem sounds similar since you couldn't recover without a hard reboot. I don't know enough about what is happening at the driver/kernel level with problems like this, but I've seen linux systems with hung drivers require a reboot many times in the past. Infiniband drivers and nvidia kernel modules have been the worst offenders in my career. Sending a kill signal or terminate signal just won't fix it. The above command will at least keep the filesystem from getting corrupted and you don't have to go physically power cycle the pi. I usually keep an ssh session running all the time to pi from my laptop at home just for this purpose.
At this point, I'm pretty sure my firmware transaction failures listed in /var/log/messages occur when the communication between the WeatherRack2 and the SkyWeather2 is corrupted. I have 4 walls or 3 walls and a roof that the signal has to go through. I think when I get the units physically closer I won't have this occasional problem. The antenna base plate I added really made a big difference on this problem.
Dow
|
|
tecronin
Junior Member
Posts: 91
Raspberry Pi: Yes
|
Post by tecronin on Apr 28, 2021 17:38:21 GMT -8
Googling of how to deal with defunct processes didn't give me too much but to kill the parent that is already down. I tried to pull out the forced kill and restart of pigpiod in skyweather and have that as an independent service and make the skyweather service dependent on it. It worked for while but would eventually fail. need to dig in to it more. my current setup is here forum.switchdoc.com/thread/1594/skyweather2-alton-ili need to grep through the logs, what are the errors your seeing in /var/log/messages?
|
|
wshealy
Full Member
Posts: 187
Raspberry Pi: Yes
|
Post by wshealy on Apr 28, 2021 20:55:03 GMT -8
Just making sure you dealt with what I discovered, turn off the red led the case.
|
|
dow4hurst
Full Member
SkyWeather2 Newbie
Posts: 117
|
Post by dow4hurst on Apr 29, 2021 8:46:17 GMT -8
Tecronin, Here is what I usually get (shown below).
Apr 27 03:50:47 SwitchDocLabs kernel: [172379.760892] ------------[ cut here ]------------ Apr 27 03:50:47 SwitchDocLabs kernel: [172379.760927] WARNING: CPU: 1 PID: 8550 at drivers/firmware/raspberrypi.c:64 rpi_firmware_transaction+0xec/0x128 Apr 27 03:50:47 SwitchDocLabs kernel: [172379.760940] Firmware transaction timeout Apr 27 03:50:47 SwitchDocLabs kernel: [172379.760952] Modules linked in: cmac rfcomm bnep hci_uart btbcm bluetooth ecdh_generic ecc sha256_generic cfg80211 rfkill 8021q garp stp llc v3d i2c_bcm2835 gpu_sched vc4 bcm2835_v4l2(C) bcm2835_isp(C) raspberrypi_hwmon videobuf2_vmalloc bcm2835_codec(C) bcm2835_mmal_vchiq(C) v4l2_mem2mem videobuf2_dma_contig videobuf2_memops videobuf2_v4l2 cec videobuf2_common snd_bcm2835(C) videodev vc_sm_cma(C) mc drm_kms_helper drm drm_panel_orientation_quirks snd_soc_core rpivid_mem snd_compress snd_pcm_dmaengine snd_pcm snd_timer snd syscopyarea sysfillrect sysimgblt fb_sys_fops uio_pdrv_genirq backlight uio nvmem_rmem i2c_dev ip_tables x_tables ipv6 Apr 27 03:50:47 SwitchDocLabs kernel: [172379.761649] CPU: 1 PID: 8550 Comm: kworker/1:0 Tainted: G C 5.10.17-v7l+ #1403 Apr 27 03:50:47 SwitchDocLabs kernel: [172379.761657] Hardware name: BCM2711 Apr 27 03:50:47 SwitchDocLabs kernel: [172379.761674] Workqueue: events dbs_work_handler Apr 27 03:50:47 SwitchDocLabs kernel: [172379.761689] Backtrace: Apr 27 03:50:47 SwitchDocLabs kernel: [172379.761716] [<c0b63050>] (dump_backtrace) from [<c0b633e4>] (show_stack+0x20/0x24) Apr 27 03:50:47 SwitchDocLabs kernel: [172379.761729] r7:ffffffff r6:00000000 r5:60000013 r4:c12e69fc Apr 27 03:50:47 SwitchDocLabs kernel: [172379.761745] [<c0b633c4>] (show_stack) from [<c0b67774>] (dump_stack+0xcc/0xf8) Apr 27 03:50:47 SwitchDocLabs kernel: [172379.761760] [<c0b676a8>] (dump_stack) from [<c0220bac>] (__warn+0xfc/0x114) Apr 27 03:50:47 SwitchDocLabs kernel: [172379.761774] r10:dec02008 r9:00000009 r8:c099d5b8 r7:00000040 r6:00000009 r5:c099d5b8 Apr 27 03:50:47 SwitchDocLabs kernel: [172379.761783] r4:c0e9abc8 r3:c1205094 Apr 27 03:50:47 SwitchDocLabs kernel: [172379.761798] [<c0220ab0>] (__warn) from [<c0b63b88>] (warn_slowpath_fmt+0xa4/0xd8) Apr 27 03:50:47 SwitchDocLabs kernel: [172379.761809] r7:00000040 r6:c0e9abc8 r5:c1205048 r4:c0e9abe8 Apr 27 03:50:47 SwitchDocLabs kernel: [172379.761826] [<c0b63ae8>] (warn_slowpath_fmt) from [<c099d5b8>] (rpi_firmware_transaction+0xec/0x128) Apr 27 03:50:47 SwitchDocLabs kernel: [172379.761838] r9:c1a944c0 r8:00000018 r7:00000000 r6:ffffff92 r5:c1a944c0 r4:c1205048 Apr 27 03:50:47 SwitchDocLabs kernel: [172379.761854] [<c099d4cc>] (rpi_firmware_transaction) from [<c099d6b0>] (rpi_firmware_property_list+0xbc/0x170) Apr 27 03:50:47 SwitchDocLabs kernel: [172379.761865] r7:c1205048 r6:dec02000 r5:00001000 r4:dec02024 Apr 27 03:50:47 SwitchDocLabs kernel: [172379.761880] [<c099d5f4>] (rpi_firmware_property_list) from [<c099d7d4>] (rpi_firmware_property+0x70/0x118) Apr 27 03:50:47 SwitchDocLabs kernel: [172379.761892] r10:c583004c r9:00038002 r8:00000018 r7:c1a944c0 r6:c637fd18 r5:0000000c Apr 27 03:50:47 SwitchDocLabs kernel: [172379.761900] r4:c5830040 Apr 27 03:50:47 SwitchDocLabs kernel: [172379.761916] [<c099d764>] (rpi_firmware_property) from [<c07e4e54>] (raspberrypi_clock_property+0x54/0x7c) Apr 27 03:50:47 SwitchDocLabs kernel: [172379.761928] r10:c2694080 r9:29b92700 r8:35a4e900 r7:c24a5a40 r6:00000000 r5:c637fd40 Apr 27 03:50:47 SwitchDocLabs kernel: [172379.761937] r4:c1205048 r3:0000000c Apr 27 03:50:47 SwitchDocLabs kernel: [172379.761949] [<c07e4e00>] (raspberrypi_clock_property) from [<c07e4fb0>] (raspberrypi_fw_set_rate+0x4c/0xb8) Apr 27 03:50:47 SwitchDocLabs kernel: [172379.761959] r5:c24a5b40 r4:c1205048 Apr 27 03:50:47 SwitchDocLabs kernel: [172379.761973] [<c07e4f64>] (raspberrypi_fw_set_rate) from [<c07dd348>] (clk_change_rate+0xf4/0x4ac) Apr 27 03:50:47 SwitchDocLabs kernel: [172379.761983] r7:00000000 r6:00000000 r5:00000000 r4:c24b3000 Apr 27 03:50:47 SwitchDocLabs kernel: [172379.761998] [<c07dd254>] (clk_change_rate) from [<c07dd87c>] (clk_core_set_rate_nolock+0x17c/0x1a4) Apr 27 03:50:47 SwitchDocLabs kernel: [172379.762010] r10:c2694080 r9:00000000 r8:c24b3000 r7:35a4e900 r6:00000000 r5:c1205048 Apr 27 03:50:47 SwitchDocLabs kernel: [172379.762018] r4:c24b3000 Apr 27 03:50:47 SwitchDocLabs kernel: [172379.762031] [<c07dd700>] (clk_core_set_rate_nolock) from [<c07dd8e4>] (clk_set_rate+0x40/0x98) Apr 27 03:50:47 SwitchDocLabs kernel: [172379.762042] r8:c26aab80 r7:29b92700 r6:c1205048 r5:35a4e900 r4:c26aab80 Apr 27 03:50:47 SwitchDocLabs kernel: [172379.762056] [<c07dd8a4>] (clk_set_rate) from [<c0968dcc>] (dev_pm_opp_set_rate+0x42c/0x588) Apr 27 03:50:47 SwitchDocLabs kernel: [172379.762065] r5:35a4e900 r4:c26bc800 Apr 27 03:50:47 SwitchDocLabs kernel: [172379.762080] [<c09689a0>] (dev_pm_opp_set_rate) from [<c0973900>] (set_target+0x38/0x3c) Apr 27 03:50:47 SwitchDocLabs kernel: [172379.762091] r10:00000000 r9:00000000 r8:00000003 r7:00000000 r6:c1404e44 r5:c1205048 Apr 27 03:50:47 SwitchDocLabs kernel: [172379.762099] r4:c26bca00 Apr 27 03:50:47 SwitchDocLabs kernel: [172379.762115] [<c09738c8>] (set_target) from [<c096e604>] (__cpufreq_driver_target+0x1bc/0x540) Apr 27 03:50:47 SwitchDocLabs kernel: [172379.762130] [<c096e448>] (__cpufreq_driver_target) from [<c0971994>] (od_dbs_update+0xc4/0x170) Apr 27 03:50:47 SwitchDocLabs kernel: [172379.762141] r10:00000000 r9:c4465100 r8:c405ee40 r7:c4465780 r6:c4465100 r5:c4465780 Apr 27 03:50:47 SwitchDocLabs kernel: [172379.762149] r4:c26bca00 Apr 27 03:50:47 SwitchDocLabs kernel: [172379.762165] [<c09718d0>] (od_dbs_update) from [<c0972bf8>] (dbs_work_handler+0x3c/0x64) Apr 27 03:50:47 SwitchDocLabs kernel: [172379.762177] r9:00000000 r8:c131beac r7:c26bca00 r6:c4465784 r5:00000000 r4:c44657b8 Apr 27 03:50:47 SwitchDocLabs kernel: [172379.762193] [<c0972bbc>] (dbs_work_handler) from [<c023dc18>] (process_one_work+0x250/0x5a0) Apr 27 03:50:47 SwitchDocLabs kernel: [172379.762205] r9:00000000 r8:00000040 r7:eff29900 r6:eff26640 r5:c48f5e00 r4:c44657b8 Apr 27 03:50:47 SwitchDocLabs kernel: [172379.762220] [<c023d9c8>] (process_one_work) from [<c023dfc8>] (worker_thread+0x60/0x5c4) Apr 27 03:50:47 SwitchDocLabs kernel: [172379.762231] r10:eff26640 r9:c1203d00 r8:eff26658 r7:00000008 r6:eff26640 r5:c48f5e14 Apr 27 03:50:47 SwitchDocLabs kernel: [172379.762239] r4:c48f5e00 Apr 27 03:50:47 SwitchDocLabs kernel: [172379.762254] [<c023df68>] (worker_thread) from [<c0245a50>] (kthread+0x170/0x174) Apr 27 03:50:47 SwitchDocLabs kernel: [172379.762266] r10:c7bfbe74 r9:c48f5e00 r8:c023df68 r7:c637e000 r6:00000000 r5:c8d31200 Apr 27 03:50:47 SwitchDocLabs kernel: [172379.762274] r4:c7baaf40 Apr 27 03:50:47 SwitchDocLabs kernel: [172379.762287] [<c02458e0>] (kthread) from [<c02000ec>] (ret_from_fork+0x14/0x28) Apr 27 03:50:47 SwitchDocLabs kernel: [172379.762296] Exception stack(0xc637ffb0 to 0xc637fff8) Apr 27 03:50:47 SwitchDocLabs kernel: [172379.762307] ffa0: 00000000 00000000 00000000 00000000 Apr 27 03:50:47 SwitchDocLabs kernel: [172379.762318] ffc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 Apr 27 03:50:47 SwitchDocLabs kernel: [172379.762329] ffe0: 00000000 00000000 00000000 00000000 00000013 00000000 Apr 27 03:50:47 SwitchDocLabs kernel: [172379.762340] r10:00000000 r9:00000000 r8:00000000 r7:00000000 r6:00000000 r5:c02458e0 Apr 27 03:50:47 SwitchDocLabs kernel: [172379.762348] r4:c8d31200 Apr 27 03:50:47 SwitchDocLabs kernel: [172379.762359] ---[ end trace c5cc2ca59519dd4f ]--- Once this occurs, forget recovering the ability to use the radio. Dow
|
|
|
Post by SDL on Apr 29, 2021 14:14:20 GMT -8
Dow,
Your speculation on the source of the firmware transaction errors (not getting WeatherRack2 messages) just doesn't ring true to me. Do you have any evidence of that or just a feeling? We have tons and tons of sensors running at 433MHz and we have yet to see a documented case of this (although we do have SDR recovery code in SkyWeather2).
John
|
|
dow4hurst
Full Member
SkyWeather2 Newbie
Posts: 117
|
Post by dow4hurst on Apr 30, 2021 7:32:23 GMT -8
John, Running top I could see the rtl_433 running when the SkyWeather2 python scripts were running normally. After that firmware transaction error, that program was no longer running. The SkyWeather2.py script was still showing up in the ps output. I did assume, which is probably not good, that the rtl_433 program was accessing a module that had gotten corrupted. However, that was when I was still having two errors occurring. I had a radio signal strength problem that a ground plate fixed, and I had a network communication problem because of dual WANs on my router. Also, I didn't try to capture a core file. I do occasionally get the firmware transaction error, but the number of walls the signal is going through at the moment (SkyWeather2 box is still on my desk, not outside) is likely that cause. But, I do have to either physically cycle power or use the magic sysrq function to recover from this error. You are probably right and I am assuming too much. Dow
|
|