I have a program that is stuck at 100% CPU, the UI is frozen, but networking and other threads are still running. A clock that is updated via Timer.CallLater is not ticking and none of the button or paint event handlers appear to be working.
The weird thing is that other parts of the program are working just fine. There are values that are read from a CAN bus (Shell.DataAvailable event handler) and sent to a telemetrics server via TCP socket. For this to work correctly, the timer that updates time must be working, event handlers must be getting processed and the thread scheduler must be working. (CAN data read and TCP send are in different threads) All threads that I have created are the same default priority.
Running GDB “bt” shows that the process appears to be stuck in XojoGUIFrameworkARM.so:
#0 futex_wait_cancelable (private=0, expected=0, futex_word=0x24b11e4)
#1 __pthread_cond_wait_common (abstime=0x0, mutex=0x0, cond=0x24b11b8)
#2 __pthread_cond_wait (cond=0x24b11b8, mutex=0x0) at pthread_cond_wait.c:655
#3 0xb6937790 in ?? ()
#4 0xb69352e8 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
I do not have debugging symbols to find out what 0xb69352e8 or 0xb6937790 reference internally.
This has happened a few times now but I have not been able to solve it with all my attempts. The app is deployed remotely and is currently frozen. I did not want to restart incase someone has any ideas for debugging the issue while frozen.
The IDE version is 2023r1.1
The app is running on Raspberry Pi 4
Thanks for your time!
Upon further investigation, I was able to get more of a stack trace to work with. Figuring out corrupt stacktraces on ARM – Happy coding
This has lead me to my serial read implementation, it appears to be getting stuck while attempt to read serial data.
Some background on how I read serial data:
I use the Serial.DataAvailable event to call “receive_thread.Run”
In the receive thread I get the data with Serial.ReadAll and then parse it.
Maybe it’s possible that Serial.ReadAll being called from a thread causes a mutex lock condition where it is not being released? Maybe I will try reading in the Serial.DataAvailable event instead
My similar threads all do the readall in the thread. If bytesavailable is zero then the thread suspends itself. All I do in the dataAvailable event is to resume the thread.
If this is a Pi then it’s Linux, and I’ve had a problem in the past with x86 Linux using 100% of a CPU core (although the app and its GUI ran just fine) but supposedly this issue was resolved.
I have many of these devices deployed using the same code for a few years now. I have had a few freezes in the past but was not able to diagnose because they were so infrequent. It’s only now that I have one captured and the client is letting me keep it frozen while I try to find the issue.
I have had issues in the past with threads using 100% cpu, specifically Thread.Sleep or Thread.Suspend Raspberry Pi Thread.Sleep Causes 100% CPU Utilization - #5 by kevin_g but as you said, the UI never froze. I was not able to reproduce those issues in the last few versions of Xojo. I don’t know if this issue is somehow related, but it would seem to be in the framework.
Do your threads yield time to the main thread when they’re processing data? If not, you just need to call the thread’s
Sleep command once it a while to allow other things to run.
None of the threads run for more than a few ms, probably less than 5% of the total time. They also are not able to restart unless another DataAvailable event fires (I believe this comes from the main thread)
I think I figured it out…
I was reading the data in from a GPS to a buffer. I would then split the buffer on end of lines to get each line of NMEA0183 data. Anything left over would be prepended to the buffer on the next read.
This all works fine when there are end of lines in the data. However, it would appear that at some point the serial port stopped working properly, causing garbled data which never contained any end of lines. In only a few minutes the buffer would have millions of characters that it kept trying to do line splits on only to put it back into the buffer for the next try. Making matters worse, I was using the character versions of the split method causing it to try to make heads and tails of badly formated encoded characters.
I have implemented a buffer cap that when hit, it will throw out the data and reset the serial port. Let’s hope this fixes the issue.
It also appears there is some issue with setting the serial port baud rate in Xojo, for the meantime I have a shell calling “stty” and that seems to work well.