I spent some more time decompiling the framework hoping to figure out whats going on. I was able to use the macOS framework to fill in some of the missing function names.
It appears that “sub_3ca4cb(0x1, rsi, rdx, 0x0);” is something along the lines of “RuntimeThread::YieldAsNeeded”.
The stack then looks something like this:
Thread 1 (Thread 0x78d1e73a6040 (LWP 157369) "redacted"):
#0 0x000078d1e9298d61 in __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x31e47e8) at ./nptl/futex-internal.c:57
#1 __futex_abstimed_wait_common (cancel=true, private=0, abstime=0x0, clockid=0, expected=0, futex_word=0x31e47e8) at ./nptl/futex-internal.c:87
#2 __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x31e47e8, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0) at ./nptl/futex-internal.c:139
#3 0x000078d1e929b7dd in __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x31e4798, cond=0x31e47c0) at ./nptl/pthread_cond_wait.c:503
#4 ___pthread_cond_wait (cond=0x31e47c0, mutex=0x31e4798) at ./nptl/pthread_cond_wait.c:627
#5 0x000078d1e99ccf10 in xojo::ConditionVariable::Wait () from /redacted Libs/XojoGUIFramework64.so
#6 0x000078d1e99caf6c in RuntimeThread::Wait () from /redacted Libs/XojoGUIFramework64.so
#7 0x000078d1e99c911a in RuntimeThread::RoundRobin () from /redacted Libs/XojoGUIFramework64.so
#8 0x000078d1e99be7bf in RuntimeThread::YieldAsNeeded () from /redacted Libs/XojoGUIFramework64.so
#9 0x00000000007d99c8 in DesktopApplication._CallFunctionWithExceptionHandling%%o<DesktopApplication>p ()
#10 0x000078d1e99be6cb in ?? () from /redacted Libs/XojoGUIFramework64.so
#11 0x000078d1e99be8b2 in ?? () from /redacted Libs/XojoGUIFramework64.so
#12 0x000078d1e99bd406 in RuntimeRun () from /redacted Libs/XojoGUIFramework64.so
#13 0x0000000000849063 in REALbasic._RuntimeRun ()
#14 0x0000000001b3e96c in _Main ()
#15 0x0000000001b3e1c3 in main ()
Inside of RoundRobin a function “DetermineNextThreadToRun” is called which appears to pick every thread except for the main thread for some reason. Still investigating further.
I would not use DoEvents in a desktop app. At this point the app is hung and I am in GDB (external debugger) trying to shake a crash out of it to see if it falls apart in some way that explains how it got hung in the first place.
I can tell by the call stack and by putting break points that all the other threads are pausing/sleeping and running as expected. No one thread is taking more than it’s fair share. Most threads spend their time sleeping.
Can you get these threads to report regularly - e.g. by adding text to a TextArea? Or are they all doing that now? By “main thread frozen” do you mean that there are screen updates that should but are not happening? Or that no UI element responds to mouse clicks?
Anything that happens in the main thread stops functioning (events, gui, etc.). I put a break point in “gtk_main_iteration_do” to catch if the main thread was running and it never got called.
I only get a chance to debug this once every few weeks when it happens so everything that I put in the app itself to debug has to be there weeks before. As far as once its hung, I can only do things that do not require events (I can read/write files, poll sockets and read but no event handlers fire)
Absolutely wild, just as we were talking, after 5 days 19 hours and 35 minutes it started working again. I can’t be sure that it wasn’t because of any of my efforts around calling “___pthread_cond_signal” but it’s like nothing ever happened.
I was paying attention to how long the main thread was frozen for, however, I should have been looking at how long before it froze. I was able to calculate an uptime of about 7.10 weeks or 4,294,080 seconds. This instantly stuck out like a sore thumb, uint32 is 4,294,967,296 and 4,294,967,296ms is pretty much exactly to 7.10 weeks.
It looks like the “RuntimeThread::DetermineNextThreadToRun” function uses something like this to get its time. Not sure why it gets ticks, then turns them into milliseconds instead of directly calculating milliseconds from the timespec, but it looks to be scaling it to an int32.
int64_t GetMilliseconds() {
int ticks = GetTicks(); // 1/60th of a second
double milliseconds = static_cast<double>(ticks) * (50.0 / 3.0); // Convert to milliseconds
double value = milliseconds + 2147483648.0;
// Check for overflow
if (value >= 4294967296.0)
value = fmod(value, 4294967296.0);
value -= 2147483648.0;
return static_cast<int64_t>(value);
}
Seems a bit suspect but nothing conclusive yet, the hunt continues…
There are many variables in the RuntimeThread object that I have gone through, however, the one at +1080 stands out to me. This appears to be something along the lines of “sleep_time” and is a timestamp in ms for the when the function should sleep till.
One of the working threads that runs ever second or so was at 125,760,320 and the main thread was at 502,247,482. Calculating the difference gives us 376,487,162ms or 4.36 days. I took the core dump around 1.4 days after it hung and it resumed about 4.35 days later so the math works out. Also 125,760,320ms is about 1.4 days.
I put a watch on that variable and waited for something to write to it. Eventually I found out that in “ThreadRun” it sets the current thread’s sleep time to the current milliseconds - 1.
It appears that there is an issue with the thread scheduler where it can set the sleep time to something wildly different than it should be. It also appears that rolling over the milliseconds causes the issue to clear itself.
If the current thread is the main thread then than it will get hung until it gets the sleep time expires. I can probably have a dedicated thread just for starting other threads, that way if it gets frozen, it wont take the whole app down with it. Most of the threads all still either sleeping or paused, new threads are only created when the user is configuring the software so less chance of it going wrong when its unattended.
Still not a concrete answer but I will get to the bottom of this if its the last thing I do.