Preemptive Threads and Listboxes Don't Play Nice

Well now I have just put my thread code into a critical section and then in the CellTextPaint event of the listbox, I try to enter that CriticalSection.

So the code in the CellTextPaint event should not run if the thread is running…

We shall see what happens now…

Not sure all the things you do but avoid Thread.Stop() when you can gracefully end it.

Isn’t it prohibited to share a lock between a preemptive thread and the main thread (where your UI runs)?

Isn’t that the point of the lock? To prevent items in one thread from accessing a protected resource in another?

Agreed. And actually, the thread completes long before that stop command ever gets issued. I am likely going to take it out of the final code.

Ah. You must set the Type of the CriticslSection to Thread.Type (Cooperative or Preemptive) for it to work properly. Are you doing that?

Yep

So more crash reports. Now with the CellTextPaint Event being restricted from painting while the thread is running (and I can see the effect of this in the listbox FYI), I am still crashing. Here’s where it gets interesting…

Here’s the first part of the first crash report I had:

Thread 0::  Dispatch queue: com.apple.main-thread
0   libsystem_kernel.dylib        	       0x1981ed9ec __psynch_cvwait + 8
1   libsystem_pthread.dylib       	       0x19822b55c _pthread_cond_wait + 1228
2   libc++.1.dylib                	       0x198150b14 std::__1::condition_variable::wait(std::__1::unique_lock<std::__1::mutex>&) + 28
3   XojoFramework                 	       0x1082527e0 0x107fd4000 + 2615264
4   XojoFramework                 	       0x10824fae4 0x107fd4000 + 2603748
5   XojoFramework                 	       0x10814731c RuntimeBackgroundTask + 68
6   MediaSwitcher                 	       0x106079db0 DeviceMaintWindow.DeviceMaintWindow.ScrollBarHorizontalVisible.Get%b%o<DeviceMaintWindow.DeviceMaintWindow>i4 + 720
7   MediaSwitcher                 	       0x10602d674 DeviceMaintWindow.DeviceMaintWindow.DeviceList_CellTextPaint%%o<DeviceMaintWindow.DeviceMaintWindow>o<ListBoxPopup>o<Graphics>i8i8i8i8 + 9904
8   MediaSwitcher                 	       0x10588b894 Delegate.IM_Invoke%%o<ListBoxPopup>o<Graphics>i8i8i8i8 + 100
9   MediaSwitcher                 	       0x10588b930 AddHandler.Stub.51%%o<Graphics>i8i8i8i8 + 136

Look where it crashed - at a wait. Looks like the thread was wanting to start its thing, the listbox was doing its stuff and so then the thread was asked to wait - it crashed. This was using a critical section and yes, the type was set to preemptive.

So second crash report:

Thread 0::  Dispatch queue: com.apple.main-thread
0   libsystem_kernel.dylib        	       0x1981ed9ec __psynch_cvwait + 8
1   libsystem_pthread.dylib       	       0x19822b55c _pthread_cond_wait + 1228
2   libc++.1.dylib                	       0x198150b14 std::__1::condition_variable::wait(std::__1::unique_lock<std::__1::mutex>&) + 28
3   XojoFramework                 	       0x105c567e0 0x1059d8000 + 2615264
4   XojoFramework                 	       0x105c53ae4 0x1059d8000 + 2603748
5   MediaSwitcher                 	       0x1027a7a8c Thread.Start%%o<Thread> + 16
6   MediaSwitcher                 	       0x103eecd60 JAPDevice.PicSockDataAvailable%%o<JAPDevice>o<TCPSocket> + 1164

Same thing. The thread is starting up and it is having to wait when it crashes…

I’ve changed the critical section to a semaphore to see if that makes a difference.

Just had another crash. This one I could actually catch and get a stack dump. Same thing. But it always seems to happen during an unlock. Maybe that is the culprit…

ThreadAccessingUIException
RuntimeRaiseException
REALbasic._UITrap
ListColumn.__Exit%%o<ListColumn>
RuntimeUnlockObject
RuntimeLockUnlockObjects
JAPDevice.HandleImageThreadRun%%o<JAPDevice>o<Thread>
Delegate.IM_Invoke%%o<Thread>
threadRun
_ZN4xojo11SpawnThreadEmPFvPvES0_
_pthread_start
2024   4.8.0.782

Look at how many of these crashes have an “UnlockObject” or “UnlockObjects” in them…

Maybe the continuous CellText Painting of the listbox is stressing the system…

I also notice that there is a thread and a ListColumn. That’s where your problem lies…

So what’s the code in JAPDevice.HandleImageThreadRun look like?

Edit: NM, I see it above.

I think I may have multiple issues here. The ThreadAccessingUI exception part seems to have been resolved by making a copy of the image created in the threads before painting it to the canvases in the main thread.

But then I still keep getting random crashes and I’m not entirely certain. A number of the crash reports show a crash in com.apple.CFNetwork.Connection

Thread 6 Crashed::  Dispatch queue: com.apple.CFNetwork.Connection
0   libsystem_platform.dylib      	       0x198258904 _platform_strlen + 4
1   libsystem_c.dylib             	       0x1980c42c8 __vfprintf + 5120
2   libsystem_c.dylib             	       0x1980edb24 _vsnprintf + 224
3   libsystem_c.dylib             	       0x1980ccd48 snprintf + 68
4   Network                       	       0x19f687814 nw_parameters_copy_description_internal + 100
5   Network                       	       0x19f68776c -[NWConcrete_nw_parameters description] + 20
6   Foundation                    	       0x199400eac _NS_os_log_callback + 284
7   libsystem_trace.dylib         	       0x197f7cc5c _os_log_fmt_flatten_NSCF + 64
8   libsystem_trace.dylib         	       0x197f7c6c0 _os_log_fmt_flatten_object + 220
9   libsystem_trace.dylib         	       0x197f7a394 _os_log_impl_flatten_and_send + 1864
10  libsystem_trace.dylib         	       0x197f79c34 _os_log + 168
11  libsystem_trace.dylib         	       0x197f79b84 _os_log_impl + 28
12  Network                       	       0x19f71fe28 __nw_connection_cancel_inner_block_invoke + 504
13  Network                       	       0x19f71fbd4 nw_connection_cancel_inner +

I think this is happening due to some sort of collision or something if I reset the URLConnection or TCPSocket. When I make the send request on the network object I start a timer. In the received event I turn the timer off. If the timer ends up running it resets the socket. I think I may not be letting the framework have enough time to close and destroy the socket before recreating it. Hard to tell but this is my best guess. I can do this reset manually from my UI and I have never seen it crash. However, if there are multiple sockets at once that need to be reset, I can see something like this possibly happening. And this is all on the main thread by the way, so it’s not a thread issue.

It’s just really hard to debug and figure out. My app ran all night until 6:45 AM this morning when it crashed. Makes debugging difficult when something has to run for hours before you have an event and then it’s still hard to tell. This morning’s crash was on the main thread during a canvas draw. No network thread crash. Wish we had a way to get better description of what’s happening in these crash reports…

Out of curiosity, for those of us on the sidelines wondering about the value (time overhead + debugging headaches vs doing things the old-fashioned way (single core), what is your expected performance bump?

I understand you can run an A/B test until you get the “B” part (pre-emptive routines) working. Did you have an expected performance return before embarking on your journey?

I think the performance improvement is significant. Previously with capturing too many images, the UI would just start lagging. Things just are smoother. If I can figure out some of these issues, then yes, it will be well worth it.

It seems like the issue I am having right now is not even related to preemptive threads as it is an OS X framework thread that is crashing. I’m trying to figure out if there is anything in my code or network objects that could be causing this.

And truthfully, this sort of issue could have happened previously before I went to preemptive threading. I don’t really know if I ever ran the routines as long as I have been running them now in testing. It is a major challenge trying to figure out what you are doing wrong by reading crash reports and having to let you app run for hours and hours before it crashes.

Potentially very significant, but there are some issues to be worked out - you can read the bad news (very slow in 3.1) and good news (possibly has already been fixed for next release) here: Xojo1BRC (Xojo One Billion Row Challenge) - #38 by Kem_Tekinay

1 Like

Yes, this is a work in progress. Step one was getting preemptive threads to work at all in a usable way, and that went way quicker than (I think) anyone imagined.

Now it’s dealing with the inevitable issues that come along with such a huge change, including performance bottlenecks.

If/when that’s addressed, I’d expect a dividable task that takes n time would go take something more than n/c when using c cores. (It likely won’t be n/c because of the processing overhead.)

You guys make great points in terms of raw data processing power. But I think even now preemptive threads work much better than cooperative threading.

In my app here, prior to using preemptive threads the maximum data rate I would see when pulling images was maybe about 60 Mb/s across the network. Now I am getting nearly 10x that rate.

I should try doing the same thing but using cooperative threads. My original code was originally running all on the main thread but using timers.

Are you protecting the timer state changes with a critical section or semaphore.

Not sure what you mean…

Sorry, should have been more explicit. Start and Stopping of timers, resetting the period etc. If a lot of things are using the same timer then with preemptive threads can end up with the timers in odd states. One item could have something like:

MyTimer.Interval = 300
MyTimer.Mode = Multiple

Another part of the code could:

MyTimer.Interval = 200
MyTimer.Mode = Stop

In a permeative world you could end up with the following sequence:

MyTimer.Interval = 300
MyTimer.Interval = 200
MyTimer.Mode = Multiple
MyTimer.Mode = Stop

For example, down to threads executing in a meshed sequence. The multiple properties need to be an atomic operation. So you either end up with the first pair running to completion or the second pair running to completion.

Why would you only have one timer? My app uses sockets, and I subclass SSL Socket for that. One of the subclass properties is a Timer, another is the thread that will own the socket. Thus everything is nicely encpsulated and the thread is nor going to be interfering with properties used by another thread.