Impossible situation with CriticalSection

2025r1 Linux x86_64 Cooperative threading. I’m getting an IllegalLockingException for trying to unlock a CriticalSection on the wrong thread. However, the code is very simple and there’s no possible way for this to be wrong.

I use a module named ObjectLocks that gets inited on app open with its own lock and dictionary. I call ObjectLocks.GetLock(Id) to fetch a CriticalSection from the dictionary. Then lock it, do some work, and unlock it. The GetLock function uses its own lock to make sure the dictionary isn’t touched by another thread while doing its job.

The trouble is, there’s simply no way for the locking exception to happen. My code looks like this:

Var Lock As CriticalSection = ObjectLocks.GetLock(Id)
Lock.Enter

Try
  // Do something, Lock is not touched inside the routine
Catch Err As RuntimeException
  // Report the error
End Try

Lock.Leave

That’s it. There’s nothing at all complicated about this usage. There’s no other code outside the Try/Catch but between the Enter/Leave, so no possibility of an exception aborting before the CriticalSection can be unlocked. No other code in the project uses the ObjectLocks module. And yet it sometimes fires the IllegalLockingException. Does anybody have any idea how this could possibly happen? Because I’m reaching the conclusion that this is a framework bug, which of course I’ll never be able to reproduce.

To be clear, this exception is raised at the Lock.Leave line, and none of your threads are set to Preemptive?

Correct.

And there is no chance the CriticalSection was set to Preemptive?

I’d add logging to this code that logs the ThreadID and the Id used for the Lock to see if that sheds some light. Throw in the Thread and CriticalSection Type too for good measure.

No chance. The trouble with logging is this only happens in production and is unpredictable. So that ends up being a LOT of log messages, as you basically need messages before and after each lock / unlock. Browsing the output is nearly impossible.

I’ve reverted this project to 2023r3. We’ve had this code working reliably for years. If it happens again, we’ll know it’s my code somehow. But I have my doubts it will.

Are you 100% sure that GetLock isn’t returning the same CriticalSection for two different threads?

Could you show us the GetLock code?

Well… it should return the same lock for two threads, otherwise the lock is pointless.

The method looks like this. mLock is the module’s CriticalSection:

Protected Function GetLock(Id As String, Create As Boolean) As CriticalSection
  mLock.Enter
  Var Lock As CriticalSection
  If mLocks.HasKey(Id) Then
    Lock = mLocks.Value(Id)
  ElseIf Create = True Then
    Lock = New CriticalSection
    mLocks.Value(Id) = Lock
  End If
  mLock.Leave
  Return Lock
End Function

I see, right. Sorry I was thinking the other way around.

I had to read this code several times before I really understood that it can return nil if Create is false and the ID isn’t found in the dictionary. I would gently suggest a restructure to make it more clear what’s going on:

Protected Function GetLock(Id As String, Create As Boolean) As CriticalSection
    Var result as CriticalSection

    mLock.Enter

    result=mLocks.Lookup(Id, nil)

    If result=nil Then
        If Create then
            result=new CriticalSection
            mLocks.Value(Id)=result
            mLock.Leave
            Return result
        Else
            mLock.Leave
            Return nil
        End
    Else
        mLock.Leave
        Return result
    End
End Function

Why are you doing this? If your app is truly only using cooperative threading, it should be impossible for the dictionary to get hit by multiple simultaneous threads.

The app was retrofit with preemptive threading as an option. We haven’t deployed that to production as there’s still problems that will probably never be found.

I can be certain there are no preemptive threads because we use compile time constants, and it’s all or nothing. Since types don’t mix, the only practical option is for all threads to be cooperative or preemptive.

Related advice: don’t try to retrofit preemptive threading into existing code. It’s a maddening exercise. New projects are much more reliable.

The process has been running with 2023r3 for hours without an issue. Same code as the 2025r1 version. It’s looking more and more likely to be a framework issue.

Have you tried latest 2025r1.1 beta? The framework REALSetPropValueObject() was broken since 2024r2, not sure if causing side effects in other places.

After this test, if problems persist, try to write a simple sample stressing your engine trying to get an IllegalLockingException in few seconds and show that when running in the 2023r3 it works, so William can track the regression.

Maybe they reintroduced this bug when refactoring for preemptive threads…

https://tracker.xojo.com/xojoinc/xojo/-/issues/10165

…but on Linux. Is it only doing this on Linux?

I haven’t experienced when testing on my Mac, but that’s not a guarantee it’s a Linux-only issue. We’ve tried stressing the dev copy hard with no luck. So I can’t confidently say it’s Linux-only, but there’s a decent chance of it.

Edit: That bug sounds exactly like what’s happening. There’s a few levels of nested loops in between. Lots of yield opportunity.

I’ve marked a solution, though I’m only about 95% confident.

I made this project and ran it on my Mac using 2025r1 so long that I forgot about it. It was running in the background for probably 12 hours without an issue. So I then built a Linux x86_64 version and ran it on a dev server that mirrors our production environment, and it hit an IllegalLockingException within 5 minutes. So 100% a Xojo bug.

So I made a build with 2025r1.1 and am running it on the same server, and it’s been a couple hours without issue. It does seem like it fixes this issue. I’m going to let it run a while longer to get more confident, but it looks like good news so far.

CriticalSection Test.zip (2.7 KB)

3 Likes

The framework REALSetPropValueObject() bug may potentially cause random results as you got, that invalidates (for me) the entire series Xojo 2024R2 to 2025r1.0 as unstable releases.

Maybe. At the very least, 2025r1.0 is a total no-go in my book. I’m not absolutely confident that REALSetPropValueObject is to blame. I’m leaning towards an undocumented fix. If I were seeing this in functionality provided by a plugin, such as a built-in plugin, and/or I was seeing this on the Mac version, I might be more inclined to agree. But it’s also well outside my area of expertise, so maybe I’m just wrong.

I’ve seen some side effects of it. But as they pop up under certain circumstances, and people could write “stable” apps just not hitting such circumstances, that’s why I wrote “potentially”. And as carrying potential instabilities is something I don’t accept, “for me” is a no-go for those versions affected. So, again, for me, people using 2024r2 to 2025r1.0 should jump to 2025r1.1 as soon as released, looking for a more stable version. That’s what I intend to do. :wink: