RuntimeLockObject - could it be faster?

Mike_D · January 9, 2024, 6:03pm

Even after extensive performance tuning (see Machine Learning in XOJO - #34 by Mike_D ) it seems like the Xojo framework spends a lot of time locking and unlocking objects.

Here’s an Instruments trace:

Notice that of the ~200 msec in this call, about 135msec is spent inside one of these functions:

RuntimeLockObject
RuntimeUnlockObject
RuntimeLockUnlockObjects

Thoughts:

It appears that these calls are mostly due to accessing objects stored in arrays.
could these functions be made faster?
could these functions be eliminated?

For example… if you are inside a method, and background tasks are disabled, and the method does not call out to any other functions or methods, wouldn’t it be possible to simply skip the locking and unlocking altogether? (I don’t have a clear idea of how the framework is designed, so this may be a dumb question )

Christian_Schmitz · January 9, 2024, 6:30pm

Sure, it could be faster.

Since RuntimeLockObject is just adding one to the reference count, it could even be inlined.
Unlock is more difficult, but could also be inlined. Especially if reference count is > 1, you could just decrease it by one and if it is 1 call the existing function to run destructor.

Feel free to make a feature request for such a change.

Björn_Eiríksson · January 9, 2024, 6:34pm

They cannot be eliminated, their the heart of Xojo’s memory management.

And I would not say their slow either there is just naturally a lot of calls to them. (As would be in any other reference counted language or framework).

Mike_D · January 9, 2024, 7:01pm

the code in question does something like this:

Assume Class B has an array of items holding Class A.

function total(foo as classB)
1  dim total as double
2  for i as integer = 0 to foo.items.ubound
3    var a as classA = foo.items(i)
4    total = total + a.foobar
5  next
6  return total
end function

I can see where in line 3, the reference count of foo.items(i) needs to be increased, since var a now points to it.

However, since we know var a will be immediately released in line 5 at the loop boundary, if we can somehow know that line 4 doesn’t have any side effects then there is no point to incrementing then decrementing the reference count.

Is the compiler smart enough to realize this? (I don’t think so)
could the compiler be smarter?
if the lock/unlock code were inlined, I wonder if the compiler optimizer would be smart enough to realize “this code has no effect” and eliminate it?
or would it make more sense to have a new #pragma, such as #pragma ReferenceCounting false which could be a ‘use at your own risk’ feature?

Tim_Hare · January 9, 2024, 7:11pm

If you’re squeaking out cpu cycles at that level wouldn’t it make more sense to do

total = total + classA(foo.items(i)).foobar

Edit: I would prefer the compiler not try to be too smart. I’ve been bitten by “optimizing” compilers in the past. It was wise of Xojo to go with an optimizing compiler that has a lot of people working on it. They really shouldn’t try to do it on their own.

Mike_D · January 9, 2024, 7:28pm

@Tim_Hare that’s what I would think, however, in the optimizations in that thread, it seemed as if caching the value once was actually faster than dereferencing it multiple times, even though caching it would cause a Lock/Unlock, whereas multiple dereferences could (in theory) be done without lock/unlock.

In other words it seems as if:

for i = 0 to u
   total = total + foo.items(i).x
   total = total + foo.items(i).y
   total = total + foo.items(i).z
next

was slower than

for i = 0 to u
   var a = foo.items(i)
   total = total + a.x
   total = total + a.y
   total = total + a.z
next

My hunch is that the compiler is doing object locking and unlocking more than needs?

Christian_Schmitz · January 9, 2024, 7:28pm

for such a function, this would be perfect. None of the objects gets crated or destroyed, so reference count stays and it could be faster.

Mike_D · January 9, 2024, 8:01pm

Array dereference is calling RuntimeLockObject and RuntimeUnlockObject when not necessary
See https://tracker.xojo.com/xojoinc/xojo/-/issues/75311

Björn_Eiríksson · January 9, 2024, 8:02pm

How would it know if its needed or not ?

It has no way to know since you might be assigning it to new variable for example.

Pretty much all reference counted frameworks do this, including Cocoa it self.

Mike_D · January 9, 2024, 8:19pm

I don’t know, but It seems like one could have some simple rules that might work?
E.g.
If on a single line of code…

the calculation uses only value types (integer, double…)
all value types are properties (no computed properties or functions are called)
the result of the calculation is a value type

Then…

reference counting is not needed, and a chain of properties such as
foo.bar.zoo(27).fee.fi(3) will not call RuntimeLockObject for any objects.

Maybe it’s not that simple, and we should just have a #pragma so we can footgun ourselves and not blame the compiler?

Rick_Araujo · January 9, 2024, 8:48pm

No one should touch the ref count in any way, just the compiler (and plugin devs at low level in their plugins). In the past I remember a kind of silly me talking about something like that with Joe and he politely said “no”. Lol.
If the compiler can be smart enough to make some correct assumptions, ok, it could auto-optimize something, never the user. The level of crashing/destruction an user can get playing with the ref count can be astronomical.
I think that the way to go is kind of Christian’s first assumptions, the object ownership system could try to “inline” some fast codes instead of making calls when possible, if possible.

Beatrix_Willius · January 10, 2024, 11:50am

Yes and no. Yes, this shouldn’t be our concern. However, the recent speedup in Xojo 20234 also shouldn’t have been our concern. The speed was improved because Christian found a glaring problem. Perhaps Xojo can improve this and perhaps they can’t. But at least they can have a look.

Rick_Araujo · January 10, 2024, 2:00pm

You are raining on the ocean for no reason.

Eric_Williams · January 10, 2024, 3:25pm

It’s true that if you are accessing multiple properties/methods of an object (as in your x-y-z example), it is quicker to store a reference to it to avoid having to locate it in the array multiple times.

However, this is not true if you are only accessing a single property/method (like your foobar example). You’ll get a performance gain by dropping the var assignment and just using the array element directly. I’d be curious if the framework even bothers to lock the object if you use it in this fashion.

Rick_Araujo · January 10, 2024, 4:22pm

Write a complete simple example. Probably just observing it we can infer how it should be affected in a standard fashion.

Rick_Araujo · January 10, 2024, 4:45pm

In a call of chained methods, It will be creating temporary values, marking them “in use” (+1 ref) passing them to the next call, then when solving the last call, it will unwind the pile of calls, one for one, every temporary value will reach 0 ref and get destroyed until the root call.
In a chain of properties, the same mechanism apply except there’s no need of temporary values, you just act on the already set objects, already referenced, so they won’t be destroyed (1 +1, instead of 0 +1)

I think that there’s more happening than you suppose.

Jeff_Tullin · January 10, 2024, 5:03pm

So , forgive the question but, in simple terms, are we saying that in a tight loop,


for x as integer = 0 to bignumber
//do something with foo.bar.zoo(27).fee.fi(3)
next

would in general be faster than

dim thing as objecttype
thing = foo.bar.zoo(27).fee

for x as integer = 0 to bignumber
//do something with thing.fi(3)
next

due to reference counting?

Björn_Eiríksson · January 10, 2024, 5:29pm

No the other way around.

And not only due to reference counting it costs also to access elements. (That is function call)

Rick_Araujo · January 10, 2024, 5:44pm

This construct

Var sum As Integer = 0

For i As Integer = 0 to bignumber
  sum = sum + foo.bar.zoo(27).fee.fi(3) // the chain needs to be reevaluated for changes every time
Next

MUST be slower than

Var sum As Integer = 0
Var oFee As FeeType = foo.bar.zoo(27).fee // Evaluate the chain, cache the last referenced object

For i As Integer = 0 to bignumber
  sum = sum + oFee.fi(3) // Reevaluate only the final segment cached in the past
Next

Eric_Williams · January 10, 2024, 5:50pm

Instead of asserting that it MUST be slower, perhaps you should write a complete simple example. Probably just observing it you can come to conclusions supported by data.