Why does a subtraction operation make this over 100 times slower?

I have two methods that are essentially the same, except for one subtraction operation. The one including subtraction is over 100 times slower than the one without it. This makes no sense to me.

The methods take an array of Booleans and turn it into a MemoryBlock where each bit (not byte) represents true/false.

Here’s the “slow” one:

[code]Protected Function EncodedBytesStringFromPoints(Points() As Boolean) as String
dim ms as double = Microseconds

if Points() = Nil or Points.Ubound = -1 then return “”

#pragma DisableBackgroundTasks

’ express a boolean array as bits in a memoryblock
dim j, k, bytePos, size, bitPos, value as integer

’ This is how many bytes we need
size = ceil( (Points.Ubound+1) / 8 )

’ make the memoryblock
dim mb as new memoryblock( size )
size = size - 1

’ write bits
for j = Points.Ubound DownTo 0
if Points(j) then
k = Points.Ubound - j // *** this line makes the operation over 100 times slower ???
bytePos = size-floor(k / 8)
bitPos = k mod 8
value = pow( 2, bitPos )
mb.byte( bytePos ) = Bitwise.BitOr( mb.byte( bytePos ), value )
end if
next

System.DebugLog str( Microseconds - ms )

return mb
End Function
[/code]

And here’s the “fast” one. The only difference is that the order of the bits is reversed.

[code]Protected Function EncodedInvertedBytesStringFromPoints(Points() As Boolean) as String
dim ms as double = Microseconds

if Points() = Nil or Points.Ubound = -1 then return “”

#pragma DisableBackgroundTasks

’ express a boolean array as bits in a memoryblock
dim k, bytePos, size, bitPos, value as integer

’ This is how many bytes we need
size = ceil( (Points.Ubound+1) / 8 )

’ make the memoryblock
dim mb as new memoryblock( size )
size = size - 1

’ write bits
for k = Points.Ubound DownTo 0
if Points(k) then
bytePos = size-floor(k / 8)
bitPos = k mod 8
value = pow( 2, bitPos )
mb.byte( bytePos ) = Bitwise.BitOr( mb.byte( bytePos ), value )
end if
next

System.DebugLog str( Microseconds - ms )

return mb
End Function
[/code]

To perform the test:

Sub Open() Handles Open dim a, b as string dim points() as boolean = Array( true, false, false, false, true, false, true, true, false, false, true, false, false, true, true) // 100010110010011 a = EncodedBytesStringFromPoints( points() ) b = EncodedInvertedBytesStringFromPoints( points() ) quit End Sub

I must be missing something?

precalculate points.ubound and put it in a variable before the loop ?

Sounds like it is being evaluated every time

[code]dim pointsubound as integer
pointsubound = Points.Ubound

’ write bits
for j = pointsubound DownTo 0
if Points(j) then
k = pointsubound - j // ***[/code]

Are you sure it is the subtraction?

I would have thought the call to Points.Ubound in the calculation would be the slow part.

Unless the size of the array might change while the loop is running I could calculate UBound once into a local variable and then use that elsewhere (ie: don’t use UBound in your For loop declaration).

Ah, of course. You’re both absolutely right. But the one with subtraction is still about 40 times slower (!) EDIT: no, it’s still about 100 times slower.

Updated methods:

[code]Protected Function EncodedBytesStringFromPoints(Points() As Boolean) as String
dim ms as double = Microseconds

if Points() = Nil or Points.Ubound = -1 then return “”

#pragma DisableBackgroundTasks

’ express a boolean array as bits in a memoryblock
dim p, j, k, bytePos, size, bitPos, value as integer
p = Points.Ubound

’ This is how many bytes we need
size = ceil( (p+1) / 8 )

’ make the memoryblock
dim mb as new memoryblock( size )
size = size - 1

’ write bits
for j = p DownTo 0
if Points(j) then
k = p - j // *** still 100 times slower ???
bytePos = size-floor(k / 8)
bitPos = k mod 8
value = pow( 2, bitPos )
mb.byte( bytePos ) = Bitwise.BitOr( mb.byte( bytePos ), value )
end if
next

System.DebugLog str( Microseconds - ms )

return mb
End Function
[/code]

[code]Protected Function EncodedInvertedBytesStringFromPoints(Points() As Boolean) as String
dim ms as double = Microseconds

if Points() = Nil or Points.Ubound = -1 then return “”

#pragma DisableBackgroundTasks

’ express a boolean array as bits in a memoryblock
dim p, k, bytePos, size, bitPos, value as integer
p = Points.Ubound

’ This is how many bytes we need
size = ceil( (p+1) / 8 )

’ make the memoryblock
dim mb as new memoryblock( size )
size = size - 1

’ write bits
for k = p DownTo 0
if Points(k) then
bytePos = size-floor(k / 8)
bitPos = k mod 8
value = pow( 2, bitPos )
mb.byte( bytePos ) = Bitwise.BitOr( mb.byte( bytePos ), value )
end if
next

System.DebugLog str( Microseconds - ms )

return mb
End Function
[/code]

Actually, after removing calls to UBound it’s still on average 100 times slower. The performance varies, no idea why. After reading about how Ubound affects performance in Xojo (see ), it seems obvious that making the above change isn’t going to gain any speed in this case. The Boolean arrays are at most only tens of elements. So to this question:

I don’t know what else it could be. Besides the subtraction and adding one variable to hold the subtracted value, the methods are identical. Yet one is 100 times slower. It makes no sense to me.

I’d hope that a UBound check wasn’t causing the problem, I think I read somewhere that they are cached to avoid such issues (might be wrong, might be a different language).

I realise you have DisableBackgroundTasks but do you actually have background threads? If so, try sleeping them at the start of the method and resuming them at the end. There might be a bug or a call outside the current method that allows the background thread to step in. For example I don’t know the internals of how floor or Bitwise work, if Bitwise leaves the current method, could that allow background tasks to run? If the length of the method is sufficiently increased by inserting the subtraction then that might allow a background task to run? Without knowing how things work internally that’s all just speculation.

Is it any quicker if you just k=k+1 starting k at 0, that’s what its doing, right?

You might also want to throw in all other pragmas that might be pertinent to a fast loop, BoundsChecking, BreakOnExceptions, NilObjectChecking and StackOverflowChecking to see if that improves things.

Just a few thoughts.

[quote=426378:@Aaron Hunt]I have two methods that are essentially the same, except for one subtraction operation. The one including subtraction is over 100 times slower than the one without it. This makes no sense to me.

The methods take an array of Booleans and turn it into a MemoryBlock where each bit (not byte) represents true/false.

Here’s the “slow” one:

[code]Protected Function EncodedBytesStringFromPoints(Points() As Boolean) as String
dim ms as double = Microseconds

if Points() = Nil or Points.Ubound = -1 then return “”

#pragma DisableBackgroundTasks

’ express a boolean array as bits in a memoryblock
dim j, k, bytePos, size, bitPos, value as integer

’ This is how many bytes we need
size = ceil( (Points.Ubound+1) / 8 )

’ make the memoryblock
dim mb as new memoryblock( size )
size = size - 1

’ write bits
for j = Points.Ubound DownTo 0
if Points(j) then
k = Points.Ubound - j // *** this line makes the operation over 100 times slower ???
bytePos = size-floor(k / 8)
bitPos = k mod 8
value = pow( 2, bitPos )
mb.byte( bytePos ) = Bitwise.BitOr( mb.byte( bytePos ), value )
end if
next

System.DebugLog str( Microseconds - ms )

return mb
End Function
[/code]

And here’s the “fast” one. The only difference is that the order of the bits is reversed.

[code]Protected Function EncodedInvertedBytesStringFromPoints(Points() As Boolean) as String
dim ms as double = Microseconds

if Points() = Nil or Points.Ubound = -1 then return “”

#pragma DisableBackgroundTasks

’ express a boolean array as bits in a memoryblock
dim k, bytePos, size, bitPos, value as integer

’ This is how many bytes we need
size = ceil( (Points.Ubound+1) / 8 )

’ make the memoryblock
dim mb as new memoryblock( size )
size = size - 1

’ write bits
for k = Points.Ubound DownTo 0
if Points(k) then
bytePos = size-floor(k / 8)
bitPos = k mod 8
value = pow( 2, bitPos )
mb.byte( bytePos ) = Bitwise.BitOr( mb.byte( bytePos ), value )
end if
next

System.DebugLog str( Microseconds - ms )

return mb
End Function
[/code]

To perform the test:

Sub Open() Handles Open dim a, b as string dim points() as boolean = Array( true, false, false, false, true, false, true, true, false, false, true, false, false, true, true) // 100010110010011 a = EncodedBytesStringFromPoints( points() ) b = EncodedInvertedBytesStringFromPoints( points() ) quit End Sub

I must be missing something?[/quote]

On what platform are you seeing this difference?

MacBook Pro 2015 3,1 GHz i7, Xojo2018r4

No, I’ve made a test app with only the code that I’ve posted here, so anyone should be able to reproduce the same behaviour.

No, in the one that’s running slow, k is only calculated when a Points() value is true, and then it depends on the value of j, so I don’t see a more efficient way to do it.

Yes, I’d thought of that too, but if I haven’t done it in either method, then none of that will help with this imbalance.

I just dropped both your methods into a test app with similar timings under Windows 10 and MacOs VM with 2018r4.

Do you have a test project that shows the issue?

Here is my test project (source) which includes two alternate ways of doing the same thing, which I was trying to improve (hence method name ending with “OLD”). The “fast” one is faster than the old code, and the “slow” one definitely isn’t.

Odd - Norm had taken your code & whipped up a quick test that ran each methods code 100 times and they are within a hair of each other with the exact same inputs.

Window1.EncodedBytesStringFromPointsavg = 20.84779 msec
Window1.EncodedInvertedBytesStringFromPointsavg = 20.76631

Here’s the project:

https://blog.xojo.com/wp-content/uploads/2019/02/speedtest.xojo_binary_project.zip

[quote=426455:@Jason Parsley]Odd - Norm had taken your code & whipped up a quick test that ran each methods code 100 times and they are within a hair of each other with the exact same inputs.

Window1.EncodedBytesStringFromPointsavg = 20.84779 msec
Window1.EncodedInvertedBytesStringFromPointsavg = 20.76631

Here’s the project:

https://blog.xojo.com/wp-content/uploads/2019/02/speedtest.xojo_binary_project.zip[/quote]

Hm, that’s interesting. But I removed the loops to call each method only once in Norman’s project, and the results are the same as my project. Here it is .

P.S. I also randomised the input.

Yeah I originally did that and didn’t see any issue, but the interesting thing is this, if you move the code (just posted by Aaron) from the open to a button, and click it twice you can see a massively different result:

21:38:07 : INPUT: 100010110010011
OLD: 30.05469
NEW: 57.37695
OLD-i: 13.66138
NEW-i: 9.904297
21:38:08 : INPUT: 100010110010011
OLD: 16.73486
NEW: 15.71021
OLD-i: 13.66113
NEW-i: 9.904541

Just by running the same method again, there’s this huge drop. Its like something is being initialised the first time around but never again, very odd.

I guess that Aaron didn’t see this because he just ran it once and quit the app

[quote=426458:@JulianS]Yeah I originally did that and didn’t see any issue, but the interesting thing is this, if you move the code (just posted by Aaron) from the open to a button, and click it twice you can see a massively different result …
I guess that Aaron didn’t see this because he just ran it once and quit the app[/quote]

Thanks, Julian. I tried the same, and here are my results. First time = ridiculously slow, second time = performance as expected.

OLD: 146.911
NEW: 1446.086
OLD-i: 39.74603
NEW-i: 30.16602

OLD: 43.04602
NEW: 38.18799
OLD-i: 58.48804
NEW-i: 31.56195

So then I thought – well, when it’s done in a loop, it becomes fast. When I call it twice in a row, it becomes fast. What about when it’s called from a timer just every so often? Same. Only the first call is slow.

I put the inverted functions first like so:

c = EncodedInvertedBytesStringFromPointsOLD( points() )
d = EncodedInvertedBytesStringFromPoints( points() )
a = EncodedBytesStringFromPointsOLD( points() )
b = EncodedBytesStringFromPoints( points() )

And got these results:
OLD-i: 61.94
NEW-i: 348.013
OLD: 10.745
NEW: 7.936005

There’s something going on behind the scenes that’s not related to your algorithm.

Next, I reversed the order in which these methods were called. Guess what? Now it’s the inverted one that’s slow…

INPUT: 1110011000101010000100111000010
OLD-i: 70.54504
NEW-i: 1484.864
OLD: 26.73303
NEW: 19.52899

So! This appears to be something to do with memory allocation! Nothing whatsoever to do with a subtraction being in the loop!

I agree, I just tried it in 2016r3 and it exhibited the same time issues between 1st and 2nd executions.

Guess what guys, put these two lines in the button or open event or wherever, BEFORE calling either method. Then the problem goes away completely.

dim mb as new memoryblock(2) mb.byte( 0 ) = Bitwise.BitOr( mb.byte( 0 ), 32 )

So this is something to do with Bitwise.

Bitwise must perform some form of initialization the first time it’s used.