Memoryblocks are slower starting with 2024r3

See Array() function using Pairs causes slow Aggressive Compile - #6 by Mike_D

It turns out that using Variants can really slow down aggressive compilation

Eric:
Thanks for the suggestion. I have left it like this because the DLL do exactly the same calculation. I wonder how much different it will be doing this optimization for the Xojo code and for the DLL code. You are also correct when it comes to why I use memory blocks in this case. The Oscilloscope data is read directly into a memory block.

Aaron:
Thanks for your suggestion. If the compiler handle the nPoints - 1 in a very slow way that might explains why xojo loops performs so bad. I wonder what the reason is to not fix this in the compiler?

Ian:
My experience is that there are very little difference between using memory block and arrays when it comes to speed. Arrays are little faster, but there are one big difference when you need to do a copy of the data. Copying an Array require a loop, copying a memory block can be done with a memcopy instruction and you can’t be faster than the memcopy instruction.

Mike: I don’t use variants due to speed reasons and I will ask you a question on your thread on this.

It will definitely make a difference in Xojo. I have an algorithm that I’ve optimized in exactly this fashion.

Generally speaking, it is almost always faster to perform a calculation once if it is going to be reused. There might be edge cases where this isn’t the case, but it is true for the vast majority of situations.

Aaron: It seems that the compiler handles the nPoints - 1, at least with aggressive compile.

The result with Eric:s suggestions and calculating the nPoints - 1 before the loop together with aggressive compile:
Xojo code time = 116400.5 us, Dll code time = 56812.7 us => No difference from the last measurement, the DLL code is still 2 times faster

To be fair, having Xojo run as fast as 1/2 the speed of a DLL (which is probably compiled C code?) is pretty good. If this were my project, I might do this instead of using the DLL, since with a DLL you have to deal with all the extra complexity (another programming language and build system, etc. etc.)

1 Like

This is an example how you can speed up your application if you run into performance problems with long loops, that a think Aaron might suffer from in his initial question. Before one decide to stop using Xojo, you might find it useful to only move a few loop constructions to a DLL I think Xojo is an excellent tool, but sometimes you need to do something with your loops. Using a DLL and only gain 2 times speed can make a huge difference for the complete application. In my case this example is a read method for one file. My users have oscilloscopes that gives 4 signals with 1e9 points of data per measurement ( 4 files ) , that they want to compare with stored reference measurements. In their situation, this loop must run 8 times only to read things in with 20 times more data. Then they want to do some calculations, zooming in the data and you need to present the result graphically. This adds 3 to >5 serial connected loop before the user can see the result depending what they want to do. The time you gain on the complete application become huge with only 2 times increased speed in one loop. In my case I have gone from a situation of 3 to 5 minutes of user wait time to 10s to 20s for the application, by moving a few loop construction into a Dll. Everything else is pure Xojo.

What you describe sounds to me rather like an ideal task for a parallel processing DSP framework like the (built-in) Accelerate on macOS or a similar xplatform library. Is this what your dll addresses? If so, yes, I agree, you can not reach that performance with pure Xojo.
If not, I’d recommend to have a look at it. Performance will be extremely better.

Yes it would great to move into parallel processing with HW support and that one thing that I might need to do in the future when the data sizes increases. Are those frameworks HW dependent or is it more optimized c-code only?

The first. Admittedly I can only speak with certainty about Apple where I once made a declare library and modified Roger Meier’s DataView(? didn’t look the name up) classes for real-time waveform display and Fourier analysis. A big chunk of the framework uses the system’s DSP units to do bulk parallel processing like you described.
I don’t have numbers to compare at hand but I certainly don’t over-exaggerate stating that performance was lifted by factor 5 – 20.

Thanks for the information. I will do a search.

My desktop apps are slower across the board with 2024R3 onwards. The speed hit is about 30%. They don’t make much use of memorybocks though - mainly my own classes, dictionaries, arrays and graphics drawn in canvas.

Not enough to matter though as they’re quite fast enough.

Right, it has been confirmed to be an across-the-board issue (not just to do with memoryblocks).

Xojo has since done some twekaing on thread-related issues and my tests show that the latest beta release of 2024r4 is slightly faster than the current stable release 2024r3.1, but it’s still nowhere near as fast as 2024r2.

When as in your case speed isn’t critical, then of course it doesn’t matter much, but when you need your app to work as fast as possible, the loss of speed is extremely frustrating.

1 Like

Since I use 2024r1, is there any advantage using 2024r2 (Mac & Windows)?
I think, but I’m not sure right now, 2024r1 builts are smaller than 2024r2.
Thanks.

I don’t have that version installed. I suggest you run the above test in the IDE and see if the number is the same as listed for r2.

MacAir M3 8GB - Sonoma 14.6.1
with MemoryBlocks

dim s, e, t as double
s = System.Microseconds
for j as integer = 0 to 1000000
  dim m as new MemoryBlock(8)
next
e = System.Microseconds
t = e - s
dim r as string = format( t, "0.0000" )
dim clip as new Clipboard
clip.SetText r
MsgBox r
quit

//2021r21 = 417843.1667
//2024r21 = 154772.4583
// 2024r3 = 193853.0000

//Aaron's
// 2024r2 = 212367,6667
// 2024r3 = 300177,5833

without MemoryBlock [added built size

dim s, e, t as double
s = System.Microseconds
for j as integer = 0 to 1000000
  'dim m as new MemoryBlock(8)
next
e = System.Microseconds
t = e - s
dim r as string = format( t, "0.0000" )
dim clip as new Clipboard
clip.SetText r
MsgBox r
quit

//2021r21 = 47822.1666       ARM built 95.6
//2024r1 = 44442.7917         ARM built 100.9
// 2024r2 = 44468.8750       ARM built  101
// 2024r3 = 53382.1250       ARM built 100.2

//Aaron's
// 2024r2 = 57570,9583
// 2024r3 = 84383,8333