Memoryblocks are slower starting with 2024r3

Kem_Tekinay · November 4, 2024, 2:25pm

Well said, and I am confident.

William_Yu · November 4, 2024, 2:34pm

Your numbers look like those from Debug->Run. We can likely improve them when preemptive threads aren’t active. The slowdown might also be less noticeable in a built app, or at least it should be.

Hakan_Nyberg · November 9, 2024, 3:03pm

Hi
I’m working on a analyze tool for many years there I can get data from oscilloscopes. Data from modern oscilloscopes will be huge, easy 500Mpts of double per signal. I have learned by experience that Xojo performance implementing loops is very slow in many situations and have been for many releases back in time. The only way to get reasonable application speed in a loop over 500Mpts is to move the loop code out to a plugin DLL and pass the memoryblock to the DLL. It doesn’t matter if you working with arrays or with memoryblocks, The plugin DLL will perform much faster. In many situations I get 10 times faster performance in the plugin compared to Xojo code.

Jeff_Tullin · November 9, 2024, 4:00pm

Calling a DLL means that the loop is not ‘debug’ code, it’s the final code, even when Xojo is running debug.
Is there still a 10x difference between compiled Xojo and the plugin speed?

Hakan_Nyberg · November 9, 2024, 4:15pm

Yes I always compare compiled application speeds when I doing speed analysis. The compiled xojo application is in some situations 10x faster when using DLL code compared to pure Xojo code.

Eric_Williams · November 9, 2024, 5:41pm

What does the DLL actually do?

Aaron_Hunt · November 9, 2024, 6:25pm

If you’ve been doing things this way for over a decade, you should try again with pure Xojo code. Our large project used to work about 10x faster with a dylib. Until the recent Xojo release in which everything became faster in Xojo. Then I tried the code in Xojo again, without the dylib, and guess what, compiled Xojo was faster than the dylib. So we ditched the dylib, yay!

Hakan_Nyberg · November 9, 2024, 6:56pm

Eric:
When I end up in a loop construction with many iterations, I always compare a DLL implementation with a Xojo implementation to figure out which one is the fastest one. In my application I have found that ~80% of the loops with many iterations gives more speed, if they are implemented in a DLL. One example is a calculation you always need to do when it comes to data from an oscilloscope, the calculation of the x-value and the y-value as:

For n as integer = 0 to numberOfDatapoints
x(n) = n * xIncrement + xOffset
y(n) = Yrawvalue(n) * yIncrement + yOffset
next n

Yrawvalue is typical an Int16 or an Uint16 value due to the A/D converter of an oscilloscope.
This loop is one example that needs to be implemented as a plugin DLL, to get reasonable compiled application speed.
The example is written with arrays but it doesn’t matter if you use a memory block instead.
xIncrement , xOffset, yIncrement and yOffset are data variables that doesn’t change during the loop.

Aaron
I will do some new tests with pure xojo code.

Eric_Williams · November 9, 2024, 7:22pm

I see. Are you implementing this DLL yourself? Or is it provided by a third party?

Mike_D · November 9, 2024, 7:41pm

Can you share your actual Xojo code (the full method)? I bet we could find some ways to speed it up dramatically.

(Might want to create a new thread if you do)

Hakan_Nyberg · November 10, 2024, 11:52am

The Dll is implemented by my self and it’s just the completeModule example with more functions ans subs added. It’s compiled with visual Studio 2019 for windows. I have tried to do some isolated test on xojo2024r3 but when I try the Xojo 2024, it complains on the dll call. The dll function works fine on xojo 2022r3.2, which I currently use. The Xojo2024 interprets the visible function when you hover the mouse over the function name as:

TestModule.calcOscXYInt8(RawArr As Ptr, xArr As Ptr, yArr As Ptr, VERTICAL_GAIN As double, VERTICAL_OFFSET As double, HORIZ_INTERVAL As double, HORIZ_OFFSET As double, startPosByte As Int64, nrData As Int64)

The actual xojo call is :

var mrawdata as Ptr = rawdata //rawdata is a memoryblock
var dataBlock0 as Ptr = HnData.dataBlock( 0 ) //HnData.dataBlock( 0 ) is a memoryblock
var dataBlock1 as Ptr = HnData.dataBlock( 1 ) //HnData.dataBlock( 1 ) is a memoryblock
TestModule.calcOscXYInt8( mrawdata, dataBlock0, dataBlock1,dVERTICAL_GAIN, dVERTICAL_OFFSET, dHORIZ_INTERVAL, dHORIZ_OFFSET,0, nrDataPoints)

or only
TestModule.calcOscXYInt8(rawdata, HnData.dataBlock( 0 ), HnData.dataBlock( 1 ),dVERTICAL_GAIN, dVERTICAL_OFFSET,dHORIZ_INTERVAL, dHORIZ_OFFSET,0, nrDataPoints)

//dVERTICAL_GAIN, dVERTICAL_OFFSET, dHORIZ_INTERVAL, dHORIZ_OFFSET is a local defined double variables
//nrDataPoints is a local definded Int64 variable

and the C definition is:

static void calcOscXYInt8(
signed char* RawArr,
double* xArr,
double* yArr,
double VERTICAL_GAIN,
double VERTICAL_OFFSET,
double HORIZ_INTERVAL,
double HORIZ_OFFSET,
const long long startPosByte,
const long long nrData);

Do anyone know if something have happened on the plugin side of things in Xojo 2023 and Xojo2024 that might case my present problems?

Hakan_Nyberg · November 10, 2024, 3:21pm

Hi Again
Sorry for posted the Dll issue in this thread. I’m solved it and I need to file a bug report.
Here is my test result for compiled xojo2024r3 application running xojo code compared to running DLL code:

Xojo code time = 462963.2 us Dll code time = 63762.8 us => DLL code is 7 times faster and this is little bit better than I have measured before.

This is the xojo code during the test:
var dataBlock0 as ptr = HnData.dataBlock( 0 )
var dataBlock1 as ptr = HnData.dataBlock( 1 )
var hMBPtr as Ptr = hMb
var xIncrement as double = FileHeader.xIncrement
var xOriginZero as double = FileHeader.xOriginZero
var yIncrement as double = TraceHeader.yIncrement
var YOrigin as double = TraceHeader.YOrigin
var yRef as double = TraceHeader.yReference

var d1 as double = system.Microseconds
for k as integer = 0 to nPoints - 1
dataBlock0.double( k * doublesize) = k * xIncrement + xOriginZero
dataBlock1.double( k * doublesize) = ( hMBPtr.UInt16( k*2) - YOrigin - yRef ) * yIncrement
next k
var dxojo as double = system.Microseconds - d1

d1 = system.Microseconds
TestModule.calcOscXYUint16Nan( hMb, HnData.dataBlock( 0 ), HnData.dataBlock( 1 ),TraceHeader.yIncrement, TraceHeader.YOrigin, TraceHeader.yReference, FileHeader.xIncrement, FileHeader.xOriginZero, 0, nPoints ) //my DLL code doing the same as the xojo code.
var d_dll as double = system.Microseconds - d1

hnData.WindowText = “Xojo code time = " + str( dxojo ) + " us Dll code time = " + str( d_dll ) + " us”

The nPoints = 50e6 in this test

Kem_Tekinay · November 10, 2024, 6:04pm

Hakan, in the future please wrap code within code tags to make it easier for us to help. You can either use the toolbar icon that looks like </> or just use three backticks before and after the code block. It makes this:

var dataBlock0 as ptr = HnData.dataBlock( 0 )
var dataBlock1 as ptr = HnData.dataBlock( 1 )

Look like this:

var dataBlock0 as ptr = HnData.dataBlock( 0 )
var dataBlock1 as ptr = HnData.dataBlock( 1 )

I wonder, did you try adding #pragma statements to speed up the code? If not, try putting this at the top of your method:

#if not DebugBuild
  #pragma BackgroundTasks false
  #pragma BoundsChecking false
  #pragma NilObjectChecking false
  #pragma StackOverflowChecking false
#endif

That and compiling as Aggressive can often make a dramatic difference.

Hakan_Nyberg · November 10, 2024, 7:49pm

Kem, thanks for the advice.
You are correct that compiling as aggressive can make a difference and it do so in this example too, but the DLL is still much faster:
Adding your pragma definitions gives the result:
Xojo code time = 404760.3 us Dll code time = 62297 us => DLL code is 6.5 times faster

Using pargma and aggressive compiling gives the result:
Xojo code time = 117765.3 us Dll code time = 61813.3 us => DLL code is still 1.9 times faster

My problem with aggressive builds is that i takes 1.1 hours to build instead of a couple of minutes.

Eric_Williams · November 10, 2024, 9:10pm

Hakan Nyberg:

for k as integer = 0 to nPoints - 1
dataBlock0.double( k * doublesize) = k * xIncrement + xOriginZero
dataBlock1.double( k * doublesize) = ( hMBPtr.UInt16( k*2) - YOrigin - yRef ) * yIncrement
next k

It would probably be faster (if only slightly) to do this:

var kIndex as integer
for k as integer = 0 to nPoints - 1
dataBlock0.double( kIndex) = k * xIncrement + xOriginZero
dataBlock1.double( kIndex) = ( hMBPtr.UInt16( k*2) - YOrigin - yRef ) * yIncrement
kIndex=kIndex+doublesize
next k

This avoids two multiplications and replaces them with a single addition and store.

It might also be incrementally faster to do this:

var kIndex as integer
var kOffset as integer
var twoK as integer
for k as integer = 0 to nPoints - 1
dataBlock0.double( kIndex) = kOffsest + xOriginZero
dataBlock1.double( kIndex) = ( hMBPtr.UInt16(twoK) - YOrigin - yRef ) * yIncrement
kIndex=kIndex+doublesize
kOffset=kOffset+xIncrement
twoK=twoK+2
next k

I’m pretty sure my first suggestion will make the loop noticeably faster. I’m not sure about the second.

Aaron_Hunt · November 10, 2024, 9:44pm

I would put nPoints - 1 in a variable so it doesn’t get calculated every iteration (or does the compiler do that for us now?)

Eric_Williams · November 10, 2024, 10:47pm

I thought it did but I could be mistaken.

Ian_Kennedy · November 10, 2024, 10:49pm

Pretty sure it doesn’t. Radical idea, what happens if you use arrays of doubles, rather than memory blocks all the time. Unless there is a reason it all has to be in a single structure for some external reason?

Eric_Williams · November 10, 2024, 11:07pm

I believe he’s receiving a block of data from the oscilloscope, so it’s faster to deal with it in place than break it up first.

MemoryBlocks are also faster for this kind of operation. Since all you need to worry about are the values, you avoid a lot of the overhead of bounds checking, references, etc.

Ian_Kennedy · November 11, 2024, 10:58am

  #pragma BoundsChecking false

Bounds checking can be turned off. I just think there is a lot of index calculations going on that could be avoided. Yes, sure the index for array’s are doing similar calculations behind the scenes, but I imagine that code to be quite highly optimised, as it is used so much.