Can you speed this up?

Running from the IDE this takes around 60 seconds to execute, when compiled it takes around 2 seconds, this is unusable when trying to rapid prototype using the data which can vary in size and composition. Does anyone have any ideas on speeding it up when running in the IDE short of caching the results between executions (cache it in a built app and using the cache when running from the ide) or running it via xojoscript so it doesn’t have the IDE/debugging overhead?

Thanks

#Pragma BackgroundTasks False
#Pragma BoundsChecking False
#Pragma NilObjectChecking False
#Pragma StackOverflowChecking False
#Pragma BreakOnExceptions False

Dim mb As New MemoryBlock(1e6 * 200) '200MB file

'for testing, place a marker at the end of the file (could be anywhere)
mb.UInt32Value(mb.Size - 4) = &hDEADBEEF

Dim loc As Integer = 0
Dim test As UInt32 = 0
Dim s, e As Double = 0.0

s = System.Microseconds

Do
  test = mb.UInt32Value(loc) 'I'll need to work on this value in the loop so it can't be moved
  loc = loc + 1
Loop Until test = &hDEADBEEF 'stop when we hit the marker

e = System.Microseconds

Dim d As Double = (e - s) / 1e6

MessageBox(d.ToString())

What is the speed using For … Next ?
Using for next removes the loc = loc + 1 line.

Use a ptr variable.
Assign the memoryblock before the loop.

Then use ptr to access memory inside the loop.

Thanks for the suggestions.

No difference

Slower, now 70 seconds

Something like this:

#if not DebugBuild
  #pragma BackgroundTasks False
  #pragma NilObjectChecking False
  #pragma StackOverflowChecking False
  #pragma BoundsChecking False
#endif

var markerMB as new MemoryBlock( 4 )
markerMB.UInt32Value( 0 ) = &hDEADBEEF
var marker as string = markerMB

Dim mb As New MemoryBlock(1e6 * 200) '200MB file

'for testing, place a marker at the end of the file (could be anywhere)
mb.UInt32Value(mb.Size - 4) = &hDEADBEEF

Dim loc As Integer = 0
Dim s, e As Double = 0.0

s = System.Microseconds

var bytePosition as integer = 0
const kChuckSize as integer = 10 * 1024

do
  var chunkSize as integer = min( kChuckSize, mb.Size - bytePosition )
  var chunk as string = mb.StringValue( bytePosition, chunkSize )
  loc = chunk.IndexOfBytes( marker )
  if loc <> -1 then
    loc = loc + bytePosition + 1
    exit
  end if
  
  bytePosition = bytePosition + kChuckSize - 3
loop until bytePosition > ( mb.Size - 4 )

e = System.Microseconds

Dim d As Double = (e - s) / 1e6

MessageBox(d.ToString())

0.18s here.

I can’t just skip over all the data as I need to work on each byte.

Apologies, I thought you were just looking for the marker.

But I am confused. Your code isn’t pulling bytes but UInt32 values at each byte location. Can you tell us more about what the processing looks like? There might be efficiencies that can be achieved overall.

A couple more datapoints:
Macbook Pro 16" M1 Max, Xojo 2022R2

Running in IDE as presented: 27s
Running in IDE without #Pragma statements: 27s (no change)
Running compiled as presented: 0.58s
Runniing compiled without #Pragma statements: 2.6s

Curious the pragmas had no measurable effect in the IDE.

1 Like

(Edited for clarity)

My tests:

Macbook Pro 2019 (core i9) Xojo 2022 R4.1:


Running in IDE
44000 msec : Original version
45000 msec : using Ptr instead of MemoryBlock

Built App (Optimization=Default)
1500 msec : Original version
395 msec : using Ptr instead of MemoryBlock


Built App - with compiler Optimization = Aggressive
940 msec : Original version
68 msec : using Ptr instead of MemoryBlock

Conclusions:

  • the IDE tests are not a good measure of speed
  • running in the IDE can be 1000x as slow as a compiled app built with Aggressive Optimization
  • Ptr.Uint32() is about 4-5x as fast as MemoryBlock.Uint32Value() – and up to 13x as fast if built with Aggressive optimization.
  • using Aggressive compilation really helps when using Ptr!

To use Ptr, just replace your loop with this:

dim p as Ptr = mb
Do
  test = p.UInt32(loc)
  loc = loc + 1
Loop Until test = &hDEADBEEF 'stop when we hit the marker

Another test:

M1 Mac Mini 2020:
Built App - with compiler Optimization = Aggressive
635 msec : Original version
123 msec : using Ptr instead of MemoryBlock

Interesting that the M1 is slower than the i9 when using Ptr - I think Intel chips may be better at non-aligned memory access than Apple Silicon? Or maybe the Xojo compiler / aggressive optimization is just not as good for M1?

2 Likes

Thanks all for the suggestions and tests (shakes fist at you lot with fast cpus) but I was hoping for a silver bullet that could make a code block run at build speeds. I can’t go into what the processing is, this was just an example to see if there was a way to not be impacted by the overhead of the ide when I really didn’t care for or need to debug.

I’ll have a think about the suggestions and see where I can go from here if not back to another language and ide.

Thanks

Reading between the lines here, and apologies if I’m misunderstanding, but it sounds like you don’t have a license to the software, and can’t build a compiled app? If so, I agree “yeah, that’s a problem!” :slight_smile:

If you do have a license, then I would suggest you make a smaller test case that will run in the IDE, e.g. instead of a 200MB file, make a 0.2 MB file for testing in the IDE.

If the file is 1000x smaller, that will make up for the IDE being 1000x slower, no?

Looks like he has some licence…

Also, if the end marker is more likely to be towards the end of the file, and there is only one instance , start at the end and work backwards.
On average, that has to be faster than starting at the beginning

I have a license and the marker can be anywhere in the file. Thanks though. Looks like this is a moot point as I’ve just found out that workers don’t work in console even though the docs say the do, brilliant. That’ll teach me for prototyping in desktop and not in the console. Back to the drawing board.

1 Like

In general the code using in debugger has a line to call into the debugger code after each code line you write.
so your loop gets two additional function calls in the loop and this slows down a lot, even as they just pass the current position in code to the debugger.

3 Likes

If you want speed you should not do the loop by your self. Use the framework instead if it’s possible, like below. It takes 0.5s inside the ide on my computer:

var mb as new MemoryBlock(1e6 * 200 )
var s,e as double

mb.UInt32Value( mb.size -4 ) = &hDEADBEEF
var mb2 as new MemoryBlock( 4 )
mb2.UInt32Value(0) = &hDEADBEEF
s = System.Microseconds
var ix as integer = mb.StringValue( 0,mb.Size ).indexof( mb2.StringValue(0,4))
e = System.Microseconds

var d as double = e-s
MessageBox( str( ix) + " " +str( d ) )

/Håkan

1 Like

Can you give us any more details about the data and the search key? For example, is the key always a 4 byte value? Are the bytes constrained in any way - numbers only, for example? There might be some interesting optimizations to be discovered.

Even better: - do not search at all.

  1. do not write the file into the memory block starting at byte 0. Instead, build-in an offset - or “header”, say 8 or 16 bytes. Heck, maybe it longer (who cares, if its 200MB, what’s another 512k).

  2. Load the file into the memoryblock from the header (ie byte 16 onwards).

  3. Once done, write the number of bytes actually written into the memory block as an integer, starting from byte 0.

  4. To read the data, start at byte 0 to read the actual data length (L) and from byte 16, read for L bytes.

This way you don’t need to search for &hDEADBEEF, and you don’t need an EOF marker either.

You might also find the description of “packed binary” files structures useful, as you can write and read data of arbitrary length this way.

With big files I would also suggest you consider error detection and validation, and a recovery mechanism so the app doesn’t crash if a 1 or 2 byte error occurs, there are ways to do this.

Back in the days of the “Classic” Mac OS, many binary file formats had a 512 byte header - to define the actual “data fork” length in bytes (it started from byte 512) and pointers defining where the “resource fork” started and its length, plus whatever metadata might be appropriate. For example documents could contain their icon in the resource fork and Finder would display it, even if the user did not have the corresponding app installed to open it.

DOS and Windows however had neither headers nor resource forks, and remain crippled by this today.

2 Likes