Workers 2.0

Sam_Rowlands · March 9, 2022, 3:36am

Now that we have a 20 core Mac Mini Ultra, speculation is that the new Mac Pro will be a 40 core Mac Vision chiplet design.

The current version of workers is a step into concurrency, but they suffer from a fundamental bottleneck, sharing data.

In order to get data into a worker to process a task, the data must be loaded into each worker. Lets say you have a 4GB log file you want to look through for specific data.

The current techniques I can see (please recommend any I’m missing).

Load all the data in the main app and separate into lines, then pass each helper a blob of data for it to scan (requires some form of serialization, 1 file access, processing).
Load all the data into memory for each worker, split into lines and then work on it’s section (20 * 4GB and 20 file reads, and additional processing per worker).
Load all the data in the main app, figure out the line locations and pass a start location and a end location to each worker. Each worker than opens the file but just loads the section it is told into memory (requires 21 file accesses, additional processing time to calculate line positions and for each worker to split lines).
Load all the data into shared memory, process that data to create a second shared block containing line positions, give each worker access to the shared data blob, the shared line positions blob and which lines it should be working on (requires declares or a plugin, minimal file access, minimal serialization, but additional processing time to create a line ‘map’ array).

My proposal (without knowing what Xojo does under the hood), is a shared immutable object.
It is my understanding that core Xojo data types are just blocks of memory (simplified), so it would be very cool if Xojo could then copy these objects into a shared memory block, which is then shared with the workers, they would have wickedly fast access to all the lines of the 4GB text file, as each worker is given enough internal Xojo meta data to understand what the shared memory is (array of strings) and which lines it should be working on. Thus eliminating the bottleneck, most efficient memory management as the 4GB string and original line array can be dumped from memory, so only the immutable 4GB block is present during the search.

Obviously sending data back to the main application as to which lines contain the data we’re looking for needs some thought, even if it only ‘returns’ the info we’re looking for.

This doesn’t cover things like a find and replace as the shared memory is immutable, but maybe someone smarter than me can figure that out.

Imagine your Xojo made application running 4x faster on a Mac Mini Ultra, when compared to a M1 Mac Mini, instead of taking roughly the same time*.

*M1 Mac Mini Ultron has 16 Performance cores, while M1 Mac Mini has 4. Both have 4 efficiency cores. Core performance is about the same for both models, which means a Xojo application without some form of concurrency will run at roughly the same speed on both machines.

Douglas_Handy · March 9, 2022, 4:03am

Are you referring to the Mac Studio? It seems very much aimed at what I would surmise to be the most common users of the current Mac Pro. Maybe I am wrong and just assume that is the target market.

Sam_Rowlands · March 9, 2022, 5:05am

Is that what it’s called, I see so many different names for it “Big Mac Mini”, “Mac Pro Mini Pro Max” and “Mac Mini Pro Ultra”. It’s hard to tell nowadays with Apple’s naming scheme. I guess they’re all better than “Mac Mini XTi”, or Performa 1,048,576.

Douglas_Handy · March 9, 2022, 5:17am

Yes, at least during the event and on their website.

Beatrix_Willius · March 9, 2022, 5:41am

Not Ultron .

Yes, sharing data is the problem. For database access like Valentina it’s worse because Valentina acts like a singleton with writing/reading.

kevin_g · March 9, 2022, 8:05am

Copying back and forth will have a non trivial overhead making it useless for many tasks (image processing for one).

The data really needs to originate in shared memory or Xojo needs to support concurrency properly.

Sam_Rowlands · March 9, 2022, 8:43am

I was merely thinking about a potentially easier way to get from an existing Xojo object to a point where it could be easily be shared between the workers. In my mind a direct memory copy appeared to be the quickest way.

It depends on how you do it, you could get the raw data of a CGImage and copy it into a shared block, then create a new CGImage and point to the data in that block. So each worker would have a shared CGImage. If Xojo attempted something like this, hopefully they would do it in a such a way that we wouldn’t need declares to accomplish this.

Confussion I’ve never tried to do image processing with Xojo workers, instead resorted to GCD via a plugin and just let GCD sort that out.

I don’t see the later happening as many times Xojo have said it’s too complicated with their framework. So I was putting out a suggestion to build upon workers to make them more efficient.

kevin_g · March 9, 2022, 9:43am

If you were breaking up a task that took several seconds then using shared memory could help. However, you might find that there was no real benefit of using shared memory over reading and writing files to share the data.

I know you don’t use plugins but MBS do have some shared memory functions available so it probably could be done today if it was needed.

Unfortunately, I also don’t see better concurrency support happening. I’ve seen enough excuses over the years (UI won’t work / its too hard for users / the framework isn’t thread safe) to come to the conclusion that it will never happen. Luckily, most of our multi-processing requirements have been resolved with the help of MBS & Einhuger plugins.

It is a real pity as computer performance is mainly improving by the adding more cores and CPUs which means Xojo apps are never really going to get faster and are restricted to one core on mobile.

Rick_Araujo · March 9, 2022, 10:37am

The coordinator (main app) just read the data size, decides a block size, fires the lookup tasks passing:
– filename with path, block size, offset to start search, and what to look for to each worker.

Each worker will open such file as a read-only stream at such offset to collect those lines.
for the first line, If offset = 0, read all bytes until you find the EOL or EOF; if offset > 0 read all bytes until you find the EOL or EOF and discard them (they belong to another worker job owner of the previous block), and then, start the real line, if not EOF read all bytes until you find the EOL or EOF. count every byte read (totalBytesRead).
process the line as you wish
loop: If EOF or totalBytesRead >= block size, end the task (if there’s another line beyond this point, it’s another worker job).
- next line, read all bytes until you find the EOL or EOF; if not EOF read all bytes until you find the EOL or EOF. adds 1 for every byte read to the totalBytesRead counter.
- process the line as you wish
Next loop

Rick_Araujo · March 9, 2022, 10:43am

And yes, shared inter-process accessible memory / data objects, is a very desirable feature for an infinite number of reasons.

Rick_Araujo · March 9, 2022, 12:21pm

There’s an edge case I see now:
If offset > 0, peek char at offset-1, if it is EOL, don’t discard the first line, because coincidentally it is not a fragment from the last block, but the offset matched exactly the start of a line.

Mike_D · March 10, 2022, 5:04pm

I’m not sure that 20 workers reading the same file will actually be slow - most OSs do extensive read-ahead caching of filesystem reads, especially when they are read-only. I would recommend benchmarking this to figure out whether it’s actually a problem or not.
it may however use 20x the memory if each process is reading the entire file in one chunk, which could definitely slow things down quite a bit. In this case, the shared memory solution may help a lot.
this sort of limitation of serialization is not unique to Xojo. Javascript has WebWorkers which until just recently did not support Shared Array Buffer | Can I use... Support tables for HTML5, CSS3, etc (requires Safari 15.3, not supported in IE11 at all…)
If Xojo is going to implement something like this, I would recommend reading up on SharedArrayBuffer - JavaScript | MDN - the browser APIs are usually pretty well designed and not a bad starting point to use a similar/same design…

Sam_Rowlands · March 12, 2022, 12:39am

Thanks all for your input on this.

It’s helped me understand what I am asking for, and that is a “Shared” immutable object, which can be created in the main application and accessed from the workers. Maybe even t’other way around as well.

If anyone else thinks that this may be useful to them, I’ll file a Feedback request, otherwise not.

Jerry_Fritschle · March 12, 2022, 3:49am

With Workers as they are, you can only pass strings back and forth. What I have been doing putting XML data into the strings. This sometimes would gag if the xml, and thus the string, were particularly large. So I switched to saving the data as temporary files into /tmp or SpecialFolder.ApplicationData/myApp. Then the string that I pass to (or from) the Worker is short and sweet, just containing the file path. The Worker then opens the file, parses it, and off it goes.

Douglas_Handy · March 12, 2022, 5:13am

I don’t use Workers 2.0 (yet) because I already had in place “helper apps” which predate Workers support. They use shared memory routines from MBS to pass data back and forth while the helper apps are active. MBS supports both strings and memory blocks, though for my purposes I just use strings with JSON payloads. Both sides just use a timer to update or read the shared memory periodically so the main app can be aware of progress / results of the helper app activity, or to instruct it to do something different.

I originally got MBS Complete because I did not want to take the time to develop some specific tasks myself. But keep finding more and more things useful. And I don’t have to keep them up to date to jump through various OS change hoops – Christian does.

Sam_Rowlands · March 13, 2022, 12:23am

@Jerry_Fritschle Which is exactly my point.
The facilities as they are, are performance detrimental, the hope behind workers is to gain maximum performance.

Xojo’s objects (IMHO and until Xojo says otherwise) are just blobs of memory, and these could be copied to a shared block of memory (or even created directly) where the workers can simply access.

Currently your process requires

XML Serialization (as it’s a string format its slower than a binary format, such as IFF).
File save.
File load
XML De-serialization (see note for serialization).

@Douglas_Handy Your solution is more efficient in my view, however you still have a slow serialization / de-serialization routine via JSON. In theory using the class I am talking about would save the time it takes to makes JSON strings and revert the process.

@Geoff_Perlman any thoughts?

Douglas_Handy · March 13, 2022, 1:12am

While true, I don’t know of a way to achieve that, especially given the version I was using at the time I put this in place. Plus my JSON objects are typically not all that big. They are more status updates and accepting instructions, and not so much directly passing large blocks of data around.

It was the most efficient I could come up with at the time, and still serves me well enough that I have not moved to workers.

Sam_Rowlands · March 13, 2022, 1:39am

Doug, please don’t think that I was criticizing, I wasn’t and I am sorry if it came out that way. Any process that has to render the data between formats to transfer is going to be a bottleneck. Hopefully if Xojo take this seriously, the only bottleneck will be the memory copy, which should be minimal.

Rick_Araujo · March 13, 2022, 2:41am

They may have spread data while active; data, vars, states, not contained in a blob. Memoryblocks and structs are blobs.

Sam_Rowlands · March 13, 2022, 3:54am

Oh most likely, but I would expect them keep the ptrs to those properties in close proximity to one another, so that they’re easy to access, and copying the data structure and values in memory should still be faster than serialization of the data, transfer and de-serialization.

I have the code to create my own dictionary, I have the declares to utilize shared memory. So I could implement a shared immutable dictionary, however it would still need some form of serialization (albeit it can a binary format) and I actually think that an array (more like a Xojo collection than a Xojo array) would be more performant, but that requires the developer to remember what properties are at what position.

Time and enthusiasm are not my friends at the moment.