Now that we have a 20 core Mac Mini Ultra, speculation is that the new Mac Pro will be a 40 core Mac Vision chiplet design.
The current version of workers is a step into concurrency, but they suffer from a fundamental bottleneck, sharing data.
In order to get data into a worker to process a task, the data must be loaded into each worker. Lets say you have a 4GB log file you want to look through for specific data.
The current techniques I can see (please recommend any I’m missing).
- Load all the data in the main app and separate into lines, then pass each helper a blob of data for it to scan (requires some form of serialization, 1 file access, processing).
- Load all the data into memory for each worker, split into lines and then work on it’s section (20 * 4GB and 20 file reads, and additional processing per worker).
- Load all the data in the main app, figure out the line locations and pass a start location and a end location to each worker. Each worker than opens the file but just loads the section it is told into memory (requires 21 file accesses, additional processing time to calculate line positions and for each worker to split lines).
- Load all the data into shared memory, process that data to create a second shared block containing line positions, give each worker access to the shared data blob, the shared line positions blob and which lines it should be working on (requires declares or a plugin, minimal file access, minimal serialization, but additional processing time to create a line ‘map’ array).
My proposal (without knowing what Xojo does under the hood), is a shared immutable object.
It is my understanding that core Xojo data types are just blocks of memory (simplified), so it would be very cool if Xojo could then copy these objects into a shared memory block, which is then shared with the workers, they would have wickedly fast access to all the lines of the 4GB text file, as each worker is given enough internal Xojo meta data to understand what the shared memory is (array of strings) and which lines it should be working on. Thus eliminating the bottleneck, most efficient memory management as the 4GB string and original line array can be dumped from memory, so only the immutable 4GB block is present during the search.
Obviously sending data back to the main application as to which lines contain the data we’re looking for needs some thought, even if it only ‘returns’ the info we’re looking for.
This doesn’t cover things like a find and replace as the shared memory is immutable, but maybe someone smarter than me can figure that out.
Imagine your Xojo made application running 4x faster on a Mac Mini Ultra, when compared to a M1 Mac Mini, instead of taking roughly the same time*.
*M1 Mac Mini Ultron has 16 Performance cores, while M1 Mac Mini has 4. Both have 4 efficiency cores. Core performance is about the same for both models, which means a Xojo application without some form of concurrency will run at roughly the same speed on both machines.