Worker operating on large matrices

Robert_Birge · December 3, 2020, 1:20am

This seems like a rather critical issue given the size of my expected objects. But would it not be possible to remove the memory object in the main program should the creation process fail indicating that a previous one remains in memory?

Peter_Stys · December 3, 2020, 1:36am

Yes absolutely. You can try this with the “Open” button in the test app (which always uses the same name for demo purposes so it can never forget it). If you “Open” and it fails, means the object with that name does not exist. If it opens successfully, means the object exists, either from the current or some previous run. And then you can destroy it by name, and start afresh.

The problem arises if you forget the object’s name: in my “real” app I create many of these (one for each scientific image in fact, which can be many 10’s GB is size), and each is given a unique name (you can see a commented-out line in the test app, basically a usec time stamp string to ensure the names are unique). So if my app crashes, and I did not make a note of all these object names, there is no way (that I know of) to retrieve them and destroy them, short of a reboot.

So that’s why I write their names to a file as soon as they are created, at least I can try opening them by name later to see if they exist, and if they do and are not part of the current session, I clean them up else the user would eventually have to reboot his/her machine (another problem arises if my users are running more than one copy of my app, which some of my students have a habit of doing, so i had to make sure that one instance of my app does not destroy valid objects belonging to another instance, etc, etc…).

But if you are only working on a single object at a time, you can use a fixed name like the demo app, and you’ll always be able to retrieve it and kill it.

Rick_Araujo · December 3, 2020, 6:51am

Create a unique temp file (or files) in an app temp folder (like myappfolder/temp_shared_objects) and write there a list of temporary object names in unique temp files (e.g. lines in a name.txt file where “name” is a UUID, or just consider “name” as the the object name, 0 bytes written, just a file creation). Create a locking scheme, like they being write only and keep them open during the session, so nobody can open them to read (open will fail because they are locked, so we know an instance of an app is using it).
When you start a task select the unique complex unique object names (like an UUID) and write there in your list, allocate the shared memory or use it, dojob(), release memory, close the file, delete the file.
Every time an instance of you app starts, check the temp_shared_objects, if there are files there, or one instance of your app is running or a crash left things orphans in the memory needing to be released. Detect those unlocked, read them, release the objects, erase the file, repeat until all unlocked files are erased there.

Rick_Araujo · December 3, 2020, 7:02am

If Xojo some day implements a SharedMemoryBlock Class as I debated in the past, when talking about workers and IPC, they could automate the auto clean up behind the scenes in a global fashion instead of per app. If some app uses SharedMemoryBlock(), a run once shared memory clean up could run at app start time. And the object name annotation and erasing would be treated behind the scenes during the constructor/destructor phases.

Peter_Stys · December 3, 2020, 3:40pm

I agree, I think with the arrival of multicore capability, unless the datasets are small, the only reasonable way to send large data back and forth is via shared memory, so a built-in mechanism and good cleanup mechanisms would be very useful.

Kem_Tekinay · December 3, 2020, 4:04pm

Curious, what is the size of data we are talking about?

Kem_Tekinay · December 3, 2020, 4:30pm

For comparison, on my box, sending around 100K bytes (about the limit of reliability) takes about 0.6 ms between request and receipt (that’s 600 microseconds), with occasional spikes up to ~15 ms.More data than that and the Worker will quit.

Are we talking about more data than that?

Peter_Stys · December 3, 2020, 4:52pm

Typically several GB, sometimes up to 30-40GB for a large image. For Robert’s application he could probably send a 1 MB dataset via IPC (tho why would you if you implement shared memory? plus you mention problems with blocks larger than 100K), but for larger data like imaging or Markus’ protein or gene sequence data, sizes are typically in the GB range, so short of a disk file with its I/O overhead (and then the data are duplicated in the workers), shared memory is the best way to go.

Kem_Tekinay · December 3, 2020, 5:21pm

I agree, I’m just trying to see if there is a practical benefit when a Worker doesn’t have to duplicate the data.

For example, if a Worker is sent a path to a file with a start index and length, it won’t be duplicating data. If it then takes 30s to perform its operation, even 1s to fetch the data won’t have much impact.

I have no idea if any of this is relevant in this case, so I’m just here for the discussion.

Kem_Tekinay · December 3, 2020, 5:22pm

FYI, sending 1B bytes (~ 1 GB) through a file takes about 570 ms round-trip. That’s creating one file that is then passed to each Worker, and each Worker reads its contents entirely, then records the time.

Peter_Stys · December 3, 2020, 5:36pm

Interesting, that is much faster than I expected (SSD I assume, or is the whole “file” cached in RAM behind your back?). So given this, file-based may not be that bad.

However a few issues:

the data WILL be duplicated because the console app will have to read it in to process it, so now you have a 2nd copy in the console app’s address space. For 1GB, no big deal, but for 20-30GB this could tax your resources unless you’re loaded with RAM
2nd, once completed, main will have to read that data back over its original data with the updated version, so more delay (agree that if your task takes 20min to complete, this overhead is trivial, but I have operations where user mouses over parts of an image and a complex operation is updated in a graph in real-time, so such overhead would degrade the responsiveness). with shared memory, there is only one copy in RAM that is simultaneously hit on by main and consoles.
3rd, this precludes what Robert wanted to do with status flags being passed back and forth between main and consoles (the 100 byte header idea I proposed to him). I actually find this extremely useful as main can keep a real-time tab on what each console is doing by checking various flags that consoles are continuously updating as they do their thing.

Kem_Tekinay · December 3, 2020, 6:24pm

To be clear, I’m not arguing against Shared Memory, and think that a standard way to do that through code would be a good thing. But I think we can agree that most use cases probably won’t need it, especially given these benchmarks.

As to your questions and points:

Yes, it’s an SSD.

I have 64 GB of RAM so I wouldn’t be surprised if the file is being cached, I just don’t know how to test that. This is also on a Mac which, based on experience, is generally better at this type of thing than, say, Windows. There might be more of a performance hit there, and I’m happy to make the test project available for anyone who wants to test it.

I didn’t follow the entire discussion so perhaps I’m misunderstanding your third point, but wouldn’t the SendProgress method help there?

Peter_Stys · December 3, 2020, 7:26pm

Agree Kem, and agree with SendProgress (haven’t tried that yet), but you can’t have main talk to console on-the-fly, if you ever needed to, tho that would be unusual so SendProgress should do, and is more straightfwd.

BTW, in my test app I posted I just noticed these stmnts in the sharedMeMViewClass Constructor that require my own plugin and will cause your compile to fail, so just delete them:
’ these require my plugin and are not necessary:

#if Target64Bit
  me.address_32bit = 0
#else
  me.address_32bit = memblock2Integer(me.memblock,0)
#endif
me.address_64bit = Memblock2Int64(me.memblock)

There is also a logical error with view offsets in the Init and computeSingleCore btns that I just noticed but that shouldn’t bother the underlying concept.

Björn_Eiríksson · December 3, 2020, 8:35pm

Would this not cause memory protection fault ? Or is it valid because their child processes ?

Peter_Stys · December 3, 2020, 8:48pm

I think it’s OK because the shared memory object is owned by the system not the app (I presume), but others who know more can chime in.

Björn_Eiríksson · December 3, 2020, 8:53pm

It should be yes, I thought you were casting normal MemoryBlock to Int

Peter_Stys · December 3, 2020, 8:57pm

I see, so you were referring to those particular stmnts above, not the whole shared memory concept in general. These statements are taken from my sharedMemViewClass which already opens a view into the shared memory object, returning a local memoryblock for you, mapped into your app’s address space. So casting it to an integer is perfectly fine because it’s all inside your app’s address space now.

Douglas_Handy · December 3, 2020, 9:11pm

My recollection from experimenting with shared memory is macOS handles it quite a bit different from Windows. With macOS, IIRC the shared memory object will persist until explicitly deleted or until a reboot. So even if there are zero processes which have the shared memory object open, the object still exists and data retained.

Whereas with Windows the object disappears as soon as the last process has dropped a reference to it.

Or at least that is my recollection.

Christian_Schmitz · December 3, 2020, 9:22pm

We have newer MBS Plugin to do > 2GB views for Peter.
So instead of int32 we use Integer everywhere now.

Peter_Stys · December 3, 2020, 9:44pm

This is absolutely correct (on macOS anyway)