Worker operating on large matrices

Kem_Tekinay · December 3, 2020, 10:09pm

If the app dies, the Shared Memory is retained anyway? Is there a way to get rid of it after the fact?

(I know nothing about this.)

Peter_Stys · December 3, 2020, 10:26pm

pls see my comments earlier in the thread

Douglas_Handy · December 3, 2020, 10:31pm

On Windows, I believe it will go away by itself. On macOS, it will last until reboot unless / until you explicitly delete it. So if you are concerned about shared memory objects hanging around after an abnormal termination (and prior to the next reboot) then handle it in the main app’s UnhandledException event handler. Or save the name(s) somewhere and check for them on app restart. Or whatever method best suits your use case.

In my usage, I generate names related to the task that was being performed, and having the object stick around was actually not a problem anyway.

YMMV.

Robert_Birge · December 4, 2020, 1:52am

In the case of the large matrix, a typical size is 400 x 400 doubles but it can go to ten times this size with a large molecule.

Kem_Tekinay · December 4, 2020, 2:27am

All workers will process all the data?

Robert_Birge · December 4, 2020, 3:26am

All the workers need access to the entire matrix of data, and although they will only use sections of it when doing an individual calculation. For the few familiar with configuration interaction calculations, the matrix holds the eigenvectors of the molecular orbital calculation. The Worker uses the eigenvectors to calculate a single element of the configuration interaction matrix, which it returns to the main program. The worker then moves on to calculate a new element likely using different components of the matrix. My goal is to have four or eight workers operating at the same time.

KarenA · December 4, 2020, 3:33am

I am not not of the few, and never was…even so I am even trying hard to forget I ever took a Quantum Mechanics course in grad school at all! (I forgot the course contents themselves several decades ago- right after the final! )

-Karen

Robert_Birge · December 4, 2020, 4:21am

Well Karen, I taught Quantum Mechanics for many years to graduate students at the University of Connecticut, and many of the students felt the same as you did. My problem was organic chemistry. Never made any sense to me at all.

Bob

Peter_Stys · December 4, 2020, 4:21am

Sounds like your problem is similar to mine where we have large multidimensional images, and each “stripe” can be processed independently by a different worker. In that case, shared memory is definitely the way to go IMHO.

Robert_Birge · December 4, 2020, 4:33am

Peter,

I am convinced based on your very impressive example that shared memory is the best way to handle this. I hope Xojo implements a SharedMemoryBlock Class as you proposed to make this easier to implement and handle cleanup.

Kem_Tekinay · December 4, 2020, 4:47am

What about a SQLite database to hold the matrix and store the results? On the same machine, the workers and main app can share it, and the data can be processed over time.

Peter_Stys · December 4, 2020, 6:08am

Years ago I made the mistake of using SQLite to store my image data, and larger images brought the blobs to their knees: if Robert’s datasets are large enough SQLite won’t do in my experience.

Arnaud_N · December 4, 2020, 11:55am

In the past (Mac OS 10.11, as a possible example), the Activity Monitor used to show “inactive memory”. In recent versions, the Activity Monitor just mentions “cached files”; I’m unsure they are the same thing.

So, if shared objects are inactive memory once the owning process has quit, this memory should be made available again once the system runs out of free memory.
If these shared objects are still considered “active memory”, then, of course, they stay in RAM like running processes until the computer is shut down.

Would a shared object be considered in “inactive memory” in Mac OS X when the owning app quits?

Kem_Tekinay · December 4, 2020, 2:42pm

Spitballing here…

If I were doing this in a vacuum, I’d consider creating a class that acted like an array of Double but used a binary file as its storage. The Worker would get a path to that file and would use that class to read from it. (It’s unclear if Robert needs to write back to it.) The code that accesses the array would not have to change much, and the sharing would be done invisibly.

I assume such a thing could be accomplished around your Shared Memory class too? Or is that essentially what the View does?

Robert_Birge · December 4, 2020, 3:50pm

Kem,

The method you propose is one of the options I investigated early on, but for some matrices, too slow for computers not using an SSD. Because I distribute my program to others, most do not have SSDs, so I seek something that uses memory. The shared memory seems to be the best. I do not write back to the matrix, but am creating a new matrix with elements that are calculated by the Workers.

Kem_Tekinay · December 4, 2020, 3:52pm

To be clear, what was too slow was using a file on the backend, not the concept of a wrapper class, right? That class around Shared Memory (if that’s doable, and I assume it is) sounds like the ticket for you.

Robert_Birge · December 4, 2020, 4:00pm

I agree. I am just learning how to use shared memory, but I think a wrapper class is an excellent idea. I really appreciate the fact that Xojo has provided the Worker class as I think they will make my CI programs significantly more efficient. Creating the CI matrix takes 85-95% of the time. That will now be done using four or eight workers.

Kem_Tekinay · December 4, 2020, 4:50pm

Is your code available anywhere?

William_James · December 4, 2020, 5:01pm

I appreciate what Xojo has done to create the worker class, but I really feel that it is just a good start for what most people will need. The data sharing and aggregation is the hard part of the needed capability and the worker class is a long way from providing a robust data solution. I have been experimenting quite a lot with Python. I have been learning about prime factorization and quadratic sieves. This algorithm has basically three phases – setup, sieving, and solving equations. One script that I would like to migrate to Xojo creates a class, does the setup phase that creates instance variables for that class (data that each worker needs), creates workers from a class function that accomplishes the sieving, aggregates the data and finally solves the equations (single processor) to determine the prime factor. To make the script multiprocessing literally takes two statements. One to import the multiprocessing module and one to use the “pool” statement that is quite remarkable. This one statement allocates data to workers, starts the workers, waits for the workers to be done, and then aggregates the data returned from the workers. No files are involved. It is nearly totally transparent. If anyone is interested the Python docs page is below:

Somehow, the developers managed to work around the python global interpreter lock to create this capability and it is quite remarkable. On my machine it speeds up sieving by almost a factor of 8 (using a hyper threaded quad core machine). I managed to factor a 90 digit number (multiple of two 45 digits primes) in about a day. The sieving was about 12 hours and the equation solving was about 12 hours. Without multiprocessing it would have been 4.5 - 5 total days. BTW I am looking forward to a 16" Mac Silicon machine to see what I can do with it. Right now I am using a 2013 MacBook pro. I have been waiting for a new model that is clearly twice as fast. Next year for sure.

Anyway, my wish is for Xojo to create a capability similar to what the Python Pool statement does. That would be awesome.

Peter_Stys · December 4, 2020, 5:58pm

If I understand correctly Kem, we are probably talking about two different solutions:

the one you are proposing is for transferring file-based data to workers (a very reasonable solution judging by your speed tests), and this could be easily accomplished by the IPC socket-based mechanism built into workers.
the one I described is built around shared memory, so everything exists as a single copy in RAM

So adding #1 to my shared memory class would not be useful because you’d be mixing the two solutions that are conceptually different: you either do #1 or #2, and neither would depend on the other.

And to answer your question, no the View has nothing to do with files, it is simply a memory block/Ptr, created for you in your app’s local address space, to access a subrange of a much larger shared memory block.

Hope this makes sense.