Several same workers running in parallel is possible?

I’m playing with the worker class and am seeing a big improvement in responsiveness and speed, as of now (that part is impressing me).
The worker searches for files inside the folder passed as a parameter (to the JobRun event). So far, so good.

Now, my app was already designed so it can search multiple times in parallel (e.g. while a search runs in /Applications, the user may start another search in another folder). Refactoring this with the worker class puzzles me.

I can’t create new instances of a worker, by design, and the existing worker, while it runs, won’t give me a chance to tell it there’s another task to perform at the same time; I can only do all tasks serially.

Of course, adding other workers to the project makes no sense, albeit it could work…

Is this just a limitation or I’m overlooking something?

You don’t have to have to create additional worker instances if, as I understand, you’re passing multiple folderitem paths to the same JobRun.

I’m not in front of my computer now, but the picture resize example is good one. It too is running the same process on multiple file paths, and does as many at once as cores available.

With this template, when you loop through the file paths, append them to a string array. You might also add the array index–i.e, “ThisFile.Append x.toString+”,"+ theFilePath". Then call Worker.Start. In the JobReceived event (again, similar to what’s in the example), define a string as ThisFile(0) and send this to JobRun. This must be done BEFORE, as in the example, ThisFile(0) is removed from the array.

In JobRun, split the string at the comma, and read in both properties. You can Return “index+”,"+report" to JobFinished.

As the ThisFile array gets whittled down (because, again, the zero index is constantly removed in JobRequested), eventually all are done. Workers proceed in parallel to the extent allowed by the system cores (and the properties set in the worker itself.)

The purpose of the index property is on the presumption that you want to get string information (like logging) on each process and want to see it by index. More complex stuff can be done in something like xml, and passed back and forth in string form.

HTH.

Edit: tweaked this after I did get to a computer and could look at code. :slight_smile:

Future Jerry: I now understand your question a little better. I’m talking about running a bunch of stuff in a batch. You’re talking about starting an altogether new run before one is done. I’m afraid I just dim out the button in that case; the whole batch has been set up in advance.

OTOMYH, you could queue additional requests, and release them to the method that calls Worker.Start when the run is finished. Again alluding to my example, run a timer that looks for “if FilePath.LastRowIndex=-1”.

Edit (again): no, that’s not quite it. FilePath will be empty when the last JobRun STARTS. Instead, have a jobCompleted integer property that gets incremented at each JobFinished event. That’s how I decide when to un-dim my aforementioned button :slight_smile:

Thanks for your answers, Jerry.
Your second thought is correct.

That would still be a serial way (or I didn’t get what you’re telling?).

Making serial jobs is easy with the worker (and I guess it’s how it has been designed). But starting an additional one while another is already running, I still don’t get it…

In the past, I had done similar tasks using two projects (the desktop and the console helper) but (1) they’re not completely reusable and (2) managing two projects files for one single project is bad (since I had really bad experiences with external items and avoided them (worse using them than not), it was a bad OOP practice).

if you set the maximumcorecount to say 4 then when you call worker.start it should call the JobRequested event multiple times (fast once after another) this seems serial but if you return a string to all calls, the workers actually start (almost at the same time).

Did you try the worker examples and add

system.debuglog currentmethodname

to every method to see how it actually flows?

Actually, it doesn’t seem serial to me; that’s on the track of what I’m looking for.

But it’s the user who decides how many searches he wants to do. In the beginning, only one search is done; then he may want to start other searches.
If maximumcorecount can be modified at runtime (haven’t checked yet), half of my problem is solved: I can ask for new jobs to be run on demand.

The other half of the problem is if the user wants to perform more searches than the available cores. Can maximumcorecount be more than 100%?
(using more than 100% of the cores is of course doable on a computer; the processes are just running “slower”).

I’ve done a lot of debugging with DebugLog, in my worker trials, yes.
But I haven’t tried the examples (just saw them on video when the worker class was being implemented), as my case is way too different than what the examples do.

Thank you.

1 Like

MaximumCoreCount seems to be Get/Set so changeable at runtime.
you can aloso just return an empty string in JobRequested to ignore the request.

The other half of the problem is if the user wants to perform more searches than the available cores. Can maximumcorecount be more than 100%?
(using more than 100% of the cores is of course doable on a computer; the processes are just running “slower”).

You’ll get more JobRequest Events that will run after once’s that are finsished. I think you can set it to more than the actual core count. Perhaps it should have been called “MaxInstanceCount”. It’s true that they would share cpu resources same as normal console applications when you have multiple running on the same core.

1 Like

I don’t know how that’s reliable… :thinking:
Imagine you start with 2 jobs and 2 cores (on a 2 cores system, so you’re at 100%). Then you change MaximumCoreCount to 1 while the workers are running. At this point, we can assume both workers will run at half of the speed (or one worker gets halted, but I don’t assume this to be right).
And then, you switch back to 2 cores. I’d expect both workers to run again at 100% rather than the worker asking for a new job and keeping the existing ones (each running at 66%, then).
Which, in turn, makes MaximumCoreCount not the definitive answer to run a new job.

That’s still be after a job has finished (or none has started yet).

Still serial, then.

I can’t see why not, but the limitations aren’t clear. And the relationship with the number of jobs running at once is still vague.

Granted, it would have been clearer. The fact it’s not could mean it’s indeed not its actual purpose.

Lack of documentation and flexibility, that’s my current opinion.

Well a worker is actually a console application that is build for you. It can run multiple instances at the same time. Since xojo runs line by line your app where the worker(s) are managed seems to be serial. But multiple workers can run at the same time, when they are done they report jobcompleted but again that can be out of order. So it’s actually parallel.

1 Like

I forgot to say, in a debug run it uses threads! Instead of actual workers… that’s what you may see…

Well, I’m not talking about stepping through code, just about having as many jobs running as I (or the user) want.

Yes, they can run in parallel, but not truly “on demand”.

No, I’m fairly aware of that fact.
It’s just about launching a new “instance” (job) while the worker(s) already run(s) that looks not enough flexible.

Thanks.

Actually… they will both be run to completion at 100% but then only one will request another job.

The maximum core count does not limit the number of jobs that can be passed. The core limit will just limit how many are done at once. Serial in this respect, as Arnaud says.

This is still likely to be pretty fast. I am implementing workers in a project for a client. The original project wasn’t even threaded. With the test data of thirty jobs, it worked, but took minutes (and beachballed, of course.) Long story short, with eight cores it now takes fifteen seconds, and the four-cores of a decent laptop can do it in the thirties. The latter is what I could get with threads, which we tried before Workers were released, but Workers are more consistent–and, of course, the UI NEVER hangs. :slight_smile:

1 Like

That would still be a serial way (or I didn’t get what you’re telling?).

Yes, Arnaud,it would be serial, and yes, you understood me. My only point was to suggest that it could be made reasonably invisible to the user.

Ultimately, in this model, the more jobs your user starts at once, the better. To your question of somehow subclassing the worker to have multiple workers running, I have no answer.

I have been made bluntly aware of that fact. With the project I’m working on, using the usual test data, I almost have to build to run it. In the IDE, with thirty jobs, the performance is terrible. For debugging I have to use a much smaller data set.

1 Like

Ypu quoted me on something i didn’t say but did quote…

1 Like

Yes. Not enough coffee. What happened, I think, is that I snipped off the part above your quote. My apologies.

No worry the quote poster’s name is not in the quote i posted. Something changed there perhaps…

1 Like

This is something that could have been done better. Since threads are not like workers, a thread should not be used. Perhaps caching the worker after a build should speed up debugging run and builds so the actual workers are used even in debug run. Only when the code in the (jobrun) is changed the worker should re-build (the cache) even this could be done by the ide maybe in the background. Anyway those could have been implemented better and debuggin should be as close to the real thing as possible.

2 Likes