improving performance in shell

DerkJ · October 19, 2019, 7:55pm

Starting a consoleapllication or shell command takes time. You might want to start multiple at the same time using an array of shells that may improve your end time.
In mode 1 or 2

Perry_Paolantonio · October 19, 2019, 8:00pm

I’m not really seeing a huge performance hit from launching shells. I mean, I ran a job yesterday that completed in about 10 hours. almost 93,000 75MB images were hashed. That’s pretty good, in my book, considering this machine is no powerhouse. It’s certainly faster than the python version we were using before. I mean, with that kind of time frame and that number of files (which isn’t unusual. I’ll be testing a 300,000 file set on Monday night), any tiny performance improvement will probably speed things up overall.

At this stage, the consensus seems to be to create a helper application, launch multiple instances of that helper, and keep it open, communicating with it as necessary to tell it which files to process. I might try it next week, since the more I think about it, the more applicable that will be to my next project, starting in a few weeks. Might as well learn how to do it now…

Michel_Bujardet · October 19, 2019, 8:32pm

With that many files, I would be more inclined to expect an i/o bottleneck than anything.

Kem_Tekinay · October 20, 2019, 12:19am

In this case, md5 IS your helper app.

Greg_O_Lone · October 20, 2019, 9:09pm

[quote=459266:@Perry Paolantonio]I’m not really seeing a huge performance hit from launching shells. I mean, I ran a job yesterday that completed in about 10 hours. almost 93,000 75MB images were hashed. That’s pretty good, in my book, considering this machine is no powerhouse. It’s certainly faster than the python version we were using before. I mean, with that kind of time frame and that number of files (which isn’t unusual. I’ll be testing a 300,000 file set on Monday night), any tiny performance improvement will probably speed things up overall.

At this stage, the consensus seems to be to create a helper application, launch multiple instances of that helper, and keep it open, communicating with it as necessary to tell it which files to process. I might try it next week, since the more I think about it, the more applicable that will be to my next project, starting in a few weeks. Might as well learn how to do it now…[/quote]
Quick hint… if you want the machine to remain responsive for any reason, consider using n-1 cores. Otherwise you could end up with users thinking that their computer locked up. Xojo does this this when compiling.

Cores = max(totalcores - 1, 1)

KarenA · October 20, 2019, 10:02pm

[quote=459406:@Greg O’Lone]Quick hint… if you want the machine to remain responsive for any reason, consider using n-1 cores. Otherwise you could end up with users thinking that their computer locked up. Xojo does this this when compiling.

Cores = max(totalcores - 1, 1)

For the general processing case (not IO bound) for using helper apps Is that still true if the cpu support hyperthreading?

In other words how much does hyperthreading boost processor efficiency? Obviously the factor (F) should be somewhere between 2.0 and 1.0. So what would the factor be… and I would think that would have bearing on the number of helpers…

A formula something like this might apply to hyperthreaded multi-core CPUs where F would be the hyperthread efficient factor :

MaxHelpersPlusMainApp = Floor(TotalCores2 - FFreeCoresDesired)
Or
MaxHelpersPlusMainApp = Floor(TotalCores*2 - FreeThreadsDesired)

(but maybe should use Round?)

If one assumes an F of 1.5 and one one core free on my 4 core i7 that would be:
MaxHelpersPlusMainApp = Floor(4*1.5 - 1.5) = 4
Without hyperthreading that would be 3

But on a new 8 core i9 iMac with hyperthreading that would be
MaxHelpersPlusMainApp = Floor(8*1.5 - 1.5) = 10
Without hyperthreading that would be 7

Obviously as F gets closer to 1 the less hyperthreading matters and the more cores teh more it could be a factor.

The reason ask is I am thinking about writing an app the uses helpers and I want to do it efficiently as possible but keeping the machine responsive… And using that to decide how many helpers to spawn and when things should be queued and wait for a free helper.

karen

anon20074439 · October 20, 2019, 10:18pm

https://www.percona.com/blog/2015/01/15/hyper-threading-double-cpu-throughput/

KarenA · October 21, 2019, 12:01am

Looking at all the comments not so simple… maybe best with 4+cores not to consider hyperthreading!

Karen

Michel_Bujardet · October 21, 2019, 8:44am

Forgive my old timer thinking, but there may be another bottleneck to consider with intensive file swapping/copying : the bus itself. Admitting the buffer used for copying files is well dimensioned, the bus itself has a finite throughput. And since copying files, as far as I understand, necessarily goes through memory, there lies a potential huge traffic jam.

I don’t know if it is possible to assign a different buffer size for shell copying, but that could be worth exploring.

It seems possible here:
https://duckduckgo.com/?q=command+prompt+copy+buffer+size&ia=web

Perry_Paolantonio · October 21, 2019, 12:09pm

We’re not copying files though, just running the hash - so wouldn’t that be a function of the md5 tool and not the shell?

Files are moved into a subfolder at one point in the process before hashing begins, but this only takes a split second even on a very large file set.

Michel_Bujardet · October 21, 2019, 7:31pm

At any rate you need to load them, in order to hash.

Kem_Tekinay · October 22, 2019, 6:21am

I ran a quick test on a large file (about 1.1 GB) using md5 in a shell vs. reading chunks at a time through MD5Digest and MD5DigestMBS. md5 took about 3s to process the file, the native MD5Digest about 3.5s, and the MBS version about 2.3 seconds. A way for you to speed this up might be to a write your own md5 utility around MD5DigestMBS and call that from your app.

Perry_Paolantonio · October 22, 2019, 3:33pm

I’m running a test right now (still using the mac’s built-in md5) on a 6-core 3.3GHZ Xeon Macpro (5,1 cheese grater) with a 10Gb connection to the network. Performance is blowing my iMac’s 6-core i7 out of the water. Granted many of the files I’m running right now are small (metadata sidecar files as well as some TIFs), but it’s done almost 25,000 of them in less than 10 minutes with 4 concurrent MD5 instances.

Because this has the same network speed as my iMac (same SAN volume, 10GbE connection), that’s pointing to either the increased power of the Xeon (even though it’s many years older than the i7-based iMac, or the bus speed of the MacPro). Could also be memory, since that machine has 28GB, and the iMac has 8. In any case, we have plenty of beefy MacPros that are only used some of the time, so this bodes well for cranking through lots of files quickly.

I’m running a batch that has 433,000 files and it’s just cutting through it like butter.

Kem_Tekinay · October 22, 2019, 3:35pm

Sounds like a case to use a distributed processing system across multiple unused or lightly-used machines.

Perry_Paolantonio · October 22, 2019, 3:42pm

Funny you should mention that: I was sitting in bed at 2:00 AM after my kid woke me up thinking exactly this, for the next app.

We’re building a film scanner from scratch and it works with ridiculously high res images (14k x 9x pixels). Each color channel is scanned separately using a monochrome camera and appropriately colored light, to create three B/W images representing R, G, and B. And each of those is made using PixelShift, which itself is 9 separate images stitched into a single composite frame. So for each frame of color film, there are a minimum of 27 images. If we do two flash HDR processing (same thing but with exposures for shadows and highlights, combined into a single imge), it’s double that, three flash (shadows, mids, highs) is triple.

The camera isn’t particularly fast and on a single machine I was thinking we’d be lucky if we can get around 2-3 Seconds/Frame (not frames per second, other way around), but it might be better to have a bank of simple computers (old blade servers maybe) inside the scanner chassis that are wired together with some kind of high speed interconnect like 40GbE ethernet or Infiniband, and just have each computer dedicated to processing a channel, with a fourth to handle recombining them into a color image.

So it might be time to learn how to do that too!