Sorting Challenge for multi-core CPUs

Paul_Chance · December 19, 2024, 6:34pm

So far, the examples I’ve seen for embracing multi-core computers - like the Mac M4 mini - involve graphic manipulation. I’m waiting for techniques to appear in other computing areas. Maybe, someday, the OS can assess the tasks and employ more than one core to do the job.

For now, it seemed that “Sorting” would be a candidate. Because it seems the “best” technique depends on two factors - the number of items to be sorted, and the initial order of the items - I’m guessing an assessment of that starting state would be the first order of business. So there is that overhead.

A google search, or even a ChatGPT inquiry will give recommendations for shorting techniques given initial starting conditions. But they are based on single-core computations. I wonder if those recommendations would change if multi-cores were involved?

For example, with three cores, I could imagine some scheme that would divide the number of records into three groups, assess the best technique for each group, use each core to simultaneously sort its group, and then weave the groups back together.

Now that “weaving” could be complicated. What if, instead of just breaking the records into three groups like - with 300 records (or 30,000) - the first 100, the second 100, the third 100, that a first pass would create three groups with everything in the first group was smaller (on sort index) than everything in the middle group, which was smaller than everything in the third group?

That way, once the first, second, and third groups were sorted, you just put them back together in that order - first, second, third.

I’m guessing there is some records/order threshold, like the number and order of records needs to be such that the increased sorting speed benefit would be greater than the computing overhead of setting it up. For example, with less than 10,000 records, it might be difficult to beat a standard Shell Sort (that employs the best Gap Sequence for the particular record number/order) because of the overhead of something more complex.

Mike_Linacre1 · December 20, 2024, 6:38am

Hi Paul: your approach reminds me of sorting boxes of 2,000 punched 80-column computer cards using a computer-controlled mechanical sorter (IBM 2560 MFCM connected to an IBM 360-20) with two input feeds and five output feeds. First run through sorted the cards into ordered subsets. A new subset would start whenever the current card was sequenced lower than any of the top cards in the output feed hoppers. Second and subsequent runs through merged the subsets. We card-sorted many more times than we should have, because boxes of cards kept getting dropped!

Robert_Weaver · December 20, 2024, 7:24am

A merge sort will efficiently merge several datasets that are already sorted. So, if you have n processors, you can divide your unsorted data into n groups and sort each one individually in the separate processors. Then use the merge sort to combine the n subgroups into the final sorted dataset.

Paul_Chance · December 20, 2024, 5:26pm

Robert, I guess a merge sort would be the last step after the n groups were also processed efficiently. So you can specify “merge sort” because at the time of the “merge” you already know something about the data - that each set is sorted.

But when sorting each n set, there is still, without some pre-analysis, the unknown of the number of records and their order. Now perhaps, at some small number of records (say under 10,000) the intermediate sort technique wouldn’t matter. The overhead of analyzing the records to ascertain their best sort method would “cost” more than any process time-saving. I don’t know.

I was wondering what that would look like coded with Xojo’s multi-core programming options. Are preemptive threads easier/faster than workers?

Eric_Williams · December 20, 2024, 8:36pm

For this particular application, preemptive threads are a MUCH better option than workers, because they can share the same memory space. Workers are completely separate processes with their own context and need to communicate with the controlling app via sockets. This is fine for some applications but not a good fit if each thread needs to have fast read/write access to shared resources.

MarkusR · December 21, 2024, 10:48am

i think its similar to division of tasks for human resources.
for sorting i would separate into beginning with a,b,c or number ranges 0 to 99, 100 to 199.