Threads, arrays and timing

I have a series of threads, each which builds on the output of the previous, adding a series of different types of filters to the data from the first thread.
The threads are designed to run in parallel so that the UI can continually update as they progress.
Each thread can start running when the previous thread has started, and won’t stop until the previous has completed.
So each is of the form:

////////////
if thisThreadJustStarting then
i = 0
endif

InputArray = OutputArrayFromPreviousThread

Do
While i <= inputArray.bound
foundItem = inputArray(i)
if filter(foundItem) then
outputArray.append foundItem
endif
i = i + 1
Wend
Loop Until previousThread.isComplete

thisThread.isComplete = true

UpdateTheUI
///////////////

The first thread iterates through folderItem.items to generate its OutputArray, the second thread does the first set of filtering. Where the folderitems are local this works fine. But where the folderitems are on a NAS or server the delay seems to give me problems. At the conclusion of the second thread the inputArray is showing the correct number of items, but the outputArray varies at random. Sometimes it gets all the files, sometimes not.

I had expected that inputArray, being a reference, would always have the correct items from the previous thread, and that the thread would yield on each Wend to ensure this but my suspicion is that the network delay is causing problems somehow. I’ve tried various techniques like delayMBS, yieldToNextThread and SleepCurrentThread to let things catch up, but none seem to work reliably.

Suggestions?

in your code, you say

Loop Until previousThread.isComplete

Which seems like it would be stopping the current thread at the wrong time - you need to let the current thread process all of the items in inputArray.

Threads can yield at loop boundaries (do, loop, while/wend) but can also yield on some other framework calls, so that may be causing something unexpected.

My hunch is that the algorithm only worked in the past due to luck, and having the timing be diferent due to a slower NAS is exposing that.

If the logic bug isn’t obvious, I would add tons of logging:

system.debuglog ("ThreadA, item " + str(i) + " did something")
system.debuglog ("ThreadB, item " + str(i) + " did something else")

etc.

Thanks Micheal - I agree there must be something wrong with the logic.

The inner While-Wend loop is supposed to catch all the inputArray items, but somehow it seems that the inputArray is being added to in-between the last wend and the last LoopUntil. I suppose if the 2nd thread yields to the 1st thread at that point it is possible.

Currently I am relying on inputArray.ubound to confirm all have been completed. I’m thinking that I ought to set a shared property at the conclusion of the first thread with the final count of its outputArray, and keep going in the second thread until that point is reached. That might be more deterministic.

I’ll revert when tested.

Some tips:

  • fyi, if you put [ code ] tags around your source code in this forum, it will get nice formatting.
  • Does your filter() function yield, or call anyything else that may yield?
  • Try putting #pragma disableBackgroundTasks in this method, and then add an explicit Yield() command so it only yields exactly when you want it.

Otherwise, I think your basic idea of sharing a single array is sound, as long as you are always checking .UBound() on each loop.

Can you put all the code into a single thread?

I agree. Adding more threads actually slows everything down, and adds complication. If the threads depend on each other, combine them.

There’s a cascade of filters, each in its own thread, each providing output to separate displays in the UI. Each of the filters can be changed independently, the result of which then cascades through the remaining filters. For each filter I want to show the results progress in the UI (but leave the UI responsive) - otherwise there can be a long wait before the UI updates. Hence I tried to make each filter/UI set somewhat independent and put into separate threads.

There might be some optimisation later by combining some of the threads - once I have the beast working reliably.

I’m avoiding this solution as it still depends on the timing of different threads. Instead I’m just counting to make sure all the inputArray items have been processed before completing that thre

A refactor might also be the solution.

For example, you can turn your filters into classes. Add a new instance of each filter to the thread before starting it. The thread can then cycle through each filter for each value, and you can still get the updates you want while controlling the flow.

The Decorator Pattern might be a good fit here too.

Interesting Kem thanks. I keep trying to reed the book on Design Patterns but it is quite sleep inducing. Perhaps I should have bought an idiots guide version.

Basically, you turn your filters into a MyFilter class with a property NextFilter as MyFilter. An AddFilter method would fill in the NextFilter property of the last Filter, and a Process(value) method would process the value, then call NextFilter.Process(value). You can design reusable filter chains that way and assign a chain to a Thread before you start it.

Or something like that.

The problem with explaining patterns is that the explainer understands the patterns and the learner understand just nothing.

THE book to learn patterns is “Head First Design Patterns”. It’s from 2004 but the content doesn’t age. Link https://www.amazon.com/dp/0596007124 . I translated most of the patterns to Xojo (https://www.mothsoftware.com/content/xojo/xojo.php) . But the implementation very likely isn’t current anymore.

You need to do some updates, like rename the files to the new file endings. But the code itself works. There is an example for the decorator pattern.

without thread is looks like
output=Process(input)
output=Process(Process(Process(Process(input)))

with threads the input / output can be one / partly / all

in your first pseudo code you let the second thread open and he await more data but normal he would end his task
because all given data was processed. if there is new block of data to process a new thread should opened.

Thank you for that reference Beatrix - an excellent read.
With regard your translated patterns - I see they are all .rb formats - does anyone know how I can open these in Xojo?

You may need to rename them to .rbp, but they should still load.

Thanks Tim