Xojo equivalent to python's scipy argrelextrema?

First, let me preface this by saying I’m well beyond my depth here, math-wise. So please, go easy. I’m pretty sure I understand what i need to do, just not sure how…

I have an OpenCV setup in python that detects objects using a template and returns all the instances of that template in the target image. The result (in python) is stuffed into a numpy array. since there are many hits on each match due to thresholding, it was suggested I find the local extrema in the array from which to derive the best x/y position for each match. In python this is done with argrelextrema from scipy.signal.

However, this will ultimately live in Xojo, I’m just using python to test stuff out because it’s easier and quicker. This is a xojo project with a single list box containing the results of the python script.

How would I go about determining the local minima and maxima from a data set like this, in Xojo?

What are those values in the listbox, the x-y positions of the returned pixes?

If so, aren’t we missing the value for each pixel (the color value, or gray-scale) in order to find the maxima ?

I am not saying I know how to find them, I am trying to understand what you are doing first.


EDIT: As a side note, you can add all the elements of a row with a single addrow: Listbox1.AddRow(“327”, “693”)

Well, why not use Python in an Xojo App ?

See Einhugur Software - Plugin Script Engines for Xojo.

Those are the X,Y coordinates of the start of the bounding box for the matched image. The box dimensions are the size of the template (which is small, relative to the image being searched). Pixel values are irrelevant here. What we’re looking for is the location of the match within the image, which OpenCV finds.

With that set of values, there are 4 matches (If you look at the Y values, you’ll see they fall into four distinct ranges). Here’s what the result looks like, in python:

The issue is that we have to be liberal with the thresholds in order to ensure we get matches. If you look closely at that image, you’ll see that the thickness of the lines is more at the top than the bottom. This is because it’s getting more hits due to slight variations in distortion in the image at the top vs the bottom. The numbers in the xojo project reflect this (you’ll see there are more hits for the top match than for the bottom).

So what we’re trying to do is whittle down the results to one set of X,Y coordinates for each match.

Performance. We need this operation to happen in less than 100ms. We’re using OpenCV via declares with a custom built C-API to OpenCV 4.5.2, working with extremely high resolution images captured with a frame grabber. The idea is to keep everything in memory. I did experiment a bit with using shared memory and python, but it’s too clunky. It’s just much easier to code everything if it’s all in Xojo.

So you need one pixel per yellow rectangle? Any pixel?

Are the white spots always at the same distance from each other? Are they always more or less vertically aligned?

I guess not, since you said you wanted the extrema. I don’t understand what you are after.

The best X,Y coordinates of the upper left point of the match. See the comments in this reddit thread for an explanation (and more detail about what we’re doing)

These are scans of motion picture film. in an ideal world, they would all be intact, and they would all be in exactly the same position relative to one another. However, due to shrinkage of the film the distances can change (in fact, we’ll be calculating shrinkage values as we scan, using the results of this pattern match), and due to physical damage, some could be missing entirely which is why we need to find as many as we can, just in case.

The position of the film within the scanned image will change from frame to frame. We’re looking for them in order to align one or more to a fixed position in order to register all the frames in the same location.

But what defines the best position?

I don’t understand python, so the code doesn’t tell me anything usefull.

Quoting the reddit comments from above:

Since you have multiple perforation instances, you have to threshold, which opens you up to this problem of multiple matches in the same location. However since these matches are located right next to each other, it’s not too hard to group them together. You could use a large local maxima (or minima) filter to find peak points in the match result and & that with the thresholded result image, giving you only the extrema inside your matched points, that’d be the easiest thing to do.

My read in this is that I need to use the local extrema to group the sets of coordinates.

Looking at the set of numbers for the top match, we’re talking a range of about 19 pixels on the Y and 14 on the X. Considering the source image here is 6500 x 5000 or so, we’re within 0.02%. So my guess would be that for each set of coordinates (each set of matches for the same object) I’d need to find the median or the mean value of the X,Y positions (I’d need to test to see what works best), and go with that.

the results would be the same regardless of language. The issue at hand just has to do with the end result, which I put in the Xojo project in the original post. So all I need to know is how to isolate the four sets of values from that dataset, within Xojo. The python code isn’t relevant here.

It is the first time you mention this…

I believe this could work:

The distance between all points within a match is smaller than the distance between the points of different matches. Begin with one pixel and check them all, remove those belonging to the first match from the list and proceed with the next pixel in the list.

Even faster, once you have the position of all the pixels in the first match you can roughly compute the positions of all the others you would only have to check their y position one by one.

Then you should compute the mean of the positions for each match.


yes, this works. I was hoping for a simpler one-liner, but this does the trick reasonably quickly. Thanks