# Optimizing the speed of a pixel processing algorithm

Can anyone spot a way to increase the processing speed of the following algorithm? I’m tapped out of ideas on how to do so.

Is there perhaps a way to get a Ptr or MemoryBlock of a Picture’s data? If not, I think a feature to do so will make a big difference when building speed critical image processing applications with Xojo.

``````Function ScalePicture(p as Picture, newWidth as integer, newHeight as Integer) As Picture
// based on code from Dr. Gerard Hammond
// with performance/functional improvements by Tomis Erwin
// support for alpha channel and minor performance improvements by Alwyn Bester

#if DebugBuild=False then
#pragma BoundsChecking false
#pragma NilObjectChecking false
#pragma StackOverflowChecking false
#endif

Dim pOut as Picture
Dim s, sm As RGBSurface
Dim o, om As RGBSurface
Dim x,y,xMax, yMax As Integer
Dim xx() as Double
Dim c1, c2, c3, c4 As Color
Dim alpha As Integer

s = p.RGBSurface

pOut= New Picture( newWidth, newHeight, 32 )

o = pOut.RGBSurface

xMax = pOut.Width - 1
yMax = pOut.Height - 1

yMult=p.Height / newHeight
xMult=p.Width / newWidth

a=newWidth/p.Width
if a>.5 then
xSub=.45
Elseif a<.5 then
xSub=.75
else
xSub=0
end

a=newHeight/p.Height
if a>.5 then
ySub=.45
Elseif a<.5 then
ySub=.75
else
ySub=0
end

Redim xx(xMax)

for x = 0 to xMax
xx(x)=(x * xMult) - xSub
next x

For y = 0 To yMax

b = (y * yMult) - ySub

For x = 0 To xMax
a = xx(x)

c1 = s.Pixel(a       , b  )
c2 = s.Pixel(aPlusXAdd , b )

o.Pixel(x, y) = RGB( _
(c1.Red + c2.Red + c3.Red + c4.Red) \\ 4, _
(c1.Green + c2.Green + c3.Green + c4.Green) \\ 4, _
(c1.Blue + c2.Blue + c3.Blue + c4.Blue) \\ 4 _
)

c1 = sm.Pixel(a       , b  )

alpha = (c1.Red + c2.Red + c3.Red + c4.Red) \\ 4

om.Pixel(x, y) = RGB( alpha, alpha, alpha )

Next x

Next y

Return pOut
End Function``````

You mean memoryblock = picture.getdata (format, quality)?

You can surely speed up things by dividing the picture into CPU-threadcores * slices and setting up so many multitasked sub-apps. Errr  btw: Can Xojo use GPU features? All those doubles are much easier treated by a GPU.

Yes, so that one can loop through the pixel data as bytes, avoiding a call to RGBSurface.Pixel(x, y) that returns an object for each pixel.

[quote=67613:@Ulrich Bogun]You mean memoryblock = picture.getdata (format, quality)?

You can surely speed up things by dividing the picture into CPU-threadcores * slices and setting up so many multitasked sub-apps. Errr  btw: Can Xojo use GPU features? All those doubles are much easier treated by a GPU.[/quote]

Haven’t used threads before… guess this is a great opportunity for me to see how I could potentially use threads to speed up the processing. Will have a look into this.

Not sure about using GPU features. Probably doable with declares of some sort?

GPU: Quite certain done by declares and therefore beyond my scope.
Regarding the threads: Better check before if normal threads would do it  far as I know the real parallel processing of Xojo threads is a bit limited , or if it would be better to set up different windowless apps like in the multiprocessing example.

The problem with the current picture.GetData() is that it returns the data in a format such as JPG, PNG, BMP etc., and not in raw RGB bytes.

But perhaps I should have a look at the picture.getdata method again. Used together with Picture.FromData() it could be a solution.

I just wish there was a way to get a MemoryBlock of the Picture.RGBSurface object, so that one could manipulate the pixel byte data directly. This would really speed up things a lot.

Got your code setup to time a test image in a built app, runs at around 97-99 thousand microseconds.

Changing the setting of alpha from RGB() to a Color array runs around 95-97

[code]//before the xy loop
static greys(-1) As Color
if greys.Ubound < 255 then
redim greys(255)
for x = 0 to 255
greys(x) = RGB(x, x, x)
next
end

//in the loop switch this line
//om.Pixel(x, y) = RGB( alpha, alpha, alpha )
om.Pixel(x, y) = greys(alpha)[/code]

And I noticed a, b, aPlusXAdd and bPlusXadd are doubles. Copying those values to ints and using those vars where ints are expected runs around 87-90.

[code]dim ai, bi, aip, bip As integer
//…

For y = 0 To yMax

``````b = (y * yMult) - ySub
bi = b                  //copy to ints

For x = 0 To xMax
a = xx(x)
ai = a                 //copy to ints

c1 = s.Pixel(ai       , bi  )  //use the ints
c2 = s.Pixel(aip , bi )
c3 = s.Pixel(ai       , bip)
c4 = s.Pixel(aip , bip)
//...
c1 = sm.Pixel(ai       , bi  )
c2 = sm.Pixel(aip, bi )
c3 = sm.Pixel(ai       , bip)
c4 = sm.Pixel(aip, bip)[/code]``````

Excellent, thanks Will.

Not declares, but a language that can use the framework that exposes the GPU. OpenCL is one such language. Maybe you can compile OpenCL code to a dll (or the corresponign MacOS and Liunx library format) that can be used in xojo via declares, I don’t know. But you can’t simply use the GPU via system declares.

[quote=67619:@Ulrich Bogun]Regarding the threads: Better check before if normal threads would do it  far as I know the real parallel processing of Xojo threads is a bit limited , or if it would be better to set up different windowless apps like in the multiprocessing example.[/quote] Correct, if you want to use more than one core in Xojo you need to launch several applications and make them work in parallel. If you use the standard Xojo threads you will only be using one core, so no speed gain.

A few months back the was a Xojo blog post on this topic: http://www.xojo.com/blog/en/2013/07/take-advantage-of-your-multi-core-processor.php

Julen

I guess you can extract the image data if you remove the unnecessary tags. Have a look at  TIFF image data is presented in Byte form, which should be what you are looking for, or am I wrong?

I’ll first have to test how much overhead the Picture => TIFF (do stuff) TIFF => Picture causes. If the TIFF to picture and back conversions are fast enough, then that might be a possible way to increase the processing speed.

Sure. If it turns out to be helpful, you could either check other uncompressed image formats. BMP is much more simple; could very well be the conversion saves a few msecs. And it delivers the Image information in rows after a declared offset. If you skip that (can one easily move the lower border of a Memoryblock to cut away the offset? Pushing the ptr Offset bytes further?), conversion should be quite fast (if general BMP conversion is fast, of course).

[quote=67634:@Alwyn Bester]I still think that

MemoryBlock = Picture.RGBSurface.GetMemoryBlock()
would be first prize. I’m sure the RGBSurface is already stored internally as an array of bytes, so if one could just somehow get access to those bytes directly, it would be easy to design speedy algorithms for picture objects.[/quote]
Seems very possible, especially when you read the definition of an rgbsurface. If I would be savvy with handling pointers  but I am sure someone else here is.

I have not analyzed the code in depth. but first thought… perhaps you can cache the C1-Cx values so as to not execute the PIXEL function so many times?

So I don’t have to analyze the code, can someone tell me what that function does that’s different than the built-in scaling offered by Graphics.DrawPicture?

Have a look here, Ken: scale-quality-of-canvas-control

Sound have been Kem. Sorry!

Darn autocorrect.

Thanks for the link, that clears it up.

Will give a shot, and post the results once I’ve tested it.

I think your easiest solution and what’ll probably give you the fastest routine will be processing your image as an OpenGL texture. There’s nearest neighbour and bilinear sampling buried in there. Also trilinear if you want to process a series of images.

It’d be handy if the GetData routines would give you an appropriate memory block. I haven’t looked at the tiff option, but it may be fairly easy. In any case you can make your own easily enough using MemoryBlock.ColorValue(offset,32) = rgbsurface.Pixel(x,y) and increment offset by 4 for each pixel you add in. I find if I need to access each pixel more than 2 or 3 times then it’s quicker putting the picture into a memoryblock, process that, and put it back into a picture at the end.

If you don’t want the overhead of the whole picture put into a memory block and know how many lines of the picture you need at a time (say applying a 3x3 convolution kernel), then just read the 3 lines of the picture into 3 separate memoryblocks, process them, transfer line 2 into the old line 3 (using memoryblock.stringvalue), transfer line 1 into line 2, and now read a new line from the image into line 1, process those three lines and so on. Much quicker than repeated calls of rgbsurface.pixel(x.y) to the same pixel.

Regards - Richard.

I actually found that RGBSurface is quicker than accessing each color channel of each pixel of a memory block. I can’t explain why, only that this is what I found in the past.