What's the fastest way to set the pixels of a Picture object?

GarryPettet · December 8, 2015, 12:42pm

I have a medical image that I’m reading from a binary (DICOM) file. Example code:

[code]’ data is a Xojo.Core.MemoryBlock containing the raw pixel information
’ transformedData results from applying a calculation to the rawValue to convert the
’ stored pixel data to a non-manufacturer dependent value
’ winCentre and winWidth are thresholds that constrain the pixel value after transformation

#pragma BackgroundTasks False

image = new Picture(columns, rows)
surface = image.RGBSurface

for row = 0 to maxRows
for col = 0 to maxCols
rawValue = data.Int16Value(offset)
’ Apply the rescale slope and intercept transformation
transformedValue = (rescaleSlope * rawValue) + rescaleIntercept
’ Apply the windowing levels to the transformed pixel value
surface.pixel(col, row) = ColourAfterWindowing(transformedValue, winCentre, winWidth)
offset = offset + 2
next col
next row[/code]

This correctly creates the image I need but there are a couple of issues:

1). It takes about 200 ms to create a 512x512 image. About 100 ms of this is a result of the code in ColourAfterWindowing(). The code for this is below. Putting the code inline (i.e. not a separate function call) doesn’t seem to reduce the time to create the image significantly:

[code]sub ColourAfterWindowing(inputValue as Double, centre as Double, width as Double) as Color
#pragma BackgroundTasks False

dim value as Double

if (inputValue <= centre - 0.5 - (width - 1)/2) then
value = 0
elseif (inputValue > centre - 0.5 + (width - 1)/2) then
value = 1
else
value = ((inputValue - (centre - 0.5)) / (width - 1) + 0.5)
end if

return HSV(0, 0, value)
end sub[/code]

2). One of the key operations needed when viewing medical images is the ability to “re-window” in realtime. Essentially, this means adjusting the pixel value of the pixels in the image to be within a certain threshold. Unless I can think of a clever way, this would require me to re-run the loop everytime the user wanted to adjust the window (usually done by holding down the mouse button and moving the mouse over the image). There’s no way that would be usable if it took 200 ms for every re-windowing adjustment.

Does anyone have any suggestions as to how I can alter the pixel values more quickly? I feel like I have grabbed the low-lying fruit already with #pragma BackgroundTasks False and using the RGBSurface.

I don’t want to use a plugin and it needs to work on all Xojo platforms. This will be an open source library when complete.

Thanks,

Will_Shank · December 8, 2015, 1:10pm

For the windowing you can try RGBSurface.Transform. This modifies the Picture so you’ll want to cache an original and Transform a copy.

But ultimately you should find the proportions of what’s eating time. Don’t use Xojo’s Profile feature, instead build an app with strategically placed Microsecond readings that allow you to list how much time was spent in creating an image, transforming, windowing, hsv and setting the values.

The fastest you may be able to get cross-platform without declares is probably OpenGL. The pixel data can be written directly into a Memoryblock of Singles, or maybe even Int16 values so you could pass your original data directly in. Once the texture is loaded in OpenGL you can draw it in real real-time with windowing. I’d have to refresh my memory to get more specific.

Greg_O_Lone · December 8, 2015, 1:37pm

You should also try using Pragmas to turn off BoundsChecking and StackOverflowChecking when not in debug mode (wrap them in #if not debug). That’ll squeeze a little more time out of them. Just be aware that they disable more exception checking so you’ll need to code a little more defensively.

Will_Shank · December 8, 2015, 4:39pm

[quote=234404:@Garry Pettet].
if (inputValue <= centre - 0.5 - (width - 1)/2) then
value = 0
elseif (inputValue > centre - 0.5 + (width - 1)/2) then
value = 1
else
value = ((inputValue - (centre - 0.5)) / (width - 1) + 0.5)[/quote]

You’re recomputing the math here on each if. Do once then clamp.

Also, that equation is linear, as is this one…

so they can be combined into a single linear equation y=m*x+b and your loop would look like

for row = 0 to maxRows for col = 0 to maxCols v = data.Int16Value(offset) * m + b if v < 0 then v = 0 elseif v > 1 then v = 1 end surface.pixel(col, row) = HSV(0, 0, v) offset = offset + 2 next col next row

Better yet, just let HSV do the clamping

surface.pixel(col, row) = HSV(0, 0, data.Int16Value(offset) * m + b)

Not sure if this is the bottleneck though

Will_Shank · December 8, 2015, 4:52pm

Also, get your data Xojo.Core.Memoryblock as a Ptr and use the Ptr instead. Ptr accesses faster but make sure the indices are correct because it doesn’t respect the Memoryblocks bounds.

Jeff_Tullin · December 8, 2015, 6:15pm

And if the bulk of the image is black, or white, fill the rectangle first, and only plot points which are a different color

GarryPettet · December 8, 2015, 10:28pm

These are all excellent suggestions, thank you. I’m on call this evening covering someone at short notice but hopefully I’ll get a couple of hours tomorrow evening to try to implement these and report back.

GarryPettet · December 9, 2015, 3:46pm

I tried all of the above suggestions, thank you. The only one I couldn’t figure out was how to convert a Xojo.Core.MemoryBlock to a Ptr to manipulate it that way.

The only one which seems to have made a noticeable impact is to Graphics.FillRect() the image black first and then only manipulate non-black pixels. This has cut the time per CT slice from 200-210 ms to 150-160 ms, a definite improvement. MR slices are much faster (50 ms) but plain films take forever (6 seconds) due to the number of pixels.

I may investigate OpenGL although I get the impression that would be a challenge.

I shall work on the windowing components to see if RGBSurface.Transform() will help.

Thanks again,

Jeff_Tullin · December 9, 2015, 4:58pm

Idea:

What you get is an array of INT16 values.
These are turned into a greyscale value.

Right now, if the array is 200 x 200, you are performing the calc 40000 times.
a 16 bit integer can hold 65535 values

The maths you perform on these values leads to a value between 0…1 and you only plot if not 0

So, how about a precalculated array?

Start by

dim RGBVALS as color(65535) for x as uint16 = 0 to 65535 RGBVALS (x) = calculate_RGB_value_For(x) //based on whatever constraints you want next

Then plotting is:

[code]image = new Picture(columns, rows)
surface = image.RGBSurface
dim theval as color

for row = 0 to maxRows
for col = 0 to maxCols
theval = RGBVALS ( data.Int16Value(offset))
if theval <> &c000000 then surface.pixel(col, row) = theval
offset = offset + 2
next col
next row[/code]

Christian_Schmitz · December 9, 2015, 5:35pm

The MBS Plugin ha sa couple of high optimized functions to copy picture to/from memoryblock:

https://www.monkeybreadsoftware.net/pluginpart-picturememory.shtml

Some people use it with OpenGL to fill textures.

GarryPettet · December 9, 2015, 11:02pm

Thanks @Jeff Tullin. I think in theory that could work but the issue is that whilst there are only 65535 rawValues, they are first transformed by applying the formula transformedValue = (rawValue * slope) + intercept. The value of intercept and slope can change between files. I suppose I could cache the computed array after the first slice and then reuse it if the next slice has the same slope and intercept, if not, recalculate it.

Fiddly work but maybe worth it. Thanks.

@Christian Thanks. I do love your plugins but I’m trying to do this in native code as I’m hoping to open source the module without dependencies.

GarryPettet · December 9, 2015, 11:28pm

Found a very easy way to speed it up - compile for 64 bit (tested on OS X). The average speed is 110 ms per slice. This compares to 150 ms for a built 32 bit app and 160 ms for a debug build. An MRI slice is just 33 ms (50 ms 32-bit) and the large radiographs are about 2.5 secs (down from 5 secs).

Thank God for LLVM.

Counting the days to debugging 64 bit builds

Marco_Hof · December 9, 2015, 11:44pm

Can you clarify what “re-window” means in real-world?
Is the user resizing the image? Is it moved around? Refreshed? Another drawn on top of it?

GarryPettet · December 10, 2015, 12:15am

Lets assume I have pixel values of between -1000 and 3000 (standard for CT images).

All images then have a “window” applied to them. A window has a “centre” and a “width”. Essentially, a window emphasises pixels whose value lies within a certain range and excludes others. For example, most of the contents of the abdomen has a value of between 0 and 100. We can display a greater range of grayscale colours for the content of the abdomen if we make every thing that’s very dense (has a high pixel value) white and everything that is not very dense (low pixel value) black. In this case we might choose a centre of 50 and a width of 300. Therefore, all pixels with a value > 350 will be made white and those < -250 will be made black. We can then use the full range of grayscale colours for the remaining range.

Here’s the same image from a DICOM file with 3 different windows applied:

Standard soft tissues:

Bone:

Liver:

When the radiologist wants to rewindow, the viewing software gradually alters the appearance of the image in real time as the mouse is moved around. See below:

Here’s a short video (few seconds) illustrating this:

Marco_Hof · December 10, 2015, 2:25am

Now I can see why you need it as fast as possible.
All I can think of is either using OpenGL or some kind of helper that keeps calculating ahead what all possible/most likely rewindow’s are going to be requested next and fill caches with it. Either way, I think you will end up with lots of fiddling.
But for sure it’s a very interesting project.

Patrick_Lim · December 10, 2015, 8:42am

Here’s a shot in the dark: each time winCentre and winWidth are changed, pre-build a dictionary lookup that maps transformedValue (or rawValue, if possible) to the required HSV colour. Then use the dictionary in your loops. Dictionary lookups are supposed to be fast and this way would move most of your calculations out of the MaxRows*MaxCols loops.

Jeff_Tullin · December 10, 2015, 10:08am

That’s what I said.

Except instead of using a dictionary, I used an array, where the starting value can be accessed by ordinal value
(I expect that to be faster than calling an element in a dictionary because there is an element of searching there… an array should be able to get the value by simple maths)

This way, no matter how big the image is, there are 65536 calculations to do, then plot as fast as possible.
An 800 x 800 pixel image would require 10 times more calculations doing every pixel one by one. (640000)

if they start with a gamut of -1000 and 3000 then the range is only 4000 elements.
Using the array would need to know the offset of the lowest.

So instead of myArray(10) you would use myArray(10 + lowestvalue)

Patrick_Lim · December 10, 2015, 11:12am

Sorry, Jeff, I didn’t mean to steal your idea! I am curious, however, about whether using a dictionary and going straight from raw data to colour without having to do any calculations in the loop would have any benefits in this case.

One other idea would be to create a half-sized copy of the data and use that for when the user is “re-windowing”. With a quarter of the data, the live updates would be much faster. Once the mouse is released, the full resolution image could then be shown.

Jeff_Tullin · December 10, 2015, 11:50am

It should.
Consider an image 1000 x 1000 pixels
Thats 1 million calls to the functions shown below

[code]for row = 0 to maxRows
for col = 0 to maxCols
rawValue = data.Int16Value(offset)
’ Apply the rescale slope and intercept transformation

// 1 million calls to the calculation to get a color
transformedValue = (rescaleSlope * rawValue) + rescaleIntercept
’ Apply the windowing levels to the transformed pixel value
surface.pixel(col, row) = ColourAfterWindowing(transformedValue, winCentre, winWidth)
//==============

  offset = offset + 2

next col
next row[/code]

Whereas if the gamut is 4000, then there are 4000 calls to the calc instead of 1 million
followed by 1 million pixel plots with no significant maths involved.

It would only make little difference if the calcs took an insignificant time.
1 million insignificants should add up to something…

Will_Shank · December 10, 2015, 2:34pm

In my testing HSV and RGBSurface.Pixel consume most of the time.

A way around those is an idea that comes from Alwyn Bester, or maybe it was Alain Bailleul: write a Memoryblock of BMP data (which is an easy, uncompressed format) and then use Picture.FromData to get the actual Picture.

It does take significant time for FromData to process, but writing ints to a Ptr is so much faster than Pixel that overall it’s quicker. Combine this with Jeff’s idea of using a map and it’s even faster.

Here’s my code if you want to try. bmpscan.zip

Firstly, I didn’t realize your data is Signed Int16. I used UInt16. Also I used Memoryblock, not Xojo.Core.Memoryblock. These are trifles. If you want to plug in your data use “dim dataPtr As Ptr = data.Data”. Ah right, your data is signed.

Method makeBMPMem takes an image size and returns a MemoryBlock formatted as a BMP image. Pixel data starts at byte 54+4*256 and is 1 byte per pixel grayscale. So long as the image is the same size this Memoryblock can be reused/cached.

The value map is being computed for all 65536 elements but I believe your data is 12 bit so only 4096 need be done.

Also I specified the windowing as a min/max value instead of center/range. Top slider is min (maps to black) and other is max (white).

  dim minv As Double = Slider1.Value/Slider1.Maximum * 65535
  dim maxv As Double = Slider2.Value/Slider2.Maximum * 65535
  dim m As Double = 255 / (maxv - minv)
  dim b As Double = minv * -m
  
  dim map(65535) As UInt8, v As integer
  for i As integer = 0 to 65535 
    v = i * m + b          
    if v < 0 then
      v = 0
    elseif v > 255 then
      v = 255
    end
    map(i) = v
  next
  
  dim w, h As integer = 512
  
  dim bmpMem As MemoryBlock = makeBMPMem(w, h)
  dim bmpPtr As Ptr = bmpMem
  dim dataPtr As Ptr = data  //data.Data when Xojo.Core.Memoryblock
  
  dim headerSize As integer = 54 + 4 * 256
  dim lastPix As integer = headerSize + w * h
  
  dim idx As integer
  for i As integer = headerSize to lastPix     //map pixels
    bmpPtr.UInt8(i) = map(dataPtr.UInt16(idx))
    idx = idx + 2
  next
  
  dim pic As Picture = Picture.FromData(bmpMem)
  g.DrawPicture(pic, 0, 0)

OpenGL is faster still and simpler(ish) because there’s built in scale and bias to rewindow the pixels. Although this is using glDrawPixels which isn’t really the best way, pixel data is sent to the gpu each time. Better would be to load it once onto the gpu as a texture and use fragment shaders for mapping, but shaders aren’t part of Xojos built in OpenGL. They require a few declares but I believe are cross-platform if it’s something you’d consider.

I don’t know if OpenGL would be a complete solution though. I mean it draws well but for saving picture files or drawing nice antialiased lines and text on top then Picture/Graphics will be easier.

Function Render() As Boolean

  OpenGL.glViewport 0, 0, me.Width, me.Height
  
  OpenGL.glMatrixMode(OpenGL.GL_PROJECTION)
  OpenGL.glLoadIdentity
  OpenGL.gluOrtho2D(0, me.Width, me.Height, 0)
  OpenGL.glMatrixMode(OpenGL.GL_MODELVIEW)
  OpenGL.glLoadIdentity
  
  OpenGL.glClearColor(0, 0, 0, 1)
  OpenGL.glClear(OpenGL.GL_COLOR_BUFFER_BIT)
  
  dim minv As Single = Slider1.Value/Slider1.Maximum
  dim maxv As Single = Slider2.Value/Slider2.Maximum
  dim m As Single = 1 / (maxv - minv)
  dim b As Single = minv * -m
  
  OpenGL.glPixelTransferf(OpenGL.GL_RED_SCALE,   m)
  OpenGL.glPixelTransferf(OpenGL.GL_GREEN_SCALE, m)
  OpenGL.glPixelTransferf(OpenGL.GL_BLUE_SCALE,  m)
  OpenGL.glPixelTransferf(OpenGL.GL_RED_BIAS,    b)
  OpenGL.glPixelTransferf(OpenGL.GL_GREEN_BIAS,  b)
  OpenGL.glPixelTransferf(OpenGL.GL_BLUE_BIAS,   b)
  
  OpenGL.glRasterPos2i(0, 0)
  OpenGL.glPixelZoom(1, -1)
  
  OpenGL.glDrawPixels(512, 512, OpenGL.GL_LUMINANCE, OpenGL.GL_UNSIGNED_SHORT, data)

End Function