How to report only parts of pictures that are different?

Arnaud_N · March 10, 2021, 4:54pm

In a flow of pictures, I’m looking for a way to send only their differences rather than the whole pictures, to lower transmission size and time (unless, of course, if the only existing ways take significant longer than sending the whole pictures).
I believe apps like Remote Desktop do that: find differences between two frames and only send the relevant changes over the network.

For now, I have this test code:

dim p1 as Picture=Picture.Open(SpecialFolder.Desktop.Child("c1.png"))
dim p2 as Picture=Picture.Open(SpecialFolder.Desktop.Child("c2.png"))
dim p3 as Picture=DiffPicturesMBS(p1,p2,True)

c1.png and c2.png are screenshots I’ve taken with slight differences (seconds on the clock and one window’s content).

This produces a picture where all pixels that are equal are black; so far, so good, now I have to send the areas of the original picture (p2 in my example above) that are not black in p3.
But if I have to loop through p3 to check every non-black pixel, it won’t be faster than manually comparing every pixel between p1 and p2. Timing is my issue here.

Ideally, I’d like to get rectangles around non-black pixels so I can extract the coordinates and report them on p2 to send its changes, but I recognise “packaging” pixels in best-arranged rectangles is going to be hard.
Otherwise, I guess I could be satisfied by encoding each pixel (x, y and colour) and sending them in one message.

Whichever path I choose, I fail to see how to encode these pixels; that is, knowing fast their x,y position (and then, their colour) without looping through all the pixels for every picture, which would be too slow.

I’m sure this is not built into Xojo, but I’m wondering:
• can the MBS plugin do that? (based on my searches, I’d think the answer is “no”)
• is it actually a waste of time trying to improve how I transmit my pictures and I have no other choice than sending them entirely each time?
• any other thoughts?

In the real project, pictures (screenshots) are taken in a loop, so this makes a lot of them.

Jeff_Tullin · March 10, 2021, 5:12pm

You might investigate GraphicsMagik. Composite function with the Difference parameter?
(no, I dont know how to use it directly, but it sounds promising)

Arnaud_N · March 10, 2021, 5:30pm

Thanks for the tip.
I have quickly looked at the MBS implementation, but there’s no call involving a Difference parameter. I’ll take a look at the original class.

So, the Composite function would mix both pictures (that’s faster than computing masks and pictures in Xojo), but I’d still have a whole picture to deal with, which won’t solve my problem of sending only the different areas/pixels; or I’m missing something?

Jeff_Tullin · March 10, 2021, 6:00pm

sending only the different areas/pixels; or I’m missing something?
How would you send those anyway, though?
pixels?
Small rectangles?
How big is small?

Is 20 small rectangles faster than sending the whole image?

In a sense what you might be doing here is recreating the process used for mpeg , which compresses video streams: a key frame followed by changes.

Arnaud_N · March 10, 2021, 6:41pm

That was actually my second wondering:

So, I don’t know the answer. I just imagine 3 solutions:
1: always send the whole pictures (current solution, but I’d like it to perform faster)
2: send rectangles. As you say, I’d have to send several pictures of undetermined size (so a back-and-forth transmission would be needed, which doesn’t sound good).
3: send pixels-encoded values. The string would be sent only once for a given picture and formatted (e.g. (pseudo-code) Pixel1.Red+“;”+Pixel1.Green+“;”+Pixel1.Blue+“;”+Pixel1.X+“;”+Pixel1.Y+“<”+Pixel2.Red+“;”+Pixel2.Green+“;”+Pixel2.Blue+“;”+Pixel2.X+“;”+Pixel1.2+“<” (etc.), where “<” would separate each pixel).

With option 3, once I know which pixels have changed (which is still an ongoing question), I’d just loop thru them and construct the string. If too many pixels have changed, thus the string would be too huge, I’d revert to sending the whole picture for this time (similar to a keyframe, in codecs’ terms). So, when only a dozen (actual number to be determined) pixels have changed, the sent string would be very small.

Still, the question about “is it worth?” is unknown to me. But I have to improve how it currently works.

That’s actually what I’m after, but in real-time (or almost).
Thanks.

Douglas_Handy · March 10, 2021, 6:48pm

It seems like this problem may have similarities to how remote screen sharing tries to optimize sending just changed portions of a screen. I think VNC refers to it as Remote Framebuffer (RFB). So you may want to investigate open source implementations of it for ideas.

Some generic info on RFB protocol on Wikipedia.

Arnaud_N · March 10, 2021, 6:58pm

That’s exactly my use case, actually, with my own-made project.

Not meaning I can’t do that (though I’m not sure), but reading and understanding other languages than Xojo and AppleScript will, at least, take me a long time. Hopefully, comments will give me clues. I also hope the solutions found there are doable in Xojo at all (the wikipedia page you mention, which I’ve started to read, mention it’s made at the frame buffer level; may not be so simple using Xojo).

Thank you, I’ll take a look.
Still interested for quicker-to-implement ways, though.

Douglas_Handy · March 10, 2021, 7:19pm

Well, my only experience with RFB was a few years ago. Had a charter school where some of my grandkids went and each classroom had an Epson projector the teacher could use to display a laptop image or other sources on the classroom whiteboards. Epson also had a network program that could pre-empt the classroom displays and show a static image, intended for things like showing emergency alerts or whatever from premade images.

The school wanted to show real-time screen updates across all rooms with the sequence of vehicles arriving to pick up students at the end of the day, but Epson had no software to do anything but static images. So I used WireShark to reverse engineer the traffic being sent to projectors by their program, and had to dig into how they sent images. That ultimately led to discovering RFB.

In the end we used people outside with tablets updating Google docs with codes of students to be picked up, and in the office used a Google doc to display a real-time list built from codes keyed outdoors. Then I would capture that office display using MBS, convert the JPEG to a string (again using MBS), then convert that to RFB buffers with Epson style command prefixes to update each classroom display.

But I didn’t bother to really optimize it. For their purposes, I only needed to update the display every few seconds, so I just resent the entire screen in a RFB buffer.

Bottom line is I suspect RFB is what you want to implement here, but I didn’t do enough research at the time to do proper RFB encoding. I just grabbed the entire window (or screen) and treated as a RFB rectangle regardless of changes from previous transmission.

Then I sent via UDP multicast so dozens of projectors could pick it up at once with minimal network traffic.

Allowed each classroom to have a scrolling list of the sequence kids should exit their classrooms to line up orderly without masses of students in the hallways.

In your case you want quicker updates, so I was just suggesting that finding explanations of RFB algorithms or open source implementations for code review may be much better than trying to guess at how to optimize this. As it seems like that is what RFB is all about, so presumably others already did that research and algorithms exist. Just not as likely in Xojo.

Arnaud_N · March 10, 2021, 7:32pm

Nice story.

In my case, the “protocol” is already implemented and works (the app is actually being used in some schools, as someone told me). It’s just me who would like to optimise it.

I’ve already started reading the RFB implementation (not that much yet) and will do my best to translate ideas into my implementation (so, Xojo language).

I don’t know how this has evolved, but Xojo was known to be slow at picture manipulations. I’m wanting to do much with the MBS plugin, which would be faster, but this would involve passing pictures back-and-forth from Xojo to the plugin (assuming I find the necessary calls in the plugin, in the first place, which I’m doubting). Would these back-and-forth moves slow things down significantly?
Thank you.

Douglas_Handy · March 10, 2021, 7:34pm

Ironic we both approach from a school standpoint.

Douglas_Handy · March 10, 2021, 7:45pm

Well, calling the plugins and passing images will be infinitely faster than the network speed of sharing the images (or RBF optimized segments of it). So I wouldn’t worry about the overhead of calling MBS. Their functions are a treasure trove of useful stuff.

In my case I used ScreenshotMBS() if I wanted the whole display. Or if only wanting to send part of that, used REALbasic.Rect.Intersection() to determine which parts overlapped what I wanted, Picture.CopyPixelFastMBS() to crop out what I wanted, and JPEGStringToPictureMBS() to convert the image to a string I could use in my network calls.

But in your use case, you want to find an algorithm description or code implementations of how RFB compares stuff and optimizes which rectangles to crop and send. I didn’t have to optimize so just grabbed the desired screen area and resent every few seconds.

Edit: Oops, it is PictureToJPEGStringMBS() which is described here.

Arnaud_N · March 10, 2021, 8:02pm

Granted.
In my case, I’d never propose something so “advanced” for firms or where production is essential. I’m too coward for that

But, yes, both projects are close.

While my program actually works, it has some flaws like when Windows shows an UAC dialog (in a separate, protected, desktop), the connection is lost. I’d have to make a service but (1) I haven’t already done one and (2) some functions like sending a message or discussing wouldn’t work from a service, which can’t show windows or UI things. I’ve managed to check on launch and ask the user’s authorisation to turn off “separate desktop for UAC prompt”, but this setting broke some years ago (in current Windows versions, changing this setting now subtly breaks the ability of the user to run administrative tasks, showing weird error messages; a reinstall is usually required, so it’s still a problem ).
I’m occasionally thinking about improving it (as I do these days), but there are specific problems where I’m currently puzzled.

Hmm… I was actually comparing handling the pictures only from code or using plugins. Sharing the images over a network is still mandatory and I don’t plan (yet) of changing how this is implemented.

I’m doing almost the same as you. Only different things are I don’t crop out and I send as PNG.

Speaking of this, why did you not used ScreenshotRectMBS for parts of the screen?

Thank you. I feel reassured someone already tried this kind of thing with Xojo.

Douglas_Handy · March 10, 2021, 8:11pm

Probably due to not noticing it at the time. That project was a freebie volunteer thing for the school. Then realized that didn’t necessarily want to send the whole office display and let students see whatever was on that display if the window was not maximized. Also tracked if the active window changed and quit sending updates until it was the current window again, so classrooms still saw whatever portion of the list was viewable before the office user returned to the doc.

Sometimes you just find things that work and go with it, especially on freebie projects.

Greg_O_Lone · March 10, 2021, 8:13pm

If you don’t care about the alpha channel, you can use it to your advantage…

When comparing the two images and setting the common areas to black, also set the alpha at that position to 255 (fully transparent) and then save the diff in PNG format. PNG will compress-away the runs of black pixels and the transmission frames will be relatively small.

The advantage is that on the receiving end, all you need to do is draw the transparent PNG on top of the previously received “frame” so it should be relatively fast.

You’ll probably want to send a full reference frame periodically in case things get out of sync.

Arnaud_N · March 10, 2021, 8:25pm

Oh yes, clever!
I didn’t know having a PNG with transparent parts would alleviate its data representation. I can’t wait to try, as it’s an ideal solution (I think as of now).

Granted.

Yes, that’s planned; I have yet to see the results of various amount of time to choose the value.

Thank you!
I’m marking this as the solution, as, with your information about how PNG compression works, I’m confident I’ll make it working.