Multi-thread Beowulf app for fast HEVC H.265 video compression

I’m wondering if it’s even possible to effectively write a multi-thread Xojo app running on an expandable 8-node Raspberry Pi 4 Beowulf cluster to compress HEVC H.265 video files as well as or better than a dedicated GPU on a PC or Mac. Assuming it’s multi-core as well.

Is it even possible? How difficult would it be? Is someone already working on it? Would it be able to use ffmpeg or VideoCore? Or is there a better RasPi solution already?

Thanks! :smiley:

You could certainly startup multiple instances of ffmpeg on the different cores. I’m not completely familiar with how a Beowulf cluster hands out processes? Even with that you couldn’t do the work just in xojo code as it’s threads do not take advantage of the multiple cores unless you startup more processes and talk to them. I would have no idea how to even go about doing H.265 compression in my own code anyway though :wink:

Having an ffmpeg process for each available node or core working on a specific part of the file to be encoded would probably work. Xojo could easily be the wrapper program that managed all that. I’m not sure how you would stitch them back together at the end, I don’t think that the file format wouldn’t let you just concatenate them together but it might. One more ffmpeg pass at the end might be necessary to put the results back together into a single file but since it would be doing the work without any compression, just pass through, it would be as fast as the disk performance on the thing would let it.


Thanks for your input. It looks like we’re on the same page. I wasn’t sure if ffmpeg had been compiled to work on the ARM processor in the RasPi. If that’s true, that would put this idea closer to possible in my mind.

As for breaking up a video file into individual segments, I think you are correct in assuming that simple concatenation would not work UNLESS perhaps if the file were segmented at specific locations within the file based on the format; having to make sure that keyframes or check sums and the like are not interrupted.

I’m just fascinated with the possibility of determining just how many Raspberry Pi nodes would be necessary to effectively process a full-length movie video file in HEVC format. As it stands, my mid-2016 MacBook Pro with Nvidia GPU takes nearly an hour and my 2014 Mac mini with integrated Intel video takes about six times as long!

since hevc is progessive like mpeg-4 where it calculates differences between frames I would guess the best you could do is split the compression at key frames as boundaries and then reassemble things from that point on

within each process doing the encoding I dont know how much can be done in parallel


Thanks. I agree. My experience with breaking up video files into smaller segments is limited, but I have run across the issue of a program requiring the cut be made at a key frame, so I’m guessing that’s the best (and maybe only) approach. Now the question becomes, how the heck do you identify a key frame in a video file? Oy!

its been so long since I had to to any of this the last time I worked on encoding the fastest darned machine was a Mac Pro running the powerPC chips
those vector units just smoked everything else

Ahhh, the good old PowerPC days. I’m a technical writer working primarily on military contract jobs and I was recently asked to write up a single-page justification for deploying Apple laptops instead of PCs. I pondered this for days. I could have easily tried to fall back on product quality, service, closed architecture, etc. But in the end, my client was concerned about longevity. So I wrote four paragraphs on the tenacity of Apple to choose the winning processor at every step of it’s evolution over the past forty-plus years. They started with the 6502, then the 68000, then the PowerPC, then Intel, and now they’re steadily moving toward their own ARM processors. You gotta admire that kind of versatility in a market that so easily sloughed off big players like Atari, Commodore, and Texas Instruments to name a few. Meanwhile, PCs have used only Intel (or compatible) since day one. Bor-ing!

I ran across a comment for an article regarding Raspberry Pi clusters that insists that the math just doesn’t add up.

So, I might be trying to ice skate uphill on this one.