Shrinking large RTF files containing images

Hi there,

I’m posting this here in the hope it might help someone in future, either in the Xojo community or elsewhere. It’s a followup to a post I made on the NUG in 2012; Kem was very helpful (as ever!).

When Word saves an RTF file which has a picture in it, it tries to be clever and save a copy of that picture in a backwards-compatible way so older versions of Word can open it. That copy is taken as a bitmap. A consequence of this is that the file becomes around 20 times larger than it needs to be.

To fix this, I have used a regex to strip out the backwards-compatible parts. I simply read the RTF in as a string, then apply the following:

[code] dim rx as new RegEx
rx.SearchPattern = “(?msi-U)\{\
onshppict\R?\{[^}]+\}\}”
rx.ReplacementPattern = “”
rx.Options.ReplaceAllMatches = True
outputFileContent = rx.Replace(sourceText)

rx.SearchPattern = "{\\\\shprslt[^}]+\\\\macpict[^}]+\\}\\\\par\\}"
rx.ReplacementPattern = ""
rx.Options.ReplaceAllMatches = True
outputFileContent = rx.Replace(sourceText)[/code]

The first part strips out the bitmaps from JPG files. The second part does the same from PNGs.

After that I simply save the file back to where it was. My test file, with lots of PNGs, started out at 34MB, and after shrinking was 1.4MB.

Hope this is of use to someone sometime in the future.