DynaPDF Optimize command

As you may know we have a great Optimize command for DynaPDF. You can use it on PDF documents with MBS Xojo DynaPDF Plugin. Over time the command got more and more powerful and we like to write here about some of the features.

In general the function rebuilds the content stream of all pages, templates, patterns, annotations, and form fields. This may remove errors in the content stream and produce a consistent document.

When you specify the flags, the default flag value (0) just rebuilds the content stream and fixes errors. You may specify “InMemory” to have changes made in a way, that the PDF is in memory and not flushed to the current output. Normally you may not notice the difference, but if you like to continue writing to the PDF, the memory flag is needed.

Scale images

The Optimize function can reduce the file size of PDF files. You can pass the ScaleImages flag and then all images are checked. You can define a minimum and target resolution for images. All pictures with at least the minimum resolution are checked. This avoids that we look on icons for example and only process pictures with a significant resolution. DynaPDF scales the images down to the target resolution and compresses them with the compression algorithm you specify, usually JPEG. If the final picture is smaller in size, we store it, otherwise we keep the original image. The reason is that often one bit tiff images can be smaller than a reduced resolution JPEG file.

You can pass flag SkipMaskedImages to skip masked images as JPEG compression may not work well with pictures, where a specific color is used to mark transparency. The check whether new picture is smaller than original image can be disabled via NoImageSizeCheck flag.

If you like to get images compressed with JBIG2, you can use CompressWithJBIG2 flag. This can drastically reduce the file size since JBIG2 compression achieves much higher compression rates than any other 1 bit image filter that PDF supports. The JBIG2 compression filter in DynaPDF is lossless, that means the original image quality will be preserved. Great to combine with ConvertGrayTo1Bit flag discussed below.

Link Names

The names for links in the PDF may often have names from the original content. For example our documentation uses the function names there as link names. But this can be optimized with the NewLinkNames flag and rename the internal link names with a running counter. This saves a few bytes for each link, which can sum up with thousands of link names.

Invisible Paths

Some drawing paths have an no-op operator on the end. Instead of defining a path and not using it, we can remove the definition. This usually happens in a lot of tools to create PDFs, as paths are defined automatically. e.g. you may have a rectangle to group items on the layout and the rectangle is not visible as it has no stroke and no fill. The resulting path is there, but has no output.

Flatten layers

If you pass FlattenLayers flag, all the layers may get flattened and non visible layers may get removed. As some layers may have content behind other layers, the invisible paths check above may kick in and remove content.

See also FlattenAnnots and FlattenForm functions.

Delete Stuff

The PDF may contain additional things, which can safely be removed. This includes private data from applications like Indesign or Adobe Illustrator. Those store details for editing of the PDF with the PDF as a BLOB. We can skip this data and get a smaller PDF size. Usually a PDF viewer will ignore any item it doesn’t understand.

We can remove thumbnails embedded in the pictures as those will be generated in the viewer if needed. Same for alternative representations of images. For example an application may include a CMYK and a RGB representation and we can remove one (the one marked as alternative).

Convert to colorspace

Beside optimization, we can do some extra operations like converting colors to other color spaces. This includes converting to grayscale, RGB or CMYK. If you like to convert a PDF to grayscale, converting colors here may be useful. Otherwise if you have to send a PDF to a printer, you can convert to CMYK. Usually you don’t need this as the printer will raster the PDF in CMYK color space. Converting CMYK content to RGB may reduce size as images have only 3 instead of 4 color spaces.

The flag ConvertAllColors allows to convert separation, DeviceN and NChannel color spaces to RGB/CMYK colors. This usually means you loose color information as the alternative colors will be used and those are specified in RGB, CMYK or LAB color spaces. For example a printer may have cyan, magenta, yellow, black and a special colors like gold and silver colors. An area marked with a Device color like gold will be printed with the gold ink to produce the shiny effect on the paper. Replacing those colors with their alternative CMYK color will remove that and the color is then printer with a mixture of yellow.

Finally the flag ConvertGrayTo1Bit can be used to convert colors to black and white. For this you can pass UseOtsuFilter flag to request to use Otsu filter.

Convert Text to Outlines

Recently added in the last weeks, DynaPDF can now convert text to outlines. This removes fonts and text information from the PDF and leaves the content as vector graphics. You can still read it as human, but a computer needs OCR to get back text.

Optionally you can pass ConvNonEmbFontsOnly flag to limit the conversion of text to outlines only for non embedded files. This way a PDF referencing fonts on your computer can convert those to outlines. Great to email your PDF to printer shop, where they don’t have the required font and you can still get your PDF printed well.

Annotations and form fields are not affected by the conversion. In order to consider these objects too it is possible to flatten all annotations and form fields before calling Optimize. See FlattenAnnots and FlattenForm functions.

Usage

The flags above can be combined in various ways. In general Optimize should be called at the end of your PDF processing just before outputting the PDF document. Our plugin may render preview and close the PDF file.

License

To use the Optimize function, you need a DynaPDF license with Pro or Enterprise level. Starter or Lite versions are not enough.

Without a license key you can test this feature and see how it works.

More

For more details, please check the DynaPDF manual on the Optimize function. For Xojo specific things, please check Optimize function in DynaPDFMBS class.

Check also DuplicateCheck flag for SetImportFlags2 function to enable duplication check. This helps to reduce the file size as duplicate font, image, templates and extended graphics state are replaced with references to first one.

Please do not hesitate to contact us with your questions.

1 Like

How comes I have overlooked this feature? That’s the sole issue with your plugins: too many options :slight_smile:

An adventure in itself. :grinning:

1 Like

:slight_smile: , whenever I’m driving nuts going through your huge documentation then I’m telling myself how lucky I am that I perhaps don’t find what I’m looking for but at least I don’t have to maintain this monster :slight_smile:

Talking about performance. I’m currently building something, where I have to parse thousands (actually 3296) webpages and then updating a postgres DB. It started as a quick and dirty project. By mistake I used at the beginning HttpSocket. Runtime: 59 minutes. The new API2 UrlConnection brought this time down significantly to some 40 minutes.

Today I replaced my logic with CurlMBS and your SQL Plugin. The whole “thing” now runs through in 20 minutes! That’s impressive.

2 Likes

Now available with 21.0 release:

MonkeyBread Software Releases the MBS Xojo Plugins in version 21.0

1 Like