PDF/A Support

Looking for a library - either something that works directly with Xojo or an external library - that can generate valid PDF/A files. Specifically needs to generate files that pass VeraPDF verification.

My preference would be for a non-xojo library if possible. The use case is a microfilm scanner that does almost all the heavy lifting on Raspberry Pi, but I’m not using Xojo for that. My desktop Xojo app is used to set up and monitor the scanner, which runs independent of the desktop app.

My preference would be to generate the PDF/A file directly on the Pi and just be done with it. But if that’s not possible then the fallback would be to capture to TIFF and then do an export from the desktop app to PDF/A. Thus the library-first preference.

MBS Xojo DynaPDF Plugin can do PDF/A.

For a new PDF, you can use Lite edition.
For conversion of existing PDFs, you need Pro + PDF/A converter.

1 Like

my vns fpdf library works on linux. you can generate pdf files with it
to build pdf/a files you need to be using only certain type of pdf commands
I’m building a pdf/a premium module (paid) for the library but it’s not ready yet
may be you can already try to create pdf files on the Pi using my libraries I did not test on a Pi.

1 Like

I’m still mapping this out. Doing the work on the Pi is going to require some serious testing to ensure that the PDF creation isn’t bogging down the job of scanning. I don’t think it will but I’ve only just gotten the scanning part working reliably this week.

So my fallback approach at the moment is to have the Pi just make a multi-page TIFF file, and because the scanner is connected to the desktop app and requires the desktop app to run, that app would pull the TIFF over the network when the scan is complete and would export it out to PDF/A locally. This is starting to feel like the better workflow, but again, will require some testing. It would also allow for OCR at export time, which is probably better done on a desktop than on the Pi.

With DynaPDF - the licensing is one-time (plus annual maintenance), developer-side only correct? So no per-seat licenses? Also, the machine running this can’t phone home to check for license keys so I need to make sure that’s not an issue.

How is OCR handled with DynaPDF, or is it? The desktop client will likely be Windows since that’s more widely used than mac in the world this machine will exist.

Thanks - I’ll check it out. What’s your timeframe for PDF/A?

Because I heard about PDF/A here for the first time (and I certainly am not alone, here two pieces of information about it:

Google:
What is PDF/A? PDF/A is an archival format of PDF that embeds all fonts used in the document within the PDF file. This means that a user of your file will not have to have the same fonts that you used to create the file installed on their computer to read the file.

and from a gov site pdf file: How Do I Create a PDF-A file

You can download our plugin and try it.
License is one time for developer plus optional maintenance for updates.

DynaPDF doesn’t phone home.
Please try it.

Not sure what you look for on OCR, but MBS Xojo Plugins have several OCR solutions:
TessEngineMBS class with Tesseract. WindowsOCREngineMBS for Windows and VNRecognizeTextRequestMBS for macOS.

1 Like

It’s not just about embedding fonts. An image-only PDF, for example, doesn’t need fonts embedded at all so it can comply with PDF/A-1b or PDF/A-2b. But when you start doing OCR then you need to embed fonts. The standard also has requirements for stuff like embedded ICC profiles, etc. There’s actually quite a bit to it. I think the spec is something like 80 pages long.

We use an expensive commercial scanner for microfilm right now and it generates PDF/A files that fail verification, forcing us to run them through software that corrects that. Not a major issue but super annoying. Anyone can say a file is PDF/A, but it doesn’t mean it’s actually a valid file.

The standard is to check it against veraPDF, which is a validator, and that’s what I need to make sure we’re using. The files this scanner produces will be for archives that require conformance with the PDF/A standard.

none for now ! I need user examples to see what’s really needed…

PDF/A is a complex thing to support. DynaPDF supports a couple of variants. Let me quote documentation:

PDF/A 1b:

When creating new PDF/A 1a or PDF/A 1b files, the following features are prohibited:

• The fill or stroke alpha constant in an extended graphics state must be 1.0 if present (see CreateExtGState()).

• Transparency groups, blend modes, as well as alpha channels in images.

• Layers (CreateOCG(), CreateOCMD() and all related functions).

• Annotations which are not defined in PDF 1.4. Highlight annotations cannot be used since these annotations require the blend mode Multiply.

• Form fields (form fields will be flattened if present). Note that check boxes use the font ZapfDingbats which is mostly not present on a Windows system.

• Embedded ICC profiles with a major version higher than 2. Version 4 profiles cannot be used in PDF/A 1 files.

• PDF/A files cannot be encrypted. The usage of CloseFileEx() or CloseAndSignFileEx() is not allowed.

• All features which are not defined in PDF 1.4.

PDF/A 2b, 2u, 3b, 3u

PDF/A 2b, 2u, 3b, 3u based on PDF 1.7 and hence support more features like transparency or optional contents (layers). The only difference between PDF/A 2b and 3b is that the latter version supports also embedded files. The following features are prohibited:

• Annotations which are not defined in PDF 1.7.

• Form fields (form fields will be flattened if present).

• Overprinting is permitted but the overprinting mode cannot be set to 1 if an ICCBased CMYK color space is used. Due to implicit color conversion rules this applies also to DeviceCMYK.

• Application events are prohibited in PDF/A 2 and 3 (see AddOCGToAppEvent() for further information).

• Annotation replies are still prohibited (see SetAnnotMigrationState() for further information).

• PDF/A files cannot be encrypted. The usage of CloseFileEx() or CloseAndSignFileEx() is not allowed.

• PDF/A 3b: Embedded files must be associated with a PDF object. See AssociateEmbFile() for further information.

PDF/A 4, 4e, 4f

PDF/A 4 files based on PDF 2.0. PDF/A 4e and 4f support embedded files but file attach annotations are supported by PDF/A 4f only.

Main differences in comparison to PDF/A 3:

• The flag coFlushPages is not supported since the special color space handling requires a cleanup run when closing the file. Links to color space resources must might be changed in this run. This is only possible if parts of the document have not already been written to the output file.

• Overprinting is fully supported.

• Geospatial and rectilinear measurement properties are supported.

• Type1 and Type5 hafltone screens are supported.

• PDF/A 4e enables beneeth embedded files the usage of 3D contents in RichMedia annotations. Note that RichMedia annotations are based on Flash.

• File attach annotations are supported by PDF/A 4f only.

So with DynaPDF you can do all these formats and output a new PDF with that or convert an existing PDF.

2 Likes