Alternative solutions for converting html to pdf

I have a solution in place for converting html to pdf: html is loaded into a html viewer, printed to pdf with the MBS plugin. Then I add page numbers and a header also with the MBS plugin. Since the change from the HtmlViewer to WKWebViewControlMBS I have had several users with different problems. Either the intermediate file can’t be opened or the final file can’t be written. Neither problem is reproducible. Therefore, I’m looking for a different solution.

WKHtmlToPDF: there is no ARM version and it’s unlikely that one is coming.
Weasyprint (Python): not suitable for my crappy html.
pyhtml2pdf (Python): doesn’t seem very fast. But it was 10times easier to set up than Weasyprint.

Does anyone have other ideas? The html coming from emails is really really bad.

There’s GraffitiHTMLtoPDF, but it’s a Desktop-only solution and – in the currently available version – produces rasterized pages. In the next release of GraffitiSuite it will output proper PDFs that are not rasterized.

For Web, GraffitiPDF has a method for converting HTML to PDF.

I need a solution for desktop so that is okay. Your solution is slow - some users make 10s of thousands of documents.

Tables are rendered not quite correctly. GraffitisSuite:

PDF from my app:

If you have a GraffitiSuite account, open a ticket and attach the HTML as a txt file. The next version uses a different engine for the conversion, so it may appear different.

What do you use there?

The solution we had for years is asking WebKit to load the page and then print to PDF or render output as PDF or image.

If you use a WebCanvas to draw it within the HTMLviewer into that WebCanvas and get a picture, that may also work.

I would not want to try and export the DOM tree to render it myself as I would have to implement all the CSS quirks.

If you’re asking me, I use a combination of solutions to convert the HTML and CSS to a JavaScript object that is then rendered using PDFKit.

If you are asking me, then yes, that’s what I’m using. I think we exchanged some emails a while ago. One of my problems is that

thePDFDoc = thePDFFile.OpenAsCGPDFDocumentMBS

results in a variable that is nil.

The second step to create another PDF out of the first PDF can also fail.

@Anthony_G_Cyphers : no I don’t have an account. Here is the html: html - Pastebin.com .

But the speed is the major problem.

That HTML hits an exception on some of the images being loaded in the new version. That’s probably why the old version was so slow, timeouts on failed image loads.

Your example takes several seconds to render. My current solution is much faster.

I think I broke the example app with a nice email that has 24 attachment previews. It’s been over a minute.

isn’t this an option? GitHub - parallax/jsPDF: Client-side JavaScript PDF generation for everyone.
you can add the js to the document then use HTMLViewer.ExecuteJavascript to run this.
Some samples running this: jsPDF - Create PDFs with HTML5 JavaScript Library

1 Like

Thanks, I’ll have a look.

1 Like

jsPDF is what I use for Xojo Web. It uses – or used to – HTML2Canvas to rasterize the HTML elements and apply them to a PDF page. There are caveats.

I don’t think it’ll do what you want. There are server content restrictions that are tripping it up, errors in loading resources, etc. For images in the document that are BASE64, it works flawlessly. URLs, not so much. There’s a lot of security hurdles (especially on macOS) for the HTMLViewer to jump through.

At least it’s Open Source… so one could (try to) compile it for that architecture.

But it seems somewhere between hard and impossible, according to the discussions on GitHub…

And I assume you want/need a solution that is working locally with no WebService involved?

1 Like

It’s not just the issue of ARM for me, but installed fonts and Android.

I don’t do large PDF files like you, but I created a Xojo web app (server has all needed fonts) that receives the HTML via URLConnection, converts it to PDF (via wkHTMLtoPDF) then returns the PDF to the user. Returns the same exact results whether called from Desktop, Web, Windows, macOS, Linux, iOS or Android!

Not a solution for all, I understand, but it works!

1 Like

Good to know. Bummer.

I have 2 main classes of emails: the personal ones with or without attachments. And newsletters which have a lot of resources which may or may not be available.

GraffitiSuite - as it is now - is waayyyyy too slow.

@Christian Schmitz: I know that the problems are not reproducible. Would you take a look again at my problems?

WKHtmlToPDF might be a good alternative for now.

Well, the next version is very fast, but the way it operates isn’t very lenient with outside resources. It was mainly developed to work with GraffitiEditor, which it does quite well.

I’ll try the new version when it will become available. My 30 MB email didn’t even give a result in the current version.

The outside resources are a problem.

I’ve stumbled upon another open source tool. It’s basically a super simple console application that works as a wrapper to load an URL into a WebView, then saves the contents as PDF.

A Blog Post by Gavin Ballard (from 2013) mentioned this tool called URL2PDF. The source is available on GitHub: scottgarner/URL2PDF.

Apart from the ReadMe, the Code seems to be about 6 years old… Anyway, I’ve downloaded it. And built it as a macOS Universal binary (x86_64 arm64) and Deployment Target macOS 10.10.

Then I’ve downloaded your test file and tried it. Seems to work just fine. :wink:
You could use that source/approach as a “how to build your own dedicated helper” (console application)… strip out unnecessary features, change default parameters, maybe you want to add dedicated parameters for different paper format(s), use the newer WKWebView, whatever.

Anyway - if you’d like to try what I’ve built quickly then you can download the source I’ve downloaded and my “out of the box build” along with your testfile from my Dropbox: Source and Build: URL2PDF
Note: The built console application is not codesigned for Distribution…
Note 2: I’ll delete this from my Dropbox sooner or later. If someone comes by later, links to the source are provided above.

If you extract it in your Downloads folder, usage would be like this (change paths to reflect your environment…):

cd /Users/juerg/Downloads/URL2PDF
./BUILD/Products/usr/local/bin/url2pdf --help
./BUILD/Products/usr/local/bin/url2pdf -n URL -p /Users/juerg/Downloads/URL2PDF/test -u file:///Users/juerg/Downloads/URL2PDF/test/test.html

So from the “extracted URL2PDF folder” I’m calling the built console app. Parameters:

  • –help → displays available options
  • -n URL → the output filename should be the URL-part (and not the content title, which is the default)
  • -p /Path/to/output → where to save the PDF
  • -u URL → the URL this tool is going to save as PDF. So you have to provide a local file URL!
  • if you don’t want a “single page PDF”: add the 2 options for “paginate” and “orientation”. It seems the tool is trying to get the default paper size.
    • -g YES -o Portrait

But I can’t say if an approach using such a command line tool fits your needs for speed and quality… :wink:

1 Like