Remove unrecognised characters from PDF

Hi all,
I have a PDF, which I have created via wkhtmltopdf. In there are some unrecognised chars - showing as boxes with an X in them. (See https://dl.dropboxusercontent.com/u/15108631/27315589275.pdf for an example.)

Is there any way, perhaps by reading it as a binarystream and then stripping out characters I don’t want, to get rid of them?

Thanks,

Hamish

Not simply :stuck_out_tongue:
PDF is a structured document and that looks to be one “object” all by itself
I’d tweak the html thats used

The HTML doesn’t have these characters; the utility which I’m using to convert HTML to PDF inserts them. It’s rather annoying. I’m trying a three-pronged approach: ask about making that not happen any more; ask about how to strip them out in terminal; ask about how to strip them out in Xojo… :slight_smile:

hamish… want to send me your html file for that?? do u actually see any of those weird characters in html file

The HTML file is clean; it’s wkhtmltopdf which is inserting them. It doesn’t occur in 0.11, but in the current release candidate it does occur. It’s annoying; I want to use the current release candidate, because that gives selectable text.

HTML file at https://dl.dropboxusercontent.com/u/15108631/27315589275.html if you want to have a look.

H

there’s something funky in the html
if that html is what generated that PDF there’s really something odd as the html when I open it looks like … well … yuck !
text doesn’t even match up
The html has a tag

Order confirmatioetails

but this appears as 2 lines in the pdf

very odd

Agh, sorry, I edited the file to try to get to the bottom of things. I’ve updated that file now so it matches the PDF - try downloading it now.

Right, stand down: after four hours on this I’ve found a workaround. https://github.com/wkhtmltopdf/wkhtmltopdf/issues/1734 has info. Thanks all for the help; have a nice weekend.

DynaPDF Plugin? Replace text feature?

Yeah, I suppose I could have tried that as well. I just wasn’t sure what the character was I was wanting to replace! Plus, my current workflow is pretty complicated (read HTML file, replace tags, save HTML file, convert to PDF, read PDF, convert to graphic, send to printer) and I didn’t want to make it even more complicated if I could at all help it! :slight_smile:

i just try it on my latest wkhtmltopdf both 32bit and 64bit for mac and running on terminal with your html file and it generate fine… can’t see the weird characters… but then i wasn’t using ‘Myriad Pro’ since i don’t have them

so look like it has something to do with fonts

I think it depends what version you are running. I was running 0.12.1 rc3 from http://wkhtmltopdf.org/downloads.html .
I believe it happens with some fonts and not others, but we let our customers choose the font they use, so we have to fix the problem!

do u have any idea where is the wkhtmltopdf located after installation on the mac?

Yes, it goes in /usr/local/bin/ . However, in our application, we extract it from there and ship it as a directory that’s part of the application package, so we don’t need to get people to install it on their machines.

i do that too… i have a folder called components and the windows and mac version is located there