I everyone
Once a Adobe indesign page is convert to pdf, it lost all reference to included image name
is there a little bit of reference like a id or something else that i can know that
is there pict A or pict B is embeded in the PDF
without viewing it
Just by searching in pdf ( code ) as text and witout any third part plug-in
I dont need the name or human readable reference… Just if is the image is differente from pdf 1 and 2
thanks
Without a plugin?
normally image is embedded without a name…
That i know… without name
I don’t need name, just if it’s different
by id, stucture or something else
by opening it as text i can isolate the pict part in the pdf
from there can i compare some code in it
with DynaPDF Pro you could extract them. But if you just need bytes, you can of course read directly in PDF file with binary stream…
ok that is possible ( in binary stream)
thanks
[quote=101309:@Denis Despres]ok that is possible ( in binary stream)
thanks[/quote]
Keep in mind that depending on how the PDF is generated what an “image” is may vary. Some PDFs are all a huge image per page. Others have lots of images that internally represent text and some have actual text and vector drawings and only the original images are still so.
Also, depending on the version of the PDF and whether it’s encrypted or not, even the above may not apply.
Raster Images in PDFs usually come as two dictionaries, one with the metadata and one with the stream. They may also come as inline images which is a raw representation. There’re multiple formats used for image data within PDFs (although the most common are jpg and tiff)
The link below is a good starting point, but the whole thing is a can of worms once you start down that path, thanks to the way the standard has evolved and allows for variation and errors.
http://blog.idrsolutions.com/2010/09/understanding-the-pdf-file-format-images/
In all honesty, the amount of variants and possibilities is so large you’re better off getting a package to do it for you. DynaPDF has a function for it, which can be nicely integrated into a Xojo program. Other than this you can try with external helpers like the expensive PDF2DTP or the free Unarchiver.
I’m in love with The Unarchiver, and the command-line tools are great as helpers for things like this. Unarchiver sees a PDF as if it was a compressed file with images in it. The support in the CLI tools was added in version 1.2:
http://unarchiver.c3.cx/commandline
Other free tools that I don’t know the status for are pdfimages