how to decode pdf 'stream'

Hi everyone,

I’m searching for the protocol to decode the ‘stream’ operations in a pdf file.
do you have any document that describes the process,
any link and example on how to decode such chunk of bytes ?

I’ve search quite a lot on this, and did not find anything useful.

the idea is to open and display a pdf at a desired page
(and no Christian I don’t want to use a plugin ! - sorry no offense)

thanks.

PDF Streams can contain all kinds of information…

  • Text
  • Images
  • Drawing command (lines, rectangles etc)

they can also contain other information such as input fields, tables, pointers, pointers to other pointers

And they can be in “plain text” (human readable) format, or they can be encrypted streams (zip format)

SO, the answer is there is no “easy” way to do it, without reading and understanding the Adobe PDF Specification (and trust me, that is no easy task).

I’ve spent the last couple of years on and off, working with the PDF specs… download the demo of gPDF and look at the output it produces (and this is the “easy” stuff)

how do you decode the encrypted zip format streams ?
for now that’s the only thing I’m searching for.
(and this is my original question !)

What do you want to do?

DynaPDF can decide you a lot and you can get decoded content stream.

You can read PDF spec from Adobe.

With zLib, look at 276 line of contentPage class of DBReportPDF component. Works like graphics class of xojo.

In PDF files use zlib to uncompress the data, as Dave says could be a text or image or other data, the object tell you what is the “stream”.

a stream is NOT one thing (image OR text … it could be any number or combination of things]
here is a “simple” example

that happens to be 6 or 8 styleruns from an Xojo TextArea translated to PDF using my gPDF class… this one doesn’t happen to have any “drawing” commands, but it just as well could.
So having the “stream” in human readable format as this is, may not get you anywhere unless you further know how to decypher the contents… (and don’t get me started on an image stream)

Well, if you use DynaPDF Lite and you import that page, you can just query the images on a page and DynaPDF provides them decoded. Either if JPEG pass through to a JPEG file or in any other format convert them to TIFF, PNG or JPEG.

see
http://www.monkeybreadsoftware.net/example-dynapdf-extractimageobjects.shtml

And when using DynaPDF Pro the ParseInterface can give you all those draw commands as events when processing them. This way we can decode all the commands just fine.
see
http://www.monkeybreadsoftware.net/class-dynapdfparseinterfacembs.shtml

Christian… not sure how that applies to the question the OP asked… .but a nice plug none the less

Well, the OP can read the PDF specs from Adobe:
https://www.adobe.com/devnet/pdf/pdf_reference.html

Jens did and build DynaPDF using those specs.
And it may be easier for the OP and others to just use what’s there instead of spending hours to reinvent the wheel.
(others may read the thread later and prefer the plugin)

well the answer I was seeking is “zlib” !
thank you, Bernardo.
will have to try it on my apps, but it seems it.

Christian and Dave, no offense, but I have been stuck with plugins not updated years ago, and I dont want to use them anymore.
Dynapdf is a huge work, and I cannot afford it (and it’s a plugin see line above…)
and yes may be people reading this thread may want to use dynapdf.

I read (quickly) the pdf reference, but did not find any example for the stream decompression.

Jean-Yves… by no means am I attempting to encourgage you to use plug-ins (Christians or anyone elses). All I am attempting to do is educate you on the fact that decoding the guts of a PDF file in by no means a trivial matter.

If you want an easy way to CREATE a PDF… and don’t want an expensive plug-in… Look at my gPDF class (its is source code in Xojo itself)… rdS.com/gpdf

and no, the PDF spec does not speak of decompression, as a matter of fact for as verbose as the spec is, it leaves out about 90% of the importatant information.