Extract images from PDF

Hello guys,

Is there a way to extract images from a pdf document ? they are National IDs and they are usually scanned back and front and we need to extract them and save them properly .

The idea was to use it on Web but if there are other options available on Windows or Macos they are more than welcome.


You can use the MBS Xojo DynaPDF Plugin (with DynaPDF Lite license) to extract images.

See example projects:

Thanks Christian, however the package is little bit to much for what i need , i’ll try to see some other non XOJO solutions maybe and then post process them

For many years I’m using PDFGenius (macOS) for extracting images among others from a PDF.

Yes, that one does the job. But there are many PDF tools, some for free, which can do this job, and much more.

You could possibly declare into the free pdfium library.

thanks a lot guys,

The idea is little bit more or less strange .

The pdfs that i’m getting are generated by a konika printer and the use the ID Copy function, so on a pdf i have the ID copy of front and back of the document.

So far i was doing some tests in python and opencv and it seems that some work and some not, i guess it depends on the filters and algorithms you use

Considering that i want this service to be on a linux server and to run as a batch feeder , it would be nice to have something console based to do the job and then post process it. i’ll think about it to see if it is worth it or not.

@PaulS I guess the app is standalone, which would imply that i need to either do an automator script or something else to interact with it , i wanted something more embedded in the code to do the whole flow.

Thanks again.

An example of a chatGPT response, with three examples of approaches, via the shell, via MBS, via a web service.

(and of course I could have continued the discussion to ask him to use only a Console project and with a whole folder of PDF files).

If the Xojo team could produce a plugin for chatGPT, it would be able to understand Xojo better, not confuse it with VB, integrate the language evolutions after 2021, directly analyse and produce Xojo files and Xojo projects, etc.

In Python, it gives the code, but it also executes it immediately (so we can be sure that the code works without needing to correct it) and generates the requested files. A Xojo plugin could do the same thing.

If you click on “Show work”, you’ll see the Python code used:

That meants it knows how to send code already given. No intelligence there.

What about something he have to compute the result by itself ? (code no one gaves it)

Of course, you are free to train it.

Of course not. You can ask it whatever you want, continue to ask it to add new features, work differently, and so on.

With the new “Code interpreter”, it becomes an intelligent development environment. It proposes code, which you can continue to ask it to evolve, and it can also execute it directly.

Call me Thomas on that one.

Maybe you can use PDFArea trough the Shell if you use windows or something like, Good luck! [Top 10 Free PDF Image Extractor to Extract Image from PDF]