Win 11 Xojo 2024 R4.1
I am tasked with developing a document storage app. I need an OCR piece so that it will OCR scan the pdf doc and then be able to automatically recognize text in the doc and fill in fields in the WebApp. The customer does Not want to copy and paste text from the OCR doc. Is this possible or does anyone know of OCR software that will integrate into my webapp and do this? I did try Tesseract-OCR with poor results. The customer is willing to train the OCR software to recognize invoices from different vendors. Thanks
You will need to do multiple things.
- PDF documents contain text and images. You can extract that with MBS Xojo DynaPDF Plugin on Windows.
- for image files or images from PDF, you can use OCR functions, e.g. our WindowsOCRMBS class in MBS Xojo WinFrameworks Plugin.
- For a Word file, you may use our WordFileMBS class to extract text or images.
- For Excel files you may use our XLBookMBS class to extract values.
and so on for other file types.
Try NAPSCAN, I use this as a solution for scanning in my own apps. It does have an OCR capability and is free (although I do donate to assist development). https://www.naps2.com/ and is windows/mac/linux solution. Works great and can be driven by command line.
chatgpt will do miracles with ocr
it can even return you a json with the scanned bills, and tell you if it is a parking bill, a highway bill, a restaurant bill, etc… give you the vat and its different rates, you only have to parse the json into your database…
Thanks for info. I will take a look at all of these options
Hi @Gary_Smith , very curious what solution you landed on. I too have a web-app project and user requirements are that a scanned (image) of an ID card be able to OCR textual data. Would be eager to hear how your own experience with this is going.
@William_Reynolds I am running my webapp on an Ubuntu VM. The app uses pdftoppm to extract the scanned image from the pdf file. pdftoppm is already installed on Ubuntu. Then the app calls Tesseract to OCR the ppm image(or png) to return a text file containing all the text it can find in the image. Later on the app sends a prompt to OpenAI containing the text from the file and asking for specific information and return it in a json format.
pdftoppm and Tesseract are called with a Shell command.
I can post some code if you need it. If you are hosting on Windows, the Tesseract install is pretty complicated. Thanks
Wow! Hats off to you for what looks like an insane amount of work. I’ll be sure to hit you up for more insight - thanks for taking the time with such a thorough reply.
you can directly send the picture to chatgpt and ask for a json of the elements in the picture you need.
it will cost you more credits than Gary’s method, but may be easier.
My only concern is that there is ‘personal’ info on the ID card being scanned, and I’m paranoid about privacy concerns and exteral Chat/AI resources.
Are all of the ID cards layout the same? If so…
You could use Tesseract to extract the text and parse that text file without sending it to AI.
If you want a secure AI then host your own LLM. Take a look at hosting Deepseek. Self hosted does not send info to external sources and the Deepseek LLM is free. But you will need some good hardware to run it on.
I think deepseek does not allow pictures input, only text or pdf(text only)
at least not in the chat online.