pdf text extraction for invoices

HI,

I’ve been asked to add the facility to import pdf purchase invoices into our accounts product.
I thought it was just linking the pdf to the record to view, but they want to extract the detail and
automatically add the Purchase invoice to the accounts.

Is that even possible ?

DynaPDFMBS will let you read PDFs. How you go about linking the data together will probably be implementation specific and related to the design of the PDF.

yeah i thought about dyapdf. i have got a licence. normally i have a clue about direction and can get started. i just cant think how to even go about starting though!

Hey Russ,

I’m sure something like that is possible. But I think it’s only possible with the help of some plugins. MBS might do that kind of trick.

Have you every looked at UBL Invoices? More companies provide this kind of files. It usually contains the PDF file, together with all the data fields that make up your invoice. It is a new standard that I have been using for a while now.

hi Edwin,

UBL looks similar to EDI. we already do EDI for some customers/suppliers but these Purchase Invoices are form various suppliers that do not use the EDI system and have different accounts systems producing the invoices.

I had a thought about perhaps getting them to pick a supplier for each pdf file as a first off which should then mean that at least they are in the same format per supplier but then im stuck.

Hi Russ,

We have a desktop app that we created in Xojo with DynaPDF plugin that does exactly what you are trying to do. It has a visual designer and you can select areas of the PDF that you want to import and it will extract the text from the area and save in a variable. You can also embed xojo script code into the template and it will allow you to modify anything that you have extracted. For example, if the date on the invoice is in a strange format. Also, you can move the extraction areas dynamically in situations where you need to grab an invoice total or some other piece of data and it is not in a consistent area. Contact me directly if you want to discuss.

Maybe @Alain Clausen could help you too. From him is the Alinof Archives. It is also written in Xojo and in my opinion does not use a plugin for this function.