Here are some code I have ever used to extract text from PDF files:
[code]Imports Yiigo.PDF
Imports Yiigo.Image
Public Function ObtainTextBBox(doc As PDFDoc, pageNum As Integer, index As Integer, count As Integer) As
Quadrilateral()
Dim page As PDFPage = DirectCast(doc.ObtainDocPage(pageNum), PDFPage)
Dim tmp As Quadrilateral() = page.ObtainBoxes(index, count)
Return tmp
Public Sub Search(doc As PDFDocument, pageNum As Integer, index As Integer, txt As String)
Dim p As PDFPage = DirectCast(doc.ObtainDocPage(pageNum), PDFPage)
Dim res As PDFSearchResult = p.Search(index, txt)
For Each idxAndCntPair As Integer() In res.Results
Next
End Sub[/code]
And here are the relevant tutorials about how to extact text form PDF files. I hope it helps. Good luck.
I currently use the CLI utility PDFTK successfully on Windows to extract text from a text only PDF. It works for me…but I’ll soon be needing a Mac solution to start work on a new cross platform project.
My particular purpose requires the PDF to be uncompressed, as most are compressed with Flatecode and ASCII85, then read as I need to know where on the page the text appears to correctly import it. Once uncompressed into pure ASCII text I read the whole PDF into an array and create custom class objects of the PDF text with x & y integers of the page position.
Obviously not what everyone wants or needs, but works well for me.