Open source: ocrJob - an ocrmypdf GUI front-end

Hello all,

I’ve open sourced ocrJob, a GUI application for creating and executing batch OCR jobs using ocrmypdf/Tesseract.

It is primarily focused on Windows, but I guess it can work on MacOS and Linux. I just don’t have the time to test it on these two other plaforms. Plus, I don’t have a Mac.

It is in its initial stages of development, so if you experience and bugs or inconsistencies, let me know.
If you also have any suggestions for improvements, or for integrating it into your workflows, I could implement them if they are generalizable enough.

Cheers,

George

2 Likes

I tried Tesseract for my OCR application but found it was very sensitive to image quality (I’m working with video caps with a lot of variation in target orientation, lighting, etc.). I switched to AWS Textract and while not free, the performance is a quantum leap above anything I could get from Tesseract. It finds every scrap of text in the image even under poor lighting and at crazy angles.

That’s nice to know!
My own experience is that on-premise Tesseract’s prime use case is this:

  • Best effort OCR/searchable PDF creation. Not if the OCR output is mission-critical in some way.
  • You can’t send the content to someone else’s computer (ie the Cloud)
  • The source images are of some controlled quality: text documents from a scanner. No video feeds and the like.

But it’s true, Tessaract output’s quality is noticeably lower than commercial engines. Sometimes, it’s good enough :slight_smile:

1 Like

OK, on macOS only, but have-you tried LiveText (Ventura) ?

Julia ?

@Emile_Schwarz , how would you interface livetext with your xojo app ???!?

Declares ?

I have read someone have used Quick Look with Xojo, so… maybe ?

It is part of Visionkit so likely with declares.

My app has to be cross-platform, so Mac-only is a non-starter for me. AWS Textract was pretty easy to set up in an hour or two.

Always easy to do mac development by using a remote mac from macweb.com

ah, you tried it on a Mac and works ok?
that’s cool! :slight_smile:

thanks, will keep it mind! I don’t see myself making any money from developing for the Mac any time soon though :slight_smile:

I have a serious bug warning about versions 1.x.x: the last document in the queue is not getting OCR’ed!
It’s being fixed in version 2.0.0, along with a big refactoring of the job engine.

Sorry about that, totally slipped through :face_with_diagonal_mouth: