This script based on Tesseract worked well for me. It requires to have Tesseract and ghostscript installed, and returns a number of ASCII text files from the PDF. Given that the OCR engine is the same used by Google, you can be assured it works pretty well.
A bit less comfy solution can be found on this Linux.com article, with some shell script based on Tesseract as well.
Another solution using other engines.
It seems also there is a potentially elegant GUI solution by means of OCRFeeder, but I still haven't tried it. I'll let you know how it works, for now I just bookmark these links.
No comments:
Post a Comment