Friday, 18 February 2011

OCR of scanned PDFs in Linux

It seems there is still no quick-and-ready solution, but found a few interesting scripts.

This script based on Tesseract worked well for me. It requires to have Tesseract and ghostscript installed, and returns a number of ASCII text files from the PDF. Given that the OCR engine is the same used by Google, you can be assured it works pretty well.

A bit less comfy solution can be found on this Linux.com article, with some shell script based on Tesseract as well.

Another solution using other engines.

It seems also there is a potentially elegant GUI solution by means of OCRFeeder, but I still haven't tried it. I'll let you know how it works, for now I just bookmark these links.

Tuesday, 8 February 2011

Install True Type fonts on Ubuntu in three steps

It's very easy:

1)Create a .fonts subdirectory in your home directory
cd ~
mkdir .fonts

2)Move the .ttf file in the .fonts directory
mv myfont.ttf ~/.fonts

3)Refresh the font cache
sudo fc-cache -f -v

And your font should be ready to be used.
Credits to Detector Pro -I streamlined the process a bit, being comfortable with the command line.