is there a program that makes pdf docs searchable

4 replies [Last post]
muhammed
Offline
Joined: 04/13/2013

the kind of pdf doc where the text is a scan (image) of a physical text document

I found this page (below). Does anyone know whether this program will work with pdf docs?

http://packages.trisquel.info/dagda/amd64/graphics/tesseract-ocr

Platypus333
Offline
Joined: 12/10/2010

The gImageReader and OCRFeeder front-ends are listed as opening pdf filetypes. There may be others too.

http://en.wikipedia.org/wiki/Tesseract_%28software%29#User_interfaces

OCRFeeder is in the Trisquel repository as package ocrfeeder .

lembas
Offline
Joined: 05/13/2010

>I found this page (below). Does anyone know whether this program will work with pdf docs?

OCR will likely work with pretty much any file format.

You could also try the pdftotext command from poppler-utils package.

Magic Banana

I am a member!

Offline
Joined: 07/24/2010

'pdftotext' does not do OCR. It only works on documents edited from a program (real characters, not images of them).

muhammed
Offline
Joined: 04/13/2013

Thanks guys, this really helps