is there a program that makes pdf docs searchable

4 respostas [Última entrada]
muhammed
Desconectado
Joined: 04/13/2013

the kind of pdf doc where the text is a scan (image) of a physical text document

I found this page (below). Does anyone know whether this program will work with pdf docs?

http://packages.trisquel.info/dagda/amd64/graphics/tesseract-ocr

Platypus333
Desconectado
Joined: 12/10/2010

The gImageReader and OCRFeeder front-ends are listed as opening pdf filetypes. There may be others too.

http://en.wikipedia.org/wiki/Tesseract_%28software%29#User_interfaces

OCRFeeder is in the Trisquel repository as package ocrfeeder .

lembas
Desconectado
Joined: 05/13/2010

>I found this page (below). Does anyone know whether this program will work with pdf docs?

OCR will likely work with pretty much any file format.

You could also try the pdftotext command from poppler-utils package.

Magic Banana

I am a member!

I am a translator!

Desconectado
Joined: 07/24/2010

'pdftotext' does not do OCR. It only works on documents edited from a program (real characters, not images of them).

muhammed
Desconectado
Joined: 04/13/2013

Thanks guys, this really helps