is there a program that makes pdf docs searchable

4 risposte [Ultimo contenuto]
muhammed
Offline
Iscritto: 04/13/2013

the kind of pdf doc where the text is a scan (image) of a physical text document

I found this page (below). Does anyone know whether this program will work with pdf docs?

http://packages.trisquel.info/dagda/amd64/graphics/tesseract-ocr

Platypus333
Offline
Iscritto: 12/10/2010

The gImageReader and OCRFeeder front-ends are listed as opening pdf filetypes. There may be others too.

http://en.wikipedia.org/wiki/Tesseract_%28software%29#User_interfaces

OCRFeeder is in the Trisquel repository as package ocrfeeder .

lembas
Offline
Iscritto: 05/13/2010

>I found this page (below). Does anyone know whether this program will work with pdf docs?

OCR will likely work with pretty much any file format.

You could also try the pdftotext command from poppler-utils package.

Magic Banana

I am a member!

I am a translator!

Offline
Iscritto: 07/24/2010

'pdftotext' does not do OCR. It only works on documents edited from a program (real characters, not images of them).

muhammed
Offline
Iscritto: 04/13/2013

Thanks guys, this really helps