is there a program that makes pdf docs searchable

4 respuestas [Último envío]
muhammed
Desconectado/a
se unió: 04/13/2013

the kind of pdf doc where the text is a scan (image) of a physical text document

I found this page (below). Does anyone know whether this program will work with pdf docs?

http://packages.trisquel.info/dagda/amd64/graphics/tesseract-ocr

Platypus333
Desconectado/a
se unió: 12/10/2010

The gImageReader and OCRFeeder front-ends are listed as opening pdf filetypes. There may be others too.

http://en.wikipedia.org/wiki/Tesseract_%28software%29#User_interfaces

OCRFeeder is in the Trisquel repository as package ocrfeeder .

lembas
Desconectado/a
se unió: 05/13/2010

>I found this page (below). Does anyone know whether this program will work with pdf docs?

OCR will likely work with pretty much any file format.

You could also try the pdftotext command from poppler-utils package.

Magic Banana

I am a member!

I am a translator!

Desconectado/a
se unió: 07/24/2010

'pdftotext' does not do OCR. It only works on documents edited from a program (real characters, not images of them).

muhammed
Desconectado/a
se unió: 04/13/2013

Thanks guys, this really helps