is there a program that makes pdf docs searchable
- Inicie sesión ou rexístrese para enviar comentarios
the kind of pdf doc where the text is a scan (image) of a physical text document
I found this page (below). Does anyone know whether this program will work with pdf docs?
http://packages.trisquel.info/dagda/amd64/graphics/tesseract-ocr
The gImageReader and OCRFeeder front-ends are listed as opening pdf filetypes. There may be others too.
http://en.wikipedia.org/wiki/Tesseract_%28software%29#User_interfaces
OCRFeeder is in the Trisquel repository as package ocrfeeder .
>I found this page (below). Does anyone know whether this program will work with pdf docs?
OCR will likely work with pretty much any file format.
You could also try the pdftotext command from poppler-utils package.
'pdftotext' does not do OCR. It only works on documents edited from a program (real characters, not images of them).
Thanks guys, this really helps
- Inicie sesión ou rexístrese para enviar comentarios