Kick off proprietary software and SaaSS, convert PDF to ODT using LibreOffice Writer

7 Antworten [Letzter Beitrag]
nadebula.1984
Offline
Beigetreten: 05/01/2018

Scroll down the long list of file types in the Open File dialogue and choose PDF - Portable Document Format (Writer). Be sure that you see "Writer" in the name of the type. Otherwise, the PDF would be opened using Draw. It is generally better than converting PDF to HTML using poppler-utils and subsequently converting resulting HTML to ODT.

The PDF - Portable Document Format (Writer) option is available in LibreOffice 6.1.5 in current Debian stable (buster), and should be available in current Trisquel 9. If not satisfied with the results, you can try to manually install the latest version of LibreOffice.

andyprough
Offline
Beigetreten: 02/12/2015

> Scroll down the long list of file types in the Open File dialogue and choose PDF - Portable Document Format (Writer). Be sure that you see "Writer" in the name of the type. Otherwise, the PDF would be opened using Draw. It is generally better than converting PDF to HTML using poppler-utils and subsequently converting resulting HTML to ODT.

I did not know about that - that is a very good import into Writer. Thanks, very useful!

nadebula.1984
Offline
Beigetreten: 05/01/2018

I've been doing localization works on another LibreOffice book. I managed to find a downloadable PDF but not ODT. When I began the work, I used poppler-utils to extract text from it and subsequently format the resulting text using markdown.

Now the markdown file is half finished and I learn this trick from my local community. However, I'd like to finish my markdown version, since I've found that markdown is good enough for preparing not-very-complex documents.

Basically, to prepare a book using markdown, perform the following steps (let's assume that you want to translate a PDF book to your local language as I always do):

1. Use poppler-utils to extract text and images from the PDF
2. As you translate the main text, type your localized version following markdown syntax
3. Copy all extracted images to a sub-directory and add links to the main text using relative path
4. When you finish, render the resulting markdown file using pandoc and open the resulting HTML in browser for proof reading
5. If everything is okay, pack the entire directory using .tar.gz format and release your localized documentation under a free/libre license

lutes
Offline
Beigetreten: 09/04/2020

> I learn this trick from my local community.

Thanks for sharing anyway. Did they find a way to overcome the curse of the text frame monster? I have tried a few other pdf sources now and I always get these unusable text frames. Maybe they have found a way to merge them into one large text frame?

I have a feeling that your poppler-utils technique is still the recommended one.

lutes
Offline
Beigetreten: 09/04/2020

I would rate this: great feature, work in progress.

With the first pdf I tried to convert I got text frames, one per line of the original document. Can you massively convert text frames into text? I doubt so.

The second one absorbed all my 4GB RAM and started eating at SWAP until the system became almost unresponsive. This had not happened for so long I had forgotten it could.

For now I'll go with good old copy-pasting and line re-formatting, that way at least text content comes out as plain text. Chasing CR/LF marks is only one Search & Replace trick away.

andyprough
Offline
Beigetreten: 02/12/2015

> The second one absorbed all my 4GB RAM and started eating at SWAP until the system became almost unresponsive. This had not happened for so long I had forgotten it could.

Hahaha, the Libreoffice monster tried to eat you alive! Like the monster girl from 'The Ring' horror movie crawling out of the television to come and attack you.

lutes
Offline
Beigetreten: 09/04/2020

Spooky. Had to swap-off to drain the remains of the monster away.

GNUbahn
Offline
Beigetreten: 02/18/2016

I would rate this: great feature, work in progress.
I tend to agree on this. Unless there is a function I am unaware of: Can al those separated text frames be combined?