Languages

User login
Create new account

Navigation

Recent donations

Tatiana Hewett
donated AUD 20.00

Santiago Rodriguez
donated € 10.00

Petr Alt
donated € 142.00

Pablo R. P.
donated € 20.00

Luca Mezzolla
donated € 20.00

Donate now!

bc1q3t3vxjhd3dmvg3cfn24k4l7n4mf750utpp75hn

Submitted by Luck-02 on Fri, 02/18/2022 - 20:31

Processando informação

(this page is a work-in-progress)

Text-processing commands

The Unix operating system came with several text-processing commands that are still very useful today: head, tail, cat, tr, wc, cut, paste, comm, join, sort, uniq, grep, etc.

Specific, these commands are very efficient. The GNU project has improved them a great deal (e.g., additional options). The original commands are part of some POSIX standards.

sed and awk are not as specific as the commands listed above but they are extremely powerful when it comes to process text.

Besides their 'info' manuals, introductory material on all those commands can be found all over the Web. For instance, the sets of slides numbered 3 to 7 on http://dcc.ufmg.br/~lcerf/en/mda.html#slides present those commands (including exercises) and allow to learn their basics within a few hours.

Commands to process PDFs

The packages "poppler-utils" and "pdfjam" provide several commands to process PDFs (e.g., to concatenate several PDFs into one single document, to extract some specific pages, to see the meta-data, to get the content as plain text, etc.).

Those commands can be used inside scripts (like any command). Following this thread of the forum, a script was written to extract from PDF documents, the pages matching some regular expressions (simple strings for example): http://dcc.ufmg.br/~lcerf/en/utilities.html#pdf-page-grep

Lucas Westermann (Full Circle Magazine) wrote a pedagogical article about Professor Loic Cerf's 'pdf-page-grep'. This article appeared in issue 89 (pages 10–11) of the the magazine: http://dl.fullcirclemagazine.org/issue89_en.pdf

Revisions

02/18/2022 - 20:31

Luck-02

top

Languages

Navigation

Recent donations

Processando informação

Revisions