One more sed example (should be awk?)

3 risposte [Ultimo contenuto]
Avron

I am a translator!

Online
Iscritto: 08/18/2020

I have a text file with things like

Some text.
-- ASN1START
nice ASN.1

more nice ASN.1

-- ASN1STOP
Some text.
More text.
-- ASN1START
other nice ASN.1
yet other nice ASN.1
-- ASN1STOP

I'd like to get only the lines between -- ASN1START and -- ASN1STOP.

After 1,5h searching through the manual of sed, I found section 6.3 on multiline techniques and was happy to see that the following seems to works: sed '/^-- ASN1STOP/!{H;d} ; x ; s/^.*-- ASN1START[^\n]*\(.*\)/\1/'

In the example in 6.3, there is /./{H;$!d}. The condition /./ is met on every line so here it makes no difference, but, if there would be another address than /./ not met on every line, would the d command be executed when both that address and $! are met, or only $!?

Then, perhaps there was simpler than that with sed or with something else, awk maybe?

Magic Banana

I am a member!

I am a translator!

Offline
Iscritto: 07/24/2010

Using -n (to suppress automatic printing), a selection based on an interval of regular expressions and p to print is simpler:
$ sed -n '/^-- ASN1START$/,/^-- ASN1STOP$/p'
Selection based on an interval of regular expressions works as well with AWK:
$ awk '/^-- ASN1START$/,/^-- ASN1STOP$/'

If you do not want the lines "-- ASN1START" and "-- ASN1STOP", you can remove them, piping the output to grep:
$ ... | grep -Evxe '-- ASN1ST(ART|OP)'
I assumed "-- ASN1START" and "-- ASN1STOP" must be whole lines.

Another simple solution in AWK is to redefine the record separator (RS variable) and print even-numbered records:
$ awk -v RS='\n-- ASN1ST(ART|OP)\n' 'NR % 2 - 1'
That solution only works if "-- ASN1START" and "-- ASN1STOP" are perfectly intertwined (no two consecutive "-- ASN1START" or two consecutive "-- ASN1STOP"). Also, the first line cannot be "-- ASN1START", because the record separator starts with a newline character.

Avron

I am a translator!

Online
Iscritto: 08/18/2020

Thanks, your examples with sed are definitely simpler.

I missed that an address range could be made of two regular expressions, I could not find any example of that in the manual but the possibility is clearly mentioned in the text (in 4.4).

About awk, I had forgotten that the record separator could be defined as a non-fixed string. On the command, I guess you meant $ awk -v RS='-- ASN1ST(ART|OP)\n' 'FNR %2 == 0'
(in my file, it actually matches with even record numbers).

EDIT: You were probably editing your message when I tried, your example as of now is fine. I am not sure what the difference between NR and FNR is.
EDIT 2: I found, FNR is reset to 0 when changing input file while NR is not, so with a single input file it makes no difference.

Magic Banana

I am a member!

I am a translator!

Offline
Iscritto: 07/24/2010

Exactly. As a consequence, whether it is better to use FNR or NR depends on whether the files are independent or continuations of each other (here with "-- ASN1START" towards the end of a file and "-- ASN1STOP" towards the beginning of the next file). I further edited my previous post: I added that selection based on an interval of regular expressions works as well with AWK and stressed the additional assumptions for the last solution to work.