Separating IP addresses in a mixed list of hostnames and addresses

3 respuestas [Último envío]
amenex
Desconectado/a
se unió: 01/04/2015

Staring at a supply of a a couple hundred files with mixed IP addresses and hostnames,
I'd like to separate the pure addresses (never gratuitously converted to PTR's by the
naive servers at Internet Service Providers) from the converted hostnames in the two
humdred files, some of them thousands of lines in length.

IPv6 addresses are easy, as they are the only strings containing colons.
Hostnames usually contain letters of the alphabet, so it should be easy
to invert the grep selection process to collect names that contain no
letters, but I haven't been able to find a grep syntax encompassing all the
uppercase and lowercase letters ...

However, this link helps:
https://www.shellhacks.com/regex-find-ip-addresses-file-grep/

Applied in the present context:
grep -E -o "([0-9]{1,3}[\.]){3}[0-9]{1,3}" Redacted/MonthlyCVs/Redacted-2014-12.txt > Redacted/IPv4s/Redacted-2014-12.txt

However, when I check the output against the original lists, there seem to be less
than half as many IPv4's in the original list as in the list the script finds,
meaning that the script is extracting IPv4's from the hostnames, which is not what
I intend. Confirming that with grep:
grep -f Redacted/IPv4s/Redacted-2014-12.txt Redacted/No-IPv6s/Redacted-2014-12.txt > Temp-03312021-C01.txt
which produces a list of the same number of items as the first grep script, containing mixed IPv4's and hostnames.

This brings me back to my original task: List the elements of the file (with no
IPv6's) that have no letters in them.
An inverse grep such as this:
grep -v [a A ...z Z] Mixed_List > IPv4-only_List
In all my notes I'm not finding that syntax.

George Langford

Magic Banana

I am a member!

Desconectado/a
se unió: 07/24/2010

As always, your problem is not clearly specified: please show us an excerpt of the input and the corresponding expected output.

Assuming the strings to analyze are separated by horizontal and/or vertical white spaces, this may be what you want ("[file] ..." is your input files):
$ awk -F . 'BEGIN { RS = "[[:space:]]+" } NF == 4 { while (++i != 5 && $i >= 0 && $i < 256); } i == 5 { i = 0; print }' [file] ...

Magic Banana

I am a member!

Desconectado/a
se unió: 07/24/2010

Given what you tried, you apparently have every IPv4 address on a separate line. The BEGIN block could then be removed. More importantly, I forgot to reset i for lines with four dot-delimited numbers that are not four numbers between 0 and 255:
$ awk -F . 'NF == 4 { while (++i != 5 && $i >= 0 && $i < 256); if (i == 5) print; i = 0 }' [file] ...

amenex
Desconectado/a
se unió: 01/04/2015

Setting aside Magic Banana's comments for the moment, I managed to get past this impasse:

This brings me back to my original task: List the elements of the file (with no
IPv6's) that have no letters in them.
An inverse grep such as this:
grep -v [a A ...z Z] Mixed_List > IPv4-only_List

In all my notes I'm not finding that syntax.

But somehow I stumbled upon it:
grep -v "[a-z, A-Z]" Mixed_List > IPv4-only_List

In a day or two I'll elucidate my task with a series of scripts which have been
effective, if not always very efficient (such as grep). They bring to mind my
chemistry classes from sixty-five years ago ...