Idiomas

Inicio de sesión
Crear nueva cuenta

Navegación

Donaciones recientes

Monika05PL
donó PLN 7.80

Jorge Garcia Go...
donó € 10.00

Jesus Tobajas ...
donó € 10.00

arielenter
donó MXN 500.00

Gary Rookard
donó $ 10.00

Donar ahora!

bc1q3t3vxjhd3dmvg3cfn24k4l7n4mf750utpp75hn

Enviado por amenex el Vie, 02/18/2022 - 16:05.

Distinguish the end of a list of IPv4 addresses from the following alphanumeric strings

Por favor lea y siga las Reglas de la Comunidad.

Inicie sesión o regístrese para enviar comentarios

4 respuestas [Último envío]

Vie, 02/18/2022 - 16:05

amenex

Desconectado/a

se unió: 01/03/2015

The attached Mixed-Types.txt file is copied from a 2000-row list of resolved domain names.
The original list had been rearranged with sort -Vk 2,2, leaving about two dozen partially
resolved domains at the end. Those partially resolved domains have to be processed again
with dig, but by grep-ing six additional lines after the ANSWER SECTION because the IPv4
address is in the last line of the grep-ed output.
Selection of those last two dozen lines requires that they be counted somehow; for example:
awk '{print $1}' Mixed-Types.txt > PartAA.txt ; grep -Eo '([0-9]{1,3}\.){3}[0-9]{1,3}' <(awk '{print $2}' Mixed-Types.txt) > PartBB.txt ; paste -d ' ' PartAA.txt PartBB.txt > PartCC.txt ; wc -l PartBB.txt | sed 's/PartBB.txt//g' ; wc -l PartCC.txt | sed 's/PartCC.txt//g' ;
Then the main file can be separated into two parts with:
awk '{print $1,$2}' PartCC.txt | head -n 35 '-' > Resolved-2k-Domains.txt awk '{print $1}' PartAA.txt | tail -n -25 '-' > Unresolved-two-dozen-Domains.txt
Perform a dig on any one of those last 25 domains and you're likely to get different answers
every day ...
Two questions remain:
(1) Is there a less cumbersome way of counting the resolved domains ?
(2) How does one move the wc -l [filename] counts into those last two scripts ?

Adjunto	Tamaño
Mixed-Types.txt	1.84 KB

Vie, 02/18/2022 - 16:47

Magic Banana

I am a member!

I am a translator!

Desconectado/a

se unió: 07/24/2010

As afar as I understand, you want:

the whole lines whose last fields is an IPv4 address:
$ grep -E ' ([0-9]{1,3}\.){3}[0-9]{1,3}$' Mixed-Types.txt > Resolved-2k-Domains.txt
the first field of the remaining lines:
$ grep -vE ' ([0-9]{1,3}\.){3}[0-9]{1,3}$' Mixed-Types.txt | cut -d ' ' -f 1 > Unresolved-two-dozen-Domains.txt

Both commands can run in parallel and Mixed-Types.txt needs not be sorted.

Answering your questions anyway:

(1) Is there a less cumbersome way of counting the resolved domains ?

Grep has option -c for that:
$ grep -Ec ' ([0-9]{1,3}\.){3}[0-9]{1,3}$' Mixed-Types.txt 35

(2) How does one move the wc -l [filename] counts into those last two scripts ?

With $(...);
$ head -$(wc -l < PartBB.txt) PartCC.txt > Resolved-2k-Domains.txt $ tail -$(wc -l < PartCC.txt) PartAA.txt > Unresolved-two-dozen-Domains.txt
Notice also the removal of the useless uses of awk, which only copies the input here!, and sed (by redirecting wc's input so that it does not print the file name).

Vie, 02/18/2022 - 18:50

amenex

Desconectado/a

se unió: 01/03/2015

The lesson for today was to learn the use of "$" in sed and in head as well as the use of cut,
not to mention counting with grep.

Another related question comes up:
mboxgrep has no provision for a pattern file like that in grep, nor can one pipe in a pattern
one-at-a-time, as from awk. My workaround has been to create a huge script file with Leafpad,
which takes only a few keystrokes. Is there a better way of listing all the emails containing
the pattern(s) ? My big scripts are held up by the dig searches, not be processing their outputs.

Vie, 02/18/2022 - 20:54

Magic Banana

I am a member!

I am a translator!

Desconectado/a

se unió: 07/24/2010

For the shell to read the lines on the standard input one by one:
while read pattern do (... "$pattern" is the line ...) done
As always, < can redirect the standard input so that the lines are read from a file ("$1" below, i.e., the first argument of the shell script):
while read pattern do (... "$pattern" is the line ...) done < "$1"
Of course, the variable containing the line needs not be named "pattern".

Vie, 02/18/2022 - 22:31

amenex

Desconectado/a

se unió: 01/03/2015

Magic Banana's assigned homework problem:
while read pattern do (... "$pattern" is the line ...) done < "$1"
Solutions proposed by amenex:
while read pattern do dig $pattern | grep -A6 ";; ANSWER SECTION:" | sed 's/;;\ ANSWER SECTION://g' |awk '{print $1,$5} NR==6{exit}' >> Resolved-XX-Domains-MB.txt done < Unresolved-two-dozen-Domains.txt
plus some sed coding to clean up Resolved-XX-Domains-MB.txt:
sed 's/\.\ /\ /g' Resolved-XX-Domains-MB.txt | sed -r 's/\.$//' | sed 's/;;\ msec//g' | sed 's/;;//g' | grep "\S" '-' > Resolved-YY-Domains-MB.txt
In response to my complaint about the limitations of mboxgrep:
while read pattern do mboxgrep $pattern /media/george/523ff5d3-64ea-486d-ba82-58721680b667/george/Georgesbasement.com.A/Thumb256E/GeorgesBasement.com/AAspam/1998-2021.Newest > Emails-02182022.txt done < Unresolved-two-dozen-Domains.txt
Takes a little longer to analyze a 200+ MB email collection ... bear in mind that the Email collection
remains fixed, but the unresolved domains vary from day to day.

Thanks to Magic Banana for making me think about applying his suggestions ..

Inicie sesión o regístrese para enviar comentarios

top

Idiomas

Navegación

Donaciones recientes

Distinguish the end of a list of IPv4 addresses from the following alphanumeric strings