Basic math on text files

5 respostas [Última entrada]
Joined: 01/04/2015

Starting with 70+ months of current visitor data, my task is to calculate the trend
of unresolvable hosts over the period punctuated by the advent of hostname lookup in
Apache servers. In my case, the transition happened abruptly about half way through
the period.

The example that I've uploaded contains 649 IPv4 addresses from the beginning of the
set of data.

What I've done so far is to use the following nmap script to look up the PTR records
(a.k.a. hosts) of as many of the addresses that are still live:
awk '{print $1}' Summary.txt | sort -u | sudo nmap -Pn -sn -T2 --max-retries 8 -iL '-' -oG - | grep "Host:" '-' | awk '{print $2,$3}' '-' | sed 's/()/(No_DNS)/g' | tr -d '()' | sed 's/No_DNS//g' > Step01.txt immediately followed by another instance of nmap that tries to do the reverse of the first task: nmap -Pn -sn -T2 --max-retries 8 -iL <(awk '{print $2}' Step01.txt) -oG - | grep "Host:" '-' | awk '{print $2,$3}' '-' | sed 's/()/(No_DNS)/g' | tr -d '()' >> Output.txt
and a comm command finds out which pairs of IPv4 address and PTR's are in both the Step01.txt
file and the Passed file:
comm -12 <(sort Step01.txt) <(sort Output.txt) > Passed.txt
You'll notice that any originally unresolvable addresses are discarded in the first
nmap script; the comm command discards the second nmap script's unresolvable addresses.

The wc command lets me calculate the figure of merit for the Resolvability of the
hosts of the set of IPv4 addresses:
wc -l Passed.txt > 359 Passed.txt ... Col$1 the numerator of the figure of merit
wc -w Step01.txt > 1108 Step01.txt ... The difference between the Col$1's ...
wc -l Step01.txt > 649 Step01.txt ... of these two numbers is the denominator.

which works out to 0.78, which is OK, considering how much time has passed since those
data were collected.

It's at this stage that I become baffled by the syntax of bash mathematics ...
Most of the file crunching will be in the hands of a couple of scripts that
I've prepared, but those ratios are a task of several hours for me.

George langford

Summary01.txt19.17 KB
Step01.txt24.6 KB
Passed01.txt17.42 KB
Magic Banana

I am a member!

Joined: 07/24/2010

The shell is inappropriate for math. It is especially bad for floating-point arithmetic. You can feed 'bc -l' with the mathematical expression. Something like that:
$ echo "$(wc -l < Passed.txt) / ($(wc -w < Step01.txt) - $(wc -l < Step01.txt))" | bc -l

Joined: 01/04/2015

Magic Banana's solution is elegant & accurate.

At first our results differed, but then I realized that the nmap results had changed between
the posted data and my saved data because the same scan had been re-run. The number of rows
of data were the same, but the word counts had changed.

Now I'm suffering through all-too-frequent Etiona freezes, the last time without any meddling
on my part. Apparently, there are nmap errors that Etiona can't handle. The freezes happen
while the nmap data is being transferred to the HDD.

There was an inconsequential error in Magic Banana's code, herein corrected:
$ echo "$(wc -l < Passed01.txt) / ($(wc -w < Step01.txt) - $(wc -l < Step01.txt))" | bc -l
Passed.txt doesn't exist ...

Joined: 01/04/2015

Road blocks have appeared ...

Attempting to streamline the processs of capturing the error messages that have
been appearing in the terminal window during the scans, I attempted to run just
the middle step:
nmap -Pn -sn -T2 --max-retries 8 -iL <(awk '{print $2}' Temp-03212021-A55.txt) -oG - | grep "Host:" '-' | awk '{print $2,$3}' '-' | sed 's/()/(No_DNS)/g' | tr -d '()' >> Summary.01906.nMapoG.txt 2> FailedToResolve.01906.txt
but forgot to add the required sudo for the nmap command ... Perhaps as a
consequence, the script ran at a glacially slow pace and somehow clogged
my access to the Internet to the point where I could not ping anything
beyond the local host; even the router was inaccessible ... but the script
motored on, albeit at a pace slower than a growing blade of grass.

After aborting this scan and rebooting (making the Internet accessible
once more) I restarted the script, this time with the requisite sudo:
sudo nmap -Pn -sn -T2 --max-retries 8 -iL <(awk '{print $2}' Temp-03212021-A55.txt) -oG - | grep "Host:" '-' | awk '{print $2,$3}' '-' | sed 's/()/(No_DNS)/g' | tr -d '()' >> Summary.01906.nMapoG.txt 2> FailedToResolve.01906.txt
and hit a second roadblock:
Failed to open input file /dev/fd/63 for reading
The source file is big (350.7 KB) but acceptable to the 1.0 MB limitation,
so here's the real thing.

Is /dev/fd/63 actually Temp-03212021-A55.txt ?
Or is it a remembered portion of the first step of the overall script from an
earlier execution of the main script ? /dev/fd lists the files 0,1,2 and 255.

Embedded in the overall script, the middle section runs OK, and the script
completes its overall task just fine, but without saving any of the error messages,
which I attempted to gather from the terminal window.

The same middle section, again run separately (with sudo !) again hits /dev/fd/63

Temp-03212021-A55.txt 350.7 KB
Magic Banana

I am a member!

Joined: 07/24/2010

Is /dev/fd/63 actually Temp-03212021-A55.txt ?

I am pretty sure it is the output of awk '{print $2}' Temp-03212021-A55.txt. It should not be in a subshell, which certainly exits before nmap is over. Also, cut would be faster than awk (I assume spaces delimit the columns; is tabs do, remove "-d ' '"), you use '-' when it is useless, your sed substitution keeps parentheses that tr deletes right after, etc.:

cut -d ' ' -f 2 Temp-03212021-A55.txt | sudo nmap -Pn -sn -T2 --max-retries 8 -iL - -oG - | grep Host: | cut -d ' ' -f 2,3 | sed 's/()/No_DNS/g' | tr -d '()' >> Summary.01906.nMapoG.txt 2> FailedToResolve.01906.txt

Joined: 01/04/2015

Efficiencies notwithstanding, those
Failed to resolve ""..
scroll by way too fast already and do not end up in the error_output file.

Now running another instance of Etiona, this one on the same HDD as the data files,
the previously mentioned tendencies of that OS to freeze at the slightest change
in the environment (such as saving another file or encountering an nmap error)
have vanished, leaving us only those nmap "comments" to collect by hand.