Outside the box thoughts

7 replies [Last post]
amenex
Offline
Joined: 01/04/2015

A substantial fraction of the Internet traffic that my website's Webalizer data
reports to me consists of unresolvable hosts. The 2016 US election was punctuated
by a sharp decrease in resolvable-host traffic.

Considering that 100% of network packets are identified by numerical information
in the form of IPv4 or IPv6 addresses, what's to prevent my 'Puter's operating
system from detecting and storing IP addresses that resolve to hosts that in turn
cannot be resolved ? Such a task need not slow down access to specific addresses
because it would not have to transmit all of that IP address's data in order to
perform its detection function.

Attached is one set of data based on my own unpublished Webalizer data which
shows a sharp downward trend in resolvable ("passed" in the figure) hosts that
reached its nadir in the month of November 2016, followed by a sharp rebound.
It took me four years to learn how to recognize what was happening.

If the trisquel operating system could detect the unresolvability of hostnames
attributable to the incoming IPv4- or IPv6-based packets, we could spot trends
like that shown in the attached figure, and we might better be able to trust the
traffic that has a positive resolvability attribute.

The sharp drops in November and December 2019 remain a mystery to me ...

George Langford

AttachmentSize
Passed.Fraction.II_.jpg1.23 MB
amenex
Offline
Joined: 01/04/2015

Installman inquired as to what is it about the traffic that makes it resolvable.

The traffic comes into your system as packets identifiable by their Internet
address, IPv4 (four three-digit numbers between 0 & 255) or IPv6 (eight four-
character hex numbers between 0 & FFFF). The host/domain/pointer name is
resolved by a DNS (domain name service) lookup. No single IP address can have
more than one name, controlled by one or another assigned number clearing
house.
Alas, the reverse isn't controlled at all. A host name can have any number of
addresses. Some IPv6-based servers send out the same name in response to all
DNS queries, making it appear that the name has an infinite number of addresses
associated with it, a quantity that would take forever to assign to the actual
information storage devices.
The IPv4 address space is being squandered by the practice of filling servers
with like-named hosts, not unlike the hoarding of wealth.

amenex
Offline
Joined: 01/04/2015

Regarding the puzzling output in my data plot:
The sharp drops in November and December 2019 remain a mystery to me ...

After failing to find any glitches in the Webalizer software & my associated scripting.
I went back to the beginning and evaluated the data for November 2019, December 2019,
and January 2020. The results are quite different from Passed.Fraction.II_.jpg
but there are still a few of the original PTR's that should be evaluated lest I include
inactive PTR's.

Here's an outline of my progress:
The initial test for resolvability was done by editing the output of the nmap scan of
the list of PTR's with sed to convert "92.242.140.21" and "127.0.0.1" to "unresolvable,"
the "unresolvable" placed in what would be the IPv4 position in the outputs. Some PTR's
are not flagged as 92.242.140.21's by the barefruit errorhandling service and appear as
"Failed to resolve" standard error outputs including the associated PTR's.
Those FTR's are the subject of this communication.
It could taint the data to include "No_DNS" addresses and "Failed to resolve" PTR's in
the unresolvable counts; whereas the "No_DNS" addresses are simply dropped from the
analysis, some of the "Failed to Resolve" PTR's can actually still be looked up with dig -x
or nmap by evaluating the 24 permutations of the four IPv4 octets often included in the
PTR names, so those subsequently resolved PTR's must still be live and belong in the
"unresolvable" counts.
The FTR's for the three months in this analysis were concatenated and subjected to Magic
Banana's 4321-permutations script:
awk -f ./Script-MB-4321-permutations -v sep='\n' <(awk '{print $1}' GB.FTRs.txt) | sudo nmap -Pn -sn -T2 --max-retries 8 -iL '-' oG - | grep "Nmap scan report for " '-' | awk '{print $6,$5}' '-' | sed 's/()/(No_DNS)/g' | tr -d '()' > GB.FTR.Extracted-IPv4s.Permutations-oGnMap.txt
Some of these nmap results clearly are resolutions of the 1234-arranged IPv4 octets, but
only the following grep scripts include them in their outputs:
grep -f GB.CurrentVisitors-2019-11.FTR.txt Temp-05162021-B01.txt > GB.CurrentVisitors.IPv4-PTR-Pairs-2019-11.txt
grep -f GB.CurrentVisitors-2019-12.FTR.txt Temp-05162021-B01.txt > GB.CurrentVisitors.IPv4-PTR-Pairs-2019-12.txt
grep -f GB.CurrentVisitors-2020-01.FTR.txt Temp-05162021-B01.txt > GB.CurrentVisitors.IPv4-PTR-Pairs-2020-01.txt

The many false positives are troubling, so I attempted the same end with awk:
awk '{print $1}' GB.CurrentVisitors-2019-11.FTR.txt | awk '/-/' Temp-05162021-B01.txt > GB.Awk.CurrentVisitors.IPv4-PTR-Pairs-2019-11.txt
awk '{print $1}' GB.CurrentVisitors-2019-12.FTR.txt | awk '/-/' Temp-05162021-B01.txt > GB.Awk.CurrentVisitors.IPv4-PTR-Pairs-2019-12.txt
awk '{print $1}' GB.CurrentVisitors-2020-01.FTR.txt | awk '/-/' Temp-05162021-B01.txt > GB.Awk.CurrentVisitors.IPv4-PTR-Pairs-2020-01.txt

All three of these scripts produce the same 35.1 kB output, so they are not working as I
might expect. I tried changing the end-of-line character in GB.CurrentVisitors-20??-??.FTR.txt
to the pipe ("|") symbol and re-running the last three scripts, but the same 35.1 kB output
was produced by those as well.

AttachmentSize
GB.FTRs_.txt 5.47 KB
Script-MB-4321-permutations.txt 863 bytes
GB.CurrentVisitors-2019-11.FTR_.txt 2.61 KB
GB.CurrentVisitors-2019-12.FTR_.txt 1.49 KB
GB.CurrentVisitors-2020-01.FTR_.txt 1.84 KB
Temp-05162021-B01.txt 41.03 KB
Magic Banana

I am a member!

Offline
Joined: 07/24/2010

All three of these scripts produce the same 35.1 kB output, so they are not working as I might expect.

They filter the lines containing "-", i.e., produce the same output as 'grep - Temp-05162021-B01.txt'. What you wrote up to "|" serves no purpose (but wasting computing resources), because the subsequent command does not process the standard input. It processes Temp-05162021-B01.txt.

In your previous grep calls, the pattern files contain dots that are interpreted as "any single character", as I tell you every month or so...

It looks like you want join (after sorting the join fields).

amenex
Offline
Joined: 01/04/2015

I must have been misinterpreting my script:
awk '{print $1}' GB.CurrentVisitors-2019-11.FTR.txt | awk '/-/' Temp-05162021-B01.txt > GB.Awk.CurrentVisitors.IPv4-PTR-Pairs-2019-11.txt
What I was thinking is that the print statement picks a PTR from the first file, and awk's "/-/"
takes that PTR as the pattern to search in the second file ... but that didn't work as intended.

Onwards to Magic Banana's suggestion to use join; after executing the script:
awk '(NF==2) {print $1,$2}' GB.FTR.Extracted-IPv4s.Permutations-oGnMap.txt > Temp-05172021-A01.txt
I joined the Temp-05172021-A01.txt file (attached) as follows:
join -1 2 -2 1 <(sort -k 2,2 Temp-05172021-A01.txt) <(awk '{print $1}' GB.CurrentVisitors-2019-11.FTR.txt | sort -k 1,1 ) | sort -u > GB.CurrentVisitors.IPv4-PTR-Pairs-2019-11.txt
Then I used comm to extract the FTR'ed PTR's from the original Current Visitor data:
comm -23 <(sort -k 1,1 GB.CurrentVisitors-2019-11.FTR.txt) <(awk '{print $1}' GB.CurrentVisitors.IPv4-PTR-Pairs-2019-11.txt | sort -k 1,1) > GB.CurrentVisitors-2019-11.FTR-Remainder.txt
and follwed that with an Internet search for "live" PTR's whose IPv4 addresses could satisfy a
dig -x lookup. After rejecting the not-live PTR's, 16, 13 & 12 additional Unresolvable PTR's could
be added to the previous counts for 2019-11, 2019-12, & 2020-01 Unresolvables, respectively.

In my previous join attempts, I tried to combine the "awk '(NF==2) {print $1,$2}' GB.FTR.Extracted ..."
step with that first join script, which was a nonstarter.

AttachmentSize
Temp-05172021-A01.txt 41.03 KB
GB.CurrentVisitors-2019-11.FTR-Remainder.GG_.txt 1.84 KB
GB.CurrentVisitors-2019-12.FTR-Remainder.GG_.txt 1.53 KB
GB.CurrentVisitors-2020-01.FTR-Remainder.GG_.txt 1.59 KB
GB.CurrentVisitors-2019-11.txt 501.15 KB
GB.CurrentVisitors-2019-12.txt 440.91 KB
GB.CurrentVisitors-2020-01.txt 479.32 KB
amenex
Offline
Joined: 01/04/2015

Magic Banana's patience is wearing thin regarding my abuse of dots in grep, so I attempted
a subterfuge because the double Q sequence is a rarity:
grep -f <(awk '{print $1}' GB.CurrentVisitors-2019-11.FTR.txt | sed 's/\./QQ/g') <(awk '{print$1,$2}' Temp-05162021-B01.txt | sed 's/\./QQ/g') | sed 's/QQ/\./g' | sort -u > GB.CurrentVisitors.IPv4-PTR-Pairs-2019-11-S.txt
Works, but carries over a few that CJ (first, comm, then, join) didn't:
comm -23 <(sort -k 1,1 GB.CurrentVisitors.IPv4-PTR-Pairs-2019-11-S.txt) <(awk '{print $2,$1}' GB.CurrentVisitors.IPv4-PTR-Pairs-2019-11-CJ.txt | sort -k 1,1) > Temp-05182021-A01.txt
Stung by the realization that the QQ (actually, "qq") sequence has special meaning in the
perl language, I tried escaping another character "\~" instead:
grep -f <(awk '{print $1}' GB.CurrentVisitors-2019-11.FTR.txt | sed 's/\./\~/g') <(awk '{print $1,$2}' Temp-05162021-B01.txt | sed 's/\./\~/g') | sed 's/\~/\./g' | sort -u > GB.CurrentVisitors.IPv4-PTR-Pairs-2019-11-S2.txt
with the exact same result. The three PTR's in Temp-05182021-A01.txt are artefacts of the
4321 permutation script and will be rejected by comparison with the original Current
Visitors list, but their selection by grep is puzzling nevertheless (but see my final sentence).
Here they are in plain view:
173.44.56.244 unassigned.quadranet.com
185.102.78.99 unassigned-185-102-78.click2call.cz
64.40.114.190 unassigned.netnation.com

Now I see the problem: There's a PTR named "unassigned" in GB.CurrentVisitors-2019-11.FTR.txt.
Grep was just doing its job ...

AttachmentSize
Temp-05182021-A01.txt 128 bytes
Magic Banana

I am a member!

Offline
Joined: 07/24/2010

Magic Banana's patience is wearing thin regarding my abuse of dots in grep, so I attempted a subterfuge because the double Q sequence is a rarity

And you will get into troubles whenever "QQ" occurs. I explained in this thread how to match a dot: https://trisquel.info/forum/outside-box-thoughts#comment-157907

I also suspect you do not understand that, without option -x, only part of the line needs to match the pattern. And it still looks like you actually want to use join.

amenex
Offline
Joined: 01/04/2015

Here's the updated graph, where I've patched in the recalculated data for November and
December 2019, as well as January 2020. The middle three of the five stars are the new
data, but the first and last of the five stars are the original data used to keep the
spline-fit line continuous. The puzzling drop in the passed fraction remains and might
only be explained by finding out the specific activity of the unresolvable PTR's.

AttachmentSize
Passed.Fraction.III_.pdf 116.78 KB