Acquiring autonomous system numbers from a list of IPv4 addresses containing malicious data
My old standby, nmap ==scripts asn-query [several thousand nasty IPv4 addresses]
causes my 'puter to freeze.
Not just often ... every time that the script is run. Worse yet, it's not saving any information about the actor.
This probably wouldn't happen if the script would simply confine itself to ASN, Country Code, and location. Instead, it
performs a wide-ranging set of port scans ... and the computer freezes.
Whois is safer, but almost never returns any ASN data from US servers. The anti-hacker-alliance website also runs port
scans on every IPv4 address that is submitted. I even found a website that will perform free ASN-lookups on batches of
up to twenty IPv4 addresses, but that website also froze (and my 'puter with it) on my first submission of addresses
from my list. The CIDR report (https://www.cidr-report.org/as2.0/) gives CIDR address block data within its
webpage, but I don't know of any way of navigating that webpage to see where a specific IPv4 might be in ASN space.
I've even split the overall list into twenty-address chunks, but that doesn't stop the freezes.
Is there any non-proprietary ASN-lookup software available ?
Solved: Applying the same options with this nmap scan as with the usual scan of IPv4 addresses & blocks thereof bypassed the port scanning feature and made the scans stealthier yet more persistent, so the several thousand emails were scanned for autonomous system numbers, country codes, etc. with no more freezes and not even any hassles.
Now it's time to process all the data thatnmap --script asn-query -Pn -sn -T2 --max-retries 8 -iL 2kIPv4s-ThruM3740.txt > 2kIPv4sthruM3740.options.txt
collected. After multiple steps of editing, there is a condensed version of 3740 rows in the second attached file.
Alas, the nmap script avoids duplication of effort by referring similar outputs to key addresses, which I've tried to process with a series of additional scripts:
grep "see-" 2kIPv4s-ThruM3740.options.edit09.txt | awk '{print $1}' > ListofSeekers.txt
grep -ef ListofSeekers.txt 2kIPv4s-ThruM3740.options.edit09.txt | sed 's/\:/\t/g' | awk '{print $2"\t"$3"\t"$4"\t"$5"\t"$6}' | grep -v "see-" '-' > ListofReplacements.txt
join -a 2 -1 1 -2 1 <(sort -k 1,1 2kIPv4s-ThruM3740.options.edit09.txt) <(awk '{print $1"\t"$2}' ListofSeekers.txt | sort -k 1,1) > ListofTargets.txt
The next step to fill in the rows that the nmap script left unfinished requires a script that, for every row in ListofTargets.txt which has "see-IPv4" in Col.$3, replace that row's 2nd through 6th columns with ListofReplacements.txt's corresponding columns that have that IPv4 in Col.$1. Doing so will improve the portability of the data to other steps in the spam analysis.
Attachment | Size |
---|---|
2kIPv4s-ThruM3740.options.txt | 615.23 KB |
2kIPv4s-ThruM3740.options.edit09.txt | 236.82 KB |
ListofSeekers.txt | 32.86 KB |
ListofReplacements.txt | 11.57 KB |
ListofTargets.txt | 130.09 KB |
As far as I understand (and assuming 2kIPv4s-ThruM3740.options.edit09.txt is OK despite the varying number of tabulations among the lines without "see-": two, four or five) that is what you want:
$ awk '{ if (substr($3, 1, 4) == "see-") see[$1 "\t" $2] = substr($3, 5); else { print; ref[$1] = $3 "\t" $4 "\t" $5 } } END { for (s in see) print s "\t" ref[see[s]] }' 2kIPv4s-ThruM3740.options.edit09.txt
Magic Banana contributed:
As far as I understand (and assuming 2kIPv4s-ThruM3740.options.edit09.txt is OK despite the varying number of tabulations among the lines without "see-": two, four or five) that is what you want: which worked perfectly on the submitted file.
$ awk '{ if (substr($3, 1, 4) == "see-") see[$1 "\t" $2] = substr($3, 5); else { print; ref[$1] = $3 "\t" $4 "\t" $5 } } END { for (s in see) print s "\t" ref[see[s]] }' 2kIPv4s-ThruM3740.options.edit09.txt
but then the need arose to include the Peer Autonomous System Numbers that the associated emails follow on their malicious way to my Inbox, which Magic Banana addressed with another efficient script https://trisquel.info/en/forum/edit-last-few-fields-line-starting-particular-string#comment-165743:
$ awk -F 'Peer-AS-' -v OFS='\t' '{ gsub(/\t*$/, "") } $2 { gsub(/\t/, "-AS", $2); $2 = "Peer-AS" $2 } { print }' Peer-problem.txt
which works perfectly when applied to the attached files.
Alas, the first script is dumfounded by the extra column containing the Peer-ASN list. Previously, what was Col.$3 of each attached file was handled OK in spite of the presence of partially empty columns $2 & $4, but the files, previously last edited with LibreOfficre.Calc to remove superfluous data (ISP addresses, etc.) suffer loss of the double-tabs usually left alone by awk, and that has left the "See-ISP4" data scattered among Col.'s $2, $3 & $4.
Even though the two files are each based on the same list of 3,740 IPv4 addresses, the outputs of the nmap --scripts asn-query command are very different, inspiring further investigation on a daily basis. Here also for consideration is 2kIPv4s-ThruM3740.options-H.txt, the unedited output of the script:
nmap --script asn-query -Pn -sn -T2 --max-retries 8 -iL nMapLists/2kIPv4s-ThruM000.txt > nMapData/2kIPv4s-ThruM000.options-H.txt ;
nmap --script asn-query -Pn -sn -T2 --max-retries 8 -iL nMapLists/2kIPv4s-ThruM001.txt > nMapData/2kIPv4s-ThruM001.options-H.txt ;
nmap --script asn-query -Pn -sn -T2 --max-retries 8 -iL nMapLists/2kIPv4s-ThruM002.txt > nMapData/2kIPv4s-ThruM002.options-H.txt ;
...
nmap --script asn-query -Pn -sn -T2 --max-retries 8 -iL nMapLists/2kIPv4s-ThruM184.txt > nMapData/2kIPv4s-ThruM184.options-H.txt ;
nmap --script asn-query -Pn -sn -T2 --max-retries 8 -iL nMapLists/2kIPv4s-ThruM185.txt > nMapData/2kIPv4s-ThruM185.options-H.txt ;
nmap --script asn-query -Pn -sn -T2 --max-retries 8 -iL nMapLists/2kIPv4s-ThruM186.txt > nMapData/2kIPv4s-ThruM186.options-H.txt ;
Which operates on 187 chunks of the 3740-line IPv4 list so as to be certain that a single error doesn't kill the script; the 3,740-row script is easily constructed with split and Leafpad, easier yet to update with Leafpad each morning, takes ten or eleven minutes of nmap time, and doesn't run afoul of the malicious data on the IPv4s' servers because there's no port scanning being done and the data is collected from the server's registrars. The last attached file is the concatenated output of those 187 chunks. Running the third-file script again isn't going to replicate its outputs from this morning.
Attachment | Size |
---|---|
2kIPv4s-ThruM3740.options-A.edit12MB.txt | 256.46 KB |
2kIPv4s-ThruM3740.options-C.edit09MB.txt | 257.5 KB |
Script-ASNs-2kIPv4s-ThruM.options-H.txt | 23.92 KB |
2kIPv4s-ThruM3740.options-H.txt | 616.13 KB |
Magic Banana's script preserves all 3740 rows of the original IPv4 address list and identifies 2008 resolved domains
associated with those addresses; another 1732 are unresolved. On a typical day there are 2000+ resolved domains in the
list, but the specific ones resolved differ by 300+/- to 2000+ names from day to day.
Flattening the nmap scan results is proving to be problematic. Here's where it stands:
awk '{print $2}' Resolved-Domains-NthruZ-Sort-IPv4.txt | nmap --script asn-query -Pn -sn -T2 --max-retries 8 -iL '-' > ASN-2kIPv4s-NthruZ.txt
Agglomerates the results of a daily series of dig searches on the domains separated from a set of collected
email addresses and then runs a nmap scan to ascertain their servers' basis data, such as AS number, CIDR
block, country code, ownership and peer AS numbers.
sed 's/Nmap\ scan\ report\ for\ /Nmap-scan-report-for-/g' ASN-2kIPv4s-NthruZ-snip.txt | grep -v "Starting" '-' | grep -v "seconds" | awk 'FS="Nmap-scan-report-for-" {print $1,$2}' | awk '{printf "%s+",$0} END {print ""}' '-'
Recognizing that each record of the nmap scan begins with the string "Nmap scan report for" the script separates
those records and then attempts to flatten each record by substituting "+" for each newline character. Alas, the
records then become indistinguishable.
See this reference for the last conversion step:
https://serverfault.com/questions/391360/remove-line-break-using-awk
Attachment | Size |
---|---|
ASN-2kIPv4s-NthruZ-snip.txt | 5.77 KB |
Resolved-Domains-NthruZ-Sort-IPv4.txt | 94.01 KB |
ASN-2kIPv4s-NthruZ.txt | 516.55 KB |
It is unclear what you want. I guess "Nmap scan report for " as the input record separator (RS, not FS) and "+" (a weird choice of delimiter) substituting any sequence of newlines in a record but its trailing sequence (to delete). If so:
$ head -n -2 ASN-2kIPv4s-NthruZ.txt | awk -v RS='Nmap scan report for ' 'NR > 1 { sub(/\n+$/, ""); gsub(/\n+/, "+"); print }'
As in one of your recent posts, you apparently try to define a variable (FS) where a condition is expected. Please learn the basic structure of an AWK program: a sequence of condition-action pairs.
Alas, I skipped a necessary step which became abundantly obvious upon inspection of the ASN-2kIPv4s-NthruZ-snip.txt file:
The dig results return the "A" records of the domains, which are mainly on shared servers whose "PTR" records are returned
by the nmap scans, so we might as well reduce the targets of the nmap scans to those IPv4 addresses which are unique:
uniq --skip-fields=1 Resolved-Domains-NthruZ-Sort-IPv4.txt | nmap --script asn-query -Pn -sn -T2 --max-retries 8 -iL '-' | head -n -2 '-' | awk -v RS='Nmap scan report for ' 'NR > 1 { sub(/\n+$/, ""); gsub(/\n+/, "+"); print }' > ASN-NthruZ-Uniq.txt
where I've incorporated Magic Banana's script to distinguish the nmap scan's records.
Another script attempts to accommodate Magic Banana's script to complete the "See the result for" tasks:
sed 's/See/\+See/g' ASN-NthruZ-Uniq.txt | sed 's/result\ for\ /result\ for\ \+/g' > ASN-NthruZ-Uniq.See.resultforplus.txt
Change the plus's to tabs to create a set of columns within each nmap record:
sed 's/\+/\t/g' ASN-NthruZ-Uniq.See.resultforplus.txt > ASN-NthruZ-Uniq.tabs.txt
Now's a good time to use Magic Banana's Peer-fixing script:
awk -F 'Peer AS-' -v OFS='\t' '{ gsub(/\t*$/, "") } $2 { gsub(/\t/, "-AS", $2); $2 = "Peer AS" $2 } { print }' ASN-NthruZ-Uniq.tabs.txt > ASN-NthruZ-Uniq.tabs.Peers.txt
The nmap scan results have now become a lot flatter and more readable.
There are some additional considerations:
(1) "Other addresses for" signals additional addresses bearing the same PTR name for the name of the queried IPv4 address.
(2) "rDNS record for" sometimes signals an alternate PTR address for that name associated with the IPv4 address scanned.
(3) "Host is up" isn't always the third string in each record; when that IPv4 hasn't been resolved, I've been adding "No_DNS" as
the first string of each record.
(4) The owner addresses make for excessive clutter; one column for the name and a second column for the country code are enough.
We shouldn't lose the information in these four considerations.
Magic Banana's "See the results for" script worked very well on my hand-flattened nmap scans, but I've not understood it well
enough to edit the script successfully yet. Attached are the beginning and end points of today's analysis.
Attachment | Size |
---|---|
Resolved-Domains-NthruZ-Sort-IPv4.txt | 94.01 KB |
ASN-NthruZ-Uniq.tabs_.Peers_.txt | 748.36 KB |
Magic Banana's "See the results for" script worked very well on my hand-flattened nmap scans, but I've not understood it well enough to edit the script successfully yet.
- awk receives all the input lines but the last two (an empty line and the "Nmap done" summary), thanks to head -n -2;
- RS is defined as "Nmap scan report for ", i.e., that string separates the records;
- the condition NR > 1 makes the subsequent action apply to every record but the first one (the "Starting Nmap" line);
- the action consists of three successive instructions:
- sub(/\n+$/, "") substitutes the trailing newlines for the empty string (a deletion);
- gsub(/\n+/, "+") substitutes every remaining sequence of newlines for the character "+"... but you now write that you want a tabulation: gsub(/\n+/, "\t");
- print outputs the edited record.
Is it clear now?
In the intervening time before Magic Banana's most recent post, I realized that I had not removed the first column (domains) of
the input file, so I've adjusted the pertinent scripts accordingly:
uniq --skip-fields=1 Resolved-Domains-NthruZ-Sort-IPv4.txt | awk '{print $2}' | nmap --script asn-query -Pn -sn -T2 --max-retries 8 -iL '-' | head -n -2 '-' | awk -v RS='Nmap scan report for ' 'NR > 1 { sub(/\n+$/, ""); gsub(/\n+/, "+"); print }' > ASN-NthruZ-Uniq-02.txt
sed 's/See/\+See/g' ASN-NthruZ-Uniq-02.txt | sed 's/result\ for\ /result\ for\ \+/g' > ASN-NthruZ-Uniq-02.See.resultforplus.txt
sed 's/\+/\t/g' ASN-NthruZ-Uniq-02.See.resultforplus.txt > ASN-NthruZ-Uniq-02.tabs.txt
awk -F 'Peer AS' -v OFS='\t' '{ gsub(/\t*$/, "") } $2 { gsub(/\t/, "-AS", $2); $2 = "Peer AS" $2 } { print }' ASN-NthruZ-Uniq-02.tabs.txt > ASN-NthruZ-Uniq-02.tabs.Peers.txt
Now it's time to edit Magic Banana's original script to complete the "See the results for "[IPv4 address] statements:
awk '{ if (substr($3, 1, 4) == "see-") see[$1 "\t" $2] = substr($3, 5); else { print; ref[$1] = $3 "\t" $4 "\t" $5 } } END { for (s in see) print s "\t" ref[see[s]] }' ASN-NthruZ-Uniq-02.tabs.Peers.txt > ASN-NthruZ-Uniq-02.tabs.Peers.See.txt
To which Magic Banana's explanation of his script of [Sat, 03/12/2022 - 19:02] doesn't apply:
head -n -2 ASN-2kIPv4s-NthruZ.txt | awk -v RS='Nmap scan report for ' 'NR > 1 { sub(/\n+$/, ""); gsub(/\n+/, "+"); print }'
However, now I do understand what that script does and how it performs those three steps.
Back to the pertinent script: There are several differences from the hand-flattened target file:
(1) The leading IPv4 addresses in some records are missing a preceding "No_DNS" entry.
(2) The other Col.$1 records now collapse the domain and its address into one field; there should be two fields
which can be corrected by using tr -d '()' to generate a space separating the two, followed by making that middle
space into a tab; the trailing [space,tab] can be fixed with sed.
(3) Where there is a leading domain (PTR) name, the "See the result for" string is now in Col.$6; it was Vol.$3.
(4) Once the first three differences are corrected in the target file, the following three columns are superfluous
in the complete recordsand can be removed with sed.
(5) The pointer (IPv4) to the asn-query data that's now in Col.$7 should end up in Col.$4 after the changes.
Attachment | Size |
---|---|
ASN-NthruZ-Uniq-02.tabs_.Peers_.txt | 354.42 KB |
using tr -d '()' to generate a space separating the two
tr -d '()' does not generate spaces. It deletes every opening or closing parenthesis.
After repeatedly using tr in that way, It should have become clear what "delete" means ...
Back to the task at hand:
After a series of my inefficient sed and awk scripts bludgeoned the file into an approximate degree of flatness,
expedience led me to the use of LibreOfffice.Calc. to perform a couple of remaining tasks with the attached result.
To prepare the attached file for Magic Banana's "See-the-result-for-"[IPv4 address] script:
$ awk '{ if (substr($3, 1, 4) == "see-") see[$1 "\t" $2] = substr($3, 5); else { print; ref[$1] = $3 "\t" $4 "\t" $5 } } END { for (s in see) print s "\t" ref[see[s]] }' 2kIPv4s-ThruM3740.options.edit09.txt
I saw to it that the new columns are in the same order as the ones to which the script applies; and I used LibreOffice.Calc.
to insert No_DNS in Col.$2 for the unresolved IPv4 addresses.
$ awk '{ if (substr($3, 1, 4) == "see-") see[$1 "\t" $2] = substr($3, 5); else { print;
ref[$1] = $3 "\t" $4 "\t" $5 } } END { for (s in see) print s "\t" ref[see[s]] }' ASN-NthruZ-flattened.sort.MB.txt
which works OK in spite of the new data. Tasks still remain:
(1) Fill in the missing domain names where the IPv4 addresses could not be resolved by nmap;
(2) Truncate the domain-owner information to a single name-only column (no addresses or gratuitous information) plus a
country-code column, those two columns to precede the Peer AS list at the end of each record.
Attachment | Size |
---|---|
ASN-NthruZ-flattened.sort_.MB_.txt | 199.22 KB |