How can one capture the "Failed to resolve" output to console during an nMap scan ?

37 replies [Last post]
amenex
Offline
Joined: 01/03/2015

After removing the unconverted IPv4 addresses from the Webalizer statistics of a domain,
there's a long list of PTR records and an occasional A record left behind. nMap has a
useful script (--script asn-query) to perform lookups of the ASN, port status, CIDR
block, country code, and even the IPv4 address, all in one go, with an -iL
of that long list.

There's a hitch in this process: Large numbers of hostnames "Failed to resolve."

Can I add {grep "Failed to resolve" [to one output file]} in addition
to the output file of the successful portions of the nMap scan ? Once the process of an
nMap scan is over, I can copy and paste some of what could have been the grep output
file from the console, but the vast majority of that output has scrolled off the top of
the console ...

I capture those IPv4 addresses manually by simple inspection, with an occasional one a
derivative of the ARPA record (i.e., backwards). In a list of 10,000 such rejected
PTR records, about ninety percent can be extracted in a LibreOffice Calc spreadsheet.

Some of the hostnames with embedded IPv4 addresses make it through nMap's reverse
hostname lookup process, perhaps because they haven't got a duplicate PTR record with
the same IPv4 address embedded in it.

In the attached file has about half such easily looked-up hostnames, mainly because
I selected a group from the batch I captured from the console that have some diversity.

George Langford

AttachmentSize
Failed2Resolve.txt1.25 KB
Magic Banana

I am a member!

I am a translator!

Offline
Joined: 07/24/2010

You can redirect the output of any command (including 'nmap') to a file. Append "&> file_name" (without the quotes and with the file name of your choice) to the command line, to redirect what it writes on both the standard and the error outputs.

amenex
Offline
Joined: 01/03/2015

Magic Banana wisely suggested:

> You can redirect the output of any command (including 'nmap') to a file.
> Append "&> file_name" (without the quotes and with the file name of your choice) to the command line,
> to redirect what it writes on both the standard and the error outputs.

That teenie little ampersand (&) is the key:

sudo nmap -sS -p3389 -T4 --max-retries 8 --script asn-query -iL FTR-UnresolvedHNs.txt &> HNs-nMap-testlist.txt
==> only "Failed to Resolve" in output file

sudo nmap -sS -p3389 -T4 --max-retries 8 --script asn-query -iL HNs-nMap-MixedList.txt &> HNs-nMap-MixedOutput.txt
==> Failed to resolve" listed first. OK !
[I randomized the list from among the first list and a "selection" that was accidentally all resolved]

The original list of hostnames had 22 thousand lines and took nMap over three hours to process.

Note in passing: The IPv4 addresses that were not (for whatever reason) converted by the server
software to hostnames is much longer and equally productive of useful data. I wish there could
be a scripted way of separating the two populations into two lists. Manually, it's very tedious.

George Langford

AttachmentSize
FTR-UnresolvedHNs.txt 758 bytes
HNs-nMap-MixedList.txt 1.33 KB
Magic Banana

I am a member!

I am a translator!

Offline
Joined: 07/24/2010

That teenie little ampersand (&) is the key

You can also only redirect the standard output with '>' (or '1>') and only the error output with '2>'. And you can use both:
$ command > standard_output 2> error_output

I wish there could be a scripted way of separating the two populations into two lists. Manually, it's very tedious.

Two 'grep's probably provide a solution. But I do not really understand what you want. As always, an excerpt of the input (with all cases) and the expected outputs would help.

amenex
Offline
Joined: 01/03/2015

Here's a sample of one online Webalizer file (stripped of reams of frequency and byte data)
which is trivial to separate. Now imagine about half a million lines of such data in random
order.

I can strip out the hostnames by putting two identical target lists side-by-side in
OpenOffice calc, stripping the dots (.) out of one, then applying "text to columns" and
deleting all but the left-most five columns of the spreadsheet. Some non-complying IPv4's
are left behind with leading letters in (usually) the first octets' column which grep
might find easily. There will also be short hostnames that didn't have three dots, but grep
should get those, too. Once that's done, subtract the IPv4 list from the original list ...

AttachmentSize
Mixed-HNs-IPv4s.txt 357 bytes
Magic Banana

I am a member!

I am a translator!

Offline
Joined: 07/24/2010

I do not understand the separation you want. Again: please give the expected output given the input attached in your previous message (assuming it cover all cases).

amenex
Offline
Joined: 01/03/2015

It turns out that nMap doesn't care which way the lookup process goes; I'm currently processing the entire file (35,000 lines) for one four-month data set, and the hostnames as well as the IPv4 addresses are being handled with the same output schemes.

The actual separation process will have to be performed on the "Failed to resolve" hostnames in order to extract their IPv4 addresses. The vast majority of those have the four octets of the IPv4 address at the beginning of the hostname, so my previous explanation will work OK to yield another set of unencumbered IPv4
addresses that can be handled easily by the scripts already given. Some of those IPv4 addresses will be dead (host not up) or pointing to more than one hostname, which made those hostnames unresolvable and therefore easy to abuse. The other hostnames in each set of duplicates can be uncovered by running nMap on the CIDR block that holds the same server's array of hostnames. A few hostnames are duplicated across different CIDR blocks; some are even duplicated across different ASN's.

amenex
Offline
Joined: 01/03/2015

I went through the file HNs-nMap-LixedList.txt and resolved all the hostnames, mainly by inspection (two were based on ARPA naming and so had their octets arranged in backwards order), many with Google, usually the first item in the list, and one by truncating the prefix from q.jaso.ru to jaso.ru (198.54.120.43) but Hurricane Electric's BGP database doesn't list any subdomains on that server.

amenex
Offline
Joined: 01/03/2015

Armed with my new knowledge, I processed all 35,000 lines of the source file so as to separate one file with
the original four-octet-containing hostnames and four additional columns each containing one of those four
octets. That list of IPv4's is 3500 lines long, out of 14,000 hosts that were up at the time of the scan of
22,500 rows scanned plus 5,000 rows that were accounted for; about 7,500 were not up at the time of the scan.

About 5,000 rows were cast aside by nMap as unresolvable; the 3500 lines came from those 5,000 rows.

Now my task is to create just two columns of data from the current five-column file. The following command
does not quite do that successfully:

awk -f beginning-5-col-file.txt '{ print $1"\t"$2"\:"$3"\:"$4"\:"$5 }' > output-2-column-file.txt

[awk finds a decimal pint that it doesn't like, but the only decimal points are in the first column of
the input file as parts of the hostnames ... looks like I need to use printf instead ... all five columns
will have to be formatted as strings to shield those dots from prying eyes]

I'm using colons (:) as the field separator here with the intent to replace the colons with decimal points
by using search-and-replace-all in Leafpad; there are no colons in the hostname column. The final output
file can then be processed with an nslookup script to make sure the octets are in the correct order and not
reversed. An awk script could compare the hostnames of the IPv4 column to the first column's hostnames and
then reverse the octet order of the ones that fail that test before trying again.

Magic Banana

I am a member!

I am a translator!

Offline
Joined: 07/24/2010

Input excerpt and expected output...

amenex
Offline
Joined: 01/03/2015

Magic Banana woke me up with: "Input excerpt and expected output..."

OK. The beginning-5-col-file shows all five tab-separated fields.

The output-2-column-file actually has an added third column listing the
proper IPv4 addresses that resolve to the hostnames in the first column.

You will see that a couple of the server owners have active imaginations:
3rd row: 53.198.27.67 to 198.27.67.35
14th row: 10.181.14.194 to 181.14.194.10

Took a while to solve those two.

I initially converted the five-column file to two columns in Leafpad by converting the
tabs to dots and then converting the .com. to .com[space], then .net. ... .br. ... .cn. ...
and so on, with a couple of unintended consequences ... not for a 5,000 row file. I also
had to correct the hostnames of two entries that had been truncated at their beginnings
by reference to the original four-month list. These hostnames all appear in online
Webalizer- and spam-statistics files.

I noticed that when the nslookup fails to return the correct hostname, the reverse order
can be grep'ed from the four octets at the beginning of the arpa-format reply. That would
have gotten the right answers in all but two of this list of hostnames.

AttachmentSize
beginning-5-col-file.txt 688 bytes
output-2-column-file.txt 947 bytes
Magic Banana

I am a member!

I am a translator!

Offline
Joined: 07/24/2010

The third column of "output-2-column-file.txt" cannot be computed from the input alone, as far as I understand. Is the script you want supposed to take the list of hostnames (the first column of beginning-5-col-file.txt) and compute what you call "literal IPv4"? If so, why is "ns530300.ip-198-27-67.net" mapped to 53.198.27.67? That makes little sense to me (and, indeed, 53 is not part of the actual address).

Defining the "literal IPv4" as the first sequence, in the hostname, of four numbers (separated by any number of other characters) that are all smaller than 256, I wrote that:
$ tr -sc 0-9\\n ' ' < hostname_list | awk '{ k = 0; for (i = 0; k < 4 && ++i <= NF; ) { if ($i < 256) ++k; else k = 0 } if (k == 4) { for (k = i - 4; ++k < i; ) printf $k "."; print $i } else print "" }' | paste hostname_list -

If hostname_list is the first column of beginning-5-col-file.txt (i.e., the output of 'cut -f 1 beginning-5-col-file.txt'), I get:
0-187-172-163.rev.cloud.scaleway.com 0.187.172.163
0-224-15-51.rev.cloud.scaleway.com 0.224.15.51
ns530300.ip-198-27-67.net
tsn109-201-154-166.dyn.nltelcom.net 109.201.154.166
1-124-15-51.rev.cloud.scaleway.com 1.124.15.51
1.170.246.94.jerzostrada.pl 1.170.246.94
1.186.151.206.dvois.com 1.186.151.206
1.186.242.20.dvois.com 1.186.242.20
1.186.251.134.dvois.com 1.186.251.134
1.186.63.130.dvois.com 1.186.63.130
1.207.86.109.triolan.net 1.207.86.109
10-96.239.170.global.psi.br 10.96.239.170
10.136.133.188.msk.enforta.com 10.136.133.188
host10.181-14-194.telecom.net.ar 10.181.14.194
10.192.7.191.online.net.br 10.192.7.191
10.52.18.175.adsl-pool.jlccptt.net.cn 10.52.18.175

Notice that ns530300.ip-198-27-67.net is not mapped. As I wrote above, I do not think it should.

amenex
Offline
Joined: 01/03/2015

Magic Banana lamented:

>The third column of "output-2-column-file.txt" cannot be computed from the input alone, as far as I understand.

If it were mine to do as well as you can ... I'd reverse the octet order in a third column, look them both
up with nMap or nslookup, and compare to the first column. That would work for some of the IPv4's in column 2,
but would not solve the ones that move the left-most octet to the far right, leaving the other three octets
in "standard" order. Another script might be written to do that in order to produce a fourth column.

> If so, why is "ns530300.ip-198-27-67.net" mapped to 53.198.27.67? That makes little sense to me (and,
> indeed, 53 is not part of the actual address).

Tricky server owner: He reversed the 53 to 35 and put it at far right just like some of the 10's were handled;
Now we're up to a fifth column to try in our output comparisons.

My short-term strategy is to try Google, then Hurricane Electric's BGP files. The next step in my plan is to
grab all the hostnames of the home servers of the malevolent country codes that appear in the output files of
the searches emanating from the 35,000-row list (maybe a couple of thousand CIDR blocks), see how many
duplicated hostnames appear, and then go after all the CIDR blocks in each of the active ASN's. With the
tweaks that we've discussed recently, those nMap scans run remarkably quickly. A dedicated pattern searcher
would not limit the analysis to the known malevolents. For example, my 100+ sextortion emails came from every
corner of the world. I have a couple written in portuguese. I grew tired of collecting them and block any
message with the word for a popular form of virtual currency, which grew out of the ancient practice of biting
precious-metal currency that was stopped when newer currency started to be made with reeded edges.

It's too hard to predict what will appear in the results of a Google search on an arbitrary but nonresolving
hostname ... maybe concentrate on some anti-spam services.

Hurricane Electric's BGP application is a little easier: Plug a three-octet abbreviation of the IPv4 address
(as oct.oct.oct.0/24) into the Search: box, pick the smaller of the two CIDR blocks that appear in response,
ask for the DNS records, put the unknown hostname into the browser's search-this-page function, and see what,
if any, IPv4 address lights up. The BGP data is limited to one thousand output lines, so many hostnames
disappear from view with the larger CIDR blocks, such as CIDR/21, especially when there are sudirectories,
i.e. "A" records.

> Notice that ns530300.ip-198-27-67.net is not mapped. As I wrote above, I do not think it should.

I peeked at the Google data, which made the "guessing" easier. "ns530300" ==> 53 ==> 35; 300 is impossible.

I tried Magic Banana's suggested scripts, and they work exactly as written. Alas, I haven't managed any better
than getting more than just the left-most of my intended reverse=ordered octets to print with this modification:

tr -sc 0-9\\n ' ' < hostname_list | awk '{ k = 0; for (i = 0; k < 4 && ++i <= NF; ) { if ($i < 256) ++k; else k = 0
} if (k == 4) { for (k = 4 - i; ++k > i; ) printf $k "."; print $i } else print "" }' | paste hostname_list -

Magic Banana

I am a member!

I am a translator!

Offline
Joined: 07/24/2010

It is easy to get the two additional columns you asked for. For instance with two additional loops. But since you seem confused about these loops (with three iterations, the last number is printed out of the loop), I put everything in one big print:
$ tr -sc 0-9\\n ' ' < hostname_list | awk '{ k = 0; for (i = 0; k < 4 && ++i <= NF; ) { if ($i < 256) ++k; else k = 0 } if (k == 4) { i -= 3; print $i "." $++i "." $++i "." $++i, $i "." $--i "." $--i "." $--i, $++i "." $++i "." $++i "." $(i - 3) } else print "" }' | paste -d ' ' hostname_list -
0-187-172-163.rev.cloud.scaleway.com 0.187.172.163 163.172.187.0 187.172.163.0
0-224-15-51.rev.cloud.scaleway.com 0.224.15.51 51.15.224.0 224.15.51.0
ns530300.ip-198-27-67.net
tsn109-201-154-166.dyn.nltelcom.net 109.201.154.166 166.154.201.109 201.154.166.109
1-124-15-51.rev.cloud.scaleway.com 1.124.15.51 51.15.124.1 124.15.51.1
1.170.246.94.jerzostrada.pl 1.170.246.94 94.246.170.1 170.246.94.1
1.186.151.206.dvois.com 1.186.151.206 206.151.186.1 186.151.206.1
1.186.242.20.dvois.com 1.186.242.20 20.242.186.1 186.242.20.1
1.186.251.134.dvois.com 1.186.251.134 134.251.186.1 186.251.134.1
1.186.63.130.dvois.com 1.186.63.130 130.63.186.1 186.63.130.1
1.207.86.109.triolan.net 1.207.86.109 109.86.207.1 207.86.109.1
10-96.239.170.global.psi.br 10.96.239.170 170.239.96.10 96.239.170.10
10.136.133.188.msk.enforta.com 10.136.133.188 188.133.136.10 136.133.188.10
host10.181-14-194.telecom.net.ar 10.181.14.194 194.14.181.10 181.14.194.10
10.192.7.191.online.net.br 10.192.7.191 191.7.192.10 192.7.191.10
10.52.18.175.adsl-pool.jlccptt.net.cn 10.52.18.175 175.18.52.10 52.18.175.10

As you can see, I also chose AWK's default separator, the space. It is indeed usually easier to work with file that do not have tabulations (especially with 'sed'). Well, except with 'cut' and 'paste' where option -d must then be set (as I did above). If you really want tabulations, replace the commas in the print with ' "\t"' (the space cancatenates in AWK) and remove option -d of 'paste'.

Tricky server owner: He reversed the 53 to 35 and put it at far right

Is it a consistent pattern over several machines? I mean: it could be coincidence.

amenex
Offline
Joined: 01/03/2015

Magic Banana wrote:

> As you can see, I also chose AWK's default separator, the space. It is indeed usually easier to work with file
> that do not have tabulations (especially with 'sed'). Well, except with 'cut' and 'paste' where option -d must
> then be set (as I did above). If you really want tabulations, replace the commas in the print with ' "\t"' (the
> space cancatenates in AWK) and remove option -d of 'paste'.

In a hurry, I grabbed a few lines of the output of MB's new script above from my own hostname list, opened it with
LibreOffice Calc, thereby replacing the spaces with tabs, and then copied & pasted by selection back into leafpad.

Before I ran the new script, I followed MB's advice and tested it with my "ftr05_output_test.txt" file":

time tr -sc 0-9\\n ' ' < ftr-shortlist.txt | awk '{ k = 0; for (i = 0; k < 4 && ++i <= NF; ) { if ($i < 256) ++k;
else k = 0 } if (k == 4) { i -= 3; print $i "." $++i "." $++i "." $++i, $i "." $--i "." $--i "." $--i"\t" $++i "."
$++i "." $++i "." $(i - 3) } else print "" }' | paste ftr-shortlist.txt - > ftr05_shortlist_output.txt

Looks OK and process nearly all the hostnames in this short list; at home, same result with a much longer list in
which I had corrected [many of] the obviously truncated hostnames ... which would have processed OK in the main
list because the correct version of those truncated hostnames was also present. I did guess one or two octet
prefixes on the assumption that they were the least significant of the original four octets.

Here comes the rub:

sudo nmap -sS -p3389 -iL ftr05_shortlist_output.txt &> ftr05_output_test-lookup.txt
Only looks up the first of four output columns in the tab-delimited target file.

sudo nmap -sS -p3389 -iL ftr-onelineinput.txt &> ftr-onelineoutput.txt
Does the trick: one of four outputs (reversed octet order) matches the hostname that was input:

grep -C0 "Nmap scan report for" ftr-onelineoutput.txt:

Nmap scan report for 11-124-15-51.rev.cloud.scaleway.com (51.15.124.11)
--
Nmap scan report for 24.15.51.1

Just need to suppress that non-matching data ... which can be done with LibreOffice Calc in post
processing by sorting and the deleting the rows with blank entries ...

sudo 'for i <=NR' nmap -sS -p3389 -iL ftr05_shortlist_output.txt &> ftr05_output_test-lookup.txt
Not quite "it." Only processes the first column of the tab-delimited, four-column input file.

That's where it stands for now. If we get these steps to work, the entire original Webalizer list
can be run in one grand rush ... of maybe eight hours ... which is a lot faster than looking them
up, one at a time.

>> Tricky server owner: He reversed the 53 to 35 and put it at far right
> Is it a consistent pattern over several machines? I mean: it could be coincidence.

The good news is that it was in a hostname that had been truncated in the Webalizer data in one
month's data but not in another month. There are so few hostnames that escape these scripts that
they could be analyzed manually after running the scripts. The incentive is that the trend is
going in the direction of maximum obfuscation, and so careful post-processing will be the order
of the day.

Here are all the unprocessed hostnames from the second, long script, for which I managed to get
confirmed IPv4 addresses (by nslookup) with Google or BGP: ftr05_output_empty_IPv4.google.txt

AttachmentSize
ftr05_output_empty_IPv4.google.txt 1.84 KB
ftr-shortlist.txt 368 bytes
ftr05_shortlist_output.txt 858 bytes
ftr-onelineinput.txt 70 bytes
ftr-onelineoutput.txt 460 bytes
Magic Banana

I am a member!

I am a translator!

Offline
Joined: 07/24/2010

Just need to suppress that non-matching data ... which can be done with LibreOffice Calc in post processing by sorting and the deleting the rows with blank entries ...

Are you referring to the lines where my command line does not detect any IPv4 address? If so, 'grep' can be used at the output:
$ tr -sc 0-9\\n ' ' < hostname_list | awk '{ k = 0; for (i = 0; k < 4 && ++i <= NF; ) { if ($i < 256) ++k; else k = 0 } if (k == 4) { i -= 3; print $i "." $++i "." $++i "." $++i "\t" $i "." $--i "." $--i "." $--i "\t" $++i "." $++i "." $++i "." $(i - 3) } else print "" }' | paste hostname_list - | grep '[0-9]$'

A perhaps more elegant solution uses one single AWK program for everything, with "[^0-9]+" as the field separator (no need for 'tr'), printing $0 before the IPv4 addresses (no need for 'paste'), without the else statement that prints a blank line (no need for 'grep'), defining "." as the output field separator (to use commas in the print instead of always concatenating ".", what could be done in the solution above too), and testing k == 4 after (rather than in) the action executed on each line (because it is nicer). Here is a script, to write in an executable file (to execute with a hostname list in argument, or several), that does all that:
#!/usr/bin/awk -f
BEGIN {
FS = "[^0-9]+"
OFS = "." }
{
k = 0
for (i = 0; k < 4 && ++i <= NF; )
if ($i != "" && $i < 256)
++k
else
k = 0 }
k == 4 {
i -= 3
print $0 "\t" $i, $++i, $++i, $++i "\t" $i, $--i, $--i, $--i "\t" $++i, $++i, $++i, $(i - 3) }

amenex
Offline
Joined: 01/03/2015

Magic Banana suggested:

> Just need to suppress that non-matching data ... which can be done with LibreOffice Calc in post processing by sorting and the deleting the rows with blank entries ...

> Are you referring to the lines where my command line does not detect any IPv4 address? If so, 'grep' can be used at the output:

$ tr -sc 0-9\\n ' ' < hostname_list | awk '{ k = 0; for (i = 0; k < 4 && ++i <= NF; ) { if ($i < 256) ++k; else
k = 0 } if (k == 4) { i -= 3; print $i "." $++i "." $++i "." $++i "\t" $i "." $--i "." $--i "." $--i "\t" $++i
"." $++i "." $++i "." $(i - 3) } else print "" }' | paste hostname_list - | grep '[0-9]$'

With modification to complete (?) the solution:

time tr -sc 0-9\\n ' ' < ftr-shortlist.txt | awk '{ k = 0; for (i = 0; k < 4 && ++i <= NF; ) { if ($i < 256) ++k;
else k = 0 } if (k == 4) { i -= 3; print $i "." $++i "." $++i "." $++i "\t" $i "." $--i "." $--i "." $--i "\t"
$++i "." $++i "." $++i "." $(i - 3) } else print "" }' | paste ftr-shortlist.txt - | grep '[0-9]$' >
intermediate.txt | sudo nmap -sS -p3389 -iL intermediate.txt &> ftr_shortlist-GL06182019.txt

This modification actually seems to work; it solves the task presented by the original list of four-column fields.

Oops ... not so fast. The output list, ftr_shortlist-GL06182019.txt, contains some interlopers; extraneous hosts
that resolve from the rearranged octets. After applying:
grep -C0 "Nmap scan report for" ftr_shortlist-GL06182019.txt > ftr_shortlist-GB-2-cols.txt

then removing the row separators (--) with Leafpad, re-opening in LibreOffice Calc, concatenating the original
ftr_shortlist.txt, saving the combined LibreOffice Calc files (to re-number the rows) and sorting the resulting
file, ftr_shortlist-GB-2-cols-plus.ods, we can see the interlopers as lonesome rows. The properly resolved hosts
are in duplicate, one with its properly transposed-octet IPv4 address, and the other without an IPv4. It
remains to be seen how to manage that so the interlopers are discarded.

AttachmentSize
ftr_shortlist-GL06182019.txt 2.77 KB
ftr_shortlist-GB-3-cols-plus.txt 1.34 KB
ftr_shortlist-GB-2-cols-plus.txt 995 bytes
Magic Banana

I am a member!

I am a translator!

Offline
Joined: 07/24/2010

The "modification to complete" does not run nmap in parallel with the rest of the command line. If you do not actually need "intermediate.txt", then remove '> intermediate.txt' and give '-' (which here means "standard input") to the -iL option. If you really want "intermediate.txt", then replace '> intermediate.txt' with '| tee intermediate.txt' and, again, give '-' to the -iL option. The single quotes in the two previous sentences are not to be copied.

'grep -C 0' asks for zero line of context: you only get the separator "--" than you then remove with Leafpad: do not use that option here! Then you can pipe (with '|') grep's output to 'sort'. Give it two arguments: '-' (again, it means "standard input" here) and the other file you want the selected lines added. You need neither Leafpad nor Libre Calc.

I am not sure I understand what you mean by "It remains to be seen how to manage that so the interlopers are discarded". Does it mean removing every line having one single field if that line is preceded (or maybe followed: sort's ordering depend on the localization of the system) by a line with the same first field? That is easy to do with a line of AWK... but, in the attached files, that would apparently remove all lines with one single field, i.e., all lines previously selected with 'grep'. That does not look right. Again, I would like an input (also to see what is sort's ordering), with all cases (lines that are discarded and lines that are not), and the respected output.

amenex
Offline
Joined: 01/03/2015

Magic Banana worries:

> I am not sure I understand what you mean by "It remains to be seen how to manage that so the interlopers are discarded".

Once the [long] command is done, it's too late to compare the [two or] three outputs from the nmap lookups of the candidate
IP addresses [that I piped to the expedient intermediates.txt file]. That has to be done in-line with the main command.

> Does it mean removing every line having one single field if that line is preceded (or maybe followed: sort's ordering
> depend on the localization of the system) by a line with the same first field? That is easy to do with a line of AWK
> ... but, in the attached files, that would apparently remove all lines with one single field, i.e., all lines previously
> selected with 'grep'. That does not look right.

Definitely not right. In my expedient "solution" we need more steps to this process:

... | grep '[0-9]$' intermediate.txt | sudo nmap -sS -p3389 -iL intermediate.txt &> penultimate.txt | awk -f 'penultimate.txt'
[i.e., pick the match and discard the non-matches of the nMap output] - > hostname-IPv4.txt | grep "Nmap scan report for"
hostname-IPv4.txt > Two-column-list.txt

That awk command should be easy to work out for someone familiar with the syntax ... I use the extraneous filenames to avoid
those hassles.

> Again, I would like an input (also to see what is sort's ordering), with all cases (lines that are discarded and lines
> that are not), and the respect[ive] output.

Here's a longer version; ftr05.txt is about 1/7th of the 4500-row "failed to resolve" data from the four-month data of 35,000
rows. Our analysis could just as well be applied to the 35,000 row file, once the inefficiencies are cleaned up:

time tr -sc 0-9\\n ' ' < ftr05.txt | awk '{ k = 0; for (i = 0; k < 4 && ++i <= NF; ) { if ($i < 256) ++k; else k = 0 }
if (k == 4) { i -= 3; print $i "." $++i "." $++i "." $++i "\t" $i "." $--i "." $--i "." $--i "\t" $++i "." $++i "."
$++i "." $(i - 3) } else print "" }' | paste ftr05.txt - | grep '[0-9]$' > intermediate.txt | sudo nmap -sS -p3389 -iL
intermediate.txt &> ftr05-GL06192019.txt

Output: Nmap done: 2069 IP addresses (611 hosts up) scanned in 553.76 seconds (original data getting very stale ...)

Applied to one-eighth of the four-month dataset:

time tr -sc 0-9\\n ' ' < JFMA-Partial.txt | awk '{ k = 0; for (i = 0; k < 4 && ++i <= NF; ) { if ($i < 256) ++k; else
k = 0 } if (k == 4) { i -= 3; print $i "." $++i "." $++i "." $++i "\t" $i "." $--i "." $--i "." $--i "\t" $++i "."
$++i "." $++i "." $(i - 3) } else print "" }' | paste JFMA-Partial.txt - | grep '[0-9]$' > intermediate.txt | sudo
nmap -sS -p3389 -iL intermediate.txt &> JFMA-partial-GL-06192019.txt

Output: Nmap done: 17492 IP addresses (6692 hosts up) scanned in 6091.89 seconds ... better to try current data.

The input file below came from today's Recent Visitors to my main website (GB), pared down to a single column of
hostnames and a lot of IPv4 address that weren't resolved by the shared site's Apache server:

time tr -sc 0-9\\n ' ' < CPV-GB-OneCol0-6192019.txt | awk '{ k = 0; for (i = 0; k < 4 && ++i <= NF; ) { if ($i < 256)
++k; else k = 0 } if (k == 4) { i -= 3; print $i "." $++i "." $++i "." $++i "\t" $i "." $--i "." $--i "." $--i "\t"
$++i "." $++i "." $++i "." $(i - 3) } else print "" }' | paste CPV-GB-OneCol0-6192019.txt - | grep '[0-9]$' >
intermediate.txt | sudo nmap -sS -p3389 -iL intermediate.txt &> CPV-GB-OneCol-output.txt

Output: Nmap done: 345 IP addresses (101 hosts up) scanned in 59.02 seconds. That 101 is from 78 original hosts;
the rest are interlopers resulting from nMap lookups of rearranged octets; I tried marking the extraneous hostnames
but a number of IPv4 addresses that are also extraneous missed my tired eyes.

The nMap scans reveal only one open port 3389 among these would-be visitors but many filtered ports 3389 ... and a
much smaller fraction of addresses that were apparently turned off right after scanning GB than appear in the Internet-
accessible Webalizer data.

AttachmentSize
ftr05.txt 23.68 KB
ftr05-GL06192019.txt 127.83 KB
JFMA-Partial.txt 85.12 KB
CPV-GB-OneCol0-6192019.txt 2.93 KB
CPV-GB-OneCol-output.txt 14.41 KB
CPV-GB-TwoCol-output.txt 4.05 KB
Magic Banana

I am a member!

I am a translator!

Offline
Joined: 07/24/2010

Read again the first paragraph of my previous post.

More generally, the following command line makes no sense:
$ command1 > file | command2 file

You want:
$ command | command2 -
Or, if you want to save the partial results (the output of 'command1', not post-processed by 'command2'):
$ command | tee file | command2 -

In both cases, "-" means "standard input" and can be omitted if, by default, 'command2' processes the standard input (not the case as an argument of option -iL of 'nmap': the argument is mandatory).

amenex
Offline
Joined: 01/03/2015

Correction to the output file:

time tr -sc 0-9\\n ' ' < CPV-GB-OneCol0-6192019.txt | awk '{ k = 0; for (i = 0; k < 4 && ++i <= NF; ) { if ($i < 256) ++k; else
k = 0 } if (k == 4) { i -= 3; print $i "." $++i "." $++i "." $++i "\t" $i "." $--i "." $--i "." $--i "\t" $++i "." $++i "."
$++i "." $(i - 3) } else print "" }' | paste CPV-GB-OneCol0-6192019.txt - | grep '[0-9]$' > intermediate.txt | sudo nmap
-sS -p3389 -iL intermediate.txt &> CPV-GB-TwoCol-output.txt

Magic Banana admonished against my use of: > $ command1 > file | command2 file

> You want: > $ command | command2 -

> Or, if you want to save the partial results (the output of 'command1', not post-processed by 'command2'):
> $ command | tee file | command2 -

> In both cases, "-" means "standard input" and can be omitted if, by default, 'command2' processes the
> standard input (not the case as an argument of option -iL of 'nmap': the argument is mandatory).

Which means that the end of my command should be:
>> ... | paste Input-OneColumn | grep '[0-9]$' > Output-FourColumns | sudo nmap -sS -p3389 -iL Output-FourColumns &> Output-TwoColumns
Instead of:
>> ... | paste Input-OneColumn - | grep '[0-9]$' > Output-FourColumns | sudo nmap -sS -p3389 -iL Output-FourColumns &> OutputX-TwoColumns

Actually, the ultimate aim is to eliminate the unintended consequences: the hostnames returned for lookups
of incorrect octet sequences. These should be tested and rejected on the fly:

>> ... | paste Input-OneColumn | grep '[0-9]$' > Output-FourColumns | sudo nmap -sS -p3389 -iL Output-FourColumns | [awk command] &> OutputY-TwoColumns

Where OutputX-TwoColumns contains innocent bystanders and OutputY-TwoColumns reveals the correct IPv4
addresses of Input-OneColumn without the unintended consequences.

Complications: The output of each instance of my nMap command has four lines:

>> Nmap scan report for [hostname] ([IPv4 address])
>> Host is up (0.16s latency).
>> PORT STATE SERVICE
>> 3389/tcp closed ms-wbt-server

For which the command:
>> grep "Nmap scan report for" [nMap output] > {Nmap scan report for [hostname] ([IPv4 address])} and finally: [hostname] ([IPv4 address])
i.e., that grep output has to be stripped of the string "Nmap scan report for" and those two pesky parentheses,
which I've been doing with LibreOffice Calc or with Leafpad, which are expedient only for short hostname lists.

In the present two-column output file there are several fer-instances; see Two-Col-OutputX-grep.txt
There are unresolved IPv4 addresses (servers down or NXDOMAIN respnses);
Unintended hostnames and their unsuccessfully rearranged IPv4 octets; and
Matching hostnames and their correctly rearranged IPv4 octets;

When I delete all the unaccompanied IPv4 addresses and sort the remainder (see Two-Col-OutputY-grep-sort.txt)
there are duplicates of all the proper hostnames from the One-Col-Input file and singular hostnames for all
the incorrectly rearranged IPv4 addresses. Deleting the singular ones and then removing the remaining
duplicates would leave us with the successfully resolved original hostnames and their IPv4 addresses.

On the other hand, nMap's on-the-fly outputs are just three- or four-fold. The four-fold outputs occur for
a resolvable hostname and include the lookup of the hostname obtained from the input file, another [correct]
hostname from the properly rearranged octets of the IPv4 address harvested from the original hostname, an
incorrect hostname(s) resolved from one or the other of the two permutations of the harvested IPv4 address,
or one or two unresolvable IPv4 addresses. The unresolved IPv4 addresses will not match the string of the
original hostname (i.e., 266.198.1.7 doesn't equal orange.calm) and can be dropped. Non-matching hostnames
can also be dropped, leaving the one or two matched hostnames and their correct IPv4 permutation. All that
remains is to remove duplicate rows from the two-column output.

When I realize that the target is the list of all the current visitors and includes resolvable hostnames,
unresolvable hostnames, resolvable as well as unresolving IPv4 addresses, my method of detecting the
unintended hostnames looks shaky ... but we can block unwanted IPv4 addresses whether or not they are
resolvable.

AttachmentSize
Two-Col-OutputX-grep.txt 3.86 KB
Two-Col-OutputY-grep-sort.txt 3.49 KB
amenex
Offline
Joined: 01/03/2015

This may be on the right track, but has a syntax error when I try to process the three permutated IPv4 addresses:

time tr -sc 0-9\\n ' ' < CPV-GB-OneCol0-6192019.txt | awk '{ k = 0; for (i = 0; k < 4 && ++i <= NF; ) { if ($i < 256) ++k; else k = 0 } if (k == 4) { i -= 3; print $i "." $++i "." $++i "." $++i "\t" $i "." $--i "." $--i "." $--i "\t" $++i "." $++i "." $++i "." $(i - 3) } else print "" }' | paste CPV-GB-OneCol0-6192019.txt | grep '[0-9]$' > intermediate.txt | awk { for (i = 2; ++i <= 4; ) 'if "nmap -sn intermediate.txt | grep "Nmap scan report for " - | cut -c22- == CPV-GB-OneCol0-6192019.txt' } print CPV-GB-OneCol0-6192019.txt, intermediate.txt ' > CPV-GB-TwoCol-OutputY.txt

This is my tentative cleanup code:

| awk { for (i = 1; ++i <= 4; ) 'if "nmap -sn intermediate.txt | grep "Nmap scan report for " - | cut -c22- == CPV-GB-OneCol0-6192019.txt' } print CPV-GB-OneCol0-6192019.txt"\t"intermediate.txt ' > CPV-GB-TwoCol-OutputY.txt

The awk command is intended to select the one combination of hostname and permutated IPv4 address that correctly resolves.
The grep command { grep "Nmap scan report for " - | cut -c22- } should remove all but the hostname and its IPv4 address
from the output of the earlier grep command { | grep '[0-9]$' > intermediate.txt }.

The following attempt has no flagged errors, but stalls after creating intermediate.txt:

time tr -sc 0-9\\n ' ' < CPV-GB-OneCol0-6192019.txt | awk '{ k = 0; for (i = 0; k < 4 && ++i <= NF; ) { if ($i < 256) ++k; else k = 0 } if (k == 4) { i -= 3; print $i "." $++i "." $++i "." $++i "\t" $i "." $--i "." $--i "." $--i "\t" $++i "." $++i "." $++i "." $(i - 3) } else print "" }' | paste CPV-GB-OneCol0-6192019.txt | grep '[0-9]$' > intermediate.txt | awk { 'for (i = 2; ++i <= 4; )' 'if "nmap -sn intermediate.txt | grep "Nmap scan report for " - | cut -c22- == CPV-GB-OneCol0-6192019.txt' } print CPV-GB-OneCol0-6192019.txt, intermediate.txt ' > CPV-GB-TwoCol-OutputY.txt

Maybe my suggested awk command should be integrated with the earlier awk command. Lots to think about.

Magic Banana

I am a member!

I am a translator!

Offline
Joined: 07/24/2010

No. The command should be:
$ tr -sc 0-9\\n ' ' < hostname_list | awk '{ k = 0; for (i = 0; k < 4 && ++i <= NF; ) { if ($i < 256) ++k; else k = 0 } if (k == 4) { i -= 3; print $i "." $++i "." $++i "." $++i "\t" $i "." $--i "." $--i "." $--i "\t" $++i "." $++i "." $++i "." $(i - 3) } else print "" }' | paste hostname_list - | grep '[0-9]$' | sudo nmap -sS -p3389 -iL -
Or, using the small AWK program I wrote to do everything (here, I assume it is written in "./find-IPv4-addr.awk', an executable file in the current directory):
$ ./find-IPv4-addr.awk hostname_list | sudo nmap -sS -p3389 -iL -
You can add 'tee intermediate.txt |' after the last pipe if you want to save what goes through this pipe in "intermediate.txt".

Just to be clear: it is not a matter of esthetics. In the subsequent command line, nothing goes through the pipe and 'command2' can start running before 'command1' (they truly run in parallel):
$ command1 > file | command2 file
As a consequence, it is possible that "file" does not even exist when 'command2' tries to read it. 'command2' would then end with an error.

amenex
Offline
Joined: 01/03/2015

Starting with Magic Banana's firm suggestion:

tr -sc 0-9\\n ' ' < hostname_list | awk '{ k = 0; for (i = 0; k < 4 && ++i <= NF; ) { if ($i < 256) ++k; else k = 0 } if (k == 4) { i -= 3; print $i "." $++i "." $++i "." $++i "\t" $i "." $--i "." $--i "." $--i "\t" $++i "." $++i "." $++i "." $(i - 3) } else print "" }' | paste hostname_list - | grep '[0-9]$' | sudo nmap -sS -p3389 -iL -

I added a little post-processing on the fly:

time tr -sc 0-9\\n ' ' < CPV-GB-OneCol0-6192019.txt | awk '{ k = 0; for (i = 0; k < 4 && ++i <= NF; ) { if ($i < 256) ++k; else k = 0 } if (k == 4) { i -= 3; print $i "." $++i "." $++i "." $++i "\t" $i "." $--i "." $--i "." $--i "\t" $++i "." $++i "." $++i "." $(i - 3) } else print "" }' | paste CPV-GB-OneCol0-6192019.txt - | grep '[0-9]$' | sudo nmap -sS -p3389 -iL -| grep "Nmap scan report for " | cut -c22- > Two-Col-Output.txt

Two-Col-Output.txt could be the starting point for some tedious work to eliminate the unresolved unaccompanied IPv4 addresses
and then to sort out the inadvertently resolved extraneous hostnames which weren't in the input hostname list.

It would be better to cast aside the collateral damage before impinging upon any affected domains by performing the comparison
of the output of nMap scan report for each of the three candidate permutations of the derived IPv4 address on the fly.

I tried to do that with:

time tr -sc 0-9\\n ' ' < CPV-GB-OneCol0-6192019.txt | awk '{ k = 0; for (i = 0; k < 4 && ++i <= NF; ) { if ($i < 256) ++k; else k = 0 } if (k == 4) { i -= 3; print $i "." $++i "." $++i "." $++i "\t" $i "." $--i "." $--i "." $--i "\t" $++i "." $++i "." $++i "." $(i - 3) } else print "" }' | paste CPV-GB-OneCol0-6192019.txt | grep '[0-9]$' > intermediate.txt | awk { 'for (i = 2; ++i <= 4; )' 'if "nmap -iL intermediate.txt | grep "Nmap scan report for " - | cut -c22- == CPV-GB-OneCol0-6192019.txt' } print CPV-GB-OneCol0-6192019.txt, intermediate.txt ' > CPV-GB-TwoCol-OutputY.txt

But it stalls without telling me anything ...

AttachmentSize
Two-Col-Output.txt 3.96 KB
Magic Banana

I am a member!

I am a translator!

Offline
Joined: 07/24/2010

For the fourth time, in the last command you give, you redirect the standard output of 'grep '[0-9]$' to a file before a pipe: that is wrong and I do not know how to explain it to you in a clearer way than what I attempted in my three last posts.

Because there is an odd number of single quotes, the shell considers that the command is unfinished and lets you type the rest of it. It does not "stall". It lets you write more. As the character at the beginning of the line, ">", indicates. The whole AWK program should be between single quotes.

That program makes no sense: the index i of the loop is never used, you apparently try to call external commands without the AWK's system function, you 'cut' characters starting from the 22nd (?), you apparently try to test if the return value (a number) of three piped commands are equal to a file (except that 'awk' will interpret CPV-GB-OneCol0-6192019.txt as an uninitialized variable, hence ""), you apparently try to print that file and "intermediate.txt" (but print does not print files, which will be here understood as uninitialized variables anyway), etc.

You should learn about the shell and then about the text-processing commands. For instance using the material I have already pointed to you in https://trisquel.info/forum/separating-hostnames-having-multiple-ipv4-addresses-long-two-column-list#comment-140501

amenex
Offline
Joined: 01/03/2015

Magic Banana grades my homework constructively:
> ... redirect the standard output of 'grep '[0-9]$' to a file before a pipe: that is wrong

Here's my mistake: ... | grep '[0-9]$' > intermediate.txt | awk ...
It should have been: ... | grep '[0-9]$'- | awk ...
And Magic Banana's last version of the command did exactly this as accepted by awk.

> ... Because there is an odd number of single quotes, ...

I have a awful time finding the orphans ... awk senses a missing } which I've yet to find in my long script.

I've tried to clean things up:

time tr -sc 0-9\\n ' ' < CPV-GB-OneCol0-6192019.txt | awk '{ k = 0; for (i = 0; k < 4 && ++i <= NF; ) { if ($i < 256) ++k; else k = 0 } if (k == 4) { i -= 3; print $i "." $++i "." $++i "." $++i "\t" $i "." $--i "." $--i "." $--i "\t" $++i "." $++i "." $++i "." $(i - 3) } else print "" }' | paste CPV-GB-OneCol0-6192019.txt | grep '[0-9]$' - | awk { 'for (i = 2; ++i <= 4; )' {./myscript01.awk myfile01.in ./myscript02.awk myfile02.in } } - &> Output-TwoCol.txt

Where myscript01.awk is: { nmap -iL - | grep "Nmap scan report for " - | cut -c22- }
and myscript02.awk is: if {'CPV-GB-OneCol0-6192019.txt ~ ./myscript.awk myfile.in' then print ./myscript01.awk myfile.in}

(Both saved awk scripts are according to: https://www.funtoo.org/Awk_by_Example,_Part_1)

The first short script strips the "Nmap scan report for " from the first line of the nMap output; the second short script makes
the comparison between the input hostname versus the hostname (and IPv4 address) returned by running nMap on that candidate IPv4
address, The match needs to be partial, because the output of mycript01.awk returns a single field with only a space between the
hostnbame and the parenthesized IPv4 address; I am having trouble with that aspect of pattern matching.

I'm assuming all along that the main script processes only one row of the input hostname file at a time and that the subsidiary
scripts remember that hostname and your three permutations of the IPv4 address gleaned from the hostname.

In order to keep your three-step process variable in operation, I suspect that the intermediate print statements should be
skipped:

time tr -sc 0-9\\n ' ' < CPV-GB-OneCol0-6192019.txt | awk '{ k = 0; for (i = 0; k < 4 && ++i <= NF; ) { if ($i < 256) ++k; else k = 0 } if (k == 4) { i -= 3; print $i "." $++i "." $++i "." $++i "\t" $i "." $--i "." $--i "." $--i "\t" $++i "." $++i "." $++i "." $(i - 3) } else print "" }' {./myscript01.awk myfile01.in ./myscript02.awk myfile02.in } - &> Output-TwoCol.txt

This is not yet working, as awk doesn't recognize myscript01.awk or myscript02.awk, which are both in the current directory and
were made executable with "chmod +x myscrip??.awk" according to the referenced HTML link. Taking the forward slashes out doesn't
help; not does removing the hidden-file dot character do those scripts need a preceding -f ?

As in Fortran, the error statements only reflect the consequence of errors, not their cause ...

Magic Banana

I am a member!

I am a translator!

Offline
Joined: 07/24/2010

The tutorial you found is excellent. It never presents the execution of an external command line from within AWK. It is actually possible, using the "system" statement. You apparently only want to do do that (calling, from within a AWK program other AWK programs, which themselves call 'nmap', etc.). I very much doubt it is what you need.

I also very much doubt you did the tutorial: you must write simple programs before writing more complicated ones. Here, you are basically asking (not with a correct syntax) to execute two AWK programs on fixed input files as many times as there are lines processed by the main AWK program. You probably only want to post-process the output of the main AWK program (not that I really understand what you try to achieve, though).

And before learning AWK, you should learn the basics of the Shell (redirections, piping, etc.).

amenex
Offline
Joined: 01/03/2015

Backtracking to find a less error-prone script-creation process:

Step (1): Apply Magic Banana's one-line processor to the single-column Hostname_List containing all the Webalizer
records for a time period:

tr -sc 0-9\\n ' ' < hostname_list | awk '{ k = 0; for (i = 0; k < 4 && ++i <= NF; ) { if ($i < 256) ++k; else k = 0 } if (k == 4) { i -= 3; print $i "." $++i "." $++i "." $++i "\t" $i "." $--i "." $--i "." $--i "\t" $++i "." $++i "." $++i "." $(i - 3) } else print "" }' | paste hostname_list - | grep '[0-9]$' | sudo nmap -sS -p3389 -iL -

This outputs a list of "Failed to resolve" error messages and nMap outputs, as well as many of the original bare IPv4 addresses
and candidate IPv4 hostnames that are unresolvable, which needs post-processing.

Step (2) Add my in-line, post-processing steps:

time tr -sc 0-9\\n ' ' < CPV-GB-OneCol0-6192019.txt | awk '{ k = 0; for (i = 0; k < 4 && ++i <= NF; ) { if ($i < 256) ++k; else k = 0 } if (k == 4) { i -= 3; print $i "." $++i "." $++i "." $++i "\t" $i "." $--i "." $--i "." $--i "\t" $++i "." $++i "." $++i "." $(i - 3) } else print "" }' | paste CPV-GB-OneCol0-6192019.txt - | grep '[0-9]$' | sudo nmap -sS -p3389 -iL -| grep "Nmap scan report for " | cut -c22- > Two-Col-Output.txt

Where CPV-GB-OneCol0-6192019.txt is the sorted and duplicate-hostname-free list from my website's cPanel Current Visitor list,
which includes quite a few bare IPv4 addresses (which might include an occasional PTR record that uses its own IPv4 address or
a forged IPv4 address, over which I have little control other than Google).

The [mostly] legitimate bare IPv4 addresses are in the original hostname_list file and can be treated concurrently. There are
also a number of duplicated hostnames and their real IPv4 addresses, plus a number of singular hostnames and _their_ legitimate
IPv4 addresses which are _not_ in the original hostname_list file. These are innocent bystanders and must be removed from the
output file.

Step (3) Sort the Two-Col-Output.txt file:

sort Two-Col-Output.txt > Two-Col-Output-Sorted.txt

Where it will be clear that many bare IPv4 addresses are duplicated, along with many resolved hostnames, but there will
be unintended bystanders that have legitimate IPv4 addresses but are absent from the original hostname_list that are not
duplicated, along with bare IPv4 addresses that weren't resolved with nMap and also aren't duplicated.

Step (4) Select the properly resolved outputs from the sorted output file:

I tried selecting just the duplicated rows of the sorted output file, but that discarded legitimately resolved hostnames
along with the inadvertent bystanders, so the sorted hostname_list file has to be compared to the sorted output file in
order to retain all the properly resolved input hostnames. This can be done with LibreOffice Calc, but that would be
considered "off-script" in the present discussion.

Hostnames that haven't been resolved with the main script have to be included and resolved by using other search methods,
such as Google and Hurricane Electric.

Steps (1 -3) can be concatenated:

time tr -sc 0-9\\n ' ' < CPV-GB-OneCol0-6192019.txt | awk '{ k = 0; for (i = 0; k < 4 && ++i <= NF; ) { if ($i < 256) ++k; else k = 0 } if (k == 4) { i -= 3; print $i "." $++i "." $++i "." $++i "\t" $i "." $--i "." $--i "." $--i "\t" $++i "." $++i "." $++i "." $(i - 3) } else print "" }' | paste CPV-GB-OneCol0-6192019.txt - | grep '[0-9]$' | sudo nmap -sS -p3389 -iL -| grep "Nmap scan report for " | cut -c22- | sort - > Two-Col-Output-Sorted.txt

Remove the duplicates from Two-Col-Output-Sorted.txt in LibreOffice Calc ... and compare to CPV-GB-OneCol0-6192019.txt ...

Here's the file as processed with LibreOffice Calc: Two-Col-Output-Sorted-Calc-NoDupes.txt It's now in solid two-column format
with tabs as separators. There are no B-column cells for the bare IPv4 addresses, so those IPv4 addresses can be shifted into
the second column easily, leaving blanks in the hostname column. All that remains is to find and delete the non-matching hostnames
from the output file.

A couple of dozen hostnames whose IPv4 addresses are discernible by inspection aren't in the output list from our scripts - it
turns out that none of them were "up" at the time of the nMap scan: but the resolution of the deduced IPv4 addresses can be
confirmed with nslookup:

time for i in `cat CPV-Supplemental_list-Sorted.txt`; do nslookup $i; done | grep -A1 "Name:" > CPV-NSL-Supplemental-Output-Sorted.txt

Lots of post-processing to be done.

AttachmentSize
Two-Col-Output-Sorted-Calc-NoDupes.txt 2.1 KB
CPV-Supplemental_list-Sorted.txt 854 bytes
CPV-NSL-Supplemental-Output-Sorted.txt 1.43 KB
amenex
Offline
Joined: 01/03/2015

After composing yet another chapter in this treatise and then losing it after an errant pinkie brushed across the
freeze-the-system key, I'm starting over to find a way of finishing the hostname-resolution process:

There are three methods of resolving hostnames:

1. Magic Banana's technique of extracting the four octets of a candidate IPv4 address and then testing three different
permutations of those octets by performing nMap scans, with some postprocessing added in line:

time tr -sc 0-9\\n ' ' < CPV-GB-OneCol0-6192019.txt | awk '{ k = 0; for (i = 0; k < 4 && ++i <= NF; ) { if ($i < 256) ++k; else k = 0 } if (k == 4) { i -= 3; print $i "." $++i "." $++i "." $++i "\t" $i "." $--i "." $--i "." $--i "\t" $++i "." $++i "." $++i "." $(i - 3) } else print "" }' | paste CPV-GB-OneCol0-6192019.txt - | grep '[0-9]$' | sudo nmap -sS -p3389 -iL -| grep "Nmap scan report for " | cut -c22- > Two-Col-Output.txt

2. My expedient nslookup script:

time for i in `cat CPV-Supplemental_list-Sorted.txt`; do nslookup $i; done | grep -A1 "Name:" > CPV-NSL-Supplemental-Output-Sorted.txt

3. Then there's dig -x to perform reverse hostname lookups:

time tr -sc 0-9\\n ' ' < CPV-GB-OneCol0-6192019.txt | awk '{ k = 0; for (i = 0; k < 4 && ++i <= NF; ) { if ($i < 256) ++k; else k = 0 } if (k == 4) { i -= 3; print "-x "$i "." $++i "." $++i "." $++i "\t" "-x "$i "." $--i "." $--i "." $--i "\t" "-x "$++i "." $++i "." $++i "." $(i - 3) } else print "\t" }' | tee twixt - | dig -f twixt | grep -A1 "ANSWER SECTION:" &> CPV-MB-answers-noNS-Rev-Supplemental-Output-Sorted.txt

Takes fifty-four seconds ... all the answers as well as the interlopers, but IPv4's have to be extracted from the ARPA data.
Note that dig -f will not return the PTR records of the candidate IPv4 addresses unless there's an "-x " in front of each of
the three permutations of the extracted four octets in Magic Banana's portion of this script that are then sent to dig -f.

Process the ARPA list:

time tr -sc 0-9\\n ' ' < CPV-ARPA-list.txt | awk '{ k = 0; for (i = 0; k < 4 && ++i <= NF; ) { if ($i < 256) ++k; else k = 0 } if (k == 4) { i = 4; print $i "." $--i "." $--i "." $--i }}' &> De-ARPAd-IPv4List.txt

About half of these IPv4 addresses match the IPv4 addresses (obtained from CPV-Supplemental_list-Sorted.txt) with nslookup in
my most recent posting. None of the other half are in the original hostnam)_list (CPV-GB-OneCol0-6192019.txt).

To sum everything up, I've annotated the three main outputs, concatenated them, added some more annotations, and made three
different sorts of the results:

Two-Col-Output-Sorted-Calc-NoDupes-ann.ods plus CPV-MB-NSL-TwoColumns-06222019-ann.ods plus CPV-Dig-De-ARPAd-IPv4List-ann.ods:

A. CPV-ThreeMethods-06222019-ann.ods ==> Sorted on the candidate hostname list and then compared manually to the original list
of hostnames gathered from a single morning's Recent Visitors.

B. CPV-ThreeMethods-06222019-sort.ods ==> Sorted on the search method, with dig first, then nMap and finally NSL.

C. CPV-ThreeMethods-06222019-Yes.ods ==> Sorted on the outcomes of the searches, with successful (Y) first. "O" indicates names
that are outside of the scope of the present scripts, but which can be found easily on the Internet.

And here''s the scorecard: CPV-GB-OneCol0-6192019-Resolved-Sort.ods ==> Three hostnames that should have been found with Magic
Banana's script were somehow missed ("None ?). All the rest of the hostnames were resolved by one or more of the three methods.
Of course, the bare IPv4 addresses at the bottom of this last table are easily fleshed out with whois, then their CIDR blocks,
and lastly, the list of CIDR blocks within their Autonomous System, which is what I planto do with the resolved IPv4 addresses
in order to find the multiply duplicated PTR records which otherwise hide those IPv4 addresses from scrutiny.

AttachmentSize
CPV-GB-OneCol0-6192019.txt 2.93 KB
CPV-Supplemental-Output-Sorted.txt 60 bytes
CPV-ARPA-list.txt 1018 bytes
Two-Col-Output-Sorted-Calc-NoDupes-ann.ods 30.52 KB
CPV-MB-NSL-TwoColumns-06222019-ann.ods 24.98 KB
CPV-Dig-De-ARPAd-IPv4List-ann.ods 24.17 KB
CPV-ThreeMethods-06222019-ann.ods 24.25 KB
CPV-ThreeMethods-06222019-sort.ods 25.56 KB
CPV-ThreeMethods-06222019-Yes.ods 24.48 KB
CPV-GB-OneCol0-6192019-Resolved-Sort.ods 24.88 KB
Magic Banana

I am a member!

I am a translator!

Offline
Joined: 07/24/2010

And here''s the scorecard: CPV-GB-OneCol0-6192019-Resolved-Sort.ods ==> Three hostnames that should have been found with Magic
Banana's script were somehow missed ("None ?).

It works here (hostname_list contains the first column of your spreadsheet):
$ ./find-IPv4-addr.awk hostname_list
178-137-92-231.broadband.kyivstar.net 178.137.92.231 231.92.137.178 137.92.231.178
static.83.94.46.78.clients.your-server.de 83.94.46.78 78.46.94.83 94.46.78.83
bb121-7-170-10.singnet.com.sg 121.7.170.10 10.170.7.121 7.170.10.121
fulltextrobot-77-75-76-161.seznam.cz 77.75.76.161 161.76.75.77 75.76.161.77
google-proxy-64-233-173-150.google.com 64.233.173.150 150.173.233.64 233.173.150.64
17-58-96-188.applebot.apple.com 17.58.96.188 188.96.58.17 58.96.188.17
baiduspider-180-76-15-137.crawl.baidu.com 180.76.15.137 137.15.76.180 76.15.137.180
crawl-54-236-1-12.pinterest.com 54.236.1.12 12.1.236.54 236.1.12.54
host-185-251-38-166.hosted-by-vdsina.ru 185.251.38.166 166.38.251.185 251.38.166.185
ip199.213-181-133.pegonet.sk 199.213.181.133 133.181.213.199 213.181.133.199
msnbot-157-55-39-153.search.msn.com 157.55.39.153 153.39.55.157 55.39.153.157
msnbot-157-55-39-222.search.msn.com 157.55.39.222 222.39.55.157 55.39.222.157
msnbot-157-55-39-88.search.msn.com 157.55.39.88 88.39.55.157 55.39.88.157
msnbot-207-46-13-125.search.msn.com 207.46.13.125 125.13.46.207 46.13.125.207
static.88-198-36-62.clients.your-server.de 88.198.36.62 62.36.198.88 198.36.62.88
140.82.10.185.vultr.com 140.82.10.185 185.10.82.140 82.10.185.140
143.237.193.82.ediscom.de 143.237.193.82 82.193.237.143 237.193.82.143
219.63.227.35.bc.googleusercontent.com 219.63.227.35 35.227.63.219 63.227.35.219
116-48-158-174.static.netvigator.com 116.48.158.174 174.158.48.116 48.158.174.116
24-181-158-230.static.dlth.mn.charter.com 24.181.158.230 230.158.181.24 181.158.230.24
5-255-250-121.spider.yandex.com 5.255.250.121 121.250.255.5 255.250.121.5
62.62.198.146.dyn.plus.net 62.62.198.146 146.198.62.62 62.198.146.62
77-88-47-28.spider.yandex.com 77.88.47.28 28.47.88.77 88.47.28.77
bot-103-131-71-48.coccoc.com 103.131.71.48 48.71.131.103 131.71.48.103
c-71-236-162-44.hsd1.or.comcast.net 71.236.162.44 44.162.236.71 236.162.44.71
crawl-66-249-79-111.googlebot.com 66.249.79.111 111.79.249.66 249.79.111.66
crawl-66-249-79-139.googlebot.com 66.249.79.139 139.79.249.66 249.79.139.66
fulltextrobot-77-75-78-162.seznam.cz 77.75.78.162 162.78.75.77 75.78.162.77
fulltextrobot-77-75-79-95.seznam.cz 77.75.79.95 95.79.75.77 75.79.95.77
google-proxy-64-233-173-149.google.com 64.233.173.149 149.173.233.64 233.173.149.64
google-proxy-64-233-173-151.google.com 64.233.173.151 151.173.233.64 233.173.151.64
google-proxy-66-102-6-215.google.com 66.102.6.215 215.6.102.66 102.6.215.66
google-proxy-66-102-6-217.google.com 66.102.6.217 217.6.102.66 102.6.217.66
host81-142-252-114.in-addr.btopenworld.com 81.142.252.114 114.252.142.81 142.252.114.81
ip174-67-55-131.ok.ok.cox.net 174.67.55.131 131.55.67.174 67.55.131.174
va-71-51-6-42.dhcp.embarqhsd.net 71.51.6.42 42.6.51.71 51.6.42.71
abts-tn-dynamic-70.160.49.171.airtelbroadband.in 70.160.49.171 171.49.160.70 160.49.171.70
customer-static-201-216-208.145.iplannetworks.net 201.216.208.145 145.208.216.201 216.208.145.201
static-mum-182.59.70.225.mtnl.net.in 182.59.70.225 225.70.59.182 59.70.225.182

amenex
Offline
Joined: 01/03/2015

Magic Banana checked my homework:

> It works here (hostname_list contains the first column of your spreadsheet):

... [selecting the last three rows]

> abts-tn-dynamic-70.160.49.171.airtelbroadband.in 70.160.49.171 171.49.160.70 160.49.171.70

> customer-static-201-216-208.145.iplannetworks.net 201.216.208.145 145.208.216.201 216.208.145.201

> static-mum-182.59.70.225.mtnl.net.in 182.59.70.225 225.70.59.182 59.70.225.182

..................................................................... Task: pick one of the each of the three choices above ........

Truncating each of the three search methods to avoid mis-aimed grep's:

(1) time tr -sc 0-9\\n ' ' < CPV-ThreeNone.txt | awk '{ k = 0; for (i = 0; k < 4 && ++i <= NF; ) { if ($i < 256) ++k; else k = 0 } if (k == 4) { i -= 3; print $i "." $++i "." $++i "." $++i "\t" $i "." $--i "." $--i "." $--i "\t" $++i "." $++i "." $++i "." $(i - 3) } else print "" }' | paste CPV-ThreeNone.txt - | grep '[0-9]$' | sudo nmap -sS -p3389 -iL -

Only one of the three hosts was found "up."

(2) time tr -sc 0-9\\n ' ' < CPV-ThreeNone.txt | awk '{ k = 0; for (i = 0; k < 4 && ++i <= NF; ) { if ($i < 256) ++k; else k = 0 } if (k == 4) { i -= 3; print "-x "$i "." $++i "." $++i "." $++i "\t" "-x "$i "." $--i "." $--i "." $--i "\t" "-x "$++i "." $++i "." $++i "." $(i - 3) } else print "\t" }' | tee twixt - | dig -f twixt

Finds all three and returns each one's arpa address, from which the IPv4 address can be obtained by reversing the octet
order, but two innocent bystanders are included ...

(3) time for i in `cat CPV-ThreeNone.txt`; do nslookup $i; done

All three were "not found" by nslookup ... but remember that those are hostnames; nslookup works fine on IPv4 addresses.

Results [stripped of innocent bystanders by visual inspection]:

abts-tn-dynamic-70.160.49.171.airtelbroadband.in ....... 171.49.160.70 ......... dig
customer-static-201-216-208.145.iplannetworks.net ..... 201.216.208.145 ..... nMap, dig
static-mum-182.59.70.225.mtnl.net.in ........................... 182.59.70.225 ......... dig

... iplannetworks.net might also have been "down" when last I checked ...

My ISP's Apache server did hostname lookup on all three when I collected the CPV data,
apparently not with nslookup !

NSLookup failed, but dig did not ... maybe I missed something in cutting & pasting. Next time: sort !:

> cat CPV-ThreeNone-Output.txt | more [and remove the intervening rows & trailing dots with LibreOffice Calc]:

> 171.49.160.70.in-addr.arpa ip70-160-49-171.hr.hr.cox.net
> 70.160.49.171.in-addr.arpa abts-tn-dynamic-70.160.49.171.airtelbroadband.in
> 145.208.216.201.in-addr.arpa customer-static-201-216-208.145.iplannetworks.net
> 201.216.208.145.in-addr.arpa 145.208.early-registration.of.surfnet.invalid
> 225.70.59.182.in-addr.arpa static-mum-182.59.70.225.mtnl.net.in

When I tried the short searches, I discovered that my post-processing of the ARPA names
scrambled the order of the IPv4 addresses... which is corrected below:

> cat CPV-ThreeNone-Final.txt | more [presented in the same order as for the preceding list]:

> 70.160.49.171
> 171.49.160.70
> 201.216.208.145
> 145.208.216.201
> 182.59.70.225

The combined table is cat CPV-Dig-Output-Sort-RemoveDots-NoDupes.txt:

145.208.216.201 145.208.early-registration.of.surfnet.invalid
171.49.160.70 abts-tn-dynamic-70.160.49.171.airtelbroadband.in
201.216.208.145 customer-static-201-216-208.145.iplannetworks.net
70.160.49.171 ip70-160-49-171.hr.hr.cox.net
182.59.70.225 static-mum-182.59.70.225.mtnl.net.in

The suitably sorted input table is cat CPV-Dig-Input-OneCol-Sorted.txt:

abts-tn-dynamic-70.160.49.171.airtelbroadband.in
customer-static-201-216-208.145.iplannetworks.net
static-mum-182.59.70.225.mtnl.net.in

Two of the rows in the longer list above are _not_ in the lower, short list.
They're the ones for which post-processing is needed for removal.

That's the bailiwick of the join command:

join -1 1 -2 2 CPV-Dig-Input-OneCol-Sorted.txt CPV-Dig-Output-Sort-RemoveDots-NoDupes.txt > CPV-joined-TwoCols.txt

cat CPV-joined-TwoCols.txt [the two fields are separated by spaces; tabs would be better]:

abts-tn-dynamic-70.160.49.171.airtelbroadband.in 171.49.160.70
customer-static-201-216-208.145.iplannetworks.net 201.216.208.145
static-mum-182.59.70.225.mtnl.net.in 182.59.70.225

This only works if the sorting is done on the _second_ column of the two-column list, which is the one on which the
target, two-column file is being joined. Opening CPV-joined-TwoCols.txt with LibreOffice Calc replaces those spaces
with tabs; the file can then be copied & pasted into Leafpad for subsequent analysis.

I'd take a victory lap now if I had somehow kept dig from putting periods at the end of the ARPA and hostname fields;
they're difficult to remove, and with them, removing duplicate lines and the join command won't happen.

Will someone tell me how the "join" output has its columns reversed respective to the target table ?

The "man join" page tells me that "The default join field is the first, delimited by blanks" ... that puts the
hostname field first in the output, followed by the untested field of the target table. It would be silly to print
the second field of the matching row of the target file, as it is identical to the join field. That's what happened
here, it seems.

AttachmentSize
CPV-ThreeNone.txt 136 bytes
CPV-Dig-Input-OneCol-Sorted.txt 136 bytes
CPV-Dig-Output-Sort-RemoveDots-NoDupes.txt 286 bytes
Magic Banana

I am a member!

I am a translator!

Offline
Joined: 07/24/2010

> cat CPV-ThreeNone-Output.txt | more [and remove the intervening rows & trailing dots with LibreOffice Calc]

I do not think you realize how much time of your life you could save by seriously learning (say for ~10 hours) GNU's text-processing commands. I understand you want to remove a dot per line if it is its last character:
$ sed 's/\.$//' CPV-ThreeNone-Output.txt

I do not know what you call an "intervening row".

cat CPV-joined-TwoCols.txt [the two fields are separated by spaces; tabs would be better]

Same remark as above:
$ tr -s ' ' '\t' < CPV-joined-TwoCols.txt
(If the separator is always one single space, the option -s is useless.)

Will someone tell me how the "join" output has its columns reversed respective to the target table ?

There is no "target table" in a join.

The "man join" page tells me that...

The man pages of the GNU commands are incomplete specifications. As written at the end of 'man join', you want to read 'info join'. In this case, you want to read about join's -o option, which allows to specify its output format. The 'info' pages are not only more complete, they are also more user-friendly (more explanations, examples, etc.).

amenex
Offline
Joined: 01/03/2015

Magic Banana, teaching, says:

> > cat CPV-ThreeNone-Output.txt | more [and remove the intervening rows & trailing dots with LibreOffice Calc]

> I do not think you realize how much time of your life you could save by seriously learning (say for ~10 hours) GNU's text-processing commands.

Magic Banana, continuing:

> I do not know what you call an "intervening row".

Read on for a hands-on demonstration.

> > cat CPV-joined-TwoCols.txt [the two fields are separated by spaces; tabs would be better]

> Same remark as above:
>$ tr -s ' ' '\t' < CPV-joined-TwoCols.txt
> (If the separator is always one single space, the option -s is useless.)

Leafpad dodges the scripting language hurdles with search-and-replace-all; see below.

amenex responds with more homework:

time tr -sc 0-9\\n ' ' < CPV-ThreeNone.txt | awk '{ k = 0; for (i = 0; k < 4 && ++i <= NF; ) { if ($i < 256) ++k; else k = 0 } if (k == 4) { i -= 3; print "-x "$i "." $++i "." $++i "." $++i "\t" "-x "$i "." $--i "." $--i "." $--i "\t" "-x "$++i "." $++i "." $++i "." $(i - 3) } else print "\t" }' | tee twixt - | dig -f twixt | grep -A1 "ANSWER SECTION:" - [copied & pasted from console into CPV-MB-Output02.txt, which follows below]

;; ANSWER SECTION:
171.49.160.70.in-addr.arpa. 86400 IN PTR ip70-160-49-171.hr.hr.cox.net.
--
;; ANSWER SECTION:
70.160.49.171.in-addr.arpa. 28800 IN PTR abts-tn-dynamic-70.160.49.171.airtelbroadband.in.
--
;; ANSWER SECTION:
145.208.216.201.in-addr.arpa. 86400 IN PTR customer-static-201-216-208.145.iplannetworks.net.
--
;; ANSWER SECTION:
201.216.208.145.in-addr.arpa. 604800 IN PTR 145.208.EARLY-REGISTRATION.of.SURFnet.invalid.
--
;; ANSWER SECTION:
225.70.59.182.in-addr.arpa. 86400 IN PTR static-mum-182.59.70.225.mtnl.net.in.

real 0m2.595s
user 0m0.020s
sys 0m0.012s

The fields, ";; ANSWER SECTION:" and "--" can be removed in Leafpad with the search-and-replace-all function:
See CPV-MB-Output02.txt & CPV-MB-Output03.txt

The varying-content fields between the two address fields can be deleted and replaced by tabs in LibreOffice Calc:
See CPV-MB-Output04.txt

MB's sed script "sed 's/\.$//' CPV-MB-Output04.txt" removes the trailing period (.) from the second column (only!):
See CPV-MB-Output05.txt
There's an unscripted way of doing this, too. Highlight the trailing dot and the following blank line in CPV-MB-Output03.txt,
followed by pasting that combination of dot and two [end-of-line?] characters into Leafpad's search-and-replace-all function.
All the blank lines and that pesky trailing dot will disappear ... except the final dot. That final dot can also be handled
by adding [two carriage returns ?] (i.e., pressing "Enter" twice) at the end of the last row in the Leafpad file beforehand:
See CPV-MB-Output03-Tr.txt

Going back to Leafpad ... the "dot[tab]" fields can be removed by highlighting one of the ".\t" and then replacing all with "\t":
See CPV-MB-Output06.txt and this reference: https://askubuntu.com/questions/525358/how-to-replace-tabs-for-spaces-in-gedit
[Aside: this forum software doesn't like tabs as much as Leafpad does; I used \t to represent those invisible tabs.]

These are essential steps to be completed before concatenating the outputs, sorting, and removing duplicates, followed
by application of the "join" command. Leafpad handles these steps quickly, if not in the blink of an eye.

Magic Banana continues:

> > Will someone tell me how the "join" output has its columns reversed respective to the target table ?

> There is no "target table" in a join.

> > The "man join" page tells me that...

> The man pages of the GNU commands are incomplete specifications. As written at the end of 'man join',
> you want to read 'info join'. In this case, you want to read about join's -o option, which allows to
> specify its output format. The 'info' pages are not only more complete, they are also more user-friendly
> (more explanations, examples, etc.).

This is where I found out the "join" syntax used in my previous post: https://www.howtoforge.com/tutorial/linux-join-command/
where it's said:

>> Now, if you want the second field of each line to be the common field for join, you can tell this to the tool by
>> using the -1 and -2 command line options. While the former represents the first file, the latter refers to the
>> second file. These options requires a numeric argument that refers to the joining field for the corresponding file.
>
>> For example, in our case, the command will be:
>
>> join -1 2 -2 2 file1 file2
>
>> And here's the output of this command: https://www.howtoforge.com/images/linux_join_command/join-custom-fields.png
>
>> Note that in case the position of common field is same in both files ...

I had been thinking of consecutive actions, whereas join does them in parallel.

AttachmentSize
CPV-MB-Output02.txt 443 bytes
CPV-MB-Output03.txt 435 bytes
CPV-MB-Output04.txt 361 bytes
CPV-MB-Output05.txt 356 bytes
CPV-MB-Output06.txt 351 bytes
CPV-MB-Output01.txt 533 bytes
CPV-MB-Output03-Tr.txt 422 bytes
Magic Banana

I am a member!

I am a translator!

Offline
Joined: 07/24/2010

tee twixt - | dig -f twixt

That should be 'tee twixt | dig -f -' (or only 'dig -f -' if you do not want to save "twixt"), for the same reason I explained you four times. If 'dig' does not understand "-" as "standard input", write 'tee twixt | dig -f /dev/stdin'.

... | grep -A1 "ANSWER SECTION:" - [copied & pasted from console into CPV-MB-Output02.txt, which follows below]
(...)
The fields, ";; ANSWER SECTION:" and "--" can be removed in Leafpad with the search-and-replace-all function
(...)
The varying-content fields between the two address fields can be deleted and replaced by tabs in LibreOffice Calc
(...)
Highlight the trailing dot and the following blank line in CPV-MB-Output03.txt, followed by pasting that combination of dot and two [end-of-line?] characters into Leafpad's search-and-replace-all function.
All the blank lines and that pesky trailing dot will disappear ... except the final dot. That final dot can also be handled by adding [two carriage returns ?] (i.e., pressing "Enter" twice) at the end of the last row in the Leafpad file beforehand
(...)
Going back to Leafpad ... the "dot[tab]" fields can be removed by highlighting one of the ".\t" and then replacing all with "\t"

... or you write two lines of AWK that do all that and that you can re-execute whenever you want, to post-process dig's output:
#!/usr/bin/awk -f
BEGIN { RS = ";; ANSWER SECTION:" }
{ print gensub(/\.$/, "", 1, $1) "\t" gensub(/\.$/, "", 1, $5) }

Leafpad handles these steps quickly, if not in the blink of an eye.

Copy-pasting things between the terminal, a text editor and LibreOffice, highlighting things, typing, ... and doing all that all over again whenever you get new data to process. That is not quick. That is a waste of time.

It takes some time (maybe ~10h) to learn the basics of the text-processing commands (starting with the simplest use case; not with a real-world problem you face), but it is worth it. Thanks to 'info' (where you can, for instance, discover that 'grep' has an option --no-group-separator), there is no need to know by heart every option. Only what feature you can expect from a given command, the regular expressions (for 'grep', 'sed' and 'awk'), etc.

This is where I found out the "join" syntax used in my previous post: https://www.howtoforge.com/tutorial/linux-join-command/

That tutorial does not exemplify the use of the -o option, to format join's output.

amenex
Offline
Joined: 01/03/2015

Magic Banana wrote:

>> tee twixt - | dig -f twixt

> That should be 'tee twixt | dig -f -'

In the meantime I had been forced to change 'tee twixt -' to 'tee twixt' in the process of troubleshooting ...

> ... (or only 'dig -f -' if you do not want to save "twixt")...

My dig scripts had been curtailed by a 'dig -' error caused because the twixt file was getting chopped off mid-output, ending,
not in lines containing '-x [ipv4]' three times, but in an entry curtailed before one of the -x fields finished (65.5kB).

I divided my ~30,000 row input file into ten ~3000 row files, and now the first three of those completed with no errors,
but only about half-way through the subdivided input files, always with twixt = 65.5 kB. The last of the twixt files ends
thusly:

>>> -x 113.203.238.179 -x 179.238.203.113 -x 203.238.179.113
>>> -x 46.143.247.52 -x 52

It appears to me that the awk script runs first, and in a very short time, followed one-by-one by the dig and grep scripts.
'tee twixt' apparently fills up the buffer, leading me to skip that step and go directly to your 'dig -f -' suggestion.

> for the same reason I explained you four times. If 'dig' does not understand "-" as "standard input", write:
> 'tee twixt | dig -f /dev/stdin'

Right now, the following syntax looks better and processes the entirety of the subdivided input file:

time tr -sc 0-9\\n ' ' < Input-shuf-01.txt | awk '{ k = 0; for (i = 0; k < 4 && ++i <= NF; ) { if ($i < 256) ++k; else k = 0 } if (k == 4) { i -= 3; print "-x "$i "." $++i "." $++i "." $++i "\t" "-x "$i "." $--i "." $--i "." $--i "\t" "-x "$++i "." $++i "." $++i "." $(i - 3) } else print "\t" }' | dig -f - | grep -A1 "ANSWER SECTION:" &> Output-shuf-01.txt

By the way, worried that my 'puter was getting hacked, I shuffled the input file before subdividing it. I think the "hacker"
was the truncated twixt file(s).

Aside: While scrolling through and updating the output file, I saw two output hostnames that are simply "." but it's worse
than that: they are actually blank, as that dot is the trailing period, later to be removed in post-processing. I've seen
the ".", sometimes offset by spaces which do not reproduce well here, at times in my doimains' Current Visitor outputs. Try
these two IPv4 addresses: 117.185.15.103 and 211.136.127.125

Magic Banana

I am a member!

I am a translator!

Offline
Joined: 07/24/2010

My dig scripts had been curtailed by a 'dig -' error caused because the twixt file was getting chopped off mid-output, ending

I have been explaining you the problem (and its solution) many times now: the commands on both side of a pipe, |, run in parallel: you cannot have command1 write a regular file and reliably have, somewhere else in the command line, command2 read it. The file may not even exist yet when command2 tries to read it. Use pipes, not regular files.

I divided my ~30,000 row input file into ten ~3000 row files, and now the first three of those completed with no errors, but only about half-way through the subdivided input files

That is no reliable solution. Obviously. Trying to fix things without understanding the problem does not lead anywhere. And to understand the problem, you need to understand how pipes work.

It appears to me that the awk script runs first, and in a very short time, followed one-by-one by the dig and grep scripts.

For the n-th time: they run in parallel. Not in a sequence.

Right now, the following syntax looks better and processes the entirety of the subdivided input file

Again: do not subdivide the input file.

Also, I do not understand how you can deem the syntax of the referred command line "better" than giving a name (e.g., "find-IPv4-addr.awk") to the AWK script I gave you in https://trisquel.info/forum/how-can-one-capture-failed-resolve-output-console-during-nmap-scan#comment-141556 , giving another name (e.g., "grep-URLs-after-dig.awk") for the two-line AWK script in https://trisquel.info/forum/how-can-one-capture-failed-resolve-output-console-during-nmap-scan#comment-141639 and writing (after turning both scripts executable):
$ ./find-IPv4-addr.awk hostname_list | dig -f - | ./grep-URLs-after-dig.awk

amenex
Offline
Joined: 01/03/2015

Oops ... still not quite time for a victory lap ...

Some time ago, we worked out that I could run several awk commands like this concurrently in separate terminal windows:
> time sudo nmap -sS -p3389 -T4 --host-timeout 300 --min-hostgroup 25 -iL Input-IPv4s.txt > nMap-IPv4s-to-HNs.txt
and even:
> time sudo nmap -sS -p3389 -T4 --host-timeout 300 --min-hostgroup 25 -iL Input-CIDRs.txt > nMap-IPv4s-to-HNs.txt

Those never collided with each other ...

Now I was planning to run concurrently several commands of the form:

time tr -sc 0-9\\n ' ' < Input01.txt | awk '{ k = 0; for (i = 0; k < 4 && ++i <= NF; ) { if ($i < 256) ++k; else k = 0 } if (k == 4) { i -= 3; print "-x "$i "." $++i "." $++i "." $++i "\t" "-x "$i "." $--i "." $--i "." $--i "\t" "-x "$++i "." $++i "." $++i "." $(i - 3) } else print "\t" }' | dig -f - | grep -A1 "ANSWER SECTION:" &> Output01-Dig.txt

time tr -sc 0-9\\n ' ' < Input02.txt | awk '{ k = 0; for (i = 0; k < 4 && ++i <= NF; ) { if ($i < 256) ++k; else k = 0 } if (k == 4) { i -= 3; print "-x "$i "." $++i "." $++i "." $++i "\t" "-x "$i "." $--i "." $--i "." $--i "\t" "-x "$++i "." $++i "." $++i "." $(i - 3) } else print "\t" }' | dig -f - | grep -A1 "ANSWER SECTION:" &> Output02-Dig.txt
... up to
time tr -sc 0-9\\n ' ' < Input10.txt | awk '{ k = 0; for (i = 0; k < 4 && ++i <= NF; ) { if ($i < 256) ++k; else k = 0 } if (k == 4) { i -= 3; print "-x "$i "." $++i "." $++i "." $++i "\t" "-x "$i "." $--i "." $--i "." $--i "\t" "-x "$++i "." $++i "." $++i "." $(i - 3) } else print "\t" }' | dig -f - | grep -A1 "ANSWER SECTION:" &> Output10-Dig.txt

But they'd all be using the same standard output ... but, so long as that isn't filled with the same filename like twixt ... now
I've gotten seven scripts to be running unimpeded ... and three more after those ... totalling about ninety minutes. Nice memory
control.
.

Magic Banana

I am a member!

I am a translator!

Offline
Joined: 07/24/2010

Some time ago, we worked out that I could run several awk commands like this concurrently in separate terminal windows

Those command lines have nothing to do with awk.

Now I was planning to run concurrently several commands of the form

Again: I see no reason to divide the input. GNU's text-processing commands can process, line by line, arbitrarily large inputs. They are no spreadsheet program!