Find the instances of each of a list of strings and print each set in a separate file
- Inicie sesión o regístrese para enviar comentarios
Working towards a list of multi-address hostnames (PTR's) in a long file of PTR records (column $1)
and their IPv4 addresses (column $2), I'll soon need to process the combined list of all those
outputs so as to produce a relatively large number of separate files listing all the IPv4 addresses
found for each PTR.
Let's say that the list of multi-address hostnames is PTRList.txt (attached).
The following join command provides a sorted list of all those hostnames and their IPv4 addresses:
join -a 1 -1 1 -2 1 <(sort PTRList.txt) <(sort -u IPv4.May2020.37.nMapoG.txt) | sort -nrk 2 | awk '{print $1"\t"$3"\t"$2}' '-' > Temp.0716.01.txt
The severely truncated file IPv4.May2020.37.nMapoG.txt is also attached. The third column of the
output file is the number of instances of each multi-address PTR.
The join script outputs each PTR and its several IPv4 addresses as a group, but each group needs to be
diverted to a separate filename with ".txt" appended to the unique PTR.
George Langford
Adjunto | Tamaño |
---|---|
PTRList.txt | 1.34 KB |
IPv4.May2020.37.nMapoG.txt | 60.34 KB |
Giving -1 1 and -2 1 as options to join is useless: 1 is their default argument. I also doubt you want -a 1, which makes no difference on the given input. You want to give -k 2,2 to sort, not -k 2 (if not all addresses would start with two digits, the output would be wrong). '-' in argument of awk is useless. I am getting tired of always telling you the exact same things...
As far as I understand what you are trying to achieve, any output file would always have the same PTR repeated in the first column and the same "number of instances" (whatever that is; it is not the number of addresses) in the last column. That is a waste of disk space. Also the last sort in you command line becomes useless (the "number of instances" being always the same in a file). The command line below only writes addresses in files whose names are the "number of instances", followed by a comma, followed by the PTR:
$ mkdir out; sort -u IPv4.May2020.37.nMapoG.txt | awk 'FILENAME == ARGV[1] { a[$1] = $2 } FILENAME == ARGV[2] && $1 in a { print $2 >> "out/" a[$1] "," $1 }' PTRList.txt -
Because the "number of instances" starts the name of a file, 'ls -v out' sorts the file using that number (and so do most file managers).
All that said, using many small files is usually a terrible idea, performance-wise. You may want to use the output of your join (minus its useless options).
Would I pick -a 2 in my join file, the output file would be twenty times as big as the
file output with -a 1, and I would have a great deal of editing to do to clean it up.
Yes, either -a 1 or -a 2 ultimately gives the same net output. I prefer the succinct one.
I suspect that
$ mkdir out; sort -u IPv4.May2020.37.nMapoG.txt | awk 'FILENAME == ARGV[1] { a[$1] = $2 } FILENAME == ARGV[2] && $1 in a { print $2 >> "out/" a[$1] "," $1 }' PTRList.txt -
Is meant to produce something similar to my preamble script:
join -a 1 -1 1 -2 1 <(sort PTRList.txt) <(sort -u IPv4.May2020.37.nMapoG.txt) | sort -nrk 2 | awk '{print $1"\t"$3"\t"$2}' '-' > Temp.0716.01.txt
but I haven't managed to figure out how to state the two require arguments in Magic Banana's script.
The actual IPv4.May2020.37.nMapoG.txt will be a concatenation of 38 nmap output files,
each one 60MB to 80MB, totalling about 3.5GB, and containing about 1GB of No_DNS data
that would be better left out of the results.
George Langford
Yes, either -a 1 or -a 2 ultimately gives the same net output. I prefer the succinct one.
The real "succinct one", as you write, would be without option -a. Neither -a 1 nor -a 2. That is what I meant.
Is meant to produce something similar to my preamble script
As I wrote: it "writes addresses in files whose names are the "number of instances", followed by a comma, followed by the PTR".
I haven't managed to figure out how to state the two require arguments in Magic Banana's script.
The two files? They are in the command line I gave.
Magic Banana, on the subject of the -a argument of join:
The real "succinct one", as you write, would be without option -a. Neither -a 1 nor -a 2. That is what I meant.
Which is absolutely correct; subconsciously I was using -a as an either/or choice, but it's also useful
to make the choice in the present anaylsis because unmatched lines indicate errors.
Magic Banana wondered about my inability to recognize those two arguments:
The two files? They are in the command line I gave.
The two files in the script that I immediately recognize as my own are PTRList.txt and IPv4.May2020.37.nMapoG.txt,
I now appreciate that the first argument is the (presumably first) PTR in PTRList.txt, but that second one still
goes over my head. Let's look at Magic Banana's awk command:
awk 'FILENAME == ARGV[1] { a[$1] = $2 } FILENAME == ARGV[2] && $1 in a { print $2 >> "out/" a[$1] "," $1 }' PTRList.txt
Column $1 of PTRList.txt holds the multi-address PTR's; Column $2 holds the corresponding number of instances,
so ARGV[1] has to be www.newsgeni.us and ARGV[2] ought to be 10.
That makes the first trial command:
./MB.suggestion.bin www.newsgeni.us 10
Which elicits the following responses:
./MB.suggestion.bin: line 1: $: command not found
awk: cmd. line:1: (FILENAME=- FNR=2) fatal: can't redirect to `out/2,lo0-100.NYCMNY-VFTTP-421.verizon-gni.net' (No such file or directory)
I tried the two files instead, but I get the exact same responses as the first two arguments.
Here's the text of MB.suggestion.txt (from which MB.suggestion.bin was made):
$ mkdir out; sort -u IPv4.May2020.37.nMapoG.txt | awk 'FILENAME == ARGV[1] { a[$1] = $2 } FILENAME == ARGV[2] && $1 in a { print $2 >> "out/" a[$1] "," $1 }' PTRList.txt -
Just in case, I'll sort PTRList.txt before running ./MB.suggestion.bin in the second trial command:
sort PTRList.txt > PTRListSort.txt ; ./MB.suggestion.sort.bin lo0-100.NYCMNY-VFTTP-421.verizon-gni.net 2
PTRList.txt was changed to PTRListSort.txt & MB.suggestion.sort.bin subsequently was made executable beforehand.
Terminal responses:
./MB.suggestion.sort.bin: line 1: $: command not found
awk: cmd. line:1: (FILENAME=- FNR=2) fatal: can't redirect to `out/2,lo0-100.NYCMNY-VFTTP-421.verizon-gni.net' (No such file or directory)
Now lo0-100.NYCMNY-VFTTP-421.verizon-gni.net in the error response is the same as the first argument of
MB.suggestion.sort.bin. That's progress.
George Langford
Let's look at Magic Banana's awk command
You did not copy the second argument, -, which is essential: it is the standard input, as always with GNU commands. Here: the output of sort -u IPv4.May2020.37.nMapoG.txt.
so ARGV[1] has to be www.newsgeni.us and ARGV[2] ought to be 10.
No. ARGV contains the arguments given to AWK. Here: ARGV[1] is PTRList.txt and ARGV[2] is -, the standard input. Excerpt from 'man awk':
ARGV Array of command line arguments. The array is indexed from 0 to ARGC - 1.
./MB.suggestion.bin: line 1: $: command not found
awk: cmd. line:1: (FILENAME=- FNR=2) fatal: can't redirect to `out/2,lo0-100.NYCMNY-VFTTP-421.verizon-gni.net' (No such file or directory)
You apparently copied a command line, including the prompt ($, which is not a command, as the error says), not the script in https://trisquel.info/forum/find-instances-each-list-strings-and-print-each-set-separate-file#comment-150649
Also, a shell script is not a binary, contrary to what the extension you chose suggests. The usual extension is "sh", but there is no need to give an extension.
If you had properly copied the script, you would have got the help message (because the test [ -z "$3" ] passes: the third argument is empty). It would have informed you that, after the two files, you must give the output directory. In the script, mkdir -p "$3" creates that directory (and even its parent directories) if it does not exists.
. could be a default value for that third argument, complementing the script in this way:
#!/bin/sh
if [ -z "$2" ]
then
printf "Usage: $0 PRT_list IPv4_addresses [output_dir]
Both files must have two fields. The first field must be the PTR and must be unique in PTR_list.
"
exit
fi
out=.
if [ -n "$3" ]
then
mkdir -p "$3"
out="$3"
fi
sort -u "$2" | awk -v out="$out/" 'FILENAME == ARGV[1] { a[$1] = $2 } FILENAME == ARGV[2] && $1 in a { print $2 >> out a[$1] "," $1 }' "$1" -
Just in case, I'll sort PTRList.txt before running ./MB.suggestion.bin
It is useless.
Now lo0-100.NYCMNY-VFTTP-421.verizon-gni.net in the error response is the same as the first argument of MB.suggestion.sort.bin. That's progress.
No it is not. You should try to understand what you are executing instead of doing random things such as sorting "just in case". Read the help message I wrote: the first argument is called "PTR_list". Not "PTR". It is a file. The rest of the message confirms it: "Both files...". Example of a call of the script (I named it "join-and-group-by-ptr": give meaningful names!), which here writes the files in the directory "out":
$ ./join-and-group-by-ptr PTRList.txt IPv4.May2020.37.nMapoG.txt out
There appears to be a misunderstanding:
I'm looking to reconcile the list of PTR's and the number of each PTR's occurrences (PTRList.txt)
and
the (severely truncated) output of the nmap scan(s) of the randomly filled out 3rd & 4th octets of
a set of CIDR/16 prefixes (IPv4.May2020.37.nMapoG.txt).
On the other hand, Magic Banana's script is looking at only one of the two files; which one ?
I've tried to save the scripts as MB.suggestion02.txt & MB.suggestion03.txt, cp them to *.bin
and *.sh, respectively, before chmod +x MB.suggestion02.bin and chmod +x Mb.suggestion03.sh
before lastly executing them with
./mb.suggestion02.bin filename01 - or ./MB.suggestion03.sh filename02 -
The error messages repeatedly say for either combination of filenames:
bash: ./MBsuggestion03.sh: No such file or directory or ./MBsuggestion02.bin: No such file or directory
No new directories appear. In my alternative scripting, I created the two necessary directories beforehand.
George Langford
There appears to be a misunderstanding
Maybe. If only you would give an example of the expected output...
On the other hand, Magic Banana's script is looking at only one of the two files
No, it is not. I do not write scripts taking several files in arguments to only use one.
I've tried to save the scripts as MB.suggestion02.txt & MB.suggestion03.txt, cp them to *.bin and *.sh, respectively
Directly write the script in a file bearing the name you want to execute it. I repeat: you want a meaningful name (pick only one script, depending on whether you want the default value for the output directory). I proposed "join-and-group-by-ptr" at the end of https://trisquel.info/forum/find-instances-each-list-strings-and-print-each-set-separate-file#comment-150667
The execution will be the same, whatever the extension. The file name does not even need an extension. For GNU/Linux systems, extensions do not "exist". Only whole file names do. Now, for users, if there is an extension, it is supposed to indicate the file format: users expect .jpg files to be images and not sounds, for instance. In the same way, .bin files are expected to be binaries and .sh files are expected to be shell scripts. Shell scripts are not binaries. They are plain text.
./mb.suggestion02.bin filename01 - or ./MB.suggestion03.sh filename02 -
Again: - is the standard input. If you do not redirect it (apparently the case here), it is the keyboard: the script is here expecting you to type the second file.
I do not know how to be clearer on how to call the script: a help message specifies the usage and I gave you an example of a call (using the files you attached) at the end of https://trisquel.info/forum/find-instances-each-list-strings-and-print-each-set-separate-file#comment-150667
chmod +x MB.suggestion02.bin and chmod +x Mb.suggestion03.sh before lastly executing them with ./mb.suggestion02.bin filename01 - or ./MB.suggestion03.sh filename02 -
The error messages repeatedly say for either combination of filenames:
bash: ./MBsuggestion03.sh: No such file or directory or ./MBsuggestion02.bin: No such file or directory
The error messages say neither "./MBsuggestion02.bin" nor "./MBsuggestion03.sh" exist. But if you would have really executed the command you wrote immediately before, the messages would be about "./mb.suggestion02.bin" and "./MB.suggestion03.sh". You would get the same error message if you named the scripts "MB.suggestion02.bin" and "Mb.suggestion03.sh", as the previous sentence suggests.
If you cannot type a same file name twice, you will never manage to execute any script. The letter case matters. And no character can be skipped. Fortunately, auto-completion makes it easy to correctly and efficiently input file names: you should use it.
No new directories appear.
With two arguments, the script in https://trisquel.info/forum/find-instances-each-list-strings-and-print-each-set-separate-file#comment-150649 displays the help message (because it requires three arguments) and the one in https://trisquel.info/forum/find-instances-each-list-strings-and-print-each-set-separate-file#comment-150667 would write files in the working directory (the default I chose if the third argument is missing).
In my alternative scripting, I created the two necessary directories beforehand.
Two necessary directories? Your original post gave a command line with one single output and you wrote that "each group [of IPv4 addresses] needs to be diverted to a separate filename with ".txt" appended to the unique PTR". How does that make two directories?
As I have already explained, the script I wrote creates the output directory if it is missing.
Magic Banana said:
Maybe.
In response to my concern:
There appears to be a misunderstanding
and then requested:
If only you would give an example of the expected output...
Here are a couple of the output files:
- CountsFiles/yellowipsdirty.singlehop.com.3.txt
- yellowipsdirty.singlehop.com 96.127.177.1
- yellowipsdirty.singlehop.com 99.198.113.0
- yellowipsdirty.singlehop.com 99.198.113.5
- zimbra.themicrobuttery.com 98.152.78.28
- zimbra.themicrobuttery.com 98.152.78.64
- zimbra.themicrobuttery.com 98.152.78.71
- zimbra.themicrobuttery.com 98.152.78.91
- zimbra.themicrobuttery.com 98.152.78.92
- zimbra.themicrobuttery.com 98.152.78.93
- zimbra.themicrobuttery.com 98.152.78.167
- zimbra.themicrobuttery.com 98.152.78.240
And another one:
- CountsFiles/zimbra.themicrobuttery.com.8.txt
The line numbers and indents were added by the incomprehensible composition code.
Magic Banana then went on:
Again: - is the standard input. If you do not redirect it (apparently the case here), it is the keyboard:
the script is here expecting you to type the second file.
At last I comprehend:
cp MB.suggestion02.txt MB.suggestion02; sudo chmod +x MB.suggestion02
./MB.suggestion02 PTRList.txt IPv4.May2020.37.nMapoG.txt
Success ! I'll add helpful comments before I forget how Magic Banana's excellent script is to be used.
The outputs are files in the working directory but will benefit from being placed by themselves in a
subdirectory. They would also benefit from adding the PTR in the first column.
Farther down in Magic Banana's response, it's stated:
[The output directory] is the working directory (the default I chose if the third argument is missing).
Guess what ? That works, too:
./MB.suggestion02 PTRList.txt IPv4.May2020.37.nMapoG.txt MBsOutputFiles
Magic Banana's script rejected a duplicate line; mine didn't; I corrected that by applying "sort -u" to the 2nd file in
the join command (I checked; it doesn't draw any complaint from the terminal):
join -a 1 <(sort PTR-files/www.newsgeni.us.txt) <(sort -u IPv4.May2020.100.nMapoG.txt) | sort -Vk 2
See this morning's detailed alternative set of scripts which won't get a medal for brevity.
The error messages say neither "./MBsuggestion02.bin" nor "./MBsuggestion03.sh" exist.
Not correct; it was the second input file that was missing.
We're back on track; Magic Banana's teaching efforts are working, and I'm learning to do some of this scripting
task myself. Adding the ability to use arguments is a task for another day, once the last third of the nmap scan
scripts finishes their two-to-three day forays into the wide, wide world.
Thank you again !
George Langford
The line numbers and indents were added by the incomprehensible composition code.
It is comprehensible: https://trisquel.info/en/filter/tips
I'll add helpful comments before I forget how Magic Banana's excellent script is to be used.
Again (it must be at least the fourth post in which I write that): an help message is displayed if you do not call the script enough arguments. For instance, with your terrible name:
$ ./MB.suggestion02
Usage: ./MB.suggestion02 PRT_list IPv4_addresses [output_dir]
Both files must have two fields. The first field must be the PTR and must be unique in PTR_list.
If the message is not meaningful to you, you can modify it: just edit the argument of printf.
The outputs are files in the working directory but will benefit from being placed by themselves in a subdirectory.
Again: the first version of the script, in https://trisquel.info/forum/find-instances-each-list-strings-and-print-each-set-separate-file#comment-150649, forces you to specify an output directory (no default).
Magic Banana's script rejected a duplicate line; mine didn't
In your original post, it did. That is why I did the same, as I explained in https://trisquel.info/forum/find-instances-each-list-strings-and-print-each-set-separate-file#comment-150709
I corrected that by applying "sort -u" to the 2nd file in the join command (I checked; it doesn't draw any complaint from the terminal)
You do not want to use your solution, which is much slower (among other reasons): https://trisquel.info/forum/find-instances-each-list-strings-and-print-each-set-separate-file#comment-150709
I'm learning to do some of this scripting task myself.
You repeatedly make the same mistakes (useless commands, wrong arguments given to sort -k, etc.). You obviously have a hard time reading documentation or what I write. You are finally confirming that the command line I gave in my first reply is perfectly fine (the scripts I then gave just ease its reuse: a help message and positional arguments rather than hard-coded ones). Despite that correct answer right from the start, look at the length of this thread...
Adding the ability to use arguments is a task for another day
It is trivial: just write "$1" for the first argument given to the script, "$2" for the second, etc.
As I've stated before, there are redundancies in my scripts, but they force me to
check the order of the columns in the original files. Once the script functions as
I expect it should, I relax and press on. Time flies when one's making progress.
The whole exercise in this posting is exactly to create a whole mess of files,
each one listing the (sometimes millions) of IP addresses claiming the same PTR
record. These text files will be linked to the Recent Visitor data spreadsheets
(which Magic Banana has helpfully taught me to code in HTML), thereby keeping the
basic presentation within reason. The entire set of illustrative address listings
would make one webpage hopelessly immense, defeating the purpose of calling attention
to the abuse of IPv4 and IPv6 address space.
The address listings ought to remain in text form so that exploring the contents of
those many like-named servers is done only by experienced and wary observers.
The number-of-instances column in PTRList.txt is retained only as a check on the
accuracy of any script that creates individual text files for all the multi-address
PTR records. Concatenated into one file makes such a list a very slow-loading page.
The final join between the listing of the PTR's in the overall PTRList.txt file will
dictate which PTR's make the final cut. The counts can be restored later.
The initial step is to extract all the multi-address PTR's after concatenating the
outputs of the nmap scripts that queried the 50,000 CIDR/16 blocks extracted from
the Current Visitor data. Those scripts are already written & tested. Next, join the
PTRList.txt file (no need for that superfluous Column $3) with the Recent Visitor
data to discover which (of the multi-address PTR's resolved by the nmap scans) are found
in the gratuitously looked up hostnames cataloged by the Webalizer analyses of the
reporting domains. Those many Internet addresses can be readily checked with dig -x
or nslookup; most of the ones that haven't been defensively changed already on the
servers should resolve to their hostnames as listed in the Recent Visitor data. The last
step is to publish these correlations to facilitate the defense of domains undergoing
attack by hosts that cannot be resolved, except by tedious perusing of search engine
data, and therefore are unblockable. Uploading the text files listing the known
addresses of the reported multi-address PTR records will be a slow process, but the
separate listings linked in the presentation webpage will be more accessible on a
one-file-at-a-time basis than an all-encompassing master list that might quickly
become obsolete.
George Langford
As I've stated before, there are redundancies in my scripts, but they force me to check the order of the columns in the original files.
That is what help messages are for:
#!/bin/sh
if [ -z "$3" ]
then
printf "Usage: $0 PRT_list IPv4_addresses output_dir
Both files must have two fields. The first field must be the PTR and must be unique in PTR_list.
"
exit
fi
mkdir -p "$3"
sort -u "$2" | awk -v out="$3/" 'FILENAME == ARGV[1] { a[$1] = $2 } FILENAME == ARGV[2] && $1 in a { print $2 >> out a[$1] "," $1 }' "$1" -
Also, you can write comments after "#". Writing 'join -a 1 -1 1 -2 1' rather than only 'join', one expects the options to have a purpose.
With both apologies and thanks to Magic Banana, yesterday I set about to find an independent solution
within which I can follow the logic.
Here is the series of scripts that accomplish the stated task. They are based on the previously
attached PTRListSort.txt and IPv4.May2020.37.nMapoG.txt as illustrative data.
The first script makes the 42 files that will hold the multi-address PTR's and their IPv4 addresses:
awk '{print $1,$2}' 'PTRList.txt' | awk '{print "touch CountsFiles/"$1"."$2".txt ;"}' '-' > Script.MakeFiles.txt
The second script makes 42 additional files, each one ready to contain just one PTR:
awk '{print $1,$2}' 'PTRList.txt' | awk '{print "touch PTR-files/"$1".txt ; "}' > Script.MakePTR-files.txt
The third script writes the PTR names into their just-created files:
awk '{print "echo "$1}' 'PTRList.txt' > Temp0718A.txt ;
awk '{print " > PTR-files/"$1".txt ;"}' 'PTRList.txt' > Temp0718B.txt ;
paste -d ' ' Temp0718A.txt Temp0718B.txt > Script.Fill.PTR-files.txt ; rm Temp0718A.txt Temp0718B.txt
The fourth script creates and lists 42 individual scripts, each one of which collects and writes the joined
address data to its just created file. Note that the first file in the join command contains only one line:
awk '{print "join -a 1 <(sort PTR-files/"$1".txt) <(sort IPv4.May2020.37.nMapoG.txt) | sort -Vk 2 >"}' 'PTRList.txt' > Temp0719A.txt;
awk '{print "CountsFiles/"$1"."$2".txt ;"}' 'PTRList.txt' > Temp0719B.txt;
paste -d ' ' Temp0719A.txt Temp0719B.txt > Script.Fill.CountsFiles.txt; rm Temp0719A.txt Temp0719B.txt
The fifth script makes all the above scripts executable
(the txt extension could be bin, run, sh (Magic Banana's preference)... the terminal doesn't care):
sudo chmod +x Script.MakeFiles.txt; chmod +x Script.MakePTR-files.txt; chmod +x Script.Fill.PTR-files.txt; chmod +x Script.Fill.CountsFiles.txt
The last script executes all the above scripts in one go, one after the other;
don't try this at home until you're exactly sure what's about to happen:
./Script.MakeFiles.txt; ./Script.MakePTR-files.txt; ./Script.Fill.PTR-files.txt; ./Script.Fill.CountsFiles.txt
George Langford
I executed that, out of curiosity. It essentially does the same as the command line I have given since the beginning of this thread:
$ sort -u IPv4.May2020.37.nMapoG.txt | awk 'FILENAME == ARGV[1] { a[$1] = $2 } FILENAME == ARGV[2] && $1 in a { print $2 >> "out/" a[$1] "," $1 }' PTRList.txt -
"Essentially" because:
- Your solution outputs in PTR-files files that each contain one single line: the PTR (which is also in the file name); what is the point?
- The files output in CountsFiles contain duplicates, e.g., "low.lowe001.net 96.125.160.252" is twice in CountsFiles/low.lowe001.net.2.txt; in your original post, you used 'sort -u IPv4.May2020.37.nMapoG.txt' to remove duplicates: that is why I did the same in my solution;
- Every line has two fields, but the first one is always the same PTR (which is also in the file name); what is the point?
- Executing it takes 0.3s on my system, against 0.01s for mine;
- 'ls -v' would not sort the files in CountsFiles by "number of instances"; in your original post, your use of 'sort -nrk 2' (the argument should have been 2,2) suggests you want to sort by "number of instances"; that is why the names of the output files in my solution start with the "number of instances".
Now, if you want the duplicates, if you insist on the file names you chose and if you really want the repeated PTR in a first field (what only looks like a waste of disk space), it is trivial to adapt my solution:
awk 'FILENAME == ARGV[1] { a[$1] = $2 } FILENAME == ARGV[2] && $1 in a { print $1, $2 >> "CountsFiles/" $1 "." a[$1] ".txt" }' PTRList.txt IPv4.May2020.37.nMapoG.txt
One single program is called. Against more than a dozen to create your Script.* files that then execute 252 other commands (more generally: 6 times the number of PTRs).
Magic Banana added responses to questions he didn't know I'd asked; our notes apparently crossed in the ether !
My inefficient (but clear to me) scripts needed that extra collection of single-line PTR's because I use them
in my one-at-a-time join command; afterwards, they're not needed any more, but take up a lot of inefficiently
used disk unit storage.
I've fixed the duplicates issue in my join command.
Those unwieldy tables of PTR's and their multiple addresses will soon be scrolling down the screen so far
that the filename may disappear from view; and the -Vk 2 option puts the IPv4's in visually searchable order.
2GB/60kB times 0.3 seconds is seriously longer than times 0.01 second: 2.8 hours vs. 5.6 minutes. Touche.
Magic Banana condensed his already slender script to the utmost:
Now, if you want the duplicates [No !], if you insist on the file name[s?] you chose and if you really want the
repeated PTR in a first field (what only looks like a waste of disk space), it is trivial to adapt my solution:
awk 'FILENAME == ARGV[1] { a[$1] = $2 } FILENAME == ARGV[2] && $1 in a { print $1, $2 >> "CountsFiles/" $1 "." a[$1] ".txt" }' PTRList.txt IPv4.May2020.37.nMapoG.txt
It works ! Thanks for your script's astounding brevity, for your accurate scripting, and for being so diligent.
In the final reckoning, IPv4.May2020.37.nMapoG.txt (60kB) becomes IPv4.May2020.100.nMapoG.txt (2GB).
PTRList.txt can keep the same name.
George Langford
- Inicie sesión o regístrese para enviar comentarios