Syntax of an if ... then ... action statement

15 Antworten [Letzter Beitrag]
amenex
Offline
Beigetreten: 01/03/2015

Every so often I come across IP addresses in the hostname column of the Current Visitor data collected with Webalizer;
it's unclear whether these addresses where never resolved by Apache's hostname-lookup function or are actually hostnames
in their own right.

After some searching with nmap, I've generated a long list of pointers (i.e. hostnames) and IPv4 address pairs, from
which I'd like to select the pairs in which the pointer is not the same as the address.

The attached file SourceFiles-113A.txt lists some obvious examples; there are others in the actual 70,000+ pair list.

Here's my present (unsuccessful) effort to create a suitable script:
awk '{print $1"\t"$2}' 'SourceFiles-113A.txt' | if [ "$1" != "$2" ]; then awk '{print $1"\t"$2}' '-' fi

"Why is this important?" you might ask ...

Here's a "ferinstance" in which the pointer (PTR) is the first string and the address (IPv4) is the second string:
10.64.10.153 51.15.213.51
dig -x 51.15.213.51 returns in the answer section: PTR 10.64.10.153
dig -x 10.64.10.153 returns no answer
dig 10.64.10.153 returns in the answer section: PTR 10.64.10.153
dig 51.15.213.51 returns in the answer section: A 51.15.213.51

Another one:
5.189.130.207 5.189.130.210
dig -x 5.189.130.207 returns in the answer section: PTR mx-c27.ox6dev.com
dig -x 5.189.130.210 returns in the answer section: PTR 5.189.130.207
dig 5.189.130.207 returns in the answer section: A 5.189.130.207
dig 5.189.130.210 returns in the answer section: A 5.189.130.210

On the other hand, dig mx-c27.ox6dev.com returns: A 5.189.130.207

Hurricane Electric's BGP service returns this result: 5.189.130.207 resolves to mx-c27.ox6dev.com.
The 5.189.128.0/20 CIDR block in which these 5.189.130.0/24 addresses reside also contains pointers indicated
only by dashes ("-"); several of these obfuscated pointers have A records, so when a hostname lookup is
performed by Apache on the IP addresses of those hidden A records, they become untraceable and all the
Webalizer data says is ("-"). My longer list of PTR-IPv4 pairs has nearly two thousand "-" PTR records.

I also found one pointer with three addresses:
The PTR "108.61.10.10" has three IPv4 addresses ==> 149.28.202.234, 149.28.207.67 and 149.28.209.68

AnhangGröße
SourceFiles-113A.txt1.56 KB
Magic Banana

I am a member!

I am a translator!

Offline
Beigetreten: 07/24/2010

$1, $2, ... in a shell script are its arguments, on the command line (technically called "positional parameters"). Well, unless you used 'set' before in the script. You do not want to do that: shells are slow. Filtering the lines where the first two fields differ is extremely easy with awk:
$ awk '$1 != $2'
In the AWK "program", $1, $2, ... are the fields of the current record.

amenex
Offline
Beigetreten: 01/03/2015

Thank you.

Along the way I worked out a way of extracting IP address data from a file of mixed IP addresses and alphanumeric
pointers: Convert all the letters of the alphabet to tabs with sed, then (thanks to Mrs. L) condensing the tabs with
LibreOffice Calc., condensing [by hand] the ensuing mess to tab-delimited IPv4 addresses (discarding the IPv6 rows
in the process), expanding those addresses to the 24 permutations of 1234, and lastly using a 24-line nMap script to
look up the PTR's of those mostly fictitious IPv4 addresses. Finally, join the nMap list to the Current Visitor data
to resolve not only previously unresolved IPv4 addresses, but also hostnames that had been gratuitously converted
from long-lost IP addresses by the Apache servers.

I gave up trying to print a series of script steps with awk and instead now use the yes command to print multiple
copies of portions of the intended script, paste to attach adjacent columns of the script command, and Leafpad
for repairs and minor additions. It's like making sausage: don't watch too closely, because all the inefficiencies
of the executable script are hidden from view and the outcome works really fast anyway.

I also found out that grep and awk share grep's failing that it will select any string that carries the desired
pattern, but there are a couple of simple commands that will discard the excess matches; see
https://stackoverflow.com/questions/17960758/how-to-use-awk-to-extract-a-line-with-exact-match
awk '$1=="42.64.uzpak.uz"' 42.64.uzpak.uz.txt or sed -n '/\b42.64.uzpak.uz\b/p' 42.64.uzpak.uz.txt

George Langford

AnhangGröße
42.64.uzpak_.uz_.txt 40 Bytes
Magic Banana

I am a member!

I am a translator!

Offline
Beigetreten: 07/24/2010

Convert all the letters of the alphabet to tabs with sed, then (thanks to Mrs. L) condensing the tabs with LibreOffice Calc., condensing [by hand] the ensuing mess ...

That looks like what 'tr -s A-z \\t' does, but you apparently love to lose your time with manual work...

I gave up trying to print a series of script steps with awk and instead now use the yes command to print multiple copies of portions of the intended script, paste to attach adjacent columns of the script command, and Leafpad for repairs and minor additions.

That must be the worst way to implement a loop.

grep's failing that it will select any string that carries the desired pattern

It is not "failing". It is what grep does by default. And there is an option to do what you apparently want: --word-regexp (-w).

amenex
Offline
Beigetreten: 01/03/2015

Magic Banana commented: That must be the worst way to implement a loop.

I did mention the making of sausage.

This approach gives me the least trouble with syntax as I become more familiar with scripting.
Today I wrote two 89-row scripts to (a) Create 89 basic files to present over 120,000 lines of
data and then (b) Populate those 89 files with three extra columns. Each script, once free
of mistakes, ran in the blink of an eye. "chmod +x filename" is now a familiar step.

Now I would like to create a script to list all the instances of one of the seven elements of
each of those 120,000 rows of data, which would facilitate study of the patterns in the data.
I found reference to while loops based on the getopts command, but I haven't found out if Trisquel
supports that command.
Reference: https://www.lifewire.com/pass-arguments-to-bash-script-2200571
while getopts u:d:p:f: option, etc.

Thank you for paying attention !

George Langford

Magic Banana

I am a member!

I am a translator!

Offline
Beigetreten: 07/24/2010

I would like to create a script to list all the instances of one of the seven elements

What are an "instance" and an "element"? Without a small input and the related expected output, it is not understandable.

I haven't found out if Trisquel supports that command.

getopts has been built in any POSIX-compliant shell for the past 12 years or so: https://pubs.opengroup.org/onlinepubs/9699919799/

But I doubt you need it.

amenex
Offline
Beigetreten: 01/03/2015

OK; Attached find a sampling wherein the columns are aligned thusly, left to right:
hostname...IPv4 Address...domainNNN.txt...Visits Per Domain...Autonomous System Number...Country Code...Topic
The first two columns are for the visiting agent; the third through seventh columns are specific to the domains visited.

I foresee having just one argument, albeit a variable, but a different one for each column as in the illustrations above.
Note that I'm taking advantage of that sometimes pesky feature of grep.

All is not lost; there is a "info getopt" page, even though there aren't any man pages for getopt or getopts.

Here are some grep commands that illustrate what I want the script to run:
grep -e "hn.kd.ny.adsl" FerInstance.txt | sort -Vk 2 > Temp09222020A.txt ==> 23.4 kB
grep -e "2.61." FerInstance.txt | sort -Vk 2 > Temp09222020B.txt ==> 714 bytes
grep -e ".pppoe.khakasnet.ru" FerInstance.txt | sort -Vk 2 > Temp09222020C.txt ==> 714 bytes

AnhangGröße
FerInstance.txt 44.45 KB
Magic Banana

I am a member!

I am a translator!

Offline
Beigetreten: 07/24/2010

As I have told you so many times:

  • in a regular expression, '.' means "any single character", '\.' means a dot (or you can execute 'grep -F' to grep fixed strings rather than regular expressions);
  • you almost certain want to use 'sort -k 2,2' (sort using the sole second field) and not sort -k 2 (sort using the whole line starting from its second field).

With those fixes, it looks like the script you want is simply:

#!/bin/sh
grep -F "$1" | sort -Vk 2,2

All is not lost; there is a "info getopt" page, even though there aren't any man pages for getopt or getopts.

getopt and getopts are different. As I wrote, "getopts has been *built in* any POSIX-compliant shell for the past 12 years" (emphasis added): it is not a separate command; you will find its specification in the manual of your shell, as in 'man sh'.

amenex
Offline
Beigetreten: 01/03/2015

Following up, based on the much clearer explanation here:
https://www.baeldung.com/linux/use-command-line-arguments-in-bash-script

I prepared a successful script:
while getopts h:i:d:v:a:c:t: flag
do
case "{$flag}"
in
h) Hostname=${OPTARG};;
i) IPv4=${OPTARG};;
d) DomainNumber=${OPTARG};;
v) VisitsPerDomain=${OPTARG};;
a) ASNumber=${OPTARG};;
c) CountryCode=${OPTARG};;
t) Topic=${OPTARG};;
esac
done
grep -e "${OPTARG}" FerInstance.txt | sort -Vk 2 > ${OPTARG}.txt

I ran it with three different arguments, the same ones as in my earlier posting:
sh userReg-flags.sh -h 'hn.kd.ny.adsl'
sh userReg-flags.sh -i '2.61.'
sh userReg-flags.sh -h '.pppoe.khakasnet.ru'

The outputs are all the same as before, except for their different names, with minor blemishes such as
the doubled dots in the IPv4.txt output filename and the fact that the partial hostname's leading dot
makes that one a hidden file. In future I'll remind myself to leave out the leading dot in partial PTR's.

It's a huge help that the script is quite logical.

George Langford

Magic Banana

I am a member!

I am a translator!

Offline
Beigetreten: 07/24/2010

I prepared a successful script

It makes no sense: none of the shell variables you define with getopts is ever used. See my previous post for what you probably want.

If you really want options to specify the columns to search in, then getopts may be appropriate (or not, if you always want to search one single field). But grep is certainly not, because its does not have any concept of "field": you want to use awk.

minor blemishes such as the doubled dots in the IPv4.txt output filename and the fact that the partial hostname's leading dot makes that one a hidden file.

That is a reason why scripts usually write on the standard output (beside avoiding overwriting by mistake an existing file, being able to pipe the output, etc.). I did that in my previous post. When calling the script, you can redirect the output wherever you want.

amenex
Offline
Beigetreten: 01/03/2015

Magic Banana misunderstands:
I prepared a successful script.
It makes no sense: none of the shell variables you define with getopts is ever used. See my previous post for what you probably want.

On the contrary: I can pick any one or more of the named single-letter arguments, provide the appropriate value(s)
place those arguments after the name of the script on the command line, and the script collects the response and
places that response into the working directory. There are no redundant or unnecessary steps or verbiage in the
script. The same script works for any combination of the shell variables, and the same grep command does all the
work of outputting the result.

My homework assignment is to apply the script online from the browser ... Magic Banana's suggestions seem
particularly appropriate:
That is a reason why scripts usually write on the standard output (beside avoiding overwriting by mistake an existing file, being
able to pipe the output, etc.). I did that in my previous post. When calling the script, you can redirect the output wherever you want.

Magic Banana continues:
awk '$1=="42.64.uzpak.uz"' 42.64.uzpak.uz.txt or sed -n '/\b42.64.uzpak.uz\b/p' 42.64.uzpak.uz.txt

The FerInstance.txt file is the concatenation of three of eighty-nine separate files in a subdirectory; application
of the suggested awk or sed commands might just end up just as complex as the userReg-flags.sh script, which I like
because it works, it's logical, and it's plenty fast enough at that.

George Langford

Magic Banana

I am a member!

I am a translator!

Offline
Beigetreten: 07/24/2010

I insist: it makes no sense. Substitute the option with any other (for instance give an IP address after -h) or even remove the whole case structure and you will see that you script behaves the same.

amenex
Offline
Beigetreten: 01/03/2015

Magic Banana is correct; I "adjusted" the script in anticipation of his constructive criticism:
while getopts n: flag
do
case "{$flag}"
in
n) Input=${OPTARG};;
esac
done
grep -e "${OPTARG}" FerInstance.txt | sort -u > ${OPTARG}.txt

Which collects the appropriate data whatever the input and names the output file appropriately, for example:
sh userReg-flags.Single.sh -n hn.kd.ny.adsl

So does Magic Banana's sed script:
sed -n '/\hn.kd.ny.adsl\b/p' FerInstance.txt | sort -u > hn.kd.ny.adsl.sed.txt

Ignoring the complexities and lurking redundancies of the userReg-flags.Single.sh script, the user input
for the sed script is less complex than it is for the userReg-flags.Single.sh script.

Inserting the sed script in place of the grep script in the userReg-flags.Single.sh script hasn't yet worked
for me, probably because I haven't yet made sense of sed's options.

The awk script presupposes that the user knows the location of the particular argument in the target file; grep and sed don't care.

I tried using awk $0 in place of awk $1 (for the hostname column) but awk cannot pick out the hostname's string in the
selected rows of the target file, even though it can in a continuous string.

George Langford

Magic Banana

I am a member!

I am a translator!

Offline
Beigetreten: 07/24/2010

Having one single option that is mandatory means plain useless typing of the option (and a uselessly more complicated script, which still defines a Shell variable that is never used): you just want the one-line script I wrote in https://trisquel.info/forum/syntax-if-then-action-statement#comment-152436

So does Magic Banana's sed script

I very much doubt I ever wrote that.

I tried using awk $0 in place of awk $1 (for the hostname column) but awk cannot pick out the hostname's string in the selected rows of the target file, even though it can in a continuous string.

To select the rows where the first field is "hn.kd.ny.adsl":

$ awk '$1 == "hn.kd.ny.adsl"'

amenex
Offline
Beigetreten: 01/03/2015

amenex: So does Magic Banana's sed script
Magic Banana: I very much doubt I ever wrote that.

The referenced script was built on the basic form constructed by Magic Banana,
for which he deserves full credit but no blame.

...uselessly more complicated script, which still defines a Shell variable that is never used

Here it is, with comments:
while getopts n: flag # getopts requires at least this much
do
case "{$flag}"
in
n) Input=${OPTARG};; # this statment tells grep how to format is query as well as formatting the output
esac
done
grep -e "${OPTARG}" FerInstance.txt | sort -u > ${OPTARG}.txt # input is declared once but is used three times

Before I added the comments, it was just 139 bytes ...

Here's the initiating command:
sh userReg-flags.Single.sh -n hn.kd.ny.adsl
It's 43 bytes.

George Langford

Magic Banana

I am a member!

I am a translator!

Offline
Beigetreten: 07/24/2010

The referenced script was built on the basic form constructed by Magic Banana, for which he deserves full credit but no blame.

That is essentially grep implemented in sed. Well, the regular expression uses "\h" and I do not know if it is available in grep... since I have no idea what it means!

n) Input=${OPTARG};; # this statment tells grep how to format is query as well as formatting the output

This statement is useless: you can even remove the whole case structure and you will not see any difference in behavior. For the third time, I repeat: the defined shell variable (Input) is never read! grep's output cannot be "formatted": it selects lines (or parts of them with -o).

The whole script can be one line long. The line I gave you yesterday. Well, maybe with sort called with different options.

sh userReg-flags.Single.sh -n hn.kd.ny.adsl

And for the n-th time (n tending to infinity), a dot in a regular expression means "any single character". Just try to add to your input a line containing "hn1kd2ny3adsl", for instance, and you will see your grep (without option -F) selecting it.