A sed script to replace one HTML string with a different one
- Vous devez vous identifier ou créer un compte pour écrire des commentaires
Whereas Magic Banana's admonition to write HTML with a text editor is encouraging me
actually to do so, I've reached a degree of exasperation with my lack of geek-like
resources even though I've managed to produce a mostly working HTML page containing
the essence of my results.
That said, I'm not quite at the middle of another stage of the process, one in which
I'm adding links to another series of files containing additional pertinent data. I've
managed to replace about 400 of 900+ strings that display the multi-addressed PTRs in
the database with another (much longer) set of strings that contain the links to those
data files that contain additional detail about the domains that those PTRs are visiting.
The good news is that I created a set of scripts that extract the data from the master
database (which would fill the full width of the display and be 3500 rows deep) and place
it in a subdirectory. That task is complete and took just a few seconds after a day or
so of script-writing.
The present task runs afoul of the control strings necessary for HTML.
Here's the script that creates the actual string-replacement script:
awk '{print "cat "$5" sed '\''s/"$1$2$3"/"$4"/g'\'' '\>' "$6}' 'Source-04.txt' > Outcome-07.txt
where Source-04.txt is: 'pea dhcp-163.net1.bg slashpea' 'Success' Target-0802A.txt Target-0803A.txt
and Target-0802A.txt is: peadhcp-163.net1.bgslashpea
In order to comprehend this, you'll have to imagine "less than" p "greater than" wherever
you read "pea" and "forward slash" "less than" p "greater than" wherever you read "slashpea."
The script ensuing from the awk command as Outcome-07.txt is:
cat Target-0802A.txt sed 's/'peadhcp-163.net1.bgslashpea'/'Success'/g' > Target-0803A.txt
Which elicits the following complaint and fails to produce a modified Target-0803A.txt:
bash: p: No such file or directory
The files Target-0802A.txt and Target-0803A.txt both exist; Target-0803A.txt is empty in
order to protect File-0802A.txt from obliteration by a non-functional script Outcome-07.txt.
The script Outcome-07.txt appears to be looking for an imaginary file ...
Once the script is satisfied, Target-0803A.txt should read "Success" but by then I can create
the true successor script by using awk to print all its component parts just like Source-04.txt,
but with more (and longer) components.
The last step is to replace $3 in the new Source-05.txt with the individual PTRs read from a
list of the actual remaining 500+ PTRs that are to be linked. That's a future scripting task
which can be trivial, as it has been before.
I've made a copy of a portion of the developing webpage, which is just a table at this stage,
and that's the first attachment, presented as a text file. Then there are 33 more files with
the actual data that belong in a subdirectory named "MatchFilesMay2020" These are the ones
that all should already be linked in the table. Another set of 33 files with the quantity
portion of their filenames removed are also attached; they are a different set of data. They
belong in a different subfolder, "MatchedPTRs" The forum software has renamed all the domain-like
files with underscores; those may have to be reconciled with the webpage file to complete the
links.
George Langford
Was it necessary to attach dozens of files, about one megabyte of size?
Ignacio Agullo inquired:
Was it necessary to attach dozens of files, about one megabyte of size?
Yes; they're all different, with differing goals, impacts, patterns, and the like.
I'd also like to encourage others to attempt similar analyses. It's taking me a
couple of months to gather the data and put it into an order which can be examined
to find out why and how so many attacks are being made by servers located at
addresses which cannot be traced. These results show that they can be examined for
country of origin, degree of obfuscation, location of additional addresses, etc.
There are other months in the year; one person cannot possibly keep up with the task;
yet there are hundreds of folks picking up the traces left behind in the headers of
malicious messages; you can find out for yourselves by putting one of the PTR records
(a.k.a. hostnames) in an Internet search engine, enclosed in quotation marks, and then
gathering the IP addresses gleaned from malicious Internet traffic by the many folks
who monitor such traffic. That's another webpage like this excerpt that can be generated.
George Langford
Regarding the Table-html-excerpt.txt file:
After converting it back to HTML and trying out the links, I discovered an easily corrected
error in about two-thirds of them. In Leafpad the correction is to search for the string,
../../ScoreCards" and replace it with "../ScoreCards"
That should fix all the broken links.
By the way: The linked files are all plain text without any scripts and contain no more
links to anywhere else. That said, you can test the names with "dig hostname" and the
IP addresses with "dig -x IPaddress" Many of the hostnames come back as on the server,
"92.242.140.21" which is a catchall address used by a fellow who maintains a site called
"barefruit error handling" or "unallocated.barefruit.co.uk" but it's not where these
oftimes malicious servers are. "whois IPaddress" will tell you where they are located
and what their autonomous server number (ASN) is.
George Langford
These days, I do not have time to decipher your posts. Anyway, a pipe is missing in:
cat Target-0802A.txt sed 's/'peadhcp-163.net1.bgslashpea'/'Success'/g' > Target-0803A.txt
And it is equivalent to, simply:
sed 's/peadhcp-163.net1.bgslashpea/Success/g' Target-0802A.txt > Target-0803A.txt
And, as I have told you many times, a non-escaped dot in a regular expression means "any single character". As a consequence, it should probably be:
sed 's/peadhcp-163\.net1\.bgslashpea/Success/g' Target-0802A.txt > Target-0803A.txt
Also, there is no way you got that command line returning:
bash: p: No such file or directory
Bash tried here to execute a command named "p". Such a single "p" was certainly right after the prompt.
After learning how to count characters in bash:
https://linuxhint.com/length_of_string_bash/
I found out that the offending character in the sed expression is the forward slash preceding
the second "p" in the string to be replaced. The attached text file demonstrates this result.
I have yet to see whether or not my substitution will work in the real world.
Your corrections were a key factor in this small accomplishment; thanks again !
After a lot of tries, I decided simply to bypass sed's difficulty with the pesky slashpea code,
supplied as ExemplarScripts29.txt
I constructed this with a series of awk commands followed by the paste command, plus some editing
in Leafpad to replace troublesome stuff like "'s" and "g'" wherein I filled in the leading and
trailing "'" characters, and a space between the "a and href" that I plugged with "a--href".
Most troublesome is the "IPv4" which even "3334" hasn't fixed. Everything appears to be an unknown
option to s which is where its stands now.
George Langford
Pièce jointe | Taille |
---|---|
MBdemonstration.txt | 151 octets |
ExemplarScripts29.txt.txt | 698 octets |
Trying another tack with awk ...
see: https://stackoverflow.com/questions/50244876/how-to-use-gsub-in-awk-to-find-and-replace-and-txt-characters-within
where it's said:
echo "./file_name.txt|1230" | awk '{gsub(/\.\/|\.txt/,"")}1' file_name|1230
In the present task, taking just one exemplar PTR, see the attached file, which also shows bash's response.
The character that's flagged is the end-parenthesis, but that's part of the standard gsub syntax.
George Langford
Pièce jointe | Taille |
---|---|
exemplar-script-awk-gsub.txt | 239 octets |
There are three non-escaped single quotes on this command line.
In my forays into the mysteries of Fortran fifty years ago, mistakes in the coding often generated
error messages that had no discernible relation to the mistakes, but instead pointed to their
consequences. That's what I suspect is happening as a consequence of attempting to use the command
line to edit HTML.
The attached file has five versions of the offending script and the bash responses:
(1) Is the script that was the first that I tried, with escaped dots; ")" was flagged.
(2) All the dots are escaped; also the single quotes; "(" was flagged.
(3) Target file was altered to eliminate the p's; so was the script; ")" was flagged.
(4) Target file still altered to eliminate the p's; single quotes escaped; "(" was flagged.
(5) All the HTML-sensitive characters were translated; `/\<' (That's "<" - GL) was flagged.
Ref:https://stackoverflow.com/questions/12873682/short-way-to-escape-html-in-bash
George Langford
Pièce jointe | Taille |
---|---|
TrisquelScript-08072020.txt | 1.28 Ko |
- Vous devez vous identifier ou créer un compte pour écrire des commentaires