A sed script to replace one HTML string with a different one

8 réponses [Dernière contribution]
amenex
Hors ligne
A rejoint: 01/04/2015

Whereas Magic Banana's admonition to write HTML with a text editor is encouraging me
actually to do so, I've reached a degree of exasperation with my lack of geek-like
resources even though I've managed to produce a mostly working HTML page containing
the essence of my results.

That said, I'm not quite at the middle of another stage of the process, one in which
I'm adding links to another series of files containing additional pertinent data. I've
managed to replace about 400 of 900+ strings that display the multi-addressed PTRs in
the database with another (much longer) set of strings that contain the links to those
data files that contain additional detail about the domains that those PTRs are visiting.

The good news is that I created a set of scripts that extract the data from the master
database (which would fill the full width of the display and be 3500 rows deep) and place
it in a subdirectory. That task is complete and took just a few seconds after a day or
so of script-writing.

The present task runs afoul of the control strings necessary for HTML.

Here's the script that creates the actual string-replacement script:
awk '{print "cat "$5" sed '\''s/"$1$2$3"/"$4"/g'\'' '\>' "$6}' 'Source-04.txt' > Outcome-07.txt

where Source-04.txt is: 'pea dhcp-163.net1.bg slashpea' 'Success' Target-0802A.txt Target-0803A.txt

and Target-0802A.txt is: peadhcp-163.net1.bgslashpea

In order to comprehend this, you'll have to imagine "less than" p "greater than" wherever
you read "pea" and "forward slash" "less than" p "greater than" wherever you read "slashpea."

The script ensuing from the awk command as Outcome-07.txt is:
cat Target-0802A.txt sed 's/'peadhcp-163.net1.bgslashpea'/'Success'/g' > Target-0803A.txt

Which elicits the following complaint and fails to produce a modified Target-0803A.txt:
bash: p: No such file or directory

The files Target-0802A.txt and Target-0803A.txt both exist; Target-0803A.txt is empty in
order to protect File-0802A.txt from obliteration by a non-functional script Outcome-07.txt.

The script Outcome-07.txt appears to be looking for an imaginary file ...

Once the script is satisfied, Target-0803A.txt should read "Success" but by then I can create
the true successor script by using awk to print all its component parts just like Source-04.txt,
but with more (and longer) components.

The last step is to replace $3 in the new Source-05.txt with the individual PTRs read from a
list of the actual remaining 500+ PTRs that are to be linked. That's a future scripting task
which can be trivial, as it has been before.

I've made a copy of a portion of the developing webpage, which is just a table at this stage,
and that's the first attachment, presented as a text file. Then there are 33 more files with
the actual data that belong in a subdirectory named "MatchFilesMay2020" These are the ones
that all should already be linked in the table. Another set of 33 files with the quantity
portion of their filenames removed are also attached; they are a different set of data. They
belong in a different subfolder, "MatchedPTRs" The forum software has renamed all the domain-like
files with underscores; those may have to be reconciled with the webpage file to complete the
links.

George Langford

Pièce jointeTaille
Table-html-excerpt.txt17.65 Ko
ip-66-70-185.eu_.3.txt89 octets
default-rdns.vocus_.co_.nz_.18099.txt696.82 Ko
125.mtsnet.ru_.12.txt345 octets
ip-54-39-190.eu_.8.txt239 octets
dedic-center.ru_.116.txt3.5 Ko
123-51-215-0.ll_.static.sparqnet.net_.4.txt201 octets
ip-54-39-184.eu_.9.txt267 octets
dedicated.vsys_.host_.100.txt3.34 Ko
121.dhcp_.apogeetelecom.com_.7.txt290 octets
ip-54-39-179.eu_.3.txt87 octets
dedicated-assignments-only.fuse_.net_.27.txt1.34 Ko
120.mtsnet.ru_.6.txt173 octets
ip-54-39-178.eu_.5.txt145 octets
dc113.kdata_.vn_.9.txt277 octets
113.mtsnet.ru_.3.txt87 octets
ip-54-38-90.eu_.6.txt164 octets
dallas-tx-datacenter.serverpoint.com_.6.txt302 octets
111.mtsnet.ru_.6.txt174 octets
ip-54-38-43.eu_.6.txt166 octets
daimon.alastyr.com_.2.txt60 octets
111.14.103.jeruk1_.ats-com.net_.4.txt179 octets
ip-54-38-42.eu_.7.txt194 octets
dailytopoffer.com_.4.txt136 octets
109-198-197-x.dynamic.b-domolink.net_.16.txt837 octets
ip-54-38-41.eu_.6.txt166 octets
cust.uvtnet.cz_.123.txt3.36 Ko
109-198-192-x.dynamic.b-domolink.net_.6.txt312 octets
ip-54-38-40.eu_.8.txt221 octets
customer.vivid-hosting.net_.215.txt8.92 Ko
100.mtsnet.ru_.3.txt84 octets
ip-66-70-185.eu_.txt515 octets
default-rdns.vocus_.co_.nz_.txt1.03 Ko
125.mtsnet.ru_.txt44 octets
ip-54-39-190.eu_.txt42 octets
dedic-center.ru_.txt84 octets
123-51-215-0.ll_.static.sparqnet.net_.txt126 octets
ip-54-39-184.eu_.txt41 octets
dedicated.vsys_.host_.txt971 octets
121.dhcp_.apogeetelecom.com_.txt57 octets
ip-54-39-179.eu_.txt44 octets
dedicated-assignments-only.fuse_.net_.txt61 octets
120.mtsnet.ru_.txt40 octets
ip-54-39-178.eu_.txt44 octets
dc113.kdata_.vn_.txt380 octets
113.mtsnet.ru_.txt39 octets
ip-54-38-90.eu_.txt587 octets
dallas-tx-datacenter.serverpoint.com_.txt63 octets
111.mtsnet.ru_.txt45 octets
ip-54-38-43.eu_.txt382 octets
daimon.alastyr.com_.txt136 octets
111.14.103.jeruk1_.ats-com.net_.txt55 octets
ip-54-38-42.eu_.txt461 octets
dailytopoffer.com_.txt129 octets
109-198-197-x.dynamic.b-domolink.net_.txt68 octets
ip-54-38-41.eu_.txt165 octets
cust.uvtnet.cz_.txt80 octets
109-198-192-x.dynamic.b-domolink.net_.txt62 octets
ip-54-38-40.eu_.txt255 octets
customer.worldstream.nl_.txt573 octets
103.140.104-static.rdns_.serverhub.com_.txt1.15 Ko
ip-54-38-38.eu_.txt245 octets
customer.vivid-hosting.net_.txt106 octets
100.mtsnet.ru_.txt40 octets
Ignacio Agulló
Hors ligne
A rejoint: 07/30/2019

Was it necessary to attach dozens of files, about one megabyte of size?

amenex
Hors ligne
A rejoint: 01/04/2015

Ignacio Agullo inquired:
Was it necessary to attach dozens of files, about one megabyte of size?

Yes; they're all different, with differing goals, impacts, patterns, and the like.

I'd also like to encourage others to attempt similar analyses. It's taking me a
couple of months to gather the data and put it into an order which can be examined
to find out why and how so many attacks are being made by servers located at
addresses which cannot be traced. These results show that they can be examined for
country of origin, degree of obfuscation, location of additional addresses, etc.

There are other months in the year; one person cannot possibly keep up with the task;
yet there are hundreds of folks picking up the traces left behind in the headers of
malicious messages; you can find out for yourselves by putting one of the PTR records
(a.k.a. hostnames) in an Internet search engine, enclosed in quotation marks, and then
gathering the IP addresses gleaned from malicious Internet traffic by the many folks
who monitor such traffic. That's another webpage like this excerpt that can be generated.

George Langford

amenex
Hors ligne
A rejoint: 01/04/2015

Regarding the Table-html-excerpt.txt file:

After converting it back to HTML and trying out the links, I discovered an easily corrected
error in about two-thirds of them. In Leafpad the correction is to search for the string,
../../ScoreCards" and replace it with "../ScoreCards"

That should fix all the broken links.

By the way: The linked files are all plain text without any scripts and contain no more
links to anywhere else. That said, you can test the names with "dig hostname" and the
IP addresses with "dig -x IPaddress" Many of the hostnames come back as on the server,
"92.242.140.21" which is a catchall address used by a fellow who maintains a site called
"barefruit error handling" or "unallocated.barefruit.co.uk" but it's not where these
oftimes malicious servers are. "whois IPaddress" will tell you where they are located
and what their autonomous server number (ASN) is.

George Langford

Magic Banana

I am a member!

I am a translator!

Hors ligne
A rejoint: 07/24/2010

These days, I do not have time to decipher your posts. Anyway, a pipe is missing in:
cat Target-0802A.txt sed 's/'peadhcp-163.net1.bgslashpea'/'Success'/g' > Target-0803A.txt
And it is equivalent to, simply:
sed 's/peadhcp-163.net1.bgslashpea/Success/g' Target-0802A.txt > Target-0803A.txt
And, as I have told you many times, a non-escaped dot in a regular expression means "any single character". As a consequence, it should probably be:
sed 's/peadhcp-163\.net1\.bgslashpea/Success/g' Target-0802A.txt > Target-0803A.txt

Also, there is no way you got that command line returning:
bash: p: No such file or directory
Bash tried here to execute a command named "p". Such a single "p" was certainly right after the prompt.

amenex
Hors ligne
A rejoint: 01/04/2015

After learning how to count characters in bash:
https://linuxhint.com/length_of_string_bash/

I found out that the offending character in the sed expression is the forward slash preceding
the second "p" in the string to be replaced. The attached text file demonstrates this result.

I have yet to see whether or not my substitution will work in the real world.

Your corrections were a key factor in this small accomplishment; thanks again !

After a lot of tries, I decided simply to bypass sed's difficulty with the pesky slashpea code,
supplied as ExemplarScripts29.txt

I constructed this with a series of awk commands followed by the paste command, plus some editing
in Leafpad to replace troublesome stuff like "'s" and "g'" wherein I filled in the leading and
trailing "'" characters, and a space between the "a and href" that I plugged with "a--href".

Most troublesome is the "IPv4" which even "3334" hasn't fixed. Everything appears to be an unknown
option to s
which is where its stands now.

George Langford

Pièce jointeTaille
MBdemonstration.txt 151 octets
ExemplarScripts29.txt.txt 698 octets
amenex
Hors ligne
A rejoint: 01/04/2015

Trying another tack with awk ...

see: https://stackoverflow.com/questions/50244876/how-to-use-gsub-in-awk-to-find-and-replace-and-txt-characters-within
where it's said:
echo "./file_name.txt|1230" | awk '{gsub(/\.\/|\.txt/,"")}1' file_name|1230

In the present task, taking just one exemplar PTR, see the attached file, which also shows bash's response.

The character that's flagged is the end-parenthesis, but that's part of the standard gsub syntax.

George Langford

Pièce jointeTaille
exemplar-script-awk-gsub.txt 239 octets
Magic Banana

I am a member!

I am a translator!

Hors ligne
A rejoint: 07/24/2010

There are three non-escaped single quotes on this command line.

amenex
Hors ligne
A rejoint: 01/04/2015

In my forays into the mysteries of Fortran fifty years ago, mistakes in the coding often generated
error messages that had no discernible relation to the mistakes, but instead pointed to their
consequences. That's what I suspect is happening as a consequence of attempting to use the command
line to edit HTML.

The attached file has five versions of the offending script and the bash responses:
(1) Is the script that was the first that I tried, with escaped dots; ")" was flagged.
(2) All the dots are escaped; also the single quotes; "(" was flagged.
(3) Target file was altered to eliminate the p's; so was the script; ")" was flagged.
(4) Target file still altered to eliminate the p's; single quotes escaped; "(" was flagged.
(5) All the HTML-sensitive characters were translated; `/\&lt' (That's "<" - GL) was flagged.

Ref:https://stackoverflow.com/questions/12873682/short-way-to-escape-html-in-bash

George Langford

Pièce jointeTaille
TrisquelScript-08072020.txt 1.28 Ko