A sed script to replace one HTML string with a different one

8 risposte [Ultimo contenuto]
Iscritto: 01/03/2015

Whereas Magic Banana's admonition to write HTML with a text editor is encouraging me
actually to do so, I've reached a degree of exasperation with my lack of geek-like
resources even though I've managed to produce a mostly working HTML page containing
the essence of my results.

That said, I'm not quite at the middle of another stage of the process, one in which
I'm adding links to another series of files containing additional pertinent data. I've
managed to replace about 400 of 900+ strings that display the multi-addressed PTRs in
the database with another (much longer) set of strings that contain the links to those
data files that contain additional detail about the domains that those PTRs are visiting.

The good news is that I created a set of scripts that extract the data from the master
database (which would fill the full width of the display and be 3500 rows deep) and place
it in a subdirectory. That task is complete and took just a few seconds after a day or
so of script-writing.

The present task runs afoul of the control strings necessary for HTML.

Here's the script that creates the actual string-replacement script:
awk '{print "cat "$5" sed '\''s/"$1$2$3"/"$4"/g'\'' '\>' "$6}' 'Source-04.txt' > Outcome-07.txt

where Source-04.txt is: 'pea dhcp-163.net1.bg slashpea' 'Success' Target-0802A.txt Target-0803A.txt

and Target-0802A.txt is: peadhcp-163.net1.bgslashpea

In order to comprehend this, you'll have to imagine "less than" p "greater than" wherever
you read "pea" and "forward slash" "less than" p "greater than" wherever you read "slashpea."

The script ensuing from the awk command as Outcome-07.txt is:
cat Target-0802A.txt sed 's/'peadhcp-163.net1.bgslashpea'/'Success'/g' > Target-0803A.txt

Which elicits the following complaint and fails to produce a modified Target-0803A.txt:
bash: p: No such file or directory

The files Target-0802A.txt and Target-0803A.txt both exist; Target-0803A.txt is empty in
order to protect File-0802A.txt from obliteration by a non-functional script Outcome-07.txt.

The script Outcome-07.txt appears to be looking for an imaginary file ...

Once the script is satisfied, Target-0803A.txt should read "Success" but by then I can create
the true successor script by using awk to print all its component parts just like Source-04.txt,
but with more (and longer) components.

The last step is to replace $3 in the new Source-05.txt with the individual PTRs read from a
list of the actual remaining 500+ PTRs that are to be linked. That's a future scripting task
which can be trivial, as it has been before.

I've made a copy of a portion of the developing webpage, which is just a table at this stage,
and that's the first attachment, presented as a text file. Then there are 33 more files with
the actual data that belong in a subdirectory named "MatchFilesMay2020" These are the ones
that all should already be linked in the table. Another set of 33 files with the quantity
portion of their filenames removed are also attached; they are a different set of data. They
belong in a different subfolder, "MatchedPTRs" The forum software has renamed all the domain-like
files with underscores; those may have to be reconciled with the webpage file to complete the

George Langford

Table-html-excerpt.txt17.65 KB
ip-66-70-185.eu_.3.txt89 byte
default-rdns.vocus_.co_.nz_.18099.txt696.82 KB
125.mtsnet.ru_.12.txt345 byte
ip-54-39-190.eu_.8.txt239 byte
dedic-center.ru_.116.txt3.5 KB
123-51-215-0.ll_.static.sparqnet.net_.4.txt201 byte
ip-54-39-184.eu_.9.txt267 byte
dedicated.vsys_.host_.100.txt3.34 KB
121.dhcp_.apogeetelecom.com_.7.txt290 byte
ip-54-39-179.eu_.3.txt87 byte
dedicated-assignments-only.fuse_.net_.27.txt1.34 KB
120.mtsnet.ru_.6.txt173 byte
ip-54-39-178.eu_.5.txt145 byte
dc113.kdata_.vn_.9.txt277 byte
113.mtsnet.ru_.3.txt87 byte
ip-54-38-90.eu_.6.txt164 byte
dallas-tx-datacenter.serverpoint.com_.6.txt302 byte
111.mtsnet.ru_.6.txt174 byte
ip-54-38-43.eu_.6.txt166 byte
daimon.alastyr.com_.2.txt60 byte
111.14.103.jeruk1_.ats-com.net_.4.txt179 byte
ip-54-38-42.eu_.7.txt194 byte
dailytopoffer.com_.4.txt136 byte
109-198-197-x.dynamic.b-domolink.net_.16.txt837 byte
ip-54-38-41.eu_.6.txt166 byte
cust.uvtnet.cz_.123.txt3.36 KB
109-198-192-x.dynamic.b-domolink.net_.6.txt312 byte
ip-54-38-40.eu_.8.txt221 byte
customer.vivid-hosting.net_.215.txt8.92 KB
100.mtsnet.ru_.3.txt84 byte
ip-66-70-185.eu_.txt515 byte
default-rdns.vocus_.co_.nz_.txt1.03 KB
125.mtsnet.ru_.txt44 byte
ip-54-39-190.eu_.txt42 byte
dedic-center.ru_.txt84 byte
123-51-215-0.ll_.static.sparqnet.net_.txt126 byte
ip-54-39-184.eu_.txt41 byte
dedicated.vsys_.host_.txt971 byte
121.dhcp_.apogeetelecom.com_.txt57 byte
ip-54-39-179.eu_.txt44 byte
dedicated-assignments-only.fuse_.net_.txt61 byte
120.mtsnet.ru_.txt40 byte
ip-54-39-178.eu_.txt44 byte
dc113.kdata_.vn_.txt380 byte
113.mtsnet.ru_.txt39 byte
ip-54-38-90.eu_.txt587 byte
dallas-tx-datacenter.serverpoint.com_.txt63 byte
111.mtsnet.ru_.txt45 byte
ip-54-38-43.eu_.txt382 byte
daimon.alastyr.com_.txt136 byte
111.14.103.jeruk1_.ats-com.net_.txt55 byte
ip-54-38-42.eu_.txt461 byte
dailytopoffer.com_.txt129 byte
109-198-197-x.dynamic.b-domolink.net_.txt68 byte
ip-54-38-41.eu_.txt165 byte
cust.uvtnet.cz_.txt80 byte
109-198-192-x.dynamic.b-domolink.net_.txt62 byte
ip-54-38-40.eu_.txt255 byte
customer.worldstream.nl_.txt573 byte
103.140.104-static.rdns_.serverhub.com_.txt1.15 KB
ip-54-38-38.eu_.txt245 byte
customer.vivid-hosting.net_.txt106 byte
100.mtsnet.ru_.txt40 byte
Ignacio Agulló
Iscritto: 07/30/2019

Was it necessary to attach dozens of files, about one megabyte of size?

Iscritto: 01/03/2015

Ignacio Agullo inquired:
Was it necessary to attach dozens of files, about one megabyte of size?

Yes; they're all different, with differing goals, impacts, patterns, and the like.

I'd also like to encourage others to attempt similar analyses. It's taking me a
couple of months to gather the data and put it into an order which can be examined
to find out why and how so many attacks are being made by servers located at
addresses which cannot be traced. These results show that they can be examined for
country of origin, degree of obfuscation, location of additional addresses, etc.

There are other months in the year; one person cannot possibly keep up with the task;
yet there are hundreds of folks picking up the traces left behind in the headers of
malicious messages; you can find out for yourselves by putting one of the PTR records
(a.k.a. hostnames) in an Internet search engine, enclosed in quotation marks, and then
gathering the IP addresses gleaned from malicious Internet traffic by the many folks
who monitor such traffic. That's another webpage like this excerpt that can be generated.

George Langford

Iscritto: 01/03/2015

Regarding the Table-html-excerpt.txt file:

After converting it back to HTML and trying out the links, I discovered an easily corrected
error in about two-thirds of them. In Leafpad the correction is to search for the string,
../../ScoreCards" and replace it with "../ScoreCards"

That should fix all the broken links.

By the way: The linked files are all plain text without any scripts and contain no more
links to anywhere else. That said, you can test the names with "dig hostname" and the
IP addresses with "dig -x IPaddress" Many of the hostnames come back as on the server,
"" which is a catchall address used by a fellow who maintains a site called
"barefruit error handling" or "unallocated.barefruit.co.uk" but it's not where these
oftimes malicious servers are. "whois IPaddress" will tell you where they are located
and what their autonomous server number (ASN) is.

George Langford

Magic Banana

I am a member!

I am a translator!

Iscritto: 07/24/2010

These days, I do not have time to decipher your posts. Anyway, a pipe is missing in:
cat Target-0802A.txt sed 's/'peadhcp-163.net1.bgslashpea'/'Success'/g' > Target-0803A.txt
And it is equivalent to, simply:
sed 's/peadhcp-163.net1.bgslashpea/Success/g' Target-0802A.txt > Target-0803A.txt
And, as I have told you many times, a non-escaped dot in a regular expression means "any single character". As a consequence, it should probably be:
sed 's/peadhcp-163\.net1\.bgslashpea/Success/g' Target-0802A.txt > Target-0803A.txt

Also, there is no way you got that command line returning:
bash: p: No such file or directory
Bash tried here to execute a command named "p". Such a single "p" was certainly right after the prompt.

Iscritto: 01/03/2015

After learning how to count characters in bash:

I found out that the offending character in the sed expression is the forward slash preceding
the second "p" in the string to be replaced. The attached text file demonstrates this result.

I have yet to see whether or not my substitution will work in the real world.

Your corrections were a key factor in this small accomplishment; thanks again !

After a lot of tries, I decided simply to bypass sed's difficulty with the pesky slashpea code,
supplied as ExemplarScripts29.txt

I constructed this with a series of awk commands followed by the paste command, plus some editing
in Leafpad to replace troublesome stuff like "'s" and "g'" wherein I filled in the leading and
trailing "'" characters, and a space between the "a and href" that I plugged with "a--href".

Most troublesome is the "IPv4" which even "3334" hasn't fixed. Everything appears to be an unknown
option to s
which is where its stands now.

George Langford

MBdemonstration.txt 151 byte
ExemplarScripts29.txt.txt 698 byte
Iscritto: 01/03/2015

Trying another tack with awk ...

see: https://stackoverflow.com/questions/50244876/how-to-use-gsub-in-awk-to-find-and-replace-and-txt-characters-within
where it's said:
echo "./file_name.txt|1230" | awk '{gsub(/\.\/|\.txt/,"")}1' file_name|1230

In the present task, taking just one exemplar PTR, see the attached file, which also shows bash's response.

The character that's flagged is the end-parenthesis, but that's part of the standard gsub syntax.

George Langford

exemplar-script-awk-gsub.txt 239 byte
Magic Banana

I am a member!

I am a translator!

Iscritto: 07/24/2010

There are three non-escaped single quotes on this command line.

Iscritto: 01/03/2015

In my forays into the mysteries of Fortran fifty years ago, mistakes in the coding often generated
error messages that had no discernible relation to the mistakes, but instead pointed to their
consequences. That's what I suspect is happening as a consequence of attempting to use the command
line to edit HTML.

The attached file has five versions of the offending script and the bash responses:
(1) Is the script that was the first that I tried, with escaped dots; ")" was flagged.
(2) All the dots are escaped; also the single quotes; "(" was flagged.
(3) Target file was altered to eliminate the p's; so was the script; ")" was flagged.
(4) Target file still altered to eliminate the p's; single quotes escaped; "(" was flagged.
(5) All the HTML-sensitive characters were translated; `/\&lt' (That's "<" - GL) was flagged.


George Langford

TrisquelScript-08072020.txt 1.28 KB