Remove BOM from UTF8 files

8 respuestas [Último envío]
Sabrinakitty
Conectado
se unió: 06/17/2020

Hello
BOM: Byte order mark
Wikipedia article: https://en.wikipedia.org/wiki/Byte_order_mark
The UTF-8 representation of the BOM is the (hexadecimal) byte sequence 0xEF,0xBB,0xBF.
The Unicode Standard permits the BOM in UTF-8, but does not require or recommend its use.

I have 700 log files copied from Windows PC, all of them encoded in UTF-8 with BOM. I need to remove BOM (0xEF,0xBB,0xBF) from the beginning of each file. How can I accomplish this task in Trisquel? I'm attaching a sample text file encoded in UTF-8 with BOM.

Thank you!

AdjuntoTamaño
textUTF8BOM.txt363 bytes
loldier
Desconectado/a
se unió: 02/17/2016
loldier
Desconectado/a
se unió: 02/17/2016

That line seems to work as intended. Now, we need Magic Banana tell us how to process the remaining 699 log files automatically.

sample_log01.png sample_log02.png
Malsasa
Desconectado/a
se unió: 12/01/2016

Can't we just use looping for that? For example, I use this command to
process multiple files. It removes lines with hashtag sign in every
TXT file of all TXT files in current directory. What do you think?

for filename in file*.txt; do sed -i '/#/d' $filename; done

loldier
Desconectado/a
se unió: 02/17/2016

No doubt a for loop would do the trick.

https://en.wikipedia.org/wiki/For_loop

Magic Banana

I am a member!

Desconectado/a
se unió: 07/24/2010

That should work. Of course, the sed command loldier gave must replace the one that deletes every line with "#". Also, writing "$filename" instead of $filename allows the file name to contain characters such as spaces, which have a "special" meaning for the shell.

loldier
Desconectado/a
se unió: 02/17/2016

There's a utility 'bomstrip' (tool to strip Byte-Order Marks from UTF-8 text files).

sudo apt install bomstrip

https://www.ueber.net/who/mjl/projects/bomstrip/

http://muzso.hu/2011/11/08/using-awk-sed-to-detect-remove-the-byte-order-mark-bom

Sabrinakitty
Conectado
se unió: 06/17/2020

Thanks, everyone!
This code did the job:
#!/bin/bash
for filename in $HOME/Documents/bom/*.log; do
sed -i $'1s/^\uFEFF//' "$filename"
done

Malsasa
Desconectado/a
se unió: 12/01/2016

Amazing. Glad to know that worked! Thanks to lcerf and enduzzer for
explaining my command I learned new thing.