GNU tar observations

6 risposte [Ultimo contenuto]
Avron

I am a translator!

Online
Iscritto: 08/18/2020

I'd like to use GNU tar for backup to a disk connected via USB2 and I did a number of tries on a "small" directory with 5718 files, total size 15 GB.

I get the size by "du -sh ." which also matches with the resulting tar file.

I did several tries, the time it took, as indicated by time, varied from 9 minutes 19 seconds until 9 minutes 49 seconds (I am not running anything else but I have a number of programs started, like evolution, gajim, quassel client).

This seems to mean that the speed is 15 000 / 600 = 25 MB/s. "sudo hdparm -t" gives a read speed around 10 MB/s for this disk, so I find it surprising that the write speed would actually be higher. Or could the disk find out it is writing the same thing as previous file and then be faster?

Besides, I noticed that,during the copy, GNU tar seems to make pauses that may be slightly shorter or slightly longer than 1 minutes (I saw roughly from 57s to 70s). By pauses I mean that "ls -l" on the tar file does not increase, if I use checkpoints showing the progress, the display of checkpoints is interrupted and at resume the size copied has not increased more than one step, and if I used --totals=USR1 and send signals, during the pause nothing is displayed. So GNU tar seems to be totally suspended. Whether or not I use these options, the total time is in the same range.

Any clue why the write speed is apparently higher than expected and why GNU tar makes such pauses?

Any suggestion on tar to make the backup faster (I want to copy about 350 GB) is also welcome.

Magic Banana

I am a member!

I am a translator!

Offline
Iscritto: 07/24/2010

If a large proportion data is not already compressed (notice that picture, sound, video, LibreOffice files are usually already compressed), you can gain speed and space by compressing with GNU Gzip or zstd (install the eponymous package first). GNU tar respectively uses them when called with option -z and --zstd. Algorithm providing higher compression ratios would make the CPU be the bottleneck.

Avron

I am a translator!

Online
Iscritto: 08/18/2020

Thanks but what takes the largest amount of space (photos) cannot be compressed much. I had tried with --gzip, the size was slightly reduced but it took nearly 50% longer.

I tried tar to an internal disk instead (data to archive are on a RAID volume, using mdadm, made of internal SATA HDDs, archive is on internal SATA SSD), in this case GNU tar does not make any pause. So the pauses only occur with the external disk connected via USB2.

Ark74

I am a member!

I am a translator!

Online
Iscritto: 07/15/2009

Do you need to archive you files on a tar file?

For large and "fast" transfers/("backups" if you will), I prefer to use rsync.
Also using internal SATA connectors will give you better performance than USB.

You might want to check if your USB connector has UASP support[1] to make sure your have the best performance, otherwise you might have a low speed bus bottleneck.

Cheers! o/

[1] https://en.wikipedia.org/wiki/USB_Attached_SCSI

Avron

I am a translator!

Online
Iscritto: 08/18/2020

My internal SATA connectors are all used already and I have no spare PCIe slot, I guess I could remove my sound card and add one (more) PCIe SATA adapter.

I did not think of rsync for a local copy, if it works for a local copy too, I can use it, thanks for the suggestion.

Do you have any suggestion about which options to use to do an exact copy (e.g. keep all dates, owner, group, symlinks)?

I already used "rsync -a" to copy a directory and its contents to a remote computer, it preserved dates. I had tried "scp -pr", the manual says "-p" will preserve dates, I tried, it did not. That made me a bit worried about the manual.

Ark74

I am a member!

I am a translator!

Online
Iscritto: 07/15/2009

rsync -Aax will do.

Avron

I am a translator!

Online
Iscritto: 08/18/2020

Thanks, it works well!