Backing up /home dir, when permission is unimportant

8 risposte [Ultimo contenuto]
nadebula.1984
Offline
Iscritto: 05/01/2018

If I used tar (in combination with certain compressing program such as lz4 or zstd), extracting some files can be gravely inefficient. Opening the tar archive takes several hours (several TB in size); skipping unwanted files takes another several hours.

Therefore when permission is unimportant, I'd like to try some archiving-compressing programs that support non-solid archiving/compressing, such as 7zz (officially released 7-zip for GNU/Linux).

Another problem is symbolic link. My /home directory doesn't contain symbolic links temporarily, but I may need to use them in the future. Therefore I'd like to find certain archiver/compressor that fully supports *nix file system features as well as (partial) non-solid archiving/compressing.

Magic Banana

I am a member!

I am a translator!

Offline
Iscritto: 07/24/2010

Isn't GNU ZIP (which is fast) compressing enough? If you mostly have large text files (typical case where higher compression ratios can be obtained) rather than already-compressed data (movies, music, pictures, what is most of /home's usage on most desktop systems, I believe), you could compress them on your system rather than only on the backup. If you process those text files from the terminal, you can start your command line with zcat, bzcat or xzcat (depending on the used algorithm; XZ provides the highest compression ratio among the three) or with zgrep, bzgrep or xzgrep if your processing starts with some selection of lines, using a regular expression.

Also, with regular backups (and backups should be regular!) most of the space to gain comes from not backing up what was has not been modified since the last backup. Back In Time, by default in Trisquel, does that through hard links.

nadebula.1984
Offline
Iscritto: 05/01/2018

tar was and is good, only that it lacks certain flexibility when it comes to large archives (TB in size). However, if a program creates non-solid, compressed archives, it has to do both archiving and compressing, which is considered incompatible with Unix philosophy. See "Solid compression" page in Wikipedia.

And those archive formats mainly designed for Losedows are not suitable, either, because they are very unlikely to support *nix file system features. Again, such archive format has to be free/open and has free/libre software implementation.

Therefore I found two formats promising: FreeArc and dar. The former has been continued for a long time, so I learned to use dar. It says that files are compressed individually, suggesting that the archive is non-solid, and it's possible to provide much more flexibility than tar.

One downside is that dar supports very few compression algorithms, among which the "-zlzop-1" parameter seems most useful for me.

Magic Banana

I am a member!

I am a translator!

Offline
Iscritto: 07/24/2010

You still do not explain what occupies your /home. I repeat: if, like on most desktop systems, it mostly contains compressed pictures (JPEG, PNG, etc.), compressed music (Ogg, Flac, etc.), compressed videos (WebM, H264, etc.), ... then there is essentially no space to gain by compressing with a generic algorithm the archive containing those files. If your /home contains large plain text files, then what is the reason to not compress them separately on the system?

nadebula.1984
Offline
Iscritto: 05/01/2018

That I want to use a quick compression programs implies that at least part of my /home contents are compressible. However, if such a quick compression utility can marginally reduce the total size of the archive (e.g. from 1 TB to 900 GB), while having no observable impact on total transfer time (i.e., it is smart enough to avoid wasting time on compressing file types that are already compressed), it is still desirable.

Therefore I'd like to try "dar -zlzop-1" and see what happens. lzo is not as efficient as lz4 and zstd, but the latter two are temporarily not supported by dar.

gzip is too slow for me, whereas xz is a poorly designed format which is structurally flawed and unsuitable for archiving. If you want higher compression ratio at the cost of speed, lz (lzip) is way better.

Magic Banana

I am a member!

I am a translator!

Offline
Iscritto: 07/24/2010

My point is that if, in addition to files in already compressed formats, we are talking about plain text files (you apparently do not want to give that information), you may be able to individually compress them in your /home. In this way, you save space in your /home and in your backups. The backup system would not need to compress any more. It could be Back In Time, Trisquel's default, which uses hard links for files that are unmodified between snapshots. And depending on the algorithms you choose to compress the text files, even reading them on the disk may be faster.

ZIP is an option (as I showed) but, you are right, recent formats such as the Zstandard may now be considered sufficiently mature. Going on with my small benchmark: 'zstd HIGGS.csv' only takes one minute and seven seconds (6.5 times faster than gzip!) to make HIGGS.csv.zst, which weights 2.5 GB (a little less than HIGGS.csv.gz but 25% more than HIGGS.csv.bz2 and almost twice more than HIGGS.csv.xz) and is decompressed very fast (2.4 times faster than zgrep!):
$ time zstdgrep -c 0 HIGGS.csv.zst
11000000
18.40user 5.51system 0:21.77elapsed 109%CPU (0avgtext+0avgdata 3368maxresident)k
3679392inputs+0outputs (5major+743minor)pagefaults 0swaps

nadebula.1984
Offline
Iscritto: 05/01/2018

I do have lots of compressible files (several TB), but very few of them (about one thousandth in capacity) are text files. I learned that there are certain algorithms very efficient when compressing text files (such as PPMd), but they are basically irrelevant here.

Fortunately I've found that dar format combined with lzo compressor (with -zlzop-1 parameter) is good enough, though I need to take some time to further study the usage of dar (quite different from tar).

If the permission and features such as symbolic link are irrelevant, and if I wish to share the archive with Losedows useds, I'd like to use 7z format with Deflate compressor (which was used in PK-ZIP) to guarantee cross-platform compatibility.

lanun
Offline
Iscritto: 04/01/2021

> I do have lots of compressible files (several TB), but very few of them (about one thousandth in capacity) are text files.

Do you consider spreadsheets and databases to be text files?

Magic Banana

I am a member!

I am a translator!

Offline
Iscritto: 07/24/2010

The Open Document Format (ODF), that of LibreOffice, is ZIP-compressed: there is essentially no space to save by further compressing it. For databases, I do not know. It probably depends on the database management system.