Page 1 of 3 123 LastLast
Results 1 to 10 of 22

Thread: Rsync vs Tar/Gz for backups

  1. #1

    Default Rsync vs Tar/Gz for backups

    Dear all
    I have the following backup script that packs my current folders contents to an external hard disk (in my company we have enough hard disks so there is no really need to do incremental backups)

    Code:
    echo '----- Backup Started '`date` >>/root/backup/backuperrors.txt
    tar -zcvf /media/a9f299d7-fcbc28b3f3c0/user-host`date '+%d-%B-%Y'`.tar.gz /etc /root /home 2>> /root/backup/backuperrors.txt

    I would like then to ask you
    a. Is there any problem with tar gz with large size? Lets say that created output file is of size 500GB, can it be that tar/gz have limitations on the file size they can handle?

    b. Can it be better to use rsync
    with the -c flag for doing checksum checks while "copying"
    and the -z flag to compress the file?

    c. How the backup in rsync works with the -z flag ? Does it create a single compressed file like tar.gz or it compressed file by file?

    d. Any better ways to use rsync for compressive backups?

    I would like to thank you in advance for your help

    Best Regards
    Alex

  2. #2
    Join Date
    Jun 2008
    Location
    Netherlands
    Posts
    25,777

    Default Re: Rsync vs Tar/Gz for backups

    I think you should first try to understand what rsync does. It is rather different from tar. And when you say
    c. How the backup in rsync works with the -z flag ? Does it create a single compressed file like tar.gz or it compressed file by file?
    my guess is that you did not understand what it says in the rsync man page
    -z, --compress compress file data during the transfer
    Which means imho that no compressed files are created, compresssion is during transfer (usefull, because rsync often is between different systems and thus compressing over a network may bring better performance).

    In short, tar (tape archiver, but almost nobody nowadays use tape as the place to put the archive) bundles several files into one big file (in earler times on a tape).

    rsync/rsyncd (you need both) synchronises files (complete directory trees) between places) often on different systems). Think e.g. you having an address book on different systems where you changed one and want the other synced with it. But also tthink of backup (I e.g. rsync the complete /home of one system to a backup place of another system). The gain using rsync is that only changed files (or even parts of it) are transfered. Another gain is that you can walk through the directory tree on the backup system and find everything in the same place for inspection and/or restore.
    Henk van Velden

  3. #3

    Default Re: Rsync vs Tar/Gz for backups

    hcvv wrote:
    > -rsync/rsyncd -(you need both)


    my guess is that you did not understand what it says in the -rsync-
    -man- page

    rsyncd is the rsync daemon and you only need it if you want to use it.
    rsync will work perfectly happily without it.

    But that niggle apart, everything else was good advice.

  4. #4

    Default Re: Rsync vs Tar/Gz for backups

    You are right. I was too furious, most probably being tempted from the fact that rsync does also the checksum checks on files it handles (which I thought might be a very useful feature when creating large tar compressed files)

  5. #5

    Default Re: Rsync vs Tar/Gz for backups

    I would like to add one more point in this nice discussion.
    I was looking at the hard disk costs and I have found that hard disk costs are quite low these days (in other words not too expensive if you think how crucial my data are)

    So the question then is
    How about if I have an external hard disk of 3TB and having rsync running once per week? (like Sundays) and rsyncing my system to the external hard disk. If I am not wrong this will create on external hard disk a cloned version of my system. Would not that be true?

    B.R
    Alex

  6. #6
    Join Date
    Aug 2010
    Location
    Chicago suburbs
    Posts
    13,185
    Blog Entries
    3

    Default Re: Rsync vs Tar/Gz for backups

    Quote Originally Posted by alaios View Post
    d. Any better ways to use rsync for compressive backups?
    Currently, I am using "dar", which you would have to install from the repos.

    I have been persuaded that it is better, though I never ran into problems with "tar".

    The differences:

    1: "dar" compresses files individually in the archive. This is supposed to ensure that a bad disk sector in the archive will only affect one file, instead of having effects that leak into all subsequent files.

    2: "dar" creates a multi file archive, as needed, for large archives.

    The main disadvantage of "dar" - the command is not on the usual rescue CD or DVD media, so you either need to build a special purpose CD, or figure on installing the system before you can recover the backed up data. I use the second of those choices - I only backup "/home" anyway.
    openSUSE Leap 15.1; KDE Plasma 5;
    testing Leap 15.2Alpha

  7. #7

    Default Re: Rsync vs Tar/Gz for backups

    alaios wrote:
    > How about if I have an external hard disk of 3TB and having rsync
    > running once per week? (like Sundays) and rsyncing my system to the
    > external hard disk. If I am not wrong this will create on external hard
    > disk a cloned version of my system. Would not that be true?


    Yes, that will work. You can even use different disks on alternate weeks
    or whatever to get more than one backup. It's also possible to use rsync
    to create many incremental backups on one disk. I use a program called
    dirvish - http://www.dirvish.org/

  8. #8
    Join Date
    Jun 2008
    Location
    Netherlands
    Posts
    25,777

    Default Re: Rsync vs Tar/Gz for backups

    I like (and use) the method that is automised and configurable by rsnapshot

    In fact I reprogrammed it myself, but the idea is that you make an rsync copy and then uses cp -al to p create a new generation of the backup (say backup0 to backup1) that uses hardlinks and thus almost no space. A new rsync then synchronises backup0 (only changed files are copied and removed files deleted), while at the same time backup1 is still complete.

    You can thus create cycles. E.g. I have a cycle of 10 and run a backup each week, which means that you can retrieve until 10 weeks back in time. But it is easy (and that is where rsnapshot helps you) to create several cycles with e.g. daily, weekly, etc. cycles.

    (And yes, rsyncd is not needed in all circumstances, but it helps when youu use different systems )
    Henk van Velden

  9. #9

    Default Re: Rsync vs Tar/Gz for backups

    Hi. I think your answer is even better.

    So I go purchase the external hard disk and then I would ask for some help configuring it.
    How easy is to recover a specific cycle? How easy is to fail and after one year of operation the cycles can not be recovered (in that extreme case I would guess that only the first backup would be restored.. which was done one year before)

    I am ordering the hard disk today and I will come for more help tomorrow or at weekend.

    Regards
    A

  10. #10

    Default Re: Rsync vs Tar/Gz for backups

    hcvv wrote:
    > I like (and use) the method that is automised and configurable by
    > 'rsnapshot' (http://www.rsnapshot.org/)


    Yes, rsnapshot is an alternative to dirvish. There are more (not sure
    whether that is a or a )

    > In fact I reprogrammed it myself, but the idea is that you make an
    > rsync copy and then uses cp -al to p create a new generation of the
    > backup (say backup0 to backup1) that uses hardlinks and thus almost no
    > space. A new rsync then synchronises backup0 (only changed files are
    > copied and removed files deleted), while at the same time backup1 is
    > still complete.


    That's how dirvish works, except it uses rsync to do everything instead
    of using cp for part of the work. I'm not saying there's anything wrong
    with using cp, BTW.

    > You can thus create cycles. E.g. I have a cycle of 10 and run a backup
    > each week, which means that you can retrieve until 10 weeks back in
    > time. But it is easy (and that is where rsnapshot helps you) to create
    > several cycles with e.g. daily, weekly, etc. cycles.


    Yes, and you can have different cycles for different parts of your
    filesystems etc etc.

Page 1 of 3 123 LastLast

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •