tar.gzip umystified

Dear all.

I have tried to plan my backup plans. As I want it simple I am gonna use only tar.gz combination of some files that are important.

My question then is the following:
-I have a 100GB hard disk with 20Gb free space only.
-I would like to backup the rest 80Gb to an external hard disk.
-I run my scripts which end up saving a 75Gb(due to compression) to my external hard disk.

–>Then comes the times to try to see the contents of my archive (just to make sure that I can recover what is inside the 75GB disk file). Do you know if tar.gz needs to decompress the 75Gb file in some /tmp space in my hard disk for showing me the contents inside it? In that case it will not be easy at all to ever look at what is inside it in my hard disk, as there is no 80Gb of free space in my hard disk (20gb only).

Best Regards
Alex

No, tar ztvf does not need temp space to store the expanded archive since it uses a pipeline to stream the uncompressed output to tar.

Hi,

I would recommend that you do not use .tar.gz as an archive format.

You are compressing the entire archive as a single entity and a subsequent corruption of the archive could render its contents completely inaccessible. It is better to individually compress the files before adding them to the archive such that a corruption of the archive is likely to only affect one, or a limited number, of files.

I also recommend using a proper archiving program that maintains a catalog of the archive and is capable of verifying the integrity of the archive without having to perform extraction and compare. For a simple case like yours I would suggest dar and use of parchive with it. These are available as packages in the openSUSE repositories and are not hard to configure.

80GB of data is surely valuable to you and I certainly would not trust the .tar.gz format to keep it safe.

Regards,
Neil Darlow

Thanks a lot for your comments.
I have tried dargui and I found that it produces only one file at the end. What I didn’t like with dar is that I want when the compression/packaging process is over to see what is inside. If needed I can see inside my archive and recover few files if needed.

That is why I decided for tar.gzip as I can any time double click on it and see inside. Also I found my self more familiar with this type of ‘products’
If you can suggest me something else please feel free to comment again :slight_smile:

Regards
Alex

neildarlow wrote:

> You are compressing the entire archive as a single entity and a
> subsequent corruption of the archive could render its contents
> completely inaccessible.

The best way to protect against corruption of a backup is to have
another backup. In a different place. Corruption is a lot less likely
than complete failure (of the device or in a fire or a theft).

> It is better to individually compress the files before adding them to
> the archive such that a corruption of the archive is likely to only
> affect one, or a limited number, of files.

That can be an extremely wasteful technique with small files, since the
filesystem may well use a 4 KB block or bigger to store each file.

But having said that, if your data is worth backing up then the space
saved by compression should not be an overriding priority.

Hi,

@djh-novell: There is always the option to store small files or certain filetypes uncompressed.

@alaios: dar can break its archive into slices of a user-defined size e.g. 4GB for burning to DVDs. It is also possible to list the contents of the archive to see what is inside. Any negative impression you have gained of dar is, I believe, due to limitations of dargui. I would suggest consulting the dar documentation to see what it is capable of.

Regards,
Neil Darlow

I am sorry that what I will write is partially off-topic, but I must say it anyways.
I have tried almost any backup technology even tar.gz, and what I found simplest and most reliable is:

  1. your root directory, excluding home partition or windows partition (just system) is good to do tar.gz
  2. Anything else, rsync is totally simplest, most reliable and fastest way. Of course, taking that external drive is now very very cheap
    (I store that system.tar.gz in directory which is under /home, so this file also goes to rsync)

this may not be the ultimate truth, but it is what my experience selected as “the best” way.

so I made one-line rsync script, I plug externad drive in, and run command.

On 2011-04-08 14:06, alaios wrote:
> That is why I decided for tar.gzip as I can any time double click on it
> and see inside.

I suggest you try that with the big targz archive. Some tools do a full
expansion before letting you look inside a file.


Cheers / Saludos,

Carlos E. R.
(from 11.2 x86_64 “Emerald” at Telcontar)

On 2011-04-08 13:46, Dave Howorth wrote:
> neildarlow wrote:
>
>> You are compressing the entire archive as a single entity and a
>> subsequent corruption of the archive could render its contents
>> completely inaccessible.
>
> The best way to protect against corruption of a backup is to have
> another backup. In a different place. Corruption is a lot less likely
> than complete failure (of the device or in a fire or a theft).

A corruption on both archives and you have nothing. Tgz format is known to
have that problem, is not a good format for backup.

A good backup should have error protection.


Cheers / Saludos,

Carlos E. R.
(from 11.2 x86_64 “Emerald” at Telcontar)

I use tar and gzip (and gpg) to make backups on our servers at work.
The only time I have a problem is when there is a hardware problem.
That means there is a problem with the HD I’m reading from or the USB drive I’m writing to.
tar and gzip aren’t corrupting anything it’s what they are reading in and writing that’s the problem. Garbage in, garbage out…

robin_listas is right about having error checking, use something like this.


tar -czf /mnt/usbdrive/backup.tar.gz /home/
BACKRES="$?"

#Backup fail?
if  "${BACKRES}" -gt "0" ]
then
   #Write a message to /var/log/messages
   logger "Backing up /home has failed."
fi

You can get alert on the backup failure a lot of different ways, I just picked writing to /var/log/messages becuase it was easy.

Good luck,
Hiatt

On 2011-04-10 17:36, jthiatt08 wrote:

> tar and gzip aren’t corrupting anything it’s what they are reading in
> and writing that’s the problem. Garbage in, garbage out…

Yes, that is correct. But media errors do develop on backup media. It
happens. The problem is that when it happens, a tgz copes badly with the
situation.

> robin_listas is right about having error checking, use something like
> this.

Not error checking, but error correction. It is different.
And much better if it is native (internal) to the format used.


Cheers / Saludos,

Carlos E. R.
(from 11.2 x86_64 “Emerald” at Telcontar)