Page 1 of 2 12 LastLast
Results 1 to 10 of 15

Thread: Zip or tar causing loss?

  1. #1

    Question Zip or tar causing loss?

    I have encountered this twice, and want to know if you also have experienced data loss because of zip+tar.

    I created a huge tgz file with
    tar -f my.tgz -v -c -z ....

    The purpose was backup, and the file was around 50 GB.

    After a few months, I could not restore the content. There was an error message probably from zip library. After attempting some rescue measures, I understood that the file could not be recovered.

    I decided to check the hard drive on which the file resided. badblocks in non-destructive and destructive modes reported no errors, and there was no reason to believe that hardware is at fault.

    Now I do not use zip for large files.

    It is surprising that such an old and tested software can fail (perhaps).

  2. #2

    Default Re: Zip or tar causing loss?

    The -z option to tar tells it to use gzip (actually libgz), not zip.
    gzip and zip are 2 totally different things.
    I suspect that you messed something up, possibly trying to unzip the .tgz file, where you should have just untar/uncompressed it using 'tar -zxf file.tgz'
    I've never seen Gnu tar screw up, so it's very unlikely that it's at fault here.

  3. #3
    Join Date
    Jun 2008
    Location
    The English Lake District. UK - GMT/BST
    Posts
    36,733
    Blog Entries
    20

    Default Re: Zip or tar causing loss?

    Why archive it in the first place. Just another complication to the mix IMO
    Leap 15.1_KDE
    My Articles Was I any help? If yes: Click the star below

  4. #4

    Default Re: Zip or tar causing loss?

    I used gzip, evidently. I didn't know that -z option calls libgz which is different from zip.

    I did the tar-gzip correctly, as usual. I checked the size of tgz as it was created, it was as expected.

    Both errors occurred while dealing with huge files.

    One of the hard drives on which this occurred is still in use and does not cause any problems. I also checked the RAM with memtest.

    Now I do not trust the tar+gzip process any more.



    The tar+gzip process is very convenient to save space and archive folders, that's why I used it. Probably by "Why archive?" you mean that I could have used

    zip -r ....

    I shall try it for huge files; haven't used that a lot.

  5. #5

    Default Re: Zip or tar causing loss?

    ZStefan wrote:
    > I have encountered this twice, and want to know if you also have
    > experienced data loss because of zip+tar.
    >
    > I created a huge tgz file with
    > tar -f my.tgz -v -c -z ....
    >
    > The purpose was backup, and the file was around 50 GB.
    >
    > After a few months, I could not restore the content. There was an error
    > message probably from zip library. After attempting some rescue
    > measures, I understood that the file could not be recovered.
    >
    > I decided to check the hard drive on which the file resided. badblocks
    > in non-destructive and destructive modes reported no errors, and there
    > was no reason to believe that hardware is at fault.
    >
    > Now I do not use zip for large files.
    >
    > It is surprising that such an old and tested software can fail
    > (perhaps).


    Like noident, I think it is unlikely tar messed up. It's just as likely
    that your filesystem or memory or disk messed up, and likeliest of all
    is that the human messed up.

    But we can't help you decide what went wrong, because you haven't told
    us anything about it! What system are you using, what version of tar?
    What filesystem? What hardware are you using? What exactly was the error
    message? You say "there was no reason to believe that hardware is at
    fault" but you don't tell us what investigations you conducted to reach
    that conclusion, other than badblocks and I hope that wasn't all!

    I regularly use tar and gzip for big files and don't have problems.

  6. #6

    Default Re: Zip or tar causing loss?

    -----BEGIN PGP SIGNED MESSAGE-----
    Hash: SHA1

    > Now I do not trust the tar+gzip process any more.


    That you're using tar tell me you are not new to computers, so you are
    probably familiar with the random problems that can come into things.
    'tar' and 'gzip' are both made to work on streams of data which has at
    least one huge advantage: size doesn't matter (at one point tar had a
    limit of 8 GB files, though that was overcome a decade ago or so I
    believe). Anyway, huge chunks of the world use this same combination
    for all of their storage, and you will have better luck with tar+gzip or
    tar+bzip2 than with zip for large files for one large reason: zip does
    not support storing files >= 2GiB (zip64 does, but not sure how common
    that is now).

    The long and short of this is that before you rule in/out something you
    should test it. If creation of the tar file works then extraction later
    should work barring corruption that happened in the meantime (obviously
    not the fault of 'tar' since the data are (or should be) at rest).
    Having a checksum of the data before/after is a good way to see if
    anything has changed. You could do this test simply at any time to see
    if tar+gzip can handle your data since creating the archive and then
    immediately extracting it proves the technology one way or another
    quickly. You could even do this without taking any disk space:

    tar -zcv /path/to/archive | tar -ztv
    echo $?

    If the lats line printed is '0' then all was well. The above commands
    create an archive but pipe the output directly to a 'tar' command
    decompressing and testing the contents (without writing data anywhere...
    just reading everything from disk and then basically throwing it away
    while testing it).

    > The tar+gzip process is very convenient to save space and archive
    > folders, that's why I used it. Probably by "Why archive?" you mean
    > that I could have used


    The person asking probably asked why do an archive in the first place.
    The downside of any type of archive is that if your hardware (again,
    this is my vote in your case since you seem to know what you're doing
    with the tar command in general) has a single random one-time problem
    (it happens... look at all of the download failures people have of the
    OpenSUSE ISOs for some reason on otherwise-reliable Internet
    connections... bigger just makes it more obvious which is why checksums
    exist) can corrupt the entire archive. Storing files individually
    removes the risk of a single byte affecting gigabytes of data. It's a
    tradeoff, but one that I use as well (rsync backs up my stuff... no
    archiving with 'tar' because it just means more work to create/extract
    the archive).

    > zip -r ....


    Huge files is not the strength of the 'zip' format. Chances are that
    most of your sensitive data out online somewhere are being saved with
    'tar' more than 'zip'; it's worth going with 'tar' if possible because
    its track record is just that good.

    Good luck.
    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v2.0.15 (GNU/Linux)
    Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

    iQIcBAEBAgAGBQJO33CcAAoJEF+XTK08PnB5sRcP/AizvXrvWzK8/2HXz4+CpyZ6
    DoMd5Yx66tz4SqqBmAqtlMHUmYXcZqfobhtr0JqwIlE6WU1lc5GSvPEqvG0WOrP3
    KbeLTR/Iq+PqaLstsuf8weOHmaCZ9C4uBwsef/MpK+zsWq3qfPQrBRFQFmPSoMDt
    sZ6OHJWgsNJbTV9Z3AGr/+o9XvVjhlq7ikdDopIA2mqhNDymO5TtQrG3QeDGX7/I
    BsaBcW3aLMOPZ7cUPfEJ9/iOUmwcrGWo68ZOr72Shi44ooq6oXQQe8RNGGwh910C
    WXXL8lqsXuWV2onwPs64tGzEh2kWOO4wL0KQZH9nQXhMQ5JYOoQwSm0yvZxxoZjH
    mxdbGLR0W9ANAUUNiP8QMf6z9hyn+LWBkPcFTjkRp0jVLh0CQ7fp8wC7uWTefJ1R
    kLmhBCCJOuxiDTIeVnBR1yw55fAIo8gLwYhvXt6DRC5li7tA207U39kjv6qeLRno
    z8w9XHoaxxmfXilR99fH5y4V2qNMMXxDe2Ur6No3LnTApAQyrq2oqcxL/gu8WLn3
    Bo9n4/uAOY0BgstB+0POWWYqV+NDonEixHZRgwZj0cl8xJtqq3pAX34C/jNp6/qx
    s6EK5QbWRUnNoNIW2BLR8x9wFCURN4vckD4quW55Wud3edQmJecro6VyiAeigH7E
    CTv3neX2KA5HG08juZif
    =x9Uf
    -----END PGP SIGNATURE-----

  7. #7
    Join Date
    Feb 2009
    Location
    Spain
    Posts
    25,547

    Default Re: Zip or tar causing loss?

    On 2011-12-07 06:36, ZStefan wrote:
    >
    > I have encountered this twice, and want to know if you also have
    > experienced data loss because of zip+tar.
    >
    > I created a huge tgz file with
    > tar -f my.tgz -v -c -z ....


    That's not "zip", but gzip. Different programs.

    >
    > The purpose was backup, and the file was around 50 GB.


    A lot.

    > It is surprising that such an old and tested software can fail
    > (perhaps).


    It is known. Gzipped tar backups have a single point of failure: a problem
    decompressing renders the entire archive useless.

    The procedure could be improved with a check step: compare the backup with
    the original before saying "done".

    --
    Cheers / Saludos,

    Carlos E. R.
    (from 11.4 x86_64 "Celadon" at Telcontar)

  8. #8
    Join Date
    Feb 2009
    Location
    Spain
    Posts
    25,547

    Default Re: Zip or tar causing loss?

    On 2011-12-07 14:56, ab wrote:
    > Huge files is not the strength of the 'zip' format. Chances are that
    > most of your sensitive data out online somewhere are being saved with
    > 'tar' more than 'zip'; it's worth going with 'tar' if possible because
    > its track record is just that good.


    A good backup/archiving software should have integrated a forward recovery
    method. Ie, some amount or redundancy in the data so that errors can be
    recovered. A tgz doesn't have this. A plain zip is better because a failure
    doesn't corrupt the entire archive, just a file. Another method is to
    compress files first, then use tar or cpio. The rar format does have error
    recovery, but it is commercial.

    --
    Cheers / Saludos,

    Carlos E. R.
    (from 11.4 x86_64 "Celadon" at Telcontar)

  9. #9
    Join Date
    Feb 2009
    Location
    New Zealand
    Posts
    121

    Default Re: Zip or tar causing loss?

    At one time I was involved with a system that had 400 Linux servers. On each server every static file was checksummed by AIDE to detect intruders. Occasionally, a server would report a file had changed that was unexpected - it was never an intrusion, it was always bad RAM or disk.

    For SATA/IDE rewriting a bad block would normally cause the drives firmware to map it out and replace it with a good one. You should check the SMART readouts on your drives for unrecoverable errors.

    Perhaps verify backups immediately after creating them (I always used to do this with tapes, for some reason I've lost the habit now that I use disk). You could also store a checksum to track whether they've been corrupted after the fact.

    You could run a RAM test, but I've never had a great deal of success with them. Maybe it would be better to loop doing backups and verifies. Are you sure on which machine the corruption is occurring?

    As others have suggested, tar.gz is not a great way to backup - a bad block prevents access to the rest of the archive. Now that disk is cheap I tend to just rsync to more than one removable disk (there are attempts at Apple Time-machine style archives for Linux - haven't tried any of them though).

  10. #10
    Join Date
    Feb 2009
    Location
    Spain
    Posts
    25,547

    Default Re: Zip or tar causing loss?

    On 2011-12-08 05:46, mchnz wrote:
    > At one time I was involved with a system that had 400 Linux servers. On
    > each server every static file was checksummed by AIDE to detect
    > intruders. Occasionally, a server would report a file had changed that
    > was unexpected - it was never an intrusion, it was always bad RAM or
    > disk.


    No need to be bad hardware, it can be chance. Cosmic rays, for instance,
    flipping a bit.

    Years ago, in MsDOS, one of the options was to enable verify mode whenever
    you wrote anything to disk. Writes were verified always (if enabled). It
    was understood that writes could fail. Now we can't do that.

    --
    Cheers / Saludos,

    Carlos E. R.
    (from 11.4 x86_64 "Celadon" at Telcontar)

Page 1 of 2 12 LastLast

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •