Page 2 of 7 FirstFirst 1234 ... LastLast
Results 11 to 20 of 61

Thread: Bit rot is real

  1. #11
    Join Date
    Aug 2010
    Location
    Chicago suburbs
    Posts
    15,011
    Blog Entries
    3

    Default Re: Bit rot is real

    Quote Originally Posted by ZStefan View Post
    Well, maybe the error correction on hardware or firmware levels has its limits, since I have seen files with bit rot.
    A memory error could cause that. Low end computers do not have error detection for memory, but they still have it for hard drives. And I'm not sure about SSD devices.
    openSUSE Leap 15.2; KDE Plasma 5.18.5;

  2. #12
    Join Date
    Nov 2009
    Location
    West Virginia Sector 13
    Posts
    16,129

    Default Re: Bit rot is real

    Also there is a finite error rate on reading and writing data. When you get into huge files or disks they become more evident. Also correction algorithms are not perfect. Example is each sector has a check sum but a check some does not uniquely identify the data . It can only signal that the data does not match the checksum. this normally will trigger a reread (possibly many) of the sector but it is possible to still have a data error and the checksum match just the checksum of the data matches the checksum recorded. then you can have bad data but a matching checksum.

    Ok this would be rare but mathematically possible But when you speak of billions of bytes in is something you have to face.

  3. #13

    Default Re: Bit rot is real

    Quote Originally Posted by nrickert View Post
    A memory error could cause that. Low end computers do not have error detection for memory, but they still have it for hard drives. And I'm not sure about SSD devices.
    I check all my computers with memtest from time to time. There were none. But, of, course, there are transient RAM errors. I still think those two cases were magnetic media errors.

  4. #14

    Unhappy What is left?

    ext4 filesystem only has option for keeping track of files' metadata's checksum. btrfs can keep track of file's data's checksum but the filesystem is experimental (their own roadmark has no end and they use phrases like "... is no longer unstable"). Simple backup may serve a disservice since the bit rot will be backed up. Apparmor does not monitor files. RAID is not recommended for data storage.

    What tools are left to at least automatically detect a file corruption? Is the only way left to calculate and monitor checksums myself?

  5. #15

    Default Re: Bit rot is real

    gogalthorp wrote:
    > Also there is a finite error rate on reading and writing data. When you
    > get into huge files or disks they become more evident. Also correction
    > algorithms are not perfect. Example is each sector has a check sum


    No, a sector does not have a checksum, it has an ECC!

    As nrickert says, on normal desktop machines its the main memory and the
    buses that don't have ECC. That's where errors can arise more easily.
    Those are the first things to fix on any machine intended to store data.

    > Ok this would be rare but mathematically possible But when you speak of
    > billions of bytes in is something you have to face.


    Right, but its also worth bearing in mind that computers do work and
    that many, many large companies over many years store much, much data
    without having to invoke mystical methods to preserve the data.

    There's an awful lot of literature been published about the errors that
    can and do occur and the best ways to minimise their impact. Best to
    read that rather than dubious blogs and forum postings.

  6. #16
    Join Date
    Nov 2009
    Location
    West Virginia Sector 13
    Posts
    16,129

    Default Re: Bit rot is real

    ECC is a type of checksum and it can only correct small changes and it is not a unique 1 to 1 value to the data. and it does not uniquely match the data otherwise we could have humungous compression. But maybe it is 42 after all

  7. #17

    Default Old, simple example of corruption going unnoticed

    Here is a simplified example that I know of how simple checksumming may fail to detect errors.

    This is for old hard drives, now not in use. Sector size is 512 bytes and checksum size is 2 bytes. A simple checksumming that was used is insensitive to byte swap.

    Now consider two consecutive bytes on hard drive. The bit content of the first and second bytes are
    00000000 10000000

    If, because of bit rot, two bits in the bytes change their values, the new content will be
    10000000 00000000

    This is equivalent to byte swap and will go unnoticed.

    I can imagine that similar things can happen even with modern, more advanced checks.

  8. #18
    Join Date
    Feb 2009
    Location
    Spain
    Posts
    25,547

    Default Re: Bit rot is real

    On 2014-01-25 06:46, ZStefan wrote:
    > Now, I wonder whether there are tools to reveal and to fight the bit rot
    > without switching to btrfs? Maybe some utilities that would calculate
    > the md5sums of important files automatically, and compare them from time
    > to time?


    par2.


    Generates parity files
    Uses Reed-Solomon algorithm to produce a RAID-like data protection and
    recovery system

    --
    Cheers / Saludos,

    Carlos E. R.
    (from 12.3 x86_64 "Dartmouth" at Telcontar)

  9. #19
    Join Date
    Aug 2010
    Location
    Chicago suburbs
    Posts
    15,011
    Blog Entries
    3

    Default Re: Old, simple example of corruption going unnoticed

    Quote Originally Posted by ZStefan View Post
    Here is a simplified example that I know of how simple checksumming may fail to detect errors.

    This is for old hard drives, now not in use. Sector size is 512 bytes and checksum size is 2 bytes. A simple checksumming that was used is insensitive to byte swap.

    Now consider two consecutive bytes on hard drive. The bit content of the first and second bytes are
    00000000 10000000
    Those old hard drives used CRC16 check sums, which would have detected this error.
    openSUSE Leap 15.2; KDE Plasma 5.18.5;

  10. #20
    Join Date
    Jun 2008
    Location
    Kansas City Area, Missouri, USA
    Posts
    7,236

    Default Re: Bit rot is real

    On 01/30/2014 07:56 AM, nrickert wrote:
    >
    > ZStefan;2620584 Wrote:
    >> Here is a simplified example that I know of how simple checksumming may
    >> fail to detect errors.
    >>
    >> This is for old hard drives, now not in use. Sector size is 512 bytes
    >> and checksum size is 2 bytes. A simple checksumming that was used is
    >> insensitive to byte swap.
    >>
    >> Now consider two consecutive bytes on hard drive. The bit content of the
    >> first and second bytes are
    >> 00000000 10000000

    >
    > Those old hard drives used CRC16 check sums, which would have detected
    > this error.


    @ZStefan: I know your mind is probably made up and we should not bother you with
    facts, but we may still influence some other readers of this thread. Before you
    contribute to FUD, please read on how ECC works.

    A legacy disk with 512-byte sectors uses a 50 byte ECC field, and the newer
    4K-byte sector disks use 100 bytes. Given the number of errors that such ECC
    lengths can correct, any procedure that could be devised to "refresh" the files
    would greatly increase the probability of error. This increase would be due to
    the extra load on the disk drives causing earlier failure, and the cosmic-ray
    and other mechanisms that affect consumer-grade hardware without ECC memory or
    ECC-protected data paths.


Page 2 of 7 FirstFirst 1234 ... LastLast

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •