Page 1 of 7 123 ... LastLast
Results 1 to 10 of 61

Thread: Bit rot is real

  1. #1

    Question Bit rot is real

    I have recently read an article about bit rot:
    http://arstechnica.com/information-t...n-filesystems/

    Bit rot is the phenomenon in hard drives when a bit arbitrarily switches its value in a file on HD.

    The author says that bit rot is quite common:

    "It's not uncommon at all to see five, or 10, or 50 checksum errors on a disk that's been in service for a few years"

    And I am afraid I have encountered it many times. Some of the files had to be discarded.

    Bit rot cannot be easily detected in ext4 and many other filesystems since the filesystem's integrity is not violated in most cases. A simple backup of the file made after bit rot will, naturally, also be corrupted, and, unless special measures are taken, the file will be corrupted.

    The basic way to find out whether bit rot has taken place is to calculate the checksums of the file.

    Now, I wonder whether there are tools to reveal and to fight the bit rot without switching to btrfs? Maybe some utilities that would calculate the md5sums of important files automatically, and compare them from time to time?

  2. #2
    Join Date
    May 2010
    Location
    Space Colony Lagrange Point 22° à, 77° Ƅ, 56° ɤ, 99° ɜ
    Posts
    3,166

    Default Re: Bit rot is real

    ZStefan wrote:
    >
    > I have recently read an article about bit rot:
    > http://tinyurl.com/nc5763k
    >


    >
    >

    The article probably blows things out of proportion

    --
    GNOME 3.10.2
    openSUSE 13.1 (Bottle) (x86_64) 64-bit
    Kernel Linux 3.11.6-4-desktop

  3. #3
    Join Date
    Feb 2009
    Location
    Spain
    Posts
    25,547

    Default Re: Bit rot is real

    On 2014-01-25 06:46, ZStefan wrote:
    >
    > I have recently read an article about bit rot:
    > http://tinyurl.com/nc5763k


    Interesting.

    Two snags, though: it requires duplicating hardware, ie, using the raid
    version of btrfs. And, I have seen btrfs failing completely.

    It was also possible, since many years, to store data with some
    redundancy enough to recover damaged sectors, without using raid. It is
    routinely used for such thing as transmissions from remote space
    missions, which use a single data stream (a retransmission, even if
    possible, could take hours to request, so error detection is
    insuficient). It is called forward error recovery.

    --
    Cheers / Saludos,

    Carlos E. R.
    (from 12.3 x86_64 "Dartmouth" at Telcontar)

  4. #4
    Join Date
    Nov 2009
    Location
    West Virginia Sector 13
    Posts
    16,129

    Default Re: Bit rot is real

    If you see this a lot then you should check the drive .smartctl -a /dev/sdX where X is the drive number

  5. #5
    Join Date
    Feb 2009
    Location
    Spain
    Posts
    25,547

    Default Re: Bit rot is real

    On 2014-01-25 20:06, gogalthorp wrote:
    >
    > If you see this a lot then you should check the drive .smartctl -a
    > /dev/sdX where X is the drive number


    No, smartctl can not detect this type of problem. It detects sectors
    completely damaged. Sectors that you write something and read something
    else entirely, everytime you try.

    This problem is way more subtle, like just a bit changing in gigabytes
    of data. It simply can not be detected unless the filesystem (or the
    hardware) is designed for it.

    --
    Cheers / Saludos,

    Carlos E. R.
    (from 12.3 x86_64 "Dartmouth" at Telcontar)

  6. #6
    Join Date
    Nov 2009
    Location
    West Virginia Sector 13
    Posts
    16,129

    Default Re: Bit rot is real

    Sure bit rot happens but it is rare if you see it a lot something in the hardware is going south.
    A bit change still should set a checksum wrong and that should be recorded by smart
    Never hurts to check in any case.

  7. #7

    Default Automation is needed

    Quote Originally Posted by gogalthorp View Post
    Sure bit rot happens but it is rare if you see it a lot something in the hardware is going south.
    A bit change still should set a checksum wrong and that should be recorded by smart
    Never hurts to check in any case.
    I am looking for some automated system, so that I don't need to keep track of checksums of data files. Data files are the ones that are not going to be modified after acquisition.

    smartctl does not read files, does not create and compare checksums.

  8. #8
    Join Date
    Feb 2009
    Location
    Spain
    Posts
    25,547

    Default Re: Bit rot is real

    On 2014-01-26 06:56, gogalthorp wrote:
    >
    > Sure bit rot happens but it is rare if you see it a lot something in the
    > hardware is going south.


    You will never know if this happens, so you don't know how frequent it
    is or not.

    > A bit change still should set a checksum wrong and that should be
    > recorded by smart


    No, it is not. Or not always.

    In the demonstration of the article, a single bit is changed in one of
    the mirror sides, using system tools. As far as the disk hardware is
    concerned, the data is absolutely correct, so no detection is possible.

    But the advanced features of the btrfs detected and corrected the error,
    automatically.

    That's the point.

    > Never hurts to check in any case.


    Oh, I do. No errors.



    Actually, if you look carefully at the output from modern disks, you
    will see some figures that show the underlying error rate, pretty high.
    These errors are continuously corrected, the hardware is designed for
    this. The signals the head reads are simply very close to noise level.
    With bigger disk sizes, the chances of errors that can not be corrected
    (with nothing going (more) wrong) increase.


    --
    Cheers / Saludos,

    Carlos E. R.
    (from 12.3 x86_64 "Dartmouth" at Telcontar)

  9. #9

    Default Re: Bit rot is real

    ZStefan wrote:
    > I have recently read an article about bit rot:
    > http://tinyurl.com/nc5763k
    >
    > Bit rot is the phenomenon in hard drives when a bit arbitrarily switches
    > its value in a file on HD.
    >
    > The author says that bit rot is quite common:


    Err, this is all total rubbish.

    Yes magnetic media get bit errors. That's why magnetic media have used
    error-correction since the Ark. If a bit rots in a sector, the drive
    detects and corrects it and there is *no data corruption*. That's why
    filesystems haven't bothered much historically, because the medium is
    reliable.

    Sure, if you manually change a bit in a file, you will see that a bit
    has been changed. But if a bit arbitrarily changes state in a sector,
    the hardware will detect and correct it.

  10. #10

    Default Re: Bit rot is real

    Well, maybe the error correction on hardware or firmware levels has its limits, since I have seen files with bit rot. I still keep one such file, a few GB in size, in which some bits evidently are wrong. All files were saved on HDs and then read, years later. Unfortunately, I didn't even think about creating md5 sums of them immediately after acquisition.

    While the change of file's content can take place because of various reasons, in two cases I suspected a defect which was magnetic in nature, and investigated. I checked the HDs in a variety of ways, from soft non-data-destructive tools to using the tools from manufacturer. All tests showed healthy HDs. How does one explain bit rot then? Data from HDs was backed up regularly, but the content was not visually checked. I discarded the HDs.

    Another interesting statement from the article:

    "It's a common misconception to think that RAID protects data from corruption since it introduces redundancy. The reality is exactly the opposite: traditional RAID increases the likelihood of data corruption since it introduces more physical devices with more things to go wrong." There is more explanation in the article, reasoning that non-catastrophic failure of a disk in RAID leads to data corruption.

    I don't use RAID and advise colleagues not to use RAID for storage, and only to use RAID0 for speed. "For storage, use single disks and make backups instead of relying on RAID's abilities," I say to them. I have read somewhere that Google does not use RAIDs for data storage, but this information may be old or wrong.

Page 1 of 7 123 ... LastLast

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •