**NOTE** January 2022 - Changes to Gstreamer and Pipewire packages from PackmanPlease read the following thread about the current changes
-
Bit rot is real
I have recently read an article about bit rot:
http://arstechnica.com/information-t...n-filesystems/
Bit rot is the phenomenon in hard drives when a bit arbitrarily switches its value in a file on HD.
The author says that bit rot is quite common:
"It's not uncommon at all to see five, or 10, or 50 checksum errors on a disk that's been in service for a few years"
And I am afraid I have encountered it many times. Some of the files had to be discarded.
Bit rot cannot be easily detected in ext4 and many other filesystems since the filesystem's integrity is not violated in most cases. A simple backup of the file made after bit rot will, naturally, also be corrupted, and, unless special measures are taken, the file will be corrupted.
The basic way to find out whether bit rot has taken place is to calculate the checksums of the file.
Now, I wonder whether there are tools to reveal and to fight the bit rot without switching to btrfs? Maybe some utilities that would calculate the md5sums of important files automatically, and compare them from time to time?
-
Re: Bit rot is real
ZStefan wrote:
>
> I have recently read an article about bit rot:
> http://tinyurl.com/nc5763k
>
>
>
The article probably blows things out of proportion
--
GNOME 3.10.2
openSUSE 13.1 (Bottle) (x86_64) 64-bit
Kernel Linux 3.11.6-4-desktop
-
Re: Bit rot is real
On 2014-01-25 06:46, ZStefan wrote:
>
> I have recently read an article about bit rot:
> http://tinyurl.com/nc5763k
Interesting.
Two snags, though: it requires duplicating hardware, ie, using the raid
version of btrfs. And, I have seen btrfs failing completely.
It was also possible, since many years, to store data with some
redundancy enough to recover damaged sectors, without using raid. It is
routinely used for such thing as transmissions from remote space
missions, which use a single data stream (a retransmission, even if
possible, could take hours to request, so error detection is
insuficient). It is called forward error recovery.
--
Cheers / Saludos,
Carlos E. R.
(from 12.3 x86_64 "Dartmouth" at Telcontar)
-
Re: Bit rot is real
If you see this a lot then you should check the drive .smartctl -a /dev/sdX where X is the drive number
-
Re: Bit rot is real
On 2014-01-25 20:06, gogalthorp wrote:
>
> If you see this a lot then you should check the drive .smartctl -a
> /dev/sdX where X is the drive number
No, smartctl can not detect this type of problem. It detects sectors
completely damaged. Sectors that you write something and read something
else entirely, everytime you try.
This problem is way more subtle, like just a bit changing in gigabytes
of data. It simply can not be detected unless the filesystem (or the
hardware) is designed for it.
--
Cheers / Saludos,
Carlos E. R.
(from 12.3 x86_64 "Dartmouth" at Telcontar)
-
Re: Bit rot is real
Sure bit rot happens but it is rare if you see it a lot something in the hardware is going south.
A bit change still should set a checksum wrong and that should be recorded by smart
Never hurts to check in any case.
-
Automation is needed
 Originally Posted by gogalthorp
Sure bit rot happens but it is rare if you see it a lot something in the hardware is going south.
A bit change still should set a checksum wrong and that should be recorded by smart
Never hurts to check in any case.
I am looking for some automated system, so that I don't need to keep track of checksums of data files. Data files are the ones that are not going to be modified after acquisition.
smartctl does not read files, does not create and compare checksums.
-
Re: Bit rot is real
On 2014-01-26 06:56, gogalthorp wrote:
>
> Sure bit rot happens but it is rare if you see it a lot something in the
> hardware is going south.
You will never know if this happens, so you don't know how frequent it
is or not.
> A bit change still should set a checksum wrong and that should be
> recorded by smart
No, it is not. Or not always.
In the demonstration of the article, a single bit is changed in one of
the mirror sides, using system tools. As far as the disk hardware is
concerned, the data is absolutely correct, so no detection is possible.
But the advanced features of the btrfs detected and corrected the error,
automatically.
That's the point.
> Never hurts to check in any case.
Oh, I do. No errors.
Actually, if you look carefully at the output from modern disks, you
will see some figures that show the underlying error rate, pretty high.
These errors are continuously corrected, the hardware is designed for
this. The signals the head reads are simply very close to noise level.
With bigger disk sizes, the chances of errors that can not be corrected
(with nothing going (more) wrong) increase.
--
Cheers / Saludos,
Carlos E. R.
(from 12.3 x86_64 "Dartmouth" at Telcontar)
-
Re: Bit rot is real
ZStefan wrote:
> I have recently read an article about bit rot:
> http://tinyurl.com/nc5763k
>
> Bit rot is the phenomenon in hard drives when a bit arbitrarily switches
> its value in a file on HD.
>
> The author says that bit rot is quite common:
Err, this is all total rubbish.
Yes magnetic media get bit errors. That's why magnetic media have used
error-correction since the Ark. If a bit rots in a sector, the drive
detects and corrects it and there is *no data corruption*. That's why
filesystems haven't bothered much historically, because the medium is
reliable.
Sure, if you manually change a bit in a file, you will see that a bit
has been changed. But if a bit arbitrarily changes state in a sector,
the hardware will detect and correct it.
-
Re: Bit rot is real
Well, maybe the error correction on hardware or firmware levels has its limits, since I have seen files with bit rot. I still keep one such file, a few GB in size, in which some bits evidently are wrong. All files were saved on HDs and then read, years later. Unfortunately, I didn't even think about creating md5 sums of them immediately after acquisition.
While the change of file's content can take place because of various reasons, in two cases I suspected a defect which was magnetic in nature, and investigated. I checked the HDs in a variety of ways, from soft non-data-destructive tools to using the tools from manufacturer. All tests showed healthy HDs. How does one explain bit rot then? Data from HDs was backed up regularly, but the content was not visually checked. I discarded the HDs.
Another interesting statement from the article:
"It's a common misconception to think that RAID protects data from corruption since it introduces redundancy. The reality is exactly the opposite: traditional RAID increases the likelihood of data corruption since it introduces more physical devices with more things to go wrong." There is more explanation in the article, reasoning that non-catastrophic failure of a disk in RAID leads to data corruption.
I don't use RAID and advise colleagues not to use RAID for storage, and only to use RAID0 for speed. "For storage, use single disks and make backups instead of relying on RAID's abilities," I say to them. I have read somewhere that Google does not use RAIDs for data storage, but this information may be old or wrong.
Tags for this Thread
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|