Page 1 of 2 12 LastLast
Results 1 to 10 of 11

Thread: Hard drive failing with Btrfs error correction running. How to minimise damage?

Hybrid View

  1. #1

    Default Hard drive failing with Btrfs error correction running. How to minimise damage?

    My desktop machine crashed last night. I left it doing an update and the next day I had an amber warning light on the panel and my network was down. Looked at the network with another machine on the lan and was told there was an address conflict.
    Decided to reboot and the boot stalled waiting for what from the booting screen is stated as rebuilding Btrfs file or similar words but it had not corrected errors after 9 minutes so I chickened out and turned the machine off while I had a think. It appears the network problem was a red herring due to DHCP being messed up during boot as all the network was OK once the failing machine was turned off.

    It is clear that a hard drive is in trouble but my problem is my notes from installation have been lost in a house move. I believe there are possibly two raids setup but in hardware so difficult to find out. As I recall first drives were set up with root partition on Btrfs and the home partition on xfs. There was also a very large raid drive just for data. I am sure the data partition is OK and is fully backed up. What I am not so confident of is the home partition and the backups I have on my NAS seem too old so I would like to try and recover if possible.

    My options appear to be:-
    Use gparted to interrogate the partitions to the extent possible, possibly with help here to see what we have in the box.
    Look for warning lights on the drives themselves and if only one is winking, clone it and see if it can be fixed on boot. (I have not seen a raid warnings anywhere!)
    Reboot the machine, wait and pray.

    Grateful for some guidance please as I am sure every time I try and boot, more damage will be done.

  2. #2
    Join Date
    Jan 2014
    Location
    Erlangen
    Posts
    1,700
    Blog Entries
    1

    Default Re: Hard drive failing with Btrfs error correction running. How to minimise damage?

    Quote Originally Posted by Budgie2 View Post
    My desktop machine crashed last night. I left it doing an update and the next day I had an amber warning light on the panel and my network was down. Looked at the network with another machine on the lan and was told there was an address conflict.
    Decided to reboot and the boot stalled waiting for what from the booting screen is stated as rebuilding Btrfs file or similar words but it had not corrected errors after 9 minutes so I chickened out and turned the machine off while I had a think. It appears the network problem was a red herring due to DHCP being messed up during boot as all the network was OK once the failing machine was turned off.

    It is clear that a hard drive is in trouble but my problem is my notes from installation have been lost in a house move. I believe there are possibly two raids setup but in hardware so difficult to find out. As I recall first drives were set up with root partition on Btrfs and the home partition on xfs. There was also a very large raid drive just for data. I am sure the data partition is OK and is fully backed up. What I am not so confident of is the home partition and the backups I have on my NAS seem too old so I would like to try and recover if possible.

    My options appear to be:-
    Use gparted to interrogate the partitions to the extent possible, possibly with help here to see what we have in the box.
    Look for warning lights on the drives themselves and if only one is winking, clone it and see if it can be fixed on boot. (I have not seen a raid warnings anywhere!)
    Reboot the machine, wait and pray.

    Grateful for some guidance please as I am sure every time I try and boot, more damage will be done.
    Get a live or rescue system and follow https://en.opensuse.org/SDB:BTRFS
    AMD Athlon 4850e (2009), openSUSE 13.1, KDE 4, Intel i3-4130 (2014), i7-6700K (2016), i5-8250U (2018), AMD Ryzen 5 3400G (2020), openSUSE Tumbleweed, KDE Plasma 5

  3. #3

    Default Re: Hard drive failing with Btrfs error correction running. How to minimise damage?

    Quote Originally Posted by karlmistelberger View Post
    Get a live or rescue system and follow https://en.opensuse.org/SDB:BTRFS
    Hi and very many thanks for the link. Before I received your help I spent some time trying to research what I had done and although my ms notes are lost I have some history on websites and was able to call up the raid configuration gui. Quite difficult to follow but it appears I have all 8 1Tb drives set up as a raid 5 array and one has failed.

    In consequence the raid has, I assume, been working in the background but the system still allowed me to boot. At this point the Btrfs system must have had a fit chasing a moving system. So there is some fault tolerance built into Btrfs and also into raid 5. What should I do next?

    I have now identified the dead drive but am struggling for an exact replacement. Just a thought, I have several similar spec 2Tb drives available. How tolerant would the raid controller be to accepting a replacement which is larger but only using what is required? Any hardware specialists out there?

    Meanwhile I shall search for replacement of same type and size but don't hold out much hope so need to work out a plan on what next. Plenty of reading through the weekend I fear.

  4. #4
    Join Date
    Jun 2008
    Location
    Podunk
    Posts
    29,061
    Blog Entries
    15

    Default Re: Hard drive failing with Btrfs error correction running. How to minimise damage?

    Quote Originally Posted by Budgie2 View Post
    Hi and very many thanks for the link. Before I received your help I spent some time trying to research what I had done and although my ms notes are lost I have some history on websites and was able to call up the raid configuration gui. Quite difficult to follow but it appears I have all 8 1Tb drives set up as a raid 5 array and one has failed.

    In consequence the raid has, I assume, been working in the background but the system still allowed me to boot. At this point the Btrfs system must have had a fit chasing a moving system. So there is some fault tolerance built into Btrfs and also into raid 5. What should I do next?

    I have now identified the dead drive but am struggling for an exact replacement. Just a thought, I have several similar spec 2Tb drives available. How tolerant would the raid controller be to accepting a replacement which is larger but only using what is required? Any hardware specialists out there?

    Meanwhile I shall search for replacement of same type and size but don't hold out much hope so need to work out a plan on what next. Plenty of reading through the weekend I fear.
    Hi
    As long as the disk has the same Sector Size eg 512 bytes logical/physical, create the partition size with the same sector count as the smaller drive your matching and should be fine....

    A hindsight comment.... Remember RAID does not equal BACKUP/RESTORE... so you should have a backup disk(s) that are the same size or greater as your RAID configuration.... For me, if creating a RAID setup then normally purchase a spare disk to have a round
    Cheers Malcolm °¿° SUSE Knowledge Partner (Linux Counter #276890)
    SUSE SLE, openSUSE Leap/Tumbleweed (x86_64) | GNOME DE
    If you find this post helpful and are logged into the web interface,
    please show your appreciation and click on the star below... Thanks!

  5. #5

    Default Re: Hard drive failing with Btrfs error correction running. How to minimise damage?

    Quote Originally Posted by malcolmlewis View Post
    Hi
    As long as the disk has the same Sector Size eg 512 bytes logical/physical, create the partition size with the same sector count as the smaller drive your matching and should be fine....

    A hindsight comment.... Remember RAID does not equal BACKUP/RESTORE... so you should have a backup disk(s) that are the same size or greater as your RAID configuration.... For me, if creating a RAID setup then normally purchase a spare disk to have a round
    Hi Malcolm,
    I think the spare went some time ago to family! Fortunately I have found a new identical drive on line at much less than original price and it should be here within a couple of days. Meanwhile will leave machine alone. If the disc arrives and I put it in, do I then issue a "rebuild" instruction or will it just work. I am terrified I issue the wrong command and wipe the lot. So many choices/options in the raid controller!

  6. #6
    Join Date
    Jun 2008
    Location
    Podunk
    Posts
    29,061
    Blog Entries
    15

    Default Re: Hard drive failing with Btrfs error correction running. How to minimise damage?

    Quote Originally Posted by Budgie2 View Post
    Hi Malcolm,
    I think the spare went some time ago to family! Fortunately I have found a new identical drive on line at much less than original price and it should be here within a couple of days. Meanwhile will leave machine alone. If the disc arrives and I put it in, do I then issue a "rebuild" instruction or will it just work. I am terrified I issue the wrong command and wipe the lot. So many choices/options in the raid controller!
    Hi
    I would check the array with mdadm (assuming you have used software RAID?), or look at using YaST partitioner to remove the degraded drive and add the new one in...
    Cheers Malcolm °¿° SUSE Knowledge Partner (Linux Counter #276890)
    SUSE SLE, openSUSE Leap/Tumbleweed (x86_64) | GNOME DE
    If you find this post helpful and are logged into the web interface,
    please show your appreciation and click on the star below... Thanks!

  7. #7

    Default Re: Hard drive failing with Btrfs error correction running. How to minimise damage?

    Quote Originally Posted by malcolmlewis View Post
    Hi
    I would check the array with mdadm (assuming you have used software RAID?), or look at using YaST partitioner to remove the degraded drive and add the new one in...
    Hi Malcolm,
    No software raids here. Using IBM ServeRAID hardware.

  8. #8
    Join Date
    Jun 2008
    Location
    Podunk
    Posts
    29,061
    Blog Entries
    15

    Default Re: Hard drive failing with Btrfs error correction running. How to minimise damage?

    Quote Originally Posted by Budgie2 View Post
    Hi Malcolm,
    No software raids here. Using IBM ServeRAID hardware.
    Hi
    Then likely you need to perform removal/replacement at the RAID BIOS level, or does it support hotswap?
    Cheers Malcolm °¿° SUSE Knowledge Partner (Linux Counter #276890)
    SUSE SLE, openSUSE Leap/Tumbleweed (x86_64) | GNOME DE
    If you find this post helpful and are logged into the web interface,
    please show your appreciation and click on the star below... Thanks!

  9. #9

    Default Re: Hard drive failing with Btrfs error correction running. How to minimise damage?

    Quote Originally Posted by malcolmlewis View Post
    Hi
    Then likely you need to perform removal/replacement at the RAID BIOS level, or does it support hotswap?
    Hi Malcolm,
    All singing and dancing and I recall it does hot swap so once I get the drive will plug in and try. The BIOS controller can use cli or a small gui which runs from the BIOS but I am spoiled for choice with options and am not confident with the terminology. The term "rebuild" could mean rebuilding the array but does it keep the file system intact? In my distant memory rebuild meant you lost the existing array so will not try that. I was hoping for repair!!!
    More when I get the drive.
    Regards,
    Budge

  10. #10
    Join Date
    Jun 2008
    Location
    Podunk
    Posts
    29,061
    Blog Entries
    15

    Default Re: Hard drive failing with Btrfs error correction running. How to minimise damage?

    Quote Originally Posted by Budgie2 View Post
    Hi Malcolm,
    All singing and dancing and I recall it does hot swap so once I get the drive will plug in and try. The BIOS controller can use cli or a small gui which runs from the BIOS but I am spoiled for choice with options and am not confident with the terminology. The term "rebuild" could mean rebuilding the array but does it keep the file system intact? In my distant memory rebuild meant you lost the existing array so will not try that. I was hoping for repair!!!
    More when I get the drive.
    Regards,
    Budge
    Hi
    I would suggest going in and remove the drive from the array, shutdown, replace drive, then bring up and re-attach to the array. Read the IBM documentation for the controller, but I would expect you need to format, rebuild all via the controller before starting the system.
    Cheers Malcolm °¿° SUSE Knowledge Partner (Linux Counter #276890)
    SUSE SLE, openSUSE Leap/Tumbleweed (x86_64) | GNOME DE
    If you find this post helpful and are logged into the web interface,
    please show your appreciation and click on the star below... Thanks!

Page 1 of 2 12 LastLast

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •