NETGEAR is aware of a growing number of phone and online scams. To learn how to stay safe click here.

Forum Discussion

Sandshark's avatar
Sandshark
Sensei
Aug 12, 2024

Can you safely stop as re-sync?

While I've successfully halted the re-sync that happens as a part of the ReadyNAS scrub (by echoing idle to the appropriate array's sync_action), I now find myself in a position that has caused others to lose their array, so I'm looking for advice.

 

One of the drives in my backup unit had >500 ATA errors (having previously had no SMART errors at all), which caused it to be kicked out of the array (XRAID 5).  Unfortunately, XRAID then started a re-sync with that drive, which is going to last for days.  I want to replace the drive, not try to get the clearly bad one back into the array.  But pulling the drive is, as best I know, a bad idea, whether power is on or off.  Everything I Google says it's a bad idea to to this for any MDADM RAID, not just a ReadyNAS one, since I don't know that the ATA errors didn't affect the data on one of the other drives. (it was in the middle of a backup job when it happened).

 

So, is there a safe way to stop the re-sync before I do the swap, or am I better off letting it either fail or complete before I replace the drive?  Given it's a backup, I can recover if I lose the array, but it's a lot of data to transfer.

4 Replies

Replies have been turned off for this discussion
  • StephenB's avatar
    StephenB
    Guru - Experienced User

    Sandshark wrote:

    since I don't know that the ATA errors didn't affect the data on one of the other drives. (it was in the middle of a backup job when it happened).

     


    Just to clarify this:.  When you are re-syncing a drive, its contents are being rebuilt from the other drives in the array.  The data on the other drives won't be changed by the resync.  It's the other way around.

     

    When data is written to a drive, the associated parity block on another drive needs to be updated too.  If the parity blocks on the remaining drives in the array somehow didn't get updated when the problem drive dropped out, then there would be some data loss.    But that damage (if it exists) has already been done.  Letting the resync finish won't repair it.

     


    Sandshark wrote:

    So, is there a safe way to stop the re-sync before I do the swap, or am I better off letting it either fail or complete before I replace the drive?  Given it's a backup, I can recover if I lose the array, but it's a lot of data to transfer.


    I think you can safely pull the problem drive now, and then hot-insert the replacement when it arrives.  The main reason to consider doing that is that it reduces the stress on the remaining drives in the array.

     

     

     

    • Sandshark's avatar
      Sandshark
      Sensei

      StephenB :  So if the data on one of the other drives is corrupt, it won't try to recover from the rest of the array, including the drive it is re-syncing, even though the status is shown as "re-syncing", not "re-building" and the performance page only shows reads except during the period it ran the backups?   Will it at least give me some kind of error that there is a parity issue?  It may end up a moot point.  The completion estimate updated and it should complete before the replacement drive gets here and I can run it through some tests.  The drives are all "low mileage", so no issue with the stress.  This is a new situation for me.  I typically replace drives before they create this kind of issue.  And the couple I haven't have all died completely -- no re-sync attempted. 

       

      ranaabid :  This is my backup, and I'm trying to avoid the need to do a data recovery, which I can do from my (RAID6) primary NAS if needed.  It'll just take a few days, during which I'll have no backup of the data not also backed up off-site.  I only back up critical data off-site, and this backup includes all of my media files, which are obviously non-critical.

       

       

      • StephenB's avatar
        StephenB
        Guru - Experienced User

        Sandshark wrote:

        So if the data on one of the other drives is corrupt, it won't try to recover from the rest of the array, including the drive it is re-syncing, even though the status is shown as "re-syncing", not "re-building" and the performance page only shows reads except during the period it ran the backups?   

        With RAID 5/6 the mdadm resync assumes the data blocks are ok unless there is a read error.  I believe it only re-writes parity blocks when it finds a discrepancy with no read error. That would expain the lack of writes.

         

        If enabled, Netgear's bit-rot protection should also kick in if there is an error in the file checksums, but that would only happen when the file is read later on (not by mdadm of course, since it is running below BTRFS).

         

        Not sure how that changes the calculus on whether you should pull the disk with errors now, or whether you should wait.  I guess if data is wrong on the disk that dropped out (but the parity block was calculated on the correct data), then updating the parity blocks would lock in the error.

         

  • Hi Sandshark,

    Always make sure to keep a current backup before performing these kinds of operations to minimize data loss risk. If you're unsure or uncomfortable with stopping the re-sync, consulting with a data recovery specialist might be a wise choice.

    Hope this helps, and good luck with the drive replacement!

NETGEAR Academy

Boost your skills with the Netgear Academy - Get trained, certified and stay ahead with the latest Netgear technology! 

Join Us!

ProSupport for Business

Comprehensive support plans for maximum network uptime and business peace of mind.

 

Learn More