NETGEAR is aware of a growing number of phone and online scams. To learn how to stay safe click here.

Forum Discussion

Chappy316's avatar
Chappy316
Aspirant
Nov 11, 2021

Chances of Recovery

Hoping to (hopefully) find some sliver of light at the end of what appears to be a very dark tunnel at this time.

 

It appears that I have lost two of the four drives (in a very short time period) and I am wondering if there is any chance of recovering anything that was on these drives.

 

I was having an issue with a drive starting to throw errors. Unfortunately, I drug my feet and did not replace said drive in a timely manner as I should have. This backup was not accessed very often and within the last couple weeks we have lost two of those four drives in a matter of a few days without knowing it.

 

Is there any way to potentially get the drives back up long enough to recover some or all of the data so I can begin the process of rebuilding that should have been started when I first noticed errors with the first drive?

 

Ideally this time around I will purchase an external backup and grab the most important files first if there is a way to some how restore the volume, even if it is temporary.

 

Thanks to everyone for any and all help. Sadly, I may have learned my lesson the hard way. Shame on me but if this is the way it goes, I can at least say I learned.

23 Replies

Replies have been turned off for this discussion
  • Every situation is different. Problems like this have to be considered on a case by case basis. It depends how badly the disks have failed etc.

     

    In some cases use of disk cloning tools designed for cloning failing disks can help.

  • To avoid cluttering up someone else's thread, it appears that Matt-I is having similar issues to me.

     

    Someone else tagged rn_enthusiast in his thread. I will be taking the suggestions he made to see what we can come up with. Hopefully things work out for the best.

    • rn_enthusiast's avatar
      rn_enthusiast
      Virtuoso

      Hi Chappy316 

       

      Your raid is broken because you have a dual disk failure in a raid 5.

       

      Disk 3 started to show signs of failure back in December 2020. I don't think you were notified as it seems the alert system failed to send you messages (either not configured or misconfigured).

      [20/12/01 22:47:30 EST] warning:system:LOGMSG_SENT_ALERT_MESG_FAILED Alert message failed to send.

      Evident in the logs is that disk 3 was steadily getting worse throughout the year and eventually the disk was kicked from the raid.

      [21/06/13 17:25:27 EDT] warning:volume:LOGMSG_HEALTH_VOLUME Volume data health changed from Redundant to Degraded.
      [21/06/13 17:25:31 EDT] err:disk:LOGMSG_ZFS_DISK_STATUS_CHANGED Disk in channel 3 (Internal) changed state from ONLINE to FAILED.

      In July, it appears the disk was pulled from the bay and re-added which initiated a raid re-sync.

      [21/07/08 19:23:58 EDT] warning:disk:LOGMSG_DELETE_DISK Disk Model:WDC WD4000FYYZ-01UL1B2 Serial:WD-WMC130D08XJ7 was removed from Channel 3 of the head unit.
      [21/07/08 19:26:45 EDT] notice:disk:LOGMSG_ADD_DISK Disk Model:WDC WD4000FYYZ-01UL1B2 Serial:WD-WMC130D08XJ7 was added to Channel 3 of the head unit.
      [21/07/08 19:26:58 EDT] notice:volume:LOGMSG_RESILVERSTARTED_VOLUME Resyncing started for Volume data.

      However, the disk is too bad and resync never completed. From this moment forward, the raid is no longer redundant.

      Then in October, disk 4 was kicked out of the raid and the volume was declared "dead" (dual disk failure).

      [21/10/04 18:03:11 EDT] notice:volume:LOGMSG_HEALTH_VOLUME Volume data health changed from Degraded to Dead.
      [21/10/04 18:04:09 EDT] err:disk:LOGMSG_ZFS_DISK_STATUS_CHANGED Disk in channel 4 (Internal) changed state from ONLINE to FAILED.

      Disk 4 is not 100% healthy but not too bad either. 19 bad sectors on the disk and I don't see any real complaints about the disk in the kernel logs. However, it is still not a healthy disk and clearly it encountered a failure on the 4th of Oct.

       

      Below is the current state of your disk 3 and disk 4.

      ---> Disk 3
      Device:             sdb
      Controller:         0
      Channel:            2
      Model:              WDC WD4000FYYZ-01UL1B2
      Serial:             WD-WMC130D08XJ7
      Firmware:           01.01K03
      Class:              SATA
      RPM:                7200
      Sectors:            7814037168
      Pool:               data-0
      PoolType:           RAID 5
      PoolState:          5
      PoolHostId:         2fe5a296
      Health data 
        ATA Error Count:                2554
        Reallocated Sectors:            1300
        Reallocation Events:            125
        Spin Retry Count:               0
        Current Pending Sector Count:   864
        Uncorrectable Sector Count:     148
        Temperature:                    45
        Start/Stop Count:               33
        Power-On Hours:                 36077
        Power Cycle Count:              33
        Load Cycle Count:               3
      
      
      ---> Disk 4
      Device:             sdc
      Controller:         0
      Channel:            3
      Model:              WDC WD4000FYYZ-01UL1B2
      Serial:             WD-WMC130D3MD5Z
      Firmware:           01.01K03
      Class:              SATA
      RPM:                7200
      Sectors:            7814037168
      Pool:               data-0
      PoolType:           RAID 5
      PoolState:          5
      PoolHostId:         2fe5a296
      Health data 
        ATA Error Count:                0
        Reallocated Sectors:            0
        Reallocation Events:            0
        Spin Retry Count:               0
        Current Pending Sector Count:   19
        Uncorrectable Sector Count:     19
        Temperature:                    44
        Start/Stop Count:               32
        Power-On Hours:                 37414
        Power Cycle Count:              32
        Load Cycle Count:               3

       

      Disk 1 and 2 are healthy as is. Disk 3 is likely a write-off at this stage (but keep it until data is recovered). The best option here is to clone disk 4 and use a healthy cloned disk to re-assemble the raid. That should absolutely be possible given that disk 4 is not completely dead, which I don't see it being.


      My advise would be to opt for some paid data recovery support with Netgear and let them help clone disk 4 and re-assemble the raid. You could possibly even manually re-assemble the raid with the current disk 4 but I would not risk it as it is not a fully healthy disk. I think a paid support contract is worth it here as chances for successful recovery are very high, in my opinion.

       

      I would also advise that you consider getting the alerts setup to be working. This way you will be notified by email if disks are failing and you won't end up in the same situation again. Backups are of course also important and I am sure many people here on forum can give advise on that and which strategies they use.

       

       

      Cheers

      • Chappy316's avatar
        Chappy316
        Aspirant

        Hey rn_enthusiast,

         

        For starters, thank you very much for the help and insight to hopefully resolving this problem.

         

        Just some follow up then a couple questions.

        Disk 3 was removed from the array in July with some guidance from other uses on this forum to hopefully jump start it back to life. The suggestion was made, to hopefully make this fully redundant again, to start searching for replacement/upgrade options. I did not realize that it never fully reinitialized. In the process of determining what I wanted for a replacement, we get to where we are now unfortunately.

        In the future, I would guess your suggestion would be to not pull a drive that is potentially, or likely from what it looked like, failing to avoid a chance of breaking the array?

        Also, I am getting an external backup solution (some sort of external USB) for the highly sensitive files in the array. Do you have one here that you would recommend brand or size wise? I was looking at a WD Essentials as I have always used their internal drives in personal builds in the past and never had any major issues.

         

        So a couple questions on trying to recovery what is left.

        What is the process of going through NetGear for paid support to clone Disk 4 (and/or Disk 3 for that matter) and do you know a rough idea of cost on this process? (I know you were a former employee so its just a question, nothing I would hold you to. Just looking for a rough idea.)

        Is the cloning process something I could do at home myself? If yes, would that be a faster and cheaper first attempt to fixing the array? Also, is there any more damage that can be caused if I made an initial attempt at home and then had to revert to NetGear?

        When attempting to clone Disk 4 (and or Disk 3) either at home or with NetGear, can the drive size be upgraded at that time? The initial reason I came here was to look into expanding the size of the array. Currently they are 4tb drives, could I clone one or both of them to larger drives and restart the array with more size? The ultimate plan is to upgrade the whole array but I was cautioned away from it in July. The status of Disk 3 scared away a couple users in fear that another drive in the array may be close to end of life. Looks like they were right.

        If upgrading in HDD size is not the case, would I need to buy a 4tb drive that "matches" what is currently in the array or would anything of similar size be acceptable to clone to regardless of brand or model?

         

        As far as alerts go, apparently it doesn't want to play well with gmail or I am missing something simple. I tried getting it set up with my account so I can receive them and Google won't allow the simple one button sign in, even after I turned on the ability to use less secure apps. Attempting to manually enter the email credentials doesn't help either as it throws an SMTP error when sending a test message.

         

        Again, thank you for any and all insight into this.

        Chris

NETGEAR Academy

Boost your skills with the Netgear Academy - Get trained, certified and stay ahead with the latest Netgear technology! 

Join Us!

ProSupport for Business

Comprehensive support plans for maximum network uptime and business peace of mind.

 

Learn More