Forum Discussion

Aspirant

Mar 27, 2024

RN 316 - Raid 5 config, Data Volume Dead

Hello. I have a ReadyNas 316 with 6x hd running in a raid 5 configuration. The other day I had disk2 go from ONLINE to FAILED and the system entered a degraded state. This forced a resync. As it was resyncing, I then had a failure in disk3, and it went from ONLINE to FAILED. This caused disk2 to change from RESYNC to ONLINE, but the whole system moved from degraded to failed. Now it's stating the Volume: Volume data is Dead. At this point, I have powered down the system.

These hard drives are more than 5-6 years old and non enterprise. In hindsight, I should *not* have allowed the resync to occur and instead performed a backup of the NAS before resyncing the data. The resyncing of the raid array taxed disk3 resulting in its failure and breaking the whole system which I’m learning is quite common. At this point, I have lost 2 disks of my 6 hds, thus making recovery via the raid 5 configuration impossible.

I believe I can be certain that data in disk2 is gone. However, it is possible that I may be able to recover disk3 despite it being marked FAILED.

Based on my preliminary research, what I need to do is clone each drive to preserve whatever data is recoverable. If I’m lucky and can access disk 3 and clone it (along with disk 1,4,5,6) and not run into any other issues, I *should* be able to rebuild/recover the raid array as I have enough drives to do so.

I may consider professional data recovery, but I do want to assess if I could do this on my own first and what costs of that could look like. I know professional recovery can be very expensive and is not 100% guaranteed.

Now I need to formulate a plan. But some questions first:

The raid 5 configuration was setup using netgear’s X-Raid. Does this imply rebuilding this raid array outside of netgears ReadyNAS HW is impossible? I think X-Raid is proprietary to netgear?
Can I say with certainty that disk2 is useless in the context of using it for any data recovery, hence I can skip cloning it? It's the drive that first FAILED, which started the resyncing process, which never completed. I think any data that existed here to recover the raid array is gone or not useable from the incomplete resync. Am I wrong?
Cloning every drive to preserve their data would require me purchasing 6 new older hd models that are compatible with the ReadyNAS 316. Before I make this investment, I do want to be able get some confidence or indication that I can recover the data. What's the best way to start? I'm thinking maybe I just attempt to clone disk3, then replace it with the new clone in ReadyNAS 316.

Appreciate all help!

PS. pasting the last batch of logs that lead to this failure state:

Mar 27, 2024 01:00:17 AM Volume: Volume data is Dead.
Mar 26, 2024 10:23:57 PM Disk: Detected increasing ATA error count: [334] on disk 3 (Internal) [WDC WD30EFRX-68EUZN0, WD-WMC4N1002183] 2 times in the past 30 days. This condition often indicates an impending failure. Be prepared to replace this disk to maintain data redundancy.
Mar 26, 2024 10:22:45 PM Disk: Disk in channel 2 (Internal) changed state from RESYNC to ONLINE.
Mar 26, 2024 10:22:34 PM Disk: Disk in channel 3 (Internal) changed state from ONLINE to FAILED.
Mar 26, 2024 10:22:34 PM Volume: Volume data health changed from Degraded to Dead.
Mar 26, 2024 08:17:57 PM Disk: Detected increasing reallocated sector count: [1967] on disk 3 (Internal) [WDC WD30EFRX-68EUZN0 WD-WMC4N1002183] 53 times in the past 30 days. This condition often indicates an impending failure. Be prepared to replace this disk to maintain data redundancy.
Mar 26, 2024 08:15:46 PM Disk: Detected increasing reallocated sector count: [1959] on disk 3 (Internal) [WDC WD30EFRX-68EUZN0 WD-WMC4N1002183] 52 times in the past 30 days. This condition often indicates an impending failure. Be prepared to replace this disk to maintain data redundancy.
Mar 26, 2024 08:09:11 PM Disk: Detected increasing reallocated sector count: [1954] on disk 3 (Internal) [WDC WD30EFRX-68EUZN0 WD-WMC4N1002183] 51 times in the past 30 days. This condition often indicates an impending failure. Be prepared to replace this disk to maintain data redundancy.
Mar 26, 2024 08:00:48 PM Disk: Detected increasing reallocated sector count: [1948] on disk 3 (Internal) [WDC WD30EFRX-68EUZN0 WD-WMC4N1002183] 50 times in the past 30 days. This condition often indicates an impending failure. Be prepared to replace this disk to maintain data redundancy.
Mar 26, 2024 07:59:22 PM Disk: Detected increasing reallocated sector count: [1923] on disk 3 (Internal) [WDC WD30EFRX-68EUZN0 WD-WMC4N1002183] 49 times in the past 30 days. This condition often indicates an impending failure. Be prepared to replace this disk to maintain data redundancy.
Mar 26, 2024 07:56:36 PM Disk: Detected increasing reallocated sector count: [1758] on disk 3 (Internal) [WDC WD30EFRX-68EUZN0 WD-WMC4N1002183] 48 times in the past 30 days. This condition often indicates an impending failure. Be prepared to replace this disk to maintain data redundancy.
Mar 26, 2024 07:55:23 PM Disk: Detected increasing reallocated sector count: [1757] on disk 3 (Internal) [WDC WD30EFRX-68EUZN0 WD-WMC4N1002183] 47 times in the past 30 days. This condition often indicates an impending failure. Be prepared to replace this disk to maintain data redundancy.
Mar 26, 2024 07:52:13 PM Disk: Detected increasing reallocated sector count: [1718] on disk 3 (Internal) [WDC WD30EFRX-68EUZN0 WD-WMC4N1002183] 46 times in the past 30 days. This condition often indicates an impending failure. Be prepared to replace this disk to maintain data redundancy.
Mar 26, 2024 07:50:55 PM Disk: Detected increasing reallocated sector count: [1713] on disk 3 (Internal) [WDC WD30EFRX-68EUZN0 WD-WMC4N1002183] 45 times in the past 30 days. This condition often indicates an impending failure. Be prepared to replace this disk to maintain data redundancy.
Mar 26, 2024 07:48:37 PM Disk: Detected increasing reallocated sector count: [1696] on disk 3 (Internal) [WDC WD30EFRX-68EUZN0 WD-WMC4N1002183] 44 times in the past 30 days. This condition often indicates an impending failure. Be prepared to replace this disk to maintain data redundancy.
Mar 26, 2024 06:54:00 PM Disk: Disk Model:WDC WD30EFRX-68EUZN0 Serial:WD-WMC4N0956058 was added to Channel 2 of the head unit.
Mar 26, 2024 06:53:43 PM Disk: Disk Model:WDC WD30EFRX-68EUZN0 Serial:WD-WMC4N0956058 was removed from Channel 2 of the head unit.
Mar 26, 2024 06:52:29 PM Disk: Detected increasing reallocated sector count: [1663] on disk 3 (Internal) [WDC WD30EFRX-68EUZN0 WD-WMC4N1002183] 43 times in the past 30 days. This condition often indicates an impending failure. Be prepared to replace this disk to maintain data redundancy.
Mar 26, 2024 06:52:28 PM Disk: Disk in channel 2 (Internal) changed state from ONLINE to FAILED.
Mar 26, 2024 06:52:22 PM Volume: Volume data health changed from Redundant to Degraded.
Mar 26, 2024 06:44:27 PM Disk: Detected increasing reallocated sector count: [1439] on disk 3 (Internal) [WDC WD30EFRX-68EUZN0 WD-WMC4N1002183] 42 times in the past 30 days. This condition often indicates an impending failure. Be prepared to replace this disk to maintain data redundancy.
Mar 26, 2024 06:43:14 PM Disk: Detected increasing reallocated sector count: [1407] on disk 3 (Internal) [WDC WD30EFRX-68EUZN0 WD-WMC4N1002183] 41 times in the past 30 days. This condition often indicates an impending failure. Be prepared to replace this disk to maintain data redundancy.
Mar 26, 2024 06:13:26 PM Disk: Detected increasing reallocated sector count: [1361] on disk 3 (Internal) [WDC WD30EFRX-68EUZN0 WD-WMC4N1002183] 40 times in the past 30 days. This condition often indicates an impending failure. Be prepared to replace this disk to maintain data redundancy.
Mar 26, 2024 04:37:29 PM Disk: Detected increasing reallocated sector count: [1360] on disk 3 (Internal) [WDC WD30EFRX-68EUZN0 WD-WMC4N1002183] 39 times in the past 30 days. This condition often indicates an impending failure. Be prepared to replace this disk to maintain data redundancy.
Mar 26, 2024 04:20:57 PM Disk: Detected increasing reallocated sector count: [1326] on disk 3 (Internal) [WDC WD30EFRX-68EUZN0 WD-WMC4N1002183] 38 times in the past 30 days. This condition often indicates an impending failure. Be prepared to replace this disk to maintain data redundancy.
Mar 26, 2024 04:18:32 PM Disk: Detected increasing reallocated sector count: [1325] on disk 3 (Internal) [WDC WD30EFRX-68EUZN0 WD-WMC4N1002183] 37 times in the past 30 days. This condition often indicates an impending failure. Be prepared to replace this disk to maintain data redundancy.
Mar 26, 2024 03:37:44 PM Disk: Detected increasing reallocated sector count: [1317] on disk 3 (Internal) [WDC WD30EFRX-68EUZN0 WD-WMC4N1002183] 36 times in the past 30 days. This condition often indicates an impending failure. Be prepared to replace this disk to maintain data redundancy.
Mar 26, 2024 01:38:10 PM Disk: Detected increasing reallocated sector count: [1314] on disk 3 (Internal) [WDC WD30EFRX-68EUZN0 WD-WMC4N1002183] 35 times in the past 30 days. This condition often indicates an impending failure. Be prepared to replace this disk to maintain data redundancy.
Mar 26, 2024 11:21:25 AM Volume: Scrub started for volume data.
Mar 26, 2024 10:12:34 AM Volume: Disk test failed on disk in channel 5, model WDC_WD3003FZEX-00Z4SA0, serial WD-WCC130725037.
Mar 26, 2024 10:12:33 AM Volume: Disk test failed on disk in channel 3, model WDC_WD30EFRX-68EUZN0, serial WD-WMC4N1002183.
Mar 26, 2024 10:12:33 AM Volume: Disk test failed on disk in channel 2, model WDC_WD30EFRX-68EUZN0, serial WD-WMC4N0956058.

18 Replies

Replies have been turned off for this discussion

StephenB
Guru - Experienced User
Mar 28, 2024
halbertn wrote:

The raid 5 configuration was setup using netgear’s X-Raid. Does this imply rebuilding this raid array outside of netgears ReadyNAS HW is impossible? I think X-Raid is proprietary to netgear?

X-RAID is built on Linux software RAID (mdadm). All X-RAID adds is software that automatically manages expansion.

Rebuilding the array and mounting the data volume is certainly possible on a standard linux PC that has mdadm and btrfs (the file system used by the NAS) installed.

halbertn wrote:

However, it is possible that I may be able to recover disk3 despite it being marked FAILED.

Mar 26, 2024 08:17:57 PM Disk: Detected increasing reallocated sector count: [1967] on disk 3 (Internal)

This says ~2000 failed sectors on disk 3. There could easily be more that haven't been detected yet.

Plus you have these errors:

Mar 26, 2024 10:12:34 AM Volume: Disk test failed on disk in channel 5, model WDC_WD3003FZEX-00Z4SA0, serial WD-WCC130725037. Mar 26, 2024 10:12:33 AM Volume: Disk test failed on disk in channel 3, model WDC_WD30EFRX-68EUZN0, serial WD-WMC4N1002183. Mar 26, 2024 10:12:33 AM Volume: Disk test failed on disk in channel 2, model WDC_WD30EFRX-68EUZN0, serial WD-WMC4N0956058.

suggesting that disk 5 also is at risk.

I've helped quite a few people deal with failed volumes over the years. Honestly your odds of success aren't good.

BTW, did you download the full log zip file from the NAS before you powered it down?

halbertn wrote:

Based on my preliminary research, what I need to do is clone each drive to preserve whatever data is recoverable. If I’m lucky and can access disk 3 and clone it (along with disk 1,4,5,6) and not run into any other issues, I *should* be able to rebuild/recover the raid array as I have enough drives to do so.

When you try to recover from failing disks, the recovery can stress the disks - which can push them to complete failure. The benefit of cloning is that it can limit that additional damage.

There is a downside. RAID recovery can only help recover/repair failed sectors when it can detect which sectors have failed. Cloning hides that information. A failed sector in the source will result in garbage data on the corresponding sector of the clone, and there is no way RAID recovery software can identify that sector on the clone is bogus,

Still, in your case cloning makes sense.

There is a related path with similar plusses and minuses. That is to image the disks, and then work with the images. That can be done in a ReadyNAS VM or using RAID recovery software. If you have a large enough destination disk, it can hold multiple images.

halbertn wrote:
Can I say with certainty that disk2 is useless in the context of using it for any data recovery,

It's hard to say whether disk 2 is in worse shape than disk 3 or not. The fact that the system started to resync means that the disk 2 came back on line after the degraded message.

It is possible that disk 2 has completely failed, but you can't conclude that yet.

halbertn wrote:

Cloning every drive to preserve their data would require me purchasing 6 new older hd models that are compatible with the ReadyNAS 316.

No. It would be a big mistake to try and purchase old hard drive models from the HCL

Current Seagate Ironwolf drives and WD Red Plus drives are all compatible with your NAS. So are all enterprise-clasee drives.

Avoid desktop drives and WD Red models, even if they are on the compatibility list. Most desktop drives in the 2-6 TB size range are now SMR drives that are not good choices for your NAS. Regrettably current WD Red drives are also SMR (but not Red Plus or Red Pro). These drives often have very similar model numbers to older CMR drives on the compatibility list.

Plus in recent years Netgear really didn't test all the drives they added to the HCL. Many were added based on tests of different models that Netgear believed were similar.

Bottom line here is that you can't trust the HCL, and it is best ignored.. Just get Red Plus, Seagate Ironwolf, or enterprise class disks.

halbertn wrote:

Before I make this investment, I do want to be able get some confidence or indication that I can recover the data. What's the best way to start? I'm thinking maybe I just attempt to clone disk3, then replace it with the new clone in ReadyNAS 316.

I'd start with asking yourself how much your data is worth to you. That would establish a ceiling on how much you are willing to pay to recover it.

It is also worth asking yourself what you will do if the recovery fails. If you plan to bring the NAS back on line anyway (starting fresh), then investing in new disks wouldn't be wasted money, as you need to replace the failed disks anyway.

As far as where to start - I'd connect each drive to a PC (either with a USB adapter/dock or with SATA) and test it with WD's dashboard utility. That will give you more information on how many disks are failing. Label the disks by slot number as you remove them from the NAS.

As I mentioned above I think successful recovery will be problematic even for a professional. In addition to dealing with multiple disk failures, it sounds like you have no experience with data recovery. How much experience do you have with the linux command line interface? Have you ever worked with mdadm or btrfs using those interfaces? It is very easy to do more damage (making recovery even more difficult) if you don't really know what you are doing.

halbertn wrote:

These hard drives are more than 5-6 years old and non enterprise. In hindsight, I should *not* have allowed the resync to occur and instead performed a backup of the NAS before resyncing the data. The resyncing of the raid array taxed disk3 resulting in its failure and breaking the whole system which I’m learning is quite common. At this point, I have lost 2 disks of my 6 hds, thus making recovery via the raid 5 configuration impossible.

Did you do something to trigger the resync? Or did it happen on its own???

FWIW, I think the lesson here is that RAID isn't enough to keep data safe - you definitely need a backuip plan in place to do that. Disks can fail without warning (as can a NAS or any storage device). Waiting until there's a failure to make a backup often results in waiting too long.
- halbertn
  Aspirant
  Mar 28, 2024
  StephenB
  Data is all personal/unique and important enough to attempt recovery. Not the end of the world if the data is lost, but valuable enough to try. My thinking is spend a little to do an initial but safe evaluation of my drives. If it does look possible for me to proceed with data recovery, pause and assess if I continue on my own or consult with a professional based on their price estimates.
  
  I was unable to pull the latest logs from the RN 316 admin dashboard. Although I was able to navigate around the dashboard, each time I tried to download the logs, I was presented with a generic web service error. I don’t recall the error…I want to say it was a 503 error code. The web service was not able to respond to my request to download the logs.
  
  Question: are you familiar enough with the internals of how RN 316 stores and manages its log? My thinking is it should have its own internal storage or memory to collect logs (and not use the hdd that make up the raid array). If so, I could remove and label all drives, power up the RN and connect to it and attempt to either download the logs from the web portal or ssh in and access the logs from the terminal. What do you think? Is this worth pursuing?
  
  I do however, have logs from a day earlier, prior to the failure of the NAS. It’s important to note from the logs that disk3 has multiple warning reports.
  
  Yes, the resync happened on its own. A scheduled scrub was running, which caused Disk2 to “lose sync” and that triggered the resync to automatically occur. This is what lead to Disk3s reported failure and then data volume dying.
  
  Question: since the NAS reported that disk2 was offline and in a resync state, does this imply that any data that was on this disk to preserve the data volume has been lost or become invalid? It had only been resyncing for a 3 hours of the 27 hours it estimated for completion. I’m asking b/c I want to know if I can even consider disk2 as a candidate for restoring the data from the NAS.
  
  I don’t have experience in data recovery. I am very comfortable working in Linux environment and running command lines. I’m not experienced with mdadm or btrf so i would need to study up on those filesystems.
  
  The total volume size was at 18TB. I do see 20+ TB drives in the market that are reasonably priced. I prefer to proceed with your suggestion for cloning each drive as an image and store them on a single hard drive before attempting the data recovery process. At least I will then have images of each drive saved and can then decide how to proceed with data recovery using the original hard drives.
  
  Question:
  You suggested I connect each drive to a PC running Western Digitals dashboard utility to check the status of the drive. Are you referring to this https://support-en.wd.com/app/answers/detailweb/a_id/31759/~/download%2C-install%2C-test-drive-and-update-firmware-using-western-digital
  Assuming tool recognizes the connected drive from the NAS, what should I be looking for?
  I understand I can run various tests, but I’m not sure I want to run them at the risk of stressing the drives further.
  
  Here’s my proposed plan. Please let me know what you think:
  Purchase a 20TB+ hard drive for storing images of each cloned drive. Even if data recovery is impossible, I can utilize this drive for other purposes so I’m willing to make this expense in the beginning.
  Connect disk3 (drive that was in a FAILED state) to a PC running WD dashboard.
  If the drive is recognized by the dashboard, proceed with cloning the drive.
  Question: I’m learning that ddrescue for Linux is the best tool for cloning. Assuming I get this far, I’d need to reboot the machine and open my Linux environment. Is this the right tool and env to use for cloning? Alternatively, I can setup ddrescue-gui in advance to be prepared to do a clone in a windows environment.
  
  Thank you!
  - StephenB
    Guru - Experienced User
    Mar 28, 2024
    halbertn wrote:
    I understand I can run various tests, but I’m not sure I want to run them at the risk of stressing the drives further.
    
    That is a risk. You could limit the test to the short self test (which normally only takes about 2 minutes). If that fails, the drive has certainly failed. Though success wouldn't mean that the drive is ok, given the brevity of the test.
    
    You can run essentially the same test using smartctl on a linux system.
    
    halbertn wrote:
    
    Question: are you familiar enough with the internals of how RN 316 stores and manages its log?
    
    Many of the files in the zip extracted from the systemd journal. There are some others - mdstat.log is a copy of \etc\mdstat. lsblk is just the output of lsblk, etc. If you can log in via ssh (or tech support mode) you can create the zip file manually with rnutil create_system_log -o </path/filename> Use root as the username when you log in.
    
    halbertn wrote:
    
    Question: since the NAS reported that disk2 was offline and in a resync state, does this imply that any data that was on this disk to preserve the data volume has been lost or become invalid? It had only been resyncing for a 3 hours of the 27 hours it estimated for completion. I’m asking b/c I want to know if I can even consider disk2 as a candidate for restoring the data from the NAS.
    
    It depends on whether you were doing a lot of other writes to the volume during the resync. It the scrub/resync is all that was happening, then the data being written to the disk would have been identical to what was already on the disk. Likely there was some other activity. But I think it is reasonable to try recovery with disk 2 if you can clone it. There is no harm in trying.
    
    halbertn wrote:
    
    Question: I’m learning that ddrescue for Linux is the best tool for cloning. Assuming I get this far, I’d need to reboot the machine and open my Linux environment. Is this the right tool and env to use for cloning? Alternatively, I can setup ddrescue-gui in advance to be prepared to do a clone in a windows environment.
    
    You would need to install ddrescue. In general, installing stuff on the NAS will require some changes to the apt config.
    
    More info on that is here:
    
    https://community.netgear.com/t5/Using-your-ReadyNAS-in-Business/How-I-got-apt-update-and-install-to-work-with-Debian-Jessie/td-p/2306772
    
    Or you could just boot the PC up using a linux live boot flash drive.