× NETGEAR will be terminating ReadyCLOUD service by July 1st, 2023. For more details click here.
Orbi WiFi 7 RBE973
Reply

Re: ReadyNAS RN10400 - one disk failure quickly leads to array loss and admin account lockout

EMddx
Aspirant

ReadyNAS RN10400 - one disk failure quickly leads to array loss and admin account lockout

Hello all,

   I've had my ReadyNAS RN10400 for about 4 years, and have been happy with it until today. My concern is not with the failed hard drive ( I know they will all fail eventually), but how the system quickly devolved and resulted in data loss. Firmware was latest, updated about three weeks ago. Array is RAID5 with four x 2 TB SATA drives. If I did not have multiple backups of the NAS data, I'd be quite upset by now. 

   Further, I'm a bit upset that the NetGear support lasts for 90 days. That seems extremely short, considering these devices are purpose built to hold your family files / pictures / videos for years. 

   I work in IT, and have been server / network admin for decades. I understand disk arrays, but what I saw today was a bit crazy.

 

Timeline:  1)    three weeks ago, I updated the fw to latest, and uploaded some GoPro videos from Holiday Season events. 2) Two weeks ago, I noticed a failed hard drive was reported in slot 3, so I order two more drives of exact same make, model, and drive firmware.  3)   the drives arrived today, and I powered up my NAS in preparation to do drive replacement on the slot 3 disk. 4) I was temporarily happy when I saw the slot 3 drive was reporting as 'healthy' and the array showed the status in the logs as changing from Degraded to Healthy. (following a 40 minute resynch). 'Great !" I say, 'problem solved'. Not quite !!! ; goes downhill from here. 5) After viewing some old recordings, and all was working fine, I notice that now drive in slot 2 is reported as 'failed'. Yipes! Now I'm worried, because I've got two questionable drives out of a total of 4 drives. 6) I make my first questionable call, and decide to replace the slot 2 disk with one of my new drives. 7) logs show resynch moving forward, all is well, just have to wait another 30 minutes. 😎 of course, now slot 1 disk now says failed too! Array now 'offline'. I do a shutdown via the software button, and swap back the original slot 2 disk for the new one, and power up. 9) system takes 10 minutes to finish startup, and in the device's web interface, I see the entire array shows up in red . . .  5 minutes later, there is a message about removing old or failed volumes (the web interface now shows two volumes (instead of one), but both have read dot of failure. Lastly, when I try to logon again, my local 'admin' account does not work at all. Talk about a house of cards!  NetGear tech support, you have any advice for me? Anyone else have something to try, let me know. 

    I'm close to doing full reset procedure on the ReadyNAS, then checking all the hard drives in depth, and creating a new RAID5 array, and uploading all the videos, etc. from the recovery disks. 

    thanks for your attention,    - E.M.

 

 

Model: RN10400|ReadyNAS 100 Series 4- Bay (Diskless)
Message 1 of 6
Marc_V
NETGEAR Employee Retired

Re: ReadyNAS RN10400 - one disk failure quickly leads to array loss and admin account lockout

@EMddx

 

Welcome to the Community!

 

Sorry to hear on what happened Smiley Sad Glad to hear as well that you have backup of your data elsewhere aside from the NAS which is the best.

 

It seems that your disks were having the same fate, it's just that they failed all at the same time which is very unfortunate, Were you able to download the logs from your NAS? Multiple drive failure may require Data Recovery services if you contact Support for assistance but there is also a chance that there was only few bumps that the array went out of sync so chances of remounting or rebuilding it is there provided that disks are still healthy.

 

If you want to DIY, since you have completed re-sync of Disk 3, Disk 2 or Disk 1 might need to be cloned to get back to degraded mode then add a new drive to re-sync and get back to normal RAID 5 protected. Alternatively, the easiest method would be to replace the failed drives and do a Factory reset then send the backup data back to the NAS.

 

Cloning can be done thru SSH or using apps recommended by the Community. 

 

HTH

 


Regards

Message 2 of 6
StephenB
Guru

Re: ReadyNAS RN10400 - one disk failure quickly leads to array loss and admin account lockout


@Marc_V wrote:

Alternatively, the easiest method would be to replace the failed drives and do a Factory reset then send the backup data back to the NAS.

 


That's what I would do.  Even if you do manage to repair the array, there could be some file system corruption that would be hard to find.  That's especially likely if you are cloning drives with disk errors.

 

But first I'd download the log zip file from the NAS, and look at the smart stats for each disk.  The built-in thresholds that the NAS uses to report disk issues are much higher than I like.  While looking at that, check system.log and kernel.log for disk or btrfs errors.  I'd also test the drives by connecting them to a Windows PC (either with SATA or a USB adapter/dock), and test them with vendor tools (Seatools for Seagate; Lifeguard for Western Digital).  I recommend starting with the long non-destructive test, and following up with the write zeros test (called "erase" on one of those tools).

 

After you verified disk health for the disks you want to use, proceed with the factory install, reconfigure the NAS, and restore the data from the backup.  BTW, the factory install will happen automatically if you've erased the disks.

 


@Marc_V wrote:

 

It seems that your disks were having the same fate, it's just that they failed all gave up at the same time which is very unfortunate,

Yes, this does happen more often than many people think.  One aspect is that the drives are generally the same model, installed at the same time, and are subject to identical loads and environmental conditions.

 

Another (more relevant in my opinion) is that bad sectors are only found when those sectors are read or written.  In most NAS, a lot of the data isn't accessed that often, so there can be latent problems that aren't detected for a long time.  Then you try to do a resync (which reads or writes every sector in the volume), and discover that there's a problem.

 

On this last possibility - there are maintenance tasks you can schedule for the NAS on the volume tab.  That includes a disk test, as well as a scrub (which is a good disk exerciser).  It's also good to run a balance from time to time (not to verify disk health). With BTRFS, free space can remain allocated - which makes it not available.  Running a balance will (among other things) deallocate that free space.

 

Message 3 of 6
EMddx
Aspirant

Re: ReadyNAS RN10400 - one disk failure quickly leads to array loss and admin account lockout

Thanks very much for your input and reply, sorry for the slow response!  Turns out both "replacement " drives had similar, very intermittent issues that would only show up in extended testing. I'm working on getting replacements, and being careful to not even look at my backups til I have the NAS ready to accept data transfer. Best Wishes!

Message 4 of 6
EMddx
Aspirant

Re: ReadyNAS RN10400 - one disk failure quickly leads to array loss and admin account lockout

Thanks again, looking like bad drives are culprit, not NetGear's fault at all

Message 5 of 6
StephenB
Guru

Re: ReadyNAS RN10400 - one disk failure quickly leads to array loss and admin account lockout


@EMddx wrote:

Thanks again, looking like bad drives are culprit, not NetGear's fault at all


Thanks for following up.

Message 6 of 6
Top Contributors
Discussion stats
  • 5 replies
  • 771 views
  • 2 kudos
  • 3 in conversation
Announcements