× NETGEAR will be terminating ReadyCLOUD service by July 1st, 2023. For more details click here.
Orbi WiFi 7 RBE973
Reply

ReadyNas 104 Disk Errors

briskik
Aspirant

ReadyNas 104 Disk Errors

I have had a ReadyNas 104 for several years and it has worked really well. Last week I logged into it, and saw disk errors in the logs and have monitored it closesly over the past few days. My disk errors have started on Disk1 and now am seeing errors on Disk3. Disk errors on disk 3 have been ongoing for the past few days, while Disk 1 seems to have quieted down. I have 4 x 3TB drives in a RAID5, and am concerned with 2 drives throwing errors, that I may lose the Raid.

 

All 4 disks are still green and online. I'm on version 6.9.3

 

Drives have ATA errors and rewritten sectors

 

Here are my disk errors:

Disk: Detected high uncorrectable error count: [12264] on disk 3 (Internal) [ST3000DM001-1CH166, W1F5QEWW]. This condition often indicates an impending failure. Be prepared to replace this disk to maintain data redundancy.

 

Disk: Detected increasing pending sector: count [12264] on disk 3 (Internal) [ST3000DM001-1CH166, W1F5QEWW] 679 times in the past 30 days. This condition often indicates an impending failure. Be prepared to replace this disk to maintain data redundancy.

 

Any advice? Do I let the drives fail on their own or do something manually? 

Model: RN104|ReadyNAS 100 Series 4- Bay
Message 1 of 16
StephenB
Guru

Re: ReadyNas 104 Disk Errors


@briskik wrote:

 

Any advice? Do I let the drives fail on their own or do something manually? 


I'd back up the data first, and then replace disk 3.  If that works without data loss, then replace disk 1.

 

I recommend NAS-purposed drives for the RN104 (not desktop drives).  So if you prefer Seagate, replace them with 3 TB IronWolf drives (ST3000VN007).  The Western Digital equivalent is the WD30EFRX.

 

Since you have a pair of disks to replace, you could also take the opportunity to increase your storage by getting larger drives.  For instance if you purchased a pair of 8 TB NAS-purposed drives, your storage would increase from 9 TB (~8.2 TiB) to 14 (~12.7 TiB).

 

Message 2 of 16
briskik
Aspirant

Re: ReadyNas 104 Disk Errors

Thanks for the quick reply!

 

I have most of the important data backed up, but not all of it. I'm attempting to browse my share to back up everything and it times out. I try to log into the UI and it will sit on the Netgear ReadyNAS page for over 10 minutes, and now is sitting at the Connecting to the ReadNas Admin Page... page with the blue bar that is sliding accross looking like it is trying to load. Then once the bar is complete it says the ReadyNas Admin Page is offline

 

Last night I hit the power button twice and let it gracefully power down and powered it up this morning.

 

How do you recommend replacing drives in this condition? I'm unable to log into the UI at the moment. If I physically pull out the online drive, I don't think I'll have the ability to log in to manage the replacement drive and bring it into the RAID and start the raid rebuild process.

Message 3 of 16
Sandshark
Sensei

Re: ReadyNas 104 Disk Errors

What is the status on the display and in RAIDar?  Do you have SSH enabled so you can check status with that?

 

It sounds likely that drive 3 is making it hard to impossible for the NAS to properly boot.  You could remove drive 3 and try booting in read-only mode to find out, but doing a bit more sleuthing before hand is a better approach.  Definitely do not remove drive 3 and just boot normally.  If you guess wrong, your volume likely gets destroyed.

Message 4 of 16
briskik
Aspirant

Re: ReadyNas 104 Disk Errors

I'm fortunate enough to occassionally and sporatically log back in. The status on the LCD Display is normal and no errors. When I log into the UI, it shows the status of the NAS as green and disks are all green. I can't use windows explorer to browse the share anymore. I checked the logs and over the past couple days its just Disk 3 having errors currently.

 

I'm sure I can enable SSH and putty in. I've never SSH'd into it before, what should I be looking for?

 

Current Summary: I have a 4 x 3TB Raid 5, with disk 3 throwing lots of errors, UI sporatically allowing me in (very slow response when I get in), I have a replacement drive ready in hand, I'm unable to browse the network shares.

Message 5 of 16
Sandshark
Sensei

Re: ReadyNas 104 Disk Errors

Probably not the best of time to learn Linux commands, but cat  /proc/mdstat will show you the status of the arrays and top will show you what processes are running (and if any are taking a large chunk of CPU time).  smartclt  --all  /dev/sda (followed by sdb, sdc, and sdd)  will give detailed SMART data.

Message 6 of 16
briskik
Aspirant

Re: ReadyNas 104 Disk Errors

First I want to say to everyone in the community thanks for being so helpful and responsive!

 

The UI has became much more responsive, and I had the ability to backup the random things I had forgot. So I should have all the data on the NAS backed up.

 

Current Status: 2 of my 4 disks in a Raid-5 are throwing errors, but the system is not failing them out. UI is much more responsive. My system is "green", Raid data is online, all 4 of my disks are online. CIFS Share is intermittenlty responsive. I'm unable to stream content off the NAS as I normally do, because of its current status.

 

I'm just concerned that as time passes, both these drives are going to continue throwing errors and will fail at the same time. These errors started about 2 weeks ago, and i'm living on borrowed time. I'm hoping only 1 fails at a time, allowing me time to replace the drive and to let the Raid rebuild.

 

Any advice for my situation?

 

Here are my last ~15 log messages, the same error messages go back for weeks

 

Tue Jul 9 2019 6:59:07

 

Disk: Detected high command timeouts: [270] on disk 3 (Internal) [ST3000DM001-1CH166, W1F5QEWW]. This condition often indicates an impending failure. Be prepared to replace this disk to maintain data redundancy. Tue Jul 9 2019 6:40:33

 

Disk: Detected high command timeouts: [269] on disk 3 (Internal) [ST3000DM001-1CH166, W1F5QEWW]. This condition often indicates an impending failure. Be prepared to replace this disk to maintain data redundancy. Tue Jul 9 2019 6:34:10

 

Disk: Detected increasing ATA error count: [194] on disk 1 (Internal) [ST3000DM001-1CH166, W1F5QFS2] 65 times in the past 30 days. This condition often indicates an impending failure. Be prepared to replace this disk to maintain data redundancy. Tue Jul 9 2019 6:34:10

 

Disk: Detected increasing ATA error count: [194] on disk 1 (Internal) [ST3000DM001-1CH166, W1F5QFS2] 65 times in the past 30 days. This condition often indicates an impending failure. Be prepared to replace this disk to maintain data redundancy. Tue Jul 9 2019 6:34:10

 

Disk: Detected high uncorrectable error count: [4704] on disk 1 (Internal) [ST3000DM001-1CH166, W1F5QFS2]. This condition often indicates an impending failure. Be prepared to replace this disk to maintain data redundancy. Tue Jul 9 2019 6:34:10

 

Disk: Detected increasing pending sector: count [4704] on disk 1 (Internal) [ST3000DM001-1CH166, W1F5QFS2] 187 times in the past 30 days. This condition often indicates an impending failure. Be prepared to replace this disk to maintain data redundancy. Tue Jul 9 2019 6:01:11

 

Disk: Detected high command timeouts: [268] on disk 3 (Internal) [ST3000DM001-1CH166, W1F5QEWW]. This condition often indicates an impending failure. Be prepared to replace this disk to maintain data redundancy. Tue Jul 9 2019 5:52:22

 

Disk: Detected increasing ATA error count: [193] on disk 1 (Internal) [ST3000DM001-1CH166, W1F5QFS2] 64 times in the past 30 days. This condition often indicates an impending failure. Be prepared to replace this disk to maintain data redundancy. Tue Jul 9 2019 5:52:22

 

Disk: Detected increasing ATA error count: [193] on disk 1 (Internal) [ST3000DM001-1CH166, W1F5QFS2] 64 times in the past 30 days. This condition often indicates an impending failure. Be prepared to replace this disk to maintain data redundancy. Tue Jul 9 2019 5:52:22

 

Disk: Detected high uncorrectable error count: [4728] on disk 1 (Internal) [ST3000DM001-1CH166, W1F5QFS2]. This condition often indicates an impending failure. Be prepared to replace this disk to maintain data redundancy. Tue Jul 9 2019 5:52:22

 

Disk: Detected increasing pending sector: count [4728] on disk 1 (Internal) [ST3000DM001-1CH166, W1F5QFS2] 186 times in the past 30 days. This condition often indicates an impending failure. Be prepared to replace this disk to maintain data redundancy. Tue Jul 9 2019 5:46:18

 

Disk: Detected increasing ATA error count: [192] on disk 1 (Internal) [ST3000DM001-1CH166, W1F5QFS2] 63 times in the past 30 days. This condition often indicates an impending failure. Be prepared to replace this disk to maintain data redundancy. Tue Jul 9 2019 5:46:18

 

Disk: Detected increasing ATA error count: [192] on disk 1 (Internal) [ST3000DM001-1CH166, W1F5QFS2] 63 times in the past 30 days. This condition often indicates an impending failure. Be prepared to replace this disk to maintain data redundancy. Tue Jul 9 2019 5:46:18

 

Disk: Detected high uncorrectable error count: [4736] on disk 1 (Internal) [ST3000DM001-1CH166, W1F5QFS2]. This condition often indicates an impending failure. Be prepared to replace this disk to maintain data redundancy. Tue Jul 9 2019 5:46:18

 

Disk: Detected increasing pending sector: count [4736] on disk 1 (Internal) [ST3000DM001-1CH166, W1F5QFS2] 186 times in the past 30 days. This condition often indicates an impending failure. Be prepared to replace this disk to maintain data redundancy.

 

 

Message 7 of 16
StephenB
Guru

Re: ReadyNas 104 Disk Errors

FWIW, the ST3000DM001 disks have a reputation for working badly in RAID arrays.  There's been quite a few issues reported with them here over the years. So over time I suggest replacing all of them with NAS-purposed drives (WD Red or Seagate Ironwolf).  They tend to work out much better. 

 

But disk 1 and disk 3 are the immediate concerns.  Both should be replaced - I wouldn't wait for them to completely die.  Not sure which one is more at risk - both have thousands of pending sectors/uncorrectable error counts.  So back up everything you can while you are waiting for the replacement disks.  

 

One thing you could try that might work out better than 2 resyncs:

 

When you get the two replacements you can try cloning disks 1 and 3 to the new ones.  You'd need to connect two disks to a PC in order to do that (either SATA or using USB adapter/docks).  You'd use cloning software that supports sector by sector copying (CloneZilla is one).

 

Then you'd power down the NAS and insert the two clones  Boot the system up read-only. There likely will be some file corrruption (some sectors probably won't copy correctly).  But you could then do a backup (living with some damaged files).  Follow that up with a factory default to build a clean file system, and restore the files from the backup.

 

Message 8 of 16
briskik
Aspirant

Re: ReadyNas 104 Disk Errors

Thanks for the advice. I've been monitoring the NAS over the past few weeks, and nothing has changed in the situation. Disk 3 seems to be having more errors than disk 1.

 

Your Clonezilla idea seems like it might be an idea in theory.

 

In regards to your comments about "But disk 1 and disk 3 are the immediate concerns.  Both should be replaced - I wouldn't wait for them to completely die". I have the replacement drives ready to go. How do I go about replacing them without losing the RAID? Do I really just pull out one of the "online" drives, and let the system rebuild the raid?

 

Still running version 6.9.3

Message 9 of 16
StephenB
Guru

Re: ReadyNas 104 Disk Errors


@briskik wrote:

Do I really just pull out one of the "online" drives, and let the system rebuild the raid?

 


Well, you could try replacing disk 3 first - but if read errors happen on disk 1 during the resync, you will lose the volume.

 

Powering down the system and cloning disk 1 would be one way to avoid that risk - you'd replace disk 1 with the clone, and then power up the NAS.  The NAS should still show your volume, and you should be able to access your files.  After confirming that, hot-swap disk 3 and let it resync.

Message 10 of 16
briskik
Aspirant

Re: ReadyNas 104 Disk Errors

If I go down the route of using Clonezilla - Id shut down the NAS. Take disk 3 out, and connect it to my desktop along with the new replacement drive (I have spare SATA and SATA power cables) and use Clonezilla to make an exact copy of disk 3 to the new drive. Then once its complete, put the new drive in the NAS, power it up normally and be good to go?

 

And my other (slighltly risky option) is while the NAS is on, pull drive 3 out. Put in the replacement drive. And then go to the volumes tab, and resync the raid to the new drive

Message 11 of 16
StephenB
Guru

Re: ReadyNas 104 Disk Errors

Those are the two options.  There are risks with option 1 also - especially if there are bad sectors that can't be copied over to the clone.  That will result in filesystem corruption (files with errors in them, including folders).

 

If CloneZilla runs into problems doing the cloning, then you'd need to fall back to option 2.

Message 12 of 16
briskik
Aspirant

Re: ReadyNas 104 Disk Errors

Sounds good. I'm going to confirm my backups are as good as possible here before I start.

 

After I clone disk 3, and put it in the NAS and power the NAS up, its just a regular boot correct?

Message 13 of 16
StephenB
Guru

Re: ReadyNas 104 Disk Errors


@briskik wrote:

After I clone disk 3, and put it in the NAS and power the NAS up, its just a regular boot correct?


You can just do a regular boot.  Though perhaps it's better to boot the NAS in read-only mode first.  If that works, it'd be a good opportunity to refresh your backup.

Message 14 of 16
briskik
Aspirant

Re: ReadyNas 104 Disk Errors

I attempted to use Clonezilla and clone disk 3 to a new disk, and after a day it failed with a warning message about the disk has bad sectors. I also attempted to do the same process of cloning disk 1, but had the same errors as disk 3.

 

I enabled SSH and ran some of the above commands to see what process were using cpu/mem

 

So with all the original disks back in the NAS, and it running "normally", I hot swapped disk 3 to a new disk, and after about 10 hours and being more than 14% rebuilt, I see the message on the LCD of "data DEAD"

 

what options do I have now? is there any way to see what files / data the raid rebuild is failing on, that I can remove that data to allow the raid rebuild to keep working?

Message 15 of 16
StephenB
Guru

Re: ReadyNas 104 Disk Errors


@briskik wrote:

what options do I have now? is there any way to see what files / data the raid rebuild is failing on, that I can remove that data to allow the raid rebuild to keep working?


Your only option now is RAID recovery.  You can try Netgear's service ( https://kb.netgear.com/69/ReadyNAS-Data-Recovery-Diagnostics-Scope-of-Service ).

 

You could also try ReclaiMe software - though you'd need a way to connect all the disks to a windows PC (for instance a SATA enclosure).   https://www.reclaime.com/

 

Hindsight is always 20-20.  That said, you should have backed up the NAS before you hot-swapped disk 3. 

Message 16 of 16
Top Contributors
Discussion stats
  • 15 replies
  • 2192 views
  • 0 kudos
  • 3 in conversation
Announcements