× NETGEAR will be terminating ReadyCLOUD service by July 1st, 2023. For more details click here.
Orbi WiFi 7 RBE973
Reply

Re: ReadyNAS - A RAID that is not a RAID

jelockwood1
Guide

ReadyNAS - A RAID that is not a RAID

I have owned and used ReadyNAS models for many, many years and also used them at work. I personally own three which are an NV+, and two Pro Business Edition models. At work we have three 1100 models. I am also starting to plan to get a new bigger model.

However I am currently :x extremely :x hacked off with the firmware for the old Sparc based models i.e. the NV+ and 1100. Yes I am aware these are very old. However at a minimum it should be an absolutely given that they do the basic task of implementing a type of RAID so that if a single drive fails the unit carries on running although obviously without any further protection until the failed drive is replaced.

Unfortunately (to put it mildly) this is not the case!

I have now seen several instances of a drive failing and then taking down the entire unit. That is the unit 'locks up' so that FrontView is not accessible and RAIDar also loses sites of it, often even a simple PING will fail. It also seems to go in to a reboot loop.

In the latest case affecting a 1100 model, it was running the very latest 4.1.14 firmware and had been factory default setup initially with that version. It had 4 x 2TB drives fitted to it. (The biggest it supports.) Obviously the initial format had been successful. A drive later failed and it locked up as I had seen before. Because it had locked up and FrontView was not accessible I could not use the software to access the detailed report of which drive had failed and how it had failed. Worse still even though the 1100 model has four LEDs representing each hard disk these did not indicate which drive had failed. After several attempted reboots with no success these LEDs then seemed to be indicating it was drive four that had failed, however this was a lie! Since the unit was effectively dead and fortunately I had backups of everything on it, and it was not worth paying NetGear for an out of warranty support case I took the risk of pulling out drive four - which the LEDs were now implying was the faulty one. This then did allow the unit to boot successfully with just the remaining three drives. I then fitted a replacement drive in bay four and waited while it did the rebuild. Unfortunately the rebuild failed resulting in it again locking up although admittedly this was not totally the ReadyNAS at fault.

To cut a long story short, it turned out to be drive 2 not 4 that was faulty with a single bad block, I eventually determined this by running a hard disk surface scan on a computer for each drive. The original drives 1, 3 and 4 had zero bad blocks but drive 2 had a single bad block. It also turned out that the brand new replacement disk had 21+ bad blocks.

Now the big problems here are as follows -

1. If a single drive fails, the whole point of RAID including X-RAID as used in this case is to provide protection and allow the unit to continue to work. THIS DID NOT HAPPEN.
2. If a drive fails the unit should indicate this in software i.e. FrontView and should be able to do so because the whole point of RAID is that the unit should still be working. THIS DID NOT HAPPEN.
3. If LEDs are provided to show the status of the drives they should do so, and do so accurately. THIS DID NOT HAPPEN.

I feel and hope that the above critical bugs only affect the old Sparc models. As mentioned I have seen this happen now several times on several different units with various versions of the 4.1.x firmware including the very latest. I have not seen it happen on my Pro Business Edition models which run newer generation firmware. I have also had drives fail on these Pro models but then it did what it is supposed to - it carried on running and allowed me to swap out the drive when this happened.

So, NetGear seem to have made a RAID that is completely useless.

While on the topic of the 4.1.14 firmware, I will also mention the fact that despite one of the supposed fixes specifically listed as being included in it which is to 'fix' the Time Machine Capacity issue this does not seem to be the case. I am still finding the maximum capacity listed for Time Machine even when I have four empty 2TB drives (the maximum) giving a potentially 5.5TB of useable space is still only 2TB. Again this is with a factory default setup unit with 4.1.14 firmware.
Message 1 of 4
mdgm-ntgr
NETGEAR Employee Retired

Re: ReadyNAS - A RAID that is not a RAID

Disks can and do fail at any time and in various ways. Especially if the disk failure manages to go undetected this can cause issues. Did you get any email alerts about SMART errors to warn you that this disk was having issues?

The Sparc models are far more limited than newer models in how they check to see if a disk is bad. Disk 4 may have failed to respond in a reasonable amount of time or perhaps you saw another pattern?: http://kb.netgear.com/app/answers/detail/a_id/21437/~/what-does-this-light-pattern-mean%3F-(led)

Newer models have a Disk Test boot option that can be used to help with checking if disks are bad. Unfortunately this option is not available for Sparc

Handling of disk failures does improve over time. Updates to e.g. smartmontools can help as they add profiles for newer drives.
Message 2 of 4
jelockwood1
Guide

Re: ReadyNAS - A RAID that is not a RAID

mdgm wrote:
Disks can and do fail at any time and in various ways. Especially if the disk failure manages to go undetected this can cause issues. Did you get any email alerts about SMART errors to warn you that this disk was having issues?

The Sparc models are far more limited than newer models in how they check to see if a disk is bad. Disk 4 may have failed to respond in a reasonable amount of time or perhaps you saw another pattern?: http://kb.netgear.com/app/answers/detail/a_id/21437/~/what-does-this-light-pattern-mean%3F-(led)

Newer models have a Disk Test boot option that can be used to help with checking if disks are bad. Unfortunately this option is not available for Sparc

Handling of disk failures does improve over time. Updates to e.g. smartmontools can help as they add profiles for newer drives.


I have now received a replacement drive for the replacement drive. This time the replacement drive was good. As a reminder the ReadyNAS had locked up when a single drive failed and it was not indicating which drive failed or later apparently blaming the wrong drive (drive 4). In actuality there was a single bad block on drive 2. With this new replacement drive it did successfully do a factory default initialisation of the original drive 1, 3, and 4, with the new drive 2.

I have to confess I had not turned on email notification but as the entire ReadyNAS had locked up I am not sure they would have worked anyway. I have now turned on email notifications and tested them.

I still say there is absolutely no excuse for a RAID system to completely lock up with just a single drive failure. Drive 4 has worked as part of the latest factory default setup so drive 4 is apparently good. I have now checked the SMART status for all four drives including the new drive 2 and none of them and that includes the original drive 4 show any ATA or block errors.
Message 3 of 4
jaffacake
Tutor

Re: ReadyNAS - A RAID that is not a RAID

For reference, I recently had a drive failure in my old NV+ which was running the latest hardware.

I got numerous email warnings that it was about to die:


Reallocated sector count has increased in the last day.

Disk 4:
Previous count: 34
Current count: 77

ATA error count has increased in the last day.

Disk 4:
Previous count: 0
Current count: 12

Growing SMART errors indicate a disk that may fail soon. If the errors continue to increase, you should be prepared to replace the disk.


When it did finally fail, over a month after the initial warning, everything stayed online quite happily.


Disk fail event occurred on SATA channel 4.

If the failed disk is used in a RAID level 1, 5, or X-RAID volume, please note that volume is now unprotected, and an additional disk failure may render that volume dead.


[Sun Sep 28 10:00:09 WEST 2014]


I'm very confident in my ReadyNAS storage which is why I happily bought another one earlier this year.
Message 4 of 4
Top Contributors
Discussion stats
  • 3 replies
  • 4041 views
  • 0 kudos
  • 3 in conversation
Announcements