Replacing Failed Drive Possible Bug?

Yesterday I replaced on of the 3TB drives in my RN312. It was in the second bay. I left the NAS powered up, removed the drive in bay 2, replaced the drive with an identical new drive and reinserted it in the NAS slot 2. Here are the log entries that show what happened during this process:

[13/10/09 17:14:23 CST] warning:disk:LOGMSG_DELETE_DISK Disk Model:WDC WD30EFRX-68EUZN0 Serial:WD-WMC4N0369713 has been removed from Channel '2' of the head unit.
[13/10/09 17:14:31 CST] warning:volume:LOGMSG_HEALTH_VOLUME Volume 'data' health changed from 'REDUNDANT' to 'DEGRADED'.
[13/10/09 17:14:33 CST] notice:disk:LOGMSG_DISK_FAIL_HALT System will be shut down in 30 minutes because of disk failure.
[13/10/09 17:16:48 CST] notice:disk:LOGMSG_ADD_DISK Disk Model:WDC WD30EFRX-68EUZN0 Serial:WD-WMC4N0512222 has been added to Channel '2' of the head unit.
[13/10/09 17:17:05 CST] notice:volume:LOGMSG_RESILVERSTARTED_VOLUME Rebuilding started for Volume 'data'.
[13/10/09 17:47:57 CST] info:system:LOGMSG_READYNASD_ABORTED_NOINFO ReadyNASOS service or process was restarted.
[13/10/09 17:47:58 CST] warning:volume:LOGMSG_HEALTH_VOLUME_WARN Volume 'data' is 'DEGRADED'.
[13/10/09 17:47:58 CST] info:system:LOGMSG_START_READYNASD ReadyNASOS background service started.
[13/10/09 17:48:13 CST] notice:volume:LOGMSG_RESILVERSTARTED_VOLUME Rebuilding started for Volume 'data'.
[13/10/10 01:00:04 CST] warning:volume:LOGMSG_HEALTH_VOLUME_WARN Volume 'data' is 'DEGRADED'.
[13/10/10 02:37:17 CST] notice:volume:LOGMSG_RESILVERCOMPLETE_VOLUME Volume 'data' has been rebuilt.
[13/10/10 02:37:19 CST] notice:volume:LOGMSG_HEALTH_VOLUME Volume 'data' health changed from 'DEGRADED' to 'REDUNDANT'.
[13/10/10 02:37:21 CST] notice:disk:LOGMSG_ZFS_DISK_STATUS_CHANGED Disk in channel '2' (Internal) changed state from RESYNC to ONLINE.

The process went fine although it took a long time. The unexpected bit of this is the warning that was given when I removed the failing drive. The system warned that the System would be shut down in 30min. I reinserted a new drive 2 min. later, the disk was recognized and the rebuild started. Then during the rebuild (30 min. later) the system shut down. This freaked me out initially, but I restarted the system and the volume rebuild continued normally until it finished successfully.

I guess the bug part could be is why did the system still shut down, even though it had a valid device in bay 2 and the rebuild had started. I would assume that if there were a drive failure and the failure state continued for 30 min, then a shutdown would occur.

Your thoughts?

Installation & Upgrade

11 Replies

Replies have been turned off for this discussion

fastfwd

Virtuoso

Oct 10, 2013

TobyE wrote:

The unexpected bit of this is the warning that was given when I removed the failing drive. The system warned that the System would be shut down in 30min. I reinserted a new drive 2 min. later, the disk was recognized and the rebuild started. Then during the rebuild (30 min. later) the system shut down. This freaked me out initially, but I restarted the system and the volume rebuild continued normally until it finished successfully.

I guess the bug part could be is why did the system still shut down, even though it had a valid device in bay 2 and the rebuild had started. I would assume that if there were a drive failure and the failure state continued for 30 min, then a shutdown would occur.

Your thoughts?

TobyE wrote:
The unexpected bit of this is the warning that was given when I removed the failing drive. The system warned that the System would be shut down in 30min. I reinserted a new drive 2 min. later, the disk was recognized and the rebuild started. Then during the rebuild (30 min. later) the system shut down. This freaked me out initially, but I restarted the system and the volume rebuild continued normally until it finished successfully. I guess the bug part could be is why did the system still shut down, even though it had a valid device in bay 2 and the rebuild had started. I would assume that if there were a drive failure and the failure state continued for 30 min, then a shutdown would occur. Your thoughts?

Yeah, sounds like a bug to me, too. I imagine that it's not THAT big a deal, aside from the brief unavailability of the shares on the NAS, because the shutdown is orderly and the resync process apparently continues after the restart with no ill effects.

I replaced a drive in my Pro Pioneer this week, but I don't have it configured to shut down on a single-drive failure, so I don't know whether the older OS4 systems have the same behavior as your OS6 system.

TobyE
Guide
Oct 10, 2013
You are right. Not that big of a deal, but it sure freaked me out!
routier1642
Aspirant
Oct 11, 2013
I had the same experience yesterday, and it freaked me out too!

OS4 had nice, relaxed messages like "Data Unprotected", and "Re-syncing".
OS6 has replaced this with panic-inducing and misleading messages like 'DATA DEGRADED!" and, when you replace the disk, "Recover data".
This is a crock, because the data has not been degraded, and data has not been lost (they're talking about loss of redundancy, but that's a different thing)

What's more, I had to deal with this BS "System will shut down in 30 minutes due to disk failure", which didn't go away even after I replaced the disk.
Why is there a shut-down in the first place? Isn't the WHOLE IDEA of a fault-tolerant system that it CAN KEEP GOING AFTER A DISK FAILURE?

AAGGH!! :evil:
I'm angry. What has happened to this product??
routier1642
Aspirant
Oct 11, 2013
The best way to avoid the grief is to swap disks while the system is shut down.
It panics for a couple of minutes, but then figures it out and re-syncs.

So hot-swapping is now no longer really possible.
In other words, they've gone backwards.

NetGear need to lift their game - my next NAS may not be one of theirs.
mdgm-ntgr
NETGEAR Employee Retired
Oct 11, 2013
What version of ReadyNAS OS was this on? 6.1.3 or older firmware?

Thanks for reporting this. I have contacted some contacts at NetGear. So this should be looked into soon and if it is reproduced one would expect a bug of this nature to be fixed for the next release.

Under System > Settings > Alerts there is the option to shutdown the NAS when a disk fails (I believe this option is disabled by default). As there is the possibility of multiple disk failures, some users like their NAS units to shutdown in the event of a disk failure to minimise the risk of a second disk failure before the array can be rebuilt with a replacement disk (which may need to be purchased so might not be inserted for a day or two). Of course if you store important data primarily on the ReadyNAS you should backup that data. Whilst the ReadyNAS has good features to help minimise the risk of data loss, backups are still important.

Obviously when the rebuild with a replacement disk had already started before the 30 minutes was up it is not good that the NAS still shutdown.
TobyE
Guide
Oct 11, 2013
What version of ReadyNAS OS was this on? 6.1.3 or older firmware?

It is on 6.1.3

Under System > Settings > Alerts there is the option to shutdown the NAS when a disk fails (I believe this option is disabled by default).

The setting is "If Disk fails or no longer responds". I don't think I changed this, and it is enabled. I'm not sure.
StephenB
Guru - Experienced User
Oct 12, 2013
TobyE wrote:
The setting is "If Disk fails or no longer responds". I don't think I changed this, and it is enabled. I'm not sure.
It is not enabled on my RN102, and I am certain I did not set it. Probably most users should not enable this mode.
mdgm-ntgr
NETGEAR Employee Retired
Oct 12, 2013
Still it should not shutdown the system if the failed disk has already been replaced a rebuild has started. So there is an issue that needs addressing.
routier1642
Aspirant
Oct 12, 2013
There certainly is.
And I checked my settings - the "Shut down the system when a disk fails.." setting was NOT enabled.
But it did it anyway....
routier1642
Aspirant
Oct 12, 2013
Oh - it did this under ReadyNAS 6, the version before 6.1.3.