× NETGEAR will be terminating ReadyCLOUD service by July 1st, 2023. For more details click here.
Orbi WiFi 7 RBE973
Reply

Re: Ultra6 rebuild always ends with "complete" and then "Dea

nlaredo
Aspirant

Ultra6 rebuild always ends with "complete" and then "Dead"

I have an Ultra6 where drive 1 failed, so I bought a new drive 1, and then it started rebuilding. After popping up a dialog that the rebuild was complete, it told me that my array was dead because of a second drive failure (drive 2). I was a little confused because drive 1 had just completed rebuild successfully, so it seemed that I only had one drive failure, drive 2, but on restarting the system, it said drive 2 was fine (and had logged 6 errors) and then restarted the resync on drive 1 from scratch. It completed this again, then told me drive 2 had failed again and the array was completely dead again. In the mean time, during this resync, I was able to perform a (nearly) complete data backup via readynas replicate to a new rn516 (status on replicate jobs is not that great but the sizes of the two arrays match up now).

Now, I restarted the ultra6 again, and it told me disk 2 was fine and it was restarting the rebuild on disk 1 from scratch again.. and it had logged another six errors on disk 2. it completed this resync again in a few hours at 80+MB/sec, and at the end popped up a dialog that rebuild was complete followed by a dialog that I was running in degraded mode due to a drive failure (disk 2 marked dead again). Reboot caused it to mark disk 2 as ok again, and then it started a full rebuild on disk 1 again (despite this being the Nth time that disk 1 completed rebuild).

At this point I'm pretty confident that there is some bug in the readynas software that is sabotaging the end of the rebuild.

Initial config was 6x ST32000542AS (on the compatibility list), and then current is disk 1 replaced with a new WD30EFRX (3TB while all other drives are 2TB).

After rebuild completes (each time) "successfully" the UI shows Ch 1 with a green circle with a + in it and I have no idea what that means (I am guessing "spare" but I have never had a spare), and shows CH 2 as dead.

Drive 1 before replacement was really and truly dead with some clicks and whirs of death, so it had to be replaced, but drive 2 still lets me mirror the whole array via readynas replicate, and it appears to still support a full rebuild to the completion dialog, but never far enough so that drive 1 gets marked good so I can move on to replacing drive 2. Am I just hosed? I may actually be on my fifth or sixth recovery cycle now (currently 68%, time to finish 2 hr 12 min, speed 75.4MB/sec) of drive 1 and this is really smelling like a software issue in readynas software.

Any ideas? suggestions? should I just give up on the ultra6 (potentially losing some files) now that I've got a "failed" backup on the rn516 that is larger than the size of the data on the ultra6? I know i could shut down, pop drive 2 out, clone it ignoring errors, and pop the clone back into the ultra6 and complete the repair operation, but it seems like broken software that I would have to resort to any measures like that.
Message 1 of 8
nlaredo
Aspirant

Re: Ultra6 rebuild always ends with "complete" and then "Dea

I'm having a really bad evening:

$ grep -A2 ^md2 System*/mdstat.log
System_log-ultra6-20130822-224602/mdstat.log:md2 : active raid5 sdb5[1] sda5[6] sdf5[5] sde5[4] sdd5[3] sdc5[2]
System_log-ultra6-20130822-224602/mdstat.log- 9743966080 blocks super 1.2 level 5, 64k chunk, algorithm 2 [6/5] [_UUUUU]
System_log-ultra6-20130822-224602/mdstat.log- [================>....] recovery = 83.1% (1620384512/1948793216) finish=90.6min speed=60383K/sec
--
System_log-ultra6-20130822-232558/mdstat.log:md2 : active raid5 sdb5[1] sda5[6] sdf5[5] sde5[4] sdd5[3] sdc5[2]
System_log-ultra6-20130822-232558/mdstat.log- 9743966080 blocks super 1.2 level 5, 64k chunk, algorithm 2 [6/5] [_UUUUU]
System_log-ultra6-20130822-232558/mdstat.log- [==================>..] recovery = 90.5% (1764500096/1948793216) finish=57.2min speed=53695K/sec
--
System_log-ultra6-20130823-001805/mdstat.log:md2 : active raid5 sdb5[1] sda5[6] sdf5[5] sde5[4] sdd5[3] sdc5[2]
System_log-ultra6-20130823-001805/mdstat.log- 9743966080 blocks super 1.2 level 5, 64k chunk, algorithm 2 [6/5] [_UUUUU]
System_log-ultra6-20130823-001805/mdstat.log- [===================>.] recovery = 99.0% (1930460544/1948793216) finish=6.9min speed=44163K/sec
--
System_log-ultra6-20130823-002303/mdstat.log:md2 : active raid5 sdb5[1] sda5[6] sdf5[5] sde5[4] sdd5[3] sdc5[2]
System_log-ultra6-20130823-002303/mdstat.log- 9743966080 blocks super 1.2 level 5, 64k chunk, algorithm 2 [6/5] [_UUUUU]
System_log-ultra6-20130823-002303/mdstat.log- [===================>.] recovery = 99.7% (1944840292/1948793216) finish=1.3min speed=47614K/sec
--
System_log-ultra6-20130823-002523/mdstat.log:md2 : active raid5 sdb5[1](F) sda5[6](S) sdf5[5] sde5[4] sdd5[3] sdc5[2]
System_log-ultra6-20130823-002523/mdstat.log- 9743966080 blocks super 1.2 level 5, 64k chunk, algorithm 2 [6/4] [__UUUU]
System_log-ultra6-20130823-002523/mdstat.log-
--
System_log-ultra6-20130823-002612/mdstat.log:md2 : active raid5 sdb5[1](F) sda5[6](S) sdf5[5] sde5[4] sdd5[3] sdc5[2]
System_log-ultra6-20130823-002612/mdstat.log- 9743966080 blocks super 1.2 level 5, 64k chunk, algorithm 2 [6/4] [__UUUU]
System_log-ultra6-20130823-002612/mdstat.log-


Turns out my entire 9TB array at home got nuked as "dead" due to 9 bad sectors at the end of one of the remaining 5 drives during the rebuild process of the 6th drive.

*sigh*
Message 2 of 8
StephenB
Guru

Re: Ultra6 rebuild always ends with "complete" and then "Dea

This was in addition to failed drive 1? IF so, it is expected, though it would be nice if the NAS had a recovery mode where it would ignore those sectors. You could try cloning the remaing bad drive.

On the RN516 - are you sure the backup is failed? How far off is the size?
Message 3 of 8
nlaredo
Aspirant

Re: Ultra6 rebuild always ends with "complete" and then "Dea

Yes, my biggest complaint here is that netgear failed to give me a user option to ignore the drive errors and complete the rebuild anyway and just tell me which files are now corrupted by the roughly ~4k of bad blocks vs declaring the whole array dead...

On the RN516, the backup is *larger* than the original ultra6 data used. I am collecting a full list of files on the RN516 backup today so I can compare with a full list of files on the ultra6 after I finish rebuilding it outside of the ultra6 in a dedicated linux machine. At least in linux as I get the list of bad sectors, I can literally force rewrite of zero those sectors to get them reallocated until the rebuild succeeds without the drive throwing any errors. I guess I could root the ultra6 and put the right tools on there (like hdparm) if they aren't already there, but I didn't want to void any remaining warranty.
Message 4 of 8
StephenB
Guru

Re: Ultra6 rebuild always ends with "complete" and then "Dea

nlaredo wrote:
Yes, my biggest complaint here is that netgear failed to give me a user option to ignore the drive errors and complete the rebuild anyway and just tell me which files are now corrupted by the roughly ~4k of bad blocks vs declaring the whole array dead...
I agree it would be nice. Cloning the bad drives does the same thing, but requires that you already have the replacement drives.

BTW, Scrubbing [prior to the main failure] would have uncovered/reallocated the bad blocks too, so perhaps setting up a scrubbing schedule after you get it rebuilt would be a good idea.

nlaredo wrote:
On the RN516, the backup is *larger* than the original ultra6 data used.
If you are talking about bytes written, then it could be due to the different file system. How large a discrepancy are you talking about?

nlaredo wrote:
...I guess I could root the ultra6 and put the right tools on there (like hdparm) if they aren't already there, but I didn't want to void any remaining warranty.
Enabling ssh (and adding tools) doesn't void the warranty, though of course if you do something which destroys the OS, etc Netgear won't help.
Message 5 of 8
nlaredo
Aspirant

Re: Ultra6 rebuild always ends with "complete" and then "Dea

For anyone following along, the array is completely rebuilt now. I ended up taking drive 2 to work and analyzing it. It had one total bad sector on the entire surface. I rewrote that one sector manually with zero and then put the drive back into the readynas. The rebuild completed successfully while I slept last night.

I have now manually kicked off another readynas replicate job, so after this finishes (hopefully successfully without incident) and a full system backup of the ultra6 also finishes after that, I will likely write off the balance of the ultra6 warranty and attempt to graft os 6 into the ultra6 as described in another forum. I imagine this should allow me to continue to use readynas replicate beyond the 45 day trial. Worst case, the price of readynas replicate that I would be saving covers a big chunk of the cost of a RN516 enclosure if I decide to buy a second bare one of those instead (if my love of ecc ddr3 sdram ends up being stronger than my love of saving money).
Message 6 of 8
nlaredo
Aspirant

Re: Ultra6 rebuild always ends with "complete" and then "Dea

I don't even see the option in my RN516 to enable scrubbing, though I have now done so in my Ultra6. Did the option disappear in OS6? Is it automagic now?
Message 7 of 8
StephenB
Guru

Re: Ultra6 rebuild always ends with "complete" and then "Dea

You can manually scrub (click on the volume on the volumes page), but I don't think there is a scheduled scrub.

For a while the volume was being rebuilt the first weekend of every month, but I don't think that happened on my RN102 this month - though I was running a beta firmware, which might account for it.
Message 8 of 8
Top Contributors
Discussion stats
  • 7 replies
  • 1430 views
  • 0 kudos
  • 2 in conversation
Announcements