× NETGEAR will be terminating ReadyCLOUD service by July 1st, 2023. For more details click here.
Orbi WiFi 7 RBE973
Reply

Re: ReadyNAS Pro 6 volume lost after trouble with 2 disks

SarahS3074
Aspirant

ReadyNAS Pro 6 volume lost after trouble with 2 disks #26195581

We have a ReadyNAS 6 Pro with 6 2TB drives but only ~500GB of data in a few shares and now our volume is completely inaccessible (first it was stuck in life support mode no matter what we did). We have 4.2.28 running and XRAID2 (so it's configured as RAID 6).

 

Here's the chronology of events:

1) drive 5 reported errors and eventually died altogether

2) drive 6 started reporting errors but wasn't dead when we pulled 5 and replaced it

3) the array started rebuilding but eventually failed with an error and drive 6 was marked dead and noticed 5 was marked as a hot spare and the volume went into read only life support mode (very puzzling as 4 drives are plenty for at least RAID5 and we're using FLEX RAID2 or something to it doesn't make sense)

4) we don't have another drive yet so we reseated 5 and booted back up and it began to rebuild and we were able to work again for about 3 hours, drive 6 was just marked dead

5) the rebuild failed again and yet again the unit went into life support mode with 5 as a hot spare so we shut down and this time reseated both 5 and 6 and booted back up

6) the unit failed a volume check now (and has ever since) and again put the volume into life support mode with 5 as a hot spare

7) we shut down and pulled drive 6 and reseated 5 and booted back up and had the same problem

😎 we shut down again and reseated 5 and put 6 back in and booted back up and now the volume isn't being recognized at all (and hasn't been since)

 

We still have the old failed drive 5. I'm going to try to put that back in to see if that makes a damn bit of difference but am not hopeful. The total data footprint here is laughable. It's like 500GB TOPS and we have 6 2TB drives in this sucker.

 

So we NEVER pulled a drive while a rebuild was happening ever. Now the unit is totally down and tech support is totally pessimistic of being able to help because 2 drives are failed. We now need level 3 support at an additional $200 (and we already added a support contract for ~$300 to get back in the graces of tech support help) for a "maybe but not likely able to help". They suggested we call the number when we made the decision to go forward with level 3 support and that the tech helping us was going home for the weekend and when I called, the automated number hangs up when we tell it we're calling about a ReadyNAS storage product and it says to file a support ticket online. SO we have NO HELP THIS ENTIRE WEEKEND and this is the repository of all data for 2 companies. We are dead in the water but do have backups that we're looking into now, but does anyone have any suggestion on how to move forward? If this were a regular server I was maintaining, Linux or Windows with a RAID 6 array with failed drives I'd be doing some Linux scripting with dd and attempting to rebuild the data given the amount we need is SO SMALL ANY 1 DRIVE SHOULD CONTAIN A COMPLETE FRIGGING COPY WITH OVER 70% FREE SPACE!

Message 1 of 9

Accepted Solutions
Danthem
NETGEAR Employee

Re: ReadyNAS Pro 6 volume lost after trouble with 2 disks #26195581

This case is resolved and data was eventually succesfully recovered. This thread can be marked as "Solved" unless SarahS3074 wants to add anything more? 

 

-Daniel

View solution in original post

Message 9 of 9

All Replies
mdgm-ntgr
NETGEAR Employee Retired

Re: ReadyNAS Pro 6 volume lost after trouble with 2 disks

X-RAID2 would use RAID-5, unless you chose the dual-redundant option when creating the volume (or converted to dual-redundancy at some point). Only in 12-bay devices would it default to RAID-6.

 

(with dual-redundancy) shows in Frontview under Volumes > Volume Settings for dual-redundant X-RAID2 volumes.
 

X-RAID2 is a better version of X-RAID than what we shipped in our legacy Sparc devices. X-RAID2 supports vertical expansion without requiring all disks to be replaced to achieve it.

Chronology:

2) When you replace a disk, there is a resync that puts heavy stress on all disks. If a disk is failing it can easily finish it off. This is one reason why maintaining regular backups (i.e. never storing important data on just the one device) is important.

3) X-RAID2 would use RAID-5 by default

4) Before you did this, in my view it would be advisable to contact support and wait for our advice. The more you do to try to fix such a problem yourself the less likely we are going to be able to help.

 

Seeing you've done some syncs already the old failed drive 5 would be out of sync with the rest of the disks and putting that back in would only make things worse.

 

Dual-disk failure does make things difficult especially considering you have tried to fix the problem yourself. This is a data recovery situation and the only thing that covers data recovery is a "data recovery" contract, which at the moment I think is $200 USD for the initial diagnostics (up to 1hr of work performed). Data recovery may be unsuccessful.

 

We use mdadm, LVM2 and EXT4 on your ReadyNAS Pro Business Edition.

 

The disks are synced at a lower level than the filesystem. Furthermore with a redundant RAID-5 volume, parity is evenly distributed amongst the disks. With two disk failures you would have complete data loss if you can't find a way to get one of the failed disks back online. It is important to get professional advice to maximise the chances of such an attempt being successful. The XOR calculations to rebuild a RAID-5 volume require n-1 working disks (where n is the number of disks in a redundant volume). You certainly can't take any single given disk and get at all of the data.

RAID is not designed to replace backups. Redundant RAID levels are designed to keep the system running without downtime in the event of a disk failure (or with some RAID levels more than one disk failure e.g. RAID-6 protects against two disk failures). Too many disk failures, accidental file deletions, fire, flood and theft etc. are just some of the problems that RAID can't protect against.

Message 2 of 9
SarahS3074
Aspirant

Re: ReadyNAS Pro 6 volume lost after trouble with 2 disks

Well for what it's worth arrays md0 and md1 (the core array for the partition housing the main /etc and other folders for the OS, and the array for the swap partitions respectively) both show as RAID 6 arrays. But somehow md2 is RAID 5 (which is and was the problem). Because of the chronology of events, it's clear to me that it WASN'T RAID 5 Thursday morning when we pulled the dead drive 5 out and installed the new one and a resync began. That morning, BEFORE we pulled drive 5 we saw drive 6 show up as dead as well (or dying, I can't remember at this point). Nonetheless, we're in the predicament that we're in and we are faced with attempting to clone the now dead drive 6 to a good drive and put it in and see what we can do about reconstructing the RAID 5 array without drive 5.

 

In short, this shouldn't have happened with 4 perfectly working drives in a RAID 6 array, I should've been getting life support errors and that's only until it realized a perfectly good drive was inserted ready for it to just be 1 drive down. This has happened to us before and wasn't an issue at all. If something, ANYTHING, in the Frontview told me the RAID array was no longer RAID 6 I would've stopped immediately and performed an emergency backup. I should've known to suspect this in hindsight when I saw the newly inserted drive 5 show up as a hot spare for whatever stupid reason. I clearly remembering checking Frontview and seeing RAID 6 there (which it's probably reading from md0 not md2 as it should).

 

Never once have I complained that RAID should be our backup, by the way, the need for offsite backup is NOT lost on me whatsoever and is why we are able to have SOMETHING in place at this point.

 

Thanks for the help, and for what it's worth Netgear support is quite useful when they're open and actively responding.

Message 3 of 9
mdgm-ntgr
NETGEAR Employee Retired

Re: ReadyNAS Pro 6 volume lost after trouble with 2 disks #26195581

The OS partition is 4GB and on md0 and we use RAID-1 for a good balance between performance and redundancy. Your swap is on md1 and this is using RAID-6.

 

Your volume most certainly is RAID-5 and was created back in August 2010. The volume level wouldn't have been changed from RAID-6 to RAID-5. We don't support doing that, and it didn't happen.

 

X-RAID2 uses RAID-5 in the Pro 6 unless you deliberately choose to use dual-redundancy (either setting it up as such using RAIDar during the 10 minute countdown after choosing to do a factory reset; or if you have a redundant volume with less than 6 disks and choose the option in Frontview for the next added disk to convert the volume to dual-redundancy before adding a disk to an empty drive bay).

 

If your volume was using Flex-RAID RAID-6 or X-RAID2 (with dual redundancy) it would have been clearly stated under Volumes > Volume Settings in Frontview.

 

Your expansion.log (from the logs zip file) consistently shows X_level: 5 which proves your volume has always been RAID-5.

 

My X-RAID2 (with dual-redundancy) volume shows X_level: 6

It's possible that with the volume not being usable you might have seen info for the OS volume or swap incorrectly but then you would also have seen that the volume size shown wasn't for the data volume.

 

Once what data can be recovered has been it would be advisable to use X-RAID2 (with dual-redundancy) or Flex-RAID RAID-6 going forward.

Message 4 of 9
SarahS3074
Aspirant

Re: ReadyNAS Pro 6 volume lost after trouble with 2 disks #26195581

Front view stated RAID 6 and I swear it was RAID 6. On the 17th drive 6 failed and was reported as dead before we pulled 5 and replaced it later that same morning (5 had failed days before). We were able to use everything while it was resyncing which could only be possible in a RAID 6 environment with dual redundancy. I understand that automatically switching back to RAID 5 from RAID 6 isn't supported and shouldn't have happened but this was not the way this device was set up back in 2010. The device was working with drives 5 and 6 down before attempting to reboot it a few times in the attempt to get the new drive 5 to be added to the array. In fact even in a RAID 5 environment it makes no sense that this wasn't working or why the volume scan was failing or why the resyncing was failing when disks 1 thru 4 were (and still are) fine and 5 was replaced with a good drive. It's not worth arguing at this point but we're not stupid. This device has always had 6 drives and was always intended to be set up X-RAID2 with dual redundancy. This will be the way it's set up once we give up with messing with the unit and factory reset it and I will never take lightly the messages stating the volume going into life support mode or the logs saying a drive has become a spare. And again, we never ever intended this device to simultaneously be our production units and a backup at the same time, we just didn't expect it to fail so epically when only 2 of 6 drives have trouble.
Message 5 of 9
mdgm-ntgr
NETGEAR Employee Retired

Re: ReadyNAS Pro 6 volume lost after trouble with 2 disks #26195581

The status log said

2!!Thu Dec 17 04:31:56 EST 2015!!root!!Detected increasing uncorrectable errors[16] on disk 6 [ST2000DM001-1CH164, ZXXXXXXX] in the past 30 days. This often indicates an impending failure. Please be prepared to replace this disk to maintain data redundancy.

That indicates that the system warned you that the disk may be about to fail. It was not marked as dead. There is a difference.

So disk 6 was failing, and disk 5 had failed. You replaced disk 5 and the heavy stress of the resync likely contributed to finishing disk 6 off.


I can assure you that the expansion log is accurate. It's in our interests as much as yours for us to have accurate information about the history of the system, and your volume has clearly always been RAID-5. There are entries dating from 2010 through to the present consistently indicating your volume is using RAID-5.

RAID-5 can withstand a single disk failure, it is not designed for problems with multiple disks.

 

Whatever you thought the RAID level used was, the log makes it clear.

One big giveaway would have been the volume capacity as a RAID-5 volume will have a higher volume capacity than a RAID-6 volume with the same disks.

Message 6 of 9
SarahS3074
Aspirant

Re: ReadyNAS Pro 6 volume lost after trouble with 2 disks #26195581

I'm well aware about what RAID 5 can and cannot withstand. Unsure what post makes that unclear. I assure you the logs visible in front view never mentioned the danger due to the system having only single redundancy. It only shows the generic message that single redundancy means imminent failure but dual redundancy allows a second failure. I'm not lying when I tell you a page in the front view stated XRAID-2 and RAID 6 NOT 5.
When the good disk 5 was installed and the system was up and running while disk 6 was failing during the resync it went into life support mode and the good disk 5 became a spare according to the health page of front view. So did that mean there was nothing we could do? Replacing disk 6 while 5 didn't yet have the partition information synced would've toasted the system anyway, am I right? It was when we pulled disk 6 and attempted to start up without it inserted that we lost the volume altogether.
Message 7 of 9
StephenB
Guru

Re: ReadyNAS Pro 6 volume lost after trouble with 2 disks #26195581


@SarahS3074 wrote:


When the good disk 5 was installed and the system was up and running while disk 6 was failing during the resync it went into life support mode and the good disk 5 became a spare according to the health page of front view. So did that mean there was nothing we could do? Replacing disk 6 while 5 didn't yet have the partition information synced would've toasted the system anyway, am I right? It was when we pulled disk 6 and attempted to start up without it inserted that we lost the volume altogether.

I think by that point there was not much you could do.  You might have tried cloning disk 6 before you replaced disk 5.  That might have resulted in some data corruption, but probably would have avoided the volume failure.

 

In hindsight (always easy) periodic volume scrubs might have surfaced the disk problems sooner.

Message 8 of 9
Danthem
NETGEAR Employee

Re: ReadyNAS Pro 6 volume lost after trouble with 2 disks #26195581

This case is resolved and data was eventually succesfully recovered. This thread can be marked as "Solved" unless SarahS3074 wants to add anything more? 

 

-Daniel

Message 9 of 9
Top Contributors
Discussion stats
  • 8 replies
  • 3299 views
  • 1 kudo
  • 4 in conversation
Announcements