NETGEAR is aware of a growing number of phone and online scams. To learn how to stay safe click here.

Forum Discussion

tcleland's avatar
tcleland
Aspirant
Feb 13, 2013

Rescuing from an epic fail

Hello folks. I've got a tough one here.

My lab runs a Pro, an 1100, and two NV+ ReadyNAS devices. In general things work well and I'm comfortable with all common management tasks. This problem just arose with one of the NV+ devices which is configured very simply using 'share' security mode. Its primary use is as a data-storage buffer that can be accessed by multiple analysis computers, though in reality it is mostly accessed by one. All of the drives are Seagate 1TB drives, though the specific model numbers vary (presently, two Barracuda ST31000 and two newer Barracuda ST1000DM003 drives). RAIDiator version is 4.1.7. Yes, I upgraded the other SPARC-based ReadyNASs in the lab to 4.1.10 but hadn't gotten to this one yet.

A few days ago, I got an alert that the SMART error count for disk 1 was rising.

Reallocated sector count has increased in the last day.

Disk 1:
Previous count: 5
Current count: 6

Growing SMART errors indicate a disk that may fail soon. If the errors continue to increase, you should be prepared to replace the disk.



This is routine, so I hot-swapped it out for a new drive. I got the usual series of email notifications that all was well:

Disk add event occurred on SATA channel 1.
...
RAID sync started on volume C.
...
Disk initialization successfully finished.
...
RAID sync finished on volume C. The volume is now fully redundant.


Then, three days later, I got this email from the NAS:

Access to the disk on channel (??) is producing I/O errors. Although the array is still redundant, please replace this drive as soon as possible, as it is likely to fail soon.


I got this email four times within a minute, then nothing more. Note that it couldn't even identify which disk was bad. I've never seen this error before. Frontview was unresponsive, and RAIDar said that all disks were just fine (green). I reasoned that the new Disk 1 was the likely culprit, so went to the NAS and replaced it with another new disk (hotplug). Things didn't look good when the ReadyNAS didn't react to the new disk being inserted. The Web interface was down, but RAIDar saw it -- oddly, RAIDar claimed that all four disks were fine (green light), so I couldn't use that info to confirm that disk 1 was the problem. I shut down and rebooted using the front power button (didn't require a forced shutdown - it shut down nicely). When it rebooted, though, I got only "missing shares" alerts:

The paths for the shares listed below could not be found. Typically, this occurs when the ReadyNAS is unable to access the data volume.

siwei
media
backup


these are the three existing shares -- the two defaults and the one we use. When I access the ReadyNAS over CIFS (normal Windows use), I can see the "siwei" share as a folder, seemingly perfectly normally, but when I double-click on that folder icon Windows tells me that it cannot access the share. Bad.

Moreover, when I look at the Frontview admin interface for the NV+ over https, it seems to be working mostly fine, but there are anomalies. It sees all four drives in the Health tab, and claims that they are all in good condition (though RAIDar did this with the crashed drive in place too). The volume is also green. All four disks give me reasonable SMART+ data, and the NV+ did recognize the model number of the second replacement for Disk 1 (which differed from the model number of the first replacement that presumably caused all the trouble).

But.

The Shares|Share Settings tab tells me that no shares exist. The Volumes|Volume Settings|RAID Settings tab, oddly, tells me that two of the four disks have "0 MB free" while the other two each have "928 GB free" -- all four are 931 GB drives. The disk in the slot that I replaced twice (#1) was one of those that claims 928 GB free. In a probably-related effect, the "Disk space" bar at the top lists 0 MB (0%) of 0 MB used. The home screen also lists:

"Offline, RAID Level , disks, 05 of 0 MB used"

That is, it doesn't report the RAID level (it was X), the number of disks (4, which it well knows in the Status tab), and the anomalous 0% of 0 MB reading.

At this point, I stopped fiddling, and wrote this post.

I'm assuming that the bad disk that I added was REALLY bad somehow and hosed something in the ReadyNAS firmware interface. However, the ReadyNAS did tell me that the disks were fully redundant (when the first, presumably bad disk was part of the array), so it should have been able to rebuild itself given my replacement of any disk.

One possibility is that a second, different disk had just happened to fail (in the second failure event), and that I therefore replaced the wrong one. I still have the disk that I first used to replace the disk with SMART errors and can replace it if a guru says that's the way to go. Unfortunately the original disk with the SMART errors is en route to Seagate for warranty replacement. But I am troubled that the ReadyNAS couldn't tell me which disk was the troublemaker after installation of the first replacement disk, which makes me think that something at a higher level got hosed.

I would very much like to know (a) what's going on -- if a guru can parse out the most likely scenario here, (b) if there is any chance of data recovery, and (c) if not, is a factory reset the only way out of this hole? or, perhaps, not even that?

Thanks folks. Here's the executive summary.

* Four disks in a ReadyNAS: 1,2,3,4.
* SMART errors in disk 1. Routine. Replaced with new disk "1A"
* 3 days later, bizarre new errors arise and the ReadyNAS cannot identify which drive is the problem. I assume that it is drive 1A, so replace it with another new drive, 1B.
* ReadyNAS now boots but is clearly not in a good way, and data are inaccessible. I write this post.

Thom

3 Replies

Replies have been turned off for this discussion
  • OS re-install won't help an offline volume. He will likely require support.

    His system is from 2007, so phone support WILL cost money. Email/Online is free though.
  • Thanks, folks. Let me summarize/infer so I'm sure I get what you are thinking.

    Does "offline volume" here refer to the fact that I can't access the existing share under CIFS? Or does it refer to Frontview's insistence that 0 MB are free on two of the disks, meaning that they are functionally offline so the volume cannot even be constructed?

    From EBrown's suggestion, I interpreted that somehow the Bad Drive (or possibly some coincident other cause) managed to hose the firmware copies on two of the other disks so that they don't play nicely with the system and can't even identify the bad disk.

    Question: if two disks go bad simultaneously, will Frontview report two bad drives by number or will it report "Drive ???" as I saw in this case. If the firmware got hosed on two drives (judging this because now Frontview shows two drives with "0 MB free"), and if this inability to identify the bad drives if there are more than one is normal Frontview behavior, and if two drives are hosed then of course you can't build the volume or rebuild the image onto the fourth disk -- then that would sort of explain everything. In which case it would seem that if I could fix the firmware image on the drives then they would get back with the program and, crossing fingers, fix everything. Is this right thinking? If so, I get that reinstalling the OS would be a good way to go. Am I correct in interpreting that, if all goes well, my volumes and data would be intact after this OS reinstallation?

    Or, counterpoint, why would this not work (as in "OS reinstall won't help an offline volume"). Will/could it hurt anything or is it worth a try? Please help me sort this out.

    Thanks very much for your help & input -

    Thom

NETGEAR Academy

Boost your skills with the Netgear Academy - Get trained, certified and stay ahead with the latest Netgear technology! 

Join Us!

ProSupport for Business

Comprehensive support plans for maximum network uptime and business peace of mind.

 

Learn More