NETGEAR is aware of a growing number of phone and online scams. To learn how to stay safe click here.
Forum Discussion
RTSwiss
Jan 30, 2013Aspirant
loss of array after drive failure; problems with ST3500320AS
At the top, a modest complaint about the operation of the forum. I started a message several hours ago, and then got interrupted. To be on the safe side I copied the text out to notepad but left the browser idling in the forum. When I returned and tried entering additional text the system allowed me to proceed without any problem, so I assumed my session was still active, and proceeded to complete a relatively long post. When I hit "Preview" the system required me to login again, and when I did the entire post was lost. Surely the system can be setup so that if you have been logged out of a session you are prevented from continuing to write a post.
I am not sure in exactly which forum this belongs. Device is a 4+ year old ReadyNAS NV+, originally shipped with 2x ST3500630NS, both still going strong after 40,000+ hours. I added a third Seagate drive at the beginning, and the device has gone through 2-3 of them in four years. Mostly they have been Seagate ST3500320NS, which are on the HCL but which the machine seems to chew up. They also tend to fail without warning, which happpened to me last fall and happened again this morning. So I do wonder just how NV+ (V1) compatible these really are.
First inidication of failure was inability to access shares. Device responded to ping, but could not be seen by raidar or accessed via frontview. From a failure last fall of a power supply, following replacement of which the device refused to boot (it started, then reported "Checking FS" and hung there), I learned that it also contained a bad drive (which Netgear insisted was unrelated to the PS failure), and which, I recall, was determined by checking its behaviour drive by drive. I recalled, perhaps incorrectly, that one could remove the drives one by one to see if that cured the problem. So I removed drive 3, a 630 NS, and the problem persisted; I replaced that and removed drive 1 (the 320AS), and the machine booted. It did not, however, show any drive LED's; after booting the LED displayed "Drive C: 0/0MB free"; the device could be pinged; it could be seen on Raidar (showing drives 2 and 3 present, but no volume); but could not be accessed by Frontview.
I then added an replacemernt 500 GB drive (another 320AS), and restarted again, with the same results -- no lit drive LED's, visible on Raidar (showing 3 drives and no volume) -- but now it could be accessed by Frontview. But the only available shares are those on a pair of attached USB devices used to make partial backups of the NAS. The Volumes tab under frontview shows 463 GB free on drives 1 (320AS) and 2 (630NS not diagnostically moved) and 0 GB free on drive 3, the 630NS that was removed while trying to identify the bad drive. I should add that before any of this happened, the device reported a capacity of 907 GB with around 500 GB used.
So my question is this. Did I, by removing one of the good drives and trying to restart the machine, irretrievably destroy the array, the volume, and the shares that it contained? I would have thought that, unless the machine tried to write to the drives while checking the file system, reinsertion of the non-failed drive would have allowed the array to be recovered. That's what happened following my PS and drive failure last autumn, but at least thus far it does not seem so in this case.
Any insight any of the experts out there might offer would be appreciated.
Thanks.
-- Ted
I am not sure in exactly which forum this belongs. Device is a 4+ year old ReadyNAS NV+, originally shipped with 2x ST3500630NS, both still going strong after 40,000+ hours. I added a third Seagate drive at the beginning, and the device has gone through 2-3 of them in four years. Mostly they have been Seagate ST3500320NS, which are on the HCL but which the machine seems to chew up. They also tend to fail without warning, which happpened to me last fall and happened again this morning. So I do wonder just how NV+ (V1) compatible these really are.
First inidication of failure was inability to access shares. Device responded to ping, but could not be seen by raidar or accessed via frontview. From a failure last fall of a power supply, following replacement of which the device refused to boot (it started, then reported "Checking FS" and hung there), I learned that it also contained a bad drive (which Netgear insisted was unrelated to the PS failure), and which, I recall, was determined by checking its behaviour drive by drive. I recalled, perhaps incorrectly, that one could remove the drives one by one to see if that cured the problem. So I removed drive 3, a 630 NS, and the problem persisted; I replaced that and removed drive 1 (the 320AS), and the machine booted. It did not, however, show any drive LED's; after booting the LED displayed "Drive C: 0/0MB free"; the device could be pinged; it could be seen on Raidar (showing drives 2 and 3 present, but no volume); but could not be accessed by Frontview.
I then added an replacemernt 500 GB drive (another 320AS), and restarted again, with the same results -- no lit drive LED's, visible on Raidar (showing 3 drives and no volume) -- but now it could be accessed by Frontview. But the only available shares are those on a pair of attached USB devices used to make partial backups of the NAS. The Volumes tab under frontview shows 463 GB free on drives 1 (320AS) and 2 (630NS not diagnostically moved) and 0 GB free on drive 3, the 630NS that was removed while trying to identify the bad drive. I should add that before any of this happened, the device reported a capacity of 907 GB with around 500 GB used.
So my question is this. Did I, by removing one of the good drives and trying to restart the machine, irretrievably destroy the array, the volume, and the shares that it contained? I would have thought that, unless the machine tried to write to the drives while checking the file system, reinsertion of the non-failed drive would have allowed the array to be recovered. That's what happened following my PS and drive failure last autumn, but at least thus far it does not seem so in this case.
Any insight any of the experts out there might offer would be appreciated.
Thanks.
-- Ted
14 Replies
Replies have been turned off for this discussion
- StephenBGuru - Experienced UserSince you purchased less than 5 years ago, then you likely are still covered by the warranty - so you should probably enter a support ticket. They might be able to restore your data (perhaps).
If you installed a new drive, then the system tries to rebuild the array. If you pull a drive, and reinstall it with the system running, then the system will treat it like a new drive, and again try to rebuild the array.
Have you monitored the SMART+ stats? Often that is an easy way to see which drive has failed. - RTSwissAspirantSadly, a purchase 4+ years ago was covered only by a 3-year warranty (Netgear was good, however, about an out-of-warranty replacement of a power supply, apparently a known problem at the time the unit was bought).
I did not replace any drive with the system running. I did force a shutdown, then restarted it with drive 3 removed, and got a "Checking FS" message in the LED and a stalled system. So I powered down, replaced drive 3 and removed drive 1, and then powered it up again. It then booked, went out and found the UPS and the two attached USB drives, but reported as to "Drive C: 0/0MB Free". I don't know whether it tries to write to the drives during bootstrap, but if it doesn't I would have thought the drive data were intact. And it is puzzling to me that the one drive that has remained in the system (drive 2) is now reported as empty.
I do not religiously monitor the SMART data, but the last time I looked, probably several weeks ago, there were no reported errors or reallocated sectors on any drive. And the drive in question had been in the machine for only about four months (and is still under its Seagate warranty), so it does seem to me that the NV+ does not play particularly well with this particular drive, though it is on the HCL. And I thought the machine was supposed to generate alerts if drive errors begin to mount.
Thanks for the thoughts. I may try opening a support ticket if I do not get help in the forum. - StephenBGuru - Experienced UserYour power down procedure should not have resulted in a rebuilt array.
And you should have gotten email and log alerts on reallocated sectors, though I think most SMART parameters are ignored. In general that's a good thing (many of them turned out to be unrelated to risk of failure). Though sometimes they are helpful in sorting out what's going on.
It is true that there are some drives on the HCL that don't seem to work out that well. WDC Green and Seagate ST2000DL003 for instance.
One next step might be to test the drives with Seatools on a PC.
Do you have a current backup? If so, then you could do a factory reset - which will wipe your configuration, add-ons, and data. Though I'd test with Seatools first. - RTSwissAspirantThe drives currently in the device by slot are (1) replacement ST3500320AS (third such drive to be installed on the device); (2)-(3) original ST3500630NS. The Smart data reported for each look normal, with an "LP stat events" (I've never been able to figure out what that means) of 1 reported for drive (1), no errors for (2), and an "ATA Error Count" of 1, which has been stable at that value for at least a year, for drive 3. Raidar can find the device and reports the presence of those three drives but no volume; none of the drive LEDs on the device itself is lit; and the volume tab on frontview reports three available disks but no volumes on the device other than the attached USB drives. When the device was rebooted following the drive tests described below, it issued the following warning:
"The paths for the shares listed below could not be found. Typically, this occurs when the ReadyNAS is unable to
access the data volume. media data backup"
I powered down and checked all drives with Seatools. Both original 630NS's pass; the replacement 320AS also passes. (The 320AS previously removed from slot 1 passed the short generic test but not the long.) I do have backups, but they are not entirely complete, and I would like to verify that I will be unable to recover the preexisting array before I take that step, because it does not seem as though anything I have done ought to have caused the current condition, and I would like to try and figure that out first. I have also looked at the logs, and with the information they provide here's a recap of what happened.
(1) Last Monday the logs reported the successful completion of two backup jobs.
(2) On Tuesday the shares became unavailable to any machine on the network, the device could not found by Raidar or accessed via Frontview. It did respond to being pinged. Pressing the power button did not illuminate the front panel display, and the device would not respond to a shutdown command from the panel. There are no log entries associated with any of this, and there is no indication of a previous accumulation of Smart errors.
(3) At that point I pulled the plug, and the log entry following the last backup job reports "Improper shutdown detected, . . . " When I restarted the device it started to report booting, but then displayed "Checking FS", but thereafter reported no progress. From my experience with a failure last fall, hanging at that point suggested the presence of a bad drive, so I shut it down, again having to unplug it, removed drive 3, and tried restarting again. It responded in the same way, so I unplugged again, reinserted drive 3, removed drive 1 (which Seatools later tested to be defective), and restarted again. This time it came up, but on each restart since then it has issued the warning quoted above, has then then reported "System is up," but in each instance the condition has been as described in the first paragraph of this post: found on Raidar, accessible by Frontview, but no Volume (C: or otherwise) present, device drive LEDs unlit. The only other log entries reflect the device being powered down or restarted, except for a couple of error messages reflecting the inability to start scheduled backups due to the unvailability of the shares to be backed up.
At no time was a drive removed or inserted while the machine was powered on. So it seems as though I ought to have been protected against the loss of a single drive, and it does not seem as though anything I have done should have caused the loss of that array, or the puzzling behaviour of the device since then.
Any other thoughts, anyone?
Thanks. - StephenBGuru - Experienced UserI think it would be hard to sort out what caused this w/o logs (and even then it might not be possible).
Did support tell you that they won't help because you are out-of-warranty? If not, maybe try submitting a request.
It is possible that the OS partition is full - so you might try to do an OS reinstall from the boot menu. That will reset your IP configuration to use DHCP and also reset your admin password to the factory default. Data and add-ons should be unaffected. - RTSwissAspirantLogs will probably not be helpful. Nothing remarkable in them beyond what I have already described.
I'm not sure about the OS reinstall. Does it reside on the drives or in non-volatile memory of some sort?
I have put in a request with tech support, which has acknowledged receipt but not yet said yea or nea.
Finally, as one added experiment, I removed all existing drives and installed a single 1TB Seagate 524AS. The machine came up, found the drive AND the volume, all visible on Raidar, but when I tried getting in via Frontview I found to my surprise that the admin password had been reset to the netgear default, and the device wanted to update the firmware immediately. (I did not let it.)
At this point I'd be happy to ditch the existing drives (I can use them for spares for a duo elswhere in the house) and install 1TB's. But I would really like to try and sort out what happened, and to recover the data from this array and copy it to the new drives. Does the fact that the machine works normally with a completely fresh drive in it shed any light on the hypothesis that the OS partition was full? That would only make sense if that resided on the hard drive.
Thanks. - StephenBGuru - Experienced UserWhat firmware are you running on the NAS?
An install image is kept in the NAS flash. When you do a factory default from the boot menu, or install empty drives (as you did with your 524AS), the NAS creates an OS partition on the drives, and boots from them.RTSwiss wrote: I'm not sure about the OS reinstall. Does it reside on the drives or in non-volatile memory of some sort?
An OS reinstall is an option in the boot menu which reinstalls the OS from the flash onto the OS partition. It doesn't wipe the existing partition, but it does do some cleanup. Like the clean install you just did, the admin password is reset to netgear default; also the IP configuration is reset to DHCP. Other settings and add-ons are preserved, and the data partition is not wiped. - RTSwissAspirant4.1.6. Sorry, but how do you initiate the OS reinstall? If it is somewhere on the Frontview menu I couldn't find it.
And the bad news is that I did not realize that installing a new drive would reset the system. Among other things the prior logs have disappeared. Have they been preserved on the old array? - StephenBGuru - Experienced UserAgain, the OS partition is on the drives, so the logs, etc are on the other array.
The boot menu does the OS install, details for each ReadyNas model are here: http://www.readynas.com/kb/faq/boot/how ... _boot_menu - "Duo / NV / NV+ / X6/600" is your section. - RTSwissAspirantThank you. I'm going to wait a day or two to see how tech support responds. I'll report back. I very much appreciate the help.
Related Content
NETGEAR Academy
Boost your skills with the Netgear Academy - Get trained, certified and stay ahead with the latest Netgear technology!
Join Us!