NETGEAR is aware of a growing number of phone and online scams. To learn how to stay safe click here.

Forum Discussion

yeneric's avatar
yeneric
Aspirant
Sep 01, 2012

Failed Drive - Can't Boot! HELP! #19341286

Hi everyone,

I hope someone out there can give me some guidance as I no longer have access to my data and somewhat stressed to say the least...

I've got the ReadyNAS NV and it's been working flawlessly for nearly 6 years! Here's the situation... My setup is 3x 2TB Seagate drives using X-RAID running RAIDiator 4.1.9. Starting Friday Aug 24th I received a number of email alerts:

    Aug-24 5:51:09am: RAID event detected; Access to the disk on channel (??) is producing I/O errors. Although the array is still redundant, please replace this drive as soon as possible, as it is likely to fail soon.
    Aug-24 5:51:09am: RAID event detected; Access to the disk on channel (??) is producing I/O errors. Although the array is still redundant, please replace this drive as soon as possible, as it is likely to fail soon.
    Aug-24 5:51:09am: RAID event detected; Access to the disk on channel (??) is producing I/O errors. Although the array is still redundant, please replace this drive as soon as possible, as it is likely to fail soon.
    Aug-24 5:51:10am: RAID event detected; Access to the disk on channel (??) is producing I/O errors. Although the array is still redundant, please replace this drive as soon as possible, as it is likely to fail soon.
    Aug-24 5:51:09am: Disk failure detected.; Disk fail event occurred on SATA channel 3.
    Aug-24 5:51:10am: Hotplug disk event detected; Disk add event occurred on SATA channel 3.
    Aug-24 6:59:04am: RAID event detected; RAID sync started on volume C.
    Aug-24 7:00am: New SMART disk errors detected!; ATA error count has increased in the last day. Disk 1: Previous count: 0 Current count: 3183
    Aug-24 4:26:09pm: RAID event detected; RAID sync finished on volume C. The volume is now fully redundant.
    Aug-25 4:00am: New SMART disk errors detected!; Reallocated sector count has increased in the last day. Disk 1: Previous count: 0 Current count: 2 ATA error count has increased in the last day. Disk 1: Previous count: 3183 Current count: 3191 Reallocated sector count has increased in the last day. Disk 3: Previous count: 0 Current count: 934 Growing SMART errors indicate a disk that may fail soon. If the errors continue to increase, you should be prepared to replace the disk.
    Aug-26 4:00am: New SMART disk errors detected!; Reallocated sector count has increased in the last day. Disk 1: Previous count: 2 Current count: 4ATA error count has increased in the last day. Disk 1: Previous count: 3191 Current count: 3201 Growing SMART errors indicate a disk that may fail soon. If the errors continue to increase, you should be prepared to replace the disk.

Unfortunately, I was out of town away from email and didn't receive these alerts until the 26th and would still be away for another 5 days. I was able to access my home network remotely and log into frontview and shut down the ReadyNAS until my return yesterday. However, upon arriving home, I noted that the power button LED was still on despite my shutdown attempt (phasing in and out.) I unplugged the power and powered back on but RAIDar was unable to detect it. I reset again and could ping the ReadyNAS by IP address, but not by name. This time RAIDar was able to detect it but with a blue status light and last column contained "bad disks detected".

I shut down again removed each disk and ran Seagate's Seatools for windows via an eSATA connection. Disk 1 failed both the generic short and long hard drive tests, Disk 2 passed the short test (didn't try long) and Disk 3 passed both short and long tests. Now when I boot up the ReadyNAS the power button LED just fades in and out with no drive LED or activity LED indicators flashing. I've tried to start up with Disk 1 removed, thinking that maybe its problems were causing the issues and that 2 and 3 could run without redundancy long enough for me to back up my files but I get the same problem. After many successive tries I've been unable to even get RAIDar to detect my ReadyNAS, nor have I been able to ping it at all, whether by name or IP. After hunting around the forums, I suspect I *may* have (but am not sure) unintentionally held the power button long enough when turning on to run the "skip volume check" mode which I now know comes with warnings.

I'm scared to death of losing my and my family's documents, pictures, history, etc... and am hoping with every ounce of my being that I've not lost it all. Any advice, recommendations, words of encouragement or assistance in any form would be greatly appreciated.

BTW, I'm willing to accept my stupidity for not setting up a backup for my RAIDed setup. I did have, but my backup died a long while ago and I hadn't replaced it yet due to other priorities and a sense of over-confidence established by the consistently solid performance of my ReadyNAS over the years. I'm desperately hoping I won't be paying for this mistake for years to come with permanent data loss and will actively ensure I will never put myself through this again!!!

I've opened a ticket with my.netgear.com (ticket #19341286) so I'm sitting here holding my breath, crossing my fingers and knocking on so much wood I fear my knuckles will never be quite the same.

Thoughts, ideas.... please?!

Thanks!

19 Replies

Replies have been turned off for this discussion
  • So I've found a bit of time to do some more poking around and here's my progress thus far (I'm not sure I can really call it progress yet as I've still got no access to any of my files, but I've at least got some more information)...

    I've picked up two additional 2TB drives, thinking that I can at minimum clone the two drives that passed all the diagnostic tests and then try to somehow mount them.

    I connected each of the disks to my desktop (running Windows 7) and here's what I found when checking them out in Computer Management --> Storage --> Disk Management:

      Disk 1 -- This is the one that has SMART errors and fails all the Seagate SeaTools tests (both Windows and DOS versions) and seems to need replacement
      There are a number of healthy partitions displayed, looking similar to disk 3

      Disk 2 -- passed all SeaTools tests
      All space is unallocated, no partitions and I keep getting prompted to initialise -- this was the big surprise and somewhat disheartening.

      Disk 3 -- passed all SeaTools test
      There are a number of healthy partitions displayed, looking similar to disk 1

    So first I cloned disk 3 using the EaseUS ToDo Backup Advanced Server 5 Trial. It has a sector by sector option and it seemed to work out ok. It completed and the cloned disk has a bunch of healthy looking partitions on it according to Windows. I actually attempted to clone disk 1 first, but the app just kind of sat there and didn't do anything so I figured it wasn't going to happen.

    Then when I tried to clone Disk 2 (the uninitialised one), I couldn't because it was uninitialised. I didn't think initialising it would help my cause so after some research, I downloaded a Knoppix image and used 'dd' to clone disk 2. It took a heck of a long time, but it eventually completed. The cloned disk is also entirely unallocated; strangely though, it does seem to be initialised (I think with MBR, as Windows gives option to convert to GPT.)

    I'm currently running a complete scan on my clone of disk 2 with EaseUS Partition Recovery to see if that might work and it'll likely take a long time. It's a little past a quarter of the way through and I'm getting interesting results. It's found 9 partitions so far: 2 EXT3, 5 "FAT12" and 3 NTFS. I can't imagine this is correct, but maybe there are patterns its looking for that coincidentally match? Anyway, I'll let it finish and see if there are more found that look a bit more reasonable.

    I think my next step once this scan of disk 2 (clone) and subsequent fiddling is done might be to see if I can get a clone of my damaged disk 1 using the Knoppix/dd method to see if that fares any better than in Windows. Maybe I'm being optimistic, but in theory if I can somehow get disk 1 back, maybe a disk 1 and 3 combo would have me sailing away into the sunset with all my data. sigh.

    Anyway, if there are any good suggestions given any of this new info, please do tell as I'm not totally immersed in data recovery and I'm sure there are things to try that may not be obvious to a relative newb.

    I'll try to exhaust a few more paths and hopefully there are a few things left to try before being forced down the last resort path (and brutally more expensive) of professional recovery.

    Thanks for any ideas that may be on the way!
  • mdgm-ntgr's avatar
    mdgm-ntgr
    NETGEAR Employee Retired
    It's possible disk 2 might be fine. When you last did a factory default or initial setup were there two or fewer disks installed? It's possible disk 2 is the parity disk. If this is the case it's normal for there to be no partition table on the disk and for the disk to appear to be uninitialised to your PC. Data recovery using a parity disk would only be possible in a ReadyNAS.

    To attempt to clone a damaged disk use dd_rescue not dd.

    Rabbie wrote:

    It is better do clone job as below instruction:
    CLONE DISK:
    Use a Knoppix 6.2 Live CD for this guide. Can be found at http://www.knoppix.net
    Using dd_rescue command allows you to copy data from one drive to another block for block. This is especially useful for recovering a failed drive. Often when a drive fails, the drive is still accessible, it has just surpassed the S.M.A.R.T. error threshold. dd_rescue allows you to ignore the bad sectors and continue cloning the bad drive to a new healthy drive.

    1) Connect your old drive and new drive to your PC
    2) Boot up using your Linux live CD
    3) Launch a terminal window.
    4) Run fdisk -l to make sure the system sees both of the hard drives.
    5) Run hdparm -i /dev/sdx on both of the drives to find which drive is your source drive and which drive is your destination drive
    6) Once you know which drive is which you can start the clone process.
    dd_rescue /dev/sdx(source disk) /dev/sdx(destination drive)
    7) You will see the process start, just keep an eye on it, it might take a few hours for the clone job to finish, depending on the size of the drive.

    Once the process is complete, there will be no notification, the transfer will just stop and you will see the terminal prompt again.

    If you see a lot of errors or see that there is no more data being shown as succxfer: it means the drive got marked faulty by the kernel. At this point reboot the system and make sure you know which drive is which again, as it is possible they lettering might switch. Run the dd-rescue command again but this time with -r option. This will start the cloning again but this time will start from the back of the drive and will make sure to get the data that has not been cloned yet.
  • Thanks for the feedback mdgm!

    It was initially a 3x500GB array and then I switched it live (x-raid) to a 3x2TB array. I'm fairly certain I swapped the disks out in order starting with disk 1. I'm not sure if that would land the parity disk as disk 2 or not. So I suppose if disks 2 and 3 are actually good still it could be a hardware problem with my ReadyNAS since it still can't boot with just disks 2 and 3.

    I appreciate for the tip on dd_rescue. I'll likely give it a whirl tomorrow, but I figure I should let this partition scan complete since its come this far, if only to see what else it comes up with. I go to bed now with somewhat more positive thoughts -- thanks!
  • mdgm-ntgr's avatar
    mdgm-ntgr
    NETGEAR Employee Retired
    It's possible (not sure) that simply replacing the failed disk with a new disk might have resolved the problem. Probably too late to try that now though (well I wouldn't try it before cloning the good disks)
  • Status update and a few questions:

    First status:
    • I've now got three new 2TB drives; I used dd to clone the two error free drives; I tried to add a new clean drive in the readynas with the two good drives but nothing. I then tried dd_rescue to clone the failed drive. dd_rescue crashed after over 20 hours twice, so tried ddrescue (no underscore) and was more successful. It was able to successfully recover 1.89TB of the failed 2TB drive. I thought this was quite the achievement, but when it's a striped volume, the disk on its own doesn't get me very far. I tried a few utilities that claimed to read raid 5 volumes, but nothing. I tried sticking the disks back in the readynas in every combination (all 3, 1&2, 2&3, 1&3) but also to no avail. My readynas just won't boot. It starts to boot and once I even picked it up on Raidar with status booting before it disappeared again and just sat there with the blue power light pulsing.


    Now questions (any help is gratefully accepted):

    • What are the chances that there are logs on the ReadyNAS that might explain what happened? (i.e. when one drive failed, did something else get corrupt? Did it start to re-initialise my other two drives? Is there just some kind of firmware corruption? Hardware issue? )

    • If such logs exist, how might I gain access to them given that I can't boot? Would sticking in a separate clean disk allow it to boot? If so, would that reset old logs and settings?



    Basically, what I'm trying to do is try to determine the best next step. If there's any indication from the log data that the readynas itself may have issues due failure or corruption I'd have a pretty good chance to get stuff back by finding another machine to pop my disks into, whereas if the logs indicated that it recognised my failed volume as new disks and did bad things to them then I know I've likely got some pricey recovery process ahead.

    I'm a little disappointed in the Netgear support though I should've expected as much since I do have a much older model. Their response is that they can't help me because my product is out of support and EOL. :-( I still feel like they could chime in with a few suggestions other than take your drives to a local IT technician. sigh.

    Anyway, I guess I'm going to start looking into getting quotes on the recovery process. If there are any suggestions out there please throw them my way and thanks fro the comments and assistance thus far!
  • mdgm-ntgr's avatar
    mdgm-ntgr
    NETGEAR Employee Retired
    There's a 2GB OS partition and the data volume is on a separate partition. If the partition table and the 4GB OS partition is fine that "might" be able to be mounted and the logs would be in there. If your data is important I would suggest contacting a data recovery company.
  • Hi all,

    GREAT NEWS!!! I'm currently in the process of copying photos of my two childrens' birth from my cloned NAS drives connected to my PC onto an external hard drive!!!

    So after my last post, I connected my drives again to my PC and booted into Knoppix from a CD (my native OS is Win7) and tried to browse the devices. I could read all the OS files and log files but they didn't mean all that much to me. I saw all kinds of symlinks to my shares, but obviously no data as they would be on other partitions. I'm no guru when it comes to this stuff so I get why I can't access it, but just didn't have the extra bit of knowledge to get me there. That and I didn't even know the state of my drives.

    I buckled and decided to call a data recovery place. All the places I called had a minimum price tag of $1500. The upper range was from $3000 to $15,000!!! So I sent in my original drives to the 1500-3000 place. At this point I was fairly confident that recovery was possible since I was able to recover 99.4% of the failed drive and all I needed was two good drives, right?

    So I got an email this evening with my quote...

    One drive was found to have problems internal to the Head Disk Assembly. Another drive was found to have severe platter degradation. We will need to do whatever is required to get a read and create good images of both failed drives (this will improve the chances for a successful recovery rather than focusing on one drive only). We will then rebuild the RAID and try to put the files and directories back together to get them as close as possible to their original state. At the end of the process, we will provide you with a file list to review so you can approve the completion of work.
    The cost of the recovery will be $2,690 and the estimated time to complete your case is 9 business days.


    That's all I needed to go over my options again. It wasn't only the qutoe but the explanation just didn't sit right with me. Before I got my second wind to try again myself, I was trying to devise a strategy to take my cloned drives into another place to see if they could verify anything that this company was saying. In any case, I decided to give it another whirl myself and boy am I glad I did.

    Here's what I did to mount my volume under Knoppix (based mostly on this post):


    • First I connected all three drives to my desktop computer (clones of my originals made with dd_rescue for the good two, and ddrescue in reverse for the bad one.) I guess I'm lucky to have had the spare SATA ports

    • The I booted up with a Knoppix CD downloaded online

    • I opened a root shell (from the knoppix menu.

    • I typed the following sequence of commands:

      • lvm pvscan

      • lvm vgchange -ay c

      • lvm lvs

      • mkdir test

      • mount -o ro /dev/mapper/c-c test


    • At this point, the "test" directory I'd just created contains all my shares!!!


    For completeness, here is the output of the entire ordeal including a listing of my precious shares:
    root@Microknoppix:/home/knoppix# lvm pvscan
    PV /dev/sdd5 VG c lvm2 [463.50 GiB / 0 free]
    PV /dev/sde5 VG c lvm2 [463.50 GiB / 0 free]
    PV /dev/sdd6 VG c lvm2 [1.36 TiB / 0 free]
    PV /dev/sde6 VG c lvm2 [1.36 TiB / 5.00 GiB free]
    Total: 4 [3.63 TiB] / in use: 4 [3.63 TiB] / in no VG: 0 [0 ]
    root@Microknoppix:/home/knoppix# lvm vgchange -ay c
    1 logical volume(s) in volume group "c" now active
    root@Microknoppix:/home/knoppix# lvm lvs
    LV VG Attr LSize Pool Origin Data% Move Log Copy% Convert
    c c -wn-a--- 3.63t
    root@Microknoppix:/home/knoppix# mkdir test
    root@Microknoppix:/home/knoppix# mount -o ro /dev/mapper/c-c test
    root@Microknoppix:/home/knoppix# cd test
    root@Microknoppix:/home/knoppix/test# ll
    total 104
    -rw------- 1 root root 7168 Aug 26 16:46 aquota.group
    -rw------- 1 root root 8192 Aug 26 16:46 aquota.user
    drwxrwx--- 7 nobody nogroup 4096 Jul 29 10:25 backup_orig
    drwx------ 3 nobody nogroup 4096 Dec 17 2006 backups
    drwxrwx--- 81 1003 libuuid 8192 Aug 24 13:00 docs
    drwxrwx--- 16 nobody nogroup 4096 Mar 5 2011 fsbackup
    drwxr-xr-x 3 98 98 4096 Sep 22 2006 home
    drwx------ 2 root root 16384 Sep 22 2006 lost+found
    drwxrwxr-x 6 nobody nogroup 4096 Sep 17 2011 media
    drwxr-xr-x 264 nobody nogroup 12288 Jul 29 10:25 music
    drwxr-xr-x 40 nobody nogroup 4096 Jul 29 10:25 pictures
    drwxrwx--- 10 1003 libuuid 4096 Aug 11 17:45 software
    drwxr-xr-x 13 nobody nogroup 4096 Nov 8 2010 video
    drwxrwxrwx 6 nobody nogroup 4096 Apr 9 2007 vssdb
    root@Microknoppix:/home/knoppix/test#


    I've only got a 500GB external available at this moment so I'm copying the critical things. I guess I'll politely decline the quote and ask the data recovery folks to ship my drives back. I'll have to pay the return shipping, but I haven't had to pay anything else at this point. As long as I get my critical files off, and I'll know soon enough (less than an hour and a half left in the copy), I'm good to go. My photos are now complete and I'm just waiting on my documents. :D

    Once I receive my drives back I guess I'll try to recover all the remaining data back onto the good disks and then I can try to rebuild my ReadyNAS from scratch. At least now I've got the additional disks I can permanently mount in my PC and mirror from the NAS so this doesn't happen again!!!

    Just as a side note I did see and skim the above post that helped me so much before; however, with my limited knowledge of Linux I didn't really get it and saw too many dissimilarities (like ReadyNAS version, disk images, etc...) that I didn't pay much attention. I came back to it when I got my quote and spent some more time researching the various parts of the processes and letting the whole thing sink in a bit more. Anyway, thanks to everyone who chimed in on this thread and all those who've posted before that provided a little piece of the puzzle in my head. A big shout out goes to mjw who was the poster of the thread linked above that really got me out of this mess.

    I hope from here on in is a good news story, but things are looking bright so far. I hope this info can assist someone else if they run into an unfortunate circumstance similar to mine.

    Cheers!
  • mdgm-ntgr's avatar
    mdgm-ntgr
    NETGEAR Employee Retired
    Great news!

    When you rebuild the NAS, I'd suggest updating to 4.1.10: http://www.readynas.com/RAIDiator_4_1_10_Notes and do a factory default (wipes all data, settings, everything) on that firmware.

    I hope you keep your backup up to date in future. Might want to use some of the money you saved by not having to pay a data recovery company on setting up a good backup strategy.
  • Makes the second (backup) NAS and disks look inexpensive in comparison.

NETGEAR Academy

Boost your skills with the Netgear Academy - Get trained, certified and stay ahead with the latest Netgear technology! 

Join Us!

ProSupport for Business

Comprehensive support plans for maximum network uptime and business peace of mind.

 

Learn More