Hey all, I have an NV+ that is about 2 months old. I fired it up and loaded it with all of our data, backed up all files from PC's etc...it has 4 x 2TB WD Green Caviar drives at this point. All is well.Last Friday drive #3 shows up dead via email alert, so I do an RMA with WD and order a second hot spare from newegg so I won't have to wait a week praying my data is safe next time around. Today I received the replacement and I am attempting to return the NAS to its happy XRAID status. I had powered off the drive last week after receiving this message instructing me to do so:The X-RAID engine has failed to start. To ensure data integrity, please refrain from using the NAS and contact Support immediately. So here I am, new drive installed, awaiting some instruction.Powering up the NAS with the brand spanking new 2TB WD Green Caviar in slot 3 now shows it as "not present" in RAIDar and in the admin pages. Not sure why this is happening, but could use some advice here.When doing the last reboot, I chose: Perform volume scan on next boot. And the volume scan found no errors. However, the data volume is inaccessible and my shares are obviously missing as well.I can't access the rescan option, it is greyed out. How can I add this guy? Hope I am missing something very basic here.Thanks in advance,Mike

Please contact tech support (http://www.ReadyNAS.com/support) and seek their assistance. Post your case number in this thread.What version of RAIDiator are you running?

I have 4.1.7 - How do I attach the NAS log zip file to my case? Case #: 15693780

If/when they need the logs they will give you instructions on how to send them.

OK, thanks for the quick response. I read that I may have done a bad thing by installing the new drive and booting it up again. Should I power down, remove the new drive, or anything else that might be necessary to help sort this out with minimal damage?XRAID should allow me to lose one drive and still retain my data, yes? I purchased the NAS to prevent data loss! Ahh!

Hmm, I just noticed after doing a rescan in RAIDar, the drive in slot #3 [newly added today] is showing up with a normal green light [1863GB]But when I choose setup and load the web configuration page, Not Present. Page Cache issues?

Help swapping "dead" drive? [Case #15693780]

34 Replies

Replies have been turned off for this discussion

PapaBear1
Apprentice
May 30, 2011
I would get back with tech support. It is extremely unusual for a slot to go bad, but hardware failure is not impossible.

To answer your question about backups. For a long time when I only had a relatively small amount of data on my NV+, I used to backup using Windows copy/paste to a USB drive attached to my wired PC. I did this on a routine basis. That was not the fastest process, and later I put an 2TB drive in my desktop and backed up to it. I use my NAS more as a server.

Eventually that became too much of a time consuming chore as the amount of data and when the NV+ was about three years old, I added an NVX to my network and converted my NV+ to a backup target. Once I had the initial backup done, I converted the backup jobs to rsync and it only take minutes every night now starting at midnight. As I have over 1.5TB of data now, manual copy/paste is no longer an option.

When backing up NAS to NAS, rsync is the way to go as the incremental is really only the changed portions of a file, not the entire file. It synchronized the data between the two units and as only the changes are copied over, it is very fast. (Warning don't use rsync for the initial backup for the checking/verification process takes forever when copying data the first time).
mjw3786
Aspirant
May 30, 2011
I fear I am going to be SOL here. Looks like from what I have read in other cases, support won't help me if my drives are not compatible/on the HCL. I have WD Green Caviars [2TB] but they're not the correct model number. That's what I get for glossing over the HCL before scrambling to find a solution to back up my files.

Is there really NOTHING that can be done here then? Other options for data retrieval from the remaining healthy disks?
Bigbearf
Aspirant
May 30, 2011
@mjw3786
I have not read the entire thread but I have a RNPP and six WD20EADS drives. I recall updating firmware a while back and I had some problems. I had escalating LLCs and eventually ran the Dos utilities TLER ON & WDIDLE /D on the affected drive and it stopped the increasing LLCs. I then had a drive show up as "DEAD" and I do not remember exactly how this happened but to make a long story short here is the saga.

I went to BB and got two new WD drives with the proper firmware revision and then shut down the RNPP and pulled the dead drive. As suggested by mdgm I ran the vendor tools and found no problems. Next I ran the DOS utilities listed above and just for kicks put it back in the RNPP and flipped the power toggle and watched as the unit resynced the "DEAD" disk thinking that it would not work. After resyncing the "DEAD" hard drive showed no SMART errors and has functioned without problems for over 10 months. I still have one of the new hard drives available just in case but why don't you try the DOS utilities. What do you have to lose? Please check my posts for the "how to" steps and post your results.

Hope this helps.
bigbearf
PapaBear1
Apprentice
May 30, 2011
mjw3786 - what are the WD drives that you have installed.

Do you still have access to the volume?

What size is the data?

mjw3786

Aspirant

May 30, 2011

BigBearf wrote:
@mjw3786
I have not read the entire thread but I have a RNPP and six WD20EADS drives. I recall updating firmware a while back and I had some problems. I had escalating LLCs and eventually ran the Dos utilities TLER ON & WDIDLE /D on the affected drive and it stopped the increasing LLCs. I then had a drive show up as "DEAD" and I do not remember exactly how this happened but to make a long story short here is the saga.

I went to BB and got two new WD drives with the proper firmware revision and then shut down the RNPP and pulled the dead drive. As suggested by mdgm I ran the vendor tools and found no problems. Next I ran the DOS utilities listed above and just for kicks put it back in the RNPP and flipped the power toggle and watched as the unit resynced the "DEAD" disk thinking that it would not work. After resyncing the "DEAD" hard drive showed no SMART errors and has functioned without problems for over 10 months. I still have one of the new hard drives available just in case but why don't you try the DOS utilities. What do you have to lose? Please check my posts for the "how to" steps and post your results.

Hope this helps.
bigbearf

BigBearf wrote:
@mjw3786 I have not read the entire thread but I have a RNPP and six WD20EADS drives. I recall updating firmware a while back and I had some problems. I had escalating LLCs and eventually ran the Dos utilities TLER ON & WDIDLE /D on the affected drive and it stopped the increasing LLCs. I then had a drive show up as "DEAD" and I do not remember exactly how this happened but to make a long story short here is the saga. I went to BB and got two new WD drives with the proper firmware revision and then shut down the RNPP and pulled the dead drive. As suggested by mdgm I ran the vendor tools and found no problems. Next I ran the DOS utilities listed above and just for kicks put it back in the RNPP and flipped the power toggle and watched as the unit resynced the "DEAD" disk thinking that it would not work. After resyncing the "DEAD" hard drive showed no SMART errors and has functioned without problems for over 10 months. I still have one of the new hard drives available just in case but why don't you try the DOS utilities. What do you have to lose? Please check my posts for the "how to" steps and post your results. Hope this helps. bigbearf

Please bear with me as I am not familiar with the DOS tools you mentioned. This post gives me hope though, so thank you for that! A quick google search shows that WD no longer makes the wdidle utility available to consumers. Where is a safe place to get these tools? I assume I would need to connect the drives to a pc directly to run the utility? Is there any issue with doing this that might damage the data on disk? One other question. Should I first try running the tools on my original "dead" disk before attempting to get the NAS to recognize the new disks? I feel like the original disk may not be bad, but I suppose running the vendor tools can tell me this, yes?

PapaBear, I had originally installed 4 of the WD20EADS-00W4B0 [2TB WD Green Caviar - 64MB Cache] - when the slot 3 drive showed dead, I tried hot removing and adding it back with no luck. I tried rebooting the NAS a couple of times but continued to get "XRAID engine failed to start" and "shares missing" to go along with 3 "healthy" drives and a 0MB/OMB free message as well. argh

I RMA'd the "dead" drive to WD and they sent out a WD20EARS-00MVWB0 as a replacement. It looks like a recertified "green" drive with a black label. I tried hot adding this with no difference in results. While I was waiting for the RMA disk to arrive, I also ordered a second "hot spare" so as not to make this waiting process an issue in the future. This drive came from newegg and is also a WD20EARS-00MVWB0, this one being new with the standard green label. All of these are 2TB, in case that wasn't clear.

The volume is inaccessible "The paths for the shares listed below could not be found. Typically, this occurs when the ReadyNAS is unable to access the data volume. media backup" and I also see this when entering frontview: "A SATA reset has been performed on one or more of your disks that may have affected the RAID parity integrity. It is recommended that you perform a RAID volume resync from the RAID Settings tab ( accessible in the Volumes page => Volume tab in FrontView ). The resync process will run in the background, and you can continue to use the ReadyNAS in the meantime."

Obviously the resync volume button is greyed out and will not allow a resync. I think I need to go to the DOS tools stage now?

Here is the data size info, I believe:

Ch 1 : WDC WD20EADS-00W4B0 [1862 GB] 0 MB free
Ch 2 : WDC WD20EADS-00W4B0 [1862 GB] 0 MB free
Ch 4 : WDC WD20EADS-00W4B0 [1862 GB] 1860 GB free

If you guys can point me in the direction of the tools I need to download, I would greatly appreciate it. Thanks for taking the time to help me figure this out.

mjw3786

Aspirant

May 30, 2011

Finally found your thread where the TLER discussion is taking place. I assume it's here, yes? viewtopic.php?f=7&t=36560

As rec'd by c3po, I downloaded my NAS logs, and searched kernel.log for TLER:

May 25 18:09:09 MULTIBEAST_8TB kernel: WD drive does not support TLER, 0, 3530
May 25 18:09:09 MULTIBEAST_8TB kernel: hdc: WDC WD20EADS-00W4B0 (s/n:WD-WCAVY-------), ATA DISK drive (ATAEXT)
May 25 18:09:09 MULTIBEAST_8TB kernel: WD drive does not support TLER, 1, 3530
May 25 18:09:09 MULTIBEAST_8TB kernel: hde: WDC WD20EADS-00W4B0 (s/n:WD-WCAVY-------), ATA DISK drive (ATAEXT)
May 25 18:09:09 MULTIBEAST_8TB kernel: WD drive does not support TLER, 2, 3530
May 25 18:09:09 MULTIBEAST_8TB kernel: hdg: WDC WD20EARS-00MVWB0 (s/n:WD-WCAZA-------), ATA DISK drive (ATAEXT)
May 25 18:09:09 MULTIBEAST_8TB kernel: WD drive does not support TLER, 3, 3530
May 25 18:09:09 MULTIBEAST_8TB kernel: hdi: WDC WD20EADS-00W4B0 (s/n:WD-WCAVY-------), ATA DISK drive (ATAEXT)

Now i'm assuming this is the NAS reading the status from the drive and not a limitation of the hardware, is that correct? Another forum discussion of TLER seems to indicate that this utility will not work on the newer WD20EADS drives with the 64MB cache, which is what i've got. Hmm. Thoughts? Running the WD Lifeguard extended test now, so I think i'll go enjoy my day and hope for good news upon my return.

Bigbearf
Aspirant
May 31, 2011
@mjw3786
Please view my member posts. Here is a link that may help. viewtopic.php?f=64&t=32319&p=184386&hilit=WDIDLE3#p184386
There is a HOW TO in this link as I recall. The WD drives may not support TLER but they do support WDIDLE tools. If you run the TLER on an unsupported drive it will just give you a "Does not support TLER" response but it will not hurt anything.
Basically here are the steps.
1. Make a DOS disk including the tools
2. Attach your drives via PATA to SATA connector.
3. Insert Floppy and reboot machine.
4. Run DOS utilities and then check to see if they are correct. Very easy to do.
5. Exit DOS utilities and remove WD Hard drive.
6. Reinsert into ReadyNAS.
Hope this helps. Please post results.
bigbearf
mjw3786
Aspirant
May 31, 2011
Removed the problem drive, installed in to this PC as a secondary and ran the vendor tools. Result was "passed" and the drive appears fine with no errors.

So I spent way too long trying to make a bootable CD with the WDIDLE tool on it. Took a few tries due to having 64 bit windows 7 and no floppy disk.

THEN, I could not get WDIDLE to recognize my disk because I have a stupid Gateway with the BIOS setting to change the SATA mode from Onboard to RAID [or in my case, RAID to onboard] greyed out from the factory. So my options at this point are flash BIOS with unpredictable results, or try another machine. No use in potentially destroying a machine for days to troubleshoot a "dead" disk right?

I put the WD20EADS in a different PC as the only disk, booted to CD and ran wdidle3 /d and it ran for approx 1 second, showed the drive model and serial numbers then a message stating that the timer had been disabled. I think this is good, but the program did not return and spit out a "could not read from drive a:" which was probably an issue with the boot CD. So I exited out of the program and powered off the machine, insert that disk in to NAS [powered off] and start it up.

Same issue now. RAIDar shows the drive as normal and fine. Frontview reports the drive as not present and my shares are missing since the volume choked as it has been. So is the issue with these "incompatible drives" related to a setting in the drive that is detected early on and remembered so the NAS will ignore it from that point forward? This seems very strange to me, but I am not familiar with how the RAID controller communicates with the hardware.

Mon May 30 21:49:39 EDT 2011 A SATA reset has been performed on one or more of your disks that may have affected the RAID parity integrity. It is recommended that you perform a RAID volume resync from the RAID Settings tab ( accessible in the Volumes page => Volume tab in FrontView ). The resync process will run in the background, and you can continue to use the ReadyNAS in the meantime.
Mon May 30 21:47:06 EDT 2011 System is up.
Mon May 30 21:47:01 EDT 2011 The paths for the shares listed below could not be found. Typically, this occurs when the ReadyNAS is unable to access the data volume. media backup

I will let it run for another hour to see if anything happens, but it appears nothing has changed. I did discover that this PC has 2 open screwless bays for SATA hard disks. That has made it easy to work with the vendor tools. What are my options now? Should I try using the vendor tools to zero fille one of my brand new disks to see if that improves things? Should I connect them to the other PC to run WDIDLE on those too before I zero fill/try them in the ReadyNAS?

I'm starting to feel like this is not going to end well, but maybe the support team have some other suggestions. Hope it's not "unsupported drives, have a nice day" :-/

mjw3786

Aspirant

May 31, 2011

Some more ideas:

Was looking through the log from startup a few minutes ago to look for possible issues unrelated to the drive and found this:

May 30 21:45:24 MULTIBEAST_8TB kernel: NAND device: Manufacture ID: 0xec, Chip ID: 0x76 (Samsung NAND 64MiB 3,3V 8-bit)
May 30 21:45:24 MULTIBEAST_8TB kernel: Samsung NAND flash rev C 
May 30 21:45:24 MULTIBEAST_8TB kernel: size of table 4096
May 30 21:45:24 MULTIBEAST_8TB kernel: table is there 0x8
May 30 21:45:24 MULTIBEAST_8TB kernel: bad block 2592 replacing by 4095
May 30 21:45:24 MULTIBEAST_8TB kernel: total bad block 1
May 30 21:45:24 MULTIBEAST_8TB kernel: bad 2592 = 4095 bad 4095 = -1 Total bad block number 1
May 30 21:45:24 MULTIBEAST_8TB kernel: retlen = 0x0200
May 30 21:45:24 MULTIBEAST_8TB kernel: VPD checksum = 0x10f7
May 30 21:45:24 MULTIBEAST_8TB kernel: ECC is ON
May 30 21:45:24 MULTIBEAST_8TB kernel: Creating 2 MTD partitions on "NAND 64MiB 3,3V 8-bit":
May 30 21:45:24 MULTIBEAST_8TB kernel: 0x00000000-0x00100000 : "P0 flash partition 1"
May 30 21:45:24 MULTIBEAST_8TB kernel: 0x00100000-0x03ffc000 : "P0 flash partition 2"
May 30 21:45:24 MULTIBEAST_8TB kernel: NEON flash: probing 8-bit flash bus
May 30 21:45:24 MULTIBEAST_8TB kernel: CFI: Found no NEON flash device at location zero
May 30 21:45:24 MULTIBEAST_8TB kernel: NEON flash: unknown flash device, mfr id 0x1, dev id 0x0
May 30 21:45:25 MULTIBEAST_8TB kernel: NEON flash: Found no Atmel device at location zero
May 30 21:45:25 MULTIBEAST_8TB kernel: This board is not supported.

Is that referring to the system RAM? I did upgrade the RAM to a 1GB chip, but wouldn't I have other issues if that was bad? Hmm that's the first thing I saw. Then this:

May 30 21:45:25 MULTIBEAST_8TB kernel: X_RAID_START
May 30 21:45:25 MULTIBEAST_8TB kernel: startstop  XRAID command = start, flash_cache=0
May 30 21:45:25 MULTIBEAST_8TB kernel: X_RAID clean shutdown indicator: 0x7.
May 30 21:45:25 MULTIBEAST_8TB kernel: dj_raid: invalid raid superblock magic on hdi
May 30 21:45:25 MULTIBEAST_8TB kernel: 0 2 2 1 0 0 0 0
May 30 21:45:25 MULTIBEAST_8TB kernel: 0 1 0 0
May 30 21:45:25 MULTIBEAST_8TB kernel: 1 0 0 0
May 30 21:45:25 MULTIBEAST_8TB kernel: 0 0 0 0
May 30 21:45:25 MULTIBEAST_8TB kernel: 0 0 0 0
May 30 21:45:25 MULTIBEAST_8TB kernel: Update time for sb 1 = 4dd5b15f.
May 30 21:45:25 MULTIBEAST_8TB kernel: Update time for sb 2 = 4dd5b15f.
May 30 21:45:25 MULTIBEAST_8TB kernel: Update time for sb 3 = 4d85745d.
May 30 21:45:25 MULTIBEAST_8TB kernel: Update time for sb 4 = 0.
May 30 21:45:25 MULTIBEAST_8TB kernel: recent_ID = 1, select_ID=1, most_ID=2 right_mac=3
May 30 21:45:25 MULTIBEAST_8TB kernel: Selected sb 1, ctime=4dd5b15f, id=f5f2ab98.
May 30 21:45:25 MULTIBEAST_8TB kernel: Use this image: 1

Not sure what the "RAID superblock" is, but that probably shouldn't be invalid. That sounds important. And lastly:

May 30 21:45:54 MULTIBEAST_8TB kernel: sata_hotplug: /sbin/hotplug retry hdiUser mode helper start.
May 30 21:45:54 MULTIBEAST_8TB kernel: done do_sata_hotplug
May 30 21:45:57 MULTIBEAST_8TB kernel: IOERR: lp_stat=0x51, rq_sec=0x20, d_block=0x20, n_sect=0x20
May 30 21:45:57 MULTIBEAST_8TB kernel: IN MD: LP_ACTIVE=0, lp_error=64
May 30 21:45:57 MULTIBEAST_8TB kernel: ==== SATA init channel 3
May 30 21:45:57 MULTIBEAST_8TB kernel: After INIT SATA channel 3, retry=28, sata=113, status=50
May 30 21:45:57 MULTIBEAST_8TB kernel: Failing this request
May 30 21:45:57 MULTIBEAST_8TB kernel: Buffer I/O error on device hdi, logical block 2 (swapper)
May 30 21:45:57 MULTIBEAST_8TB kernel: Buffer I/O error on device hdi, logical block 3 (swapper)
May 30 21:45:57 MULTIBEAST_8TB kernel: Drive failed on channel 3, remove this drive.
May 30 21:45:57 MULTIBEAST_8TB kernel: Dump hwif 8041efb8 structure, 0-8041d398
May 30 21:45:57 MULTIBEAST_8TB kernel: 1-8041daa0|1-8041e1a8|1-8041e8b0|1-8041efb8
May 30 21:45:57 MULTIBEAST_8TB kernel: 0-8041f6c0|0-8041fdc8|0-804204d0|1-80420bd8
May 30 21:45:57 MULTIBEAST_8TB kernel:  hwif->name---------------------ide4
May 30 21:45:57 MULTIBEAST_8TB kernel:  hwif->hwgroup------------------81e95b40
May 30 21:45:57 MULTIBEAST_8TB kernel:  hwif->irq----------------------35
May 30 21:45:57 MULTIBEAST_8TB kernel:  hwif->present------------------1
May 30 21:45:57 MULTIBEAST_8TB kernel:  hwif->hold---------------------1
May 30 21:45:57 MULTIBEAST_8TB kernel:  hwif->noprobe^I^I^I0
May 30 21:45:57 MULTIBEAST_8TB kernel:  hwif->true_device^I^I1
May 30 21:45:57 MULTIBEAST_8TB kernel:  hwif->state0^I^I^I0
May 30 21:45:57 MULTIBEAST_8TB kernel:   hwif->drives[0].name----------hdi
May 30 21:45:57 MULTIBEAST_8TB kernel:   hwif->drives[0].present-------1
May 30 21:45:57 MULTIBEAST_8TB kernel:   hwif->drives[0].id_read-------1
May 30 21:45:57 MULTIBEAST_8TB kernel:   hwif->drives[0].noprobe^I0
May 30 21:45:57 MULTIBEAST_8TB kernel:   hwif->drives[0].dead^I^I0
May 30 21:45:57 MULTIBEAST_8TB kernel:   hwif->drives[0].id^I^I81e96940
May 30 21:45:57 MULTIBEAST_8TB kernel:   hwif->drives[1].present-------0
May 30 21:45:57 MULTIBEAST_8TB kernel:   hwif->drives[1].id_read-------0
May 30 21:45:57 MULTIBEAST_8TB kernel: Warning:chn 3, there is a pending retry on req 80ff69b8, current req 80ff6568
May 30 21:45:57 MULTIBEAST_8TB kernel: hwif->irq = 35, retry 1
May 30 21:45:57 MULTIBEAST_8TB kernel: sata_hotplug: /sbin/hotplug retry hdiUser mode helper start.
May 30 21:45:57 MULTIBEAST_8TB kernel: hwif->irq = 35, fail 1
May 30 21:45:57 MULTIBEAST_8TB kernel: sata_hotplug: /sbin/hotplug fail hdiUser mode helper start.
May 30 21:45:57 MULTIBEAST_8TB kernel: done do_sata_hotplug

Someone must have an idea what this indicates, right? How can the drive be "Dead" or "Not present" if it tests fine in the vendor tool. Windows was excited for me to format it when I popped it in to run the tool. :)

mdgm-ntgr
NETGEAR Employee Retired
May 31, 2011
Did you run the memory test (http://www.readynas.com/forum/faq.php#Is_there_a_way_I_can_verify_if_my_memory_is_good%3F) twice after upgrading the memory? Memory upgrades are not supported and incompatible memory can cause problems.

Forum Discussion

Help swapping "dead" drive? [Case #15693780]

34 Replies

Related Content

RBS40V "Dead"?

ReadyNas 526X "Data: dead"?

X-RAID2 "Dead", even though still fully-accessible

New drive is marked "Dead"

ReadyNAS Pro 6 with 6.4 reporting a status of "Dead"

NETGEAR Academy

ProSupport for Business