× NETGEAR will be terminating ReadyCLOUD service by July 1st, 2023. For more details click here.
Orbi WiFi 7 RBE973
Reply

Re: Disk Failure Detected...

bluewomble
Aspirant

Re: Disk Failure Detected...

Yes, it does seem a problem with these drives specifically.

I've had 3 'failures', all while doing large file transfers.

Since then , I have run the ReadyNAS's own extended disk test and run the SeaTools long drive test on all 6 drives... no problems have yet been found with _any_ of the disks.

I've done an OS reinstall at the request of the Netgear support guys and since then (touch wood) everything has been running normally... I might try doing some more large file transfers to see if I can stress the system though.

These don't feel like normal disk failures to me... feels more like a firmware / software bug in either the disks or the NAS (or the combination of the two).
Message 26 of 145
CitizenPlain
Aspirant

Re: Disk Failure Detected...

Yeah. I'm with Kevin & BlueWombie. Doesn't seem like a regular failure to me either and we've all got the same setup (Ultra 6 Plus and the same drives.) The drive keep working just fine and without errors after they're reinserted and checked.

Are you guys also all using OSX? I wondered if it was some sort of weird conflict with doing a file transfer and Time Machine trying to back up at the same time. I thought I observed that on my second failure earlier (was using the machine when the drive died -- other two times I wasn't in the room when I got the alert), but I can't tell if it was coincidental or not.
Message 27 of 145
imlucid
Aspirant

Re: Disk Failure Detected...

bluewomble wrote:
Yes, it does seem a problem with these drives specifically.

I've had 3 'failures', all while doing large file transfers.

Since then , I have run the ReadyNAS's own extended disk test and run the SeaTools long drive test on all 6 drives... no problems have yet been found with _any_ of the disks.

I've done an OS reinstall at the request of the Netgear support guys and since then (touch wood) everything has been running normally... I might try doing some more large file transfers to see if I can stress the system though.

These don't feel like normal disk failures to me... feels more like a firmware / software bug in either the disks or the NAS (or the combination of the two).


Just curious, what mechanism are you using to do the large file transfers?

I'm using cifs and afp shares at home but my mirrored NAS backed up 7TB via rsync and haven't had a single issue on it (same exact setup as the one that has had 3 failures, 4 Seagate drives, 2 Hitachi).

The CIFS share I use as a mount for iTunes and download TV Shows and Movies into it, the AFP mount I use for FCP editing, mostly read only as I have a local scratch disk.

Kevin
Message 28 of 145
imlucid
Aspirant

Re: Disk Failure Detected...

CitizenPlain wrote:
Yeah. I'm with Kevin & BlueWombie. Doesn't seem like a regular failure to me either and we've all got the same setup (Ultra 6 Plus and the same drives.) The drive keep working just fine and without errors after they're reinserted and checked.

Are you guys also all using OSX? I wondered if it was some sort of weird conflict with doing a file transfer and Time Machine trying to back up at the same time. I thought I observed that on my second failure earlier (was using the machine when the drive died -- other two times I wasn't in the room when I got the alert), but I can't tell if it was coincidental or not.


I'm using OSX (Leopard and SnowLeopard).

I do have Time Machine backups set up to this NAS as well (but not to my mirrored backup NAS)... I have no idea if TM was trying to run at the time but have seen TM errors.

Kevin
Message 29 of 145
bluewomble
Aspirant

Re: Disk Failure Detected...

I'm also using Macs (running Snow Leopard).. I have 4 machines all doing time machine backups and I'm using afp to connect to the macs and rsync to another nas... I have an Ultra 6 rather than a 6 Plus, but otherwise the same.
Message 30 of 145
CitizenPlain
Aspirant

Re: Disk Failure Detected...

imlucid wrote:
Just curious, what mechanism are you using to do the large file transfers?


My incidents have been while copying through the Finder from an external drive to an AFP-connected share on the ReadyNAS.
Message 31 of 145
evanhatesspam
Aspirant

Re: Disk Failure Detected...

On the Mac OS X question: I will eventually be using my new ReadyNAS with a Mac or two, but not until a clean RAID-6 volume is built...
Message 32 of 145
Greg_Staten
Tutor

Re: Disk Failure Detected...

More evidence that there may be something funky going on here, firmware wise.

Have a Pro Pioneer and after updating it this evening to the latest firmware, suddenly channels 2, 3, 4, and 5 report as dead. Haven't even had a SMART error up until now. Everything was fine until I did the firmware update.

If that isn't odd, here's what happened immediately after the firmware update:

1. RAIDar shows all drives as green.
2. Frontview simultaneously shows channels 2-5 as DEAD.
3. I can browse the shares and successfully both loaded files off the drives and moves files around.

About ten minutes later RAIDar also reported the drives had failed, as did the front panel on the ReadyNAS Pro Pioneer. The shares were no longer accessible.

Then I rebooted the device. After the reboot the same situation as above happens again. Good in RAIDar, DEAD in FrontView, shares accessible.

Ran a volume scan and it completed with no errors. Yet the same conditions apply: RAIDar thinks thinks are fine, FrontView thinks I'm screwed. My confidence is quite low at the moment...

-greg
Message 33 of 145
mdgm-ntgr
NETGEAR Employee Retired

Re: Disk Failure Detected...

Greg you probably should start a new thread, open an Online tech support case and post your case number there.
Message 34 of 145
CitizenPlain
Aspirant

Re: Disk Failure Detected...

Reading back through this, there seems to be a commonality in our disks being erroneously reported as dead and the use of OSX.

Summary: (user: details)

bluewombie: Ultra 6 (not plus), OSX Snowleopard, AFP
imlucid: Pro Pioneer, OSX Snow Leopard and Leopard, Time Machine
citizenplain (me): Ultra 6 Plus, OSX Snow Leopard, AFP file transfer, Time Machine
evanhatesspam: Ultra 6 Plus

None of us have had abnormal results from any disk checks. All of us are having trouble with the Seagate ST2000DL003-9VT166 drives.

Didn't catch if user bluewombie has Time Machine enabled. Not sure how evanhatesspam is accessing his NAS.

I updated to the new firmware (RAIDiator-x86 4.2.16) recently, but haven't had a need to do a large file transfer since updating. The updates described in the change log for this version didn't sound like they specifically address this problem. We'll see if it makes a difference.

Anything I'm missing here or any other details we could add to this? Bluewombie, did you ever get anywhere with tech support?
Message 35 of 145
bluewomble
Aspirant

Re: Disk Failure Detected...

CitizenPlain: Yes, that looks like a good summary. I am also using Time Machine.

My conversations with tech support weren't enormously helpful... the guy reccomended:
1. Running the ReadyNAS extended disk test (passed)
2. Running SeaTools long drive test on each disk (they all passed)
3. Reinstalling the OS (done)

Since I followed these steps, I haven't _yet_ had any further problems... though I'm reluctant to say that the issue is fixed because it seems to me that none of the above steps _ought_ to fix the problem.

Since then I have also installed the new firmware update (RAIDiator-x86 4.2.16).

I have tried to do some fairly large file transfers, but haven't yet managed to make the problem reappear.
Message 36 of 145
imlucid
Aspirant

Re: Disk Failure Detected...

Disk 3 reported "failed" a week ago, now disk 4...

Still no reported failures on my mirrored rsync unit...
Kevin
Message 37 of 145
imlucid
Aspirant

Re: Disk Failure Detected...

Hmm, interesting new developments...

So after drive reported failure I shutdown, pulled drive, replaced and booted.

Drive still shows missing so I do the same steps again waiting a bit longer and on boot it starts resyncing the drive. So far so good.

After it was around 1.6% done or so, I checked to see if I could mount cifs via finder, no problem.

I tried to load the frontview web admin and at first it wouldn't load, but after a couple of tries it loads but doesn't fill in any text. Then the drive's fans go on full blast and what appears to be something like a progress bar on the LED on the front of the NAS starts flashing from left to right a few times and the NAS then reboots...

Interesting.

So I do the same steps again and once again right when I'm trying to load the web UI, the server reboots again with same symptoms...

I'm now going to leave it alone as it is trying to resync again (3rd time a charm?).

Kevin
Message 38 of 145
diode
Aspirant

Re: Disk Failure Detected...

I have just experienced the same issue with my Ultra4 and these seagate harddrives.
I have performed a factory reset and this has fixed it for now but there has to be a bug somewhere.
Message 39 of 145
theali3n
Aspirant

Re: Disk Failure Detected...

Just had the same problem with Ultra4 Plus.

Also running all OSX in the house. The NAS has been up for weeks now. I've copied all my media on there. "today" was the 1st time I've really had a chance to use it. Besides copying several multi-gig isos around everything seemed to be ok. I enabled ReadyNAS backup today and kicked off a backup. It was about 70gig into a 250gig backup and I got a failure email from disk3.

I have not done any of the steps yet in this thread, but something weird going on and I specifically bought off the HCL having had issues with NV/NV+ and WD EARS drives awhile go.

from DMESG:

ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
ata5.00: failed command: FLUSH CACHE EXT
ata5.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0
res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
ata5.00: status: { DRDY }
ata5: hard resetting link
ata5: link is slow to respond, please be patient (ready=0)
ata5: COMRESET failed (errno=-16)
ata5: hard resetting link
ata5: link is slow to respond, please be patient (ready=0)
ata5: COMRESET failed (errno=-16)
ata5: hard resetting link
ata5: link is slow to respond, please be patient (ready=0)
ata5: COMRESET failed (errno=-16)
ata5: limiting SATA link speed to 1.5 Gbps
ata5: hard resetting link
ata5: COMRESET failed (errno=-16)
ata5: reset failed, giving up
ata5.00: disabled
ata5.00: device reported invalid CHS sector 0
ata5: EH complete
sd 4:0:0:0: [sdc] Unhandled error code
sd 4:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
sd 4:0:0:0: [sdc] CDB: Read(10): 28 00 a9 e8 08 00 00 00 08 00
end_request: I/O error, dev sdc, sector 2850555904
sd 4:0:0:0: [sdc] Unhandled error code
sd 4:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
sd 4:0:0:0: [sdc] CDB: Read(10): 28 00 00 12 89 80 00 00 08 00
end_request: I/O error, dev sdc, sector 1214848
md/raid1:md0: sdc1: rescheduling sector 1212736
sd 4:0:0:0: [sdc] Unhandled error code
sd 4:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
sd 4:0:0:0: [sdc] CDB: Write(10): 2a 00 00 90 00 50 00 00 02 00
end_request: I/O error, dev sdc, sector 9437264
end_request: I/O error, dev sdc, sector 9437264
md: super_written gets error=-5, uptodate=0
md/raid:md2: Disk failure on sdc5, disabling device.
<1>md/raid:md2: Operation continuing on 3 devices.
sd 4:0:0:0: [sdc] Unhandled error code
sd 4:0:0:0: [sdc] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
sd 4:0:0:0: [sdc] CDB: Write(10): 2a 00 00 00 00 48 00 00 02 00
end_request: I/O error, dev sdc, sector 72
end_request: I/O error, dev sdc, sector 72
md: super_written gets error=-5, uptodate=0
md/raid1:md0: Disk failure on sdc1, disabling device.
<1>md/raid1:md0: Operation continuing on 3 devices.
RAID conf printout:
--- level:5 rd:4 wd:3
disk 0, o:1, dev:sda5
disk 1, o:1, dev:sdb5
disk 2, o:0, dev:sdc5
disk 3, o:1, dev:sdd5
RAID conf printout:
--- level:5 rd:4 wd:3
disk 0, o:1, dev:sda5
disk 1, o:1, dev:sdb5
disk 3, o:1, dev:sdd5
RAID1 conf printout:
--- wd:3 rd:4
disk 0, wo:0, o:1, dev:sda1
disk 1, wo:0, o:1, dev:sdb1
disk 2, wo:1, o:0, dev:sdc1
disk 3, wo:0, o:1, dev:sdd1
RAID1 conf printout:
--- wd:3 rd:4
disk 0, wo:0, o:1, dev:sda1
disk 1, wo:0, o:1, dev:sdb1
disk 3, wo:0, o:1, dev:sdd1
md/raid1:md0: redirecting sector 1212736 to other mirror: sdb1

This sux 😞 It's not cheap to put this stuff together.
Message 40 of 145
theali3n
Aspirant

Re: Disk Failure Detected...

Disk 2 has just done the same thing.

It's got to be something with the backplane/raid, firmware and/or drive incompatibility. Maybe they the drives should be marked as such in the Compatibility guide.

Note. I plugged in my USB drive and was moving files from that to the raid disk locally. CP jobs were started from the console. Large files +/-2gig each.

My drive 3 had been stable since my previous post. I pray for a firmware upgrade...
Message 41 of 145
paul4321
Aspirant

Re: Disk Failure Detected...

So i have been running into the exact same issues, error logs and lack of solid evidence showing that the disks are bad.
However, im not running Netgear at all. I created my own Linux based NAS solution (based on some specs of commercial NAS solutions).

Hardware specs:
Supermicro X7SPA-H-D525 integrated D525 processor
Intel® ICH9R Express Chipset
4 x 2 TB ST2000DL003-9VT1 disks
Raid 5

I have had two failures in the past 3 weeks, and in both situations, I could shutdown the box, re-insert the disk and start it up again.
This would trigger a resync and everything looks fine.
Other sites I have read have made the following recomendataions:
Try new SATA cables
Try a more powerfull power supply (this wouldnt apply for Netgear users)
Try new disks
Check for overheating

I have been monitoring the heat of the disks, and all 4 are below 40 deg C (within appropreate ranges).

Another site mentioned that each disk would fail once, and only once. So once they all failed for the first time, you should not see any more issues.

My take on all this...
Switching disks would probably fix this issue. Since these disks are classified as "low power/green" disks, it could be that under increased stress, the disks dont draw enough power to compensate causing a temporary hardware failure in the drive. Future smart and manufacturer test don't stress the power of the disk the same way.

So if it continues, I will buy new SATA cables... if it still continues, I will buy new drives...
Message 42 of 145
imlucid
Aspirant

Re: Disk Failure Detected...

Well I've had the same disk reporting failures more than once so I don't think its a matter of fail once and your good...
Message 43 of 145
paul4321
Aspirant

Re: Disk Failure Detected...

So last night when my 4th disk was re-syncing, I decided to tax the system as heavily as possible. Within 30 min disks 2 and 3 both failed at the same time. A hard reboot (and a completely lost raid array) and all 4 disks looked, acted and tested normal (minus all my lost data which was ok because I have it backed up).
Message 44 of 145
paul4321
Aspirant

Re: Disk Failure Detected...

Similar threads:
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=625922
http://forum.qnap.com/viewtopic.php?f=182&t=39893&start=30

Some strong words on the Seagate site itself:
http://forums.seagate.com/t5/Barracuda-XT-Barracuda-Barracuda/ST2000DL003-Barracuda-Green-not-detected-at-BIOS/td-p/87154/page/7
Message 45 of 145
Royan
Aspirant

Re: Disk Failure Detected...

Add another case...

ReadyNas Pro 2 (RAIDiator-x86 4.2.17)
two Seagate ST2000DL003-9VT166 (CC32) disks.
One disk reported itself as dead and 'SMART+' greyed out

In my case it happened while runnning the 'bliss' plugin.
The plugin had a lot to do, and there was a lot of disk activity when it happened.

I didn't have to yank the drive.
After a restart and running smartctl from the shell, it all of a sudden decided the disk was ok, and started rebuilding the volume.

Now I'm thinking of replacing both disks (3 weeks old!), just in case it happens again...
So then it's just a case of finding a set of disks that are on the HCL, available from my retailers and doesn't have any black marks against them in the forum...:)

Edit:
Oh, and I'm not running a mac, and there was no errors in the smart data/logs...

brgds
Royan
Message 46 of 145
Upstate
Aspirant

Re: Disk Failure Detected...

...and add another one.

The Google gods led me here and I just had this same exact issue on a month old 2100.

Disk 2 gave up the ghost, hot swapped in another disk that wasn't recognized. Swapped in the reported Dead disk and with a cold restart was happily resyncing. A warm reboot did not do a thing.

My unit has the ST32000644NS and like everyone else FV reported the drive as dead, with no SMART errors and this is on 4.2.17.

And in keeping with the theme of the thread, I also have these same errors repeating in the system log:


Aug 2 16:44:04 HPNAS1 kernel: ata2: hard resetting link
Aug 2 16:44:04 HPNAS1 kernel: ata2: link is slow to respond, please be patient (ready=0)
Aug 2 16:44:04 HPNAS1 kernel: ata2: COMRESET failed (errno=-16)


And for the final kicker, we are also running OSX and AFP. Timemachine was enabled on 7/29 and the dying disk issue occured on 8/2.

Judging from the similarities posted I have disabled TM on our macs and will also do so on the NAS. Interestingly my other 2100 that is RSync'd with the first has no issues at all and is also not using AFP or Timemachine.

I am going to pilfer through the logs on the macs and see what TM was doing at the time of the supposed disk failure and wil report back any interesting findings.

Fits in nicely with:

CitizenPlain wrote:
Reading back through this, there seems to be a commonality in our disks being erroneously reported as dead and the use of OSX.

Summary: (user: details)

bluewombie: Ultra 6 (not plus), OSX Snowleopard, AFP
imlucid: Pro Pioneer, OSX Snow Leopard and Leopard, Time Machine
citizenplain (me): Ultra 6 Plus, OSX Snow Leopard, AFP file transfer, Time Machine
evanhatesspam: Ultra 6 Plus

None of us have had abnormal results from any disk checks. All of us are having trouble with the Seagate ST2000DL003-9VT166 drives.

Didn't catch if user bluewombie has Time Machine enabled. Not sure how evanhatesspam is accessing his NAS.

I updated to the new firmware (RAIDiator-x86 4.2.16) recently, but haven't had a need to do a large file transfer since updating. The updates described in the change log for this version didn't sound like they specifically address this problem. We'll see if it makes a difference.

Anything I'm missing here or any other details we could add to this? Bluewombie, did you ever get anywhere with tech support?



Updated...

Summary: (user: details)

bluewombie: Ultra 6 (not plus), OSX Snowleopard, AFP
imlucid: Pro Pioneer, OSX Snow Leopard and Leopard, Time Machine
citizenplain (me): Ultra 6 Plus, OSX Snow Leopard, AFP file transfer, Time Machine
evanhatesspam: Ultra 6 Plus
upstate: 2100 v2, OSX Snow Leopard and Leopard, AFP file transfer, Time Machine
Message 47 of 145
PiddeP
Aspirant

Re: Disk Failure Detected...

I've experienced the exact same problem, but with another Seagate model (ST31500341AS). They worked flawlessly with my Duo but when I switched to an Ultra 2 the problems started.

My system.log is filed with cycles like this one:
Aug 11 10:48:06 nas-8B-21-2C kernel: ata1: limiting SATA link speed to 1.5 Gbps
Aug 11 10:48:06 nas-8B-21-2C kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Aug 11 10:48:06 nas-8B-21-2C kernel: ata1.00: failed command: FLUSH CACHE EXT
Aug 11 10:48:06 nas-8B-21-2C kernel: ata1.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0
Aug 11 10:48:06 nas-8B-21-2C kernel: res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Aug 11 10:48:06 nas-8B-21-2C kernel: ata1.00: status: { DRDY }
Aug 11 10:48:06 nas-8B-21-2C kernel: ata1: hard resetting link
Aug 11 10:48:06 nas-8B-21-2C kernel: ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Aug 11 10:48:06 nas-8B-21-2C kernel: ata1.00: configured for UDMA/133
Aug 11 10:48:06 nas-8B-21-2C kernel: ata1.00: retrying FLUSH 0xea Emask 0x4
Aug 11 10:48:06 nas-8B-21-2C kernel: ata1.00: device reported invalid CHS sector 0
Aug 11 10:48:06 nas-8B-21-2C kernel: ata1: EH complete

Two days ago disk 2 turned "dead"/grey in Frontview, but came back alive after I pulled it out, rebooted and then put it back in. After a few hours it hade re-synced.

This has now happened again, but now disk 2 is still showing in Frontview as dead, without being present in the system! I can't make the system identify the slot as empty, and then re-sync as I insert the drive.

What should I do? Could a reboot with both of the inserted do any harm?

Cheers,

Peter
Message 48 of 145
PiddeP
Aspirant

Re: Disk Failure Detected...

PiddeP wrote:

This has now happened again, but now disk 2 is still showing in Frontview as dead, without being present in the system! I can't make the system identify the slot as empty, and then re-sync as I insert the drive.
r


Update: I booted up with only disk 1 inserted, inserted a new WDC-disk, which seem to re-sync correctly.
Message 49 of 145
jah313
Tutor

Re: Disk Failure Detected...

You can add me as well...

Ultra 6+

with 6: ST2000DL003

DISK 4 failed today while copying data from an NV+. I am sending them all back to Newegg and getting Hitachi drives.
Message 50 of 145
Top Contributors
Discussion stats
Announcements