× NETGEAR will be terminating ReadyCLOUD service by July 1st, 2023. For more details click here.
Orbi WiFi 7 RBE973
Reply

Re: Disk failure detected.

backstr
Aspirant

Disk failure detected.

Hi,

I have a new Ultra 4 (running 4.2.16). A few hours after sync finished on the second disk (only two installed) I got a disk failure warning. Both drives in the array were new (they had been in an NV+ for the previous week). I replaced the failed disk with a spare, but it was not detected. I removed the disk and tried the failed disk in slot 3 and it was detected. I rebooted the Ultra and after the reboot, both the 'failed' drive and the spare were detected in slot 2. Any thoughts on whether I have a bad drive that messed up slot 2, or a bad slot 2?

Here's the system.log from the disk failure:

Apr 13 20:42:11 RNDU4000 RAIDiator: RAID sync finished on volume C.
Apr 13 20:42:16 RNDU4000 RAIDiator: RAID sync finished on volume C. \n\n[Wed Apr 13 20:42:09 CDT 2011]
Apr 13 20:42:19 RNDU4000 kernel: md: md1: recovery done.
Apr 13 20:42:20 RNDU4000 kernel: RAID conf printout:
Apr 13 20:42:20 RNDU4000 kernel: --- level:5 rd:2 wd:2
Apr 13 20:42:20 RNDU4000 kernel: disk 0, o:1, dev:sda2
Apr 13 20:42:20 RNDU4000 kernel: disk 1, o:1, dev:sdb2
Apr 13 20:46:14 RNDU4000 cnid_dbd[1469]: Set syslog logging to level: LOG_NOTE
Apr 13 21:11:15 RNDU4000 cnid_dbd[4221]: Set syslog logging to level: LOG_NOTE
Apr 13 21:32:43 RNDU4000 cnid_dbd[6788]: Set syslog logging to level: LOG_NOTE
Apr 13 23:38:42 RNDU4000 kernel: ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Apr 13 23:38:42 RNDU4000 kernel: ata2.00: failed command: FLUSH CACHE EXT
Apr 13 23:38:42 RNDU4000 kernel: ata2.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0
Apr 13 23:38:42 RNDU4000 kernel: res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Apr 13 23:38:42 RNDU4000 kernel: ata2.00: status: { DRDY }
Apr 13 23:38:42 RNDU4000 kernel: ata2: hard resetting link
Apr 13 23:38:42 RNDU4000 kernel: ata2: link is slow to respond, please be patient (ready=0)
Apr 13 23:38:42 RNDU4000 kernel: ata2: COMRESET failed (errno=-16)
Apr 13 23:38:42 RNDU4000 kernel: ata2: hard resetting link
Apr 13 23:38:42 RNDU4000 kernel: ata2: link is slow to respond, please be patient (ready=0)
Apr 13 23:38:42 RNDU4000 kernel: ata2: COMRESET failed (errno=-16)
Apr 13 23:38:42 RNDU4000 kernel: ata2: hard resetting link
Apr 13 23:38:42 RNDU4000 kernel: ata2: link is slow to respond, please be patient (ready=0)
Apr 13 23:38:42 RNDU4000 kernel: ata2: COMRESET failed (errno=-16)
Apr 13 23:38:42 RNDU4000 kernel: ata2: limiting SATA link speed to 1.5 Gbps
Apr 13 23:38:42 RNDU4000 kernel: ata2: hard resetting link
Apr 13 23:38:42 RNDU4000 kernel: ata2: COMRESET failed (errno=-16)
Apr 13 23:38:42 RNDU4000 kernel: ata2: reset failed, giving up
Apr 13 23:38:42 RNDU4000 kernel: ata2.00: disabled
Apr 13 23:38:42 RNDU4000 kernel: ata2.00: device reported invalid CHS sector 0
Apr 13 23:38:42 RNDU4000 kernel: ata2: EH complete
Apr 13 23:38:42 RNDU4000 kernel: sd 1:0:0:0: [sdb] Unhandled error code
Apr 13 23:38:42 RNDU4000 kernel: sd 1:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Apr 13 23:38:42 RNDU4000 kernel: sd 1:0:0:0: [sdb] CDB: Read(10): 28 00 d3 a4 86 40 00 00 08 00
Apr 13 23:38:42 RNDU4000 kernel: end_request: I/O error, dev sdb, sector 3550774848
Apr 13 23:38:42 RNDU4000 kernel: sd 1:0:0:0: [sdb] Unhandled error code
Apr 13 23:38:42 RNDU4000 kernel: sd 1:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Apr 13 23:38:42 RNDU4000 kernel: sd 1:0:0:0: [sdb] CDB: Write(10): 2a 00 00 90 00 50 00 00 02 00
Apr 13 23:38:42 RNDU4000 kernel: end_request: I/O error, dev sdb, sector 9437264
Apr 13 23:38:42 RNDU4000 kernel: end_request: I/O error, dev sdb, sector 9437264
Apr 13 23:38:42 RNDU4000 kernel: md: super_written gets error=-5, uptodate=0
Apr 13 23:38:42 RNDU4000 kernel: md/raid:md2: Disk failure on sdb5, disabling device.
Apr 13 23:38:42 RNDU4000 kernel: <1>md/raid:md2: Operation continuing on 1 devices.
Apr 13 23:38:42 RNDU4000 kernel: sd 1:0:0:0: [sdb] Unhandled error code
Apr 13 23:38:42 RNDU4000 kernel: sd 1:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Apr 13 23:38:42 RNDU4000 kernel: sd 1:0:0:0: [sdb] CDB: Read(10): 28 00 00 11 6c f8 00 00 08 00
Apr 13 23:38:42 RNDU4000 kernel: end_request: I/O error, dev sdb, sector 1142008
Apr 13 23:38:42 RNDU4000 kernel: md/raid1:md0: sdb1: rescheduling sector 1139896
Apr 13 23:38:42 RNDU4000 kernel: sd 1:0:0:0: [sdb] Unhandled error code
Apr 13 23:38:42 RNDU4000 kernel: sd 1:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Apr 13 23:38:42 RNDU4000 kernel: sd 1:0:0:0: [sdb] CDB: Read(10): 28 00 00 50 09 28 00 00 08 00
Apr 13 23:38:42 RNDU4000 kernel: end_request: I/O error, dev sdb, sector 5245224
Apr 13 23:38:42 RNDU4000 kernel: md/raid1:md0: sdb1: rescheduling sector 5243112
Apr 13 23:38:42 RNDU4000 kernel: sd 1:0:0:0: [sdb] Unhandled error code
Apr 13 23:38:42 RNDU4000 kernel: sd 1:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Apr 13 23:38:42 RNDU4000 kernel: sd 1:0:0:0: [sdb] CDB: Read(10): 28 00 00 37 98 70 00 00 08 00
Apr 13 23:38:42 RNDU4000 kernel: end_request: I/O error, dev sdb, sector 3643504
Apr 13 23:38:42 RNDU4000 kernel: md/raid1:md0: sdb1: rescheduling sector 3641392
Apr 13 23:38:42 RNDU4000 kernel: sd 1:0:0:0: [sdb] Unhandled error code
Apr 13 23:38:42 RNDU4000 kernel: sd 1:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Apr 13 23:38:42 RNDU4000 kernel: sd 1:0:0:0: [sdb] CDB: Read(10): 28 00 00 37 98 98 00 00 10 00
Apr 13 23:38:42 RNDU4000 kernel: end_request: I/O error, dev sdb, sector 3643544
Apr 13 23:38:42 RNDU4000 kernel: md/raid1:md0: sdb1: rescheduling sector 3641432
Apr 13 23:38:42 RNDU4000 kernel: sd 1:0:0:0: [sdb] Unhandled error code
Apr 13 23:38:42 RNDU4000 kernel: sd 1:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Apr 13 23:38:42 RNDU4000 kernel: sd 1:0:0:0: [sdb] CDB: Read(10): 28 00 00 50 08 d8 00 00 08 00
Apr 13 23:38:42 RNDU4000 kernel: end_request: I/O error, dev sdb, sector 5245144
Apr 13 23:38:42 RNDU4000 kernel: md/raid1:md0: sdb1: rescheduling sector 5243032
Apr 13 23:38:42 RNDU4000 kernel: sd 1:0:0:0: [sdb] Unhandled error code
Apr 13 23:38:42 RNDU4000 kernel: sd 1:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Apr 13 23:38:42 RNDU4000 kernel: sd 1:0:0:0: [sdb] CDB: Read(10): 28 00 00 11 6c f8 00 00 08 00
Apr 13 23:38:42 RNDU4000 kernel: end_request: I/O error, dev sdb, sector 1142008
Apr 13 23:38:42 RNDU4000 kernel: sd 1:0:0:0: [sdb] Unhandled error code
Apr 13 23:38:42 RNDU4000 kernel: sd 1:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Apr 13 23:38:42 RNDU4000 kernel: sd 1:0:0:0: [sdb] CDB: Write(10): 2a 00 00 11 6c f8 00 00 08 00
Apr 13 23:38:42 RNDU4000 kernel: end_request: I/O error, dev sdb, sector 1142008
Apr 13 23:38:42 RNDU4000 kernel: md/raid1:md0: Disk failure on sdb1, disabling device.
Apr 13 23:38:42 RNDU4000 kernel: <1>md/raid1:md0: Operation continuing on 1 devices.
Apr 13 23:38:42 RNDU4000 kernel: RAID conf printout:
Apr 13 23:38:42 RNDU4000 kernel: --- level:5 rd:2 wd:1
Apr 13 23:38:42 RNDU4000 kernel: disk 0, o:1, dev:sda5
Apr 13 23:38:42 RNDU4000 kernel: disk 1, o:0, dev:sdb5
Apr 13 23:38:42 RNDU4000 kernel: md/raid1:md0: redirecting sector 1139896 to other mirror: sda1
Apr 13 23:38:42 RNDU4000 kernel: RAID conf printout:
Apr 13 23:38:42 RNDU4000 kernel: --- level:5 rd:2 wd:1
Apr 13 23:38:42 RNDU4000 kernel: disk 0, o:1, dev:sda5
Apr 13 23:38:42 RNDU4000 kernel: md/raid1:md0: redirecting sector 5243112 to other mirror: sda1
Apr 13 23:38:42 RNDU4000 kernel: md/raid1:md0: redirecting sector 3641392 to other mirror: sda1
Apr 13 23:38:42 RNDU4000 kernel: md/raid1:md0: redirecting sector 3641432 to other mirror: sda1
Apr 13 23:38:42 RNDU4000 kernel: md/raid1:md0: redirecting sector 5243032 to other mirror: sda1
Apr 13 23:38:42 RNDU4000 kernel: RAID1 conf printout:
Apr 13 23:38:42 RNDU4000 kernel: --- wd:1 rd:2
Apr 13 23:38:42 RNDU4000 kernel: disk 0, wo:0, o:1, dev:sda1
Apr 13 23:38:42 RNDU4000 kernel: disk 1, wo:1, o:0, dev:sdb1
Apr 13 23:38:42 RNDU4000 kernel: RAID1 conf printout:
Apr 13 23:38:42 RNDU4000 kernel: --- wd:1 rd:2
Apr 13 23:38:42 RNDU4000 kernel: disk 0, wo:0, o:1, dev:sda1
Apr 13 23:38:43 RNDU4000 RAIDiator: Disk failure detected.
Apr 13 23:38:43 RNDU4000 RAIDiator: If the failed disk is used in a RAID level 1, 5, or X-RAID volume, please note that volume is now unprotected, and an additional disk failure may render that volume dead. If this disk is a part of a RAID 10 volume,your volume is still protected if more than half of the disks alive. But another failure of disks been marked may render that volume dead. It is recommended that you replace the failed disk as soon as possible to maintain optimal protection of your volume.

Thanks,
Bill
Message 1 of 8
Jedi_Knight
Tutor

Re: Disk failure detected.

I would suggest testing the disk test with disk test tool from disk manufacture on those failure disks. Could you tell us Make/Model/Size/Firmware of the drive you are using?
Message 2 of 8
backstr
Aspirant

Re: Disk failure detected.

It is a Seagate/ST2000DL003-9VT166/2TB/CC32. Unfortunately I cannot run the seagate tools, I'm a Mac user.

As a side note, I originally had this drive and an identical one in an NV+. I was getting poor read/write performance over wired gigabit ethernet (no better than wireless) and resyncs were taking 12-14 hours. I blamed the slow processor in the NV+, returned it and upgraded to the Ultra 4.

Strangely performance was not much better, resync took just as long. But after the reported drive failure I ended up with the other ST2000DL003 and a new ST32000542AS. The resync of the ST32000542AS took 7 hours and R/W performance over GE more than doubled.

I'm returning the suspect ST2000DL003 and getting a replacement though. I don't think I have the time to round up a PC. The ST32000542AS is reporting enormous reallocated sector counts (it has grown from 21 yesterday, to 251 this morning to 789 at present) so I'm expecting it to also fail.

Is a 66% failure rate in two weeks typical? Sure hope it is just bad luck.

Thanks,
Bill
Message 3 of 8
imlucid
Aspirant

Re: Disk failure detected.

You may want to check out the following thread:
http://www.readynas.com/forum/viewtopic.php?f=65&t=51496

Basically we are noticing a trend with the following configurations:
Mac OS access
Seagate ST2000DL003-9VT166 2TB drives

A number of us have had failed drive detections only with these Seagate drives. Shutting down the NAS, pulling the drive and reinserting it and letting it resync appear to recover with no issues.

One user has run lower level diagnostics and seen no errors.
All drives after resync show no SMART errors

I've had 4 different drives give me the alert within a month, none have done so a second time so far.

Kevin
Message 4 of 8
backstr
Aspirant

Re: Disk failure detected.

Thanks, I noticed that thread shortly after starting this one. It does sound exactly like my experience.

I was planning to reinstall the drive to see if the failure reproduced, but with an apparent failure looming on the ST32000542AS I thought it better not to risk it.
Message 5 of 8
nixlimited
Aspirant

Re: Disk failure detected.

Crap, I am just about to rebuild my Ultra 6 NAS with ST2000DL003 drives (x3) and I have a Mac household. And this rebuild is after I lost, simultaneously, 3x WD WD20EADS drives (thread: http://www.readynas.com/forum/viewtopic.php?f=66&t=52261). Sounds like I may be in for a rocky ride...

The other interesting thing to note about this thread is that much like my experience, inserting a new disk precipitated a disk failure. As someone pointed out in another thread, re-syncing is disk-intensive, but still, I shouldn't have disks fail every time I insert a new disk. I do not read about this issue with other vendor NAS devices.
Message 6 of 8
JMehring
Apprentice

Re: Disk failure detected.

Stay away from these drives is you have a mac os client. I have had 3 replacements ReadyNas and now 8 drive failures within 3 weeks! So its either Netgear or the drive problem, but be better to be safe and stay away from the drive even though its on the compatibility list.

I have had multiple drive failures at the same time... I have tries both current and beta ReadyNAS firmware... Netgear says is a sata problem (thats why they replaced the ReadyNaS).

I eventually thought it was a APF problem, since I only starting experiencing the problems when I introduced a mac mini with Lion OS to the NAS. At that point I used AFP as my sharing preference. Bit after that second last failure I thought maybe it was AFP that was making my drives fail (tried both 4.2.17 and 4.2.19tX) beta). So guess what, I disabled any AFP acess and connected my mac to the nas via nsf. Darn; 12 hours later, and 13 hours later I had 2 more drive failures.

I hope they fix my problems, since I just got this ReadyNAS about a month ago and have had at LEAST 8 drive failures and 3 readyNAS replacement; I'm really worried I will lose all my data (I'm afraid to even back it up since I get failures during backups too).

I really hope this ReadyNAS is not a piece of crap; I am thinking it is not since it only seems to be us mac users having problems with this drive. Netgear I hope you will stand by your product and get me working and if its the drives, I expect you will replace them with another type since I bought them since they were on your compatibility list!
Message 7 of 8
mdgm-ntgr
NETGEAR Employee Retired

Re: Disk failure detected.

JMehring, If you have had a bad batch of drives (e.g. drives damaged when a courier dropped them) that is hardly NetGear's fault. If however it is the NAS units that have been faulty that's a different matter. I would suggest you open your own thread with your most recent/current case/RMA number in the thread title (i.e. subject of the first post of the thread).
Message 8 of 8
Top Contributors
Discussion stats
  • 7 replies
  • 1427 views
  • 0 kudos
  • 6 in conversation
Announcements