Forum Discussion

Aspirant

Mar 13, 2011

Disk Failure Detected...

I've recently purchased a ReadyNAS Ultra 6 along with 6 2 Tb Seagate ST2000DL003 disks (which are on the HCL).

I've set up the NAS in a dual redundancy X-RAID2 configuration and have starting copying all the data over the network from my old ReadyNAS NV to the new ultra 6...

About half way through copying (on 6th March), I got a disk failure detected (on channel 4). I powered down the NAS took the disk out and reinserted it, assuming there might be some kind of connection problem... When I powered back up it detected the disk, tested it and started to resync (which takes about 24 hours)... I left it alone while it did that and then it seemed to be ok, so I started copying the rest of my data across. There is nothing in the SMART+ log for disk 4 which would indicate that there was ever a problem with that disk.

A few minutes ago, I just got another disk failure (this time on channel 2). Exactly the same story... powered down and then back up again, the disk comes back to life and the NAS starts testing it and resyncing it... again, there is nothing in the SMART+ log for disk 2 which indicates (to me at least) that there was ever a problem.

After both occasions, I've downloaded the system logs from the NAS, but I'm not sure what to do with them. Is there something in the log which would show what exactly failed?

Any idea what's going on here? Have I got a couple of dud disks which need to be sent back, or is there something else going on? If they are dud, I'd need to be able to prove to the retailer that they were... the only indication I have of a problem is that the ReadyNAS ultra 6 _said_ that they had failed... but they both seem to be working fine now.

Thanks,
Ash.

P.S. Here's the SMART+ report from disk 2:


SMART Information for Disk 2

Model:	ST2000DL003-9VT166
Serial:	5YD2196G
Firmware:	CC32
SMART Attribute
Spin Up Time	0
Start Stop Count	12
Reallocated Sector Count	0
Power On Hours	151
Spin Retry Count	0
Power Cycle Count	12
Reported Uncorrect	0
High Fly Writes	0
Airflow Temperature Cel	42
G-Sense Error Rate	0
Power-Off Retract Count	6
Load Cycle Count	12
Temperature Celsius	42
Current Pending Sector	0
Offline Uncorrectable	0
UDMA CRC Error Count	0
Head Flying Hours	221474283585687
ATA Error Count	0

This looks like the appropriate section of system.log for the failure which occurred today:


Mar 13 20:00:09 ultranas ntpdate[11162]: step time server 194.238.48.3 offset 0.310812 sec
Mar 13 20:16:27 ultranas kernel: ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Mar 13 20:16:27 ultranas kernel: ata2.00: failed command: FLUSH CACHE EXT
Mar 13 20:16:27 ultranas kernel: ata2.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0
Mar 13 20:16:27 ultranas kernel:          res 40/00:ff:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout)
Mar 13 20:16:27 ultranas kernel: ata2.00: status: { DRDY }
Mar 13 20:16:27 ultranas kernel: ata2: hard resetting link
Mar 13 20:16:33 ultranas kernel: ata2: link is slow to respond, please be patient (ready=0)
Mar 13 20:16:37 ultranas kernel: ata2: COMRESET failed (errno=-16)
Mar 13 20:16:37 ultranas kernel: ata2: hard resetting link
Mar 13 20:16:43 ultranas kernel: ata2: link is slow to respond, please be patient (ready=0)
Mar 13 20:16:47 ultranas kernel: ata2: COMRESET failed (errno=-16)
Mar 13 20:16:47 ultranas kernel: ata2: hard resetting link
Mar 13 20:16:53 ultranas kernel: ata2: link is slow to respond, please be patient (ready=0)
Mar 13 20:17:23 ultranas kernel: ata2: COMRESET failed (errno=-16)
Mar 13 20:17:23 ultranas kernel: ata2: limiting SATA link speed to 1.5 Gbps
Mar 13 20:17:23 ultranas kernel: ata2: hard resetting link
Mar 13 20:17:28 ultranas kernel: ata2: COMRESET failed (errno=-16)
Mar 13 20:17:28 ultranas kernel: ata2: reset failed, giving up
Mar 13 20:17:28 ultranas kernel: ata2.00: disabled
Mar 13 20:17:28 ultranas kernel: ata2.00: device reported invalid CHS sector 0
Mar 13 20:17:28 ultranas kernel: ata2: EH complete
Mar 13 20:17:28 ultranas kernel: end_request: I/O error, dev sdb, sector 0
Mar 13 20:17:28 ultranas kernel: sd 1:0:0:0: [sdb] Unhandled error code
Mar 13 20:17:28 ultranas kernel: sd 1:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Mar 13 20:17:28 ultranas kernel: sd 1:0:0:0: [sdb] CDB: Write(10): 2a 00 00 90 00 50 00 00 02 00
Mar 13 20:17:28 ultranas kernel: end_request: I/O error, dev sdb, sector 9437264
Mar 13 20:17:28 ultranas kernel: end_request: I/O error, dev sdb, sector 9437264
Mar 13 20:17:28 ultranas kernel:  **************** super written barrier kludge on md2: error==IO 0xfffffffb
Mar 13 20:17:28 ultranas kernel: sd 1:0:0:0: [sdb] Unhandled error code
Mar 13 20:17:28 ultranas kernel: sd 1:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Mar 13 20:17:28 ultranas kernel: sd 1:0:0:0: [sdb] CDB: Write(10): 2a 00 00 00 00 48 00 00 02 00
Mar 13 20:17:28 ultranas kernel: end_request: I/O error, dev sdb, sector 72
Mar 13 20:17:28 ultranas kernel: end_request: I/O error, dev sdb, sector 72
Mar 13 20:17:28 ultranas kernel:  **************** super written barrier kludge on md0: error==IO 0xfffffffb
Mar 13 20:17:28 ultranas kernel: sd 1:0:0:0: [sdb] Unhandled error code
Mar 13 20:17:28 ultranas kernel: sd 1:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Mar 13 20:17:28 ultranas kernel: sd 1:0:0:0: [sdb] CDB: Read(10): 28 00 00 51 8f 30 00 00 28 00
Mar 13 20:17:28 ultranas kernel: end_request: I/O error, dev sdb, sector 5345072
Mar 13 20:17:28 ultranas kernel: raid1: sdb1: rescheduling sector 5342960
Mar 13 20:17:28 ultranas kernel: sd 1:0:0:0: [sdb] Unhandled error code
Mar 13 20:17:28 ultranas kernel: sd 1:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Mar 13 20:17:28 ultranas kernel: sd 1:0:0:0: [sdb] CDB: Write(10): 2a 00 00 90 00 50 00 00 02 00
Mar 13 20:17:28 ultranas kernel: end_request: I/O error, dev sdb, sector 9437264
Mar 13 20:17:28 ultranas kernel: md: super_written gets error=-5, uptodate=0
Mar 13 20:17:28 ultranas kernel: raid5: Disk failure on sdb5, disabling device.
Mar 13 20:17:28 ultranas kernel: raid5: Operation continuing on 5 devices.
Mar 13 20:17:28 ultranas kernel: sd 1:0:0:0: [sdb] Unhandled error code
Mar 13 20:17:28 ultranas kernel: sd 1:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Mar 13 20:17:28 ultranas kernel: sd 1:0:0:0: [sdb] CDB: Write(10): 2a 00 00 00 00 48 00 00 02 00
Mar 13 20:17:28 ultranas kernel: end_request: I/O error, dev sdb, sector 72
Mar 13 20:17:28 ultranas kernel: md: super_written gets error=-5, uptodate=0
Mar 13 20:17:28 ultranas kernel: raid1: Disk failure on sdb1, disabling device.
Mar 13 20:17:28 ultranas kernel: raid1: Operation continuing on 5 devices.
Mar 13 20:17:28 ultranas kernel: RAID5 conf printout:
Mar 13 20:17:28 ultranas kernel:  --- rd:6 wd:5
Mar 13 20:17:28 ultranas kernel:  disk 0, o:1, dev:sda5
Mar 13 20:17:28 ultranas kernel:  disk 1, o:0, dev:sdb5
Mar 13 20:17:28 ultranas kernel:  disk 2, o:1, dev:sdc5
Mar 13 20:17:28 ultranas kernel:  disk 3, o:1, dev:sdd5
Mar 13 20:17:28 ultranas kernel:  disk 4, o:1, dev:sde5
Mar 13 20:17:28 ultranas kernel:  disk 5, o:1, dev:sdf5
Mar 13 20:17:28 ultranas kernel: RAID5 conf printout:
Mar 13 20:17:28 ultranas kernel:  --- rd:6 wd:5
Mar 13 20:17:28 ultranas kernel:  disk 0, o:1, dev:sda5
Mar 13 20:17:28 ultranas kernel:  disk 2, o:1, dev:sdc5
Mar 13 20:17:28 ultranas kernel:  disk 3, o:1, dev:sdd5
Mar 13 20:17:28 ultranas kernel:  disk 4, o:1, dev:sde5
Mar 13 20:17:28 ultranas kernel:  disk 5, o:1, dev:sdf5
Mar 13 20:17:28 ultranas kernel: RAID1 conf printout:
Mar 13 20:17:28 ultranas kernel:  --- wd:5 rd:6
Mar 13 20:17:28 ultranas kernel:  disk 0, wo:0, o:1, dev:sda1
Mar 13 20:17:28 ultranas kernel:  disk 1, wo:1, o:0, dev:sdb1
Mar 13 20:17:28 ultranas kernel:  disk 2, wo:0, o:1, dev:sdc1
Mar 13 20:17:28 ultranas kernel:  disk 3, wo:0, o:1, dev:sdd1
Mar 13 20:17:28 ultranas kernel:  disk 4, wo:0, o:1, dev:sde1
Mar 13 20:17:28 ultranas kernel:  disk 5, wo:0, o:1, dev:sdf1
Mar 13 20:17:28 ultranas kernel: RAID1 conf printout:
Mar 13 20:17:28 ultranas kernel:  --- wd:5 rd:6
Mar 13 20:17:28 ultranas kernel:  disk 0, wo:0, o:1, dev:sda1
Mar 13 20:17:28 ultranas kernel:  disk 2, wo:0, o:1, dev:sdc1
Mar 13 20:17:28 ultranas kernel:  disk 3, wo:0, o:1, dev:sdd1
Mar 13 20:17:28 ultranas kernel:  disk 4, wo:0, o:1, dev:sde1
Mar 13 20:17:28 ultranas kernel:  disk 5, wo:0, o:1, dev:sdf1
Mar 13 20:17:28 ultranas kernel: raid1: sdf1: redirecting sector 5342960 to another mirror
Mar 13 20:17:32 ultranas RAIDiator: Disk failure detected.\n\nIf the failed disk is used in a RAID level 1, 5, or X-RAID volume, please note that volume is now unprotected, and an additional disk failure may render that volume dead.  If this disk is a part of a RAID 6 volume, your volume is still protected if this is your first failure.  A 2nd disk failure will make your volume unprotected.  It is recommended that you replace the failed disk as soon as possible to maintain optimal protection of your volume.\n\n[Sun Mar 13 20:17:29 WET 2011]
Mar 13 20:20:24 ultranas kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO

and here is what looks like the relevant part of the log from the failure on 6th March:


Mar  6 16:00:07 nas-EA-A6-42 ntpdate[12452]: step time server 62.84.188.34 offset -0.103568 sec
Mar  6 18:48:21 nas-EA-A6-42 kernel: ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Mar  6 18:48:22 nas-EA-A6-42 kernel: ata4.00: failed command: FLUSH CACHE EXT
Mar  6 18:48:22 nas-EA-A6-42 kernel: ata4.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0
Mar  6 18:48:22 nas-EA-A6-42 kernel:          res 40/00:00:b8:f7:0e/00:00:00:00:00/40 Emask 0x4 (timeout)
Mar  6 18:48:22 nas-EA-A6-42 kernel: ata4.00: status: { DRDY }
Mar  6 18:48:22 nas-EA-A6-42 kernel: ata4: hard resetting link
Mar  6 18:48:27 nas-EA-A6-42 kernel: ata4: link is slow to respond, please be patient (ready=0)
Mar  6 18:48:32 nas-EA-A6-42 kernel: ata4: COMRESET failed (errno=-16)
Mar  6 18:48:32 nas-EA-A6-42 kernel: ata4: hard resetting link
Mar  6 18:48:37 nas-EA-A6-42 kernel: ata4: link is slow to respond, please be patient (ready=0)
Mar  6 18:48:42 nas-EA-A6-42 kernel: ata4: COMRESET failed (errno=-16)
Mar  6 18:48:42 nas-EA-A6-42 kernel: ata4: hard resetting link
Mar  6 18:48:47 nas-EA-A6-42 kernel: ata4: link is slow to respond, please be patient (ready=0)
Mar  6 18:49:17 nas-EA-A6-42 kernel: ata4: COMRESET failed (errno=-16)
Mar  6 18:49:17 nas-EA-A6-42 kernel: ata4: limiting SATA link speed to 1.5 Gbps
Mar  6 18:49:17 nas-EA-A6-42 kernel: ata4: hard resetting link
Mar  6 18:49:22 nas-EA-A6-42 kernel: ata4: COMRESET failed (errno=-16)
Mar  6 18:49:22 nas-EA-A6-42 kernel: ata4: reset failed, giving up
Mar  6 18:49:22 nas-EA-A6-42 kernel: ata4.00: disabled
Mar  6 18:49:22 nas-EA-A6-42 kernel: ata4: EH complete
Mar  6 18:49:22 nas-EA-A6-42 kernel: sd 3:0:0:0: [sdd] Unhandled error code
Mar  6 18:49:22 nas-EA-A6-42 kernel: sd 3:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Mar  6 18:49:22 nas-EA-A6-42 kernel: sd 3:0:0:0: [sdd] CDB: Write(10): 2a 00 00 00 00 48 00 00 02 00
Mar  6 18:49:22 nas-EA-A6-42 kernel: end_request: I/O error, dev sdd, sector 72
Mar  6 18:49:22 nas-EA-A6-42 kernel: end_request: I/O error, dev sdd, sector 72
Mar  6 18:49:22 nas-EA-A6-42 kernel:  **************** super written barrier kludge on md0: error==IO 0xfffffffb
Mar  6 18:49:22 nas-EA-A6-42 kernel: sd 3:0:0:0: [sdd] Unhandled error code
Mar  6 18:49:22 nas-EA-A6-42 kernel: sd 3:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Mar  6 18:49:22 nas-EA-A6-42 kernel: sd 3:0:0:0: [sdd] CDB: Write(10): 2a 00 00 93 9e 80 00 00 08 00
Mar  6 18:49:22 nas-EA-A6-42 kernel: end_request: I/O error, dev sdd, sector 9674368
Mar  6 18:49:22 nas-EA-A6-42 kernel: raid5: Disk failure on sdd5, disabling device.
Mar  6 18:49:22 nas-EA-A6-42 kernel: raid5: Operation continuing on 5 devices.
Mar  6 18:49:22 nas-EA-A6-42 kernel: sd 3:0:0:0: [sdd] Unhandled error code
Mar  6 18:49:22 nas-EA-A6-42 kernel: sd 3:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Mar  6 18:49:22 nas-EA-A6-42 kernel: sd 3:0:0:0: [sdd] CDB: Write(10): 2a 00 34 c5 68 48 00 00 80 00
Mar  6 18:49:22 nas-EA-A6-42 kernel: end_request: I/O error, dev sdd, sector 885352520
Mar  6 18:49:22 nas-EA-A6-42 kernel: sd 3:0:0:0: [sdd] Unhandled error code
Mar  6 18:49:22 nas-EA-A6-42 kernel: sd 3:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Mar  6 18:49:22 nas-EA-A6-42 kernel: sd 3:0:0:0: [sdd] CDB: Write(10): 2a 00 34 c6 f0 c8 00 00 50 00
Mar  6 18:49:22 nas-EA-A6-42 kernel: end_request: I/O error, dev sdd, sector 885453000
Mar  6 18:49:22 nas-EA-A6-42 kernel: sd 3:0:0:0: [sdd] Unhandled error code
Mar  6 18:49:22 nas-EA-A6-42 kernel: sd 3:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Mar  6 18:49:22 nas-EA-A6-42 kernel: sd 3:0:0:0: [sdd] CDB: Read(10): 28 00 00 91 28 c8 00 00 38 00
Mar  6 18:49:22 nas-EA-A6-42 kernel: end_request: I/O error, dev sdd, sector 9513160
Mar  6 18:49:22 nas-EA-A6-42 kernel: sd 3:0:0:0: [sdd] Unhandled error code
Mar  6 18:49:22 nas-EA-A6-42 kernel: sd 3:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Mar  6 18:49:22 nas-EA-A6-42 kernel: sd 3:0:0:0: [sdd] CDB: Read(10): 28 00 00 91 29 10 00 00 10 00
Mar  6 18:49:22 nas-EA-A6-42 kernel: end_request: I/O error, dev sdd, sector 9513232
Mar  6 18:49:22 nas-EA-A6-42 kernel: sd 3:0:0:0: [sdd] Unhandled error code
Mar  6 18:49:22 nas-EA-A6-42 kernel: sd 3:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Mar  6 18:49:22 nas-EA-A6-42 kernel: sd 3:0:0:0: [sdd] CDB: Read(10): 28 00 00 91 29 28 00 00 10 00
Mar  6 18:49:22 nas-EA-A6-42 kernel: end_request: I/O error, dev sdd, sector 9513256
Mar  6 18:49:22 nas-EA-A6-42 kernel: sd 3:0:0:0: [sdd] Unhandled error code
Mar  6 18:49:22 nas-EA-A6-42 kernel: sd 3:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Mar  6 18:49:22 nas-EA-A6-42 kernel: sd 3:0:0:0: [sdd] CDB: Read(10): 28 00 00 91 29 40 00 00 08 00
Mar  6 18:49:22 nas-EA-A6-42 kernel: end_request: I/O error, dev sdd, sector 9513280
Mar  6 18:49:22 nas-EA-A6-42 kernel: sd 3:0:0:0: [sdd] Unhandled error code
Mar  6 18:49:22 nas-EA-A6-42 kernel: sd 3:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Mar  6 18:49:22 nas-EA-A6-42 kernel: sd 3:0:0:0: [sdd] CDB: Read(10): 28 00 00 93 88 48 00 00 08 00
Mar  6 18:49:22 nas-EA-A6-42 kernel: end_request: I/O error, dev sdd, sector 9668680
Mar  6 18:49:22 nas-EA-A6-42 kernel: sd 3:0:0:0: [sdd] Unhandled error code
Mar  6 18:49:22 nas-EA-A6-42 kernel: sd 3:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Mar  6 18:49:22 nas-EA-A6-42 kernel: sd 3:0:0:0: [sdd] CDB: Read(10): 28 00 00 93 a1 90 00 00 10 00
Mar  6 18:49:22 nas-EA-A6-42 kernel: end_request: I/O error, dev sdd, sector 9675152
Mar  6 18:49:22 nas-EA-A6-42 kernel: sd 3:0:0:0: [sdd] Unhandled error code
Mar  6 18:49:22 nas-EA-A6-42 kernel: sd 3:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Mar  6 18:49:22 nas-EA-A6-42 kernel: sd 3:0:0:0: [sdd] CDB: Read(10): 28 00 34 c5 38 48 00 00 08 00
Mar  6 18:49:22 nas-EA-A6-42 kernel: end_request: I/O error, dev sdd, sector 885340232
Mar  6 18:49:22 nas-EA-A6-42 kernel: sd 3:0:0:0: [sdd] Unhandled error code
Mar  6 18:49:22 nas-EA-A6-42 kernel: sd 3:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Mar  6 18:49:22 nas-EA-A6-42 kernel: sd 3:0:0:0: [sdd] CDB: Read(10): 28 00 34 c5 64 48 00 00 80 00
Mar  6 18:49:22 nas-EA-A6-42 kernel: end_request: I/O error, dev sdd, sector 885351496
Mar  6 18:49:22 nas-EA-A6-42 kernel: sd 3:0:0:0: [sdd] Unhandled error code
Mar  6 18:49:22 nas-EA-A6-42 kernel: sd 3:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Mar  6 18:49:22 nas-EA-A6-42 kernel: sd 3:0:0:0: [sdd] CDB: Read(10): 28 00 34 c6 f1 18 00 00 30 00
Mar  6 18:49:22 nas-EA-A6-42 kernel: end_request: I/O error, dev sdd, sector 885453080
Mar  6 18:49:22 nas-EA-A6-42 kernel: sd 3:0:0:0: [sdd] Unhandled error code
Mar  6 18:49:22 nas-EA-A6-42 kernel: sd 3:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Mar  6 18:49:22 nas-EA-A6-42 kernel: sd 3:0:0:0: [sdd] CDB: Write(10): 2a 00 00 80 00 48 00 00 02 00
Mar  6 18:49:22 nas-EA-A6-42 kernel: end_request: I/O error, dev sdd, sector 8388680
Mar  6 18:49:22 nas-EA-A6-42 kernel: end_request: I/O error, dev sdd, sector 8388680
Mar  6 18:49:22 nas-EA-A6-42 kernel:  **************** super written barrier kludge on md1: error==IO 0xfffffffb
Mar  6 18:49:22 nas-EA-A6-42 kernel: sd 3:0:0:0: [sdd] Unhandled error code
Mar  6 18:49:22 nas-EA-A6-42 kernel: sd 3:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Mar  6 18:49:22 nas-EA-A6-42 kernel: sd 3:0:0:0: [sdd] CDB: Read(10): 28 00 00 31 8d 58 00 00 28 00
Mar  6 18:49:22 nas-EA-A6-42 kernel: end_request: I/O error, dev sdd, sector 3247448
Mar  6 18:49:22 nas-EA-A6-42 kernel: raid1: sdd1: rescheduling sector 3245336
Mar  6 18:49:22 nas-EA-A6-42 kernel: sd 3:0:0:0: [sdd] Unhandled error code
Mar  6 18:49:22 nas-EA-A6-42 kernel: sd 3:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Mar  6 18:49:22 nas-EA-A6-42 kernel: sd 3:0:0:0: [sdd] CDB: 
Mar  6 18:49:22 nas-EA-A6-42 kernel: sd 3:0:0:0: [sdd] Unhandled error code
Mar  6 18:49:22 nas-EA-A6-42 kernel: sd 3:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Mar  6 18:49:22 nas-EA-A6-42 kernel: sd 3:0:0:0: [sdd] CDB: Write(10)Write(10): 2a 00 00 00 00 48 00 00 02 00
Mar  6 18:49:22 nas-EA-A6-42 kernel: end_request: I/O error, dev sdd, sector 72
Mar  6 18:49:22 nas-EA-A6-42 kernel: :md: super_written gets error=-5, uptodate=0
Mar  6 18:49:22 nas-EA-A6-42 kernel:  2a
Mar  6 18:49:22 nas-EA-A6-42 kernel: raid1: Disk failure on sdd1, disabling device.
Mar  6 18:49:22 nas-EA-A6-42 kernel: raid1: Operation continuing on 5 devices.
Mar  6 18:49:22 nas-EA-A6-42 kernel:  00 00 80 00 48 00 00 02 00
Mar  6 18:49:22 nas-EA-A6-42 kernel: end_request: I/O error, dev sdd, sector 8388680
Mar  6 18:49:22 nas-EA-A6-42 kernel: md: super_written gets error=-5, uptodate=0
Mar  6 18:49:22 nas-EA-A6-42 kernel: raid5: Disk failure on sdd2, disabling device.
Mar  6 18:49:22 nas-EA-A6-42 kernel: raid5: Operation continuing on 5 devices.
Mar  6 18:49:23 nas-EA-A6-42 kernel: RAID1 conf printout:
Mar  6 18:49:23 nas-EA-A6-42 kernel:  --- wd:5 rd:6
Mar  6 18:49:23 nas-EA-A6-42 kernel:  disk 0, wo:0, o:1, dev:sda1
Mar  6 18:49:23 nas-EA-A6-42 kernel:  disk 1, wo:0, o:1, dev:sdb1
Mar  6 18:49:23 nas-EA-A6-42 kernel:  disk 2, wo:0, o:1, dev:sdc1
Mar  6 18:49:23 nas-EA-A6-42 kernel:  disk 3, wo:1, o:0, dev:sdd1
Mar  6 18:49:23 nas-EA-A6-42 kernel:  disk 4, wo:0, o:1, dev:sde1
Mar  6 18:49:23 nas-EA-A6-42 kernel:  disk 5, wo:0, o:1, dev:sdf1
Mar  6 18:49:23 nas-EA-A6-42 kernel: RAID1 conf printout:
Mar  6 18:49:23 nas-EA-A6-42 kernel:  --- wd:5 rd:6
Mar  6 18:49:23 nas-EA-A6-42 kernel:  disk 0, wo:0, o:1, dev:sda1
Mar  6 18:49:23 nas-EA-A6-42 kernel:  disk 1, wo:0, o:1, dev:sdb1
Mar  6 18:49:23 nas-EA-A6-42 kernel:  disk 2, wo:0, o:1, dev:sdc1
Mar  6 18:49:23 nas-EA-A6-42 kernel:  disk 4, wo:0, o:1, dev:sde1
Mar  6 18:49:23 nas-EA-A6-42 kernel:  disk 5, wo:0, o:1, dev:sdf1
Mar  6 18:49:23 nas-EA-A6-42 kernel: RAID5 conf printout:
Mar  6 18:49:23 nas-EA-A6-42 kernel:  --- rd:6 wd:5
Mar  6 18:49:23 nas-EA-A6-42 kernel:  disk 0, o:1, dev:sda5
Mar  6 18:49:23 nas-EA-A6-42 kernel:  disk 1, o:1, dev:sdb5
Mar  6 18:49:23 nas-EA-A6-42 kernel:  disk 2, o:1, dev:sdc5
Mar  6 18:49:23 nas-EA-A6-42 kernel:  disk 3, o:0, dev:sdd5
Mar  6 18:49:23 nas-EA-A6-42 kernel:  disk 4, o:1, dev:sde5
Mar  6 18:49:23 nas-EA-A6-42 kernel:  disk 5, o:1, dev:sdf5
Mar  6 18:49:23 nas-EA-A6-42 kernel: RAID5 conf printout:
Mar  6 18:49:23 nas-EA-A6-42 kernel:  --- rd:6 wd:5
Mar  6 18:49:23 nas-EA-A6-42 kernel:  disk 0, o:1, dev:sda5
Mar  6 18:49:23 nas-EA-A6-42 kernel:  disk 1, o:1, dev:sdb5
Mar  6 18:49:23 nas-EA-A6-42 kernel:  disk 2, o:1, dev:sdc5
Mar  6 18:49:23 nas-EA-A6-42 kernel:  disk 4, o:1, dev:sde5
Mar  6 18:49:23 nas-EA-A6-42 kernel:  disk 5, o:1, dev:sdf5
Mar  6 18:49:23 nas-EA-A6-42 kernel: RAID5 conf printout:
Mar  6 18:49:23 nas-EA-A6-42 kernel:  --- rd:6 wd:5
Mar  6 18:49:23 nas-EA-A6-42 kernel:  disk 0, o:1, dev:sda2
Mar  6 18:49:23 nas-EA-A6-42 kernel:  disk 1, o:1, dev:sdb2
Mar  6 18:49:23 nas-EA-A6-42 kernel:  disk 2, o:1, dev:sdc2
Mar  6 18:49:23 nas-EA-A6-42 kernel:  disk 3, o:0, dev:sdd2
Mar  6 18:49:23 nas-EA-A6-42 kernel:  disk 4, o:1, dev:sde2
Mar  6 18:49:23 nas-EA-A6-42 kernel:  disk 5, o:1, dev:sdf2
Mar  6 18:49:23 nas-EA-A6-42 kernel: raid1: sdb1: redirecting sector 3245336 to another mirror
Mar  6 18:49:23 nas-EA-A6-42 kernel: RAID5 conf printout:
Mar  6 18:49:23 nas-EA-A6-42 kernel:  --- rd:6 wd:5
Mar  6 18:49:23 nas-EA-A6-42 kernel:  disk 0, o:1, dev:sda2
Mar  6 18:49:23 nas-EA-A6-42 kernel:  disk 1, o:1, dev:sdb2
Mar  6 18:49:23 nas-EA-A6-42 kernel:  disk 2, o:1, dev:sdc2
Mar  6 18:49:23 nas-EA-A6-42 kernel:  disk 4, o:1, dev:sde2
Mar  6 18:49:23 nas-EA-A6-42 kernel:  disk 5, o:1, dev:sdf2
Mar  6 18:49:53 nas-EA-A6-42 RAIDiator: Disk failure detected.\n\nIf the failed disk is used in a RAID level 1, 5, or X-RAID volume, please note that volume is now unprotected, and an additional disk failure may render that volume dead.  If this disk is a part of a RAID 6 volume, your volume is still protected if this is your first failure.  A 2nd disk failure will make your volume unprotected.  It is recommended that you replace the failed disk as soon as possible to maintain optimal protection of your volume.\n\n[Sun Mar  6 18:49:51 WET 2011]

Installation & Upgrade

144 Replies

Replies have been turned off for this discussion

roger_armstrong
Aspirant
Dec 23, 2011
Does anyone know from experience which other 2 or 3TB drives work reliably in an Ultra 6+. We're backing up over a TB every night to the ReadyNAS so we really need rock solid drives.
PapaBear1
Apprentice
Dec 23, 2011
For 3TB disks I have had very good service from 4 Hitachi HDS5C3030ALA630 drives (2 each in 2 NVX units, plus 2x1TB Seagates in each as well). These are listed on most websites as PN 0S03230 as well as PN 0F12460 and PN 0S03228 (listed as the retail version). If you go to the Newegg website and click on the images of the drive and zoom in on the labels, you will see they are the same model. (Normal search only brings up 0S03230 and 0F12460, but if you enter "Hitachi 0S03228" in the search box it will show up and is in stock. The four I got from Newegg are PN 0S03230 and I also got one from Amazon. (One of the 5 is a spare still in the sealed static bag) Hitachi uses an opaque silver static bag so you cannot read the drive label until you open the sealed bag. On the label on the bag, only the PN appears, on the drive label, the model number but not the PN appears.

Mine have been in service for about 6 or 7 months without problems and the 1TB Seagates have been in service for 18 months, but one is now throwing ATA errors in the smart info, although I have had no data problems.

FWIW I am running 4.2.17 on both units and have since about a month after it's release. NAS2 (NVX Pioneer) is the nightly backup of NAS1 (NVX Business Edition) which is my server. I have a home network consisting normally of 2 Win 7 Custom built desktops (me) and 1 Win 7 HP laptop. All three are running 64bit Home Premium. From time to time an old (2004) HP laptop with XP Home and an old (2003) HP D530 desktop with XP Pro are also on the network for special purposes only.

ferg1

Guide

Dec 28, 2011

capaust wrote:
roger.armstrong wrote:
Can anyone confirm the theory that reverting to 4.2.17 fixes the problem? Or that TimeMachine is to blame?

I'm not sure about 4.2.17, but our problems with ST2000DL003 drives had nothing to do with Time Machine. We use our ReadyNAS as a file server and just having 25 employees accessing files at the same time caused drives to fail. As with many others here, a couple of resyncs and restarts would 'fix' the problem temporarily, but they would inevitably fail again. It appears that the problem has more to do with loads placed on the drives. Under regular daily use, they would fail intermittently, but I could consistently get them to fail if I ran a large file transfer using RichCopy.

I would concur with seeing this problem under high loads. I have seen the problem with TM and also with multiple concurrent large (+1gb) file transfers. TM does do a lot of disk access as there are a lot of very small (symlinked files). I also saw this multiple times when initially going from a new 4 disc volume (raid5) to a 6 disc (raid6). Enough to cause me to factory reset the unit and start again with a new 6 disc volume.

With the high current price of hard drives I'm hoping for a fix in firmware in the new year as I cannot afford to replace the existing drives yet.

Cheers
Ferg

thestumper
Aspirant
Jan 04, 2012
It's load related. Time machine generates load, but my last failure happned with Time Machine disabled. I was copying a large amount of data (music and video files) to the NAS when it happened. Just a drag-and-drop copy; nothing fancy. I just pinged support again. I took a break over the holidays because I had enough stress but I'm taking it up in earnest again. We're going on 8 pages of problems here, so they need to do something, but I'm not holding my breath because if Netgear can't fix it, they're liable due to the HCL posting. My guess is that if they can't find a fix, they'll procrastinate until someone takes action that's more serious than complaining on a forum or hassling the support techs :) Honestly, I may end up selling the unit and just use the drives as stand alone externals. I can stream video and music from my Mac on a couple, and use a couple as Time machine targets. Not optimal, but if I can't use the unit, maybe someone can.

Or maybe disks will become cheap again some day :)
ferg1
Guide
Jan 06, 2012
I've just had an additional drive failed when I had to poweroff due to a power cut. I have a raid6 Pro with 4 of these ST2000DL003 drives and 2 of a different type. The unit was previously in a single failed drive state which I've been talking to Support about. When booted back up the "failed disc" started resynching. It reached 65% and then froze. I cannot access the unit. RAIDar now indicates that two drives have failed. Frontview is unreachable. Note that the additional failed drive is also a ST2000DL003.

Of these four drives each one has been RMA'ed at least once (some twice). I must be into double figures of disc failures.

Pian

Aspirant

Jan 06, 2012

ferg wrote:
I've just had an additional drive failed when I had to poweroff due to a power cut. I have a raid6 Pro with 4 of these ST2000DL003 drives and 2 of a different type. The unit was previously in a single failed drive state which I've been talking to Support about. When booted back up the "failed disc" started resynching. It reached 65% and then froze. I cannot access the unit. RAIDar now indicates that two drives have failed. Frontview is unreachable. Note that the additional failed drive is also a ST2000DL003.

Of these four drives each one has been RMA'ed at least once (some twice). I must be into double figures of disc failures.

ferg wrote:
I've just had an additional drive failed when I had to poweroff due to a power cut. I have a raid6 Pro with 4 of these ST2000DL003 drives and 2 of a different type. The unit was previously in a single failed drive state which I've been talking to Support about. When booted back up the "failed disc" started resynching. It reached 65% and then froze. I cannot access the unit. RAIDar now indicates that two drives have failed. Frontview is unreachable. Note that the additional failed drive is also a ST2000DL003. Of these four drives each one has been RMA'ed at least once (some twice). I must be into double figures of disc failures.

The scariest post I've read in quite a long time.

I'm in the ridiculous position of trying not to use data on my ReadyNAS in case I get a double failure too. And with disks not yet coming down in price it's as if if I can hear a bomb ticking ...
:(

ferg1

Guide

Jan 06, 2012

Pian wrote:
ferg wrote:
I've just had an additional drive failed when I had to poweroff due to a power cut. I have a raid6 Pro with 4 of these ST2000DL003 drives and 2 of a different type. The unit was previously in a single failed drive state which I've been talking to Support about. When booted back up the "failed disc" started resynching. It reached 65% and then froze. I cannot access the unit. RAIDar now indicates that two drives have failed. Frontview is unreachable. Note that the additional failed drive is also a ST2000DL003.

The scariest post I've read in quite a long time.

I'm in the ridiculous position of trying not to use data on my ReadyNAS in case I get a double failure too. And with disks not yet coming down in price it's as if if I can hear a bomb ticking ...
:(

I hear you. Luckily for me I went with RAID6 when I realised just how unreliable the original five ST2000DL003 drives were, by purchasing a sixth different disc! Still it's highly probable that a third disc will fail and then I'm back into the task of putting 6GB's of backup back.

If discs were not now so expensive I would be at the shop now and putting these ST2000DL003 in the bonfire.

ferg1
Guide
Jan 06, 2012
Unfortunately Level2 have told me that the drive has known issues that they just cannot deal with. They recommend to either exchange the drive or wait for a firmware fix from Seagate.

Not a happy customer.
bokvast
Aspirant
Jan 07, 2012
that just wont do! How are we suppose to just exchange the drives? what store in their right mind would take back the drives? Just leave them and buy new ones? Dont know about you guys but i'm definitely not made of money!

Do SOMETHING Netgear!!
jrpNAS
Aspirant
Jan 20, 2012
ferg wrote:
Unfortunately Level2 have told me that the drive has known issues that they just cannot deal with. They recommend to either exchange the drive or wait for a firmware fix from Seagate.

I wrote to Seagate and ask when can I expect new firmware.
They replied to me:
"We do not have news of any upcoming firmware update for those barracuda drives.
Usually, it is announced ahead of time."

ferg wrote:
Unfortunately Level2 have told me that the drive has known issues that they just cannot deal with. They recommend to either exchange the drive or wait for a firmware fix from Seagate.

NETGEAR Academy

Boost your skills with the Netgear Academy - Get trained, certified and stay ahead with the latest Netgear technology!

Join Us!

ProSupport for Business

Comprehensive support plans for maximum network uptime and business peace of mind.

Learn More

Forum Discussion

Disk Failure Detected...

144 Replies

Related Content

Failure to Detect Adapter

Disk Test Failure

Disk Test Failure

Readynad duo v2 root danneggiata e disk failure

No Disk Detected Bay 2

NETGEAR Academy

ProSupport for Business