NETGEAR is aware of a growing number of phone and online scams. To learn how to stay safe click here.
Forum Discussion
bluewomble
Mar 13, 2011Aspirant
Disk Failure Detected...
I've recently purchased a ReadyNAS Ultra 6 along with 6 2 Tb Seagate ST2000DL003 disks (which are on the HCL).
I've set up the NAS in a dual redundancy X-RAID2 configuration and have starting copying all the data over the network from my old ReadyNAS NV to the new ultra 6...
About half way through copying (on 6th March), I got a disk failure detected (on channel 4). I powered down the NAS took the disk out and reinserted it, assuming there might be some kind of connection problem... When I powered back up it detected the disk, tested it and started to resync (which takes about 24 hours)... I left it alone while it did that and then it seemed to be ok, so I started copying the rest of my data across. There is nothing in the SMART+ log for disk 4 which would indicate that there was ever a problem with that disk.
A few minutes ago, I just got another disk failure (this time on channel 2). Exactly the same story... powered down and then back up again, the disk comes back to life and the NAS starts testing it and resyncing it... again, there is nothing in the SMART+ log for disk 2 which indicates (to me at least) that there was ever a problem.
After both occasions, I've downloaded the system logs from the NAS, but I'm not sure what to do with them. Is there something in the log which would show what exactly failed?
Any idea what's going on here? Have I got a couple of dud disks which need to be sent back, or is there something else going on? If they are dud, I'd need to be able to prove to the retailer that they were... the only indication I have of a problem is that the ReadyNAS ultra 6 _said_ that they had failed... but they both seem to be working fine now.
Thanks,
Ash.
P.S. Here's the SMART+ report from disk 2:
This looks like the appropriate section of system.log for the failure which occurred today:
and here is what looks like the relevant part of the log from the failure on 6th March:
I've set up the NAS in a dual redundancy X-RAID2 configuration and have starting copying all the data over the network from my old ReadyNAS NV to the new ultra 6...
About half way through copying (on 6th March), I got a disk failure detected (on channel 4). I powered down the NAS took the disk out and reinserted it, assuming there might be some kind of connection problem... When I powered back up it detected the disk, tested it and started to resync (which takes about 24 hours)... I left it alone while it did that and then it seemed to be ok, so I started copying the rest of my data across. There is nothing in the SMART+ log for disk 4 which would indicate that there was ever a problem with that disk.
A few minutes ago, I just got another disk failure (this time on channel 2). Exactly the same story... powered down and then back up again, the disk comes back to life and the NAS starts testing it and resyncing it... again, there is nothing in the SMART+ log for disk 2 which indicates (to me at least) that there was ever a problem.
After both occasions, I've downloaded the system logs from the NAS, but I'm not sure what to do with them. Is there something in the log which would show what exactly failed?
Any idea what's going on here? Have I got a couple of dud disks which need to be sent back, or is there something else going on? If they are dud, I'd need to be able to prove to the retailer that they were... the only indication I have of a problem is that the ReadyNAS ultra 6 _said_ that they had failed... but they both seem to be working fine now.
Thanks,
Ash.
P.S. Here's the SMART+ report from disk 2:
SMART Information for Disk 2
Model: ST2000DL003-9VT166
Serial: 5YD2196G
Firmware: CC32
SMART Attribute
Spin Up Time 0
Start Stop Count 12
Reallocated Sector Count 0
Power On Hours 151
Spin Retry Count 0
Power Cycle Count 12
Reported Uncorrect 0
High Fly Writes 0
Airflow Temperature Cel 42
G-Sense Error Rate 0
Power-Off Retract Count 6
Load Cycle Count 12
Temperature Celsius 42
Current Pending Sector 0
Offline Uncorrectable 0
UDMA CRC Error Count 0
Head Flying Hours 221474283585687
ATA Error Count 0
This looks like the appropriate section of system.log for the failure which occurred today:
Mar 13 20:00:09 ultranas ntpdate[11162]: step time server 194.238.48.3 offset 0.310812 sec
Mar 13 20:16:27 ultranas kernel: ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Mar 13 20:16:27 ultranas kernel: ata2.00: failed command: FLUSH CACHE EXT
Mar 13 20:16:27 ultranas kernel: ata2.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0
Mar 13 20:16:27 ultranas kernel: res 40/00:ff:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout)
Mar 13 20:16:27 ultranas kernel: ata2.00: status: { DRDY }
Mar 13 20:16:27 ultranas kernel: ata2: hard resetting link
Mar 13 20:16:33 ultranas kernel: ata2: link is slow to respond, please be patient (ready=0)
Mar 13 20:16:37 ultranas kernel: ata2: COMRESET failed (errno=-16)
Mar 13 20:16:37 ultranas kernel: ata2: hard resetting link
Mar 13 20:16:43 ultranas kernel: ata2: link is slow to respond, please be patient (ready=0)
Mar 13 20:16:47 ultranas kernel: ata2: COMRESET failed (errno=-16)
Mar 13 20:16:47 ultranas kernel: ata2: hard resetting link
Mar 13 20:16:53 ultranas kernel: ata2: link is slow to respond, please be patient (ready=0)
Mar 13 20:17:23 ultranas kernel: ata2: COMRESET failed (errno=-16)
Mar 13 20:17:23 ultranas kernel: ata2: limiting SATA link speed to 1.5 Gbps
Mar 13 20:17:23 ultranas kernel: ata2: hard resetting link
Mar 13 20:17:28 ultranas kernel: ata2: COMRESET failed (errno=-16)
Mar 13 20:17:28 ultranas kernel: ata2: reset failed, giving up
Mar 13 20:17:28 ultranas kernel: ata2.00: disabled
Mar 13 20:17:28 ultranas kernel: ata2.00: device reported invalid CHS sector 0
Mar 13 20:17:28 ultranas kernel: ata2: EH complete
Mar 13 20:17:28 ultranas kernel: end_request: I/O error, dev sdb, sector 0
Mar 13 20:17:28 ultranas kernel: sd 1:0:0:0: [sdb] Unhandled error code
Mar 13 20:17:28 ultranas kernel: sd 1:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Mar 13 20:17:28 ultranas kernel: sd 1:0:0:0: [sdb] CDB: Write(10): 2a 00 00 90 00 50 00 00 02 00
Mar 13 20:17:28 ultranas kernel: end_request: I/O error, dev sdb, sector 9437264
Mar 13 20:17:28 ultranas kernel: end_request: I/O error, dev sdb, sector 9437264
Mar 13 20:17:28 ultranas kernel: **************** super written barrier kludge on md2: error==IO 0xfffffffb
Mar 13 20:17:28 ultranas kernel: sd 1:0:0:0: [sdb] Unhandled error code
Mar 13 20:17:28 ultranas kernel: sd 1:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Mar 13 20:17:28 ultranas kernel: sd 1:0:0:0: [sdb] CDB: Write(10): 2a 00 00 00 00 48 00 00 02 00
Mar 13 20:17:28 ultranas kernel: end_request: I/O error, dev sdb, sector 72
Mar 13 20:17:28 ultranas kernel: end_request: I/O error, dev sdb, sector 72
Mar 13 20:17:28 ultranas kernel: **************** super written barrier kludge on md0: error==IO 0xfffffffb
Mar 13 20:17:28 ultranas kernel: sd 1:0:0:0: [sdb] Unhandled error code
Mar 13 20:17:28 ultranas kernel: sd 1:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Mar 13 20:17:28 ultranas kernel: sd 1:0:0:0: [sdb] CDB: Read(10): 28 00 00 51 8f 30 00 00 28 00
Mar 13 20:17:28 ultranas kernel: end_request: I/O error, dev sdb, sector 5345072
Mar 13 20:17:28 ultranas kernel: raid1: sdb1: rescheduling sector 5342960
Mar 13 20:17:28 ultranas kernel: sd 1:0:0:0: [sdb] Unhandled error code
Mar 13 20:17:28 ultranas kernel: sd 1:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Mar 13 20:17:28 ultranas kernel: sd 1:0:0:0: [sdb] CDB: Write(10): 2a 00 00 90 00 50 00 00 02 00
Mar 13 20:17:28 ultranas kernel: end_request: I/O error, dev sdb, sector 9437264
Mar 13 20:17:28 ultranas kernel: md: super_written gets error=-5, uptodate=0
Mar 13 20:17:28 ultranas kernel: raid5: Disk failure on sdb5, disabling device.
Mar 13 20:17:28 ultranas kernel: raid5: Operation continuing on 5 devices.
Mar 13 20:17:28 ultranas kernel: sd 1:0:0:0: [sdb] Unhandled error code
Mar 13 20:17:28 ultranas kernel: sd 1:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Mar 13 20:17:28 ultranas kernel: sd 1:0:0:0: [sdb] CDB: Write(10): 2a 00 00 00 00 48 00 00 02 00
Mar 13 20:17:28 ultranas kernel: end_request: I/O error, dev sdb, sector 72
Mar 13 20:17:28 ultranas kernel: md: super_written gets error=-5, uptodate=0
Mar 13 20:17:28 ultranas kernel: raid1: Disk failure on sdb1, disabling device.
Mar 13 20:17:28 ultranas kernel: raid1: Operation continuing on 5 devices.
Mar 13 20:17:28 ultranas kernel: RAID5 conf printout:
Mar 13 20:17:28 ultranas kernel: --- rd:6 wd:5
Mar 13 20:17:28 ultranas kernel: disk 0, o:1, dev:sda5
Mar 13 20:17:28 ultranas kernel: disk 1, o:0, dev:sdb5
Mar 13 20:17:28 ultranas kernel: disk 2, o:1, dev:sdc5
Mar 13 20:17:28 ultranas kernel: disk 3, o:1, dev:sdd5
Mar 13 20:17:28 ultranas kernel: disk 4, o:1, dev:sde5
Mar 13 20:17:28 ultranas kernel: disk 5, o:1, dev:sdf5
Mar 13 20:17:28 ultranas kernel: RAID5 conf printout:
Mar 13 20:17:28 ultranas kernel: --- rd:6 wd:5
Mar 13 20:17:28 ultranas kernel: disk 0, o:1, dev:sda5
Mar 13 20:17:28 ultranas kernel: disk 2, o:1, dev:sdc5
Mar 13 20:17:28 ultranas kernel: disk 3, o:1, dev:sdd5
Mar 13 20:17:28 ultranas kernel: disk 4, o:1, dev:sde5
Mar 13 20:17:28 ultranas kernel: disk 5, o:1, dev:sdf5
Mar 13 20:17:28 ultranas kernel: RAID1 conf printout:
Mar 13 20:17:28 ultranas kernel: --- wd:5 rd:6
Mar 13 20:17:28 ultranas kernel: disk 0, wo:0, o:1, dev:sda1
Mar 13 20:17:28 ultranas kernel: disk 1, wo:1, o:0, dev:sdb1
Mar 13 20:17:28 ultranas kernel: disk 2, wo:0, o:1, dev:sdc1
Mar 13 20:17:28 ultranas kernel: disk 3, wo:0, o:1, dev:sdd1
Mar 13 20:17:28 ultranas kernel: disk 4, wo:0, o:1, dev:sde1
Mar 13 20:17:28 ultranas kernel: disk 5, wo:0, o:1, dev:sdf1
Mar 13 20:17:28 ultranas kernel: RAID1 conf printout:
Mar 13 20:17:28 ultranas kernel: --- wd:5 rd:6
Mar 13 20:17:28 ultranas kernel: disk 0, wo:0, o:1, dev:sda1
Mar 13 20:17:28 ultranas kernel: disk 2, wo:0, o:1, dev:sdc1
Mar 13 20:17:28 ultranas kernel: disk 3, wo:0, o:1, dev:sdd1
Mar 13 20:17:28 ultranas kernel: disk 4, wo:0, o:1, dev:sde1
Mar 13 20:17:28 ultranas kernel: disk 5, wo:0, o:1, dev:sdf1
Mar 13 20:17:28 ultranas kernel: raid1: sdf1: redirecting sector 5342960 to another mirror
Mar 13 20:17:32 ultranas RAIDiator: Disk failure detected.\n\nIf the failed disk is used in a RAID level 1, 5, or X-RAID volume, please note that volume is now unprotected, and an additional disk failure may render that volume dead. If this disk is a part of a RAID 6 volume, your volume is still protected if this is your first failure. A 2nd disk failure will make your volume unprotected. It is recommended that you replace the failed disk as soon as possible to maintain optimal protection of your volume.\n\n[Sun Mar 13 20:17:29 WET 2011]
Mar 13 20:20:24 ultranas kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO
and here is what looks like the relevant part of the log from the failure on 6th March:
Mar 6 16:00:07 nas-EA-A6-42 ntpdate[12452]: step time server 62.84.188.34 offset -0.103568 sec
Mar 6 18:48:21 nas-EA-A6-42 kernel: ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Mar 6 18:48:22 nas-EA-A6-42 kernel: ata4.00: failed command: FLUSH CACHE EXT
Mar 6 18:48:22 nas-EA-A6-42 kernel: ata4.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0
Mar 6 18:48:22 nas-EA-A6-42 kernel: res 40/00:00:b8:f7:0e/00:00:00:00:00/40 Emask 0x4 (timeout)
Mar 6 18:48:22 nas-EA-A6-42 kernel: ata4.00: status: { DRDY }
Mar 6 18:48:22 nas-EA-A6-42 kernel: ata4: hard resetting link
Mar 6 18:48:27 nas-EA-A6-42 kernel: ata4: link is slow to respond, please be patient (ready=0)
Mar 6 18:48:32 nas-EA-A6-42 kernel: ata4: COMRESET failed (errno=-16)
Mar 6 18:48:32 nas-EA-A6-42 kernel: ata4: hard resetting link
Mar 6 18:48:37 nas-EA-A6-42 kernel: ata4: link is slow to respond, please be patient (ready=0)
Mar 6 18:48:42 nas-EA-A6-42 kernel: ata4: COMRESET failed (errno=-16)
Mar 6 18:48:42 nas-EA-A6-42 kernel: ata4: hard resetting link
Mar 6 18:48:47 nas-EA-A6-42 kernel: ata4: link is slow to respond, please be patient (ready=0)
Mar 6 18:49:17 nas-EA-A6-42 kernel: ata4: COMRESET failed (errno=-16)
Mar 6 18:49:17 nas-EA-A6-42 kernel: ata4: limiting SATA link speed to 1.5 Gbps
Mar 6 18:49:17 nas-EA-A6-42 kernel: ata4: hard resetting link
Mar 6 18:49:22 nas-EA-A6-42 kernel: ata4: COMRESET failed (errno=-16)
Mar 6 18:49:22 nas-EA-A6-42 kernel: ata4: reset failed, giving up
Mar 6 18:49:22 nas-EA-A6-42 kernel: ata4.00: disabled
Mar 6 18:49:22 nas-EA-A6-42 kernel: ata4: EH complete
Mar 6 18:49:22 nas-EA-A6-42 kernel: sd 3:0:0:0: [sdd] Unhandled error code
Mar 6 18:49:22 nas-EA-A6-42 kernel: sd 3:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Mar 6 18:49:22 nas-EA-A6-42 kernel: sd 3:0:0:0: [sdd] CDB: Write(10): 2a 00 00 00 00 48 00 00 02 00
Mar 6 18:49:22 nas-EA-A6-42 kernel: end_request: I/O error, dev sdd, sector 72
Mar 6 18:49:22 nas-EA-A6-42 kernel: end_request: I/O error, dev sdd, sector 72
Mar 6 18:49:22 nas-EA-A6-42 kernel: **************** super written barrier kludge on md0: error==IO 0xfffffffb
Mar 6 18:49:22 nas-EA-A6-42 kernel: sd 3:0:0:0: [sdd] Unhandled error code
Mar 6 18:49:22 nas-EA-A6-42 kernel: sd 3:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Mar 6 18:49:22 nas-EA-A6-42 kernel: sd 3:0:0:0: [sdd] CDB: Write(10): 2a 00 00 93 9e 80 00 00 08 00
Mar 6 18:49:22 nas-EA-A6-42 kernel: end_request: I/O error, dev sdd, sector 9674368
Mar 6 18:49:22 nas-EA-A6-42 kernel: raid5: Disk failure on sdd5, disabling device.
Mar 6 18:49:22 nas-EA-A6-42 kernel: raid5: Operation continuing on 5 devices.
Mar 6 18:49:22 nas-EA-A6-42 kernel: sd 3:0:0:0: [sdd] Unhandled error code
Mar 6 18:49:22 nas-EA-A6-42 kernel: sd 3:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Mar 6 18:49:22 nas-EA-A6-42 kernel: sd 3:0:0:0: [sdd] CDB: Write(10): 2a 00 34 c5 68 48 00 00 80 00
Mar 6 18:49:22 nas-EA-A6-42 kernel: end_request: I/O error, dev sdd, sector 885352520
Mar 6 18:49:22 nas-EA-A6-42 kernel: sd 3:0:0:0: [sdd] Unhandled error code
Mar 6 18:49:22 nas-EA-A6-42 kernel: sd 3:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Mar 6 18:49:22 nas-EA-A6-42 kernel: sd 3:0:0:0: [sdd] CDB: Write(10): 2a 00 34 c6 f0 c8 00 00 50 00
Mar 6 18:49:22 nas-EA-A6-42 kernel: end_request: I/O error, dev sdd, sector 885453000
Mar 6 18:49:22 nas-EA-A6-42 kernel: sd 3:0:0:0: [sdd] Unhandled error code
Mar 6 18:49:22 nas-EA-A6-42 kernel: sd 3:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Mar 6 18:49:22 nas-EA-A6-42 kernel: sd 3:0:0:0: [sdd] CDB: Read(10): 28 00 00 91 28 c8 00 00 38 00
Mar 6 18:49:22 nas-EA-A6-42 kernel: end_request: I/O error, dev sdd, sector 9513160
Mar 6 18:49:22 nas-EA-A6-42 kernel: sd 3:0:0:0: [sdd] Unhandled error code
Mar 6 18:49:22 nas-EA-A6-42 kernel: sd 3:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Mar 6 18:49:22 nas-EA-A6-42 kernel: sd 3:0:0:0: [sdd] CDB: Read(10): 28 00 00 91 29 10 00 00 10 00
Mar 6 18:49:22 nas-EA-A6-42 kernel: end_request: I/O error, dev sdd, sector 9513232
Mar 6 18:49:22 nas-EA-A6-42 kernel: sd 3:0:0:0: [sdd] Unhandled error code
Mar 6 18:49:22 nas-EA-A6-42 kernel: sd 3:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Mar 6 18:49:22 nas-EA-A6-42 kernel: sd 3:0:0:0: [sdd] CDB: Read(10): 28 00 00 91 29 28 00 00 10 00
Mar 6 18:49:22 nas-EA-A6-42 kernel: end_request: I/O error, dev sdd, sector 9513256
Mar 6 18:49:22 nas-EA-A6-42 kernel: sd 3:0:0:0: [sdd] Unhandled error code
Mar 6 18:49:22 nas-EA-A6-42 kernel: sd 3:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Mar 6 18:49:22 nas-EA-A6-42 kernel: sd 3:0:0:0: [sdd] CDB: Read(10): 28 00 00 91 29 40 00 00 08 00
Mar 6 18:49:22 nas-EA-A6-42 kernel: end_request: I/O error, dev sdd, sector 9513280
Mar 6 18:49:22 nas-EA-A6-42 kernel: sd 3:0:0:0: [sdd] Unhandled error code
Mar 6 18:49:22 nas-EA-A6-42 kernel: sd 3:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Mar 6 18:49:22 nas-EA-A6-42 kernel: sd 3:0:0:0: [sdd] CDB: Read(10): 28 00 00 93 88 48 00 00 08 00
Mar 6 18:49:22 nas-EA-A6-42 kernel: end_request: I/O error, dev sdd, sector 9668680
Mar 6 18:49:22 nas-EA-A6-42 kernel: sd 3:0:0:0: [sdd] Unhandled error code
Mar 6 18:49:22 nas-EA-A6-42 kernel: sd 3:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Mar 6 18:49:22 nas-EA-A6-42 kernel: sd 3:0:0:0: [sdd] CDB: Read(10): 28 00 00 93 a1 90 00 00 10 00
Mar 6 18:49:22 nas-EA-A6-42 kernel: end_request: I/O error, dev sdd, sector 9675152
Mar 6 18:49:22 nas-EA-A6-42 kernel: sd 3:0:0:0: [sdd] Unhandled error code
Mar 6 18:49:22 nas-EA-A6-42 kernel: sd 3:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Mar 6 18:49:22 nas-EA-A6-42 kernel: sd 3:0:0:0: [sdd] CDB: Read(10): 28 00 34 c5 38 48 00 00 08 00
Mar 6 18:49:22 nas-EA-A6-42 kernel: end_request: I/O error, dev sdd, sector 885340232
Mar 6 18:49:22 nas-EA-A6-42 kernel: sd 3:0:0:0: [sdd] Unhandled error code
Mar 6 18:49:22 nas-EA-A6-42 kernel: sd 3:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Mar 6 18:49:22 nas-EA-A6-42 kernel: sd 3:0:0:0: [sdd] CDB: Read(10): 28 00 34 c5 64 48 00 00 80 00
Mar 6 18:49:22 nas-EA-A6-42 kernel: end_request: I/O error, dev sdd, sector 885351496
Mar 6 18:49:22 nas-EA-A6-42 kernel: sd 3:0:0:0: [sdd] Unhandled error code
Mar 6 18:49:22 nas-EA-A6-42 kernel: sd 3:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Mar 6 18:49:22 nas-EA-A6-42 kernel: sd 3:0:0:0: [sdd] CDB: Read(10): 28 00 34 c6 f1 18 00 00 30 00
Mar 6 18:49:22 nas-EA-A6-42 kernel: end_request: I/O error, dev sdd, sector 885453080
Mar 6 18:49:22 nas-EA-A6-42 kernel: sd 3:0:0:0: [sdd] Unhandled error code
Mar 6 18:49:22 nas-EA-A6-42 kernel: sd 3:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Mar 6 18:49:22 nas-EA-A6-42 kernel: sd 3:0:0:0: [sdd] CDB: Write(10): 2a 00 00 80 00 48 00 00 02 00
Mar 6 18:49:22 nas-EA-A6-42 kernel: end_request: I/O error, dev sdd, sector 8388680
Mar 6 18:49:22 nas-EA-A6-42 kernel: end_request: I/O error, dev sdd, sector 8388680
Mar 6 18:49:22 nas-EA-A6-42 kernel: **************** super written barrier kludge on md1: error==IO 0xfffffffb
Mar 6 18:49:22 nas-EA-A6-42 kernel: sd 3:0:0:0: [sdd] Unhandled error code
Mar 6 18:49:22 nas-EA-A6-42 kernel: sd 3:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Mar 6 18:49:22 nas-EA-A6-42 kernel: sd 3:0:0:0: [sdd] CDB: Read(10): 28 00 00 31 8d 58 00 00 28 00
Mar 6 18:49:22 nas-EA-A6-42 kernel: end_request: I/O error, dev sdd, sector 3247448
Mar 6 18:49:22 nas-EA-A6-42 kernel: raid1: sdd1: rescheduling sector 3245336
Mar 6 18:49:22 nas-EA-A6-42 kernel: sd 3:0:0:0: [sdd] Unhandled error code
Mar 6 18:49:22 nas-EA-A6-42 kernel: sd 3:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Mar 6 18:49:22 nas-EA-A6-42 kernel: sd 3:0:0:0: [sdd] CDB:
Mar 6 18:49:22 nas-EA-A6-42 kernel: sd 3:0:0:0: [sdd] Unhandled error code
Mar 6 18:49:22 nas-EA-A6-42 kernel: sd 3:0:0:0: [sdd] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Mar 6 18:49:22 nas-EA-A6-42 kernel: sd 3:0:0:0: [sdd] CDB: Write(10)Write(10): 2a 00 00 00 00 48 00 00 02 00
Mar 6 18:49:22 nas-EA-A6-42 kernel: end_request: I/O error, dev sdd, sector 72
Mar 6 18:49:22 nas-EA-A6-42 kernel: :md: super_written gets error=-5, uptodate=0
Mar 6 18:49:22 nas-EA-A6-42 kernel: 2a
Mar 6 18:49:22 nas-EA-A6-42 kernel: raid1: Disk failure on sdd1, disabling device.
Mar 6 18:49:22 nas-EA-A6-42 kernel: raid1: Operation continuing on 5 devices.
Mar 6 18:49:22 nas-EA-A6-42 kernel: 00 00 80 00 48 00 00 02 00
Mar 6 18:49:22 nas-EA-A6-42 kernel: end_request: I/O error, dev sdd, sector 8388680
Mar 6 18:49:22 nas-EA-A6-42 kernel: md: super_written gets error=-5, uptodate=0
Mar 6 18:49:22 nas-EA-A6-42 kernel: raid5: Disk failure on sdd2, disabling device.
Mar 6 18:49:22 nas-EA-A6-42 kernel: raid5: Operation continuing on 5 devices.
Mar 6 18:49:23 nas-EA-A6-42 kernel: RAID1 conf printout:
Mar 6 18:49:23 nas-EA-A6-42 kernel: --- wd:5 rd:6
Mar 6 18:49:23 nas-EA-A6-42 kernel: disk 0, wo:0, o:1, dev:sda1
Mar 6 18:49:23 nas-EA-A6-42 kernel: disk 1, wo:0, o:1, dev:sdb1
Mar 6 18:49:23 nas-EA-A6-42 kernel: disk 2, wo:0, o:1, dev:sdc1
Mar 6 18:49:23 nas-EA-A6-42 kernel: disk 3, wo:1, o:0, dev:sdd1
Mar 6 18:49:23 nas-EA-A6-42 kernel: disk 4, wo:0, o:1, dev:sde1
Mar 6 18:49:23 nas-EA-A6-42 kernel: disk 5, wo:0, o:1, dev:sdf1
Mar 6 18:49:23 nas-EA-A6-42 kernel: RAID1 conf printout:
Mar 6 18:49:23 nas-EA-A6-42 kernel: --- wd:5 rd:6
Mar 6 18:49:23 nas-EA-A6-42 kernel: disk 0, wo:0, o:1, dev:sda1
Mar 6 18:49:23 nas-EA-A6-42 kernel: disk 1, wo:0, o:1, dev:sdb1
Mar 6 18:49:23 nas-EA-A6-42 kernel: disk 2, wo:0, o:1, dev:sdc1
Mar 6 18:49:23 nas-EA-A6-42 kernel: disk 4, wo:0, o:1, dev:sde1
Mar 6 18:49:23 nas-EA-A6-42 kernel: disk 5, wo:0, o:1, dev:sdf1
Mar 6 18:49:23 nas-EA-A6-42 kernel: RAID5 conf printout:
Mar 6 18:49:23 nas-EA-A6-42 kernel: --- rd:6 wd:5
Mar 6 18:49:23 nas-EA-A6-42 kernel: disk 0, o:1, dev:sda5
Mar 6 18:49:23 nas-EA-A6-42 kernel: disk 1, o:1, dev:sdb5
Mar 6 18:49:23 nas-EA-A6-42 kernel: disk 2, o:1, dev:sdc5
Mar 6 18:49:23 nas-EA-A6-42 kernel: disk 3, o:0, dev:sdd5
Mar 6 18:49:23 nas-EA-A6-42 kernel: disk 4, o:1, dev:sde5
Mar 6 18:49:23 nas-EA-A6-42 kernel: disk 5, o:1, dev:sdf5
Mar 6 18:49:23 nas-EA-A6-42 kernel: RAID5 conf printout:
Mar 6 18:49:23 nas-EA-A6-42 kernel: --- rd:6 wd:5
Mar 6 18:49:23 nas-EA-A6-42 kernel: disk 0, o:1, dev:sda5
Mar 6 18:49:23 nas-EA-A6-42 kernel: disk 1, o:1, dev:sdb5
Mar 6 18:49:23 nas-EA-A6-42 kernel: disk 2, o:1, dev:sdc5
Mar 6 18:49:23 nas-EA-A6-42 kernel: disk 4, o:1, dev:sde5
Mar 6 18:49:23 nas-EA-A6-42 kernel: disk 5, o:1, dev:sdf5
Mar 6 18:49:23 nas-EA-A6-42 kernel: RAID5 conf printout:
Mar 6 18:49:23 nas-EA-A6-42 kernel: --- rd:6 wd:5
Mar 6 18:49:23 nas-EA-A6-42 kernel: disk 0, o:1, dev:sda2
Mar 6 18:49:23 nas-EA-A6-42 kernel: disk 1, o:1, dev:sdb2
Mar 6 18:49:23 nas-EA-A6-42 kernel: disk 2, o:1, dev:sdc2
Mar 6 18:49:23 nas-EA-A6-42 kernel: disk 3, o:0, dev:sdd2
Mar 6 18:49:23 nas-EA-A6-42 kernel: disk 4, o:1, dev:sde2
Mar 6 18:49:23 nas-EA-A6-42 kernel: disk 5, o:1, dev:sdf2
Mar 6 18:49:23 nas-EA-A6-42 kernel: raid1: sdb1: redirecting sector 3245336 to another mirror
Mar 6 18:49:23 nas-EA-A6-42 kernel: RAID5 conf printout:
Mar 6 18:49:23 nas-EA-A6-42 kernel: --- rd:6 wd:5
Mar 6 18:49:23 nas-EA-A6-42 kernel: disk 0, o:1, dev:sda2
Mar 6 18:49:23 nas-EA-A6-42 kernel: disk 1, o:1, dev:sdb2
Mar 6 18:49:23 nas-EA-A6-42 kernel: disk 2, o:1, dev:sdc2
Mar 6 18:49:23 nas-EA-A6-42 kernel: disk 4, o:1, dev:sde2
Mar 6 18:49:23 nas-EA-A6-42 kernel: disk 5, o:1, dev:sdf2
Mar 6 18:49:53 nas-EA-A6-42 RAIDiator: Disk failure detected.\n\nIf the failed disk is used in a RAID level 1, 5, or X-RAID volume, please note that volume is now unprotected, and an additional disk failure may render that volume dead. If this disk is a part of a RAID 6 volume, your volume is still protected if this is your first failure. A 2nd disk failure will make your volume unprotected. It is recommended that you replace the failed disk as soon as possible to maintain optimal protection of your volume.\n\n[Sun Mar 6 18:49:51 WET 2011]
144 Replies
Replies have been turned off for this discussion
- roger_armstrongAspirantWe backup 5 ESX servers (with ghettoVCB) to our Ultra 6+ with 6 of these ST2000DL003s every day. After losing several disks, we factory reset and rebuilt with X-RAID2 with dual redundancy and removed the Time Machine backups to the NAS. That's been running peacefully for a month or so, but our trust in it is very limited...
Its a pity because they are great devices - the NFS performance is great for ESX backup and even for ESX NFS datastores, but it would be really important for Netgear to take a clear position on this to reestablish trust. I, for one, cannot recommend ReadyNAS to our customers as long as Netgear includes what appear to be dysfunctional configurations in its HCL list. - alcesterAspirantHi All,
I'm currently working with Support to find out why a 2 to 4 disk X-RAID2 Expansion on an Ultra 4+ is only including 3 disks. I did have a drive reported as Dead during Restriping, but a reboot allowed it to continue. The drive that was reported as Dead is not included in the RAID array but shows as an Available disk. All SMART tests and the units own Boot Menu Disk Test shows no errors.
All the 4 drives are these same 2TB Seagate units, but my firmware is different to what I can see in all the above posts (I have CC3C not CC32). Having not read this Forum until after choosing my drives from the HCL, I was feeling very doomed, but if my firmware version is newer, might there be some light at the end of the tunnel?? - e3henriAspirantWhat happens after the failure?
Is the data lost or will it just recover after the drive is detected again? Will the volumes remain?
I have a setup with no redudancy
A Netgear ReadyNAS Ultra 2 RNDU2120 (one 2TB Seagate was included) + an additional Seagate Drive
This is configured with one raid1 volume per drive
Yes, I know I will lose half of my data in case on of the drive fails, but the data is not critical.
I need all the 4TB and I couldn't afford a Ultra 4 with raid5.
So I'm prepared to loose data in case of a real disc crash, but is my data also in danger due to this incorrectly detected disc dead?
I really hope your answer is that the volumes is just rebuilt without data loss :)
I'm not using mac. Since I added the second driver I have copied about 3TB of data back and forth within 24 hours with no problem while streaming to a media player at the same time.
I know that there are also issues with some WD discs that requires the idle time to be changed in the drive in order to wake up fast enough or else the raid functionality in the Readynas will drop drives.
But the Seagate issue is more about heavy load and not starting from idle (I suppose).
Strange that the drive is actually delivered with a seagate drive if this problem is common.
What about the fix that was somewhat promised for Q1 2012? Was that a fix in the Readynas or in the Seagate firmware? - jrpNASAspirant
e3henri wrote: What happens after the failure?
Is the data lost or will it just recover after the drive is detected again? Will the volumes remain?
When the drive dies, all data disappears.e3henri wrote: Strange that the drive is actually delivered with a seagate drive if this problem is common.
What about the fix that was somewhat promised for Q1 2012? Was that a fix in the Readynas or in the Seagate firmware?
Seagate does not care, because at the moment are more buyers than sellers.
Seagate's support does not know anything about firmware update. - CitizenPlainAspirantDisregard the post below! It didn't work. I just had a disk fail under these settings.
Summary:
The good: I think I found a fix for this problem.
The bad: It can slow the NAS down significantly.
The (possible) fix: Disable jumbo frames, enable full data journaling.
Does it work: Probably? :D -- Update: No. It doesn't. I just had a disk fail.
Details, history, background:
I wanted to give an update on my latest experiences with this issue. I first posted here on March 27, 2011. After my last incident I posted about, I managed to complete a Time Machine backup and copy all the data I needed to copy to the ReadyNAS Ultra 6 Plus. For 8 months or so, it was simply handling incremental time machine backups from one Macbook Pro, copying the occasional movie or folder of photos, and that was about it. No disk failures. No heavy activity, either. Everything seemed fine.
Two weeks ago, I added a new Mac Pro to the network and attempted to complete a Time Machine backup. Over the course of 5 or 6 days of it simply trying to complete it's first backup of about 220 GB, I had three disk failures. The backup simply didn't complete without the RAID reporting a disk as failed. Each time it was a different disk. Each time, the power was cycled, NAS rebooted, and disk rebuilt in about 12 hours and I'd restart Time Machine.
What I changed:
I wondered if I disabled some features that made the NAS go faster if that would solve the problem. So, under:
System > Peformance > UNchecked "Disable full data journaling."
Network > Interfaces > Ethernet 1 tab > (scroll down) > Performance Settings > UNchecked "Enable jumbo frames."
also, for good measure, I disabled the iTunes Streaming Server, which I realized I was never using. (Services > Streaming Services > iTunes Streaming Server).
That's it. That's all I changed. Left "Disk write cache" enabled as before.
Results:
Good stuff: No disk failures. (yet.) My Time Machine backup completed without a disk failure, which it wasn't doing before. (in fact, it wasn't even completing at all before) I've tried copying a big folder over from one machine (~300 GB, mix of large and small files) while initiating a Time Machine backup on the other machine and had no failures, which was previously prime failure time.
Bad stuff: It seems noticeably slower, though I didn't empirically test it. Watching network IO rate during an AFP file copy of lots of small files and it can be down to 2 or 3 MB/s. Large files (~1GB) will still copy at around 40 to 60 MB/s, though this is still slower than the 80 or 90 MB/s I'd previously get.
Conclusion:
I actually don't care if my ReadyNAS doesn't go crazy fast. Data security / no drive failures is more important to me. I mostly use the NAS as a backup solution for Time Machine, work projects, photos, videos, music, movies, etc. Files get copied there, freeing up space somewhere else, and they sit there for months or years. I don't actually work on projects off the drive (I do video and animation) so the IO speed isn't that big of a deal.
Remaining Questions / Anybody wanna help?:
Because I changed two things, I don't know if it was full data journaling, jumbo frames, or the combination of both that's seemingly fixing the problem. (surely the iTunes server had nothing to do with it.) It would be nice if someone tried just disabling one or the other and seeing if they don't have any more problems. It seems feasible that just doing one or the other thing could solve the problem and the NAS might not take such a performance hit, though I honestly don't have the time or inclination to test it.
If someone wants to try it both ways and can work with the NAS not going at full tilt speed, that would be great. i.e., dear forum reader, go and either turn on data journaling or off jumbo frames and then report back if you have/don't-have a disk failure.
I also didn't do a reliable/repeatable benchmark to check the actual difference in speeds this is causing. Someone might want to take a look at that. Would be nice to know what effect data journaling vs jumbo frames vs both has.
... and, of course, it's completely feasible that I'll have a failure tomorrow and there's absolutely no relation to all the stuff I'm saying. (Update: I just did. It doesn't work.) This just seemed noteworthy enough at the time to mention here.
Looking forward to getting this all sorted out without having to buy an entire batch of new drives. Thanks for reading. - jrpNASAspirantI removed that full data journaling at the end of December. The next day one of the disks died, but after that everything is working without any problems.
- CitizenPlainAspirantThanks, jrpNAS.
Just to confirm, by "removed that full data journaling," do you mean that you unchecked "disable full data journaling," thus, enabling it? Also, what level of load has your NAS been getting since your last failure? Light, occasional use or more frequent heavy use? - skywalker1215AspirantCitizenPlain, thanks for looking into this so much!
At christmas I purchased an NV+ V2 and like everyone else here, I've had randomly occurring disk fails. I'm running 3 3TB drives from 3 different manufacturers (Hitachi, WD, Seagate) and all 3 have randomly failed at various times and all 3 were brand new. A simple restart of the device and a resync and it will work again for a day or two.
We've been doing large file transfers and a Time Machine backup from a mac pro, so I'm hoping you are right! I will try unchecking the "disabling full data journaling" and disabling jumbo frames. I will post my results when I can. - jrpNASAspirantYes, unchecked "disable full data journaling."
I have use NAS always the same, TimeMachine, EyeTV, storage, not very heavy load.
Before Lion I don't have big problem with NAS, only few disk "death." After Lion that started to happen a couple times a week.
Sorry my bad english. - jrpNASAspirantYesterday I uncheck Enable Jumbo frames and today one disk was died.
Feel that I have four Seagate too much.
Related Content
NETGEAR Academy
Boost your skills with the Netgear Academy - Get trained, certified and stay ahead with the latest Netgear technology!
Join Us!