Loosing volume on error

PeterZaitsev · ‎2011-08-27

I've reset system to factory default yesterday and it was sinking over night. To my surprise I found it hanged in the morning and after reboot it is unable to find any volumes:

Aug 26 22:11:24 ReadyNAS1 kernel: ata4: hard resetting link
Aug 26 22:11:24 ReadyNAS1 kernel: ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Aug 26 22:11:24 ReadyNAS1 kernel: ata4.00: configured for UDMA/133
Aug 26 22:11:24 ReadyNAS1 kernel: ata4.00: device reported invalid CHS sector 0
Aug 26 22:11:24 ReadyNAS1 last message repeated 10 times
Aug 26 22:11:24 ReadyNAS1 kernel: sd 3:0:0:0: [sdd] Unhandled error code
Aug 26 22:11:24 ReadyNAS1 kernel: sd 3:0:0:0: [sdd] Result: hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT
Aug 26 22:11:24 ReadyNAS1 kernel: sd 3:0:0:0: [sdd] CDB: Read(16): 88 00 00 00 00 01 50 38 79 48 00 00 04 00 00 00
Aug 26 22:11:24 ReadyNAS1 kernel: md/raid:md2: read error not correctable (sector 5631408392 on sdd3).
Aug 26 22:11:24 ReadyNAS1 kernel: md/raid:md2: read error not correctable (sector 5631408400 on sdd3).
Aug 26 22:11:24 ReadyNAS1 kernel: md/raid:md2: read error not correctable (sector 5631408408 on sdd3).
Aug 26 22:11:24 ReadyNAS1 kernel: md/raid:md2: read error not correctable (sector 5631408416 on sdd3).
Aug 26 22:11:24 ReadyNAS1 kernel: md/raid:md2: read error not correctable (sector 5631408424 on sdd3).
Aug 26 22:11:24 ReadyNAS1 kernel: md/raid:md2: read error not correctable (sector 5631408432 on sdd3).
Aug 26 22:11:24 ReadyNAS1 kernel: md/raid:md2: read error not correctable (sector 5631408440 on sdd3).
Aug 26 22:11:24 ReadyNAS1 kernel: md/raid:md2: read error not correctable (sector 5631408448 on sdd3).
Aug 26 22:11:24 ReadyNAS1 kernel: md/raid:md2: read error not correctable (sector 5631408456 on sdd3).
Aug 26 22:11:24 ReadyNAS1 kernel: md/raid:md2: read error not correctable (sector 5631408464 on sdd3).
Aug 26 22:11:24 ReadyNAS1 kernel: sd 3:0:0:0: [sdd] Unhandled error code
Aug 26 22:11:24 ReadyNAS1 kernel: sd 3:0:0:0: [sdd] Result: hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT
Aug 26 22:11:24 ReadyNAS1 kernel: sd 3:0:0:0: [sdd] CDB: Read(16): 88 00 00 00 00 01 50 38 7d 48 00 00 04 00 00 00
Aug 26 22:11:24 ReadyNAS1 kernel: sd 3:0:0:0: [sdd] Unhandled error code
Aug 26 22:11:24 ReadyNAS1 kernel: sd 3:0:0:0: [sdd] Result: hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT
Aug 26 22:11:24 ReadyNAS1 kernel: sd 3:0:0:0: [sdd] CDB: Read(16): 88 00 00 00 00 01 50 38 81 48 00 00 04 00 00 00
Aug 26 22:11:24 ReadyNAS1 kernel: ata4: EH complete
Aug 26 22:11:24 ReadyNAS1 kernel: md: md2: recovery done.
Aug 26 22:11:52 ReadyNAS1 kernel: quiet_error: 374 callbacks suppressed
Aug 26 22:11:52 ReadyNAS1 kernel: lost page write due to I/O error on dm-0
Aug 26 22:11:52 ReadyNAS1 RAIDiator: RAID sync finished on volume C. The array is still in degraded mode, however. Thi
s can be caused by a disk sync failure or failed disks in a multi-parity disk array.

On reboot:

ug 27 10:42:24 ReadyNAS1 kernel: bio: create slab <bio-1> at 1
Aug 27 10:42:24 ReadyNAS1 kernel: md/raid1:md0: not clean -- starting background reconstruction
Aug 27 10:42:24 ReadyNAS1 kernel: md/raid1:md0: active with 6 out of 6 mirrors
Aug 27 10:42:24 ReadyNAS1 kernel: md0: detected capacity change from 0 to 4293906432
Aug 27 10:42:24 ReadyNAS1 kernel: md: resync of RAID array md0
Aug 27 10:42:24 ReadyNAS1 kernel: md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
Aug 27 10:42:24 ReadyNAS1 kernel: md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for re
sync.
Aug 27 10:42:24 ReadyNAS1 kernel: md: using 128k window, over a total of 4193268 blocks.
Aug 27 10:42:24 ReadyNAS1 kernel: md0: unknown partition table
Aug 27 10:42:24 ReadyNAS1 kernel: md: bind<sdb2>
Aug 27 10:42:24 ReadyNAS1 kernel: md: bind<sdc2>
Aug 27 10:42:24 ReadyNAS1 kernel: md: bind<sdd2>
Aug 27 10:42:24 ReadyNAS1 kernel: md: bind<sde2>
Aug 27 10:42:24 ReadyNAS1 kernel: md: bind<sdf2>
Aug 27 10:42:24 ReadyNAS1 kernel: md: bind<sda2>
Aug 27 10:42:24 ReadyNAS1 kernel: md/raid:md1: device sda2 operational as raid disk 0
Aug 27 10:42:24 ReadyNAS1 kernel: md/raid:md1: device sdf2 operational as raid disk 5
Aug 27 10:42:24 ReadyNAS1 kernel: md/raid:md1: device sde2 operational as raid disk 4
Aug 27 10:42:24 ReadyNAS1 kernel: md/raid:md1: device sdd2 operational as raid disk 3
Aug 27 10:42:24 ReadyNAS1 kernel: md/raid:md1: device sdc2 operational as raid disk 2
Aug 27 10:42:24 ReadyNAS1 kernel: md/raid:md1: device sdb2 operational as raid disk 1
Aug 27 10:42:24 ReadyNAS1 kernel: md/raid:md1: allocated 6372kB
Aug 27 10:42:24 ReadyNAS1 kernel: md/raid:md1: raid level 6 active with 6 out of 6 devices, algorithm 2
Aug 27 10:42:24 ReadyNAS1 kernel: md1: detected capacity change from 0 to 2147221504
Aug 27 10:42:24 ReadyNAS1 kernel: md1: unknown partition table
Aug 27 10:42:24 ReadyNAS1 kernel: md: bind<sdb3>
Aug 27 10:42:24 ReadyNAS1 kernel: md: bind<sdc3>
Aug 27 10:42:24 ReadyNAS1 kernel: md: bind<sdd3>
Aug 27 10:42:24 ReadyNAS1 kernel: md: bind<sde3>
Aug 27 10:42:24 ReadyNAS1 kernel: md: bind<sdf3>
Aug 27 10:42:24 ReadyNAS1 kernel: md: bind<sda3>
Aug 27 10:42:24 ReadyNAS1 kernel: md: kicking non-fresh sdd3 from array!
Aug 27 10:42:24 ReadyNAS1 kernel: md: unbind<sdd3>
Aug 27 10:42:24 ReadyNAS1 kernel: md: export_rdev(sdd3)
Aug 27 10:42:24 ReadyNAS1 kernel: md/raid:md2: device sda3 operational as raid disk 0
Aug 27 10:42:24 ReadyNAS1 kernel: md/raid:md2: device sde3 operational as raid disk 4
Aug 27 10:42:24 ReadyNAS1 kernel: md/raid:md2: device sdc3 operational as raid disk 2
Aug 27 10:42:24 ReadyNAS1 kernel: md/raid:md2: device sdb3 operational as raid disk 1
Aug 27 10:42:24 ReadyNAS1 kernel: md/raid:md2: allocated 6372kB
Aug 27 10:42:24 ReadyNAS1 kernel: md: md2 stopped.
Aug 27 10:42:24 ReadyNAS1 kernel: md: unbind<sda3>
Aug 27 10:42:24 ReadyNAS1 kernel: md: export_rdev(sda3)
Aug 27 10:42:24 ReadyNAS1 kernel: md: unbind<sdf3>
Aug 27 10:42:24 ReadyNAS1 kernel: md: export_rdev(sdf3)
Aug 27 10:42:24 ReadyNAS1 kernel: md: unbind<sde3>
Aug 27 10:42:24 ReadyNAS1 kernel: md: export_rdev(sde3)
Aug 27 10:42:24 ReadyNAS1 kernel: md: unbind<sdc3>
Aug 27 10:42:24 ReadyNAS1 kernel: md: export_rdev(sdc3)
Aug 27 10:42:24 ReadyNAS1 kernel: md: unbind<sdb3>
Aug 27 10:42:24 ReadyNAS1 kernel: md: export_rdev(sdb3)
Aug 27 10:42:24 ReadyNAS1 kernel: md: bind<sdb3>
Aug 27 10:42:24 ReadyNAS1 kernel: md: bind<sdc3>
Aug 27 10:42:24 ReadyNAS1 kernel: md: bind<sdd3>
Aug 27 10:42:24 ReadyNAS1 kernel: md: bind<sde3>
Aug 27 10:42:24 ReadyNAS1 kernel: md: bind<sdf3>
Aug 27 10:42:24 ReadyNAS1 kernel: md: bind<sda3>
Aug 27 10:42:24 ReadyNAS1 kernel: md: kicking non-fresh sdd3 from array!
Aug 27 10:42:24 ReadyNAS1 kernel: md: unbind<sdd3>
Aug 27 10:42:24 ReadyNAS1 kernel: md: export_rdev(sdd3)
Aug 27 10:42:24 ReadyNAS1 kernel: md/raid:md2: device sda3 operational as raid disk 0
Aug 27 10:42:24 ReadyNAS1 kernel: md/raid:md2: device sde3 operational as raid disk 4
Aug 27 10:42:24 ReadyNAS1 kernel: md/raid:md2: device sdc3 operational as raid disk 2
Aug 27 10:42:24 ReadyNAS1 kernel: md/raid:md2: device sdb3 operational as raid disk 1
Aug 27 10:42:24 ReadyNAS1 kernel: md/raid:md2: allocated 6372kB
Aug 27 10:42:24 ReadyNAS1 kernel: md: md2 stopped.
Aug 27 10:42:24 ReadyNAS1 kernel: md: unbind<sda3>
Aug 27 10:42:24 ReadyNAS1 kernel: md: export_rdev(sda3)
Aug 27 10:42:24 ReadyNAS1 kernel: md: unbind<sdf3>
Aug 27 10:42:24 ReadyNAS1 kernel: md: export_rdev(sdf3)
Aug 27 10:42:24 ReadyNAS1 kernel: md: unbind<sde3>
Aug 27 10:42:24 ReadyNAS1 kernel: md: export_rdev(sde3)
Aug 27 10:42:24 ReadyNAS1 kernel: md: unbind<sdc3>
Aug 27 10:42:24 ReadyNAS1 kernel: md: export_rdev(sdc3)
Aug 27 10:42:24 ReadyNAS1 kernel: md: unbind<sdb3>
Aug 27 10:42:24 ReadyNAS1 kernel: md: export_rdev(sdb3)

so as you see md2 volume is gone for some reason and as such Volume is not being detected

ReadyNAS1:~# cat /proc/mdstat
Personalities : [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
md1 : active raid6 sda2[0] sdf2[5] sde2[4] sdd2[3] sdc2[2] sdb2[1]
2096896 blocks super 1.2 level 6, 64k chunk, algorithm 2 [6/6] [UUUUUU]

md0 : active raid1 sda1[0] sdf1[5] sde1[4] sdd1[3] sdc1[2] sdb1[1]
4193268 blocks super 1.2 [6/6] [UUUUUU]

Now I can believe it could be disk failure causing it but is not this is exactly what I was suppose to be protected from ?

PeterZaitsev · ‎2011-08-27

OK. I got what is going on here. The second disk had been detected as filed while resync did not complete yet

ReadyNAS1:~# mdadm -Q --detail /dev/md2
/dev/md2:
Version : 1.2
Creation Time : Fri Aug 26 13:51:11 2011
Raid Level : raid5
Used Dev Size : -1
Raid Devices : 6
Total Devices : 5
Persistence : Superblock is persistent

Update Time : Fri Aug 26 22:11:26 2011
State : active, FAILED, Not Started
Active Devices : 4
Working Devices : 5
Failed Devices : 0
Spare Devices : 1

Layout : left-symmetric
Chunk Size : 64K

Name : 001F33EABA01:2
UUID : 01a26106:50b297a8:1d542f0a:5c9b74c6
Events : 83

Number Major Minor RaidDevice State
0 8 3 0 active sync /dev/sda3
1 8 19 1 active sync /dev/sdb3
2 8 35 2 active sync /dev/sdc3
3 0 0 3 removed
4 8 67 4 active sync /dev/sde3
5 8 83 5 spare rebuilding /dev/sdf3

What I need to get now I guess is to change that removed guy to "active sync" any idea how to do it ?

PeterZaitsev · ‎2011-08-27

OK, I got it repaired now. Here is the post http://www.mysqlperformanceblog.com/201 ... id5-array/

Loosing volume on error

Loosing volume on error

Re: Loosing volume on error

Re: Loosing volume on error