NETGEAR is aware of a growing number of phone and online scams. To learn how to stay safe click here.
Forum Discussion
Jophus
Jul 03, 2018Luminary
RN316 disk randomly failed without warning
Just got the email message that my relatively new replacement disk in my 6 disk array failed.. Without warning or errors. Is there a way to see why it failed? Email notifification at 2:01pm and all I can see from logs is:
STATUS
[18/07/03 14:01:42 AEST] warning:volume:LOGMSG_HEALTH_VOLUME Volume Raid-6 health changed from Redundant to Degraded.
[18/07/03 14:01:49 AEST] err:disk:LOGMSG_ZFS_DISK_STATUS_CHANGED Disk in channel 3 (Internal) changed state from ONLINE to FAILED.
DISKINFO
Device: sdf
Controller: 0
Channel: 2
Model:
Serial:
Firmware:
Class: SATA
Sectors: 5860533168
Pool: Raid-6
PoolType: RAID 6
PoolState: 3
PoolHostId: 540eddc2
Health data
ATA Error Count: 0
VOLUME
Disk sdf:
HostID: 2fe73ef2
Flags: 0x0
Size: 5860533168 (2794 GB)
Free: 14
Controller 0
Channel: 2
Model:
Serial:
Firmware:
Class: SATA (2)
SMART data
Latest Self Test: Passed
DMESG (note errors only happen 7 minutes after the disk has been taken offline)
[Sun Jul 1 06:13:34 2018] usb 4-2: DVB: adapter 0 frontend 0 frequency 0 out of range (45000000..860000000)
[Tue Jul 3 14:08:18 2018] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[Tue Jul 3 14:08:18 2018] ata3.00: failed command: FLUSH CACHE EXT
[Tue Jul 3 14:08:18 2018] ata3.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 5
res 40/00:01:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
[Tue Jul 3 14:08:18 2018] ata3.00: status: { DRDY }
[Tue Jul 3 14:08:18 2018] ata3: hard resetting link
[Tue Jul 3 14:08:24 2018] do_marvell_9170_recover: ignoring PCI device (8086:3a22) at PCI#0
[Tue Jul 3 14:08:24 2018] do_marvell_9170_recover: ignoring PCI device (8086:3a22) at PCI#0
[Tue Jul 3 14:08:28 2018] do_marvell_9170_recover: ignoring PCI device (8086:3a22) at PCI#0
[Tue Jul 3 14:08:28 2018] ata3: softreset failed (1st FIS failed)
[Tue Jul 3 14:08:28 2018] ata3: hard resetting link
[Tue Jul 3 14:08:34 2018] do_marvell_9170_recover: ignoring PCI device (8086:3a22) at PCI#0
[Tue Jul 3 14:08:34 2018] do_marvell_9170_recover: ignoring PCI device (8086:3a22) at PCI#0
[Tue Jul 3 14:08:38 2018] do_marvell_9170_recover: ignoring PCI device (8086:3a22) at PCI#0
[Tue Jul 3 14:08:38 2018] ata3: softreset failed (1st FIS failed)
[Tue Jul 3 14:08:38 2018] ata3: hard resetting link
[Tue Jul 3 14:08:44 2018] do_marvell_9170_recover: ignoring PCI device (8086:3a22) at PCI#0
[Tue Jul 3 14:08:44 2018] do_marvell_9170_recover: ignoring PCI device (8086:3a22) at PCI#0
[Tue Jul 3 14:09:13 2018] do_marvell_9170_recover: ignoring PCI device (8086:3a22) at PCI#0
[Tue Jul 3 14:09:13 2018] ata3: softreset failed (1st FIS failed)
[Tue Jul 3 14:09:13 2018] ata3: limiting SATA link speed to 1.5 Gbps
[Tue Jul 3 14:09:13 2018] ata3: hard resetting link
8 Replies
Replies have been turned off for this discussion
- mdgm-ntgrNETGEAR Employee Retired
Do you see any errors for the disk in smart_history.log ?
- JophusLuminary
2018-06-04 20:23:14 ST3000DM007-1WY10G ZFQ02SC4 0 0 0 0 0 0 0 0
2018-06-05 14:32:54 ST3000DM007-1WY10G ZFQ02SC4 0 0 0 0 1 0 0 0Nope... this is the drive and these are the entries. 1 CMD_TIMEOUT 05/06/2018 2:32PM - one month ago.
- mdgm-ntgrNETGEAR Employee Retired
Have you checked e.g. kernel.log or systemd-journal.log?
Have you checked the disk using SeaTools?
- mdgm-ntgrNETGEAR Employee Retired
Sometimes disks do fail without having zero values for the key SMART counts.
The possibility of disk failure is one of the reasons why backups are important. Yes, redundant RAID levels can provide some protection but they're not replacements for backups.
Related Content
NETGEAR Academy
Boost your skills with the Netgear Academy - Get trained, certified and stay ahead with the latest Netgear technology!
Join Us!