NETGEAR is aware of a growing number of phone and online scams. To learn how to stay safe click here.

Forum Discussion

Jophus's avatar
Jophus
Luminary
Jul 03, 2018

RN316 disk randomly failed without warning

Just got the email message that my relatively new replacement disk in my 6 disk array failed.. Without warning or errors.  Is there a way to see why it failed?  Email notifification at 2:01pm and all I can see from logs is:

 

STATUS

[18/07/03 14:01:42 AEST] warning:volume:LOGMSG_HEALTH_VOLUME Volume Raid-6 health changed from Redundant to Degraded.
[18/07/03 14:01:49 AEST] err:disk:LOGMSG_ZFS_DISK_STATUS_CHANGED Disk in channel 3 (Internal) changed state from ONLINE to FAILED.

 

DISKINFO

Device: sdf
Controller: 0
Channel: 2
Model:
Serial:
Firmware:
Class: SATA
Sectors: 5860533168
Pool: Raid-6
PoolType: RAID 6
PoolState: 3
PoolHostId: 540eddc2
Health data
ATA Error Count: 0

 

VOLUME

Disk sdf:
HostID: 2fe73ef2
Flags: 0x0
Size: 5860533168 (2794 GB)
Free: 14
Controller 0
Channel: 2
Model:
Serial:
Firmware:
Class: SATA (2)
SMART data
Latest Self Test: Passed

 

DMESG (note errors only happen 7 minutes after the disk has been taken offline)

[Sun Jul 1 06:13:34 2018] usb 4-2: DVB: adapter 0 frontend 0 frequency 0 out of range (45000000..860000000)
[Tue Jul 3 14:08:18 2018] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[Tue Jul 3 14:08:18 2018] ata3.00: failed command: FLUSH CACHE EXT
[Tue Jul 3 14:08:18 2018] ata3.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 5
res 40/00:01:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
[Tue Jul 3 14:08:18 2018] ata3.00: status: { DRDY }
[Tue Jul 3 14:08:18 2018] ata3: hard resetting link
[Tue Jul 3 14:08:24 2018] do_marvell_9170_recover: ignoring PCI device (8086:3a22) at PCI#0
[Tue Jul 3 14:08:24 2018] do_marvell_9170_recover: ignoring PCI device (8086:3a22) at PCI#0
[Tue Jul 3 14:08:28 2018] do_marvell_9170_recover: ignoring PCI device (8086:3a22) at PCI#0
[Tue Jul 3 14:08:28 2018] ata3: softreset failed (1st FIS failed)
[Tue Jul 3 14:08:28 2018] ata3: hard resetting link
[Tue Jul 3 14:08:34 2018] do_marvell_9170_recover: ignoring PCI device (8086:3a22) at PCI#0
[Tue Jul 3 14:08:34 2018] do_marvell_9170_recover: ignoring PCI device (8086:3a22) at PCI#0
[Tue Jul 3 14:08:38 2018] do_marvell_9170_recover: ignoring PCI device (8086:3a22) at PCI#0
[Tue Jul 3 14:08:38 2018] ata3: softreset failed (1st FIS failed)
[Tue Jul 3 14:08:38 2018] ata3: hard resetting link
[Tue Jul 3 14:08:44 2018] do_marvell_9170_recover: ignoring PCI device (8086:3a22) at PCI#0
[Tue Jul 3 14:08:44 2018] do_marvell_9170_recover: ignoring PCI device (8086:3a22) at PCI#0
[Tue Jul 3 14:09:13 2018] do_marvell_9170_recover: ignoring PCI device (8086:3a22) at PCI#0
[Tue Jul 3 14:09:13 2018] ata3: softreset failed (1st FIS failed)
[Tue Jul 3 14:09:13 2018] ata3: limiting SATA link speed to 1.5 Gbps
[Tue Jul 3 14:09:13 2018] ata3: hard resetting link

 

 

 

8 Replies

Replies have been turned off for this discussion
  • mdgm-ntgr's avatar
    mdgm-ntgr
    NETGEAR Employee Retired

    Do you see any errors for the disk in smart_history.log ?

    • Jophus's avatar
      Jophus
      Luminary

      2018-06-04 20:23:14 ST3000DM007-1WY10G ZFQ02SC4 0 0 0 0 0 0 0 0
      2018-06-05 14:32:54 ST3000DM007-1WY10G ZFQ02SC4 0 0 0 0 1 0 0 0

       

      Nope... this is the drive and these are the entries.  1 CMD_TIMEOUT 05/06/2018 2:32PM - one month ago.

      • mdgm-ntgr's avatar
        mdgm-ntgr
        NETGEAR Employee Retired

        Have you checked e.g. kernel.log or systemd-journal.log?

         

        Have you checked the disk using SeaTools?

  • mdgm-ntgr's avatar
    mdgm-ntgr
    NETGEAR Employee Retired

    Sometimes disks do fail without having zero values for the key SMART counts.

     

    The possibility of disk failure is one of the reasons why backups are important. Yes, redundant RAID levels can provide some protection but they're not replacements for backups.

NETGEAR Academy

Boost your skills with the Netgear Academy - Get trained, certified and stay ahead with the latest Netgear technology! 

Join Us!

ProSupport for Business

Comprehensive support plans for maximum network uptime and business peace of mind.

 

Learn More