NETGEAR is aware of a growing number of phone and online scams. To learn how to stay safe click here.

Forum Discussion

BBSUP's avatar
BBSUP
Aspirant
Oct 14, 2020

RN212 volume degraded after reboot after replacing disk

Set-up: Raid1 (X-RAID), 2 x WD Nas Red 2TB disks

 

On 20 August the volume status changed to degraded. Only 1 disk visible anymore.

Due to circumstances the disk was only replaced on 12 October. After booting with the new disk, the synchronisation process started and finished about 8 hours later.

 

Today, the NAS installed the latest firmware (6.10.3), but after a reboot the new disk isn't visible anymore! Performed another reboot, same issue. Chances are slim that this new disk dead already.

 

What log can be checked to see if the issue is with the enclosure instead of the disk? Anything that can be done remotely about this? Or should a replacement enclosure be requested? Device still under warranty.

 

 

9 Replies

  • I just performed a quick test with Data Lifeguard Diagnostics on the "faulty" disk, it passed.

    Extended test is running now, but I wouldn't be surprised if that passes as well, which would make it verly likely that there's an issue with the enclosure itself.

  • StephenB's avatar
    StephenB
    Guru - Experienced User

    BBSUP wrote:

    Chances are slim that this new disk dead already.

     


    Disks can fail at any time.  Historically disk reliability has followed a "bathtub curve", with new disks and very old disks being the most likely to fail.  Generally I test my disks before installing them in a Windows PC with vendor tools - running the full non-destructive test, followed by the full write test.  I have sometimes had new disks fail those tests.  If the disk passes, you might try the write test (since the disk needs to be resynced anyway if it is good).

     

    What disk model did you purchase?  The WD20EFAX is SMR, and I don't recommend it for ReadyNAS.  If you got that, I suggest exchanging it for a WD20EFRX.

     


    BBSUP wrote:

     

    What log can be checked to see if the issue is with the enclosure instead of the disk? Anything that can be done remotely about this? Or should a replacement enclosure be requested? Device still under warranty.

     


    You can download the log zip file.  Disk-info.log might give you some info, though generally "dead" disks aren't listed there.  You can also look for i/o errors in system.log and kernel.log.

     

    It's possible you'll see some errors that point to the disk itself - it is unlikely that you'll see anything that points to the enclosure.  A failing SATA interface in the chassis can't be distinguished from a failing SATA interface in the disk itself.

     

    What you could do is power down the NAS, and remove the new disk (best the label it).  Then move the older disk into the other bay, and see if the system boots.  If it does, it's not the enclosure.

    • BBSUP's avatar
      BBSUP
      Aspirant

      What disk model did you purchase?  The WD20EFAX is SMR, and I don't recommend it for ReadyNAS.  If you got that, I suggest exchanging it for a WD20EFRX.


      The original disk was WD20EFRX, the replacement is indeed WD20EFAX. Our vendor doesn't offer WD20EFRX anymore, so to get CMR, we'd need WD2002FFSX then (WD Red Pro). But performance isn't really an issue on that particular device, it just stores an acronis backup which is then uploaded to the Cloud.

       

       

      What you could do is power down the NAS, and remove the new disk (best the label it).  Then move the older disk into the other bay, and see if the system boots.  If it does, it's not the enclosure.


      In worst case, yes. The NAS isn't located here and due to Covid-19 onsite visits are limited.

      • StephenB's avatar
        StephenB
        Guru - Experienced User

        BBSUP wrote:

        What disk model did you purchase?  The WD20EFAX is SMR, and I don't recommend it for ReadyNAS.  If you got that, I suggest exchanging it for a WD20EFRX.


        The original disk was WD20EFRX, the replacement is indeed WD20EFAX. Our vendor doesn't offer WD20EFRX anymore, so to get CMR, we'd need WD2002FFSX then (WD Red Pro). But performance isn't really an issue on that particular device, it just stores an acronis backup which is then uploaded to the Cloud.

         


        Personally, I won't use SMR in a RAID array, and don't know what WD was thinking when they silently switched to SMR drives a while back.  Write speeds are highly variable, and can slow down to a crawl.  This potentially can lead to timeouts (which could cause a drive to drop out of the array).

         

        WD changed their lineup (in response to customer backlash), and put only SMR disks into the WD Red line.  CMR disks are now in the WD Red Plus lineup (and of course the WD Red Pro).  The WD20EFRX is still current, and is in the Red Plus line.  

         

        You might want to push back on your supplier - WD no longer positions the WD20EFAX as a replacement for the EFRX.  I get that performance isn't a primary concern but several folks here have had issues with the 2-6 TB EFAX drives.  

         

         An alternative to the Red Pro is the 2 TB Seagate Ironwolf - you can mix that with the existing Red.

         

         

    • BBSUP's avatar
      BBSUP
      Aspirant

      You can download the log zip file.  Disk-info.log might give you some info, though generally "dead" disks aren't listed there.  You can also look for i/o errors in system.log and kernel.log.

      Only the active disk is visible in the disk_info.log. Also via SSH for that matter. No related errors found in system.log afaics.

      Kernel.log shows a lot of errors though.

       

      Aug 20, the day the first disk "failed"

      Aug 20 02:42:00 ATTS-NAS-01 kernel: ata2: SATA link down (SStatus 1 SControl 310)
      Aug 20 02:42:00 ATTS-NAS-01 kernel: ata2: EH complete
      Aug 20 02:42:00 ATTS-NAS-01 kernel: ata2: exception Emask 0x10 SAct 0x0 SErr 0x4000000 action 0xe frozen
      Aug 20 02:42:00 ATTS-NAS-01 kernel: ata2: irq_stat 0x00000040, connection status changed
      Aug 20 02:42:00 ATTS-NAS-01 kernel: ata2: SError: { DevExch }
      Aug 20 02:42:00 ATTS-NAS-01 kernel: ata2: limiting SATA link speed to 1.5 Gbps
      Aug 20 02:42:00 ATTS-NAS-01 kernel: ata2: hard resetting link

      .....

      Aug 20 02:53:05 ATTS-NAS-01 kernel: ata2: EH complete
      Aug 20 07:52:13 ATTS-NAS-01 kernel: nr_pdflush_threads exported in /proc is scheduled for removal
      Aug 21 21:51:35 ATTS-NAS-01 kernel: ata1: exception Emask 0x10 SAct 0x0 SErr 0x10200 action 0xe frozen
      Aug 21 21:51:40 ATTS-NAS-01 kernel: ata1: irq_stat 0x00400000, PHY RDY changed
      Aug 21 21:51:40 ATTS-NAS-01 kernel: ata1: SError: { Persist PHYRdyChg }
      Aug 21 21:51:40 ATTS-NAS-01 kernel: ata1: hard resetting link
      Aug 21 21:51:40 ATTS-NAS-01 kernel: ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
      Aug 21 21:51:40 ATTS-NAS-01 kernel: ata1.00: configured for UDMA/133
      Aug 21 21:51:40 ATTS-NAS-01 kernel: ata1: EH complete
      Aug 24 01:11:34 ATTS-NAS-01 kernel: al_eth_0 mdio failed to take ownership. MDIO info reg: 0x00000007

       

      Today before the reboot following the firmware update

      Oct 12 15:33:44 ATTS-NAS-01 kernel: md1: detected capacity change from 0 to 536281088
      Oct 12 15:33:44 ATTS-NAS-01 kernel: md: resync of RAID array md1
      Oct 12 15:33:44 ATTS-NAS-01 kernel: md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
      Oct 12 15:33:44 ATTS-NAS-01 kernel: md: using maximum available idle IO bandwidth (but not more than 1000 KB/sec) for resync.
      Oct 12 15:33:44 ATTS-NAS-01 kernel: md: using 128k window, over a total of 523712k.
      Oct 12 15:33:44 ATTS-NAS-01 kernel: Adding 523708k swap on /dev/md1. Priority:-1 extents:1 across:523708k
      Oct 12 15:34:09 ATTS-NAS-01 kernel: md: md1: resync done.
      Oct 12 15:34:09 ATTS-NAS-01 kernel: RAID1 conf printout:
      Oct 12 15:34:09 ATTS-NAS-01 kernel: --- wd:2 rd:2
      Oct 12 15:34:09 ATTS-NAS-01 kernel: disk 0, wo:0, o:1, dev:sda2
      Oct 12 15:34:09 ATTS-NAS-01 kernel: disk 1, wo:0, o:1, dev:sdb2
      Oct 12 15:35:34 ATTS-NAS-01 kernel: md: md0: recovery done.
      Oct 12 15:35:34 ATTS-NAS-01 kernel: RAID1 conf printout:
      Oct 12 15:35:34 ATTS-NAS-01 kernel: --- wd:2 rd:2
      Oct 12 15:35:34 ATTS-NAS-01 kernel: disk 0, wo:0, o:1, dev:sda1
      Oct 12 15:35:34 ATTS-NAS-01 kernel: disk 1, wo:0, o:1, dev:sdb1
      Oct 12 22:13:40 ATTS-NAS-01 kernel: md: md127: recovery done.
      Oct 12 22:13:40 ATTS-NAS-01 kernel: RAID1 conf printout:
      Oct 12 22:13:40 ATTS-NAS-01 kernel: --- wd:2 rd:2
      Oct 12 22:13:40 ATTS-NAS-01 kernel: disk 0, wo:0, o:1, dev:sda3
      Oct 12 22:13:40 ATTS-NAS-01 kernel: disk 1, wo:0, o:1, dev:sdb3

       

      Today after the firmware reboot: both disks still listed

      Oct 14 09:58:53 ATTS-NAS-01 kernel: ata2.00: ATA-10: WDC WD20EFAX-68FB5N0, 82.00A82, max UDMA/133
      Oct 14 09:58:53 ATTS-NAS-01 kernel: ata2.00: 3907029168 sectors, multi 16: LBA48 NCQ (depth 31/32), AA
      Oct 14 09:58:53 ATTS-NAS-01 kernel: ata1.00: ATA-9: WDC WD20EFRX-68EUZN0, 82.00A82, max UDMA/133
      Oct 14 09:58:53 ATTS-NAS-01 kernel: ata1.00: 3907029168 sectors, multi 0: LBA48 NCQ (depth 31/32), AA
      Oct 14 09:58:53 ATTS-NAS-01 kernel: ata2.00: configured for UDMA/133
      Oct 14 09:58:53 ATTS-NAS-01 kernel: ata1.00: configured for UDMA/133
      Oct 14 09:58:53 ATTS-NAS-01 kernel: scsi 0:0:0:0: Direct-Access ATA WDC WD20EFRX-68E 0A82 PQ: 0 ANSI: 5
      Oct 14 09:58:53 ATTS-NAS-01 kernel: sd 0:0:0:0: [sda] 3907029168 512-byte logical blocks: (2.00 TB/1.82 TiB)
      Oct 14 09:58:53 ATTS-NAS-01 kernel: sd 0:0:0:0: [sda] 4096-byte physical blocks
      Oct 14 09:58:53 ATTS-NAS-01 kernel: sd 0:0:0:0: [sda] Write Protect is off
      Oct 14 09:58:53 ATTS-NAS-01 kernel: sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
      Oct 14 09:58:53 ATTS-NAS-01 kernel: sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
      Oct 14 09:58:53 ATTS-NAS-01 kernel: scsi 1:0:0:0: Direct-Access ATA WDC WD20EFAX-68F 0A82 PQ: 0 ANSI: 5
      Oct 14 09:58:53 ATTS-NAS-01 kernel: sd 1:0:0:0: [sdb] 3907029168 512-byte logical blocks: (2.00 TB/1.82 TiB)
      Oct 14 09:58:53 ATTS-NAS-01 kernel: sd 1:0:0:0: [sdb] 4096-byte physical blocks
      Oct 14 09:58:53 ATTS-NAS-01 kernel: sd 1:0:0:0: [sdb] Write Protect is off
      Oct 14 09:58:53 ATTS-NAS-01 kernel: sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00
      Oct 14 09:58:53 ATTS-NAS-01 kernel: sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
      Oct 14 09:58:53 ATTS-NAS-01 kernel: pci 0000:00:04.1: [1c36:8011] type 00 class 0x100000
      Oct 14 09:58:53 ATTS-NAS-01 kernel: sdb: sdb1 sdb2 sdb3
      Oct 14 09:58:53 ATTS-NAS-01 kernel: sda: sda1 sda2 sda3
      Oct 14 09:58:53 ATTS-NAS-01 kernel: sd 1:0:0:0: [sdb] Attached SCSI disk
      Oct 14 09:58:53 ATTS-NAS-01 kernel: sd 0:0:0:0: [sda] Attached SCSI disk

       

      Today when the volume became degraded again

      Line 3175: Oct 14 09:58:55 ATTS-NAS-01 kernel: ata2.00: irq_stat 0x08000000, interface fatal error
      Line 3176: Oct 14 09:58:55 ATTS-NAS-01 kernel: ata2: SError: { UnrecovData Proto LinkSeq }
      Line 3179: res 40/00:e8:b8:03:94/00:00:00:00:00/40 Emask 0x12 (ATA bus error)
      Line 3187: Oct 14 09:58:57 ATTS-NAS-01 kernel: [eth rx] warn: failed to change state, error -110
      Line 3213: Oct 14 09:59:49 ATTS-NAS-01 kernel: ata2.00: irq_stat 0x08000000, interface fatal error
      Line 3214: Oct 14 09:59:49 ATTS-NAS-01 kernel: ata2: SError: { UnrecovData Proto LinkSeq }
      Line 3217: res 40/00:00:40:e9:08/00:00:00:00:00/40 Emask 0x12 (ATA bus error)
      Line 3224: Oct 14 09:59:49 ATTS-NAS-01 kernel: ata2.00: irq_stat 0x08000000, interface fatal error
      Line 3225: Oct 14 09:59:49 ATTS-NAS-01 kernel: ata2: SError: { UnrecovData Proto LinkSeq }
      Line 3228: res 40/00:a8:40:33:0a/00:00:00:00:00/40 Emask 0x12 (ATA bus error)
      Line 3242: Oct 14 09:59:56 ATTS-NAS-01 kernel: blk_update_request: I/O error, dev sdb, sector 668480
      Line 3248: Oct 14 09:59:56 ATTS-NAS-01 kernel: blk_update_request: I/O error, dev sdb, sector 72
      Line 3249: Oct 14 09:59:56 ATTS-NAS-01 kernel: md: super_written gets error=-5
      Line 3259: Oct 14 09:59:57 ATTS-NAS-01 kernel: ata2: SError: { DevExch }
      Line 3272: Oct 14 09:59:57 ATTS-NAS-01 kernel: md: super_written gets error=-5
      Line 3306: Oct 14 09:59:58 ATTS-NAS-01 kernel: ata2: SError: { DevExch }

       

    • BBSUP's avatar
      BBSUP
      Aspirant


      You can download the log zip file.  Disk-info.log might give you some info, though generally "dead" disks aren't listed there.  You can also look for i/o errors in system.log and kernel.log.


       

      Disk-info.log: only active disk

      System.log: nothing relevant afaics

      Kernel.log: a whole bunch of errors. I copy/pasted some in a previous post, but this seems to have disappeared. Moderator, please release that post :smileywink:

      • StephenB's avatar
        StephenB
        Guru - Experienced User

        BBSUP wrote:


        Kernel.log: a whole bunch of errors. 


        And they do look like interface errors.

         

        The safest way to test if this is the chassis is by inserting a scratch disk into that bay (by itself) and try a factory install.  That eliminates the chance of data loss.

NETGEAR Academy

Boost your skills with the Netgear Academy - Get trained, certified and stay ahead with the latest Netgear technology! 

Join Us!

ProSupport for Business

Comprehensive support plans for maximum network uptime and business peace of mind.

 

Learn More