NETGEAR is aware of a growing number of phone and online scams. To learn how to stay safe click here.

Forum Discussion

JDFuzz's avatar
JDFuzz
Aspirant
Oct 26, 2020

Volume Degraded after disk replacement

Hi I hope somebody can help me!

 

I got a message one day saying one of my two disks have failed (both 4xTB WD reds).

 

So I replaced the bad one with a 4TB Seagate Ironwolf as the newer WD reds had bad reviews,

 

After re-syncing the first time, the volume was still degraded. I had this message in my logs.

"Volume: The resync operation finished on volume data. However, the volume is still degraded."

 

Everytime I shut this thing down and it boots up again, it begins to resync which takes all day. At this point I'm not sure what to do. I'd try a factory reset after backing up my data but I do not have the storage capacity to do that without the NAS.

 

I saw a similar post asking for logs, so below is disk_info and mdstat.

 

Thank you!

 

disk_info

Device:             sda
Controller:         0
Channel:            0
Model:              WDC WD40EFRX-68WT0N0
Serial:             WD-WCC4E3RDS3R9
Firmware:           82.00A82W
Class:              SATA
RPM:                5400
Sectors:            7814037168
Pool:               data
PoolType:           RAID 1
PoolState:          3
PoolHostId:         1177b65c
Health data 
  ATA Error Count:                0
  Reallocated Sectors:            0
  Reallocation Events:            0
  Spin Retry Count:               0
  Current Pending Sector Count:   2163
  Uncorrectable Sector Count:     0
  Temperature:                    34
  Start/Stop Count:               4057
  Power-On Hours:                 23338
  Power Cycle Count:              754
  Load Cycle Count:               4080

Device:             sdb
Controller:         0
Channel:            1
Model:              ST4000VN008-2DR166
Serial:             ZGY7LFFH
Firmware:           SC60
Class:              SATA
RPM:                5980
Sectors:            7814037168
Pool:               data
PoolType:           RAID 1
PoolState:          3
PoolHostId:         1177b65c
Health data 
  ATA Error Count:                0
  Reallocated Sectors:            0
  Reallocation Events:            0
  Spin Retry Count:               0
  End-to-End Errors:              0
  Command Timeouts:               0
  Current Pending Sector Count:   0
  Uncorrectable Sector Count:     0
  Temperature:                    30
  Start/Stop Count:               8
  Power-On Hours:                 45
  Power Cycle Count:              5
  Load Cycle Count:               51

mdstat

Personalities : [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] 
md1 : active raid1 sdb2[1] sda2[0]
      523264 blocks super 1.2 [2/2] [UU]
      
md127 : active raid1 sdb3[2](S) sda3[0]
      3902166784 blocks super 1.2 [2/1] [U_]
      
md0 : active raid1 sdb1[2] sda1[0]
      4190208 blocks super 1.2 [2/2] [UU]
      
unused devices: <none>
/dev/md/0:
           Version : 1.2
     Creation Time : Tue Jan  3 18:50:48 2017
        Raid Level : raid1
        Array Size : 4190208 (4.00 GiB 4.29 GB)
     Used Dev Size : 4190208 (4.00 GiB 4.29 GB)
      Raid Devices : 2
     Total Devices : 2
       Persistence : Superblock is persistent

       Update Time : Mon Oct 26 20:57:19 2020
             State : clean 
    Active Devices : 2
   Working Devices : 2
    Failed Devices : 0
     Spare Devices : 0

Consistency Policy : unknown

              Name : 1177b65c:0  (local to host 1177b65c)
              UUID : 9fc60477:074fbd3b:2f491de7:c513f18b
            Events : 5324

    Number   Major   Minor   RaidDevice State
       0       8        1        0      active sync   /dev/sda1
       2       8       17        1      active sync   /dev/sdb1
/dev/md/data-0:
           Version : 1.2
     Creation Time : Tue Jan  3 18:50:48 2017
        Raid Level : raid1
        Array Size : 3902166784 (3721.40 GiB 3995.82 GB)
     Used Dev Size : 3902166784 (3721.40 GiB 3995.82 GB)
      Raid Devices : 2
     Total Devices : 2
       Persistence : Superblock is persistent

       Update Time : Mon Oct 26 20:48:18 2020
             State : clean, degraded 
    Active Devices : 1
   Working Devices : 2
    Failed Devices : 0
     Spare Devices : 1

Consistency Policy : unknown

              Name : 1177b65c:data-0  (local to host 1177b65c)
              UUID : da7276bb:de92440c:3630b8ff:39b87372
            Events : 5565

    Number   Major   Minor   RaidDevice State
       0       8        3        0      active sync   /dev/sda3
       -       0        0        1      removed

       2       8       19        -      spare   /dev/sdb3

4 Replies

Replies have been turned off for this discussion
  • StephenB's avatar
    StephenB
    Guru - Experienced User

    This confirms that the data volume is degraded:

    md127 : active raid1 sdb3[2](S) sda3[0]
          3902166784 blocks super 1.2 [2/1] [U_]

    It should say [UU].

     

    But it's hard to say why, since the new disk looks healthy.  Try downloading the full log zip file, and look for errors in system.log and kernel.log.

     

    You might also look at the bottom of volume.log, and see if there is anything useful listed there (in the === maintenance history === section).

     

    You shouldn't post the full log zip here.  But you can ask the mods ( JohnCM_S and Marc_V ) to review them for you.  What you need to do is first upload the log zip to cloud storage (google drive, onedrive, dropbox, etc).  Then send a private message (PM) to the mods, using the envelope icon in the upper right of the forum page.  Include a download link to the log zip, and also a link to this forum thread.

    • Sandshark's avatar
      Sandshark
      Sensei

      One possibility is that the new drive is unhealthy, though not totally dead.  It does happen.  I recommend testing it in a PC (connected via USB dock or internal SATA) with Seatools.

    • JDFuzz's avatar
      JDFuzz
      Aspirant

      Hey, Thanks for taking the time to reply!

       

      It turns out that my essentials files were only around 300GB which I could offload and then perform a factory reset.

       

      After the fresh install and resync. It runs perfectly now!

       

      So can we safely rule out hardware at this point ?

       

      There only thing I had noticed prior to all this is my Videos share was showing that I was using 100,000's of GBs. Much more than I have on the NAS. I didn't pay it any attention as it didn't affect anything.

       

      I would love to find out what happened here for others or future me. Luckily I downloaded the logs prior to the reset.

       

      Below is the bottom of my volume.log

      === maintenance history ===
      device      operation  start_time           end_time             result     details                                                         
      ----------  ---------  -------------------  -------------------  ---------  ----------------------------------------------------------------
      data        resilver   2018-03-22 19:29:43  2018-03-23 05:38:19  completed                                                                  
      data        resilver   2020-10-19 19:23:52  2020-10-19 19:24:03  degraded                                                                   
      data        resilver   2020-10-25 00:00:51                                                                                                  
      data        resilver   2020-10-25 11:19:14  2020-10-26 06:28:40  degraded                                                                   
      data        disk test  2020-10-26 09:41:47                                                                                                  
      data        resilver   2020-10-26 10:42:41  2020-10-26 20:48:23  degraded                                                                   
      • StephenB's avatar
        StephenB
        Guru - Experienced User

        I suggest running the disk test from the volume settings wheel (after the volume is fully synced).  There might still be an issue with one of the two disks.

         

        The incorrect size of the video share suggests that there was some btrfs file system corruption on the first disk.  

         

        As far as the logs go, I suggest examining system.log and kernel.log for btrfs or disk error messages.  Look in the time window when the last resilvering was being done - 10:42:41 to 20:48:23 on 26 Oct.

NETGEAR Academy

Boost your skills with the Netgear Academy - Get trained, certified and stay ahead with the latest Netgear technology! 

Join Us!

ProSupport for Business

Comprehensive support plans for maximum network uptime and business peace of mind.

 

Learn More