NETGEAR is aware of a growing number of phone and online scams. To learn how to stay safe click here.

Forum Discussion

Sandshark's avatar
Sandshark
Sensei
May 21, 2022

RAID re-sync with BTRFS scrub

I've not had a ReadyNAS with a multi-tier XRAID volume for so long, I'm not sure of this.  Am I correct that the standard UI-initiated or scheduled scrub will do a RAID re-sync on all RAID groups of a multi-group volume?  I seem to remember it did, so you'd see the first group in re-sync and others pending if you looked at mdstat early on in the scrub.

 

I ask because I have two groups in FlexRAID where I created a second group (with only some drives larger) manually but as XRAID-like as I could, but I just noticed that the second layer isn't getting a re-sync with a scrub.  I expect that's because there is a database somewhere that I don't know how to update that tells the OS what layers belong to the volume, even though it shows both RAID groups on the Volumes page.  But maybe it's just a bug.

 

I can always initiate a re-sync of the other group via SSH, if I feel it's needed, but I'm curious to know if this is a limitation of my manual expansion (as detailed here: How-to-do-incremental-vertical-expansion-in-FlexRAID-mode ) or something else.

4 Replies

Replies have been turned off for this discussion
  • StephenB's avatar
    StephenB
    Guru - Experienced User

    I looked at the log zip from the last scrub from one of my NAS, and it did resync both RAID groups:

     

    Mar 01 01:00:11 NAS readynasd[5859]: Scrub started for volume data.
    Mar 01 01:00:11 NAS kernel: md: requested-resync of RAID array md127
    Mar 03 03:31:56 NAS kernel: md: md127: requested-resync done.
    Mar 03 03:31:56 NAS kernel: md: requested-resync of RAID array md126
    Mar 03 13:21:26 NAS kernel: md: md126: requested-resync done.
    Mar 03 13:21:30 NAS readynasd[5859]: Scrub completed for volume data'.

    FWIW, there is nothing in the log files for the btrfs scrub. 

     

    That does seem to be happening (my RN202 has jbod volumes, and scheduled scrubs on those take several hours to complete, with no md messages).  But I don't find logging for btrfs - perhaps because it is completing with no errors? 

     

    • Sandshark's avatar
      Sandshark
      Sensei

      Thanks.  Looking at the log, I now see it did do both sync's.  The difference from what I (at least thought I) remembered is that it doesn't have one going and one pending at the beginning.  It (now?) requests the second sync after the first completes.

       

      And having looked at it, I see it's struggling with one drive, though all that drive's SMART shows is one pending sector.  All the errors but one were correctable, so I guess that's associated with the pending sector, but you'd think the OS would alert you to something being amiss when there are a dozen errors like this:

      [Wed May 11 14:50:56 2022] md/raid:md127: read error corrected (8 sectors at 2704135552 on sdf3).

       

      The drive in question being one of the older, smaller ones, I'm going to replace it with a bigger one.  But this removed drive won't get relegated to a backup unit, as I have done with the ones replaced just to increase storage.

       

      But the fact that they were corrected should serve notice to those not running periodic scrubs.  If not for the scrub, these would not have been corrected, and likely would continue to accumulate.

      • StephenB's avatar
        StephenB
        Guru - Experienced User

        Sandshark wrote:

        Thanks.  Looking at the log, I now see it did do both sync's.  The difference from what I (at least thought I) remembered is that it doesn't have one going and one pending at the beginning.  It (now?) requests the second sync after the first completes.

         


        FWIW, I looked at mdstat partway through the scrub, and saw this:

         

        Personalities : [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] 
        md126 : active raid5 sda4[0] sdd4[3] sdc4[2] sdb4[1]
              11717352576 blocks super 1.2 level 5, 64k chunk, algorithm 2 [4/4] [UUUU]
              	resync=DELAYED
              
        md127 : active raid5 sda3[5] sdd3[7] sdc3[6] sdb3[4]
              17567012352 blocks super 1.2 level 5, 64k chunk, algorithm 2 [4/4] [UUUU]
              [=========>...........]  resync = 45.1% (2643152640/5855670784) finish=1293.6min speed=41386K/sec
        

         

        So I think the ReadyNAS application is requesting both at the beginning, but mdadm is delaying the second one until the first one completes.

         


        Sandshark wrote:

        And having looked at it, I see it's struggling with one drive, though all that drive's SMART shows is one pending sector.  All the errors but one were correctable, so I guess that's associated with the pending sector, but you'd think the OS would alert you to something being amiss when there are a dozen errors like this:

        [Wed May 11 14:50:56 2022] md/raid:md127: read error corrected (8 sectors at 2704135552 on sdf3).

         


        I agree that the OS should give more alerts on disk issues than it does.  IMO, all disk errors should be shown in the UI log (along with increases in the SMART error counts).

NETGEAR Academy

Boost your skills with the Netgear Academy - Get trained, certified and stay ahead with the latest Netgear technology! 

Join Us!

ProSupport for Business

Comprehensive support plans for maximum network uptime and business peace of mind.

 

Learn More