× NETGEAR will be terminating ReadyCLOUD service by July 1st, 2023. For more details click here.
Orbi WiFi 7 RBE973
Reply

Re: RN424 Resynch Ended Degraded

SDBloco
Aspirant

RN424 Resynch Ended Degraded

Drives Installed: 2x Seagate ST4000DM006, 4 TB SATA 6Gb/s 64MB Cache, 7200 rpm.

Installed OS Version: ReadyNAS 6.10.2.

Description: In an effort to expand the capacity of this NAS system, I ordered two additional Seagate drives of the same model, size, and specifications. However, I received two incorrect drives, Seagate ST4000DM004 drives which are different only in their lower, 5400 rpm spin rate.

After installing the first of the new drives in slot 3 of the RN424 chassis, I received a message that Resynching had started on the volume (see below for a complete copy of the relevant logs). I also received a warning that I should not mix different disk types, noting that the RPM values are different. This was the first indication that I received that the disks shipped to me were not what I ordered. I removed the offending disk within 4 minutes, and am in the process of returning it to the vendor. I will be getting the correct disks soon.

My questions relate to the resulting state of the RAID-5 (X-RAID) volume, which continued to resynch after I removed the offending drive. This resynch took 18 hours, and left the volume in a degraded state. Neither of the two original drives is showing any ATA or Sector errors.

First, has the data on the volume been damaged? I have a complete and current backup, but prefer avoid the hassles of restoring 2.5TB. (For the moment, I have blocked users from making changes to data on the drive.)

Second, after reading Netgear ReadyNAS support forums, it appears the only way to force another resynch of the existing pair of drives is to pull one from the chassis and reinsert it. Can I initiate a resynch from the WebUI admin interface? If not, can this be done via SSH and the OS CLI?

Third, if I force another resynch, will that damage the data on the volume since it is currently in a degraded (but apparently useable) state?

Have I missed anything that I can do to fix the volume state short of a full resynch? I have tried to scrub the volume, and a disk test did not indicate anything notable. I do not see that defragging or balancing will have any impact.

Any and all advise will be appreciated.

Steve

 

Jan 17, 2020 01:47:15 PM	Disk: Disk Model:ST4000DM004-2CV104 Serial:WFN2XKKY was added to Channel 3 of the head unit.
Jan 17, 2020 01:47:24 PM	Volume: Resyncing started for Volume data.
Jan 17, 2020 01:47:24 PM	Volume: It is not recommended to mix different disk types. Current volume is using SATA 7200 RPM drives. Please replace the disk in channel 3 (Internal) to match the rest of the disks on volume data for best performance.
Jan 17, 2020 01:49:25 PM	Volume: It is not recommended to mix different disk types. Current volume is using SATA 7200 RPM drives. Please replace the disk in channel 3 (Internal) to match the rest of the disks on volume data for best performance.
Jan 17, 2020 01:50:15 PM	Disk: Disk Model:ST4000DM004-2CV104 Serial:WFN2XKKY was removed from Channel 3 of the head unit.
Jan 17, 2020 10:13:43 PM	System: Antivirus scanner definition file was updated to 59.25698.
Jan 18, 2020 07:56:20 AM	Volume: The resync operation finished on volume data. However, the volume is still degraded.
Jan 18, 2020 07:56:20 AM	Volume: Volume data health changed from Redundant to Degraded.
Jan 18, 2020 10:13:23 PM	System: Antivirus scanner definition file was updated to 59.25699.
Jan 18, 2020 11:40:15 PM	Volume: Scrub started for volume data.
Jan 19, 2020 12:08:45 AM	Volume: Scrub completed for volume data'.
Jan 19, 2020 01:00:11 AM	Volume: Volume data is Degraded.
Jan 19, 2020 02:00:14 AM	Volume: Disk test started for volume data.
Jan 19, 2020 08:04:49 AM	Volume: Disk test completed for volume data.
Jan 19, 2020 10:13:23 PM	System: Antivirus scanner definition file was updated to 59.25700.
Jan 20, 2020 10:10:26 AM	Volume: Volume data is Degraded.

MDSTAT.LOG:

Personalities : [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] 
md127 : active raid5 sda3[0] sdb3[1]
      7804333568 blocks super 1.2 level 5, 64k chunk, algorithm 2 [3/2] [UU_]
      
md1 : active raid1 sda2[0] sdb2[1]
      523264 blocks super 1.2 [2/2] [UU]
      
md0 : active raid1 sda1[0] sdb1[1]
      4190208 blocks super 1.2 [3/2] [UU_]
      
unused devices: <none>
/dev/md/0:
           Version : 1.2
     Creation Time : Wed Dec 13 03:18:04 2017
        Raid Level : raid1
        Array Size : 4190208 (4.00 GiB 4.29 GB)
     Used Dev Size : 4190208 (4.00 GiB 4.29 GB)
      Raid Devices : 3
     Total Devices : 2
       Persistence : Superblock is persistent

       Update Time : Mon Jan 20 22:26:33 2020
             State : clean, degraded 
    Active Devices : 2
   Working Devices : 2
    Failed Devices : 0
     Spare Devices : 0

Consistency Policy : unknown

              Name : 0a43991e:0  (local to host 0a43991e)
              UUID : 3eac258d:af34d0a3:71118549:35c346a8
            Events : 6233

    Number   Major   Minor   RaidDevice State
       0       8        1        0      active sync   /dev/sda1
       1       8       17        1      active sync   /dev/sdb1
       -       0        0        2      removed
/dev/md/1:
           Version : 1.2
     Creation Time : Fri Jan 17 13:50:17 2020
        Raid Level : raid1
        Array Size : 523264 (511.00 MiB 535.82 MB)
     Used Dev Size : 523264 (511.00 MiB 535.82 MB)
      Raid Devices : 2
     Total Devices : 2
       Persistence : Superblock is persistent

       Update Time : Mon Jan 20 22:14:16 2020
             State : clean 
    Active Devices : 2
   Working Devices : 2
    Failed Devices : 0
     Spare Devices : 0

Consistency Policy : unknown

              Name : 0a43991e:1  (local to host 0a43991e)
              UUID : 9752c51d:781a55d2:c42912aa:23434426
            Events : 19

    Number   Major   Minor   RaidDevice State
       0       8        2        0      active sync   /dev/sda2
       1       8       18        1      active sync   /dev/sdb2
/dev/md/data-0:
           Version : 1.2
     Creation Time : Wed Dec 13 03:18:04 2017
        Raid Level : raid5
        Array Size : 7804333568 (7442.79 GiB 7991.64 GB)
     Used Dev Size : 3902166784 (3721.40 GiB 3995.82 GB)
      Raid Devices : 3
     Total Devices : 2
       Persistence : Superblock is persistent

       Update Time : Mon Jan 20 22:25:23 2020
             State : clean, degraded 
    Active Devices : 2
   Working Devices : 2
    Failed Devices : 0
     Spare Devices : 0

            Layout : left-symmetric
        Chunk Size : 64K

Consistency Policy : unknown

              Name : 0a43991e:data-0  (local to host 0a43991e)
              UUID : e6277874:c66b9c7c:ba8ee70c:a6513d6c
            Events : 6269

    Number   Major   Minor   RaidDevice State
       0       8        3        0      active sync   /dev/sda3
       1       8       19        1      active sync   /dev/sdb3
       -       0        0        2      removed

 

Model: RN424|ReadyNAS 424 – High-performance Business Data Storage - 4-Bay
Message 1 of 4
Sandshark
Sensei

Re: RN424 Resynch Ended Degraded

While removing the drive as you did was a really bad idea, it does not appear you have lost any data.  The RAID is re-configured as a non-redundant RAID5 instead of a redundant RAID1, and should become redundant after you replace the removed drive and another resync completes.

Message 2 of 4
SDBloco
Aspirant

Re: RN424 Resynch Ended Degraded

While I agree that my choice to pull the drive during the resynch was a bad idea, I am discouraged that my choices at that point in time were exceedingly limited: I could have let the resynch complete, then removed the drive forcing it into a non-redundant state until a new drive is inserted and resynched. Or I could have powered down the whole system. I am not sure what that would have done to the synch state on reboot. What I expected to happen was that when the drive was removed, the resynch operation would stop and that I (the user) would receive information about the status of the system in a timely manner. This did not happen.

I am convinced that Netgear could improve this situation with some updates to the ReadyNAS OS. As I read the logs, the OS accepted the drive when it was inserted, then validated it against the existing pair of drives, finally issuing me a warning about the different spin rates. At the same time (the order of activity is unclear to me), it automatically started the resnych, which is normally fine, but should not happen if there are potential or actual drive compatibiltiy issues. I see this as a design flaw. Additionally, so far as I know, there is no way to interrupt the resynch once it is started. I see this as a weak user interface design, though I accept that RAID technology may not allow for such a feature--I am not an expert in RAID systems.

In this use case, a better design would test the new drive, log it, then make a simple decision: if the new drive is fully compatible, start the resynch. If the new drive is not fully compatible, pause the process and provide the user with information about the issue, and then allow the user to authorize the resynch or prevent it from starting. Since the drive incompatibilty was logged in my case, the information to make such a go/no-go test is clearly available within the OS; it is just not being used to gate the resynch process.

When I teach students user interface design, I stress that getting the user to approve long or potentially problematic operations before they start is the best way to gain their trust in a system. Such approvals do require a modicum of information, enabling the user to make an informed choice. I do not belive that the resynch functionality within the ReadyNAS OS meets this standard today.

Message 3 of 4
StephenB
Guru

Re: RN424 Resynch Ended Degraded


@SDBloco wrote:

As I read the logs, the OS accepted the drive when it was inserted, then validated it against the existing pair of drives, finally issuing me a warning about the different spin rates. At the same time (the order of activity is unclear to me), it automatically started the resnych, which is normally fine, but should not happen if there are potential or actual drive compatibiltiy issues. I see this as a design flaw.

FWIW, in my opinion the warning is overstated.  RAID works perfectly well with mismatched drive speeds - I ran my Pro-6 for several years with mismatched RPM and had no problem at all.  The performance of the RAID array is gated by the slowest drive (all things being equal, seek time will be longer on the slower drives).  But RPM is only one of several factors that can limit drive performance. So IMO the drive was compatible. The RAID software doesn't require that the drives in the array perform identically (which is a good thing, since they won't - even if they are the same model).

 

@Sandshark has raised a concern about mismatched spin-up times.  Though I agree that could create issues, I'd say that spin-up time isn't just a function of RPM (a faster RPM drive might not take longer to spin up than a slower RPM one).  And the NAS should handle mismatched spin-up times anyway - if it doesn't, then there are bugs that should be fixed.

 


@SDBloco wrote:

Additionally, so far as I know, there is no way to interrupt the resynch once it is started. I see this as a weak user interface design, though I accept that RAID technology may not allow for such a feature--I am not an expert in RAID systems.

 


When the array is being expanded, your data is being moved onto it (the entire array is being re-organized).  You can't just stop that process, you need to undo it somehow.  

 

It can be safely undone after the process completes, and it would be good if Netgear provided more tools in that area.  But it would be difficult to support cancellation of an expansion while it is in process -  users will manipulate the disks as they will (not as Netgear anticipates), and there are other scenarios (power cuts and failures). Personally I think it would be unwise for them attempt it.  There's a reason why we recommend updating your backup before manipulating disks.

Message 4 of 4
Top Contributors
Discussion stats
  • 3 replies
  • 709 views
  • 0 kudos
  • 3 in conversation
Announcements