NETGEAR is aware of a growing number of phone and online scams. To learn how to stay safe click here.
Forum Discussion
Sandshark
Sep 04, 2017Sensei - Experienced User
EDA500 on RN516 - Scrub very slow
This problem has been previously reported by another user on an earlier version of the OS: ReadyNAS-516-2x-EDA500-Scrub-on-EDA500-very-slow. It persists in OS6.7.5. As scheduled, my main data vo...
kohdee
Oct 27, 2017NETGEAR Expert
A scrub kicks off both a btrfs scrub and an mdadm resync. 5 disks relying on 1 eSATA connection to perform extreme recalculations on 2 fronts (RAID and FS) is very intensive and is very slow. You could move your EDA500 disks to your head unit and let the operations continue, then move them back.
- SandsharkOct 27, 2017Sensei - Experienced User
Does it do them concurrently? Maybe that's the issue, but 54 days still seems like a very long time. Why would runing two processes take less CPU time than me running just one via SSH? It didn't take anywhere near that long for the orignal sync, so why should a resync take that long? I'll have to kick off another and see what /proc/mdstat says about resync progress while this is going on. Maybe the two processes are fighting over access to the same area of the array and that slows them both down, but would that not also occur on the main array?
I have to admit I did not let it complete, but I did let it go more than two days to see if it was just the reported progress that was wrong.. Maybe it would have sped up at some point. After two days, I Googled how to find the progress in SSH and that's when I found the progress shown in SSH was identical to that shown in the GUI, so I then trusted the progress report and my resulting time to go estimate were accurate and aborted it.
As far as moving the array to the main chassis for this, that's just not a real solution. I keep everything I need daily access to on the main array and computer backups and such on the EDA500 (actually, now two of them).
- mdgm-ntgrOct 29, 2017NETGEAR Employee Retired
I believe it is concurrent.
In the initial sync we can do things faster as there's not existing stuff to sync across. If you replace a disk in your EDA500 you'll find the sync to rebuild is longer than the initial sync when creating the volume.
The larger the disk capacity the longer things will take as there's more to check.54 days for a scrub does still seem like a very long time even in an EDA500.
If moving the disks to the main chassis is not practical you could find that additional main units is better than using EDA500 units for you.
I would think a volume in any of our current main units would significantly outperform one in the EDA500
- SandsharkOct 29, 2017Sensei - Experienced User
OK, so this is what top looks like with a GUI-intiated scrub:
top - 16:48:32 up 7 days, 22:33, 1 user, load average: 4.71, 1.55, 0.64 Tasks: 314 total, 1 running, 313 sleeping, 0 stopped, 0 zombie %Cpu(s): 0.0 us, 1.1 sy, 0.0 ni, 98.9 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st KiB Mem: 16297764 total, 15755896 used, 541868 free, 11572 buffers KiB Swap: 1569788 total, 0 used, 1569788 free. 14749048 cached Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2006 root 20 0 0 0 0 S 4.0 0.0 0:05.13 md123_raid5 15024 root 20 0 0 0 0 D 1.0 0.0 0:00.57 md123_resync 4226 root 20 0 6344 1728 1600 S 0.3 0.0 10:42.07 wsdd2 4452 root 20 0 661376 14428 9676 S 0.3 0.1 2:00.16 zerotier-one 1 root 20 0 136976 7264 5144 S 0.0 0.0 0:09.02 systemd 2 root 20 0 0 0 0 S 0.0 0.0 0:00.18 kthreadd 3 root 20 0 0 0 0 S 0.0 0.0 0:12.67 ksoftirqd/0 5 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/0:0H 7 root 20 0 0 0 0 S 0.0 0.0 2:40.28 rcu_sched 8 root 20 0 0 0 0 S 0.0 0.0 0:00.00 rcu_bh 9 root rt 0 0 0 0 S 0.0 0.0 0:02.24 migration/0 10 root rt 0 0 0 0 S 0.0 0.0 0:01.88 watchdog/0 11 root rt 0 0 0 0 S 0.0 0.0 0:01.86 watchdog/1 12 root rt 0 0 0 0 S 0.0 0.0 0:01.69 migration/1 13 root 20 0 0 0 0 S 0.0 0.0 0:09.43 ksoftirqd/1 15 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/1:0H
And here it is is with scrub initiated via SSH:
top - 16:58:13 up 4 min, 1 user, load average: 1.90, 0.98, 0.42 Tasks: 316 total, 1 running, 315 sleeping, 0 stopped, 0 zombie %Cpu(s): 0.0 us, 13.8 sy, 0.0 ni, 86.1 id, 0.1 wa, 0.0 hi, 0.1 si, 0.0 st KiB Mem: 16297764 total, 983956 used, 15313808 free, 11252 buffers KiB Swap: 1569788 total, 0 used, 1569788 free. 565484 cached Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 71 root 20 0 0 0 0 S 11.0 0.0 0:07.33 kworker/u8:3 1035 root 20 0 0 0 0 S 10.0 0.0 0:07.31 kworker/u8:7 1056 root 20 0 0 0 0 S 9.6 0.0 0:07.77 kworker/u8:10 28 root 20 0 0 0 0 S 9.0 0.0 0:07.83 kworker/u8:1 43 root 20 0 0 0 0 S 7.3 0.0 0:06.73 kworker/u8:2 1054 root 20 0 0 0 0 S 6.7 0.0 0:05.52 kworker/u8:9 5609 root 20 0 32168 204 16 S 3.0 0.0 0:03.04 btrfs 1777 root 0 -20 0 0 0 S 1.3 0.0 0:01.54 kworker/2:1H 4219 root 20 0 6344 1764 1628 S 0.7 0.0 0:00.90 wsdd2 1745 root 0 -20 0 0 0 S 0.3 0.0 0:00.24 kworker/0:1H 5673 root 20 0 28892 3068 2424 R 0.3 0.0 0:00.14 top 1 root 20 0 136976 7136 5100 S 0.0 0.0 0:01.51 systemd 2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kthreadd 3 root 20 0 0 0 0 S 0.0 0.0 0:00.00 ksoftirqd/0 4 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kworker/0:0 5 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/0:0H 6 root 20 0 0 0 0 S 0.0 0.0 0:00.23 kworker/u8:0
Here is what it looks like if I start a scrub via the GUI and cancel it (but not the resync) via SSH:
top - 17:02:18 up 8 min, 1 user, load average: 2.73, 1.85, 0.90 Tasks: 305 total, 1 running, 304 sleeping, 0 stopped, 0 zombie %Cpu(s): 0.0 us, 1.0 sy, 0.0 ni, 99.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st KiB Mem: 16297764 total, 1024016 used, 15273748 free, 11252 buffers KiB Swap: 1569788 total, 0 used, 1569788 free. 579124 cached Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2000 root 20 0 0 0 0 S 4.0 0.0 0:05.24 md123_raid5 4219 root 20 0 6344 1764 1628 S 0.7 0.0 0:01.72 wsdd2 6623 root 20 0 0 0 0 D 0.7 0.0 0:01.13 md123_resync 7 root 20 0 0 0 0 S 0.3 0.0 0:00.20 rcu_sched 4680 nut 20 0 17260 1508 1112 S 0.3 0.0 0:00.77 usbhid-ups 1 root 20 0 136976 7136 5100 S 0.0 0.0 0:01.54 systemd 2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kthreadd 3 root 20 0 0 0 0 S 0.0 0.0 0:00.00 ksoftirqd/0 4 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kworker/0:0 5 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/0:0H 8 root 20 0 0 0 0 S 0.0 0.0 0:00.00 rcu_bh 9 root rt 0 0 0 0 S 0.0 0.0 0:00.01 migration/0 10 root rt 0 0 0 0 S 0.0 0.0 0:00.00 watchdog/0 11 root rt 0 0 0 0 S 0.0 0.0 0:00.00 watchdog/1 12 root rt 0 0 0 0 S 0.0 0.0 0:00.01 migration/1 13 root 20 0 0 0 0 S 0.0 0.0 0:00.00 ksoftirqd/1
So, yes, there is a resync in progress when scrub is initiated via the GUI that is not there when I do it via SSH. When I cancel the scrub via SSH, very little changes. If I resume the scrub with the resync still ongoing, it looks the same as if I never cancelled it. If I initiate just the scrub via SSH, then all of those kworker tasks are busy doing the scrub that are not even in the top ten processes when the resync is also ongoing. Clearly, something about having an ongoing resync is seriously affecting the scrub on the EDA500. It's not CPU availability -- the resync takes little CPU. So, it must be the I/O channel. My best guess is that the resync process is keeping the eSATA port multiplier "locked" to one drive and so the scrub process cannot access any others.
BTW, here is what cat /proc/mdstat reports on the sync:
md123 : active raid5 sdm3[0] sdq3[4] sdp3[3] sdo3[5] sdn3[1] 7794659328 blocks super 1.2 level 5, 64k chunk, algorithm 2 [5/5] [UUUUU] [>....................] resync = 3.5% (69154404/1948664832) finish=1209.6min speed=25895K/sec
So the re-sync in and of itself is also not the issue, it will complete within a reasonable time (this is for an array half the size of the other, but even double this is reasonable).
I don't know the solution -- maybe doing the processes sequentially instead of concurrently. But it is definately an unacceptable situation that need attention. Any excuse that "a second NAS is a better solution" is just that -- an excuse. Netgear sold the product, and the OS should play well with it. I could accept a 3x or even maybe 4x longer task. 25x or more is just insane.
Related Content
NETGEAR Academy

Boost your skills with the Netgear Academy - Get trained, certified and stay ahead with the latest Netgear technology!
Join Us!