NETGEAR is aware of a growing number of phone and online scams. To learn how to stay safe click here.
Forum Discussion
btaroli
Jan 15, 2017Prodigy
6.6.1 Scrub Hammers CPU
And when I say hammers, I mean it runs with -1 prio and causes 6-8 kernel worker threads each vying to consume 100% of CPU. So rabid is this consumption that all other background processes, including one's third party apps and Time Machine backups, just cease to function.
So with all the attention to being a good neighbor during resyncs and whatnot, why is scrub such a terrible neighbor? I'd love to enable it to run every month or quarter, but I can't stand to have my NAS more or less inoperable for my needs while it's running.
36 Replies
Replies have been turned off for this discussion
- mdgm-ntgrNETGEAR Employee Retired
Which model is this on?
Can you send in your logs (see the Sending Logs link in my sig)?- btaroliProdigy
Sure, I'd be glad to. This is on a 528.
Just for the sake of completeness, here's top with just background running...
top - 03:34:19 up 3 days, 49 min, 2 users, load average: 0.01, 0.03, 0.06 Tasks: 241 total, 2 running, 239 sleeping, 0 stopped, 0 zombie %Cpu(s): 0.4 us, 0.7 sy, 0.0 ni, 98.8 id, 0.1 wa, 0.0 hi, 0.0 si, 0.0 st KiB Mem: 16303964 total, 15086028 used, 1217936 free, 2468 buffers KiB Swap: 3139580 total, 0 used, 3139580 free. 13522476 cached Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 11971 root 20 0 3584900 356660 27084 S 4.0 2.2 182:46.02 /apps/dvblink-tv-server/dvblink_server 10318 root 20 0 233308 31464 5660 S 1.0 0.2 40:27.23 /usr/bin/python /apps/dropboxmanager/web/manage.py run+ 5138 root 20 0 992516 12028 8796 S 0.3 0.1 18:19.58 /opt/p2p/bin/leafp2p -n 5419 root 20 0 28788 3060 2468 R 0.3 0.0 7:33.54 top 22590 root 20 0 0 0 0 S 0.3 0.0 0:00.30 [kworker/2:6] 1 root 20 0 202460 6504 4516 S 0.0 0.0 0:38.41 /sbin/init 2 root 20 0 0 0 0 S 0.0 0.0 0:00.10 [kthreadd] 3 root 20 0 0 0 0 S 0.0 0.0 0:06.33 [ksoftirqd/0] 5 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 [kworker/0:0H] 7 root 20 0 0 0 0 R 0.0 0.0 0:57.85 [rcu_sched] 8 root 20 0 0 0 0 S 0.0 0.0 0:00.00 [rcu_bh] 9 root rt 0 0 0 0 S 0.0 0.0 0:00.21 [migration/0] 10 root rt 0 0 0 0 S 0.0 0.0 0:00.83 [watchdog/0] 11 root rt 0 0 0 0 S 0.0 0.0 0:00.85 [watchdog/1] 12 root rt 0 0 0 0 S 0.0 0.0 0:00.24 [migration/1] 13 root 20 0 0 0 0 S 0.0 0.0 0:03.26 [ksoftirqd/1] 15 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 [kworker/1:0H]
And here's what it looks like shortly after kicking off a scrub.
top - 03:34:19 up 3 days, 49 min, 2 users, load average: 0.01, 0.03, 0.06 Tasks: 241 total, 2 running, 239 sleeping, 0 stopped, 0 zombie top - 03:39:04 up 3 days, 53 min, 2 users, load average: 2.67, 0.65, 0.26 Tasks: 249 total, 7 running, 242 sleeping, 0 stopped, 0 zombie %Cpu(s): 0.5 us, 95.1 sy, 0.0 ni, 3.5 id, 0.0 wa, 0.0 hi, 0.9 si, 0.0 st KiB Mem: 16303964 total, 15111680 used, 1192284 free, 2468 buffers KiB Swap: 3139580 total, 0 used, 3139580 free. 13537484 cached Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 14083 root 20 0 0 0 0 R 69.1 0.0 0:10.26 [kworker/u8:2] 18741 root 20 0 0 0 0 R 59.1 0.0 0:10.26 [kworker/u8:0] 22107 root 20 0 0 0 0 R 55.5 0.0 0:08.28 [kworker/u8:1] 23262 root 20 0 0 0 0 R 55.1 0.0 0:08.74 [kworker/u8:9] 23253 root 20 0 0 0 0 R 51.5 0.0 0:05.02 [kworker/u8:7] 9426 root 20 0 0 0 0 R 50.1 0.0 0:09.49 [kworker/u8:5] 2455 root 20 0 0 0 0 S 18.9 0.0 1:37.18 [md126_raid6] 23229 root 19 -1 40340 212 12 S 10.0 0.0 0:02.30 btrfs scrub start /data 18976 root 20 0 0 0 0 S 4.3 0.0 0:02.26 [kworker/u8:8] 11971 root 20 0 3584900 356660 27084 S 3.7 2.2 182:56.16 /apps/dvblink-tv-server/dvblink_server 5335 root 19 -1 1589984 59196 12228 S 2.0 0.4 5:41.17 /usr/sbin/readynasd -v 3 -t 10318 root 20 0 233308 31464 5660 S 1.7 0.2 40:29.84 /usr/bin/python /apps/dropboxmanager/web/manage.py run+ 23226 root 39 19 0 0 0 D 1.3 0.0 0:00.35 [md126_resync] 2335 root 0 -20 0 0 0 S 0.7 0.0 0:01.65 [kworker/1:1H] 5138 root 20 0 992516 12028 8796 S 0.7 0.1 18:20.82 /opt/p2p/bin/leafp2p -n 22590 root 20 0 0 0 0 S 0.7 0.0 0:01.27 [kworker/2:6] 2340 root 0 -20 0 0 0 S 0.3 0.0 0:27.69 [kworker/0:1H]
- FramerVNETGEAR Employee Retired
Hi btaroli,
Have you sent your logs already?
I will be sending an inquiry to our subject matter expert about it if have you did.
Regards,
- btaroliProdigyI haven't had the AV service enable for years now. I was hopeful to do so Hefner it was replaced recently, but the false positive issue reversed that.
I gather 6.7.1 fixes that, but baby steps I shall take in re-enabling functions. :D - ctechsApprentice
We experience the "bad neighbor" scrub behavior too - I don't dare schedule a scrub during business hours: the responsiveness of the ReadyNAS over SMB suffers pretty severely. CPU load average 2.5-3 during a scrub on a ReadyNAS 516. OS 6.7.1, have never used the antivirus feature.
Can this be tamed down?
- LaserbaitLuminary
Exactly! It'd be great to be able to throttle the scrub somehow. Maybe a selectable priority, or limit it to a single thread/core.
- btaroliProdigy
Well this is confirmed to still be happening in 6.7.1. I observe that overall I/O and wait time seem OK. Indeed the journal entries that pop up as the job starts suggest it's throttling on I/O rate. However the kernel worker processes still monopolize the CPU cores/threads. A certain amount of application CPU usage seems OK, but if you have anything like PLEX transcribes or AFS based Time Machine, which cause a fair amount of CPU activity themselves, then these other processes get starved to the point of being almost unusable.
Some of this is just Btrfs behavior, which I can compare to similar operations I do on even newer kernels on other Linux machines. But when you have a server environment where there is an expectation of responsiveness from applications, it can be problematic. On this front, the only thing I would consider a standard OS issue is Time Machine backups. These are CPU intensive and will be seriously delayed if not fail outright -- based on previous painful experiences. In this run I'm not allowing TM to even trigger until the scrub finishes. I know for a fact I'll wind up having to trash my whole backup archive if I let it try and fail.
As for PLEX, I can work around the issue by enabling direct play and disabling transcode in the client config. But hopefully we get to a point where scrubs will gracefully butt out when other activity requires CPU attention.
- btaroliProdigyDid you verify the scrub is still running? How much data is in your volume? If memory serves, it only scrubs data and not free space. I'll have to check action of -A as well, because it seems you picked up kernel threads and I wonder if it includes use space stuff too? Not familiar with that option and can't check until my toddler is down for his nap.
- LaserbaitLuminary
Yes, scrub is running, currently 4.85% completed.
- btaroliProdigy
https://manpages.debian.org/stretch/procps/ps.1.en.html
-A is a synonym for -a, selecting all processes
What you seem to be missing is an option for output format. I generally use -f, but there are others.
If you only see the name of the program being run, then ALL Btrfs commands show up as "btrfs". You need to see the command arguments in order to know which function is being called, such as scrub.
So try "ps -Af"
- btaroliProdigyIn the case of PLEX and DVBLink it's definitely the transcoding that was most affected. Recordings and other operations seemed ok.
Time Machine (really the AFP file service) would get so starved for cycles that my backup would corrupt and get flagged to be replaced (as TM is want to do) whenever it attempted to run during a scrub.
I fully appreciate the de ription of the scrub processes you present here, but I can tell you that until I forced them into idle with ionice I was having exactly the problems described. And once having done it the behaviors stopped.
I was rather hope to let the scrub finish, since this value has never been scrubbed fully, and it's 5 days in now at about 55%. But I'm happy to restart the process to compare the ioprio on the breads as reported and detail/verify the outcome of the tweaks. Been through a few iterations during this run. But I am in a state now where things are humming along nicely, even if the scrub is taking a while -- which is fine by me.
The affect on the other processes (whether it's CPU, IO, or a combination of both) is real and extremely annoying, however. So something does need adjustment here.
I do observed on Fedora that btrfs scrub behaves as described on the btrfs wiki and your description. The base process is in ioprio B4, as well as one of the threads, the thread doing the effective IO is in idle. Course on this system effective CPU utilization is under 40% (four threads) and iowait is almost nil. Debian is of course quite different in how kernel threads operate, but this is what I see on fedora.
I'll restart the scrub on the NAS later tonight and see what I see there to compare. Not totally apples and apples, but I think it's odd how btrfs stopped getting in the way of other services until it was ionice'd directly.
I'll report back once I've had a chance to observe it again without manual adjustments.- SkywalkerNETGEAR Expert
It's worth noting that the resources consumed by btrfs scrub can vary widely depending on the files it's scrubbing at the time. Very fragmented files will result in a lot more I/O wait time, and more CPU consumption. Also, a ReadyNAS volume scrub also includes a MD RAID scrub, which adds to the I/O load. MD is generally pretty good at yielding to other I/O consumers, but it's still a minimum 30MB/sec used for that plus the CPU usage for parity calculation.
- Michael_OzLuminary
Still on 6.7.4.
Started scrub.
Three threads as described above, main process & two threads (shown with H command to htop)
PPID PID USER IO IORR IOWR IO PRI NI VIRT RES SHR S CPU% MEM% TIME+ Command 1 29003 root 41462 41462 0 B3 19 -1 32180 200 16 S 5.6 0.0 0:26.28 `- btrfs scrub start /N316AR6 1 29005 root 0 0 0 B3 19 -1 32180 200 16 S 0.0 0.0 0:00.03 | `- btrfs 1 29004 root 41462 41462 0 id 19 -1 32180 200 16 D 5.6 0.0 0:26.26 | `- btrfs
I'm reading that as the IO/IORR/IOWR & CPU usage is accounted for in the main process too, rather than both doing I/O & CPU.
Note active thread is idle iopriority, nice -1 (as most things spawned from readynas seem to inherit -1).
However, sort by CPU, (K command to show kernal threads), everything with CPU > 0.
PPID PID USER IO IORR IOWR IO PRI NI VIRT RES SHR S CPU% MEM% TIME+ Command 2 29066 root 0 0 0 B4 20 0 0 0 0 R 39.9 0.0 3:52.37 kworker/u8:10 2 29043 root 0 0 0 B4 20 0 0 0 0 R 38.2 0.0 4:47.15 kworker/u8:9 2 28656 root 0 0 0 B4 20 0 0 0 0 R 37.0 0.0 4:37.22 kworker/u8:2 2 28793 root 0 0 0 B4 20 0 0 0 0 R 37.0 0.0 4:42.18 kworker/u8:3 2 30086 root 0 0 0 B4 20 0 0 0 0 R 35.8 0.0 1:05.82 kworker/u8:14 2 28903 root 57.2 57.2 0 B4 20 0 0 0 0 S 33.4 0.0 4:37.18 kworker/u8:8 2 28071 root 0 0 0 B4 20 0 0 0 0 R 33.4 0.0 4:27.62 kworker/u8:0 2 29095 root 19.2 19.2 0 B4 20 0 0 0 0 R 31.6 0.0 1:44.06 kworker/u8:13 2 28902 root 0 0 0 B4 20 0 0 0 0 R 31.0 0.0 4:56.74 kworker/u8:7 2 1515 root 0 0 0 B4 17 -1 0 0 0 R 25.6 0.0 35h04:21 md127_raid6 2 27507 root 0 0 0 B4 20 0 0 0 0 R 17.9 0.0 4:31.39 kworker/u8:1 2 29067 root 0 0 0 B4 20 0 0 0 0 R 17.9 0.0 4:30.13 kworker/u8:11 1 29003 root 45490 45490 0 B3 19 -1 32180 200 16 S 6.0 0.0 1:12.96 btrfs scrub start /N316AR6 1 29004 root 45517 45517 0 id 19 -1 32180 200 16 D 6.0 0.0 1:12.87 btrfs 2 29001 root 0 0 0 ?? 39 19 0 0 0 R 3.0 0.0 2:22.40 md127_resync 28517 28522 root 0 0 0 B4 20 0 29460 3688 3004 R 3.0 0.2 0:56.68 htop 2 1402 root 0 0 0 B0 0 -20 0 0 0 S 1.8 0.0 0:54.53 kworker/1:1H 2 1401 root 0 0 0 B0 0 -20 0 0 0 S 1.2 0.0 0:50.00 kworker/0:1H 2 1359 root 0 0 0 B0 0 -20 0 0 0 S 0.6 0.0 3:29.12 kworker/2:1H 2 1389 root 0 0 0 B0 0 -20 0 0 0 S 0.6 0.0 3:34.20 kworker/3:1H 2 7 root 0 0 0 B4 20 0 0 0 0 R 0.6 0.0 4:10.65 rcu_sched 1 3703 nut 0 0 0 B4 20 0 17240 1296 932 S 0.6 0.1 1:31.42 /lib/nut/usbhid-ups -a UPS
Note all the CPU chewed up by kworker threads, Nice 0, doing little IO. (kworker IO increased later, similar numbers as above, but more threads doing IO)
btaroli's media processes (infering from the above iopriority B7/B6 - I don't know what they are doing) will have lesser nice values, and be CPU constrained. Those at B4 will have Nice 0, round-robin with all those kworker's, so ~1/12th CPU (?5/17th?).
How changing the iopriority of the other two threads can change this I can't fathom ATM.
Perhaps media threads with a 'interactive' workload should run nice -1 for now.
Longer term there should be a gap between current -1 processes (readynas etc) and default worker thread priority (currently 0 ), so intermidiate priority things can fit in the middle??
I'll repeat this on 6.7.5 when I get around to it, I'm currently juggling 10TB disks upgrading...nothing happens fast...
Related Content
NETGEAR Academy

Boost your skills with the Netgear Academy - Get trained, certified and stay ahead with the latest Netgear technology!
Join Us!