× NETGEAR will be terminating ReadyCLOUD service by July 1st, 2023. For more details click here.
Orbi WiFi 7 RBE973
Reply

Re: 6.6.1 Scrub Hammers CPU

btaroli
Prodigy

6.6.1 Scrub Hammers CPU

And when I say hammers, I mean it runs with -1 prio and causes 6-8 kernel worker threads each vying to consume 100% of CPU. So rabid is this consumption that all other background processes, including one's third party apps and Time Machine backups, just cease to function.

 

So with all the attention to being a good neighbor during resyncs and whatnot, why is scrub such a terrible neighbor? I'd love to enable it to run every month or quarter, but I can't stand to have my NAS more or less inoperable for my needs while it's running.

Model: RN51600|ReadyNAS 516 6-Bay
Message 1 of 37
mdgm-ntgr
NETGEAR Employee Retired

Re: 6.6.1 Scrub Hammers CPU

Which model is this on?

Can you send in your logs (see the Sending Logs link in my sig)?

Message 2 of 37
btaroli
Prodigy

Re: 6.6.1 Scrub Hammers CPU

Sure, I'd be glad to. This is on a 528.

 

Just for the sake of completeness, here's top with just background running...

 

top - 03:34:19 up 3 days, 49 min,  2 users,  load average: 0.01, 0.03, 0.06
Tasks: 241 total,   2 running, 239 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.4 us,  0.7 sy,  0.0 ni, 98.8 id,  0.1 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem:  16303964 total, 15086028 used,  1217936 free,     2468 buffers
KiB Swap:  3139580 total,        0 used,  3139580 free. 13522476 cached Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                
11971 root      20   0 3584900 356660  27084 S   4.0  2.2 182:46.02 /apps/dvblink-tv-server/dvblink_server                 
10318 root      20   0  233308  31464   5660 S   1.0  0.2  40:27.23 /usr/bin/python /apps/dropboxmanager/web/manage.py run+
 5138 root      20   0  992516  12028   8796 S   0.3  0.1  18:19.58 /opt/p2p/bin/leafp2p -n                                
 5419 root      20   0   28788   3060   2468 R   0.3  0.0   7:33.54 top                                                    
22590 root      20   0       0      0      0 S   0.3  0.0   0:00.30 [kworker/2:6]                                          
    1 root      20   0  202460   6504   4516 S   0.0  0.0   0:38.41 /sbin/init                                             
    2 root      20   0       0      0      0 S   0.0  0.0   0:00.10 [kthreadd]                                             
    3 root      20   0       0      0      0 S   0.0  0.0   0:06.33 [ksoftirqd/0]                                          
    5 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 [kworker/0:0H]                                         
    7 root      20   0       0      0      0 R   0.0  0.0   0:57.85 [rcu_sched]                                            
    8 root      20   0       0      0      0 S   0.0  0.0   0:00.00 [rcu_bh]                                               
    9 root      rt   0       0      0      0 S   0.0  0.0   0:00.21 [migration/0]                                          
   10 root      rt   0       0      0      0 S   0.0  0.0   0:00.83 [watchdog/0]                                           
   11 root      rt   0       0      0      0 S   0.0  0.0   0:00.85 [watchdog/1]                                           
   12 root      rt   0       0      0      0 S   0.0  0.0   0:00.24 [migration/1]                                          
   13 root      20   0       0      0      0 S   0.0  0.0   0:03.26 [ksoftirqd/1]                                          
   15 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 [kworker/1:0H]                                         

And here's what it looks like shortly after kicking off a scrub.

 

top - 03:34:19 up 3 days, 49 min,  2 users,  load average: 0.01, 0.03, 0.06
Tasks: 241 total,   2 running, 239 sleeping,   0 stopped,   0 zombie
top - 03:39:04 up 3 days, 53 min,  2 users,  load average: 2.67, 0.65, 0.26
Tasks: 249 total,   7 running, 242 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.5 us, 95.1 sy,  0.0 ni,  3.5 id,  0.0 wa,  0.0 hi,  0.9 si,  0.0 st
KiB Mem:  16303964 total, 15111680 used,  1192284 free,     2468 buffers
KiB Swap:  3139580 total,        0 used,  3139580 free. 13537484 cached Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                
14083 root      20   0       0      0      0 R  69.1  0.0   0:10.26 [kworker/u8:2]                                         
18741 root      20   0       0      0      0 R  59.1  0.0   0:10.26 [kworker/u8:0]                                         
22107 root      20   0       0      0      0 R  55.5  0.0   0:08.28 [kworker/u8:1]                                         
23262 root      20   0       0      0      0 R  55.1  0.0   0:08.74 [kworker/u8:9]                                         
23253 root      20   0       0      0      0 R  51.5  0.0   0:05.02 [kworker/u8:7]                                         
 9426 root      20   0       0      0      0 R  50.1  0.0   0:09.49 [kworker/u8:5]                                         
 2455 root      20   0       0      0      0 S  18.9  0.0   1:37.18 [md126_raid6]                                          
23229 root      19  -1   40340    212     12 S  10.0  0.0   0:02.30 btrfs scrub start /data                                
18976 root      20   0       0      0      0 S   4.3  0.0   0:02.26 [kworker/u8:8]                                         
11971 root      20   0 3584900 356660  27084 S   3.7  2.2 182:56.16 /apps/dvblink-tv-server/dvblink_server                 
 5335 root      19  -1 1589984  59196  12228 S   2.0  0.4   5:41.17 /usr/sbin/readynasd -v 3 -t                            
10318 root      20   0  233308  31464   5660 S   1.7  0.2  40:29.84 /usr/bin/python /apps/dropboxmanager/web/manage.py run+
23226 root      39  19       0      0      0 D   1.3  0.0   0:00.35 [md126_resync]                                         
 2335 root       0 -20       0      0      0 S   0.7  0.0   0:01.65 [kworker/1:1H]                                         
 5138 root      20   0  992516  12028   8796 S   0.7  0.1  18:20.82 /opt/p2p/bin/leafp2p -n                                
22590 root      20   0       0      0      0 S   0.7  0.0   0:01.27 [kworker/2:6]                                          
 2340 root       0 -20       0      0      0 S   0.3  0.0   0:27.69 [kworker/0:1H]                                         
Message 3 of 37
FramerV
NETGEAR Employee Retired

Re: 6.6.1 Scrub Hammers CPU

Hi btaroli,

 

Have you sent your logs already?

 

I will be sending an inquiry to our subject matter expert about it if have you did.

 

 

Regards,

Message 4 of 37
btaroli
Prodigy

Re: 6.6.1 Scrub Hammers CPU

Sent them attn to mdgm the same time I posted my reply. 

Message 5 of 37
FramerV
NETGEAR Employee Retired

Re: 6.6.1 Scrub Hammers CPU

Hi btaroli,

 

Okay, I will give mdgm a heads up about your case.

 

 

Regards,

Message 6 of 37
mdgm-ntgr
NETGEAR Employee Retired

Re: 6.6.1 Scrub Hammers CPU

There have been some changes in 6.7.0, I think. Do you still have this problem on ReadyNASOS 6.7.0-T158 (Beta 1)?

Message 7 of 37
btaroli
Prodigy

Re: 6.6.1 Scrub Hammers CPU

I'd certainly be willing to try it. Given some pain in the last upgrade or two, I've been a little shy to want to rush into betas. I'll take a look at the release notes and forums posts on this one and the give it a try if it's looking fairly quiet on the problem front. 

Message 8 of 37
Laserbait
Luminary

Re: 6.6.1 Scrub Hammers CPU

I'm seeing this same issue on my RN316 running 6.6.1 with 6x4Tb drives in a RAID 5.   All 4 cores are running at 95-100%.  Did you update to 6.7.1 yet, if so, did that improve the scrub efficiency?

Model: RN31600|ReadyNAS 300 Series 6- Bay
Message 9 of 37
btaroli
Prodigy

Re: 6.6.1 Scrub Hammers CPU

I'm going to be trying 6.7.1 tonight after reading through the forums to see if there's been any post-upgrade issues. I only learned about 6.7.1 after seeing mentions of it over at DVBLogic's user forums...

Message 10 of 37
Retired_Member
Not applicable

Re: 6.6.1 Scrub Hammers CPU

If you like, try running the scrub while antivirus service being disabled.

Message 11 of 37
btaroli
Prodigy

Re: 6.6.1 Scrub Hammers CPU

I haven't had the AV service enable for years now. I was hopeful to do so Hefner it was replaced recently, but the false positive issue reversed that.

I gather 6.7.1 fixes that, but baby steps I shall take in re-enabling functions. 😄
Message 12 of 37
Laserbait
Luminary

Re: 6.6.1 Scrub Hammers CPU


@Retired_Member wrote:

If you like, try running the scrub while antivirus service being disabled.


I have never used the AntiVirus on my ReadyNAS's.

Message 13 of 37
ctechs
Apprentice

Re: 6.6.1 Scrub Hammers CPU

We experience the "bad neighbor" scrub behavior too - I don't dare schedule a scrub during business hours: the responsiveness of the ReadyNAS over SMB suffers pretty severely. CPU load average 2.5-3 during a scrub on a ReadyNAS 516. OS 6.7.1, have never used the antivirus feature.

 

Can this be tamed down?

Model: RN51600|ReadyNAS 516 6-Bay
Message 14 of 37
Laserbait
Luminary

Re: 6.6.1 Scrub Hammers CPU

Exactly! It'd be great to be able to throttle the scrub somehow.  Maybe a selectable priority, or limit it to a single thread/core.

Message 15 of 37
btaroli
Prodigy

Re: 6.6.1 Scrub Hammers CPU

Well this is confirmed to still be happening in 6.7.1. I observe that overall I/O and wait time seem OK. Indeed the journal entries that pop up as the job starts suggest it's throttling on I/O rate. However the kernel worker processes still monopolize the CPU cores/threads. A certain amount of application CPU usage seems OK, but if you have anything like PLEX transcribes or AFS based Time Machine, which cause a fair amount of CPU activity themselves, then these other processes get starved to the point of being almost unusable.

 

Some of this is just Btrfs behavior, which I can compare to similar operations I do on even newer kernels on other Linux machines. But when you have a server environment where there is an expectation of responsiveness from applications, it can be problematic. On this front, the only thing I would consider a standard OS issue is Time Machine backups. These are CPU intensive and will be seriously delayed if not fail outright -- based on previous painful experiences. In this run I'm not allowing TM to even trigger until the scrub finishes. I know for a fact I'll wind up having to trash my whole backup archive if I let it try and fail. 

 

As for PLEX, I can work around the issue by enabling direct play and disabling transcode in the client config. But hopefully we get to a point where scrubs will gracefully butt out when other activity requires CPU attention. 

Message 16 of 37
Laserbait
Luminary

Re: 6.6.1 Scrub Hammers CPU

That's unfortunate to hear.  I was hoping that it would be somewhat better.   I think it's going on day four for my scrub at about 90% complete now.

Are you using compression on any of your volumes/shares?

Message 17 of 37
Laserbait
Luminary

Re: 6.6.1 Scrub Hammers CPU

I am running compression on one of my shares.  I think I'll copy that data off to a new uncompressed share.  I'll run another scrub to see what happens.

Message 18 of 37
btaroli
Prodigy

Re: 6.6.1 Scrub Hammers CPU

Hmm. I've read more recently that IO throttling changed in more recent releases. I stated a scrub off while transcoding a show using PLEX. In the past, this would have immediately resulted in complaints that the server couldn't keep up with the transcode. In two hours I've had one such complaint. But overall it's been doing alright. 

Message 19 of 37
Laserbait
Luminary

Re: 6.6.1 Scrub Hammers CPU

I just updated to 6.7.5 a few days ago, so I'm interested to see what happens with my next scrub.   I also will be adding a EDA500 with 5x4TB drive in a few days, so this will be interesting. 

Message 20 of 37
StephenB
Guru

Re: 6.6.1 Scrub Hammers CPU


@Laserbait wrote:

I just updated to 6.7.5 a few days ago, so I'm interested to see what happens with my next scrub.  


I just ran one on my 526x  It took about 39 hours for 4x6TB RAID-5.  The file indexing feature in 6.8.0 was also actively indexing.

Message 21 of 37
Laserbait
Luminary

Re: 6.6.1 Scrub Hammers CPU

6.8.0? Is that out already?

 

Message 22 of 37
StephenB
Guru

Re: 6.6.1 Scrub Hammers CPU


@Laserbait wrote:

6.8.0? Is that out already?

 


It's at Beta 2.

Message 23 of 37
btaroli
Prodigy

Re: 6.6.1 Scrub Hammers CPU

from another thread... once you start the scrub, find the process running "btrfs scrub ...." and ionice it (ionice -p ####) to see if it's set to idle. If it's "none", then try "ionice -p ##### -c 3" (which sets it to class 3/idle) and see how this affects the performance. I noticed a significant improvement in CPU-heavy tasks (e.g. transcoding) while the scrub runs.

 

Maybe 6.8.0 fixes this, but I'm a bit leery of beta releases of late. It would be nice to know if this is addressed there though.

Message 24 of 37
Laserbait
Luminary

Re: 6.6.1 Scrub Hammers CPU

I'm currenly on 6.7.5 and running a scrub. I do not see a procress that shows btrfs scrub.

 

 ps -A | grep -i btrfs
 1356 ?        00:00:00 btrfs-worker
 1358 ?        00:00:00 btrfs-worker-hi
 1359 ?        00:00:00 btrfs-delalloc
 1360 ?        00:00:00 btrfs-flush_del
 1361 ?        00:00:00 btrfs-cache
 1362 ?        00:00:00 btrfs-submit
 1363 ?        00:00:00 btrfs-fixup
 1364 ?        00:00:00 btrfs-endio
 1365 ?        00:00:00 btrfs-endio-met
 1366 ?        00:00:00 btrfs-endio-met
 1367 ?        00:00:00 btrfs-endio-rai
 1368 ?        00:00:00 btrfs-endio-rep
 1369 ?        00:00:00 btrfs-rmw
 1370 ?        00:00:00 btrfs-endio-wri
 1371 ?        00:00:00 btrfs-freespace
 1372 ?        00:00:00 btrfs-delayed-m
 1373 ?        00:00:00 btrfs-readahead
 1374 ?        00:00:00 btrfs-qgroup-re
 1375 ?        00:00:00 btrfs-extent-re
 1376 ?        00:00:00 btrfs-cleaner
 1377 ?        00:00:26 btrfs-transacti
 1487 ?        00:00:00 btrfs-worker
 1488 ?        00:00:00 btrfs-worker-hi
 1489 ?        00:00:00 btrfs-delalloc
 1490 ?        00:00:00 btrfs-flush_del
 1491 ?        00:00:00 btrfs-cache
 1492 ?        00:00:00 btrfs-submit
 1493 ?        00:00:00 btrfs-fixup
 1494 ?        00:00:00 btrfs-endio
 1495 ?        00:00:00 btrfs-endio-met
 1496 ?        00:00:00 btrfs-endio-met
 1497 ?        00:00:00 btrfs-endio-rai
 1498 ?        00:00:00 btrfs-endio-rep
 1499 ?        00:00:00 btrfs-rmw
 1500 ?        00:00:00 btrfs-endio-wri
 1501 ?        00:00:00 btrfs-freespace
 1503 ?        00:00:00 btrfs-delayed-m
 1504 ?        00:00:00 btrfs-readahead
 1505 ?        00:00:00 btrfs-qgroup-re
 1506 ?        00:00:00 btrfs-extent-re
 2884 ?        00:02:07 btrfs-cleaner
 2885 ?        09:55:28 btrfs-transacti
26453 ?        00:17:54 btrfs <defunct>

Message 25 of 37
Top Contributors
Discussion stats
  • 36 replies
  • 6135 views
  • 1 kudo
  • 9 in conversation
Announcements