- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
Re: 6.6.1 Scrub Hammers CPU
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
6.6.1 Scrub Hammers CPU
And when I say hammers, I mean it runs with -1 prio and causes 6-8 kernel worker threads each vying to consume 100% of CPU. So rabid is this consumption that all other background processes, including one's third party apps and Time Machine backups, just cease to function.
So with all the attention to being a good neighbor during resyncs and whatnot, why is scrub such a terrible neighbor? I'd love to enable it to run every month or quarter, but I can't stand to have my NAS more or less inoperable for my needs while it's running.
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Re: 6.6.1 Scrub Hammers CPU
Which model is this on?
Can you send in your logs (see the Sending Logs link in my sig)?
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Re: 6.6.1 Scrub Hammers CPU
Sure, I'd be glad to. This is on a 528.
Just for the sake of completeness, here's top with just background running...
top - 03:34:19 up 3 days, 49 min, 2 users, load average: 0.01, 0.03, 0.06 Tasks: 241 total, 2 running, 239 sleeping, 0 stopped, 0 zombie %Cpu(s): 0.4 us, 0.7 sy, 0.0 ni, 98.8 id, 0.1 wa, 0.0 hi, 0.0 si, 0.0 st KiB Mem: 16303964 total, 15086028 used, 1217936 free, 2468 buffers KiB Swap: 3139580 total, 0 used, 3139580 free. 13522476 cached Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 11971 root 20 0 3584900 356660 27084 S 4.0 2.2 182:46.02 /apps/dvblink-tv-server/dvblink_server 10318 root 20 0 233308 31464 5660 S 1.0 0.2 40:27.23 /usr/bin/python /apps/dropboxmanager/web/manage.py run+ 5138 root 20 0 992516 12028 8796 S 0.3 0.1 18:19.58 /opt/p2p/bin/leafp2p -n 5419 root 20 0 28788 3060 2468 R 0.3 0.0 7:33.54 top 22590 root 20 0 0 0 0 S 0.3 0.0 0:00.30 [kworker/2:6] 1 root 20 0 202460 6504 4516 S 0.0 0.0 0:38.41 /sbin/init 2 root 20 0 0 0 0 S 0.0 0.0 0:00.10 [kthreadd] 3 root 20 0 0 0 0 S 0.0 0.0 0:06.33 [ksoftirqd/0] 5 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 [kworker/0:0H] 7 root 20 0 0 0 0 R 0.0 0.0 0:57.85 [rcu_sched] 8 root 20 0 0 0 0 S 0.0 0.0 0:00.00 [rcu_bh] 9 root rt 0 0 0 0 S 0.0 0.0 0:00.21 [migration/0] 10 root rt 0 0 0 0 S 0.0 0.0 0:00.83 [watchdog/0] 11 root rt 0 0 0 0 S 0.0 0.0 0:00.85 [watchdog/1] 12 root rt 0 0 0 0 S 0.0 0.0 0:00.24 [migration/1] 13 root 20 0 0 0 0 S 0.0 0.0 0:03.26 [ksoftirqd/1] 15 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 [kworker/1:0H]
And here's what it looks like shortly after kicking off a scrub.
top - 03:34:19 up 3 days, 49 min, 2 users, load average: 0.01, 0.03, 0.06 Tasks: 241 total, 2 running, 239 sleeping, 0 stopped, 0 zombie top - 03:39:04 up 3 days, 53 min, 2 users, load average: 2.67, 0.65, 0.26 Tasks: 249 total, 7 running, 242 sleeping, 0 stopped, 0 zombie %Cpu(s): 0.5 us, 95.1 sy, 0.0 ni, 3.5 id, 0.0 wa, 0.0 hi, 0.9 si, 0.0 st KiB Mem: 16303964 total, 15111680 used, 1192284 free, 2468 buffers KiB Swap: 3139580 total, 0 used, 3139580 free. 13537484 cached Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 14083 root 20 0 0 0 0 R 69.1 0.0 0:10.26 [kworker/u8:2] 18741 root 20 0 0 0 0 R 59.1 0.0 0:10.26 [kworker/u8:0] 22107 root 20 0 0 0 0 R 55.5 0.0 0:08.28 [kworker/u8:1] 23262 root 20 0 0 0 0 R 55.1 0.0 0:08.74 [kworker/u8:9] 23253 root 20 0 0 0 0 R 51.5 0.0 0:05.02 [kworker/u8:7] 9426 root 20 0 0 0 0 R 50.1 0.0 0:09.49 [kworker/u8:5] 2455 root 20 0 0 0 0 S 18.9 0.0 1:37.18 [md126_raid6] 23229 root 19 -1 40340 212 12 S 10.0 0.0 0:02.30 btrfs scrub start /data 18976 root 20 0 0 0 0 S 4.3 0.0 0:02.26 [kworker/u8:8] 11971 root 20 0 3584900 356660 27084 S 3.7 2.2 182:56.16 /apps/dvblink-tv-server/dvblink_server 5335 root 19 -1 1589984 59196 12228 S 2.0 0.4 5:41.17 /usr/sbin/readynasd -v 3 -t 10318 root 20 0 233308 31464 5660 S 1.7 0.2 40:29.84 /usr/bin/python /apps/dropboxmanager/web/manage.py run+ 23226 root 39 19 0 0 0 D 1.3 0.0 0:00.35 [md126_resync] 2335 root 0 -20 0 0 0 S 0.7 0.0 0:01.65 [kworker/1:1H] 5138 root 20 0 992516 12028 8796 S 0.7 0.1 18:20.82 /opt/p2p/bin/leafp2p -n 22590 root 20 0 0 0 0 S 0.7 0.0 0:01.27 [kworker/2:6] 2340 root 0 -20 0 0 0 S 0.3 0.0 0:27.69 [kworker/0:1H]
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Re: 6.6.1 Scrub Hammers CPU
Hi btaroli,
Have you sent your logs already?
I will be sending an inquiry to our subject matter expert about it if have you did.
Regards,
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Re: 6.6.1 Scrub Hammers CPU
Sent them attn to mdgm the same time I posted my reply.
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Re: 6.6.1 Scrub Hammers CPU
Hi btaroli,
Okay, I will give mdgm a heads up about your case.
Regards,
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Re: 6.6.1 Scrub Hammers CPU
There have been some changes in 6.7.0, I think. Do you still have this problem on ReadyNASOS 6.7.0-T158 (Beta 1)?
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Re: 6.6.1 Scrub Hammers CPU
I'd certainly be willing to try it. Given some pain in the last upgrade or two, I've been a little shy to want to rush into betas. I'll take a look at the release notes and forums posts on this one and the give it a try if it's looking fairly quiet on the problem front.
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Re: 6.6.1 Scrub Hammers CPU
I'm seeing this same issue on my RN316 running 6.6.1 with 6x4Tb drives in a RAID 5. All 4 cores are running at 95-100%. Did you update to 6.7.1 yet, if so, did that improve the scrub efficiency?
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Re: 6.6.1 Scrub Hammers CPU
I'm going to be trying 6.7.1 tonight after reading through the forums to see if there's been any post-upgrade issues. I only learned about 6.7.1 after seeing mentions of it over at DVBLogic's user forums...
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Re: 6.6.1 Scrub Hammers CPU
If you like, try running the scrub while antivirus service being disabled.
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Re: 6.6.1 Scrub Hammers CPU
I gather 6.7.1 fixes that, but baby steps I shall take in re-enabling functions. 😄
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Re: 6.6.1 Scrub Hammers CPU
@Retired_Member wrote:If you like, try running the scrub while antivirus service being disabled.
I have never used the AntiVirus on my ReadyNAS's.
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Re: 6.6.1 Scrub Hammers CPU
We experience the "bad neighbor" scrub behavior too - I don't dare schedule a scrub during business hours: the responsiveness of the ReadyNAS over SMB suffers pretty severely. CPU load average 2.5-3 during a scrub on a ReadyNAS 516. OS 6.7.1, have never used the antivirus feature.
Can this be tamed down?
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Re: 6.6.1 Scrub Hammers CPU
Exactly! It'd be great to be able to throttle the scrub somehow. Maybe a selectable priority, or limit it to a single thread/core.
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Re: 6.6.1 Scrub Hammers CPU
Well this is confirmed to still be happening in 6.7.1. I observe that overall I/O and wait time seem OK. Indeed the journal entries that pop up as the job starts suggest it's throttling on I/O rate. However the kernel worker processes still monopolize the CPU cores/threads. A certain amount of application CPU usage seems OK, but if you have anything like PLEX transcribes or AFS based Time Machine, which cause a fair amount of CPU activity themselves, then these other processes get starved to the point of being almost unusable.
Some of this is just Btrfs behavior, which I can compare to similar operations I do on even newer kernels on other Linux machines. But when you have a server environment where there is an expectation of responsiveness from applications, it can be problematic. On this front, the only thing I would consider a standard OS issue is Time Machine backups. These are CPU intensive and will be seriously delayed if not fail outright -- based on previous painful experiences. In this run I'm not allowing TM to even trigger until the scrub finishes. I know for a fact I'll wind up having to trash my whole backup archive if I let it try and fail.
As for PLEX, I can work around the issue by enabling direct play and disabling transcode in the client config. But hopefully we get to a point where scrubs will gracefully butt out when other activity requires CPU attention.
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Re: 6.6.1 Scrub Hammers CPU
That's unfortunate to hear. I was hoping that it would be somewhat better. I think it's going on day four for my scrub at about 90% complete now.
Are you using compression on any of your volumes/shares?
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Re: 6.6.1 Scrub Hammers CPU
I am running compression on one of my shares. I think I'll copy that data off to a new uncompressed share. I'll run another scrub to see what happens.
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Re: 6.6.1 Scrub Hammers CPU
Hmm. I've read more recently that IO throttling changed in more recent releases. I stated a scrub off while transcoding a show using PLEX. In the past, this would have immediately resulted in complaints that the server couldn't keep up with the transcode. In two hours I've had one such complaint. But overall it's been doing alright.
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Re: 6.6.1 Scrub Hammers CPU
I just updated to 6.7.5 a few days ago, so I'm interested to see what happens with my next scrub. I also will be adding a EDA500 with 5x4TB drive in a few days, so this will be interesting.
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Re: 6.6.1 Scrub Hammers CPU
@Laserbait wrote:
I just updated to 6.7.5 a few days ago, so I'm interested to see what happens with my next scrub.
I just ran one on my 526x It took about 39 hours for 4x6TB RAID-5. The file indexing feature in 6.8.0 was also actively indexing.
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Re: 6.6.1 Scrub Hammers CPU
@Laserbait wrote:
6.8.0? Is that out already?
It's at Beta 2.
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Re: 6.6.1 Scrub Hammers CPU
from another thread... once you start the scrub, find the process running "btrfs scrub ...." and ionice it (ionice -p ####) to see if it's set to idle. If it's "none", then try "ionice -p ##### -c 3" (which sets it to class 3/idle) and see how this affects the performance. I noticed a significant improvement in CPU-heavy tasks (e.g. transcoding) while the scrub runs.
Maybe 6.8.0 fixes this, but I'm a bit leery of beta releases of late. It would be nice to know if this is addressed there though.
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Re: 6.6.1 Scrub Hammers CPU
I'm currenly on 6.7.5 and running a scrub. I do not see a procress that shows btrfs scrub.
ps -A | grep -i btrfs
1356 ? 00:00:00 btrfs-worker
1358 ? 00:00:00 btrfs-worker-hi
1359 ? 00:00:00 btrfs-delalloc
1360 ? 00:00:00 btrfs-flush_del
1361 ? 00:00:00 btrfs-cache
1362 ? 00:00:00 btrfs-submit
1363 ? 00:00:00 btrfs-fixup
1364 ? 00:00:00 btrfs-endio
1365 ? 00:00:00 btrfs-endio-met
1366 ? 00:00:00 btrfs-endio-met
1367 ? 00:00:00 btrfs-endio-rai
1368 ? 00:00:00 btrfs-endio-rep
1369 ? 00:00:00 btrfs-rmw
1370 ? 00:00:00 btrfs-endio-wri
1371 ? 00:00:00 btrfs-freespace
1372 ? 00:00:00 btrfs-delayed-m
1373 ? 00:00:00 btrfs-readahead
1374 ? 00:00:00 btrfs-qgroup-re
1375 ? 00:00:00 btrfs-extent-re
1376 ? 00:00:00 btrfs-cleaner
1377 ? 00:00:26 btrfs-transacti
1487 ? 00:00:00 btrfs-worker
1488 ? 00:00:00 btrfs-worker-hi
1489 ? 00:00:00 btrfs-delalloc
1490 ? 00:00:00 btrfs-flush_del
1491 ? 00:00:00 btrfs-cache
1492 ? 00:00:00 btrfs-submit
1493 ? 00:00:00 btrfs-fixup
1494 ? 00:00:00 btrfs-endio
1495 ? 00:00:00 btrfs-endio-met
1496 ? 00:00:00 btrfs-endio-met
1497 ? 00:00:00 btrfs-endio-rai
1498 ? 00:00:00 btrfs-endio-rep
1499 ? 00:00:00 btrfs-rmw
1500 ? 00:00:00 btrfs-endio-wri
1501 ? 00:00:00 btrfs-freespace
1503 ? 00:00:00 btrfs-delayed-m
1504 ? 00:00:00 btrfs-readahead
1505 ? 00:00:00 btrfs-qgroup-re
1506 ? 00:00:00 btrfs-extent-re
2884 ? 00:02:07 btrfs-cleaner
2885 ? 09:55:28 btrfs-transacti
26453 ? 00:17:54 btrfs <defunct>