× NETGEAR will be terminating ReadyCLOUD service by July 1st, 2023. For more details click here.
Orbi WiFi 7 RBE973
Reply

Scrub slows to a crawl -- is it SMR causing it?

Sandshark
Sensei

Scrub slows to a crawl -- is it SMR causing it?

I have a converted ReadyData5200 as my primary NAS.  So, it's got a 2.67GHz quad-core (eight thread) Xeon X3450 and 8GB of dual-channel RAM, which make it pretty snappy.  A few days ago, it started the first scrub since I had to replace one 6TB WD red and was one of those that ended up with an SMR drive before it became common knowledge that WD had made the switch.  The scrub usualy takes a couple days.  After 2 days, it was at only 82%, and I said to myself that's probably the penalty for SMR, but not really too bad.  I expected it would complete soon.  But then at the end of 3 days, it was only at 87%.  Oddly, SSH showed that the MDADM re-sync had completed, which did not previously occur so far ahead (if at all ahead) from the BTRFS scrub.  Overall network access was clearly being affected way more than for a past scrub, and top showed a lot more kworker and BTRFS activity than I remembered.

 

I then looked at the main log and saw that snapshots seemed to be occurring properly, but only the first set of snapshot trims that usually accompany them had occurred in that 3-day span.  rsync backup jobs were also very bogged down.

 

I issued a btrfs scrub cancel /data from SSH and noticed a fury of activity in top and all of the "backed up"  snapshot deletions now showed in the log and the backups completed (very quickly because they were for shares with little or no change from the last one).  After the fury died down (<10 minutes), I then issued btrfs scrub resume /data and the scub is back to progressing at a normal rate and network access is also back to typical speed (only slightly slowed by the ongoing scrub).  In just over an hour, the scrub has gone from 87% to 94% complete -- more progress than took a day before I interrupted the scrub.  It's that significant.

 

Is this apparent "task backup" and it's affect on a scrub something inherent in BTRFS and/or how ReadyNASOS uses it (all BTRFS tasks seem to have the same priority of 20 and niceness of 0), perhaps pushed into this situation by the slightly longer scrub due to the SMR drive (assuming that actually extended it)?  Or is the SMR drive itself to blame and it's it's background data re-arranging that backed up?  Since one of the reasons I switched to rack-mount is that I had similar issues with scrubs and even balances on an EDA500, where access is bottlenecked by the eSATA interface, I think it's not just the SMR drive.  But is there anything that can be done about it (by me, or in an OS update)?  If my NAS has this issue with it's CPU and RAM, those with lesser hardware are clearly going to see it worse.

 

Yeah, I know the WD reds aren't rated for a 12-drive NAS.  But I moved them from an RN516 and they actually seem to be operating just fine up until this.  The one failure was not pre-mature.

Message 1 of 3
StephenB
Guru

Re: Scrub slows to a crawl -- is it SMR causing it?



This sounds like it might be SMR.  The straight mdadm resync should be handled fairly well, since it is sequential.  But if you mix in other activity (like the btrfs scrub combined with snapshot deletions), the caching strategies don't work well.  

 

But I have no way to test this, since I don't have an SMR internal drive.  Not sure if you've seen this testing: https://www.servethehome.com/wd-red-smr-vs-cmr-tested-avoid-red-smr/

Message 2 of 3
Sandshark
Sensei

Re: Scrub slows to a crawl -- is it SMR causing it?

I had not seen that particualr article, but was aware of the basic conclusions.

 

It seems pretty clear that the SMR drive is at least partly part of the problem.  But with many more users potentially using SMR drives because they are unaware of this issue, I just wonder if Netgear can and will do something that can help.  As I said, this seems very much in line with performance with an EDA500, where a drive I/O bottleneck starts to domino out of control as more and more processes with equal priority are spawned and try to get their "share of the pie".  My NAS is powerful enough that readynasd didn't get locked out, but I have seen that happen on lesser NAS, cutting off GUI access.

 

I think I'm going to replace the 6TB EFAX with a larger one that's not SMR and just keep the current one as an emergency backup.

Message 3 of 3
Top Contributors
Discussion stats
  • 2 replies
  • 807 views
  • 0 kudos
  • 2 in conversation
Announcements