× NETGEAR will be terminating ReadyCLOUD service by July 1st, 2023. For more details click here.
Orbi WiFi 7 RBE973
Reply

Re: Volume operations causing slowness across SMB

nickjames
Luminary

Volume operations causing slowness across SMB

Greetings!

 

I'm trying to understand the behavior that I'm experiencing, whether or not its expected and most importantly, how can I improve it.

We are a graphic design shop. We use a lot of Adobe Illustrator files (*.ai, *.eps, *.pdf, etc.) that range from just a few hundred KB all the way up to 4GB per file.

 

The problem that we have experienced is opening those said files, editing them and then saving them back to the NAS, using Illustrator. I understand that this takes time even if the file was on the local machine however, occasionally when saving back to the NAS, other users that are using the NAS will report slower than usual file/directory access (SMB). Once the file finishes saving, the issue goes away.

 

When I look performance graphs within the webUI of the NAS, during business hours, the average is probably under 200 operations per second, with occasional spikes between 500-600 operations per second maybe 2-3 times a day, if that. I would not expect this to be considered high utilization.

 

I have snapshots occurring only x2 a day (12am and 12pm) and the users are not complaining about this during those times.

 

That said, here is a screenshot from the last occurrence when this happened. Business hours are between 7am-5pm so anything outside of this timeframe is more than likely a backup of some sort. I'm specifically looking at what is going on just before 10am.

 

008.png

 

 

 

→ Do you think the NAS is really being pushed? I don't but I need to be sure.
→ How can I improve this experience?

 

 

 

Thanks in advance!

Nick

Model: RN51600|ReadyNAS 516 6-Bay
Message 1 of 23
jak0lantash
Mentor

Re: Volume operations causing slowness across SMB

A lot of small files leads to a lot of "access". Two snapshots per day may lead to fragmentation, therefore even more access. You may want to look at a faster RAID level, such as RAID10, or SSDs. Also look at balance and defrag on your volume.

600 IOPS is significant on a RAID5 volume of 6 mechanical HDDs.

Message 2 of 23
Retired_Member
Not applicable

Re: Volume operations causing slowness across SMB

With SMB connections there is an app "SMB plus", which could help to avoid defragmentation right at the beginning, when files are written to your volume by your users. Just make sure you set option "Preallocation" to "enable" after the installation is complete.

 

The benefit would be faster reading of files during users' access and faster defragmentation operations in general.

Message 3 of 23
jak0lantash
Mentor

Re: Volume operations causing slowness across SMB

I don't think this would help with fragmentation due to snapshots. And won't help reduce access time either. But it's worth a try to measure the difference in user experience.
Message 4 of 23
Retired_Member
Not applicable

Re: Volume operations causing slowness across SMB

Very well put, jak0lantash. However, I got the impression, that user experience is key here.

 

Together wit a chance in current snapshot policy, nickjames, it could make a significant difference over time. Let me recommend to choose different points in time for snapshots. Or better (assuming there are no nightshifts), I would do only one snapshot at the very end of your business day, which could be 24:00. ...And don't do them on weekends (assuming your users are not working through the weekends, I hope 🙂

 

 

Message 5 of 23
Retired_Member
Not applicable

Re: Volume operations causing slowness across SMB

Sorry, in my earlier post "wit a chance" should read "with a change"

Message 6 of 23
jak0lantash
Mentor

Re: Volume operations causing slowness across SMB

OP could also try with an SSD and compare the user experience, to try confirm if the issue is related to storage throughput and/or IOPS.

Message 7 of 23
nickjames
Luminary

Re: Volume operations causing slowness across SMB

Thanks for all the replies.

 

I guess more information on the setup would have been helpful as well:

- WD Red drives, 4TB each x4 (RAID5)

- Under 10 users accessing the drive at a given moment

- Not using any SSDs currently; we are actually thinking about swaping out to a Synology setup for SSD caching (their decision not mine but I'm supportive)

- File structure consisents of 10 folders or so. The main "Projects" folder that everyone works out of is 700GB and contains 22,000 folders in the root of it (this does not include sub folders). Could the file/directory structure be improved?, ie- folders used to break the alphabet up. A-D, E-H, I-J, etc.

 

I have the snapshots setup, from my understanding, pretty ideal and I dont think that is the problem but then again, maybe I dont understand how snapshots work. Originally we were taking snapshots once an hour because we wanted hourly protection. This turned into a preformance problem and I could see right away on the Preformance Graphs in the webUI. Since then, I have reverted the snapshots only x2 a day (12am/12pm). We've had problems with a file that was incorrectly editted that we needed to restore that file from earlier that day, so that is why we have this interval.

 

Users are not complaining about the slowness between 12pm-1pm (which is the snapshot/prune timeframe), they are complaining about it when a user opens a 2GB Illustrator file, edits the file and then writes it back to the NAS. Accessing the volume during this save procedure is difficult for the other user(s) on the network. I was hoping from the screenshot that I provided, we could get a better idea as to what should be expected/unexpected with the given setup (you don't have the right RAID, you don't have enough memory, the NAS is too small, etc.- what is the bottleneck?)

 

This is a Monday-Friday 7am-5pm shop. The NAS works great now that I have the snapshots where I need them. Its just the random times when the NAS is "slow" and "sluggish" not between the hours of 12pm-1pm which is a given due to the snapshot taken.

 

As it was said by @jak0lantash "600 IOPS is significant on a RAID5 volume of 6 mechanical HDDs" - if this holds true, perhaps they have outgrown the mechanical hard drives? Maybe the SSD caching is the next step? That is what I was hoping to find out with my post. is 600 IOPS ideal for mechanical drives? Should I set that as my ceiling? How can I improve this.

 

Thanks in advance.

 

 

Message 8 of 23
cpu8088
Virtuoso

Re: Volume operations causing slowness across SMB

wd red is very slow drive at 5x00 rpm not suitable for commercial use. need enterprise grade drives

 

raid 5 with 4 disks is slow. consider raid 0 with 2 drives array

 

use another nas for daily night time backup instead of using snapshot.

 

 

 

Message 9 of 23
nickjames
Luminary

Re: Volume operations causing slowness across SMB

Thanks for the reply, @cpu8088, however I dont see the snapshots being a problem at 10am? Am I misunderstanding how snapshots are made/utilized? My assumption is when the snapshot is being created (12pm) the volume is busy making those snapshot(s) but once it finishes, it shouldn't be using the disk, correct? Later that same hour but closer to 1pm, the snapshots are pruned. In my book, 12pm-1pm is noted slow access times as expected due to this schedule but anytime before 12pm should not be affected in terms of performance due to snapshots, right?

 

Noted on the Red drives being slow-- what enterprise drives you suggest?

I thought Red was best for NAS systems or is it just a fancy marketing name/logo?

Message 10 of 23
cpu8088
Virtuoso

Re: Volume operations causing slowness across SMB

snapshots actions include read read write on same disk. if u use a separate nas for backup the actions will be read only and then read write on the disk in separate nas.

 

if your disk has mechanical, electrical failure your snapshots will be gone. so it is not a good way as backup.

 

regarding enterprise drives i use wd re or gold. red is just a modified desktop green with some vibration control. red pro and se are better. re or gold is top.

 

with btrfs plenty of defrag, scrub and balancing the reds life span is very short.

Message 11 of 23
nickjames
Luminary

Re: Volume operations causing slowness across SMB

Thanks @cpu8088. Noted.

 

I did forget to mention, we do backup via Rsync nightly to another RN516. We use this as the actual backup but like the snapshots for same day issues. I suppose we could pull the day before file off the device but the idea of a mid day snapshot is needed.

 

How does the mid day snap shot/prune between 12pm-1pm affect disk usage at 10am though?

Message 12 of 23
mdgm-ntgr
NETGEAR Employee Retired

Re: Volume operations causing slowness across SMB

We do have an additional RAID option for 6-bays in 6.7.x, namely RAID-50. 8-bays and above now also have the option to use RAID-60.

 

If you've got a high workload RAID-5 probably isn't the best choice. I'd consider using RAID-50.

Message 13 of 23
StephenB
Guru

Re: Volume operations causing slowness across SMB

Also, if you aren't using NIC bonding, you might want to try that. Network congestion at peak usage times might also be a factor.


@mdgm wrote:

 

If you've got a high workload RAID-5 probably isn't the best choice. I'd consider using RAID-50.


There is some basic information on RAID-50 here:  http://www.techrepublic.com/blog/the-enterprise-cloud/raid-50-offers-a-balance-of-performance-storag...  If you try that, you'll go with a 6 disk array (with 4 disks you'd use RAID-10 - which is also an option if you are replacing the drives anyway).  

 

I'm not keen on RAID-0 for a production shop that needs high availability - so I'd disagree with @cpu8088 on that idea.  But I agree that red pro or golds will give better performance on small file transfers.

 

 

But since you have 2 free slots, you could alternatively switch to flexraid and add a RAID-1 SSD volume (2x1TB), shifting your main project share to the SSD volume.  You can do that w/o rebuilding the current volume.  That's also cheaper than replacing all your mechanical disks.  

 

Message 14 of 23
TeknoJnky
Hero

Re: Volume operations causing slowness across SMB

A lot of good discussion and recommendations here.

 

As Stephen mentioned, I suspect your first bottleneck is network.

 

Most x86 nas devices are capable of faster than gbit ethernet.

 

I suspect that what is happening during your sluggish period is the file is being saved and fully utilzing the network connection, causing the other connections to be sluggish.

 

Similarly how if you ever used bittorrent wiithout speed restrictioned or tried to streamed HD across a slow internet connection, its hard for any other connections to get through at the same time.

 

So I would look into a switch that supports NIC bonding, bonding can nearly double your throughput during multiple connections.

 

Regarding the folder with 22k subfolders; I would definately recommend splitting those up into smaller groups as you mentioned. For my media storage I have 0-9, ABCDEFG, HIJKLMN, OPQRSTU, VWXYZ folders to split my movies across.

 

But 22k folders should only affect performance whenever a user accesses that project folder. Consider what happens; a user opens the project folder, the nas has to respond with all files/folders in it, it has to read the directory entries off the array, process and send that information across the network, the local pc then has to read that info and display it in explorer or whatever application dialog has requested the folder/file list.

 

Frankly I am amazed anyone can find anything in a folder with 22k items in it.

 

However, 22k folders should not really have any bearing on performance on saving already opened files, or accessing other parts of the array filesystem.

 

Finally another option I don't think has been mentioned, but might also help alleviate the performance is putting more RAM in your nas.

 

Extra ram will be used as additional cache/buffer to help cushion against high workloads.

 

Message 15 of 23
ctechs
Apprentice

Re: Volume operations causing slowness across SMB

As others have stated, WD Reds are about the slowest NAS drives around. Nothing wrong with them, and they're priced accordingly, but they're relatively slow. We use WD SE drives in our 516.

 

Given the slowness you report seems to involve long sequential writes (e.g. saving a 2GB file) - even with the Red drives, with a ReadyNAS 516, your first limitation is almost certainly a network bottleneck if you're only using a single NIC on the 516. You need to get a managed switch that can do LACP and bond both gigabit nics on the 516. That way, when one of your (gigabit, I'm asssuming) network clients is saving a huge file and saturating one of the gigabit links (only 100MB/sec) there's at least some hope for some file i/o to still happen over the second link. But even bonded gigabit nics will net you 200MB/sec theoretical peak, which isn't terribly fast if you're doing big file i/o.

 

If large file manipulation is the name of the game and you have the resources, you'll probably want a 528X, use 7x00RPM drives like the Gold (or SE, or Red Pro), set up them up with RAID-50 and connect it a switch with 10 gigabit capabilities. Maybe even hook up some of the heavy user clients with 10 gigabit nics too.

Message 16 of 23
nickjames
Luminary

Re: Volume operations causing slowness across SMB

Thank you everyone for taking a moment to comment.

 

I never thought about bonding the NICs to be honest as I thought the disk speed would be the bottleneck before the NIC. At any rate, I know how to setup LAGs and are using them elsewhere on the network switches (GS748Tv5) so why not setup bonding since the network supports it.

In terms of the disks (Red vs. Gold/Pro), I will keep that in mind moving forward.

 

A lot of good ideas here in terms of RAID-50 vs. setting up a second RAID-1 using SSDs for the production folder. I will take this into account as well. This might be a lot cheaper vs. buying all new Gold/Pro discs. The new RAID-1 seems like an easier solution perhaps too.

I will keep you guys posted. I appreciate all the information.

 

I will also look into the RAM situation. I know RAM comes into play here as well. Just never thought about opening up the device to upgrade. 

Message 17 of 23
ctechs
Apprentice

Re: Volume operations causing slowness across SMB

Even though they are on the slower side, even 4TB Reds can sustain over 130MB/s sequential writes, which means your NIC is maxed out every time a user writes a big file to disk and squeezes out competing traffic. Definitely the cheapest/easiest thing you can do to improve responsiveness is to bond the NICs since you already have managed switches.

 

http://hdd.userbenchmark.com/WD-Red-4TB-2013/Rating/3525

Message 18 of 23
TeknoJnky
Hero

Re: Volume operations causing slowness across SMB

another thing to consider regarding disks;

 

like mentioned above, many disks can easily max gbit on single file saves.

 

but what separates the big dogs from the little dogs is performance under mixed/multiple user loads, which is much more difficult for physical disks to do well.

 

IE, saving one large file fast is easy.

 

saving a large file while reading/writing multiple other users smaller files, is more complex and difficult, which you seen earlier on with the ~600 IOPS spikes.

 

That is why enterprise and SAS disks are often utilized in data centers/heavy duty servers, they are better able to handle complex multiuser workloads. (note while similar, SAS are different and generally way more expensive than SATA drives used in the nas)

 

So take that into consideration when you upgrade the drives.

 

But definately make use of network bonding since it sounds like you already have the hardware for it.

 

Message 19 of 23
nickjames
Luminary

Re: Volume operations causing slowness across SMB

I got the NIC bonding all setup tonight. I went with LACP Layer 2 + 3. I was moving a few files around just for fun and was getting between 98-116MB/sec via SMB. I thought I would see more though? My test PC is on switch A, which has a LACP (gigabit) to switch B where the RN516 is.

 

I guess I should have tested before bonding the NICs to see what I was getting before? What should I expect to see?

Message 20 of 23
jak0lantash
Mentor

Re: Volume operations causing slowness across SMB

This is expected. LACP is good for multiple devices.
To determine which Ethernet link to send the packet through, LACP calculates a hash. Here you chose L2+L3. Because you're transferring data from a single client to a single NAS. the source data is always the same, therefore the hash is always the same, therefore the Ethernet link used is always the same. That's why you're capped at the throughput of a single link.

If you want to test the gain in throughput, you need to test from multiple clients simultaneously.
Message 21 of 23
jak0lantash
Mentor

Re: Volume operations causing slowness across SMB

Also, the throughput is only as good as the slowest link in the path. So if your test PC is connected to switch A via a single Ethernet cable, you won't see more than 1Gbps (so <125Mo/s - overhead).
Message 22 of 23
StephenB
Guru

Re: Volume operations causing slowness across SMB


@jak0lantash wrote:
This is expected. LACP is good for multiple devices.
To determine which Ethernet link to send the packet through, LACP calculates a hash. Here you chose L2+L3. Because you're transferring data from a single client to a single NAS. the source data is always the same, therefore the hash is always the same, therefore the Ethernet link used is always the same. That's why you're capped at the throughput of a single link.

If you want to test the gain in throughput, you need to test from multiple clients simultaneously.

Exactly so.  LACP is working as designed.  If you aggregate 1 gbps links, the maximum flow to each client connection through the bond is intentionally limited to 1 gbps.  That prevents packet loss at layers 1 and 2. 

 

Your original use case was that when one "primary" user was copying a large file, all other users were blocked until that was complete.  LACP should reduce how often that happens, though for each of the other user's there's still a 50-50 chance that LACP's hash will put them on the same NIC as the "primary" user creating the congestion.  For those users the bandwidth starvation will still happen.

 

You could try a different aggregation mode (with NAS using ALB perhaps) - that has different downsides, but might work out better.  

 

10 gigabit etherent in the server is the best technical answer, but that requires at least a 52x NAS (and of course a 10 gigabit switch).

Message 23 of 23
Top Contributors
Discussion stats
  • 22 replies
  • 15229 views
  • 7 kudos
  • 8 in conversation
Announcements