ReadyNAS 4312X SMB shares intermittently hang/freeze

Question

ReadyNAS 4312X, running firmware version 6.10.10.

Disks are 6TB 7200rpm SATAs of WD and Toshiba make, respectively.

Network connection is bond0 over eth4/eth5 (the two 10Gbps NICs), configured LACP 802.3ad Layer 2. Upstream switches are Meraki MS355s, running aggregation.

Joined to Active Directory; SMBv3 encryption was set to "Desired", now "Disabled" --- no change, so likely to go back to "Desired".

Every so often and quite randomly, the SMB share will be unreachable. Python code attempting to read or write to the NAS returns "OSError: [Errno 22] Invalid argument" as an example; another tool says "Can't find the file" that it read successfully on the previous iteration.

top/iostat on the system return reasonable iowaits (<4%), tps in the mid-teens at peak per disk, and read/write rates per disk well under SATA maximums. Truncated sample:

avg-cpu: %user %nice %system %iowait %steal %idle
4.48 6.93 4.05 0.68 0.00 83.86

Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 15.40 121.60 669.40 608 3347
sdb 15.20 115.20 669.40 576 3347

....(x12 disks)

I don't see any errors in the /var/log/samba directory on the NAS, and the switches are not reporting flapping or interface issues. Drop counters on the NICs are 0; the rx_no_dma_resources counter on each is in the hundreds of thousands mark, though seem not to be rising. I've already set the ring buffers higher through ethtool.

Looking for ideas at this point, or if there's more information needed to help, I'll grab it...

EMF2 · Answer

Further information:When the connection drops, it's down for five or ten seconds for all clients at the same time.&nbsp; If a client isn't actively trying to read or write to the share, they don't even notice.Date and time are configured to time-sync to the DCs and match.There are no other apps on the device; NFS and Rsync are enabled for support of some backup jobs, but almost completely unused.Audit logs are turned on, but the information in them consists solely of the login/logoff which doesn't seem to correlate with the drops.This network configuration has existed longer than the trouble with the dropping share has occurred, but might still somehow be relevant:Bond0 is running subinterfaces for access segmentation and service control.&nbsp;&nbsp;bond0: (native VLAN 6), IP address 192.168.6.X/255.255.255.0/192.168.6.1&nbsp; DNS nas1.sec.domain.localbond0.2: (trunk VLAN 2) IP address 192.168.2.X/255.255.255.0/192.168.2.1&nbsp; &nbsp;DNS&nbsp;nas1.domain.localbond0.7: (trunk VLAN 7) IP address 192.168.7.X/255.255.255.0/192.168.7.1&nbsp; &nbsp;DNS nas1.dev.domain.localSSH/HTTPS/etc. are access-list blocked at the switches for all but VLAN 6, and you can only reach that VLAN through a firewalled management portal.&nbsp;&nbsp;&nbsp;

StephenB · Answer

Just wondering - is this happening when the client is starting to communicate with the NAS?&nbsp; Or does it appear to be happening mid-flight?

EMF2 · Answer

I know this is going to seem like a non-answer, but it's not: Yes to both, sort of.

Sometimes it happens in the middle of a large file. But the Python process I mention is processing literally tens of thousands of 2.5MB files, and when it breaks, it breaks for the next few seconds of attempts. It could be one driving the other, though -- because if I'm processing the tiny files while handling the large one, the large transfer will *also* fail.

Leading the witness, but this seems like some sort of load issue or service crash on the NAS, where the system just can't keep up with the demand... but there's nothing in the logs or iostat/vmstat/netstat output to support that; conversely, a network issue, where there's some sort of renegotiation on how to connect to it. Nothing in the logs on the switches is showing flapping, and no mention of service interruptions on the NAS logs --- but I will freely admit I might be missing where to look especially on the NAS.

StephenB · Answer

EMF2&nbsp;wrote:
But the Python process I mention is processing literally tens of thousands of 2.5MB files, and when it breaks, it breaks for the next few seconds of attempts.&nbsp;&nbsp;

There are some linux limits that might be kicking in.&nbsp; The max number of open files is 65536 - is there any chance that the script is leaving some open?
&nbsp;
If you go into the share settings -&gt; network access -&gt; SMB -&gt; advanced, you'll see a "Strict Sync" setting.&nbsp; You could try turning that on, and see if that resolves the issue.&nbsp; (Performance might slow down though).

EMF2 · Answer

The way the Python process works, it should be closing each one rapidly after processing (it's a "with open(filename, 'w') as f:" ).&nbsp; lsof shows numbers in the low 20,000 range.&nbsp;Unfortunately, "Strict Sync" is already enabled for this particular share.A couple of my users just raised my stress level when they lost connection to the share altogether at the same time -- had to reboot their machines to return it -- which leads me to more closely suspect the NAS has something going sideways.

Forum Discussion

ReadyNAS 4312X SMB shares intermittently hang/freeze

8 Replies

Related Content

Nighthawk CAX80 Intermittent internet disconnections

RBRE960 - Intermittent Connectivity Issues

Intermittent internet connection on some devices

XR500 intermittent dns timeout issue

Orbi BK43 AC2200 intermittent working/not working

NETGEAR Academy

ProSupport for Business