NETGEAR is aware of a growing number of phone and online scams. To learn how to stay safe click here.
Forum Discussion
jlehtinen
Mar 29, 2013Aspirant
ESXi reports "All Paths Down" for ReadyNAS hosted NFS share
Hiya - looking for some feedback from the community on an issue I'm seeing. Thanks in advance for any insights.
Some background:
We're using two ReadyNAS 3200's to host virtual machines via NFS.
ESXi hosts are running ESXi 5.1.
The ReadyNAS units are running 4.2.19, and have "adaptive load balancing" set on the NICs.
Issue:
I'm seeing some of the ESXi hosts report that NFS shares enter "All Paths Down" state for 6-7 seconds, before exiting this status and reconnecting. This happens for BOTH ReadyNAS units, and on 9 ESXi hosts - with no solid pattern on which host is impacted OR which ReadyNAS shows as "All Paths Down". It DOES appear to be related to the current load on the ReadyNAS. For example, if I start a backup job, I can expect to see this error on 3-4 ESXi hosts at least. I believe this has been happening for awhile without anyone noticing - but it caused a HUGE issue 2 weeks ago, when one of the ReadyNAS units entered/exited "All Paths Down" state nonstop while backups were running. (I opened a support case with Netgear and submitted the logs but they could not explain why this happened.)
Current theory:
From what I can tell, adaptive load balancing causes the ReadyNAS to change what MAC address (and NIC) is receiving traffic for a certain percentage of the overall traffic. It's my guess that when I run backups (or do anything else load intensive), the ReadyNAS attempts to load balance some of the traffic going to the ESXi hosts. The resulting change to the MAC address being reported to the ESXi host causes ESXi to report "all paths down" briefly before the new MAC address/NIC resolves correctly.
The issue we experienced must have been due to a glitch or bug in the load balancing, which caused the ReadyNAS to fail to "stabilize" the load balancing correctly. I was only able to stabilize the unit by power cycling it.
Questions:
1.) Does this sound like a plausible theory? My current thinking is I should disable load balancing and go to active-backup configuration to see if this resolves the issue.
2.) Will a firmware update resolve this issue? I reviewed the firmware patch notes and none of them mention NFS stability with NIC teaming.
Some background:
We're using two ReadyNAS 3200's to host virtual machines via NFS.
ESXi hosts are running ESXi 5.1.
The ReadyNAS units are running 4.2.19, and have "adaptive load balancing" set on the NICs.
Issue:
I'm seeing some of the ESXi hosts report that NFS shares enter "All Paths Down" state for 6-7 seconds, before exiting this status and reconnecting. This happens for BOTH ReadyNAS units, and on 9 ESXi hosts - with no solid pattern on which host is impacted OR which ReadyNAS shows as "All Paths Down". It DOES appear to be related to the current load on the ReadyNAS. For example, if I start a backup job, I can expect to see this error on 3-4 ESXi hosts at least. I believe this has been happening for awhile without anyone noticing - but it caused a HUGE issue 2 weeks ago, when one of the ReadyNAS units entered/exited "All Paths Down" state nonstop while backups were running. (I opened a support case with Netgear and submitted the logs but they could not explain why this happened.)
Current theory:
From what I can tell, adaptive load balancing causes the ReadyNAS to change what MAC address (and NIC) is receiving traffic for a certain percentage of the overall traffic. It's my guess that when I run backups (or do anything else load intensive), the ReadyNAS attempts to load balance some of the traffic going to the ESXi hosts. The resulting change to the MAC address being reported to the ESXi host causes ESXi to report "all paths down" briefly before the new MAC address/NIC resolves correctly.
The issue we experienced must have been due to a glitch or bug in the load balancing, which caused the ReadyNAS to fail to "stabilize" the load balancing correctly. I was only able to stabilize the unit by power cycling it.
Questions:
1.) Does this sound like a plausible theory? My current thinking is I should disable load balancing and go to active-backup configuration to see if this resolves the issue.
2.) Will a firmware update resolve this issue? I reviewed the firmware patch notes and none of them mention NFS stability with NIC teaming.
22 Replies
Replies have been turned off for this discussion
- jlehtinenAspirantAs an update...
I have:
- replaced the switch that is used to connect the ReadyNAS units to the ESXi hosts.
- disabled load balancing on the ReadyNAS units.
- upgraded firmware to most recent version.
Issue still persists and we're still getting the APD error in ESXi. - diegoazeAspirantHi jlehtinen,
Have you, by any chance, got this issue sorted? We are having same issue here since ever.
We use ReadyNAS 4200 as NFS datastores for VMs running virtual backup appliances.
During backup window, ReadyNAS is under intensive I/O and I believe it cant handle the load. It will happen if more than 2 backups are running at the same time.
We have 10GB NICs setup as a Team, with Active/Backup policy.
I have upgraded firmware on it over and over, but none of them have fixed the issue. Currently installed version is RAIDiator 4.2.23.
ESXi are running 5.1 U1, but the issue happened when hosts were running 5.0 as well.
We have 3 ReadyNAS devices attached to ESXi hosts and the issue happens on all of them, ones more often then the others. - chirpaLuminaryMaybe try this: viewtopic.php?f=118&t=65374
- jlehtinenAspirant
chirpa wrote: Maybe try this: http://www.readynas.com/forum/viewtopic ... 18&t=65374
chirpa, thanks for the reply. Unless I'm missing something, that's a post for how to adjust settings to maximize IOPs for a very specific NIC configuration (4 NIC MPIO) in an environment that is using iSCSI.
Doesn't really relate to situation described in this post. - jlehtinenAspirant
diegoaze wrote: Hi jlehtinen,
Have you, by any chance, got this issue sorted? We are having same issue here since ever.
We use ReadyNAS 4200 as NFS datastores for VMs running virtual backup appliances.
During backup window, ReadyNAS is under intensive I/O and I believe it cant handle the load. It will happen if more than 2 backups are running at the same time.
We have 10GB NICs setup as a Team, with Active/Backup policy.
I have upgraded firmware on it over and over, but none of them have fixed the issue. Currently installed version is RAIDiator 4.2.23.
ESXi are running 5.1 U1, but the issue happened when hosts were running 5.0 as well.
We have 3 ReadyNAS devices attached to ESXi hosts and the issue happens on all of them, ones more often then the others.
diegoaze
No luck on resolving it. A tech had me do (another) chassis replacement on one of the 3200's, so I've been monitoring that unit.
The storage with the replaced chassis has actually been OK so far, but I'm not sure if that's a coincidence.
This issue seems to get worse over time - the longer the storage is on, and the more data gets pushed on/off, the worse they get - to the point where a reboot of the storage is required to resolve connectivity. It is possible that the unit with the replacement chassis is working better because it was just rebooted, and the issues will manifest again over time. - jlehtinenAspirantJust an update, the unit where I completed a chassis replacement started giving this error again in the ESXi hosts while backups were running.
- diegoazeAspirantIn my case I don't really think it is an issue with the chassis or a hardaware faulty, as it happens on 3 boxes that are connected to ESXi hosts.
We have another 3 units that are not connected to vSphere butI am not sure whether the issue also happens there. We use them to replicate data from the first 3 units, using ReadyNAS Replicate plugin.
Is there any logs where I can look at for further troubleshooting? I am really not good with Linux stuff. - jlehtinenAspirantI don't think it's related to hardware either, as it happens on all 3 of our ReadyNAS units. However, chassis replacement is what the support technicians recommended. So far I've done a chassis replacement on two of the units in the last year and it hasn't helped.
You can get at the logs in FrontView by going to Status > Logs > Download all logs. I'm no Linux expert either, so I don't really have a more specific direction for troubleshooting. I feel like it's got to be a configuration issue either with NFS or with the NICs... - diegoazeAspirantI already had a look on the logs downloaded from FrontView but could not see much. I was expecting to see a NFS specific log of some sort.
I contacted Negtear 2 days ago and I was told to change load balance mode to TLB (Transmit Load Balance). I was also told my array was built a long time ago when the device was running a pretty old firmware version. The reccomendation from Support was to backup everything and factory reset the appliance. I then should recreate it again using X-RAID (already is BTW) and copy the data back. The only issue here is that each of our ReadyNASes (6 in total) range from 6TB to 12TB of used space. It will take ages for my to accomplish this. And Netgear tech told me this is something to be done every 1 year or 2.
I have already changed the Network Load balance option (one one device only) and will see what happens. Next step would be to disable teaming at all! - jlehtinenAspirantOur units were originally in Adaptive Load Balancing. I changed them to Active Backup and still get the error. I have not tried disabling teaming completely.
I was also told by tech support that a complete re-build on these units is expected to be completed every 1-2 years. This was news to me. I think they switched from MBR to GPT partitions, but the disks maintain the MBR partitioning unless you do a factory reset and re-build the array.
I've got a plan in place to try a full rebuild on one of our units, but it's going to take awhile. I don't know how you could do a yearly re-build on these unless you keep one storage array as a "spare" and use it to store data while you are re-building another array.
Related Content
NETGEAR Academy

Boost your skills with the Netgear Academy - Get trained, certified and stay ahead with the latest Netgear technology!
Join Us!