× NETGEAR will be terminating ReadyCLOUD service by July 1st, 2023. For more details click here.
Orbi WiFi 7 RBE973
Reply

Re: ESXi reports "All Paths Down" for ReadyNAS hosted NFS sh

jlehtinen
Aspirant

ESXi reports "All Paths Down" for ReadyNAS hosted NFS share

Hiya - looking for some feedback from the community on an issue I'm seeing. Thanks in advance for any insights.

Some background:
We're using two ReadyNAS 3200's to host virtual machines via NFS.
ESXi hosts are running ESXi 5.1.
The ReadyNAS units are running 4.2.19, and have "adaptive load balancing" set on the NICs.

Issue:
I'm seeing some of the ESXi hosts report that NFS shares enter "All Paths Down" state for 6-7 seconds, before exiting this status and reconnecting. This happens for BOTH ReadyNAS units, and on 9 ESXi hosts - with no solid pattern on which host is impacted OR which ReadyNAS shows as "All Paths Down". It DOES appear to be related to the current load on the ReadyNAS. For example, if I start a backup job, I can expect to see this error on 3-4 ESXi hosts at least. I believe this has been happening for awhile without anyone noticing - but it caused a HUGE issue 2 weeks ago, when one of the ReadyNAS units entered/exited "All Paths Down" state nonstop while backups were running. (I opened a support case with Netgear and submitted the logs but they could not explain why this happened.)

Current theory:
From what I can tell, adaptive load balancing causes the ReadyNAS to change what MAC address (and NIC) is receiving traffic for a certain percentage of the overall traffic. It's my guess that when I run backups (or do anything else load intensive), the ReadyNAS attempts to load balance some of the traffic going to the ESXi hosts. The resulting change to the MAC address being reported to the ESXi host causes ESXi to report "all paths down" briefly before the new MAC address/NIC resolves correctly.

The issue we experienced must have been due to a glitch or bug in the load balancing, which caused the ReadyNAS to fail to "stabilize" the load balancing correctly. I was only able to stabilize the unit by power cycling it.

Questions:
1.) Does this sound like a plausible theory? My current thinking is I should disable load balancing and go to active-backup configuration to see if this resolves the issue.

2.) Will a firmware update resolve this issue? I reviewed the firmware patch notes and none of them mention NFS stability with NIC teaming.
Message 1 of 23
jlehtinen
Aspirant

Re: ESXi reports "All Paths Down" for ReadyNAS hosted NFS sh

As an update...

I have:
- replaced the switch that is used to connect the ReadyNAS units to the ESXi hosts.
- disabled load balancing on the ReadyNAS units.
- upgraded firmware to most recent version.

Issue still persists and we're still getting the APD error in ESXi.
Message 2 of 23
diegoaze
Aspirant

Re: ESXi reports "All Paths Down" for ReadyNAS hosted NFS sh

Hi jlehtinen,

Have you, by any chance, got this issue sorted? We are having same issue here since ever.

We use ReadyNAS 4200 as NFS datastores for VMs running virtual backup appliances.

During backup window, ReadyNAS is under intensive I/O and I believe it cant handle the load. It will happen if more than 2 backups are running at the same time.

We have 10GB NICs setup as a Team, with Active/Backup policy.

I have upgraded firmware on it over and over, but none of them have fixed the issue. Currently installed version is RAIDiator 4.2.23.

ESXi are running 5.1 U1, but the issue happened when hosts were running 5.0 as well.

We have 3 ReadyNAS devices attached to ESXi hosts and the issue happens on all of them, ones more often then the others.
Message 3 of 23
chirpa
Luminary

Re: ESXi reports "All Paths Down" for ReadyNAS hosted NFS sh

Maybe try this: viewtopic.php?f=118&t=65374
Message 4 of 23
jlehtinen
Aspirant

Re: ESXi reports "All Paths Down" for ReadyNAS hosted NFS sh

chirpa wrote:
Maybe try this: http://www.readynas.com/forum/viewtopic ... 18&t=65374


chirpa, thanks for the reply. Unless I'm missing something, that's a post for how to adjust settings to maximize IOPs for a very specific NIC configuration (4 NIC MPIO) in an environment that is using iSCSI.

Doesn't really relate to situation described in this post.
Message 5 of 23
jlehtinen
Aspirant

Re: ESXi reports "All Paths Down" for ReadyNAS hosted NFS sh

diegoaze wrote:
Hi jlehtinen,

Have you, by any chance, got this issue sorted? We are having same issue here since ever.

We use ReadyNAS 4200 as NFS datastores for VMs running virtual backup appliances.

During backup window, ReadyNAS is under intensive I/O and I believe it cant handle the load. It will happen if more than 2 backups are running at the same time.

We have 10GB NICs setup as a Team, with Active/Backup policy.

I have upgraded firmware on it over and over, but none of them have fixed the issue. Currently installed version is RAIDiator 4.2.23.

ESXi are running 5.1 U1, but the issue happened when hosts were running 5.0 as well.

We have 3 ReadyNAS devices attached to ESXi hosts and the issue happens on all of them, ones more often then the others.



diegoaze

No luck on resolving it. A tech had me do (another) chassis replacement on one of the 3200's, so I've been monitoring that unit.

The storage with the replaced chassis has actually been OK so far, but I'm not sure if that's a coincidence.

This issue seems to get worse over time - the longer the storage is on, and the more data gets pushed on/off, the worse they get - to the point where a reboot of the storage is required to resolve connectivity. It is possible that the unit with the replacement chassis is working better because it was just rebooted, and the issues will manifest again over time.
Message 6 of 23
jlehtinen
Aspirant

Re: ESXi reports "All Paths Down" for ReadyNAS hosted NFS sh

Just an update, the unit where I completed a chassis replacement started giving this error again in the ESXi hosts while backups were running.
Message 7 of 23
diegoaze
Aspirant

Re: ESXi reports "All Paths Down" for ReadyNAS hosted NFS sh

In my case I don't really think it is an issue with the chassis or a hardaware faulty, as it happens on 3 boxes that are connected to ESXi hosts.

We have another 3 units that are not connected to vSphere butI am not sure whether the issue also happens there. We use them to replicate data from the first 3 units, using ReadyNAS Replicate plugin.

Is there any logs where I can look at for further troubleshooting? I am really not good with Linux stuff.
Message 8 of 23
jlehtinen
Aspirant

Re: ESXi reports "All Paths Down" for ReadyNAS hosted NFS sh

I don't think it's related to hardware either, as it happens on all 3 of our ReadyNAS units. However, chassis replacement is what the support technicians recommended. So far I've done a chassis replacement on two of the units in the last year and it hasn't helped.

You can get at the logs in FrontView by going to Status > Logs > Download all logs. I'm no Linux expert either, so I don't really have a more specific direction for troubleshooting. I feel like it's got to be a configuration issue either with NFS or with the NICs...
Message 9 of 23
diegoaze
Aspirant

Re: ESXi reports "All Paths Down" for ReadyNAS hosted NFS sh

I already had a look on the logs downloaded from FrontView but could not see much. I was expecting to see a NFS specific log of some sort.

I contacted Negtear 2 days ago and I was told to change load balance mode to TLB (Transmit Load Balance). I was also told my array was built a long time ago when the device was running a pretty old firmware version. The reccomendation from Support was to backup everything and factory reset the appliance. I then should recreate it again using X-RAID (already is BTW) and copy the data back. The only issue here is that each of our ReadyNASes (6 in total) range from 6TB to 12TB of used space. It will take ages for my to accomplish this. And Netgear tech told me this is something to be done every 1 year or 2.

I have already changed the Network Load balance option (one one device only) and will see what happens. Next step would be to disable teaming at all!
Message 10 of 23
jlehtinen
Aspirant

Re: ESXi reports "All Paths Down" for ReadyNAS hosted NFS sh

Our units were originally in Adaptive Load Balancing. I changed them to Active Backup and still get the error. I have not tried disabling teaming completely.

I was also told by tech support that a complete re-build on these units is expected to be completed every 1-2 years. This was news to me. I think they switched from MBR to GPT partitions, but the disks maintain the MBR partitioning unless you do a factory reset and re-build the array.

I've got a plan in place to try a full rebuild on one of our units, but it's going to take awhile. I don't know how you could do a yearly re-build on these unless you keep one storage array as a "spare" and use it to store data while you are re-building another array.
Message 11 of 23
StephenB
Guru

Re: ESXi reports "All Paths Down" for ReadyNAS hosted NFS sh

jlehtinen wrote:
...I was also told by tech support that a complete re-build on these units is expected to be completed every 1-2 years. This was news to me. I think they switched from MBR to GPT partitions, but the disks maintain the MBR partitioning unless you do a factory reset and re-build the array.

I've got a plan in place to try a full rebuild on one of our units, but it's going to take awhile. I don't know how you could do a yearly re-build on these unless you keep one storage array as a "spare" and use it to store data while you are re-building another array.
First of all, I don't buy that response. You shouldn't need to rebuild the RAID array under normal circumstances. As you point out, there is a lot of down time involved. Proposing it as routine maintenance seems crazy to me.

Though if you want to protect your data from loss, you should have a backup plan in place, so it would always be possible to rebuild the NAS if need be. Backing up to a second NAS is one approach, backing up to a cloud service is also possible. I do both (the cloud gives me good disaster recovery, but I am not sure I can count on it).

There are certainly some cases where a factory reset / rebuild is needed (or a good idea). There's a helpful article on the subject here: http://www.rnasguide.com/2011/06/22/why ... -readynas/
Message 12 of 23
jlehtinen
Aspirant

Re: ESXi reports "All Paths Down" for ReadyNAS hosted NFS sh

Yeah, I'm unsure how I feel about it. If it's true, than it undermines the concept of the ReadyNAS brand being enterprise grade.

The tech I spoke to said that the yearly rebuild is recommended for data center environments, or when the storage is under consistent heavy load. If I remember correctly he said it's recommended because the arrays pick up file system corruption that causes performance and stability issues over time. If you're just using the ReadyNAS to store relatively static data and your I/O load is low, maybe it's a different story.

The article you linked is good - it also notes that a factory re-set and full rebuild is needed to take advantage of most of the 'major' firmware upgrades. To stay on the topic of this thread (NFS instability), I'm curious to see if a full factory reset on newest firmware will resolve the issue.
Message 13 of 23
jlehtinen
Aspirant

Re: ESXi reports "All Paths Down" for ReadyNAS hosted NFS sh

FYI - I set up a NetApp FAS2220 in our environment and do not experience this issue on that storage. This seems to indicate it's an issue specific to the ReadyNAS units. I'm currently in process of RMA'ing several drives, and then will complete a full rebuild on the ReadyNAS to see if this corrects the issue.
Message 14 of 23
jlehtinen
Aspirant

Re: ESXi reports "All Paths Down" for ReadyNAS hosted NFS sh

As an update...

I completed a full rebuild on my 2 ReadyNAS 3200's. This included replacing any drives with errors, upgrading the firmware to the newest version, resetting the device to factory defaults, then re-configuring the volume and shares.

Since completing this, I have noticed fewer instances of this error. It no longer happens at 'random', or when I'm running backups. This is an improvement.

However, there are still issues whenever the arrays complete a RAID scrub. The last time a scrub ran, my monitoring software registered 200-300 "storage is not accessible" events on the ESXi hosts. These stopped immediately once the RAID scrub finished. This seems to indicate that the arrays are not able to maintain consistent network connectivity while under the load caused by the scrub.

So far this flakey connectivity has not caused data loss or corruption in my environment. However, it means I have to hand-hold the environment whenever a RAID scrub is scheduled.

IMO, these units are not stable enough for a VMWare environment. I've had zero issues with my NetApp FAS2220 that is using the exact same infrastructure and is actually running a heavier load.
Message 15 of 23
gavind
Aspirant

Re: ESXi reports "All Paths Down" for ReadyNAS hosted NFS sh

Hi jlehtinen, just to confirm, currently, what hardware are you using for a VMware environment? Coz I'm currently shopping around.
Message 16 of 23
cyrill1
Aspirant

Re: ESXi reports "All Paths Down" for ReadyNAS hosted NFS sh

Hi jlehtinen! Did you manage to sort out the problem? I am experiencing the same with RN4220s/ESXi 5.5u1 - in my case it gets unavailable under random circumstances. Moreover, the device management web-interface isn't available (the device is offline) and ssh sessions drop just after I provide correct logon credentials..
Message 17 of 23
jlehtinen
Aspirant

Re: ESXi reports "All Paths Down" for ReadyNAS hosted NFS sh

Sorry I didn't see your posts... I haven't been around these forums in a long time.

@gavind:
For storage I've got 2x ReadyNAS 3200's, 1 NetApp FAS2220, and 1 NetApp FAS2520.

Personally I like the NetApps better. They have a ton of functionality even w/ basic licensing. You get tons of data on performance, usage, and other metrics without having to install hackjob 3rd party mods or other weird crap. There's much better support and detailed documentation. NetApp also treats their product like it's enterprise grade, so you won't get techs recommending you should reboot a production storage array in the middle of business hours. :wink:

To be fair, ReadyNAS does some things well, and they're cheaper... so it all depends on if you want a rock-solid platform, or if you need to save some $$$.



@cyrill:

No, status hasn't changed since my last post. The ReadyNAS 3200's are stable unless there's a RAID scrub running, and then they get flakey. It hasn't caused issues yet, ESXi seems to be able to cope with the I/O dropping in/out, and I haven't seen data loss yet. I'm not happy about the situation but there's nothing I can do as long as I need to keep the units running. For your case, I think you might have some other issue, as my problems were all related to connectivity loss while the unit was under high load. You might have problems with a bad NIC, cable, etc.
Message 18 of 23
mdgm-ntgr
NETGEAR Employee Retired

Re: ESXi reports "All Paths Down" for ReadyNAS hosted NFS sh

For those of you with OS6 NAS Units which version of ESXi are you running now? Can you downgrade to build 1331820 if you are running a newer version?

Is this RAID scrub issue you are facing on a 3220 or a 3200? What firmware version are you running (version number please)?
Message 19 of 23
jlehtinen
Aspirant

Re: ESXi reports "All Paths Down" for ReadyNAS hosted NFS sh

Both units are ReadyNAS 3200. I have one on 4.2.26, other is still 4.2.24. Both have same issue.
Message 20 of 23
mdgm-ntgr
NETGEAR Employee Retired

Re: ESXi reports "All Paths Down" for ReadyNAS hosted NFS sh

jlehtinen what build of ESXi are you using?
Message 21 of 23
jlehtinen
Aspirant

Re: ESXi reports "All Paths Down" for ReadyNAS hosted NFS sh

Most are: 5.1.0 799733
I have a few: 5.1.0 1483097
Message 22 of 23
clariion
Aspirant

Re: ESXi reports "All Paths Down" for ReadyNAS hosted NFS sh

I had the same Problem with Readynas RN102.

 

I tried everthing even using the firmware 6.4.1 beta.

 

The final the solution was not using NFS but switch to iscsi.

 

Now the Performance is great an no hard reset is nessecary to make the NFS visible again to the ESX Host.

 

I think there is a NFS Problem in the Readynas software.

 

Hope this will help someone.

 

 

Message 23 of 23
Top Contributors
Discussion stats
  • 22 replies
  • 9362 views
  • 0 kudos
  • 8 in conversation
Announcements