NETGEAR is aware of a growing number of phone and online scams. To learn how to stay safe click here.
Forum Discussion
mobocracy
Jun 13, 2015Aspirant
RN2100, 2120 & 3220 all going offline under heavy I/O
I work for a SMB consultancy and have several clients who use RN2100/2120/3220 NAS devices and since about last fall I've had recurring problems with devices going offline (iSCSI offline, no web access, no ping) during heavy disk I/O. We use them about 90% of the time as backup storage, mostly with Veeam, as iSCSI mounts. Prior to last fall I had several devices with at least two years of regular, problem-free use, so it seems really puzzling that the same usage patterns we've had for years would suddenly start giving us problems.
In troubleshooting the problem, the devices seem to crash after a period (several hours) of heavy disk I/O. No network response. Devices look good physically (normal status LEDs, including traffic lights on the network links) but can't be accessed without a power cycle. The devices seem OK when they power back up, and usually work again normally for a few weeks to a couple months before it recurs. About 20% of the time the devices will crash immediately on the next job cycle, typically when Veeam is recovering from the previous backup job failure (this rollback is very disk intensive).
I've made a few adjustments which seem to have extended -- but not eliminated -- the time between device crashes -- disabling 9000 byte MTU, and disabling any kind of link aggregation or trunking. Prior to last fall we had been using 9000 byte MTUs and adaptive load balancing since the devices were setup, further puzzling me as to why they fail.
In no case have we tracked the problem down to a specific hardware/OS combination (mix of 2008r2, 2012r2, multiple switch vendors, multiple server vendors) or seen units with hanging problems that have SMART errors or diagnosable physical disk problems.
We considered changes in the backup software which might be contributing to this, but we have several installs where we're using servers with lots of internal disk (eg, R710s with 8x4TB) and these never have problems.
I just setup a brand-new 3220 yesterday and with backups disabled, the 3220 hung after about 2 hours copying the backup repository from a 2120. The 3220 I configured with 9000 byte MTU, but no link aggregation. Device went offline as the others have before -- no ping, no web interface, no iscsi and not visible to RAIDar.
In troubleshooting the problem, the devices seem to crash after a period (several hours) of heavy disk I/O. No network response. Devices look good physically (normal status LEDs, including traffic lights on the network links) but can't be accessed without a power cycle. The devices seem OK when they power back up, and usually work again normally for a few weeks to a couple months before it recurs. About 20% of the time the devices will crash immediately on the next job cycle, typically when Veeam is recovering from the previous backup job failure (this rollback is very disk intensive).
I've made a few adjustments which seem to have extended -- but not eliminated -- the time between device crashes -- disabling 9000 byte MTU, and disabling any kind of link aggregation or trunking. Prior to last fall we had been using 9000 byte MTUs and adaptive load balancing since the devices were setup, further puzzling me as to why they fail.
In no case have we tracked the problem down to a specific hardware/OS combination (mix of 2008r2, 2012r2, multiple switch vendors, multiple server vendors) or seen units with hanging problems that have SMART errors or diagnosable physical disk problems.
We considered changes in the backup software which might be contributing to this, but we have several installs where we're using servers with lots of internal disk (eg, R710s with 8x4TB) and these never have problems.
I just setup a brand-new 3220 yesterday and with backups disabled, the 3220 hung after about 2 hours copying the backup repository from a 2120. The 3220 I configured with 9000 byte MTU, but no link aggregation. Device went offline as the others have before -- no ping, no web interface, no iscsi and not visible to RAIDar.
4 Replies
Replies have been turned off for this discussion
- mdgm-ntgrNETGEAR Employee RetiredHave you opened a support case?
If so, do you have a case number?
If not, I would open one for the 3220. - mobocracyAspirant
mdgm wrote: Have you opened a support case?
If so, do you have a case number?
If not, I would open one for the 3220.
I opened one last November/December for the first unit (a 2100) I had problems with. It wasn't a very satisfactory experience -- the technician didn't have much to go on since the logging on these devices seems pretty sparse. He did replace the unit as a warranty as the only thing we could agree on was that a total stoppage of all network I/O wasn't normal, especially since the unit in question had been working normally for years (it had about 4 months left on the warranty).
What I'd really like is a way to collect more advanced logging from any of these units that might show kernel log errors from the network or disk subsystem, especially off-host if possible. - mdgm-ntgrNETGEAR Employee RetiredPlease send me your logs (see the Sending logs link in my sig)
- jpcorzoAspirant
Good morning all,
Was this issue ever resolved? We have a ReadyNas 3220 at one of our medical practices and we have started seeing the same behavior. Our current setup consists of 2 ReadyNas 3220 units, the primary is the affected one and goes offline completely where we have to manually power cycled and reset the network connections.Any assistance or advice would be appreciated.
Thank you
Related Content
NETGEAR Academy
Boost your skills with the Netgear Academy - Get trained, certified and stay ahead with the latest Netgear technology!
Join Us!