Orbi WiFi 7 RBE973
Reply

R7000P crashes with Readyshare usage

dav0dav0
Guide

R7000P crashes with Readyshare usage

Nighthawk router running fine with 2.4 and 5GHz users.

Hook up 2 TB USB drive (Western Digital, nothing exotic) formatted as NTFS (probably MBR, but not sure).

Basic readyshare read/write with drive works fine.

But if I do a lot of reading or writing via a single script file (no other users touching the files) -- not particularly large files -- router gets confused after a couple of minutes and:

  • No longer lets me log in to it
  • Kicks some network devices off
  • Will not let any devices join either network
  • Runs traffic from existing users fairly slowly
  • Puts disk into an infinite loop (disk access light blinks constantly).

Running latest firmware with a mix of windows and mac and android devices.

Error logs show nothing (not even failed network-join events)

Seems to be "router's file system gets confused by ReadyShare usage, and it's good-bye router". Only remedy is reboot router and abandon Readyshare
Ideas?  Note: this isn't a particularly fast drive, I've been using the back (USB 2) port...maybe I should focus on the front (USB 3) port??
Anybody else seeing this?
I've got an ancient Netgear 3400 router that serves up the same drive just fine. Seems to be something in this device's firmware. Hard to believe they got it wrong after all these years.

Did reflash from a previous firmware rev--same symptom.

Model: R7000P|Nighthawk AC2300 Smart WiFi Dual Band Gigabit Router
Message 1 of 10

Accepted Solutions
dav0dav0
Guide

Re: And yet another update

Well, I don't want to jinx this, but I believe all the above issues have been resolved.  I've been running for 5 days now with 100% functionality, no reboots, and a more responsive/reliable administrative app (NetGear's Genie on the PC).

The big change: doing a factory-reset on the router and manually reconfiguring it to my needs (i.e., not doing a backup/restore of settings).

Seems like something somewhere in the router's little mind had gotten corrupted, causing all kinds of confusion.. What may have caused this was its moving to a totally new network (when it was new, it was on one ISP...then it got moved to a new ISP without making any changes). Note that in during all my wild-goose chases I had updated the firmware, and the confusion persisted because the problem was buried somewhere in the configuration/settings data.

 

So the morals of my story are:

  1. When troubleshooting, start with a factory refresh and brand new settings
  2. When moving a router across networks, see #1

View solution in original post

Message 9 of 10

All Replies
dav0dav0
Guide

Re: R7000P crashes with Readyshare usage

Looks like I've answered my own question via experimentation.

The USB port on the back says USB 2.0 and the manual advises "put slower devices like printers there".
This should be re-interpreted as "don't you dare to attach any disks there!!!"

Plugged the disk drive into the front panel USB 3.0 socket and it runs likea champ...24 hours straight (my script is copying a LOT of files) with no gliteches.

I suspect what's happening with the rear socket is this:

  • USB 2.0 does good handshake with USB drive
  • At some point (20 minutes of use?) the drive somehow overruns the speed or vocabulary of the USB 2.0 port
  • At that point, the USB multiplexor chip in the router goes berserk and has tons of low-level error conditions
  • Router CPU gets waaaaaay too busy servicing all those low-level error conditions which it can't really do anything about
  • Router goes into life-support mode where it can't do anything new (just handle existing connections and streams), and doesn't have the time or code to log hardware errors.

The front socket has a different chip (and a different code-path), and doesn't get caught up with any of this...it just hums along dutifully.

Model: R7000P|Nighthawk AC2300 Smart WiFi Dual Band Gigabit Router
Message 2 of 10
dav0dav0
Guide

Re: R7000P crashes with Readyshare usage

Another update...
Did a full low-level disk check of the NTFS/MBR partition to make sure there weren't any corrupted entries.  All good.

I have been using the USB disk from the front port for a while now, and it's looking like there's another level of issue with this router's firmware.

When the router first connects to the disk, it spends a couple of minutes reading something from the disk.  I'm going to assume it's the directory structure, to make accesses faster once everything is in memory.

But...it appears that after a few days (where I'm barely using the disk, but banging pretty hard on the WiFi) the router loses its mind, stops serving internet, won't let me log in to the administrative console, and eventually is doing nothing but blinking it's LEDs happily.  Only slution is to reboot both it and my cable modem.  I'm going to guess this is a memory leak, or some other resource constraint.
The perhaps interesting factoid is the disk holds about 1,250,000 files in 1.8 TB.
I'm fooling around with the configuration (e.g., disconnect ethernet ports, disconnect disk) to troubleshoot...but really, I'm flying blind.
As this same disk drive works fine in readyshare mode with a 10-year-old Netgear $40 WiFi router, it seems hard to believe that Netgear messed things up on this model.

 

Message 3 of 10
dav0dav0
Guide

And yet another update

Still running with same configuration, tried various experiments to see what the issue was.Clearly, using the front USB port is way more likely to be successful over the long run than using the back one.

For whatever reason, the configuration got way more unstable after 9 months of only having to hard-reboot every 3 weeks or so... the Netgear "freezes" as described above were occurring more than daily. Noticed that the problem seemed to be triggered by windows clients (there are only two of them...every other device is a phone or tablet) coming out of sleep/hibernate mode. Also noticed that the USB hard disk spun up at random times for no particular reason (probably some sort of router housekeeping of Readyshare).

Here are the things that offered no improvement:

  • Turning off WMM
  • Turning off QOS
  • Putting a powered USB hub between the router and the disk
  • Swapping out the power supply
  • Power cycling the router every night at 1 am (no usage for hours before or after), letting it cool off

Here's what made a BIG difference:  turning off MU-MIMO and the ReadyShare DLNA Media server features (both very nicely hidden in the menu tree).  Even with that noticed that the router was rebooting itself on a daily basis.  Sometimes when a windows client came out of hibernation, sometimes when there hadn't been any user activity for hours.

To quote Miss Anne Elke, I have a theory!  This router must run an enormous amount of software, given the time it takes to come back online after a boot, and how long it takes to log in as an administrator when the network is under load.  Evidently, this issue is made much more significant when a USB disk is attached with a lot of files. My theory is that the ReadyShare Media Server features make the processing load much worse (it generates a SQLite database that can be 50 MB or more), and that MU-MIMO has to do a lot of work whenever it sees a new or returning client -- and too often, the router wraps itself around the axle and becomes unresponsive.  Once you turn those features off, the daily reboots that still occur are relatively transparent (happening either when nobody is looking, or happening in a way that just makes the network less responsive for a minute or so) are, I believe, the result of the router successfully recovering when its backlog gets to be too big. (I'm being nice here: it's just as likely that the Netgear software has some sort of watchdog function that triggers a reboot when the processor hits a bug, a buffer overflow, an out of memory situation, or just vectors off to never-never land.)

Now that I'm with this configuration, let's see how long I can run without a router freeze/hang.

 

Message 4 of 10
dav0dav0
Guide

Re: And yet another update

Configuration info:
- Two Win10 clients

- Android phone

- 2 iPads

- 1 iPhone

- 2 TB hard drive, with separate power supply

running latest firmware on Netgear R7000P

 

Message 5 of 10
dav0dav0
Guide

Re: And yet another update

Some additional evidence supporting my "processor overload" hypothesis:

  • the ReadyShare file system appears to be very poorly multithreaded: with a single process using it, it's pretty snappy...but if you have multiple file operations simultaneously, it slows to a crawl.  File deletion operations contend very seriously with file write operations, slowing its effective I/O to less than one I/O per second, even on small files.
  • during nearly any kind of ReadyShare slowdown, you can't log in to the router as an admin.  Even putting up the login credentials popup may take 30 seconds or not work at all, and even if you can enter your credentials you'll almost never actually get logged in.
  • attempting to delete 3500 files in one directory while doing a simple file copy into another directory has frozen the router at least once.

It seems that the router has a priority system, where it serves existing users' network traffic above all else.  If it has other things to do, they'll be easily slowed down.  As things start to get messy, it simply stops responding to any new requests (including traffic from a new client).

 

Bottom line: to lower risk of instability, do not attempt to have more than one process using ReadyShare for bulk operations.  It's simply too crummy an implementation of NTFS -- and who knows what kinds of detritus the file system leaves behind on the disk when it's running into trouble.

 

Crap software courtesy of NetGear.

Message 6 of 10
dav0dav0
Guide

Re: And yet another update

OK, some solid news here...and some important warnings for people experiencing this kind of issue:

  • Whenever the router goes loopy, if you power cycle it you risk creating damaged files/directories on disk.  This is always the case with file systems...but when you do it on a Mac or PC, the file system knows it needs to recover and does so in the background.  With a router, the file system isn't all that smart and may leave all kinds of rubbish scribbled on the disk.
  • SO, before you connect a drive to a router, make sure it passes the most thorough file system check you can find.  Any errors that the router's file system runs across may cause it to do all kinds of unfortunate things.
  • In my case, when I disconnected the drive from the router and connected it to the PC, it worked fine under a trivial load.  But evidently the drive's USB adaptor/controller (USB3 to SATA) was flaky, even when it was brand new.  So there was even more rubbish on the disk.  Files that couldn't be deleted, attributes set wrong, 60 GB of orphanned stuff.  Fortunately, I had good data to replace the old.

It is more than possible that all the craziness I've been experiencing has as it's root cause flaky disk hardware.  The drive is fine, but the controller is definitely not (all kinds of USB errors, wouldn't pass a file system check, etc.)

 

Are there things to complain about wrt Netgear?  Well, let's start with useless logs that don't tell you anything, crummy error handling, etc. But that's to be expected from all but the best software...just frustrating in its ability to generate wild goose chases.

 

I'll update you with performance/reliability info as I experience it with corrected hardware.

 

Message 7 of 10
dav0dav0
Guide

Re: And yet another update

Well, it only took a day to demonstrate that Nighthawk itself is still failing.

Failure mode is interesting:

  • Runs for days as long as you don't use the network
  • Most common failure is a minute or so after client PCs come out of sleep/hibernate mode (network runs fine for the first few seconds)
  • When the failure occurs, (1) pings to router are fine, (2) pings to 8.8.8.8 time out at all clients, and (3) the readyshare disk is spun up.

The reason for #3 is either (1) some weird Netgear bug spontaneously making a request of the disk, or (2) Windows networking making some sort of request as part of refreshing its network map about connected shares.  Note that this symptom occurs on three separate disks and controllers I've tested, so I don't suspect the drive itself.

Message 8 of 10
dav0dav0
Guide

Re: And yet another update

Well, I don't want to jinx this, but I believe all the above issues have been resolved.  I've been running for 5 days now with 100% functionality, no reboots, and a more responsive/reliable administrative app (NetGear's Genie on the PC).

The big change: doing a factory-reset on the router and manually reconfiguring it to my needs (i.e., not doing a backup/restore of settings).

Seems like something somewhere in the router's little mind had gotten corrupted, causing all kinds of confusion.. What may have caused this was its moving to a totally new network (when it was new, it was on one ISP...then it got moved to a new ISP without making any changes). Note that in during all my wild-goose chases I had updated the firmware, and the confusion persisted because the problem was buried somewhere in the configuration/settings data.

 

So the morals of my story are:

  1. When troubleshooting, start with a factory refresh and brand new settings
  2. When moving a router across networks, see #1
Message 9 of 10
dav0dav0
Guide

Re: R7000P crashes with Readyshare usage

Well, I guess I did jinx it.

Everything runs fine as long as you're not doing anything serious with ReadyShare.  But as soon as you try to write more than a couple of thousand files, the file system goes slower and slower until it simply times out with various weird error messages.  If you're trying to read with a different process while you're writing, the read stream will be very choppy indeed.  All other network traffic will continue as normal for a while, but if you keep throwing a lot of writes eventually the whole network will stop working. Only recourse is power-down, as the Netgear admin app stops responding as well.

If you have a big directory tree to copy, put the copy on pause every five minutes or so, letting the processor do whatever it wants for a couple of minutes.

This is just a rubbish implementation, dozens of bugs in the UI alone.

 

Message 10 of 10
Top Contributors
Discussion stats
  • 9 replies
  • 2059 views
  • 1 kudo
  • 1 in conversation
Announcements

Orbi WiFi 7