Reply

R9000 DNS Performance

isaki
Apprentice

R9000 DNS Performance

Greetings!

 

I am currently running stock firmware (1.0.2.40) on my X10 and noticed some strange behavior when doing updates on my Ubunutu machines (both virtual and physical). After running straces on apt and trying various configurations to see the differences (no router, router connected but machine hardline, router connected but machine over wifi, router connected, hardline, custom DNS settings), I determined that the DNS services provided by my R9000 are the issue.

 

Here are the worthwhile tests and the results. These tests were all performance on a MPB running Ubuntu 16 server under VMWare Fusion (so I could rollback state using snapshots and ensure my tests were always the same) that was connected via cat6 to the router with the WiFi adapter disabled. The VM was configured to use bridge mode, so the DNS server for the VM was my MPB.

 

It is worth noting I see this with other things as well, I just liked this test as it was easy to reproduce, debug, and see when it was happening (being stuck at 0% for 10-20s is pretty obvious).

 

Test 1: Normal Operation

 

This test uses WiFi (5GHz) to connect to the X10. The X10 is connected via cat6 to my cable modem.

 

Result: Hangs when attempting to do updates (it eventually works).

 

Test 2: No router in play

This test used a hardwired connection from my machine to my cable modem.

 

Result: No hang; everything is super fast. This used the DNS servers provided by my ISP via DHCP.

 

Test 3: Hardwired Operation

This is a variant of Test 1, but hardlined to the router with the WiFi disabled.

 

Result: Same as Test 1.

 

Test 4: Google DNS on MPB

This test was done using the physical configuration of Test 3, but I overrode the DHCP provided DNS servers with 8.8.8.8 and 8.8.4.4.

 

Result: Same as Test 2, no hangs and everything super fast.

 

Test 5: ISP DNS on MPB

This is the same test as Test 4, but using my ISP's DNS. These were taken from the Netgear Genie UI so I ensured I was using the same DNS servers as the router was using, only eliminating the router proxy.

 

Result: Super fast, no hangs.

 

Solutions?

 

I'd either like to fix whatever is making this suck or just disable DNS proxy so that my router hands the DNS servers it was handed to DHCP requests. I'm tired of having random timeouts in games, web browsing, etc... due to this issue.

 

NOTE: I have a BestBuy warranty on this, so I can get a new one for free if we suspect a hardware problem.

Model: R9000|Nighthawk X10 AD7200 Smart WiFi Router
Message 1 of 31
isaki
Apprentice

Re: R9000 DNS Performance

That doesn't explain why I see the same behavior when I'm hardwired.

Message 2 of 31
isaki
Apprentice

Re: R9000 DNS Performance

Let me address everything said thus far:

 

The performance issue does not exist if I override the DNS servers at the network adapter level for both 5G and Ethernet. Changing the DNS servers at the router will not solve my problem as my ISP's DNS servers are just as responsive as Google's. The problem is specifically related to some interaction with the DNS server capabilities and the subsequent socket creation.

 

The reason for this is running a bunch of nslookups using my ISP, Google, and the router all return in statstically equivalent timeframes. I did an additional test using Java's JNDI framework and got similar results (we are talking milliseconds of difference, not the 10-30 seconds seen in the apt-get call).

 

The only thing that is wierd is Google and my ISP happen to alternate the first DNS a record between one of three servers, whereas the answer returned by the router has those three near the bottom of the list.

 

It is entirely possible the router is not respecting the A record order provied by the DNS servers and is returning slow/high latent addresses for the CNAME first, causing the seemingly slow behavior (as once it gets going i.e. makes a connection that doesn't timeout during the TCP handshake) it is lightning fast (though, as per your own ticket, not as fast as it could be).

 

As to maximum bandwidth, I ran a few tests for you (300/35 is the advertised speed):

 

Direct off modem (windows 7): 290/36

5G (MacOS 10.13.2): 261/36

Ethernet (gigabit) (windows 7): 253/37

 

So you are probably right that ethernet and 5g are having performance issues, however they are not significant enought to explain the observed problem. In fact, I would expect worse peformance for apt-get using external DNS intead of the router at the VM level as it would involve more internet based traffic; I get the exact opposite.

Message 3 of 31
isaki
Apprentice

Re: R9000 DNS Performance

How do I escalate to tech support? I mean, I could just use my best buy warranty, get a new one, register it, and get free 90 day support, but that seems like I whole lot of work for what should be a simple form to submit bugs.

Message 4 of 31
isaki
Apprentice

Re: R9000 DNS Performance

@Case850 No, this is still unresolved, even on the current patch (1.0.3.10).

Message 5 of 31
isaki
Apprentice

Re: R9000 DNS Performance

@Case850

 

As part of the testing (I left it out) I used both my old LinkSys running DD-WRT and my old Apple AirPort router as additional testing avenues; they did not introduce any performance issues so not sure what the Edge Router would give me. I've toyed with the idea of throwing DD-WRT on the R9000 as a test, but that point I'd rather do a hardware exchange first as it is way less effort.

 

Additionally, my R9000 drops my XBox Live connection, both with port forwarding (with UPnP disabled and static DHCP reservations) and with UPnP (with standard DHCP, with both reserved and nonreserved MAC based DHCP assignments). This is probably unrelated as I've already set my XBox 360 to use 8.8.8.8 and 8.8.4.4 for DNS.

 

I can exchange my R9000 for another R9000 (I have an extended replacement plan on it) and I may do so today. If that doesn't work, I should be able to get pay for support and/or throw SVoxel DD-WRT on there for further testing.

Message 6 of 31
isaki
Apprentice

Re: R9000 DNS Performance

@Case850

 

Something is deleting my posts. I found the root cause and it keeps getting deleted. I know it is being deleted because it lives for a short time (I see it, reload the page, and its suddenly gone), and I'm getting badges for reply counts so I know it hit the server.

 

The TL;DR is that the R9000 is sending an invalid packet in reply to a DNS lookup failure instead of NXDOMAIN (resuling in a FORMERR, which for some things, causes them to sleep for X seconds and try again a few times before giving up). I have all the data to prove this, but posting it gets my post deleted, so I'll find another way to get it to you.

Message 7 of 31
schumaku
Guru

Re: R9000 DNS Performance

@isaki Make a short document and add it as a PDF for example.

 

@Case850 As far as I'm aware the R9000/R8900 project engineer is reading here.

Message 8 of 31
isaki
Apprentice

Re: R9000 DNS Performance

@schumaku

 

Thanks for the idea! I'll put something together and add the PDF. Thank you!

Message 9 of 31
isaki
Apprentice

Re: R9000 DNS Performance

From my Mac:

 

 

Server: 10.0.1.1

Address: 10.0.1.1#53

 

version.bind text = "dnsmasq-2.39"

 

This is on the latest and greatest official R9000 firmware, so not sure why this is the case.

 

I'll attach the PDF with my packet captures and what I've determined to be the issue later tonight. It is possible that the version of dnsmasq running on the router is the cause of this problem.

 

Message 10 of 31
isaki
Apprentice

Re: R9000 DNS Performance

@Case850

 

I got it for a relatively cheap price on a special deal and it had the performance specs I needed to hold up to my device and performance load that my router at the time (an AirPlay Extreme) could not.

 

That being said, why I purchased the R9000 is ultimately irrelevent to the discussion at hand; it was expensive and I expect it to work (especially for something that I can't turn off and is as fundamental as DNS; I wish I could make it pass through my ISP's DNS and not be in the mix at all).

 

Attached is my packet capture overview and diagnosis. I have the raw captures if required.

Message 11 of 31
schumaku
Guru

Re: R9000 DNS Performance


@Case850wrote:

What is the reason for purchasing the R9000?


Would you mind stop stalking every R9000/R8900 community member please?

Message 12 of 31
schumaku
Guru

Re: R9000 DNS Performance


@Case850wrote:

The DNS issue could be resolved with a simple upgrade to dnsmasq 2.78. I guess now that the current version has been reported to have a vulnerability, it might force Netgear into action..


So you state that every earlier dnsmasq - or specifically this 2.3.x we have on the Nightawks - does have the issue @isaki has shown? I strogly doubt - but see below.

 

@Case850 wrote:

The R8900 & R9000 are expensive because they include the now dead 802.11ad technology and the 10 Gbps LAN interface.

The 802.11ad has certainly the higher impact than the SFP+ cage and wiring, not requiring much components. However, this is not relevant in this DNS bug context.

 

@Case850 wrote:

The R8900 & R9000 have more bugs because the integration between the Annapurna CPU and the Qualcomm interfaces has proven difficult.

About the same drivers should be used for both ARM architectures.

 

@Case850 wrote:

However the R7800 which is an all Qualcomm Router is a much more stable product.

When you do an nslookup for a non-existing name on a client resolving to the R7800 your ie. Windows system does also throw a format error?

 

C:\>nslookup aaa.bbb.ccc
Server: UnKnown
Address: 192.168.1.1

*** aaa.bbb.ccc wurde von UnKnown nicht gefunden: Format error.

 

Message 13 of 31
schumaku
Guru

Re: R9000 DNS Performance

This thread is not about the dnsmasq vulnerability. The only thing I would like to understand is of Netgear has created an additional systematic mess in the Nighthawk code or of only the R8900/R9000 are affected by the subject problem.

You could easily proof how much better your R7800 is - at least in one point on the long list.
Message 14 of 31
schumaku
Guru

Re: R9000 DNS Performance

We don't talk about the vulnerability here - most Netgear and many other routers have dnsmasq on board - but much more on the faux retrurn code on an attempt of resolving non-existing names.
Message 15 of 31
isaki
Apprentice

Re: R9000 DNS Performance

As @schumaku has stated, I don't really care about the vulnerability (I mean I do, but that isn't causing me any grief at the moment). Please do the following on your R7800 and post the result:

 

nslookup this.is.not.a.real.host

 

Do you get NXDOMAIN or FORMERR? If you get the former, then your R7800 doesn't have the bug. If you get the latter, its a wider issue with Netgear firmware.

 

Also, out of curiousity, does the R7800 support link teaming for NAS?

Message 16 of 31
isaki
Apprentice

Re: R9000 DNS Performance

@Case850

 

Look man, I don't know what your deal is but it is clear you have something against the R9000. I'm glad you love your R7800 and that nothing else is as good as what you have. Good for you; you spent good money on a product and I'm glad you are happy with your purcahse.

 

However, nothing you are saying is helping to address my issue. I have an R9000. I can't return it (I can exchange it for hardware issues, but only for another R9000). And even if I could, I don't think I would.

 

Additionally, you left off the most important part of the nslookup output (see bold below):

 

Server: 10.0.1.1

Address: 10.0.1.1#53

 

** server can't find this.is.not.a.host: FORMERR

 

The fact you left this off I am willing to attribue to an honest mistake on your part; please share the entire output so we can see if this is a widespread firmware issue or if it is specific to the R9000 firmware.

 

Also, you mentioned earlier that running open source firmware "hides" issues with the R9000. On the contrary, if open source firmware works better than factory that shows that the R9000 works well with open source hardware drivers (which, as someone who uses and contributes to various open source projects is something that makes me like the R9000 even more) and that the factory firmware is severly lacking; it isn't the hardware. If it helps, think of Netgear's Genie as Windows and OpenWRT and its variants as Linux (which it is). One chooses the best operating system for the job, and in this case, that may very well be Linux instead of the factory, closed source, buggy, firmware.

Message 17 of 31
isaki
Apprentice

Re: R9000 DNS Performance

Server failed! That is FORMERR! You have the same issue!

 

This is awesome news because it means it is a fundamental problem in Netgear's firmware. They will have to fix it. I think. I hope.

 

Thank you sir!

Message 18 of 31
isaki
Apprentice

Re: R9000 DNS Performance

I find it rediculous that I can't inform NetGear of their own bug without paying them for support. I'm going to have to use the BestBuy warranty to get a new serial number to get the ability to submit the bug. I've pulled the source code for the firmware from their site and am going to see if I can just hand them a patch.

Message 19 of 31
schumaku
Guru

Re: R9000 DNS Performance

Issue does still exist on a R9000 test firmware with dnsmasq v2.78. Reply code is wrong: Should be 0011 (3) for no such name, is 0010 (2) on names which can't be resolved.

Message 20 of 31
isaki
Apprentice

Re: R9000 DNS Performance

I gave myself root access via telnet via the Genie Debug UI. I then killed dnsmasq and restarted it with the default options plus --no-daemon and -q so I could watch it work on the console.

 

dnsmasq: #####query domain is:foo.foo
dnsmasq: ppp1 not enable, return
dnsmasq: forwarded foo.foo to 8.8.8.8
dnsmasq: get reply with response code NXDOMAIN from 8.8.8.8 for domain foo.foo, treat as SERVFAIL

However, I noticed this only occurs for failed FQDN access. Simple lookups with default domain result in the proper NXDOMAIN respone:

 

$ nslookup foo
Server:		10.0.1.1
Address:	10.0.1.1#53

** server can't find foo: NXDOMAIN

$ nslookup foo.foo
Server:		10.0.1.1
Address:	10.0.1.1#53

** server can't find foo.foo: SERVFAIL

$ 

If we do the same thing from the router, you can see it gets NXDOMAIN in all cases:

 

root@R9000:~# nslookup foomatic
Server:  8.8.8.8
Address: 8.8.8.8 google-public-dns-a.google.com

nslookup: getaddrinfo('foomatic') failed: Name or service not known
root@R9000:~# nslookup foomatic.fun.com
Server:  8.8.8.8
Address: 8.8.8.8 google-public-dns-a.google.com

nslookup: getaddrinfo('foomatic.fun.com') failed: Name or service not known
root@R9000:~# 

So yeah, this definitely seems like a bug in dnsmasq or the way NetGear is running it. I need to dig through the manpage to see if there is a setting to fix this and I can just hack the /etc/init.d script on the router that starts dnsmasq until such time as NetGear gets a patch out.

Message 21 of 31
isaki
Apprentice

Re: R9000 DNS Performance

So, I have the dnsmasq source code and I'm looking at the /etc/dnsmasq.conf on the router. It looks like the NetGear firmware has enabled something called try-all-ns, which if you look at the 2.38 tag in the dnsmasq git repo, shows you the following behavior was patched in:

 

+       // If strict-order and try-all-ns are set, treat NXDOMAIN as a failed request
+       if( (daemon->options & OPT_ORDER) && (daemon->options && OPT_TRY_ALL_NS)
+           && header->rcode == NXDOMAIN ) header->rcode = SERVFAIL;
+

However, it does not appear as though strict-order has been specified (and it does not appear to be the default based on the manpage):

 

root@R9000:/etc# cat dnsmasq.conf 
# filter what we send upstream
domain-needed
bogus-priv
localise-queries

no-negcache

cache-size=0
no-hosts
try-all-ns
root@R9000:/etc# 
root@R9000:/etc# ps -w | grep dnsmasq | grep -v grep
 6795 root        252 S   /usr/sbin/dnsmasq --except-interface=lo -r /tmp/resolv.conf 
root@R9000:/etc# 

 

The 'try-all-ns' option is not even in the manpage; according to the README associated with this patch request this is a very strange edge case that is not intended for general use.

 

Date: Thu, 07 Dec 2006 00:41:43 -0500
From: Bob <REDACTED>
Subject: dnsmasq suggestion
To: simon@thekelleys.org.uk


Hello,

I recently needed a feature in dnsmasq for a very bizarre situation. I 
placed a list of name servers in a special resolve file and told dnsmasq 
to use that. But I wanted it to try requests in order and treat NXDOMAIN 
requests as a failed tcp connection. I wrote the feature into dnsmasq 
and it seems to work. I prepared a patch in the event that others might 
find it useful as well.

Thanks and keep up the good work.

--Bob

 

For the savy, you will notice the accepted patch actually has a bug; there is a boolean AND where a bitwise AND should be used. This was fixed in a later patch.

Message 22 of 31
isaki
Apprentice

Re: R9000 DNS Performance

Here is my proposed solution:

 

  • (REQUIRED) Remove try-all-ns from /etc/dnsmasq.conf
  • (OPTIONAL, SECURITY) Upgrade to a modern version of dnsmasq
  • (OPTIONAL) If trying all servers all the time is what NetGear really wants to do, replace the try-all-ns with the proper all-servers directive (man dnsmasq(8))
  • (OPTIONAL) Provide users via the UI in ADVANCED the ability to change between sequential and parallel DNS requests (i.e. whether or not all-servers is specified either on the command line or in dnsmasq.conf).

 

The problem now is how do I get NetGear to actually work on this without having to return my current router.

 

Any ideas @schumaku?

Message 23 of 31
isaki
Apprentice

Re: R9000 DNS Performance

After a bit if poking, parallel requests is not really a "good internet citizen" thing to do anyway, so perhaps it should not be enabled by default and come with a warning if someone wants to turn it on. Again, I leave the "what to replace try-all-ns" with up to the NetGear devs. However, regardless of the outcome of sequential vs parallel, 'try-all-ns' is not the right thing to do in all but the most esoteric situations.

Message 24 of 31
isaki
Apprentice

Re: R9000 DNS Performance

A quick update; the all-severs directive is not available on dnsmasq 2.38 so an upgrade would be required to use it. I have resolved the issue for myself (and this will work for anyone who wishes a fix now; note this will likely not surivive a firmware update; in fact I have backed up the original file and I will put the original back prior to upgrade so that any migration/upgrade scripts that have to run don't have an issue on a non-standard file).

 

  1. Go to http://www.routerlogin.net/debug.htm
  2. Enable telnet access.
  3. Telnet to the router's address (do nslookup routerlogin.net if you aren't sure).
  4. Login with your admin password.
  5. cp /etc/dnsmasq.conf /etc/dnsmasq.conf.bak
  6. Remove the try-all-ns option from /etc/dbnsmasq.conf
  7. Run dnsmasq -test to verify there are no issues (there should be no output).
  8. Reboot (for some reason, using /etc/init.d/dnsmasq causes the process to run as the wrong user even though your user is root, it shows up as guest in the process table; thus reboot).

For Step 6 above, you can use vi and edit the file by hand, or you can use the following one liner:

 

grep -v try-all-ns /etc/dnsmasq.conf.bak > /etc/dnsmasq.conf

I called NetGear and they have escalated this to their engineering org, so hopefully we see a fix in the firmware soon!

 

Note that it is not required to disable telnet; it is automatically disabled on reboot.

 

Message 25 of 31
Top Contributors
Discussion stats
  • 30 replies
  • 2935 views
  • 1 kudo
  • 3 in conversation
Announcements