NETGEAR is aware of a growing number of phone and online scams. To learn how to stay safe click here.
Forum Discussion
jkberry924
Jun 01, 2024Follower
WAX630 - connected devices disconnect, "reconnect", cannot pull/renew DHCP without restarting radios
Buckle up. This one is wild. I'll preface this with the fact that I've been in all realms of IT for about 20 years. This problem has me literally chasing a ghost in my network, and short of "firmware bug" I'm borderline stumped.
Setup:
WAX630E (FW v10.8.6.3) - Trunked back to PfSense firewall
SSID1: VLAN 1010, 5/6ghz
SSID2: Multi-PSK (2), VLAN 1921, 1922, 2.4/5ghz
SSID3: VLAN 1722, 5/6ghz
Backhaul: Tagged, VLAN 1720 (mgmt)
PfSense:
DHCP pools x.5-x.50 on all 4 Wireless VLANs, as well as several other ethernet VLANS.
DNS Resolver for all VLANs.
I'll try to shorten this, but it's been a journey. Throughout all this troubleshooting I've gone back and forth between an AP issue or a firewall (DNS/DHCP) issue, and I'm settling on AP/firmware. A few months back, the wife's older Macbook would say it's connected to the network (SSID2, VLAN 1921), but not be able to browse the internet. Granted, there would be weeks between her uses and this was the only effected device at the time, so I'd finagle it to get back online, and if I couldn't then I'd connect it to VLAN 1922 (same SSID, different psk). This was the first symptom. Over time, we noticed some other oddities, but I believe that wound up being Apple's 'Private IP Address' BS. Eventually, this issue started occurring pretty regularly with the wife's iPhone and MacBook, and then it began happening to the Apple TV as well, essentially anything on that VLAN. What I concluded was that at seemingly random times these devices would fall off the network and although show they still maintained an active DHCP lease and IP, they would not have network connectivity. After disconnecting/reconnecting/rebooting, the devices would authenticate to the AP, but not be able to pull a new IP from DHCP. Then this began happening daily, but all this only happened on VLAN 1921. The other VLAN on the same SSID as well as the other SSID's never showed this issue. Through all the small changes and tweaks that I thought fixed it over the months which ended up only being temporary, what I now know is that each one of those changes reset/restart the radios when it's applied, and the devices would reconnect. For instance, changing 'DHCP broadcast to unicast' to off, the radios would restart and devices would reconnect. Then the next time I change it to on, and they'd again reconnect.
Fast forward to this weekend, I decided I was going to dive in and fix this issue instead of dealing with the symptoms. I factory-reset the AP and rebuilt the config manually. Devices seemed to reconnect, but within an hour or so the issue surfaced again. and then some. Again, only VLAN 1921 was affected, but a new issue surfaced that affected all wireless VLANs, something related to DNS. When I set public DNS statically, devices worked, mostly. But for whatever reason, and completely randomly, certain domains (completely random) were not resolvable on wireless clients, but were from ethernet clients. This maddening new symptom turned out to be the AP changing the destination port for DNS from UDP/53 to UDP/1200. I'm not even joking, and I have the logs to show it. Leaving wireless devices, the lookups were destined for UDP/53, and sniffing the traffic from the backhaul showed the port had changed to UDP/12000. I confirmed this in PfSense logs, and even put a bandaid fix in place to nat UDP/12000 -> UDP/53, and it worked. At this point I downgraded the firmware to a 10.5.x release from last summer and confirmed the issue was still present. I assumed hardware failure, and I bought a brand new WAX630e to replace it.
After receiving the new device, I configured it manually to the same setup used previously, and validated the DNS issue was resolved - DNS traffic was not altered, and the seemingly random domains that were not resolvable from wireless now were. However, within a couple hours, the exact same issue presented itself with devices on VLAN 1921 losing connectivity and not being able to reconnect. From the firewall side, I have completely ripped out everything related to that VLAN and recreated it, using different VLAN numbers, IP space, everything, and same with the AP. The issue followed that specific VLAN regardless. Again, no other SSIDs/VLANs are affected, and the issue doesn't follow the devices - I've moved those devices to other SSIDs and other devices to that one, the issue stays at SSID2:VLAN1921. I also removed the multi-psk configuration and left the SSID standalone, but same issue. Finally, I've deleted all other SSIDs and added them to SSID2 as more multi-psk's and tagged the VLANs appropriately. In this configuration ALL devices on all VLANs were effected. In looking at the logs, this is the most identifying line I can find that maybe points to something firmware related? Or at least confirms the issue is with the radios somehow. The last part of successful authentication, followed by 'get_cur_channel:'
nddmp[6423]: alarm : seqNo-[2981], level-[OTHER], info-[{"ip":"192.168.1.13","mac":"BC-D0-74-01-05-A1","hname":"[REDACTED_HOSTNAME]","txR":"6 Mbps","rxR":"270.80 Mbps","ssid":"Nothing","dOs":"Mac OS X","dType":"Laptop/PC","mode":"11AX","status":"Authenticated","bssid":"E0-46-EE-54-65-01","chnl":"100","chWidth":"20/40/80/160 MHz","rssi":"36","state":"QOS/HT/VHT/HE","type":"wpa2","idle":"0","time":"00:00:00","txB":"594","rxB":"0","pmf":"none","vlanID":"1921","Username":"Guest","radio":"1"}]
'get_cur_channel: None of the interfaces are in UP state for wifi2'
I am exhausted, and I welcome any questions, suggestions, or kind words lol. I still have the old AP with the weird DNS issue, and have all the logs.
1 Reply
- MarkTonnaerAspirant
I am not sure if this information can help you but I am struggling since june with WAX620 devices. My network has 3 AP's and 4 VLAN's and I trie a lot and I am searching for weeks and weeks. Netgear support is involved and in my case there was also one device changed for another.
I use Multi-PSK in a simple way to route to 3 VLAN's (the 4th is the management VLAN).
The problems that I experienced where that the AP's did crash and reboot constantly. They did lost the connection with the Insight portal constantly. I did switch to local mangement mode and re configure the 3 AP's manualy. At this moment I am running the latest FW and the problems did come back. When I do connect our Chromecast the AP that it is connected to crashes and rebootes or sometimes I have to cut the power off to reboot the device manualy.
When I connect my Apple Ipad to the WiFi and open my email the same is happening (to another AP).
I decide to make a test WiFi network and link them directly to VLAN10 and I do not use Multi-PSK for that SSID. When I do connect the Ipad (air 4th gen) to that SSID there is no problem and everything is stable.
Before I did this I did also test is with a USB-ethernet adapter an that was also working (just to confirm that it was not related to the rest of my network).
I am not sure if this information will help you with the root cause but I am convinced that there is a Multi-PSK related bu in the latest FW for the WAX device (I have the WAX-620).
Netgear is aware and I did ask them to do research to this problem and give feedback as soon as possible. I am curious if they wild find something.
Related Content
NETGEAR Academy
Boost your skills with the Netgear Academy - Get trained, certified and stay ahead with the latest Netgear technology!
Join Us!