NETGEAR is aware of a growing number of phone and online scams. To learn how to stay safe click here.
Forum Discussion
JinTu
Aug 26, 2021Star
LM1200 repeatedly dropping link
I recently purchased an LM1200-100NAS to serve as an LTE failover connection for my homelab and noticed a strange issue while setting up my pfSense-based router to use the LM1200 as a secondary WAN c...
labellama
Mar 18, 2022Tutor
JohnPeng & Et al.
What a troubling issue indeed. Please read below for what I've discovered, if you've been reading this topic for months you'll find it quite interesting.
AT&T hotspot SIM works fine using Bridge Mode in my LM1200, no link dropping no lost IP addresses, this we know already.
All of my testing was done using a MikroTik hAP ac2 running 7.1.3 software for reference.
The Verizon hotspot SIM on our business account in the LM1200 in Bridge Mode receives an IP then drops the ethernet link approximately 26 seconds later and of course loses the IP address, then approximately 23 seconds later the ethernet link is recovered and the cycle repeats. This is the first time I've seen the behavior documented to this level of detail.
I decided to leave the hAP ac2 running all night long, utilizing the LM1200 with the Verizon SIM and left no devices connected to the router. Curiously the Verizon LTE connection never dropped while no other devices were connected to the router. Further, the ZeroTier VPN connection stayed stable and I was able to remotely access the hAP ac2 without issue. This behavior was most curious and so began my attempt at a theory of why. I had something to do with devices connected behind the NAT of the hAP ac2. Sure enough when I connected a device to the router the next morning the LM1200's connection began the drop and reconnect behavior described above.
What is different when the router connects via the Verizon IP vs though the NAT of the router? Should be nothing, right? A proper NAT translates the device IPs behind it and Verizon shouldn't know any better. How could Verizon know? After digging back into years of IP header theory back in my college days, I had mine (it took a few days) - and it was the TTL value was wrong.
I added a rule in the hAP ac2 that adds 1 to the TTL of all IP packets passing though the NAT. After doing that, the Bridge Mode connection became solid. No cycling of the ethernet link, nor IP addresses releasing and renewing. Of course, upon removing the rule adding 1 to the TTL the connection is dysfunctional as described above. After dozens of tests, the behavior is controllable and predicable based upon the TTL change.
Ok, so your thinking then the Netgear programmers just need to fix this by adding 1 to the TTL when the LM1200 is in Bridge Mode. Wrong!
Just for giggles (does anyone really giggle in these moments?) I decided to really go for it and see what the heck was going on and see if I could determine who's issue it was, now that I'd confirmed what it was. No need to blame Netgear if indeed they were not at fault. It's easy and that's wrong.
Since the hAP ac2 has a USB port and supports the Verizon USB730L functioning as a pass though bridge device I decided to do an experiment with the same Verizon SIM used in the testing with the LM1200. Surprisingly both the USB730L and the LM1200 exhibit the same behavior where the connection is unstable with an unmanipulated TTL and adding 1 to the TTL stabilizes the connection.
Verizon's network used to work properly without TTL manipulation with the USB730L and the hAP ac2. At some point it stopped working. I thought it was a bug in the hAP ac2s software and said well hopefully they'll fix it someday. It wasn't the hAP ac2s software, something has changed Verizon's network, I'm not sure when it changed it's been a while. This is Verizon's issue, not Netgear's.
Unfortunately, I don't know how to get this to the right folks at Verizon. Most people who hear this are going to think it's network gibberish when attempting a support call. I tried to get help from Netgear regarding the device and was promised a call back the next day, the call never came and my Case #: 45696214 was closed in 7 days because I was unresponsive. This is where Netgear is at fault. I'm utterly disappointed in Netgear support, closing the case with no call back is absolutely unacceptable.
Hopefully this helps someone.
--Mike
- JinTuMar 20, 2022Star
labellama this is an interesting discovery. I just did a quick check and by default my pfSense-based router is setting the TTL (and hop limit for IPv6) to 64 for the gateway health checks (but this varies for other traffic on failover) What is the minimum TTL/hop limit that you are seeing on egress from the hAP ac2 after making this change?