M4300 - Switches Reboot from Chrome when accessing by FQDN name - All the way back to 12.0.7.9.

TL/DR: My desktop Chrome browswer has the magical ability to force any m4300 stack in my environment to reboot just by accessing it by fqdn. e.g http://switchname.company.com. Other posts and threads from the past indicate that users had "random" reboots throughout the months and then it would stop.

For many months now we have had random cases where we find the environment becoming unstable. Switch stacks will reboot, or we find them in a split-brain scenario i.e. Both members are master. I have opened support cases in the past when I experienced these events thinking it was spanning tree.
We would upload log’s, and everything looks fine and then momentum slows down and it goes away.

Last Friday around 4pm our three of four m4300 stacks went down. I went into the server room after I attempted to log in and found them all in a split-brain situation. I had to power restart both switches and take one half off-line. I would then work to get thing stable. When things were not rebooting, I would return to my desk and monitor. As soon as I would refresh my connection to the switches the network switch would go down. Cascade effect to any VM’s depending on that switch.

I then spent the entire weekend migrating all our critical VM’s to our offsite DR location. All the while one of our switches continued to reboot every 30 minutes slowing all progress down. After a full day Sunday on this remotely I had to come in and work at the office until 4:30 am Monday. I finally got the final switch to stop rebooting by completely disconnecting the uplink from our core stack.

Later on, Monday, at 10:30 I returned to the office and started to look into what might be causing this. The network crashed again. This time it was our core switch which impacted all VPN/RDP users. Thinking that it was a STP issue again we removed EVERYTHING from the core stack except for Infoblox (DHCP/DNS) and our internet connection. We pulled every single wire from our floor and edge environment. We then proceeded to network only users with their workstations at the office. We added back one other critical m4300 switch managing VM’s. Both the core and the VM switch rebooted a few more times throughout the evening and then we were stable.

Tuesday, I returned to the office and start to monitor our environment by attempting to connect to our core stack and the switch rebooted again. Assuming the problem was environment still I pulled our VM switch off the core.

Troubleshoot, troubleshoot… I waited a few hours looking into other problems. I attempt again to log into the web UI of the core. Network drops… go into server room, switch returns, log in and nothing looks amiss. Hour goes by, return to my desk, attempt to log in, network goes out. Bingo! Every time I attempt to connect to any of our switches by name e.g http://switch01.company.com the switch crashes and restarts. Try this on any other m4300 in our environment same thing.

We took 3 x m4300 that we had unused and set them up in a lab. I connected the server room laptop and my desk workstation to one switch at a time. Start monitoring by ping. I could connect on both devices by IP on the web interface. I could connect by name on the laptop. When I tried to use my desktop instant crash. Move to the next switch… repeat. Next switch… repeat.

Note: Repeated reboots was due to my desktop chrome session randomly and/or constantly refreshing the failed connections. If I leave my desktop connected it will just keep the crashes randomly happening.

Break fix attempts:
Change password – After password change = Crash. After restart login confirmed on laptop with new password – Try again from desktop = Crash.

HTTP Disable – Blocked access but not a good fix.

HTTP port change – Moved from port 80 to 1025. On desktop I had to modify the url to http://switch01.company.com:1025/v1/base/cheetah_login.html = crash.

Fix (Tried one time as of this writing):

Modified http://switch01.company.com:1025/v1/base/cheetah_login.html to http://%IPaddress%/v1/base/cheetah_login.html.
This then loaded the login page as I would expect. Attempting to go back to the FQDN login no longer caused the reboot.

Break it again:

Opened up chrome and continued to spam ctrl+shift+t to open all of my recent history. Moved a tab into a new page, launched a new fqdn connection. After many tries the crash began to happen again.

I used this opportunity to get more logs, do a firmware upgrade and crash it again and collect more logs.

Next I am going to try and repro this on different hardware workstations and laptops using the above break method.

5 Replies

Waywatcher
Aspirant
May 05, 2021
Netgear Testing:
New Name - Changed the the bottom switch to xxx-test. Rebooted. Updated hosts on desktop. Attempted to access xxx-test.company.com and was able to access the webpage without a crash. Opened multiple windows and on the 3rd window I signed in. CRASH!
After the restart I canceled the retry on the browser that crashed it. Moved over to another window that was not using the cheetah_login.html path. Refresh = crash #2.
After power off continued to crash.

Netgear Testing:
PW Change - Changed the password, saved config, restarted the switch. Confirmed password held. Desktop refresh = Crash. Reboot and Desktop Refresh = Crash. Power off the machine complete. After reboot and desktop refresh = Crash.

After a full power down and refresh = Crash.

Netgear Testing:
Factory Default - Reset bottom to factory default. Changed IP to 1company IP. Crash resumed. Changed subnet on switch to 255.255.255.0. Could not discover the switch from desktop. Set desktop to same subnet, it was flagged as out of range for our subnet and did not work to help connect with the switch. Reverted both subnets on switch and desktop, crash resumed.

Netgear Testing:
Mid Switch - 12.0.2.20 and 12.0.4.9 = No crash. Multiple attempts to repro the crash without luck.
Bot Switch - Still Factory default FW changed to 12.0.7.17. The web address is still using the older value without /v1/ that we are seeing in 12.0.8.15 and up = Crash.
I moved back to mid switch to see if I was no longer safe and could not force a crash.
2nd try on 12.0.7.17 = crash.
- schumaku
  Guru - Experienced User
  May 06, 2021
  LaurentMa please.
- Waywatcher
  Aspirant
  May 06, 2021
  Netgear Testing:
  New Name - Changed the the bottom switch name. Rebooted. Updated hosts on desktop. Attempted to access the new FQDN and was able to access the webpage without a crash. Opened multiple windows and on the 3rd window I signed in. CRASH!
  After the restart I canceled the retry on the browser that crashed it. Moved over to another window that was not using the cheetah_login.html path. Refresh = crash #2.
  After power off continued to crash.
  
  Netgear Testing:
  PW Change - Changed the password, saved config, restarted the switch. Confirmed password held. Desktop refresh = Crash. Reboot and Desktop Refresh = Crash. Power off the machine complete. After reboot and desktop refresh = Crash.
  
  After a full power down and refresh = Crash.
  
  Netgear Testing:
  Factory Default - Reset bottom to factory default. Changed IP to 1company IP. Crash resumed. Changed subnet on switch to 255.255.255.0. Could not discover the switch from desktop. Set desktop to same subnet, it was flagged as out of range for our subnet and did not work to help connect with the switch. Reverted both subnets on switch and desktop, crash resumed.
  
  Netgear Testing:
  Mid Switch - 12.0.2.20 and 12.0.4.9 = No crash. Multiple attempts to repro the crash without luck.
  Bot Switch - Still Factory default FW changed to 12.0.7.17. The web address is still using the older value without /v1/ that we are seeing in 12.0.8.15 and up = Crash.
  I moved back to mid switch to see if I was no longer safe and could not force a crash.
  2nd try on 12.0.7.17 = crash.
  
  Model: XSM4348CS|M4300-48X - Stackable Managed Switch with 48x10GBASE-T
  - Waywatcher
    Aspirant
    May 06, 2021
    Update:
    I created a stack in our test environment to get some data points. When I attempted to access the switch via FQDN the stack goes into a split-brain mode. Both become master and no reboot happens. Both switches will sit forever unable to be accessed. You must reboot both to get the stack functional again.
    Tested twice.

Forum Discussion

M4300 - Switches Reboot from Chrome when accessing by FQDN name - All the way back to 12.0.7.9.

5 Replies

Related Content

Accessing 3 switches - wrong subnet

Switching SRK60 to Access Point Mode

GS105Ev2 - Unable to access switch using a browser

M4300 configuration as AV network access switch

Problem accessing WebGUI on M4300 switch

NETGEAR Academy

ProSupport for Business