Forum Discussion

Tutor

Mar 23, 2018

Solved

M4300-24x: spontaneous reboots dispite latest firmware

Hi *, we're running a (until today) single M4300-24x 10G switch, with latest firmware (M4300-24X ProSAFE 20-port 10GBASE-T and 4-port 10G combo, 12.0.2.20, 1.0.0.9). Both with earlierer and current ...

Firmware

Troubleshooting

DaneA
Sep 20, 2018
jmozdzen,

I suggest you to open a chat or online support ticket with NETGEAR Support then describe your concern and include the logs you have posted for it to be analyzed. If ever the switch has been declared faulty, be ready to submit a .doc or .pdf copy of the Proof of Purchase or Sales Invoice of it for warranty verification. Then, if the hardware warranty is still valid, an online replacement will follow.

Regards,

DaneA

NETGEAR Community Team

jmozdzen

Tutor

Mar 29, 2018

Hi DaneA,

sorry for the late reply, I've been on the road.

> From the logs you have posted, there might be a loop within the existing network where the M4300-24x switch is

> connected since there are changes in the Spanning Tree Topology that occured.

I don't think those are indicators for loops, but rather for actual topology changes (many LAGs across our switching landscape, and servers rebooting from time to time). I guess the reports came from the upstream switch because of the 24X's reboot.

> Let us isolate the problem. Kindly answer the questions below:

> a. How is everything connected? It would be best that you post an image or screenshot of your detailed network diagram.

That'd be too much details ("everything") - concerning the M4300, it's been that single 24X chassis with a two-port LACP uplink to a core switch (via 1 Gbps, it's the LAG 1 you see in the messages) and a number of servers connected to the 24X, again two ports per server, LACP, mostly 10 Gbps, some 1 Gbps links.

The -24X basically is a top-of-rack switch connecting dual-attached 10G servers (some still waiting to be upgraded from 1G) to the core switch.

> b. On the web-GUI of the M4300-24xz switch, go to Switching > STP > Basic > STP Configuration and check the Force

> Protocol Version selected. On the STP Configuration, what is the Force Protocol Version selected?

Rapid spanning tree (IEEE 802.1w).

We now have switched to a dual-chassis configuration (2 pieces 24X side by side), let's see if we still get the reboots. But as these only occur with that multi-week delay, it'll likely be 6 weeks before I report back... if earlier, then because I got another unexpected reboot, which wouldn't be very appreciated ;) But of course I'll answer any upcoming questions before that.

Regards,

Jens

ttrue

Aspirant

Mar 30, 2018

Can you download the crashlog from the switch to see what has possibly caused the crash?

The file is usually nvram:crash-log which you can copy via tftp/etc to your pc

I believe that bugfix "M4300-12X12F (XSM4324S) - device rebooted itself without any reason, customer wants to know the root cause" was actually a ticket i had opened as i had a couple switches either crash and reload or crash so remote management was broken.

I was able to see that the crash file contained some information about the LLDP process which is supposedly fixed in the 12.0.2.20 build.

***************************************************
* Start FASTPATH Stack Information *
***************************************************
pid: 627
TID: 0x961df300
Task Name: lldpTask
si_signo: 11
si_errno: 0
si_code: 1
si_addr: 0x58f

I should also note that i had to open a new ticket as I ran into issues with openflow when upgrading from older build to the 12.0.2.20 firmware. The fix for this was to disable openflow for now as the case is still being worked on.

***************************************************
* Start FASTPATH Stack Information *
***************************************************
pid: 1771
TID: 0x97027300
Task Name: openflowProtoTask
si_signo: 11
si_errno: 0
si_code: 1
si_addr: (nil)
Date/Time: 1/1/1970 0:01:05
SW ver: 12.0.2.20

Im curious what your crash log may contain as in my case the issues i have run into all seemed to point to something in the crash log. That may be helpful for you/support to take a look at that file

jmozdzen
Tutor
Apr 09, 2018
Hi ttrue,

thank you for taking the time to respond.

> Can you download the crashlog from the switch to see what has possibly caused the crash?
> The file is usually nvram:crash-log which you can copy via tftp/etc to your pc

Instead of that file, I see a directory named "crashlogs"

--- cut here ---
(M4300-24X) #dir

1 drwx 3576 Mar 25 2018 15:43:14 .
1049 drwx 0 Nov 13 2017 05:59:32 ..
[...]
72 drwx 160 Jan 01 2016 00:00:01 crashlogs
[...]
--- cut here ---

Judging from the time stamp of that directory, it is likely empty - but I've not found a way to verify that.

Regards,
Jens
- ttrue
  Aspirant
  Apr 09, 2018
  I dont think the size of that folder is actually a indicator of the crash.
  The switch that crahed for me looks the same way, but he command that downloads the logs grabs the file you need.
  
  Here is what mine looks like that crashed.
  
  70 drwx 488 Jan 01 1970 00:00:25 crashlogs
  1022 drwx 0 Jan 01 1970 00:00:36 ramcp
  1003 -rw- 3833 Mar 11 2018 13:15:23 fastpathRun.cfg
  76 -rw- 596 Jan 01 1970 00:00:40 hpc_broad.cfg
  796 -rw- 495 Jan 01 2016 00:11:14 coredump_log.old
  986 -rw- 493 Mar 03 2018 15:18:06 coredump_log.txt
  
  The file that matches up with the crash time is the coredump_log.txt
  
  If you have a TFTP server use the copy nvram:crash-log command to grab that file as it may contain a bit more information on what caused your crash. The switches I have been using have been good so far with the 12.0.2.20 firmware (I have noticed they have released 12.0.4.8 but it seems to be for the new model and not a lot of fixes that looked important)
  - jmozdzen
    Tutor
    Apr 09, 2018
    Hi ttrue,
    
    it may well be that the core dump is lost because I not only restarted the switch multiple times since the crash, but also turned it into a stack member.
    
    > use the copy nvram:crash-log command to grab
    That command succeeds, but leave me with a zero-byte file on the tftp server (access to the file is ok, I double-checked by copying the running configuration to that file). Sounds reasonable, since I have no core dump in my "dir" output, and was the reason I was looking at the "crashlogs" directory in the first place. BTW, what caught my eye was the time stamp, not the size.
    
    Seems like I'll have to wait for the next occurence of the problem, if it will happen again at all. Now that I have a redundant stack, rebooting a single chassis shouldn't hurt, so "Murphy" says no spontaneous reboots anymore ;)
    
    Thanks again,
    Jens

NETGEAR Academy

Boost your skills with the Netgear Academy - Get trained, certified and stay ahead with the latest Netgear technology!

Join Us!

ProSupport for Business

Comprehensive support plans for maximum network uptime and business peace of mind.

Learn More

Forum Discussion

M4300-24x: spontaneous reboots dispite latest firmware

Related Content

M4300-12X12F replacing M7100-24X - Wireless Access Points not working

M4300 24X: sudden restarts

M4300-24x: switching ports to "stack" mode needs reboot?

M4300-24x

M4300-24X - multiple sudden restarts

NETGEAR Academy

ProSupport for Business