NETGEAR is aware of a growing number of phone and online scams. To learn how to stay safe click here.

Forum Discussion

jmozdzen's avatar
Mar 23, 2018
Solved

M4300-24x: spontaneous reboots dispite latest firmware

Hi *,

we're running a (until today) single M4300-24x 10G switch, with latest firmware (M4300-24X ProSAFE 20-port 10GBASE-T and 4-port 10G combo, 12.0.2.20, 1.0.0.9). Both with earlierer and current firmware, we have experienced unexpected reboots every few weeks (typically there are about 4 weeks uptime, but I've seen this after two weeks, too).

 

The remote syslog does not report anything special, i.e. the recent reboot (approx Mar 22 15:34):

 

--- cut here ---

Mar 22 12:57:26 s-22455-02-05 TRAPMGR[trapTask]: traputil.c(721) 56609 %% Link Up: lag 10
Jan 1 00:01:11 s-22455-02-05 TRAPMGR[dot1s_task]: traputil.c(763) 416 %% Spanning Tree Topology Change Received: MSTID: 0 lag 1
Jan 1 00:01:12 s-22455-02-05 TRAPMGR[dot1s_task]: traputil.c(763) 417 %% Spanning Tree Topology Change Received: MSTID: 0 lag 1
Jan 1 00:01:14 s-22455-02-05 TRAPMGR[dot1s_task]: traputil.c(763) 418 %% Spanning Tree Topology Change Received: MSTID: 0 lag 1
Jan 1 00:01:15 s-22455-02-05 TRAPMGR[dot1s_task]: traputil.c(763) 419 %% Spanning Tree Topology Change Received: MSTID: 0 lag 1
Jan 1 00:01:16 s-22455-02-05 TRAPMGR[dot1s_task]: traputil.c(763) 420 %% Spanning Tree Topology Change Received: MSTID: 0 lag 1
Jan 1 00:01:18 s-22455-02-05 TRAPMGR[dot1s_task]: traputil.c(763) 421 %% Spanning Tree Topology Change Received: MSTID: 0 lag 1
Jan 1 00:01:46 s-22455-02-05 TRAPMGR[SNMPCfgTask]: traputil.c(763) 422 %% Cold Start: Unit: 0
Mar 23 06:38:36 s-22455-02-05 CLI_WEB[emWeb]: emweb_common_custom.c(162) 467 %% HTTP Session 11 started for user admin connected from 192.168.103.12

---- cut here ---

 

So no pointer either before or after the reboot.

 

There has been no special (network / system) activity around the reboots, nor a pattern we could detect. Nobody was logged on to the switch at the time of reboot.

 

The release notes from 12.0.2.20 suggested that somebody reported similar behaviour: "M4300-12X12F (XSM4324S) - device rebooted itself without any reason, customer wants to know the root cause."

 

Any ideas on how to proceed?

 

Regards,

Jens

  • DaneA's avatar
    DaneA
    Sep 20, 2018

    jmozdzen,

     

    I suggest you to open a chat or online support ticket with NETGEAR Support then describe your concern and include the logs you have posted for it to be analyzed.  If ever the switch has been declared faulty, be ready to submit a .doc or .pdf copy of the Proof of Purchase or Sales Invoice of it for warranty verification.  Then, if the hardware warranty is still valid, an online replacement will follow. 

     

     

    Regards,

     

    DaneA

    NETGEAR Community Team

8 Replies

Replies have been turned off for this discussion
  • DaneA's avatar
    DaneA
    NETGEAR Employee Retired

    Hi jmozdzen,

     

    we're running a (until today) single M4300-24x 10G switch, with latest firmware (M4300-24X ProSAFE 20-port 10GBASE-T and 4-port 10G combo, 12.0.2.20, 1.0.0.9). Both with earlierer and current firmware, we have experienced unexpected reboots every few weeks (typically there are about 4 weeks uptime, but I've seen this after two weeks, too).

     

    The remote syslog does not report anything special, i.e. the recent reboot (approx Mar 22 15:34):

     

    --- cut here ---

    Mar 22 12:57:26 s-22455-02-05 TRAPMGR[trapTask]: traputil.c(721) 56609 %% Link Up: lag 10
    Jan 1 00:01:11 s-22455-02-05 TRAPMGR[dot1s_task]: traputil.c(763) 416 %% Spanning Tree Topology Change Received: MSTID: 0 lag 1
    Jan 1 00:01:12 s-22455-02-05 TRAPMGR[dot1s_task]: traputil.c(763) 417 %% Spanning Tree Topology Change Received: MSTID: 0 lag 1
    Jan 1 00:01:14 s-22455-02-05 TRAPMGR[dot1s_task]: traputil.c(763) 418 %% Spanning Tree Topology Change Received: MSTID: 0 lag 1
    Jan 1 00:01:15 s-22455-02-05 TRAPMGR[dot1s_task]: traputil.c(763) 419 %% Spanning Tree Topology Change Received: MSTID: 0 lag 1
    Jan 1 00:01:16 s-22455-02-05 TRAPMGR[dot1s_task]: traputil.c(763) 420 %% Spanning Tree Topology Change Received: MSTID: 0 lag 1
    Jan 1 00:01:18 s-22455-02-05 TRAPMGR[dot1s_task]: traputil.c(763) 421 %% Spanning Tree Topology Change Received: MSTID: 0 lag 1
    Jan 1 00:01:46 s-22455-02-05 TRAPMGR[SNMPCfgTask]: traputil.c(763) 422 %% Cold Start: Unit: 0
    Mar 23 06:38:36 s-22455-02-05 CLI_WEB[emWeb]: emweb_common_custom.c(162) 467 %% HTTP Session 11 started for user admin connected from 192.168.103.12

    ---- cut here ---

     

    So no pointer either before or after the reboot.

     

    There has been no special (network / system) activity around the reboots, nor a pattern we could detect. Nobody was logged on to the switch at the time of reboot.

    From the logs you have posted, there might be a loop within the existing network where the M4300-24x switch is connected since there are changes in the Spanning Tree Topology that occured.  

     

    Let us isolate the problem.  Kindly answer the questions below:

     

    a. How is everything connected?  It would be best that you post an image or screenshot of your detailed network diagram. 

    b. On the web-GUI of the M4300-24xz switch, go to Switching > STP > Basic > STP Configuration and check the Force Protocol Version selected.  On the STP Configuration, what is the Force Protocol Version selected?

     

    The release notes from 12.0.2.20 suggested that somebody reported similar behaviour: "M4300-12X12F (XSM4324S) - device rebooted itself without any reason, customer wants to know the root cause."

    This is one of the fix that should take effect after upgrading the firmware of the M4300-12X12F(XSM4324S) switch to v12.0.2.20.  

     

     

    Regards,


    DaneA

    NETGEAR Community Team

    • jmozdzen's avatar
      jmozdzen
      Tutor

      Hi DaneA,

       

      sorry for the late reply, I've been on the road.

       

      > From the logs you have posted, there might be a loop within the existing network where the M4300-24x switch is

      > connected since there are changes in the Spanning Tree Topology that occured.

       

      I don't think those are indicators for loops, but rather for actual topology changes (many LAGs across our switching landscape, and servers rebooting from time to time). I guess the reports came from the upstream switch because of the 24X's reboot.

       

      > Let us isolate the problem.  Kindly answer the questions below:

      >

      > a. How is everything connected?  It would be best that you post an image or screenshot of your detailed network diagram. 

       

      That'd be too much details ("everything") - concerning the M4300, it's been that single 24X chassis with a two-port LACP uplink to a core switch (via 1 Gbps, it's the LAG 1 you see in the messages) and a number of servers connected to the 24X, again two ports per server, LACP, mostly 10 Gbps, some 1 Gbps links.

       

      The -24X basically is a top-of-rack switch connecting dual-attached 10G servers (some still waiting to be upgraded from 1G) to the core switch.

       

      > b. On the web-GUI of the M4300-24xz switch, go to Switching > STP > Basic > STP Configuration and check the Force

      > Protocol Version selected.  On the STP Configuration, what is the Force Protocol Version selected?

       

      Rapid spanning tree (IEEE 802.1w).

       

      We now have switched to a dual-chassis configuration (2 pieces 24X side by side), let's see if we still get the reboots. But as these only occur with that multi-week delay, it'll likely be 6 weeks before I report back... if earlier, then because I got another unexpected reboot, which wouldn't be very appreciated ;) But of course I'll answer any upcoming questions before that.

       

      Regards,

      Jens

      • ttrue's avatar
        ttrue
        Aspirant

        Can you download the crashlog from the switch to see what has possibly caused the crash?

        The file is usually nvram:crash-log which you can copy via tftp/etc to your pc

         

        I believe that bugfix "M4300-12X12F (XSM4324S) - device rebooted itself without any reason, customer wants to know the root cause" was actually a ticket i had opened as i had a couple switches either crash  and reload or crash so remote management was broken.  

         

        I was able to see that the crash file contained some information about the LLDP process which is supposedly fixed in the 12.0.2.20 build.  

         

        ***************************************************
        * Start FASTPATH Stack Information *
        ***************************************************
        pid: 627
        TID: 0x961df300
        Task Name: lldpTask
        si_signo: 11
        si_errno: 0
        si_code: 1
        si_addr: 0x58f

         

        I should also note that i had to open a new ticket as I ran into issues with openflow when upgrading from older build to the 12.0.2.20 firmware. The fix for this was to disable openflow for now as the case is still being worked on.

         

        ***************************************************
        * Start FASTPATH Stack Information *
        ***************************************************
        pid: 1771
        TID: 0x97027300
        Task Name: openflowProtoTask
        si_signo: 11
        si_errno: 0
        si_code: 1
        si_addr: (nil)
        Date/Time: 1/1/1970 0:01:05
        SW ver: 12.0.2.20

         

        Im curious what your crash log may contain as in my case the issues i have run into all seemed to point to something in the crash log.  That may be helpful for you/support to take a look at that file

NETGEAR Academy

Boost your skills with the Netgear Academy - Get trained, certified and stay ahead with the latest Netgear technology! 

Join Us!

ProSupport for Business

Comprehensive support plans for maximum network uptime and business peace of mind.

 

Learn More