NETGEAR is aware of a growing number of phone and online scams. To learn how to stay safe click here.

Forum Discussion

bbs2web's avatar
bbs2web
Guide
May 25, 2025

M4300-12X12F unstable on 12.0.19.x

We have approximately 15 Netgear M4300 stacks where we hit a problem upgrading firmware. We've subsequently paused updating the others.

 

The following stacks upgraded without problem and remain stable afterwards:

  • 2 x M4300-24X24F + 2 x M4300-28G
  • 2 x M4300-8X8F
  • 2 x M4300-24X

We however had a problem with the following stack, where the 2nd unit was rebooting regularly (between 1-5 hours of update between restarts):

  • 2 x M4300-12X12F

 

Unit 1 has remained stable but unit 2 reports the following issue:

May 23 02:54:40 zanjnb01-swd1l5-02-2 BSP[unitMgrTask]: cpu_utils.c(1294) 25321 %% CPLD(0x30): Unable to read @reg(0x50).
May 23 02:54:40 zanjnb01-swd1l5-02-2 BSP[unitMgrTask]: cpu_utils.c(1294) 25322 %% CPLD(0x30): Unable to read @reg(0x02).
May 23 02:54:40 zanjnb01-swd1l5-02-2 BSP[unitMgrTask]: cpu_utils.c(1322) 25323 %% CPLD(0x30): Unable to write data(0xa5)@reg(0x02).
May 23 02:54:40 zanjnb01-swd1l5-02-2 BSP[unitMgrTask]: cpu_utils.c(1294) 25324 %% CPLD(0x30): Unable to read @reg(0x10).
May 23 02:54:41 zanjnb01-swd1l5-02-2 BSP[unitMgrTask]: cpu_utils.c(1294) 25325 %% CPLD(0x30): Unable to read @reg(0x11).
May 23 02:54:41 zanjnb01-swd1l5-02-2 BSP[unitMgrTask]: cpu_utils.c(1294) 25326 %% CPLD(0x30): Unable to read @reg(0x19).
May 23 02:54:41 zanjnb01-swd1l5-02-2 BSP[unitMgrTask]: cpu_utils.c(1294) 25327 %% CPLD(0x30): Unable to read @reg(0x50).
May 23 02:54:41 zanjnb01-swd1l5-02-1 BOXSERV[boxs Req]: boxs.c(1452) 25328 %% Unit 2 power supply 1 FAILURE event (4) occurred.
May 23 02:54:41 zanjnb01-swd1l5-02-1 TRAPMGR[boxs Req]: traputil.c(795) 25329 %% Power supply state change alarm: Power Supply Unit: 2  ID: 1  Event: 4 - Failure
May 23 02:54:41 zanjnb01-swd1l5-02-2 BSP[unitMgrTask]: cpu_utils.c(1294) 25330 %% CPLD(0x30): Unable to read @reg(0x1a).
May 23 02:54:41 zanjnb01-swd1l5-02-2 BSP[unitMgrTask]: cpu_boxs.c(1259) 25331 %% FAN Module interrupt received on the unit 2 fan 1 fail

 

 

We interpreted the above as a bad PSU in unit 2 and replaced the PSU with a spare, replaced the power cable and move it to another socket on a different PDU after it still restarted. The above message about the PSU having failed is however most probably a red herring, in that the FAN is also reported as having failed. My gut tells me that the stack master looses sync with the unit which sporadically reboots and then reports the various components as having failed.

 

 

The configuration on this problematic switch stack is fairly simple, in that it defines a variety of LAGs balanced between the two stack members, defines a couple of layer 2 VLANs that are distributed over LAGs and has two interfaces configured to do VLAN stacking (double VLAN). The other stacks all have a similar configuration.

 

We are not doing layer 3 on these switch stacks, except for management of the devices themselves.

 

Is this a known issue?

3 Replies

  • With both Power Supply -and- Fan control related errors I suspect a hardware issue on a specific internal controller. 

     

    Consider challenging Netgear Support via HTTPS://my.netgear.com/ or direct the Netgear AV team, lead by LaurentMa​  LaurentMa​ LaurentMa​  (no idea why this new community platform won't find his certainly existing account, but that's another issue In reporting since it was deployed ChristineT​

     

    Looks like a hardware warranty replacement is required.

     

    Regards,

    -Kurt.

  • So did we, we however swapped the PSU in unit 2 and additionally replaced the power cable and routed it to another power distribution panel in the rack.

     

    Downgrading to 12.0.17.12 has however stabilised things. It was a little better on 12.0.17.16 (lasting almost 12 hours but then restarting again 3 hours later), but stable for 1+ day now on 12.0.17.12.

    It was restarting regularly (between 1-5 hours) when running 12.0.19.10


    PS: The M4300-12X12F is unfortunately single corded. I believe that the warnings shown in the logs appear to be red herrings, in that monitoring to the 2nd unit when it restarts simply indicates read errors and then fails certain components (eg PSU, FANs, ports, etc).

     

  • Stable for 52+ hours after downgrading firmware, in our testing a stack of 2 x M4300-12X12F switches resulted in unit 2 restarting randomly between 1-5 hours of uptime when running:

    • 12.0.19.10
    • 12.0.19.7
    • 12.0.17.16

    Stable with:

    • 12.0.17.12