× NETGEAR is aware of a growing number of phone and online scams. To learn how to stay safe click here.
Orbi WiFi 7 RBE973
Reply

Latest firmware (11.0.0.28) kills all switches

dialsc
Guide

Latest firmware (11.0.0.28) kills all switches

Hello,

 

I just upgraded the firmware of all the M5300 switches we've got, stand alone as well as stacks. I was running version 11.0.0.25, a beta version I've got from Netgear support, on all of them. After the upgrade all the switches started to crash. It looks like as soon as there is a port being used which is configured to do Port Authentication (802.1x), the switches start to crash. I had to revert back to the former firmware version on all the switchs.

 

Anyone seing the same behavior?

 

Greez,

 

dialsc

Model: M5300-28G_PoE+ (GSM7228PSv1h2)|ProSAFE 24+4 L2+ Gigabit Stackable Managed Switch, M5300-28G3 (GSM7328Sv2h2)|ProSAFE 24-port Managed L3 Gigabit Stackable Switch, M5300-28GF3 (GSM7328FSv2)|ProSAFE 24F+4 Layer 3 Gigabit Stackable Managed Switch
Message 1 of 19
LaurentMa
NETGEAR Expert

Re: Latest firmware (11.0.0.28) kills all switches

Hi dialsc

 

Sorry for your issue here. We are trying to reproduce it in our labs with Tech Support teams and will report back here our findings.

 

Regards,

Message 2 of 19
dialsc
Guide

Re: Latest firmware (11.0.0.28) kills all switches

Hi Laurent,

 

very nice, thanks. Let me know if I can be of any assistance to that.

 

Greez,

 

dialsc

Message 3 of 19
CaseyH
NETGEAR Expert

Re: Latest firmware (11.0.0.28) kills all switches

@dialsc,

 

Can you PM your tech-support file so I can reproduce this in the lab?

 

To get the tech-support file you need to:

 

1. console or telnet to your switch.

2. login

3. type "enable"

4. type "show tech-support"

5. Login to the GUI

6. goto "Maintenance\Upload\HTTP File Upload" and select "Tech Support" from the File type drop down box.

 

 

Thank you,

Casey

Message 4 of 19
dialsc
Guide

Re: Latest firmware (11.0.0.28) kills all switches

Hi,

 

right now I'm in the process of recovering our Hyper-V farm which totally crashed due to the switches rebooting... 

 

Once I finished that I will seperate one switch, update the firmware and try to reproduce the problem. I will then do what you asked me to do.

 

Okay for you?

 

Greez,

 

dialsc

Message 5 of 19
CaseyH
NETGEAR Expert

Re: Latest firmware (11.0.0.28) kills all switches

@dialsc

 

That would be fine, or if you just send me your current tech support I can start looking into what might have happend.

Message 6 of 19
dialsc
Guide

Re: Latest firmware (11.0.0.28) kills all switches

Hey,

 

Unfortunately I did not get any response from Casey anymore during the last two or three weeks. Would it be possible that someone else takes a look to this? I'm 100% able to reproduce this problem on the M5300 switches by:

 

  1. Unplug all ports which are configured to do port authentication (802.1X )
  2. Stop the RADIUS service at the two RADIUS servers we have
  3. Plug in one device to one port configured to port authentication (802.1X )

Here's an extract of the console logs showing what happens on the switch while forcing the problem (from the reboot after the firmware 11.0.0.28 was selected as the boot firmware):

:

Uboot M5300 VerNo=1.0.0.5

starting pid 30, tty '': '/etc/init.d/rcS'

starting pid 37, tty '/dev/ttyS0': '/etc/rc.d/rc.fastpath'

 

Checking for application

Starting Operational code ...

         Rel 11, Ver 0, Maint Lev 0, Bld No 28

Uncompressing apps.lzma

SyncDB Running...

<9> Jan  1 00:00:21 0.0.0.0-0 General[fp_main_task]: unitmgr.c(6477) 1 %% Reboot 1 (0x1)

DMA pool size: 8388608

hpc - No stack ports. Starting in stand-alone mode.

 

<10> Jan  1 00:00:29 0.0.0.0-1 General[fp_main_task]: bootos.c(197) 10 %% Event(0xaaaaaaaa)                                                                                                                                                 started!

<9> Jan  1 00:00:30 0.0.0.0-1 SIM[Cnfgr_Thread ]: sim_util.c(3787) 11 %% Switch was reset due to operator intervention.

(Unit 1)>

Applying Global configuration, please wait ...

Applying Interface configuration, please wait ...

 

 

User:

Config file 'startup-config' created successfully .

<15> Feb 15 13:26:52 AS-0003-0001-1 CLI_WEB[emWeb]: cli_txtcfg.c(810) 274 %% Text Configuration of length 10610 (420 CMDS) compressed to save 0% Flash

 

<14> Feb 15 13:26:52 AS-0003-0001-1 UNITMGR[emWeb]: unitmgr.c(6778) 275 %% Configuration propagation successful for config type 0

<15> Feb 15 13:26:53 AS-0003-0001-1 CLI_WEB[umWorkerTask]: cli_txtcfg.c(810) 276 %% Text Configuration of length 10610 (420 CMDS) compressed to save 0% Flash

 

<15> Feb 15 13:28:22 AS-0003-0001-1 SNTP[SNTP]: sntp_client.c(1900) 277 %% SNTP: system clock synchronized on Wed Feb 15 12:28:22 2017 UTC. Indicates that SNTP has successfully synchronized the time of the box with the server (192.168.48.6).

<13> Feb 15 13:28:53 AS-0003-0001-1 TRAPMGR[trapTask]: traputil.c(735) 278 %% Link Up: 1/0/2

<13> Feb 15 13:29:03 AS-0003-0001-1 TRAPMGR[trapTask]: traputil.c(735) 279 %% Link Down: 1/0/2

<13> Feb 15 13:29:08 AS-0003-0001-1 TRAPMGR[trapTask]: traputil.c(735) 280 %% Link Up: 1/0/2

<15> Feb 15 13:29:19 AS-0003-0001-1 RADIUS[dot1xTask]: radius_api.c(911) 281 %% RADIUS: radiusAccessRequestMsgSend(): Server index 255

 

<15> Feb 15 13:29:19 AS-0003-0001-1 RADIUS[dot1xTask]: radius_api.c(911) 282 %% RADIUS: radiusAccessRequestMsgSend(): Server index 255

 

<15> Feb 15 13:29:19 AS-0003-0001-1 RADIUS[dot1xTask]: radius_api.c(911) 283 %% RADIUS: radiusAccessRequestMsgSend(): Server index 255

 

<15> Feb 15 13:29:19 AS-0003-0001-1 RADIUS[dot1xTask]: radius_api.c(911) 284 %% RADIUS: radiusAccessRequestMsgSend(): Server index 255

 

<15> Feb 15 13:29:19 AS-0003-0001-1 RADIUS[dot1xTask]: radius_api.c(911) 285 %% RADIUS: radiusAccessRequestMsgSend(): Server index 255

 

<15> Feb 15 13:29:20 AS-0003-0001-1 RADIUS[dot1xTask]: radius_api.c(911) 286 %% RADIUS: radiusAccessRequestMsgSend(): Server index 255

 

<14> Feb 15 13:29:20 AS-0003-0001-1 RADIUS[radius_task]: radius.c(1395) 287 %% RADIUS: MS attribute type =14

 

<14> Feb 15 13:29:20 AS-0003-0001-1 RADIUS[radius_task]: radius.c(1395) 288 %% RADIUS: MS attribute type =15

 

<13> Feb 15 13:29:20 AS-0003-0001-1 DOT1X[dot1xTask]: dot1x_radius.c(933) 289 %% Client 70:5A:0F:13:82:1B authenticated successfully on the port 1/0/2

 

<15> Feb 15 13:29:28 AS-0003-0001-1 SNTP[SNTP]: sntp_client.c(1900) 290 %% SNTP: system clock synchronized on Wed Feb 15 12:29:28 2017 UTC. Indicates that SNTP has successfully synchronized the time of the box with the server (192.168.48.6).

<15> Feb 15 13:31:38 AS-0003-0001-1 SNTP[SNTP]: sntp_client.c(1900) 291 %% SNTP: system clock synchronized on Wed Feb 15 12:31:38 2017 UTC. Indicates that SNTP has successfully synchronized the time of the box with the server (192.168.48.6).

<13> Feb 15 13:32:02 AS-0003-0001-1 TRAPMGR[trapTask]: traputil.c(735) 292 %% Link Down: 1/0/2

<13> Feb 15 13:32:09 AS-0003-0001-1 TRAPMGR[trapTask]: traputil.c(735) 293 %% Link Up: 1/0/2

<15> Feb 15 13:32:09 AS-0003-0001-1 RADIUS[dot1xTask]: radius_api.c(911) 294 %% RADIUS: radiusAccessRequestMsgSend(): Server index 255

 

<15> Feb 15 13:32:39 AS-0003-0001-1 RADIUS[dot1xTask]: radius_api.c(911) 295 %% RADIUS: radiusAccessRequestMsgSend(): Server index 255

 

<15> Feb 15 13:32:43 AS-0003-0001-1 SNTP[SNTP]: sntp_client.c(1900) 296 %% SNTP: system clock synchronized on Wed Feb 15 12:32:43 2017 UTC. Indicates that SNTP has successfully synchronized the time of the box with the server (192.168.48.6).

 

 

 

 

 

 

 

 

Switching software SIGSEGV Handler

This build was configured to copy this crash information to

  a file.

Symbols already loaded.

 

<14> Feb 15 13:32:54 AS-0003-0001-1 DHCP6S[tDhcp6sTask]: dhcp6s_main.c(306) 297 %% Failed to receive data on DHCPv6 client socket.

 

<14> Feb 15 13:32:54 AS-0003-0001-1 DHCP6S[tDhcp6sTask]: dhcp6s_main.c(453) 298 %% Failed to receive data on DHCPv6 server socket.

 

<13> Feb 15 13:32:54 AS-0003-0001-1 IPV6[ip6MapRadvdTask]: ip6_radvd.c(113) 299 %% ip6MapRadvdTask: error selecting

 

<13> Feb 15 13:32:54 AS-0003-0001-1 IPV6[ip6MapNbrDiscTa]: ip6map.c(1068) 300 %% ip6MapNDiscTask: error selecting

 

<13> Feb 15 13:32:54 AS-0003-0001-1 RPCSRV[tRpcsrv.01000]: rpcsrv_task.c(259) 301 %% Server socket receive error: inst=3 sockid=1000 total_seen=1 errno=4 (Interrupted system call)

<13> Feb 15 13:32:54 AS-0003-0001-1 RPCSRV[tRpcsrv.00021]: rpcsrv_task.c(259) 302 %% Server socket receive error: inst=2 sockid=21 total_seen=1 errno=4 (Interrupted system call)

<13> Feb 15 13:32:54 AS-0003-0001-1 RPCSRV[tRpcsrv.00002]: rpcsrv_task.c(259) 303 %% Server socket receive error: inst=1 sockid=2 total_seen=1 errno=4 (Interrupted system call)

<13> Feb 15 13:32:54 AS-0003-0001-1 RPCSRV[tRpcsrv.00001]: rpcsrv_task.c(259) 304 %% Server socket receive error: inst=0 sockid=1 total_seen=1 errno=4 (Interrupted system call)

unable to open pipe file

: Interrupted system call

../../../../src/mgmt/broadcom/cli_web_mgr/cli_web_util.c: 1475: Failed to create cliWebIORedirectHandle

<14> Feb 15 13:32:55 AS-0003-0001-1 General[webJavaTask]: web_java_socket.c(5492) 305 %% Error on select.

 

<9> Feb 15 13:32:55 AS-0003-0001-1 SIM[SimAddrConflict]: sim.c(4392) 306 %% Failed to read conflicting ARP packet from stack - errno Resource temporarily unavailable

osapiInputTask() stdin read failed: Interrupted system call

starting pid 1291, tty '': '/etc/rc.d/rc.reboot'

syncing filesystems....This may take a few moments

Rebooting system!

Sent SIGKILL to all processes

Requesting system reboot

Restarting system.

 

 

 

 

 

 

 

 

 

Uboot M5300 VerNo=1.0.0.5

starting pid 30, tty '': '/etc/init.d/rcS'

starting pid 37, tty '/dev/ttyS0': '/etc/rc.d/rc.fastpath' 

New crash information files detected!

 

Checking for application

Starting Operational code ...

         Rel 11, Ver 0, Maint Lev 0, Bld No 28

Uncompressing apps.lzma

SyncDB Running...

File: , Line: 0, Reboot 2 (0x2)

 

<9> Jan  1 00:00:21 0.0.0.0-0 General[fp_main_task]: (0) 1 %% Reboot 2 (0x2)

DMA pool size: 8388608

hpc - No stack ports. Starting in stand-alone mode.

 

<10> Jan  1 00:00:30 0.0.0.0-1 General[fp_main_task]: bootos.c(197) 9 %% Event(0xaaaaaaaa)                                                                                                                                                  started!

<9> Jan  1 00:00:30 0.0.0.0-1 SIM[Cnfgr_Thread ]: sim_util.c(3790) 10 %% Switch was reset due to a software exception.

(Unit 1)>

Applying Global configuration, please wait ...

Applying Interface configuration, please wait ...

Message 7 of 19
dialsc
Guide

Re: Latest firmware (11.0.0.28) kills all switches

I would like to bring this one to Netgear's attention again. This issue still exists and it would be cool if someone could care about it. I'm pretty sure I'm not the only one using port authentication on this device so I'm not the only one potentially suffering from this issue.

 

Message 8 of 19
LaurentMa
NETGEAR Expert

Re: Latest firmware (11.0.0.28) kills all switches

Hi dialsc

Let me ring the bell for you. I am sorry that the investigation went nowhere so far. It might mean it was impossible to reproduce, but let me make figure out.

We will be back to you with the answer.

Regards,
Message 9 of 19
dialsc
Guide

Re: Latest firmware (11.0.0.28) kills all switches

Hi Laurent,

 

Do you mind ringing that bell again? Two weeks later and still nothing happend.

 

Greetings,

 

dialsc

Message 10 of 19
LaurentMa
NETGEAR Expert

Re: Latest firmware (11.0.0.28) kills all switches

Hi dialsc

Thank you for the heads-up today. Yes this case is hard to reproduce and stands at Technical Support organization for now. We will get you an update as soon as possible. 

 

Regards,

Message 11 of 19
dialsc
Guide

Re: Latest firmware (11.0.0.28) kills all switches

Hi Laurent,

 

Let me know if I can help you. It takes me approx. 5 minutes to reproduce this problem... 😉

 

Best,

 

dialsc

Message 12 of 19
OrbitIT
Aspirant

Re: Latest firmware (11.0.0.28) kills all switches

We are also experiencing the same issue and we can reproduce this issue by having a switch that is configured for MAC Based Port Authentication and has a RADIUS Server configured that it cannot communicate with.  We have a switch configured this way with nothing connected to it and when a device is connected to a port that is configured for MAC Based Port Authentication the switch will show the following message on the console and then reboot:

 

<9> Jan 1 00:01:40 192.168.11.85-1 SIM[SimAddrConflict]: sim.c(4392) 256 %% Failed to read conflicting ARP packet from stack - errno Resource temporarily unavailable
osapiInputTask() stdin read failed: Interrupted system call
starting pid 601, tty '': '/etc/rc.d/rc.reboot'

syncing filesystems....This may take a few moments
Rebooting system!
Sent SIGKILL to all processes
Requesting system reboot
Restarting system.

Model: M5300-52G-PoE+ (GSM7252PSv1h2)|ProSAFE 48+4 L2+ PoE Stackable Managed Switch
Message 13 of 19
LaurentMa
NETGEAR Expert

Re: Latest firmware (11.0.0.28) kills all switches

Hi OrbitIT

 

Welcome to the Community!

 

I am sorry for your issue. Yes it looks like the same. Please contact Technical Support and log this issue as well. If the switch is registered, on the online portal at my.netgear.com please go through the case creation process by clicking the "Call us" button. Once in this section please fill out the case summary, mentionning this post on the Community, and the escalation issue #14398.

 

So far we can reproduce it, and we escalated the issue to our Engineering team for debugging.

 

We hope to resolve this issue very rapidly. In meantime please use former version of M5300 firmware.

 

By logging your issue at Technical Support, you will be able to receive corrective patch faster.

 

Regards,

Message 14 of 19
OrbitIT
Aspirant

Re: Latest firmware (11.0.0.28) kills all switches

We did contact Technical Support and created a case.  I was told that they were looking into this issue and would follow up with us yesterday but still no word.

Model: M5300-52G-PoE+ (GSM7252PSv1h2)|ProSAFE 48+4 L2+ PoE Stackable Managed Switch
Message 15 of 19
dialsc
Guide

Re: Latest firmware (11.0.0.28) kills all switches

I was contacted by a support engineer and informed that even development was able to reproduce the problem meanwhile and that they are working on it right now. So let's see... 😉

Message 16 of 19
LaurentMa
NETGEAR Expert

Re: Latest firmware (11.0.0.28) kills all switches

Please make sure you have your case updated with escalation issue #14398. This way Tech Support team will provide you with the corrective patch as soon as it's available.

The good news is that our Engineering team where this issue was escalated, has come with new Engineering build fixing this issue today. Per our processes it is now quickly going through assessment / testing before the image can be transferred to the Tech Support team in charge of the issue. I sincerely hope you will receive a patch in very next days. We strive to go even faster than that.

If the patch works well for you also, then we'll integrate the fix in next public maintenance release with a new firmware version.


Message 17 of 19
dialsc
Guide

Re: Latest firmware (11.0.0.28) kills all switches

Hello,

 

I would like to inform you that my problem was fixed with the beta firmware (11.0.0.30) I got from support. Thank you very much for your help!

 

Best.

 

dialsc

Message 18 of 19
LaurentMa
NETGEAR Expert

Re: Latest firmware (11.0.0.28) kills all switches

This is good news. We are going to merge this in our next maintenance release after some more internal testing on our side.

 

Thanks again,

Message 19 of 19
Top Contributors
Discussion stats
  • 18 replies
  • 6365 views
  • 0 kudos
  • 4 in conversation
Announcements