Reply
bmomjian
Guide

Occasional wireless traffic hang

I am seeing occasional hangs on my WAP network. Ssh to the WAP normally shows 95% idle and 0% isrq for top:

Tue Jan 22 23:00:01 UTC 2019
 23:00:01 up 1 day, 23:54, load average: 3.01, 3.01, 3.04
Mem: 78828K used, 176300K free, 0K shrd, 0K buff, 23848K cached
CPU:   4% usr   0% sys   0% nic  95% idle   0% io   0% irq   0% sirq
Load average: 3.01 3.02 3.05 1/66 15057
  PID  PPID USER     STAT   VSZ %MEM CPU %CPU COMMAND
15057 15054 root     R     1408   1%   0   5% top -b -n1
 1405  1179 root     S    13876   5%   1   0% /usr/sbin/snmpd -f -c /tmp/snmpd.c
 1179     1 root     S    12044   5%   0   0% /usr/sbin/dman
 1172     1 root     S     4992   2%   0   0% /usr/sbin/mapd

 

Since the hang 20 minutes ago, there almost no WAP network traffic, but ssh to the WAP shows that top has 50% idle and a 50% sirq (ksoftirqd) value:

 

Tue Jan 22 23:20:45 UTC 2019
23:20:45 up 2 days, 15 min, load average: 4.02, 4.04, 3.74
Mem: 78904K used, 176224K free, 0K shrd, 0K buff, 23872K cached
CPU: 0% usr 0% sys 0% nic 50% idle 0% io 0% irq 50% sirq
Load average: 4.03 4.04 3.74 2/66 19837
PID PPID USER STAT VSZ %MEM CPU %CPU COMMAND
9 2 root RW 0 0% 1 50% [ksoftirqd/1]
1405 1179 root S 13876 5% 0 0% /usr/sbin/snmpd -f -c /tmp/snmpd.c
1179 1 root S 12044 5% 0 0% /usr/sbin/dmand
1172 1 root S 4992 2% 0 0% /usr/sbin/mapdd


Any idea why this is happening?   It is happening regularly.  I am on firmware 3.9.0.3 and recently did a factory default reset.

 

Model: WAC730|3x3 Wireless-AC Access Points
Message 1 of 37

Accepted Solutions
bmomjian
Guide

Re: Occasional wireless traffic hang

I am happy to report that firmware 3.9.1.0 has been released, and I think it fixes the problem reported in this thread.  The release notes document is dated August 13, 2019, mention these two fixes:

 

    1. Addressed intermittent, rarely encountered access point hang issues.
    2. Fixed various stability and connectivity issues.

 

I think this closes the issue.  I will keep my monitoring in place for another year to verify the fix.  Thanks to all who helped.

 

View solution in original post

Model: WAC730|3x3 Wireless-AC Access Points
Message 37 of 37

All Replies
bmomjian
Guide

Re: Occasional wireless traffic hang

As an update to this report, the 50% sirq continued for 17 hours, until I rebooted the WAP.  You can see the dramatic change at exactly 2 days of uptime in these hourly 'top' reports:

 

CPU:   0% usr   4% sys   0% nic  95% idle   0% io   0% irq   0% sirq
CPU:   0% usr   4% sys   0% nic  95% idle   0% io   0% irq   0% sirq
CPU:   0% usr   4% sys   0% nic  95% idle   0% io   0% irq   0% sirq
CPU:   0% usr   4% sys   0% nic  95% idle   0% io   0% irq   0% sirq
CPU:   0% usr   4% sys   0% nic  95% idle   0% io   0% irq   0% sirq
CPU:   4% usr   0% sys   0% nic  95% idle   0% io   0% irq   0% sirq
CPU:   0% usr   0% sys   0% nic  50% idle   0% io   0% irq  50% sirq
CPU:   0% usr   0% sys   0% nic  50% idle   0% io   0% irq  50% sirq
CPU:   0% usr   4% sys   0% nic  45% idle   0% io   0% irq  50% sirq
CPU:   0% usr   0% sys   0% nic  50% idle   0% io   0% irq  50% sirq
CPU:   0% usr   0% sys   0% nic  50% idle   0% io   0% irq  50% sirq
CPU:   4% usr   4% sys   0% nic  40% idle   0% io   0% irq  50% sirq

After 17 hours of 50% sirq but before the WAP reboot, I disconnected every wifi device from the WAP, and verified there were no connected devices from the WAP dashboard, but the WAP was still showing 50% sirq.  I just rebooted the WAP and it is back to 0% sirq, and not slow.

 

I will keep monitoring 'top' after the reboot and get an alert if sirq% gets high.  I am curious to see if it gets a high sirq% at exactly two days of uptime again. Does something special happen to the WAP at two days of uptime?  I could automatically reboot the WAP when the sirq% gets high, but that hardly seems like a clean fix.

Model: WAC730|3x3 Wireless-AC Access Points
Message 2 of 37
schumaku
Guru

Re: Occasional wireless traffic hang

Looks very familiar - WAC730, 3.9.0.3, ... slow or intermittent wireless access only after some uptime

 

WAC730-1# uptime
00:26:48 up 3 days, 1:47, load average: 4.07, 4.08, 4.06


WAC730-1# top

Mem: 77560K used, 177568K free, 0K shrd, 0K buff, 20776K cached
CPU: 0% usr 0% sys 0% nic 49% idle 0% io 0% irq 50% sirq
Load average: 4.03 4.04 4.05 3/70 4210
PID PPID USER STAT VSZ %MEM CPU %CPU COMMAND
9 2 root RW 0 0% 1 47% [ksoftirqd/1]

...

Message 3 of 37
bmomjian
Guide

Re: Occasional wireless traffic hang

Ah, interesting.  I have a Debian server with ssh access to the WAP so I can monitor the 'top' output and get an alert when sirq gets high.  Once it happens again, I will try reverting to a previous firmware, maybe 3.8.3.0, and see if it happens again. 

 

My family has been complaining about wifi hangs and disconnects for about six months, and that matches the time I installed the 3.9.0.3 firmware.  It will take me perhaps another week to come to a conslusion on this.  I will keep reporting on my progress.

Message 4 of 37
schumaku
Guru

Re: Occasional wireless traffic hang

Another odd one - skipping a MAC address without any indication in the product logs:

 

WAC730-1# dmesg
processpmq: skip entry with mc/bc address 41:4e:36:84:48:6e
wl1: wlc_bmac_processpmq: skip entry with mc/bc address 41:4e:36:84:48:6e
wl1: wlc_bmac_processpmq: skip entry with mc/bc address 41:4e:36:84:48:6e
...

 

It's a valid device, a HTC phone.

Message 5 of 37
schumaku
Guru

Re: Occasional wireless traffic hang

@RaghuHR please.

Message 6 of 37
RaghuHR
NETGEAR Expert

Re: Occasional wireless traffic hang

@schumaku  what is the actual MAC address of your HTC phone? Do you have detailed logs for us?

Message 7 of 37
schumaku
Guru

Re: Occasional wireless traffic hang

Priority should be on the sirq issue started by the OP. That MAC message poped up while peek-and-poke the system (silly me - after a reboot). Will capture the logs ans share by PM.

Message 8 of 37
schumaku
Guru

Re: Occasional wireless traffic hang


@bmomjian wrote:

Ah, interesting.  I have a Debian server with ssh access to the WAP so I can monitor the 'top' output and get an alert when sirq gets high.  Once it happens again, ...


Before reverting, go to Monitoring -> Logs -> Save As ... and put the .tar to any cloud share, and seend the link to @RaghuHR  for Netgear inspection.

Message 9 of 37
bmomjian
Guide

Re: Occasional wireless traffic hang

OK, I will grab the logs once it happens again, though I have sent 50% sirq logs to you before and you said it looked fine.

Model: WAC730|3x3 Wireless-AC Access Points
Message 10 of 37
bmomjian
Guide

Re: Occasional wireless traffic hang

Last time I got the 50% sirq at exactly 48 hours of uptime.  I have passed 48 hours since the recent reboot and the sirq% is still zero.  I will report back and grab the logs as soon as my logging informs me that sirq has increased.

Model: WAC730|3x3 Wireless-AC Access Points
Message 11 of 37
schumaku
Guru

Re: Occasional wireless traffic hang

 

Mem: 77772K used, 177356K free, 0K shrd, 0K buff, 20872K cached
CPU:   0% usr   0% sys   0% nic  49% idle   0% io   0% irq  50% sirq
Load average: 4.00 4.01 4.05 6/71 7314
  PID  PPID USER     STAT   VSZ %MEM CPU %CPU COMMAND
    9     2 root     RW       0   0%   1  50% [ksoftirqd/1]
 1111     1 root     S    11656   5%   0   0% /usr/sbin/dman
 1518  1111 root     S    13880   5%   0   0% /usr/sbin/snmpd -f -c /tmp/snmpd.conf udp:161,udp6:161 -LO 0
 1626  1111 root     R     7196   3%   0   0% /usr/sbin/cportald
 1104     1 root     R     4984   2%   0   0% /usr/sbin/mapd
 1597  1111 root     S     4496   2%   0   0% /usr/bin/mini_httpd-ssl -S -E /etc/mini_httpd.pem -D -p1 80 -p2 443 -t 5 -s 1 -u root -c ./*.cgi|*.cgi|cgi-bin/
 1593  1111 root     S     4440   2%   0   0% /usr/bin/mini_httpd-ssl -D -p1 80 -p2 443 -t 5 -s 1 -u root -c ./*.cgi|*.cgi|cgi-bin/* -n 20 -Y ALL:!aNULL:!eNU
 1541     1 root     R     4312   2%   0   0% /usr/sbin/dhcpdump brtrunk
 1415  1111 root     S     4228   2%   0   0% /usr/sbin/mapqosd
  986     1 root     S     3736   1%   0   0% /usr/sbin/tspec
 1542  1111 root     S     2244   1%   0   0% /usr/sbin/hostapd /tmp/hostapd.conf.wlan0 /tmp/hostapd.conf.wlan1
 1073     1 root     S     1696   1%   0   0% /usr/sbin/asengd
 1416  1111 root     S     1540   1%   0   0% /usr/sbin/lldpd
 8774  8770 root     S     1488   1%   0   0% -splash
 1066     1 root     S     1424   1%   0   0% insmod /lib/modules/2.6.36.4/extra/wlext_ae.ko
 1427  1111 root     S     1416   1%   0   0% /sbin/udhcpc -l /tmp/udhcpc.lease.brtrunk --foreground -i brtrunk -H WAC730-1 -s /usr/share/udhcpc/dhcp_client.
    1     0 root     S     1412   1%   0   0% init
15908  8774 root     R     1412   1%   0   0% top
 1133  1111 root     S     1412   1%   0   0% /sbin/getty -L 115200 ttyS0
  940     1 root     S     1400   1%   0   0% insmod /lib/modules/2.6.36.4/extra/wlext_wds.ko

Lucky me had to move the WAC730 away from the Insight switch to an elderly GS510TP in attempting to check if the latent IGMP Multicast issues (still can't get any Apple Drive Bonjour announcements e.g. for Time Machine) are caused by the new Insight switch firmware 1.0.4.16 or the WAC505/510 on 5.0.10.2 ... sorry for the slightly off topic, but now the 50% sirq is obviously gone again for a while.

Message 12 of 37
bmomjian
Guide

Re: Occasional wireless traffic hang

That looks just like mine, exactly 50%.  How long has this WAP been up?  Also, is this from a WAC730?  What firmware version?

Model: WAC730|3x3 Wireless-AC Access Points
Message 13 of 37
schumaku
Guru

Re: Occasional wireless traffic hang

Same WAC730 as above again, up for a few days, 3.0.9.3 as before.

Message 14 of 37
bmomjian
Guide

Re: Occasional wireless traffic hang

Well, the most recent time happened was after 48 hours, but now I am up for 5 days, 3:34 and still 0% sirq. I am monitoring hourly.  Can you do the dump requested earlier in the thread and email it to him?  I don't know when my failure is going to happen again.

Model: WAC730|3x3 Wireless-AC Access Points
Message 15 of 37
schumaku
Guru

Re: Occasional wireless traffic hang

Afraid, different priorities required moving the WAC730 to a different PoE source 8-(
Message 16 of 37
bmomjian
Guide

Re: Occasional wireless traffic hang

OK, please let us know if you see it again.

Model: WAC730|3x3 Wireless-AC Access Points
Message 17 of 37
schumaku
Guru

Re: Occasional wireless traffic hang

Of course I will do so! The captured screen came from the ssh session open before power-off the WAC730. Silly me, I know.
Message 18 of 37
bmomjian
Guide

Re: Occasional wireless traffic hang

Uh, I made the same mistake last time too.  :-(  We will get this!

Message 19 of 37
schumaku
Guru

Re: Occasional wireless traffic hang


@bmomjian wrote:

Uh, I made the same mistake last time too.  :-(  We will get this!


We're lucky 8-)

 

Mem: 80920K used, 174208K free, 0K shrd, 0K buff, 22372K cached
CPU:   0% usr   0% sys   0% nic  49% idle   0% io   0% irq  50% sirq
Load average: 4.00 4.01 4.05 2/70 6976
  PID  PPID USER     STAT   VSZ %MEM CPU %CPU COMMAND
    9     2 root     RW       0   0%   1  50% [ksoftirqd/1]
 1540     1 root     S     4312   2%   0   0% /usr/sbin/dhcpdump brtrunk
 1518  1111 root     S    13880   5%   0   0% /usr/sbin/snmpd -f -c /tmp/snmpd.conf udp:161,udp6:161 -LO 0
 1111     1 root     S    11996   5%   0   0% /usr/sbin/dman
 1625  1111 root     S     7196   3%   0   0% /usr/sbin/cportald
 1104     1 root     S     4984   2%   0   0% /usr/sbin/mapd
 1596  1111 root     S     4496   2%   0   0% /usr/bin/mini_httpd-ssl -S -E /etc/mini_httpd.pem -D -p1 80 -p2 443 -t 5 -s 1 -u root -c ./*.cgi|*.cgi|cgi-bin/
 1592  1111 root     S     4440   2%   1   0% /usr/bin/mini_httpd-ssl -D -p1 80 -p2 443 -t 5 -s 1 -u root -c ./*.cgi|*.cgi|cgi-bin/* -n 20 -Y ALL:!aNULL:!eNU
 1415  1111 root     S     4228   2%   0   0% /usr/sbin/mapqosd
  986     1 root     S     3736   1%   0   0% /usr/sbin/tspec
 5892  1111 root     S     2252   1%   0   0% /usr/sbin/hostapd /tmp/hostapd.conf.wlan0 /tmp/hostapd.conf.wlan1
 1073     1 root     S     1696   1%   0   0% /usr/sbin/asengd
 1416  1111 root     S     1540   1%   0   0% /usr/sbin/lldpd
 6893  6886 root     S     1488   1%   0   0% -splash
 1066     1 root     S     1424   1%   0   0% insmod /lib/modules/2.6.36.4/extra/wlext_ae.ko
 1427  1111 root     S     1416   1%   1   0% /sbin/udhcpc -l /tmp/udhcpc.lease.brtrunk --foreground -i brtrunk -H WAC730-1 -s /usr/share/udhcpc/dhcp_client.
    1     0 root     S     1412   1%   0   0% init
 6894  6893 root     R     1412   1%   0   0% top
 1133  1111 root     S     1412   1%   0   0% /sbin/getty -L 115200 ttyS0
  940     1 root     S     1400   1%   0   0% insmod /lib/modules/2.6.36.4/extra/wlext_wds.ko
  972     1 root     S     1400   1%   0   0% insmod /lib/modules/2.6.36.4/extra/wlext_rfscan.ko
  957     1 root     D     1400   1%   0   0% insmod /lib/modules/2.6.36.4/extra/wlext_dl2tunnel.ko
  891     1 root     D     1400   1%   0   0% insmod /lib/modules/2.6.36.4/extra/wlext_l2tunnel.ko
  892     1 root     D     1400   1%   0   0% insmod /lib/modules/2.6.36.4/extra/wlext_l2tunnel.ko
 6886  1342 root     S     1100   0%   0   0% /usr/sbin/dropbear -E -F -d /tmp/dss.key -r /tmp/rsa.key
 1342  1111 root     S     1044   0%   0   0% /usr/sbin/dropbear -E -F -d /tmp/dss.key -r /tmp/rsa.key
 1676  1111 root     S      952   0%   0   0% /usr/sbin/sntp -s 3600 0 time-b.netgear.com
 6952  1111 root     S      836   0%   0   0% /usr/sbin/mDNSResponderPosix -f /tmp/bonjour_services
 1512  1111 root     S      716   0%   0   0% /usr/sbin/syslogd
 1145  1111 root     S      644   0%   0   0% /usr/bin/eapd
 1045     1 root     S      628   0%   0   0% /usr/bin/ifmon
 1509     1 root     S      628   0%   0   0% ifmon

Kernel:

 

WAC730-1# dmesg
ndefined error)
wl0: wlc_iovar_op: BCME -1 (Undefined error)
wl1: wlc_iovar_op: BCME -1 (Undefined error)
wl0: wlc_iovar_op: BCME -1 (Undefined error)
wl0: wlc_iovar_op: BCME -1 (Undefined error)
wl1: wlc_iovar_op: BCME -1 (Undefined error)
...

 

Let's see ...

 

WAC730-1# cat /proc/interrupts
           CPU0       CPU1
 27:         36          0         GIC  mpcore_gtimer
 32:          0          0         GIC  L2C
117:       2640          0         GIC  serial
163:          0    5951497         GIC  wlan0
169:          0    3683950         GIC  wlan1
179:    3041369          0         GIC  eth0
IPI:      70538      69585
LOC:   19178805   19043112
Err:          0
WAC730-1# cat /proc/interrupts
           CPU0       CPU1
 27:         36          0         GIC  mpcore_gtimer
 32:          0          0         GIC  L2C
117:       2640          0         GIC  serial
163:          0    5951611         GIC  wlan0
169:          0    3683950         GIC  wlan1
179:    3041389          0         GIC  eth0
IPI:      70538      69585
LOC:   19179335   19043607
Err:          0
WAC730-1# cat /proc/interrupts
           CPU0       CPU1
 27:         36          0         GIC  mpcore_gtimer
 32:          0          0         GIC  L2C
117:       2640          0         GIC  serial
163:          0    5952063         GIC  wlan0
169:          0    3683950         GIC  wlan1
179:    3041456          0         GIC  eth0
IPI:      70538      69586
LOC:   19181330   19045488
Err:          0

So let's focus on 163, 169, 179, and LOC.

 

WAC730-1# cat /proc/softirqs
                CPU0       CPU1
      HI:          0          0
   TIMER:    9445575    9445441
  NET_TX:         81         64
  NET_RX:    2104183      12945
   BLOCK:          0          0
BLOCK_IOPOLL:          0          0
 TASKLET:    4446053  109704001
   SCHED:    9030682    7939783
 HRTIMER:          0          0
     RCU:    1467424    1531885
WAC730-1# cat /proc/softirqs
                CPU0       CPU1
      HI:          0          0
   TIMER:    9445734    9445600
  NET_TX:         81         64
  NET_RX:    2104196      12945
   BLOCK:          0          0
BLOCK_IOPOLL:          0          0
 TASKLET:    4446080  109735563
   SCHED:    9030841    7939791
 HRTIMER:          0          0
     RCU:    1467443    1531908
WAC730-1# cat /proc/softirqs
                CPU0       CPU1
      HI:          0          0
   TIMER:    9445990    9445856
  NET_TX:         81         64
  NET_RX:    2104214      12945
   BLOCK:          0          0
BLOCK_IOPOLL:          0          0
 TASKLET:    4446116  109786311
   SCHED:    9031088    7939803
 HRTIMER:          0          0
     RCU:    1467498    1531957

Hm... dropping a PM to @RaghuHR  with the logs.

Message 20 of 37
bmomjian
Guide

Re: Occasional wireless traffic hang

Great, thanks!  It has not happened here since my last report.

Model: WAC730|3x3 Wireless-AC Access Points
Message 21 of 37
schumaku
Guru

Re: Occasional wireless traffic hang

Thank you @RaghuHR for the confirmation. Any indication about the root cause? Please share a quick bugfix firmware ASAP.

 

Would downgrading the firmware be an option?

Message 22 of 37
schumaku
Guru

Re: Occasional wireless traffic hang

Experiencing the high sirq load almost every other day, here at the home office as well as on your customer sites. Not nice as connectivity, reliability and customer happiness is impacted.

Message 23 of 37
schumaku
Guru

Re: Occasional wireless traffic hang

Got a WAC7xx Beta FW v3.9.0.15 under NDA for standalone usage so can't share (talk to @DaneA), it was requested to provide feedback in the public community.

 

WAC730-1# uptime
14:52:09 up 1 day, 20:17, load average: 3.00, 3.01, 3.04

No unexpected high sirq load leading to a partial DoS situation - testing and monitoring does continue.

 

TIA,

-Kurt

Message 24 of 37
bmomjian
Guide

Re: Occasional wireless traffic hang

Were you getting high sirqs before the update?  Any idea on a cause?  I am monitoring sirq and have not seen any spikes since my earlier report.

 

Model: WAC730|3x3 Wireless-AC Access Points
Message 25 of 37
Top Contributors
Discussion stats
  • 36 replies
  • 5793 views
  • 2 kudos
  • 3 in conversation
Announcements