NETGEAR is aware of a growing number of phone and online scams. To learn how to stay safe click here.

Forum Discussion

ThirtyReset's avatar
Feb 11, 2016

System hang, fan issues with 6.4.x on Pioneer Pro, Ultra 6 Plus

I have a pair of 6-bay legacy ReadyNAS units here (a Pro Pioneer and an Ultra 6 Plus), both of which have been hardware upgraded to match the same spec (E7600, 8GB RAM, 6x3TB Seagate ST3000DM001) and both of which have been exhibiting problems since I upgraded to 6.4.1.

 

Since moving from 6.2.4 to 6.4.1, I've had problems witth the system fan running full speed constantly as compared to a more tolerable level under 6.2.4.  I've tried the various sensors.conf changes from other threads without much luck.  Finally resolved this by replacing the 120mm fan with a Noctua, but I'm not sure why I had to go that route in the end.  Something clearly changed with the fan behavior in 6.4.

 

In attempting to fix the fan problem, I updated to 6.4.2 RC1 and then most recently 6.4.2 and now have a new problem - every 2-3 days the NAS will basically go unresponsive.  If I can get in via SSH, the load averages have shot up into the 20s and 30s and the unit will eventually become completely unresponsive and require a hard reset.

 

Looking at the journal afterword, the only thing I can see so far that jumps out are few instances of kernel call traces that seem to potentially be related to the NIC teaming around the time things go nuts:

 

Feb 10 09:09:37 nas1 kernel: bond0: hw csum failure
Feb 10 09:09:37 nas1 kernel: CPU: 0 PID: 0 Comm: swapper/0 Tainted: P           O    4.1.16.x86_64.1 #1
Feb 10 09:09:37 nas1 kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E....26/2010
Feb 10 09:09:37 nas1 kernel:  ffff88021189f001 ffff88024fc03b18 ffffffff889aa075 0000000000000001
Feb 10 09:09:37 nas1 kernel:  ffff88024327b000 ffff88024fc03b38 ffffffff888abe3d ffffffff88899c10
Feb 10 09:09:37 nas1 kernel:  ffff88024e948700 ffff88024fc03b78 ffffffff888a13a4 ffff88024fc03b88
Feb 10 09:09:37 nas1 kernel: Call Trace:
Feb 10 09:09:37 nas1 kernel:  <IRQ>  [<ffffffff889aa075>] dump_stack+0x45/0x57
Feb 10 09:09:37 nas1 kernel:  [<ffffffff888abe3d>] netdev_rx_csum_fault+0x3d/0x40
Feb 10 09:09:37 nas1 kernel:  [<ffffffff88899c10>] ? csum_block_add_ext+0x30/0x30
Feb 10 09:09:37 nas1 kernel:  [<ffffffff888a13a4>] __skb_checksum_complete+0xc4/0xd0
Feb 10 09:09:37 nas1 kernel:  [<ffffffff8890d198>] icmp_rcv+0x1b8/0x360
Feb 10 09:09:37 nas1 kernel:  [<ffffffff888da428>] ip_local_deliver_finish+0x58/0x170
Feb 10 09:09:37 nas1 kernel:  [<ffffffff888da712>] ip_local_deliver+0xa2/0xb0
Feb 10 09:09:37 nas1 kernel:  [<ffffffff888da3d0>] ? ip_rcv_finish+0x2f0/0x2f0
Feb 10 09:09:37 nas1 kernel:  [<ffffffff888da1e9>] ip_rcv_finish+0x109/0x2f0
Feb 10 09:09:37 nas1 kernel:  [<ffffffff888da988>] ip_rcv+0x268/0x370
Feb 10 09:09:37 nas1 kernel:  [<ffffffff8864721d>] ? bond_handle_frame+0x7d/0x200
Feb 10 09:09:37 nas1 kernel:  [<ffffffff888a719c>] __netif_receive_skb_core+0x51c/0x750
Feb 10 09:09:37 nas1 kernel:  [<ffffffff888aa1c1>] __netif_receive_skb+0x21/0x70
Feb 10 09:09:37 nas1 kernel:  [<ffffffff888aa368>] netif_receive_skb_internal+0x28/0x90
Feb 10 09:09:37 nas1 kernel:  [<ffffffff888aa3dc>] netif_receive_skb_sk+0xc/0x10
Feb 10 09:09:37 nas1 kernel:  [<ffffffff88745e93>] SkY2Poll+0xeb3/0x14b0
Feb 10 09:09:37 nas1 kernel:  [<ffffffff88087b32>] ? ttwu_do_wakeup+0x12/0x80
Feb 10 09:09:37 nas1 kernel:  [<ffffffff888aa71c>] net_rx_action+0x12c/0x2c0
Feb 10 09:09:37 nas1 kernel:  [<ffffffff8806b72a>] __do_softirq+0xda/0x1f0
Feb 10 09:09:37 nas1 kernel:  [<ffffffff8806ba26>] irq_exit+0x76/0xa0
Feb 10 09:09:37 nas1 kernel:  [<ffffffff880052c0>] do_IRQ+0x60/0x100
Feb 10 09:09:37 nas1 kernel:  [<ffffffff889b232b>] common_interrupt+0x6b/0x6b
Feb 10 09:09:37 nas1 kernel:  <EOI>  [<ffffffff8800c9eb>] ? mwait_idle+0x5b/0x90
Feb 10 09:09:37 nas1 kernel:  [<ffffffff8800d29a>] arch_cpu_idle+0xa/0x10
Feb 10 09:09:37 nas1 kernel:  [<ffffffff8809c0ba>] cpu_startup_entry+0x18a/0x2a0
Feb 10 09:09:37 nas1 kernel:  [<ffffffff889a3902>] rest_init+0x72/0x80
Feb 10 09:09:37 nas1 kernel:  [<ffffffff88e2c02f>] start_kernel+0x4dd/0x4ea
Feb 10 09:09:37 nas1 kernel:  [<ffffffff88e2b93f>] ? set_init_arg+0x58/0x58
Feb 10 09:09:37 nas1 kernel:  [<ffffffff88e2b568>] x86_64_start_reservations+0x2a/0x2c
Feb 10 09:09:37 nas1 kernel:  [<ffffffff88e2b627>] x86_64_start_kernel+0xbd/0xc1

 

The Ultra 6 Plus chassis is a spare unit I used for experimenting with OS 6 before moving my main Pro Pioneer over, so I've migrated my disks carefully in order from the Ultra 6 Plus chassis to the Pro chassis and the issue followed, which leads me to believe it's nothing related to specific hardware but rather something at the OS level.


Any suggestions?  Just about got the wife convinced that I should just replace these with a nice new 516 given that OS 6 seems touchy at times still during upgrades, but I'd rather not replace outright if there's something simple I'm missing.  Are there known issues with NIC teaming on legacy hardware and OS 6 that I've missed in my searches (the new forum isn't the greatest for finding things)?  Something else I should check?  Anyone else seeing the same problem?

 

Worst case scenario, everything on the NAS is backed up, so I can simply reset to defautls and rebuild or even roll back to 6.2.4 if need be.

21 Replies

Replies have been turned off for this discussion
  • Forgot to mention, Anti-Virus is disabled per other recommendations I've seen for 6.4.x.

    • mdgm-ntgr's avatar
      mdgm-ntgr
      NETGEAR Employee Retired

      Which teaming method are you using? Some teaming methods are better than others.

      • ThirtyReset's avatar
        ThirtyReset
        Tutor

        Using 802.3ad LACP, Lyaer 3+4.  This was rock solid for better part of 8 months on 6.2.x.  The system instability started with 6.4.2 RC1 and continues in 6.4.2.  Haven't tried shutting off bonding yet to see if it makes a difference.  I moved to 6.4.2 RC1 when it came out to hopefully resolve the fan issues, so I've been chasing this off and on for a month or 2 now I guess.  Just stumbled on to this error in the logs recently when looking for some common thread.

         

        I guess I can disable and see if the problem goes away, it's been pretty reliably occurring at least twice a week. Any other suggestions?  I have the logs downloaded if anyone wants to take a look.

         

        My fan problem was solved as I noted by replacing the stock 120mm fan with a Noctua.  Not really sure why I had to do this, as the fans were nice and quiet under 6.2.x as well, so something changed there, but the new fan has made it tolerable to be in the same room as the server again at least.

NETGEAR Academy

Boost your skills with the Netgear Academy - Get trained, certified and stay ahead with the latest Netgear technology! 

Join Us!

ProSupport for Business

Comprehensive support plans for maximum network uptime and business peace of mind.

 

Learn More