× NETGEAR will be terminating ReadyCLOUD service by July 1st, 2023. For more details click here.
Orbi WiFi 7 RBE973
Reply

Re: System hang, fan issues with 6.4.x on Pioneer Pro, Ultra 6 Plus

ThirtyReset
Tutor

System hang, fan issues with 6.4.x on Pioneer Pro, Ultra 6 Plus

I have a pair of 6-bay legacy ReadyNAS units here (a Pro Pioneer and an Ultra 6 Plus), both of which have been hardware upgraded to match the same spec (E7600, 8GB RAM, 6x3TB Seagate ST3000DM001) and both of which have been exhibiting problems since I upgraded to 6.4.1.

 

Since moving from 6.2.4 to 6.4.1, I've had problems witth the system fan running full speed constantly as compared to a more tolerable level under 6.2.4.  I've tried the various sensors.conf changes from other threads without much luck.  Finally resolved this by replacing the 120mm fan with a Noctua, but I'm not sure why I had to go that route in the end.  Something clearly changed with the fan behavior in 6.4.

 

In attempting to fix the fan problem, I updated to 6.4.2 RC1 and then most recently 6.4.2 and now have a new problem - every 2-3 days the NAS will basically go unresponsive.  If I can get in via SSH, the load averages have shot up into the 20s and 30s and the unit will eventually become completely unresponsive and require a hard reset.

 

Looking at the journal afterword, the only thing I can see so far that jumps out are few instances of kernel call traces that seem to potentially be related to the NIC teaming around the time things go nuts:

 

Feb 10 09:09:37 nas1 kernel: bond0: hw csum failure
Feb 10 09:09:37 nas1 kernel: CPU: 0 PID: 0 Comm: swapper/0 Tainted: P           O    4.1.16.x86_64.1 #1
Feb 10 09:09:37 nas1 kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E....26/2010
Feb 10 09:09:37 nas1 kernel:  ffff88021189f001 ffff88024fc03b18 ffffffff889aa075 0000000000000001
Feb 10 09:09:37 nas1 kernel:  ffff88024327b000 ffff88024fc03b38 ffffffff888abe3d ffffffff88899c10
Feb 10 09:09:37 nas1 kernel:  ffff88024e948700 ffff88024fc03b78 ffffffff888a13a4 ffff88024fc03b88
Feb 10 09:09:37 nas1 kernel: Call Trace:
Feb 10 09:09:37 nas1 kernel:  <IRQ>  [<ffffffff889aa075>] dump_stack+0x45/0x57
Feb 10 09:09:37 nas1 kernel:  [<ffffffff888abe3d>] netdev_rx_csum_fault+0x3d/0x40
Feb 10 09:09:37 nas1 kernel:  [<ffffffff88899c10>] ? csum_block_add_ext+0x30/0x30
Feb 10 09:09:37 nas1 kernel:  [<ffffffff888a13a4>] __skb_checksum_complete+0xc4/0xd0
Feb 10 09:09:37 nas1 kernel:  [<ffffffff8890d198>] icmp_rcv+0x1b8/0x360
Feb 10 09:09:37 nas1 kernel:  [<ffffffff888da428>] ip_local_deliver_finish+0x58/0x170
Feb 10 09:09:37 nas1 kernel:  [<ffffffff888da712>] ip_local_deliver+0xa2/0xb0
Feb 10 09:09:37 nas1 kernel:  [<ffffffff888da3d0>] ? ip_rcv_finish+0x2f0/0x2f0
Feb 10 09:09:37 nas1 kernel:  [<ffffffff888da1e9>] ip_rcv_finish+0x109/0x2f0
Feb 10 09:09:37 nas1 kernel:  [<ffffffff888da988>] ip_rcv+0x268/0x370
Feb 10 09:09:37 nas1 kernel:  [<ffffffff8864721d>] ? bond_handle_frame+0x7d/0x200
Feb 10 09:09:37 nas1 kernel:  [<ffffffff888a719c>] __netif_receive_skb_core+0x51c/0x750
Feb 10 09:09:37 nas1 kernel:  [<ffffffff888aa1c1>] __netif_receive_skb+0x21/0x70
Feb 10 09:09:37 nas1 kernel:  [<ffffffff888aa368>] netif_receive_skb_internal+0x28/0x90
Feb 10 09:09:37 nas1 kernel:  [<ffffffff888aa3dc>] netif_receive_skb_sk+0xc/0x10
Feb 10 09:09:37 nas1 kernel:  [<ffffffff88745e93>] SkY2Poll+0xeb3/0x14b0
Feb 10 09:09:37 nas1 kernel:  [<ffffffff88087b32>] ? ttwu_do_wakeup+0x12/0x80
Feb 10 09:09:37 nas1 kernel:  [<ffffffff888aa71c>] net_rx_action+0x12c/0x2c0
Feb 10 09:09:37 nas1 kernel:  [<ffffffff8806b72a>] __do_softirq+0xda/0x1f0
Feb 10 09:09:37 nas1 kernel:  [<ffffffff8806ba26>] irq_exit+0x76/0xa0
Feb 10 09:09:37 nas1 kernel:  [<ffffffff880052c0>] do_IRQ+0x60/0x100
Feb 10 09:09:37 nas1 kernel:  [<ffffffff889b232b>] common_interrupt+0x6b/0x6b
Feb 10 09:09:37 nas1 kernel:  <EOI>  [<ffffffff8800c9eb>] ? mwait_idle+0x5b/0x90
Feb 10 09:09:37 nas1 kernel:  [<ffffffff8800d29a>] arch_cpu_idle+0xa/0x10
Feb 10 09:09:37 nas1 kernel:  [<ffffffff8809c0ba>] cpu_startup_entry+0x18a/0x2a0
Feb 10 09:09:37 nas1 kernel:  [<ffffffff889a3902>] rest_init+0x72/0x80
Feb 10 09:09:37 nas1 kernel:  [<ffffffff88e2c02f>] start_kernel+0x4dd/0x4ea
Feb 10 09:09:37 nas1 kernel:  [<ffffffff88e2b93f>] ? set_init_arg+0x58/0x58
Feb 10 09:09:37 nas1 kernel:  [<ffffffff88e2b568>] x86_64_start_reservations+0x2a/0x2c
Feb 10 09:09:37 nas1 kernel:  [<ffffffff88e2b627>] x86_64_start_kernel+0xbd/0xc1

 

The Ultra 6 Plus chassis is a spare unit I used for experimenting with OS 6 before moving my main Pro Pioneer over, so I've migrated my disks carefully in order from the Ultra 6 Plus chassis to the Pro chassis and the issue followed, which leads me to believe it's nothing related to specific hardware but rather something at the OS level.


Any suggestions?  Just about got the wife convinced that I should just replace these with a nice new 516 given that OS 6 seems touchy at times still during upgrades, but I'd rather not replace outright if there's something simple I'm missing.  Are there known issues with NIC teaming on legacy hardware and OS 6 that I've missed in my searches (the new forum isn't the greatest for finding things)?  Something else I should check?  Anyone else seeing the same problem?

 

Worst case scenario, everything on the NAS is backed up, so I can simply reset to defautls and rebuild or even roll back to 6.2.4 if need be.

Message 1 of 22
ThirtyReset
Tutor

Re: System hang, fan issues with 6.4.x on Pioneer Pro, Ultra 6 Plus

Forgot to mention, Anti-Virus is disabled per other recommendations I've seen for 6.4.x.

Message 2 of 22
mdgm-ntgr
NETGEAR Employee Retired

Re: System hang, fan issues with 6.4.x on Pioneer Pro, Ultra 6 Plus

Which teaming method are you using? Some teaming methods are better than others.

Message 3 of 22
ThirtyReset
Tutor

Re: System hang, fan issues with 6.4.x on Pioneer Pro, Ultra 6 Plus

Using 802.3ad LACP, Lyaer 3+4.  This was rock solid for better part of 8 months on 6.2.x.  The system instability started with 6.4.2 RC1 and continues in 6.4.2.  Haven't tried shutting off bonding yet to see if it makes a difference.  I moved to 6.4.2 RC1 when it came out to hopefully resolve the fan issues, so I've been chasing this off and on for a month or 2 now I guess.  Just stumbled on to this error in the logs recently when looking for some common thread.

 

I guess I can disable and see if the problem goes away, it's been pretty reliably occurring at least twice a week. Any other suggestions?  I have the logs downloaded if anyone wants to take a look.

 

My fan problem was solved as I noted by replacing the stock 120mm fan with a Noctua.  Not really sure why I had to do this, as the fans were nice and quiet under 6.2.x as well, so something changed there, but the new fan has made it tolerable to be in the same room as the server again at least.

Message 4 of 22
mdgm-ntgr
NETGEAR Employee Retired

Re: System hang, fan issues with 6.4.x on Pioneer Pro, Ultra 6 Plus

You can send your logs in (see the Sending Logs link in my sig).

Message 5 of 22
ThirtyReset
Tutor

Re: System hang, fan issues with 6.4.x on Pioneer Pro, Ultra 6 Plus

Thanks much for the offer, will send off shortly.  There's some errors in there from earlier today when I was setting up an external drive for backing up the last of the data that would be faster to reconstitute from one location than the several places that feed it to my NAS today, so you can ignore those of course.

Message 6 of 22
mdgm-ntgr
NETGEAR Employee Retired

Re: System hang, fan issues with 6.4.x on Pioneer Pro, Ultra 6 Plus

Does your switch support Layer 3+4 for LACP and have you configured it to use that?

Message 7 of 22
ThirtyReset
Tutor

Re: System hang, fan issues with 6.4.x on Pioneer Pro, Ultra 6 Plus

Good question. 

 

Switch is configured for LACP approprlately and the bonded interfaces show up in the switch configuration appropriately.  I don't see a setting for Layer 3+4 on the switch side, but as I said this seemed to be working fine for many many months before I moved to 6.4.2 RC1, never noticed a hiccup. 

 

The switch in question is an HP ProCurve 1800-24G, if that makes a difference to you (sorry, not a Netgear!).  I can't find any specifics yet online about Layer 3+4 support, and it's been a while since I configured that, so I don't remember what basis I made that choice on.  I can switch it over to Layer 2 if you think it might make a difference.

 

 

Message 8 of 22
ThirtyReset
Tutor

Re: System hang, fan issues with 6.4.x on Pioneer Pro, Ultra 6 Plus

Okay, it's only a layer 2 switch like i thought, so I've changed the bond over to layer 2 for the hashing.  Not sure why I had it setup all those months as layer 3+4, or for that matter why it's only now started to cause problems, but I'll change it up and go from there.  Hopefully this solves it and no more hangs...

Message 9 of 22
mdgm-ntgr
NETGEAR Employee Retired

Re: System hang, fan issues with 6.4.x on Pioneer Pro, Ultra 6 Plus

The datasheet indicates that is a Layer 2 switch so trying Layer 2 may well be worth a try.

Message 10 of 22
ThirtyReset
Tutor

Re: System hang, fan issues with 6.4.x on Pioneer Pro, Ultra 6 Plus

Yep, I should have known better, but most of my networking experience was in a former life at this point 🙂

 

Anyways, let's hope this resolves the system hangs and go from there.  Thanks for taking a peek at the logs.

Message 11 of 22
StephenB
Guru

Re: System hang, fan issues with 6.4.x on Pioneer Pro, Ultra 6 Plus


@mdgm wrote:

Does your switch support Layer 3+4 for LACP and have you configured it to use that?


It really doesn't matter if the switch supports layer 3+4 or not.  It is just an algorithm choice for deciding which NIC will transmit each packet.  It is not negotiated, and the switch can use a different algorithm from the NAS.

 

However, layer 3+4 can result in packet loss - since in some cases the NAS might end up trying to send more than 1 gigabit to a single client.  That will create congestion and the switch will have to drop packets when its queues fill up

 

Layer 2 or Layer 2+3 are safer choices.

 

Message 12 of 22
ThirtyReset
Tutor

Re: System hang, fan issues with 6.4.x on Pioneer Pro, Ultra 6 Plus

StephenB, thanks for the clarification - I thought that was the case but wasn't too sure.   I've got it set back to Layer 2 for now, but I'm not convinced yet that this could be the source of my system instability.  Do you think that issues with congestion triggered by this could cause the spike in load I'm seeing that's making the NAS unresponsive?

Message 13 of 22
StephenB
Guru

Re: System hang, fan issues with 6.4.x on Pioneer Pro, Ultra 6 Plus


@ThirtyReset wrote:

StephenB, thanks for the clarification - I thought that was the case but wasn't too sure.   I've got it set back to Layer 2 for now, but I'm not convinced yet that this could be the source of my system instability.  Do you think that issues with congestion triggered by this could cause the spike in load I'm seeing that's making the NAS unresponsive?


I'm not convinced this causing the instability either.  You could look at the switch stats, and see if there actually is congestion (usually there are stats on packet queues). 

 

But the ethernet checksum error is troubling in its own right (and it is rx, so the xmit hashing algorithm choice isn't related to that).

Message 14 of 22
ThirtyReset
Tutor

Re: System hang, fan issues with 6.4.x on Pioneer Pro, Ultra 6 Plus

No noticeable congestion on the switch, but the statistics on the ProCurve aren't all that informative either.  Agreed about the checksum error.  Making a mental note to monitor status of my switch, may be indicative of some issue on that front. 

 

I've had some trouble with my MacPro and the mDNSresponder service losing DNS connectivity with the nas (where I was running dnsmasq until recently), I'm suspicious that may be related.

 

Of course, that brings me back to still trying to figure out the stability issue.  If it holds to normal patterns, it should lock up on me again sometime this weekend.  I'm in the process of backing up the last chunk of data now that wasn't backed up (only because that data itself was backup from other sources, mind you - backing up to an external drive as an added precaution plus ease of restoration later).  Once that's done, I'm tempted to factory default this and give it a fresh installation and rebuild, but I'm not there yet. 

 

Of course, I'm also temptingly eyeing RN516's right now...I can't complain, my Pro Pioneer still does an amazingly bang-up job (especially after the CPU was upgraded a few years back).  Really, only dealing with this backup right now makes me wish for something a bit more modern with USB3 or at least eSATA 🙂

Message 15 of 22
StephenB
Guru

Re: System hang, fan issues with 6.4.x on Pioneer Pro, Ultra 6 Plus

One possibility is to do a direct connect to the PC, and try backing up to a PC-attached drive.

 

That's the fastest backup method, and will also give you more evidence on how the main issue relates to the network.

 

The bad news is that it takes the NAS off-line.

Message 16 of 22
ThirtyReset
Tutor

Re: System hang, fan issues with 6.4.x on Pioneer Pro, Ultra 6 Plus

So the good news is the backup finished several days ago.  We've had some family activity the last week (birth of my daughter!), so this hasn't been high on the list, but now that things are settling in at home and baby loves to have my Squeezebox radios running in the background all day, I need to sort this out again.

 

Still seeing handing behavior similar to previous descriptions.  Hits a load average of 4.0x, at which point I can usually ssh in and do a few things before it becomes completely unresponsive.   On at least one occasion, it appeared that CrashPlan was chewing up a lot of memory but the system was still not in swap yet.  Killing off CrashPlan made no difference, however.

 

This last time it went unresponsive I got some more info out of the logs, a protection fault coming from the nv6lcd and vpd:

 

Feb 20 13:35:59 nas1 kernel: general protection fault: 0000 [#3] SMP
Feb 20 13:35:59 nas1 kernel: Modules linked in: nv6lcd(O) vpd(PO)
Feb 20 13:35:59 nas1 kernel: CPU: 1 PID: 20911 Comm: avahi-publish-s Tainted: P      D    O    4.1.16.x86_64.1 #1
Feb 20 13:35:59 nas1 kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E....26/2010
Feb 20 13:35:59 nas1 kernel: task: ffff8800a5852010 ti: ffff8800714c4000 task.ti: ffff8800714c4000
Feb 20 13:35:59 nas1 kernel: RIP: 0010:[<ffffffff8813e11c>]  [<ffffffff8813e11c>] __fget_light+0x5c/0x70
Feb 20 13:35:59 nas1 kernel: RSP: 0018:ffff8800714c7ae0  EFLAGS: 00010202
Feb 20 13:35:59 nas1 kernel: RAX: 0000000000000000 RBX: ffff8800714c7b8c RCX: ffff8800714c7b68
Feb 20 13:35:59 nas1 kernel: RDX: 1d00010004060008 RSI: 0000000000004000 RDI: 0000000000000003
Feb 20 13:35:59 nas1 kernel: RBP: ffff8800714c7ae8 R08: ffff8800714c4000 R09: 0000000000000000
Feb 20 13:35:59 nas1 kernel: R10: 00000000000002ef R11: 0000000000000000 R12: 0000000000000000
Feb 20 13:35:59 nas1 kernel: R13: 0000000000000000 R14: ffff8800714c7b74 R15: 0000000000000000
Feb 20 13:35:59 nas1 kernel: FS:  00007f3e28c37700(0000) GS:ffff88024fc80000(0000) knlGS:0000000000000000
Feb 20 13:35:59 nas1 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Feb 20 13:35:59 nas1 kernel: CR2: 00007f3bb768a000 CR3: 00000001d5f82000 CR4: 00000000000406e0
Feb 20 13:35:59 nas1 kernel: Stack:
Feb 20 13:35:59 nas1 kernel:  ffffffff8813e13e ffff8800714c7f08 ffffffff881369bd ffff8800714c7b08
Feb 20 13:35:59 nas1 kernel:  ffff8801d63b9000 ffff8801d63b9000 ffff8800714c8000 ffff8800a5852010
Feb 20 13:35:59 nas1 kernel:  0000000000000000 0000000000000000 0000000000000000 0000000000000000
Feb 20 13:35:59 nas1 kernel: Call Trace:
Feb 20 13:35:59 nas1 kernel:  [<ffffffff8813e13e>] ? __fdget+0xe/0x10
Feb 20 13:35:59 nas1 kernel:  [<ffffffff881369bd>] do_sys_poll+0x22d/0x5a0
Feb 20 13:35:59 nas1 kernel:  [<ffffffff8889bc9c>] ? skb_free_head+0x6c/0x80
Feb 20 13:35:59 nas1 kernel:  [<ffffffff8889bd73>] ? skb_release_data+0xc3/0xd0
Feb 20 13:35:59 nas1 kernel:  [<ffffffff8889be0c>] ? __kfree_skb+0x2c/0x80
Feb 20 13:35:59 nas1 kernel:  [<ffffffff8893c3d3>] ? unix_stream_recvmsg+0x433/0x780
Feb 20 13:35:59 nas1 kernel:  [<ffffffff881356b0>] ? __pollwait+0xf0/0xf0
Feb 20 13:35:59 nas1 kernel:  [<ffffffff881356b0>] ? __pollwait+0xf0/0xf0
Feb 20 13:35:59 nas1 kernel:  [<ffffffff881356b0>] ? __pollwait+0xf0/0xf0
Feb 20 13:35:59 nas1 kernel:  [<ffffffff8813c1c1>] ? touch_atime+0x71/0x160
Feb 20 13:35:59 nas1 kernel:  [<ffffffff8812b071>] ? pipe_read+0x281/0x2e0
Feb 20 13:35:59 nas1 kernel:  [<ffffffff88123896>] ? vfs_read+0x126/0x150
Feb 20 13:35:59 nas1 kernel:  [<ffffffff88136dfd>] SyS_poll+0x6d/0x100
Feb 20 13:35:59 nas1 kernel:  [<ffffffff889b1857>] system_call_fastpath+0x12/0x6a
Feb 20 13:35:59 nas1 kernel: Code: 0f 45 c1 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 48 8b 50 08 31 c0 3b 3a 73 ea ... 00 55
Feb 20 13:35:59 nas1 kernel: RIP  [<ffffffff8813e11c>] __fget_light+0x5c/0x70
Feb 20 13:35:59 nas1 kernel:  RSP <ffff8800714c7ae0>
Feb 20 13:35:59 nas1 kernel: ---[ end trace fe1f1bb67fb99dce ]---
Feb 20 13:35:59 nas1 kernel: general protection fault: 0000 [#4] SMP
Feb 20 13:35:59 nas1 kernel: Modules linked in: nv6lcd(O) vpd(PO)
Feb 20 13:35:59 nas1 kernel: CPU: 1 PID: 20911 Comm: avahi-publish-s Tainted: P      D    O    4.1.16.x86_64.1 #1
Feb 20 13:35:59 nas1 kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E....26/2010
Feb 20 13:35:59 nas1 kernel: task: ffff8800a5852010 ti: ffff8800714c4000 task.ti: ffff8800714c4000
Feb 20 13:35:59 nas1 kernel: RIP: 0010:[<ffffffff88120337>]  [<ffffffff88120337>] filp_close+0x17/0x80
Feb 20 13:35:59 nas1 kernel: RSP: 0018:ffff8800714c7888  EFLAGS: 00010286
Feb 20 13:35:59 nas1 kernel: RAX: ffff88024ea71000 RBX: 1d00ffffffffffff RCX: 0000000000000100
Feb 20 13:35:59 nas1 kernel: RDX: 0000000000000001 RSI: ffff88024eb9b540 RDI: 1d00ffffffffffff
Feb 20 13:35:59 nas1 kernel: RBP: ffff8800714c78a8 R08: 0000000000000000 R09: 0000000000000001
Feb 20 13:35:59 nas1 kernel: R10: 0000000000000001 R11: dead000000200200 R12: 000000000044487f
Feb 20 13:35:59 nas1 kernel: R13: 0000000000000001 R14: ffff8801e1a2eb00 R15: ffff88024eb9b540
Feb 20 13:35:59 nas1 kernel: FS:  0000000000000000(0000) GS:ffff88024fc80000(0000) knlGS:0000000000000000
Feb 20 13:35:59 nas1 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Feb 20 13:35:59 nas1 kernel: CR2: 00007f3bb768a000 CR3: 0000000008d5b000 CR4: 00000000000406e0
Feb 20 13:35:59 nas1 kernel: Stack:
Feb 20 13:35:59 nas1 kernel:  ffff8800714c7890 0000000000000001 000000000044487f 0000000000000001
Feb 20 13:35:59 nas1 kernel:  ffff8800714c78f8 ffffffff8813e684 ffff8800a5852010 0000000000000000
Feb 20 13:35:59 nas1 kernel:  ffffea0006ccffe8 ffff8800a5852010 ffff88024eb9b540 ffff8800a5852648
Feb 20 13:35:59 nas1 kernel: Call Trace:
Feb 20 13:35:59 nas1 kernel:  [<ffffffff8813e684>] put_files_struct+0x94/0x100
Feb 20 13:35:59 nas1 kernel:  [<ffffffff8813e795>] exit_files+0x45/0x50
Feb 20 13:35:59 nas1 kernel:  [<ffffffff8806aa57>] do_exit+0x747/0x9a0
Feb 20 13:35:59 nas1 kernel:  [<ffffffff88006d4f>] oops_end+0x8f/0xd0
Feb 20 13:35:59 nas1 kernel:  [<ffffffff88006ed3>] die+0x53/0x80
Feb 20 13:35:59 nas1 kernel:  [<ffffffff880045da>] do_general_protection+0xda/0x160
Feb 20 13:35:59 nas1 kernel:  [<ffffffff889b2d42>] general_protection+0x22/0x30
Feb 20 13:35:59 nas1 kernel:  [<ffffffff8813e11c>] ? __fget_light+0x5c/0x70
Feb 20 13:35:59 nas1 kernel:  [<ffffffff8813e13e>] ? __fdget+0xe/0x10
Feb 20 13:35:59 nas1 kernel:  [<ffffffff881369bd>] do_sys_poll+0x22d/0x5a0
Feb 20 13:35:59 nas1 kernel:  [<ffffffff8889bc9c>] ? skb_free_head+0x6c/0x80
Feb 20 13:35:59 nas1 kernel:  [<ffffffff8889bd73>] ? skb_release_data+0xc3/0xd0
Feb 20 13:35:59 nas1 kernel:  [<ffffffff8889be0c>] ? __kfree_skb+0x2c/0x80
Feb 20 13:35:59 nas1 kernel:  [<ffffffff8893c3d3>] ? unix_stream_recvmsg+0x433/0x780
Feb 20 13:35:59 nas1 kernel:  [<ffffffff881356b0>] ? __pollwait+0xf0/0xf0
Feb 20 13:35:59 nas1 kernel:  [<ffffffff881356b0>] ? __pollwait+0xf0/0xf0
Feb 20 13:35:59 nas1 kernel:  [<ffffffff881356b0>] ? __pollwait+0xf0/0xf0
Feb 20 13:35:59 nas1 kernel:  [<ffffffff8813c1c1>] ? touch_atime+0x71/0x160
Feb 20 13:35:59 nas1 kernel:  [<ffffffff8812b071>] ? pipe_read+0x281/0x2e0
Feb 20 13:35:59 nas1 kernel:  [<ffffffff88123896>] ? vfs_read+0x126/0x150
Feb 20 13:35:59 nas1 kernel:  [<ffffffff88136dfd>] SyS_poll+0x6d/0x100
Feb 20 13:35:59 nas1 kernel:  [<ffffffff889b1857>] system_call_fastpath+0x12/0x6a
Feb 20 13:35:59 nas1 kernel: Code: ff ff ff eb f5 66 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 48 ... 48 8b
Feb 20 13:35:59 nas1 kernel: RIP  [<ffffffff88120337>] filp_close+0x17/0x80
Feb 20 13:35:59 nas1 kernel:  RSP <ffff8800714c7888>
Feb 20 13:35:59 nas1 kernel: ---[ end trace fe1f1bb67fb99dcf ]---
Feb 20 13:35:59 nas1 kernel: Fixing recursive fault but reboot is needed!

 

Similar to previous scenarios, my shell from my ssh session became unresponsive, though I was able to access the web interface and initiate a shutdown.  However, that shutdown never completed, the LCD hung at the "Rebooting" notification.

 

In the logs there's at least 4 or 5 of these that have occured.  A LOT of meesages about the Fan speed being below min fan speed, but the fans are running fine and the system is staying appropriately cool as far as I can tell.  I'm also seeing a lot of ntp time adjustments.  I can grab and send in the logs again if you guys have any ideas, but since I have my offloaded backups all complete now, I could also just factory reset this and start over.  I'm suspicious that this is any kind of hardware issue since I was seeing the same behavior with these exact disks in my backup Ultra 6 Plus.

 

Chris

Message 17 of 22
mdgm-ntgr
NETGEAR Employee Retired

Re: System hang, fan issues with 6.4.x on Pioneer Pro, Ultra 6 Plus

You could try putting 4.2.28 back on the box and seeing if you still have the same problem.

Message 18 of 22
mdgm-ntgr
NETGEAR Employee Retired

Re: System hang, fan issues with 6.4.x on Pioneer Pro, Ultra 6 Plus

Any update?

Message 19 of 22
ThirtyReset
Tutor

Re: System hang, fan issues with 6.4.x on Pioneer Pro, Ultra 6 Plus

Sorry, been a bit distracted lately.  Rather than nuking my box all the way back to 4.2.28, I did a factory reset back to a fresh 6.4.2 install.  I put minimal apps/services back on this fresh re-install - just put htop, Transmission, and the latest Logitech Media Server back on.  I was good for about 2 days and the same behavior has resumed - loss of connectivity, command-line sessions hanging/never recovering, and more kernel GPFs in the logs.

 

At this point, I'm giving strong thought to following the procedure floating around the forums to reset back to 6.2.4, where everything was rock solid.  I'm not sure what specifically is wrong in 6.4.x but I'm seeing other posts where people are experiencing the same issues on Pro 6, Ultra 6 Pluses, etc. so I don't think it's something specific to my hardware.  I have a spare Ultra 6 Plus here that is also setup with 6.4.2, I may leave it up and running and see how it behaves but I think at this point there's still something in 6.4.x that's not happy on the Pro Pioneer/Pro 6/Ultra 6 Plus hardware...

Message 20 of 22
ThirtyReset
Tutor

Re: System hang, fan issues with 6.4.x on Pioneer Pro, Ultra 6 Plus

Note that I have everything backed up on this NAS now to external drives so I can easily rebuild if needed.  I already had backups of everything, of course, but my media files in particular were spread across a few spots but got them centralized in one volume so I can more easily rsync the data back on to a reset image.

 

So if anyone has other suggestions on what to try, let me know.  I'm willing to put a little bit of time into being a guinea pig for this if it helps others.

Message 21 of 22
ThirtyReset
Tutor

Re: System hang, fan issues with 6.4.x on Pioneer Pro, Ultra 6 Plus

Just a quick update, I followed the procedure on these forums to roll my ReadyNAS Pro Pioneer back to 6.2.4 a few weeks ago.  I'm running stable with no issues, no fan problems, and no crashes for at least 2 weeks now whereas on 6.4.2 I was crashing every 2-3 days.

 

I'll keep an eye on 6.4.x progress but I would have to say I don't think 6.4.x is a good fit for the legacy hardware yet...

Message 22 of 22
Top Contributors
Discussion stats
  • 21 replies
  • 2922 views
  • 0 kudos
  • 3 in conversation
Announcements