System hang, fan issues with 6.4.x on Pioneer Pro, Ultra 6 Plus

StephenB
Guru - Experienced User
Feb 13, 2016
ThirtyReset wrote:

StephenB, thanks for the clarification - I thought that was the case but wasn't too sure. I've got it set back to Layer 2 for now, but I'm not convinced yet that this could be the source of my system instability. Do you think that issues with congestion triggered by this could cause the spike in load I'm seeing that's making the NAS unresponsive?

I'm not convinced this causing the instability either. You could look at the switch stats, and see if there actually is congestion (usually there are stats on packet queues).

But the ethernet checksum error is troubling in its own right (and it is rx, so the xmit hashing algorithm choice isn't related to that).
ThirtyReset
Tutor
Feb 13, 2016
No noticeable congestion on the switch, but the statistics on the ProCurve aren't all that informative either. Agreed about the checksum error. Making a mental note to monitor status of my switch, may be indicative of some issue on that front.

I've had some trouble with my MacPro and the mDNSresponder service losing DNS connectivity with the nas (where I was running dnsmasq until recently), I'm suspicious that may be related.

Of course, that brings me back to still trying to figure out the stability issue. If it holds to normal patterns, it should lock up on me again sometime this weekend. I'm in the process of backing up the last chunk of data now that wasn't backed up (only because that data itself was backup from other sources, mind you - backing up to an external drive as an added precaution plus ease of restoration later). Once that's done, I'm tempted to factory default this and give it a fresh installation and rebuild, but I'm not there yet.

Of course, I'm also temptingly eyeing RN516's right now...I can't complain, my Pro Pioneer still does an amazingly bang-up job (especially after the CPU was upgraded a few years back). Really, only dealing with this backup right now makes me wish for something a bit more modern with USB3 or at least eSATA :)
StephenB
Guru - Experienced User
Feb 13, 2016
One possibility is to do a direct connect to the PC, and try backing up to a PC-attached drive.

That's the fastest backup method, and will also give you more evidence on how the main issue relates to the network.

The bad news is that it takes the NAS off-line.
ThirtyReset
Tutor
Feb 20, 2016
So the good news is the backup finished several days ago. We've had some family activity the last week (birth of my daughter!), so this hasn't been high on the list, but now that things are settling in at home and baby loves to have my Squeezebox radios running in the background all day, I need to sort this out again.

Still seeing handing behavior similar to previous descriptions. Hits a load average of 4.0x, at which point I can usually ssh in and do a few things before it becomes completely unresponsive.   On at least one occasion, it appeared that CrashPlan was chewing up a lot of memory but the system was still not in swap yet. Killing off CrashPlan made no difference, however.

This last time it went unresponsive I got some more info out of the logs, a protection fault coming from the nv6lcd and vpd:

Feb 20 13:35:59 nas1 kernel: general protection fault: 0000 [#3] SMP
Feb 20 13:35:59 nas1 kernel: Modules linked in: nv6lcd(O) vpd(PO)
Feb 20 13:35:59 nas1 kernel: CPU: 1 PID: 20911 Comm: avahi-publish-s Tainted: P      D    O    4.1.16.x86_64.1 #1
Feb 20 13:35:59 nas1 kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E....26/2010
Feb 20 13:35:59 nas1 kernel: task: ffff8800a5852010 ti: ffff8800714c4000 task.ti: ffff8800714c4000
Feb 20 13:35:59 nas1 kernel: RIP: 0010:[<ffffffff8813e11c>] [<ffffffff8813e11c>] __fget_light+0x5c/0x70
Feb 20 13:35:59 nas1 kernel: RSP: 0018:ffff8800714c7ae0 EFLAGS: 00010202
Feb 20 13:35:59 nas1 kernel: RAX: 0000000000000000 RBX: ffff8800714c7b8c RCX: ffff8800714c7b68
Feb 20 13:35:59 nas1 kernel: RDX: 1d00010004060008 RSI: 0000000000004000 RDI: 0000000000000003
Feb 20 13:35:59 nas1 kernel: RBP: ffff8800714c7ae8 R08: ffff8800714c4000 R09: 0000000000000000
Feb 20 13:35:59 nas1 kernel: R10: 00000000000002ef R11: 0000000000000000 R12: 0000000000000000
Feb 20 13:35:59 nas1 kernel: R13: 0000000000000000 R14: ffff8800714c7b74 R15: 0000000000000000
Feb 20 13:35:59 nas1 kernel: FS: 00007f3e28c37700(0000) GS:ffff88024fc80000(0000) knlGS:0000000000000000
Feb 20 13:35:59 nas1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Feb 20 13:35:59 nas1 kernel: CR2: 00007f3bb768a000 CR3: 00000001d5f82000 CR4: 00000000000406e0
Feb 20 13:35:59 nas1 kernel: Stack:
Feb 20 13:35:59 nas1 kernel: ffffffff8813e13e ffff8800714c7f08 ffffffff881369bd ffff8800714c7b08
Feb 20 13:35:59 nas1 kernel: ffff8801d63b9000 ffff8801d63b9000 ffff8800714c8000 ffff8800a5852010
Feb 20 13:35:59 nas1 kernel: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
Feb 20 13:35:59 nas1 kernel: Call Trace:
Feb 20 13:35:59 nas1 kernel: [<ffffffff8813e13e>] ? __fdget+0xe/0x10
Feb 20 13:35:59 nas1 kernel: [<ffffffff881369bd>] do_sys_poll+0x22d/0x5a0
Feb 20 13:35:59 nas1 kernel: [<ffffffff8889bc9c>] ? skb_free_head+0x6c/0x80
Feb 20 13:35:59 nas1 kernel: [<ffffffff8889bd73>] ? skb_release_data+0xc3/0xd0
Feb 20 13:35:59 nas1 kernel: [<ffffffff8889be0c>] ? __kfree_skb+0x2c/0x80
Feb 20 13:35:59 nas1 kernel: [<ffffffff8893c3d3>] ? unix_stream_recvmsg+0x433/0x780
Feb 20 13:35:59 nas1 kernel: [<ffffffff881356b0>] ? __pollwait+0xf0/0xf0
Feb 20 13:35:59 nas1 kernel: [<ffffffff881356b0>] ? __pollwait+0xf0/0xf0
Feb 20 13:35:59 nas1 kernel: [<ffffffff881356b0>] ? __pollwait+0xf0/0xf0
Feb 20 13:35:59 nas1 kernel: [<ffffffff8813c1c1>] ? touch_atime+0x71/0x160
Feb 20 13:35:59 nas1 kernel: [<ffffffff8812b071>] ? pipe_read+0x281/0x2e0
Feb 20 13:35:59 nas1 kernel: [<ffffffff88123896>] ? vfs_read+0x126/0x150
Feb 20 13:35:59 nas1 kernel: [<ffffffff88136dfd>] SyS_poll+0x6d/0x100
Feb 20 13:35:59 nas1 kernel: [<ffffffff889b1857>] system_call_fastpath+0x12/0x6a
Feb 20 13:35:59 nas1 kernel: Code: 0f 45 c1 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 48 8b 50 08 31 c0 3b 3a 73 ea ... 00 55
Feb 20 13:35:59 nas1 kernel: RIP [<ffffffff8813e11c>] __fget_light+0x5c/0x70
Feb 20 13:35:59 nas1 kernel: RSP <ffff8800714c7ae0>
Feb 20 13:35:59 nas1 kernel: ---[ end trace fe1f1bb67fb99dce ]---
Feb 20 13:35:59 nas1 kernel: general protection fault: 0000 [#4] SMP
Feb 20 13:35:59 nas1 kernel: Modules linked in: nv6lcd(O) vpd(PO)
Feb 20 13:35:59 nas1 kernel: CPU: 1 PID: 20911 Comm: avahi-publish-s Tainted: P      D    O    4.1.16.x86_64.1 #1
Feb 20 13:35:59 nas1 kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E....26/2010
Feb 20 13:35:59 nas1 kernel: task: ffff8800a5852010 ti: ffff8800714c4000 task.ti: ffff8800714c4000
Feb 20 13:35:59 nas1 kernel: RIP: 0010:[<ffffffff88120337>] [<ffffffff88120337>] filp_close+0x17/0x80
Feb 20 13:35:59 nas1 kernel: RSP: 0018:ffff8800714c7888 EFLAGS: 00010286
Feb 20 13:35:59 nas1 kernel: RAX: ffff88024ea71000 RBX: 1d00ffffffffffff RCX: 0000000000000100
Feb 20 13:35:59 nas1 kernel: RDX: 0000000000000001 RSI: ffff88024eb9b540 RDI: 1d00ffffffffffff
Feb 20 13:35:59 nas1 kernel: RBP: ffff8800714c78a8 R08: 0000000000000000 R09: 0000000000000001
Feb 20 13:35:59 nas1 kernel: R10: 0000000000000001 R11: dead000000200200 R12: 000000000044487f
Feb 20 13:35:59 nas1 kernel: R13: 0000000000000001 R14: ffff8801e1a2eb00 R15: ffff88024eb9b540
Feb 20 13:35:59 nas1 kernel: FS: 0000000000000000(0000) GS:ffff88024fc80000(0000) knlGS:0000000000000000
Feb 20 13:35:59 nas1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Feb 20 13:35:59 nas1 kernel: CR2: 00007f3bb768a000 CR3: 0000000008d5b000 CR4: 00000000000406e0
Feb 20 13:35:59 nas1 kernel: Stack:
Feb 20 13:35:59 nas1 kernel: ffff8800714c7890 0000000000000001 000000000044487f 0000000000000001
Feb 20 13:35:59 nas1 kernel: ffff8800714c78f8 ffffffff8813e684 ffff8800a5852010 0000000000000000
Feb 20 13:35:59 nas1 kernel: ffffea0006ccffe8 ffff8800a5852010 ffff88024eb9b540 ffff8800a5852648
Feb 20 13:35:59 nas1 kernel: Call Trace:
Feb 20 13:35:59 nas1 kernel: [<ffffffff8813e684>] put_files_struct+0x94/0x100
Feb 20 13:35:59 nas1 kernel: [<ffffffff8813e795>] exit_files+0x45/0x50
Feb 20 13:35:59 nas1 kernel: [<ffffffff8806aa57>] do_exit+0x747/0x9a0
Feb 20 13:35:59 nas1 kernel: [<ffffffff88006d4f>] oops_end+0x8f/0xd0
Feb 20 13:35:59 nas1 kernel: [<ffffffff88006ed3>] die+0x53/0x80
Feb 20 13:35:59 nas1 kernel: [<ffffffff880045da>] do_general_protection+0xda/0x160
Feb 20 13:35:59 nas1 kernel: [<ffffffff889b2d42>] general_protection+0x22/0x30
Feb 20 13:35:59 nas1 kernel: [<ffffffff8813e11c>] ? __fget_light+0x5c/0x70
Feb 20 13:35:59 nas1 kernel: [<ffffffff8813e13e>] ? __fdget+0xe/0x10
Feb 20 13:35:59 nas1 kernel: [<ffffffff881369bd>] do_sys_poll+0x22d/0x5a0
Feb 20 13:35:59 nas1 kernel: [<ffffffff8889bc9c>] ? skb_free_head+0x6c/0x80
Feb 20 13:35:59 nas1 kernel: [<ffffffff8889bd73>] ? skb_release_data+0xc3/0xd0
Feb 20 13:35:59 nas1 kernel: [<ffffffff8889be0c>] ? __kfree_skb+0x2c/0x80
Feb 20 13:35:59 nas1 kernel: [<ffffffff8893c3d3>] ? unix_stream_recvmsg+0x433/0x780
Feb 20 13:35:59 nas1 kernel: [<ffffffff881356b0>] ? __pollwait+0xf0/0xf0
Feb 20 13:35:59 nas1 kernel: [<ffffffff881356b0>] ? __pollwait+0xf0/0xf0
Feb 20 13:35:59 nas1 kernel: [<ffffffff881356b0>] ? __pollwait+0xf0/0xf0
Feb 20 13:35:59 nas1 kernel: [<ffffffff8813c1c1>] ? touch_atime+0x71/0x160
Feb 20 13:35:59 nas1 kernel: [<ffffffff8812b071>] ? pipe_read+0x281/0x2e0
Feb 20 13:35:59 nas1 kernel: [<ffffffff88123896>] ? vfs_read+0x126/0x150
Feb 20 13:35:59 nas1 kernel: [<ffffffff88136dfd>] SyS_poll+0x6d/0x100
Feb 20 13:35:59 nas1 kernel: [<ffffffff889b1857>] system_call_fastpath+0x12/0x6a
Feb 20 13:35:59 nas1 kernel: Code: ff ff ff eb f5 66 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 48 ... 48 8b
Feb 20 13:35:59 nas1 kernel: RIP [<ffffffff88120337>] filp_close+0x17/0x80
Feb 20 13:35:59 nas1 kernel: RSP <ffff8800714c7888>
Feb 20 13:35:59 nas1 kernel: ---[ end trace fe1f1bb67fb99dcf ]---
Feb 20 13:35:59 nas1 kernel: Fixing recursive fault but reboot is needed!

Similar to previous scenarios, my shell from my ssh session became unresponsive, though I was able to access the web interface and initiate a shutdown. However, that shutdown never completed, the LCD hung at the "Rebooting" notification.

In the logs there's at least 4 or 5 of these that have occured. A LOT of meesages about the Fan speed being below min fan speed, but the fans are running fine and the system is staying appropriately cool as far as I can tell. I'm also seeing a lot of ntp time adjustments. I can grab and send in the logs again if you guys have any ideas, but since I have my offloaded backups all complete now, I could also just factory reset this and start over. I'm suspicious that this is any kind of hardware issue since I was seeing the same behavior with these exact disks in my backup Ultra 6 Plus.

Chris
mdgm-ntgr
NETGEAR Employee Retired
Feb 23, 2016
You could try putting 4.2.28 back on the box and seeing if you still have the same problem.
mdgm-ntgr
NETGEAR Employee Retired
Feb 26, 2016
Any update?
ThirtyReset
Tutor
Feb 26, 2016
Sorry, been a bit distracted lately. Rather than nuking my box all the way back to 4.2.28, I did a factory reset back to a fresh 6.4.2 install. I put minimal apps/services back on this fresh re-install - just put htop, Transmission, and the latest Logitech Media Server back on. I was good for about 2 days and the same behavior has resumed - loss of connectivity, command-line sessions hanging/never recovering, and more kernel GPFs in the logs.

At this point, I'm giving strong thought to following the procedure floating around the forums to reset back to 6.2.4, where everything was rock solid. I'm not sure what specifically is wrong in 6.4.x but I'm seeing other posts where people are experiencing the same issues on Pro 6, Ultra 6 Pluses, etc. so I don't think it's something specific to my hardware. I have a spare Ultra 6 Plus here that is also setup with 6.4.2, I may leave it up and running and see how it behaves but I think at this point there's still something in 6.4.x that's not happy on the Pro Pioneer/Pro 6/Ultra 6 Plus hardware...
ThirtyReset
Tutor
Feb 26, 2016
Note that I have everything backed up on this NAS now to external drives so I can easily rebuild if needed. I already had backups of everything, of course, but my media files in particular were spread across a few spots but got them centralized in one volume so I can more easily rsync the data back on to a reset image.

So if anyone has other suggestions on what to try, let me know. I'm willing to put a little bit of time into being a guinea pig for this if it helps others.
ThirtyReset
Tutor
Mar 18, 2016
Just a quick update, I followed the procedure on these forums to roll my ReadyNAS Pro Pioneer back to 6.2.4 a few weeks ago. I'm running stable with no issues, no fan problems, and no crashes for at least 2 weeks now whereas on 6.4.2 I was crashing every 2-3 days.

I'll keep an eye on 6.4.x progress but I would have to say I don't think 6.4.x is a good fit for the legacy hardware yet...

Forum Discussion

System hang, fan issues with 6.4.x on Pioneer Pro, Ultra 6 Plus

Related Content

Orbi Systems Manual Factory Reset Process

Issue connecting 2.4 gHZ to Orbi system

Wired backhaul issue with MR60/MS60 System

Orbi System causing issue with EPSN app

QOS issues on Orbi RBK50 system

NETGEAR Academy

ProSupport for Business