ReadyNAS NV+ lockups

Question

I have an NV+ with the latest firmware (4.1.8) and 4 2TB WD Green SATA II hard drives (WD20EARS) in an X-RAID configuration. I primarily use the NAS to store movies, music, and TV shows that everybody in the family accesses with XBMC on their local machines.

Everything has been working fine for about 10 months, and the past week it's been going a little nuts. TV shows would freeze right in the middle of a show and we'd have to end the task to force XBMC to close. I would frequently have to reboot the NAS, but it would work sporadically for anywhere from 5 minutes to 5 hours before locking up again.

I've resynched the volume (a couple of times now), rebooted running volume scans, etc, but it never reports any errors except for once last Wednesday. All at once I got 4 of these messages:

Access to the disk on channel (??) is producing I/O errors. Although the array is still redundant, please replace this drive as soon as possible, as it is likely to fail soon.

I haven't received another error since and all the drive lights are green. So I downloaded all the logs and noticed a lot of these messages:


May 19 12:10:37 NAS kernel: ::hdc: drive_cmd: status=0x51 { DriveReady SeekComplete Error }
May 19 12:10:37 NAS kernel: hdc: drive_cmd: error=0x04 { DriveStatusError }
May 19 12:10:37 NAS kernel: ide: failed opcode was: 0xef
May 19 12:10:37 NAS kernel: hde: drive_cmd: status=0x51 { DriveReady SeekComplete Error }
May 19 12:10:37 NAS kernel: hde: drive_cmd: error=0x04 { DriveStatusError }
May 19 12:10:37 NAS kernel: ide: failed opcode was: 0xef
May 19 12:10:37 NAS kernel: hdg: drive_cmd: status=0x51 { DriveReady SeekComplete Error }
May 19 12:10:37 NAS kernel: hdg: drive_cmd: error=0x04 { DriveStatusError }
May 19 12:10:37 NAS kernel: ide: failed opcode was: 0xef

There are lots of those. And lots of these too:


 Need to re-set all of the sata channel  f
May 18 17:53:16 NAS kernel: SDW_STATUS on drive 0 = 0x50
May 18 17:53:16 NAS kernel: SDW_STATUS on drive 1 = 0x50
May 18 17:53:16 NAS kernel: SDW_STATUS on drive 2 = 0x51
May 18 17:53:16 NAS kernel: SDW_STATUS on drive 3 = 0x50
May 18 17:53:16 NAS kernel: Chn=2,this channel failed 90 (0/0/90/0) times,need to take channel 7 offline.
May 18 17:53:16 NAS kernel: Reset sata channel 0
May 18 17:53:16 NAS kernel: Need to disable irq on 0
May 18 17:53:16 NAS kernel: ==== SATA init channel 0
May 18 17:53:16 NAS kernel: After INIT SATA channel 0, retry=366, sata=113, status=50
May 18 17:53:16 NAS kernel: Need to enable irq on 0
May 18 17:53:16 NAS kernel: Reset sata channel 1
May 18 17:53:16 NAS kernel: Need to disable irq on 1
May 18 17:53:16 NAS kernel: ==== SATA init channel 1
May 18 17:53:16 NAS kernel: After INIT SATA channel 1, retry=0, sata=113, status=50
May 18 17:53:16 NAS kernel: Need to enable irq on 1
May 18 17:53:16 NAS kernel: Reset sata channel 2
May 18 17:53:16 NAS kernel: ==== SATA init channel 2
May 18 17:53:16 NAS kernel: After INIT SATA channel 2, retry=449, sata=113, status=50
May 18 17:53:16 NAS kernel: Reset sata channel 3
May 18 17:53:16 NAS kernel: Need to disable irq on 3
May 18 17:53:16 NAS kernel: ==== SATA init channel 3
May 18 17:53:16 NAS kernel: After INIT SATA channel 3, retry=364, sata=113, status=50
May 18 17:53:16 NAS kernel: Need to enable irq on 3
May 18 17:53:16 NAS kernel: failed channel image=0
May 18 17:53:16 NAS kernel: failed_channel image=0
May 18 17:53:16 NAS kernel: Release waiting_for_dma flag 2
May 18 17:53:16 NAS kernel: Release the req pointer 80ff4510
May 18 17:53:16 NAS kernel: Timer out1  needIO=1, X_mode=0 hwgroup=81e95f40, irq=34
May 18 17:53:16 NAS kernel: ****Waiting_intr for chn 1 00000000 0****
May 18 17:53:16 NAS kernel: ****Waiting_intr for chn 2 00000000 0****
May 18 17:53:16 NAS kernel: ****Waiting_intr for chn 3 00000000 1****
May 18 17:53:16 NAS kernel: ****Waiting_intr for chn 4 00000000 0****
May 18 17:53:16 NAS kernel: Enable IRQ 34, on 8041e8b0
May 18 17:53:16 NAS kernel: Another IRQ 34, on 81e95f40
May 18 17:53:16 NAS kernel: enable_irq(34) unbalanced from f811c118
May 18 17:53:16 NAS kernel: Warning:chn 2, there is a pending retry on req 80ff4510, current req 80ff4510
May 18 17:53:16 NAS kernel: hwif->irq = 34, retry 1
May 18 17:53:16 NAS kernel: sata_hotplug: /sbin/hotplug retry hdgUser mode helper start.
May 18 17:53:16 NAS kernel: done do_sata_hotplug
May 18 17:53:16 NAS kernel: ****Waiting_intr for chn 0 00000000****
May 18 17:53:16 NAS kernel: ****Waiting_intr for chn 1 00000000****
May 18 17:53:16 NAS kernel: ****Waiting_intr for chn 2 8041e948****
May 18 17:53:16 NAS kernel: ****Waiting_intr for chn 3 00000000****
May 18 17:53:16 NAS kernel: ****Waiting_intr for chn 4 00000000****
May 18 17:53:16 NAS kernel: ****Waiting_intr for chn 5 00000000****
May 18 17:53:16 NAS kernel: ****Waiting_intr for chn 6 00000000****
May 18 17:53:16 NAS kernel: ****Waiting_intr for chn 7 00000000****
May 18 17:53:16 NAS kernel: ----- lp_timer_expiry:0, chn=2,status=0x51 -----

I don't know what any of it means... is one of my drives failing? How do I know which one?

arjoseph1 · Answer

Check the disk smart logs and find which disk has "Re-allocated sectors", Current Pending Sector" and ATA errors. Or you may contact support for a much better diagnosis. Once you get a case #, please edit the subject of your post and add the case # at the end. Thanks.

turick · Answer

Well, based on this, I took a stab at thinking channel 2 (drive 3) was the culprit:

May 18 17:53:16 NAS kernel: SDW_STATUS on drive 0 = 0x50
May 18 17:53:16 NAS kernel: SDW_STATUS on drive 1 = 0x50
May 18 17:53:16 NAS kernel: SDW_STATUS on drive 2 = 0x51
May 18 17:53:16 NAS kernel: SDW_STATUS on drive 3 = 0x50

and

May 19 12:10:37 NAS kernel: ::hdc: drive_cmd: status=0x51 { DriveReady SeekComplete Error }

So I pulled the drive and ran Western Digital's LifeGuard Diagnostics tool and did an extended test. Within a few minutes it had already reported bad sectors and asked if I wanted to try to repair them. I said yes, but it failed.

I preemptively ordered a new WD20EARS drive a couple of days ago, so when it comes in I'll fire it up and see if my problems are solved.

turick · Answer

So the new WD20EARS came and popped it in. I noticed that there are numbers after the model number which apparently indicate firmware versions, etc, and the new drive wasn't the same as the old drives. I still had nothing but problems and lockups. I ended up pulling the drive out and the NAS worked fine. I RMAed original drive and got an identical one back. It locked up a couple of times while re-synching the volume, but after the last reboot, it's been stable for several days.

I wish the firmware did a better job of reporting the errors. It's kind of crappy that the whole unit takes a crap and locks up and that's how you know a drive is failing. The underlying OS should remain stable and report what's going on.

turick · Answer

So I'm getting real discouraged with my NV+. It has worked OK for a couple of months, but recently it just started not responding for 10-20 seconds at a time -- we'd be watching a video while my daughter was streaming music... our video will stop and her music will start skipping, then it just starts working again.

Then out of nowhere it became completely unresponsive. I reboot it and it hangs on quota checks. If I leave my browser up and running it might connect to the page after about 15-20 minutes of loading. At one point I was able to navigate to the point where I could download the logs, and after another 15 minutes of waiting, it downloaded a zip file with 2 txt files inside, neither of which were actually the system logs that would show me any i/o errors or anything like that.

I'm very aggravated with my NAS and it's been constant work to keep it alive. I just double-checked the compatibility list, and although I was sure my drives were on there when I bought the NAS a year ago, they're not on there now. I have WD Caviar Green WD20EARS drives, and the closest drive on the list is WD Caviar Green WD20EARX.

Could this be causing all my issues? Does anybody else have constant issues like this with drives on the compatibility list?

StephenB · Answer

I suggest opening a support ticket.  I'd also check the SMART+ stats on the disks, and look at the ethernet stats for transmission/reception errors.

Forum Discussion

ReadyNAS NV+ lockups

5 Replies

Related Content

Readynas nv+

readynas nv+

Best way to swap drives ReadyNAS nv+ v2

RND4000 (ReadyNAS NV+

Issue adding drive 4 to ReadyNAS nv+ v2

NETGEAR Academy

ProSupport for Business