Forum Discussion

Aspirant

Jul 27, 2016

Solved

NAS Slow, Reboot Slow, Drive Light Blinking

My ReadyNAS 516 has been unreasonably slow lately. The system is not sharing files properly, seems to be accessing slow for both reading and writing, and even the web interface is slow. The drive array is reporting to be in a valid state, although a few drives are starting to report errors.

Today, when I rebooted the unit, the reboot process was really slow (it's been about 15 minutes and we are still only at 41%), and while I was watching the box reboot, I noticed that the drive light for one of the disks (#2 from the top) kept blinking constantly and steadily -- whereas the other drives would blink only very sporadically and usually only once at a time.

The web console does not indicate any major issues with the RAID volume at all, and it only reports minor issues on the disks. Here is the piece from the VOLUME.LOG file regarding this drive, which I extracted before the reboot:

        sdb:
           HostID: 0ed5fea8
           Flags: 0x0
           Size: 7814037168 (3726 GB)
           Free: 4054
           Controller 0
           Channel: 1
           Model: WDC WD4001FAEX-00MJRA0
           Serial: WD-WCC131021590
           Firmware: 01.01L01
           SMART data
               Reallocated Sectors:            3
               Reallocation Events:            3
               Spin Retry Count:               0
               Current Pending Sector Count:   11
               Uncorrectable Sector Count:     14
               Temperature:                    50
               Start/Stop Count:               9
               Power-On Hours:                 25400
               Power Cycle Count:              9
               Load Cycle Count:               6
               Latest Self Test:               Passed

However, this has far fewer issues than drive 6:

        sdf:
           HostID: 0ed5fea8
           Flags: 0x0
           Size: 7814037168 (3726 GB)
           Free: 4054
           Controller 0
           Channel: 5
           Model: WDC WD4001FAEX-00MJRA0
           Serial: WD-WMC1F0561108
           Firmware: 01.01L01
           SMART data
               Reallocated Sectors:            20
               Reallocation Events:            3
               Spin Retry Count:               0
               Current Pending Sector Count:   238
               Uncorrectable Sector Count:     234
               Temperature:                    46
               Start/Stop Count:               3
               Power-On Hours:                 5025
               Power Cycle Count:              3
               Load Cycle Count:               0
               Latest Self Test:               Passed

And drive 6 seems to have no similar issues with the blinking.

I do have one spare drive already in-house, so I can replace one of the drives. Do I replace #2? Obviously, I need to wait for it to finish booting, though, right? Or is it better to just power it down again, replace the drive, and let the RAID volume rebuild itself?

Business Discussions

BtrieveBill
Jul 27, 2016
I would have also opted to replace Drive 6, if it were an option. However, Drive 2 was the one blinking incessently, and even though it had fewer errors, it was apparently the squeakiest wheel today. Further, the reboot NEVER finished. It hung at 94% for over 90 minutes.

I finally gave up on the reboot and powered down the ReadyNAS entirely a second time, replaced Drive 2, and rebooted. As advertised, it booted up in about 5 minutes, detected the degraded array, and immediately started the Rebuild Process. The system is now working substantially better, and even with the RAID rebuild running, it is turning out better performance than I was getting all this week. I can now send that drive back to WD, get the replacement, and then swap out drive 6 later on. (Strangely, drive 6 was the only drive that had been replaced once before. When the new drive 6 was put in, drive 6 started spewing errors after about a week. This makes me wonder if there is not a problem with the SATA controller or cabling, and that perhaps drive 6 is really OK.)

Lessons learned:
1) Don't assume that the system is working properly, just becasue the Web console shows all drives are green.
2) Don't assume that the drive with the most errors is the one with the biggest problem.
3) Ignore the data in the logs and just replace the drive that is blinking out of sync with everyone else.
4) Always have at least one spare drive on standby.

24 Replies

Replies have been turned off for this discussion

BtrieveBill
Aspirant
Jul 27, 2016
Update: It suddenly jumped from 41% to 94% in the boot process -- and has been sitting at 94% now for another 10 minutes already. Activity light for drive #2 is still blinking steadily and continuously. Others are steady.
StephenB
Guru - Experienced User
Jul 27, 2016
BtrieveBill wrote:

I do have one spare drive already in-house, so I can replace one of the drives. Do I replace #2? Obviously, I need to wait for it to finish booting, though, right? Or is it better to just power it down again, replace the drive, and let the RAID volume rebuild itself?

I'd wait for the resync to finish, and then re-check the SMART stats.

But based on the stats you've posted so far, I'd replace drive 6 next. Current Pending Sectors happen on failed reads, Reallocated Sectors happen on failed writes. Both are bad, and I generally sum them when I'm assessing disk condition. So you have 14 bad sectors on drive 2, but 258 bad sectors on drive 6.
- BtrieveBill
  Aspirant
  Jul 27, 2016
  I would have also opted to replace Drive 6, if it were an option. However, Drive 2 was the one blinking incessently, and even though it had fewer errors, it was apparently the squeakiest wheel today. Further, the reboot NEVER finished. It hung at 94% for over 90 minutes.
  
  I finally gave up on the reboot and powered down the ReadyNAS entirely a second time, replaced Drive 2, and rebooted. As advertised, it booted up in about 5 minutes, detected the degraded array, and immediately started the Rebuild Process. The system is now working substantially better, and even with the RAID rebuild running, it is turning out better performance than I was getting all this week. I can now send that drive back to WD, get the replacement, and then swap out drive 6 later on. (Strangely, drive 6 was the only drive that had been replaced once before. When the new drive 6 was put in, drive 6 started spewing errors after about a week. This makes me wonder if there is not a problem with the SATA controller or cabling, and that perhaps drive 6 is really OK.)
  
  Lessons learned:
  1) Don't assume that the system is working properly, just becasue the Web console shows all drives are green.
  2) Don't assume that the drive with the most errors is the one with the biggest problem.
  3) Ignore the data in the logs and just replace the drive that is blinking out of sync with everyone else.
  4) Always have at least one spare drive on standby.
  - Hopchen
    Prodigy
    Jul 27, 2016
    1) Don't assume that the system is working properly, just becasue the Web console shows all drives are green.
    
    I believe the green dot is more an indication of whether disks are online or not. Hold your mouse over the disk to see more detailed info.
    
    2) Don't assume that the drive with the most errors is the one with the biggest problem.
    
    Definitely never assume this. A disk with only very errors can cause big issues.
    
    3) Ignore the data in the logs and just replace the drive that is blinking out of sync with everyone else.
    
    Don't ignore the logs! :) Those are really important. Rather - always trust the logs. Pull logs regularly and inspect them. I suggest you also setup email alerts to warm you about things such as disk failures.
    
    4) Always have at least one spare drive on standby.
    
    Yup, very good idea. And always have an up-to-date backup.
omicron_persei8
Luminary
Jul 31, 2016
In case this is useful:
http://kb.netgear.com/app/answers/detail/a_id/30046/
There should be a log in the GUI about the sdb drive increasing errors, but surely no email about it! And that's a shame. Cf disk alerts thresholds.
There should be a log and at least a mail.
Obviously mail alert settings must be configured to receive a mail.
- StephenB
  Guru - Experienced User
  Jul 31, 2016
  omicron_persei8 wrote:
  
  There should be a log and at least a mail.
  
  If email alerts were configured, then there would have been email, at least on the reallocated sectors. It might not send an email (or log) with increasing pending sectors - if not, it should.
  
  FWIW, everyone should make sure that their email alerts work, and should take prompt action on disk errors.
omicron_persei8
Luminary
Jul 31, 2016
Sdb errors are below thresholds. It would have be good if those thresholds were lower, but I don't think any email was generated for sdb
- StephenB
  Guru - Experienced User
  Jul 31, 2016
  omicron_persei8 wrote:
  Sdb errors are below thresholds.
  
  AFAIK there are no thresholds.
  
  If I'm wrong on that, it would be useful to know what those thresholds are (and ideally how to set them to 0).
omicron_persei8
Luminary
Jul 31, 2016
Those are the thresholds:
http://kb.netgear.com/app/answers/detail/a_id/30046/
Afaik, there is no way to change them. And imo, some are very high.
omicron_persei8
Luminary
Jul 31, 2016
Your HDD gets 1000 reallocated sectors in one time, no mail...
- StephenB
  Guru - Experienced User
  Jul 31, 2016
  omicron_persei8 wrote:
  Your HDD gets 1000 reallocated sectors in one time, no mail...
  
  I've always received them (though I just haven't had any reallocated sectors on my OS 6 system).
  
  But with OS 4.2
  
  ReadyNas PRO wrote:
  Reallocated sector count has increased in the last day.
  
  Disk 2:
  Previous count: 0
  Current count: 10
  
  Growing SMART errors indicate a disk that may fail soon. If the errors continue to increase, you should be prepared to replace the disk
  - Retired_Member
    Jul 31, 2016
    My understanding is that disk alerts and their thresholds are handled differently in RAIDiator 4.2 and ReadyNAS OS6.
    I have seen these email alerts for "small" amount errors on RAIDiator 4.2, but never on ReadyNAS OS6, and the KB article posted by omicron_persei8 would confirm that.
omicron_persei8
Luminary
Aug 02, 2016
How's NETGEAR responsible for your multiple HDD failure? Do you think they invented RAID?
I do agree though that just showing all HDDs red in this situation can be quite confusing. In a sense, it's correct, they all have inactive volumes, but I'm sure there would a more user friendly way to show a dead RAID.
- BtrieveBill
  Aspirant
  Aug 02, 2016
  I never implied that NetGear was responsible for the drives. My point is simply this: A RAID6 array, by its very definition, is supposed to allow for up to two drives to completely fail within the array, and yet still retain full functionality (albeit not full performance). This is why companies purchase RAID solutions in the first place -- it is a known fact that hard drives WILL fail, it is only a matter of when. The RAID array is supposed to provide protection from this failure wiping out the data. In this case, we had ONE drive fail, and it was replaced and everything was starting to work normally once again.
  
  My complaint about the ReadyNAS is the RAID implementation itself. What would cause the entire data volume to just "go away" in the middle of the night when usage is non-existent? Further, how can it present only a very cryptic error message, where your only option is to tear down and rebuild the array and volume, losing all data in the process? In effect, all redundancy was lost. This is no longer RAID, but should be classified as JBOD.
  - StephenB
    Guru - Experienced User
    Aug 02, 2016
    BtrieveBill wrote:
    
    I never implied that NetGear was responsible for the drives.
    
    I agree, I didn't see that in your post. Just understandable frustration, and wanting some explanation on the inactive volume status.
    
    BtrieveBill wrote:
    I didn't see that in your post either.
    
    What would cause the entire data volume to just "go away" in the middle of the night when usage is non-existent?
    
    I don't know the cause of the "inactive volume" issue, but we do see it here fairly regularly. It's possibly a bug, but there could be some other problem the NAS found (or thought it found) with the array - perhaps something with the event counters???
    
    It is possible that the array is intact and might be mountable by Netgear support.
    
    BTW if you are the original purchaser and bought the NAS between 1 June 2014 and 31 May 2016 you have free chat support.