NETGEAR is aware of a growing number of phone and online scams. To learn how to stay safe click here.

Forum Discussion

BtrieveBill's avatar
BtrieveBill
Aspirant
Jul 27, 2016
Solved

NAS Slow, Reboot Slow, Drive Light Blinking

My ReadyNAS 516 has been unreasonably slow lately.  The system is not sharing files properly, seems to be accessing slow for both reading and writing, and even the web interface is slow.  The drive array is reporting to be in a valid state, although a few drives are starting to report errors.

 

Today, when I rebooted the unit, the reboot process was really slow (it's been about 15 minutes and we are still only at 41%), and while I was watching the box reboot, I noticed that the drive light for one of the disks (#2 from the top) kept blinking constantly and steadily -- whereas the other drives would blink only very sporadically and usually only once at a time.

 

The web console does not indicate any major issues with the RAID volume at all, and it only reports minor issues on the disks.  Here is the piece from the VOLUME.LOG file regarding this drive, which I extracted before the reboot:

        sdb:
            HostID: 0ed5fea8
            Flags: 0x0
            Size: 7814037168 (3726 GB)
            Free: 4054
            Controller 0
            Channel: 1
            Model: WDC WD4001FAEX-00MJRA0
            Serial: WD-WCC131021590
            Firmware: 01.01L01
            SMART data
                Reallocated Sectors:            3
                Reallocation Events:            3
                Spin Retry Count:               0
                Current Pending Sector Count:   11
                Uncorrectable Sector Count:     14
                Temperature:                    50
                Start/Stop Count:               9
                Power-On Hours:                 25400
                Power Cycle Count:              9
                Load Cycle Count:               6
                Latest Self Test:               Passed

 

However, this has far fewer issues than drive 6:

        sdf:
            HostID: 0ed5fea8
            Flags: 0x0
            Size: 7814037168 (3726 GB)
            Free: 4054
            Controller 0
            Channel: 5
            Model: WDC WD4001FAEX-00MJRA0
            Serial: WD-WMC1F0561108
            Firmware: 01.01L01
            SMART data
                Reallocated Sectors:            20
                Reallocation Events:            3
                Spin Retry Count:               0
                Current Pending Sector Count:   238
                Uncorrectable Sector Count:     234
                Temperature:                    46
                Start/Stop Count:               3
                Power-On Hours:                 5025
                Power Cycle Count:              3
                Load Cycle Count:               0
                Latest Self Test:               Passed

 

And drive 6 seems to have no similar issues with the blinking.

 

I do have one spare drive already in-house, so I can replace one of the drives.  Do I replace #2?  Obviously, I need to wait for it to finish booting, though, right?  Or is it better to just power it down again, replace the drive, and let the RAID volume rebuild itself?

 

  • I would have also opted to replace Drive 6, if it were an option.  However, Drive 2 was the one blinking incessently, and even though it had fewer errors, it was apparently the squeakiest wheel today.  Further, the reboot NEVER finished.  It hung at 94% for over 90 minutes. 

     

    I finally gave up on the reboot and powered down the ReadyNAS entirely a second time, replaced Drive 2, and rebooted.  As advertised, it booted up in about 5 minutes, detected the degraded array, and immediately started the Rebuild Process.  The system is now working substantially better, and even with the RAID rebuild running, it is turning out better performance than I was getting all this week.  I can now send that drive back to WD, get the replacement, and then swap out drive 6 later on.  (Strangely, drive 6 was the only drive that had been replaced once before.  When the new drive 6 was put in, drive 6 started spewing errors after about a week.  This makes me wonder if there is not a problem with the SATA controller or cabling, and that perhaps drive 6 is really OK.)

     

    Lessons learned:

    1) Don't assume that the system is working properly, just becasue the Web console shows all drives are green.

    2) Don't assume that the drive with the most errors is the one with the biggest problem. 

    3) Ignore the data in the logs and just replace the drive that is blinking out of sync with everyone else.

    4) Always have at least one spare drive on standby.

     

24 Replies

Replies have been turned off for this discussion
  • Update: It suddenly jumped from 41% to 94% in the boot process -- and has been sitting at 94% now for another 10 minutes already.  Activity light for drive #2 is still blinking steadily and continuously.  Others are steady.

  • StephenB's avatar
    StephenB
    Guru - Experienced User

    BtrieveBill wrote:

     

    I do have one spare drive already in-house, so I can replace one of the drives.  Do I replace #2?  Obviously, I need to wait for it to finish booting, though, right?  Or is it better to just power it down again, replace the drive, and let the RAID volume rebuild itself?

     


    I'd wait for the resync to finish, and then re-check the SMART stats.

     

    But based on the stats you've posted so far, I'd replace drive 6 next.  Current Pending Sectors happen on failed reads, Reallocated Sectors happen on failed writes.  Both are bad, and I generally sum them when I'm assessing disk condition.  So you have 14 bad sectors on drive 2, but 258 bad sectors on drive 6.

     

    • BtrieveBill's avatar
      BtrieveBill
      Aspirant

      I would have also opted to replace Drive 6, if it were an option.  However, Drive 2 was the one blinking incessently, and even though it had fewer errors, it was apparently the squeakiest wheel today.  Further, the reboot NEVER finished.  It hung at 94% for over 90 minutes. 

       

      I finally gave up on the reboot and powered down the ReadyNAS entirely a second time, replaced Drive 2, and rebooted.  As advertised, it booted up in about 5 minutes, detected the degraded array, and immediately started the Rebuild Process.  The system is now working substantially better, and even with the RAID rebuild running, it is turning out better performance than I was getting all this week.  I can now send that drive back to WD, get the replacement, and then swap out drive 6 later on.  (Strangely, drive 6 was the only drive that had been replaced once before.  When the new drive 6 was put in, drive 6 started spewing errors after about a week.  This makes me wonder if there is not a problem with the SATA controller or cabling, and that perhaps drive 6 is really OK.)

       

      Lessons learned:

      1) Don't assume that the system is working properly, just becasue the Web console shows all drives are green.

      2) Don't assume that the drive with the most errors is the one with the biggest problem. 

      3) Ignore the data in the logs and just replace the drive that is blinking out of sync with everyone else.

      4) Always have at least one spare drive on standby.

       

      • Hopchen's avatar
        Hopchen
        Prodigy

        1) Don't assume that the system is working properly, just becasue the Web console shows all drives are green.

         

        I believe the green dot is more an indication of whether disks are online or not. Hold your mouse over the disk to see more detailed info.

         

        2) Don't assume that the drive with the most errors is the one with the biggest problem.

         

        Definitely never assume this. A disk with only very errors can cause big issues.

        3) Ignore the data in the logs and just replace the drive that is blinking out of sync with everyone else.

         

        Don't ignore the logs! :) Those are really important. Rather - always trust the logs. Pull logs regularly and inspect them. I suggest you also setup email alerts to warm you about things such as disk failures.

         

        4) Always have at least one spare drive on standby.

         

        Yup, very good idea. And always have an up-to-date backup.

    • StephenB's avatar
      StephenB
      Guru - Experienced User

      omicron_persei8 wrote:


      There should be a log and at least a mail.


      If email alerts were configured, then there would have been email, at least on the reallocated sectors.  It might not send an email (or log) with increasing pending sectors - if not, it should.

       

      FWIW, everyone should make sure that their email alerts work, and should take prompt action on disk errors.

  • Sdb errors are below thresholds. It would have be good if those thresholds were lower, but I don't think any email was generated for sdb
    • StephenB's avatar
      StephenB
      Guru - Experienced User

      omicron_persei8 wrote:
      Sdb errors are below thresholds.

      AFAIK there are no thresholds.

       

      If I'm wrong on that, it would be useful to know what those thresholds are (and ideally how to set them to 0).

    • StephenB's avatar
      StephenB
      Guru - Experienced User

      omicron_persei8 wrote:
      Your HDD gets 1000 reallocated sectors in one time, no mail...

      I've always received them (though I just haven't had any reallocated sectors on my OS 6 system).

       

      But with OS 4.2

      ReadyNas PRO wrote:
      Reallocated sector count has increased in the last day.

      Disk  2:
        Previous count: 0
        Current count: 10

      Growing SMART errors indicate a disk that may fail soon.  If the errors continue to increase, you should be prepared to replace the disk

       

      • Retired_Member's avatar
        Retired_Member

        My understanding is that disk alerts and their thresholds are handled differently in RAIDiator 4.2 and ReadyNAS OS6.

        I have seen these email alerts for "small" amount errors on RAIDiator 4.2, but never on ReadyNAS OS6, and the KB article posted by omicron_persei8 would confirm that.

  • How's NETGEAR responsible for your multiple HDD failure? Do you think they invented RAID?
    I do agree though that just showing all HDDs red in this situation can be quite confusing. In a sense, it's correct, they all have inactive volumes, but I'm sure there would a more user friendly way to show a dead RAID.
    • BtrieveBill's avatar
      BtrieveBill
      Aspirant

      I never implied that NetGear was responsible for the drives.  My point is simply this:  A RAID6 array, by its very definition, is supposed to allow for up to two drives to completely fail within the array, and yet still retain full functionality (albeit not full performance). This is why companies purchase RAID solutions in the first place -- it is a known fact that hard drives WILL fail, it is only a matter of when.  The RAID array is supposed to provide protection from this failure wiping out the data.  In this case, we had ONE drive fail, and it was replaced and everything was starting to work normally once again.

       

      My complaint about the ReadyNAS is the RAID implementation itself.  What would cause the entire data volume to just "go away" in the middle of the night when usage is non-existent?  Further, how can it present only a very cryptic error message, where your only option is to tear down and rebuild the array and volume, losing all data in the process?  In effect, all redundancy was lost.  This is no longer RAID, but should be classified as JBOD. 

      • StephenB's avatar
        StephenB
        Guru - Experienced User

        BtrieveBill wrote:

        I never implied that NetGear was responsible for the drives.


        I agree, I didn't see that in your post.  Just understandable frustration, and wanting some explanation on the inactive volume status. 


        BtrieveBill wrote:
        I didn't see that in your post either.

         

        What would cause the entire data volume to just "go away" in the middle of the night when usage is non-existent? 


        I don't know the cause of the "inactive volume" issue, but we do see it here fairly regularly.  It's possibly a bug, but there could be some other problem the NAS found (or thought it found) with the array - perhaps something with the event counters???

         

        It is possible that the array is intact and might be mountable by Netgear support.  

         

        BTW if you are the original purchaser and bought the NAS between 1 June 2014 and 31 May 2016 you have free chat support. 

NETGEAR Academy

Boost your skills with the Netgear Academy - Get trained, certified and stay ahead with the latest Netgear technology! 

Join Us!

ProSupport for Business

Comprehensive support plans for maximum network uptime and business peace of mind.

 

Learn More