Firmware 6.6.1 I had 4 x 1TB drives in my system and planned to upgrade one disk a month for four months to aceive a 4 x 4TB system. The initial swap of the first drive seemed to go well but after about 7 hours I came back and had this warning message on the Admin page under volumes. I see several threads about this problem being fixed but none detail the actual method that was used. I would prefer NOT to lose the data on the NAS so I'm curious what the process is here. I have the log files downloaded as well. As a side note, I attempted to buy premium support but one the last page after entering in credit card info it doesn't give any option for hitting "next" or "enter". It just hits a dead end. Thank you

jak0lantash wrote: I can't see your screenshot as it wasn't approved by a moderator yet. I just took care of that.

Well, that's not a very good start. Sda is not in sync with sdb and sdc. Sdd is not in the RAID array. In other words, dual disk failure (one that you removed, one dead). One disk failed before the RAID array finished rebuilding the new one.You can check the channel numbers, the device names and serial numbers in disk_info.log (channel number starts at zero). This is a tricky situation, but you can try the following:1. Gracefully shut down the NAS from the GUI.2. Remove the new drive you inserted (it's not in sync anyway).3. Re-insert the old drive.4. Boot the NAS.5. If it boots OK and the volume is accessible, make a full backup and/or replace the disk that is not in sync by a brand new one. You have two disks with ATA errors which is not very good. Resyncing the RAID array put strain on all the disks, which can push a damaged/old disk to its limits. Alternatively, you can contact NETGEAR for a Data Recovery contract. They can assess the situation and assist you with recovering the situation. Thanks StephenB for approving the screenshot.

Operationaly and statistically this doesn't make any sense. The drives stay active all the time with backups and streaming media so I'm not sure why doing a disk upgrade would cause abnormal stress. But even if that is the case and drive D suddenly died, I replaced Drive A with the original fully functional drive which should recover the system. Also, the NAS is on a power conditioning UPS so power failure was not a cause. Based on the MANY threads on this same topic, I don't think this is the result of a double drive failure. I think there is a firmware or hardware issue that is making the single most important feature of a RAID 5 NAS unreliable. Even if I could figure out how to pay Netgear for support on this, I don't have any confidence this same thing won't happen next time so I'm not sure it's worth the investment.Thank you for your assitance though. It was greatly appreciated.

daelomin wrote:[Thu May 4 23:58:14 2017] md/raid:md127: not enough operational devices (2/4 failed)Your issue is not at the BTRFS volume level but at the RAID array level (lower layer).You need to fix the RAID array.

Just to clarify, ATA errors do not mean that a drive is failing. Techmically an ATA error occurs when controller and drive fail to accurately communicate a string of data. We would typcially think of this as a drive beginning to fail and that would be the most common reason BUT it can also have several other causes including failing connectors or a failing controller. In my situation, I was getting ATA errors on three drives including a brand new WD 4TB Red. When I moved all these drives to another NAS, the ATA errors went away and the drives appear to be performing as expected.

Remove inactive volumes to use the disk. Disk #1,2,3,4.

25 Replies

Replies have been turned off for this discussion

jak0lantash
Mentor
May 01, 2017
Before starting:
I can't see your screenshot as it wasn't approved by a moderator yet.
Maybe, you would like to upvote this "idead": https://community.netgear.com/t5/Idea-Exchange-for-ReadyNAS/Change-the-incredibly-confusing-error-message-quot-remove/idi-p/1271658

Do you know if any drive is showing errors? Like reallocated sectors, pending sectors, ATA errors? From the GUI, look under System / Performance and hover the cursor on the disk beside the disk number (or look in disk_info.log from the log bundle).
In dmesg.log, do you see any error containing "md127" (start from the end of the file)?
- StephenB
  Guru - Experienced User
  May 01, 2017
  jak0lantash wrote:
  
  I can't see your screenshot as it wasn't approved by a moderator yet.
  
  I just took care of that.
- Oversteer71
  Guide
  May 01, 2017
  Thanks for the fast replies.
  On Disks 2, 3 and 4 (the original 1TB drives) I show 10, 0, 4 ATA errors respectively. Disk 1, the new 4TB also shows 0
  
  Here's what I found towards the end of the dmesg.log file:
  [Sun Apr 30 20:06:20 2017] md: md127 stopped.
  [Sun Apr 30 20:06:21 2017] md: bind<sda3>
  [Sun Apr 30 20:06:21 2017] md: bind<sdc3>
  [Sun Apr 30 20:06:21 2017] md: bind<sdd3>
  [Sun Apr 30 20:06:21 2017] md: bind<sdb3>
  [Sun Apr 30 20:06:21 2017] md: kicking non-fresh sda3 from array!
  [Sun Apr 30 20:06:21 2017] md: unbind<sda3>
  [Sun Apr 30 20:06:21 2017] md: export_rdev(sda3)
  [Sun Apr 30 20:06:21 2017] md/raid:md127: device sdb3 operational as raid disk 1
  [Sun Apr 30 20:06:21 2017] md/raid:md127: device sdc3 operational as raid disk 2
  [Sun Apr 30 20:06:21 2017] md/raid:md127: allocated 4280kB
  [Sun Apr 30 20:06:21 2017] md/raid:md127: not enough operational devices (2/4 failed)
  [Sun Apr 30 20:06:21 2017] RAID conf printout:
  [Sun Apr 30 20:06:21 2017] --- level:5 rd:4 wd:2
  [Sun Apr 30 20:06:21 2017] disk 1, o:1, dev:sdb3
  [Sun Apr 30 20:06:21 2017] disk 2, o:1, dev:sdc3
  [Sun Apr 30 20:06:21 2017] md/raid:md127: failed to run raid set.
  [Sun Apr 30 20:06:21 2017] md: pers->run() failed ...
  [Sun Apr 30 20:06:21 2017] md: md127 stopped.
  [Sun Apr 30 20:06:21 2017] md: unbind<sdb3>
  [Sun Apr 30 20:06:21 2017] md: export_rdev(sdb3)
  [Sun Apr 30 20:06:21 2017] md: unbind<sdd3>
  [Sun Apr 30 20:06:21 2017] md: export_rdev(sdd3)
  [Sun Apr 30 20:06:21 2017] md: unbind<sdc3>
  [Sun Apr 30 20:06:21 2017] md: export_rdev(sdc3)
  [Sun Apr 30 20:06:21 2017] systemd[1]: Started udev Kernel Device Manager.
  [Sun Apr 30 20:06:21 2017] systemd[1]: Started MD arrays.
  [Sun Apr 30 20:06:21 2017] systemd[1]: Reached target Local File Systems (Pre).
  [Sun Apr 30 20:06:21 2017] systemd[1]: Found device /dev/md1.
  [Sun Apr 30 20:06:21 2017] systemd[1]: Activating swap md1...
  [Sun Apr 30 20:06:21 2017] Adding 1046524k swap on /dev/md1. Priority:-1 extents:1 across:1046524k
  [Sun Apr 30 20:06:21 2017] systemd[1]: Activated swap md1.
  [Sun Apr 30 20:06:21 2017] systemd[1]: Started Journal Service.
  [Sun Apr 30 20:06:21 2017] systemd-journald[1020]: Received request to flush runtime journal from PID 1
  [Sun Apr 30 20:07:09 2017] md: md1: resync done.
  [Sun Apr 30 20:07:09 2017] RAID conf printout:
  [Sun Apr 30 20:07:09 2017] --- level:6 rd:4 wd:4
  [Sun Apr 30 20:07:09 2017] disk 0, o:1, dev:sda2
  [Sun Apr 30 20:07:09 2017] disk 1, o:1, dev:sdb2
  [Sun Apr 30 20:07:09 2017] disk 2, o:1, dev:sdc2
  [Sun Apr 30 20:07:09 2017] disk 3, o:1, dev:sdd2
  [Sun Apr 30 20:07:51 2017] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
  [Sun Apr 30 20:07:56 2017] mvneta d0070000.ethernet eth0: Link is Up - 1Gbps/Full - flow control off
  [Sun Apr 30 20:07:56 2017] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
  - jak0lantash
    Mentor
    May 01, 2017
    Well, that's not a very good start. Sda is not in sync with sdb and sdc. Sdd is not in the RAID array. In other words, dual disk failure (one that you removed, one dead). One disk failed before the RAID array finished rebuilding the new one.
    You can check the channel numbers, the device names and serial numbers in disk_info.log (channel number starts at zero).
    This is a tricky situation, but you can try the following:
    1. Gracefully shut down the NAS from the GUI.
    2. Remove the new drive you inserted (it's not in sync anyway).
    3. Re-insert the old drive.
    4. Boot the NAS.
    5. If it boots OK and the volume is accessible, make a full backup and/or replace the disk that is not in sync by a brand new one.
    
    You have two disks with ATA errors which is not very good. Resyncing the RAID array put strain on all the disks, which can push a damaged/old disk to its limits.
    
    Alternatively, you can contact NETGEAR for a Data Recovery contract. They can assess the situation and assist you with recovering the situation.
    
    Thanks StephenB for approving the screenshot.
jak0lantash
Mentor
May 01, 2017
As you were replacing the first drive and it failed, you now need to boot with the four original drives first. If the RAID still doesn't start with all the original drive, then you won't be able to rebuild a new drive. Your best chance is to contact NETGEAR Support.
- Oversteer71
  Guide
  May 02, 2017
  Operationaly and statistically this doesn't make any sense. The drives stay active all the time with backups and streaming media so I'm not sure why doing a disk upgrade would cause abnormal stress. But even if that is the case and drive D suddenly died, I replaced Drive A with the original fully functional drive which should recover the system. Also, the NAS is on a power conditioning UPS so power failure was not a cause.
  
  Based on the MANY threads on this same topic, I don't think this is the result of a double drive failure. I think there is a firmware or hardware issue that is making the single most important feature of a RAID 5 NAS unreliable.
  
  Even if I could figure out how to pay Netgear for support on this, I don't have any confidence this same thing won't happen next time so I'm not sure it's worth the investment.
  
  Thank you for your assitance though. It was greatly appreciated.
  - StephenB
    Guru - Experienced User
    May 02, 2017
    Oversteer71 wrote:
    
    I'm not sure why doing a disk upgrade would cause abnormal stress.
    
    I just wanted to comment on this aspect. Disk replacement (and also volume expansion) require every sector on every disk in the data volume to either be read or written. If there are as-yet undetected bad sectors they certainly can turn up during the resync process.
    
    The disk I/O also is likely higher during resync than normal operation (though that depends on the normal operating load for the NAS).
    
    As far as replacing the original drive A - if there have been any updates to the volume (including automatic updates like snapshots), then that usually won't help. There are event counters on each drive, and if they don't match mdadm won't mount the volume (unless you force it to).
    
    That said, I have seen too many unexplained "inactive volumes" threads here - so I also am not convinced that double-drive failure is the only cause of this problem.