Disk in channel 1 (Internal) changed state from ONLINE to FAILED (RN202, 2x WD Red 3TB, OS 6.10.2)

Greetings,

For some time now I have been getting the error mentioned in the subject, and I am trying to find out whether the failing disk needs replacing or if there's anything I can do to fix this. Data is still there, as I'm in RAID-1 configuration, and the second disk is fine.

I've skimmed through the logs, and I do see I/O errors being reported, however I'm looking for a second - more experienced - opinion. I'm attaching a couple of examples -- can anyone make anything out of this? Please let me know if you need more logs! Thanks in advance :-)

dmesg.log

[Mon Jan 20 23:49:02 2020] blk_update_request: I/O error, dev sda, sector 5860533160
[Mon Jan 20 23:49:02 2020] blk_update_request: I/O error, dev sda, sector 5860533160
[Mon Jan 20 23:49:02 2020] Buffer I/O error on dev sda, logical block 732566645, async page read
[Mon Jan 20 23:49:02 2020] blk_update_request: I/O error, dev sda, sector 5860533128
[Mon Jan 20 23:49:02 2020] blk_update_request: I/O error, dev sda, sector 5860533128
[Mon Jan 20 23:49:02 2020] Buffer I/O error on dev sda, logical block 732566641, async page read
[Mon Jan 20 23:49:02 2020] blk_update_request: I/O error, dev sda, sector 5860532992
[Mon Jan 20 23:49:02 2020] Buffer I/O error on dev sda, logical block 732566624, async page read

system.log

$ grep mdadm system.log
Jan 20 23:33:27 ReadyNAS mdadm[1851]: NewArray event detected on md device /dev/md0
Jan 20 23:33:27 ReadyNAS mdadm[1851]: DegradedArray event detected on md device /dev/md0
Jan 20 23:33:27 ReadyNAS mdadm[1851]: NewArray event detected on md device /dev/md1
Jan 20 23:33:27 ReadyNAS mdadm[1851]: DegradedArray event detected on md device /dev/md1
Jan 20 23:33:27 ReadyNAS mdadm[1851]: NewArray event detected on md device /dev/md127
Jan 20 23:33:36 ReadyNAS mdadm[1851]: DegradedArray event detected on md device /dev/md127
Jan 20 23:34:13 ReadyNAS mdadm[1851]: RebuildStarted event detected on md device /dev/md0, component device recovery
Jan 20 23:34:18 ReadyNAS mdadm[1851]: RebuildStarted event detected on md device /dev/md1, component device recovery
Jan 20 23:34:24 ReadyNAS mdadm[1851]: RebuildStarted event detected on md device /dev/md127, component device recovery
Jan 20 23:34:33 ReadyNAS mdadm[1851]: RebuildFinished event detected on md device /dev/md1, component device recovery
Jan 20 23:34:33 ReadyNAS mdadm[1851]: SpareActive event detected on md device /dev/md1, component device /dev/sda2
Jan 20 23:36:29 ReadyNAS mdadm[1851]: RebuildFinished event detected on md device /dev/md0, component device recovery
Jan 20 23:36:29 ReadyNAS mdadm[1851]: SpareActive event detected on md device /dev/md0, component device /dev/sda1
Jan 20 23:41:15 ReadyNAS mdadm[1851]: Fail event detected on md device /dev/md0, component device /dev/sda1
Jan 20 23:41:16 ReadyNAS mdadm[1851]: FailSpare event detected on md device /dev/md127, component device /dev/sda3
Jan 20 23:41:16 ReadyNAS mdadm[1851]: RebuildFinished event detected on md device /dev/md127, component device recovery

StephenB
Jan 27, 2020
liouk wrote:

Here's my disk_info.log as well:

Device: sda Controller: 0 Channel: 0 Model: WDC_WD30EFRX-68EUZN0 Serial: WD-WCC4N1XHR82L Firmware: 82.00A82 Class: SATA Sectors: 5860533168 Pool: data PoolType: RAID 1 PoolState: 3 PoolHostId: 1165483a Health data ATA Error Count: 0

This is all you see for disk 1? My guess is yes, as that is consistent with what you posted before.

Reformatting a line from the earlier pdf:

time model serial realloc_sect realloc_evnt spin_retry_cnt ioedc cmd_timeouts pending_sect uncorrectable_err ata_errors ------------------- -------------------- -------------------- ------------ ------------ -------------- ---------- ------------ ------------ ----------------- ---------- 2020-01-20 23:39:10 WDC WD30EFRX-68EUZN0 WD-WCC4N1XHR82L 41 7 0 -1 -1 0 0 0

You can see there were 41 reallocated sectors reported on the 20th, and that count was increasing regularly for some months.

I believe that disk 1 has failed. If you can connect it to a Windows PC (either with a USB adapter/dock or with SATA), you can test it with WD's Lifeguard program. FWIW, I'd replace it even if it passes Lifeguard.

If you installed it at the same time as disk 2, it likely is still covered by the manufacturer's warranty (The power-on hours suggests it's been installed for about 18 months, and the warranty is three years. Though if the NAS is powered down a lot, the disks could be a lot older). If it is covered, you can get an RMA, but the replacement disk will be recertified (not new). Personally I generally purchase a new disk, and keep the replacement disk as an emergency spare.

7 Replies

Replies have been turned off for this discussion

Sandshark
Sensei
Jan 26, 2020
What you want to look at is the SMART stats, which I suspect will show significant issues. All of those rebuild events are creating quite a lot of activity on both of your drives, raising the chances of the second one failing and you losing all your data. It sounds like you don't have a backup of the data, and you should look into remedying that situation, as RAID alone is not enough to keep your data safe and you don't even currently have a sound RAID.
- liouk
  Aspirant
  Jan 26, 2020
  Hi Sandshark ,
  
  Thanks for your response! I'm not quite sure how to read the SMART stats, so I'm attaching two logfiles I could find relevant info in -- from the little I can gather, it doesn't look too bad.
  
  Regarding your comments -- actually I'm using my ReadyNAS as a backup for everything else, and you're right, I don't have a backup of that. Can you elaborate on why my RAID is not good enough? Do you mean that using only 2 disks isn't sufficient, or is there something else in my configuration?
  
  Again, thanks for the response!
  smart_history_and_volume.pdf36 KB
  - StephenB
    Guru - Experienced User
    Jan 27, 2020
    liouk wrote:
    
    Thanks for your response! I'm not quite sure how to read the SMART stats, so I'm attaching two logfiles I could find relevant info in -- from the little I can gather, it doesn't look too bad.
    
    I don't like the command timeouts. Can you post disk_info.log - that will give the full SMART stats. You definitely are getting errors on disk 1 (sda) - that's what your first set of btrfs errors are telling you.
    
    liouk wrote:
    
    Can you elaborate on why my RAID is not good enough? Do you mean that using only 2 disks isn't sufficient, or is there something else in my configuration?
    
    He's saying that RAID is never enough to keep your data (or anyone else's data) safe. RAID is helpful, but there are a lot of scenarios where it can fail, and you will lose your data if it does.
    
    liouk wrote:
    
    Regarding your comments -- actually I'm using my ReadyNAS as a backup for everything else,
    
    If everything on the NAS is stored on another device, then you aren't depending on RAID alone to keep it safe - since the primary copy is still on the original device. That's good.
    
    If that's the case, then backing up the ReadyNAS itself might also worth considering, giving you more recovery options if something catastrophic happens. I like to have three copies of everything I care about myself (including the original). A couple times (before I had a ReadyNAS), I had PC hard disks fail, and then discovered that my USB backup had disk errors - so I lost some data. I haven't lost anything since I started keeping three copies.

Forum Discussion

Disk in channel 1 (Internal) changed state from ONLINE to FAILED (RN202, 2x WD Red 3TB, OS 6.10.2)

7 Replies

Related Content

changing 5Ghz channel Orbr RBR850

Nighthawk AXE7000 Router Changing 5G Channels by Itself

Stop changing 5ghz channels

5ghz auto changes channel

WNR2000v4 keeps changing internal ip address

NETGEAR Academy

ProSupport for Business