- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
Disk in channel 1 (Internal) changed state from ONLINE to FAILED (RN202, 2x WD Red 3TB, OS 6.10.2)
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Greetings,
For some time now I have been getting the error mentioned in the subject, and I am trying to find out whether the failing disk needs replacing or if there's anything I can do to fix this. Data is still there, as I'm in RAID-1 configuration, and the second disk is fine.
I've skimmed through the logs, and I do see I/O errors being reported, however I'm looking for a second - more experienced - opinion. I'm attaching a couple of examples -- can anyone make anything out of this? Please let me know if you need more logs! Thanks in advance 🙂
dmesg.log
[Mon Jan 20 23:49:02 2020] blk_update_request: I/O error, dev sda, sector 5860533160 [Mon Jan 20 23:49:02 2020] blk_update_request: I/O error, dev sda, sector 5860533160 [Mon Jan 20 23:49:02 2020] Buffer I/O error on dev sda, logical block 732566645, async page read [Mon Jan 20 23:49:02 2020] blk_update_request: I/O error, dev sda, sector 5860533128 [Mon Jan 20 23:49:02 2020] blk_update_request: I/O error, dev sda, sector 5860533128 [Mon Jan 20 23:49:02 2020] Buffer I/O error on dev sda, logical block 732566641, async page read [Mon Jan 20 23:49:02 2020] blk_update_request: I/O error, dev sda, sector 5860532992 [Mon Jan 20 23:49:02 2020] Buffer I/O error on dev sda, logical block 732566624, async page read
system.log
$ grep mdadm system.log Jan 20 23:33:27 ReadyNAS mdadm[1851]: NewArray event detected on md device /dev/md0 Jan 20 23:33:27 ReadyNAS mdadm[1851]: DegradedArray event detected on md device /dev/md0 Jan 20 23:33:27 ReadyNAS mdadm[1851]: NewArray event detected on md device /dev/md1 Jan 20 23:33:27 ReadyNAS mdadm[1851]: DegradedArray event detected on md device /dev/md1 Jan 20 23:33:27 ReadyNAS mdadm[1851]: NewArray event detected on md device /dev/md127 Jan 20 23:33:36 ReadyNAS mdadm[1851]: DegradedArray event detected on md device /dev/md127 Jan 20 23:34:13 ReadyNAS mdadm[1851]: RebuildStarted event detected on md device /dev/md0, component device recovery Jan 20 23:34:18 ReadyNAS mdadm[1851]: RebuildStarted event detected on md device /dev/md1, component device recovery Jan 20 23:34:24 ReadyNAS mdadm[1851]: RebuildStarted event detected on md device /dev/md127, component device recovery Jan 20 23:34:33 ReadyNAS mdadm[1851]: RebuildFinished event detected on md device /dev/md1, component device recovery Jan 20 23:34:33 ReadyNAS mdadm[1851]: SpareActive event detected on md device /dev/md1, component device /dev/sda2 Jan 20 23:36:29 ReadyNAS mdadm[1851]: RebuildFinished event detected on md device /dev/md0, component device recovery Jan 20 23:36:29 ReadyNAS mdadm[1851]: SpareActive event detected on md device /dev/md0, component device /dev/sda1 Jan 20 23:41:15 ReadyNAS mdadm[1851]: Fail event detected on md device /dev/md0, component device /dev/sda1 Jan 20 23:41:16 ReadyNAS mdadm[1851]: FailSpare event detected on md device /dev/md127, component device /dev/sda3 Jan 20 23:41:16 ReadyNAS mdadm[1851]: RebuildFinished event detected on md device /dev/md127, component device recovery
Solved! Go to Solution.
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@liouk wrote:
Here's my disk_info.log as well:
Device: sda Controller: 0 Channel: 0 Model: WDC_WD30EFRX-68EUZN0 Serial: WD-WCC4N1XHR82L Firmware: 82.00A82 Class: SATA Sectors: 5860533168 Pool: data PoolType: RAID 1 PoolState: 3 PoolHostId: 1165483a Health data ATA Error Count: 0
This is all you see for disk 1? My guess is yes, as that is consistent with what you posted before.
Reformatting a line from the earlier pdf:
time model serial realloc_sect realloc_evnt spin_retry_cnt ioedc cmd_timeouts pending_sect uncorrectable_err ata_errors ------------------- -------------------- -------------------- ------------ ------------ -------------- ---------- ------------ ------------ ----------------- ---------- 2020-01-20 23:39:10 WDC WD30EFRX-68EUZN0 WD-WCC4N1XHR82L 41 7 0 -1 -1 0 0 0
You can see there were 41 reallocated sectors reported on the 20th, and that count was increasing regularly for some months.
I believe that disk 1 has failed. If you can connect it to a Windows PC (either with a USB adapter/dock or with SATA), you can test it with WD's Lifeguard program. FWIW, I'd replace it even if it passes Lifeguard.
If you installed it at the same time as disk 2, it likely is still covered by the manufacturer's warranty (The power-on hours suggests it's been installed for about 18 months, and the warranty is three years. Though if the NAS is powered down a lot, the disks could be a lot older). If it is covered, you can get an RMA, but the replacement disk will be recertified (not new). Personally I generally purchase a new disk, and keep the replacement disk as an emergency spare.
All Replies
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Re: Disk in channel 1 (Internal) changed state from ONLINE to FAILED (RN202, 2x WD Red 3TB, OS 6.10.
What you want to look at is the SMART stats, which I suspect will show significant issues. All of those rebuild events are creating quite a lot of activity on both of your drives, raising the chances of the second one failing and you losing all your data. It sounds like you don't have a backup of the data, and you should look into remedying that situation, as RAID alone is not enough to keep your data safe and you don't even currently have a sound RAID.
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Re: Disk in channel 1 (Internal) changed state from ONLINE to FAILED (RN202, 2x WD Red 3TB, OS 6.10.
Hi @Sandshark ,
Thanks for your response! I'm not quite sure how to read the SMART stats, so I'm attaching two logfiles I could find relevant info in -- from the little I can gather, it doesn't look too bad.
Regarding your comments -- actually I'm using my ReadyNAS as a backup for everything else, and you're right, I don't have a backup of that. Can you elaborate on why my RAID is not good enough? Do you mean that using only 2 disks isn't sufficient, or is there something else in my configuration?
Again, thanks for the response!
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Re: Disk in channel 1 (Internal) changed state from ONLINE to FAILED (RN202, 2x WD Red 3TB, OS 6.10.
@liouk wrote:
Thanks for your response! I'm not quite sure how to read the SMART stats, so I'm attaching two logfiles I could find relevant info in -- from the little I can gather, it doesn't look too bad.
I don't like the command timeouts. Can you post disk_info.log - that will give the full SMART stats. You definitely are getting errors on disk 1 (sda) - that's what your first set of btrfs errors are telling you.
@liouk wrote:
Can you elaborate on why my RAID is not good enough? Do you mean that using only 2 disks isn't sufficient, or is there something else in my configuration?
He's saying that RAID is never enough to keep your data (or anyone else's data) safe. RAID is helpful, but there are a lot of scenarios where it can fail, and you will lose your data if it does.
@liouk wrote:
Regarding your comments -- actually I'm using my ReadyNAS as a backup for everything else,
If everything on the NAS is stored on another device, then you aren't depending on RAID alone to keep it safe - since the primary copy is still on the original device. That's good.
If that's the case, then backing up the ReadyNAS itself might also worth considering, giving you more recovery options if something catastrophic happens. I like to have three copies of everything I care about myself (including the original). A couple times (before I had a ReadyNAS), I had PC hard disks fail, and then discovered that my USB backup had disk errors - so I lost some data. I haven't lost anything since I started keeping three copies.
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Re: Disk in channel 1 (Internal) changed state from ONLINE to FAILED (RN202, 2x WD Red 3TB, OS 6.10.
@StephenB thanks for the reply!
Here's my disk_info.log as well:
Device: sda Controller: 0 Channel: 0 Model: WDC_WD30EFRX-68EUZN0 Serial: WD-WCC4N1XHR82L Firmware: 82.00A82 Class: SATA Sectors: 5860533168 Pool: data PoolType: RAID 1 PoolState: 3 PoolHostId: 1165483a Health data ATA Error Count: 0 Device: sdb Controller: 0 Channel: 1 Model: WDC WD30EFRX-68EUZN0 Serial: WD-WCC4N1NA90XX Firmware: 82.00A82W Class: SATA RPM: 5400 Sectors: 5860533168 Pool: data PoolType: RAID 1 PoolState: 3 PoolHostId: 1165483a Health data ATA Error Count: 0 Reallocated Sectors: 0 Reallocation Events: 0 Spin Retry Count: 0 Current Pending Sector Count: 0 Uncorrectable Sector Count: 0 Temperature: 31 Start/Stop Count: 329 Power-On Hours: 13446 Power Cycle Count: 329 Load Cycle Count: 328
I understand the risks when using RAID and backups -- I actually use the NAS to back up data I already have on other devices, plus to store data that I do not mind losing, but don't want to permanently store on my primary PC. Thanks for the insights though! I'm now considering backing up my NAS once more, so that I end up with three copies as well.
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@liouk wrote:
Here's my disk_info.log as well:
Device: sda Controller: 0 Channel: 0 Model: WDC_WD30EFRX-68EUZN0 Serial: WD-WCC4N1XHR82L Firmware: 82.00A82 Class: SATA Sectors: 5860533168 Pool: data PoolType: RAID 1 PoolState: 3 PoolHostId: 1165483a Health data ATA Error Count: 0
This is all you see for disk 1? My guess is yes, as that is consistent with what you posted before.
Reformatting a line from the earlier pdf:
time model serial realloc_sect realloc_evnt spin_retry_cnt ioedc cmd_timeouts pending_sect uncorrectable_err ata_errors ------------------- -------------------- -------------------- ------------ ------------ -------------- ---------- ------------ ------------ ----------------- ---------- 2020-01-20 23:39:10 WDC WD30EFRX-68EUZN0 WD-WCC4N1XHR82L 41 7 0 -1 -1 0 0 0
You can see there were 41 reallocated sectors reported on the 20th, and that count was increasing regularly for some months.
I believe that disk 1 has failed. If you can connect it to a Windows PC (either with a USB adapter/dock or with SATA), you can test it with WD's Lifeguard program. FWIW, I'd replace it even if it passes Lifeguard.
If you installed it at the same time as disk 2, it likely is still covered by the manufacturer's warranty (The power-on hours suggests it's been installed for about 18 months, and the warranty is three years. Though if the NAS is powered down a lot, the disks could be a lot older). If it is covered, you can get an RMA, but the replacement disk will be recertified (not new). Personally I generally purchase a new disk, and keep the replacement disk as an emergency spare.
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Re: Disk in channel 1 (Internal) changed state from ONLINE to FAILED (RN202, 2x WD Red 3TB, OS 6.10.
This is all you see for disk 1? My guess is yes, as that is consistent with what you posted before.
Yes indeed, this is all there is in the log for disk 1.
If you installed it at the same time as disk 2, it likely is still covered by the manufacturer's warranty (The power-on hours suggests it's been installed for about 18 months, and the warranty is three years. Though if the NAS is powered down a lot, the disks could be a lot older). If it is covered, you can get an RMA, but the replacement disk will be recertified (not new). Personally I generally purchase a new disk, and keep the replacement disk as an emergency spare.
It's actually much older than 18 months, 4+ years now -- but you're right, I'm powering it down frequently when it's not in use. Not sure if this is recommended, maybe this is an anti-pattern.
I've already purchased a new disk based on all your comments here -- if I get a chance I might run it through Lifeguard and see what happens.
Thanks for all the support @StephenB and @Sandshark !
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Re: Disk in channel 1 (Internal) changed state from ONLINE to FAILED (RN202, 2x WD Red 3TB, OS 6.10.
@liouk wrote:
t's actually much older than 18 months, 4+ years now -- but you're right, I'm powering it down frequently when it's not in use. Not sure if this is recommended, maybe this is an anti-pattern.
My main NAS is on 24x7, but my backups are all on a power schedule - generally on for an hour or two each day.