Forum Discussion

Tutor

Jul 26, 2021

Solved

Remove inactive volumes after hard drive upgrade RN104

I have an iMac with a RN104 to keep track of all my design/art files. It was starting to fill up so I upgraded and bought 2 x 4tb hard drives to replace the ones in my RN104. I put in 1 x 4tb dri...

rn_enthusiast

Jul 26, 2021

Thanks for the logs Esoteric

So, here are the events...

You replaced disk 2 (which was also a dying disk with tons of ATA errors). I can see that had been going on for a while, so I suspect you don't have email alerts setup? In any case, whether by luck or intentional, you replaced the bad disk and the raid started to sync.

[21/07/22 20:51:22 WEST] notice:disk:LOGMSG_SMART_ATA_ERR_30DAYS_WARN Detected increasing ATA error count: [13955] on disk 2 (Internal) [WDC WD20EFRX-68EUZN0, WD-WCC4M1808373] 11455 times in the past 30 days. This condition often indicates an impending failure. Be prepared to replace this disk to maintain data redundancy.
[21/07/22 20:51:29 WEST] notice:disk:LOGMSG_SMART_ATA_ERR_30DAYS_WARN Detected increasing ATA error count: [13955] on disk 2 (Internal) [WDC WD20EFRX-68EUZN0, WD-WCC4M1808373] 11455 times in the past 30 days. This condition often indicates an impending failure. Be prepared to replace this disk to maintain data redundancy.
[21/07/22 20:52:12 WEST] notice:disk:LOGMSG_SMART_ATA_ERR_30DAYS_WARN Detected increasing ATA error count: [13956] on disk 2 (Internal) [WDC WD20EFRX-68EUZN0, WD-WCC4M1808373] 11456 times in the past 30 days. This condition often indicates an impending failure. Be prepared to replace this disk to maintain data redundancy.
[21/07/22 20:56:20 WEST] notice:disk:LOGMSG_SMART_ATA_ERR_30DAYS_WARN Detected increasing ATA error count: [13956] on disk 2 (Internal) [WDC WD20EFRX-68EUZN0, WD-WCC4M1808373] 11456 times in the past 30 days. This condition often indicates an impending failure. Be prepared to replace this disk to maintain data redundancy.
[21/07/22 20:57:46 WEST] warning:disk:LOGMSG_DELETE_DISK Disk Model:WDC WD20EFRX-68EUZN0 Serial:WD-WCC4M1808373 was removed from Channel 2 of the head unit.
[21/07/22 20:57:54 WEST] warning:volume:LOGMSG_HEALTH_VOLUME Volume data health changed from Redundant to Degraded.
[21/07/22 20:58:45 WEST] notice:disk:LOGMSG_ADD_DISK Disk Model: ST4000VN008-2DR166 Serial:ZGY94XWV was added to Channel 2 of the head unit.
[21/07/22 20:59:28 WEST] notice:volume:LOGMSG_RESILVERSTARTED_VOLUME Resyncing started for Volume data.

The raid successfully synced. BTW - StephenB in my experience it is pretty normal for an RN104 to take this long for a raid sync.

[21/07/23 12:46:10 WEST] notice:volume:LOGMSG_RESILVERCOMPLETE_VOLUME Volume data is resynced.
[21/07/23 12:46:11 WEST] notice:volume:LOGMSG_HEALTH_VOLUME Volume data health changed from Degraded to Redundant.
[21/07/23 12:46:11 WEST] notice:disk:LOGMSG_ZFS_DISK_STATUS_CHANGED Disk in channel 2 (Internal) changed state from RESYNC to ONLINE.

You had correctly waited till the raid had synced and you then replaced disk 1 for new larger disk.

[21/07/23 13:54:23 WEST] warning:disk:LOGMSG_DELETE_DISK Disk Model:WDC WD10EFRX-68PJCN0 Serial:WD-WCC4J1972758 was removed from Channel 1 of the head unit.
[21/07/23 13:54:25 WEST] warning:volume:LOGMSG_HEALTH_VOLUME Volume data health changed from Redundant to Degraded.
[21/07/23 13:56:20 WEST] notice:disk:LOGMSG_ADD_DISK Disk Model: ST4000VN008-2DR166 Serial:ZGY94KA2 was added to Channel 1 of the head unit.
[21/07/23 13:56:36 WEST] notice:volume:LOGMSG_RESILVERSTARTED_VOLUME Resyncing started for Volume data.

At this point, you are still good.

But then 3 mins later we see multiple disks being pulled and added - at this point the raid would have stopped since that is a essentially a multiple disk failure during a raid resync. Do you know why this happened? Were you pulling these disks in and out?

[21/07/23 13:59:55 WEST] notice:disk:LOGMSG_ADD_DISK Disk Model:WDC WD10EFRX-68PJCN0 Serial:WD-WCC4J1977774 was added to Channel 3 of the head unit.
[21/07/23 13:59:56 WEST] warning:disk:LOGMSG_DELETE_DISK Disk Model:WDC WD10EFRX-68PJCN0 Serial:WD-WCC4J1977774 was removed from Channel 3 of the head unit.
[21/07/23 14:03:02 WEST] notice:disk:LOGMSG_ADD_DISK Disk Model:WDC WD10EFRX-68PJCN0 Serial:WD-WCC4J1972758 was added to Channel 4 of the head unit.
[21/07/23 14:03:44 WEST] warning:disk:LOGMSG_DELETE_DISK Disk Model:WDC WD10EFRX-68PJCN0 Serial:WD-WCC4J1972758 was removed from Channel 4 of the head unit.
[21/07/23 14:04:44 WEST] warning:disk:LOGMSG_DELETE_DISK Disk Model: ST4000VN008-2DR166 Serial:ZGY94KA2 was removed from Channel 1 of the head unit.
[21/07/23 14:05:09 WEST] notice:disk:LOGMSG_ADD_DISK Disk Model:WDC WD10EFRX-68PJCN0 Serial:WD-WCC4J1972758 was added to Channel 1 of the head unit.

Following this, we see multiple drives again being pulled and re-added, several reboots and shutdown - even adding back in the old bad disk 2. I assume this was part of the troubleshooting as you indicated in your original post.

[21/07/23 14:16:24 WEST] info:system:LOGMSG_READYNASD_ABORTED_NOINFO ReadyNASOS service or process was restarted.
[21/07/23 14:16:57 WEST] info:system:LOGMSG_START_READYNASD ReadyNASOS background service started.
[21/07/23 14:20:09 WEST] notice:disk:LOGMSG_ADD_DISK Disk Model:WDC WD10EFRX-68PJCN0 Serial:WD-WCC4J1977774 was added to Channel 1 of the head unit.
[21/07/23 14:20:14 WEST] warning:disk:LOGMSG_DELETE_DISK Disk Model:WDC WD10EFRX-68PJCN0 Serial:WD-WCC4J1972758 was removed from Channel 1 of the head unit.
[21/07/23 14:20:15 WEST] warning:disk:LOGMSG_DELETE_DISK Disk Model:WDC WD10EFRX-68PJCN0 Serial:WD-WCC4J1977774 was removed from Channel 3 of the head unit.
[21/07/23 14:20:57 WEST] notice:system:LOGMSG_SYSTEM_HALT The system is shutting down.
[21/07/23 14:24:36 WEST] info:system:LOGMSG_START_READYNASD ReadyNASOS background service started.
[21/07/23 14:25:11 WEST] notice:disk:LOGMSG_ZFS_DISK_STATUS_CHANGED Disk in channel 3 (Internal) changed state from RESYNC to ONLINE.
[21/07/23 14:26:28 WEST] notice:disk:LOGMSG_ADD_DISK Disk Model: ST4000VN008-2DR166 Serial:ZGY94KA2 was added to Channel 4 of the head unit.
[21/07/23 14:30:43 WEST] notice:system:LOGMSG_SYSTEM_REBOOT The system is rebooting.
[21/07/23 14:34:16 WEST] info:system:LOGMSG_START_READYNASD ReadyNASOS background service started.
[21/07/23 14:35:25 WEST] warning:disk:LOGMSG_DELETE_DISK Disk Model: ST4000VN008-2DR166 Serial:ZGY94KA2 was removed from Channel 4 of the head unit.
[21/07/23 14:35:31 WEST] notice:system:LOGMSG_SYSTEM_REBOOT The system is rebooting.
[21/07/23 14:38:48 WEST] info:system:LOGMSG_START_READYNASD ReadyNASOS background service started.
[21/07/23 14:41:24 WEST] notice:system:LOGMSG_SYSTEM_HALT The system is shutting down.
[21/07/23 14:52:00 WEST] notice:disk:LOGMSG_SMART_ATA_ERR_30DAYS_WARN Detected increasing ATA error count: [13997] on disk 2 (Internal) [WDC WD20EFRX-68EUZN0, WD-WCC4M1808373] 11307 times in the past 30 days. This condition often indicates an impending failure. Be prepared to replace this disk to maintain data redundancy.
[21/07/23 14:52:00 WEST] notice:disk:LOGMSG_SMART_ATA_ERR_30DAYS_WARN Detected increasing ATA error count: [13997] on disk 2 (Internal) [WDC WD20EFRX-68EUZN0, WD-WCC4M1808373] 11307 times in the past 30 days. This condition often indicates an impending failure. Be prepared to replace this disk to maintain data redundancy.
[21/07/23 14:52:05 WEST] info:system:LOGMSG_START_READYNASD ReadyNASOS background service started.
[21/07/23 14:53:49 WEST] warning:disk:LOGMSG_DELETE_DISK Disk Model:WDC WD20EFRX-68EUZN0 Serial:WD-WCC4M1808373 was removed from Channel 2 of the head unit.
[21/07/23 14:53:55 WEST] notice:disk:LOGMSG_ADD_DISK Disk Model: ST4000VN008-2DR166 Serial:ZGY94XWV was added to Channel 2 of the head unit

The raid died when disks we being pulled during the raid-sync. So, my question is; why these disks were pulled just 3 mins after disk 1 was replaced for a new larger disk? You added a new disk 1 at 21/07/23 13:56:20 WEST but then started to pull drives just 3 minutes after. Do you remember this or what caused you to take that action?

I also observe the NAS being on a very old firmware. While that isn't the cause it should be updated whenever you get the raid back up and running.

ReadyNASOS!!version=6.6.1,time=1482880160,arch=arm,descr=ReadyNASOS

What is needed at this point, is some delicate manual raid assembly with all the disks that were in the NAS at 21/07/23 13:56:36 WEST. The new disk 1 added just prior isn't going to help as the raid sync on that disk never finished (as other disks were pulled 3 mins later and the raid stopped working), however the remaining disks that resided in the NAS at that time, should be enough. The raid can be assembled in degraded mode and likely saved without too much trouble but reclaiME isn't going to help you here. It needs manual raid assembly.

My advise to you would be to bite the bullet and pay Netgear Support for a data recovery contract. Let their Level 3 team try and save the raid. As the lads have said already, please ensure to have backups of important data in the future.

Cheers

Esoteric

Tutor

Jul 26, 2021

Hi rn_enthusiast
I sent you my logs. Let me know if need anything else from me.

rn_enthusiast

Virtuoso

Jul 26, 2021

Thanks for the logs Esoteric

So, here are the events...

[21/07/22 20:51:22 WEST] notice:disk:LOGMSG_SMART_ATA_ERR_30DAYS_WARN Detected increasing ATA error count: [13955] on disk 2 (Internal) [WDC WD20EFRX-68EUZN0, WD-WCC4M1808373] 11455 times in the past 30 days. This condition often indicates an impending failure. Be prepared to replace this disk to maintain data redundancy.
[21/07/22 20:51:29 WEST] notice:disk:LOGMSG_SMART_ATA_ERR_30DAYS_WARN Detected increasing ATA error count: [13955] on disk 2 (Internal) [WDC WD20EFRX-68EUZN0, WD-WCC4M1808373] 11455 times in the past 30 days. This condition often indicates an impending failure. Be prepared to replace this disk to maintain data redundancy.
[21/07/22 20:52:12 WEST] notice:disk:LOGMSG_SMART_ATA_ERR_30DAYS_WARN Detected increasing ATA error count: [13956] on disk 2 (Internal) [WDC WD20EFRX-68EUZN0, WD-WCC4M1808373] 11456 times in the past 30 days. This condition often indicates an impending failure. Be prepared to replace this disk to maintain data redundancy.
[21/07/22 20:56:20 WEST] notice:disk:LOGMSG_SMART_ATA_ERR_30DAYS_WARN Detected increasing ATA error count: [13956] on disk 2 (Internal) [WDC WD20EFRX-68EUZN0, WD-WCC4M1808373] 11456 times in the past 30 days. This condition often indicates an impending failure. Be prepared to replace this disk to maintain data redundancy.
[21/07/22 20:57:46 WEST] warning:disk:LOGMSG_DELETE_DISK Disk Model:WDC WD20EFRX-68EUZN0 Serial:WD-WCC4M1808373 was removed from Channel 2 of the head unit.
[21/07/22 20:57:54 WEST] warning:volume:LOGMSG_HEALTH_VOLUME Volume data health changed from Redundant to Degraded.
[21/07/22 20:58:45 WEST] notice:disk:LOGMSG_ADD_DISK Disk Model: ST4000VN008-2DR166 Serial:ZGY94XWV was added to Channel 2 of the head unit.
[21/07/22 20:59:28 WEST] notice:volume:LOGMSG_RESILVERSTARTED_VOLUME Resyncing started for Volume data.

The raid successfully synced. BTW - StephenB in my experience it is pretty normal for an RN104 to take this long for a raid sync.

[21/07/23 12:46:10 WEST] notice:volume:LOGMSG_RESILVERCOMPLETE_VOLUME Volume data is resynced.
[21/07/23 12:46:11 WEST] notice:volume:LOGMSG_HEALTH_VOLUME Volume data health changed from Degraded to Redundant.
[21/07/23 12:46:11 WEST] notice:disk:LOGMSG_ZFS_DISK_STATUS_CHANGED Disk in channel 2 (Internal) changed state from RESYNC to ONLINE.

You had correctly waited till the raid had synced and you then replaced disk 1 for new larger disk.

[21/07/23 13:54:23 WEST] warning:disk:LOGMSG_DELETE_DISK Disk Model:WDC WD10EFRX-68PJCN0 Serial:WD-WCC4J1972758 was removed from Channel 1 of the head unit.
[21/07/23 13:54:25 WEST] warning:volume:LOGMSG_HEALTH_VOLUME Volume data health changed from Redundant to Degraded.
[21/07/23 13:56:20 WEST] notice:disk:LOGMSG_ADD_DISK Disk Model: ST4000VN008-2DR166 Serial:ZGY94KA2 was added to Channel 1 of the head unit.
[21/07/23 13:56:36 WEST] notice:volume:LOGMSG_RESILVERSTARTED_VOLUME Resyncing started for Volume data.

At this point, you are still good.

[21/07/23 13:59:55 WEST] notice:disk:LOGMSG_ADD_DISK Disk Model:WDC WD10EFRX-68PJCN0 Serial:WD-WCC4J1977774 was added to Channel 3 of the head unit.
[21/07/23 13:59:56 WEST] warning:disk:LOGMSG_DELETE_DISK Disk Model:WDC WD10EFRX-68PJCN0 Serial:WD-WCC4J1977774 was removed from Channel 3 of the head unit.
[21/07/23 14:03:02 WEST] notice:disk:LOGMSG_ADD_DISK Disk Model:WDC WD10EFRX-68PJCN0 Serial:WD-WCC4J1972758 was added to Channel 4 of the head unit.
[21/07/23 14:03:44 WEST] warning:disk:LOGMSG_DELETE_DISK Disk Model:WDC WD10EFRX-68PJCN0 Serial:WD-WCC4J1972758 was removed from Channel 4 of the head unit.
[21/07/23 14:04:44 WEST] warning:disk:LOGMSG_DELETE_DISK Disk Model: ST4000VN008-2DR166 Serial:ZGY94KA2 was removed from Channel 1 of the head unit.
[21/07/23 14:05:09 WEST] notice:disk:LOGMSG_ADD_DISK Disk Model:WDC WD10EFRX-68PJCN0 Serial:WD-WCC4J1972758 was added to Channel 1 of the head unit.

[21/07/23 14:16:24 WEST] info:system:LOGMSG_READYNASD_ABORTED_NOINFO ReadyNASOS service or process was restarted.
[21/07/23 14:16:57 WEST] info:system:LOGMSG_START_READYNASD ReadyNASOS background service started.
[21/07/23 14:20:09 WEST] notice:disk:LOGMSG_ADD_DISK Disk Model:WDC WD10EFRX-68PJCN0 Serial:WD-WCC4J1977774 was added to Channel 1 of the head unit.
[21/07/23 14:20:14 WEST] warning:disk:LOGMSG_DELETE_DISK Disk Model:WDC WD10EFRX-68PJCN0 Serial:WD-WCC4J1972758 was removed from Channel 1 of the head unit.
[21/07/23 14:20:15 WEST] warning:disk:LOGMSG_DELETE_DISK Disk Model:WDC WD10EFRX-68PJCN0 Serial:WD-WCC4J1977774 was removed from Channel 3 of the head unit.
[21/07/23 14:20:57 WEST] notice:system:LOGMSG_SYSTEM_HALT The system is shutting down.
[21/07/23 14:24:36 WEST] info:system:LOGMSG_START_READYNASD ReadyNASOS background service started.
[21/07/23 14:25:11 WEST] notice:disk:LOGMSG_ZFS_DISK_STATUS_CHANGED Disk in channel 3 (Internal) changed state from RESYNC to ONLINE.
[21/07/23 14:26:28 WEST] notice:disk:LOGMSG_ADD_DISK Disk Model: ST4000VN008-2DR166 Serial:ZGY94KA2 was added to Channel 4 of the head unit.
[21/07/23 14:30:43 WEST] notice:system:LOGMSG_SYSTEM_REBOOT The system is rebooting.
[21/07/23 14:34:16 WEST] info:system:LOGMSG_START_READYNASD ReadyNASOS background service started.
[21/07/23 14:35:25 WEST] warning:disk:LOGMSG_DELETE_DISK Disk Model: ST4000VN008-2DR166 Serial:ZGY94KA2 was removed from Channel 4 of the head unit.
[21/07/23 14:35:31 WEST] notice:system:LOGMSG_SYSTEM_REBOOT The system is rebooting.
[21/07/23 14:38:48 WEST] info:system:LOGMSG_START_READYNASD ReadyNASOS background service started.
[21/07/23 14:41:24 WEST] notice:system:LOGMSG_SYSTEM_HALT The system is shutting down.
[21/07/23 14:52:00 WEST] notice:disk:LOGMSG_SMART_ATA_ERR_30DAYS_WARN Detected increasing ATA error count: [13997] on disk 2 (Internal) [WDC WD20EFRX-68EUZN0, WD-WCC4M1808373] 11307 times in the past 30 days. This condition often indicates an impending failure. Be prepared to replace this disk to maintain data redundancy.
[21/07/23 14:52:00 WEST] notice:disk:LOGMSG_SMART_ATA_ERR_30DAYS_WARN Detected increasing ATA error count: [13997] on disk 2 (Internal) [WDC WD20EFRX-68EUZN0, WD-WCC4M1808373] 11307 times in the past 30 days. This condition often indicates an impending failure. Be prepared to replace this disk to maintain data redundancy.
[21/07/23 14:52:05 WEST] info:system:LOGMSG_START_READYNASD ReadyNASOS background service started.
[21/07/23 14:53:49 WEST] warning:disk:LOGMSG_DELETE_DISK Disk Model:WDC WD20EFRX-68EUZN0 Serial:WD-WCC4M1808373 was removed from Channel 2 of the head unit.
[21/07/23 14:53:55 WEST] notice:disk:LOGMSG_ADD_DISK Disk Model: ST4000VN008-2DR166 Serial:ZGY94XWV was added to Channel 2 of the head unit

I also observe the NAS being on a very old firmware. While that isn't the cause it should be updated whenever you get the raid back up and running.

ReadyNASOS!!version=6.6.1,time=1482880160,arch=arm,descr=ReadyNASOS

Cheers

StephenB
Guru - Experienced User
Jul 26, 2021
rn_enthusiast wrote:

BTW - StephenB in my experience it is pretty normal for an RN104 to take this long for a raid sync.

14-15 hours to resilver 1x3TB RAID 5 is (in my experience) excesslve.

Resilvering 2x1TB RAID-1 on my RN102 takes about two hours. Based on that, resilvering 3x1TB RAID-5 should be about three on the RN104 - perhaps a bit more, since the RN104 is computing the parity blocks. I'd have expected ~14 hours for 3x4TB, but not 3x1.

However, the 4 TB Ironwolf certainly isn't SMR, so there is no concern on the drive choice.

rn_enthusiast wrote:

But then 3 mins later we see multiple disks being pulled and added - at this point the raid would have stopped since that is a essentially a multiple disk failure during a raid resync. Do you know why this happened? Were you pulling these disks in and out?

...
The raid died when disks were being pulled during the raid-sync

Agreed, thx for going through the logs.

Esoteric: You can find more info on the data recovery contract here: https://kb.netgear.com/69/ReadyNAS-Data-Recovery-Diagnostics-Scope-of-Service
- rn_enthusiast
  Virtuoso
  Jul 26, 2021
  Yea I was thinking 4TB disk but of course it would only have used 1TB for the raid at the time it was added.
  14 hours might be on long end but I suspect the parity calculation hurts the RN104's CPU more then we think, given it is a single core ARM :)
Esoteric
Tutor
Jul 26, 2021
Thanks for going through this rn_enthusiast

To answer your question, that's when it said my volumes were inactive. A couple minutes after I put the second new drive in. It was syncing fine, then bam, said volumes inactive. So I pulled them out at this point.

With the recovery... It's $180 per hour after the first hour. If it took that long to sync will that mean I'll have to pay 14 x $180! Or is it quicker to fix than that?

Thanks again for all your help and my wake up call.

All the best,
-Justin-
- rn_enthusiast
  Virtuoso
  Jul 26, 2021
  Esoteric wrote:
  Thanks for going through this rn_enthusiast
  
  To answer your question, that's when it said my volumes were inactive. A couple minutes after I put the second new drive in. It was syncing fine, then bam, said volumes inactive. So I pulled them out at this point.
  
  With the recovery... It's $180 per hour after the first hour. If it took that long to sync will that mean I'll have to pay 14 x $180! Or is it quicker to fix than that?
  
  Thanks again for all your help and my wake up call.
  
  All the best,
  -Justin-
  Ahh OK, I see why you pulled them out so. The troubleshooting essentially started there. So, there must be a reason for the raid to have problems during the Sync after adding the new disk 1. I reckon Netgear Support will spot that, but the data should salvageable, in my opinion, so it's worth going for a data recovery contact and let their Level 3 team look at it. When I worked there, they weren't strict with the hourly rate TBH but that might have changed. I don't reckon it needs that much more because the work shouldn't (on paper) be too extensive here. It's at least worth inquiring with them.
  
  Cheers
  - Esoteric
    Tutor
    Jul 26, 2021
    One last thing rn_enthusiast. I'm looking online to do the Data Recovery Diagnostics - Scope of Service. When I went through the contact journey it just gives me the option to only purchase a ProSUPPORT OnCall 24x7, Category 1, 1-Year call support.
    
    Is there a way to contact them to just purchase the Data Recovery Diagnostics?
    
    You have been sucha help with all of this, and I truly appreciate it!