NETGEAR is aware of a growing number of phone and online scams. To learn how to stay safe click here.
Forum Discussion
Chappy316
Nov 11, 2021Aspirant
Chances of Recovery
Hoping to (hopefully) find some sliver of light at the end of what appears to be a very dark tunnel at this time. It appears that I have lost two of the four drives (in a very short time period) ...
rn_enthusiast
Nov 12, 2021Virtuoso
Hi Chappy316
Your raid is broken because you have a dual disk failure in a raid 5.
Disk 3 started to show signs of failure back in December 2020. I don't think you were notified as it seems the alert system failed to send you messages (either not configured or misconfigured).
[20/12/01 22:47:30 EST] warning:system:LOGMSG_SENT_ALERT_MESG_FAILED Alert message failed to send.
Evident in the logs is that disk 3 was steadily getting worse throughout the year and eventually the disk was kicked from the raid.
[21/06/13 17:25:27 EDT] warning:volume:LOGMSG_HEALTH_VOLUME Volume data health changed from Redundant to Degraded. [21/06/13 17:25:31 EDT] err:disk:LOGMSG_ZFS_DISK_STATUS_CHANGED Disk in channel 3 (Internal) changed state from ONLINE to FAILED.
In July, it appears the disk was pulled from the bay and re-added which initiated a raid re-sync.
[21/07/08 19:23:58 EDT] warning:disk:LOGMSG_DELETE_DISK Disk Model:WDC WD4000FYYZ-01UL1B2 Serial:WD-WMC130D08XJ7 was removed from Channel 3 of the head unit. [21/07/08 19:26:45 EDT] notice:disk:LOGMSG_ADD_DISK Disk Model:WDC WD4000FYYZ-01UL1B2 Serial:WD-WMC130D08XJ7 was added to Channel 3 of the head unit. [21/07/08 19:26:58 EDT] notice:volume:LOGMSG_RESILVERSTARTED_VOLUME Resyncing started for Volume data.
However, the disk is too bad and resync never completed. From this moment forward, the raid is no longer redundant.
Then in October, disk 4 was kicked out of the raid and the volume was declared "dead" (dual disk failure).
[21/10/04 18:03:11 EDT] notice:volume:LOGMSG_HEALTH_VOLUME Volume data health changed from Degraded to Dead. [21/10/04 18:04:09 EDT] err:disk:LOGMSG_ZFS_DISK_STATUS_CHANGED Disk in channel 4 (Internal) changed state from ONLINE to FAILED.
Disk 4 is not 100% healthy but not too bad either. 19 bad sectors on the disk and I don't see any real complaints about the disk in the kernel logs. However, it is still not a healthy disk and clearly it encountered a failure on the 4th of Oct.
Below is the current state of your disk 3 and disk 4.
---> Disk 3 Device: sdb Controller: 0 Channel: 2 Model: WDC WD4000FYYZ-01UL1B2 Serial: WD-WMC130D08XJ7 Firmware: 01.01K03 Class: SATA RPM: 7200 Sectors: 7814037168 Pool: data-0 PoolType: RAID 5 PoolState: 5 PoolHostId: 2fe5a296 Health data ATA Error Count: 2554 Reallocated Sectors: 1300 Reallocation Events: 125 Spin Retry Count: 0 Current Pending Sector Count: 864 Uncorrectable Sector Count: 148 Temperature: 45 Start/Stop Count: 33 Power-On Hours: 36077 Power Cycle Count: 33 Load Cycle Count: 3 ---> Disk 4 Device: sdc Controller: 0 Channel: 3 Model: WDC WD4000FYYZ-01UL1B2 Serial: WD-WMC130D3MD5Z Firmware: 01.01K03 Class: SATA RPM: 7200 Sectors: 7814037168 Pool: data-0 PoolType: RAID 5 PoolState: 5 PoolHostId: 2fe5a296 Health data ATA Error Count: 0 Reallocated Sectors: 0 Reallocation Events: 0 Spin Retry Count: 0 Current Pending Sector Count: 19 Uncorrectable Sector Count: 19 Temperature: 44 Start/Stop Count: 32 Power-On Hours: 37414 Power Cycle Count: 32 Load Cycle Count: 3
Disk 1 and 2 are healthy as is. Disk 3 is likely a write-off at this stage (but keep it until data is recovered). The best option here is to clone disk 4 and use a healthy cloned disk to re-assemble the raid. That should absolutely be possible given that disk 4 is not completely dead, which I don't see it being.
My advise would be to opt for some paid data recovery support with Netgear and let them help clone disk 4 and re-assemble the raid. You could possibly even manually re-assemble the raid with the current disk 4 but I would not risk it as it is not a fully healthy disk. I think a paid support contract is worth it here as chances for successful recovery are very high, in my opinion.
I would also advise that you consider getting the alerts setup to be working. This way you will be notified by email if disks are failing and you won't end up in the same situation again. Backups are of course also important and I am sure many people here on forum can give advise on that and which strategies they use.
Cheers
Chappy316
Nov 13, 2021Aspirant
Hey rn_enthusiast,
For starters, thank you very much for the help and insight to hopefully resolving this problem.
Just some follow up then a couple questions.
Disk 3 was removed from the array in July with some guidance from other uses on this forum to hopefully jump start it back to life. The suggestion was made, to hopefully make this fully redundant again, to start searching for replacement/upgrade options. I did not realize that it never fully reinitialized. In the process of determining what I wanted for a replacement, we get to where we are now unfortunately.
In the future, I would guess your suggestion would be to not pull a drive that is potentially, or likely from what it looked like, failing to avoid a chance of breaking the array?
Also, I am getting an external backup solution (some sort of external USB) for the highly sensitive files in the array. Do you have one here that you would recommend brand or size wise? I was looking at a WD Essentials as I have always used their internal drives in personal builds in the past and never had any major issues.
So a couple questions on trying to recovery what is left.
What is the process of going through NetGear for paid support to clone Disk 4 (and/or Disk 3 for that matter) and do you know a rough idea of cost on this process? (I know you were a former employee so its just a question, nothing I would hold you to. Just looking for a rough idea.)
Is the cloning process something I could do at home myself? If yes, would that be a faster and cheaper first attempt to fixing the array? Also, is there any more damage that can be caused if I made an initial attempt at home and then had to revert to NetGear?
When attempting to clone Disk 4 (and or Disk 3) either at home or with NetGear, can the drive size be upgraded at that time? The initial reason I came here was to look into expanding the size of the array. Currently they are 4tb drives, could I clone one or both of them to larger drives and restart the array with more size? The ultimate plan is to upgrade the whole array but I was cautioned away from it in July. The status of Disk 3 scared away a couple users in fear that another drive in the array may be close to end of life. Looks like they were right.
If upgrading in HDD size is not the case, would I need to buy a 4tb drive that "matches" what is currently in the array or would anything of similar size be acceptable to clone to regardless of brand or model?
As far as alerts go, apparently it doesn't want to play well with gmail or I am missing something simple. I tried getting it set up with my account so I can receive them and Google won't allow the simple one button sign in, even after I turned on the ability to use less secure apps. Attempting to manually enter the email credentials doesn't help either as it throws an SMTP error when sending a test message.
Again, thank you for any and all insight into this.
Chris
- rn_enthusiastNov 13, 2021Virtuoso
Hi Chappy316
As for external USBs, I have had WD elements USB drives attached to my NAS for a while with no problems. StephenB and Sandshark might know more about USB compatibility in general but I have had good success with WD elements 2TB and 4TB, personally.
The cost of the data recovery contract, I think was something in the the region of $100-$150 but it has been a good few years since I worked there. More extensive data recovery work could require extra cost from what I remember. Maybe StephenB knows the price better? In any case, it will be far cheaper than any regular data recovery service.
As for doing it yourself, you theoretically can. Netgear aren't doing anything magical here but it requires a bit of knowledge. The other advantages of using Netgear to do it, would be that they can use the NAS itself for the cloning process. Makes it easier for you. One would need to examine the raid super blocks to ensure that you are cloning and using the correct disk to re-assemble the raid. Based on logs, it looks like disk 4 is the one we need to use and clone for the raid assembly but examination of the raid super-block would still be prudent. Next will be to monitor the cloning process and assess whether the clone was fully successful and then manually assemble the raid array using disk 1, 2 and the cloned disk 4.
The issue is that it requires some knowledge and/or experience to do this. There are pitfalls, as cloning in the wrong direction or incorrect re-assembling the raid, can lead to total data loss. In any circumstance, it would imagine that you would need Netgear to at least help assemble the raid so having then also start the cloning process (which takes many hours to finish) would probably make sense as I don't imagine that would add a lot to the cost of the work + it makes it safer for you. The replacement drive and can be a drive of same size or larger. I am 99% sure of this. Either should do fine but checking with Netgear is the best thing here but I don't imagine the clone process will cause any issues using a larger target drive.
As for the alerts, I don't use gmail myself (due to privacy stance and tinfoil hats and so on :) ). But the NAS email service is like a forwarding agent that essentially log in into your gmail account and send the mail to yourself. This is something I think gmail might have blocked by default. I am sure it is possible to get working and the two guys that I tagged earlier in the post probably knows more about this, than I do. I will let them chime in on that.
Cheers
- Chappy316Nov 15, 2021Aspirant
rn_enthusiast wrote:As for doing it yourself, you theoretically can. Netgear aren't doing anything magical here but it requires a bit of knowledge. The other advantages of using Netgear to do it, would be that they can use the NAS itself for the cloning process. Makes it easier for you. One would need to examine the raid super blocks to ensure that you are cloning and using the correct disk to re-assemble the raid. Based on logs, it looks like disk 4 is the one we need to use and clone for the raid assembly but examination of the raid super-block would still be prudent. Next will be to monitor the cloning process and assess whether the clone was fully successful and then manually assemble the raid array using disk 1, 2 and the cloned disk 4.
The issue is that it requires some knowledge and/or experience to do this. There are pitfalls, as cloning in the wrong direction or incorrect re-assembling the raid, can lead to total data loss. In any circumstance, it would imagine that you would need Netgear to at least help assemble the raid so having then also start the cloning process (which takes many hours to finish) would probably make sense as I don't imagine that would add a lot to the cost of the work + it makes it safer for you. The replacement drive and can be a drive of same size or larger. I am 99% sure of this. Either should do fine but checking with Netgear is the best thing here but I don't imagine the clone process will cause any issues using a larger target drive.
This may sound a little scary but its all a dice roll to an extent, right? I have a friend who I very much trust with things of this nature so I picked his brain. He seems fully confident that we can clone the drive. I would rather make an attempt here first versus paying the $200 consult fee to find out we can't do anything through NetGear. From the browsing I have done, it appears its a $200 consult/first hour charge and then $150 an hour after that. This information was gathered from their Q&A pages as well are my (less than successful) chat with support last night.
He does have a couple things he wanted me to ask/verify with the help I have received here.
1) He knows we will have to do a bit-by-bit clone but wanted to make sure we can use an equal size or larger drive. You aren't the only one that is nearly 100% certain that we can do that. The existing drive is 4tb so I will need a fresh 4tb or larger drive to clone to. Ideally we want to increase the size of the array so I will be looking to get a larger drive as long as we can go that route. Regardless, they will all be upgraded but obviously recovery is the first priority, upgrading is second.
2) Should we power down the rest of the array to save power up time and stress on the drives that are still good? I don't want to remove or power down anything that I don't have to for fear of decreasing the chances of recovery that we are already limited to.
3) If we an manage to clone Disk 4 successfully, will it be as simple as remounting the drive in the array? Ideally yes but I feel like that may not be the case from the sounds of it. What do you mean when you say we would need to manually assemble the array?
4) Should I consider getting two replacement drives and attempt to clone Disk 3 as well to maximize our chances of recovery? Given that Disk 3 is in much worse shape, is that even a potential option? I will ultimately need two drives but is cloning the more damaged disk even an option?
Thanks again everyone! You have all been more than helpful at this point.
- StephenBNov 15, 2021Guru - Experienced User
Chappy316 wrote:1) He knows we will have to do a bit-by-bit clone but wanted to make sure we can use an equal size or larger drive.
Yes.
Chappy316 wrote:
2) Should we power down the rest of the array to save power up time and stress on the drives that are still good?
It's reasonable to do that now.
Chappy316 wrote:
3) If we an manage to clone Disk 4 successfully, will it be as simple as remounting the drive in the array? What do you mean when you say we would need to manually assemble the array?
You will begin by mounting the drive with the NAS powered down. Then power up. But the array will probably be out of sync (there will be changes to the volume that were never written to disk 4). In that case there would be some steps with mdadm and btrfs to force the array to mount. Likely there will be some file corruption/loss too. So you would need to enable ssh, and manually run some commands to do that.
Chappy316 wrote:
4) Should I consider getting two replacement drives and attempt to clone Disk 3 as well to maximize our chances of recovery? Given that Disk 3 is in much worse shape, is that even a potential option? I will ultimately need two drives but is cloning the more damaged disk even an option?
I'd get two replacement drives. But I wouldn't attempt to clone disk 3. If you are successful, you will still have a degraded array. You can hot-insert a blank disk 3, in order to recover from that part. Though I'd urge you to make a backup to external storage before you do that.
- StephenBNov 13, 2021Guru - Experienced User
Chappy316 wrote:
As far as alerts go, apparently it doesn't want to play well with gmail or I am missing something simple. I tried getting it set up with my account so I can receive them and Google won't allow the simple one button sign in, even after I turned on the ability to use less secure apps. Attempting to manually enter the email credentials doesn't help either as it throws an SMTP error when sending a test message.
For gmail, you can
- set gmail to allow less secure apps
- set gmail to not use two-factor authentication
- smtp server: smtp.gmail.com
- smtp port: 465
- tls: checked
- username and from fields set to full email address (including gmail.com).
- Chappy316Nov 13, 2021Aspirant
StephenB wrote:
For gmail, you can- set gmail to allow less secure apps
- set gmail to not use two-factor authentication
- smtp server: smtp.gmail.com
- smtp port: 465
- tls: checked
- username and from fields set to full email address (including gmail.com).
For whatever reason, those settings give me an error that simply says "Cannot send a test message. Check SMTP server settings." I attempted the simple one click with less secure apps enabled and that didn't work so I tried manually entering the info to no avail.- SandsharkNov 13, 2021Sensei - Experienced User
I use other NAS for my backup and haven't connected a current model USB drive to one in years, so I can make no recommendation. Heck, I don't even own one.
Typically, you can use a larger drive for a cloning process; but I don't know Netgear's specific requirments.
The recovery process will recover your files to an external device, so getting that USB drive is quite important and it can't just be for "important files", you need sufficient storage to hold everything you want to recover. Ultimately, you're going to want to do a factory default and restore all files from backup to get a "clean" system.
Related Content
NETGEAR Academy
Boost your skills with the Netgear Academy - Get trained, certified and stay ahead with the latest Netgear technology!
Join Us!