NETGEAR is aware of a growing number of phone and online scams. To learn how to stay safe click here.
Forum Discussion
halbertn
Mar 27, 2024Aspirant
RN 316 - Raid 5 config, Data Volume Dead
Hello. I have a ReadyNas 316 with 6x hd running in a raid 5 configuration. The other day I had disk2 go from ONLINE to FAILED and the system entered a degraded state. This forced a resync. As it was ...
Sandshark
Mar 30, 2024Sensei
If you use a VM, the images must be write enabled -- VirtualBox requires that they are. They will behave like real drives in a real NAS, except be slower. So they, too, are likely to be out of sync and require a forced re-assembly.
halbertn
Apr 03, 2024Aspirant
Recap: I had 6hdd in a raid5 configuration that failed and reported volume is dead.
Cause of failure: while running a scrub, disk2 fell out of sync forcing the raid array to resync it. During resync, disk3 failed.
Update:
I’ve begun the process of cloning each of my 6 drives to an image using ddrescue. Prior to performing the clone, I ran a smart test to get the status of each drive.
Disk 1, 4, 6 - test passed
Disk 2, 3, 5 - test failed
Assuming I would be successful in cloning disk1,4,6, I proceeded to begin cloning the failed disks first.
Disk 2 - clone was successful! No errors
Disk 3 - Stopped cloning after reaching 66% due to too many errors. Gave up on this disk.
Disk 5 - cloned 99.99% of drive.
Here is the output from ddrescue on disk5:
root@mint:~# ddrescue --verbose --idirect --no-scrape /dev/sdb /m
media/ mnt/
root@mint:~# ddrescue --verbose --idirect --no-scrape /dev/sdb /media/mint/1d4d6331-7e59-4156-824d-16f53d438b19/hdd_images/disk5.log /media/mint/1d4d6331-7e59-4156-824d-16f53d438b19/disk5.log
GNU ddrescue 1.23
About to copy 3000 GBytes from '/dev/sdb' to '/media/mint/1d4d6331-7e59-4156-824d-16f53d438b19/hdd_images/disk5.log'
Starting positions: infile = 0 B, outfile = 0 B
Copy block size: 128 sectors Initial skip size: 58624 sectors
Sector size: 512 Bytes
Press Ctrl-C to interrupt
ipos: 6716 kB, non-trimmed: 0 B, current rate: 8874 B/s
opos: 6716 kB, non-scraped: 3072 B, average rate: 104 MB/s
non-tried: 0 B, bad-sector: 1024 B, error rate: 170 B/s
rescued: 3000 GB, bad areas: 2, run time: 8h 28s
pct rescued: 99.99%, read errors: 3, remaining time: 1s
time since last successful read: 0s
Finished
root@mint:~# ddrescue --verbose --idirect -r3 --no-scrape /dev/sdb /media/mint/1d4d6331-7e59-4156-824d-16f53d438b19/hdd_images/disk5.log /media/mint/1d4d6331-7e59-4156-824d-16f53d438b19/disk5.log
GNU ddrescue 1.23
About to copy 3000 GBytes from '/dev/sdb' to '/media/mint/1d4d6331-7e59-4156-824d-16f53d438b19/hdd_images/disk5.log'
Starting positions: infile = 0 B, outfile = 0 B
Copy block size: 128 sectors Initial skip size: 58624 sectors
Sector size: 512 Bytes
Press Ctrl-C to interrupt
Initial status (read from mapfile)
rescued: 3000 GB, tried: 4096 B, bad-sector: 1024 B, bad areas: 2
Current status
ipos: 6716 kB, non-trimmed: 0 B, current rate: 0 B/s
opos: 6716 kB, non-scraped: 3072 B, average rate: 0 B/s
non-tried: 0 B, bad-sector: 1024 B, error rate: 256 B/s
rescued: 3000 GB, bad areas: 2, run time: 13s
pct rescued: 99.99%, read errors: 6, remaining time: n/a
time since last successful read: n/a
Finished
root@mint:~# ddrescue --verbose --idirect -r3 /dev/sdb /media/mint/1d4d6331-7e59-4156-824d-16f53d438b19/hdd_images/disk5.log /media/mint/1d4d6331-7e59-4156-824d-16f53d438b19/disk5.log
GNU ddrescue 1.23
About to copy 3000 GBytes from '/dev/sdb' to '/media/mint/1d4d6331-7e59-4156-824d-16f53d438b19/hdd_images/disk5.log'
Starting positions: infile = 0 B, outfile = 0 B
Copy block size: 128 sectors Initial skip size: 58624 sectors
Sector size: 512 Bytes
Press Ctrl-C to interrupt
Initial status (read from mapfile)
rescued: 3000 GB, tried: 4096 B, bad-sector: 1024 B, bad areas: 2
Current status
ipos: 6716 kB, non-trimmed: 0 B, current rate: 0 B/s
opos: 6716 kB, non-scraped: 0 B, average rate: 0 B/s
non-tried: 0 B, bad-sector: 4096 B, error rate: 256 B/s
rescued: 3000 GB, bad areas: 1, run time: 1m 4s
pct rescued: 99.99%, read errors: 30, remaining time: n/a
time since last successful read: n/a
Finished
Here is its map log file:
# Mapfile. Created by GNU ddrescue version 1.23
# Command line: ddrescue --verbose --idirect -r3 /dev/sdb /media/mint/1d4d6331-7e59-4156-824d-16f53d438b19/hdd_images/disk5.log /media/mint/1d4d6331-7e59-4156-824d-16f53d438b19/disk5.log
# Start time: 2024-04-02 13:25:45
# Current time: 2024-04-02 13:26:36
# Finished
# current_pos current_status current_pass
0x00667E00 + 3
# pos size status
0x00000000 0x00667000 +
0x00667000 0x00001000 -
0x00668000 0x2BAA0E0E000 +
Question: what should I do with this drive or image? Should I attempt to fix the image by mounting it and running chkdsk to fix it? Or should I proceed with rebuilding the raid array as is after I have finished cloning disk 1,4,6?
Thank. you!
- StephenBApr 03, 2024Guru - Experienced User
halbertn wrote:
Question: what should I do with this drive or image? Should I attempt to fix the image by mounting it and running chkdsk to fix it? Or should I proceed with rebuilding the raid array as is after I have finished cloning disk 1,4,6?
I don't see any point to chkdsk, as there are no errors in the image (just some sectors that failed to copy).
So I would continue cloning, and then attempt to assemble the RAID array.
- halbertnApr 03, 2024Aspirant
I have been able to successfully clone 100% of disk1,2,4,6.
Disk5 is at 99.9% with a missing 4k block.I have my 5 of 6 disk images and am ready to proceed with raid reassembly. The simplest way to re-assemble the raid will be to use a ReadyNAS VM in a Linux environment.
Before I proceed I think I should do the following:- purchase another 20TB hard drive
- Backup the 5 cloned images onto this new drive.
My reasoning is because I know the raid re-assembly will write to each image. By having a backup, I can revert to them if I ever need to and never touch my physical hard drives.
Question: At this point, what do you think is my success rate proceeding to re-assemble the raid array?
Question: Are there any other unknowns that may curb my success?One item that is lurking in the back of my mind is that the whole raid failed when attempting to resync disk2, therefore questioning the data integrity of disk2. However,
StephenB wrote:
It depends on whether you were doing a lot of other writes to the volume during the resync. It the scrub/resync is all that was happening, then the data being written to the disk would have been identical to what was already on the disk. Likely there was some other activity. But I think it is reasonable to try recovery with disk 2 if you can clone it. There is no harm in trying.I do want to pause a moment to assess my progress and chance of success with the new information I have gathered from reaching a milestone in cloning 5/6 drives. I'm about to invest in another 20TB drive to backup my images, so I want reassess my odds at this point. Better yet, if there is any additional tools that I can run to gather more data to assess my success rate, now would be the best time to do them.
StephenB wrote:
I've helped quite a few people deal with failed volumes over the years. Honestly your odds of success aren't good.
What do you think StephenB...Have my odds gone up now?
Thank you!
- StephenBApr 03, 2024Guru - Experienced User
halbertn wrote:
What do you think StephenB...Have my odds gone up now?
If the 5 successful clones only lost one block on disk 5, then the odds have gone up.
Whether that missing block is a real issue or not depends on what sector it is. If it's free space, then it would have no effect whatsoever.
One thing to consider is that after the dust settles you could use the 2x20TB drives in the RN316, giving you 20 TB of RAID-1 storage.
- SandsharkApr 04, 2024Sensei
You can always make another image of the 4 healthy drives, so backing up those images only saves you time in case you need to start over.
- StephenBApr 04, 2024Guru - Experienced User
Sandshark wrote:
You can always make another image of the 4 healthy drives, so backing up those images only saves you time in case you need to start over.
FWIW, it's not clear how healthy they are, given that several failed the self test (per the logs).
That said, I agree that disk 5 is the most important one to backup up, so if there is enough storage on the PC for that image, halbertn could skip the others.
Though 2x20TB is a reasonable way to start over on the NAS, and halbertn likely will also need storage for offload the files.
- halbertnApr 05, 2024Aspirant
StephenB wrote:
Sandshark wrote:You can always make another image of the 4 healthy drives, so backing up those images only saves you time in case you need to start over.
FWIW, it's not clear how healthy they are, given that several failed the self test (per the logs).
That said, I agree that disk 5 is the most important one to backup up, so if there is enough storage on the PC for that image, halbertn could skip the others.
Since my plan is to use VirtualBox to run the ReadyNAS OS VM, I will need the extra hard drive storage so that I can covert each .img into the .vdi and store them. Unfortunately VirtualBox only supports .vdi
Though 2x20TB is a reasonable way to start over on the NAS, and halbertn likely will also need storage for offload the files.
Yes, I like this idea. I've already ordered my second drive.
- halbertnApr 07, 2024Aspirant
StephenB Sandshark
I finished converting all my img to .vdi so that I could mount them using virtualbox. I assigned each drive as follows:ReadyNasVM.vdk - SATA port 0
Disk1.vid - SATA port 1Disk2.vdi - SATA port 2
Disk4.vid - SATA port 4
Disk5.vdi - SATA port 5
Disk6.vdi - SATA port 6
I intentionally skip SATA port3 as that should be the slot at which disk3 went in, but my disk3 clone was bad, so I'm ignoring that slot for now.
I boot up the VM in VirtualBox. However, my Raid5 volume is not recognized. I have an error stating "Remove inactive volumes to use the disk. Disk #1, 2, 4, 5, 6". I also have two data volumes, both which are inactive.
I'm including a screenshot below my post. I also have a zip of the latest logs, but I can't attach zips to this post. If you can point me to which log you'd like to review, I can include that in my next post.
Any ideas on what went wrong? Or am I missing a step to rebuild the raid array (I assumed it would ReadyNAS would to it automatically on boot). - StephenBApr 07, 2024Guru - Experienced User
halbertn wrote:
I'm including a screenshot below my post. I also have a zip of the latest logs, but I can't attach zips to this post. If you can point me to which log you'd like to review, I can include that in my next post.Likely the volume is out of sync.
The best approach is to get me the full log zip. Do that in a private message (PM) using the envelope link in the upper right of the forum age. Put the log zip into cloud storage, and include a link in the PM. Make sure the permissions are set so anyone with the link can download.
- halbertnApr 07, 2024Aspirant
StephenB sent you a DM with a link to the logs.zip. Tried to also include the message below, but I don't think it was formatted properly. Including it here in case you have trouble reading it:
If you look at systemd-journal.log beginning at line: 3390, you'll see below
Apr 06 19:24:02 nas-homezone kernel: md: bind<sdd3> Apr 06 19:24:02 nas-homezone kernel: md: bind<sdf3> Apr 06 19:24:02 nas-homezone kernel: md: bind<sde3> Apr 06 19:24:02 nas-homezone kernel: md: bind<sdc3> Apr 06 19:24:02 nas-homezone kernel: md: bind<sdb3> Apr 06 19:24:02 nas-homezone kernel: md/raid:md127: device sdb3 operational as raid disk 0 Apr 06 19:24:02 nas-homezone kernel: md/raid:md127: device sde3 operational as raid disk 5 Apr 06 19:24:02 nas-homezone kernel: md/raid:md127: device sdf3 operational as raid disk 4 Apr 06 19:24:02 nas-homezone kernel: md/raid:md127: device sdd3 operational as raid disk 3 Apr 06 19:24:02 nas-homezone kernel: md/raid:md127: allocated 6474kB Apr 06 19:24:02 nas-homezone start_raids[1295]: mdadm: failed to RUN_ARRAY /dev/md/data-0: Input/output error Apr 06 19:24:02 nas-homezone start_raids[1295]: mdadm: Not enough devices to start the array. Apr 06 19:24:02 nas-homezone systemd[1]: Started MD arrays. Apr 06 19:24:02 nas-homezone systemd[1]: Reached target Local File Systems (Pre). Apr 06 19:24:02 nas-homezone systemd[1]: Reached target Swap. Apr 06 19:24:02 nas-homezone systemd[1]: Starting udev Coldplug all Devices... Apr 06 19:24:02 nas-homezone kernel: md/raid:md127: not enough operational devices (2/6 failed) Apr 06 19:24:02 nas-homezone kernel: RAID conf printout: Apr 06 19:24:02 nas-homezone kernel: --- level:5 rd:6 wd:4 Apr 06 19:24:02 nas-homezone kernel: disk 0, o:1, dev:sdb3 Apr 06 19:24:02 nas-homezone kernel: disk 3, o:1, dev:sdd3 Apr 06 19:24:02 nas-homezone kernel: disk 4, o:1, dev:sdf3 Apr 06 19:24:02 nas-homezone kernel: disk 5, o:1, dev:sde3Notice that sdc3 is not included in the raid array. The device sdc matches disk2.vdi, which was a clone of disk2 - if you recall, disk2 was the drive that fell out of sync in the original ReadyNAS HW unit. This forced a 'resync', against disk2, which lead to disk3 failing and the volume dying.
I wonder if this means disk2 is also bad and unusable for rebuilding the array?
Related Content
NETGEAR Academy
Boost your skills with the Netgear Academy - Get trained, certified and stay ahead with the latest Netgear technology!
Join Us!