Forum Discussion

Aspirant

Mar 27, 2024

RN 316 - Raid 5 config, Data Volume Dead

Hello. I have a ReadyNas 316 with 6x hd running in a raid 5 configuration. The other day I had disk2 go from ONLINE to FAILED and the system entered a degraded state. This forced a resync. As it was ...

Sandshark

Sensei

Mar 30, 2024

If you use a VM, the images must be write enabled -- VirtualBox requires that they are. They will behave like real drives in a real NAS, except be slower. So they, too, are likely to be out of sync and require a forced re-assembly.

halbertn

Aspirant

Apr 03, 2024

StephenB Sandshark

Recap: I had 6hdd in a raid5 configuration that failed and reported volume is dead.

Cause of failure: while running a scrub, disk2 fell out of sync forcing the raid array to resync it. During resync, disk3 failed.

Update:

I’ve begun the process of cloning each of my 6 drives to an image using ddrescue. Prior to performing the clone, I ran a smart test to get the status of each drive.

Disk 1, 4, 6 - test passed

Disk 2, 3, 5 - test failed

Assuming I would be successful in cloning disk1,4,6, I proceeded to begin cloning the failed disks first.

Disk 2 - clone was successful! No errors

Disk 3 - Stopped cloning after reaching 66% due to too many errors. Gave up on this disk.

Disk 5 - cloned 99.99% of drive.

Here is the output from ddrescue on disk5:

root@mint:~# ddrescue --verbose --idirect --no-scrape /dev/sdb /m
media/ mnt/   
root@mint:~# ddrescue --verbose --idirect --no-scrape /dev/sdb /media/mint/1d4d6331-7e59-4156-824d-16f53d438b19/hdd_images/disk5.log /media/mint/1d4d6331-7e59-4156-824d-16f53d438b19/disk5.log
GNU ddrescue 1.23
About to copy 3000 GBytes from '/dev/sdb' to '/media/mint/1d4d6331-7e59-4156-824d-16f53d438b19/hdd_images/disk5.log'
    Starting positions: infile = 0 B,  outfile = 0 B
    Copy block size: 128 sectors       Initial skip size: 58624 sectors
Sector size: 512 Bytes

Press Ctrl-C to interrupt
     ipos:    6716 kB, non-trimmed:        0 B,  current rate:    8874 B/s
     opos:    6716 kB, non-scraped:     3072 B,  average rate:    104 MB/s
non-tried:        0 B,  bad-sector:     1024 B,    error rate:     170 B/s
  rescued:    3000 GB,   bad areas:        2,        run time:      8h 28s
pct rescued:   99.99%, read errors:        3,  remaining time:          1s
                              time since last successful read:          0s
Finished   

root@mint:~# ddrescue --verbose --idirect -r3 --no-scrape /dev/sdb /media/mint/1d4d6331-7e59-4156-824d-16f53d438b19/hdd_images/disk5.log /media/mint/1d4d6331-7e59-4156-824d-16f53d438b19/disk5.log
GNU ddrescue 1.23
About to copy 3000 GBytes from '/dev/sdb' to '/media/mint/1d4d6331-7e59-4156-824d-16f53d438b19/hdd_images/disk5.log'
    Starting positions: infile = 0 B,  outfile = 0 B
    Copy block size: 128 sectors       Initial skip size: 58624 sectors
Sector size: 512 Bytes

Press Ctrl-C to interrupt
Initial status (read from mapfile)
rescued: 3000 GB, tried: 4096 B, bad-sector: 1024 B, bad areas: 2

Current status
     ipos:    6716 kB, non-trimmed:        0 B,  current rate:       0 B/s
     opos:    6716 kB, non-scraped:     3072 B,  average rate:       0 B/s
non-tried:        0 B,  bad-sector:     1024 B,    error rate:     256 B/s
  rescued:    3000 GB,   bad areas:        2,        run time:         13s
pct rescued:   99.99%, read errors:        6,  remaining time:         n/a
                              time since last successful read:         n/a
Finished                                   
root@mint:~# ddrescue --verbose --idirect -r3 /dev/sdb /media/mint/1d4d6331-7e59-4156-824d-16f53d438b19/hdd_images/disk5.log /media/mint/1d4d6331-7e59-4156-824d-16f53d438b19/disk5.log
GNU ddrescue 1.23
About to copy 3000 GBytes from '/dev/sdb' to '/media/mint/1d4d6331-7e59-4156-824d-16f53d438b19/hdd_images/disk5.log'
    Starting positions: infile = 0 B,  outfile = 0 B
    Copy block size: 128 sectors       Initial skip size: 58624 sectors
Sector size: 512 Bytes

Press Ctrl-C to interrupt
Initial status (read from mapfile)
rescued: 3000 GB, tried: 4096 B, bad-sector: 1024 B, bad areas: 2

Current status
     ipos:    6716 kB, non-trimmed:        0 B,  current rate:       0 B/s
     opos:    6716 kB, non-scraped:        0 B,  average rate:       0 B/s
non-tried:        0 B,  bad-sector:     4096 B,    error rate:     256 B/s
  rescued:    3000 GB,   bad areas:        1,        run time:      1m  4s
pct rescued:   99.99%, read errors:       30,  remaining time:         n/a
                              time since last successful read:         n/a
Finished

Here is its map log file:

# Mapfile. Created by GNU ddrescue version 1.23
# Command line: ddrescue --verbose --idirect -r3 /dev/sdb /media/mint/1d4d6331-7e59-4156-824d-16f53d438b19/hdd_images/disk5.log /media/mint/1d4d6331-7e59-4156-824d-16f53d438b19/disk5.log
# Start time:   2024-04-02 13:25:45
# Current time: 2024-04-02 13:26:36
# Finished
# current_pos  current_status  current_pass
0x00667E00     +               3
#      pos        size  status
0x00000000  0x00667000  +
0x00667000  0x00001000  -
0x00668000  0x2BAA0E0E000  +

Question: what should I do with this drive or image? Should I attempt to fix the image by mounting it and running chkdsk to fix it? Or should I proceed with rebuilding the raid array as is after I have finished cloning disk 1,4,6?

Thank. you!

StephenB
Guru - Experienced User
Apr 03, 2024
halbertn wrote:

Question: what should I do with this drive or image? Should I attempt to fix the image by mounting it and running chkdsk to fix it? Or should I proceed with rebuilding the raid array as is after I have finished cloning disk 1,4,6?

I don't see any point to chkdsk, as there are no errors in the image (just some sectors that failed to copy).

So I would continue cloning, and then attempt to assemble the RAID array.
halbertn
Aspirant
Apr 03, 2024
I have been able to successfully clone 100% of disk1,2,4,6.
Disk5 is at 99.9% with a missing 4k block.

I have my 5 of 6 disk images and am ready to proceed with raid reassembly. The simplest way to re-assemble the raid will be to use a ReadyNAS VM in a Linux environment.

Before I proceed I think I should do the following:
purchase another 20TB hard drive
Backup the 5 cloned images onto this new drive.
My reasoning is because I know the raid re-assembly will write to each image. By having a backup, I can revert to them if I ever need to and never touch my physical hard drives.

Question: At this point, what do you think is my success rate proceeding to re-assemble the raid array?
Question: Are there any other unknowns that may curb my success?

One item that is lurking in the back of my mind is that the whole raid failed when attempting to resync disk2, therefore questioning the data integrity of disk2. However,

StephenB wrote:
It depends on whether you were doing a lot of other writes to the volume during the resync. It the scrub/resync is all that was happening, then the data being written to the disk would have been identical to what was already on the disk. Likely there was some other activity. But I think it is reasonable to try recovery with disk 2 if you can clone it. There is no harm in trying.
I do want to pause a moment to assess my progress and chance of success with the new information I have gathered from reaching a milestone in cloning 5/6 drives. I'm about to invest in another 20TB drive to backup my images, so I want reassess my odds at this point. Better yet, if there is any additional tools that I can run to gather more data to assess my success rate, now would be the best time to do them.

StephenB wrote:
I've helped quite a few people deal with failed volumes over the years. Honestly your odds of success aren't good.
What do you think StephenB...Have my odds gone up now?

Thank you!
StephenB
Guru - Experienced User
Apr 03, 2024
halbertn wrote:

What do you think StephenB...Have my odds gone up now?

If the 5 successful clones only lost one block on disk 5, then the odds have gone up.

Whether that missing block is a real issue or not depends on what sector it is. If it's free space, then it would have no effect whatsoever.

One thing to consider is that after the dust settles you could use the 2x20TB drives in the RN316, giving you 20 TB of RAID-1 storage.
Sandshark
Sensei
Apr 04, 2024
You can always make another image of the 4 healthy drives, so backing up those images only saves you time in case you need to start over.
StephenB
Guru - Experienced User
Apr 04, 2024
Sandshark wrote:

You can always make another image of the 4 healthy drives, so backing up those images only saves you time in case you need to start over.

FWIW, it's not clear how healthy they are, given that several failed the self test (per the logs).

That said, I agree that disk 5 is the most important one to backup up, so if there is enough storage on the PC for that image, halbertn could skip the others.

Though 2x20TB is a reasonable way to start over on the NAS, and halbertn likely will also need storage for offload the files.
halbertn
Aspirant
Apr 05, 2024
StephenB wrote:
Sandshark wrote:
You can always make another image of the 4 healthy drives, so backing up those images only saves you time in case you need to start over.
FWIW, it's not clear how healthy they are, given that several failed the self test (per the logs).

That said, I agree that disk 5 is the most important one to backup up, so if there is enough storage on the PC for that image, halbertn could skip the others.
Since my plan is to use VirtualBox to run the ReadyNAS OS VM, I will need the extra hard drive storage so that I can covert each .img into the .vdi and store them. Unfortunately VirtualBox only supports .vdi

Though 2x20TB is a reasonable way to start over on the NAS, and halbertn likely will also need storage for offload the files.

Yes, I like this idea. I've already ordered my second drive.
halbertn
Aspirant
Apr 07, 2024
StephenB Sandshark
I finished converting all my img to .vdi so that I could mount them using virtualbox. I assigned each drive as follows:

ReadyNasVM.vdk - SATA port 0
Disk1.vid - SATA port 1
Disk2.vdi - SATA port 2
Disk4.vid - SATA port 4
Disk5.vdi - SATA port 5
Disk6.vdi - SATA port 6

I intentionally skip SATA port3 as that should be the slot at which disk3 went in, but my disk3 clone was bad, so I'm ignoring that slot for now.

I boot up the VM in VirtualBox. However, my Raid5 volume is not recognized. I have an error stating "Remove inactive volumes to use the disk. Disk #1, 2, 4, 5, 6". I also have two data volumes, both which are inactive.

I'm including a screenshot below my post. I also have a zip of the latest logs, but I can't attach zips to this post. If you can point me to which log you'd like to review, I can include that in my next post.

Any ideas on what went wrong? Or am I missing a step to rebuild the raid array (I assumed it would ReadyNAS would to it automatically on boot).
StephenB
Guru - Experienced User
Apr 07, 2024
halbertn wrote:

I'm including a screenshot below my post. I also have a zip of the latest logs, but I can't attach zips to this post. If you can point me to which log you'd like to review, I can include that in my next post.

Likely the volume is out of sync.

The best approach is to get me the full log zip. Do that in a private message (PM) using the envelope link in the upper right of the forum age. Put the log zip into cloud storage, and include a link in the PM. Make sure the permissions are set so anyone with the link can download.

halbertn

Aspirant

Apr 07, 2024

StephenB sent you a DM with a link to the logs.zip. Tried to also include the message below, but I don't think it was formatted properly. Including it here in case you have trouble reading it:

If you look at systemd-journal.log beginning at line: 3390, you'll see below

Apr 06 19:24:02 nas-homezone kernel: md: bind<sdd3>
Apr 06 19:24:02 nas-homezone kernel: md: bind<sdf3>
Apr 06 19:24:02 nas-homezone kernel: md: bind<sde3>
Apr 06 19:24:02 nas-homezone kernel: md: bind<sdc3>
Apr 06 19:24:02 nas-homezone kernel: md: bind<sdb3>
Apr 06 19:24:02 nas-homezone kernel: md/raid:md127: device sdb3 operational as raid disk 0
Apr 06 19:24:02 nas-homezone kernel: md/raid:md127: device sde3 operational as raid disk 5
Apr 06 19:24:02 nas-homezone kernel: md/raid:md127: device sdf3 operational as raid disk 4
Apr 06 19:24:02 nas-homezone kernel: md/raid:md127: device sdd3 operational as raid disk 3
Apr 06 19:24:02 nas-homezone kernel: md/raid:md127: allocated 6474kB
Apr 06 19:24:02 nas-homezone start_raids[1295]: mdadm: failed to RUN_ARRAY /dev/md/data-0: Input/output error
Apr 06 19:24:02 nas-homezone start_raids[1295]: mdadm: Not enough devices to start the array.
Apr 06 19:24:02 nas-homezone systemd[1]: Started MD arrays.
Apr 06 19:24:02 nas-homezone systemd[1]: Reached target Local File Systems (Pre).
Apr 06 19:24:02 nas-homezone systemd[1]: Reached target Swap.
Apr 06 19:24:02 nas-homezone systemd[1]: Starting udev Coldplug all Devices...
Apr 06 19:24:02 nas-homezone kernel: md/raid:md127: not enough operational devices (2/6 failed)
Apr 06 19:24:02 nas-homezone kernel: RAID conf printout:
Apr 06 19:24:02 nas-homezone kernel:  --- level:5 rd:6 wd:4
Apr 06 19:24:02 nas-homezone kernel:  disk 0, o:1, dev:sdb3
Apr 06 19:24:02 nas-homezone kernel:  disk 3, o:1, dev:sdd3
Apr 06 19:24:02 nas-homezone kernel:  disk 4, o:1, dev:sdf3
Apr 06 19:24:02 nas-homezone kernel:  disk 5, o:1, dev:sde3

Notice that sdc3 is not included in the raid array. The device sdc matches disk2.vdi, which was a clone of disk2 - if you recall, disk2 was the drive that fell out of sync in the original ReadyNAS HW unit. This forced a 'resync', against disk2, which lead to disk3 failing and the volume dying.

I wonder if this means disk2 is also bad and unusable for rebuilding the array?