RN316 - Rejoin drive to dead volume?

nifter · ‎2022-05-25

I've been using my RN316 (OS 6.10.7) for many years.

X-Raid with 4TB+4TB +4TB+6TB drives.

I got a new 10 TB drive I was going to use for backup. I had trouble with the RN detecting it via esata or in a usb dock so I plugged it into bay 6 of the nas to see if it would detect. It was in there for about 15 seconds, then I pulled it out.

The array said degraded, but I didn't see how those 15 seconds would have caused any significant changes.

I continued copying data from my RN to other targets using teracopy to verify all files without any issue.

When I started to run out of space, I did something stupid. I thought the raid should be able to continue with one drive failure, so I pulled the 6TB drive. It was only out for about 30 seconds before I saw the errors in the web console and LCD. I quickly reinserted the drive.

Now the data volume says "Volume is inactive or dead" but I can still access it. I see nearly everything. Some folders are empty and some return I/O error when I attempt to ls them in ssh.

The volumes page has a new "data-0" volume (14.54 TB) and "data-1" volume (1.82 TB) with the original data volume as inactive.

I know it's a long shot, but is there any linux/btrfs magic where I could rejoin that 6TB drive to the volume?

I'm linux savvy.

I have a log bundle.

StephenB · ‎2022-05-26

@nifter wrote:

I got a new 10 TB drive I was going to use for backup. I had trouble with the RN detecting it via esata or in a usb dock so I plugged it into bay 6 of the nas to see if it would detect. It was in there for about 15 seconds, then I pulled it out.

The array said degraded, but I didn't see how those 15 seconds would have caused any significant changes.

Bad idea if you were running XRAID. The NAS would have tried to automatically add the 10 TB drive to the array. Once the NAS thinks the drive is part of the array, then the volume will be flagged as degraded if it is missing.

@nifter wrote:

When I started to run out of space, I did something stupid. I thought the raid should be able to continue with one drive failure, so I pulled the 6TB drive. It was only out for about 30 seconds before I saw the errors in the web console and LCD. I quickly reinserted the drive.

If the volume was already identified as degraded when you did this, then the NAS likely saw this as a two-drive failure, and therefore marked the volume as dead.

@nifter wrote:

Now the data volume says "Volume is inactive or dead" but I can still access it. I see nearly everything. Some folders are empty and some return I/O error when I attempt to ls them in ssh.

The volumes page has a new "data-0" volume (14.54 TB) and "data-1" volume (1.82 TB) with the original data volume as inactive.

I know it's a long shot, but is there any linux/btrfs magic where I could rejoin that 6TB drive to the volume?

I'm linux savvy.

I have a log bundle.

You could try manually removing the 10 TB drive from the array with mdadm. That should restore the volume to degraded status, which would allow the 6 TB drive to be resynced.

@Sandshark has played with these commands more than I have, so hopefully he will chime in.

Sandshark · ‎2022-05-26

The NAS has labeled two parts of the former "data" array as data-0 and data-1 because it can't have multiple volumes named "data". Those are also the names given to the MDADM RAID groups of a two-group volume named "data".

I don't know if anything I know of can fix your situation, as the commands normally need to operate on the un-mounted "data" volume. If you had come here before you removed that second drive, I could have helped. But look at my post Reducing-RAID-size-removing-drives-WITHOUT-DATA-LOSS-is-possible and post the results of commands like I use in the beginning to verify the configuration. You'll need at least these:

cat /proc/mdstat

btrfs filesystem show /data

btrfs filesystem show /data-0

btrfs filesystem show /data-1

mdadm --detail /dev/md127

mdadm --detail /dev/md126 (if md126 shows up in the btrfs command responses)

and the same for any other RAID volumes that show up.

From there, I'll see if it looks like something is possible, but I don't have high hopes and whatever you try will not be something I've tried before.

nifter · ‎2022-05-26

@Sandshark Thanks for having a look.

admin@Wingnut:/$ cat /proc/mdstat
Personalities : [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
md1 : active raid10 sdf2[3] sdd2[2] sdc2[1] sdb2[0]
1044480 blocks super 1.2 512K chunks 2 near-copies [4/4] [UUUU]

md126 : active raid1 sde4[1](F) sda4[0]
1953373888 blocks super 1.2 [2/1] [U_]

md127 : active raid5 sdc3[0] sdb3[2] sdd3[1]
15608667136 blocks super 1.2 level 5, 64k chunk, algorithm 2 [5/3] [UUU__]

md0 : active raid1 sdf1[5] sdc1[0] sdb1[2] sdd1[1]
4190208 blocks super 1.2 [5/4] [UUUU_]

unused devices: <none>

root@Wingnut:/# btrfs filesystem show /data
Label: '43f65d04:data' uuid: 893ff390-861c-441d-b39c-8e6c707e0e1d
Total devices 2 FS bytes used 8.14TiB
devid 1 size 14.54TiB used 9.80TiB path /dev/md127
devid 2 size 1.82TiB used 1.00GiB path /dev/md126

root@Wingnut:/# btrfs filesystem show /data-0
ERROR: not a valid btrfs filesystem: /data-0
root@Wingnut:/# btrfs filesystem show /data-1
ERROR: not a valid btrfs filesystem: /data-1

root@Wingnut:/# mdadm --detail /dev/md127
/dev/md127:
Version : 1.2
Creation Time : Fri Jul 3 15:58:26 2015
Raid Level : raid5
Array Size : 15608667136 (14885.58 GiB 15983.28 GB)
Used Dev Size : 3902166784 (3721.40 GiB 3995.82 GB)
Raid Devices : 5
Total Devices : 3
Persistence : Superblock is persistent

Update Time : Wed May 25 20:33:37 2022
State : clean, FAILED
Active Devices : 3
Working Devices : 3
Failed Devices : 0
Spare Devices : 0

Layout : left-symmetric
Chunk Size : 64K

Consistency Policy : unknown

Name : 43f65d04:data-0 (local to host 43f65d04)
UUID : e8efdcdc:ce0dd1ae:d5321061:c45a5391
Events : 44167

Number Major Minor RaidDevice State
0 8 35 0 active sync /dev/sdc3
1 8 51 1 active sync /dev/sdd3
2 8 19 2 active sync /dev/sdb3
- 0 0 3 removed
- 0 0 4 removed
root@Wingnut:/# mdadm --detail /dev/md126
/dev/md126:
Version : 1.2
Creation Time : Sat May 14 16:01:02 2022
Raid Level : raid1
Array Size : 1953373888 (1862.88 GiB 2000.25 GB)
Used Dev Size : 1953373888 (1862.88 GiB 2000.25 GB)
Raid Devices : 2
Total Devices : 2
Persistence : Superblock is persistent

Update Time : Wed May 25 13:56:57 2022
State : clean, degraded
Active Devices : 1
Working Devices : 1
Failed Devices : 1
Spare Devices : 0

Consistency Policy : unknown

Number Major Minor RaidDevice State
0 8 4 0 active sync
- 0 0 1 removed

1 8 68 - faulty

Sandshark · ‎2022-05-28

OK, from that I can see you have four drives installed. They are sdb, sdc, sde, and sdf. sdf is the one that's not a part of the array (along with one that's currently not installed) I think you pulled drive 1, which was sda at the time, then you re-inserted it and it became sdf (thast's normal for a removed and replaced/exchanged drive). But to be sure which drive is which, look at the results of get_disk_info. Note that the channels start at 0, not 1. So the device on channel 0 is bay 1. If my assumption above is wrong, then post the results and I'll modify the commands below.

You have two MDADM RAID groups. md127 is the main one: RAID5 of what should be 5 drives but has only 3. md126 is the second layer: RAID1 with one of the two drives.

So md126 is OK for now: one of two RAID drives is recoverable, and that's enough for RAID1. But since the BTRFS volume concatenates both RAID groups, you can't get anything from it by itself.

But md127 is missing two components, so the RAID5 is dead.

Your best bet at this point is to contact Netgear support, as I believe they can recover at least most of the volume (there may be errors in some files.

If that's not an option and you want to try yourself, recognizing that anything you do could make matters worse instead of better, is this:

mdadm --assemble --force /dev/md127 /dev/sdb3 /dev/sdc3 /dev/sde3 /dev/sdf3

If it doesn't work, cycle power and see if cat /proc/mdstat now shows the contents of md1 (the OS partition) to be md1 : active raid10 sda2[0] sdb2[1] sdc2[2] sdd2[3]. If it does, then try the assemble command again with the updated device names:

mdadm --assemble --force /dev/md127 /dev/sda3 /dev/sdb3 /dev/sce3 /dev/sdd3

If that doesn't work, and the drives are now in normal order, then try

mdadm /dev/md127 --re-add /dev/sda3

And if it says that's not possible:

mdadm /dev/md127 --add /dev/sda3

If this works, you'll have a degraded, but accessible volume from which you can copy the files (possibly with a few errors). From there, the best thing would be to factory default and start fresh, though there are other steps (rather complicated, like described in the post I referenced above) that might restore the volume to the state before you added the 10TB.

nifter · ‎2022-05-29

@Sandshark Thanks for the help. I decided to try ReclaiMe software and it seems to be working. Data is being recovered, just very very slow.

RN316 - Rejoin drive to dead volume?

RN316 - Rejoin drive to dead volume?

Re: RN316 - Rejoin drive to dead volume?

Re: RN316 - Rejoin drive to dead volume?

Re: RN316 - Rejoin drive to dead volume?

Re: RN316 - Rejoin drive to dead volume?

Re: RN316 - Rejoin drive to dead volume?