Forum Discussion

Aspirant

Dec 17, 2015

Solved

Volume scan failed to run properly #26188803

I have a ReadyNAS Ultra 4 with four disks in X-RAID 2. One disk had several ATA errors and I replaced the old disk 1.5 TB with a new 4 TB disc. After reboot the NAS started with sync, and after about...

Adding Disks

Volume scan failed to run properly Ultra 4

JimTho

Jan 30, 2016

Hallelujah!

I have now managed to get my data volume back up! Unfortunately, Netgear L2 tech did not manage and I had to do this by myself.

As promised I will give you the results. Note that I am not a trained IT-engineer, so I got this from Google search and dedication. If you decide to do this it is on your own risk - I take no responsibility that it will work on your system.

I had only 4 sata-ports on my PC, and had to install Linux on a USB in order to get all 4 disks connected. I wanted to use Knoppix Linux, but I am sure most Linux versions would do.

Get Linux installed on the computer:

I installed Knoppix Linux to a USB-stick. This was not trivial as I had a new Z170 motherboard and a regular USB-boot would not work, using Universial-USB-installer-1.9.6.3 or unetbootin-windows-613. I ended up attaching a sata DVD and burned a Knoppix DVD, booted from the DVD and installed Knoppix on the USB.

Removed the DVD-RW drive and attached all the 4 NAS-drives to the PC. Booted up in Knoppix (USB) and started to see if I could access the drives. I noticed that Knoppix displayed one mounted volume and a few volumes that was not mounted (these volumes turned out to be LVM physical volumes). I could access the files on the mounted RAID volume which turned out to be the NAS OS.

I performed several different commands after some Google search, being careful not to run anything that could make changes to the drive, in case I needed to perform some data recovery.

First I checked the partition tables on all four drives using gdisk:

knoppix@Microknoppix:~$ sudo gdisk /dev/sda

They were identical, not shown as I did this in four different windows.

Then I wanted to checked the raid setup and ran:

knoppix@Microknoppix:~$ sudo mdadm --detail --scan

ARRAY /dev/md/4 metadata=1.2 name=A021B7C18D0C:4 UUID=d6301b60:0ce2f767:558c574f:db007ccb

ARRAY /dev/md/1 metadata=1.2 name=A021B7C18D0C:1 UUID=d2791ec8:5adda84e:c7463c2e:c0f2016b

ARRAY /dev/md/0 metadata=1.2 name=A021B7C18D0C:0 UUID=a218f0a3:1b607e2e:953b087b:04ed9c99

INACTIVE-ARRAY /dev/md3 metadata=1.2 name=A021B7C18D0C:3 UUID=5aa62eb3:fa4e39b8:213486da:d587542d

ARRAY /dev/md/2 metadata=1.2 name=A021B7C18D0C:2 UUID=829ccffc:55683ba6:36bb7959:6eed3523

From this I figured out there was an inactive array md3.

Then I used e2fsck to check the partition:

knoppix@Microknoppix:~$ e2fsck /dev/md3

e2fsck 1.42.13 (17-May-2015)

e2fsck: Invalid argument while trying to open /dev/md3

 

The superblock could not be read or does not describe a valid ext2/ext3/ext4

filesystem. If the device is valid and it really contains an ext2/ext3/ext4

filesystem (and not swap or ufs or something else), then the superblock

is corrupt, and you might try running e2fsck with an alternate superblock:

   e2fsck -b 8193 <device>

or

   e2fsck -b 32768 <device>

This made me think there was a problem with the superblocks on the partitions, that turned out not to be important. Searching and looking for answers I decided to stop the array and start it again:

knoppix@Microknoppix:~$ sudo mdadm --stop --scan

knoppix@Microknoppix:~$ sudo mdadm --assemble --scan

mdadm: /dev/md/4 has been started with 2 drives (out of 3).

mdadm: restoring critical section

mdadm: /dev/md/3 has been started with 4 drives.

mdadm: /dev/md/2 has been started with 4 drives.

mdadm: /dev/md/1 has been started with 4 drives.

mdadm: /dev/md/0 has been started with 4 drives.

mdadm: Found some drive for an array that is already active: /dev/md/4

mdadm: giving up.

Then used lvmdiskscan to see if I could see the volumes and if there was a problem with any of them :

knoppix@Microknoppix:~$ sudo lvmdiskscan

/run/lvm/lvmetad.socket: connect failed: No such file or directory

WARNING: Failed to connect to lvmetad. Falling back to internal scanning.

/dev/ram0 [       4.00 MiB]

/dev/md0   [       4.00 GiB]

/dev/ram1 [       4.00 MiB]

/dev/md1   [   1023.88 MiB]

/dev/ram2 [       4.00 MiB]

/dev/md2   [       4.08 TiB] LVM physical volume

/dev/ram3 [       4.00 MiB]

/dev/md3   [     931.50 GiB] LVM physical volume

/dev/ram4 [       4.00 MiB]

/dev/md4   [       3.64 TiB] LVM physical volume

/dev/ram5 [       4.00 MiB]

/dev/ram6 [       4.00 MiB]

/dev/ram7 [       4.00 MiB]

/dev/ram8 [       4.00 MiB]

/dev/ram9 [       4.00 MiB]

/dev/ram10 [       4.00 MiB]

/dev/ram11 [       4.00 MiB]

/dev/ram12 [       4.00 MiB]

/dev/ram13 [       4.00 MiB]

/dev/ram14 [       4.00 MiB]

/dev/ram15 [       4.00 MiB]

/dev/sde1 [       4.46 GiB]

/dev/sde2 [     24.82 GiB]

/dev/sdf1 [       4.46 GiB]

0 disks

21 partitions

0 LVM physical volume whole disks

3 LVM physical volumes

There was 3 volumes listed. Followed up with lvdisplay to see the logical volume:

knoppix@Microknoppix:~$ sudo lvdisplay

/run/lvm/lvmetad.socket: connect failed: No such file or directory

WARNING: Failed to connect to lvmetad. Falling back to internal scanning.

--- Logical volume ---

LV Path               /dev/c/c

LV Name               c

VG Name               c

LV UUID               DHaiSO-OE5j-wbTe-rW1L-Zh1L-DNFP-vbPjvA

LV Write Access       read/write

LV Creation host, time ,

LV Status             NOT available

LV Size               6.80 TiB

Current LE             111404

Segments               3

Allocation             inherit

Read ahead sectors     auto

From here I assumed the volume c was not available. Followed up with lvscan:

knoppix@Microknoppix:~$ sudo lvscan

/run/lvm/lvmetad.socket: connect failed: No such file or directory

WARNING: Failed to connect to lvmetad. Falling back to internal scanning.

inactive         '/dev/c/c' [6.80 TiB] inherit

Hmm. The data volume (c) was inactive. Now, I had previously tried to activate the array using mdadm --detail --scan. I searched the web further and came across this site/post that solved the case: http://pissedoffadmins.com/os/mount-unknown-filesystem-type-lvm2_member.html

knoppix@Microknoppix:~$ modprobe dm-mod

knoppix@Microknoppix:~$ sudo vgchange -ay

/run/lvm/lvmetad.socket: connect failed: No such file or directory

WARNING: Failed to connect to lvmetad. Falling back to internal scanning.

1 logical volume(s) in volume group "c" now active

Voila! The volume came up and I then managed to mount it! I put all the disks back in the Netgear NAS and it booted normally. I am now transferring files to the other backup Netgear NAS as we speak. I guess this will take a bit. Also the 4th disk is now resyncing.

Sat Jan 30 17:04:37 CET 2016 System is up.

Sat Jan 30 17:04:37 CET 2016 Volume C is approaching capacity: 88% used 878G available

Sun Jan 17 12:15:59 CET 2016 System is up.

Sun Jan 17 12:15:59 CET 2016 The paths for the shares listed below could not be found. Typically, this occurs when the ReadyNAS is unable to access the data volume. Squeezeboxserver Documents Video media Photos Music

Sun Jan 17 12:15:41 CET 2016 Volume scan failed to run properly.

I hope this can be useful for others, including the L2 Netgear support, which in my opinion should have been able to address this issue in the first place. Not letting me go searching around the web for possible solutions. If I am able to figure this out (though I have a PhD in genetics, and have been around computers for 25 years) an engineer at Netgear definitely should have fixed this easily. This in my point qualify for a refund! Also, that Netgear does not log their service to provide proof/documentation of their work is surprising.

I am happy I figured it out, and hope this can be useful for someone else in a similar situation.

JimTho

Aspirant

Jan 27, 2016

Netgear Support has not got back to me regarding the SSH log files as requested. Another 7 days with no good news. Not sure if this is normal, anyway bad sign for my case.

I have started to search for answers and have found some possible solutions/things to check out.

The intention now is to perform diagnostic "tests" within the next few days.

Too bad I am not able to get information from Netgear of what they have done while the Unit was in "debug mode" as this could better help me to address the situation/troubleshooting.

I hope for good news when I start the troubleshooting myself.

JimTho

Aspirant

Jan 30, 2016

Hallelujah!

I have now managed to get my data volume back up! Unfortunately, Netgear L2 tech did not manage and I had to do this by myself.

I had only 4 sata-ports on my PC, and had to install Linux on a USB in order to get all 4 disks connected. I wanted to use Knoppix Linux, but I am sure most Linux versions would do.

Get Linux installed on the computer:

Removed the DVD-RW drive and attached all the 4 NAS-drives to the PC. Booted up in Knoppix (USB) and started to see if I could access the drives. I noticed that Knoppix displayed one mounted volume and a few volumes that was not mounted (these volumes turned out to be LVM physical volumes). I could access the files on the mounted RAID volume which turned out to be the NAS OS.

I performed several different commands after some Google search, being careful not to run anything that could make changes to the drive, in case I needed to perform some data recovery.

First I checked the partition tables on all four drives using gdisk:

knoppix@Microknoppix:~$ sudo gdisk /dev/sda

They were identical, not shown as I did this in four different windows.

Then I wanted to checked the raid setup and ran:

knoppix@Microknoppix:~$ sudo mdadm --detail --scan

ARRAY /dev/md/4 metadata=1.2 name=A021B7C18D0C:4 UUID=d6301b60:0ce2f767:558c574f:db007ccb

ARRAY /dev/md/1 metadata=1.2 name=A021B7C18D0C:1 UUID=d2791ec8:5adda84e:c7463c2e:c0f2016b

ARRAY /dev/md/0 metadata=1.2 name=A021B7C18D0C:0 UUID=a218f0a3:1b607e2e:953b087b:04ed9c99

INACTIVE-ARRAY /dev/md3 metadata=1.2 name=A021B7C18D0C:3 UUID=5aa62eb3:fa4e39b8:213486da:d587542d

ARRAY /dev/md/2 metadata=1.2 name=A021B7C18D0C:2 UUID=829ccffc:55683ba6:36bb7959:6eed3523

From this I figured out there was an inactive array md3.

Then I used e2fsck to check the partition:

knoppix@Microknoppix:~$ e2fsck /dev/md3

e2fsck 1.42.13 (17-May-2015)

e2fsck: Invalid argument while trying to open /dev/md3

 

The superblock could not be read or does not describe a valid ext2/ext3/ext4

filesystem. If the device is valid and it really contains an ext2/ext3/ext4

filesystem (and not swap or ufs or something else), then the superblock

is corrupt, and you might try running e2fsck with an alternate superblock:

   e2fsck -b 8193 <device>

or

   e2fsck -b 32768 <device>

This made me think there was a problem with the superblocks on the partitions, that turned out not to be important. Searching and looking for answers I decided to stop the array and start it again:

knoppix@Microknoppix:~$ sudo mdadm --stop --scan

knoppix@Microknoppix:~$ sudo mdadm --assemble --scan

mdadm: /dev/md/4 has been started with 2 drives (out of 3).

mdadm: restoring critical section

mdadm: /dev/md/3 has been started with 4 drives.

mdadm: /dev/md/2 has been started with 4 drives.

mdadm: /dev/md/1 has been started with 4 drives.

mdadm: /dev/md/0 has been started with 4 drives.

mdadm: Found some drive for an array that is already active: /dev/md/4

mdadm: giving up.

Then used lvmdiskscan to see if I could see the volumes and if there was a problem with any of them :

knoppix@Microknoppix:~$ sudo lvmdiskscan

/run/lvm/lvmetad.socket: connect failed: No such file or directory

WARNING: Failed to connect to lvmetad. Falling back to internal scanning.

/dev/ram0 [       4.00 MiB]

/dev/md0   [       4.00 GiB]

/dev/ram1 [       4.00 MiB]

/dev/md1   [   1023.88 MiB]

/dev/ram2 [       4.00 MiB]

/dev/md2   [       4.08 TiB] LVM physical volume

/dev/ram3 [       4.00 MiB]

/dev/md3   [     931.50 GiB] LVM physical volume

/dev/ram4 [       4.00 MiB]

/dev/md4   [       3.64 TiB] LVM physical volume

/dev/ram5 [       4.00 MiB]

/dev/ram6 [       4.00 MiB]

/dev/ram7 [       4.00 MiB]

/dev/ram8 [       4.00 MiB]

/dev/ram9 [       4.00 MiB]

/dev/ram10 [       4.00 MiB]

/dev/ram11 [       4.00 MiB]

/dev/ram12 [       4.00 MiB]

/dev/ram13 [       4.00 MiB]

/dev/ram14 [       4.00 MiB]

/dev/ram15 [       4.00 MiB]

/dev/sde1 [       4.46 GiB]

/dev/sde2 [     24.82 GiB]

/dev/sdf1 [       4.46 GiB]

0 disks

21 partitions

0 LVM physical volume whole disks

3 LVM physical volumes

There was 3 volumes listed. Followed up with lvdisplay to see the logical volume:

knoppix@Microknoppix:~$ sudo lvdisplay

/run/lvm/lvmetad.socket: connect failed: No such file or directory

WARNING: Failed to connect to lvmetad. Falling back to internal scanning.

--- Logical volume ---

LV Path               /dev/c/c

LV Name               c

VG Name               c

LV UUID               DHaiSO-OE5j-wbTe-rW1L-Zh1L-DNFP-vbPjvA

LV Write Access       read/write

LV Creation host, time ,

LV Status             NOT available

LV Size               6.80 TiB

Current LE             111404

Segments               3

Allocation             inherit

Read ahead sectors     auto

From here I assumed the volume c was not available. Followed up with lvscan:

knoppix@Microknoppix:~$ sudo lvscan

/run/lvm/lvmetad.socket: connect failed: No such file or directory

WARNING: Failed to connect to lvmetad. Falling back to internal scanning.

inactive         '/dev/c/c' [6.80 TiB] inherit

knoppix@Microknoppix:~$ modprobe dm-mod

knoppix@Microknoppix:~$ sudo vgchange -ay

/run/lvm/lvmetad.socket: connect failed: No such file or directory

WARNING: Failed to connect to lvmetad. Falling back to internal scanning.

1 logical volume(s) in volume group "c" now active

Sat Jan 30 17:04:37 CET 2016 System is up.

Sat Jan 30 17:04:37 CET 2016 Volume C is approaching capacity: 88% used 878G available

Sun Jan 17 12:15:59 CET 2016 System is up.

Sun Jan 17 12:15:59 CET 2016 The paths for the shares listed below could not be found. Typically, this occurs when the ReadyNAS is unable to access the data volume. Squeezeboxserver Documents Video media Photos Music

Sun Jan 17 12:15:41 CET 2016 Volume scan failed to run properly.

I am happy I figured it out, and hope this can be useful for someone else in a similar situation.

mdgm-ntgr
NETGEAR Employee Retired
Feb 01, 2016
You purchased a per incident support contract and support advised they would need a data recovery contract in place to escalate this to L3 the day after the case was opened. When you declined to purchase it they continued to provide what support they could.

Data recovery attempts inherently may be unsuccessful. It's important to backup your data if you value it. No important data should be stored on just the one device, no matter which device that is. See Preventing Catastrophic Data Loss

L3 support handles data recovery cases. It is not something which L2 is trained to perform. If you don't know what you're doing when attempting data recovery you can make things worse. I can see some strange commands in your list of commands I see you attempted to run a filesystem check on one of the raid layers even though there is a layer between the raid layers and the filesystem. In this case that command shouldn't have done any damage as it recognised that their wasn't an EXT filesystem directly on the raid layer, but with other commands you can do damage.

Every case is different and what "worked" for you may not be appropriate for others. Indeed it's possible from the list of commands that you provided that you could make the problem worse.

With problems like this one needs to carefully identify why one of the RAID layers failed to start and which of the disks to try and bring online. If you bring the wrong disks online then you can cause problems.

One needs to examine why md3 failed to start and whether to leave out the partition on one of the disks when starting the array or not.

You've had some expansion on this system and have a triple layer array which can complicate things considerably when it comes to data recovery.

Now you may get fortunate and find that blindly entering commands you've found on the web works fine, but then again you may not.

Your logs attached to the case show that one of your disks failed very badly.

If not in April, certainly by the alert in September it should have been clear that the disk needed replacing or at the very least that it was advisable to update your backup. A common cause of getting into data recovery situations is leaving a disk in the NAS that needs replacing for a long time after it should be replaced.
JimTho
Aspirant
Feb 01, 2016
Thank you mdgm for making time to comment this thread. I honestly appreciate what you and others do by participating in the discussion forum. Further, I respect your opinion and can live with the fact that we might not see things the same way. I have shared my experience, the way I experienced it. You might criticize me, which you of course are allowed to do. Given this is the first comment to this thread something like “I am happy to see that you managed to get you data back” from you would warm my heart a bit before you start a rather long criticism of me and my wrong doing. No hard feelings ... :smileywink:
mdgm wrote:
You purchased a per incident support contract and support advised they would need a data recovery contract in place to escalate this to L3 the day after the case was opened. When you declined to purchase it they continued to provide what support they could.

I honestly did not know what type of support existed when I called Netgear. It was suggested by a forum member to check with Support if a “pay-per incident” would be appropriate – which I did over the phone. Netgear Tech Support approved over the phone that this would be a good first go at the problem before they charged my credit card. Again I appreciate that you take time to go through the correspondence, but maybe you do not know the story as well as I do. Remember that I have been part of this experience since mid-December. Besides, even if we indeed were equally involved in the case we might end up with a different experience – which is a normal human response. However, I do not accept that I “declined” L3 support as Tech Support agreed that a “pay-per incident” would be a good start. If not, I would not pay 60 euro for nothing.

Data recovery attempts inherently may be unsuccessful. It's important to backup your data if you value it. No important data should be stored on just the one device, no matter which device that is. See Preventing Catastrophic Data Loss

Absolutely, and I cannot agree more. That is why I had a dedicated NAS (identical), and still has, to perform backup using rsync. If you read carefully, you would see that I indeed had that in place. Also, some of my “very” important data had a backup on the cloud (not mentioned as I did not see it very relevant at the time).

L3 support handles data recovery cases. It is not something which L2 is trained to perform. If you don't know what you're doing when attempting data recovery you can make things worse. I can see some strange commands in your list of commands I see you attempted to run a filesystem check on one of the raid layers even though there is a layer between the raid layers and the filesystem. In this case that command shouldn't have done any damage as it recognised that their wasn't an EXT filesystem directly on the raid layer, but with other commands you can do damage.

Again, thank you for bringing this up. I believe you are right. I have had the intention while posting on the board to be transparent – my intention is to get feedback to help me and others. And, you have made me aware that using the e2fsck command is not trivial and should be avoided in this case.

What still puzzles me, and maybe you can clear that up for me is what are L2 engineers trained to do? I was told that they could remotly log in make changes to see if they could get the volume up. Is this wrong?

As you know, since you have gone through the correspondence, L2 engineer has synced the partitions from one disk in the array to a new disk. The new disk was a part of an expanding array. In your opinion that is a job that L2 is qualified to do, right? However, it is not expected that they should be able to run lvdisplay and lvscan to investigate a problem with the volume? Followed up with running modprobe and the vgchange? You see, I do not know these things – and I believe my disappointment with Netgear Support, and don’t take this personally, comes down to communication. From my communication with Tech Support over the phone and in writing I was under the impression that the above diagnostic could be expected from L2 – hence I paid and now I am disappointed. Do you think it is wrong to express this disappointment on the forum?

Now you may get fortunate and find that blindly entering commands you've found on the web works fine, but then again you may not.

Now, I am sure you do not suggest that I blindly entered commands and by chance managed to get the volume back up? If so what do you think would be the odds for that to have a happy ending?

Your logs attached to the case show that one of your disks failed very badly.

If not in April, certainly by the alert in September it should have been clear that the disk needed replacing or at the very least that it was advisable to update your backup. A common cause of getting into data recovery situations is leaving a disk in the NAS that needs replacing for a long time after it should be replaced.

It is ok to look back and be wise – sometimes I do that. However, I did not know of the problems with the disks in April and not in September. I should, but i did not. When I did find out, that’s when the real trouble started, right? Again, this is something I learned from. I did not have email warning, which I should. I learned from this incident, which is a philosophy of mine. I make mistakes, and I learn from them. I gladly share my mistakes for others to learn from them, even though it might make me look silly.

Finishing your comment with what I failed to do to put myself in this situation in the first place - is hardly constructive. Remember, I managed to get my volume up and save my 6.8 TB of data something L2 Tech did not. TATA - a celebration for me! Even if you feel that everything is my fault with the outcome in this case it still does not change the fact that I did indeed have a bad experience with Tech Support. I believe that this comes down to communication and my expectations as a paying customer. My expectations, which I got from my dialog with Tech Support, were not met. I have learned from this, hopefully so did Tech Support. And maybe also some of the many forum members, if they visit this thread. :smileyhappy:
StephenB
Guru - Experienced User
Feb 01, 2016
JimTho wrote:

I managed to get my volume up and save my 6.8 TB of data.

And of course that is great news.

JimTho wrote:

I have learned from this, hopefully so did Tech Support.

My personal experience with tech support is pretty limited. I had a PSU fail on my NV+ last year, and tech support dealt with that swiftly and professionally. Some Netgear folks in the forum have also helped me directly from time to time (particularly with beta hardware and software). That includes both mdgm and skywalker (and probably others over the years).

My overall impression from posts here is that most people who do use tech support are happy with the outcome. But that is not to say that there isn't room for improvement.

I think overall your lessons from this are pretty balanced - some improvements in your own practices, and some areas where you think Netgear should have done better.
mdgm-ntgr
NETGEAR Employee Retired
Feb 02, 2016
Some of the advice I have to provide is general in nature and for the benefit of others who may come across the thread. It is certainly commendable that you were able to get back your data, but if others follow the steps you did they may not be so fortunate. Data recovery you attempt yourself is done at your own risk and may reduce the chances of a professional data recovery attempt being successful.

Now, I understand the problem was not resolved by support, but any support we provide is on a best effort basis. There is never any guarantee that the problem will be resolved or on how long it will take. The unit was out of warranty so if you had not purchased support then support would have proceeded to close the case rather than provide the support they did. The cloning of the partition table is an example of work that we did for you. You paid for support and it was provided.

Pay Per Incident is designed for dealing with a one off problem and does not cover data recovery. Now when per incident support is purchased it may/may not be immediately obvious whether it will lead to data recovery. If soon after per incident support is purchased it is determined that data recovery is needed and you promptly purchase that then we may be able to refund per incident support, but in this case there was quite a bit of support done under per incident.

I understand that you see it differently. You could ask support to raise your request for a refund with customer service, who could then review your case and come to their own conclusion.

Once it is determined that data recovery contract is needed then the case can't be escalated to L3 for troubleshooting until that contract is purchased.

Before you had the call where you purchased the per incident contract I see you were emailed and advised to purchase a data recovery contract.

Well you tried running e2fsck on the raid layer, whereas if you run that it has to be run on the filesystem. Best to run it read-only too at first as a filesystem check may not be advisable in some instances.

If you have an up to date backup then in cases like this an alternative would be to wipe the NAS and restore from backup rather than purchase support. Data recovery is there for those who don't value their data enough to store it on multiple devices at all times. At about $200USD an hour work performed (there is a Euro price too, not sure what that is) it does cost a bit, but then data recovery requires a lot of highly skilled work.

L2's may be able to do some basic troubleshooting using SSH and fix some problems that L3 has told them how to fix, but more complex problems such as data recovery is for L3s.

Reviewing the case notes the system was determined to be in the middle of expansion and that data recovery would be needed. One of the RAID layers wasn't coming online and it would require some investigation to determine how best to proceed to try and bring it online as safely as possible.

It's my experience that Advanced L2s will try to do what they can to resolve a problem and move the case forward but they know once they reach their limits.

Some of the commands you entered showed a lack of knowledge as to how things work (e.g. trying to run a filesystem check on a RAID layer) and also if your list is a complete list then checking to see if it was safe to try and bring up md3 wasn't done either. You may have got out of this fine this time, but others doing the same may not be so fortunate and as this is a public community I have to allow for the fact that others are likely to see this thread.

It's best not to learn how we do RAID, LVM etc. when doing data recovery. It's far better to power down, remove your disks (label order) and put some scratch disks in (must not be from your array) and setup a new volume and then power down, remove the scratch disks (label order) and experiment on the array that uses the scratch disks and doesn't matter, if you must.

I was pointing out that the problem could have been avoided in the first place if the faulty disk had been replaced in a timely manner back in April/September, or at least regular backups done so that could be used instead of data recovery if running into problems with the volume/array.
JimTho
Aspirant
Feb 11, 2016
mdgm wrote:
It's best not to learn how we do RAID, LVM etc. when doing data recovery. It's far better to power down, remove your disks (label order) and put some scratch disks in (must not be from your array) and setup a new volume and then power down, remove the scratch disks (label order) and experiment on the array that uses the scratch disks and doesn't matter, if you must.

That is ok, but luckily for me I found a solution that worked, in my case. And, I also learned from your feedback as I shared my experience.

I was pointing out that the problem could have been avoided in the first place if the faulty disk had been replaced in a timely manner back in April/September, or at least regular backups done so that could be used instead of data recovery if running into problems with the volume/array.

In this case the problem as stated before was not due to a faulty disk, but because I replaced a drive. It could as well be a fully functional drive. I rebooted the NAS because the GUI told me to "Please reboot your ReadyNAS device to continue with the update process."
In a mindless moment I thought this had to do with the expansion/new disk (I had not been doing this for some years) and clicked reboot in the gui. Now the update process the GUI reported was an automatic firmware update that took place at the same time. If the OS had been constructed better or been more "idiot-proof" it should not be allowed to reboot the device unless you pulled the power cord.

I see that you have not addressed all my questions, but you have provided a long reply. I will leave this post as is, and move on to secure my two NAS-units to make the data ready for the next disaster.