× NETGEAR will be terminating ReadyCLOUD service by July 1st, 2023. For more details click here.
Orbi WiFi 7 RBE973
Reply

RN104 fw:6.9.0 fail startup after RAID expansion

Whompin105
Aspirant

RN104 fw:6.9.0 fail startup after RAID expansion

Had 3x 4TB drives and was getting close to hitting capacity.  Added a 4th 4TB drive and waited a couple days for resync.  At some point I saw RETRY STARTUP, and cannot power off without pulling plug.  I've already tried removing the added drive, read only reboot, and boot menu OS reinstall, but still get failed startup.  RAIDar shows the device and 4 drives but says "Management Service is Offline" and I'm unable to get to admin page.   I can download logs, but I'm unsure how to interpret them.  Where should I look for clues? I guess I should add that it would be a real pain to lose the data, which also doesn't seem to be accessible in the current state. TIA

Message 1 of 15

Accepted Solutions
StephenB
Guru

Re: RN104 fw:6.9.0 fail startup after RAID expansion


@Whompin105 wrote:

Any ideas about what might cause the non-fresh disk being kicked from the array?


Non-fresh means it's not in sync (meaning some writes never made it to the disk). So the real question is why it's not in sync.  Was the NAS forcibly shut down before this (or suffer a power failure).

 

It is possible to force the array to assemble anyway.  Though being out-of-sync could result in some file system corruption/loss.

View solution in original post

Message 13 of 15

All Replies
mdgm
Virtuoso

Re: RN104 fw:6.9.0 fail startup after RAID expansion

It's strongly recommended to update your regular backup before expanding a data volume. No important data should be stored on the one device. When you remove a disk the volume becomes degraded and your data is at heightened risk until the RAID array is rebuilt.

 

A bit late for this now, but checking e.g. smart_history.log for clues as to whether any of the disks installed are failing or have failed, is a good idea for choosing which disk to replace first. Disks can and do fail at any time however.

You'd want to check mdstat.log to see if the data volume RAID layers md126 and md127 have been started.

 

You'd also want to check btrfs.log to see if the data volume is recognised there and also check if it's mounted e.g. in volume.log.

If the data volume is mounted that's a good sign. If it's not the management service probably failed to start due to a problem with the RAID or data volume, which would suggest a data recovery situation.

Message 2 of 15
Whompin105
Aspirant

Re: RN104 fw:6.9.0 fail startup after RAID expansion

I really appreciate the reply.  Thank you.

Ok, so I looked at smart_history and can see one of the drives had 8 pending sectors, 8 uncorrectible errors and 9 ata errors, which were all logged after the 4th drive was added (horizonal expansion) and failed boot.  I also used boot menu disk check before getting your response and it indicacted errors with disk1.  I removed disk 1 and can get the device to boot and let me in to the admin page. RAIDar indicates an inactive RAID5 data-0 volume with 10.9 of 10.9 TB used (actual data i risk losing is about 7TB) and innactive "RAID level unknown" data volume with 0MB of 0 MB used. Admin page says to remove innactive volumes to use disks #2, 3 and 4.  Was prompted for firmware update to 6.10.4, after which point the device will now boot with all 4 drives installed, but still shows inactive volume.  It seems the updated firmware allows it to finish booting and detects the disk errors rather than freezing up, so now Admin page shows "remove innactive volumes to use disk #1,2,3,4."  Anything else I can look into from here?  I'm not in a position to pursue paid data recovery, but would happily spend a bit of time trying out other options and learning some things in the process.  

Message 3 of 15
StephenB
Guru

Re: RN104 fw:6.9.0 fail startup after RAID expansion

can you copy/paste mdstat.log into a reply here?

Message 4 of 15
Whompin105
Aspirant

Re: RN104 fw:6.9.0 fail startup after RAID expansion

Personalities : [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
md1 : active raid10 sdd2[3] sdc2[2] sdb2[1] sda2[0]
1044480 blocks super 1.2 512K chunks 2 near-copies [4/4] [UUUU]

md0 : active raid1 sdd1[7] sdc1[4] sda1[6] sdb1[5]
4190208 blocks super 1.2 [4/4] [UUUU]

unused devices: <none>
/dev/md/0:
Version : 1.2
Creation Time : Tue Aug 19 04:37:49 2014
Raid Level : raid1
Array Size : 4190208 (4.00 GiB 4.29 GB)
Used Dev Size : 4190208 (4.00 GiB 4.29 GB)
Raid Devices : 4
Total Devices : 4
Persistence : Superblock is persistent

Update Time : Mon May 17 15:46:29 2021
State : clean
Active Devices : 4
Working Devices : 4
Failed Devices : 0
Spare Devices : 0

Consistency Policy : unknown

Name : 0e3603c2:0 (local to host 0e3603c2)
UUID : 5fea52a3:c212a53f:5d24e9bb:6426d7ec
Events : 69127

Number Major Minor RaidDevice State
4 8 33 0 active sync /dev/sdc1
7 8 49 1 active sync /dev/sdd1
5 8 17 2 active sync /dev/sdb1
6 8 1 3 active sync /dev/sda1

Message 5 of 15
Whompin105
Aspirant

Re: RN104 fw:6.9.0 fail startup after RAID expansion

I have tried to and my response post keeps disappearing - let's give it another shot:

Personalities : [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
md1 : active raid10 sdd2[3] sdc2[2] sdb2[1] sda2[0]
1044480 blocks super 1.2 512K chunks 2 near-copies [4/4] [UUUU]

md0 : active raid1 sdd1[7] sdc1[4] sda1[6] sdb1[5]
4190208 blocks super 1.2 [4/4] [UUUU]

unused devices: <none>
/dev/md/0:
Version : 1.2
Creation Time : Tue Aug 19 04:37:49 2014
Raid Level : raid1
Array Size : 4190208 (4.00 GiB 4.29 GB)
Used Dev Size : 4190208 (4.00 GiB 4.29 GB)
Raid Devices : 4
Total Devices : 4
Persistence : Superblock is persistent

Update Time : Mon May 17 15:46:29 2021
State : clean
Active Devices : 4
Working Devices : 4
Failed Devices : 0
Spare Devices : 0

Consistency Policy : unknown

Name : 0e3603c2:0 (local to host 0e3603c2)
UUID : 5fea52a3:c212a53f:5d24e9bb:6426d7ec
Events : 69127

Number Major Minor RaidDevice State
4 8 33 0 active sync /dev/sdc1
7 8 49 1 active sync /dev/sdd1
5 8 17 2 active sync /dev/sdb1
6 8 1 3 active sync /dev/sda1

Model: RN104|ReadyNAS 100 Series 4- Bay
Message 6 of 15
Whompin105
Aspirant

Re: RN104 fw:6.9.0 fail startup after RAID expansion

I have tried to, but whenever I reload the page the reply post disappears.  I've tried sending it in a PM

Message 7 of 15
StephenB
Guru

Re: RN104 fw:6.9.0 fail startup after RAID expansion

This tells us that the OS partition has expanded to all four disks.  It's not clear why the horizonal expansion failed. 

 

The volume is likely out of sync at this point, but there could be other things wrong.  

 

Have you ever used the linux command line interface?

 

 


@Whompin105 wrote:

I have tried to, but whenever I reload the page the reply post disappears.  I've tried sending it in a PM


There is an automatic spam filter that sometimes kicks in.  Periodically the mods review the quarantine queue and release false positives.

 

I can also release them, so you can PM me if it happens again.

Message 8 of 15
Whompin105
Aspirant

Re: RN104 fw:6.9.0 fail startup after RAID expansion

I'm fairly comfortable on a Linux terminal, but haven't used the CLI to interface with the Netgear NAS before.  It looks like I can SSH into the NAS as root, but I haven't played around enough with btrfs to know where to start diagnosing the filesytem.

Message 9 of 15
Whompin105
Aspirant

Re: RN104 fw:6.9.0 fail startup after RAID expansion

I think that filter must just not like me very much.  I'm comfortable using Linux CLI, and I am able to ssh into root on my ReadyNAS.  I 

Message 10 of 15
StephenB
Guru

Re: RN104 fw:6.9.0 fail startup after RAID expansion

You can try downloading the full log zip file from the NAS web ui, and look in the various files for errors.

 

Similarly, you can explore the logs via ssh using journalctl.

 

The NAS uses mdadm to create the RAID array, and then creates the btrfs file system on top of that.

Message 11 of 15
Whompin105
Aspirant

Re: RN104 fw:6.9.0 fail startup after RAID expansion

Digging through all the log files to find anything resebling warnings or errors - it takes some time since I don't know where to look, but grep helped me find a few things.  In kernal.log I see what looks like an attempt to create the raid 5 array, where it binds 4 devices but then has a line that says

"nas kernel: md: kicking non-fresh sda3 from array!" 

I'm not sure exactly what this means, but I assume there's some issue with one of the disks, so it unbinds that one and then continues with the remaining 3 disks.

"nas kernel: md/raid:md127: raid level 5 active with 3 out of 4 devices, algorithm 2"

"nas kernel: md127: detected capacity change from 0 to 7991637573632"

"nas kernel: md: reshape of RAID array md127"

Then durring the reshape it appears there is an ATA error,  not correctable errors on several sectors of sdd3 followed by:

"nas kernel: md/raid:md127: Disk failure on sdd3, disabling device."

 

So it appears that 1 of 4 disks is initally ignored due to it's "non-fresh" status, and then reshaping fails due to errors on one of the remaining 3 disks, so the RAID array never gets built and btrfs can't mount and I can't acess my data volume.  Any ideas about what might cause the non-fresh disk being kicked from the array?

Message 12 of 15
StephenB
Guru

Re: RN104 fw:6.9.0 fail startup after RAID expansion


@Whompin105 wrote:

Any ideas about what might cause the non-fresh disk being kicked from the array?


Non-fresh means it's not in sync (meaning some writes never made it to the disk). So the real question is why it's not in sync.  Was the NAS forcibly shut down before this (or suffer a power failure).

 

It is possible to force the array to assemble anyway.  Though being out-of-sync could result in some file system corruption/loss.

Message 13 of 15
Whompin105
Aspirant

Re: RN104 fw:6.9.0 fail startup after RAID expansion

Ok, so here is my best guess as to what happened.  The 4th disk was added for horizontal expansion, but before expansion completed power flickered out, so 4th drive is out of sync (shows 29k events as opposed to 35k on other disks).  When power restored disk 1 had ATA errors preventing boot.  After removing disk 1 and booting and upgrading FW, the NAS could boot with all 4 disks installed, but fails to asemple the array due to out-of sync disk4 and errors on disk1. 

From ssh session I was able to mdadm force assemble the data array using disks 1-3, and leaving off the 4th out of sync disk.  I'm not sure if data is all in tact due to possible bad disk 1, but I have the volume mounted now and am trying to back up what I can to USB external.  I'm only getting ~25MBps so it's going to take 3 days to back up.  This feels slow to me (like half what I would expect). I'm using a 8TB WD elements NTFS external and data consists mostly of large media files.  I used the web interface backup funtion to initiate the transfer.

 

After the backup is complete, the question is what to do next. The array is degraded with disk 4 not included, but also possibly damaged due to disk 1 errors.  Do I chuck disk 1, buy another disk, reset to factory and restore my backup?  Or can I do something to get disk 4 in-sync and then replace disk1 with a new disk, potentially never losing access to the data in the process?

 

 

Message 14 of 15
StephenB
Guru

Re: RN104 fw:6.9.0 fail startup after RAID expansion


@Whompin105 wrote:

After the backup is complete, the question is what to do next. The array is degraded with disk 4 not included, but also possibly damaged due to disk 1 errors.  Do I chuck disk 1, buy another disk, reset to factory and restore my backup?  Or can I do something to get disk 4 in-sync and then replace disk1 with a new disk, potentially never losing access to the data in the process?

 


Both are reasonable. If there is evidence of file system corruption as you do the backup, then the factory reset is the way to go. 

 

If not, then you can always do the factory reset later on if you find evidence of corruption (as long as you keep the backup up to date). 

Message 15 of 15
Top Contributors
Discussion stats
  • 14 replies
  • 3104 views
  • 2 kudos
  • 3 in conversation
Announcements