× NETGEAR will be terminating ReadyCLOUD service by July 1st, 2023. For more details click here.
Orbi WiFi 7 RBE973
Reply

Re: PRO6 "Remove inactive volumes" after interrupted balance

itachi2
Tutor

PRO6 "Remove inactive volumes" after interrupted balance

Firmware 6.9.3, RAID 6, 6 drives.  I let the volume get really full before my autoscheduled balance kicked on and ground everything to a halt.  Forced reboot gives all red drives and the "Remove inactive volumes" message.

 

I'm fine with opening a support ticket for recovery attempts, even read-only would be fine.

 

Attempted recovery mount has been running for an hour with little to no disk activity but is at 100% for one of my cores.  Logs are available.

 

Thanks much!

Model: ReadyNAS RNDP6000v2|ReadyNAS Pro 6 Chassis only
Message 1 of 13

Accepted Solutions
JohnCM_S
NETGEAR Employee Retired

Re: PRO6 "Remove inactive volumes" after interrupted balance

Hi itachi2,

 

It appears that the volume is already filled up when the balance ran which has given it no room to rewrite the data. When the file system comes up, it is trying to finish a large amount of previous tasks but there is no space so it just locks up.

 

You can try booting the NAS to read-only and check if you can access the data that way. Otherwise you have to look for the 3rd party data recovery to assist you in recovering the data.

 

As what @StephenB mentioned, Pro 6 with OS6 firmware is not supported by NETGEAR.

 

Regards,

View solution in original post

Message 4 of 13

All Replies
StephenB
Guru

Re: PRO6 "Remove inactive volumes" after interrupted balance

Paid support isn't available, since OS 6 isn't supported on the Pro-6.

 

Perhaps ask @JohnCM_S or @Marc_V via PM if they are willing to review the logs.

Message 2 of 13
JohnCM_S
NETGEAR Employee Retired

Re: PRO6 "Remove inactive volumes" after interrupted balance

Hi itachi2,

 

You may just upload the logs to a file sharing site then PM me the download link so we can review it.

 

Regards,

Message 3 of 13
JohnCM_S
NETGEAR Employee Retired

Re: PRO6 "Remove inactive volumes" after interrupted balance

Hi itachi2,

 

It appears that the volume is already filled up when the balance ran which has given it no room to rewrite the data. When the file system comes up, it is trying to finish a large amount of previous tasks but there is no space so it just locks up.

 

You can try booting the NAS to read-only and check if you can access the data that way. Otherwise you have to look for the 3rd party data recovery to assist you in recovering the data.

 

As what @StephenB mentioned, Pro 6 with OS6 firmware is not supported by NETGEAR.

 

Regards,

Message 4 of 13
JohnCM_S
NETGEAR Employee Retired

Re: PRO6 "Remove inactive volumes" after interrupted balance

Hi itachi2,

 

We’d greatly appreciate hearing your feedback letting us know if the information we provided has helped resolve your issue or if you need further assistance.

 

If your issue is now resolved, we encourage you to mark the appropriate reply as the “Accept as Solution” or post what resolved it and mark it as solution so others can be confident in benefiting from the solution. 
 
The NETGEAR community looks forward to hearing from you and being a helpful resource in the future!
 
Regards,

Message 5 of 13
itachi2
Tutor

Re: PRO6 "Remove inactive volumes" after interrupted balance

I can't be 100% sure, but I think my last reply to this thread may have been deleted.  Prior to this message I have 3 posts, 2 of which are replies.  Those two probably are the missing message in question and my edit of that message for grammar and style.  In this message I thanked both Stephen and John for their help and suggested other ways to help that could potentially assist users who either can't afford or are not eligible for support.

I also mentioned a few btrfs and fsck commands as potential avenues for the user to self-repair, and suggested that with such a high amount of experience and wisdom regarding Linux and btrfs that it would be even more useful to nudge or guide people in a relatively safe direction to attempt to get the original data online again rather than backup and restore for a few days (even at GB speeds).  I opined that many disclaimers could be given to absolve the adviser, forum and company from any ill effects that may arise from the use or misuse of the potentially proffered advice.

I mean no disrespect or insult with either this description of my mystery post or that post itself.  The help I was given was on a purely volunteer and altruistic level to begin with and I am appreciative.  The advice to mount in read-only is solid, and if luck is with you and time is no issue, that would allow most people in similar situations to recover their data.  However, the hobbyist in me can only think, "This doesn't _fix_ the problem that happened."  Though specifics would be unique to each volume, there is a methodical flowchart that one could follow depending on the error messages and output from commands given.

There have been participants in threads in the past who did go above and beyond to share the benefit of their breadth of knowledge to the folks in need.  Maybe I'm naive, but I don't see a problem with even a company rep suggesting a particular mount, fsck or btrfs command with the imperative to make sure that data is backed up first and the usual disclaimers regarding said commands and data loss and liability, etc.

And I also understand that this is in "Using your ReadyNAS," not "Data Recovery 101 for the Home Hacker," but I would argue that even with good use practices, sometimes it is useful, necessary, and edifying to go under the hood to tinker with the guts of the OS and filesystem, and that knowledge guided by wisdom will never be a bad thing.

So I'll mark the answer given as a solution, but not the solution I was hoping for, and it is a safe solution that uses days of time when an hour or two with well-crafted commands could potentially repair the damaged filesystem.

In any case, my upgraded Pro 6 has been my favorite NAS to use, and I look forward to restoring my data to continue using it.  Thanks again to all the forum writers and company reps for doing what they can.

Message 6 of 13
StephenB
Guru

Re: PRO6 "Remove inactive volumes" after interrupted balance


@itachi2 wrote:

I can't be 100% sure, but I think my last reply to this thread may have been deleted. 

 

...

Though specifics would be unique to each volume, there is a methodical flowchart that one could follow depending on the error messages and output from commands given.

 


It was just caught by the spam filter. Mods generally check the quarantine manually and release false positives, but there sometimes is too much spam for that to be practical. 

 

I do take your point on the value of a methodical flowchart to troubleshoot the mount failure problem.  It's not a problem I've had first hand, and unfortunately I haven't seen enough posted on the solutions to feel comfortable providing specific steps.  

 

The missing post was: 

 


@itachi2 wrote:

John and Stephen,

 

Thank you both for checking my logs and the input regarding my issue.  Like I said, I was able to mount the md127 and md0 in recovery with seemingly no ill effects.  I thought about deleting some things from my data volume in recovery and rebooting to see if it could complete the boot, but will only attempt that after I ensure a good backup to external drives, as my original plan was to backup then erase the old / create a new Data volume anyway, so nothing to lose by mucking with the drive at that point.

 

Currently copying everything across the network to some USB 3.0 CIFS shares mounted on my Windows 7 box, which is much faster than the ~25MB/s afforded by USB 2.0.  And it seems to be going okay - I should be able to get the bulk of the data off soon.

 

If you were inclined, I would have liked to have heard the benefit of your collective experience with some other possibilities for further self-help and repair... i.e. "If it were my NAS, I might attempt a mount with these options: x, y, and z" or "You may want to explore the commands 'btrfs-zero-log' or 'btrfs rescue' or 'btrfsck --repair [--init-extent-tree]' or 'btrfs check --repair' but I, the forum and Netgear are not responsible for any loss or damage that may occur from the use of these commands..."  The suggestion for booting in read-only is a good one, and I may have completely overlooked that when I was exploring my rescue options.  I wound up mounting /data as read-only anyway in recovery for safety's sake.

 

Well, I wound up rambling a bit...  Thanks again for the assist.


 

Message 7 of 13
itachi2
Tutor

Re: PRO6 "Remove inactive volumes" after interrupted balance

Stephen,

 

Thanks for reading.  I wasn't necessarily talking about a literal flowchart, but more of a general branching process that one can use depending on the errors/problems and success or failure of each remediating step.  The problem-solving techniques that come from research, practice, and experience. 

In any case, after I backed up most everything using a readonly mount and some Windows share destinations with CIFS, I decided to try a few commands to test.

 

btrfs check /dev/md/Data-0
btrfs check --repair /dev/md/Data-0
mount -t btrfs -o skip_balance /dev/md/Data-0 /mnt/data
btrfs balance cancel /mnt/data

The first "check" command is optional but I wanted to see the number and types of errors without committing changes.  Thankfully it seemed to be just one: "root 259 inode 1753733 errors 400, nbytes wrong" 

Relevant pages:

https://btrfs.wiki.kernel.org/index.php/Btrfsck

https://btrfs.wiki.kernel.org/index.php/Problem_FAQ (which says that "such errors should be fixable with 'btrfs check --repair'")

 

https://btrfs.wiki.kernel.org/index.php/Manpage/btrfs%285%29#MOUNT_OPTIONS

"Mount Options" directly above for the commands to mount without resuming the balance and to cancel the balance operation in progress.

Anyway, the check threw up lots of errors regarding qgroup counts and extent buffer leaks but did give me the all-important "err is 0"

Also note that the btrfsck page above mentions "check --repair" as a *last resort* after 5 preliminary recovery methods, and even then with a recommended btrfs-tools > 4.0.  You should probably use the booted firmware's version of the tools over ssh vs. the altered busybox telnet shell available through the boot menu as that is more likely to have an updated version.

In any case, I think I have my original file system back, though I can't be completely sure all files are whole--everything seems to be acting fine, though... or at least as badly as I had configured it before all this.  I had 60GB of space left, but I am hoping to leave a bit more for the system to use prior to the next balance - I already backed up a TB of archive data that doesn't need to stay live and will clear that space up.

Message 8 of 13
viperhansa
Virtuoso

Re: PRO6 "Remove inactive volumes" after interrupted balance

@itachi2 

 

Hi , can you go thru the steps and commands you used for this:

"Thank you both for checking my logs and the input regarding my issue.  Like I said, I was able to mount the md127 and md0 in recovery with seemingly no ill effects. "

 

Would be nice to have those steps since there seems more "inactive volumes" happening around...

And for reference, there IS some kind of paid support/recovery service for us that runs OS6 on legacy NAS.

I myself used it 10 months ago.

 

Best regards

 

// Hans.

Message 9 of 13
itachi2
Tutor

Re: PRO6 "Remove inactive volumes" after interrupted balance

Hans,

Nothing special, just commands basically the same since the RAIDiator days.  Note that some of the below commands assume the "techsupport" boot mode which is just BusyBox over telnet with a line to Netgear if they want to jump in.  User: root, pw: infr8ntdebug.  If you can work over ssh on a running system, I highly recommend that as it will be faster and you'll have all the updated tools and filesystem choices in a "full" Linux system available to you.

I'm pretty sure I stole these or similar from the forum, Stack Overflow or Super User.  I'm absolutely no expert...

*** Only probably necessary in techsupport recovery, but see if you have
*** /dev/md/, /dev/md0, and /dev/md127 already detected & assembled
*** ready to mount - I only used --assemble --scan which was likely redundant
echo DEVICE partitions > /etc/mdadm/mdadm.conf mdadm --examine --scan >> /etc/mdadm/mdadm.conf mdadm --assemble --scan *** End recovery commands
mkdir /mnt/data
mount -t btrfs -o ro /dev/md127 /mnt/data
*** mkdir /mnt/sys
*** mount -t btrfs /dev/md0 /mnt/sys

So really not much to it - just the "-o ro" for readonly allowed me to mount and copy my stuff to some Windows shares I mounted using CIFS.  md0 is the 4GB system partition, md127 is data, and md1 is swap I believe.  Just leave md0 or your system partition unmounted for now unless you can't boot to ssh...  And if you can boot to ReadyNAS OS there's no need to mount your system!

What I learned later were the commands above to be able to mount data without resuming the balance, the somewhat risky nature of running btrfs check (on unmounted data volume) to repair the filesystem, and the "btrfs balance cancel /mnt/data" which let me reboot normally without resuming the balance operation.  While I had the volume mounted in r/w I deleted a few things that I backed up just in case the balance decided to kick in again after reboot...

Please research any commands before you run them to make sure they're applicable and know what they do in your case.  Like, you usually won't need to configure mdadm.conf or assemble your RAID in normal mode!

 

So, if I were doing this over again, I would:

Boot up normally to an unmounted data volume but running ReadyNAS OS.

See if I can mount data volume read-only and back up everything necessary, incl ".apps" and related folders.  This may involve just copying everything "by hand" over the command line or using Midnight Commander.

If volume won't mount, check btrfs problems FAQ for other recovery mounts and tools. 

Run the btrfs check command without fixing to see what errors exist from my hard shutdown of the NAS 😞

If errors are minor, run btrfs check --repair to fix data volume (1 hour and up for each run for me)

Mount with -o skip_balance to avoid immediate balance resume.

Run "btrfs balance cancel [path to mounted volume]" to stop balance from restarting on next mount.

 

Lessons learned:

Either free up some space on the volume prior to the scheduled balance or run the balance manually.

Set up regular backup for the NAS for disasters and "volume abuse"

Techsupport boot mode is rarely necessary to use and most everything can be fixed in a running system...

Find out what forums the btrfs experts hang out in case I need them 🙂

 

I hope all this helps - it's more about the concepts than the copypaste here because one could potentially make things worse with a btrfs check.  That's why the readonly mount is a good thing if you need to make a current backup before the repair attempts.  I always go with the assumption that it's a goner and everything needs to be duplicated or restored before I try to fix anything.  Even if it turns out to be a minor repair, you have no way of knowing how serious it is before you start.

 

Message 10 of 13
viperhansa
Virtuoso

Re: PRO6 "Remove inactive volumes" after interrupted balance

@itachi2 

 

Thank you for an great answer!!

I'm definitely not an  expert at all so asnwers like this will surley help me and others ending up in the same situation!

Also as an last resort to try to repair and maby save some data after the cause is lost so to speak.

I have learned it the hard way and used paid support to get some data back.

 

So, again, THANK You!

 

regards

 

// Hans 

Message 11 of 13
StephenB
Guru

Re: PRO6 "Remove inactive volumes" after interrupted balance

FWIW, one potential cause of this problem is lost (cached) writes when the system is powered down. 

 

These lost writes can result in one or more RAID groups in the volume failing to assemble (an mdadm error), or a btrfs file system error, or both.  

 

While @itachi2's commands worked for him (and I think will be helpful for others), there is no guarantee that they will work for everyone, and there is also no guarantee that there will be no data loss or file corruption.  

 

 

 

Message 12 of 13
viperhansa
Virtuoso

Re: PRO6 "Remove inactive volumes" after interrupted balance

@StephenB 

I agree, it is not for everyone and any situation.

But if its a lost case, you have noting to lose and much to win if it works.

 

And about the write cache, it could be so for some users..

Not in my case, i have around 10-11 hours of ups supply just for my nas.  🙂
And it was shutdown gracefully.

Message 13 of 13
Top Contributors
Discussion stats
  • 12 replies
  • 3057 views
  • 2 kudos
  • 4 in conversation
Announcements