× NETGEAR will be terminating ReadyCLOUD service by July 1st, 2023. For more details click here.
Orbi WiFi 7 RBE973
Reply

ReadyNAS RN214 check root fs - booting 47% - stalled

Chapi
Aspirant

ReadyNAS RN214 check root fs - booting 47% - stalled

round

I have a RN214 runninfg V6.10.2 (downgraded from V6.10.3 a month or 2 ago).

 

I had to unplug the unit after it became un-responsive (again) but this time, instead of rebooting normally, it displayed the message "check rootfs" followed by "Booting" with indicate an increasing percentage that stalled at 47%. The unit does not respond to ping, disk activity light is off, power button is fllashing and unresponsive.

 

From other similar experience on this board, it looks like the 4Gig main partition is corrupted and that someone needs to go in the unit while in tech support mode to restore/fix the main partition.

 

I would like to retain the data currently in the /data partition if at all possible.

 

Background; for the last while, the unit has been frequentlty become unresponsive (cannot ping, cannot connect, does not respond to ping, disk activity light is off, power button is fllashing and unresponsive.

 

Login via ssh and using journalctl, the unit comes unresponsive after a few messages like "btrfs_transaction invoked oom-killer". When that happens, the 2 processes using the most memory by far are two readynasd processes.

 

The last time it went unresponsive, I was running a btrfs balance from the web interface as I added a third and eventually a 4th drives much later than the first 2. Before the balanced, I had used snapper to remove a lot of daily snapshot as there was indications in the community that this might have caused the oom problems that I was seeing.

 

I have fairly recent logs downloaded via the web interface and I has a ssh window opened to the NAS and running journalctl showing what was happening on the NAS at the time it went unresponsive.

 

What can I do to get this working again and (hopefully) access the /data partition?

 

Thanks

Model: RN214|4 BAY Desktop ReadyNAS Storage
Message 1 of 11

Accepted Solutions
StephenB
Guru

Re: ReadyNAS RN214 check root fs - booting 47% - stalled

If the NAS is used, then Netgear won't provide paid support.  readynasd is the Netgear NAS application itself.

 

Have you tried booting up the NAS read-only?  Probably worth doing before you try going into tech support mode.

 

Though if this isn't critical, I'd suggest just doing a factory default and rebuilding the NAS.

 

 

View solution in original post

Message 6 of 11

All Replies
StephenB
Guru

Re: ReadyNAS RN214 check root fs - booting 47% - stalled

The btrfs errors cause me to suspect the /data filesystem is corrupted (though it does sound like root is messed up also).

 

Do you have a backup of the data? 

Message 2 of 11
Sandshark
Sensei

Re: ReadyNAS RN214 check root fs - booting 47% - stalled

I have found readynasd to be very problematic when it is having difficulty accessing the volume, be that during a sync, scrub, or because of a problem with the volume as you seem to be experiencing.  It will ultimately reach 100% CPU usage and lock up the NAS.  This behavior has been reported to Netgear via the forum multiple times, and mostly ignored.  When not completely ignored, they say they can't re-produce it.

 

My speculation is that it just keeps stacking up failed drive access attempts until it runs out of space and/or throughput.  It has no "escape clause" that would at least exit and report the problem.

 

Of course, even reporting a problem would likely not solve your issue; though it may give you more clues as to how to solve it.

 

I agree with your assessemnt that somebody likely needs to go in in support mode and try to fix it.  Are you looking for instructions on how to do that yourself or how to get support from Netgear?

Message 3 of 11
Chapi
Aspirant

Re: ReadyNAS RN214 check root fs - booting 47% - stalled

@StephenBThis now failing NAS is use to backup 3 other NAS that I own. I am exploiting btrfs snapshots to keep multiple versions of files that changed on the 3 other NAS.

 

There is a few files that I'd really like to get to on the /data partition but, assuing the none of the other 3 NAS dies on me, I have no critical data on the failing NAS that I cannot recreate.

 

Should I have seen warnings about the btrfs errors? Was there a way to check the health of the btrfs /data partition before it died on me?

 

Thanks.

Message 4 of 11
Chapi
Aspirant

Re: ReadyNAS RN214 check root fs - booting 47% - stalled

 

@SandsharkI am looking preferably for instructions on how to repair/replace the main filesystem so I can fix it myself if it ever happens again.

 

I bought that NAS used so I have no formal support. I would have to check how much Netgear charges for looking into this and fixing it (assuming they charge only if the repair of the main filesystem is sucessful)...

 

I have seen posts from members of this community fixing similar problems so that is an option too...

 

What is the purpose of readynasd? Is there supposed to be more than 1 active at any given time?

 

Thanks

Message 5 of 11
StephenB
Guru

Re: ReadyNAS RN214 check root fs - booting 47% - stalled

If the NAS is used, then Netgear won't provide paid support.  readynasd is the Netgear NAS application itself.

 

Have you tried booting up the NAS read-only?  Probably worth doing before you try going into tech support mode.

 

Though if this isn't critical, I'd suggest just doing a factory default and rebuilding the NAS.

 

 

Message 6 of 11
Chapi
Aspirant

Re: ReadyNAS RN214 check root fs - booting 47% - stalled

@StephenB  I just tried booting in RO mode and it did boot. I can access the NAS with the /data partition in RO. I got the few filles that I needed from it...

 

I looked as the log; the last entry was "starting Balance"... I was running balance as the NAS started with 2 disk, then after it was getting full, a third disk was added, and then, after it was getting full again, the 4th disk was added.

 

I'm not thrilled about doing factory default and re-buiding the NAS as I will lose snapshots of files but, considering the problems I had recently, it's likely what I need to do...

 

What do I need to save before doing the factory reset other than the obvious (userids, groups, shares)? And, going forward, what should I check/backup to improve problem determination/likelyhood of fix/keeping the NAS Healthy? When should I update the OS to the latest (6.10.3)?

 

Do you recomend running a memory test and a disk test before doing the factory reset?

 

Thanks

Message 7 of 11
StephenB
Guru

Re: ReadyNAS RN214 check root fs - booting 47% - stalled


@Chapi wrote:

 

I'm not thrilled about doing factory default and re-buiding the NAS as I will lose snapshots of files but, considering the problems I had recently, it's likely what I need to do...

 


It is a nuisance for sure.  


@Chapi wrote:

 

 

What do I need to save before doing the factory reset other than the obvious (userids, groups, shares)? 

 

Do you recomend running a memory test and a disk test before doing the factory reset?

 


 

I think you should look at the log zip file next - checking for disk errors and btrfs errors.  disk-info.log, kernel.log and system.log would be good to check.  If you see disk issues, then of course you need to deal with them.  I wouldn't run a memory test.

 

You can save the configuration and then restore it after the reset.  However, that might not get all settings, so it is useful to document what you have.  Screen shots on the various pages is one way.  Also, make sure you reinstall any apps before you restore the configuration.

 


@Chapi wrote:

 

 And, going forward, what should I check/backup to improve problem determination/likelyhood of fix/keeping the NAS Healthy? When should I update the OS to the latest (6.10.3)?

 

On the first question, I do recommend running the maintenance functions on a regularly schedule.  Personally I run each every three months (on a staggered schedule).  Also, download the log zip files every couple of months and take a look at the disk smart stats in disk-info.log.  Of course don't let the volume got more than about 80% full.

 

Also, I don't recommend using smart snapshots.  I suggest custom snapshots, with retention explicitly set.  Smart snapshots have indefinite retention, and eventually will fill the volume.

 

On the update question - no real advice there.  I currently am running 6.10.3 (current stable release) and have had no issues with it.  But staying with the long term stable release is reasonable too.

 

 

Message 8 of 11
Chapi
Aspirant

Re: ReadyNAS RN214 check root fs - booting 47% - stalled


@StephenB wrote:

@Chapi wrote:

 

I'm not thrilled about doing factory default and re-buiding the NAS as I will lose snapshots of files but, considering the problems I had recently, it's likely what I need to do...

 


It is a nuisance for sure.  


@Chapi wrote:

 

 

What do I need to save before doing the factory reset other than the obvious (userids, groups, shares)? 

 

Do you recomend running a memory test and a disk test before doing the factory reset?

 


 

I think you should look at the log zip file next - checking for disk errors and btrfs errors.  disk-info.log, kernel.log and system.log would be good to check.  If you see disk issues, then of course you need to deal with them.  I wouldn't run a memory test.

 

disk-info.log:

Health Data looks good; the only "curious" thing is that one disk (the newest with 11141 power on hours (POH)) shows 2 commands timeouts. The secod newest disk (13588 POH) shows 1 command timeout. The 2 oldest disks (16181 POH) shows 0 commands timeout.

 

Kernel.log:

A few days before the balance, is saw these errors:

Jun 10 21:16:05 RN4 kernel: BTRFS info (device md127): qgroup_rescan_init failed with -115
Jun 10 21:17:11 RN4 kernel: BTRFS info (device md127): qgroup_rescan_init failed with -115
Jun 10 23:51:41 RN4 kernel: BTRFS info (device md127): qgroup scan completed (inconsistency flag cleared)
Jun 12 12:35:52 RN4 kernel: BTRFS info (device md127): qgroup scan completed (inconsistency flag cleared)

 

After balance was started I saw these:

Jun 16 21:52:03 RN4 kernel: BTRFS info (device md127): relocating block group 17222848217088 flags data
Jun 16 21:52:16 RN4 kernel: BTRFS info (device md127): found 31 extents
Jun 16 21:52:23 RN4 kernel: BTRFS info (device md127): found 31 extents
Jun 16 21:52:23 RN4 kernel: BTRFS info (device md127): relocating block group 17206742089728 flags data
Jun 16 21:52:30 RN4 kernel: BTRFS info (device md127): found 28 extents
Jun 16 21:52:33 RN4 kernel: BTRFS info (device md127): found 28 extents
Jun 16 21:52:33 RN4 kernel: BTRFS info (device md127): relocating block group 17159497449472 flags data
Jun 16 21:52:39 RN4 kernel: BTRFS info (device md127): found 27 extents
Jun 16 21:52:42 RN4 kernel: BTRFS info (device md127): found 27 extents
Jun 16 21:52:42 RN4 kernel: BTRFS info (device md127): relocating block group 17158423707648 flags data
Jun 16 21:52:43 RN4 kernel: BTRFS info (device md127): found 2 extents
Jun 16 21:52:46 RN4 kernel: BTRFS info (device md127): found 2 extents
Jun 16 21:52:46 RN4 kernel: BTRFS info (device md127): relocating block group 17157349965824 flags data
Jun 16 21:52:55 RN4 kernel: BTRFS info (device md127): found 18 extents
Jun 16 21:52:57 RN4 kernel: BTRFS info (device md127): found 18 extents
Jun 16 21:52:57 RN4 kernel: BTRFS info (device md127): relocating block group 17150907514880 flags data
Jun 16 21:52:57 RN4 kernel: BTRFS info (device md127): relocating block group 17149833773056 flags data
Jun 16 21:52:59 RN4 kernel: BTRFS info (device md127): found 4 extents
Jun 16 21:53:01 RN4 kernel: BTRFS info (device md127): found 4 extents
Jun 16 21:53:01 RN4 kernel: BTRFS info (device md127): relocating block group 17128358936576 flags data
Jun 16 21:53:03 RN4 kernel: BTRFS info (device md127): found 7 extents
Jun 16 21:53:05 RN4 kernel: BTRFS info (device md127): found 7 extents
Jun 16 21:53:06 RN4 kernel: BTRFS info (device md127): relocating block group 17127285194752 flags data
Jun 16 21:53:06 RN4 kernel: BTRFS info (device md127): relocating block group 17126211452928 flags data
Jun 16 21:53:06 RN4 kernel: BTRFS info (device md127): relocating block group 17125137711104 flags data
Jun 16 21:53:07 RN4 kernel: BTRFS info (device md127): found 1 extents
Jun 16 21:53:09 RN4 kernel: BTRFS info (device md127): found 1 extents
Jun 16 21:53:09 RN4 kernel: BTRFS info (device md127): relocating block group 17124063969280 flags data
Jun 16 21:53:16 RN4 kernel: BTRFS info (device md127): found 31 extents
Jun 16 21:53:19 RN4 kernel: BTRFS info (device md127): found 31 extents
Jun 16 21:53:19 RN4 kernel: BTRFS info (device md127): relocating block group 17122990227456 flags data
Jun 16 21:53:19 RN4 kernel: BTRFS info (device md127): relocating block group 17121916485632 flags data
Jun 16 21:53:19 RN4 kernel: BTRFS info (device md127): found 1 extents
Jun 16 21:53:21 RN4 kernel: BTRFS info (device md127): found 1 extents
Jun 16 21:53:22 RN4 kernel: BTRFS info (device md127): relocating block group 17120842743808 flags data
Jun 16 21:53:22 RN4 kernel: BTRFS info (device md127): relocating block group 17119769001984 flags data
Jun 16 21:53:22 RN4 kernel: BTRFS info (device md127): relocating block group 17118695260160 flags data
Jun 16 21:53:22 RN4 kernel: BTRFS info (device md127): relocating block group 17117621518336 flags data
Jun 16 21:53:22 RN4 kernel: BTRFS info (device md127): relocating block group 16986625015808 flags data
Jun 16 21:53:28 RN4 kernel: BTRFS info (device md127): found 22 extents
Jun 16 21:53:31 RN4 kernel: BTRFS info (device md127): found 22 extents
Jun 16 21:53:32 RN4 kernel: BTRFS info (device md127): relocating block group 16985551273984 flags data
Jun 16 21:53:33 RN4 kernel: BTRFS info (device md127): found 13 extents
Jun 16 21:53:36 RN4 kernel: BTRFS info (device md127): found 13 extents
Jun 16 21:53:36 RN4 kernel: BTRFS info (device md127): relocating block group 16984477532160 flags data
Jun 16 21:53:39 RN4 kernel: BTRFS info (device md127): found 6 extents
Jun 16 21:53:42 RN4 kernel: BTRFS info (device md127): found 6 extents
Jun 16 21:53:42 RN4 kernel: BTRFS info (device md127): relocating block group 16981256306688 flags data
Jun 16 21:53:43 RN4 kernel: BTRFS info (device md127): found 3 extents
Jun 16 21:53:45 RN4 kernel: BTRFS info (device md127): found 3 extents
Jun 16 21:53:45 RN4 kernel: BTRFS info (device md127): relocating block group 16978035081216 flags data
Jun 16 21:53:47 RN4 kernel: BTRFS info (device md127): found 4 extents
Jun 16 21:53:49 RN4 kernel: BTRFS info (device md127): found 4 extents
Jun 16 21:53:49 RN4 kernel: BTRFS info (device md127): relocating block group 16974813855744 flags data
Jun 16 21:53:52 RN4 kernel: BTRFS info (device md127): found 9 extents
Jun 16 21:53:54 RN4 kernel: BTRFS info (device md127): found 9 extents
Jun 16 21:53:55 RN4 kernel: BTRFS info (device md127): relocating block group 16969445146624 flags data
Jun 16 21:54:01 RN4 kernel: BTRFS info (device md127): found 30 extents
Jun 16 21:54:03 RN4 kernel: BTRFS info (device md127): found 30 extents
Jun 16 21:54:03 RN4 kernel: BTRFS info (device md127): relocating block group 16966223921152 flags data
Jun 16 21:54:10 RN4 kernel: BTRFS info (device md127): found 33 extents
Jun 16 21:54:13 RN4 kernel: BTRFS info (device md127): found 33 extents
Jun 16 21:54:13 RN4 kernel: BTRFS info (device md127): relocating block group 14499838951424 flags metadata|dup
Jun 16 22:39:27 RN4 kernel: apache2 invoked oom-killer: gfp_mask=0x26000c0, order=0, oom_score_adj=0
Jun 16 22:39:27 RN4 kernel: apache2 cpuset=/ mems_allowed=0
Jun 16 22:39:27 RN4 kernel: CPU: 2 PID: 10435 Comm: apache2 Tainted: P O

4.4.184.alpine.1 #1

and the NAS looked up around Jun 16 23:52:36 after many oom-killer messages

 

System.log: shows many of these on the same day:

 

Jun 14 18:44:05 RN4 snapperd[24243]: special btrfs cmpDirs
Jun 14 18:44:06 RN4 snapperd[24243]: btrfs_read_and_process_send_stream failed
Jun 14 18:44:06 RN4 snapperd[24243]: THROW: btrfs send/receive error
Jun 14 18:44:06 RN4 snapperd[24243]: special btrfs cmpDirs failed, btrfs send/receive error
Jun 14 19:04:20 RN4 snapperd[24243]: special btrfs cmpDirs
Jun 14 19:04:21 RN4 snapperd[24243]: btrfs_read_and_process_send_stream failed
Jun 14 19:04:21 RN4 snapperd[24243]: THROW: btrfs send/receive error
Jun 14 19:04:21 RN4 snapperd[24243]: special btrfs cmpDirs failed, btrfs send/receive error
Jun 21 17:10:20 RN4 readynasd[2056]: Failed to set BTRFS_IOC_SET_DEV_TYPE to /dev/md/data-0:3:0: 30

 

I interpret the above as meaning that the disks are good but that I have btrfs errors that require a factory reset of the NAS to be safe

 

 

 

 

You can save the configuration and then restore it after the reset.  However, that might not get all settings, so it is useful to document what you have.  Screen shots on the various pages is one way.  Also, make sure you reinstall any apps before you restore the configuration.

 


@Chapi wrote:

 

 And, going forward, what should I check/backup to improve problem determination/likelyhood of fix/keeping the NAS Healthy? When should I update the OS to the latest (6.10.3)?

 

On the first question, I do recommend running the maintenance functions on a regularly schedule.  Personally I run each every three months (on a staggered schedule).  Also, download the log zip files every couple of months and take a look at the disk smart stats in disk-info.log.  Of course don't let the volume got more than about 80% full.

 

So run balance, scrub and defragment every 3 months or so. From reading this community, doens't defragment cause snapshots to potentially use more space?

 

 

Also, I don't recommend using smart snapshots.  I suggest custom snapshots, with retention explicitly set.  Smart snapshots have indefinite retention, and eventually will fill the volume.

 

Thanks for the advice

 

 

On the update question - no real advice there.  I currently am running 6.10.3 (current stable release) and have had no issues with it.  But staying with the long term stable release is reasonable too.

 

 


 

Message 9 of 11
StephenB
Guru

Re: ReadyNAS RN214 check root fs - booting 47% - stalled


@Chapi wrote:

 

I interpret the above as meaning that the disks are good but that I have btrfs errors that require a factory reset of the NAS to be safe

 

Yes, I agree.

 


@Chapi wrote:
From reading this community, doens't defragment cause snapshots to potentially use more space?

If you modify (e.g., rewrite) some blocks in a file, then the updated version in the main share becomes fragmented, and the original version remains in the snapshot(s).  The updated version shares the unmodified blocks with the original version.  

 

Defragmenting the file will change that - there won't be any shared blocks after the defragmentation, so the total space used will go up.

 

In practice, this comes up mostly with live databases and torrent files - the best option for shares holding that content is to keep snapshots off.  Updating media file tags is another way you can easily end up modifying some blocks in the the file. 

 

Usually Microsoft Word, Excel, etc. will rewrite the entire file every time you save.  BTRFS doesn't check if a re-written block is identical or not, so with those files the new version won't share any blocks with the older ones.

 

So the net here - your understanding is correct, but the practical implications often aren't that significant.  

 

An alternative is not to defrag the volume, but instead selectively enable autodefrag for some shares.  Or just don't defrag.

Message 10 of 11
Chapi
Aspirant

Re: ReadyNAS RN214 check root fs - booting 47% - stalled

@StephenBSorry for the late reply; work got in the way.

 

Thank you for all the answers and recommendation. The NAS is now resyncing after the factory reset.

 

I'll then "restore" the configuration manually to be sure the saved configuration do not inject a problem that may have been present in the NAS.

 

Thanks again...

Message 11 of 11
Top Contributors
Discussion stats
  • 10 replies
  • 1850 views
  • 3 kudos
  • 3 in conversation
Announcements