× NETGEAR will be terminating ReadyCLOUD service by July 1st, 2023. For more details click here.
Orbi WiFi 7 RBE973
Reply

Re: Admin page unavailable after cancelled backup job & hard reboot, shares are working

VolkerB
Aspirant

Admin page unavailable after cancelled backup job & hard reboot, shares are working

Hopefully someone can help me get out of this deadlock.

 

I recently added a rsync backup job to sync the local /media share with an USB drive that was connected via the rear USB socket of the RN214 box. After starting it, I found out that - by mistake - the NAS was creating a directory /media/media/... on the USB drive (essentially duplicating all my data), so I cancelled the backup job via the ReadyNAS admin page. After refreshing the page I was presented with  the progress bar and eventually a notification similar to the one described in https://kb.netgear.com/26883/ReadyNAS-OS-6-Admin-Page-is-offline. A graceful restart did not work, the LCD was not lighting up. So I pulled the plug, forcing a cold restart.

 

After this, the device came back up again, however boot progress notification is stuck at 95%, the upper line of the LCD says "fvbackup-q.servi". I can ping the device and the shares are accessible (RW). Unfortunately the admin page http://rn214/admin/ is still offline (same error as mentioned above), the power button never stops blinking.

 

I have not set up SSH access (dammit!), so I can't log on and view logs/running processes. NetGear RAIDar reported the management service to be offline and was unable to retrieve logfiles. Diagnostics yielded the snippet below (slightly abbreviated, there were a lot more entries saying "Failed to UPDATE dictionary".

 

I can reboot the device using RAIDar. In that case no progress bar is shown at all (only "Booting...") and the "fvbackup-q.servi" information if the power button is pressed once. Shares are available, admin page says "Connecting..." and then ends up with the offline notification mentioned above.

 

I then tried the boot menu's "OS Reinstall" option which successfully recovered the admin page. In the log, it says "Volume: System volume root's usage is 99%. This condition should not occur under normal conditions. Contact technical support.", downloading the logfiles (http://rn214/dbbroker) fails with this XML file:

 

<xs:nml xmlns:xs="http://www.netgear.com/protocol/transaction/NMLSchema-0.9" src="browser" dst="nas" locale="en-us">
<xs:transaction ref-id="" type="0">
<xs:response ref-id="opid" status="failure">
<xs:error>
<xs:error-code>
<![CDATA[ 12008010002 ]]>
</xs:error-code>
<xs:error-cause>
<![CDATA[ Can't create zipped log ]]>
</xs:error-cause>
<xs:error-details>
<![CDATA[ Error in dlowload log ]]>
</xs:error-details>
</xs:error>
</xs:response>
</xs:transaction>
</xs:nml>

Any advice on how to proceed?

 

Many thanks in advance!

 

Successfully completed diagnostics
System
No errors found.
Logs
2021-09-22 11:34:39: Assertion 'f' failed at ../src/journal/journal-file.c:1674, function journal_file_post_change(). Aborting.
2021-09-22 11:02:36: ufsd: "mount" (sda2): is mounted as NTFS at 2021-09-22 09:02:36
2021-09-22 11:00:19: ufsd: "umount" (sda1): is unmounted at 2021-09-22 09:00:19
2021-09-22 07:26:17: ufsd: "mount" (sda1): is mounted as NTFS at 2021-09-22 05:26:17
2021-09-22 07:25:57: ufsd: "umount" (sda1): is unmounted at 2021-09-22 05:25:57
2021-09-22 07:25:37: ufsd: "mount" (sda1): is mounted as NTFS at 2021-09-22 05:25:37
2021-09-22 07:20:02: ufsd: "umount" (sda2): is unmounted at 2021-09-22 05:20:02
2021-09-22 07:19:50: ufsd: "umount" (sda1): is unmounted at 2021-09-22 05:19:50
2021-09-22 07:19:31: ufsd: "mount" (sda1): is mounted as NTFS at 2021-09-22 05:19:31
2021-09-22 07:19:12: ufsd: "umount" (sda1): is unmounted at 2021-09-22 05:19:12
2021-09-22 07:18:02: ufsd: "mount" (sda1): is mounted as NTFS at 2021-09-22 05:18:02
2021-09-22 07:17:16: ufsd: "umount" (sda1): is unmounted at 2021-09-22 05:17:16
2021-09-22 07:05:39: ufsd: "mount" (sda1): is mounted as NTFS at 2021-09-22 05:05:39
2021-09-22 01:01:17: ufsd: "umount" (sda1): is unmounted at 2021-09-21 23:01:17
2021-09-21 07:44:33: ufsd: "mount" (sda1): is mounted as NTFS at 2021-09-21 05:44:33
2021-09-21 01:00:57: ufsd: "umount" (sda1): is unmounted at 2021-09-20 23:00:57
2021-09-20 10:22:24: ufsd: "mount" (sda1): is mounted as NTFS at 2021-09-20 08:22:24
2021-09-20 10:22:06: ufsd: "umount" (sda1): is unmounted at 2021-09-20 08:22:06
2021-09-20 10:19:31: ufsd: "mount" (sda1): is mounted as NTFS at 2021-09-20 08:19:31
2021-09-20 10:18:29: ufsd: "umount" (sda1): is unmounted at 2021-09-20 08:18:29
2021-09-20 08:13:42: ufsd: "mount" (sda1): is mounted as NTFS at 2021-09-20 06:13:42
2021-09-03 09:55:17: ufsd: "umount" (sdc1): is unmounted at 2021-09-03 07:55:17
System Management
2021-09-22 11:37:21: Failed to UPDATE dictionary
2021-09-22 11:37:21: Failed to UPDATE dictionary
2021-09-22 11:37:03: Failed to start ReadyNAS System Daemon.
2021-09-22 11:36:43: Failed to start ReadyNAS System Daemon.
2021-09-22 11:36:17: DB (main) schema version: new ==> 24
2021-09-22 11:36:17: DB (queue) schema version: new ==> 0
2021-09-22 11:36:16: DB sanity check failed! Trying backup readynasd_2021_09_22_110945.db.lz4.
2021-09-22 11:36:16: DB sanity check failed! Trying backup readynasd_2021_09_22_110945.db.
Model: RN21400|ReadyNAS 214 Series 4- Bay (Diskless)
Message 1 of 13

Accepted Solutions
Sandshark
Sensei

Re: Admin page unavailable after cancelled backup job & hard reboot, shares are working

Did you at any point disconnect or power down the USB drive before the backup job said it was done?  I ask because I suspect that it continued after that.  But once the USB drive wasn't connected, it began to copy files to the mount point in the OS partion instead of the USB drive that should have been mounted there.  That'll fill the OS partition in a hurry.

 

The message you are seeing is the location where the OS crashed, undoubtedly due to the too-full OS partition.  If you are now able to enable SSH after the OS re-install, you need to go in and clear out any files that were copied to the mount point directory.

 

The fact that your files were copied to media/media is the way the rsync backup jobs are designed.  I disagree that's the way it shoud be, but it is.  The work-around is that you need to go back into the backup job configuration after it's created and put a single forward slash "/"  as the source path.

View solution in original post

Message 2 of 13

All Replies
Sandshark
Sensei

Re: Admin page unavailable after cancelled backup job & hard reboot, shares are working

Did you at any point disconnect or power down the USB drive before the backup job said it was done?  I ask because I suspect that it continued after that.  But once the USB drive wasn't connected, it began to copy files to the mount point in the OS partion instead of the USB drive that should have been mounted there.  That'll fill the OS partition in a hurry.

 

The message you are seeing is the location where the OS crashed, undoubtedly due to the too-full OS partition.  If you are now able to enable SSH after the OS re-install, you need to go in and clear out any files that were copied to the mount point directory.

 

The fact that your files were copied to media/media is the way the rsync backup jobs are designed.  I disagree that's the way it shoud be, but it is.  The work-around is that you need to go back into the backup job configuration after it's created and put a single forward slash "/"  as the source path.

Message 2 of 13
VolkerB
Aspirant

Re: Admin page unavailable after cancelled backup job & hard reboot, shares are working


@Sandshark wrote:

Did you at any point disconnect or power down the USB drive before the backup job said it was done?  I ask because I suspect that it continued after that.  But once the USB drive wasn't connected, it began to copy files to the mount point in the OS partion instead of the USB drive that should have been mounted there.  That'll fill the OS partition in a hurry.

OMG. That is absolutely possible. rsync backup jobs on the RN214 always were a mystery to me, however still being the only choice to have a one-click incremental nobrainer backup to an attached USB3 device excluding btrfs snapshots and a couple of other unnecessary directories. Once I got aware of the /media/media problem, I wanted to avoid 4TB of redundant data being copied only to be deleted once the job was finished. Hence I canceled the job and did not really care about the HDD access  LED in the NAS. I then probably have ejected the USB device (don't really remember if I did) or it could be that at that point the admin page was already unresponsive.

 

Since I was stupid enough not to enable SSH and the power button shutdown did not work, I had to pull the plug. Later I learned about RAIDar, which I could have tried to force a shutdown as well.

 

After the forced restart, the admin GUI just showed the progress bar similar to https://kb.netgear.com/26883/ReadyNAS-OS-6-Admin-Page-is-offline, no way around that. I resorted to the OS reinstall boot menu option after which I could log onto the admin GUI successfully. Then I saw the "Volume: System volume root's usage is 99%. This condition should not occur under normal conditions. Contact technical support." log message but sure enough, I don't have any technical support contract.

 

So the master plan was to enable SSH shell access for admin, which failed (blocked GUI again and all other kinds of weird behaviour). After a couple of OS reinstalls, suddently a "firmware image unpack failure" (or likewise) was displayed on the LCD panel and the admin GUI came up completely non-localized, with just the message tokens/placeholders instead of the real messages. Still no SSH access.

 

That was the moment when I called it a day and performed a complete factory default reset, knowing that I had at least backed up all the data (without the snapshots unfortunately, since the NTFS partition on the external USB drive does not support them). Currently, the device is resyncing its data (at ~10%).

 


@Sandshark wrote:

The message you are seeing is the location where the OS crashed, undoubtedly due to the too-full OS partition.  If you are now able to enable SSH after the OS re-install, you need to go in and clear out any files that were copied to the mount point directory.

That was the plan. But it seems that my iterations and an OS partition filled up to the brim was not allowing this configuration change. Lesson learned. Enabling SSH was the veryfirst thing I did after the RN214 came up in it's pristine state.

 


@Sandshark wrote:

The fact that your files were copied to media/media is the way the rsync backup jobs are designed.  I disagree that's the way it should be, but it is.  The work-around is that you need to go back into the backup job configuration after it's created and put a single forward slash "/"  as the source path.


Usually I'm aware of those rsync subtleties but that's what happens, if you're copying files while working on other stuff in parallel...

 

The big question is now: If I run an rsync backup job in the future and find out that things are going haywire: What is the recommended way of cancelling them in a safe way so that this disaster can't happen again? Does it suffice to wait for the job shown as "Cancelled"? Is there a logfile to have a look at?

 

That was quite a miserable day to say the least. Nevertheless many thanks for your explanations, I learned a lot about rsync again.

Model: RN21400|ReadyNAS 214 Series 4- Bay (Diskless)
Message 3 of 13
Sandshark
Sensei

Re: Admin page unavailable after cancelled backup job & hard reboot, shares are working

For future aborts, I recommend you check the log (available inn the backup job menu) to see that it says the job was cancelled.  The log is only updated at completion (it doesn't show progress), so that should be a sure way to know it's done.

 

There are other ways we could have helped you clean out the system, but it's complicated.  Since you had a full backup, that's really the road of least resistance in this kind of case.

Message 4 of 13
VolkerB
Aspirant

Re: Admin page unavailable after cancelled backup job & hard reboot, shares are working


@Sandshark wrote:

For future aborts, I recommend you check the log (available inn the backup job menu) to see that it says the job was cancelled.  The log is only updated at completion (it doesn't show progress), so that should be a sure way to know it's done.

 

There are other ways we could have helped you clean out the system, but it's complicated.  Since you had a full backup, that's really the road of least resistance in this kind of case.


One last thing and on a sidenote:

 

What would be the recommended way to restore my backup? NB: It's around 4TB of data on an external USB3 box with NTFS filesystem. I could abuse the "Backup" GUI option (kind of backwards rsync, *sigh*), just a trivial copy-paste in the NAS' file browser (will this also copy hidden/system files - at least they are not shown there?) or probably even by rsyncing in an SSH root shell, e. g. something like:

su
rsync -avh /media/USB_HDD_1/home/ /data/home &

to restore the home folders (don't know if activating checksums with -c is really necessary). At least @eton had success doing it that way (see his post in https://community.netgear.com/t5/Using-your-ReadyNAS-in-Business/How-to-restore-a-nas-with-rsync/td-..., I modified directories to match my RN214). I would somehow need to send the command to the background, since I don't want to keep the SSH console window open for hours. Dragging the files around the network connection with a remote PC would be the very last resort.

 

Thanks again!

Model: RN21400|ReadyNAS 214 Series 4- Bay (Diskless)
Message 5 of 13
VolkerB
Aspirant

Re: Admin page unavailable after cancelled backup job & hard reboot, shares are working

Addendum: Just sending rsync to the background with "&" does not work. The process will stop once the SSH terminal window is closed. Some people had success with nohup, redirecting stdin, stdout and stderr to /dev/nul. This seems quite complicated to me.

 

Then there is the fraction recommending to install the screen package with

apt-get update
apt-get install screen

and then

screen -S rsync
rsync -avh /media/USB_HDD_1/home/ /data/home

in your SSH terminal to create a screen named "rsync", start the restore and hit CTRL+a, d to detach the screen. Once the restore is finished, you would run

screen -r rsync

in the SSH terminal to reconnect.

 

I would probably send stdout and stderr to a file, so there is a way of diagnosing things in case something went wrong:

rsync -avh /media/USB_HDD_1/home/ /data/home >~/rsync.log 2>&1

 

What do you think? I'm a bit hesitant though to install $STUFF on my RN214 box - if that works at all...

Message 6 of 13
Sandshark
Sensei

Re: Admin page unavailable after cancelled backup job & hard reboot, shares are working

Restore using an rsync backup job will take a lot longer than just using a standard internal to intenal one, which seems to be little more than a cp -a.

Message 7 of 13
VolkerB
Aspirant

Re: Admin page unavailable after cancelled backup job & hard reboot, shares are working


@Sandshark wrote:

Restore using an rsync backup job will take a lot longer than just using a standard internal to intenal one, which seems to be little more than a cp -a.


OK. I have just started a good plain old local copy job from the connected external USB box to a NAS share and forego with any fancy checksums *). Let's hope the data arrives intact and the whole process is not going to take an eternity.

 

Thanks again!

 

*) ... for now. I can still rsync -c later, use TotalCommander from the Windows PC, Meld, or mess with hashes later.

Message 8 of 13
StephenB
Guru

Re: Admin page unavailable after cancelled backup job & hard reboot, shares are working


@VolkerB wrote: I can still rsync -c later 

The rsync checksums are only needed when you doing an incremental restore, but think you might have corrupted files already on the target.  If the file

  • exists on the target
  • has the same size
  • has the same file date

then rsync compares the two checksums and updates the files if the two checksums differ.  Checksums are used to instead of direct comparisons to preserve bandwidth on the network link (which doesn't apply in your case anyway).

 

If your goal is to simply verify the folder contents after the backup, I suggest just using

diff -qr /path0 /path1

 

Message 9 of 13
VolkerB
Aspirant

Re: Admin page unavailable after cancelled backup job & hard reboot, shares are working


 @Sandshark wrote:

Did you at any point disconnect or power down the USB drive before the backup job said it was done?  I ask because I suspect that it continued after that.  But once the USB drive wasn't connected, it began to copy files to the mount point in the OS partion instead of the USB drive that should have been mounted there.  That'll fill the OS partition in a hurry.

OK, come to think of it and to avoid trouble in the future... Please apologize, if the follwing is a potentially stupid question:

 

Assume, I want to backup a couple shares to an external USB drive connected to the ReadyNAS. I want to use rsync, because that allows hassle free incremental updates. So I connect the drive, enable RSYNC R/W access for that drive in the admin page share section like this:

20210924_170406_capture.png

Now assume, I want to create a backup job for all home shares like this:

20210924_170815_capture.png

As destination, I point the NAS to the remote rsync server on 127.0.0.1 (which is essentially the local machine accessing the USB drive via rsync):

20210924_170928_capture.png

I exclude the /admin/snapshot directory, because this will confuse the NTFS filesystem and set the --delete option to remove remote files that don't exist locally anymore:

20210924_171027_capture.png

So far, so good.

 

This backup seems to be bound to the share HDS5C3020ALA632 for the destination. Now assume, that drive is not connected but I accidentially hit the backup button on the NAS. I hope, in this case it is NOT writing to the OS partition but rather failing the operation in the first place as any sane person (and all the Linux OSes I have worked with) would do.

 

Am I right? Thanks again for your patience. I'm getting a bit paranoid after three sleepless nights. Smiley Wink

 

P.S.: I hope the screenshots show up in my post. I can see them while editing but after posting, there's just the yellow triangle...

Message 10 of 13
Sandshark
Sensei

Re: Admin page unavailable after cancelled backup job & hard reboot, shares are working

I've not tried it, but I think the answer may be that it will write to the OS partition in that case.  I know rsync can't tell the difference when a directory that's intended as a mount point actually has something mounted to it or doesn't, but does somehting in the NAS unique software check first?

 

The screen grabs are there.  A moderator has to approve them, but that's obviously already taken place.

 

I think this is worth some time with my "sandbox" NAS, so I'll give it a try some time this weekend if I have the time.  You may have more fully explained why so many have issues with a too full OS partition.  We've seen instances that were clearly related to an unmounted USB drive, but I never thought about the backup button being a trigger.

 

Or, if you want to try it, do so with a share that has very little in it, so the OS partition won't fill and you can still get in with SSH and check the content of the mount point with nothing mounted and delete anything that is there.

Message 11 of 13
VolkerB
Aspirant

Re: Admin page unavailable after cancelled backup job & hard reboot, shares are working


@Sandshark wrote:

[external media absent, so rsync] will write to the OS partition in that case.  I know rsync can't tell the difference when a directory that's intended as a mount point actually has something mounted to it or doesn't, but does somehting in the NAS unique software check first?

I vaguely remember that I had a couple sanity checks in my backup.sh on the Linux box and it was failing early if the share was not mounted. There was an entry similar to

//rn214/media /media/rn214/media cifs noauto,users,credentials=/home/admin/.smbcredentials,iocharset=utf8,uid=1000,gid=1000,file_mode=0770,dir_mode=0770  0 0
//rn214/admin /media/rn214/admin cifs noauto,users,credentials=/home/anja/.smbcredentials,iocharset=utf8,uid=1000,gid=1000,file_mode=0770,dir_mode=0770 0 0

in fstab and I was checking the backup target directory with

if [ ! -d "$TARGET" ]; then
  echo "# Error: Target directory '$TARGET' does not exist." 2>&1 | tee -a "$LOG"
  exit -1
fi

which seemed to work.


@Sandshark wrote:

You may have more fully explained why so many have issues with a too full OS partition.  We've seen instances that were clearly related to an unmounted USB drive, but I never thought about the backup button being a trigger.


 IMHO that's the purpose of this button in the first place: Allow a hasslefree nobrainer backup.

  1. Connect external drive.
  2. Press button.
  3. Wait until process is finished/email received.
  4. Eject drive, disconnect.
  5. Enjoy life.

If you're in constant danger of exploding your OS root volume doing so and having to run a factory reset if SSH is not available, that would severely spoil my fun to say the least. Losing all snapshots and spending more than one full day to restore 5TB of data is nothing that I want happening to me anytime soon.

 

On a sidenote: I eventually came up with a quite elegant (OK, translation: not totally clumsy) way of backing up the entire data volume of the NAS with one single backup job. So assume a big disk is attached as ST8000VN0022 via USB, rsync enabled:

20210924_185049_capture.png

Let's then declare volume:data as the rsync source and add the slash "/" to the path, so there is no useless directory "data" created at the target:

20210924_185111_capture.png

The rsync target is of course the external HDD on localhost:

20210924_185117_capture.png

I had to enable the "Multiple Files Systems" option (otherwise the shares underneath "data" weren't included in the backup) and add "home/admin/snapshot" etc. to the rsync ignore list, because I had btrfs and smart snapshots enabled for user's homes:

20210924_185124_capture.png

My dry test on a blank HDD worked well, all directories&files equal. So the big question is: Am I missing something totally obvious here or is this a viable way to backup the contents of a ReadyNAS to an external disk of at least equal size?


@Sandshark wrote:

Or, if you want to try it, do so with a share that has very little in it, so the OS partition won't fill and you can still get in with SSH and check the content of the mount point with nothing mounted and delete anything that is there.


*LOL* Thanks, but no thanks. I really learned my lesson these days, don't have the guts for further experimentation. If this NAS misbehaves to a similar extent again, I'll store my data on punchcards, trust me.

Message 12 of 13
Sandshark
Sensei

Re: Admin page unavailable after cancelled backup job & hard reboot, shares are working

Since I don't actually back up to USB (I have backup NAS), I can't be 100% sure, but you appear to have found all the "secrets" for accomplishing a full volume backup.

Message 13 of 13
Top Contributors
Discussion stats
  • 12 replies
  • 3687 views
  • 2 kudos
  • 3 in conversation
Announcements