Forum Discussion

Guide

Sep 21, 2016

Solved

RN104 immediately "out of memory 390" error after 6.2.2 -> 6.2.5 -> 6.4.2 upgrade

I decided to upgrade my 2+ years old 6.2.2 firmware to a newer stable one. The latest in 6.4 branch seemed OK -> about 9 months old, so I thought it's been proven. I wasn't wrong, I just missed all t...

390

Firmware

Installation & Upgrade

Sep 23, 2016

I'm not out of the woods yet, but I've made considerable progress.

I was able to boot in "volume read-only" mode which was reassuring. At least I'd be able to backup the data...

So, I tried to find what's actually happening in the system. So I logged in via SSH and watched the system utilization. As soon as the data volume was mounted, btrfs-cleaner process appeared. It pegged the CPU to 100% which wouldn't be a problem in itself. But it also consumed more and more memory and when the system started to swap, it ground to halt. I made another test--booted in "volume read-only" mode, logged in as root and executed `mount -o remount,rw /data`. The outcome was the same--btrfs-cleaner ate all memory, swapping started, the system halted. Again, the finger was pointed at either snapshots (too many to process etc.), or quotas (newly introduced feature).

After another "volume read-only" mode boot-up, I executed this:

mount -o remount,rw /data;btrfs quota disable /data

and the system stayed OK, no btrfs-cleaner, nothing. So I remounted all other btrfs filesystems (get the list with `mount | grep btrfs`). In my case:
- the list of btrfs volumes:-

/dev/md127 on /data type btrfs (rw,noatime,nodiratime,nodatasum,nospace_cache,subvolid=5,subvol=/)
/dev/md127 on /apps type btrfs (rw,noatime,nodiratime,nodatasum,nospace_cache,subvolid=257,subvol=/.apps)
/dev/md127 on /home type btrfs (rw,noatime,nodiratime,nodatasum,nospace_cache,subvolid=256,subvol=/home)
/dev/md127 on /var/ftp/home type btrfs (rw,noatime,nodiratime,nodatasum,nospace_cache,subvolid=256,subvol=/home)
/dev/md127 on /run/nfs4/data/Shared type btrfs (rw,noatime,nodiratime,nodatasum,nospace_cache,subvolid=275,subvol=/Shared)
/dev/md127 on /run/nfs4/home type btrfs (rw,noatime,nodiratime,nodatasum,nospace_cache,subvolid=256,subvol=/home)

- the commands to remount and disable quotas:-

mount -o remount,rw /home;btrfs quota disable /home
mount -o remount,rw /apps;btrfs quota disable /apps
mount -o remount,rw /var/ftp/home;btrfs quota disable /var/ftp/home
mount -o remount,rw /run/nfs4/home;btrfs quota disable /run/nfs4/home
mount -o remount,rw /run/nfs4/data/Shared;btrfs quota disable /run/nfs4/data/Shared

Note that /run/nfs4/data/Shared is my NFS share and you won't have it for sure. Your list may be very different.

Still no btrfs-cleaner, the system still running OK. So I removed all snapshots (note: I was not using them at all, your set up might be different, but you may need to do so too in order to restore the system).
How to remove snapshots--well, first you need to have a list of configs--they're actually numbers and their count corresponds with the btrfs volume count:-

ls /etc/snapper/configs

You can also find out which config is for which volume (the smileys are actually a colon and capital S):-

root@kostka:~# grep VOLUME /etc/snapper/configs/*
0:SUBVOLUME="/data/Backup"
1:SUBVOLUME="/data/Documents"
2:SUBVOLUME="/data/Music"
3:SUBVOLUME="/data/Pictures"
4:SUBVOLUME="/data/Shared"
5:SUBVOLUME="/data/Videos"

Then execute for each config:

snapper -c <config> list

It'll show you something like:

Type   | #   | Pre # | Date                             | User | Cleanup | Description | Userdata             
-------+-----+-------+----------------------------------+------+---------+-------------+----------------------
single | 0   |       |                                  | root |         | current     |                      
single | 13  |       | Thu 31 Mar 2016 12:00:41 AM CEST | root |         |             | snapshot=c_1459375241
single | 43  |       | Sat 30 Apr 2016 12:00:25 AM CEST | root |         |             | snapshot=c_1461967225
single | 74  |       | Tue 31 May 2016 12:00:49 AM CEST | root |         |             | snapshot=c_1464645648
single | 104 |       | Thu 30 Jun 2016 12:00:25 AM CEST | root |         |             | snapshot=c_1467237624
single | 134 |       | Sat 30 Jul 2016 12:00:20 AM CEST | root |         |             | snapshot=c_1469829619
single | 135 |       | Sun 31 Jul 2016 12:00:51 AM CEST | root |         |             | snapshot=c_1469916051
single | 141 |       | Sat 06 Aug 2016 12:00:58 AM CEST | root |         |             | snapshot=c_1470434458
single | 148 |       | Sat 13 Aug 2016 12:00:05 AM CEST | root |         |             | snapshot=c_1471039205
single | 155 |       | Sat 20 Aug 2016 12:00:25 AM CEST | root |         |             | snapshot=c_1471644025
single | 158 |       | Tue 23 Aug 2016 12:00:08 AM CEST | root |         |             | snapshot=c_1471903208
..
cut = there were daily snapshots
..
single | 181 |       | Thu 15 Sep 2016 12:00:21 AM CEST | root |         |             | snapshot=c_1473890421

And now you clean all of them, but the current. The command is `snapper -c <config> delete 1-<no. of the latest snapshot>`, in my case e.g. `snapper -c 0 delete 1-181`. They were all zero-size, so it took less than a second.

I then rebooted to normal mode. The current state of the system is:-
- Snapshots deleted and all snapshotting disabled.
- Quotas disabled. It's for sure unsupported by Netgear. Some things may be broken. Firmware upgrade may enable them or fail because of them being disabled. Etc.
- I'm now balancing, defragging and scrubbing the volume. I haven't found in which order it's best to do this, the guidance in Netgear's KB #26941 is to scrub rarely, defrag occasionally and balance regularly. My 6 TB volume took 10.5 hours to balance, 4.5 hours to defrag and it took 13 hours to scrub to 30%.

Next steps:-
- Verify data integrity.
- Enable quotas and see if the system comes up normally.

iany

Guide

Sep 21, 2016

From one of the more helpful posts (5th update at https://community.netgear.com/t5/Using-your-ReadyNAS/Readynas-104-won-t-boot-Error-354-out-of-memory-After-upgrade-to/td-p/1033804/page/8):

The Guide also mentions some common reasons why problems might be encountered:

Systems that are completely full.
Systems that have high filesystem fragmentation.
Systems that have large quantities of hourly, daily, monthly snapshots.

The first and last of these should be easy for you to verify before you update the firmware. The middle one may usually (but not always) be somewhat related to the other two, but advanced users could get a good indication by looking at the metadata usage in btrfs.log. If the metadata usage is huge then this would suggest that the way the system was configured and/or used was far from ideal.

It appears that all systems encountering this problem are affected by one or more of the issues described in these bullet points.

Some suggestions going forward would be to keep volume usage under 80%, run regular scheduled volume maintenance (defrag & balance) and to only use bit-rot protection and snapshots on shares which are suited to using those, not on every share.

iany

Guide

Sep 23, 2016

mount -o remount,rw /data;btrfs quota disable /data

and the system stayed OK, no btrfs-cleaner, nothing. So I remounted all other btrfs filesystems (get the list with `mount | grep btrfs`). In my case:
- the list of btrfs volumes:-

/dev/md127 on /data type btrfs (rw,noatime,nodiratime,nodatasum,nospace_cache,subvolid=5,subvol=/)
/dev/md127 on /apps type btrfs (rw,noatime,nodiratime,nodatasum,nospace_cache,subvolid=257,subvol=/.apps)
/dev/md127 on /home type btrfs (rw,noatime,nodiratime,nodatasum,nospace_cache,subvolid=256,subvol=/home)
/dev/md127 on /var/ftp/home type btrfs (rw,noatime,nodiratime,nodatasum,nospace_cache,subvolid=256,subvol=/home)
/dev/md127 on /run/nfs4/data/Shared type btrfs (rw,noatime,nodiratime,nodatasum,nospace_cache,subvolid=275,subvol=/Shared)
/dev/md127 on /run/nfs4/home type btrfs (rw,noatime,nodiratime,nodatasum,nospace_cache,subvolid=256,subvol=/home)

- the commands to remount and disable quotas:-

mount -o remount,rw /home;btrfs quota disable /home
mount -o remount,rw /apps;btrfs quota disable /apps
mount -o remount,rw /var/ftp/home;btrfs quota disable /var/ftp/home
mount -o remount,rw /run/nfs4/home;btrfs quota disable /run/nfs4/home
mount -o remount,rw /run/nfs4/data/Shared;btrfs quota disable /run/nfs4/data/Shared

ls /etc/snapper/configs

You can also find out which config is for which volume (the smileys are actually a colon and capital S):-

root@kostka:~# grep VOLUME /etc/snapper/configs/*
0:SUBVOLUME="/data/Backup"
1:SUBVOLUME="/data/Documents"
2:SUBVOLUME="/data/Music"
3:SUBVOLUME="/data/Pictures"
4:SUBVOLUME="/data/Shared"
5:SUBVOLUME="/data/Videos"

Then execute for each config:

snapper -c <config> list

It'll show you something like:

Type   | #   | Pre # | Date                             | User | Cleanup | Description | Userdata             
-------+-----+-------+----------------------------------+------+---------+-------------+----------------------
single | 0   |       |                                  | root |         | current     |                      
single | 13  |       | Thu 31 Mar 2016 12:00:41 AM CEST | root |         |             | snapshot=c_1459375241
single | 43  |       | Sat 30 Apr 2016 12:00:25 AM CEST | root |         |             | snapshot=c_1461967225
single | 74  |       | Tue 31 May 2016 12:00:49 AM CEST | root |         |             | snapshot=c_1464645648
single | 104 |       | Thu 30 Jun 2016 12:00:25 AM CEST | root |         |             | snapshot=c_1467237624
single | 134 |       | Sat 30 Jul 2016 12:00:20 AM CEST | root |         |             | snapshot=c_1469829619
single | 135 |       | Sun 31 Jul 2016 12:00:51 AM CEST | root |         |             | snapshot=c_1469916051
single | 141 |       | Sat 06 Aug 2016 12:00:58 AM CEST | root |         |             | snapshot=c_1470434458
single | 148 |       | Sat 13 Aug 2016 12:00:05 AM CEST | root |         |             | snapshot=c_1471039205
single | 155 |       | Sat 20 Aug 2016 12:00:25 AM CEST | root |         |             | snapshot=c_1471644025
single | 158 |       | Tue 23 Aug 2016 12:00:08 AM CEST | root |         |             | snapshot=c_1471903208
..
cut = there were daily snapshots
..
single | 181 |       | Thu 15 Sep 2016 12:00:21 AM CEST | root |         |             | snapshot=c_1473890421

iany
Guide
Sep 25, 2016
I enabled quotas again. N.B.: You don't need to set them up, they were set up and I just disabled them, I did not remove their configuration.
btrfs quota enable /data
btrfs quota enable /home
btrfs quota enable /apps btrfs quota enable /var/ftp/home btrfs quota enable /run/nfs4/home btrfs quota enable /run/nfs4/data/Shared

Then I rebooted and watched for unusual/unwanted processes, e.g. btrfs-cleaner eating my memory etc. ;) Long story short, the machine is running for 30 hours now without any problems.

The last step was to set up volume maintenance schedule:-
- Disk test seems to be extended offline test. The kind you run with `smartctl -t long /dev/sdX`. I run this weekly.
- Balance will run monthly.
- Defrag will run quarterly.
- I don't run scrub as I don't use snapshots.

My last update here, I hope :)
- mdgm-ntgr
  NETGEAR Employee Retired
  Sep 25, 2016
  Scrubbing is there for if you use bit-rot protection not snapshots.
  - StephenB
    Guru - Experienced User
    Sep 26, 2016
    mdgm wrote:
    
    Scrubbing is there for if you use bit-rot protection not snapshots.
    
    Scrubbing still reads all the data on the disks even if bit-rot protection is off. So it does provide some assurance that the drives and the file system are ok.
    
    I schedule each of the functions - disk test, balance, defrag, and scrub - once every three months (spreading them out over the quarter).