NETGEAR is aware of a growing number of phone and online scams. To learn how to stay safe click here.

Forum Discussion

iany's avatar
iany
Guide
Sep 21, 2016
Solved

RN104 immediately "out of memory 390" error after 6.2.2 -> 6.2.5 -> 6.4.2 upgrade

I decided to upgrade my 2+ years old 6.2.2 firmware to a newer stable one. The latest in 6.4 branch seemed OK -> about 9 months old, so I thought it's been proven. I wasn't wrong, I just missed all t...
  • iany's avatar
    iany
    Sep 23, 2016

    I'm not out of the woods yet, but I've made considerable progress.

    I was able to boot in "volume read-only" mode which was reassuring. At least I'd be able to backup the data...

    So, I tried to find what's actually happening in the system. So I logged in via SSH and watched the system utilization. As soon as the data volume was mounted, btrfs-cleaner process appeared. It pegged the CPU to 100% which wouldn't be a problem in itself. But it also consumed more and more memory and when the system started to swap, it ground to halt. I made another test--booted in "volume read-only" mode, logged in as root and executed `mount -o remount,rw /data`. The outcome was the same--btrfs-cleaner ate all memory, swapping started, the system halted. Again, the finger was pointed at either snapshots (too many to process etc.), or quotas (newly introduced feature).

    After another "volume read-only" mode boot-up, I executed this:

    mount -o remount,rw /data;btrfs quota disable /data

    and the system stayed OK, no btrfs-cleaner, nothing. So I remounted all other btrfs filesystems (get the list with `mount | grep btrfs`). In my case:
    - the list of btrfs volumes:-

    /dev/md127 on /data type btrfs (rw,noatime,nodiratime,nodatasum,nospace_cache,subvolid=5,subvol=/)
    /dev/md127 on /apps type btrfs (rw,noatime,nodiratime,nodatasum,nospace_cache,subvolid=257,subvol=/.apps)
    /dev/md127 on /home type btrfs (rw,noatime,nodiratime,nodatasum,nospace_cache,subvolid=256,subvol=/home)
    /dev/md127 on /var/ftp/home type btrfs (rw,noatime,nodiratime,nodatasum,nospace_cache,subvolid=256,subvol=/home)
    /dev/md127 on /run/nfs4/data/Shared type btrfs (rw,noatime,nodiratime,nodatasum,nospace_cache,subvolid=275,subvol=/Shared)
    /dev/md127 on /run/nfs4/home type btrfs (rw,noatime,nodiratime,nodatasum,nospace_cache,subvolid=256,subvol=/home)

    - the commands to remount and disable quotas:-

    mount -o remount,rw /home;btrfs quota disable /home
    mount -o remount,rw /apps;btrfs quota disable /apps
    mount -o remount,rw /var/ftp/home;btrfs quota disable /var/ftp/home
    mount -o remount,rw /run/nfs4/home;btrfs quota disable /run/nfs4/home
    mount -o remount,rw /run/nfs4/data/Shared;btrfs quota disable /run/nfs4/data/Shared

    Note that /run/nfs4/data/Shared is my NFS share and you won't have it for sure. Your list may be very different.

    Still no btrfs-cleaner, the system still running OK. So I removed all snapshots (note: I was not using them at all, your set up might be different, but you may need to do so too in order to restore the system).
    How to remove snapshots--well, first you need to have a list of configs--they're actually numbers and their count corresponds with the btrfs volume count:-

    ls /etc/snapper/configs

    You can also find out which config is for which volume (the smileys are actually a colon and capital S):-

    root@kostka:~# grep VOLUME /etc/snapper/configs/*
    0:SUBVOLUME="/data/Backup"
    1:SUBVOLUME="/data/Documents"
    2:SUBVOLUME="/data/Music"
    3:SUBVOLUME="/data/Pictures"
    4:SUBVOLUME="/data/Shared"
    5:SUBVOLUME="/data/Videos"

    Then execute for each config:

    snapper -c <config> list

    It'll show you something like:

    Type   | #   | Pre # | Date                             | User | Cleanup | Description | Userdata             
    -------+-----+-------+----------------------------------+------+---------+-------------+----------------------
    single | 0   |       |                                  | root |         | current     |                      
    single | 13  |       | Thu 31 Mar 2016 12:00:41 AM CEST | root |         |             | snapshot=c_1459375241
    single | 43  |       | Sat 30 Apr 2016 12:00:25 AM CEST | root |         |             | snapshot=c_1461967225
    single | 74  |       | Tue 31 May 2016 12:00:49 AM CEST | root |         |             | snapshot=c_1464645648
    single | 104 |       | Thu 30 Jun 2016 12:00:25 AM CEST | root |         |             | snapshot=c_1467237624
    single | 134 |       | Sat 30 Jul 2016 12:00:20 AM CEST | root |         |             | snapshot=c_1469829619
    single | 135 |       | Sun 31 Jul 2016 12:00:51 AM CEST | root |         |             | snapshot=c_1469916051
    single | 141 |       | Sat 06 Aug 2016 12:00:58 AM CEST | root |         |             | snapshot=c_1470434458
    single | 148 |       | Sat 13 Aug 2016 12:00:05 AM CEST | root |         |             | snapshot=c_1471039205
    single | 155 |       | Sat 20 Aug 2016 12:00:25 AM CEST | root |         |             | snapshot=c_1471644025
    single | 158 |       | Tue 23 Aug 2016 12:00:08 AM CEST | root |         |             | snapshot=c_1471903208
    ..
    cut = there were daily snapshots
    ..
    single | 181 |       | Thu 15 Sep 2016 12:00:21 AM CEST | root |         |             | snapshot=c_1473890421

    And now you clean all of them, but the current. The command is `snapper -c <config> delete 1-<no. of the latest snapshot>`, in my case e.g. `snapper -c 0 delete 1-181`. They were all zero-size, so it took less than a second.

    I then rebooted to normal mode. The current state of the system is:-
    - Snapshots deleted and all snapshotting disabled.
    - Quotas disabled. It's for sure unsupported by Netgear. Some things may be broken. Firmware upgrade may enable them or fail because of them being disabled. Etc.
    - I'm now balancing, defragging and scrubbing the volume. I haven't found in which order it's best to do this, the guidance in Netgear's KB #26941 is to scrub rarely, defrag occasionally and balance regularly. My 6 TB volume took 10.5 hours to balance, 4.5 hours to defrag and it took 13 hours to scrub to 30%.

    Next steps:-
    - Verify data integrity.
    - Enable quotas and see if the system comes up normally.

NETGEAR Academy

Boost your skills with the Netgear Academy - Get trained, certified and stay ahead with the latest Netgear technology! 

Join Us!

ProSupport for Business

Comprehensive support plans for maximum network uptime and business peace of mind.

 

Learn More