Hang/Crash after 6.9.5 upgrade - Defrag

JohnCM_S
NETGEAR Employee Retired
Feb 27, 2019
Thank you for providing the logs. We will look at the logs soon.
JohnCM_S
NETGEAR Employee Retired
Mar 05, 2019
Hi Michael_Oz,

There is a lot of metadata on your NAS and the unit only has 2GB RAM.

=== filesystem /RN316AV1 ===
Data, single: total=14.03TiB, used=14.01TiB
System, DUP: total=32.00MiB, used=1.84MiB
Metadata, DUP: total=29.50GiB, used=24.90GiB
GlobalReserve, single: total=512.00MiB, used=0.00B

It is not probably not having enough RAM and SWAP to accomplish the cleanup. L3 has cancelled the defrag.

This issue can be resolved by backing up, doing a factory default and restoring the data to a fresh clean volume.

Regards,
- Hopchen
  Prodigy
  Mar 05, 2019
  That is a lot of metadata for a volume that size. Do you remember when you did a factory reset last, what firmware version you were on?
- Michael_Oz
  Luminary
  Mar 05, 2019
  Thanks John.
  
  > L3 has cancelled the defrag.
  
  What does that mean? The system was hung the first time and OOM the second.
  
  One would think software would maintain it's metadata.
  
  As I mentioned it had been happily defragging on schedule UNTIL the first defrag on 6.9.5.
  
  Now it COULD be that it just happenned to have that one extra nybble to put it over the edge the day after the upgrade, but occam's razor says no.
  
  > backing up, doing a factory default and restoring the data to a fresh clean volume.
  
  Is this going into the Product Brochure? 'BTW you will need to reset & reload your NAS every year or so'
  
  That volume was created 2017/08/05 (on 6.7.5) with data reloaded from backup of a previous volume, and it, and the NAS, is basically idle 99.9% of the time, the most it does is take & delete empty snapshots and ponder the meaning of UPS signals.
  
  It appears the metadata is volume specific, so surely deleting/recreating the volume is all that is required?
  
  Why factory reset? (and hence need to do the other volume too)
  
  I had also regularly scrubbed the volumes, until a while ago.
  
  I started a scrub 6 days ago which is now at 10%, is a scrub going to tidy up the metadata?
  
  If not, surely a metadata_tidyup routine is called for rather than factory resetting, that is not a 21st century solution.
  
  I'm monitoring memory use, it is growing slightly, but is only ~30%.
  
  Hopchen
  
  > That is a lot of metadata for a volume that size. Do you remember when you did a factory reset last, what firmware version you were on?
  
  I don't think I have factory reset since a couple shortly after getting the unit.
  
  [2014/11/03 22:12:40] Factory default initiated due to new disks (no RAID, no partitions)!
  [2014/11/03 22:12:58] Defaulting to X-RAID2 mode, RAID level 1
  [2014/11/03 22:13:13] Factory default initiated on ReadyNASOS 6.1.6. [2015/09/18 21:15:14] Updated from ReadyNASOS 6.1.6 to 6.1.6. [2015/09/25 01:31:45] Updated from ReadyNASOS 6.1.6 () to 6.2.4 (ReadyNASOS). [2016/03/04 22:44:33] Updated from ReadyNASOS 6.2.4 (ReadyNASOS) to 6.4.2 (ReadyNASOS). [2016/05/25 16:24:48] Updated from ReadyNASOS 6.4.2 (ReadyNASOS) to 6.5.0 (ReadyNASOS). [2016/07/12 18:46:10] Updated from ReadyNASOS 6.5.0 (ReadyNASOS) to 6.5.1 (ReadyNASOS). [2016/11/11 13:35:30] Updated from ReadyNASOS 6.5.1 (ReadyNASOS) to 6.6.0 (ReadyNASOS). [2016/11/17 23:05:03] Updated from ReadyNASOS 6.6.0 (ReadyNASOS) to 6.6.1-T200 (Beta 1). [2016/12/20 14:54:50] Updated from ReadyNASOS 6.6.1-T200 (Beta 1) to 6.6.1-T220 (Beta 3). [2017/01/12 14:18:27] Updated from ReadyNASOS 6.6.1-T220 (Beta 3) to 6.6.1 (ReadyNASOS). [2017/03/02 07:04:04 UTC] Updated from ReadyNASOS 6.6.1 (ReadyNASOS) to 6.7.0-T169 (Beta 2). [2017/03/03 06:47:43 UTC] Updated from ReadyNASOS 6.7.0-T169 (Beta 2) to 6.7.0-T172 (ReadyNASOS). [2017/03/03 23:44:36 UTC] Updated from ReadyNASOS 6.7.0-T172 (ReadyNASOS) to 6.7.0-T180 (Beta 3). [2017/03/16 09:47:38 UTC] Updated from ReadyNASOS 6.7.0-T180 (Beta 3) to 6.7.0-T206 (Beta 4). [2017/03/29 23:56:09 UTC] Updated from ReadyNASOS 6.7.0-T206 (Beta 4) to 6.7.0 (ReadyNASOS). [2017/05/15 06:42:29 UTC] Updated from ReadyNASOS 6.7.0 (ReadyNASOS) to 6.7.1 (ReadyNASOS). [2017/06/11 02:41:54 UTC] Updated from ReadyNASOS 6.7.1 (ReadyNASOS) to 6.7.4 (ReadyNASOS). [2017/07/14 07:02:22 UTC] Updated from ReadyNASOS 6.7.4 (ReadyNASOS) to 6.7.5 (ReadyNASOS). [2017/09/30 01:38:36 UTC] Updated from ReadyNASOS 6.7.5 (ReadyNASOS) to 6.8.1 (ReadyNASOS). [2018/03/30 05:23:48 UTC] Updated from ReadyNASOS 6.8.1 (ReadyNASOS) to 6.9.3 (ReadyNASOS). [2019/02/08 23:09:27 UTC] Updated from ReadyNASOS 6.9.3 (ReadyNASOS) to 6.9.5 (ReadyNASOS).
  - StephenB
    Guru - Experienced User
    Mar 06, 2019
    Michael_Oz wrote:
    
    It appears the metadata is volume specific, so surely deleting/recreating the volume is all that is required?
    
    Why factory reset? (and hence need to do the other volume too)
    
    If the other volume doesn't have the problem, then deleting/recreating this volume should be enough.
    
    Michael_Oz wrote:
    
    As I mentioned it had been happily defragging on schedule UNTIL the first defrag on 6.9.5.
    
    Now it COULD be that it just happenned to have that one extra nybble to put it over the edge the day after the upgrade, but occam's razor says no.
    
    Personally I don't see this as particularly relevant. The metadata growth almost certainly happened over time. Almost certainly the upgrade to 6.9.5 also contributed to timing of the failure, but the issue was hiding under the surface already. The failure mode is also unusual - not something I recall seeing here before.
    
    But it also appears to me that the volume is essentially completely full:
    
    Data, single: total=14.03TiB, used=14.01TiB
    
    Another thing you could try is to offload as many files as you can (deleting them from the volume). You could then try doing a balance (not a scrub). If you have ssh enabled, you can do a "partial" balance - which would complete more quickly, and not require as much memory. If that frees up enough space, you could follow that up with a full balance.
    
    FWIW, I would use ssh here, as that does give more control over options, and more ability to cancel operations. Though that does depend on your linux skills.
    
    Michael_Oz wrote:
    surely a metadata_tidyup routine ...
    
    FWIW, the BTRFS folks seem to agree with you there
    
    https://btrfs.wiki.kernel.org/index.php/FAQ wrote:
    
    If you have full up metadata, and more than 1 GiB of space free in data, as reported by btrfs fi df, then you should be able to free up some of the data allocation with a partial balance:
    
    # btrfs balance start /mountpoint -dlimit=3
    
    We know this isn't ideal, and there are plans to improve the behavior.
    
    Note you might not have more than 1 GiB of free space in data.
- Michael_Oz
  Luminary
  Mar 27, 2019
  JohnCM_S wrote:
  
  Hi Michael_Oz,
  
  There is a lot of metadata on your NAS and the unit only has 2GB RAM.
  
  === filesystem /RN316AV1 ===
  Data, single: total=14.03TiB, used=14.01TiB
  System, DUP: total=32.00MiB, used=1.84MiB
  Metadata, DUP: total=29.50GiB, used=24.90GiB
  GlobalReserve, single: total=512.00MiB, used=0.00B
  
  It is not probably not having enough RAM and SWAP to accomplish the cleanup. L3 has cancelled the defrag.
  
  This issue can be resolved by backing up, doing a factory default and restoring the data to a fresh clean volume.
  
  Regards,
  
  For the record.
  
  I backed up, deleted & recreated the volume, no factory default, restored.
  
  Note as a solution it is a PITA, as well as the time & effort you loose all your backup & restore jobs.
  
  Defrag now runs to completion.
  
  Size info:
  
  Data, single: total=8.19TiB, used=8.12TiB
  System, DUP: total=64.00MiB, used=960.00KiB
  Metadata, DUP: total=10.00GiB, used=5.59GiB
  GlobalReserve, single: total=512.00MiB, used=0.00B
  
  Note the previous data
  
  Data, single: total=14.03TiB, used=14.01TiB
  
  the difference must be the snapshots.
  
  I previously configured snapshots everywhere, as a nice to have.
  
  I will only be doing it selectively now.
  - StephenB
    Guru - Experienced User
    Mar 28, 2019
    Michael_Oz wrote:
    
    I previously configured snapshots everywhere, as a nice to have.
    
    I will only be doing it selectively now.
    
    I use them on almost every share. Shares that have a lot of "churn"should have snapshots disabled - in particular shares where there are a lot of file deletions and/or files that are modified in place (torrents or databases).
    
    I suggest using custom snapshots, and then explicitly setting retention. With the default "smart" snapshots, the monthly snapshots are never pruned, and over time they will take up a lot of space.
    
    I've found that using 3 months retention on most shares (and 2 weeks retention on one share used for PC image backups), I end up with about 5% of the available space going for snapshots. So you could start with those values, and tweak them as desired to control the overall space. I also check the "only make snapshots on changes" option.
    
    BTW, when you defrag the share that has snapshots, the storage needed for the snapshots will often rise. When a file that is in a snapshot and a share is modified, BTRFS will fragment the copy in the main folder. That allows the unmodified blocks to be held in common by the snapshots and the main folder. If you defrag that file, then the unmodified blocks aren't held in common any longer, so the storage needed by the snapshot goes up.

Forum Discussion

Hang/Crash after 6.9.5 upgrade - Defrag

Related Content

6.6.0 Cant Access Admin Page, Device keeps crashing/hanging

Upgrade Nighthawk M1

RN104: Defrag freezes NAS

WC7600 upgrade

Upgrading from OS6.1.8

NETGEAR Academy

ProSupport for Business