NETGEAR is aware of a growing number of phone and online scams. To learn how to stay safe click here.
Forum Discussion
Michael_Oz
Feb 21, 2019Luminary
Hang/Crash after 6.9.5 upgrade - Defrag
I recently upgraded to 6.9.5, 9th Feb.
I have Defrag (& tests & balance) scheduled on two volumes, they have run without issues on 6.9.3 forever.
A disk test of V1 completed 19th Feb.
A Defrag ...
JohnCM_S
Feb 22, 2019NETGEAR Employee Retired
Hi Michael_Oz,
You may provide to me the logs. Please upload it to Google Drive then PM me the download link.
Regards,
Michael_Oz
Feb 22, 2019Luminary
Thanks John. Logs PM'd.
I have disabled volume schedules.
I ran disk test on both volumes, both completed, no errors.
- JohnCM_SFeb 27, 2019NETGEAR Employee Retired
Thank you for providing the logs. We will look at the logs soon.
- JohnCM_SMar 05, 2019NETGEAR Employee Retired
Hi Michael_Oz,
There is a lot of metadata on your NAS and the unit only has 2GB RAM.
=== filesystem /RN316AV1 ===
Data, single: total=14.03TiB, used=14.01TiB
System, DUP: total=32.00MiB, used=1.84MiB
Metadata, DUP: total=29.50GiB, used=24.90GiB
GlobalReserve, single: total=512.00MiB, used=0.00BIt is not probably not having enough RAM and SWAP to accomplish the cleanup. L3 has cancelled the defrag.
This issue can be resolved by backing up, doing a factory default and restoring the data to a fresh clean volume.
Regards,
- HopchenMar 05, 2019ProdigyThat is a lot of metadata for a volume that size. Do you remember when you did a factory reset last, what firmware version you were on?
- Michael_OzMar 05, 2019Luminary
Thanks John.
> L3 has cancelled the defrag.
What does that mean? The system was hung the first time and OOM the second.
One would think software would maintain it's metadata.
As I mentioned it had been happily defragging on schedule UNTIL the first defrag on 6.9.5.
Now it COULD be that it just happenned to have that one extra nybble to put it over the edge the day after the upgrade, but occam's razor says no.
> backing up, doing a factory default and restoring the data to a fresh clean volume.
Is this going into the Product Brochure? 'BTW you will need to reset & reload your NAS every year or so'
That volume was created 2017/08/05 (on 6.7.5) with data reloaded from backup of a previous volume, and it, and the NAS, is basically idle 99.9% of the time, the most it does is take & delete empty snapshots and ponder the meaning of UPS signals.
It appears the metadata is volume specific, so surely deleting/recreating the volume is all that is required?
Why factory reset? (and hence need to do the other volume too)
I had also regularly scrubbed the volumes, until a while ago.
I started a scrub 6 days ago which is now at 10%, is a scrub going to tidy up the metadata?
If not, surely a metadata_tidyup routine is called for rather than factory resetting, that is not a 21st century solution.
I'm monitoring memory use, it is growing slightly, but is only ~30%.
> That is a lot of metadata for a volume that size. Do you remember when you did a factory reset last, what firmware version you were on?
I don't think I have factory reset since a couple shortly after getting the unit.
[2014/11/03 22:12:40] Factory default initiated due to new disks (no RAID, no partitions)!
[2014/11/03 22:12:58] Defaulting to X-RAID2 mode, RAID level 1
[2014/11/03 22:13:13] Factory default initiated on ReadyNASOS 6.1.6. [2015/09/18 21:15:14] Updated from ReadyNASOS 6.1.6 to 6.1.6. [2015/09/25 01:31:45] Updated from ReadyNASOS 6.1.6 () to 6.2.4 (ReadyNASOS). [2016/03/04 22:44:33] Updated from ReadyNASOS 6.2.4 (ReadyNASOS) to 6.4.2 (ReadyNASOS). [2016/05/25 16:24:48] Updated from ReadyNASOS 6.4.2 (ReadyNASOS) to 6.5.0 (ReadyNASOS). [2016/07/12 18:46:10] Updated from ReadyNASOS 6.5.0 (ReadyNASOS) to 6.5.1 (ReadyNASOS). [2016/11/11 13:35:30] Updated from ReadyNASOS 6.5.1 (ReadyNASOS) to 6.6.0 (ReadyNASOS). [2016/11/17 23:05:03] Updated from ReadyNASOS 6.6.0 (ReadyNASOS) to 6.6.1-T200 (Beta 1). [2016/12/20 14:54:50] Updated from ReadyNASOS 6.6.1-T200 (Beta 1) to 6.6.1-T220 (Beta 3). [2017/01/12 14:18:27] Updated from ReadyNASOS 6.6.1-T220 (Beta 3) to 6.6.1 (ReadyNASOS). [2017/03/02 07:04:04 UTC] Updated from ReadyNASOS 6.6.1 (ReadyNASOS) to 6.7.0-T169 (Beta 2). [2017/03/03 06:47:43 UTC] Updated from ReadyNASOS 6.7.0-T169 (Beta 2) to 6.7.0-T172 (ReadyNASOS). [2017/03/03 23:44:36 UTC] Updated from ReadyNASOS 6.7.0-T172 (ReadyNASOS) to 6.7.0-T180 (Beta 3). [2017/03/16 09:47:38 UTC] Updated from ReadyNASOS 6.7.0-T180 (Beta 3) to 6.7.0-T206 (Beta 4). [2017/03/29 23:56:09 UTC] Updated from ReadyNASOS 6.7.0-T206 (Beta 4) to 6.7.0 (ReadyNASOS). [2017/05/15 06:42:29 UTC] Updated from ReadyNASOS 6.7.0 (ReadyNASOS) to 6.7.1 (ReadyNASOS). [2017/06/11 02:41:54 UTC] Updated from ReadyNASOS 6.7.1 (ReadyNASOS) to 6.7.4 (ReadyNASOS). [2017/07/14 07:02:22 UTC] Updated from ReadyNASOS 6.7.4 (ReadyNASOS) to 6.7.5 (ReadyNASOS). [2017/09/30 01:38:36 UTC] Updated from ReadyNASOS 6.7.5 (ReadyNASOS) to 6.8.1 (ReadyNASOS). [2018/03/30 05:23:48 UTC] Updated from ReadyNASOS 6.8.1 (ReadyNASOS) to 6.9.3 (ReadyNASOS). [2019/02/08 23:09:27 UTC] Updated from ReadyNASOS 6.9.3 (ReadyNASOS) to 6.9.5 (ReadyNASOS).- StephenBMar 06, 2019Guru - Experienced User
Michael_Oz wrote:
It appears the metadata is volume specific, so surely deleting/recreating the volume is all that is required?
Why factory reset? (and hence need to do the other volume too)
If the other volume doesn't have the problem, then deleting/recreating this volume should be enough.
Michael_Oz wrote:
As I mentioned it had been happily defragging on schedule UNTIL the first defrag on 6.9.5.
Now it COULD be that it just happenned to have that one extra nybble to put it over the edge the day after the upgrade, but occam's razor says no.
Personally I don't see this as particularly relevant. The metadata growth almost certainly happened over time. Almost certainly the upgrade to 6.9.5 also contributed to timing of the failure, but the issue was hiding under the surface already. The failure mode is also unusual - not something I recall seeing here before.
But it also appears to me that the volume is essentially completely full:
Data, single: total=14.03TiB, used=14.01TiB
Another thing you could try is to offload as many files as you can (deleting them from the volume). You could then try doing a balance (not a scrub). If you have ssh enabled, you can do a "partial" balance - which would complete more quickly, and not require as much memory. If that frees up enough space, you could follow that up with a full balance.
FWIW, I would use ssh here, as that does give more control over options, and more ability to cancel operations. Though that does depend on your linux skills.
Michael_Oz wrote:surely a metadata_tidyup routine ...
FWIW, the BTRFS folks seem to agree with you there
https://btrfs.wiki.kernel.org/index.php/FAQ wrote:
If you have full up metadata, and more than 1 GiB of space free in data, as reported by btrfs fi df, then you should be able to free up some of the data allocation with a partial balance:
# btrfs balance start /mountpoint -dlimit=3
We know this isn't ideal, and there are plans to improve the behavior.
Note you might not have more than 1 GiB of free space in data.
- Michael_OzMar 27, 2019Luminary
JohnCM_S wrote:
Hi Michael_Oz,
There is a lot of metadata on your NAS and the unit only has 2GB RAM.
=== filesystem /RN316AV1 ===
Data, single: total=14.03TiB, used=14.01TiB
System, DUP: total=32.00MiB, used=1.84MiB
Metadata, DUP: total=29.50GiB, used=24.90GiB
GlobalReserve, single: total=512.00MiB, used=0.00BIt is not probably not having enough RAM and SWAP to accomplish the cleanup. L3 has cancelled the defrag.
This issue can be resolved by backing up, doing a factory default and restoring the data to a fresh clean volume.
Regards,
For the record.
I backed up, deleted & recreated the volume, no factory default, restored.
Note as a solution it is a PITA, as well as the time & effort you loose all your backup & restore jobs.
Defrag now runs to completion.
Size info:
Data, single: total=8.19TiB, used=8.12TiB
System, DUP: total=64.00MiB, used=960.00KiB
Metadata, DUP: total=10.00GiB, used=5.59GiB
GlobalReserve, single: total=512.00MiB, used=0.00BNote the previous data
Data, single: total=14.03TiB, used=14.01TiB
the difference must be the snapshots.
I previously configured snapshots everywhere, as a nice to have.
I will only be doing it selectively now.
- StephenBMar 28, 2019Guru - Experienced User
Michael_Oz wrote:
I previously configured snapshots everywhere, as a nice to have.
I will only be doing it selectively now.
I use them on almost every share. Shares that have a lot of "churn"should have snapshots disabled - in particular shares where there are a lot of file deletions and/or files that are modified in place (torrents or databases).
I suggest using custom snapshots, and then explicitly setting retention. With the default "smart" snapshots, the monthly snapshots are never pruned, and over time they will take up a lot of space.
I've found that using 3 months retention on most shares (and 2 weeks retention on one share used for PC image backups), I end up with about 5% of the available space going for snapshots. So you could start with those values, and tweak them as desired to control the overall space. I also check the "only make snapshots on changes" option.
BTW, when you defrag the share that has snapshots, the storage needed for the snapshots will often rise. When a file that is in a snapshot and a share is modified, BTRFS will fragment the copy in the main folder. That allows the unmodified blocks to be held in common by the snapshots and the main folder. If you defrag that file, then the unmodified blocks aren't held in common any longer, so the storage needed by the snapshot goes up.
Related Content
NETGEAR Academy
Boost your skills with the Netgear Academy - Get trained, certified and stay ahead with the latest Netgear technology!
Join Us!