Forum Discussion

Aspirant

Jan 10, 2021

Solved

Volume dead or inactive after balancing - works with readonly

Hello, my ReadyNAS 104 reports that the Volume is inactive or dead after I performed "balancing disks". The data volume is accessible in read-only mode (boot menu), but I am really concerned about t...

rn_enthusiast

Jan 12, 2021

Thanks for the logs Matthias1111

The NAS experienced several out of memory conditions which likely caused the issue/crash in the end.
It also seems to be induced by quota calculation. Example below. This happened over and over, by the way.

Jan 10 00:12:18 NAS kernel: Hardware name: Marvell Armada 370/XP (Device Tree)
Jan 10 00:12:18 NAS kernel: [<c0015270>] (unwind_backtrace) from [<c001173c>] (show_stack+0x10/0x18)
Jan 10 00:12:18 NAS kernel: [<c001173c>] (show_stack) from [<c03849d0>] (dump_stack+0x78/0x9c)
Jan 10 00:12:18 NAS kernel: [<c03849d0>] (dump_stack) from [<c00d5e20>] (dump_header+0x4c/0x1b4)
Jan 10 00:12:18 NAS kernel: [<c00d5e20>] (dump_header) from [<c00a09a0>] (oom_kill_process+0xd0/0x45c)
Jan 10 00:12:18 NAS kernel: [<c00a09a0>] (oom_kill_process) from [<c00a10b0>] (out_of_memory+0x310/0x374)
Jan 10 00:12:18 NAS kernel: [<c00a10b0>] (out_of_memory) from [<c00a49d4>] (__alloc_pages_nodemask+0x6e0/0x7dc)
Jan 10 00:12:18 NAS kernel: [<c00a49d4>] (__alloc_pages_nodemask) from [<c00cb4c0>] (__read_swap_cache_async+0x70/0x1a0)
Jan 10 00:12:18 NAS kernel: [<c00cb4c0>] (__read_swap_cache_async) from [<c00cb600>] (read_swap_cache_async+0x10/0x34)
Jan 10 00:12:18 NAS kernel: [<c00cb600>] (read_swap_cache_async) from [<c00cb788>] (swapin_readahead+0x164/0x17c)
Jan 10 00:12:18 NAS kernel: [<c00cb788>] (swapin_readahead) from [<c00bd4fc>] (handle_mm_fault+0x83c/0xc04)
Jan 10 00:12:18 NAS kernel: [<c00bd4fc>] (handle_mm_fault) from [<c0017cb8>] (do_page_fault+0x134/0x2b0)
Jan 10 00:12:18 NAS kernel: [<c0017cb8>] (do_page_fault) from [<c00092b0>] (do_DataAbort+0x34/0xb8)
Jan 10 00:12:18 NAS kernel: [<c00092b0>] (do_DataAbort) from [<c00123fc>] (__dabt_usr+0x3c/0x40)
Jan 10 00:12:18 NAS kernel: Out of memory: Kill process 1113 (mount) score 1 or sacrifice child
Jan 10 00:12:18 NAS kernel: Killed process 1113 (mount) total-vm:5400kB, anon-rss:0kB, file-rss:1764kB
Jan 10 00:12:18 NAS kernel: mount: page allocation failure: order:0, mode:0x2600040
Jan 10 00:12:18 NAS kernel: CPU: 0 PID: 1113 Comm: mount Tainted: P O 4.4.190.armada.1 #1
Jan 10 00:12:18 NAS kernel: Hardware name: Marvell Armada 370/XP (Device Tree)
Jan 10 00:12:18 NAS kernel: [<c0015270>] (unwind_backtrace) from [<c001173c>] (show_stack+0x10/0x18)
Jan 10 00:12:18 NAS kernel: [<c001173c>] (show_stack) from [<c03849d0>] (dump_stack+0x78/0x9c)
Jan 10 00:12:18 NAS kernel: [<c03849d0>] (dump_stack) from [<c00a2570>] (warn_alloc_failed+0xec/0x118)
Jan 10 00:12:18 NAS kernel: [<c00a2570>] (warn_alloc_failed) from [<c00a4a44>] (__alloc_pages_nodemask+0x750/0x7dc)
Jan 10 00:12:18 NAS kernel: [<c00a4a44>] (__alloc_pages_nodemask) from [<c00d0d58>] (allocate_slab+0x88/0x280)
Jan 10 00:12:18 NAS kernel: [<c00d0d58>] (allocate_slab) from [<c00d253c>] (___slab_alloc.constprop.13+0x250/0x35c)
Jan 10 00:12:18 NAS kernel: [<c00d253c>] (___slab_alloc.constprop.13) from [<c00d2828>] (kmem_cache_alloc+0xac/0x168)
Jan 10 00:12:18 NAS kernel: [<c00d2828>] (kmem_cache_alloc) from [<c0306114>] (ulist_alloc+0x1c/0x54)
Jan 10 00:12:18 NAS kernel: [<c0306114>] (ulist_alloc) from [<c03040d0>] (resolve_indirect_refs+0x1c/0x6d4)
Jan 10 00:12:18 NAS kernel: [<c03040d0>] (resolve_indirect_refs) from [<c0304b54>] (find_parent_nodes+0x3cc/0x6b0)
Jan 10 00:12:18 NAS kernel: [<c0304b54>] (find_parent_nodes) from [<c0304eb8>] (btrfs_find_all_roots_safe+0x80/0xfc)
Jan 10 00:12:18 NAS kernel: [<c0304eb8>] (btrfs_find_all_roots_safe) from [<c0304f7c>] (btrfs_find_all_roots+0x48/0x6c)
Jan 10 00:12:18 NAS kernel: [<c0304f7c>] (btrfs_find_all_roots) from [<c03089ec>] (btrfs_qgroup_prepare_account_extents+0x58/0xa0)
Jan 10 00:12:18 NAS kernel: [<c03089ec>] (btrfs_qgroup_prepare_account_extents) from [<c029b714>] (btrfs_commit_transaction+0x49c/0x9b4)
Jan 10 00:12:18 NAS kernel: [<c029b714>] (btrfs_commit_transaction) from [<c0284ac4>] (btrfs_drop_snapshot+0x420/0x6bc)
Jan 10 00:12:18 NAS kernel: [<c0284ac4>] (btrfs_drop_snapshot) from [<c02f6338>] (merge_reloc_roots+0x120/0x220)
Jan 10 00:12:18 NAS kernel: [<c02f6338>] (merge_reloc_roots) from [<c02f7138>] (btrfs_recover_relocation+0x2c8/0x370)
Jan 10 00:12:18 NAS kernel: [<c02f7138>] (btrfs_recover_relocation) from [<c0298f00>] (open_ctree+0x1df0/0x2168)
Jan 10 00:12:18 NAS kernel: [<c0298f00>] (open_ctree) from [<c026f578>] (btrfs_mount+0x458/0x690)
Jan 10 00:12:18 NAS kernel: [<c026f578>] (btrfs_mount) from [<c00dbdc0>] (mount_fs+0x6c/0x14c)
Jan 10 00:12:18 NAS kernel: [<c00dbdc0>] (mount_fs) from [<c00f4490>] (vfs_kern_mount+0x4c/0xf0)
Jan 10 00:12:18 NAS kernel: [<c00f4490>] (vfs_kern_mount) from [<c026ea68>] (mount_subvol+0xf4/0x7ac)
Jan 10 00:12:18 NAS kernel: [<c026ea68>] (mount_subvol) from [<c026f2f4>] (btrfs_mount+0x1d4/0x690)
Jan 10 00:12:18 NAS kernel: [<c026f2f4>] (btrfs_mount) from [<c00dbdc0>] (mount_fs+0x6c/0x14c)
Jan 10 00:12:18 NAS kernel: [<c00dbdc0>] (mount_fs) from [<c00f4490>] (vfs_kern_mount+0x4c/0xf0)
Jan 10 00:12:18 NAS kernel: [<c00f4490>] (vfs_kern_mount) from [<c00f71a4>] (do_mount+0xa30/0xb60)
Jan 10 00:12:18 NAS kernel: [<c00f71a4>] (do_mount) from [<c00f74fc>] (SyS_mount+0x70/0xa0)
Jan 10 00:12:18 NAS kernel: [<c00f74fc>] (SyS_mount) from [<c000ec40>] (ret_fast_syscall+0x0/0x40)

It is a quite well established fact that quotas carry a lot of calculation overhead when deleting snapshots. The RN104 has 512MB of RAM so it is already resource starved and so the act of deleting many snapshots in a row or big snapshots, can tip the unit over the edge.

You can run into a race condition where the filesystem need to update (do btrfs-transactions) but the quota module has hogged all the resources and then you can end up in a limbo. The OS re-install disabled quotas (which I didn't know it would) and that likely allowed the filesystem to actually finish the clean-up. It explains why the OS re-install "fixed" it... It didn't really fix anything - it just disabled quotas and that allowed room for other filesystem transactions to take place.

I would advise to keep a lower amount of snapshots in general on these units. Disabling quotas before you delete snapshots or when running things like a balance, will help keeping the unit afloat. And then just re-enable quotas afterwards.

Matthias1111

Aspirant

Jan 12, 2021

Sounds good. :-) PM sent.

Just some additional Info what I did in the meantime:

Thanks to this Post I identified one single snapshot which was not shown in the GUI but via SSH:

btrfs subvolume list -s /data
ID 8763 gen 86996 cgen 60428 top level 324 otime 2020-06-20 11:00:20 path home/Tinchen/.snapshots/1767/snapshot
ID 12844 gen 89355 cgen 80343 top level 5 otime 2020-12-19 20:52:18 path Bilder

After deleting this snapshot the calculated free disc space was accurate and also shown correctly in the GUI:

root@NAS:~# btrfs subvolume delete /data/home/Tinchen/.snapshots/1767/snapshot/
Delete subvolume (no-commit): '/data/home/Tinchen/.snapshots/1767/snapshot'

Space deleting snapshot before:

root@NAS:~# btrfs fi usage /data
Overall:
Device size: 7.27TiB
Device allocated: 5.70TiB
Device unallocated: 1.56TiB
Device missing: 0.00B
Used: 5.12TiB
Free (estimated): 2.14TiB (min: 1.36TiB)
Data ratio: 1.00
Metadata ratio: 2.00
Global reserve: 74.03MiB (used: 0.00B)

Space after deleting snapshot:

root@NAS:~# btrfs fi usage /data
Overall:
Device size: 7.27TiB
Device allocated: 5.70TiB
Device unallocated: 1.56TiB
Device missing: 0.00B
Used: 3.17TiB
Free (estimated): 4.10TiB (min: 3.31TiB)
Data ratio: 1.00
Metadata ratio: 2.00
Global reserve: 57.19MiB (used: 0.00B)

The only thing which looks a little bit strange is, that there is 1,56 TiB of Unallocated space listed. But that's maybe by design?!

Finally I also run a disk test. I just got the message: "Volume: Disk test completed for volume data." - without any error message or somthing like that. That's good, right?

rn_enthusiast

Virtuoso

Jan 12, 2021

Thanks for the logs Matthias1111

Jan 10 00:12:18 NAS kernel: Hardware name: Marvell Armada 370/XP (Device Tree)
Jan 10 00:12:18 NAS kernel: [<c0015270>] (unwind_backtrace) from [<c001173c>] (show_stack+0x10/0x18)
Jan 10 00:12:18 NAS kernel: [<c001173c>] (show_stack) from [<c03849d0>] (dump_stack+0x78/0x9c)
Jan 10 00:12:18 NAS kernel: [<c03849d0>] (dump_stack) from [<c00d5e20>] (dump_header+0x4c/0x1b4)
Jan 10 00:12:18 NAS kernel: [<c00d5e20>] (dump_header) from [<c00a09a0>] (oom_kill_process+0xd0/0x45c)
Jan 10 00:12:18 NAS kernel: [<c00a09a0>] (oom_kill_process) from [<c00a10b0>] (out_of_memory+0x310/0x374)
Jan 10 00:12:18 NAS kernel: [<c00a10b0>] (out_of_memory) from [<c00a49d4>] (__alloc_pages_nodemask+0x6e0/0x7dc)
Jan 10 00:12:18 NAS kernel: [<c00a49d4>] (__alloc_pages_nodemask) from [<c00cb4c0>] (__read_swap_cache_async+0x70/0x1a0)
Jan 10 00:12:18 NAS kernel: [<c00cb4c0>] (__read_swap_cache_async) from [<c00cb600>] (read_swap_cache_async+0x10/0x34)
Jan 10 00:12:18 NAS kernel: [<c00cb600>] (read_swap_cache_async) from [<c00cb788>] (swapin_readahead+0x164/0x17c)
Jan 10 00:12:18 NAS kernel: [<c00cb788>] (swapin_readahead) from [<c00bd4fc>] (handle_mm_fault+0x83c/0xc04)
Jan 10 00:12:18 NAS kernel: [<c00bd4fc>] (handle_mm_fault) from [<c0017cb8>] (do_page_fault+0x134/0x2b0)
Jan 10 00:12:18 NAS kernel: [<c0017cb8>] (do_page_fault) from [<c00092b0>] (do_DataAbort+0x34/0xb8)
Jan 10 00:12:18 NAS kernel: [<c00092b0>] (do_DataAbort) from [<c00123fc>] (__dabt_usr+0x3c/0x40)
Jan 10 00:12:18 NAS kernel: Out of memory: Kill process 1113 (mount) score 1 or sacrifice child
Jan 10 00:12:18 NAS kernel: Killed process 1113 (mount) total-vm:5400kB, anon-rss:0kB, file-rss:1764kB
Jan 10 00:12:18 NAS kernel: mount: page allocation failure: order:0, mode:0x2600040
Jan 10 00:12:18 NAS kernel: CPU: 0 PID: 1113 Comm: mount Tainted: P O 4.4.190.armada.1 #1
Jan 10 00:12:18 NAS kernel: Hardware name: Marvell Armada 370/XP (Device Tree)
Jan 10 00:12:18 NAS kernel: [<c0015270>] (unwind_backtrace) from [<c001173c>] (show_stack+0x10/0x18)
Jan 10 00:12:18 NAS kernel: [<c001173c>] (show_stack) from [<c03849d0>] (dump_stack+0x78/0x9c)
Jan 10 00:12:18 NAS kernel: [<c03849d0>] (dump_stack) from [<c00a2570>] (warn_alloc_failed+0xec/0x118)
Jan 10 00:12:18 NAS kernel: [<c00a2570>] (warn_alloc_failed) from [<c00a4a44>] (__alloc_pages_nodemask+0x750/0x7dc)
Jan 10 00:12:18 NAS kernel: [<c00a4a44>] (__alloc_pages_nodemask) from [<c00d0d58>] (allocate_slab+0x88/0x280)
Jan 10 00:12:18 NAS kernel: [<c00d0d58>] (allocate_slab) from [<c00d253c>] (___slab_alloc.constprop.13+0x250/0x35c)
Jan 10 00:12:18 NAS kernel: [<c00d253c>] (___slab_alloc.constprop.13) from [<c00d2828>] (kmem_cache_alloc+0xac/0x168)
Jan 10 00:12:18 NAS kernel: [<c00d2828>] (kmem_cache_alloc) from [<c0306114>] (ulist_alloc+0x1c/0x54)
Jan 10 00:12:18 NAS kernel: [<c0306114>] (ulist_alloc) from [<c03040d0>] (resolve_indirect_refs+0x1c/0x6d4)
Jan 10 00:12:18 NAS kernel: [<c03040d0>] (resolve_indirect_refs) from [<c0304b54>] (find_parent_nodes+0x3cc/0x6b0)
Jan 10 00:12:18 NAS kernel: [<c0304b54>] (find_parent_nodes) from [<c0304eb8>] (btrfs_find_all_roots_safe+0x80/0xfc)
Jan 10 00:12:18 NAS kernel: [<c0304eb8>] (btrfs_find_all_roots_safe) from [<c0304f7c>] (btrfs_find_all_roots+0x48/0x6c)
Jan 10 00:12:18 NAS kernel: [<c0304f7c>] (btrfs_find_all_roots) from [<c03089ec>] (btrfs_qgroup_prepare_account_extents+0x58/0xa0)
Jan 10 00:12:18 NAS kernel: [<c03089ec>] (btrfs_qgroup_prepare_account_extents) from [<c029b714>] (btrfs_commit_transaction+0x49c/0x9b4)
Jan 10 00:12:18 NAS kernel: [<c029b714>] (btrfs_commit_transaction) from [<c0284ac4>] (btrfs_drop_snapshot+0x420/0x6bc)
Jan 10 00:12:18 NAS kernel: [<c0284ac4>] (btrfs_drop_snapshot) from [<c02f6338>] (merge_reloc_roots+0x120/0x220)
Jan 10 00:12:18 NAS kernel: [<c02f6338>] (merge_reloc_roots) from [<c02f7138>] (btrfs_recover_relocation+0x2c8/0x370)
Jan 10 00:12:18 NAS kernel: [<c02f7138>] (btrfs_recover_relocation) from [<c0298f00>] (open_ctree+0x1df0/0x2168)
Jan 10 00:12:18 NAS kernel: [<c0298f00>] (open_ctree) from [<c026f578>] (btrfs_mount+0x458/0x690)
Jan 10 00:12:18 NAS kernel: [<c026f578>] (btrfs_mount) from [<c00dbdc0>] (mount_fs+0x6c/0x14c)
Jan 10 00:12:18 NAS kernel: [<c00dbdc0>] (mount_fs) from [<c00f4490>] (vfs_kern_mount+0x4c/0xf0)
Jan 10 00:12:18 NAS kernel: [<c00f4490>] (vfs_kern_mount) from [<c026ea68>] (mount_subvol+0xf4/0x7ac)
Jan 10 00:12:18 NAS kernel: [<c026ea68>] (mount_subvol) from [<c026f2f4>] (btrfs_mount+0x1d4/0x690)
Jan 10 00:12:18 NAS kernel: [<c026f2f4>] (btrfs_mount) from [<c00dbdc0>] (mount_fs+0x6c/0x14c)
Jan 10 00:12:18 NAS kernel: [<c00dbdc0>] (mount_fs) from [<c00f4490>] (vfs_kern_mount+0x4c/0xf0)
Jan 10 00:12:18 NAS kernel: [<c00f4490>] (vfs_kern_mount) from [<c00f71a4>] (do_mount+0xa30/0xb60)
Jan 10 00:12:18 NAS kernel: [<c00f71a4>] (do_mount) from [<c00f74fc>] (SyS_mount+0x70/0xa0)
Jan 10 00:12:18 NAS kernel: [<c00f74fc>] (SyS_mount) from [<c000ec40>] (ret_fast_syscall+0x0/0x40)

Matthias1111
Aspirant
Jan 12, 2021
Hello rn_enthusiast,
thanks a lot for analyzing my logs and the pretty good explanation. When I now also took a closer look to the kernel log shortly before the crash (Jan 9 21:52), I fully agree with you. The NAS kind of "screems" for more memory here. I didn't see this in the first time because I just looked for btrfs errors.

To be honest, I just was not aware, that balancing and a turned on quota (which in my opinion was on be default) could cause such a memory issue combined with many snapshots. My expectation to Netgear would be, that they test their devices under all circumstances and e.g. limit the snapshots the system or user can take. But that's of course no criticism to you.

Once again thanks for you help. I'll follow your advice in the future and I'm happy that there is no need to factory reset the device.
Best, Matthias
rn_enthusiast
Virtuoso
Jan 12, 2021
No problem :)

I agree that the RN102 and RN104 are undertuned. I think Netgear realised this quickly enough and thus bumped up the spec on the newer RN202/204 and RN212/214 models, while leaving the 100 series go "end-of-life" (they stopped making them a good few years back).

I didn't spot any signs of data corruption in the logs, so that is good. Here is a link that tells a little bit more about BTRFS quotas:
https://btrfs.wiki.kernel.org/index.php/Quota_support

There is a known issues section that would be good to read. Good luck onwards.

NETGEAR Academy

Boost your skills with the Netgear Academy - Get trained, certified and stay ahead with the latest Netgear technology!

Join Us!

ProSupport for Business

Comprehensive support plans for maximum network uptime and business peace of mind.

Learn More

Forum Discussion

Volume dead or inactive after balancing - works with readonly

Related Content

PRO6 "Remove inactive volumes" after interrupted balance

Armor showing inactive subscription

ReadyNAS RN424 | Inactive Volume + RAID Issue

RN104 Remove inactive volumes Disk 3,4

RN214 btrfs corruption forced readonly

NETGEAR Academy

ProSupport for Business