NETGEAR is aware of a growing number of phone and online scams. To learn how to stay safe click here.

Forum Discussion

Matthias1111's avatar
Matthias1111
Aspirant
Jan 10, 2021
Solved

Volume dead or inactive after balancing - works with readonly

Hello,

my ReadyNAS 104 reports that the Volume is inactive or dead after I performed "balancing disks". The data volume is accessible in read-only mode (boot menu), but I am really concerned about the root cause, since all Disks (3 HDDs as Raid 5) are reported healthy.

 

Is there anybody who can take a look to the logs (which I would send via PM). I really would appriciate that.

Thanks in advance 

Matthias

  • Thanks for the logs Matthias1111 


    The NAS experienced several out of memory conditions which likely caused the issue/crash in the end.
    It also seems to be induced by quota calculation. Example below. This happened over and over, by the way.

    Jan 10 00:12:18 NAS kernel: Hardware name: Marvell Armada 370/XP (Device Tree)
    Jan 10 00:12:18 NAS kernel: [<c0015270>] (unwind_backtrace) from [<c001173c>] (show_stack+0x10/0x18)
    Jan 10 00:12:18 NAS kernel: [<c001173c>] (show_stack) from [<c03849d0>] (dump_stack+0x78/0x9c)
    Jan 10 00:12:18 NAS kernel: [<c03849d0>] (dump_stack) from [<c00d5e20>] (dump_header+0x4c/0x1b4)
    Jan 10 00:12:18 NAS kernel: [<c00d5e20>] (dump_header) from [<c00a09a0>] (oom_kill_process+0xd0/0x45c)
    Jan 10 00:12:18 NAS kernel: [<c00a09a0>] (oom_kill_process) from [<c00a10b0>] (out_of_memory+0x310/0x374)
    Jan 10 00:12:18 NAS kernel: [<c00a10b0>] (out_of_memory) from [<c00a49d4>] (__alloc_pages_nodemask+0x6e0/0x7dc)
    Jan 10 00:12:18 NAS kernel: [<c00a49d4>] (__alloc_pages_nodemask) from [<c00cb4c0>] (__read_swap_cache_async+0x70/0x1a0)
    Jan 10 00:12:18 NAS kernel: [<c00cb4c0>] (__read_swap_cache_async) from [<c00cb600>] (read_swap_cache_async+0x10/0x34)
    Jan 10 00:12:18 NAS kernel: [<c00cb600>] (read_swap_cache_async) from [<c00cb788>] (swapin_readahead+0x164/0x17c)
    Jan 10 00:12:18 NAS kernel: [<c00cb788>] (swapin_readahead) from [<c00bd4fc>] (handle_mm_fault+0x83c/0xc04)
    Jan 10 00:12:18 NAS kernel: [<c00bd4fc>] (handle_mm_fault) from [<c0017cb8>] (do_page_fault+0x134/0x2b0)
    Jan 10 00:12:18 NAS kernel: [<c0017cb8>] (do_page_fault) from [<c00092b0>] (do_DataAbort+0x34/0xb8)
    Jan 10 00:12:18 NAS kernel: [<c00092b0>] (do_DataAbort) from [<c00123fc>] (__dabt_usr+0x3c/0x40)
    Jan 10 00:12:18 NAS kernel: Out of memory: Kill process 1113 (mount) score 1 or sacrifice child
    Jan 10 00:12:18 NAS kernel: Killed process 1113 (mount) total-vm:5400kB, anon-rss:0kB, file-rss:1764kB
    Jan 10 00:12:18 NAS kernel: mount: page allocation failure: order:0, mode:0x2600040
    Jan 10 00:12:18 NAS kernel: CPU: 0 PID: 1113 Comm: mount Tainted: P O 4.4.190.armada.1 #1
    Jan 10 00:12:18 NAS kernel: Hardware name: Marvell Armada 370/XP (Device Tree)
    Jan 10 00:12:18 NAS kernel: [<c0015270>] (unwind_backtrace) from [<c001173c>] (show_stack+0x10/0x18)
    Jan 10 00:12:18 NAS kernel: [<c001173c>] (show_stack) from [<c03849d0>] (dump_stack+0x78/0x9c)
    Jan 10 00:12:18 NAS kernel: [<c03849d0>] (dump_stack) from [<c00a2570>] (warn_alloc_failed+0xec/0x118)
    Jan 10 00:12:18 NAS kernel: [<c00a2570>] (warn_alloc_failed) from [<c00a4a44>] (__alloc_pages_nodemask+0x750/0x7dc)
    Jan 10 00:12:18 NAS kernel: [<c00a4a44>] (__alloc_pages_nodemask) from [<c00d0d58>] (allocate_slab+0x88/0x280)
    Jan 10 00:12:18 NAS kernel: [<c00d0d58>] (allocate_slab) from [<c00d253c>] (___slab_alloc.constprop.13+0x250/0x35c)
    Jan 10 00:12:18 NAS kernel: [<c00d253c>] (___slab_alloc.constprop.13) from [<c00d2828>] (kmem_cache_alloc+0xac/0x168)
    Jan 10 00:12:18 NAS kernel: [<c00d2828>] (kmem_cache_alloc) from [<c0306114>] (ulist_alloc+0x1c/0x54)
    Jan 10 00:12:18 NAS kernel: [<c0306114>] (ulist_alloc) from [<c03040d0>] (resolve_indirect_refs+0x1c/0x6d4)
    Jan 10 00:12:18 NAS kernel: [<c03040d0>] (resolve_indirect_refs) from [<c0304b54>] (find_parent_nodes+0x3cc/0x6b0)
    Jan 10 00:12:18 NAS kernel: [<c0304b54>] (find_parent_nodes) from [<c0304eb8>] (btrfs_find_all_roots_safe+0x80/0xfc)
    Jan 10 00:12:18 NAS kernel: [<c0304eb8>] (btrfs_find_all_roots_safe) from [<c0304f7c>] (btrfs_find_all_roots+0x48/0x6c)
    Jan 10 00:12:18 NAS kernel: [<c0304f7c>] (btrfs_find_all_roots) from [<c03089ec>] (btrfs_qgroup_prepare_account_extents+0x58/0xa0)
    Jan 10 00:12:18 NAS kernel: [<c03089ec>] (btrfs_qgroup_prepare_account_extents) from [<c029b714>] (btrfs_commit_transaction+0x49c/0x9b4)
    Jan 10 00:12:18 NAS kernel: [<c029b714>] (btrfs_commit_transaction) from [<c0284ac4>] (btrfs_drop_snapshot+0x420/0x6bc)
    Jan 10 00:12:18 NAS kernel: [<c0284ac4>] (btrfs_drop_snapshot) from [<c02f6338>] (merge_reloc_roots+0x120/0x220)
    Jan 10 00:12:18 NAS kernel: [<c02f6338>] (merge_reloc_roots) from [<c02f7138>] (btrfs_recover_relocation+0x2c8/0x370)
    Jan 10 00:12:18 NAS kernel: [<c02f7138>] (btrfs_recover_relocation) from [<c0298f00>] (open_ctree+0x1df0/0x2168)
    Jan 10 00:12:18 NAS kernel: [<c0298f00>] (open_ctree) from [<c026f578>] (btrfs_mount+0x458/0x690)
    Jan 10 00:12:18 NAS kernel: [<c026f578>] (btrfs_mount) from [<c00dbdc0>] (mount_fs+0x6c/0x14c)
    Jan 10 00:12:18 NAS kernel: [<c00dbdc0>] (mount_fs) from [<c00f4490>] (vfs_kern_mount+0x4c/0xf0)
    Jan 10 00:12:18 NAS kernel: [<c00f4490>] (vfs_kern_mount) from [<c026ea68>] (mount_subvol+0xf4/0x7ac)
    Jan 10 00:12:18 NAS kernel: [<c026ea68>] (mount_subvol) from [<c026f2f4>] (btrfs_mount+0x1d4/0x690)
    Jan 10 00:12:18 NAS kernel: [<c026f2f4>] (btrfs_mount) from [<c00dbdc0>] (mount_fs+0x6c/0x14c)
    Jan 10 00:12:18 NAS kernel: [<c00dbdc0>] (mount_fs) from [<c00f4490>] (vfs_kern_mount+0x4c/0xf0)
    Jan 10 00:12:18 NAS kernel: [<c00f4490>] (vfs_kern_mount) from [<c00f71a4>] (do_mount+0xa30/0xb60)
    Jan 10 00:12:18 NAS kernel: [<c00f71a4>] (do_mount) from [<c00f74fc>] (SyS_mount+0x70/0xa0)
    Jan 10 00:12:18 NAS kernel: [<c00f74fc>] (SyS_mount) from [<c000ec40>] (ret_fast_syscall+0x0/0x40)


    It is a quite well established fact that quotas carry a lot of calculation overhead when deleting snapshots. The RN104 has 512MB of RAM so it is already resource starved and so the act of deleting many snapshots in a row or big snapshots, can tip the unit over the edge.

    You can run into a race condition where the filesystem need to update (do btrfs-transactions) but the quota module has hogged all the resources and then you can end up in a limbo. The OS re-install disabled quotas (which I didn't know it would) and that likely allowed the filesystem to actually finish the clean-up. It explains why the OS re-install "fixed" it... It didn't really fix anything - it just disabled quotas and that allowed room for other filesystem transactions to take place.

     

    I would advise to keep a lower amount of snapshots in general on these units. Disabling quotas before you delete snapshots or when running things like a balance, will help keeping the unit afloat. And then just re-enable quotas afterwards.

12 Replies

Replies have been turned off for this discussion
  • StephenB's avatar
    StephenB
    Guru - Experienced User

    If the volume is read-only, then you should immediately make sure you have an up-to-date backup.  The data is definitely at risk.

     

    You can send a download link to the logs via PM (private message) to one of the mods ( JohnCM_S or Marc_V ).  Others here might also offer to take a look.  You might start by looking in system.log and kernel.log for btrfs and disk i/o errors.  

    • Matthias1111's avatar
      Matthias1111
      Aspirant

      Hello Stephen,

      thanks for the fast reply. I asked Marc for help. 

       

      Best Matthias

      • Matthias1111's avatar
        Matthias1111
        Aspirant

        One addtional remark: Since I have backups from my NAS data, I now tried the bootmenu option "reinstall OS". This worked fine and the volume is back online now. The only thing is that in the capacity bar the yellow part for the snapshot consumption is missing (see screenshot). But snapshots are still there, accessible and new ones are taken automaticly like configured.  What do you think? Is my NAS healthy, or should I factory reset it (It would prefere to avoid that because it consumes a lot of time to configure the NAS and transfer all the data back to the NAS).

         

NETGEAR Academy

Boost your skills with the Netgear Academy - Get trained, certified and stay ahead with the latest Netgear technology! 

Join Us!

ProSupport for Business

Comprehensive support plans for maximum network uptime and business peace of mind.

 

Learn More