NETGEAR is aware of a growing number of phone and online scams. To learn how to stay safe click here.

Forum Discussion

Piglet's avatar
Piglet
Luminary
Jul 09, 2015

Aborting a balance

I'm running 6.2.2 on a RN104 with 3x3TB and scheduled a balance a couple of days ago.

I had 25GB free, which seems to cause the balance to take a very long time.

After about 20h the display still said "0% done", so I figured I'd restart and try to make some more free space available.

However, after selecting Restart from the web interface I'm now stuck at the "See you soon" display because the system seems to wait with the restart until the balance is done. :(

All services seems to have shut down. I can't access the NAS via ssh or file sharing any more. But the disks are working hard, so it's not locked up.

I've now waited 48h after the restart attempt and it still hasn't restarted. The balance has been running 3 days...

Any ideas on how long it will take to complete the balance? Any suggestions of what I can do? I obviously can't wait several weeks for it to complete but I'm hesitant to force a restart by pulling the plug after reading stories of the NAS becoming unbootable.

34 Replies

Replies have been turned off for this discussion
  • mdgm-ntgr's avatar
    mdgm-ntgr
    NETGEAR Employee Retired
    A balance can be cancelled but with SSH stopped and no access to the web interface there's nothing that can be done.

    You had 25GB free? If there's only a small amount of space unused then it would be recommended to free up some space before running a balance if you want to run one.
      • rugene's avatar
        rugene
        Guide

        similar issues with balance for me on a RN314.

        I used the "btrfs fi balance status /data" command to kill the operation.

        Running RAID5 xraid; 6.4.0; 4ea 4TB red drives. 2.2TB free space

        Balance hangs system everytime.

         

        I've also stopped scrub and defrag until this issue gets resolved.

         

        not good...

  • Yes, I had 25GB free and I came to realise that it wasn't enough after the balance wasn't getting anywhere after 20h. My mistake was to do a restart before freeing up more space. I assumed it would halt or pause the balance, not pause the shutdown and make the NAS inaccessible. :(

    I guess I'm left with the dilemma of either cutting the power and risking data loss or waiting an unknown time for the balance to finish.

    Is there any way to estimate the time it requires? Even if it had to rewrite every single byte on the drives, one would think it would only take a couple of days. Not several weeks as the initial "0% done" after 20h indicated.

    Right now it's about 70h since the balance started.
  • So here's what happened.

     

    I waited close to two weeks for the balance to complete but when it hadn't done so I finally pulled the cord.

     

    It rebooted normally after that, it also automatically restarted the balance (at 0%). Which I stopped gracefully now that I had ssh access again.

     

    I found that the filesystem had been damaged when I was trying to delete some files and the filesystem suddenly turned read-only. I gather this is a feature of btrfs, so when something unexpected happens it prevents more damage. A reboot resets the filesystem to the normal read/write.

     

    In order to fix the damaged filesystem I have tried a full scrub (which took 2 days and reported no errors) as well as a new balance (with smaller -dusage values) but the problem still remains.

     

    The kernel.log messages end with the following:

    Jul 21 18:15:54 Nasse kernel: WARNING: at fs/btrfs/super.c:255 __btrfs_abort_transaction+0xa4/0xf0()
    Jul 21 18:15:54 Nasse kernel: btrfs: Transaction aborted (error -2)
    Jul 21 18:15:54 Nasse kernel: Modules linked in: vpd(P)
    Jul 21 18:15:54 Nasse kernel: Backtrace: 
    Jul 21 18:15:54 Nasse kernel: [<c003c51c>] (dump_backtrace+0x0/0x110) from [<c061e304>] (dump_stack+0x18/0x20)
    Jul 21 18:15:54 Nasse kernel:  r6:000000ff r5:c02d1bec r4:c46c3c78 r3:00000000
    Jul 21 18:15:54 Nasse kernel: [<c061e2ec>] (dump_stack+0x0/0x20) from [<c006ab4c>] (warn_slowpath_common+0x54/0x70)
    Jul 21 18:15:54 Nasse kernel: [<c006aaf8>] (warn_slowpath_common+0x0/0x70) from [<c006ac0c>] (warn_slowpath_fmt+0x38/0x40)
    Jul 21 18:15:54 Nasse kernel:  r8:0000160d r7:c063f74c r6:d4d9f000 r5:d4e2c680 r4:fffffffe
    Jul 21 18:15:54 Nasse kernel: r3:00000009
    Jul 21 18:15:54 Nasse kernel: [<c006abd4>] (warn_slowpath_fmt+0x0/0x40) from [<c02d1bec>] (__btrfs_abort_transaction+0xa4/0xf0)
    Jul 21 18:15:54 Nasse kernel:  r3:fffffffe r2:c0753e18
    Jul 21 18:15:54 Nasse kernel: [<c02d1b48>] (__btrfs_abort_transaction+0x0/0xf0) from [<c02e24d8>] (__btrfs_free_extent+0x5a0/0x8cc)
    Jul 21 18:15:54 Nasse kernel:  r8:d9f2a240 r7:00000000 r6:001020d8 r5:00000000 r4:00000000
    Jul 21 18:15:54 Nasse kernel: [<c02e1f38>] (__btrfs_free_extent+0x0/0x8cc) from [<c02e6638>] (run_clustered_refs+0xa1c/0xe90)
    Jul 21 18:15:54 Nasse kernel: [<c02e5c1c>] (run_clustered_refs+0x0/0xe90) from [<c02ea530>] (btrfs_run_delayed_refs+0xbc/0x528)
    Jul 21 18:15:54 Nasse kernel: [<c02ea474>] (btrfs_run_delayed_refs+0x0/0x528) from [<c02f9fbc>] (btrfs_commit_transaction+0x90/0x8f0)
    Jul 21 18:15:54 Nasse kernel: [<c02f9f2c>] (btrfs_commit_transaction+0x0/0x8f0) from [<c02f3840>] (transaction_kthread+0x1b0/0x1c4)
    Jul 21 18:15:54 Nasse kernel: [<c02f3690>] (transaction_kthread+0x0/0x1c4) from [<c00843f0>] (kthread+0x8c/0x94)
    Jul 21 18:15:54 Nasse kernel: [<c0084364>] (kthread+0x0/0x94) from [<c006e1f8>] (do_exit+0x0/0x6a8)
    Jul 21 18:15:54 Nasse kernel:  r6:c006e1f8 r5:c0084364 r4:d46d3cc0
    Jul 21 18:15:54 Nasse kernel: ---[ end trace 464cda4a3b14cdb0 ]---
    Jul 21 18:15:54 Nasse kernel: BTRFS error (device md127) in __btrfs_free_extent:5645: errno=-2 No such entry
    Jul 21 18:15:54 Nasse kernel: BTRFS info (device md127): forced readonly
    Jul 21 18:15:54 Nasse kernel: BTRFS debug (device md127): run_one_delayed_ref returned -2
    Jul 21 18:15:54 Nasse kernel: BTRFS error (device md127) in btrfs_run_delayed_refs:2688: errno=-2 No such entry

     

    Any suggestions to repair this? I've read about btrfsck but I wanted to check if there's other options before doing that since it seems like a last option.

     

    Also, I'd like to suggest that ReadyNAS always pauses any btrfs operation like balance before it reboots instead of waiting for it to complete. In order to prevent things like this from happening.

     

    • mdgm-ntgr's avatar
      mdgm-ntgr
      NETGEAR Employee Retired

      Do you have a backup? If not I would suggest backing up your data (if you can) as the next step.

       

      We have a change coming in a future firmware release to attempt to cancel any running balances before shutting down.

       

      Please send your logs in (see the Sending Logs link in my sig)

      • Piglet's avatar
        Piglet
        Luminary

        Good to hear that future versions might avoid this problem. As for my current issue; I don't have a backup but I'm working on finding space for the data and making copies at the moment.

         

        I have sent you the logs.

         

NETGEAR Academy

Boost your skills with the Netgear Academy - Get trained, certified and stay ahead with the latest Netgear technology! 

Join Us!

ProSupport for Business

Comprehensive support plans for maximum network uptime and business peace of mind.

 

Learn More