I'm running 6.2.2 on a RN104 with 3x3TB and scheduled a balance a couple of days ago.I had 25GB free, which seems to cause the balance to take a very long time.After about 20h the display still said "0% done", so I figured I'd restart and try to make some more free space available.However, after selecting Restart from the web interface I'm now stuck at the "See you soon" display because the system seems to wait with the restart until the balance is done. :( All services seems to have shut down. I can't access the NAS via ssh or file sharing any more. But the disks are working hard, so it's not locked up.I've now waited 48h after the restart attempt and it still hasn't restarted. The balance has been running 3 days...Any ideas on how long it will take to complete the balance? Any suggestions of what I can do? I obviously can't wait several weeks for it to complete but I'm hesitant to force a restart by pulling the plug after reading stories of the NAS becoming unbootable.

The command to stop a balance is:btrfs fi balance cancel /data(where /data is the path) If you want to see the current status of running balance operations, use:btrfs fi balance status /data

EKroboter wrote:Should this be a quick kill? My terminal has been like this for the past few minutes:I'm not sure. I've only done it once and as far as I remember it was fairly quick, but not instantaneous. I'm guessing it has to finish up the current chunk before it can cancel gracefully.

A balance can be cancelled but with SSH stopped and no access to the web interface there's nothing that can be done.You had 25GB free? If there's only a small amount of space unused then it would be recommended to free up some space before running a balance if you want to run one.

Yes, I had 25GB free and I came to realise that it wasn't enough after the balance wasn't getting anywhere after 20h. My mistake was to do a restart before freeing up more space. I assumed it would halt or pause the balance, not pause the shutdown and make the NAS inaccessible. :(I guess I'm left with the dilemma of either cutting the power and risking data loss or waiting an unknown time for the balance to finish. Is there any way to estimate the time it requires? Even if it had to rewrite every single byte on the drives, one would think it would only take a couple of days. Not several weeks as the initial "0% done" after 20h indicated.Right now it's about 70h since the balance started.

Sent you a PM.Do you have a backup?

Aborting a balance | NETGEAR Communities

34 Replies

Replies have been turned off for this discussion

mdgm-ntgr
NETGEAR Employee Retired
Jul 09, 2015
A balance can be cancelled but with SSH stopped and no access to the web interface there's nothing that can be done.

You had 25GB free? If there's only a small amount of space unused then it would be recommended to free up some space before running a balance if you want to run one.
- mschaffl
  Aspirant
  Nov 02, 2015
  - rugene
    Guide
    Nov 04, 2015
    similar issues with balance for me on a RN314.
    I used the "btrfs fi balance status /data" command to kill the operation.
    Running RAID5 xraid; 6.4.0; 4ea 4TB red drives. 2.2TB free space
    Balance hangs system everytime.
    
    I've also stopped scrub and defrag until this issue gets resolved.
    
    not good...
Piglet
Luminary
Jul 09, 2015
Yes, I had 25GB free and I came to realise that it wasn't enough after the balance wasn't getting anywhere after 20h. My mistake was to do a restart before freeing up more space. I assumed it would halt or pause the balance, not pause the shutdown and make the NAS inaccessible. :(

I guess I'm left with the dilemma of either cutting the power and risking data loss or waiting an unknown time for the balance to finish.

Is there any way to estimate the time it requires? Even if it had to rewrite every single byte on the drives, one would think it would only take a couple of days. Not several weeks as the initial "0% done" after 20h indicated.

Right now it's about 70h since the balance started.
mdgm-ntgr
NETGEAR Employee Retired
Jul 09, 2015
Sent you a PM.

Do you have a backup?
BaJohn
Virtuoso
Jul 10, 2015
Just to say, I tried to provide info on how long a balance takes.
See http://www.readynas.com/forum/viewtopic.php?f=21&t=80244 for what it is worth.
Unless there are other problems or no free space it is usually very quick.

Piglet

Luminary

Jul 23, 2015

So here's what happened.

I waited close to two weeks for the balance to complete but when it hadn't done so I finally pulled the cord.

It rebooted normally after that, it also automatically restarted the balance (at 0%). Which I stopped gracefully now that I had ssh access again.

I found that the filesystem had been damaged when I was trying to delete some files and the filesystem suddenly turned read-only. I gather this is a feature of btrfs, so when something unexpected happens it prevents more damage. A reboot resets the filesystem to the normal read/write.

In order to fix the damaged filesystem I have tried a full scrub (which took 2 days and reported no errors) as well as a new balance (with smaller -dusage values) but the problem still remains.

The kernel.log messages end with the following:

Jul 21 18:15:54 Nasse kernel: WARNING: at fs/btrfs/super.c:255 __btrfs_abort_transaction+0xa4/0xf0()
Jul 21 18:15:54 Nasse kernel: btrfs: Transaction aborted (error -2)
Jul 21 18:15:54 Nasse kernel: Modules linked in: vpd(P)
Jul 21 18:15:54 Nasse kernel: Backtrace: 
Jul 21 18:15:54 Nasse kernel: [<c003c51c>] (dump_backtrace+0x0/0x110) from [<c061e304>] (dump_stack+0x18/0x20)
Jul 21 18:15:54 Nasse kernel:  r6:000000ff r5:c02d1bec r4:c46c3c78 r3:00000000
Jul 21 18:15:54 Nasse kernel: [<c061e2ec>] (dump_stack+0x0/0x20) from [<c006ab4c>] (warn_slowpath_common+0x54/0x70)
Jul 21 18:15:54 Nasse kernel: [<c006aaf8>] (warn_slowpath_common+0x0/0x70) from [<c006ac0c>] (warn_slowpath_fmt+0x38/0x40)
Jul 21 18:15:54 Nasse kernel:  r8:0000160d r7:c063f74c r6:d4d9f000 r5:d4e2c680 r4:fffffffe
Jul 21 18:15:54 Nasse kernel: r3:00000009
Jul 21 18:15:54 Nasse kernel: [<c006abd4>] (warn_slowpath_fmt+0x0/0x40) from [<c02d1bec>] (__btrfs_abort_transaction+0xa4/0xf0)
Jul 21 18:15:54 Nasse kernel:  r3:fffffffe r2:c0753e18
Jul 21 18:15:54 Nasse kernel: [<c02d1b48>] (__btrfs_abort_transaction+0x0/0xf0) from [<c02e24d8>] (__btrfs_free_extent+0x5a0/0x8cc)
Jul 21 18:15:54 Nasse kernel:  r8:d9f2a240 r7:00000000 r6:001020d8 r5:00000000 r4:00000000
Jul 21 18:15:54 Nasse kernel: [<c02e1f38>] (__btrfs_free_extent+0x0/0x8cc) from [<c02e6638>] (run_clustered_refs+0xa1c/0xe90)
Jul 21 18:15:54 Nasse kernel: [<c02e5c1c>] (run_clustered_refs+0x0/0xe90) from [<c02ea530>] (btrfs_run_delayed_refs+0xbc/0x528)
Jul 21 18:15:54 Nasse kernel: [<c02ea474>] (btrfs_run_delayed_refs+0x0/0x528) from [<c02f9fbc>] (btrfs_commit_transaction+0x90/0x8f0)
Jul 21 18:15:54 Nasse kernel: [<c02f9f2c>] (btrfs_commit_transaction+0x0/0x8f0) from [<c02f3840>] (transaction_kthread+0x1b0/0x1c4)
Jul 21 18:15:54 Nasse kernel: [<c02f3690>] (transaction_kthread+0x0/0x1c4) from [<c00843f0>] (kthread+0x8c/0x94)
Jul 21 18:15:54 Nasse kernel: [<c0084364>] (kthread+0x0/0x94) from [<c006e1f8>] (do_exit+0x0/0x6a8)
Jul 21 18:15:54 Nasse kernel:  r6:c006e1f8 r5:c0084364 r4:d46d3cc0
Jul 21 18:15:54 Nasse kernel: ---[ end trace 464cda4a3b14cdb0 ]---
Jul 21 18:15:54 Nasse kernel: BTRFS error (device md127) in __btrfs_free_extent:5645: errno=-2 No such entry
Jul 21 18:15:54 Nasse kernel: BTRFS info (device md127): forced readonly
Jul 21 18:15:54 Nasse kernel: BTRFS debug (device md127): run_one_delayed_ref returned -2
Jul 21 18:15:54 Nasse kernel: BTRFS error (device md127) in btrfs_run_delayed_refs:2688: errno=-2 No such entry

Any suggestions to repair this? I've read about btrfsck but I wanted to check if there's other options before doing that since it seems like a last option.

Also, I'd like to suggest that ReadyNAS always pauses any btrfs operation like balance before it reboots instead of waiting for it to complete. In order to prevent things like this from happening.

mdgm-ntgr
NETGEAR Employee Retired
Jul 23, 2015
Do you have a backup? If not I would suggest backing up your data (if you can) as the next step.

We have a change coming in a future firmware release to attempt to cancel any running balances before shutting down.

Please send your logs in (see the Sending Logs link in my sig)
- Piglet
  Luminary
  Jul 24, 2015
  Good to hear that future versions might avoid this problem. As for my current issue; I don't have a backup but I'm working on finding space for the data and making copies at the moment.
  
  I have sent you the logs.

Forum Discussion

Aborting a balance

34 Replies

Related Content

Volume balance

XS728T - LAG Load balancing

Orbi AXE11000 WAN Balancing

X-RAID aborting expansion

ssh command to stop balance

NETGEAR Academy

ProSupport for Business