- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
Re: Followup
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Aborting a balance
I had 25GB free, which seems to cause the balance to take a very long time.
After about 20h the display still said "0% done", so I figured I'd restart and try to make some more free space available.
However, after selecting Restart from the web interface I'm now stuck at the "See you soon" display because the system seems to wait with the restart until the balance is done. 😞
All services seems to have shut down. I can't access the NAS via ssh or file sharing any more. But the disks are working hard, so it's not locked up.
I've now waited 48h after the restart attempt and it still hasn't restarted. The balance has been running 3 days...
Any ideas on how long it will take to complete the balance? Any suggestions of what I can do? I obviously can't wait several weeks for it to complete but I'm hesitant to force a restart by pulling the plug after reading stories of the NAS becoming unbootable.
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Re: Aborting a balance
You had 25GB free? If there's only a small amount of space unused then it would be recommended to free up some space before running a balance if you want to run one.
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Re: Aborting a balance
I guess I'm left with the dilemma of either cutting the power and risking data loss or waiting an unknown time for the balance to finish.
Is there any way to estimate the time it requires? Even if it had to rewrite every single byte on the drives, one would think it would only take a couple of days. Not several weeks as the initial "0% done" after 20h indicated.
Right now it's about 70h since the balance started.
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Re: Aborting a balance
See http://www.readynas.com/forum/viewtopic.php?f=21&t=80244 for what it is worth.
Unless there are other problems or no free space it is usually very quick.
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Followup
So here's what happened.
I waited close to two weeks for the balance to complete but when it hadn't done so I finally pulled the cord.
It rebooted normally after that, it also automatically restarted the balance (at 0%). Which I stopped gracefully now that I had ssh access again.
I found that the filesystem had been damaged when I was trying to delete some files and the filesystem suddenly turned read-only. I gather this is a feature of btrfs, so when something unexpected happens it prevents more damage. A reboot resets the filesystem to the normal read/write.
In order to fix the damaged filesystem I have tried a full scrub (which took 2 days and reported no errors) as well as a new balance (with smaller -dusage values) but the problem still remains.
The kernel.log messages end with the following:
Jul 21 18:15:54 Nasse kernel: WARNING: at fs/btrfs/super.c:255 __btrfs_abort_transaction+0xa4/0xf0() Jul 21 18:15:54 Nasse kernel: btrfs: Transaction aborted (error -2) Jul 21 18:15:54 Nasse kernel: Modules linked in: vpd(P) Jul 21 18:15:54 Nasse kernel: Backtrace: Jul 21 18:15:54 Nasse kernel: [<c003c51c>] (dump_backtrace+0x0/0x110) from [<c061e304>] (dump_stack+0x18/0x20) Jul 21 18:15:54 Nasse kernel: r6:000000ff r5:c02d1bec r4:c46c3c78 r3:00000000 Jul 21 18:15:54 Nasse kernel: [<c061e2ec>] (dump_stack+0x0/0x20) from [<c006ab4c>] (warn_slowpath_common+0x54/0x70) Jul 21 18:15:54 Nasse kernel: [<c006aaf8>] (warn_slowpath_common+0x0/0x70) from [<c006ac0c>] (warn_slowpath_fmt+0x38/0x40) Jul 21 18:15:54 Nasse kernel: r8:0000160d r7:c063f74c r6:d4d9f000 r5:d4e2c680 r4:fffffffe Jul 21 18:15:54 Nasse kernel: r3:00000009 Jul 21 18:15:54 Nasse kernel: [<c006abd4>] (warn_slowpath_fmt+0x0/0x40) from [<c02d1bec>] (__btrfs_abort_transaction+0xa4/0xf0) Jul 21 18:15:54 Nasse kernel: r3:fffffffe r2:c0753e18 Jul 21 18:15:54 Nasse kernel: [<c02d1b48>] (__btrfs_abort_transaction+0x0/0xf0) from [<c02e24d8>] (__btrfs_free_extent+0x5a0/0x8cc) Jul 21 18:15:54 Nasse kernel: r8:d9f2a240 r7:00000000 r6:001020d8 r5:00000000 r4:00000000 Jul 21 18:15:54 Nasse kernel: [<c02e1f38>] (__btrfs_free_extent+0x0/0x8cc) from [<c02e6638>] (run_clustered_refs+0xa1c/0xe90) Jul 21 18:15:54 Nasse kernel: [<c02e5c1c>] (run_clustered_refs+0x0/0xe90) from [<c02ea530>] (btrfs_run_delayed_refs+0xbc/0x528) Jul 21 18:15:54 Nasse kernel: [<c02ea474>] (btrfs_run_delayed_refs+0x0/0x528) from [<c02f9fbc>] (btrfs_commit_transaction+0x90/0x8f0) Jul 21 18:15:54 Nasse kernel: [<c02f9f2c>] (btrfs_commit_transaction+0x0/0x8f0) from [<c02f3840>] (transaction_kthread+0x1b0/0x1c4) Jul 21 18:15:54 Nasse kernel: [<c02f3690>] (transaction_kthread+0x0/0x1c4) from [<c00843f0>] (kthread+0x8c/0x94) Jul 21 18:15:54 Nasse kernel: [<c0084364>] (kthread+0x0/0x94) from [<c006e1f8>] (do_exit+0x0/0x6a8) Jul 21 18:15:54 Nasse kernel: r6:c006e1f8 r5:c0084364 r4:d46d3cc0 Jul 21 18:15:54 Nasse kernel: ---[ end trace 464cda4a3b14cdb0 ]--- Jul 21 18:15:54 Nasse kernel: BTRFS error (device md127) in __btrfs_free_extent:5645: errno=-2 No such entry Jul 21 18:15:54 Nasse kernel: BTRFS info (device md127): forced readonly Jul 21 18:15:54 Nasse kernel: BTRFS debug (device md127): run_one_delayed_ref returned -2 Jul 21 18:15:54 Nasse kernel: BTRFS error (device md127) in btrfs_run_delayed_refs:2688: errno=-2 No such entry
Any suggestions to repair this? I've read about btrfsck but I wanted to check if there's other options before doing that since it seems like a last option.
Also, I'd like to suggest that ReadyNAS always pauses any btrfs operation like balance before it reboots instead of waiting for it to complete. In order to prevent things like this from happening.
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Re: Followup
Do you have a backup? If not I would suggest backing up your data (if you can) as the next step.
We have a change coming in a future firmware release to attempt to cancel any running balances before shutting down.
Please send your logs in (see the Sending Logs link in my sig)
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Re: Followup
Good to hear that future versions might avoid this problem. As for my current issue; I don't have a backup but I'm working on finding space for the data and making copies at the moment.
I have sent you the logs.
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Re: Followup
I have now copied all my data off the NAS.
Unfortunately I lost a 400GB directory after moving it to a connected USB disk using the web interface. After leaving the copy overnight I woke up to a frozen NAS. I had to cut the power to reboot it. The logs indicated it had gone into read-only mode again and the directory I had copied had been deleted, but the disk appeared empty. Using a recovery software I eventually found about 200GB of the files although most of them without filenames, so it will take a long time to piece together what is what.
I ran "btrfs check" on the disk:
# btrfs check /dev/md127 Checking filesystem on /dev/md127 UUID: 34bda540-18c4-4437-b708-f7d6d81b53c3 checking extents checking free space cache cache and super generation don't match, space cache will be invalidated checking fs roots checking csums checking root refs found 1762329360808 bytes used err is 0 total csum bytes: 1047206540 total tree bytes: 2745073664 total fs tree bytes: 1211465728 total extent tree bytes: 228753408 btree space waste bytes: 484263440 file data blocks allocated: 67038373376000 referenced 5557936832512 Btrfs v3.17.3
After doing that I tried once again to delete files, but it again triggered the errors leading to the read-only state.
Is there anything else I could try before giving up and reformatting everything?
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Re: Followup
The 6.4 firmware updated continues to screw up everything, it's becoming the worst update ever from Netgear.
During a disk balance task, our 516 completely locks up. No frontview access, no SSH, no ping responses. Nothing.
After manually restarting the device, the disk balance starts all over again and will lockup the system eventually (sometimes at 40%, others at 62%, it's completely random).
I have no way to cancel this job from frontview. I have ssh access but I need furhter instructions.
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Re: Followup
The command to stop a balance is:
btrfs fi balance cancel /data
(where /data is the path)
If you want to see the current status of running balance operations, use:
btrfs fi balance status /data
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Re: Followup
Thanks man!
Should this be a quick kill? My terminal has been like this for the past few minutes:
Welcome to ReadyNASOS 6.4.0 Last login: Mon Oct 19 11:56:23 2015 from 192.168.1.73 root@NAS-EK:~# btrfs fi balance cancel /data
And the frontview is completely unresponsive.
However, ths second command shows:
Last login: Mon Oct 19 12:00:30 2015 from 192.168.1.73 root@NAS-EK:~# btrfs fi balance status /data Balance on '/data' is running, cancel requested 2 out of about 436 chunks balanced (1132 considered), 100% left root@NAS-EK:~#
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Re: Followup
I'm not sure. I've only done it once and as far as I remember it was fairly quick, but not instantaneous. I'm guessing it has to finish up the current chunk before it can cancel gracefully.
@EKroboter wrote:Should this be a quick kill? My terminal has been like this for the past few minutes:
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Re: Followup
Thanks. I'll wait it out. In the meanitime the NAS and all the shares are fully accesible, so at least we cna get some work done. The frontview won't load though.
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Re: Followup
It eventually stopped and now fonrtview is accesible and fast again. No connections issues so far, performance seems to be on par as with yesterday.
I cancelled all scheduled defrag, scrub and balance tasks for good measure.
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Re: Followup
I had the same problem with my 314. Glad there was a way to cancel the disk balance.
Is there a way to disable disk balance?
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Re: Followup
The ssh command worked for me, and I also disabled schedule disk maintenance (defrag, scrub and balance) for the time being.
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Re: Aborting a balance
similar issues with balance for me on a RN314.
I used the "btrfs fi balance status /data" command to kill the operation.
Running RAID5 xraid; 6.4.0; 4ea 4TB red drives. 2.2TB free space
Balance hangs system everytime.
I've also stopped scrub and defrag until this issue gets resolved.
not good...
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Re: Aborting a balance
Add another RN314 to the sad list of hung systems after a balance.
6.4.1 can't come soon enough.
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Re: Aborting a balance
Took me 4 attempts at killing the balance!!! Grrr!
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Re: Aborting a balance
I have a similar issue using OS 6.4.1 on a 314. Balance is running, and has been for > 12 hours (to get 5% complete).
The machine is barely responsive - copies from windows PC taking a very long time, frontview sometimes responds, sometimes does not.
When the machine was running the 6.2 seroes of OS the balance performance was OK.
A performance improvement here would be nice, please.
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Re: Aborting a balance
quickly_now, if you can download your logs, can you send those in please (see the Sending Logs link in my sig)?