Re: Running defrag, losing 1GB a minute

joe_schmo · ‎2016-10-12

I am running defrag for the first time in a while... and my free space is dropping like crazy.

Size Used Avail Use% Mounted on

30T 28T 632G 98% /data

When this started, I had over 1TB of free space. And it's going down now, about 1GB per minute. Any ideas what is happening? There's no one writing or adding data to the disks at all.

joe_schmo · ‎2016-10-12

Also, there are no snapshots. just wanted to clarify that up front so that can't be the cause.

mdgm-ntgr · ‎2016-10-12

Sounds like some of your files probably use CoW.

Do you (or did you) have bit-rot protection enabled on any of your shares? This is separate to CoW but we link enabling/disabling bit-rot protection to enabling/disabling CoW.

You should free up space before running volume maintenance on a volume that is this full.

Which firmware are you running?

What does your initrd.log look like?

joe_schmo · ‎2016-10-12

my initrd.log:

2014/01/15 14:52:32] Factory default initiated by button!
[2014/01/15 14:52:47] Defaulting to X-RAID2 mode, RAID level 5
[2014/01/15 14:53:03] Factory default initiated on ReadyNASOS 6.0.4.
[2014/01/15 14:49:34] Updated from ReadyNASOS 6.0.4 to 6.1.5.
[2014/01/15 16:02:20] Updated from ReadyNASOS 6.1.5 to 6.1.6.
[2014/04/14 05:09:27] Updated from ReadyNASOS 6.1.6 (1389750453) to 6.1.7 (1396977042).
[2014/05/24 11:33:24] Updated from ReadyNASOS 6.1.7 (1396977042) to 6.1.8 (1398980083).
[2014/10/28 16:39:30] Updated from ReadyNASOS 6.1.8 (1398980083) to 6.1.9 (1409791183).

I have Bitrot Protection (Copy on Write) enabled on all shares. Not sure wha tyou mean by Bitrot being separate to CoW since its the same option.

Firmware 6.4.2.

What's the link between what I am seeing and bitrot?

mdgm-ntgr · ‎2016-10-12

They are not the same thing but we link enabling them/disabling them.

Running a defrag won't uncow files. It will break the CoW link between current data and snapshots, but even if not using snapshots you will see an increase in volume usage if you are using CoW.

joe_schmo · ‎2016-10-13

Thanks for the info... have some more questions:

most importantly:

- Can I turn off CoW to regain space?

And

- Why does defragging cause space to go down, with regards to CoW?

- How much of an increase in volume usage do I experience using CoW?

StephenB · ‎2016-10-13

@joe_schmo wrote:

- How much of an increase in volume usage do I experience using CoW?

This question is wrongly put. CoW was saving you some space, but the defrag eliminated the savings. CoW itself can only improve storage efficiency, it can't hurt it.

I'm not clear on exactly why your free space is dropping that quickly with no snapshots. I am thinking there might be some that you can't see from the UI. However, your volume is extremely full, and it is also possible that the btrfs allocator is needing to grab free chunks as it defrags your files.

If you purchased between 1 June 2014 and 31 May 2016, then you have lifetime chat support. You could use that, ask Netgear support to look for hidden snapshots.

If there are no snapshots, then offloading some data and then doing a balance would be good next steps.

joe_schmo · ‎2016-10-13

@StephenB wrote:
@joe_schmo wrote:

- How much of an increase in volume usage do I experience using CoW?

This question is wrongly put. CoW was saving you some space, but the defrag eliminated the savings. CoW itself can only improve storage efficiency, it can't hurt it.

OK, that makes sense... so basically, I shouldn't have done a defrag on a CoW share?

I'm not clear on exactly why your free space is dropping that quickly with no snapshots. I am thinking there might be some that you can't see from the UI. However, your volume is extremely full, and it is also possible that the btrfs allocator is needing to grab free chunks as it defrags your files.

So, do you think that after defrag finishes, that I will reclaim space? Just don't want to continue defragging if I'm going to keep losing space, but I do want to continue if it will come back when it's done.

If there are no snapshots, then offloading some data and then doing a balance would be good next steps.

I can unload some data, but I've done a balance now and it had no effect on the free space.

I guess I could create a new share without CoW and copy the data there... just really confused on CoW being such a disaster of a feature.

joe_schmo · ‎2016-10-13

I've identified 2 shares that have something...

In the Admin UI, it says ShareA Consumes 25.1TB, while du -h shows 24TB.

In the Admin UI, it says ShareB Consumes 2.8TB, while du -h shows 2.1TB.

For ShareA, I definitely don't need CoW, so, I am thinking of renaming it to ShareA2, creating a new ShareA with CoW disabled and then rsyncing the data over.

Would that work?

For ShareB, I think I want CoW on, because it's data that only gets addedd, not really edited... but if I did the same thing for that share as I did for ShareA, would I reclaim that missing 1.1TB?

StephenB · ‎2016-10-13

Let's review what CoW actually does:

CoW simply allows multiple versions of a file to use only one copy of the datablocks they have in common. In a snapshot context, if a 300 MB video file (for example) is in both a snapshot and the main share, then all the video data is held in common. So both files share all the datablocks, and only 300 MB of data is actually on the disk. This will normally be contiguous.

If you then edit a tagfield in the main file, then the datablock holding that tagfield has changed. So one datablock is different, all the others are still common. The file in the snapshot remains continguous. But in the main share, a new datablock is substituted for the original. The main share file is therefore fragmented into two (or possibly three) sections.

Now you run a defrag on the main share. The only way to defrag this video file is to replicate all the shared blocks - taking 600 MB of total space, instead of the original 300 MB.

Now the question which is unanswered:

Why are so many GBs of new datablock usage showing up in your system when you have no snapshots?

mdgm says "CoW can create them anyway" - he might well be right, but I don't see how. I think it is likely that you have snapshots that you can't see. This has happened to some users as a side-effect of software updates. I think you need to sort out if you have these hidden snapshots [or not] before you start making changes.

What do you see with

btrfs fi df /data/ShareA and btrfs fi df /data/ShareB

Also, do you see any paths containing .snapshots with

btrfs subvolume list /data

joe_schmo · ‎2016-10-13

root@NAS:~# btrfs fi df /data/ShareA/
Data, single: total=28.45TiB, used=27.89TiB
System, DUP: total=8.00MiB, used=3.43MiB
System, single: total=4.00MiB, used=0.00B
Metadata, DUP: total=279.00GiB, used=37.39GiB
Metadata, single: total=8.00MiB, used=0.00B
GlobalReserve, single: total=512.00MiB, used=32.00KiB

root@NAS:~# btrfs fi df /data/ShareB/
Data, single: total=28.45TiB, used=27.89TiB
System, DUP: total=8.00MiB, used=3.43MiB
System, single: total=4.00MiB, used=0.00B
Metadata, DUP: total=279.00GiB, used=37.39GiB
Metadata, single: total=8.00MiB, used=0.00B
GlobalReserve, single: total=512.00MiB, used=3.59MiB

root@NAS:~# btrfs subvolume list /data
ID 256 gen 1177180 top level 5 path home
ID 259 gen 1462117 top level 5 path .apps
ID 260 gen 7 top level 5 path .vault
ID 266 gen 1030073 top level 5 path Backup
ID 267 gen 1463111 top level 5 path ._share
ID 268 gen 1463113 top level 5 path .timemachine
ID 270 gen 1171396 top level 256 path home/joe
ID 467 gen 1463441 top level 5 path ShareB
ID 9341 gen 3032 top level 256 path home/userA
ID 13841 gen 8115 top level 256 path home/userB
ID 14001 gen 1463057 top level 5 path ShareC
ID 14009 gen 563759 top level 5 path .purge
ID 14010 gen 170284 top level 256 path home/userC
ID 14011 gen 1029948 top level 5 path ShareD
ID 14012 gen 1029937 top level 256 path home/userD
ID 26342 gen 1177181 top level 256 path home/userE
ID 34534 gen 1463441 top level 5 path ShareA
ID 34535 gen 1463113 top level 34534 path ShareA/.snapshots

Wow, thanks for the detailed reply.... So I took your commands and ran them, see above.

I also started copying ShareB to a new Share. As I am moving the data, my free space is actually increasing.

Most shares have the same size between du -h and the Consumed colum in the UI. Some have larger consumed space. But one share has du -h showing 900GB more than the Consumed column. I get how it could be reversed, but that is odd.

StephenB · ‎2016-10-13

Great.

Is there anything in /ShareA/.snapshots ?

mdgm-ntgr · ‎2016-10-13

You have a huge amount allocated to metadata. This suggests that you may well have had snapshots on this system at some point.

It could well be that the amount of metadata has been reduced greatly from what it once was (e.g. if snapshots were deleted) but a huge amount is still allocated to that.

Once you've freed up some space to get volume usage back down to e.g. 80-85% a balance could bring down the metadata allocation to a more reasonable level.

joe_schmo · ‎2016-10-13

@StephenB wrote:
Great.
Is there anything in /ShareA/.snapshots ?

No, no files at all, it's an empty folder.

joe_schmo · ‎2016-10-13

@mdgm wrote:
You have a huge amount allocated to metadata. This suggests that you may well have had snapshots on this system at some point.

It could well be that the amount of metadata has been reduced greatly from what it once was (e.g. if snapshots were deleted) but a huge amount is still allocated to that.

Once you've freed up some space to get volume usage back down to e.g. 80-85% a balance could bring down the metadata allocation to a more reasonable level.

Yeah, it's possible that I had snapshots way back, but I've had this NAS for almost 3 years, so it would have been a while ago.

From what it appears above, there's only 279GB of metadata. From what I can tell, I have 3TB of free space marked as allocated. For example, I've moved 1TB of data over to the new share (on the same partition), and I've increased my free space from 503GB to 942GB. The balance operation is not doing anything for free space.

I am kind of curious now, where the space that I am freeing up is allocated (as in, can I run a command or look somewhere to see it)?

Really appreciate all of the help on this from both of you.

mdgm-ntgr · ‎2016-10-13

A balance moves data and metadata around so that chunks are emptied and can be returned to unallocated space. That's the entire point of a balance.

On a very full system a balance may get stuck and not work so it is important to free up space first.

Allocated space is only a problem is the space on the data volume is fully allocated and data or metadata needs more allocated to it.

joe_schmo · ‎2016-10-14

How much space do you think I need to free up? The answer to that will dictate where I move data temporarily.

StephenB · ‎2016-10-14

@joe_schmo wrote:

How much space do you think I need to free up? The answer to that will dictate where I move data temporarily.

I'd free up about 5 TB. That will bring you into the 80-85% range.

Longer term, you should consider either expanding the volume (e.g., 8 TB drives), adding on an EDA500, or possibly getting a second NAS for some of the data.

joe_schmo · ‎2016-10-15

OK, so I am moving some data around to get the free space up... when I do a balance, from the numbers I posted above, I can expect to gain no more than 279GB of space though, right?

mdgm-ntgr · ‎2016-10-17

A balance will move around both data and metadata. You'd expect some space to be returned to unallocated space from both of these.

joe_schmo · ‎2016-10-19

Awesome. Final question... it appears that this is progressing at 1% per day. Any way to speed it up? Will freeing even more space make it go faster? btrfs balance status -v /data Balance on '/data' is running 1527 out of about 29508 chunks balanced (1528 considered), 95% left Dumping filters: flags 0x7, state 0x1, force is off DATA (flags 0x0): balancing METADATA (flags 0x0): balancing SYSTEM (flags 0x0): balancing