Forum Discussion

Luminary

Feb 19, 2021

Cannot copy files to RN316 although I have 22TB free...

Hi all. I have the following RN316: Firmware 6.10.4 Running 6 x 10TB IronWolf HDDs X-RAID 21.9TB free / 23.4TB used History: Last year I replaced all the 8TB Ironwold HDDs (from memory) one by...

Luminary

Mar 06, 2021

Ha! Ha! Ha!

Clearly you are a 1,000,000% correct in saying The "total=22.95TB" doesn't mean what you think it means.

Yes I understand that md127 + md126 should equal to 45.35TiB, but everything else is confusing... Does the "Data" label in the UI denote the amount of storage space consumed or it just referring to the Data pool, or something else?

Why do I ask? Well...

So this is what I did yesterday:

Moved some files off the new share I created in Feb 2021 and an older one (May 2020).
Turned off the remaining 2 snapshots that I had for OneDrive and DropBox
Deleted all recent 2021 daily snapshots for these two shares that have been created.
I decided to keep the monthly snapshots for these shares. As discussed before there are only about 2 x 19 of them going back 2 years. (Now I am not sure how the RN316 implements snapshots. If I have only ever added files and hardly ever modified or deleted the files, does the snapshot make a copy of the 16GB work of files, or just does it just maintain metadat in the file system, and when you modify or delete a files that is when it "moves" it into the snapshot, if you know what I mean)
Before I went to sleep last night I kickjed off aa Defrag.
Today I kicked of a Balance through the UI

Operations performed yesterday and kicked off today

But right now 3.5 hours later here is what the UI is showing:

Why has Data dropped from 23.40TB to 13.26TB???

So obviously the confusion/concern is that Data dropped from 23.40TB to 13.26TB!

Is this being recalculated dynamically as the Balance is performing it's "dark magic"? So I have to wait for it to finish to see what these values end up? Or is it accurate right now?

Is this expected? Obviously something has happened, and is happening. How exciting :(

My concern of course it that I have "lost data" as I am pretty sure I have NOT moved off 10TB as I don't have enough free spare storage. :)

Recall that I:

Turned off Smart Snapshots from about 5 shares which had very little data. I did not ever have snapshot my main shares that I used for Videos, Photos, Software and Backups, which would probably account for 90% off files. I only had a number of snapshots for the OneDrive and DropBox shares, but remember they are both limited to 16GB as i am only using the free tiers.
Moved 500GB of the file that I copied across to the RN316 in February 2021 and some older files from May 2020 (so before I swapped out the last 10TB HDD)

So this clearly is well under 10TB. Not even 1TB.

So why the big decrease in size reported for Data?

Apologies if the answer is really simple and I am being dumb...

StephenB

Guru - Experienced User

Mar 06, 2021

Platypus69 wrote:

Yes I understand that md127 + md126 should equal to 45.35TiB, but everything else is confusing... Does the "Data" label in the UI denote the amount of storage space consumed or it just referring to the Data pool, or something else?

Going back to where you started, I'm going to slightly revise the original report, which hopefully will help provide clarity.

Label: 'data'  uuid: ...
	Total devices 2 FS bytes used 23.43TiB
	devid    1 size 18.17TiB allocated 18.17TiB path /dev/md127
	devid    2 size 27.28TiB allocated 5.29TiB path /dev/md126
=== filesystem /data ===
Data, single: allocated=23.43TiB, used=23.42TiB
System, RAID1: allocated=32.00MiB, used=2.99MiB
Metadata, RAID1: allocated=5.85GiB, used=5.84GiB
Metadata, DUP: allocated=10.50GiB, used=10.01GiB
GlobalReserve, single: allocated=512.00MiB, used=33.05MiB
=== subvolume /data ===

Looking at line 4, md126 has a size of 27.28 TiB, but only 5.29 TiB was allocated. Per line 3, md127 has a size of 18.17 TiB, but all of it was allocated. This totals 23.46 allocated.

Looking at line 6, the volume had 23.43 TiB allocated for data. Since we have 23.46 total allocation, that means .03 TiB (~30 GiB) was allocated to system and metadata. Of course there are rounding errors, since the reports aren't exact - so if you add up the system and metadata stuff on lines 7-10, it is a bit less than that.

Now, looking again at line 6, you had 23.43 TiB of allocated space, but 23.42 TiB of used space. That means you had .01 TiB of space that is allocated but not used. Although it is natural to label this "allocated but not used" bucket as "unused", personally I think it is more appropriate to label it "unusable". This unusable space is there because BTRFS allocates space in blocks. There will be some lost (unusable) space in many blocks.

I've been careful not to use the word "free", because that concept is a bit slippery with btrfs. The Web UI is labeling unallocated space as "free" - which is reasonable, but sometimes misleading. What you really have is allocated (used and "unusable") and unallocated.

As rn_enthusiast points out, the balance was failing because the file system

is set up to duplicate metadata (so there is a copy on both md126 and md127)
and there was no allocated space at all on md127

Your deletions apparently did reclaim some unallocated space, and it looks like the balance is now doing its job. But what exactly does "doing it's job" mean?

As Sandshark said (quoting the man page), The primary purpose of the balance feature is to spread block groups across all devices... There is useful side effect though. A balance will also consolidate the allocated space, so there is less "unusable" space. So even if you only have one device (your original md127), it can be useful to run balances from time to time.

So when your balance completes, you should expect to see more unallocated space on md127, and more allocated space on md126. You should look at the unallocated space you end up with on both volumes when it's done.

But as rn_enthusiast says Running balance from GUI isn't really a full balance. The GUI will use parameters during the balance so it only balances parts of the volume. So what's that about? Well, mostly it's about how long the balance takes. A full balance (with no filter parameters) will take several days on your system. Lots of users complained about the run time, so Netgear added in some filters - which speed it up, but at the cost of not balancing completely. What these parameters do is focus the balance on chunks that have unusable space.

For instance, rn_enthusiast also suggested running running the balance from the command line:

btrfs balance start -dusage=10 /data

The -d is a filter for data blocks (not metadata or system). The usage=10 tells the balance to only process blocks that have 10% (or less) used space - in other words, only process blocks that have 90% or more unusable space. That will run more quickly, and it will be easier for the system to consolidate the unusable space - converting it back to the unallocated space you needed. The system needs some working space in order for the considation to happen, and setting the dusage low reduces that space. FWIW, I'd So why the big decrease in size reported for Data?have suggested starting with -dusage=0 as there often are some allocated blocks that end up completely empty, and the system can convert them back to unallocated without needing any working space.

Platypus69 wrote:

So why the big decrease in size reported for Data?

Good question, and I'm not sure I have a fully satisfactory answer. But I believe the issue is that you had more unusable space than the system was reporting (that the used fraction in the reports is an estimate). Then once the balance was able to really get going, it found a lot more unusable space that it could shift to unallocated.

This could be related to the snapshots you deleted - the system perhaps wasn't able to reclaim the space at the time, but now that you have some unallocated space to work with, the system is getting that space back.

Platypus69 wrote:

Before I went to sleep last night I kicked off a Defrag.

You got away with this, but it was a bad idea.

A defrag is basically rewriting a fragmented file, so it is unfragmented. Doing that requires unallocated space that you didn't have.

Even with older file systems like FAT32, defragging the files results in fragmented free space, and defragging the free space results in fragmented files. It's similar with BTRFS - defragging files will end up reducing the unallocated space.

Also, defragging a share with snapshots can sharply increase the amount of disk space used by the share. If you want to defrag regularly, you really do want to limit snapshot retention (as I suggested earlier).

Platypus69
Luminary
Mar 06, 2021
Thanks for that.

Muchly appreciated.

Perhaps Deftrag was not a big deal as I have only ever copied files and never/hardly ever deleted or modified them. (But I do have Copy-on-Write enabled for all shares. Not sure what that has to do woith Bit-Rot Protection, but I digress.) So that's why I got away with it???

In any case I will report what happens tomorrow as it is still at 64% as of midnight tonight. Another aside: I cannot a Balance that fails with an error shows up as a green completed line in the UI logs!!!

But is sound like you're saying that I should still perform the following after this completes:

btrfs balance start -dusage=10 /data btrfs balance start -dusage=30 /data btrfs balance start -dusage=50 /data btrfs balance start /data
Happy to do that.

Otherwise, is my current scenario due to the fact that it is predominantly storing photos? So my Photos share stores over 380,000 files that are JPG and HEVC files which are on average around 4MB in size? So that accounts for your "unusable space"?

Thanks again for comprehensive reply, muchly appreciated.
StephenB
Guru - Experienced User
Mar 06, 2021
Platypus69 wrote:

Another aside: I cannot a Balance that fails with an error shows up as a green completed line in the UI logs!!!

I'm not understanding this comment.

Platypus69 wrote:

But it sound like you're saying that I should still perform the following after this completes:

Give us the btrfs info after the balance completes - that will make it easier to give you any next steps.

Platypus69 wrote:

Otherwise, is my current scenario due to the fact that it is predominantly storing photos? So my Photos share stores over 380,000 files that are JPG and HEVC files which are on average around 4MB in size? So that accounts for your "unusable space"?

I don't think it's about the number of files or their size.

BTRFS allocates fairly large blocks (1 GiB). The data from more than one file can be stored in the same block.

When you fill the share initially, the file system should be pretty efficient. But later on, as files are modified or deleted there will be "holes" - unused space in the blocks. There is background processing that can consolidate them - but I'm not sure what triggers that background processing (and there's no easy way to see it).

But doing a balance will run all the blocks that meet the -dusage criteria through the btrfs allocator again, and that will consolidate the data (removing the wasted space).

Platypus69

Luminary

Mar 07, 2021

Wow! The remaining 80% of the Balance took most of today.

data        balance    2021-03-05 09:34:32  2021-03-05 10:29:27  completed  ERROR: error during balancing '/data': No space left on device
T
data        balance    2021-03-05 19:39:44  2021-03-05 19:49:07  completed  ERROR: error during balancing '/data': No space left on device
T
data        balance    2021-03-05 21:09:45  2021-03-05 21:27:23  completed  ERROR: error during balancing '/data': No space left on device
T
data        balance    2021-03-05 21:28:15  2021-03-05 21:28:19  completed  Done, had to relocate 1 out of 23557 chunks
data        balance    2021-03-05 21:45:20  2021-03-05 21:46:05  completed  Done, had to relocate 29 out of 23557 chunks
data        balance    2021-03-05 21:57:26  2021-03-05 21:57:31  completed  Done, had to relocate 1 out of 23529 chunks
data        balance    2021-03-05 21:59:22  2021-03-05 21:59:27  completed  Done, had to relocate 1 out of 23529 chunks
data        balance    2021-03-05 21:59:48  2021-03-05 21:59:53  completed  Done, had to relocate 1 out of 23529 chunks
data        balance    2021-03-05 22:25:13  2021-03-05 22:25:18  completed  Done, had to relocate 1 out of 23529 chunks
data        balance    2021-03-05 23:19:38  2021-03-05 23:19:44  completed  Done, had to relocate 1 out of 23529 chunks
data        balance    2021-03-06 00:54:22  2021-03-06 00:54:28  completed  Done, had to relocate 1 out of 23529 chunks
data        defrag     2021-03-06 00:54:49  2021-03-06 03:02:04  completed
data        balance    2021-03-06 11:23:49  2021-03-07 18:58:54  completed  Done, had to relocate 8286 out of 23482 chunks

Out of curiosity is there a correlation between the 8286 chunks relocated and the 10 odd TB that Data was reduced by? So something like 8286 x 1GB = 10TB approximately?

BTRFS:

Label: '*:root'  uuid: *
	Total devices 1 FS bytes used 1.48GiB
	devid    1 size 4.00GiB used 3.61GiB path /dev/md0

Label: '*:data'  uuid:*
	Total devices 2 FS bytes used 13.24TiB
	devid    1 size 18.17TiB used 10.02TiB path /dev/md127
	devid    2 size 27.28TiB used 4.83TiB path /dev/md126

=== filesystem /data ===
Data, single: total=14.81TiB, used=13.23TiB
System, RAID1: total=32.00MiB, used=2.04MiB
Metadata, RAID1: total=6.85GiB, used=6.80GiB
Metadata, DUP: total=10.50GiB, used=8.44GiB
GlobalReserve, single: total=512.00MiB, used=24.00KiB
=== subvolume /data ===

Does that look all good now? I have not lost data?

So what should I do now?

Happy to do following as was suggested before.

btrfs balance start -dusage=10 /data
btrfs balance start -dusage=30 /data
btrfs balance start -dusage=50 /data
btrfs balance start /data

Or anything else such as Scrub or a Defrag.

I don't want to again have the same issue of 10TB being free but not being able to copy any files to the RN316.

So if these operations take another week, that is perfectly fine. I'd rather have everythign optimised now.

Thanks again!!!

StephenB
Guru - Experienced User
Mar 07, 2021
Platypus69 wrote:

Does that look all good now? I have not lost data?

So what should I do now?

All is good now, and there is no need to run anything immediately. You can of course restore the data you deleted.

But I do suggest setting up a maintenance schedule using the volume schedule control on the volume settings wheel (again, mine uses a four-month cycle, running one maintenance task each month). You could alternatively enable autodefrag on the share settings, and run the remaining tasks on a 3 month cycle.

You could also re-enable snapshots on the remaining shares if you like - ideally setting up custom snapshots with fixed retention. I use 3 months on my own systems for most shares.

If you don't have a backup plan in place for your NAS you should definitely set up one - for instance, purchasing a large USB backup disk.

Platypus69 wrote:

Out of curiosity is there a correlation between the 8286 chunks relocated and the 10 odd TB that Data was reduced by? So something like 8286 x 1GB = 10TB approximately?

Each chunk is 1 GiB. So while there is a correlation, you reclaimed somewhat more space than that. There could have been some chunks that were allocated but completely unused - not sure if the log would show them as relocated or not.
Platypus69
Luminary
Mar 07, 2021
Thanks for that. Will do.

Thanks A BILLION for everyone's help and all practical advice!!!!!!

And patience :)

I do have another NAS/HDD for the photos but am wondering if I just RAR up each month and then store them in AWS Deep Glacier for $0.00099 per GB-month.

Notice that my New DS1819+ can sync with BackBlaze, so maybe I'll take advantge of that???
rn_enthusiast
Virtuoso
Mar 07, 2021
StephenB

Out of interest, how did your full balance distribute the data between md127 and md126, in the end?

:)

StephenB

Guru - Experienced User

Mar 07, 2021

rn_enthusiast wrote:

StephenB

Out of interest, how did your full balance distribute the data between md127 and md126, in the end?

It's still running - at the moment the stats are

Label: '2fe72582:data'  uuid: a665beff-2a06-4b88-b538-f9fa4fb2dfef
        Total devices 2 FS bytes used 13.64TiB
        devid    1 size 16.36TiB used 9.55TiB path /dev/md127
        devid    2 size 10.91TiB used 4.11TiB path /dev/md126

It started at

Label: '2fe72582:data'  uuid: a665beff-2a06-4b88-b538-f9fa4fb2dfef
	Total devices 2 FS bytes used 13.54TiB
	devid    1 size 16.36TiB used 12.72TiB path /dev/md127
	devid    2 size 10.91TiB used 1.27TiB path /dev/md126

So it has shifted quite a few blocks to md126.

NETGEAR Academy

Boost your skills with the Netgear Academy - Get trained, certified and stay ahead with the latest Netgear technology!

Join Us!

ProSupport for Business

Comprehensive support plans for maximum network uptime and business peace of mind.

Learn More

Forum Discussion

Cannot copy files to RN316 although I have 22TB free...

Related Content

Wifi FREE

Fibre Free et ORBI RBR50

Can't free space

3 Months Free Plex pass promotion code -

NETGEAR Business Switches New Features Roadmap - Free Webinar

NETGEAR Academy

ProSupport for Business