NETGEAR is aware of a growing number of phone and online scams. To learn how to stay safe click here.
Forum Discussion
Platypus69
Feb 19, 2021Luminary
Cannot copy files to RN316 although I have 22TB free...
Hi all. I have the following RN316: Firmware 6.10.4 Running 6 x 10TB IronWolf HDDs X-RAID 21.9TB free / 23.4TB used History: Last year I replaced all the 8TB Ironwold HDDs (from memory) one by...
Platypus69
Mar 05, 2021Luminary
Firstly, thanks a million as always.
Of course I don't know, but I would be surprised if snapshots are the root cause....
I only have set up snapshots for my OneDrive and Dropbox shares. Which represent a fraction of the photos and movies that are stored on the RN316.
Any other snapshots, which likewise were not large anyway, are now gone. So because I only used the free versions of these services that are limited in size. Dropbox = 16GB, OneDrive I can't remember, but probably around 16GB as well. So I thought I would use the snapshot feature of the RN316 for these shares as the free tiers of OneDrive and Dropbox do not have this functionality.
Do you really think it will make a difference if I remove these underlying snapshots? They are small no? But perhaps they take up a lot of meta data???? I don't know...
OneDrive share UI says:
- 7149 files, 98 folders, 13GB
- 20 snapshots (2year(s) protection)
DropBox share UI says:
- 15365 files, 571 folders, 13.9GB
- 19 snapshots (2year(s) protection)
I too have concluded/decided that I will at some point, as soon as I can, buy 8 x 16TB HDDs for my new DS1819+, and do as you suggest with moving all the data off the RN316, refomatting it and movingit back. But I cannot afford the 8 x 16TB HDDs right now, in one hit.
So the frustrating thing is I have run out of space on all my ReadyNASes. I have this 20TB free but I cannot use it!!!! ArggghHh.... :)
So would you suggest an action plan of trying to removing 1TB of old data from md127, then doing a balance, then doing a defrag, then doing a balance and then trying to copy the data back????????
Of course I am very curious as to what the problem is and how to avoid it in the future. It sounds to me that using a strategy of going from (6 x 4TB HDDs) to (6 x 10TB HDDs) to (6 x 16TB HDDs) in the future is not a viable solution for this BTRFS based RAID NASes.
Unless of course I should have had monthly balances/defrags, Which I never did. Netgear never recommended it. I had assumed (incorrectly it seems) that you never needed to run these operatiosn as I only predominanlty only add my family photos and videos.
So I want to learn the lesson here, but am struggling to learn what I did wrong and how to avoid this in the future, other than your "brute force" technique.
So I was planning to fill out my new DS1819+
- Buy 1 x 16TB HDDs in the first month (Yes I know there is no RAID)
- Add 1 x 16TB HDD every month after that, so as to stagger the HDDs lifetime, reduced the chance of them all failing simultaneously, and also staggering the cost
But given all the dramas I am having with BTRFS, I wondering whether this is a horrendous idea, and I would be better off buying 8 x 16TB HDDs and setting up one massive pool. So take the hit on the wallet! :(
Or can I get away with perhaps buying 4 x 16TB HDDs and set up one pool this year. And in 12-24 months buying 4 x 16TB HDDs and set up another pool?
I am begining to suspect that buy the 8 x 16TB HDDs in one hit in the best way to go... Ouch!
StephenB
Mar 05, 2021Guru - Experienced User
Platypus69 wrote:
So would you suggest an action plan of trying to removing 1TB of old data from md127, then doing a balance, then doing a defrag, then doing a balance and then trying to copy the data back????????
Well, we can't see what is actually on md127 (as opposed to md126). But you could try copying off some older shares, and then delete them. After the space is reclaimed (hopefully from md127), you can try a balance (which should succeed if there's enough space on md127). A scrub might also reallocate some space. After that, you could recreate the shares and copy the data back.
A defrag won't help - and it can reduce free space in the shares that have snapshots enabled.
Platypus69 wrote:
Unless of course I should have had monthly balances/defrags, Which I never did. Netgear never recommended it. I had assumed (incorrectly it seems) that you never needed to run these operatiosn as I only predominanlty only add my family photos and videos.
So I want to learn the lesson here, but am struggling to learn what I did wrong and how to avoid this in the future
They don't offer any guidance on volume maintenance. My current practice is to schedule each of the four tasks (scrub, disk test, balance, and defrag). I cycle through one each month, so over a year each runs 3 times. Defrag probably isn't necessary - but I have enough free space to avoid the downside, so I just run it anyway.
Opinions here differ on balance - mdgm for instance only runs it rarely (if at all). But I have seen posts here where it has reclaimed unallocated space. In general, if a balance isn't needed then it runs very quickly and I've never had any problems running them. So I continue to run them on this schedule.
I don't know how your system ended up this way. FWIW I also have multiple RAID groups on my NAS.
Label: '2fe72582:data' uuid: a665beff-2a06-4b88-b538-f9fa4fb2dfef Total devices 2 FS bytes used 13.54TiB devid 1 size 16.36TiB used 12.72TiB path /dev/md127 devid 2 size 10.91TiB used 1.27TiB path /dev/md126
Unallocated space isn't evenly split across the two RAID groups, but fortunately I do have reasonable space on the original md127 RAID group.
It seems to me that btrfs balance should handle this better - not sure if there are options that would spread the unallocated space more evenly. I'll try to research it if I can find the time.
- SandsharkMar 05, 2021Sensei
In general Linux/BTRFS forums, it is recommended that you run a balance after adding a new element to a BTRFS volume. I have seen no evidence that Netgear expansion does that when it vertically expands, which is adding a new element (new md RAID). Without the balance, the data and metadata are not spread across the whole volume, which can result in problems. From the BTRFS balance man page: The primary purpose of the balance feature is to spread block groups across all devices so they match constraints defined by the respective profiles.
I don't know what those "constraints" are, but I asked about the balance because you may be running into one of them.
FYI, I run a balance monthly. That's probably more often than necessary, but it's a fast process if you do it often and it happens on a schedule while I sleep, so why not?
- StephenBMar 05, 2021Guru - Experienced User
Sandshark wrote:
The primary purpose of the balance feature is to spread block groups across all devices so they match constraints defined by the respective profiles.
Yes, and the fact that Platypus69 has never run one is part of the puzzle. But his balance is failing now, even though there is plenty of unallocated space in md126. So the "primary purpose" isn't being acheived.
FWIW, I decided to run a balance with no parameters ( btrfs balance start /data ) just to see if that moves any chunks from md127 (about 75% allocated) to md126 (about 13% allocated). It'll take a while, but I will report back when it completes.
- rn_enthusiastMar 05, 2021Virtuoso
When expanding, the NAS creates a second raid (md126) as you guys know. BTRFS will then put md127 and md126 into a JBOD kind of raid, using the filesystem itself to do so. Basically, mdadm creates to raids (devices essentially) and BTRFS sticks those together in a JBOD using the raid capability of the filesystem itself. That is fine and not an issue BUT if you look at the metadata, it is set to raid1. BTRFS can do clever things like hold different raid levels for data and metadata.
But take this situation... md127 is totally full and metadata is totally full as well. In order to write new data, new medata MUST be duplicated between md126 and md127 as specified in the BTRFS raid profile. But since the NAS cannot write anymore metadata to md127, well then it can't write at all because it is supposed to write the metadata in a raid1 fashion between the two devices.
Label: 'data' uuid: ... Total devices 2 FS bytes used 23.43TiB devid 1 size 18.17TiB used 18.17TiB path /dev/md127 devid 2 size 27.28TiB used 5.29TiB path /dev/md126 === filesystem /data === Data, single: total=23.43TiB, used=23.42TiB System, RAID1: total=32.00MiB, used=2.99MiB Metadata, RAID1: total=5.85GiB, used=5.84GiB Metadata, DUP: total=10.50GiB, used=10.01GiB GlobalReserve, single: total=512.00MiB, used=33.05MiB
We need to balance some of the data across to md126. Running balance from GUI isn't really a full balance. The GUI will use parameters during the balance so it only balances parts of the volume. My suggestion would be to:
1. Take some data off the NAS temporarily. A few TBs.
2. Run a full balance from the CLI, like StephenB mentioned.3. Move the data back (post to the thread with the volume stats like above, first, post balance).
Cheers
- Platypus69Mar 06, 2021Luminary
Thanks. What everyone here has said makes sense, but it's the action plan which is confusing.
So am I winning here?
You can see that things "have changed":
Label: 'blah:root' uuid: * Total devices 1 FS bytes used 1.46GiB devid 1 size 4.00GiB used 3.61GiB path /dev/md0 Label: 'blah:data' uuid: * Total devices 2 FS bytes used 13.24TiB devid 1 size 18.17TiB used 18.09TiB path /dev/md127 devid 2 size 27.28TiB used 4.84TiB path /dev/md126 === filesystem /data === Data, single: total=22.90TiB, used=13.23TiB System, RAID1: total=32.00MiB, used=2.95MiB Metadata, RAID1: total=6.85GiB, used=5.81GiB Metadata, DUP: total=10.50GiB, used=9.54GiB GlobalReserve, single: total=512.00MiB, used=0.00B
Is that showing that we ae starting to expand Metadata, RAID1 as it has grown from 5.85GiB to 6.85GiB.
Is that sufficient? Will it now be ablt to grow from now on as required?
Has the problem been solved, or should I continue to move more stuff of and re-balance?
Here is the history from volume.log (apologies for length, I have removed all disk tests). I was incorrect in saying that I have never run Balance before as you can see, apologies.
FYI: I replaced the 4TB HDDs with 10TB HDDs in this time line:
- 04/12/2018
- 06/12/2018
- 29/07/2019
- 15/12/2019
- 21/03/2020
- 21/06/2020
So it would seem no Balance has been done since the last 4TB HDD was swapped out, until I have encountered this problem in 2021.
=== maintenance history === device operation start_time end_time result details ---------- --------- ------------------- ------------------- --------- ---------------------------------------------------------------- data balance 2016-06-15 00:00:01 2016-06-15 00:00:33 completed data scrub 2016-07-01 00:00:01 data balance 2016-07-15 00:00:01 2016-07-15 00:02:20 completed data balance 2016-08-15 00:00:19 2016-08-15 00:00:33 fail data scrub 2016-09-01 00:00:01 data balance 2016-09-15 00:00:02 2016-09-15 00:00:38 completed data scrub 2016-10-01 00:00:02 2016-10-02 04:38:32 pass data balance 2016-10-15 00:00:01 2016-10-15 01:51:49 completed data scrub 2016-11-01 00:00:01 2016-11-02 17:35:19 pass data balance 2016-11-15 00:00:01 2016-11-15 00:01:55 completed data scrub 2016-12-01 00:00:01 2016-12-02 21:24:24 pass data balance 2016-12-15 00:00:01 2016-12-15 00:17:38 completed data scrub 2017-01-01 00:00:02 2017-01-03 02:19:12 pass data balance 2017-01-15 00:00:01 2017-01-15 00:09:23 completed data scrub 2017-03-01 00:00:03 data balance 2017-03-15 00:00:01 2017-03-15 06:26:22 completed data scrub 2017-04-01 00:00:01 data balance 2017-04-15 00:00:01 2017-04-15 00:00:31 completed data scrub 2017-05-01 00:00:01 2017-05-01 22:03:16 abort data scrub 2017-05-01 22:31:25 2017-05-04 17:39:57 pass data balance 2017-05-15 00:00:03 2017-05-15 00:03:54 completed Done, had to relocate 4 out of 13438 chunks data scrub 2017-06-02 00:00:01 data balance 2017-06-15 00:00:02 2017-06-15 00:17:54 completed Done, had to relocate 40 out of 14672 chunks data scrub 2017-07-02 00:00:01 data balance 2017-07-15 00:00:01 2017-07-15 00:00:19 completed Done, had to relocate 2 out of 14758 chunks data scrub 2017-08-02 00:00:01 data balance 2017-08-15 00:00:01 2017-08-15 02:00:33 completed balance canceled by user data scrub 2017-09-02 00:00:01 data balance 2017-09-15 00:00:01 2017-09-15 02:05:02 completed balance canceled by user data scrub 2017-10-02 00:00:01 data balance 2017-10-15 00:00:01 2017-10-15 02:03:23 completed balance canceled by user data scrub 2018-01-02 00:00:01 data balance 2018-01-15 00:00:01 2018-01-15 00:35:45 completed Done, had to relocate 53 out of 16330 chunks data scrub 2018-02-02 00:00:01 data resilver 2018-02-17 21:47:57 2018-02-18 20:11:21 completed data scrub 2018-03-02 00:00:01 data balance 2018-03-15 00:00:02 2018-03-15 00:13:32 completed Done, had to relocate 14 out of 16390 chunks data scrub 2018-04-02 00:00:01 data balance 2018-04-15 00:00:01 2018-04-15 00:05:54 completed Done, had to relocate 15 out of 16391 chunks data scrub 2018-05-02 00:00:01 data scrub 2018-06-02 00:00:01 2018-06-03 17:21:01 pass data balance 2018-06-15 00:00:02 2018-06-15 01:43:40 completed Done, had to relocate 67 out of 18585 chunks data balance 2018-07-15 00:00:01 2018-07-15 00:00:09 completed Done, had to relocate 0 out of 18607 chunks ERROR: error during
data scrub 2018-08-02 00:00:01 data balance 2018-08-15 00:00:01 2018-08-15 00:00:08 completed Done, had to relocate 0 out of 18607 chunks ERROR: error during data resilver 2018-12-04 00:04:00 data resilver 2018-12-04 00:50:30 2018-12-04 17:53:15 completed data scrub 2018-12-04 18:21:51 2018-12-06 09:40:35 pass data resilver 2018-12-06 23:59:57 data resilver 2018-12-07 01:06:25 2018-12-07 16:58:15 completed data resilver 2018-12-07 16:58:27 2018-12-08 10:49:21 completed data scrub 2018-12-08 14:54:34 2018-12-11 02:49:34 pass data balance 2018-12-15 00:00:01 2018-12-15 00:03:26 completed Done, had to relocate 0 out of 18625 chunks Done, had to relocat data scrub 2019-01-07 03:08:33 2019-01-11 08:28:03 pass data balance 2019-01-15 00:00:01 2019-01-15 00:12:40 completed Done, had to relocate 0 out of 19285 chunks ERROR: error during data balance 2019-02-15 00:00:01 2019-02-15 00:00:30 completed Done, had to relocate 1 out of 20138 chunks data scrub 2019-03-01 01:00:01 data resilver 2019-07-25 12:20:40 2019-07-26 05:53:47 completed data resilver 2019-07-29 20:02:39 data resilver 2019-07-30 20:36:36 2019-07-31 15:17:52 completed data scrub 2019-10-01 01:00:01 2019-10-06 05:37:11 pass data scrub 2019-12-01 01:00:01 2019-12-06 10:58:28 pass data resilver 2019-12-14 23:56:42 2019-12-14 23:57:32 degraded data resilver 2019-12-15 00:09:16 2019-12-17 01:53:29 completed data scrub 2019-12-17 11:26:26 2019-12-23 05:14:28 pass data scrub 2020-02-01 01:00:01 2020-02-07 07:45:05 pass data resilver 2020-03-21 23:56:06 2020-03-22 17:01:34 completed data resilver 2020-03-22 17:02:31 2020-03-24 09:54:07 completed data scrub 2020-04-01 01:00:01 2020-04-07 10:46:32 pass data scrub 2020-06-01 01:00:02 data scrub 2020-06-10 19:44:01 2020-06-17 15:59:33 pass data resilver 2020-06-21 23:16:29 2020-06-24 14:16:38 completed data scrub 2020-08-01 01:00:01 data resilver 2020-08-08 15:10:50 data resilver 2020-08-08 15:45:05 2020-08-09 15:42:35 completed data resilver 2020-09-13 15:54:14 2020-09-14 20:35:46 completed data balance 2021-02-18 21:17:16 2021-02-18 21:18:42 completed ERROR: error during balancing '/data': No space left on device T data scrub 2021-02-18 21:29:29 data balance 2021-03-01 21:03:16 2021-03-01 21:04:48 completed ERROR: error during balancing '/data': No space left on device T data balance 2021-03-03 15:34:37 2021-03-04 03:44:36 completed ERROR: error during balancing '/data': No space left on device T data balance 2021-03-05 09:34:32 2021-03-05 10:29:27 completed ERROR: error during balancing '/data': No space left on device T data balance 2021-03-05 19:39:44 2021-03-05 19:49:07 completed ERROR: error during balancing '/data': No space left on device T data balance 2021-03-05 21:09:45 2021-03-05 21:27:23 completed ERROR: error during balancing '/data': No space left on device T data balance 2021-03-05 21:28:15 2021-03-05 21:28:19 completed Done, had to relocate 1 out of 23557 chunks data balance 2021-03-05 21:45:20 2021-03-05 21:46:05 completed Done, had to relocate 29 out of 23557 chunks data balance 2021-03-05 21:57:26 2021-03-05 21:57:31 completed Done, had to relocate 1 out of 23529 chunks data balance 2021-03-05 21:59:22 2021-03-05 21:59:27 completed Done, had to relocate 1 out of 23529 chunks data balance 2021-03-05 21:59:48 2021-03-05 21:59:53 completed Done, had to relocate 1 out of 23529 chunks data balance 2021-03-05 22:25:13 2021-03-05 22:25:18 completed Done, had to relocate 1 out of 23529 chunks data balance 2021-03-05 23:19:38 2021-03-05 23:19:44 completed Done, had to relocate 1 out of 23529 chunks data balance 2021-03-06 00:54:22 2021-03-06 00:54:28 completed Done, had to relocate 1 out of 23529 chunks data defrag 2021-03-06 00:54:49 2021-03-06 03:02:04 completed - Platypus69Mar 06, 2021Luminary
Thanks.
I will run this today.
Out of curiosity is there a --verbose switch or something like that which will report what's doing on the screen. Does the command return a summary of what was done in SSH?
- rn_enthusiastMar 06, 2021Virtuoso
The distribution between the two devices (md126 and md127) is still pretty bad.
md127 is still very very full and that can in some cases issues for the balance, hence I suggest maybe off loading some data.
You are gaining ground here, so that is good. You might get a away with simply balancing it out without off-loading any data first. If you keep the data on the NAS - i.e. not off loading some before the balance, then I would do it in increments.
btrfs balance start -dusage=10 /data btrfs balance start -dusage=30 /data btrfs balance start -dusage=50 /data btrfs balance start /data
You can check progress of current balance with
btrfs balance status -v /data
Some good reading with regards to BTRFS balancing:
https://btrfs.wiki.kernel.org/index.php/Manpage/btrfs-balance
Related Content
NETGEAR Academy
Boost your skills with the Netgear Academy - Get trained, certified and stay ahead with the latest Netgear technology!
Join Us!