× NETGEAR will be terminating ReadyCLOUD service by July 1st, 2023. For more details click here.
Orbi WiFi 7 RBE973
Reply

Cannot copy files to RN316 although I have 22TB free...

Platypus69
Luminary

Cannot copy files to RN316 although I have 22TB free...

Hi all. I have the following RN316:

  • Firmware 6.10.4
  • Running 6 x 10TB IronWolf HDDs
  • X-RAID
  • 21.9TB free / 23.4TB used
  • History:
    • Last year I replaced all the 8TB Ironwold HDDs (from memory) one by one over a 6-8 month period.
    • The RN316 has predominantly been used for family photos and videos, so lot's of small files
    • I have enabled Intelligent Snapshots

Last week I upgraded to 6.10.4 from the previous version (I like to keep things up to date.)

 

I then created a new SMB TEMP share with Bit Rot Protection, no snapshots, no quotas.

I have been copying on average 20GB files to this share with no problem/ Currently there is 22TB free.

 

And then I  started having this problem:

  1. I try to copy a large file to the share. Windows 10's copy dialog box comes up with 0% complete with Time Remaining: Calculating
  2. Nothing happens for a long time. There is no progress on the green file copy graph
  3. After a minute or so I get an error message The destination already has a file named ...
  4. I click on skip this file option
  5. A zero size file is created.

 

So the file is not copied and a zero sized file is created instead! 

 

I ran a balance, which took seconds.

 

I am currently running a scrub. it will take days to complete.

 

I can replicate this on other shares today, so it is not share specific. not a quota thing???

I also tried copying a 968KB PDF file today in an attempt to isolate the problem. It worked, but took ages. Not the 1-2 seconds that I would expect.

 

So I have no idea, another BTRFS / metadata / copy-on-write issue? I have no idea...

 

How do I solve this?

 

So here is from the BTRFS.LOG snippet that I assume is of interest:

Label: 'root'  uuid: ...
	Total devices 1 FS bytes used 1.47GiB
	devid    1 size 4.00GiB used 3.24GiB path /dev/md0
Label: 'data'  uuid: ...
	Total devices 2 FS bytes used 23.43TiB
	devid    1 size 18.17TiB used 18.17TiB path /dev/md127
	devid    2 size 27.28TiB used 5.29TiB path /dev/md126
=== filesystem /data ===
Data, single: total=23.43TiB, used=23.42TiB
System, RAID1: total=32.00MiB, used=2.99MiB
Metadata, RAID1: total=5.85GiB, used=5.84GiB
Metadata, DUP: total=10.50GiB, used=10.01GiB
GlobalReserve, single: total=512.00MiB, used=33.05MiB
=== subvolume /data ===

 

This has confused me the numbers don't look right.

Did the X-RAID not expand correctly???

 

Does this snippet from the VOLUME.LOG help?

...

Pool data
Device: /dev/md126
Node: 9:127
HostID: 43f5fa04 (native)
UUID: 02de77f1-efa5-4411-a1c8-b1ee66da855a
Mount point: /data
Size: 48806686264KB (46545 GB)
Available: 23630170408KB (22535 GB)
Snapshot: 0KB (0 MB)
Data allocation: 23990 GB
Data used: 24553465 MB (23977 GB)
Metadata allocation: 16776 MB
Metadata used: 16231 MB
Unallocated: 22523 GB
RAID Level: 5
State: redundant
Action: scrubbing for errors (auto-repair)
Sync progress: 17.7% (3455025152/19523436425)
Sync speed: 30945 KB/sec
Time to completion: 4327.1 minutes
Flags: 0x4C8
 Type: btrfs
Tier Flag: Disable
RAIDs:
md127
Size: 39021667840
Level: 5
Action: scrubbing for errors (auto-repair)
Members: 6
Type: 3
Flags: 0x408
Tier: 0
Data allocation: 18580 GB
Metadata allocation: 27528 MB
Unallocated: 0 GB
md126
Size: 58592892160
Level: 5
Action: scrubbing for errors (auto-repair)
Members: 6
Type: 3
Flags: 0x8
Tier: 0
Data allocation: 5410 GB
Metadata allocation: 6024 MB
Unallocated: 22523 GB
Disks:

...

=== df -h ===
Filesystem Size Used Avail Use% Mounted on
udev 10M 4.0K 10M 1% /dev
/dev/md0 4.0G 1.6G 2.2G 42% /
tmpfs 993M 0 993M 0% /dev/shm
tmpfs 993M 31M 963M 4% /run
tmpfs 497M 1.6M 495M 1% /run/lock
tmpfs 993M 0 993M 0% /sys/fs/cgroup
/dev/md127 46T 24T 23T 52% /data
/dev/md127 46T 24T 23T 52% /home
/dev/md127 46T 24T 23T 52% /apps
=== df -i ===
Filesystem Inodes IUsed IFree IUse% Mounted on
udev 253416 551 252865 1% /dev
/dev/md0 0 0 0 - /
tmpfs 254086 1 254085 1% /dev/shm
tmpfs 254086 849 253237 1% /run
tmpfs 254086 28 254058 1% /run/lock
tmpfs 254086 15 254071 1% /sys/fs/cgroup
/dev/md127 0 0 0 - /data
/dev/md127 0 0 0 - /home
/dev/md127 0 0 0 - /apps

Not sure with what's going on with md127 versus md126, whatever they are? Mount points?

 

Happy to provide any other log data.

 

TIA

Model: RN31600|ReadyNAS 300 Series 6- Bay (Diskless)
Message 1 of 27
StephenB
Guru

Re: Cannot copy files to RN316 although I have 22TB free...

Your posts were caught by the automatic spam filter - I released this one.  Not sure if the others had the same information, or if there was additional stuff.  If the latter, let me know, and I can release the others.

 


@Platypus69 wrote:

Not sure with what's going on with md127 versus md126, whatever they are? Mount points?

 


Your volume has two RAID groups - the original RAID-5 group, and the second RAID-5 group that was created when you vertically expanded the array to 6x10TB.

 

The main RAID group is md127, the second group is md126.  These are concatenated together to give you a single volume.

 

Not sure what is confusing you about the numbers above, they look like they are consistent with your overall used/free information.  One thing that is confusing is that you say you upgraded 8 TB ironwolf drives to 10 TB.  The sizes of md126 and md127 look you like might have upgraded 4 TB ironwolf drives to 10 TB.

 

Have you looked for errors in system.log and kernel.log?

 

As an aside - though you do have plenty of free space, you might want to reconsider use of the "smart" snapshots.  The problem with them is that the oldest are retained forever.  Setting up custom snapshots with fixed retention is IMO a much better approach.  Something to consider when you resolve your problem here.

Message 2 of 27
Sandshark
Sensei

Re: Cannot copy files to RN316 although I have 22TB free...

Try turning off Strict Sync for the share.  It's in the share's Network Access/Advanced settings tab.

Message 3 of 27
Platypus69
Luminary

Re: Cannot copy files to RN316 although I have 22TB free...

Thanks, but that did not fix problem.

Message 4 of 27
Platypus69
Luminary

Re: Cannot copy files to RN316 although I have 22TB free...

Thanks.

I am familiar with Microsoft stack and NTFS/ReFS, and VSAN.

I know nothing about BTRFS, so I assume that it would expand existing volume not create a second. Thus my confusion through ignorance.

Yes, you are correctm they were 4TB HDDs, it was over 5 years ago probably...

 

Thanks for the heads up on the Snapshots. I expect it was a good idea 5 years when I set it up as it was designed to store family photos/videos.

 

So how do I turn of snapshots. Is it simply a matter of going into the snapshot GUI and say turn off "Smart" management? I am nervouse about doing something wrong and losing data 🙂

 

What would I be looking for in kernel.log or system.log? I tried to attach them as a ZIP file but it ssmes you can't...

 

I had a quick look and noithing made sense to me or stood out.

 

Although I am wondering if it is snapshot related. I am getting lots of errors like the following in system.log:

Feb 21 22:56:03 RN316 snapperd[545]: loading 12848 failed
Feb 21 22:56:03 RN316 snapperd[545]: loading 12850 failed
Feb 21 22:56:03 RN316 snapperd[545]: loading 1036 failed
Feb 21 22:56:03 RN316 snapperd[545]: loading 1037 failed

A quick Google search seemed to imply that snapperd is snapshot related???

 

Any help appreciated.

 

Sorry for delay, RN316 has been unresponsive for 3 days due to Scrub operation 😞

 

Actually I just randomly found this in VOLUME.LOG:

data        disk test  2020-09-01 01:00:01  2020-09-01 15:20:27  pass                                                                       
data        resilver   2020-09-13 15:54:14  2020-09-14 20:35:46  completed                                                                  
data        balance    2021-02-18 21:17:16  2021-02-18 21:18:42  completed  ERROR: error during balancing '/data': No space left on device
T
data        scrub      2021-02-18 21:29:29            

Is that relevant???

Should I try moving some large files off and doing Balance again?

 

TIA

 

Message 5 of 27
StephenB
Guru

Re: Cannot copy files to RN316 although I have 22TB free...


@Platypus69 wrote:

 

So how do I turn off snapshots. Is it simply a matter of going into the snapshot GUI and say turn off "Smart" management? I am nervous about doing something wrong and losing data 🙂

You can either turn them off altogether in the GUI or you can change to Custom and explicitly set retention.  Just turning them off won't delete existing snapshots - you need to go into "recover", select them, and delete them manually.

snapshot-01.png

 

snapshot-02.png


@Platypus69 wrote:

I assume that it would expand existing volume not create a second.

It did expand the existing volume.  You are confusing "volumes" with "RAID groups".  They aren't the same thing.  A ReadyNAS volume consists of one or more RAID groups.

 


@Platypus69 wrote:

 

What would I be looking for in kernel.log or system.log? I tried to attach them as a ZIP file but it ssmes you can't...

 

Actually I just randomly found this in VOLUME.LOG:

data        disk test  2020-09-01 01:00:01  2020-09-01 15:20:27  pass                                                                       
data        resilver   2020-09-13 15:54:14  2020-09-14 20:35:46  completed                                                                  
data        balance    2021-02-18 21:17:16  2021-02-18 21:18:42  completed  ERROR: error during balancing '/data': No space left on device
T
data        scrub      2021-02-18 21:29:29            

Is that relevant???

Should I try moving some large files off and doing Balance again?

 


You can't attach zips, and you should be cautious about including links to the full logs.  There is some privacy leakage.

 

Generally you are looking for errors that include disks or "btrfs".

 

Your error is certainly relevant, and I would suggest looking for errors in system and kernel.log around the time of the error in volume.log.  Deleting some large files and trying to balance again is a reasonable next step.

Message 6 of 27
Sandshark
Sensei

Re: Cannot copy files to RN316 although I have 22TB free...


@StephenB wrote:

  Deleting some large files and trying to balance again is a reasonable next step.

And perhaps some of the oldest (and thus likely largest) snapshots as well before the balance.  The first balance will normally take a while, so that error explains why it only took seconds.  That the ReadyNAS didn't tell you it ended in error is, IMHO, an oversight in the ReadyNAS OS.

 

I've seen some posts on general Linux forums recommending a balance after a BTRFS expansion, and it does not appear Netgear does that automatically.  Apparently, that helps properly allocate data and metadata across the volume.  The MAN page says "The primary purpose of the balance feature is to spread block groups across all devices so they match constraints defined by the respective profiles".  I've not found a good list of those restraints, but you may have arrived at one of them.  So having added to and then expanded one of the "devices" (which is a multi-partition MDADM RAID) in your file system, a balance should help.  Of course, it is sometimes hard to pick out what in the general BTRFS forums and wiki are applicable to ReadyNAS because ReadyNAS uses BTRFS on top of MDADM RAID rather than BTRFS for both file system and RAID and the general discussions typically assume you are using BTRFS for both..

Message 7 of 27
StephenB
Guru

Re: Cannot copy files to RN316 although I have 22TB free...


@Sandshark wrote:

I've seen some posts on general Linux forums recommending a balance after a BTRFS expansion, and it does not appear Netgear does that automatically.  Apparently, that helps properly allocate data and metadata across the volume.  The MAN page says "The primary purpose of the balance feature is to spread block groups across all devices so they match constraints defined by the respective profiles".  I've not found a good list of those restraints, but you may have arrived at one of them.  

I'm wondering that also.  Looking at the first post, I see this.

Label: 'data'  uuid: ...
	Total devices 2 FS bytes used 23.43TiB
	devid    1 size 18.17TiB used 18.17TiB path /dev/md127
	devid    2 size 27.28TiB used 5.29TiB path /dev/md126

Note all the unallocated space is on md126.  md127 is completely full.

Message 8 of 27
Platypus69
Luminary

Re: Cannot copy files to RN316 although I have 22TB free...

Hi all....

 

So I have not been able to resolve the issue yet, I believe.


BTW is 6.10.4 (Hotfix 1) buggy??? Should I be downgrading to 6.10.3?

 

I've been offline as the RN316 was horrendously unavailabe during my scrub, and even after it failed or completed after 7 days it still feels sluggish to me...

 

Anyway... I have moved about 500GB off the RN316. But is that enough? It have moved most of the files I have copied across this year, but also some older stuff from late last year, but probably AFTER I replaced all the 4TB HDDs with 10TB HDDs so I am again wondering if everything is going to the old md127 and not new md126???

 

So right now here are my stats / logs / telemetry:

 

BTRFS.LOG

Label: 'blah:root'  uuid: blah-blah
	Total devices 1 FS bytes used 1.43GiB
	devid    1 size 4.00GiB used 3.61GiB path /dev/md0

Label: 'blah:data'  uuid: blah-blah
	Total devices 2 FS bytes used 22.95TiB
	devid    1 size 18.17TiB used 18.14TiB path /dev/md127
	devid    2 size 27.28TiB used 4.84TiB path /dev/md126

=== filesystem /data ===
Data, single: total=22.95TiB, used=22.93TiB
System, RAID1: total=32.00MiB, used=2.95MiB
Metadata, RAID1: total=5.85GiB, used=5.32GiB
Metadata, DUP: total=10.50GiB, used=10.03GiB
GlobalReserve, single: total=512.00MiB, used=0.00B
=== subvolume /data ===

Why does Data,  single: showing 22.95TiB, which suspiciously seems to be the limit of the amount of data I can store? Recall the UI is showing data 22.96TB and Free space: 22.49TB. Is this the "smoking gun"???

 

VOLUME.LOG

data        disk test  2020-09-01 01:00:01  2020-09-01 15:20:27  pass                                                                       
data        resilver   2020-09-13 15:54:14  2020-09-14 20:35:46  completed                                                                  
data        balance    2021-02-18 21:17:16  2021-02-18 21:18:42  completed  ERROR: error during balancing '/data': No space left on device
T
data        scrub      2021-02-18 21:29:29                                                                                                  
data        disk test  2021-03-01 08:15:53                                                                                                  
data        balance    2021-03-01 21:03:16  2021-03-01 21:04:48  completed  ERROR: error during balancing '/data': No space left on device
T
data        balance    2021-03-03 15:34:37  2021-03-04 03:44:36  completed  ERROR: error during balancing '/data': No space left on device
T
data        balance    2021-03-05 09:34:32  2021-03-05 10:29:27  completed  ERROR: error during balancing '/data': No space left on device
T
data        balance    2021-03-05 19:39:44  2021-03-05 19:49:07  completed  ERROR: error during balancing '/data': No space left on device
T
data        balance    2021-03-05 21:09:45  2021-03-05 21:27:23  completed  ERROR: error during balancing '/data': No space left on device
T
data        balance    2021-03-05 21:28:15  2021-03-05 21:28:19  completed  Done, had to relocate 1 out of 23557 chunks                     
data        balance    2021-03-05 21:45:20  2021-03-05 21:46:05  completed  Done, had to relocate 29 out of 23557 chunks                    
data        balance    2021-03-05 21:57:26  2021-03-05 21:57:31  completed  Done, had to relocate 1 out of 23529 chunks                     
data        balance    2021-03-05 21:59:22  2021-03-05 21:59:27  completed  Done, had to relocate 1 out of 23529 chunks                     
data        balance    2021-03-05 21:59:48  2021-03-05 21:59:53  completed  Done, had to relocate 1 out of 23529 chunks                     
data        balance    2021-03-05 22:25:13  2021-03-05 22:25:18  completed  Done, had to relocate 1 out of 23529 chunks

Why does it keep relocate 1 out of 23529 chunks only? That chunk size does not go down? I have no idea. Do I keep doing balances?

 

Should I Defrag now?

 

I also have SMB Plus installed and have enabled Preallocate (FYI: Preallocate disk space before writing data. This can slow down write speed slightly, but should result in the file being nicely laid out on the disk, with minimal fragmentation.)

 

I have removed a lot of snapshots but would like to keep the ones that I have set up for OneDrive and DropBox apps. Both report 19 snapshots with 2 years protection.

 

I am happy to turn off the snapshots, ie: set them to manual. But can drop all snapshots if people think that's good. Just being a bit nervous...

 

I'm pulling my hair out... What do I do?

 

Has the problem been solved? Can you tell? Or should I try to find much older data and remove another 500Gb or 1 TB of older data before I try balancing again?

 

Any help appreciated!

 

For what it's worth:

 

KERNEL.LOG

Mar 05 22:11:19 RN316 systemd[1]: Set hostname to <RN316>.
Mar 05 22:11:19 RN316 systemd[1]: systemd-journald-audit.socket: Cannot add dependency job, ignoring: Unit systemd-journald-audit.socket is masked.
Mar 05 22:11:19 RN316 systemd[1]: systemd-journald-audit.socket: Cannot add dependency job, ignoring: Unit systemd-journald-audit.socket is masked.
Mar 05 22:11:19 RN316 systemd[1]: Started Forward Password Requests to Wall Directory Watch.
Mar 05 22:11:19 RN316 systemd[1]: Listening on Journal Socket (/dev/log).
Mar 05 22:11:19 RN316 systemd[1]: Set up automount Arbitrary Executable File Formats File System Automount Point.
Mar 05 22:11:19 RN316 systemd[1]: Created slice System Slice.
Mar 05 22:11:19 RN316 systemd[1]: Created slice system-serial\x2dgetty.slice.
Mar 05 22:11:19 RN316 systemd[1]: Created slice system-getty.slice.
Mar 05 22:11:19 RN316 systemd[1]: Listening on /dev/initctl Compatibility Named Pipe.
Mar 05 22:11:19 RN316 systemd[1]: Started Dispatch Password Requests to Console Directory Watch.
Mar 05 22:11:19 RN316 systemd[1]: Reached target Encrypted Volumes.
Mar 05 22:11:19 RN316 systemd[1]: Listening on udev Control Socket.
Mar 05 22:11:19 RN316 systemd[1]: Reached target Paths.
Mar 05 22:11:19 RN316 systemd[1]: Reached target Remote File Systems (Pre).
Mar 05 22:11:19 RN316 systemd[1]: Reached target Remote File Systems.
Mar 05 22:11:19 RN316 systemd[1]: Listening on udev Kernel Socket.
Mar 05 22:11:19 RN316 systemd[1]: Listening on Journal Socket.
Mar 05 22:11:19 RN316 systemd[1]: Starting Remount Root and Kernel File Systems...
Mar 05 22:11:19 RN316 systemd[1]: Mounting POSIX Message Queue File System...
Mar 05 22:11:19 RN316 systemd[1]: Starting Create Static Device Nodes in /dev...
Mar 05 22:11:19 RN316 systemd[1]: Mounting Debug File System...
Mar 05 22:11:19 RN316 systemd[1]: Created slice User and Session Slice.
Mar 05 22:11:19 RN316 systemd[1]: Reached target Slices.
Mar 05 22:11:19 RN316 systemd[1]: Listening on Syslog Socket.
Mar 05 22:11:19 RN316 systemd[1]: Starting Journal Service...
Mar 05 22:11:19 RN316 systemd[1]: Starting Load Kernel Modules...
Mar 05 22:11:19 RN316 systemd[1]: Started ReadyNAS LCD splasher.
Mar 05 22:11:19 RN316 systemd[1]: Starting ReadyNASOS system prep...
Mar 05 22:11:19 RN316 systemd[1]: Mounted POSIX Message Queue File System.
Mar 05 22:11:19 RN316 systemd[1]: Mounted Debug File System.
Mar 05 22:11:19 RN316 systemd[1]: Started Remount Root and Kernel File Systems.
Mar 05 22:11:19 RN316 systemd[1]: Started Create Static Device Nodes in /dev.
Mar 05 22:11:19 RN316 systemd[1]: Started Load Kernel Modules.
Mar 05 22:11:19 RN316 systemd[1]: Starting Apply Kernel Variables...
Mar 05 22:11:19 RN316 systemd[1]: Mounting FUSE Control File System...
Mar 05 22:11:19 RN316 systemd[1]: Mounting Configuration File System...
Mar 05 22:11:19 RN316 systemd[1]: Starting udev Kernel Device Manager...
Mar 05 22:11:19 RN316 systemd[1]: Starting Load/Save Random Seed...
Mar 05 22:11:19 RN316 systemd[1]: Starting Rebuild Hardware Database...
Mar 05 22:11:19 RN316 systemd[1]: Mounted Configuration File System.
Mar 05 22:11:19 RN316 systemd[1]: Mounted FUSE Control File System.
Mar 05 22:11:19 RN316 systemd[1]: Started Apply Kernel Variables.
Mar 05 22:11:19 RN316 systemd[1]: Started ReadyNASOS system prep.
Mar 05 22:11:19 RN316 systemd[1]: Started Load/Save Random Seed.
Mar 05 22:11:19 RN316 systemd[1]: Started udev Kernel Device Manager.
Mar 05 22:11:19 RN316 systemd[1]: Started Journal Service.
Mar 05 22:11:19 RN316 kernel: md: md127 stopped.
Mar 05 22:11:19 RN316 kernel: md: bind<sdb3>
Mar 05 22:11:19 RN316 kernel: md: bind<sdc3>
Mar 05 22:11:19 RN316 kernel: md: bind<sdd3>
Mar 05 22:11:19 RN316 kernel: md: bind<sde3>
Mar 05 22:11:19 RN316 kernel: md: bind<sdf3>
Mar 05 22:11:19 RN316 kernel: md: bind<sda3>
Mar 05 22:11:19 RN316 kernel: md/raid:md127: device sda3 operational as raid disk 0
Mar 05 22:11:19 RN316 kernel: md/raid:md127: device sdf3 operational as raid disk 5
Mar 05 22:11:19 RN316 kernel: md/raid:md127: device sde3 operational as raid disk 4
Mar 05 22:11:19 RN316 kernel: md/raid:md127: device sdd3 operational as raid disk 3
Mar 05 22:11:19 RN316 kernel: md/raid:md127: device sdc3 operational as raid disk 2
Mar 05 22:11:19 RN316 kernel: md/raid:md127: device sdb3 operational as raid disk 1
Mar 05 22:11:19 RN316 kernel: md/raid:md127: allocated 6474kB
Mar 05 22:11:19 RN316 kernel: md/raid:md127: raid level 5 active with 6 out of 6 devices, algorithm 2
Mar 05 22:11:19 RN316 kernel: RAID conf printout:
Mar 05 22:11:19 RN316 kernel:  --- level:5 rd:6 wd:6
Mar 05 22:11:19 RN316 kernel:  disk 0, o:1, dev:sda3
Mar 05 22:11:19 RN316 kernel:  disk 1, o:1, dev:sdb3
Mar 05 22:11:19 RN316 kernel:  disk 2, o:1, dev:sdc3
Mar 05 22:11:19 RN316 kernel:  disk 3, o:1, dev:sdd3
Mar 05 22:11:19 RN316 kernel:  disk 4, o:1, dev:sde3
Mar 05 22:11:19 RN316 kernel:  disk 5, o:1, dev:sdf3
Mar 05 22:11:19 RN316 kernel: created bitmap (30 pages) for device md127
Mar 05 22:11:19 RN316 kernel: md127: bitmap initialized from disk: read 2 pages, set 0 of 59543 bits
Mar 05 22:11:19 RN316 kernel: md127: detected capacity change from 0 to 19979093934080
Mar 05 22:11:19 RN316 kernel: Adding 1566716k swap on /dev/md1.  Priority:-1 extents:1 across:1566716k 
Mar 05 22:11:20 RN316 kernel: BTRFS: device label 43f5fa04:data devid 1 transid 1895561 /dev/md127
Mar 05 22:11:20 RN316 kernel: md: md126 stopped.
Mar 05 22:11:20 RN316 kernel: md: bind<sdb4>
Mar 05 22:11:20 RN316 kernel: md: bind<sdc4>
Mar 05 22:11:20 RN316 kernel: md: bind<sdd4>
Mar 05 22:11:20 RN316 kernel: md: bind<sde4>
Mar 05 22:11:20 RN316 kernel: md: bind<sdf4>
Mar 05 22:11:20 RN316 kernel: md: bind<sda4>
Mar 05 22:11:20 RN316 kernel: md/raid:md126: device sda4 operational as raid disk 0
Mar 05 22:11:20 RN316 kernel: md/raid:md126: device sdf4 operational as raid disk 5
Mar 05 22:11:20 RN316 kernel: md/raid:md126: device sde4 operational as raid disk 4
Mar 05 22:11:20 RN316 kernel: md/raid:md126: device sdd4 operational as raid disk 3
Mar 05 22:11:20 RN316 kernel: md/raid:md126: device sdc4 operational as raid disk 2
Mar 05 22:11:20 RN316 kernel: md/raid:md126: device sdb4 operational as raid disk 1
Mar 05 22:11:20 RN316 kernel: md/raid:md126: allocated 6474kB
Mar 05 22:11:20 RN316 kernel: md/raid:md126: raid level 5 active with 6 out of 6 devices, algorithm 2
Mar 05 22:11:20 RN316 kernel: RAID conf printout:
Mar 05 22:11:20 RN316 kernel:  --- level:5 rd:6 wd:6
Mar 05 22:11:20 RN316 kernel:  disk 0, o:1, dev:sda4
Mar 05 22:11:20 RN316 kernel:  disk 1, o:1, dev:sdb4
Mar 05 22:11:20 RN316 kernel:  disk 2, o:1, dev:sdc4
Mar 05 22:11:20 RN316 kernel:  disk 3, o:1, dev:sdd4
Mar 05 22:11:20 RN316 kernel:  disk 4, o:1, dev:sde4
Mar 05 22:11:20 RN316 kernel:  disk 5, o:1, dev:sdf4
Mar 05 22:11:20 RN316 kernel: md126: detected capacity change from 0 to 29999560785920
Mar 05 22:11:20 RN316 kernel: BTRFS: device label 43f5fa04:data devid 2 transid 1895561 /dev/md126
Mar 05 22:13:08 RN316 kernel: e1000e: eth1 NIC Link is Down
Mar 05 22:13:09 RN316 kernel: IPv6: ADDRCONF(NETDEV_UP): eth1: link is not ready
Mar 05 22:13:09 RN316 kernel: 8021q: adding VLAN 0 to HW filter on device eth1
Mar 05 22:13:12 RN316 kernel: e1000e: eth1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
Mar 05 22:13:12 RN316 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): eth1: link becomes ready
Mar 05 22:13:15 RN316 kernel: Adjusting tsc more than 11% (6835455 vs 8751273)
Mar 05 22:16:07 RN316 kernel: nr_pdflush_threads exported in /proc is scheduled for removal
Mar 05 22:25:11 RN316 kernel: BTRFS info (device md126): relocating block group 36377627721728 flags system|raid1

 

Not sure if these snapperd errors are relevant:

 

SYSTEM.LOG

Mar 05 22:21:54 RN316 dbus[2986]: [system] Activating service name='org.opensuse.Snapper' (using servicehelper)
Mar 05 22:21:54 RN316 dbus[2986]: [system] Successfully activated service 'org.opensuse.Snapper'
Mar 05 22:21:54 RN316 snapperd[6838]: loading 13409 failed
Mar 05 22:21:54 RN316 snapperd[6838]: loading 19029 failed
Mar 05 22:21:54 RN316 snapperd[6838]: loading 19504 failed
Mar 05 22:21:54 RN316 snapperd[6838]: loading 19543 failed
Mar 05 22:21:54 RN316 snapperd[6838]: loading 19557 failed
Mar 05 22:21:54 RN316 snapperd[6838]: loading 19608 failed
Mar 05 22:21:54 RN316 snapperd[6838]: loading 19614 failed
...
Mar 05 22:25:25 RN316 clamd[4134]: SelfCheck: Database status OK.
Mar 05 22:25:32 RN316 snapperd[6838]: loading 13409 failed
Mar 05 22:25:32 RN316 snapperd[6838]: loading 19029 failed
Mar 05 22:25:32 RN316 snapperd[6838]: loading 19504 failed
...
Mar 05 22:25:32 RN316 snapperd[6838]: loading 12924 failed
Mar 05 22:25:32 RN316 snapperd[6838]: loading 12925 failed
Mar 05 22:25:32 RN316 snapperd[6838]: loading 1036 failed

 

 

 

 

 

Model: RN31600|ReadyNAS 300 Series 6- Bay (Diskless)
Message 9 of 27
StephenB
Guru

Re: Cannot copy files to RN316 although I have 22TB free...


@Platypus69 wrote:

 

BTRFS.LOG

Label: 'blah:root'  uuid: blah-blah
	Total devices 1 FS bytes used 1.43GiB
	devid    1 size 4.00GiB used 3.61GiB path /dev/md0

Label: 'blah:data'  uuid: blah-blah
	Total devices 2 FS bytes used 22.95TiB
	devid    1 size 18.17TiB used 18.14TiB path /dev/md127
	devid    2 size 27.28TiB used 4.84TiB path /dev/md126

=== filesystem /data ===
Data, single: total=22.95TiB, used=22.93TiB
System, RAID1: total=32.00MiB, used=2.95MiB
Metadata, RAID1: total=5.85GiB, used=5.32GiB
Metadata, DUP: total=10.50GiB, used=10.03GiB
GlobalReserve, single: total=512.00MiB, used=0.00B
=== subvolume /data ===

Why does Data,  single: showing 22.95TiB, which suspiciously seems to be the limit of the amount of data I can store? Recall the UI is showing data 22.96TB and Free space: 22.49TB. Is this the "smoking gun"???

 

 


The "total=22.95TB" doesn't mean what you think it means.  It is the total of the allocated space (and is essentially the same as the 22.96 TB you are seeing in the UI).  The two sizes further up are the size of your storage - md127 has size 18.17, md126 has size 27.28.  the total size is therefore 45.45TiB, which is the correct size for 6x10TB single redundancy RAID.

 

The problem here is that for some reason your system has completely filled md127.  You can see that by subtracting the "18.14 used" from the "18.17 size" for md127.  

 

There is a brute-force solution - which is to do a factory default, set up the NAS again, and restore your data from backup.  That would give you a single RAID group, and you'd have plenty of free space.  In addition to being time-consuming, you would lose all your snapshots.  Though painful, if it were my own system I'd do the reset and start over.

 

The other option I see is to delete all your existing snapshots, and see if that frees up space on md127.  You'd wait for a while after deletion - then download the log zip again, and look at the "used" space for that RAID group.  Hopefully it will drop substantially.

Message 10 of 27
Platypus69
Luminary

Re: Cannot copy files to RN316 although I have 22TB free...

Firstly, thanks a million as always.

 

Of course I don't know, but I would be surprised if snapshots are the root cause....

 

I only have set up snapshots for my OneDrive and Dropbox shares. Which represent a fraction of the photos and movies that are stored on the RN316.

 

Any other snapshots, which likewise were not large anyway, are now gone. So because I only used the free versions of these services that are limited in size. Dropbox = 16GB, OneDrive I can't remember, but probably around 16GB as well. So I thought I would use the snapshot feature of the RN316 for these shares as the free tiers of OneDrive and Dropbox do not have this functionality.

 

Do you really think it will make a difference if I remove these underlying snapshots? They are small no? But perhaps they take up a lot of meta data???? I don't know...

 

OneDrive share UI says:

  • 7149 files, 98 folders, 13GB
  • 20 snapshots (2year(s) protection)

DropBox share UI says:

  • 15365 files, 571 folders, 13.9GB
  • 19 snapshots (2year(s) protection)

I too have concluded/decided that I will at some point, as soon as I can, buy 8 x 16TB HDDs for my new DS1819+, and do as you suggest with moving all the data off the RN316, refomatting it and movingit back. But I cannot afford the 8 x 16TB HDDs right now, in one hit.

 

So the frustrating thing is I have run out of space on all my ReadyNASes. I have this 20TB free but I cannot use it!!!! ArggghHh.... 🙂

 

So would you suggest an action plan of trying to removing 1TB of old data from md127, then doing a balance, then doing a defrag, then doing a balance and then trying to copy the data back????????

 

Of course I am very curious as to what the problem is and how to avoid it in the future. It sounds to me that using a strategy of going from (6 x 4TB HDDs) to (6 x 10TB HDDs) to (6 x 16TB HDDs) in the future is not a viable solution for this BTRFS based RAID NASes. 

 

Unless of course I should have had monthly balances/defrags, Which I never did. Netgear never recommended it. I had assumed (incorrectly it seems)  that you never needed to run these operatiosn as I only predominanlty only add my family photos and videos.

 

So I want to learn the lesson here, but am struggling to learn what I did wrong and how to avoid this in the future, other than your "brute force" technique.

 

So I was planning to fill out my  new DS1819+

  • Buy 1 x 16TB HDDs in the first month (Yes I know there is no RAID)
  • Add 1 x 16TB HDD every month after that, so as to stagger the HDDs lifetime, reduced the chance of them all failing simultaneously, and also staggering the cost

But given all the dramas I am having with BTRFS, I wondering whether this is a horrendous idea, and I would be better off buying 8 x 16TB HDDs and setting up one massive pool. So take the hit on the wallet! 😞

 

Or can I get away with perhaps buying 4 x 16TB HDDs and set up one pool this year. And in 12-24 months buying 4 x 16TB HDDs  and set up another pool?

 

I am begining to suspect that buy the 8 x 16TB HDDs in one hit in the best way to go... Ouch!

 

 

 

 

 

 

 

Message 11 of 27
StephenB
Guru

Re: Cannot copy files to RN316 although I have 22TB free...

@Platypus69 wrote:

So would you suggest an action plan of trying to removing 1TB of old data from md127, then doing a balance, then doing a defrag, then doing a balance and then trying to copy the data back????????

 


Well, we can't see what is actually on md127 (as opposed to md126).  But you could try copying off some older shares, and then delete them.  After the space is reclaimed (hopefully from md127), you can try a balance (which should succeed if there's enough space on md127).  A scrub might also reallocate some space.  After that, you could recreate the shares and copy the data back.

 

A defrag won't help - and it can reduce free space in the shares that have snapshots enabled.

 


@Platypus69 wrote:

 

Unless of course I should have had monthly balances/defrags, Which I never did. Netgear never recommended it. I had assumed (incorrectly it seems)  that you never needed to run these operatiosn as I only predominanlty only add my family photos and videos.

 

So I want to learn the lesson here, but am struggling to learn what I did wrong and how to avoid this in the future

They don't offer any guidance on volume maintenance.  My current practice is to schedule each of the four tasks (scrub, disk test, balance, and defrag).  I cycle through one each month, so over a year each runs 3 times.  Defrag probably isn't necessary - but I have enough free space to avoid the downside, so I just run it anyway.

 

Opinions here differ on balance - @mdgm for instance only runs it rarely (if at all).  But I have seen posts here where it has reclaimed unallocated space.  In general, if a balance isn't needed then it runs very quickly and I've never had any problems running them.  So I continue to run them on this schedule.

 

I don't know how your system ended up this way. FWIW I also have multiple RAID groups on my NAS.

Label: '2fe72582:data'  uuid: a665beff-2a06-4b88-b538-f9fa4fb2dfef
	Total devices 2 FS bytes used 13.54TiB
	devid    1 size 16.36TiB used 12.72TiB path /dev/md127
	devid    2 size 10.91TiB used 1.27TiB path /dev/md126

Unallocated space isn't evenly split across the two RAID groups, but fortunately I do have reasonable space on the original md127 RAID group.

 

It seems to me that btrfs balance should handle this better - not sure if there are options that would spread the unallocated space more evenly.  I'll try to research it if I can find the time.

Message 12 of 27
Sandshark
Sensei

Re: Cannot copy files to RN316 although I have 22TB free...

In general Linux/BTRFS forums, it is recommended that you run a balance after adding a new element to a BTRFS volume.  I have seen no evidence that Netgear expansion does that when it vertically expands, which is adding a new element (new md RAID).  Without the balance, the data and metadata are not spread across the whole volume, which can result in problems.  From the BTRFS balance man page:  The primary purpose of the balance feature is to spread block groups across all devices so they match constraints defined by the respective profiles.

 

I don't know what those "constraints" are, but I asked about the balance because you may be running into one of them.

 

FYI, I run a balance monthly.  That's probably more often than necessary, but it's a fast process if you do it often and it happens on a schedule while I sleep, so why not?

Message 13 of 27
StephenB
Guru

Re: Cannot copy files to RN316 although I have 22TB free...


@Sandshark wrote:

 The primary purpose of the balance feature is to spread block groups across all devices so they match constraints defined by the respective profiles.

 


Yes, and the fact that @Platypus69 has never run one is part of the puzzle. But his balance is failing now, even though there is plenty of unallocated space in md126.  So the "primary purpose" isn't being acheived.

 

FWIW, I decided to run a balance with no parameters ( btrfs balance start /data ) just to see if that moves any chunks from md127 (about 75% allocated) to md126 (about 13% allocated).  It'll take a while, but I will report back when it completes.

Message 14 of 27
rn_enthusiast
Virtuoso

Re: Cannot copy files to RN316 although I have 22TB free...

When expanding, the NAS creates a second raid (md126) as you guys know. BTRFS will then put md127 and md126 into a JBOD kind of raid, using the filesystem itself to do so. Basically, mdadm creates to raids (devices essentially) and BTRFS sticks those together in a JBOD using the raid capability of the filesystem itself. That is fine and not an issue BUT if you look at the metadata, it is set to raid1. BTRFS can do clever things like hold different raid levels for data and metadata.

But take this situation... md127 is totally full and metadata is totally full as well. In order to write new data, new medata MUST be duplicated between md126 and md127 as specified in the BTRFS raid profile. But since the NAS cannot write anymore metadata to md127, well then it can't write at all because it is supposed to write the metadata in a raid1 fashion between the two devices.

Label: 'data' uuid: ...
Total devices 2 FS bytes used 23.43TiB
devid 1 size 18.17TiB used 18.17TiB path /dev/md127
devid 2 size 27.28TiB used 5.29TiB path /dev/md126

=== filesystem /data ===
Data, single: total=23.43TiB, used=23.42TiB
System, RAID1: total=32.00MiB, used=2.99MiB
Metadata, RAID1: total=5.85GiB, used=5.84GiB
Metadata, DUP: total=10.50GiB, used=10.01GiB
GlobalReserve, single: total=512.00MiB, used=33.05MiB


We need to balance some of the data across to md126. Running balance from GUI isn't really a full balance. The GUI will use parameters during the balance so it only balances parts of the volume. My suggestion would be to:
1. Take some data off the NAS temporarily. A few TBs.
2. Run a full balance from the CLI, like @StephenB  mentioned.

3. Move the data back (post to the thread with the volume stats like above, first, post balance).

 

Cheers

Message 15 of 27
Platypus69
Luminary

Re: Cannot copy files to RN316 although I have 22TB free...

Thanks. What everyone here has said makes sense, but it's the action plan which is confusing.

 

So am I winning here?

 

You can see that things "have changed":

 

Label: 'blah:root'  uuid: *
	Total devices 1 FS bytes used 1.46GiB
	devid    1 size 4.00GiB used 3.61GiB path /dev/md0

Label: 'blah:data'  uuid: *
	Total devices 2 FS bytes used 13.24TiB
	devid    1 size 18.17TiB used 18.09TiB path /dev/md127
	devid    2 size 27.28TiB used 4.84TiB path /dev/md126

=== filesystem /data ===
Data, single: total=22.90TiB, used=13.23TiB
System, RAID1: total=32.00MiB, used=2.95MiB
Metadata, RAID1: total=6.85GiB, used=5.81GiB
Metadata, DUP: total=10.50GiB, used=9.54GiB
GlobalReserve, single: total=512.00MiB, used=0.00B

Is that showing that we ae starting to expand Metadata, RAID1 as it has grown from 5.85GiB to 6.85GiB.

 

Is that sufficient? Will it now be ablt to grow from now on as required?

 

Has the problem been solved, or should I continue to move more stuff of and re-balance?

 

Here is the history from volume.log (apologies for length, I have removed all disk tests). I was incorrect in saying that I have never run Balance before as you can see, apologies.

 

FYI: I replaced the 4TB HDDs with 10TB HDDs in this time line:

  1. 04/12/2018
  2. 06/12/2018
  3. 29/07/2019
  4. 15/12/2019
  5. 21/03/2020
  6. 21/06/2020

So it would seem no Balance has been done since the last 4TB HDD was swapped out, until I have encountered this problem in 2021.

 

=== maintenance history ===
device      operation  start_time           end_time             result     details                                                         
----------  ---------  -------------------  -------------------  ---------  ----------------------------------------------------------------
data        balance    2016-06-15 00:00:01  2016-06-15 00:00:33  completed                                                                                                                                 
data        scrub      2016-07-01 00:00:01                                                                                                  
data        balance    2016-07-15 00:00:01  2016-07-15 00:02:20  completed                                                                  
data        balance    2016-08-15 00:00:19  2016-08-15 00:00:33  fail                                                                       
data        scrub      2016-09-01 00:00:01                                                                                                  data        balance    2016-09-15 00:00:02  2016-09-15 00:00:38  completed                                                                  data        scrub      2016-10-01 00:00:02  2016-10-02 04:38:32  pass                                                                       
data        balance    2016-10-15 00:00:01  2016-10-15 01:51:49  completed                                                                  
data        scrub      2016-11-01 00:00:01  2016-11-02 17:35:19  pass                                                                       data        balance    2016-11-15 00:00:01  2016-11-15 00:01:55  completed                                                                  
data        scrub      2016-12-01 00:00:01  2016-12-02 21:24:24  pass                                                                       data        balance    2016-12-15 00:00:01  2016-12-15 00:17:38  completed                                                                  
data        scrub      2017-01-01 00:00:02  2017-01-03 02:19:12  pass                                                                       
data        balance    2017-01-15 00:00:01  2017-01-15 00:09:23  completed                                                                  
data        scrub      2017-03-01 00:00:03                                                                                                  data        balance    2017-03-15 00:00:01  2017-03-15 06:26:22  completed                                                                  
data        scrub      2017-04-01 00:00:01                                                                                                  
data        balance    2017-04-15 00:00:01  2017-04-15 00:00:31  completed                                                                  
data        scrub      2017-05-01 00:00:01  2017-05-01 22:03:16  abort                                                                      
data        scrub      2017-05-01 22:31:25  2017-05-04 17:39:57  pass                                                                       
data        balance    2017-05-15 00:00:03  2017-05-15 00:03:54  completed  Done, had to relocate 4 out of 13438 chunks                     
data        scrub      2017-06-02 00:00:01                                                                                                  
data        balance    2017-06-15 00:00:02  2017-06-15 00:17:54  completed  Done, had to relocate 40 out of 14672 chunks                    
data        scrub      2017-07-02 00:00:01                                                                                                  data        balance    2017-07-15 00:00:01  2017-07-15 00:00:19  completed  Done, had to relocate 2 out of 14758 chunks                     data        scrub      2017-08-02 00:00:01                                                                                                  data        balance    2017-08-15 00:00:01  2017-08-15 02:00:33  completed  balance canceled by user                                        data        scrub      2017-09-02 00:00:01                                                                                                  data        balance    2017-09-15 00:00:01  2017-09-15 02:05:02  completed  balance canceled by user                                        data        scrub      2017-10-02 00:00:01                                                                                                  data        balance    2017-10-15 00:00:01  2017-10-15 02:03:23  completed  balance canceled by user                                        data        scrub      2018-01-02 00:00:01                                                                                                  data        balance    2018-01-15 00:00:01  2018-01-15 00:35:45  completed  Done, had to relocate 53 out of 16330 chunks                    data        scrub      2018-02-02 00:00:01                                                                                                  data        resilver   2018-02-17 21:47:57  2018-02-18 20:11:21  completed                                                                  data        scrub      2018-03-02 00:00:01                                                                                                  data        balance    2018-03-15 00:00:02  2018-03-15 00:13:32  completed  Done, had to relocate 14 out of 16390 chunks                    data        scrub      2018-04-02 00:00:01                                                                                                  data        balance    2018-04-15 00:00:01  2018-04-15 00:05:54  completed  Done, had to relocate 15 out of 16391 chunks                    data        scrub      2018-05-02 00:00:01                                                                                                  data        scrub      2018-06-02 00:00:01  2018-06-03 17:21:01  pass                                                                       data        balance    2018-06-15 00:00:02  2018-06-15 01:43:40  completed  Done, had to relocate 67 out of 18585 chunks                    data        balance    2018-07-15 00:00:01  2018-07-15 00:00:09  completed  Done, had to relocate 0 out of 18607 chunks
ERROR: error during 
data scrub 2018-08-02 00:00:01 data balance 2018-08-15 00:00:01 2018-08-15 00:00:08 completed Done, had to relocate 0 out of 18607 chunks ERROR: error during data resilver 2018-12-04 00:04:00 data resilver 2018-12-04 00:50:30 2018-12-04 17:53:15 completed data scrub 2018-12-04 18:21:51 2018-12-06 09:40:35 pass data resilver 2018-12-06 23:59:57 data resilver 2018-12-07 01:06:25 2018-12-07 16:58:15 completed data resilver 2018-12-07 16:58:27 2018-12-08 10:49:21 completed data scrub 2018-12-08 14:54:34 2018-12-11 02:49:34 pass data balance 2018-12-15 00:00:01 2018-12-15 00:03:26 completed Done, had to relocate 0 out of 18625 chunks Done, had to relocat data scrub 2019-01-07 03:08:33 2019-01-11 08:28:03 pass data balance 2019-01-15 00:00:01 2019-01-15 00:12:40 completed Done, had to relocate 0 out of 19285 chunks ERROR: error during data balance 2019-02-15 00:00:01 2019-02-15 00:00:30 completed Done, had to relocate 1 out of 20138 chunks data scrub 2019-03-01 01:00:01 data resilver 2019-07-25 12:20:40 2019-07-26 05:53:47 completed data resilver 2019-07-29 20:02:39 data resilver 2019-07-30 20:36:36 2019-07-31 15:17:52 completed data scrub 2019-10-01 01:00:01 2019-10-06 05:37:11 pass data scrub 2019-12-01 01:00:01 2019-12-06 10:58:28 pass data resilver 2019-12-14 23:56:42 2019-12-14 23:57:32 degraded data resilver 2019-12-15 00:09:16 2019-12-17 01:53:29 completed data scrub 2019-12-17 11:26:26 2019-12-23 05:14:28 pass data scrub 2020-02-01 01:00:01 2020-02-07 07:45:05 pass data resilver 2020-03-21 23:56:06 2020-03-22 17:01:34 completed data resilver 2020-03-22 17:02:31 2020-03-24 09:54:07 completed data scrub 2020-04-01 01:00:01 2020-04-07 10:46:32 pass data scrub 2020-06-01 01:00:02 data scrub 2020-06-10 19:44:01 2020-06-17 15:59:33 pass data resilver 2020-06-21 23:16:29 2020-06-24 14:16:38 completed data scrub 2020-08-01 01:00:01 data resilver 2020-08-08 15:10:50 data resilver 2020-08-08 15:45:05 2020-08-09 15:42:35 completed data resilver 2020-09-13 15:54:14 2020-09-14 20:35:46 completed data balance 2021-02-18 21:17:16 2021-02-18 21:18:42 completed ERROR: error during balancing '/data': No space left on device T data scrub 2021-02-18 21:29:29 data balance 2021-03-01 21:03:16 2021-03-01 21:04:48 completed ERROR: error during balancing '/data': No space left on device T data balance 2021-03-03 15:34:37 2021-03-04 03:44:36 completed ERROR: error during balancing '/data': No space left on device T data balance 2021-03-05 09:34:32 2021-03-05 10:29:27 completed ERROR: error during balancing '/data': No space left on device T data balance 2021-03-05 19:39:44 2021-03-05 19:49:07 completed ERROR: error during balancing '/data': No space left on device T data balance 2021-03-05 21:09:45 2021-03-05 21:27:23 completed ERROR: error during balancing '/data': No space left on device T data balance 2021-03-05 21:28:15 2021-03-05 21:28:19 completed Done, had to relocate 1 out of 23557 chunks data balance 2021-03-05 21:45:20 2021-03-05 21:46:05 completed Done, had to relocate 29 out of 23557 chunks data balance 2021-03-05 21:57:26 2021-03-05 21:57:31 completed Done, had to relocate 1 out of 23529 chunks data balance 2021-03-05 21:59:22 2021-03-05 21:59:27 completed Done, had to relocate 1 out of 23529 chunks data balance 2021-03-05 21:59:48 2021-03-05 21:59:53 completed Done, had to relocate 1 out of 23529 chunks data balance 2021-03-05 22:25:13 2021-03-05 22:25:18 completed Done, had to relocate 1 out of 23529 chunks data balance 2021-03-05 23:19:38 2021-03-05 23:19:44 completed Done, had to relocate 1 out of 23529 chunks data balance 2021-03-06 00:54:22 2021-03-06 00:54:28 completed Done, had to relocate 1 out of 23529 chunks data defrag 2021-03-06 00:54:49 2021-03-06 03:02:04 completed

 

 

Message 16 of 27
Platypus69
Luminary

Re: Cannot copy files to RN316 although I have 22TB free...

Thanks.

 

I will run this today.

 

Out of curiosity is there a --verbose switch or something like that which will report what's doing on the screen. Does the command return a summary of what was done in SSH?

Message 17 of 27
rn_enthusiast
Virtuoso

Re: Cannot copy files to RN316 although I have 22TB free...

The distribution between the two devices (md126 and md127) is still pretty bad.

md127 is still very very full and that can in some cases issues for the balance, hence I suggest maybe off loading some data.

 

You are gaining ground here, so that is good. You might get a away with simply balancing it out without off-loading any data first. If you keep the data on the NAS - i.e. not off loading some before the balance, then I would do it in increments.

 

btrfs balance start -dusage=10 /data
btrfs balance start -dusage=30 /data
btrfs balance start -dusage=50 /data
btrfs balance start /data

You can check progress of current balance with

btrfs balance status -v /data

Some good reading with regards to BTRFS balancing:

https://btrfs.wiki.kernel.org/index.php/Manpage/btrfs-balance 

Message 18 of 27
Platypus69
Luminary

Re: Cannot copy files to RN316 although I have 22TB free...

Ha! Ha! Ha!


Clearly you are a 1,000,000% correct in saying The "total=22.95TB" doesn't mean what you think it means. 

 

Yes I understand that md127 + md126 should equal to 45.35TiB, but everything else is confusing...  Does the "Data" label in the UI denote the amount of storage space consumed or it just referring to the Data pool, or something else?

 

Why do I ask? Well...

 

So this is what I did yesterday:

  • Moved some files off the new share I created in Feb 2021 and an older one (May 2020).
  • Turned off the remaining 2 snapshots that I had for OneDrive and DropBox
  • Deleted all recent 2021 daily snapshots for these two shares that have been created.
  • I decided to keep the monthly snapshots for these shares. As discussed before there are only about 2 x 19 of them going back 2 years. (Now I am not sure how the RN316 implements snapshots. If I have only ever added files and hardly ever modified or deleted the files, does the snapshot make a copy of the 16GB work of files, or just does it just maintain metadat in the file system, and when you modify or delete a files that is when it "moves" it into the snapshot, if you know what I mean)
  • Before I went to sleep last night I kickjed off aa Defrag.
  • Today I kicked of a Balance through the UI

 

Operations performed yesterday and kicked off todayOperations performed yesterday and kicked off today

 

But right now 3.5 hours later here is what the UI is showing:

 

Why has Data dropped from 23.40TB to 13.26TB???Why has Data dropped from 23.40TB to 13.26TB???

So obviously the confusion/concern is that Data dropped from 23.40TB to 13.26TB!

 

Is this being recalculated dynamically as the Balance is performing it's "dark magic"? So I have to wait for it to finish to see what these values end up? Or is it accurate right now?

 

Is this expected? Obviously something has happened, and is happening. How exciting 😞

 

My concern of course it that I have "lost data" as I am pretty sure I have NOT moved off 10TB as I don't have enough free spare storage. 🙂

 

Recall that I:

  • Turned off Smart Snapshots from about 5 shares which had very little data. I did not ever have snapshot my  main shares that I used for Videos, Photos, Software and Backups, which would probably account for 90% off files. I only had a number of snapshots for the OneDrive and DropBox shares, but remember they are both limited to 16GB as i am only using the free tiers.
  • Moved 500GB of the file that I copied across to the RN316 in February 2021 and some older files from May 2020 (so before I swapped out the last 10TB HDD)

So this clearly is well under 10TB. Not even 1TB.

 

So why the big decrease in size reported for Data?

 

Apologies if the answer is really simple and I am being dumb...

 

 

 

 

 

Message 19 of 27
StephenB
Guru

Re: Cannot copy files to RN316 although I have 22TB free...


@Platypus69 wrote:

 

 

Yes I understand that md127 + md126 should equal to 45.35TiB, but everything else is confusing...  Does the "Data" label in the UI denote the amount of storage space consumed or it just referring to the Data pool, or something else?


Going back to where you started,  I'm going to slightly revise the original report, which hopefully will help provide clarity.  

Label: 'data'  uuid: ...
	Total devices 2 FS bytes used 23.43TiB
	devid    1 size 18.17TiB allocated 18.17TiB path /dev/md127
	devid    2 size 27.28TiB allocated 5.29TiB path /dev/md126
=== filesystem /data ===
Data, single: allocated=23.43TiB, used=23.42TiB
System, RAID1: allocated=32.00MiB, used=2.99MiB
Metadata, RAID1: allocated=5.85GiB, used=5.84GiB
Metadata, DUP: allocated=10.50GiB, used=10.01GiB
GlobalReserve, single: allocated=512.00MiB, used=33.05MiB
=== subvolume /data ===

 Looking at line 4, md126 has a size of 27.28 TiB, but only 5.29 TiB was allocated.  Per line 3, md127 has a size of 18.17 TiB, but all of it was allocated.  This totals 23.46 allocated.

 

Looking at line 6, the volume had 23.43 TiB allocated for data.  Since we have 23.46 total allocation, that means .03 TiB (~30 GiB) was allocated to system and metadata.  Of course there are rounding errors, since the reports aren't exact - so if you add up the system and metadata stuff on lines 7-10, it is a bit less than that.

 

Now, looking again at line 6, you had 23.43 TiB of allocated space, but 23.42 TiB of used space.  That means you had .01 TiB of space that is allocated but not used.  Although it is natural to label this "allocated but not used" bucket as "unused", personally I think it is more appropriate to label it "unusable".  This unusable space is there because BTRFS allocates space in blocks.  There will be some lost (unusable) space in many blocks.

 

I've been careful not to use the word "free", because that concept is a bit slippery with btrfs.  The Web UI is labeling unallocated space as "free" - which is reasonable, but sometimes misleading.  What you really have is allocated (used and "unusable") and unallocated.

 

As @rn_enthusiast points out, the balance was failing because the file system

  • is set up to duplicate metadata (so there is a copy on both md126 and md127)
  • and there was no allocated space at all on md127

 

Your deletions apparently did reclaim some unallocated space, and it looks like the balance is now doing its job. But what exactly does "doing it's job" mean?

 

As @Sandshark said (quoting the man page), The primary purpose of the balance feature is to spread block groups across all devices...  There is useful side effect though.  A balance will also consolidate the allocated space, so there is less "unusable" space.  So even if you only have one device (your original md127), it can be useful to run balances from time to time.

 

So when your balance completes, you should expect to see more unallocated space on md127, and more allocated space on md126.  You should look at the unallocated space you end up with on both volumes when it's done.  

 

But as @rn_enthusiast says Running balance from GUI isn't really a full balance. The GUI will use parameters during the balance so it only balances parts of the volume.  So what's that about?  Well, mostly it's about how long the balance takes.  A full balance (with no filter parameters) will take several days on your system.  Lots of users complained about the run time, so Netgear added in some filters - which speed it up, but at the cost of not balancing completely.  What these parameters do is focus the balance on chunks that have unusable space. 

 

For instance, @rn_enthusiast  also suggested running running the balance from the command line: 

btrfs balance start -dusage=10 /data

The -d is a filter for data blocks (not metadata or system).  The usage=10 tells the balance to only process blocks that have 10% (or less) used space - in other words, only process blocks that have 90% or more unusable space.  That will run more quickly, and it will be easier for the system to consolidate the unusable space - converting it back to the unallocated space you needed.  The system needs some working space in order for the considation to happen, and setting the dusage low reduces that space.  FWIW, I'd So why the big decrease in size reported for Data?have suggested starting with -dusage=0 as there often are some allocated blocks that end up completely empty, and the system can convert them back to unallocated without needing any working space.

 


@Platypus69 wrote:

So why the big decrease in size reported for Data?

 


Good question, and I'm not sure I have a fully satisfactory answer.  But I believe the issue is that you had more unusable space than the system was reporting (that the used fraction in the reports is an estimate).  Then once the balance was able to really get going, it found a lot more unusable space that it could shift to unallocated.

 

This could be related to the snapshots you deleted - the system perhaps wasn't able to reclaim the space at the time, but now that you have some unallocated space to work with, the system is getting that space back.

 


@Platypus69 wrote:
  • Before I went to sleep last night I kicked off a Defrag.

You got away with this, but it was a bad idea.

 

A defrag is basically rewriting a fragmented file, so it is unfragmented.  Doing that requires unallocated space that you didn't have.

 

Even with older file systems like FAT32, defragging the files results in fragmented free space, and defragging the free space results in fragmented files.  It's similar with BTRFS - defragging files will end up reducing the unallocated space.

 

Also, defragging a share with snapshots can sharply increase the amount of disk space used by the share.  If you want to defrag regularly, you really do want to limit snapshot retention (as I suggested earlier).

 

 

 

Message 20 of 27
Platypus69
Luminary

Re: Cannot copy files to RN316 although I have 22TB free...

Thanks for that.

 

Muchly appreciated.

 

Perhaps Deftrag was not a big deal as I have only ever copied files and never/hardly ever deleted or modified them. (But I do have Copy-on-Write enabled for all shares. Not sure what that has to do woith Bit-Rot Protection, but I digress.) So that's why I got away with it???

 

In any case I will report what happens tomorrow as it is still at 64% as of midnight tonight. Another aside: I cannot a Balance that fails with an error shows up as a green completed line in the UI logs!!!

 

But is sound like you're saying that I should still perform the following after this completes:

 

btrfs balance start -dusage=10 /data
btrfs balance start -dusage=30 /data
btrfs balance start -dusage=50 /data
btrfs balance start /data

Happy to do that.

 

Otherwise, is my current scenario due to the fact that it is predominantly storing photos? So my Photos share stores over 380,000 files that are JPG and HEVC files which are on average around 4MB in size?  So that accounts for your "unusable space"?

 

Thanks again for comprehensive reply, muchly appreciated.

Message 21 of 27
StephenB
Guru

Re: Cannot copy files to RN316 although I have 22TB free...


@Platypus69 wrote:

Another aside: I cannot a Balance that fails with an error shows up as a green completed line in the UI logs!!!

I'm not understanding this comment.

 


@Platypus69 wrote:

But it sound like you're saying that I should still perform the following after this completes:

Give us the btrfs info after the balance completes - that will make it easier to give you any next steps.

 


@Platypus69 wrote:

Otherwise, is my current scenario due to the fact that it is predominantly storing photos? So my Photos share stores over 380,000 files that are JPG and HEVC files which are on average around 4MB in size?  So that accounts for your "unusable space"?

 


I don't think it's about the number of files or their size. 

 

BTRFS allocates fairly large blocks (1 GiB). The data from more than one file can be stored in the same block.

 

When you fill the share initially, the file system should be pretty efficient.  But later on, as files are modified or deleted there will be "holes" - unused space in the blocks.  There is background processing that can consolidate them - but I'm not sure what triggers that background processing (and there's no easy way to see it). 

 

But doing a balance will run all the blocks that meet the -dusage criteria through the btrfs allocator again, and that will consolidate the data (removing the wasted space). 

 

Message 22 of 27
Platypus69
Luminary

Re: Cannot copy files to RN316 although I have 22TB free...

Wow! The remaining 80% of the Balance took most of today.

 

data        balance    2021-03-05 09:34:32  2021-03-05 10:29:27  completed  ERROR: error during balancing '/data': No space left on device
T
data        balance    2021-03-05 19:39:44  2021-03-05 19:49:07  completed  ERROR: error during balancing '/data': No space left on device
T
data        balance    2021-03-05 21:09:45  2021-03-05 21:27:23  completed  ERROR: error during balancing '/data': No space left on device
T
data        balance    2021-03-05 21:28:15  2021-03-05 21:28:19  completed  Done, had to relocate 1 out of 23557 chunks
data        balance    2021-03-05 21:45:20  2021-03-05 21:46:05  completed  Done, had to relocate 29 out of 23557 chunks
data        balance    2021-03-05 21:57:26  2021-03-05 21:57:31  completed  Done, had to relocate 1 out of 23529 chunks
data        balance    2021-03-05 21:59:22  2021-03-05 21:59:27  completed  Done, had to relocate 1 out of 23529 chunks
data        balance    2021-03-05 21:59:48  2021-03-05 21:59:53  completed  Done, had to relocate 1 out of 23529 chunks
data        balance    2021-03-05 22:25:13  2021-03-05 22:25:18  completed  Done, had to relocate 1 out of 23529 chunks
data        balance    2021-03-05 23:19:38  2021-03-05 23:19:44  completed  Done, had to relocate 1 out of 23529 chunks
data        balance    2021-03-06 00:54:22  2021-03-06 00:54:28  completed  Done, had to relocate 1 out of 23529 chunks
data        defrag     2021-03-06 00:54:49  2021-03-06 03:02:04  completed
data        balance    2021-03-06 11:23:49  2021-03-07 18:58:54  completed  Done, had to relocate 8286 out of 23482 chunks

 

Out of curiosity is there a correlation between the 8286 chunks relocated and the 10 odd TB that Data was reduced by? So something like 8286 x 1GB = 10TB approximately?

 

BTRFS:

Label: '*:root'  uuid: *
	Total devices 1 FS bytes used 1.48GiB
	devid    1 size 4.00GiB used 3.61GiB path /dev/md0

Label: '*:data'  uuid:*
	Total devices 2 FS bytes used 13.24TiB
	devid    1 size 18.17TiB used 10.02TiB path /dev/md127
	devid    2 size 27.28TiB used 4.83TiB path /dev/md126

=== filesystem /data ===
Data, single: total=14.81TiB, used=13.23TiB
System, RAID1: total=32.00MiB, used=2.04MiB
Metadata, RAID1: total=6.85GiB, used=6.80GiB
Metadata, DUP: total=10.50GiB, used=8.44GiB
GlobalReserve, single: total=512.00MiB, used=24.00KiB
=== subvolume /data ===

Does that look all good now? I have not lost data?

 

So what should I do now?

 

Happy to do following as was suggested before.

btrfs balance start -dusage=10 /data
btrfs balance start -dusage=30 /data
btrfs balance start -dusage=50 /data
btrfs balance start /data

Or anything else such as Scrub or  a Defrag.

 

I don't want to again have the same issue of 10TB being free but not being able to copy any files to the RN316.

 

So if these operations take another week, that is perfectly fine. I'd rather have everythign optimised now. 

 

Thanks again!!!

 

 

 

 

Message 23 of 27
StephenB
Guru

Re: Cannot copy files to RN316 although I have 22TB free...


@Platypus69 wrote:

Does that look all good now? I have not lost data?

 

So what should I do now?

All is good now, and there is no need to run anything immediately.  You can of course restore the data you deleted.

 

But I do suggest setting up a maintenance schedule using the volume schedule control on the volume settings wheel (again, mine uses a four-month cycle, running one maintenance task each month).  You could alternatively enable autodefrag on the share settings, and run the remaining tasks on a 3 month cycle.

volume-1.png

 

volume-2.png

 

 

You could also re-enable snapshots on the remaining shares if you like - ideally setting up custom snapshots with fixed retention.  I use 3 months on my own systems for most shares.

snapshot1.png

 

snapshot2.png

 

If you don't have a backup plan in place for your NAS you should definitely set up one - for instance, purchasing a large USB backup disk.

 


@Platypus69 wrote:

 

Out of curiosity is there a correlation between the 8286 chunks relocated and the 10 odd TB that Data was reduced by? So something like 8286 x 1GB = 10TB approximately?

 


Each chunk is 1 GiB.  So while there is a correlation, you reclaimed somewhat more space than that.  There could have been some chunks that were allocated but completely unused - not sure if the log would show them as relocated or not.

Message 24 of 27
Platypus69
Luminary

Re: Cannot copy files to RN316 although I have 22TB free...

Thanks for that. Will do.

 

Thanks A BILLION for  everyone's help and all practical advice!!!!!!

 

And patience 🙂

 

I do have another NAS/HDD for the photos but am wondering if I just RAR up each month and then store them in AWS Deep Glacier for $0.00099 per GB-month.

 

Notice that my New DS1819+ can sync with BackBlaze, so maybe I'll take advantge of that???

Message 25 of 27
Top Contributors
Discussion stats
  • 26 replies
  • 4273 views
  • 2 kudos
  • 4 in conversation
Announcements