Forum Discussion

Platypus69

Luminary

Feb 19, 2021

Cannot copy files to RN316 although I have 22TB free...

Hi all. I have the following RN316:

Firmware 6.10.4
Running 6 x 10TB IronWolf HDDs
X-RAID
21.9TB free / 23.4TB used
History:
- Last year I replaced all the 8TB Ironwold HDDs (from memory) one by one over a 6-8 month period.
- The RN316 has predominantly been used for family photos and videos, so lot's of small files
- I have enabled Intelligent Snapshots

Last week I upgraded to 6.10.4 from the previous version (I like to keep things up to date.)

I then created a new SMB TEMP share with Bit Rot Protection, no snapshots, no quotas.

I have been copying on average 20GB files to this share with no problem/ Currently there is 22TB free.

And then I started having this problem:

I try to copy a large file to the share. Windows 10's copy dialog box comes up with 0% complete with Time Remaining: Calculating
Nothing happens for a long time. There is no progress on the green file copy graph
After a minute or so I get an error message The destination already has a file named ...
I click on skip this file option
A zero size file is created.

So the file is not copied and a zero sized file is created instead!

I ran a balance, which took seconds.

I am currently running a scrub. it will take days to complete.

I can replicate this on other shares today, so it is not share specific. not a quota thing???

I also tried copying a 968KB PDF file today in an attempt to isolate the problem. It worked, but took ages. Not the 1-2 seconds that I would expect.

So I have no idea, another BTRFS / metadata / copy-on-write issue? I have no idea...

How do I solve this?

So here is from the BTRFS.LOG snippet that I assume is of interest:

Label: 'root'  uuid: ...
	Total devices 1 FS bytes used 1.47GiB
	devid    1 size 4.00GiB used 3.24GiB path /dev/md0
Label: 'data'  uuid: ...
	Total devices 2 FS bytes used 23.43TiB
	devid    1 size 18.17TiB used 18.17TiB path /dev/md127
	devid    2 size 27.28TiB used 5.29TiB path /dev/md126
=== filesystem /data ===
Data, single: total=23.43TiB, used=23.42TiB
System, RAID1: total=32.00MiB, used=2.99MiB
Metadata, RAID1: total=5.85GiB, used=5.84GiB
Metadata, DUP: total=10.50GiB, used=10.01GiB
GlobalReserve, single: total=512.00MiB, used=33.05MiB
=== subvolume /data ===

This has confused me the numbers don't look right.

Did the X-RAID not expand correctly???

Does this snippet from the VOLUME.LOG help?

...

Pool data 
Device: /dev/md126
Node: 9:127
HostID: 43f5fa04 (native)
UUID: 02de77f1-efa5-4411-a1c8-b1ee66da855a
Mount point: /data
Size: 48806686264KB (46545 GB)
Available: 23630170408KB (22535 GB)
Snapshot: 0KB (0 MB)
Data allocation: 23990 GB
Data used: 24553465 MB (23977 GB)
Metadata allocation: 16776 MB
Metadata used: 16231 MB
Unallocated: 22523 GB
RAID Level: 5
State: redundant
Action: scrubbing for errors (auto-repair)
Sync progress: 17.7% (3455025152/19523436425)
Sync speed: 30945 KB/sec
Time to completion: 4327.1 minutes
Flags: 0x4C8
 Type: btrfs
Tier Flag: Disable
RAIDs:
    md127
        Size: 39021667840
        Level: 5
        Action: scrubbing for errors (auto-repair)
        Members: 6
        Type: 3
        Flags: 0x408
        Tier: 0
        Data allocation: 18580 GB
        Metadata allocation: 27528 MB
        Unallocated: 0 GB
    md126
        Size: 58592892160
        Level: 5
        Action: scrubbing for errors (auto-repair)
        Members: 6
        Type: 3
        Flags: 0x8
        Tier: 0
        Data allocation: 5410 GB
        Metadata allocation: 6024 MB
        Unallocated: 22523 GB
Disks:

...

=== df -h ===
Filesystem      Size  Used Avail Use% Mounted on
udev             10M  4.0K   10M   1% /dev
/dev/md0        4.0G  1.6G  2.2G  42% /
tmpfs           993M     0  993M   0% /dev/shm
tmpfs           993M   31M  963M   4% /run
tmpfs           497M  1.6M  495M   1% /run/lock
tmpfs           993M     0  993M   0% /sys/fs/cgroup
/dev/md127       46T   24T   23T  52% /data
/dev/md127       46T   24T   23T  52% /home
/dev/md127       46T   24T   23T  52% /apps
=== df -i ===
Filesystem     Inodes IUsed  IFree IUse% Mounted on
udev           253416   551 252865    1% /dev
/dev/md0            0     0      0     - /
tmpfs          254086     1 254085    1% /dev/shm
tmpfs          254086   849 253237    1% /run
tmpfs          254086    28 254058    1% /run/lock
tmpfs          254086    15 254071    1% /sys/fs/cgroup
/dev/md127          0     0      0     - /data
/dev/md127          0     0      0     - /home
/dev/md127          0     0      0     - /apps

Not sure with what's going on with md127 versus md126, whatever they are? Mount points?

Happy to provide any other log data.

TIA

Adding Disks

Other Discussions

Performance

26 Replies

Replies have been turned off for this discussion

StephenB
Guru - Experienced User
Feb 19, 2021
Your posts were caught by the automatic spam filter - I released this one. Not sure if the others had the same information, or if there was additional stuff. If the latter, let me know, and I can release the others.

Platypus69 wrote:

Not sure with what's going on with md127 versus md126, whatever they are? Mount points?

Your volume has two RAID groups - the original RAID-5 group, and the second RAID-5 group that was created when you vertically expanded the array to 6x10TB.

The main RAID group is md127, the second group is md126. These are concatenated together to give you a single volume.

Not sure what is confusing you about the numbers above, they look like they are consistent with your overall used/free information. One thing that is confusing is that you say you upgraded 8 TB ironwolf drives to 10 TB. The sizes of md126 and md127 look you like might have upgraded 4 TB ironwolf drives to 10 TB.

Have you looked for errors in system.log and kernel.log?

As an aside - though you do have plenty of free space, you might want to reconsider use of the "smart" snapshots. The problem with them is that the oldest are retained forever. Setting up custom snapshots with fixed retention is IMO a much better approach. Something to consider when you resolve your problem here.
- Sandshark
  Sensei
  Feb 20, 2021
  Try turning off Strict Sync for the share. It's in the share's Network Access/Advanced settings tab.
  - Platypus69
    Luminary
    Feb 21, 2021
    Thanks, but that did not fix problem.
- Platypus69
  Luminary
  Feb 21, 2021
  Thanks.
  I am familiar with Microsoft stack and NTFS/ReFS, and VSAN.
  I know nothing about BTRFS, so I assume that it would expand existing volume not create a second. Thus my confusion through ignorance.
  Yes, you are correctm they were 4TB HDDs, it was over 5 years ago probably...
  
  Thanks for the heads up on the Snapshots. I expect it was a good idea 5 years when I set it up as it was designed to store family photos/videos.
  
  So how do I turn of snapshots. Is it simply a matter of going into the snapshot GUI and say turn off "Smart" management? I am nervouse about doing something wrong and losing data :)
  
  What would I be looking for in kernel.log or system.log? I tried to attach them as a ZIP file but it ssmes you can't...
  
  I had a quick look and noithing made sense to me or stood out.
  
  Although I am wondering if it is snapshot related. I am getting lots of errors like the following in system.log:
  Feb 21 22:56:03 RN316 snapperd[545]: loading 12848 failed Feb 21 22:56:03 RN316 snapperd[545]: loading 12850 failed Feb 21 22:56:03 RN316 snapperd[545]: loading 1036 failed Feb 21 22:56:03 RN316 snapperd[545]: loading 1037 failed
  A quick Google search seemed to imply that snapperd is snapshot related???
  
  Any help appreciated.
  
  Sorry for delay, RN316 has been unresponsive for 3 days due to Scrub operation :(
  
  Actually I just randomly found this in VOLUME.LOG:
  data disk test 2020-09-01 01:00:01 2020-09-01 15:20:27 pass data resilver 2020-09-13 15:54:14 2020-09-14 20:35:46 completed data balance 2021-02-18 21:17:16 2021-02-18 21:18:42 completed ERROR: error during balancing '/data': No space left on device T data scrub 2021-02-18 21:29:29
  Is that relevant???
  
  Should I try moving some large files off and doing Balance again?
  
  TIA
  - StephenB
    Guru - Experienced User
    Feb 21, 2021
    Platypus69 wrote:
    
    So how do I turn off snapshots. Is it simply a matter of going into the snapshot GUI and say turn off "Smart" management? I am nervous about doing something wrong and losing data :)
    
    You can either turn them off altogether in the GUI or you can change to Custom and explicitly set retention. Just turning them off won't delete existing snapshots - you need to go into "recover", select them, and delete them manually.
    
    Platypus69 wrote:
    
    I assume that it would expand existing volume not create a second.
    
    It did expand the existing volume. You are confusing "volumes" with "RAID groups". They aren't the same thing. A ReadyNAS volume consists of one or more RAID groups.
    
    Platypus69 wrote:
    
    What would I be looking for in kernel.log or system.log? I tried to attach them as a ZIP file but it ssmes you can't...
    
    Actually I just randomly found this in VOLUME.LOG:
    
    data disk test 2020-09-01 01:00:01 2020-09-01 15:20:27 pass data resilver 2020-09-13 15:54:14 2020-09-14 20:35:46 completed data balance 2021-02-18 21:17:16 2021-02-18 21:18:42 completed ERROR: error during balancing '/data': No space left on device T data scrub 2021-02-18 21:29:29
    
    Is that relevant???
    
    Should I try moving some large files off and doing Balance again?
    
    You can't attach zips, and you should be cautious about including links to the full logs. There is some privacy leakage.
    
    Generally you are looking for errors that include disks or "btrfs".
    
    Your error is certainly relevant, and I would suggest looking for errors in system and kernel.log around the time of the error in volume.log. Deleting some large files and trying to balance again is a reasonable next step.