RN316 Corrupting files on write

Aspirant

Jun 24, 2021

No issues found on the Disk Test and Scrub (unless there is a detailed log somewhere else other than Frontview?)

Jun 24, 2021 01:55:40 AM	Volume: Disk test completed for volume data.
Jun 23, 2021 10:10:17 PM	Volume: Disk test is in process for Volume 'data'.
Jun 23, 2021 08:14:05 PM	Volume: Scrub completed for volume data'.
Jun 20, 2021 01:43:55 AM	Volume: Scrub started for volume data.

I'll do an firmware update to 6.10.2 at least (I don't use rsync backups so I can even do 10.5)

I also read somewhere, that scrubbing disk by disk (rather than raid level) was required for BTRFS, however I'm not sure how Netgear have implemented BTRFS on RAID 5 with the RN315/OS6?

Would it be safe and of any benefit to do a disk by disk scrub on the RN316? ....at lease read only to start with, something like:

btrfs scrub start -r /dev/sda
btrfs scrub start -r /dev/sdb
btrfs scrub start -r /dev/sdc
btrfs scrub start -r /dev/sdd
btrfs scrub start -r /dev/sde
btrfs scrub start -r /dev/sdf

Then if any errors, without the -r (read only)?

Drives are listed as:

# fdisk -l | grep "^Disk /dev/sd"
Partition 2 does not start on physical sector boundary.

Disk /dev/sda: 7.3 TiB, 8001563222016 bytes, 15628053168 sectors
Disk /dev/sdb: 7.3 TiB, 8001563222016 bytes, 15628053168 sectors
Disk /dev/sdc: 7.3 TiB, 8001563222016 bytes, 15628053168 sectors
Disk /dev/sdd: 7.3 TiB, 8001563222016 bytes, 15628053168 sectors
Disk /dev/sde: 7.3 TiB, 8001563222016 bytes, 15628053168 sectors
Disk /dev/sdf: 7.3 TiB, 8001563222016 bytes, 15628053168 sectors
......

q3d
Aspirant
Jun 25, 2021
Thanks, found it. The kernal logs have several different types of errors.

There's a few entries from a 'mount -a' command with errors (network drive not mounting, due to network connection issue or the network device was off at the time, etc.), however could a 'mount -a' affect the already existant raid mount in anyway?

It's also possible one of the network cables are/were faulty, and some mid-writes aborted due to timeouts - could mid-write netwrok timeout transfer aborts also cause BTRFS write issues?

I also ran btrfs scrub -r on the raid, and found 123 read errors within the first few gb's scan,,,but that's all.

I'll check for other errors in the logs, however before doing anything further, I'll do some physical/hardware cleanups, reinsert the drives, swapout cables, etc and monitor it over a few days. The RN316 is several years old, remotely isolated but probably needs some physical checkups.
- q3d
  Aspirant
  Jun 25, 2021
  uncorrectable_errors: 123
  
  The drives are under a year old and do not report any issues.
- StephenB
  Guru - Experienced User
  Jun 25, 2021
  q3d wrote:
  
  It's also possible one of the network cables are/were faulty, and some mid-writes aborted due to timeouts - could mid-write network timeout transfer aborts also cause BTRFS write issues?
  
  Connection timeouts couldn't cause BTRFS write issues, though of course they could result in lost writes.
  
  You are using both network connections? If so, what form of network aggregation are you using? (LACP, etc)?
  
  q3d wrote:
  
  I also ran btrfs scrub -r on the raid, and found 123 read errors within the first few gb's scan,,,but that's all.
  
  What volume were you scrubbing? The OS partition (md0) is also BTRFS.
  
  Read errors obviously would account for it. Either there are disk errors involved, or the file system contains entries that somehow point to non-existent disk sectors.
  - q3d
    Aspirant
    Jun 25, 2021
    StephenB wrote:
    q3d wrote:
    
    It's also possible one of the network cables are/were faulty, and some mid-writes aborted due to timeouts - could mid-write network timeout transfer aborts also cause BTRFS write issues?
    
    Connection timeouts couldn't cause BTRFS write issues, though of course they could result in lost writes.
    
    You are using both network connections? If so, what form of network aggregation are you using? (LACP, etc)?
    Lost writes? as in the file transfer would abort completely? eg clean aborts?
    
    I'm using both NIC's - I changed the binding a few times over the last several months (the BTRFS errors start around April '21), I'm currently using Adaptive Load Balancing. I have 1x NAS NIC going to a router, and the other going through a managed switch which is managed by the same router - eg same network /24 range, etc. - not sure if that could cause issues using going through devices with Adaptive Load Balancing.
    
    I also noticed a few SMB failures on IPV6 - not sure why IPV6 is on DHCP enabled (the LAN is IPV4 focused).
    
    ../source3/smbd/smb2_read.c:258(smb2_sendfile_send_data) smb2_sendfile_send_data: sendfile failed for file (Input/output error) for client ipv6:fe80::4425:12a8:14fd:b7fb:54365. Terminating
    
    StephenB wrote:
    
    q3d wrote:
    
    I also ran btrfs scrub -r on the raid, and found 123 read errors within the first few gb's scan,,,but that's all.
    
    What volume were you scrubbing? The OS partition (md0) is also BTRFS.
    
    Read errors obviously would account for it. Either there are disk errors involved, or the file system contains entries that somehow point to non-existent disk sectors.
    Scrub on /dev/md127 - it appears on Frontview when running it, but allows me to capture live data as it scrubs via console. Once done, I can also check /dev/md0 just in case.
    
    Any idea what md126 is? I get a stack of the following in kernal.log (says BTRF warnings come from device md126? but i/o errrors are on /dev/md127)
    Jun 24 14:06:09 NAS kernel: BTRFS warning (device md126): i/o error at logical 68479287296 on dev /dev/md127, sector 135862144, root 272, inode 1215727, offset 57511936, length 4096, links 1 (path: *****) Jun 24 14:06:09 NAS kernel: BTRFS error (device md126): bdev /dev/md127 errs: wr 61280, rd 690, flush 0, corrupt 0, gen 0 ... Jun 24 14:06:09 NAS kernel: BTRFS error (device md126): bdev /dev/md127 errs: wr 61280, rd 694, flush 0, corrupt 0, gen 0 Jun 24 14:06:09 NAS kernel: BTRFS error (device md126): bdev /dev/md127 errs: wr 61280, rd 695, flush 0, corrupt 0, gen 0 ...
    Also, just prior to the above, I get this:
    Jun 24 12:45:30 NAS kernel: nfsd: last server has exited, flushing export cache Jun 24 12:45:30 NAS kernel: NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory Jun 24 12:45:30 NAS kernel: NFSD: starting 90-second grace period (net ffffffff88d74240) Jun 24 13:58:10 NAS kernel: CIFS VFS: Send error in QFSUnixInfo = -13 Jun 24 14:01:11 NAS kernel: CIFS VFS: Send error in QFSAttributeInfo = -13 Jun 24 14:04:11 NAS kernel: CIFS VFS: cifs_mount failed w/return code = -13

Forum Discussion

Related Content

Dead RN316

RN316 will not boot. Corrupt?

RBW30 Latest Firmware file corrupt

RN316 Corrupt Files

ReadyNAS RN316 showing Volume Is read-only

NETGEAR Academy

ProSupport for Business