× NETGEAR will be terminating ReadyCLOUD service by July 1st, 2023. For more details click here.
Orbi WiFi 7 RBE973
Reply

Re: ReadyNAS 214 data corruption issue

aks-2
Apprentice

ReadyNAS 214 data corruption issue

I am seeing a very strange behaviour that results in corrupted files. My NAS (ReadyNAS 214) has been stable for years, until recently I found a single file had an error revealed via the logs following a scrub. I recoved that from a backup.

 

Over the past two days, I have noticed various newly written files appear fine, but after 'a bit of time', they become corrupted. Yes, weird I know. This has happened with edited Excel (xlsx) and non-edited .jpg and PDF files. Not all files going to the NAS are affected.

 

To isolate the issue:

  • I scanned my PC in case of a virus. Nothing found.
  • Created a checksum file, then copied the files to the NAS. Reread the checksums and all appears good.
  • A variable time later, re-check checksums. Results bad.
  • The time to going bad is immediate if I re-copy a file within the NAS folder, i.e. CTRL-C / CTRL-V, then check the checksums, and both the original and copy become corrupted.
  • I initially noticed this issue with photos as previews no longer showed, but repeated with xls and pdf.
  • I shutdown my PC in case it was the cause in some way, e.g. undetected virus.
  • I then tried from my wifes PC and was able to recreate the corrupting of files, almost at will.
  • I'm now convinced it's a NAS problem, rather than a PC problem, but maybe I am wrong!

So, I dived in to the logs and found quite a few lines like the following line in kernel.log:

 

Aug 26 17:54:53 NETDISK kernel: BTRFS error (device md124): bdev /dev/md127 errs: wr 95, rd 333, flush 0, corrupt 0, gen 0

 

Which of course looks bad, and is probably why I am getting corrupted files.

 

Any ideas how I identify the root cause, and fix this, as right now I cannot trust anything writen to the NAS?

 

Luckily, I think I caught this early and have other copies of the files, or can re-create any lost data easily.

 

Message 1 of 22
StephenB
Guru

Re: ReadyNAS 214 data corruption issue


@aks-2 wrote:

 

Aug 26 17:54:53 NETDISK kernel: BTRFS error (device md124): bdev /dev/md127 errs: wr 95, rd 333, flush 0, corrupt 0, gen 0

 

Which of course looks bad, and is probably why I am getting corrupted files.

 

 


Are you also seeing disk errors?  Or just btrfs errors on the RAID array?

Message 2 of 22
aks-2
Apprentice

Re: ReadyNAS 214 data corruption issue

No disk errors in the logs, or reported in the dashboard logs page.

Message 3 of 22
StephenB
Guru

Re: ReadyNAS 214 data corruption issue


@aks-2 wrote:

No disk errors in the logs, or reported in the dashboard logs page.


Have you run the disk test from the volume settings wheel lately?

 

One option is to do a factory reset (or destroy/recreate the volume), and see if the sync succeeds. Are you using X-RAID?  Or FlexRAID with multiple volumes.

 

Message 4 of 22
aks-2
Apprentice

Re: ReadyNAS 214 data corruption issue

Recent maintenance has been:

  • scrub (two weeks ago)
  • disk test (two weeks ago)
  • firmware upgrade to 6.10.9 (last week)
  • defragmentation (last week)

I am using X-RAID, with 4TB + 4TB + 8TB + 8TB, and been stable for a very long time.

 

I wondered if the firmware upgrade and/or defragmentation might be the cause, but I observe others (including yourself) at least using 6.10.9.

 

Message 5 of 22
StephenB
Guru

Re: ReadyNAS 214 data corruption issue


@aks-2 wrote:

I am using X-RAID, with 4TB + 4TB + 8TB + 8TB, and been stable for a very long time.

 


Errors on md124 suggest that you've vertically expanded the volume at least 3 times (md124, md125, md126, and md127 all exist).

 

Normally I'd expect disk errors to go along with the btrfs errors.

 

How much free space to you have on the volume?

 


@aks-2 wrote:

 

I wondered if the firmware upgrade and/or defragmentation might be the cause, but I observe others (including yourself) at least using 6.10.9.

 


I have several ReadyNAS running 6.10.9, and haven't seen any signs of BTRFS problems.

Message 6 of 22
aks-2
Apprentice

Re: ReadyNAS 214 data corruption issue

Thanks @StephenB , yes this NAS has been expanded at least twice, although during the last upgrade I did upgrade 1 x 2TB and 1 x 3TB both to 8TB, so expansion probably happened on each upgrade (even though I saw it as one).

 

Right now, I have 7TB free of 14.54TB reported on the dashboard.

Message 7 of 22
StephenB
Guru

Re: ReadyNAS 214 data corruption issue


@aks-2 wrote:

 

Right now, I have 7TB free of 14.54TB reported on the dashboard.


So lack of free space can be ruled out for sure.

Message 8 of 22
aks-2
Apprentice

Re: ReadyNAS 214 data corruption issue

Absolutely:

root:/# df -h -T
Filesystem     Type      Size  Used Avail Use% Mounted on
udev           devtmpfs   10M  4.0K   10M   1% /dev
/dev/md0       ext4      3.7G  653M  2.9G  19% /
tmpfs          tmpfs    1009M     0 1009M   0% /dev/shm
tmpfs          tmpfs    1009M  488K 1009M   1% /run
tmpfs          tmpfs     505M  8.6M  496M   2% /run/lock
tmpfs          tmpfs    1009M     0 1009M   0% /sys/fs/cgroup
/dev/md127     btrfs      15T  7.6T  7.1T  52% /data
/dev/md127     btrfs      15T  7.6T  7.1T  52% /apps
/dev/md127     btrfs      15T  7.6T  7.1T  52% /home
Message 9 of 22
StephenB
Guru

Re: ReadyNAS 214 data corruption issue

Try running smartctl -x on each drive, and see if there are any disk errors that for some reason aren't in the logs.

Message 10 of 22
aks-2
Apprentice

Re: ReadyNAS 214 data corruption issue

Appreciate the ongoing assistance/ideas, thank you.

 

I ran:

smartctl --scan
/dev/sda -d scsi # /dev/sda, SCSI device
/dev/sdb -d scsi # /dev/sdb, SCSI device
/dev/sdc -d scsi # /dev/sdc, SCSI device
/dev/sdd -d scsi # /dev/sdd, SCSI device

 

Then smartctl -x for each drive, all arror entries report zero, and smart tests for each drive report passed.

 

I am ready to rebuild/restore this NAS, but I worry that I have not yet identified the root cause, and it could therefore be some failure that I've not detected, but one that will repeat going forwards. Of course, it could 'just' be a software issue, but that's very strange too.

 

Of course, any more ideas to poke at would be most welcome!

Message 11 of 22
schumaku
Guru

Re: ReadyNAS 214 data corruption issue

...the moment I would wish having a NAS with ECC for memory and CPU 8-/

Message 12 of 22
aks-2
Apprentice

Re: ReadyNAS 214 data corruption issue

For testing, I have removed all the disks, and inserted some old 3TB drives.

 

I did a factory default, copied some files, and so far no corruptions observed. It is early days, as sometimes the corruptions took "a while" to be observable, but so far it appears reliable.

 

To me, this tends to confirm either the system software had become unstable for some reason, or I have a dodgy drive that is not showing any errors through SMART. I will run the manufacturer tests on each drive in the coming days.

 

Copying and duplicating files at will currently, and no errors are appearing, no files appear corrupt (checksum checking is passing).

 

The volume is resyncing, it will take another 18h according to the dashboard.

Message 13 of 22
StephenB
Guru

Re: ReadyNAS 214 data corruption issue


@aks-2 wrote:

 

To me, this tends to confirm either the system software had become unstable for some reason, or I have a dodgy drive that is not showing any errors through SMART. I will run the manufacturer tests on each drive in the coming days.

 


You can't easily rule out memory, so definitely keep that possibility in mind.

Message 14 of 22
CR_MHC
Aspirant

Re: ReadyNAS 214 data corruption issue

I have been having the same issues, RN204 corrupting excel and word docs, you guys seem way more techie than me, does anyone have a fix, or a recovery option? everything i have tried has not repaired the files. 

 

CR_MHC_0-1693329804799.png

 

 

this is what i get...

Message 15 of 22
StephenB
Guru

Re: ReadyNAS 214 data corruption issue


@CR_MHC wrote:

I have been having the same issues, RN204 corrupting excel and word docs,

 


If you copy the xlsx file to your PC and open it fhere, are you seeing the same problem?

Message 16 of 22
aks-2
Apprentice

Re: ReadyNAS 214 data corruption issue

You should store your files on an additional device for now, i.e. your PC local drive.

 

I found existing files already on my NAS in general did not get corrupted, only files edited or newly added. I did find copying a file within the same directory on the NAS also corrupted both the source and copy, so avoid that action.

 

Once a file is corrupted, you will need to go back to a previous good copy from another device. The corrupted files appear unrecoverable. Luckily I did have a backup, and I noticed this problem within a day or two, so could recreate what I needed.

 

Which version of the OS are you running, i.e. have you recently upgraded?

Have you changed anything else recently?

 

I decided to rebuild my NAS from factory defaults, and now I'm restoring files - around 7.5TB, so it will take a few days. My NAS seems stable since clearing it, not ideal, and for sure there may still be an underlying issue that I've not seen yet on the rebuild.

Message 17 of 22
StephenB
Guru

Re: ReadyNAS 214 data corruption issue

@CR_MHC - are you using a Mac?  Or are you using a Windows PC?

Message 18 of 22
Sandshark
Sensei

Re: ReadyNAS 214 data corruption issue

While I have no solution, I have an idea of where to look for the problem.  Excel (and all Microsoft Office products) create a temporary file in the directory (and thus on the device) from which they were opened.  Further, when saving a modified file, they concatenate (not combine) changes with the original, saving an original plus changes instead of a modifed original.  So if something happens to that temporary file (which is set as invisible and starts with a tilde (~), at least on a Windows machine, which may or may not be pertinent), the end result can be a corrupted file.  Or. if the last part of the saved file, which has the changes, is not properly saved, the entire document can also be corrupted.

 

This would not seem to explain the issue with "newly created" Excel files, but if you save as you go (manually or automatically), it could still have the original plus changes format.

 

This experiment might help get to the bottom of it:  Copy a file that was created on a local drive to the NAS.  Open and make a minor modification to the file on the NAS and save.  If you then find the modified file to be corrupt, do a binary compare of the original on the local device to the corrupt one on the NAS.

 

As a not-so-great work-around, I believe using "save as" always creates a new original, not an original plus changes.  (I know it used to, but I've not checked recently).

Message 19 of 22
StephenB
Guru

Re: ReadyNAS 214 data corruption issue


@Sandshark wrote:

While I have no solution, I have an idea of where to look for the problem.  Excel (and all Microsoft Office products) create a temporary file in the directory (and thus on the device) from which they were opened.  Further, when saving a modified file, they concatenate (not combine) changes with the original, saving an original plus changes instead of a modifed original.  So if something happens to that temporary file (which is set as invisible and starts with a tilde (~), at least on a Windows machine, which may or may not be pertinent), the end result can be a corrupted file.  Or. if the last part of the saved file, which has the changes, is not properly saved, the entire document can also be corrupted.

 


If I remember correctly, there have been similar symptoms with some Macs related to the use of "apple fruit"

Message 20 of 22
aks-2
Apprentice

Re: ReadyNAS 214 data corruption issue

OK, an update after several months without issues.

Following on from installing temp disks to test the stability of my RN214, it had zero apparent errors during that time.

I then re-inserted all the original disks, powered up, and ran some simple copy tests. Immediate file corruptions were again observed. Great, the problem is still reproducible 😉!

 

I had to wipe the system anyway due to several cloud accounts that couldn't be removed without a full reset/re-installation.

 

After this complete reset, I did a few tests and all appeared to be well. I obviously had to restore all my data from backups, then I ran rsync with file check to alternative backups, and all was still well.

 

I have had stability for 3 months, but I still have no idea what on earth happened to create the file corruptions in the first place.

 

The corrupted (xls/jpg) files I copied back to my PC for analysis were completely filled with zeroes. Fortunately it was only a handful of files, which I recovered all bar one picture.

Message 21 of 22
aks-2
Apprentice

Re: ReadyNAS 214 data corruption issue

As this issue was so weird/unexplained, and horrific, due to data corruption, I wanted to share a further update: no further issues experienced, so I think we can safely say this was an odd incident indeed.

The RN214 continues to be used every day, the exact same drives are all good, and no errors to report. Phew 😄!

 

Message 22 of 22
Top Contributors
Discussion stats
  • 21 replies
  • 3314 views
  • 0 kudos
  • 5 in conversation
Announcements