× NETGEAR will be terminating ReadyCLOUD service by July 1st, 2023. For more details click here.
Orbi WiFi 7 RBE973
Reply

RN316 - Corrupt Files

Mook2
Aspirant

RN316 - Corrupt Files

Not sure this got posted. I can't seem to find ir.

 

Two 316 ReadyNAS boxes.
One Producrion, One Backup.
Same firmware: 6.10.2
Prod box has WD Red Pro 6 6TB drives.
Backup Box has WD Red 6TB drvies (Not Pro)

Both boxes used for media files.

Prod box is missing FLAC files. When I open a file with MP3tag the file (song) shows as a FLAC file with no tag (only filename). It will not play. Foobar gives this message:

 

Unable to open item for playback (Unsupported format or corrupted file):
"\\316-prod\db-prod\DB-BLs\Beach Boys\- Goodbye Surfing Hello God - CD 1.5\01 - Hi We're The Beach Boys - Beach Boys.flac"

 

There is no file there but the 316 Production box thinks there is. It shows as a file.

 

Same file plays fine on the Backup box.

 

I'm going to call these unplayable files "ghost" files.

 

My first step was to delete the ghost files from the 316 Production box and restore the same file from the Backup box. The copy works fine BUT the copied file now on the Production box doesn't necessarily play. It's another ghost file. Some play, some do not.

 

The MAIN share is Music. In Music there are 10 sub-shares by category. In the sub-share I'm working there are 40 sub-shares (Artist). In all there are 9500 files (songs).

 

Of those 9500 there are 154 ghost files.

 

There are no log file alert messages. The system indicates it's healhty. BUT when I select Diagnotics on Raidar I get this message:

 

Successfully completed diagnostics
System
No errors found.
Logs
2020-01-23 18:46:01: raidard[2978]: segfault at 7ffedc880000 ip 0000000000401d20 sp 00007ffedc87dab8 error 4 in raidard[400000+6000]

 

I'm stumped. Is this a drive error or memory area? Searching online gives an answer whic is above my pay grade. Why doesn't the system place a message in the log file area?

 

Questions:

How do I determine what the real issue is? (Bad disks? How do I know? Can I fix?)
What's the best way to attack this issue? (I'll assume a full restore from Backup but I MUST solve the issue first?)
Finally, what should I NOT do?

 

Dave-

 

 

 

Model: RN31600|ReadyNAS 300 Series 6- Bay (Diskless)
Message 1 of 14
StephenB
Guru

Re: RN316 - Corrupt Files

I suggest that you download the full log zip file from the production NAS.  Put the zip into cloud storage (google drive, dropbox, etc), and send a download link to the mods ( @JohnCM_S or @Marc_V ) and ask them to analyze them.

 

Don't post the link publicly.  Instead use the private message (PM) facility in the forum.  You send a PM by clicking on the envelope icon in the upper right of the forum page.

 

Note that if you purchased the NAS new between 1 June 2014 and 31 May 2016, you have lifetime chat support at my.netgear.com.  You could also use that.

Message 2 of 14
Mook2
Aspirant

Re: RN316 - Corrupt Files

I received an email from Netgear asking if my problem is solved. I haven't heard from anyone after following Stephen's reco so the answer is "no".

 

What I've done is the interim is:

Verified the files on my Backup NAS. 600,000+ files. 2 were corrupt. I had gotten the segfault on that box as well. The corrupt files were from 2012 so I'm discounting those since they could have been corrupt for 8 years.

 

I reinstalled the OS. No improvement.

I ran a memory test. 7 passes (8 hours). No errors.

Last night I ran a disk test. I'm not sure were to find the results of that in the log but thing seem fine. Production box booted up fine after the disk test, No obvious errors on the 6 disks.

 

Here's I did be between each of the above steps:

 

I found a folder (artist) with 21 sub-folders (albums) with 592 files (songs).

 

Before the test 63 of 592 files were corrupt. I deleted those and copied from backup after verifying the backup files were okay. After that copy 63/592 were bad with just a straight copy.

 

Then I did the same after the memory test:  51/592 were bad. Again just a straight copy.

 

After the Disk test I did the same and 45/592 were bad.

 

I was thinking thinking the results would always be the same. Silly me.

 

What causes "good" files to be corrupt after a simple copy???

 

I would think my next step(s) would be:

 

Test the drives with WD Lifeguard -or-

Factory reset

 

I surely would like for Netgear to give me some direction here. The data is bad but perhaps there's something that needs to be studied/corrected before I spends hours doing anything else.

 

Any ideas?

 

Dave-

 

 

Model: RN31600|ReadyNAS 300 Series 6- Bay (Diskless)
Message 3 of 14
StephenB
Guru

Re: RN316 - Corrupt Files


@Mook2 wrote:

 

I surely would like for Netgear to give me some direction here. The data is bad but perhaps there's something that needs to be studied/corrected before I spends hours doing anything else.

 


This is a community forum, and not Netgear support.  One option is for you to send @JohnCM_S or @Marc_V (Netgear mods) a downloadable link to your full log zip file, and ask them to analyze it.  Do that with a private message (PM) by clicking on the envelope link in the upper right of the forum.  

 

You could also use paid support via my.netgear.com.

 


@Mook2 wrote:

 

I was thinking thinking the results would always be the same. Silly me.

 

What causes "good" files to be corrupt after a simple copy???

You could test using a utility like teracopy, which has a verify option.  One question here is whether the issue is related to your network or with the NAS.

 

I am also thinking that you might want to create a fresh test share, and see if you get the same results when you copy to the test share.

 

But to answer your question, there are lots of causes, including:

  • a corrupted file system
  • bad memory in the NAS
  • network issues
  • failing disks

 

 


@Mook2 wrote:

 

Test the drives with WD Lifeguard -or-

Factory reset

 


This could also be -and-

 

Powering down the NAS and doing the long non-destructive test on each disk in Lifeguard is reasonable (though time consuming). 

 

I'd look at pending sector counts and reallocated sector counts in disk_info.log, and for any disk or btrfs errors you see in kernel.log or system.log.

Message 4 of 14
Mook2
Aspirant

Re: RN316 - Corrupt Files

Hi Stephen, thanks for your reply.

 

I did as you suggested and sent the logs and a PM to one of the contacts you listed.

 

I do understand this is a user forum and appreciate all the support given here, I wanted to make that clear. I'm just a tad frustrated. I have limted tech knowledge and that can be difficult for any user. Plus the amount of time it takes to do all these tests is, well, time-consuming. 🙂

 

Right now I am backing up my backup in case my backup box has issues.

 

I also didn't know I can get paid support. That's news to me and is an option I'll explore if necessary.

 

As to your new suggestions. I didn't think of creating a new share and copying to that. I'll give that a shot.

 

I also looked in the logs you recommended after my disk test.

 

In the system log I found no errors.

 

In the kernel log I see this message:

 

Feb 05 19:14:29 316-PROD kernel: BTRFS error (device md126): bdev /dev/md127 errs: wr 55441, rd 212, flush 0, corrupt 0, gen 0

(wr continues to 55450)

 

Plus other btrfs errors:

 

Feb 05 20:09:22 316-PROD kernel: BTRFS error (device md126): bdev /dev/md127 errs: wr 55571, rd 212, flush 0, corrupt 0, gen 0

 

Feb 05 20:30:41 316-PROD kernel: BTRFS error (device md126): bdev /dev/md127 errs: wr 55583, rd 212, flush 0, corrupt 0, gen 0

 

Feb 05 20:44:00 316-PROD kernel: BTRFS error (device md126): bdev /dev/md127 errs: wr 55713, rd 212, flush 0, corrupt 0, gen 0

 

Feb 05 20:44:05 316-PROD kernel: BTRFS error (device md126): bdev /dev/md127 errs: wr 55731, rd 212, flush 0, corrupt 0, gen 0

 

Feb 05 20:44:16 316-PROD kernel: BTRFS error (device md126): bdev /dev/md127 errs: wr 55751, rd 212, flush 0, corrupt 0, gen 0

 

Feb 05 20:44:43 316-PROD kernel: BTRFS error (device md126): bdev /dev/md127 errs: wr 55778, rd 212, flush 0, corrupt 0, gen 0

 

Feb 05 20:44:55 316-PROD kernel: BTRFS error (device md126): bdev /dev/md127 errs: wr 55790, rd 212, flush 0, corrupt 0, gen 0


Feb 05 20:46:04 316-PROD kernel: BTRFS error (device md126): bdev /dev/md127 errs: wr 55819, rd 212, flush 0, corrupt 0, gen 0

 

I'm not sure what this all means and now I'm unsure how to proceed.

 

Ideas, anyone?

 

I did read this thread about BTRFS:

 

https://community.netgear.com/t5/Using-your-ReadyNAS-in-Business/RN104-BTRFS-Read-Only-No-SMART-Erro...

 

and this one:

 

https://community.netgear.com/t5/Using-your-ReadyNAS-in-Business/BTRFS-is-killing-my-NAS/td-p/141497...

 

 

 

 

 

 

 

 

 

Model: RN31600|ReadyNAS 300 Series 6- Bay (Diskless)
Message 5 of 14
StephenB
Guru

Re: RN316 - Corrupt Files


@Mook2 wrote:

Hi Stephen, thanks for your reply.

 

I did as you suggested and sent the logs and a PM to one of the contacts you listed.

 

I do understand this is a user forum and appreciate all the support given here, I wanted to make that clear. I'm just a tad frustrated. I have limted tech knowledge and that can be difficult for any user. Plus the amount of time it takes to do all these tests is, well, time-consuming. 🙂

 

Right now I am backing up my backup in case my backup box has issues.

 

I also didn't know I can get paid support. That's news to me and is an option I'll explore if necessary.

 

As to your new suggestions. I didn't think of creating a new share and copying to that. I'll give that a shot.

 

I also looked in the logs you recommended after my disk test.

 

In the system log I found no errors.

 

In the kernel log I see this message:

Feb 05 20:09:22 316-PROD kernel: BTRFS error (device md126): bdev /dev/md127 errs: wr 55571, rd 212, flush 0, corrupt 0, gen 0
Feb 05 20:30:41 316-PROD kernel: BTRFS error (device md126): bdev /dev/md127 errs: wr 55583, rd 212, flush 0, corrupt 0, gen 0
Feb 05 20:44:00 316-PROD kernel: BTRFS error (device md126): bdev /dev/md127 errs: wr 55713, rd 212, flush 0, corrupt 0, gen 0
Feb 05 20:44:05 316-PROD kernel: BTRFS error (device md126): bdev /dev/md127 errs: wr 55731, rd 212, flush 0, corrupt 0, gen 0
Feb 05 20:44:16 316-PROD kernel: BTRFS error (device md126): bdev /dev/md127 errs: wr 55751, rd 212, flush 0, corrupt 0, gen 0
Feb 05 20:44:43 316-PROD kernel: BTRFS error (device md126): bdev /dev/md127 errs: wr 55778, rd 212, flush 0, corrupt 0, gen 0
Feb 05 20:44:55 316-PROD kernel: BTRFS error (device md126): bdev /dev/md127 errs: wr 55790, rd 212, flush 0, corrupt 0, gen 0
Feb 05 20:46:04 316-PROD kernel: BTRFS error (device md126): bdev /dev/md127 errs: wr 55819, rd 212, flush 0, corrupt 0, gen 0

 


I haven't seen this particular error here before, but it looks to me like the the wr counts are btrfs write error counts (which are increasing). 

 

That is consistent with the corrupted files you are seeing when you write to the NAS.

 

You aren't seeing any disk errors?  Just these BTRFS ones?

 

 

Message 6 of 14
Mook2
Aspirant

Re: RN316 - Corrupt Files

I have a question.

 

If this corruption is caused by the way BTRFS "works" should I just do a factory reset and begin all over again? I have no issue with that. I'm concerned that this was a blip on the way the file system is handed and not a hardware problem at all.

 

Is that a valid assumption?

 

In addition, is this a forseeable problem on that specific NAS or just a screw up that can happen on any NAS at any time?

 

Dave-

Model: RN31600|ReadyNAS 300 Series 6- Bay (Diskless)
Message 7 of 14
Mook2
Aspirant

Re: RN316 - Corrupt Files

Correct, No disk errors shown in the logs.

 

 

Model: RN31600|ReadyNAS 300 Series 6- Bay (Diskless)
Message 8 of 14
StephenB
Guru

Re: RN316 - Corrupt Files


@Mook2 wrote:

should I just do a factory reset and begin all over again? I have no issue with that. I'm concerned that this was a blip on the way the file system is handed and not a hardware problem at all.

I have seen BTRFS errors occur when the underlying disks have problems, and I've also seen file system corruption when the system is unexpectedly shut down (due to lost writes). I haven't seen this particular symptom before - either posted here or on my own systems.

 

So it would be good to have Netgear review the logs, and see if they can explain what happened.  

 

However, as far as recovery goes, I would test the disks and then do a factory reset. That would ensure that you have a clean file system.

Message 9 of 14
Mook2
Aspirant

Re: RN316 - Corrupt Files

Stephen, I'll send the log after I do the disk test just to make sure it's not the disks. If you haven't seen the issue in this manner before I'll follow you advice. Maybe I can learn something.

 

Disk test

Send log

Factory reset

Restore.

 

Ugh! Extended disk test. 8.5 hrs * 6 disks = 51 hours.

 

Results to follow.

 

To keep the thread alive I may post after each disk!

 

Thanks again,

Dave-

 

Model: RN31600|ReadyNAS 300 Series 6- Bay (Diskless)
Message 10 of 14
Mook2
Aspirant

Re: RN316 - Corrupt Files

I've tested all 6 disks with Lifeguard (Extended Test).

 

All passed.

 

Placed them back in the 316 and did a factory default. Copied a folder that had corrupt files on the Backup 316. All files copied okay. Kernel log doesn't show any BTRFS errors.

 

I would imagine things are fine. Now I'll wait 24 hours while the 316 resyncs zero data,

 

Dave-

Model: RN31600|ReadyNAS 300 Series 6- Bay (Diskless)
Message 11 of 14
StephenB
Guru

Re: RN316 - Corrupt Files


@Mook2 wrote:

I've tested all 6 disks with Lifeguard (Extended Test).

 

All passed.

 

Placed them back in the 316 and did a factory default. Copied a folder that had corrupt files on the Backup 316. All files copied okay. Kernel log doesn't show any BTRFS errors.

 

I would imagine things are fine. Now I'll wait 24 hours while the 316 resyncs zero data,

 


After you restore the data, I suggest setting up a maintenance schedule using the volume settings wheel.  Personally I run each option (disk test, scrub, balance, and defrag ) once every three months - spacing them out over the quarter.  The timing is a bit arbitrary, though I don't recommend doing scrubs more frequently than that.

Message 12 of 14
Mook2
Aspirant

Re: RN316 - Corrupt Files

Hi Stephen, Good idea. I must admit I've been neglecting those things.

 

So far I'm at 50% resynced.

 

One thing I tried that did not work was I did the first factory default using a scratch disk. After that I replaced the 6 original disks after a power down. After that I copied some good files to the factory defaulted NAS. They copied as corrupt once again. So then I did a factory default  with all 6 disks, copied a few corrupt files and they copied okay. Seems it was still in the file management system even after the first factory default.

 

Thanks for your advice.

 

Dave-

Model: RN31600|ReadyNAS 300 Series 6- Bay (Diskless)
Message 13 of 14
StephenB
Guru

Re: RN316 - Corrupt Files


@Mook2 wrote:

 

One thing I tried that did not work was I did the first factory default using a scratch disk.


A factory default does a fresh factory install onto the disks - it doesn't do anything with the flash.  The factory install includes installing the OS onto the disks.

 

So doing a factory default on a scratch disk and then switching back to the original disks has no effect.  The only reason to do it is to test the chassis.

Message 14 of 14
Top Contributors
Discussion stats
  • 13 replies
  • 2879 views
  • 3 kudos
  • 2 in conversation
Announcements