× NETGEAR will be terminating ReadyCLOUD service by July 1st, 2023. For more details click here.
Orbi WiFi 7 RBE973
Reply

Re: RR3312 Copying Errors

tbspray7
Aspirant

RR3312 Copying Errors

Hello,

 

We have ReadyNAS 3312 (RR 3312 w/ 12 bay, Firmware: 6.10.3) that has been throwing volume alert message reading: "The volume data encountered an error and was made read-only. It is recommended to backup your data." when large amounts of data have been copied onto it. This has happened three times in the past few weeks and occured twice in one-day. Following the last alert, we left it in read-only mode until we could figure out the issue, which we have so far been unsuccessful at doing. 

 

The following error message was in the kernel.log:

 

Jul 01 10:07:00 XXXXXXXXXX kernel: ------------[ cut here ]------------
Jul 01 10:07:00 XXXXXXXXXX kernel: WARNING: CPU: 1 PID: 4187 at fs/btrfs/extent-tree.c:7004 __btrfs_free_extent+0xaf4/0xb31()
Jul 01 10:07:00 XXXXXXXXXX kernel: BTRFS: Transaction aborted (error -28)
Jul 01 10:07:00 XXXXXXXXXX kernel: Modules linked in: vpd(PO)
Jul 01 10:07:00 XXXXXXXXXX kernel: CPU: 1 PID: 4187 Comm: btrfs-transacti Tainted: P O 4.4.190.x86_64.1 #1
Jul 01 10:07:00 XXXXXXXXXX kernel: Hardware name: NETGEAR ReadyNAS 3312/ReadyNAS RR3312, BIOS RR3312v113 01/25/2017
Jul 01 10:07:00 XXXXXXXXXX kernel: 0000000000000000 ffff8801baf7fa90 ffffffff8836360d ffff8801baf7fad8
Jul 01 10:07:00 XXXXXXXXXX kernel: 0000000000000009 ffff8801baf7fac8 ffffffff880dd1a8 ffffffff88282de0
Jul 01 10:07:00 XXXXXXXXXX kernel: 00000000ffffffe4 ffff880157a6ba10 ffff88026675a1e0 0000000000000000
Jul 01 10:07:00 XXXXXXXXXX kernel: Call Trace:
Jul 01 10:07:00 XXXXXXXXXX kernel: [<ffffffff8836360d>] dump_stack+0x4d/0x63
Jul 01 10:07:00 XXXXXXXXXX kernel: [<ffffffff880dd1a8>] warn_slowpath_common+0x8f/0xa8
Jul 01 10:07:00 XXXXXXXXXX kernel: [<ffffffff88282de0>] ? __btrfs_free_extent+0xaf4/0xb31
Jul 01 10:07:00 XXXXXXXXXX kernel: [<ffffffff88063f94>] warn_slowpath_fmt+0x47/0x49
Jul 01 10:07:00 XXXXXXXXXX kernel: [<ffffffff882823d2>] ? __btrfs_free_extent+0xe6/0xb31
Jul 01 10:07:00 XXXXXXXXXX kernel: [<ffffffff88282de0>] __btrfs_free_extent+0xaf4/0xb31
Jul 01 10:07:00 XXXXXXXXXX kernel: [<ffffffff882e0154>] ? btrfs_merge_delayed_refs+0x60/0x43c
Jul 01 10:07:00 XXXXXXXXXX kernel: [<ffffffff8828690c>] __btrfs_run_delayed_refs+0xa78/0xcbf
Jul 01 10:07:00 XXXXXXXXXX kernel: [<ffffffff882886da>] btrfs_run_delayed_refs+0x66/0x24f
Jul 01 10:07:00 XXXXXXXXXX kernel: [<ffffffff882892e7>] btrfs_write_dirty_block_groups+0x124/0x316
Jul 01 10:07:00 XXXXXXXXXX kernel: [<ffffffff882888a8>] ? btrfs_run_delayed_refs+0x234/0x24f
Jul 01 10:07:00 XXXXXXXXXX kernel: [<ffffffff8829906d>] commit_cowonly_roots+0x208/0x2ac

 

The other two times it has gone down the kernal reads the same. It seems to indicate there is no space left, but in actuality there is ~60 TB free. We haven't been able to find the right solution from the forums so far. Would anyone know how to handle this? 

 

Message 1 of 11
StephenB
Guru

Re: RR3312 Copying Errors

It looks more like a corrupted file system to me.  Is this happening on all shares, or just one in particular?

Message 2 of 11
Sandshark
Sensei

Re: RR3312 Copying Errors

When the OS puts the volume in read-only mode, it is trying to protect the volume from further damage.  Because it's read-only, the available space will show as zero, though if it really did fill up, that could be the root cause of the issue.

 

DO NOT reboot the NAS.  It may come back up with a dead volume.

DO NOT attempt to make the volume read/write.  It likely won't work, and you'll just be un-doing the protection and risking further damage to the voluime, including possible total loss.

DO do what it told you to do -- backup your files now.  And backup your NAS configuration if it'll be painful to re-create.  Hopefully, you have most of the files already backed up.  Because some files may also now be corrupt, don't destroy your old one if at all possible.

 

There is something critically wrong with the volume that only destroying it and re-building it is likely going to solve.  Netgear paid support may be able to do something, but it's doubtful.  But you'll want that backup even before they try.  If there is only one volume, a factory default is the best solution.  But before you do that, you should also check to see if any of the drives has a problem that's the root cause of the issue you are encountering, and replace any that need it.

Message 3 of 11
tbspray7
Aspirant

Re: RR3312 Copying Errors

Thanks for the reply! Both shares are on the same volume, so its happening on both from our knowledge. 

Message 4 of 11
tbspray7
Aspirant

Re: RR3312 Copying Errors

We don't anticipate that the drive filled up, the data being moved over was on the order of GB when it has happened. So far, none of the drives have indicated any errors.
 
How do we determine the root cause of the volume error? We prefer not to proceed with the rebuild until we know the root cause, and can ensure it won't happen again in the future. We do have backups of all the data, but there is so much data involved it takes a month to restore from the backups and validate the accuracy of the restoration.  This ReadyNAS is less than a year old, so hopefully we don't have to completely rebuild ever year.  
 
Thank you for the response! I greatly appreciate it. 
Message 5 of 11
Sandshark
Sensei

Re: RR3312 Copying Errors

Logs in the downloaded .zip file are a good place to look to see if there is a hard drive that's giving you problems.  Unfortunately, even new drives can fail.

 

I'm assuming that in a use case like yours, you have an UPS that is monitored by the NAS,. so a power interruption during a write operation isn't a possibility.

Message 6 of 11
tbspray7
Aspirant

Re: RR3312 Copying Errors

Hmm. It doesn't look like there are any disk errors recorded. The file repeats the following 12 times as expected (given the Channel and serial number change). 

 

Device: sdd
Controller: 0
Channel: 0
Model: ST16000VN001-2RV103
Serial: ZL20BQB5
Firmware: SC61
Class: SATA
RPM: 7200
Sectors: 31251759104
Pool: data
PoolType: RAID 6
PoolState: 1
PoolHostId: a450594
Health data
ATA Error Count: 0
Reallocated Sectors: 0
Reallocation Events: 0
Spin Retry Count: 0
Command Timeouts: 0
Current Pending Sector Count: 0
Uncorrectable Sector Count: 0
Temperature: 35
Start/Stop Count: 12
Power-On Hours: 6753
Power Cycle Count: 12
Load Cycle Count: 43945

 

Correct, the server is on a UPS that is monitored by the NAS. 

 

 

Message 7 of 11
Sandshark
Sensei

Re: RR3312 Copying Errors

Well, then I can't point to anything that I think could have caused it, though i suppose there could still be an issue with the NAS hardware.  I can say that it's atypical and I don't think you should be concerned you'll be encountering it periodically.  I've used a variety of ReadyNAS over the years, including my current use of legacy rack-mount systems converted to OS6, and I have only seen this once.  And that was mostly my fault -- the eSATA cable between an RN516 and EDA500 was loose, causing a lot of data issues.  (I say mostly my fault because the choice of connector for eSATA is just a really bad engineering decision -- not Netgear's, mind you, but by whoever designed the interface standard. ) 

 

Occurrences of similar issues reported in this forum typically have an identifiable cause (usually a failing drive with at least ATA errors or lack of proper UPS protection).

 

Since I knew what caused it and that it was corrected, I thought I could save the volume.  That's where I got the experience to say don't even try.  No matter what I did, the volume was read-only again within minutes.  Since my corrupt volume was on the EDA, I could just destroy and re-create it and not touch the primary volume.  I feel your pain in the data restoration -- the EDA500 is extremely slow due to the eSATA interface, and I had 5x6TB in it in RAID5.  But it was mostly archival data, so that probably made it a little less painful than yours will be if you are without current working data for any period of time.

 

The volume rebuild process will also be a good gauge of whether you do have a hardware issue, so check the logs after that completes.

Message 8 of 11
tbspray7
Aspirant

Re: RR3312 Copying Errors

We were doubting it was hardware, but still a small possibilty.  Thank you for advice and all of the help. We will try the rebuild and post back here if we get something figured out. Oddly enough, I have a friend has seen this error thrown on their other two ReadyNAS servers, but has never been able figure out why it happened on their end, only that its periodically occured.  

Message 9 of 11
tbspray7
Aspirant

Re: RR3312 Copying Errors

A quick update if you are curious: We have a friend well-versed in Linux that has been trying to help us come up with a solution. We found a problem that sounds very similar to the one we are having on the btrfs site.

 

The problem discussed on this open-source group (Seach for 'When you haven't hit the "usual" problem')

https://btrfs.wiki.kernel.org/index.php/FAQ#Help.21_Btrfs_claims_I.27m_out_of_space.2C_but_it_looks_...

Others showing a similar error. 

https://bugzilla.kernel.org/show_bug.cgi?id=74101

 

We are trying to confirm if this is the problem or not, but it is going to take a while. 

Message 10 of 11
StephenB
Guru

Re: RR3312 Copying Errors


@tbspray7 wrote:

 

The problem discussed on this open-source group (Seach for 'When you haven't hit the "usual" problem')

https://btrfs.wiki.kernel.org/index.php/FAQ#Help.21_Btrfs_claims_I.27m_out_of_space.2C_but_it_looks_...

Others showing a similar error. 

https://bugzilla.kernel.org/show_bug.cgi?id=74101

 

We are trying to confirm if this is the problem or not, but it is going to take a while. 


That problem looks rather different from what you are seeing. 

 

This stuff isn't being reported on those threads

Jul 01 10:07:00 XXXXXXXXXX kernel: WARNING: CPU: 1 PID: 4187 at fs/btrfs/extent-tree.c:7004 __btrfs_free_extent+0xaf4/0xb31()
Jul 01 10:07:00 XXXXXXXXXX kernel: BTRFS: Transaction aborted (error -28)
Jul 01 10:07:00 XXXXXXXXXX kernel: Modules linked in: vpd(PO)
Jul 01 10:07:00 XXXXXXXXXX kernel: CPU: 1 PID: 4187 Comm: btrfs-transacti Tainted: P O 4.4.190.x86_64.1 #1

 

 

 

Message 11 of 11
Top Contributors
Discussion stats
  • 10 replies
  • 1238 views
  • 0 kudos
  • 3 in conversation
Announcements