× NETGEAR will be terminating ReadyCLOUD service by July 1st, 2023. For more details click here.
Orbi WiFi 7 RBE973
Reply

ReadyNAS 214 BTRFS corruption - appears to be out of memory related

LimaAlphaHotel
Aspirant

ReadyNAS 214 BTRFS corruption - appears to be out of memory related

My ReadyNAS RN214, which has been reliably running for years, suddenly went read-only late yesterday afternoon. The UI log just had this ominous message:

 

Jan 30, 2023 16:54:47	Volume: The volume data encountered an error and was made read-only. It is recommended to backup your data.

 

 

After enabling SSH to diagnose further, dmesg showed me these BTRFS errors:

 

BTRFS error (device md127): parent transid verify failed on 33592936824832 wanted 23746221 found 23869946
BTRFS error (device md127): parent transid verify failed on 33592936824832 wanted 23746221 found 23869946

 

 

I tried to recover BTRFS using various techniques I found online, all of which failed so I decided the best course of action was to level the NAS and restore from backup (as has been recommended multiple times on these forums in similar situations), however I wanted to understand why this had happened.

Further digging showed nothing useful in the system.log until I tried to delete a file sometime later (which is when I noticed there was a problem):

 

Jan 30 17:34:45 isolinear smbd[3655]: [2023/01/30 17:34:45.092926,  0] ../source3/modules/vfs_fruit.c:4160(fruit_unlink)
Jan 30 17:34:45 isolinear smbd[3655]:   fruit_unlink: Forced unlink of [2022-11-02 From Dell XPS (Windows 10) pre re-install/Projects/PGCHE/.git/index:AFP_Resource] failed [Read-only file system]

 

 

However the kernel.log appears to show the smoking gun - the kernel's oom (out of memory) reaper kicked in and the next error in the log is BTRFS falling over a few minutes later:

 

an 30 16:44:26 isolinear kernel: kworker/u8:7 invoked oom-killer: gfp_mask=0x2400840, order=0, oom_score_adj=0
Jan 30 16:44:27 isolinear kernel: kworker/u8:7 cpuset=/ mems_allowed=0
Jan 30 16:44:27 isolinear kernel: CPU: 3 PID: 5904 Comm: kworker/u8:7 Tainted: P        W  O    4.4.218.alpine.1 #1
Jan 30 16:44:27 isolinear kernel: Hardware name: Annapurna Labs Alpine
Jan 30 16:44:27 isolinear kernel: Workqueue: btrfs-extent-refs btrfs_extent_refs_helper
[...]
Jan 30 16:44:28 isolinear kernel: Out of memory: Kill process 25419 (rsync) score 424 or sacrifice child
Jan 30 16:44:28 isolinear kernel: Killed process 25419 (rsync) total-vm:1445248kB, anon-rss:1346512kB, file-rss:756kB
Jan 30 16:53:53 isolinear kernel: BTRFS error (device md127): parent transid verify failed on 33592936824832 wanted 23746221 found 23869946
Jan 30 16:53:53 isolinear kernel: BTRFS error (device md127): parent transid verify failed on 33592936824832 wanted 23746221 found 23869946
Jan 30 16:53:53 isolinear kernel: BTRFS warning (device md127): Skipping commit of aborted transaction.
Jan 30 16:53:53 isolinear kernel: BTRFS: error (device md127) in cleanup_transaction:1864: errno=-5 IO failure
Jan 30 16:53:53 isolinear kernel: BTRFS info (device md127): forced readonly
Jan 30 16:53:53 isolinear kernel: BTRFS: error (device md127) in btrfs_drop_snapshot:9420: errno=-5 IO failure
Jan 30 16:53:53 isolinear kernel: BTRFS info (device md127): delayed_refs has NO entry
Jan 30 16:54:46 isolinear kernel: BTRFS error (device md127): parent transid verify failed on 33593071370240 wanted 23869946 found 23869944

 

 

  1. Thought it would be useful to feedback that it looks like running out of memory led directly to irrecoverable BTRFS corruption
  2. Is it possible to upgrade the memory in the RN 214, to prevent this happening again?
Model: RN214|4 BAY Desktop ReadyNAS Storage
Message 1 of 15

Accepted Solutions
StephenB
Guru

Re: ReadyNAS 214 BTRFS corruption - appears to be out of memory related

Thanks for sharing this.  I am wondering what apps and services you have running, and also what firmware version.

 


@LimaAlphaHotel wrote:

 

  1. Is it possible to upgrade the memory in the RN 214, to prevent this happening again?

No - unfortunately it is not socketed, so it can not be upgraded.

View solution in original post

Message 2 of 15

All Replies
StephenB
Guru

Re: ReadyNAS 214 BTRFS corruption - appears to be out of memory related

Thanks for sharing this.  I am wondering what apps and services you have running, and also what firmware version.

 


@LimaAlphaHotel wrote:

 

  1. Is it possible to upgrade the memory in the RN 214, to prevent this happening again?

No - unfortunately it is not socketed, so it can not be upgraded.

Message 2 of 15
LimaAlphaHotel
Aspirant

Re: ReadyNAS 214 BTRFS corruption - appears to be out of memory related


@StephenB wrote:

Thanks for sharing this.  I am wondering what apps and services you have running, and also what firmware version.

 


@LimaAlphaHotel wrote:

 

  1. Is it possible to upgrade the memory in the RN 214, to prevent this happening again?

No - unfortunately it is not socketed, so it can not be upgraded.



My original reply seems to have vanished, although the forums still given me a badge for posting it?

 

Anyway, it's the latest 6.10.8 firmware. smb, rsynd and dlna service enabled and Plex is the only app running (or installed). Plex was using about 10MB of memory in total - the rsync daemon (that the oom reaper killed) over 300MB!

 

Never had a device corrupt an on disk file-system due to running out of memory before but I know these modern ones need more memory too so I guess do more in RAM. I still use ext on my own systems!

 

Shame it can't be upgraded to avoid this problem reoccurring - I will accept your response as the solution for answering the question.

Message 3 of 15
LimaAlphaHotel
Aspirant

Re: ReadyNAS 214 BTRFS corruption - appears to be out of memory related

After 12 days copying data back from backup, it's just fallen over with `out_of_memory+1dc` on the LED display.  Completely unresponsive from the network and holding the power button also does nothing. I found at post https://community.netgear.com/t5/New-ReadyNAS-Users-General/Lost-access-to-Readynas-214-via-https/m-... with the same symptom that says yanking the power is the only way to recover at this point.

 

Since I reset it to factory defaults and rebutil it, the only filesystems enabled are cifs and rsync. No apps at all installed on it (didn't get that far).

 

I've had this device running absolutely fine since I bought it in November 2018 and it's been absolutely rock solid since then until January when these out of memory problems have started. I've not changed any configuration (significantly changed what's stored on it, added/removed any shares, enabled/disabled any access methods, added any new apps).

 

EDIT: After reading the linked thread, I checked my services - Antivirus and File Search (which is suggested could be the cause) were already turned off.  I turned off ReadyDLNA and uPnP as well but they were enabled for the 4 and a bit years before this started without any issues.

Message 4 of 15
Sandshark
Sensei

Re: ReadyNAS 214 BTRFS corruption - appears to be out of memory related

Was the volume still syncing?  In addition to the additional drive access, a sync uses a lot of memory.  Also, did you have multiple threads running for the file restore?  How large are the drives? Not that either of these should have resulted in an out of memory issue, just searching for what did tip the memory usage over the top so you and others can avoid it.

Message 5 of 15
StephenB
Guru

Re: ReadyNAS 214 BTRFS corruption - appears to be out of memory related


@Sandshark wrote:

Was the volume still syncing?  In addition to the additional drive access, a sync uses a lot of memory.  Also, did you have multiple threads running for the file restore?  How large are the drives? 


@LimaAlphaHotel: I am also wondering if the drives are SMR or CMR (since SMR can have very low write speeds, which might result in excessive memory buffering).

 

Drive health might also factor in.  Did you test the drives?

Message 6 of 15
LimaAlphaHotel
Aspirant

Re: ReadyNAS 214 BTRFS corruption - appears to be out of memory related

Sorry for not replying quickly, I had a very long day at work so did not have chance to yesterday.

 

Replying to each question:

 

Was the volume still syncing?

 

No, that finished within a couple of days of me doing the factory reset:

 

03 Feb 2023 05:00:32 Volume: Volume data is resynced.

 

Also, did you have multiple threads running for the file restore?

 

Not sure what you mean, I copied each share back one-by-one from USB drives using a single rsync or cp command (over rsync or cifs respectively). This out of memory happened the same day but many hours (>10) after the restores had been finished - nothing was actively using the NAS at the time, as far as I know - this time there's nothing in the UI log (see below) so I have not yet worked out exactly what time it happened. As with last time, I only noticed when I tried to use it and found it was unresponsive (last time, I discovered it had failed when I found it was read-only).

 

This time I didn't get an email either, it just seems to have locked up with the message on the LCD. The first time (on 31st January) I had an email that said "The volume data encountered an error and was made read-only. It is recommended to backup your data.".

 

How large are the drives?

 

10TB

 

just searching for what did tip the memory usage over the top so you and others can avoid it

 

I've not had chance to examine the logs this time - will be doing that very shortly. Fortunately this time BTRFS seems to have survived, probably because nothing was accessing the NAS at the time so there were no writes "in flight" when it ran out of memory.

 

I am also wondering if the drives are SMR or CMR (since SMR can have very low write speeds, which might result in excessive memory buffering).

 

Errr... I had to google that one - they are Seagate IronWolf NAS drives, 7200RPM (all the same model but I sourced them from different retailers in the hope that reduce the risk of them being from the same batch) - according to scan's website they're CMR.

 

Drive health might also factor in.  Did you test the drives?

 

Yes, I did that the first time and again - health checks are reporting the drives are all fine. All are reporting zero ATA errors, which was the indication of a faulty disk last time I had a drive fault. Two of the drives have single-digit numbers of reallocated sectors but they have been stable like that for a long time and counts have not increased while I've been having these issues - my understanding is modern drives will reallocate sectors automatically and unless the numbers start getting large or start growing in a shortish time that is probably nothing to worry about?

 

Message 7 of 15
LimaAlphaHotel
Aspirant

Re: ReadyNAS 214 BTRFS corruption - appears to be out of memory related

I've attached the kernel log (I'm afraid as a PDF as the forum won't let me attach a text file) - looks like the OOM reaper went after apache2 and syslog-journald (which, in the latter case, kept respawning).

 

But the processes are using relatively little memory (although, what is oath2-vault and how do I kill it? I assume it's related to ReadyNAS Vault but that is disabled and has never been enabled) - it looks to me like the majority of memory is used by buffers/cache (~1.2GB).  I wonder what is causing that, and is there a way through the UI to tune it?

Message 8 of 15
LimaAlphaHotel
Aspirant

Re: ReadyNAS 214 BTRFS corruption - appears to be out of memory related

Here's a landscape version of the log, not much better but slightly more readable....

Message 9 of 15
StephenB
Guru

Re: ReadyNAS 214 BTRFS corruption - appears to be out of memory related


@LimaAlphaHotel wrote:

(although, what is oath2-vault and how do I kill it? I assume it's related to ReadyNAS Vault but that is disabled and has never been enabled) - it looks to me like the majority of memory is used by buffers/cache (~1.2GB).  I wonder what is causing that, and is there a way through the UI to tune it?


FWIW, it is not running on my RN526 (running 6.10.7), and I am not seeing it in the logs for my RN202 either.  oauth2 generally is an authentication protocol. 

 

Are you using any other cloud services on the cloud page?  Also, what email provider are you using for alerts?

 


@LimaAlphaHotel wrote:

I've attached the kernel log (I'm afraid as a PDF as the forum won't let me attach a text file)


Another option is to put the log into cloud storage (dropbox, etc) and include a download link.

 

There is some information leakage, so generally I advise people providing full logs to send them in a private message.

Message 10 of 15
LimaAlphaHotel
Aspirant

Re: ReadyNAS 214 BTRFS corruption - appears to be out of memory related

FWIW, it is not running on my RN526 (running 6.10.7), and I am not seeing it in the logs for my RN202 either. oauth2 generally is an authentication protocol. Are you using any other cloud services on the cloud page? Also, what email provider are you using for alerts?

Yes, I know oauth2 is an authentication protocol - I meant more why is it running on my ReadyNAS and how do I turn it off. There are no cloud services enabled (nor have ever been - screenshot attached). Email is direct to my own email server (plain old SMTP), authenticated (username/password) with the server directly - no oauth there either.

 

Enabling CIFS, RSYNC and NFS, recreating the shares, configuring SMTP for email alerts and configuring the UPS client are literally the only things I've changed since the rebuild after the initial failure until it fell over again. ReadyDLNA and uPnP were on by default and I've since turned them off. Nothing else has been enabled and no apps at all installed this time.

 

Another option is to put the log into cloud storage (dropbox, etc) and include a download link. There is some information leakage, so generally I advise people providing full logs to send them in a private message.

I don't use dropbox, one drive or anything like that.

I did a cursory check for anything sensitive before I posted it.

Message 11 of 15
LimaAlphaHotel
Aspirant

Re: ReadyNAS 214 BTRFS corruption - appears to be out of memory related

Sorry, "Enabling CIFS, RSYNC and NFS" should read "Enabling CIFS, RSYNC and iSCSI" - there's no NFS enabled.

 

Do you think iSCSI could be the cause, it's a bit different to traditional file sharing - although iSCSI support is one of the reasons I bought this particular NAS and (again) I've been using it since 2018 without incident until January this year.

Message 12 of 15
StephenB
Guru

Re: ReadyNAS 214 BTRFS corruption - appears to be out of memory related


@LimaAlphaHotel wrote:

 

Do you think iSCSI could be the cause, it's a bit different to traditional file sharing - although iSCSI support is one of the reasons I bought this particular NAS and (again) I've been using it since 2018 without incident until January this year.


I don't use iSCSI, but AFAIK it doesn't use oauth2.  I believe it uses CHAP (or is configured to use no authentication)..

 

I am thinking the place to start is email alerts. That is because of this thread in the forum:

Are your alerts set up to use gmail with app-specific passwords?

 

Either way, you could disable email alerts, reboot the system, and see if oauth2-vault is still in the systemctl service list.

Message 13 of 15
LimaAlphaHotel
Aspirant

Re: ReadyNAS 214 BTRFS corruption - appears to be out of memory related

Are your alerts set up to use gmail with app-specific passwords?

No gmail here:

Email is direct to my own email server (plain old SMTP)
Either way, you could disable email alerts, reboot the system, and see if oauth2-vault is still in the systemctl service list.

Can I do this (see the service list) from the UI? I don't want to cause support issues by enabling ssh, I only did it in a last ditch-effort to recover btrfs the first time it fell over and haven't enabled it again since the factory reset. I got the process name from the oom reaoer's messages in the downloaded logs, not by direct command-line access.

 

It might be interesting to see if it's still running after I had to reboot it by removing the power when it locked up this time - that was probably it's first reboot since the reset.

 

Edit: found the "processes" file in the logs I downloaded - doesn't look like there's anything called "oauth" running now.  Moral of the story might be that one should reboot after the initial setup wizard, after a factory reset, to clean up - even thought the wizard doesn't prompt to do such a thing?

 

This is the current state according to mem_info, also in the logs (shame we can't see the memory status in the UI). How does this compare to other people's? Does it look reasonable?:

 

 

 

 

MemTotal:        2065988 kB
MemFree:          171236 kB
MemAvailable:    1441480 kB
Buffers:           46260 kB
Cached:          1490448 kB
SwapCached:            0 kB
Active:           318452 kB
Inactive:        1265908 kB
Active(anon):      31444 kB
Inactive(anon):    18368 kB
Active(file):     287008 kB
Inactive(file):  1247540 kB
Unevictable:           0 kB
Mlocked:               0 kB
HighTotal:       1310720 kB
HighFree:          59312 kB
LowTotal:         755268 kB
LowFree:          111924 kB
SwapTotal:       1047420 kB
SwapFree:        1047420 kB
Dirty:               156 kB
Writeback:             0 kB
AnonPages:         47716 kB
Mapped:            38880 kB
Shmem:              1952 kB
Slab:              83644 kB
SReclaimable:      23084 kB
SUnreclaim:        60560 kB
KernelStack:        1792 kB
PageTables:         2216 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:     2080412 kB
Committed_AS:     299572 kB
VmallocTotal:     245760 kB
VmallocUsed:           0 kB
VmallocChunk:          0 kB

 

 

 

Message 14 of 15
StephenB
Guru

Re: ReadyNAS 214 BTRFS corruption - appears to be out of memory related

Do you see anything when you use ssh (logging in as root) and run

systemctl --type=service | grep -i auth
Message 15 of 15
Top Contributors
Discussion stats
  • 14 replies
  • 2269 views
  • 6 kudos
  • 3 in conversation
Announcements