NETGEAR is aware of a growing number of phone and online scams. To learn how to stay safe click here.

Forum Discussion

LimaAlphaHotel's avatar
Jan 31, 2023
Solved

ReadyNAS 214 BTRFS corruption - appears to be out of memory related

My ReadyNAS RN214, which has been reliably running for years, suddenly went read-only late yesterday afternoon. The UI log just had this ominous message:

 

Jan 30, 2023 16:54:47	Volume: The volume data encountered an error and was made read-only. It is recommended to backup your data.

 

 

After enabling SSH to diagnose further, dmesg showed me these BTRFS errors:

 

BTRFS error (device md127): parent transid verify failed on 33592936824832 wanted 23746221 found 23869946
BTRFS error (device md127): parent transid verify failed on 33592936824832 wanted 23746221 found 23869946

 

 

I tried to recover BTRFS using various techniques I found online, all of which failed so I decided the best course of action was to level the NAS and restore from backup (as has been recommended multiple times on these forums in similar situations), however I wanted to understand why this had happened.

Further digging showed nothing useful in the system.log until I tried to delete a file sometime later (which is when I noticed there was a problem):

 

Jan 30 17:34:45 isolinear smbd[3655]: [2023/01/30 17:34:45.092926,  0] ../source3/modules/vfs_fruit.c:4160(fruit_unlink)
Jan 30 17:34:45 isolinear smbd[3655]:   fruit_unlink: Forced unlink of [2022-11-02 From Dell XPS (Windows 10) pre re-install/Projects/PGCHE/.git/index:AFP_Resource] failed [Read-only file system]

 

 

However the kernel.log appears to show the smoking gun - the kernel's oom (out of memory) reaper kicked in and the next error in the log is BTRFS falling over a few minutes later:

 

an 30 16:44:26 isolinear kernel: kworker/u8:7 invoked oom-killer: gfp_mask=0x2400840, order=0, oom_score_adj=0
Jan 30 16:44:27 isolinear kernel: kworker/u8:7 cpuset=/ mems_allowed=0
Jan 30 16:44:27 isolinear kernel: CPU: 3 PID: 5904 Comm: kworker/u8:7 Tainted: P        W  O    4.4.218.alpine.1 #1
Jan 30 16:44:27 isolinear kernel: Hardware name: Annapurna Labs Alpine
Jan 30 16:44:27 isolinear kernel: Workqueue: btrfs-extent-refs btrfs_extent_refs_helper
[...]
Jan 30 16:44:28 isolinear kernel: Out of memory: Kill process 25419 (rsync) score 424 or sacrifice child
Jan 30 16:44:28 isolinear kernel: Killed process 25419 (rsync) total-vm:1445248kB, anon-rss:1346512kB, file-rss:756kB
Jan 30 16:53:53 isolinear kernel: BTRFS error (device md127): parent transid verify failed on 33592936824832 wanted 23746221 found 23869946
Jan 30 16:53:53 isolinear kernel: BTRFS error (device md127): parent transid verify failed on 33592936824832 wanted 23746221 found 23869946
Jan 30 16:53:53 isolinear kernel: BTRFS warning (device md127): Skipping commit of aborted transaction.
Jan 30 16:53:53 isolinear kernel: BTRFS: error (device md127) in cleanup_transaction:1864: errno=-5 IO failure
Jan 30 16:53:53 isolinear kernel: BTRFS info (device md127): forced readonly
Jan 30 16:53:53 isolinear kernel: BTRFS: error (device md127) in btrfs_drop_snapshot:9420: errno=-5 IO failure
Jan 30 16:53:53 isolinear kernel: BTRFS info (device md127): delayed_refs has NO entry
Jan 30 16:54:46 isolinear kernel: BTRFS error (device md127): parent transid verify failed on 33593071370240 wanted 23869946 found 23869944

 

 

  1. Thought it would be useful to feedback that it looks like running out of memory led directly to irrecoverable BTRFS corruption
  2. Is it possible to upgrade the memory in the RN 214, to prevent this happening again?
  • Thanks for sharing this.  I am wondering what apps and services you have running, and also what firmware version.

     


    LimaAlphaHotel wrote:

     

    1. Is it possible to upgrade the memory in the RN 214, to prevent this happening again?

    No - unfortunately it is not socketed, so it can not be upgraded.

14 Replies

Replies have been turned off for this discussion
  • Thanks for sharing this.  I am wondering what apps and services you have running, and also what firmware version.

     


    LimaAlphaHotel wrote:

     

    1. Is it possible to upgrade the memory in the RN 214, to prevent this happening again?

    No - unfortunately it is not socketed, so it can not be upgraded.

    • LimaAlphaHotel's avatar
      LimaAlphaHotel
      Aspirant

      StephenB wrote:

      Thanks for sharing this.  I am wondering what apps and services you have running, and also what firmware version.

       


      LimaAlphaHotel wrote:

       

      1. Is it possible to upgrade the memory in the RN 214, to prevent this happening again?

      No - unfortunately it is not socketed, so it can not be upgraded.



      My original reply seems to have vanished, although the forums still given me a badge for posting it?

       

      Anyway, it's the latest 6.10.8 firmware. smb, rsynd and dlna service enabled and Plex is the only app running (or installed). Plex was using about 10MB of memory in total - the rsync daemon (that the oom reaper killed) over 300MB!

       

      Never had a device corrupt an on disk file-system due to running out of memory before but I know these modern ones need more memory too so I guess do more in RAM. I still use ext on my own systems!

       

      Shame it can't be upgraded to avoid this problem reoccurring - I will accept your response as the solution for answering the question.

      • LimaAlphaHotel's avatar
        LimaAlphaHotel
        Aspirant

        After 12 days copying data back from backup, it's just fallen over with `out_of_memory+1dc` on the LED display.  Completely unresponsive from the network and holding the power button also does nothing. I found at post https://community.netgear.com/t5/New-ReadyNAS-Users-General/Lost-access-to-Readynas-214-via-https/m-p/2111643 with the same symptom that says yanking the power is the only way to recover at this point.

         

        Since I reset it to factory defaults and rebutil it, the only filesystems enabled are cifs and rsync. No apps at all installed on it (didn't get that far).

         

        I've had this device running absolutely fine since I bought it in November 2018 and it's been absolutely rock solid since then until January when these out of memory problems have started. I've not changed any configuration (significantly changed what's stored on it, added/removed any shares, enabled/disabled any access methods, added any new apps).

         

        EDIT: After reading the linked thread, I checked my services - Antivirus and File Search (which is suggested could be the cause) were already turned off.  I turned off ReadyDLNA and uPnP as well but they were enabled for the 4 and a bit years before this started without any issues.

  • Sorry for not replying quickly, I had a very long day at work so did not have chance to yesterday.

     

    Replying to each question:

     

    Was the volume still syncing?

     

    No, that finished within a couple of days of me doing the factory reset:

     

    03 Feb 2023 05:00:32 Volume: Volume data is resynced.

     

    Also, did you have multiple threads running for the file restore?

     

    Not sure what you mean, I copied each share back one-by-one from USB drives using a single rsync or cp command (over rsync or cifs respectively). This out of memory happened the same day but many hours (>10) after the restores had been finished - nothing was actively using the NAS at the time, as far as I know - this time there's nothing in the UI log (see below) so I have not yet worked out exactly what time it happened. As with last time, I only noticed when I tried to use it and found it was unresponsive (last time, I discovered it had failed when I found it was read-only).

     

    This time I didn't get an email either, it just seems to have locked up with the message on the LCD. The first time (on 31st January) I had an email that said "The volume data encountered an error and was made read-only. It is recommended to backup your data.".

     

    How large are the drives?

     

    10TB

     

    just searching for what did tip the memory usage over the top so you and others can avoid it

     

    I've not had chance to examine the logs this time - will be doing that very shortly. Fortunately this time BTRFS seems to have survived, probably because nothing was accessing the NAS at the time so there were no writes "in flight" when it ran out of memory.

     

    I am also wondering if the drives are SMR or CMR (since SMR can have very low write speeds, which might result in excessive memory buffering).

     

    Errr... I had to google that one - they are Seagate IronWolf NAS drives, 7200RPM (all the same model but I sourced them from different retailers in the hope that reduce the risk of them being from the same batch) - according to scan's website they're CMR.

     

    Drive health might also factor in.  Did you test the drives?

     

    Yes, I did that the first time and again - health checks are reporting the drives are all fine. All are reporting zero ATA errors, which was the indication of a faulty disk last time I had a drive fault. Two of the drives have single-digit numbers of reallocated sectors but they have been stable like that for a long time and counts have not increased while I've been having these issues - my understanding is modern drives will reallocate sectors automatically and unless the numbers start getting large or start growing in a shortish time that is probably nothing to worry about?

     

    • LimaAlphaHotel's avatar
      LimaAlphaHotel
      Aspirant

      I've attached the kernel log (I'm afraid as a PDF as the forum won't let me attach a text file) - looks like the OOM reaper went after apache2 and syslog-journald (which, in the latter case, kept respawning).

       

      But the processes are using relatively little memory (although, what is oath2-vault and how do I kill it? I assume it's related to ReadyNAS Vault but that is disabled and has never been enabled) - it looks to me like the majority of memory is used by buffers/cache (~1.2GB).  I wonder what is causing that, and is there a way through the UI to tune it?

NETGEAR Academy

Boost your skills with the Netgear Academy - Get trained, certified and stay ahead with the latest Netgear technology! 

Join Us!

ProSupport for Business

Comprehensive support plans for maximum network uptime and business peace of mind.

 

Learn More