NETGEAR is aware of a growing number of phone and online scams. To learn how to stay safe click here.

Forum Discussion

ziemowit's avatar
ziemowit
Aspirant
Jul 26, 2024

BTRFS read-only state, OS 6.8.10, ReadyNAS Ultra 4

Hello, everybody!

It seems like I have met the same problem as a few other people here, namely BTRFS becoming read-only.

Background

Machine is Ultra 4, with 2GB memory, converted to run OS6. Capacity 4x2TB, RAID5.

I found my machine "misbehaving" this evening. I have had some work in progress during the day with reading and writing files to the network share. But when I restarted my work in the evening I still could read my files, but no longer write anything.

I went to the machine and pressed Backup button, nothing strange here. Backups started.

Tried to reboot machine, but unfortunately the "hard" way, i.e. Power button.

Machine came up with volumes mounted read-only.

Checked ATA errors on the disks, zero on all of them, supposedly healthy.

Tried to find a fix on the community pages, no luck.

Logs downloaded, backup done. (Backups running 2-7 times every week to another ReadyNAS or an USB HD.)

 

The problem occurred to me before, but on another machine (RN104) that had a failing disk. Because of the failing disk I did not think the problem was something to bother about, but this time it happened to me on a machine with disks that did not exhibit any problems.

 

Questions

  • I want to understand what is going on. Which boils down to:
    • what logs should I read?
    • what am I looking for in the logs?
  • I want to prevent future problems with similar issues. Which leads me to questions like:
    • are the Ultras sensitive to what disks I use? I have 4 x 2TB disks in my ReadyNAS and all but one are of the same type (3 x ST32000542AS + 1 x ST2000DL003-9VT166). All have same rotational speed.
    • I have had problems with my Ultra 4 / Pro 4 converted to OS 6.8.10; they hang sometimes with a message on the display, some of the messages have been re-occurring. Because of that I decided to reboot all of those every night or every couple of nights. I know it is not a "solution" at all, but otherwise I would have them hanging at random times 1-4 times every month.

Any suggestions welcome.

15 Replies

  • A hard re-boot while the NAS is actively writing files can damage the BTRFS volume, so you likely did this to yourself.  There is noting innately different with the Ultra that would cause this.  When a volume is damaged, the NAS often puts it in read-only mode to prevent you from making matters worse.

     

    Do not reboot again until your backup is up to date.  Doing so will not fix your problem and may result in the volume failing to mount.

     

    The only guaranteed fix is it destroy and re-create the volume or do a complete factory default.  Note that if you have apps loaded, that can be problematic since the apps are on the data volume.  So un-install them and re-install after you've re-created the volume or see here:  How-to-save-your-apps-when-destroying-your-main-volume-OS6 .

     

    As for drives, just avoid any that are SMR.

    • ziemowit's avatar
      ziemowit
      Aspirant

      Hello, Sandshark! THANKS for the answer. 🙂

       

      I noticed that while I was writing Stephen B also answered with some more information. Thanks, Stephen, I will read and see what I can understand of it.

       

      OK, I get it. 🙂 I was probably a bit too impatient with the power off. Well, you never cease learning; I thought - correct or wrong - that a NAS would and should survive an unexpected power off. I was wrong and stand corrected. Guilty as charged.

      The curious and "funny" thing is that it seems like I cannot even do "ls" on my /data directory (I/O error), but I can access /data/Pictures without problems, while /data/Videos gives me I/O error. Nevertheless, I have full backups of stuff.

       

      As far as installed applications go, it is not a problem either, as I have downloaded the binaries long time ago and actually only use ssh and possibly MySQL with PHP.

       

       

      • tijgert's avatar
        tijgert
        Guide

        I am adding to this discussion because I am looking for an answer to the exact same question.

         

        My ReadyNas 516 accidentally filled up to the max (due to an emergency backup of another drive) and threw an error because of No Space Left. The Log reflects this to be the only case and there is NO hardware issue, which has been confirmed. SO there is NO backup needed of the system, which I already have (mirrored NAS).

        I just need to be able to delete files again.

         

        The system however keeps switching to ReadOnly mode when I reboot due to lack of space, and due to ReadOnly mode I cannot create more space by deleting files...

         

        So, with all the hardware being just fine, how do I tell the system to let me erase files so I can create more space?

        I am SSH inept, but I can follow instructions if I have to.

         

        I can enter SSH via Putty and I find myself at the prompt:

        admin@NAS516:~$

         

        What can I do next?

  • I agree with Sandshark that you should back up the NAS before doing anything else.

     

    Also, I agree that rebooting a NAS with a read-only volume is dangerous.  You are lucky that it remounted.

     


    ziemowit wrote:

    I have 4 x 2TB disks in my ReadyNAS and all but one are of the same type (3 x ST32000542AS + 1 x ST2000DL003-9VT166). All have same rotational speed.

    .


    When the time comes to replace disks, I suggest you go with Seagate Ironwolf or WD Red Plus.  Avoid the WD Reds, as they are SMR.  Most desktop drives between 2-6 TB are also SMR.

     


    ziemowit wrote:

     

     

    Questions

    • I want to understand what is going on. Which boils down to:
      • what logs should I read?
      • what am I looking for in the logs?

    Generally I start with dmesg.  Look for disk and btrfs errors, also look at the mdadm commands. 

     

    Looking at the bottom of volume.log can help you sort out if you have a full OS partition (md0)

    Readnasd.log and status.log are often useful in sorting out the history of degraded volumes

     

    systemd-journal.log, system.log and kernel.log, mdstat.log often have useful clues as well.

     

    Of course you also need to figure out what to do about what you are seeing.

     

    If you like, you can upload the log zip to cloud storage (dropbox, google drive, etc) and send me a PM (private message) with a link.  Make sure the link permissions are set so anyone can download.  You'd send a PM using the envelope icon in the upper right of the forum page.

     

    It might take a while (over a week) for me to get back to you, as I'm going on vacation soon and won't have much time to spend on this.

     


    ziemowit wrote:

    I want to prevent future problems with similar issues.


    If you have no backup plan in place for your NAS, then that is the place to begin.  RAID isn't enough to keep your data safe.  The best way to do that is to have at least one copy on another device (and ideally one copy off-site).

     

    One thing to do is set up a maintenance schedule on the volume settings page.  I run one of the four functions (scrub, balance, disk test, defrag) every month.  The scrub and the disk test will access every sector of every disk, so both can give early indications of disk issues.  I space those test two months apart (filling the gaps with the balance and defrag).

     

    Don't rely on the web ui to give accurate information on disk health.  Often there are clear issues in the log zip that don't show up in the web ui.  So it is wise to look at the log zip from time to time - particularly after each scrub and disk test completes.

     

    Make sure you maintain at least 15% free space on the volume. If you use snapshots, I also suggest turning off the smart snapshots and switching to custom settings.  Set an explicit retention period, and also configure them to only make snapshots when there are changes.  This can help manage the amount of space the snapshots take up.

     

    I also suggest getting a UPS for the NAS.  That will ensure it shuts down cleanly when there is a power outage.  A lot of volume failures involve unclean shutdowns.

     

    If you run the antivirus software, the file search feature, or have apps installed on your NAS, then you should also consider upgrading the RAM to 4 GB.  Generally removing apps and disabling services you don't need is a good idea - improving both performance and stability.  Antivirus and file search in particular use a lot of system resources. 

     

    • ziemowit's avatar
      ziemowit
      Aspirant

      Thank you, StephenB !

      I hope you have had a nice trip... My reply is delayed for several reasons, one of them being reading logs and tryig to understand stuff on my own. I have 3 NASes that have misbehaved from time to time, sometimes displaying filp_close+9 in the LCD just before hanging, sometimes it was some other text. Long time ago I put those NASes on "nightly reboot", in hope that the problems would diminish if the NAS did a fresh restart every night at 3 AM (via Power Management in the GUI). I also searched logs for error messages displayed, an actually I located one of the error messages in the logs, and also saw that I possibly had a disk problem not reflected in the error count on the disk. One of the logs reported repeated problems re-allocating a number of blocks. Finding disks on Ebay in reasonable condition and resonable price, as well as reinstallig fw (reverting to 6.10.8, as I want my sheell-in-a-box installed), restoring backups and resyncing took some time. But then it was fixed.

       

      So I thought.

       

      Boy, I was wrong! This very evening I could not connect to one of the NASes and decided to do a port scan. It showed that two of the machines were having issues: ports 80, 22 and 443, as well as 21 were gone, although there was still somebody listening on ports 25, 110, 119, 143, 465 and others. I then went to the machines and pressed briefly the backup button to see what they say. The one called NAS5 displayed incoherent characters on the display, i.e. gibberish. The one called NAS6 displayed understandable text about backup button being depressed. Oh, well - time to use PWR button twice to take them down in orderly manner. The NAS6 went down in a few minutes, NAS5 just did nothing, so I had to use a long pressure on PWR to cut power the brutal way. After reboot NAS6 went up without problems, but NAS5 started resyncing. 

       

      After the reboot I took a dive into the logs. Nothing unusual in NAS5, although it was not accessible and displayed gibberish on the LCD. In the logs of NAS6 (kernel.log and system.log) I could see that it reported error codes from curl all the time after 7PM. The msmtp.log reported about the NAS6 not being able to locate smtp-mail.outlook.com. So somehow I wonder. Do I have faulty hardware? Do I have other problems that I have no clue about? StephenB will you be able and kind to look at my logs? I will send you a PM with a link.

      • Sandshark's avatar
        Sandshark
        Sensei

        You may have a hardware issue, but it's a bit early to say you do.  The Ethernet and power on/off circuit in your NAS are powered by a separate voltage from the PSU labeled "+5VSB", for +5 Volts Standby.  That voltage stays on even when the NAS is "off".  And because of that, it's often the first to go, and if the NAS is in a hot environment, that can be worse because the fan is off when the NAS is "off".  This also most often happens on units that are powered on and off routinely, which sounds like what your NAS sees.  Some of your issues do sound like they could be related to that.  The "error message" you see at power-down may not be that -- it may be something you are never supposed to see because the unit shuts down.  But it could also be a software error and the last thing executed before a crash prevented reaching power-off.

         

        Are you also sometimes seeing issues where it won't power up, by schedule, power button, or WoL?  If so, that points to a PSU issue even more.

NETGEAR Academy

Boost your skills with the Netgear Academy - Get trained, certified and stay ahead with the latest Netgear technology! 

Join Us!

ProSupport for Business

Comprehensive support plans for maximum network uptime and business peace of mind.

 

Learn More