NETGEAR is aware of a growing number of phone and online scams. To learn how to stay safe click here.

Forum Discussion

duhden's avatar
duhden
Aspirant
Mar 06, 2021

System: ReadyNASOS service or process was restarted.

Morning all,
Model: ReadyNAS 3138
OS: Firmware 6.10.4 Hotfix 1

Problem: Dreaded System: ReadyNASOS service or process was restarted. error (similiar error messages to 
https://community.netgear.com/t5/Using-your-ReadyNAS-in-Business/ReadyNASOS-service-or-process-restarting/m-p/1387135#M137014) and others

 

Services running: SMB, NFS, UPnP, HTTP,  HTTPS, SSH, Antivirus

Memory (from mem_info.log) 

  • MemTotal: 4005468 kB - MemFree: 954656 kB - MemAvailable: 1303052 kB
Apps:  SMB Plus 
----------------

 

I've had this ReadyNAS for four years (plus) and up until a couple of weeks back it has performed flawlessly.  I've been very happy with it.

 

A few weeks backs it started sending out warning that disk usage had hit the 80% mark and then one Saturday morning it went offline.  After a few cycling and restarts, it came back online and all appeared fine.

 

I ordered up four brand new 12 TB drives (from the approved hardware list! for this unit) and started the slow process of upgrading the "data" volume.  The volume is using X-RAID.  I'd pull one of the original and replace with the larger, wait for it all to clear syncing, and repeat.  This took a couple of weeks to complete the process and at the end of it, everything was green. The auto expansion of X-RAID worked at each stage and I was happy.

Then the restarts started to happen.  I'm not sure of the exact time but probably within 24 hours of the final sync finishing.

 

So, what I"ve done so far:

- bunch of research online from simliar stories

- downloaded and reviewed the logs to the best of my knowledge (see notes lower on)

- moved all the data off the NAS and did a factory reset (I was surprised the problem persisted)

- deleted the volume and created a new one with only two drives, it resynced ok

- did a second factory reset.  All four drives were enabled.  Syncing completed in just under 24 hours and I thought I had it beat, but  errors started happening again about 12 hours later 

- at this point, I've pulled all the new drives and just plunked in an old one so that I monkey around

- my gut tell me that it is something misconfigured with mdadmin (x-raid) as I know that it requires a restart to begin the expansion and maybe something wasn't clearing. (which is why I destroyed the volume). Of course, my gut has been wrong before!

- volume.log shows all drives as smart passes (they are all brand new)

 

- interesting dmesg.log, long after the final sync completed

[Sat Mar  6 01:45:39 2021] systemd-journald[1490]: Received request to flush runtime journal from PID 1
[Sat Mar  6 01:45:40 2021] md: md127 stopped.
[Sat Mar  6 01:45:40 2021] md: bind<sdd3>
[Sat Mar  6 01:45:40 2021] md: bind<sdb3>
[Sat Mar  6 01:45:40 2021] md: bind<sdc3>
[Sat Mar  6 01:45:40 2021] md: bind<sda3>
[Sat Mar  6 01:45:40 2021] md/raid:md127: not clean -- starting background reconstruction
[Sat Mar  6 01:45:40 2021] md/raid:md127: device sda3 operational as raid disk 0
[Sat Mar  6 01:45:40 2021] md/raid:md127: device sdc3 operational as raid disk 3
[Sat Mar  6 01:45:40 2021] md/raid:md127: device sdb3 operational as raid disk 2
[Sat Mar  6 01:45:40 2021] md/raid:md127: device sdd3 operational as raid disk 1
[Sat Mar  6 01:45:40 2021] md/raid:md127: allocated 4362kB
[Sat Mar  6 01:45:40 2021] md/raid:md127: raid level 5 active with 4 out of 4 devices, algorithm 2
[Sat Mar  6 01:45:40 2021] RAID conf printout:
[Sat Mar  6 01:45:40 2021]  --- level:5 rd:4 wd:4
[Sat Mar  6 01:45:40 2021]  disk 0, o:1, dev:sda3
[Sat Mar  6 01:45:40 2021]  disk 1, o:1, dev:sdd3
[Sat Mar  6 01:45:40 2021]  disk 2, o:1, dev:sdb3
[Sat Mar  6 01:45:40 2021]  disk 3, o:1, dev:sdc3
[Sat Mar  6 01:45:40 2021] md127: detected capacity change from 0 to 35985517510656
[Sat Mar  6 01:45:40 2021] md: resync of RAID array md127

 

interestng cron.log - 

-- Reboot --
Mar 06 00:36:01 Radon cron[2887]: (CRON) INFO (pidfile fd = 3)
Mar 06 00:36:01 Radon cron[2887]: (CRON) INFO (Running reboot jobs)
-- Reboot --
Mar 06 00:38:25 Radon cron[2908]: (CRON) INFO (pidfile fd = 3)
Mar 06 00:38:25 Radon cron[2908]: (CRON) INFO (Running reboot jobs)
Mar 06 01:17:01 Radon CRON[4478]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)

 

In the rn-expand.log I'm seeing a whole bunch of these:

-- Reboot --
Mar 06 01:46:00 Radon rn-expand[3103]: Trying auto-expand (in-place)
Mar 06 01:46:01 Radon rn-expand[3103]: Trying auto-extend (grow onto additional disks)
Mar 06 01:46:01 Radon rn-expand[3103]: Trying xraid-expand (tiered expansion)
-- Reboot --

In the system.log this seemed strange to me and would have been right around the time that the restarts began happening again (it might have been after the first restart)
Mar 05 22:11:00 Radon mdadm[2885]: RebuildStarted event detected on md device /dev/md0, component device resync
Mar 05 22:11:00 Radon mdadm[2885]: NewArray event detected on md device /dev/md1
Mar 05 22:11:00 Radon mdadm[2885]: NewArray event detected on md device /dev/md127

 

Anyway, HELP!   ;)  

I'm fairly comfortable in the Unix terminal. If there is anything else that I can add that might be usefull, please reach out!

 

Doug

12 Replies


  • duhden wrote:

     

    - my gut tell me that it is something misconfigured with mdadmin (x-raid) as I know that it requires a restart to begin the expansion

    Normally it doesn't require a restart to begin expansion.

     

    I'd look in the logs for disk-related errors.  Or just test the drives in a Windows PC with vendor tools (seatools for seagate; lifeguard for western digital).  Personally I do test my drives before I put them into the NAS - first running the long non-destructive test, and then the ful erase/write zeros test.  

    • duhden's avatar
      duhden
      Aspirant

      I'm not too sure where my rather lenghty reply from Sunday disappeared too. It was here, but now I don't see any additional replies to the thread...

       

      I'd look in the logs for disk-related errors.  Or just test the drives in a Windows PC with vendor tools (seatools for seagate; lifeguard for western digital).  Personally I do test my drives before I put them into the NAS - first running the long non-destructive test, and then the ful erase/write zeros test.  


      So, over the weekend I took the time to run the "Long Generic" test run on all four of the new 12 TB drives.  They all passed.  I think we can rule out hardware

       

      I've started rebuilding the NAS from scratch (again)

      1. Factory reset with just one old 300 GB drive (it complained about no redundancy) 
      2. Added first 12 TB drive - Flex Raid mirrored the 300 GB as expected, leaving the rest of the 11.5 TB as orphaned
      3. Once mirror completed, pulled original 300 GB and added second tested 12TB drive.  It complained about redunancy until the 300 GB was mirrored. It then expanded, to a mirror or 12 TB and synced that.  It just finished. Everything so far is as expected as I now have a 12TB mirrored (raid 1) volume.

      I'm still feeling that there is something (some type of cron job?) related to expansion that is causing the OS to restart.

      I'm also still confused as to why the NAS was trying to resilver a mirror (a term I had to look up) when I completed the initial factory reset with four identical 12 TB drives.  There shouldn't have been any mirror activities - just raid 5.

      Any other suggesions of config files to review or tasks to run would be mos welcomed.


      Doug

      • Sandshark's avatar
        Sandshark
        Sensei

        "Just RAID5" isn't "mirror" exactly, but it does have redundancy that requires a RAID sync (aka, re-silver), even if all the drives are empty.  In other words, the NAS was doing exactly what it was supposed to do after the factory default.  The NAS will give you access to the volume before it completes that process so you can do more with it, including adding files, though that will slow down the sync.  With 12TB drives, that sync will take a while.

         

        But it won't take nearly as long as the process you are now needlessly doing adding one drive at a time and doing a re-sync with every addition that will take longer for each drive added.  At this point, you can probably still do another factory default and wait for it to sync just the once in less time than to complete the process of adding drives individually.

NETGEAR Academy

Boost your skills with the Netgear Academy - Get trained, certified and stay ahead with the latest Netgear technology! 

Join Us!

ProSupport for Business

Comprehensive support plans for maximum network uptime and business peace of mind.

 

Learn More