× NETGEAR will be terminating ReadyCLOUD service by July 1st, 2023. For more details click here.
Orbi WiFi 7 RBE973
Reply

Re: System: ReadyNASOS service or process was restarted.

Retired_Member
Not applicable

System: ReadyNASOS service or process was restarted.

I have a ReadyNAS Pro 6 running Firmware 6.10.3

Recently the system started resyncing every few hours. From the System -> Logs tab, using the web interface:

 

May 22, 2020 02:58:01 PM
System: ReadyNASOS background service started.
May 22, 2020 02:58:00 PM
Volume: Resyncing started for Volume data.
May 22, 2020 02:57:57 PM
System: ReadyNASOS service or process was restarted.
May 22, 2020 02:12:04 PM
System: Service protocol anti-virus-sche-scan is disabled.
May 22, 2020 02:12:04 PM
System: Service protocol Antivirus is disabled.
May 22, 2020 01:17:32 PM
Volume: Resyncing started for Volume data.
May 22, 2020 01:17:32 PM
System: ReadyNASOS background service started.
May 22, 2020 01:17:29 PM
System: ReadyNASOS service or process was restarted.
May 22, 2020 10:45:11 AM
System: ReadyNASOS background service started.
May 22, 2020 10:45:11 AM
Volume: Resyncing started for Volume data.
May 22, 2020 10:45:07 AM
System: ReadyNASOS service or process was restarted.
May 22, 2020 10:07:40 AM
System: ReadyNASOS background service started.
May 22, 2020 10:07:40 AM
Volume: Resyncing started for Volume data.
May 22, 2020 10:07:37 AM
System: ReadyNASOS service or process was restarted.
May 22, 2020 08:09:08 AM
System: Antivirus scanner definition file was updated to 59.25820.
May 22, 2020 07:48:01 AM
System: ReadyNASOS background service started.
May 22, 2020 07:48:01 AM
Volume: Resyncing started for Volume data.

All the disks appear to be okay (all green) at the Web UI: System -> Performance -> Status.

The CPU/System fans are working 1757/1430 RPM, and the CPU/System are at 38C/54C

 

Downloading the log .zip file, I found the following in "kernel.log". The system reboots, but I can't figure out why:

 

May 22 10:44:53 library kernel: eth0: network connection up using port A
May 22 10:44:53 library kernel: interrupt src: MSI
May 22 10:44:53 library kernel: speed: 1000
May 22 10:44:53 library kernel: autonegotiation: yes
May 22 10:44:53 library kernel: duplex mode: full
May 22 10:44:53 library kernel: flowctrl: symmetric
May 22 10:44:53 library kernel: role: slave
May 22 10:44:53 library kernel: tcp offload: enabled
May 22 10:44:53 library kernel: scatter-gather: enabled
May 22 10:44:53 library kernel: tx-checksum: enabled
May 22 10:44:53 library kernel: rx-checksum: enabled
May 22 10:44:53 library kernel: rx-polling: enabled
May 22 10:44:53 library kernel: IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
May 22 10:45:17 library kernel: nfsd: last server has exited, flushing export cache
May 22 10:45:17 library kernel: NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory
May 22 10:45:17 library kernel: NFSD: starting 90-second grace period (net ffffffff88d70240)
-- Reboot --
May 22 13:16:58 library kernel: Initializing cgroup subsys cpuset
May 22 13:16:58 library kernel: Initializing cgroup subsys cpu
May 22 13:16:58 library kernel: Initializing cgroup subsys cpuacct
May 22 13:16:58 library kernel: Linux version 4.4.190.x86_64.1 (root@blocks) (gcc version 8.3.0 (Debian 8.3.0-6) ) #1 SMP Mon Oct 28 01:55:46 UTC 2019
May 22 13:16:58 library kernel: Command line: initrd=initrd.gz reason=normal BOOT_IMAGE=kernel
May 22 13:16:58 library kernel: KERNEL supported cpus:
May 22 13:16:58 library kernel: Intel GenuineIntel
May 22 13:16:58 library kernel: Disabled fast string operations
May 22 13:16:58 library kernel: x86/fpu: Supporting XSAVE feature 0x01: 'x87 floating point registers'
May 22 13:16:58 library kernel: x86/fpu: Supporting XSAVE feature 0x02: 'SSE registers'
May 22 13:16:58 library kernel: x86/fpu: Enabled xstate features 0x3, context size is 576 bytes, using 'standard' format.
May 22 13:16:58 library kernel: e820: BIOS-provided physical RAM map:
May 22 13:16:58 library kernel: BIOS-e820: [mem 0x0000000000000000-0x000000000009cbff] usable
May 22 13:16:58 library kernel: BIOS-e820: [mem 0x000000000009cc00-0x000000000009ffff] reserved
May 22 13:16:58 library kernel: BIOS-e820: [mem 0x00000000000e0000-0x00000000000fffff] reserved
May 22 13:16:58 library kernel: BIOS-e820: [mem 0x0000000000100000-0x00000000afeaffff] usable
May 22 13:16:58 library kernel: BIOS-e820: [mem 0x00000000afeb0000-0x00000000afebdfff] ACPI data
May 22 13:16:58 library kernel: BIOS-e820: [mem 0x00000000afebe000-0x00000000afeeffff] ACPI NVS
May 22 13:16:58 library kernel: BIOS-e820: [mem 0x00000000afef0000-0x00000000afefffff] reserved
May 22 13:16:58 library kernel: BIOS-e820: [mem 0x00000000fee00000-0x00000000fee00fff] reserved
May 22 13:16:58 library kernel: BIOS-e820: [mem 0x00000000ffb00000-0x00000000ffffffff] reserved
May 22 13:16:58 library kernel: BIOS-e820: [mem 0x0000000100000000-0x000000024fffffff] usable
May 22 13:16:58 library kernel: NX (Execute Disable) protection: active

Any ideas what could be wrong? Hardware failure? I'd be happy to provide any additional information.

 

Thanks

 

Model: RNDP6000v2|ReadyNAS Pro 6 Chassis only
Message 1 of 14
StephenB
Guru

Re: System: ReadyNASOS service or process was restarted.

How much memory is installed in the NAS?

 

Perhaps try disabling the antivirus software, and see if that makes any difference.

Message 2 of 14
Retired_Member
Not applicable

Re: System: ReadyNASOS service or process was restarted.

I've got 8 GB or RAM installed. There must be a way to look this up in the logs (also couldn't see this in the web UI), but by ssh-ing into the ReadyNAS:

 

# free -mh

                   total         used         free        shared    buffers     cached

Mem:          7.8G       697M       7.1G       5.3M       4.3M       344M

-/+ buffers/cache:       348M       7.4G

Swap:         1.5G         0B       1.5G

 

I did disable the Antivirus earlier today (saw some posts on similar topics, where you suggested this), but the rebooting & resyncing continues.

 

Thanks

 

Message 3 of 14
StephenB
Guru

Re: System: ReadyNASOS service or process was restarted.


@Retired_Member wrote:

I've got 8 GB of RAM installed.

 


Thx.  I was asking because some folks with the stock 1 GB have started to run out.  Shipping OS-6 ReadyNAS all have at least 2 GB, and you of course are well over that.

 

The thought about AV was similarly motivated - wondering if you were running out of resources.  That isn't likely to be the cause of your issue.

 

What disks are you using?

 

Have you looked through the logs for disk events (insertions/removal)?  Maybe use journalctl since you have ssh enabled.  smartctl -x might also give some clues on the disks (especially if yours support the error logging).  FWIW, I had some WD60EFRX drives that started throwing UNC errors that showed up in smartctl -x, but not in the smart stats.

 

 

Message 4 of 14
Retired_Member
Not applicable

Re: System: ReadyNASOS service or process was restarted.

I'm not sure where to find the insertion/removal information. The logs file "diskinfo.log"

Device:             sda
Controller:         0
Channel:            0
Model:              WDC WD4000FYYZ-01UL1B3
Serial:             WD-WMC130F97PDF
Firmware:           01.01K04W
Class:              SATA
RPM:                7200
Sectors:            7814037168
Pool:               data
PoolType:           RAID 6
PoolState:          1
PoolHostId:         33eabaa3
Health data
  ATA Error Count:                0
  Reallocated Sectors:            0
  Reallocation Events:            0
  Spin Retry Count:               0
  Current Pending Sector Count:   0
  Uncorrectable Sector Count:     0
  Temperature:                    42
  Start/Stop Count:               194
  Power-On Hours:                 28801
  Power Cycle Count:              194
  Load Cycle Count:               23

Looking the errors since the last boot with journalctl

# journalctl -p err -b
-- Logs begin at Sat 2020-05-23 16:00:17 PDT, end at Sat 2020-05-23 23:25:09 PDT. --
May 23 22:33:53 library kernel: gpio_it87: no device
May 23 22:33:53 library kernel: snd_hda_intel 0000:00:1b.0: no codecs found!
May 23 22:33:53 library kernel: sd 6:0:0:0: Wrong diagnostic page; asked for 1 got 0
May 23 22:33:53 library kernel: sd 6:0:0:0: Failed to get diagnostic page 0xffffffea
May 23 22:33:53 library kernel: sd 6:0:0:0: [sdg] No Caching mode page found
May 23 22:33:53 library kernel: sd 6:0:0:0: [sdg] Assuming drive cache: write through
May 23 22:33:53 library kernel: sd 6:0:0:0: Failed to bind enclosure -19
May 23 22:34:05 library sm-notify[2937]: Failed to open directory sm.bak: No such file or directory
May 23 22:34:06 library netatalk[2940]: iniparser: cannot open "/etc/netatalk/timemachine.conf"
May 23 22:34:06 library nmbd[2929]: [2020/05/23 22:34:06.297565,  0] ../lib/util/become_daemon.c:147(daemon_status)
May 23 22:34:06 library nmbd[2929]:   daemon_status: daemon 'nmbd' : No local IPv4 non-loopback interfaces available, waiting for interface ...
May 23 22:34:06 library nmbd[2929]: [2020/05/23 22:34:06.305674,  0] ../source3/nmbd/nmbd_subnetdb.c:254(create_subnets)
May 23 22:34:06 library nmbd[2929]:   NOTE: NetBIOS name resolution is not supported for Internet Protocol Version 6 (IPv6).
May 23 22:34:06 library wsdd2[2868]: error: wsdd-mcast-v4: wsd_send_soap_msg: send
May 23 22:34:06 library wsdd2[2868]: error: wsdd-mcast-v6: wsd_send_soap_msg: send
May 23 22:34:06 library wsdd2[2868]: error: llmnr-mcast-v4: open_ep: IP_ADD_MEMBERSHIP
May 23 22:34:11 library wsdd2[2868]: error: wsdd-mcast-v6: wsd_send_soap_msg: send
May 23 22:34:16 library nmbd[2929]: [2020/05/23 22:34:16.306732,  0] ../lib/util/become_daemon.c:136(daemon_ready)
May 23 22:34:16 library nmbd[2929]:   daemon_ready: daemon 'nmbd' finished starting up and ready to serve connections
May 23 22:34:16 library smbd[3108]: [2020/05/23 22:34:16.762052,  0] ../lib/util/become_daemon.c:136(daemon_ready)
May 23 22:34:16 library smbd[3108]:   daemon_ready: daemon 'smbd' finished starting up and ready to serve connections
May 23 22:34:17 library readynasd[3113]: DB (main) schema version: 24 ==> 24
May 23 22:34:17 library readynasd[3113]: DB (queue) schema version: new ==> 0
May 23 22:34:26 library readynasd[3113]: Last run started at 1590296674 was aborted
May 23 22:34:26 library readynasd[3113]: Detect abnormal shutdown of readynasd and child without core
May 23 22:34:30 library readynasd[3113]: NetworkStats eth0 failed: ERROR: mmaping file '/run/readynasd/stats/network_eth0_pkts.rrd': Invalid argument
May 23 22:34:31 library snapperd[3470]: loading 584 failed

I used smartctl to run a "short" test on all 6 disks, and then list all the information about the disk. I could not find any errors. The following is the output for /dev/sdf (similarly for /dev/sda ... /dev/sde)

# smartctl --test=short /dev/sdf
smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.4.190.x86_64.1] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: "Execute SMART Short self-test routine immediately in off-line mode".
Drive command "Execute SMART Short self-test routine immediately in off-line mode" successful.
Testing has begun.
Please wait 2 minutes for test to complete.
Test will complete after Sat May 23 23:19:23 2020

Use smartctl -X to abort test.

# smartctl -a /dev/sdf
smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.4.190.x86_64.1] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Re
Device Model:     WDC WD4000FYYZ-01UL1B3
Serial Number:    WD-WMC130F4P72F
LU WWN Device Id: 5 0014ee 0597bf541
Firmware Version: 01.01K04
User Capacity:    4,000,787,030,016 bytes [4.00 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    7200 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Sat May 23 23:20:20 2020 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82)    Offline data collection activity
                    was completed without error.
                    Auto Offline Data Collection: Enabled.
Self-test execution status:      ( 241)    Self-test routine in progress...
                    10% of test remaining.
Total time to complete Offline
data collection:         (48180) seconds.
Offline data collection
capabilities:              (0x7b) SMART execute Offline immediate.
                    Auto Offline data collection on/off support.
                    Suspend Offline collection upon new
                    command.
                    Offline surface scan supported.
                    Self-test supported.
                    Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0003)    Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01)    Error logging supported.
                    General Purpose Logging supported.
Short self-test routine
recommended polling time:      (   2) minutes.
Extended self-test routine
recommended polling time:      ( 520) minutes.
Conveyance self-test routine
recommended polling time:      (   5) minutes.
SCT capabilities:            (0x70bd)    SCT Status supported.
                    SCT Error Recovery Control supported.
                    SCT Feature Control supported.
                    SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   214   155   021    Pre-fail  Always       -       8300
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       193
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   100   253   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   061   061   000    Old_age   Always       -       28813
 10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       193
 16 Total_LBAs_Read         0x0022   003   197   000    Old_age   Always       -       201904944159
183 Runtime_Bad_Block       0x0032   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       169
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       23
194 Temperature_Celsius     0x0022   109   103   000    Old_age   Always       -       43
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Self-test routine in progress 10%     28813         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay

I couldn't finy anything wrong above, but maybe I missed something. Could it be an intermittent hardware failure?

 

 

 

Message 5 of 14
StephenB
Guru

Re: System: ReadyNASOS service or process was restarted.


@Retired_Member wrote:

 

# smartctl --test=short /dev/sdf
...
Error logging capability:        (0x01)    Error logging supported.

I couldn't finy anything wrong above, but maybe I missed something. Could it be an intermittent hardware failure?

 


smartctl -x should let you see that error log, so try that option and see if there's anything reported..

 


@Retired_Member wrote:

I'm not sure where to find the insertion/removal information.

Look in system.log, kernel.log, and dmesg.log

 

The resync might well be a side-effect of the reboot, as the NAS will do a resync when there is an unclean shutdown.  So it might not be the disks - it could also be a power issue (or something else entirely).

Message 6 of 14
Retired_Member
Not applicable

Re: System: ReadyNASOS service or process was restarted.

I could not find any insertion/removal of the main disks in system.log, kernel.log, and dmesg.log. I did find the following in dmesg.log

[Sat May 23 23:37:14 2020] sd 6:0:0:0: [sdg] Attached SCSI removable disk

Not sure what sdg is. The main disks are sda through sdf.

My ReadyNAS Pro 6 is 5 years old - so it may be having hardware failures. Do you know if I can take the disks out of my unit and plug them into a ReadyNAS 626X - and keep my existing state (volumes, files, accounts, ...)?

 

 

 

Message 7 of 14
Sandshark
Sensei

Re: System: ReadyNASOS service or process was restarted.

If one of the drives is "removed" and "re-inserted", the "new" one will be the next sd# until you re-boot.  So, that's an indication that your drive is getting removed and re-inserted into the array (even if not physically).

 

Yes, one of the many advantages of moving a legacy NAS to OS6 is that when the old stallion is finally put out to pasture, you can simply move the volume to an OS6 native system, assumning the volume is still intact.   No jumping through hoops to backup and recover.  As long as you stick with an Intel based unit like the 626, you don't even have to do anything about the apps.  They just move, too.

Message 8 of 14
Retired_Member
Not applicable

Re: System: ReadyNASOS service or process was restarted.

I appear to have an intermittent hardware problem. Tough to diagnose as an end user.

 

I have two memory cards in my ReadyNAS Pro 6 (2 x Patriot PSD28G800K) - 8 GB total. I removed one of the memory cards, and the unit has been working without error (reboots) for about a week now.

 

I'm not sure if this is a memory failure, or some connector/part moved when I took out one memory stick. I wish I could use a tool like MemTest86 to test the RAM at boot with a USB drive.

 

Earlier I had replaced the motherboard CR2032 battery, but that did not fix the hourly reboots. The machine was basically in a death spiral, rebooting in the middle of re-syncing (which takes around 12 hours for this machine).

 

If this problem returns, I'll replace my ReadyNAS Pro 6 with a RN626X00-100NES. Hopefully I'll be able to move my disks over keep my data. I'm guessing that the file system will have to be resyncd on the new system, as a reboot in the current ReadyNAS Pro 6 will corrupt the file system.

 

Message 9 of 14
StephenB
Guru

Re: System: ReadyNASOS service or process was restarted.


@Retired_Member wrote:

Hopefully I'll be able to move my disks over keep my data. I'm guessing that the file system will have to be resyncd on the new system, as a reboot in the current ReadyNAS Pro 6 will corrupt the file system.

 


Normally you can simply migrate the disks with the NAS powered down.

 

Can you explain the comment that "a reboot in the current ReadyNAS Pro 6 will corrupt the file system"?

Message 10 of 14
Retired_Member
Not applicable

Re: System: ReadyNASOS service or process was restarted.

I'm guessing that if my ReadyNAS Pro 6 fails again, it will do a hard reboot at random. This will mean that the file system will need to be resyncd on the next boot. But if the machine reboots every hour (like before), it will not be able to finish the resync before crashing again.

 

So if/when the ReadyNAS Pro 6 fails again, I may have to shut it down with an unsyncd file system, move the disks to a new machine, and hope that the file system will be resyncd on the new machine.

 

Alternatively, I could shut down the system gracefully now (with syncd disks), and move the disks to a new machine (with the file system intact). So the dilemma is 1) buy the new machine now, or 2) wait for the old machine die and hope that the new machine will be able to sync the file system from the old machine.

 

 

 

Message 11 of 14
StephenB
Guru

Re: System: ReadyNASOS service or process was restarted.


@Retired_Member wrote:

 

So if/when the ReadyNAS Pro 6 fails again, I may have to shut it down with an unsyncd file system, move the disks to a new machine, and hope that the file system will be resyncd on the new machine.


Generally if the old NAS has completely failed, then there could be damage to the array - depending on exactly how it failed.  So it is a possibility.

 


@Retired_Member wrote:

So the dilemma is 1) buy the new machine now, or 2) wait for the old machine die and hope that the new machine will be able to sync the file system from the old machine.

 


If you have a proper backup plan in place, then this dilemma is a false choice - since there is an option 3) build a new array and restore data from backup.

 

If you don't have the data backed up, then I suggest option 4) purchase USB disks now and put a backup plan in place.

Message 12 of 14
Retired_Member
Not applicable

Re: System: ReadyNASOS service or process was restarted.

Thanks for the clarification. I actually do have backups of the data, but I would prefer not to rebuild the files, accounts, groups, permissions etc. from scratch on a new machine (if I can).

 

 

Message 13 of 14
StephenB
Guru

Re: System: ReadyNASOS service or process was restarted.


@Retired_Member wrote:

I actually do have backups of the data, but I would prefer not to rebuild the files, accounts, groups, permissions etc. from scratch on a new machine (if I can).

 


That makes perfect sense. 

 

FWIW, RN626 US pricing is reasonable at the moment (as is the RN628). 

Message 14 of 14
Top Contributors
Discussion stats
  • 13 replies
  • 1627 views
  • 0 kudos
  • 3 in conversation
Announcements