Forum Discussion

Sensei

Dec 24, 2020

Script for making remote NAS diagnostics available locally

StephenB suggested starting this. Obviously, it's for advanved users with at least some Linux experience. I have one main NAS, two local backup NAS, and a remote backup NAS. Since the backup de...

StephenB

Guru - Experienced User

Jan 01, 2021

I've been working on this over the holiday break, and I have something reasonable that so far is working ok.

Overall, the goal is to capture daily logs from both the main and various backup NAS, and consolidate those in a Logs share on the main NAS. The overall organization is to create a $(hostname) folder for each NAS in the share. Within that there is a folder for each year (e.g., 2021), and each month (2021-01, 2021-02, etc). When run on the main NAS, the script will write the logs directly to the consolidated log share. On the backup NAS, it writes the logs to a local share (LocalLogs), and then rsyncs that to the main NAS. The idea there is to allow me to back up the consolidated log share w/o having any contention (due to the backup jobs running at the same time as the script).

The script applies retention of 7 days to LocalLogs, but does not apply retention to the consolidated log. The idea there is that I don't want to have to manually log into the backup NAS regularly to clean LocalLogs (they are on a power schedule, so that is inconvenient). But it is ok for me to manually prune the consolidated logs. I might do something along the lines of the snapshot thinning used in SMART snaphots later on - not sure.

The system log naming convention is a bit different from the normal ReadyNAS names - I changed it in order to make it sort better (putting the HDSentinel log, the SMART log, and the system log for a given date together).

I designed the script to run both on my legacy 4.1.x NAS and OS-6 - and deliberately used old-school syntax to limit any compatibility issues with the older linux on 4.1.x. Probably over-did that. OS-6 is detected by looking for rnutil. Main vs Backup is detected by looking at the IP address (which is 10.0.0.15 on my main NAS). All my OS-6 NAS have a data volume (even the one running jbod FlexRaid), and of course all legacy NAS have a C volume. So I use those volume names.

The script itself is

#!/bin/sh
#
# set up some useful variables
#
MainNasIP=10.0.0.15
NasIP=`exec hostname -i | awk -F " " '{print $NF}'`
RemoteShareName=Logs
test "$MainNasIP" != "$NasIP" \
   && { ShareName=LocalLogs; \
      Retention=7; } \
   || ShareName=Logs; \
test -e /usr/bin/rnutil && LogShare=/data/$ShareName || LogShare=/c/$ShareName
LogFolder=$LogShare/$(hostname)
HDSentinel=/apps/HDSentinel/HDSentinel
timestamp="$(date +%Y%m%d_%H%M%S)"
RsyncFilter="--include=$(date +%Y)/ --include=$(date +%Y-%m)/*** --exclude=*";
test -e /usr/bin/rnutil && RshParm=-rsh=rsh
#
# make output folder if not there
#
test -d $LogFolder || mkdir $LogFolder
#
# Save Logs in /Logs/hostname/year/year-month
# build the longer folder name in two steps, so mkdir works
#
LogFolder=$LogFolder/$(date +%Y);
test -d $LogFolder || mkdir $LogFolder;
LogFolder=$LogFolder/$(date +%Y-%m);
test -d $LogFolder || mkdir $LogFolder;
#
# get system logs with rnutil on OS-6, otherwise zip /var/logs
# rnutil will create an empty file named "1" in its folder; which is harmless. But let's delete it anyway
# get smartctl data (somewhat different command for OS-6 than OS-4)
#
test -e /usr/bin/rnutil \
   && { rnutil create_system_log -o $LogFolder/$(hostname)-$timestamp-System.zip;\
      rm ./1;\
      for i in a b c d e f g h i j k l m n; do smartctl -a -x -l defects /dev/sd${i} | egrep -v "local build|No such device|smartmontools"; done >>$LogFolder/$(hostname)-$timestamp-Smart.log; }\
   || { /apps/Scripts/diag >/tmp/diagnostics.log;\
      /apps/Scripts/90_CreateLogs;\
      zip -r -j $LogFolder/$(hostname)-$timestamp-System.zip /ramfs/log_zip/*;\
      test -d /ramfs/log_zip && rm -rf /ramfs/log_zip;\
      test -e /tmp/diagnostics.log && rm /tmp/diagnostics.log;\
      for i in a b c d e f g h i j k l m n; do smartctl -a -x /dev/hd${i} | egrep -v "local build|No such device|smartmontools"; done >>/$LogFolder/$(hostname)-$timestamp-Smart.log; }
#
# log HDsentinel info if present
#
test -e $HDSentinel && $HDSentinel -r $LogFolder/$(hostname)-$timestamp-HDSentinel
#
# Apply retention limits if variable set
#
test "$Retention" != "" && find $LogShare/$(hostname)/* -mtime +$Retention -exec rm {} \;
test "$Retention" != "" && find $LogShare/$(hostname) -type d -empty -delete
#
# rsync logs to the main NAS if this is a backup NAS
# this requires that rsync be enabled as read-write on the destination share.
# retention is not being applied to the destination share
#
test "$MainNasIP" != "$NasIP" && rsync $RshParm -amv $RsyncFilter $LogShare/$(hostname)/* $MainNasIP::$RemoteShareName/$(hostname)
exit 0

On OS-6 I chose to run this as a service, and not in a cron job. To do this, you need to put a service and a timer specification into /var/systemd/system. The files I am using are below.

update_logs.service:

[Unit]
Description=Capture Logs Service
After=network-online.target multi-user.target

[Service]
Type=oneshot
RemainAfterExit=no
ExecStart=/apps/Scripts/update_logs

[Install]
WantedBy=multi-user.target

update_logs.timer:

[Unit]
Description=Capture Logs Service

[Timer]
OnCalendar=*-*-* 00:04:00
Persistent=true
Unit=update_logs.service

[Install]
WantedBy=multi-user.target

The services are set up by entering

systemctl enable update_logs
systemctl start update_logs
systemctl enable update_logs.timer
systemctl start update_logs.timer

The timer setting for Persistent is supposed to detect that the service wasn't run because the NAS was off, and run it at the next boot when that is detected. I haven't tested that.

Note that the exit 0 at the end of the script is intentional. If the final test is false, then the script returns an error status. Also, if the rsync fails because the main NAS is down, then the script would also return an error. There are apparently scenarios when systemd will stop running services that repeatedly fail. I don't know for sure if that can happen with a one-shot service, but it seemed best to avoid it.

I'll describe how I am building the system log for the legacy NAS in the next post.

StephenB

Guru - Experienced User

Jan 01, 2021

Of course legacy NAS don't support systemctl, so you need to run the main script as a cron job. One aspect here is that the 4.1.16 uses a system-wide cron approach. It will let you create a user cron job using crontab <filename> - but it doesn't actually run that cron job.

Another aspect is that if you look at /var/cron.log you will find that the system always skips the reboot jobs when the system boots. That is an old debian bug ( /var/run/crond.reboot isn't deleted when the system reboots). I guess you could delete it yourself in the update_logs script, but I decided not to try.

So you end up needing to add an entry in /etc/crontab that specifies the time of day that you want the script to run.

Building the legacy system log was a bit tricky. Originally I was thinking that I'd just zip up /var/log on the legacy 4.1.x NAS. But after looking at that more, I decided it would be better to try to match the actual log file created by the web ui. That took a bit of research. The result only applies to 4.1.x - I have no way to check it on 4.2 or 5.3 NAS.

It turns out that when you download all logs from frontview, the legacy NAS creates a dynamic script to consolidate the logs, and then runs a second script to clean up after the log zip is downloaded. The script that consolidates the logs is around for long enough that you can see it, and copy it. It's called 90_CreateLogs, and it is created in /var/spool/frontview. The script does depend somewhat on how the NAS is set up - in particular, my NV+ (using XRAID) and my Duo (using FlexRAID) have somewhat different files for the RAID configuration. So I began with grabbing the dynamic scripts used on both of those systems.

FWIW, the script consolidates the files into the ram filesystem (I'd expected it to use something in /tmp).

When I ran that script, I discovered that there is one file in the log that couldn't be found - diagnostics.log. That apparently is also run by the web ui before the 90_CreateLogs is run, but I wasn't able to grab the command that creates it. The Netgear version doesn't have enough information to tell exactly what it is running.

Netgear's diagnostic.log:

Disks
-------------------------------
Passed diagnostics.

Memory
-------------------------------
Passed diagnostics.

Network
-------------------------------
Passed diagnostics.

Performance
-------------------------------
* Jumbo frames are disabled on interface 1.  If both your switch and clients support jumbo frames, you can enhance your write performance by enabling jumbo frames on this interface.

Volume
-------------------------------
Passed diagnostics.

So I don't really know what Netgear is running (and testing the volume isn't really useful for me, since I am writing the zip file to the volume). I could just dropped that file from my log zip, but instead I thought it would be useful to have something roughly comparable.

After a bit more sleuthing, I discovered an old manufacturing test on the system, called quicktest. This was definitely old (it is hard-coded to fail if it doesn't find 4 disks, so it was never used for the Duo). But I used it as a starting point - stripping out some things, and adapting the disk test - the original uses badblocks, which won't work on an operational disk. I substituted a cached read test.

What I ended up with outputs this:

================================================================
Infrant Manufacturing Test, Version 1.18
Running on kernel 2.6.17.14ReadyNAS #1 Wed Jun 20 20:08:20 PDT 2012
================================================================

================================================================
Testing onboard network interface...
================================================================
Testing DHCP and ping...done
ONBOARD NIC TEST.......... PASSED

================================================================
Testing Memory...
================================================================
Running quick memory check...done
MEMORY TEST .............. PASSED (Found 256 MB)

================================================================
Testing hard disks...
================================================================
Running quick check on hdc.../dev/hdc: Timing O_DIRECT cached reads: 106 MB in 2.00 seconds = 53.03 MB/sec
Running quick check on hde.../dev/hde: Timing O_DIRECT cached reads: 62 MB in 2.01 seconds = 30.78 MB/sec
Running quick check on hdg.../dev/hdg: Timing O_DIRECT cached reads: 124 MB in 2.02 seconds = 61.27 MB/sec
Running quick check on hdi.../dev/hdi: Timing O_DIRECT cached reads: 106 MB in 2.82 seconds = 37.65 MB/sec
DISK TEST ................ PASSED

================================================================
Testing hardware monitoring...
================================================================
temp 0:22.5
fan 0:2027

HARDWARE MONITOR TEST .... PASSED

================================================================
Checking RTC...
================================================================
RTC TEST ................. PASSED


================================================================
Final Test Summary:
================================================================
RTC TEST ................. PASSED
HARDWARE MONITOR TEST .... PASSED
DISK TEST ................ PASSED
MEMORY TEST .............. PASSED (Found 256 MB)
ONBOARD NIC TEST.......... PASSED

================================================================
TEST RESULT:   PASS
================================================================

I don't know how good these tests are at detecting failures, but it still seemed to give some useful information. I can post my version of this script if there is interest.

NETGEAR Academy

Boost your skills with the Netgear Academy - Get trained, certified and stay ahead with the latest Netgear technology!

Join Us!

ProSupport for Business

Comprehensive support plans for maximum network uptime and business peace of mind.

Learn More

Forum Discussion

Script for making remote NAS diagnostics available locally

Related Content

GS308T Remote Diagnostics

RBRE960 cant connect locally

this app isn't available

Available apps is empty

LAN Connectivity Failure from Wireless Diagnostics (macOS)

NETGEAR Academy

ProSupport for Business