RN102 broken after OS 6.10.4 update

dstsui
Aspirant
Apr 30, 2021
Removed disk 1 after powering down the NAS and it boots up fine with just disk 2.

I ran NASTester all Shares. The Share with BitRot protection enabled measured 25MB/s. The remaining Shares with BitRot protection disabled measured 36MB/s on average. Is this speed normal at all? I think it is underpar even with BitRot protection off. If so, what else could be wrong? No problem reading or writing multiple files though. Is BitRot protection worth the sacrifice in speed? I have problem with 25MB/s in accessing photos on my photo management software.

I will test disk 1 tomorrow.
StephenB
Guru - Experienced User
Apr 30, 2021
dstsui wrote:

I ran NASTester all Shares. The Share with BitRot protection enabled measured 25MB/s. The remaining Shares with BitRot protection disabled measured 36MB/s on average. Is this speed normal at all?

Both speeds sound low to me. Are you testing over wifi or gigabit ethernet? On gigabit you can get ~70 MB/s read and ~50 MB/s write.

There were some improvements made to the BTRFS setup some years ago. But to take advantage of them you'd need to do a factory reset, rebuild the NAS, and restore all the data from backup. I tested that some time ago, and it did make a significant difference on my RN102.

dstsui wrote:

Is BitRot protection worth the sacrifice in speed?

Up to you really. I do leave it on my main NAS (an RN526x), but I've never had bitrot protection kick in and recover a file. But it's not an RN100 series, and even with bitrot on, it can saturate a fast gigabit connection.
dstsui
Aspirant
Apr 30, 2021
StephenB wrote:
dstsui wrote:
I ran NASTester all Shares. The Share with BitRot protection enabled measured 25MB/s. The remaining Shares with BitRot protection disabled measured 36MB/s on average. Is this speed normal at all?
Both speeds sound low to me. Are you testing over wifi or gigabit ethernet? On gigabit you can get ~70 MB/s read and ~50 MB/s write.
Testing was done over GigE. I have confirmed the LAN hardware is sound before using a direct Cat 6e cable connection (no router and switch) between NAS and PC and verifed with another PC over 5GHz WiFi and the speed in both tests were about the same. It was somewhat lower over WiFi but this was expected. I concluded the common element is the NAS and is the likely culprit. What issues could possibly drag the speed down by that much? Are there any clues in the logs?

A backup of the NAS was run overnight and completed without issues.

I ran WD Dashboard Diganostic Extended Test on disk 1 and it failed with error code 2. Don't know what that means.

I will intall a new disk into the NAS today after checking it with WD Dashboard.
dstsui
Aspirant
Apr 30, 2021
StephenB wrote:
dstsui wrote:
I ran NASTester all Shares. The Share with BitRot protection enabled measured 25MB/s. The remaining Shares with BitRot protection disabled measured 36MB/s on average. Is this speed normal at all?
There were some improvements made to the BTRFS setup some years ago. But to take advantage of them you'd need to do a factory reset, rebuild the NAS, and restore all the data from backup. I tested that some time ago, and it did make a significant difference on my RN102.

When was BTRFS introduced? I had the RN102 for 6 years and I should have checked the speed when I first installed it and defintely would not have accepted it if the speed was so slow back then, which is not a lot better than my old ReadyNAS Duo. BTRFS is unlikely to be the real issue, we are talking about 60% reduction in speed so something fundamental must be wrong.
dstsui
Aspirant
Apr 30, 2021
I disabled BTRFS on all shares and with no background process running, I got the following test results on one of the Shares:

NAS performance tester 1.7 http://www.808.dk/?nastester
Running warmup...
Running a 400MB file write on R: 5 times...
Iteration 1: 52.86 MB/sec
Iteration 2: 58.10 MB/sec
Iteration 3: 56.61 MB/sec
Iteration 4: 52.46 MB/sec
Iteration 5: 58.37 MB/sec
-----------------------------
Average (W): 55.68 MB/sec
-----------------------------
Running a 400MB file read on R: 5 times...
Iteration 1: 33.95 MB/sec
Iteration 2: 34.29 MB/sec
Iteration 3: 30.59 MB/sec
Iteration 4: 29.39 MB/sec
Iteration 5: 33.50 MB/sec
-----------------------------
Average (R): 32.34 MB/sec

The write speed has improved signficantly but the read speed has not improved. Any clues?
dstsui
Aspirant
Apr 30, 2021
dstsui wrote:
I disabled BTRFS on all shares and with no background process running, I got the following test results on one of the Shares:

NAS performance tester 1.7 http://www.808.dk/?nastester
Running warmup...
Running a 400MB file write on R: 5 times...
Iteration 1: 52.86 MB/sec
Iteration 2: 58.10 MB/sec
Iteration 3: 56.61 MB/sec
Iteration 4: 52.46 MB/sec
Iteration 5: 58.37 MB/sec
-----------------------------
Average (W): 55.68 MB/sec
-----------------------------
Running a 400MB file read on R: 5 times...
Iteration 1: 33.95 MB/sec
Iteration 2: 34.29 MB/sec
Iteration 3: 30.59 MB/sec
Iteration 4: 29.39 MB/sec
Iteration 5: 33.50 MB/sec
-----------------------------
Average (R): 32.34 MB/sec

The write speed has improved signficantly but the read speed has not improved. Any clues?

These results above were obtained with Flow Control on the PC NIC disabled. With Flow Control (Tx and Rx) enabled, I got the following results:

Running warmup...
Running a 400MB file write on R: 5 times...
Iteration 1: 30.75 MB/sec
Iteration 2: 30.22 MB/sec
Iteration 3: 30.86 MB/sec
Iteration 4: 31.57 MB/sec
Iteration 5: 30.84 MB/sec
-----------------------------
Average (W): 30.85 MB/sec
-----------------------------
Running a 400MB file read on R: 5 times...
Iteration 1: 33.76 MB/sec
Iteration 2: 34.52 MB/sec
Iteration 3: 34.29 MB/sec
Iteration 4: 33.39 MB/sec
Iteration 5: 33.89 MB/sec
-----------------------------
Average (R): 33.97 MB/sec

I also spotted the following entries in the kernel log around the time when I ran the first test with flow control off:

May 01 12:10:43 nas-36-1C-14 kernel: mvneta d0074000.ethernet eth0: bad rx status 0e830000 (overrun error), size=1024
May 01 12:10:43 nas-36-1C-14 kernel: mvneta d0074000.ethernet eth0: bad rx status 0e830000 (overrun error), size=512
May 01 12:10:43 nas-36-1C-14 kernel: mvneta d0074000.ethernet eth0: bad rx status 0e830000 (overrun error), size=128
May 01 12:23:58 nas-36-1C-14 kernel: nr_pdflush_threads exported in /proc is scheduled for removal
mdgm
Virtuoso
May 01, 2021
What does your initrd.log look like in the logs zip? That will show the firmware update history since the last factory reset of your system.

BTRFS has always been used as the filesystem for the data volume for OS6 and the filesystem for the root volume on the x86 NAS units (the 102 uses EXT4 for the root volume).

Bit-rot protection was added back in 6.2.x

If you ever used bit-rot protection or snapshots, disabling bit-rot protection isn't going to undo all the CoW that has happened. It will just limit what happens going forward.

There are things you could check like CPU usage, whether the memory is using a lot of swap etc. Also turn off services and apps that you don't need.

Over time as updates are released software may consume more resources and the RN100 series is the least powerful systems that run OS6 so any performance hit is going to be most noticeable on those systems.

If you have a bad disk that can impact performance. If you only have the one disk that can also impact performance, and if your NAS is doing a resync of the RAID whilst adding a replacement disk performance will be reduced until that resync completes.
StephenB
Guru - Experienced User
May 01, 2021
IMO you should be careful to tackle one issue at a time. So wait to address the speed issue until you have a full RAID-1 volume back.

I also suggest making a backup of your files before you add the replacement disk back. The rebuild of the RAID will stress the disk 2 (every sector is read and copied to the replacement to create the mirror). If the disk 2 fails during the process, you will lose your data.

dstsui wrote:
When was BTRFS introduced?

Your NAS has always used BTRFS.

My own RN102 goes back to 2013. By 2016 the read speed had dropped to about 35 MB/sec. This was using jbod volumes (so bitrot protection was off, since it needs RAID redundancy). That speed sounds very similar to yours.

A factory reset restored the read speed to around 70 MB/s. I posted the results here, but it was a long time ago, and search isn't finding them. I did another benchmark in 2019, checking all three versions of SMB. The second test was using different disks, and the NAS was set up for XRAID/RAID-1 - not jbod. SMB 2.1 speeds were still around 70 MB/s, though SMB 3 was somewhat slower (63 MB/sec). https://community.netgear.com/t5/Using-your-ReadyNAS-in-Business/Average-transfer-speed-read-and-write-plummeted-after-replacing/m-p/1718589#M178329

So based on my own experience, a factory reset will almost certainly increase your speeds substantially. Yes, it is painful, but the performance increase is likely worth it.

Other things you can try:

Disable strict sync in the SMB settings

disable IPv6 in your NAS network settings. In some cases the PC will use ipv6 link local addresses for data transfer, and that can be slower (esp if your router doesn't have ipv6 enabled).

Disabling all services you don't actually need (and uninstalling any apps you no longer use). The RN102 doesn't have much memory, and over the years the memory footprint of the firmware has increased. Removing services and apps can help mitigate that.

Run the maintenance functions on the volume settings wheel. Balance and Scrub can both help improve performance. Personally I schedule one maintenance function per month - completing the cycle every four months (balance, defrag, scrub, disk test)
StephenB
Guru - Experienced User
May 01, 2021
While I get your desire to solve your original speed problem, I still suggest taking things one step at a time. Data safety comes first, performance comes second.

If I were in your position now (waiting for the new disk to be inserted and resync), I'd be making sure I had an up-to-date backup. Once that was taken care of, I'd run the disk test from the volume settings wheel - in order to confirm that disk 2 didn't have any issues. The disks are about the same age, and it isn't unusual to have multiple disks in a RAID array fail in rapid succession.

dstsui wrote:

I also spotted the following entries in the kernel log around the time when I ran the first test with flow control off:

May 01 12:10:43 nas-36-1C-14 kernel: mvneta d0074000.ethernet eth0: bad rx status 0e830000 (overrun error), size=1024
May 01 12:10:43 nas-36-1C-14 kernel: mvneta d0074000.ethernet eth0: bad rx status 0e830000 (overrun error), size=512
May 01 12:10:43 nas-36-1C-14 kernel: mvneta d0074000.ethernet eth0: bad rx status 0e830000 (overrun error), size=128
May 01 12:23:58 nas-36-1C-14 kernel: nr_pdflush_threads exported in /proc is scheduled for removal

RX overuns happen when the NAS isn't keeping up with the transfer requests. So it's not surprising you'd see them with flow control off, since your NAS isn't capable of transferring at full gigabit speed. The NAS uses flow control requests to pause the traffic when the buffering on the requests starts to get full. The router then will hold them (and at some point will pause the traffic from the PC).

Note that no traffic will be lost due to the overruns - the NAS will request retransmission. TCP (network) speeds will drop though, as the network protocols are designed to back off speeds when any loss occurs.

dstsui wrote:

I disabled BTRFS on all shares and with no background process running, I got the following test results on one of the Shares:

You must mean bitrot, since it is impossible to disable btrfs (it is the file system).
dstsui
Aspirant
May 01, 2021
StephenB wrote:
IMO you should be careful to tackle one issue at a time. So wait to address the speed issue until you have a full RAID-1 volume back.

I also suggest making a backup of your files before you add the replacement disk back. The rebuild of the RAID will stress the disk 2 (every sector is read and copied to the replacement to create the mirror). If the disk 2 fails during the process, you will lose your data.
I already made a backup and it is now going through the array rebuild process. Fingers crossed.

Will certainly try your other suggestions. Not sure I have the appetite to start from scatch and resintall the OS, too much work. If I can get to to say 50MB/s write and 40MB/s read I will be happy.
StephenB
Guru - Experienced User
May 01, 2021
dstsui wrote:
Not sure I have the appetite to start from scatch and resintall the OS, too much work. If I can get to to say 50MB/s write and 40MB/s read I will be happy

FWIW, I did it fairly recently on my RN202 (not for performance reasons). I had about 12 TB of data to reload.

The factory reset itself doesn't take that much of your time unless you have a lot of apps you need to track down and reinstall. Reloading data takes more time, but you really don't have that much. The main issue for me is that the NAS isn't available until the process completes. Though the RN202 is exclusively for back up, it's not my main NAS.

Aspirant

May 01, 2021

I backed up the contents in Disk 2 before I inserted a new disk that was quick tested with WD Dashboard into bay 1 at 1:26pm. The system completed rebuild at 8:25pm. Should I be concerned about this message in the system log?

snapperd[12524]: loading 2227 failed

This appears regularly in the log, not just after the rebuild. Nothing of signficance appears in the kernel log (second window) during and after the rebuild was completed.

I initiated a Disk Test after the rebuild was completed and are now awaiting results. Is it worth taking Disk 2 out and run an extended diagnostic test with WD Dashboard just to be safe? Do you think I should replace Disk 2 as well even though it may pass the test?

May 01 20:25:49 nas-36-1C-14 mdadm[1926]: RebuildFinished event detected on md device /dev/md127, component device recovery
May 01 20:25:50 nas-36-1C-14 mdadm[1926]: SpareActive event detected on md device /dev/md127, component device /dev/sdc3
May 01 20:28:59 nas-36-1C-14 connmand[1922]: ntp: adjust (slew): +0.000579 sec
May 01 20:46:04 nas-36-1C-14 connmand[1922]: ntp: adjust (slew): -0.000303 sec
May 01 21:03:09 nas-36-1C-14 connmand[1922]: ntp: adjust (slew): +0.000041 sec
May 01 21:06:26 nas-36-1C-14 dbus[1921]: [system] Activating service name='org.opensuse.Snapper' (using servicehelper)
May 01 21:06:26 nas-36-1C-14 dbus[1921]: [system] Successfully activated service 'org.opensuse.Snapper'
May 01 21:06:26 nas-36-1C-14 snapperd[12524]: loading 2227 failed
May 01 21:06:26 nas-36-1C-14 snapperd[12524]: loading 2227 failed
May 01 21:06:27 nas-36-1C-14 snapperd[12524]: loading 2227 failed
May 01 21:06:27 nas-36-1C-14 snapperd[12524]: loading 2227 failed
May 01 21:06:27 nas-36-1C-14 snapperd[12524]: loading 2227 failed
May 01 21:17:01 nas-36-1C-14 CRON[13015]: pam_unix(cron:session): session opened for user root by (uid=0)
May 01 21:17:01 nas-36-1C-14 CRON[13016]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
May 01 21:17:01 nas-36-1C-14 CRON[13015]: pam_unix(cron:session): session closed for user root
May 01 21:20:14 nas-36-1C-14 connmand[1922]: ntp: adjust (slew): -0.000513 sec
May 01 21:22:53 nas-36-1C-14 apache2[10157]: [:error] [pid 10157] [client 192.168.1.20:55549] mod_csrf(020): request denied, expired id (12 sec), action=deny, id=-, referer: https://192.168.1.29/admin/index.html
May 01 21:22:53 nas-36-1C-14 apache_access[10155]: 192.168.1.20 "POST /dbbroker HTTP/1.1" 403
May 01 21:22:53 nas-36-1C-14 apache_access[10155]: 192.168.1.20 "GET /admin/index.html HTTP/1.1" 401
May 01 21:22:53 nas-36-1C-14 apache2[10156]: pam_unix(frontview:auth): authentication failure; logname= uid=0 euid=0 tty= ruser= rhost=192.168.1.20  user=admin
May 01 21:22:57 nas-36-1C-14 apache2[10380]: [auth_basic:error] [pid 10380] [client 192.168.1.20:55552] AH01617: user admin: authentication failure for "/admin/index.html": Password Mismatch, referer: https://192.168.1.29/admin/index.html
May 01 21:37:19 nas-36-1C-14 connmand[1922]: ntp: adjust (slew): -0.000147 sec
May 01 21:54:24 nas-36-1C-14 connmand[1922]: ntp: adjust (slew): +0.000103 sec
May 01 22:06:33 nas-36-1C-14 dbus[1921]: [system] Activating service name='org.opensuse.Snapper' (using servicehelper)
May 01 22:06:33 nas-36-1C-14 dbus[1921]: [system] Successfully activated service 'org.opensuse.Snapper'
May 01 22:06:34 nas-36-1C-14 snapperd[14319]: loading 2227 failed
May 01 22:06:34 nas-36-1C-14 snapperd[14319]: loading 2227 failed
May 01 22:06:34 nas-36-1C-14 snapperd[14319]: loading 2227 failed
May 01 22:06:34 nas-36-1C-14 snapperd[14319]: loading 2227 failed
May 01 22:06:34 nas-36-1C-14 snapperd[14319]: loading 2227 failed
May 01 22:11:29 nas-36-1C-14 connmand[1922]: ntp: adjust (slew): -0.000351 sec
May 01 22:11:54 nas-36-1C-14 connmand[1922]: ntp: adjust (slew): -0.000969 sec
May 01 22:17:02 nas-36-1C-14 CRON[14952]: pam_unix(cron:session): session opened for user root by (uid=0)
May 01 22:17:02 nas-36-1C-14 CRON[14953]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
May 01 22:17:02 nas-36-1C-14 CRON[14952]: pam_unix(cron:session): session closed for user root
May 01 22:22:35 nas-36-1C-14 apache_access[10155]: Admin session from 192.168.1.20 ended after 2:09:34
May 01 22:28:58 nas-36-1C-14 connmand[1922]: ntp: adjust (slew): +0.000615 sec
May 01 22:46:03 nas-36-1C-14 connmand[1922]: ntp: adjust (slew): -0.000185 sec
May 01 23:03:08 nas-36-1C-14 connmand[1922]: ntp: adjust (slew): -0.000433 sec
May 01 23:06:40 nas-36-1C-14 dbus[1921]: [system] Activating service name='org.opensuse.Snapper' (using servicehelper)
May 01 23:06:40 nas-36-1C-14 dbus[1921]: [system] Successfully activated service 'org.opensuse.Snapper'
May 01 23:06:41 nas-36-1C-14 snapperd[16257]: loading 2227 failed
May 01 23:06:41 nas-36-1C-14 snapperd[16257]: loading 2227 failed
May 01 23:06:41 nas-36-1C-14 snapperd[16257]: loading 2227 failed
May 01 23:06:41 nas-36-1C-14 snapperd[16257]: loading 2227 failed
May 01 23:06:41 nas-36-1C-14 snapperd[16257]: loading 2227 failed
May 01 23:17:01 nas-36-1C-14 CRON[16785]: pam_unix(cron:session): session opened for user root by (uid=0)
May 01 23:17:01 nas-36-1C-14 CRON[16786]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
May 01 23:17:01 nas-36-1C-14 CRON[16785]: pam_unix(cron:session): session closed for user root
May 01 23:20:13 nas-36-1C-14 connmand[1922]: ntp: adjust (slew): -0.000199 sec
May 01 23:37:18 nas-36-1C-14 connmand[1922]: ntp: adjust (slew): +0.000127 sec
May 01 23:44:33 nas-36-1C-14 apache_access[10155]: Admin session started from 192.168.1.20
May 01 23:44:37 nas-36-1C-14 apache_access[10155]: Suppressed 2 duplicate messages
May 01 23:44:37 nas-36-1C-14 apache_access[10155]: 192.168.1.20 "POST /dbbroker HTTP/1.1" 401
May 01 23:45:55 nas-36-1C-14 apache_access[10155]: Suppressed 2 duplicate messages
May 01 23:45:55 nas-36-1C-14 apache_access[10155]: 192.168.1.20 "GET /ispdb.xml?_dc=1619878554438 HTTP/1.1" 200
May 01 23:52:27 nas-36-1C-14 dbus[1921]: [system] Activating service name='org.opensuse.Snapper' (using servicehelper)
May 01 23:52:28 nas-36-1C-14 dbus[1921]: [system] Successfully activated service 'org.opensuse.Snapper'
May 01 23:52:28 nas-36-1C-14 snapperd[18410]: loading 2227 failed
May 01 23:52:28 nas-36-1C-14 snapperd[18410]: loading 2227 failed
May 01 23:52:28 nas-36-1C-14 snapperd[18410]: loading 2227 failed
May 01 23:52:28 nas-36-1C-14 snapperd[18410]: loading 2227 failed
May 01 23:52:28 nas-36-1C-14 snapperd[18410]: loading 2227 failed
May 01 23:52:29 nas-36-1C-14 rnutil[18439]: Failed to get tier from /dev/md0 : 25:

May 01 13:25:57 nas-36-1C-14 kernel: ata1: exception Emask 0x10 SAct 0x0 SErr 0x4250000 action 0xe frozen
May 01 13:25:57 nas-36-1C-14 kernel: ata1: irq_stat 0x80000040, connection status changed
May 01 13:25:57 nas-36-1C-14 kernel: ata1: SError: { PHYRdyChg CommWake BadCRC DevExch }
May 01 13:25:57 nas-36-1C-14 kernel: ata1: hard resetting link
May 01 13:26:05 nas-36-1C-14 kernel: ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
May 01 13:26:06 nas-36-1C-14 kernel: ata1.00: ATA-9: WDC WD20EFRX-68EUZN0, 82.00A82, max UDMA/133
May 01 13:26:06 nas-36-1C-14 kernel: ata1.00: 3907029168 sectors, multi 0: LBA48 NCQ (depth 31/32), AA
May 01 13:26:06 nas-36-1C-14 kernel: ata1.00: configured for UDMA/133
May 01 13:26:06 nas-36-1C-14 kernel: ata1: EH complete
May 01 13:26:06 nas-36-1C-14 kernel: scsi 0:0:0:0: Direct-Access     ATA      WDC WD20EFRX-68E 0A82 PQ: 0 ANSI: 5
May 01 13:26:06 nas-36-1C-14 kernel: sd 0:0:0:0: [sdc] 3907029168 512-byte logical blocks: (2.00 TB/1.82 TiB)
May 01 13:26:06 nas-36-1C-14 kernel: sd 0:0:0:0: [sdc] 4096-byte physical blocks
May 01 13:26:06 nas-36-1C-14 kernel: sd 0:0:0:0: [sdc] Write Protect is off
May 01 13:26:06 nas-36-1C-14 kernel: sd 0:0:0:0: [sdc] Mode Sense: 00 3a 00 00
May 01 13:26:06 nas-36-1C-14 kernel: sd 0:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
May 01 13:26:06 nas-36-1C-14 kernel: sd 0:0:0:0: Attached scsi generic sg3 type 0
May 01 13:26:06 nas-36-1C-14 kernel: sd 0:0:0:0: [sdc] Attached SCSI disk
May 01 13:26:08 nas-36-1C-14 kernel:  sdc: sdc1
May 01 13:26:09 nas-36-1C-14 kernel: md: bind<sdc1>
May 01 13:26:09 nas-36-1C-14 kernel: RAID1 conf printout:
May 01 13:26:09 nas-36-1C-14 kernel:  --- wd:1 rd:2
May 01 13:26:09 nas-36-1C-14 kernel:  disk 0, wo:1, o:1, dev:sdc1
May 01 13:26:09 nas-36-1C-14 kernel:  disk 1, wo:0, o:1, dev:sda1
May 01 13:26:09 nas-36-1C-14 kernel: md: recovery of RAID array md0
May 01 13:26:09 nas-36-1C-14 kernel: md: minimum _guaranteed_  speed: 30000 KB/sec/disk.
May 01 13:26:09 nas-36-1C-14 kernel: md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery.
May 01 13:26:09 nas-36-1C-14 kernel: md: using 128k window, over a total of 4190208k.
May 01 13:26:14 nas-36-1C-14 kernel: md: bind<sdc3>
May 01 13:26:14 nas-36-1C-14 kernel: RAID1 conf printout:
May 01 13:26:14 nas-36-1C-14 kernel:  --- wd:1 rd:2
May 01 13:26:14 nas-36-1C-14 kernel:  disk 0, wo:1, o:1, dev:sdc3
May 01 13:26:14 nas-36-1C-14 kernel:  disk 1, wo:0, o:1, dev:sda3
May 01 13:26:14 nas-36-1C-14 kernel: md: recovery of RAID array md127
May 01 13:26:14 nas-36-1C-14 kernel: md: minimum _guaranteed_  speed: 30000 KB/sec/disk.
May 01 13:26:14 nas-36-1C-14 kernel: md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery.
May 01 13:26:14 nas-36-1C-14 kernel: md: using 128k window, over a total of 1948662784k.
May 01 13:26:16 nas-36-1C-14 kernel: md1: detected capacity change from 535822336 to 0
May 01 13:26:16 nas-36-1C-14 kernel: md: md1 stopped.
May 01 13:26:16 nas-36-1C-14 kernel: md: unbind<sda2>
May 01 13:26:16 nas-36-1C-14 kernel: md: export_rdev(sda2)
May 01 13:26:17 nas-36-1C-14 kernel: md: bind<sda2>
May 01 13:26:17 nas-36-1C-14 kernel: md: bind<sdc2>
May 01 13:26:17 nas-36-1C-14 kernel: md/raid1:md1: not clean -- starting background reconstruction
May 01 13:26:17 nas-36-1C-14 kernel: md/raid1:md1: active with 2 out of 2 mirrors
May 01 13:26:17 nas-36-1C-14 kernel: md1: detected capacity change from 0 to 535822336
May 01 13:26:17 nas-36-1C-14 kernel: md: resync of RAID array md1
May 01 13:26:17 nas-36-1C-14 kernel: md: minimum _guaranteed_  speed: 30000 KB/sec/disk.
May 01 13:26:17 nas-36-1C-14 kernel: md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for resync.
May 01 13:26:17 nas-36-1C-14 kernel: md: using 128k window, over a total of 523264k.
May 01 13:26:20 nas-36-1C-14 kernel: Adding 523260k swap on /dev/md1.  Priority:-1 extents:1 across:523260k 
May 01 13:26:41 nas-36-1C-14 kernel: md: md1: resync done.
May 01 13:26:42 nas-36-1C-14 kernel: RAID1 conf printout:
May 01 13:26:42 nas-36-1C-14 kernel:  --- wd:2 rd:2
May 01 13:26:42 nas-36-1C-14 kernel:  disk 0, wo:0, o:1, dev:sda2
May 01 13:26:42 nas-36-1C-14 kernel:  disk 1, wo:0, o:1, dev:sdc2
May 01 13:28:20 nas-36-1C-14 kernel: md: md0: recovery done.
May 01 13:28:20 nas-36-1C-14 kernel: RAID1 conf printout:
May 01 13:28:20 nas-36-1C-14 kernel:  --- wd:2 rd:2
May 01 13:28:20 nas-36-1C-14 kernel:  disk 0, wo:0, o:1, dev:sdc1
May 01 13:28:20 nas-36-1C-14 kernel:  disk 1, wo:0, o:1, dev:sda1
May 01 20:25:48 nas-36-1C-14 kernel: md: md127: recovery done.
May 01 20:25:48 nas-36-1C-14 kernel: RAID1 conf printout:
May 01 20:25:48 nas-36-1C-14 kernel:  --- wd:2 rd:2
May 01 20:25:48 nas-36-1C-14 kernel:  disk 0, wo:0, o:1, dev:sdc3
May 01 20:25:48 nas-36-1C-14 kernel:  disk 1, wo:0, o:1, dev:sda3

StephenB
Guru - Experienced User
May 01, 2021
dstsui wrote:

Is it worth taking Disk 2 out and run an extended diagnostic test with WD Dashboard just to be safe? Do you think I should replace Disk 2 as well even though it may pass the test?

The disk test will run a full read test of both disks, so there is no need to do anything with the Dashboard program.

Personally I leave my disks in place until I see some evidence that they are misbehaving. So I'd leave disk 2 alone for now.

dstsui wrote:

Should I be concerned about this message in the system log?

snapperd[12524]: loading 2227 failed

That is a bit concerning, it suggests something isn't quite right with the file system (indicating that a snapshot can't be loaded).

Are you using the "smart" snapshots? I don't like them myself, because some of them are retained forever (and the disk space that needs gradually goes up). I use custom snapshots, and explicitly set retention. Most shares are set to 3 months.

Anyway, I suggest deleting all your current snapshots from the web ui. After that try doing a scrub from the volume settings wheel after the disk test completes. Then see what file system errors show up in the log as a result of the scrub.

Or do the factory default, which will obviously take some time, but will also give you a completely clean file system without needing to do more analysis and repair.
dstsui
Aspirant
May 01, 2021
The disk test finished without drama. Where can I see the resutls?

Some of the oldest snapshots date back to Jan 2015. I guess it is safe to delete Shares older than say one year. I will run a Scub after I deleted the old shares.
dstsui
Aspirant
May 02, 2021
I noticed something interesting in the snapshot history. Before March snapshots were taken once a month but since March they were taken daily. The oldest snapshots date back to the time when I set up the system in Jan 2015, shouldn't they be purged already?

Tonight I copied 489 images from an SD card on my PC to the RN102 and it clocked a whopping speed of 10MB/s. At least it finished without crashing, but something has to be done. I wondered whether all these legacy snapshots are weighing it down.
mdgm
Virtuoso
May 02, 2021
Does your system show how much space is consumed by snapshots? If so you have quotas enabled on your data volume and disabling that may help.

By default monthly snapshots are kept indefinitely unless there is limited free space on the data volume

A custom snapshot schedule only keeping a more limited number of snapshots would likely lead to better performance too as StephenB mentioned.

Note that if you delete a snapshot all newer snapshots are recursively updated. This can take a while to complete so performance would get worse before it improves.
StephenB
Guru - Experienced User
May 03, 2021
mdgm wrote:

By default monthly snapshots are kept indefinitely unless there is limited free space on the data volume

A custom snapshot schedule only keeping a more limited number of snapshots would likely lead to better performance too as StephenB mentioned.

Note that if you delete a snapshot all newer snapshots are recursively updated. This can take a while to complete so performance would get worse before it improves.

Also, your error is saying that one particular snapshot is somehow damaged and can't be read. We don't know how old that is - and you might not be able to delete it. So there are multiple issues here - the bad disk was one, but the error with the snapshot suggests some file system corruption.

It's possible that the poor performance will continue after you get rid of the ancient snapshots, but even if it does, you need to address the file system corruption.

Overall, the direct way is to do the factory default, set up the NAS again, and restore the files from backup. Though painful, that is guaranteed to clean up everything.

But if you really can't do that, then look for more file system errors after you delete the old snapshots, and look again after the scrub. If they are gone, follow up with a balance (also from the volume settings wheel). Then measure performance again.
dstsui
Aspirant
May 03, 2021
I could delete all Shares without issues. I started a disk scrub last night and when I checked the status this morning, it has only progressed 2.5%, if the rate is constant, the scrub won't finish for another 13 days. Is it meant to be that slow?
StephenB
Guru - Experienced User
May 04, 2021
dstsui wrote:

I started a disk scrub last night and when I checked the status this morning, it has only progressed 2.5%, if the rate is constant, the scrub won't finish for another 13 days. Is it meant to be that slow?

The rate might not be constant, but it is slower than I'd expect.

I did a scrub on my RN102 (which has a 2x1TB volume) about a month ago, and it took about 5 1/2 hours. So I'd expect about 11-12 hours for 2x2TB.

The scrub is doing two different operations in parallel - one is a RAID scrub (which is the same as a resync), the second is a BTRFS scrub. The BTRFS scrub is what we're aiming for here - how long it takes might be longer because you've never run one.

In any event, I'd wait for it to complete. You could download a fresh set of logs, and see if there are any new disk errors (or btrfs errors).

dstsui wrote:

I could delete all Shares without issues.

Do you mean Snapshots, or did you delete the Shares?

dstsui

Aspirant

May 04, 2021

The scrub finished at 1:11 this morning, took 25 hrs to complete. Snapper failures still appear in the system log after rebuild, repeating every 43 mins. Is it possible to nail down the exact cause of this problem? No anomalies are observed in kernel log. I observed something odd in the reported consumed size of the Transmission share but did not report it previously. Wonder if this is an OS bug or is it somehow related to the Snapper loading failure. All snapshots were deleted before the scrub began.

Ran NASTester again, the speed is now more reasonable, 54MB/s write and 36MB/s read.

May 05 01:10:57 nas-36-1C-14 mdadm[1926]: RebuildFinished event detected on md device /dev/md127, component device resync
May 05 01:17:01 nas-36-1C-14 CRON[24684]: pam_unix(cron:session): session opened for user root by (uid=0)
May 05 01:17:01 nas-36-1C-14 CRON[24685]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
May 05 01:17:01 nas-36-1C-14 CRON[24684]: pam_unix(cron:session): session closed for user root
May 05 01:17:18 nas-36-1C-14 dbus[1921]: [system] Activating service name='org.opensuse.Snapper' (using servicehelper)
May 05 01:17:18 nas-36-1C-14 dbus[1921]: [system] Successfully activated service 'org.opensuse.Snapper'
May 05 01:17:19 nas-36-1C-14 snapperd[24713]: loading 2227 failed
May 05 01:17:19 nas-36-1C-14 snapperd[24713]: loading 2227 failed
May 05 01:17:19 nas-36-1C-14 snapperd[24713]: loading 2227 failed
May 05 01:17:19 nas-36-1C-14 snapperd[24713]: loading 2227 failed
May 05 01:17:19 nas-36-1C-14 snapperd[24713]: loading 2227 failed
May 05 01:20:14 nas-36-1C-14 connmand[1922]: ntp: adjust (slew): -0.000153 sec
May 05 01:37:19 nas-36-1C-14 connmand[1922]: ntp: adjust (slew): -0.000953 sec
May 05 01:54:24 nas-36-1C-14 connmand[1922]: ntp: adjust (slew): -0.000544 sec
May 05 02:00:02 nas-36-1C-14 dbus[1921]: [system] Activating service name='org.opensuse.Snapper' (using servicehelper)
May 05 02:00:02 nas-36-1C-14 dbus[1921]: [system] Successfully activated service 'org.opensuse.Snapper'
May 05 02:00:02 nas-36-1C-14 snapperd[26740]: loading 2227 failed
May 05 02:00:02 nas-36-1C-14 snapperd[26740]: loading 2227 failed
May 05 02:00:02 nas-36-1C-14 snapperd[26740]: loading 2227 failed
May 05 02:00:02 nas-36-1C-14 snapperd[26740]: loading 2227 failed
May 05 02:00:03 nas-36-1C-14 snapperd[26740]: loading 2227 failed

NAS performance tester 1.7 http://www.808.dk/?nastester
Running warmup...
Running a 400MB file write on R: 5 times...
Iteration 1:     56.72 MB/sec
Iteration 2:     57.52 MB/sec
Iteration 3:     54.34 MB/sec
Iteration 4:     56.85 MB/sec
Iteration 5:     46.13 MB/sec
-----------------------------
Average (W):     54.31 MB/sec
-----------------------------
Running a 400MB file read on R: 5 times...
Iteration 1:     36.15 MB/sec
Iteration 2:     35.40 MB/sec
Iteration 3:     36.30 MB/sec
Iteration 4:     36.99 MB/sec
Iteration 5:     36.05 MB/sec
-----------------------------
Average (R):     36.18 MB/sec
-----------------------------

dstsui
Aspirant
May 04, 2021
I was never able to get an uploaded image appear in a post, regardless of the file format. What did it do wrong?
StephenB
Guru - Experienced User
May 05, 2021
dstsui wrote:

I observed something odd in the reported consumed size of the Transmission share but did not report it previously.

Disable volume quota (on the volume settings wheel), and then re-enable it. That should resolve it.

dstsui wrote:

Is it possible to nail down the exact cause of this problem?

The problem is that there is still at least one snapshot in the system, and that is corrupted and cannot be loaded. The simplest way to resolve it is to do a factory default, and rebuild the file system (as I've already recommended). One aspect is that there could be other file system issues lurking under the surface, that might not be so visible.

The subvolumes could be listed via ssh, and you could try manually deleting them. But if they are corrupted, the deletion will probably fail.

dstsui wrote:

I was never able to get an uploaded image appear in a post, regardless of the file format.

All embedded images are manually reviewed and approved before they are shown. That can take a while.
dstsui
Aspirant
May 05, 2021
I am wondering what is the best practice in terms of setting up backup, as in how to organise the folders. I attached a 2TB 2.5" HDD to one of the bottom USB port at the back. The OS mapped it as a media share and created a folder called USB_HDD_1.

When I set up backup jobs for the first time, I created a separate job for each share and selected a share in the data volume as the source, and selected as destination USB Back Bottom in the folder USB/eSATA .When the backup job was run, the result was that all the sub-folders in the share were dumped at the root of the backup drive, the high level share folder was not copied across. The same happened to all the other shares, so there was no hierarchy to speak of in the backup. Not happy with the results, I then created separate folders in the backup drive that mirror the name of the shares in the data volume and modified the paths in each job accordingly assuming the files will be copied to the specified path. When the backup was run again, the result was the same. The sub-folders were copied to the root of the backup drive and not to the specified folder. It was only until I specified media share USB_HDD_1 as the destination with the appropriate path, then the share folders are copied to the correct folder in the backup drive.

Is there a right way and a wrong way to set up backup? Is there any problem backing up to a USB HDD mapped as a share? If I took out the USB HDD and connect it to a Windows PC, can the PC read the backup files?
StephenB
Guru - Experienced User
May 05, 2021
One question for you is what behavior you want/expect when a file is renamed or deleted?

Your current settings will keep deleted files on the backup drive - and if you rename a file, both the original file and the renamed one will be on the backup drive. There is a trick you can use to get an exact mirror of the share if you want that.

The tradeoff here is that keeping deleted files lets you recover from user error. But it also makes it harder to restore the backup if you need to reload everything onto the NAS (as you'll need to manually remove the files you no longer want).

dstsui wrote:

If I took out the USB HDD and connect it to a Windows PC, can the PC read the backup files?

If the USB HDD is formatted as NTFS, then the PC can read it. If you aren't sure of the formatting, eject it (using the web ui interface), connect it to the PC and check.

dstsui wrote:

Is there any problem backing up to a USB HDD mapped as a share?

That should be fine

dstsui wrote:

It was only until I specified media share USB_HDD_1 as the destination with the appropriate path, then the share folders are copied to the correct folder in the backup drive.

Which is the right way to do it.

dstsui wrote:

When I set up backup jobs for the first time, I created a separate job for each share

Some people prefer to use one backup job for the entire data volume. Personally I prefer having one backup job per share.

One reason I like multiple backup jobs is that it is a bit easier to handle the case where the full backup won't fit on one USB drive. Also, you might have some shares you might not need to back up (for instance, if you have one for transmission).

Forum Discussion

RN102 broken after OS 6.10.4 update

Related Content

Wifi broken after firmware update

ReadyNAS 2304 6.10.4 Firmware has broken Volume data Rsynch backups

ReadyNAS 2304 6.10.4 Firmware has broken Volume data Rsynch backups

App broken?

apt.readynas.com - Broken repo?

NETGEAR Academy

ProSupport for Business