NETGEAR is aware of a growing number of phone and online scams. To learn how to stay safe click here.
Forum Discussion
dstsui
Apr 28, 2021Aspirant
RN102 broken after OS 6.10.4 update
My RN102 appears to be broken after updated to 6.10.4. Not sure which came first, I updated the OS to 6.10.4 and noticed very slow transfer speed when I copied hundreds of JPEG images from the SD car...
StephenB
Apr 29, 2021Guru - Experienced User
Lowest risk would be to add a step to back up the data on the NAS (for instance to a USB drive). This could be done before step 1, but it likely will go faster between step 2 and 3.
dstsui wrote:
Please confirm the following action plan is sound with low risk:
- Remove Disk 1 - should I remove the disk when the NAS is live or is it safer to remove it when it is powered down?
It doesn't really matter - but you probably should confirm that the NAS boots normally with only disk 2 in place. So maybe power down.
dstsui wrote:3. Power down the NAS and insert Disk 1 with a new HDD into the bay.
No need to power down here. Hot-insertion is fine.
Generally I test new disks before I add them to the NAS. What I do is test them with vendor diags in a Windows PC - running the full non-destructive "long" test, followed by a full erase/write zeros test. I've received some new disks that pass one of those tests but fail the other.
dstsui wrote:Some HDD docks have two disk slots and can do disk cloning. Is it okay to clone a disk from Disk 2 using the dock or is it better to leave this to the NAS?
I wouldn't. If you run the NAS with only disk 2 in place, you still need to do the resync. Cloning is sometimes a necessary part of data recovery - but if there are sectors that can't be read, there will be some garbage on the clone. Since there are no read errors on the new good disk, the NAS has no way to tell which sectors are garbage.
dstsui wrote:
How come the disk_info log attached shows no disk errors but the kernel log logged disk errors? Does Channel 0 corresponds to Disk 1?
That is an excellent question. You'd think it would show up in the SMART stats. But I've seen this before, so I know errors don't always end up in those stats.
SMART stats are kept by the disk itself - the NAS just asks the disk for them. So this is either a limitation in SMART itself, or something not quite right in the disk's statistics collection.
The disk is over 6 years old, so you have gotten your money's worth out of it.
dstsui
Apr 29, 2021Aspirant
The Volume log has entries for both RAID 0 and RAID 1. What does that mean? My understanding is that RAID 0 offers no redunancy and both disks are required to operate. If disk 1 is removed, does it mean the system will fail to boot with only disk 2?
- mdgmApr 29, 2021Virtuosomdstat.log shows info on the RAID levels more concisely
md0 is the 4GB root volume and uses RAID-1. The system will still boot if a disk is removed however any RAID-0 data volume using that disk won’t be able to be mounted as the RAID array won’t be able to be started. - dstsuiApr 29, 2021Aspirant
Based on the logs, do you think the data in disk 1 is recoverable using WD disk utility? Can you tell which Share will be affected by the loss of disk 1?
- dstsuiApr 29, 2021Aspirant
The following is extracted from mdstat:
Personalities : [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
md127 : active raid1 sda3[0] sdb3[1]
1948662784 blocks super 1.2 [2/2] [UU]
bitmap: 0/15 pages [0KB], 65536KB chunkmd1 : active raid1 sda2[0] sdb2[1]
523712 blocks super 1.2 [2/2] [UU]
md0 : active raid1 sda1[0] sdb1[1]
4190208 blocks super 1.2 [2/2] [UU]If md127 is the data volume (size appears correct), it is RAID1. RAID0 does not appear in the log.
- dstsuiApr 29, 2021Aspirant
/dev/md/data-0: Version : 1.2 Creation Time : Sun Jan 11 06:16:09 2015 Raid Level : raid1 Array Size : 1948662784 (1858.39 GiB 1995.43 GB) Used Dev Size : 1948662784 (1858.39 GiB 1995.43 GB) Raid Devices : 2 Total Devices : 2 Persistence : Superblock is persistent Intent Bitmap : Internal Update Time : Thu Apr 29 07:28:54 2021 State : clean Active Devices : 2 Working Devices : 2 Failed Devices : 0 Spare Devices : 0 Consistency Policy : unknown Name : 0e361c14:data-0 (local to host 0e361c14) UUID : 93079119:a9de47f1:0351ef69:148df03c Events : 5138 Number Major Minor RaidDevice State 0 8 3 0 active sync /dev/sda3 1 8 19 1 active sync /dev/sdb3If md127 is indeed the data volume, then everything looks good to me:
- mdgmApr 30, 2021Virtuoso
Yes, md127 is the data volume and in your case it's using RAID-1.
If there were multiple volumes or multiple RAID layers for a volume then you'd see other RAID arrays like md126 etc.md0 is the 4GB root volume
md1 is the swap
Note that with RAID-1 you can still have issues like filesystem corruption, accidental file deletion etc.
RAID doesn't replace the need for backups. No important data should be stored on just the one device. - StephenBApr 30, 2021Guru - Experienced User
dstsui wrote:
Based on the logs, do you think the data in disk 1 is recoverable using WD disk utility?
I don't recommend trying that. In my experience such "recovery" is more of a bandaid than a cure. When I've tried it, the disk very quickly starts throwing errors again.
Another aspect is that you are experiencing read errors. The disk utility can't preserve the data if it can't read the disk. So the volume would still need to be resynced, otherwise you won't have a proper mirror.
A replacement drive would cost ~$80 (current USD pricing). IMO attempting a "repair" isn't worth the risk.
But you are jumping ahead. The next step is to test the disk. I suspect it will fail.
dstsui wrote:
Can you tell which Share will be affected by the loss of disk 1?
Your volume is RAID-1 (mirrored disks), so there should be no data loss. All shares will lose redundancy when you remove disk 1 (since there will be no mirror). The volume status will change to "degraded" to reflect that.
But you've actually already lost some redundancy, due to the read errors. The logs show that the system is relying on disk 2 to get your data (the log shows the read for disk 1 failing, and the RAID system failing over to read from disk 2). If disk 2 were to fail, there would be at least some data loss.
dstsui wrote:
The Volume log has entries for both RAID 0 and RAID 1.
FYI, you are misreading the log. Volume.log simply numbers the RAID arrays starting from 0. So that entry isn't telling you the RAID mode - it's just a label. A few lines after that label you should see
Level: 1
That tells you the RAID mode is RAID-1.
- dstsuiApr 30, 2021Aspirant
Removed disk 1 after powering down the NAS and it boots up fine with just disk 2.
I ran NASTester all Shares. The Share with BitRot protection enabled measured 25MB/s. The remaining Shares with BitRot protection disabled measured 36MB/s on average. Is this speed normal at all? I think it is underpar even with BitRot protection off. If so, what else could be wrong? No problem reading or writing multiple files though. Is BitRot protection worth the sacrifice in speed? I have problem with 25MB/s in accessing photos on my photo management software.
I will test disk 1 tomorrow.
- StephenBApr 30, 2021Guru - Experienced User
dstsui wrote:
I ran NASTester all Shares. The Share with BitRot protection enabled measured 25MB/s. The remaining Shares with BitRot protection disabled measured 36MB/s on average. Is this speed normal at all?
Both speeds sound low to me. Are you testing over wifi or gigabit ethernet? On gigabit you can get ~70 MB/s read and ~50 MB/s write.
There were some improvements made to the BTRFS setup some years ago. But to take advantage of them you'd need to do a factory reset, rebuild the NAS, and restore all the data from backup. I tested that some time ago, and it did make a significant difference on my RN102.
dstsui wrote:
Is BitRot protection worth the sacrifice in speed?
Up to you really. I do leave it on my main NAS (an RN526x), but I've never had bitrot protection kick in and recover a file. But it's not an RN100 series, and even with bitrot on, it can saturate a fast gigabit connection.
- dstsuiApr 30, 2021Aspirant
StephenB wrote:
dstsui wrote:I ran NASTester all Shares. The Share with BitRot protection enabled measured 25MB/s. The remaining Shares with BitRot protection disabled measured 36MB/s on average. Is this speed normal at all?
Both speeds sound low to me. Are you testing over wifi or gigabit ethernet? On gigabit you can get ~70 MB/s read and ~50 MB/s write.
Testing was done over GigE. I have confirmed the LAN hardware is sound before using a direct Cat 6e cable connection (no router and switch) between NAS and PC and verifed with another PC over 5GHz WiFi and the speed in both tests were about the same. It was somewhat lower over WiFi but this was expected. I concluded the common element is the NAS and is the likely culprit. What issues could possibly drag the speed down by that much? Are there any clues in the logs?
A backup of the NAS was run overnight and completed without issues.
I ran WD Dashboard Diganostic Extended Test on disk 1 and it failed with error code 2. Don't know what that means.
I will intall a new disk into the NAS today after checking it with WD Dashboard.
- dstsuiApr 30, 2021Aspirant
StephenB wrote:
dstsui wrote:I ran NASTester all Shares. The Share with BitRot protection enabled measured 25MB/s. The remaining Shares with BitRot protection disabled measured 36MB/s on average. Is this speed normal at all?
There were some improvements made to the BTRFS setup some years ago. But to take advantage of them you'd need to do a factory reset, rebuild the NAS, and restore all the data from backup. I tested that some time ago, and it did make a significant difference on my RN102.
When was BTRFS introduced? I had the RN102 for 6 years and I should have checked the speed when I first installed it and defintely would not have accepted it if the speed was so slow back then, which is not a lot better than my old ReadyNAS Duo. BTRFS is unlikely to be the real issue, we are talking about 60% reduction in speed so something fundamental must be wrong.
- dstsuiApr 30, 2021Aspirant
I disabled BTRFS on all shares and with no background process running, I got the following test results on one of the Shares:
NAS performance tester 1.7 http://www.808.dk/?nastester
Running warmup...
Running a 400MB file write on R: 5 times...
Iteration 1: 52.86 MB/sec
Iteration 2: 58.10 MB/sec
Iteration 3: 56.61 MB/sec
Iteration 4: 52.46 MB/sec
Iteration 5: 58.37 MB/sec
-----------------------------
Average (W): 55.68 MB/sec
-----------------------------
Running a 400MB file read on R: 5 times...
Iteration 1: 33.95 MB/sec
Iteration 2: 34.29 MB/sec
Iteration 3: 30.59 MB/sec
Iteration 4: 29.39 MB/sec
Iteration 5: 33.50 MB/sec
-----------------------------
Average (R): 32.34 MB/secThe write speed has improved signficantly but the read speed has not improved. Any clues?
- dstsuiApr 30, 2021Aspirant
dstsui wrote:I disabled BTRFS on all shares and with no background process running, I got the following test results on one of the Shares:
NAS performance tester 1.7 http://www.808.dk/?nastester
Running warmup...
Running a 400MB file write on R: 5 times...
Iteration 1: 52.86 MB/sec
Iteration 2: 58.10 MB/sec
Iteration 3: 56.61 MB/sec
Iteration 4: 52.46 MB/sec
Iteration 5: 58.37 MB/sec
-----------------------------
Average (W): 55.68 MB/sec
-----------------------------
Running a 400MB file read on R: 5 times...
Iteration 1: 33.95 MB/sec
Iteration 2: 34.29 MB/sec
Iteration 3: 30.59 MB/sec
Iteration 4: 29.39 MB/sec
Iteration 5: 33.50 MB/sec
-----------------------------
Average (R): 32.34 MB/secThe write speed has improved signficantly but the read speed has not improved. Any clues?
These results above were obtained with Flow Control on the PC NIC disabled. With Flow Control (Tx and Rx) enabled, I got the following results:
Running warmup...
Running a 400MB file write on R: 5 times...
Iteration 1: 30.75 MB/sec
Iteration 2: 30.22 MB/sec
Iteration 3: 30.86 MB/sec
Iteration 4: 31.57 MB/sec
Iteration 5: 30.84 MB/sec
-----------------------------
Average (W): 30.85 MB/sec
-----------------------------
Running a 400MB file read on R: 5 times...
Iteration 1: 33.76 MB/sec
Iteration 2: 34.52 MB/sec
Iteration 3: 34.29 MB/sec
Iteration 4: 33.39 MB/sec
Iteration 5: 33.89 MB/sec
-----------------------------
Average (R): 33.97 MB/secI also spotted the following entries in the kernel log around the time when I ran the first test with flow control off:
May 01 12:10:43 nas-36-1C-14 kernel: mvneta d0074000.ethernet eth0: bad rx status 0e830000 (overrun error), size=1024
May 01 12:10:43 nas-36-1C-14 kernel: mvneta d0074000.ethernet eth0: bad rx status 0e830000 (overrun error), size=512
May 01 12:10:43 nas-36-1C-14 kernel: mvneta d0074000.ethernet eth0: bad rx status 0e830000 (overrun error), size=128
May 01 12:23:58 nas-36-1C-14 kernel: nr_pdflush_threads exported in /proc is scheduled for removal - mdgmMay 01, 2021Virtuoso
What does your initrd.log look like in the logs zip? That will show the firmware update history since the last factory reset of your system.
BTRFS has always been used as the filesystem for the data volume for OS6 and the filesystem for the root volume on the x86 NAS units (the 102 uses EXT4 for the root volume).
Bit-rot protection was added back in 6.2.xIf you ever used bit-rot protection or snapshots, disabling bit-rot protection isn't going to undo all the CoW that has happened. It will just limit what happens going forward.
There are things you could check like CPU usage, whether the memory is using a lot of swap etc. Also turn off services and apps that you don't need.
Over time as updates are released software may consume more resources and the RN100 series is the least powerful systems that run OS6 so any performance hit is going to be most noticeable on those systems.If you have a bad disk that can impact performance. If you only have the one disk that can also impact performance, and if your NAS is doing a resync of the RAID whilst adding a replacement disk performance will be reduced until that resync completes.
- StephenBMay 01, 2021Guru - Experienced User
IMO you should be careful to tackle one issue at a time. So wait to address the speed issue until you have a full RAID-1 volume back.
I also suggest making a backup of your files before you add the replacement disk back. The rebuild of the RAID will stress the disk 2 (every sector is read and copied to the replacement to create the mirror). If the disk 2 fails during the process, you will lose your data.
dstsui wrote:
When was BTRFS introduced?Your NAS has always used BTRFS.
My own RN102 goes back to 2013. By 2016 the read speed had dropped to about 35 MB/sec. This was using jbod volumes (so bitrot protection was off, since it needs RAID redundancy). That speed sounds very similar to yours.
A factory reset restored the read speed to around 70 MB/s. I posted the results here, but it was a long time ago, and search isn't finding them. I did another benchmark in 2019, checking all three versions of SMB. The second test was using different disks, and the NAS was set up for XRAID/RAID-1 - not jbod. SMB 2.1 speeds were still around 70 MB/s, though SMB 3 was somewhat slower (63 MB/sec). https://community.netgear.com/t5/Using-your-ReadyNAS-in-Business/Average-transfer-speed-read-and-write-plummeted-after-replacing/m-p/1718589#M178329
So based on my own experience, a factory reset will almost certainly increase your speeds substantially. Yes, it is painful, but the performance increase is likely worth it.
Other things you can try:
- Disable strict sync in the SMB settings
- disable IPv6 in your NAS network settings. In some cases the PC will use ipv6 link local addresses for data transfer, and that can be slower (esp if your router doesn't have ipv6 enabled).
- Disabling all services you don't actually need (and uninstalling any apps you no longer use). The RN102 doesn't have much memory, and over the years the memory footprint of the firmware has increased. Removing services and apps can help mitigate that.
- Run the maintenance functions on the volume settings wheel. Balance and Scrub can both help improve performance. Personally I schedule one maintenance function per month - completing the cycle every four months (balance, defrag, scrub, disk test)
- StephenBMay 01, 2021Guru - Experienced User
While I get your desire to solve your original speed problem, I still suggest taking things one step at a time. Data safety comes first, performance comes second.
If I were in your position now (waiting for the new disk to be inserted and resync), I'd be making sure I had an up-to-date backup. Once that was taken care of, I'd run the disk test from the volume settings wheel - in order to confirm that disk 2 didn't have any issues. The disks are about the same age, and it isn't unusual to have multiple disks in a RAID array fail in rapid succession.
dstsui wrote:
I also spotted the following entries in the kernel log around the time when I ran the first test with flow control off:
May 01 12:10:43 nas-36-1C-14 kernel: mvneta d0074000.ethernet eth0: bad rx status 0e830000 (overrun error), size=1024
May 01 12:10:43 nas-36-1C-14 kernel: mvneta d0074000.ethernet eth0: bad rx status 0e830000 (overrun error), size=512
May 01 12:10:43 nas-36-1C-14 kernel: mvneta d0074000.ethernet eth0: bad rx status 0e830000 (overrun error), size=128
May 01 12:23:58 nas-36-1C-14 kernel: nr_pdflush_threads exported in /proc is scheduled for removalRX overuns happen when the NAS isn't keeping up with the transfer requests. So it's not surprising you'd see them with flow control off, since your NAS isn't capable of transferring at full gigabit speed. The NAS uses flow control requests to pause the traffic when the buffering on the requests starts to get full. The router then will hold them (and at some point will pause the traffic from the PC).
Note that no traffic will be lost due to the overruns - the NAS will request retransmission. TCP (network) speeds will drop though, as the network protocols are designed to back off speeds when any loss occurs.
dstsui wrote:
I disabled BTRFS on all shares and with no background process running, I got the following test results on one of the Shares:
You must mean bitrot, since it is impossible to disable btrfs (it is the file system).
- dstsuiMay 01, 2021Aspirant
StephenB wrote:IMO you should be careful to tackle one issue at a time. So wait to address the speed issue until you have a full RAID-1 volume back.
I also suggest making a backup of your files before you add the replacement disk back. The rebuild of the RAID will stress the disk 2 (every sector is read and copied to the replacement to create the mirror). If the disk 2 fails during the process, you will lose your data.
I already made a backup and it is now going through the array rebuild process. Fingers crossed.
Will certainly try your other suggestions. Not sure I have the appetite to start from scatch and resintall the OS, too much work. If I can get to to say 50MB/s write and 40MB/s read I will be happy.
- StephenBMay 01, 2021Guru - Experienced User
dstsui wrote:
Not sure I have the appetite to start from scatch and resintall the OS, too much work. If I can get to to say 50MB/s write and 40MB/s read I will be happyFWIW, I did it fairly recently on my RN202 (not for performance reasons). I had about 12 TB of data to reload.
The factory reset itself doesn't take that much of your time unless you have a lot of apps you need to track down and reinstall. Reloading data takes more time, but you really don't have that much. The main issue for me is that the NAS isn't available until the process completes. Though the RN202 is exclusively for back up, it's not my main NAS.
- dstsuiMay 01, 2021Aspirant
I backed up the contents in Disk 2 before I inserted a new disk that was quick tested with WD Dashboard into bay 1 at 1:26pm. The system completed rebuild at 8:25pm. Should I be concerned about this message in the system log?
snapperd[12524]: loading 2227 failed
This appears regularly in the log, not just after the rebuild. Nothing of signficance appears in the kernel log (second window) during and after the rebuild was completed.
I initiated a Disk Test after the rebuild was completed and are now awaiting results. Is it worth taking Disk 2 out and run an extended diagnostic test with WD Dashboard just to be safe? Do you think I should replace Disk 2 as well even though it may pass the test?
May 01 20:25:49 nas-36-1C-14 mdadm[1926]: RebuildFinished event detected on md device /dev/md127, component device recovery May 01 20:25:50 nas-36-1C-14 mdadm[1926]: SpareActive event detected on md device /dev/md127, component device /dev/sdc3 May 01 20:28:59 nas-36-1C-14 connmand[1922]: ntp: adjust (slew): +0.000579 sec May 01 20:46:04 nas-36-1C-14 connmand[1922]: ntp: adjust (slew): -0.000303 sec May 01 21:03:09 nas-36-1C-14 connmand[1922]: ntp: adjust (slew): +0.000041 sec May 01 21:06:26 nas-36-1C-14 dbus[1921]: [system] Activating service name='org.opensuse.Snapper' (using servicehelper) May 01 21:06:26 nas-36-1C-14 dbus[1921]: [system] Successfully activated service 'org.opensuse.Snapper' May 01 21:06:26 nas-36-1C-14 snapperd[12524]: loading 2227 failed May 01 21:06:26 nas-36-1C-14 snapperd[12524]: loading 2227 failed May 01 21:06:27 nas-36-1C-14 snapperd[12524]: loading 2227 failed May 01 21:06:27 nas-36-1C-14 snapperd[12524]: loading 2227 failed May 01 21:06:27 nas-36-1C-14 snapperd[12524]: loading 2227 failed May 01 21:17:01 nas-36-1C-14 CRON[13015]: pam_unix(cron:session): session opened for user root by (uid=0) May 01 21:17:01 nas-36-1C-14 CRON[13016]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly) May 01 21:17:01 nas-36-1C-14 CRON[13015]: pam_unix(cron:session): session closed for user root May 01 21:20:14 nas-36-1C-14 connmand[1922]: ntp: adjust (slew): -0.000513 sec May 01 21:22:53 nas-36-1C-14 apache2[10157]: [:error] [pid 10157] [client 192.168.1.20:55549] mod_csrf(020): request denied, expired id (12 sec), action=deny, id=-, referer: https://192.168.1.29/admin/index.html May 01 21:22:53 nas-36-1C-14 apache_access[10155]: 192.168.1.20 "POST /dbbroker HTTP/1.1" 403 May 01 21:22:53 nas-36-1C-14 apache_access[10155]: 192.168.1.20 "GET /admin/index.html HTTP/1.1" 401 May 01 21:22:53 nas-36-1C-14 apache2[10156]: pam_unix(frontview:auth): authentication failure; logname= uid=0 euid=0 tty= ruser= rhost=192.168.1.20 user=admin May 01 21:22:57 nas-36-1C-14 apache2[10380]: [auth_basic:error] [pid 10380] [client 192.168.1.20:55552] AH01617: user admin: authentication failure for "/admin/index.html": Password Mismatch, referer: https://192.168.1.29/admin/index.html May 01 21:37:19 nas-36-1C-14 connmand[1922]: ntp: adjust (slew): -0.000147 sec May 01 21:54:24 nas-36-1C-14 connmand[1922]: ntp: adjust (slew): +0.000103 sec May 01 22:06:33 nas-36-1C-14 dbus[1921]: [system] Activating service name='org.opensuse.Snapper' (using servicehelper) May 01 22:06:33 nas-36-1C-14 dbus[1921]: [system] Successfully activated service 'org.opensuse.Snapper' May 01 22:06:34 nas-36-1C-14 snapperd[14319]: loading 2227 failed May 01 22:06:34 nas-36-1C-14 snapperd[14319]: loading 2227 failed May 01 22:06:34 nas-36-1C-14 snapperd[14319]: loading 2227 failed May 01 22:06:34 nas-36-1C-14 snapperd[14319]: loading 2227 failed May 01 22:06:34 nas-36-1C-14 snapperd[14319]: loading 2227 failed May 01 22:11:29 nas-36-1C-14 connmand[1922]: ntp: adjust (slew): -0.000351 sec May 01 22:11:54 nas-36-1C-14 connmand[1922]: ntp: adjust (slew): -0.000969 sec May 01 22:17:02 nas-36-1C-14 CRON[14952]: pam_unix(cron:session): session opened for user root by (uid=0) May 01 22:17:02 nas-36-1C-14 CRON[14953]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly) May 01 22:17:02 nas-36-1C-14 CRON[14952]: pam_unix(cron:session): session closed for user root May 01 22:22:35 nas-36-1C-14 apache_access[10155]: Admin session from 192.168.1.20 ended after 2:09:34 May 01 22:28:58 nas-36-1C-14 connmand[1922]: ntp: adjust (slew): +0.000615 sec May 01 22:46:03 nas-36-1C-14 connmand[1922]: ntp: adjust (slew): -0.000185 sec May 01 23:03:08 nas-36-1C-14 connmand[1922]: ntp: adjust (slew): -0.000433 sec May 01 23:06:40 nas-36-1C-14 dbus[1921]: [system] Activating service name='org.opensuse.Snapper' (using servicehelper) May 01 23:06:40 nas-36-1C-14 dbus[1921]: [system] Successfully activated service 'org.opensuse.Snapper' May 01 23:06:41 nas-36-1C-14 snapperd[16257]: loading 2227 failed May 01 23:06:41 nas-36-1C-14 snapperd[16257]: loading 2227 failed May 01 23:06:41 nas-36-1C-14 snapperd[16257]: loading 2227 failed May 01 23:06:41 nas-36-1C-14 snapperd[16257]: loading 2227 failed May 01 23:06:41 nas-36-1C-14 snapperd[16257]: loading 2227 failed May 01 23:17:01 nas-36-1C-14 CRON[16785]: pam_unix(cron:session): session opened for user root by (uid=0) May 01 23:17:01 nas-36-1C-14 CRON[16786]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly) May 01 23:17:01 nas-36-1C-14 CRON[16785]: pam_unix(cron:session): session closed for user root May 01 23:20:13 nas-36-1C-14 connmand[1922]: ntp: adjust (slew): -0.000199 sec May 01 23:37:18 nas-36-1C-14 connmand[1922]: ntp: adjust (slew): +0.000127 sec May 01 23:44:33 nas-36-1C-14 apache_access[10155]: Admin session started from 192.168.1.20 May 01 23:44:37 nas-36-1C-14 apache_access[10155]: Suppressed 2 duplicate messages May 01 23:44:37 nas-36-1C-14 apache_access[10155]: 192.168.1.20 "POST /dbbroker HTTP/1.1" 401 May 01 23:45:55 nas-36-1C-14 apache_access[10155]: Suppressed 2 duplicate messages May 01 23:45:55 nas-36-1C-14 apache_access[10155]: 192.168.1.20 "GET /ispdb.xml?_dc=1619878554438 HTTP/1.1" 200 May 01 23:52:27 nas-36-1C-14 dbus[1921]: [system] Activating service name='org.opensuse.Snapper' (using servicehelper) May 01 23:52:28 nas-36-1C-14 dbus[1921]: [system] Successfully activated service 'org.opensuse.Snapper' May 01 23:52:28 nas-36-1C-14 snapperd[18410]: loading 2227 failed May 01 23:52:28 nas-36-1C-14 snapperd[18410]: loading 2227 failed May 01 23:52:28 nas-36-1C-14 snapperd[18410]: loading 2227 failed May 01 23:52:28 nas-36-1C-14 snapperd[18410]: loading 2227 failed May 01 23:52:28 nas-36-1C-14 snapperd[18410]: loading 2227 failed May 01 23:52:29 nas-36-1C-14 rnutil[18439]: Failed to get tier from /dev/md0 : 25:
May 01 13:25:57 nas-36-1C-14 kernel: ata1: exception Emask 0x10 SAct 0x0 SErr 0x4250000 action 0xe frozen May 01 13:25:57 nas-36-1C-14 kernel: ata1: irq_stat 0x80000040, connection status changed May 01 13:25:57 nas-36-1C-14 kernel: ata1: SError: { PHYRdyChg CommWake BadCRC DevExch } May 01 13:25:57 nas-36-1C-14 kernel: ata1: hard resetting link May 01 13:26:05 nas-36-1C-14 kernel: ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300) May 01 13:26:06 nas-36-1C-14 kernel: ata1.00: ATA-9: WDC WD20EFRX-68EUZN0, 82.00A82, max UDMA/133 May 01 13:26:06 nas-36-1C-14 kernel: ata1.00: 3907029168 sectors, multi 0: LBA48 NCQ (depth 31/32), AA May 01 13:26:06 nas-36-1C-14 kernel: ata1.00: configured for UDMA/133 May 01 13:26:06 nas-36-1C-14 kernel: ata1: EH complete May 01 13:26:06 nas-36-1C-14 kernel: scsi 0:0:0:0: Direct-Access ATA WDC WD20EFRX-68E 0A82 PQ: 0 ANSI: 5 May 01 13:26:06 nas-36-1C-14 kernel: sd 0:0:0:0: [sdc] 3907029168 512-byte logical blocks: (2.00 TB/1.82 TiB) May 01 13:26:06 nas-36-1C-14 kernel: sd 0:0:0:0: [sdc] 4096-byte physical blocks May 01 13:26:06 nas-36-1C-14 kernel: sd 0:0:0:0: [sdc] Write Protect is off May 01 13:26:06 nas-36-1C-14 kernel: sd 0:0:0:0: [sdc] Mode Sense: 00 3a 00 00 May 01 13:26:06 nas-36-1C-14 kernel: sd 0:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA May 01 13:26:06 nas-36-1C-14 kernel: sd 0:0:0:0: Attached scsi generic sg3 type 0 May 01 13:26:06 nas-36-1C-14 kernel: sd 0:0:0:0: [sdc] Attached SCSI disk May 01 13:26:08 nas-36-1C-14 kernel: sdc: sdc1 May 01 13:26:09 nas-36-1C-14 kernel: md: bind<sdc1> May 01 13:26:09 nas-36-1C-14 kernel: RAID1 conf printout: May 01 13:26:09 nas-36-1C-14 kernel: --- wd:1 rd:2 May 01 13:26:09 nas-36-1C-14 kernel: disk 0, wo:1, o:1, dev:sdc1 May 01 13:26:09 nas-36-1C-14 kernel: disk 1, wo:0, o:1, dev:sda1 May 01 13:26:09 nas-36-1C-14 kernel: md: recovery of RAID array md0 May 01 13:26:09 nas-36-1C-14 kernel: md: minimum _guaranteed_ speed: 30000 KB/sec/disk. May 01 13:26:09 nas-36-1C-14 kernel: md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery. May 01 13:26:09 nas-36-1C-14 kernel: md: using 128k window, over a total of 4190208k. May 01 13:26:14 nas-36-1C-14 kernel: md: bind<sdc3> May 01 13:26:14 nas-36-1C-14 kernel: RAID1 conf printout: May 01 13:26:14 nas-36-1C-14 kernel: --- wd:1 rd:2 May 01 13:26:14 nas-36-1C-14 kernel: disk 0, wo:1, o:1, dev:sdc3 May 01 13:26:14 nas-36-1C-14 kernel: disk 1, wo:0, o:1, dev:sda3 May 01 13:26:14 nas-36-1C-14 kernel: md: recovery of RAID array md127 May 01 13:26:14 nas-36-1C-14 kernel: md: minimum _guaranteed_ speed: 30000 KB/sec/disk. May 01 13:26:14 nas-36-1C-14 kernel: md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery. May 01 13:26:14 nas-36-1C-14 kernel: md: using 128k window, over a total of 1948662784k. May 01 13:26:16 nas-36-1C-14 kernel: md1: detected capacity change from 535822336 to 0 May 01 13:26:16 nas-36-1C-14 kernel: md: md1 stopped. May 01 13:26:16 nas-36-1C-14 kernel: md: unbind<sda2> May 01 13:26:16 nas-36-1C-14 kernel: md: export_rdev(sda2) May 01 13:26:17 nas-36-1C-14 kernel: md: bind<sda2> May 01 13:26:17 nas-36-1C-14 kernel: md: bind<sdc2> May 01 13:26:17 nas-36-1C-14 kernel: md/raid1:md1: not clean -- starting background reconstruction May 01 13:26:17 nas-36-1C-14 kernel: md/raid1:md1: active with 2 out of 2 mirrors May 01 13:26:17 nas-36-1C-14 kernel: md1: detected capacity change from 0 to 535822336 May 01 13:26:17 nas-36-1C-14 kernel: md: resync of RAID array md1 May 01 13:26:17 nas-36-1C-14 kernel: md: minimum _guaranteed_ speed: 30000 KB/sec/disk. May 01 13:26:17 nas-36-1C-14 kernel: md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for resync. May 01 13:26:17 nas-36-1C-14 kernel: md: using 128k window, over a total of 523264k. May 01 13:26:20 nas-36-1C-14 kernel: Adding 523260k swap on /dev/md1. Priority:-1 extents:1 across:523260k May 01 13:26:41 nas-36-1C-14 kernel: md: md1: resync done. May 01 13:26:42 nas-36-1C-14 kernel: RAID1 conf printout: May 01 13:26:42 nas-36-1C-14 kernel: --- wd:2 rd:2 May 01 13:26:42 nas-36-1C-14 kernel: disk 0, wo:0, o:1, dev:sda2 May 01 13:26:42 nas-36-1C-14 kernel: disk 1, wo:0, o:1, dev:sdc2 May 01 13:28:20 nas-36-1C-14 kernel: md: md0: recovery done. May 01 13:28:20 nas-36-1C-14 kernel: RAID1 conf printout: May 01 13:28:20 nas-36-1C-14 kernel: --- wd:2 rd:2 May 01 13:28:20 nas-36-1C-14 kernel: disk 0, wo:0, o:1, dev:sdc1 May 01 13:28:20 nas-36-1C-14 kernel: disk 1, wo:0, o:1, dev:sda1 May 01 20:25:48 nas-36-1C-14 kernel: md: md127: recovery done. May 01 20:25:48 nas-36-1C-14 kernel: RAID1 conf printout: May 01 20:25:48 nas-36-1C-14 kernel: --- wd:2 rd:2 May 01 20:25:48 nas-36-1C-14 kernel: disk 0, wo:0, o:1, dev:sdc3 May 01 20:25:48 nas-36-1C-14 kernel: disk 1, wo:0, o:1, dev:sda3 - StephenBMay 01, 2021Guru - Experienced User
dstsui wrote:
Is it worth taking Disk 2 out and run an extended diagnostic test with WD Dashboard just to be safe? Do you think I should replace Disk 2 as well even though it may pass the test?
The disk test will run a full read test of both disks, so there is no need to do anything with the Dashboard program.
Personally I leave my disks in place until I see some evidence that they are misbehaving. So I'd leave disk 2 alone for now.
dstsui wrote:
Should I be concerned about this message in the system log?
snapperd[12524]: loading 2227 failed
That is a bit concerning, it suggests something isn't quite right with the file system (indicating that a snapshot can't be loaded).
Are you using the "smart" snapshots? I don't like them myself, because some of them are retained forever (and the disk space that needs gradually goes up). I use custom snapshots, and explicitly set retention. Most shares are set to 3 months.
Anyway, I suggest deleting all your current snapshots from the web ui. After that try doing a scrub from the volume settings wheel after the disk test completes. Then see what file system errors show up in the log as a result of the scrub.
Or do the factory default, which will obviously take some time, but will also give you a completely clean file system without needing to do more analysis and repair.
- dstsuiMay 01, 2021Aspirant
The disk test finished without drama. Where can I see the resutls?
Some of the oldest snapshots date back to Jan 2015. I guess it is safe to delete Shares older than say one year. I will run a Scub after I deleted the old shares.
- dstsuiMay 02, 2021Aspirant
I noticed something interesting in the snapshot history. Before March snapshots were taken once a month but since March they were taken daily. The oldest snapshots date back to the time when I set up the system in Jan 2015, shouldn't they be purged already?
Tonight I copied 489 images from an SD card on my PC to the RN102 and it clocked a whopping speed of 10MB/s. At least it finished without crashing, but something has to be done. I wondered whether all these legacy snapshots are weighing it down.
- mdgmMay 02, 2021Virtuoso
Does your system show how much space is consumed by snapshots? If so you have quotas enabled on your data volume and disabling that may help.
By default monthly snapshots are kept indefinitely unless there is limited free space on the data volume
A custom snapshot schedule only keeping a more limited number of snapshots would likely lead to better performance too as StephenB mentioned.
Note that if you delete a snapshot all newer snapshots are recursively updated. This can take a while to complete so performance would get worse before it improves.
- StephenBMay 03, 2021Guru - Experienced User
mdgm wrote:
By default monthly snapshots are kept indefinitely unless there is limited free space on the data volume
A custom snapshot schedule only keeping a more limited number of snapshots would likely lead to better performance too as StephenB mentioned.
Note that if you delete a snapshot all newer snapshots are recursively updated. This can take a while to complete so performance would get worse before it improves.
Also, your error is saying that one particular snapshot is somehow damaged and can't be read. We don't know how old that is - and you might not be able to delete it. So there are multiple issues here - the bad disk was one, but the error with the snapshot suggests some file system corruption.
It's possible that the poor performance will continue after you get rid of the ancient snapshots, but even if it does, you need to address the file system corruption.
Overall, the direct way is to do the factory default, set up the NAS again, and restore the files from backup. Though painful, that is guaranteed to clean up everything.
But if you really can't do that, then look for more file system errors after you delete the old snapshots, and look again after the scrub. If they are gone, follow up with a balance (also from the volume settings wheel). Then measure performance again.
- dstsuiMay 03, 2021Aspirant
I could delete all Shares without issues. I started a disk scrub last night and when I checked the status this morning, it has only progressed 2.5%, if the rate is constant, the scrub won't finish for another 13 days. Is it meant to be that slow?
- StephenBMay 04, 2021Guru - Experienced User
dstsui wrote:
I started a disk scrub last night and when I checked the status this morning, it has only progressed 2.5%, if the rate is constant, the scrub won't finish for another 13 days. Is it meant to be that slow?
The rate might not be constant, but it is slower than I'd expect.
I did a scrub on my RN102 (which has a 2x1TB volume) about a month ago, and it took about 5 1/2 hours. So I'd expect about 11-12 hours for 2x2TB.
The scrub is doing two different operations in parallel - one is a RAID scrub (which is the same as a resync), the second is a BTRFS scrub. The BTRFS scrub is what we're aiming for here - how long it takes might be longer because you've never run one.
In any event, I'd wait for it to complete. You could download a fresh set of logs, and see if there are any new disk errors (or btrfs errors).
dstsui wrote:
I could delete all Shares without issues.
Do you mean Snapshots, or did you delete the Shares?
Related Content
NETGEAR Academy
Boost your skills with the Netgear Academy - Get trained, certified and stay ahead with the latest Netgear technology!
Join Us!