NETGEAR is aware of a growing number of phone and online scams. To learn how to stay safe click here.
Forum Discussion
tony359
Mar 30, 2023Apprentice
ReadyNAS Pro 6 crashed again
Hello all, My ReadyNAS Pro6 periodically stops responding to the network. When that happens I can push the button to shutdown it but it will sit on "shutting down" forever and then I'll have to p...
tony359
Jun 07, 2023Apprentice
Just for the Hardware Geeks, I decided to replace the capacitors of the PSU anyways. I know, waste of time. And massive one as those large traces with wired made the process very difficult.
HOWEVER, 90% of the caps were bulged and were reading either nothing or a fraction of the original capacitance!! But the NAS was working, I wonder how.
And I wonder how is it even possible that the replacement (temporary) PSU hasn't fixed the issue!
I think I'll replace the Seasonic anyways but a small PSU is always handy so I'll fix it anyways.
The NAS didn't disappear over the past couple of days.
KDS
Jun 09, 2023Guide
Just on an off chance you haven't tried yet.
CPU & RAM swap out?
- tony359Jun 09, 2023ApprenticeConsidering it’s happening very rarely it’s not such a bad idea.
I’ve run overnights of ram tests but maybe it didn’t catch it because it happens very rarely.
I still have the original CPU so I could try that too.
That said, the fact that just the network went down last time is suspicious. A ram or cpu issue would have much bigger impact I reckon. I might want to put a switch in between the nas and the main switch. It’s always been that switch and maybe it’s faulty. After all the nas stopped crashing when I took it off the main network - which takes the main switch out of the equation.
And it worked for a while while connected to my main desktop, again no main switch involved.
Uhm… I like this idea 🙂 - KDSJun 09, 2023Guide
I have my router dishing out DHCP addresses>>>Unmanaged 2.5G switch>>>both NICs into switch.
Static IP's on both Netgear NIC settings (IPV4) and router address.
Router set to static IP addresses for both NICs.
Since doing that both NICs are very stable.
Ram is 2 x 2GB PC800.
CPU is now E7600, Just upgraded from E5300, find this much faster than the Q6600, though my NAS is mainly used for backup, and file server, not really serving any Apps. E7600 runs faster and much cooler than Q6600.
- SandsharkJun 09, 2023Sensei - Experienced User
Is it a "green" switch? I've had a couple issues with ReadyNAS and green switches, though I've believed the problem units already had partly damaged LAN ports. My main switch has a "green" on/off selection. Try turning off power saving mode if yours does. Otherwise, a non-green switch in between might be the answer.
- tony359Jun 09, 2023Apprentice
It's a Netgear! 🙂
GS108Ev2. "partly" managed. V1.00.12 (latest). DHCP disabled. DHCP is handled by the router (Fritzbox) which issues the same IP to the NAS MAC address. All settings are default to be honest.
I have tried a static IP in the past with no change - though, I'm confident those swollen capacitors might have contributed to SOME of the issues I was having.
Today's new issue is... the NAS is online, I can see the files. I can SSH into it. But web interface shows an "500 - internal server error". This is on both ports. Sigh 🙂
Before I just reboot the box, how would I restart the web interface from SSH?
I'll install a dumb switch between the NAS and the main switch - with new cables.
The 7600 seems to be a good option. It's only 2 cores but it's faster than the cores in the 6600. I wonder how much a NAS used as a "file system" is actually using a multi-core CPU. And the 7600 as you say is cooler.
I think I'll fix this issue first then I might try the 7600 as well, thanks for the hint!
- tony359Jun 09, 2023Apprentice
I feel that the below is relevant with my issue. Again, the NAS is accessible, I can write a file on the data folder via nano. I just lost the web interface.
These weird failures are incredibly annoying. I'd like to test what itachi2 recommended, can someone possibly point me to the right direction? See https://community.netgear.com/t5/Using-your-ReadyNAS-in-Business/ReadyNAS-Pro-6-crashed-again/m-p/2316638/highlight/true#M199640
Thanks 🙂
root@Enterprise-NAS:/# systemctl status apache2 Failed to get properties: Activation of org.freedesktop.systemd1 timed out root@Enterprise-NAS:/# root@Enterprise-NAS:/# root@Enterprise-NAS:/# root@Enterprise-NAS:/# root@Enterprise-NAS:/# systemctl restart apache2 Failed to restart apache2.service: Activation of org.freedesktop.systemd1 timed out See system logs and 'systemctl status apache2.service' for details. root@Enterprise-NAS:/# sudo systemctl status apache2 -bash: sudo: command not found root@Enterprise-NAS:/# su root@Enterprise-NAS:/# systemctl status apache2 Failed to get properties: Activation of org.freedesktop.systemd1 timed out root@Enterprise-NAS:/# systemctl status readynasd Failed to get properties: Activation of org.freedesktop.systemd1 timed out root@Enterprise-NAS:/# ps aux | grep readynasd root 26625 0.0 0.0 17836 1008 pts/2 S+ 19:58 0:00 grep readynasd root@Enterprise-NAS:/# service ctscand stop Failed to stop ctscand.service: Connection timed out See system logs and 'systemctl status ctscand.service' for details. Failed to get load state of ctscand.service: Connection timed out root@Enterprise-NAS:/# systemctl restart readynasd Failed to restart readynasd.service: Activation of org.freedesktop.systemd1 timed out See system logs and 'systemctl status readynasd.service' for details. root@Enterprise-NAS:/# systemctl status readynasd.service Failed to get properties: Activation of org.freedesktop.systemd1 timed out - KDSJun 09, 2023Guide
Just another hardware thing that has probably already happened.
1. After good PSU installed was CMOS cleared?
2. Has CMOS battery been checked?
3. Are you keeping it simple with just 1 HDD, possibly 2 (raid 1), with HDDs especially raid arrays cleaned and cleared on another PC prior to installing. Granted you may have data on your system, though remove those HDDs and start fresh, with known clean and good drives? I tested with some old 320GB junk drives I had kicking about. I also encountered NIC, web access, and HDD problems prior to replacing the PSU. My original 7200 WD HDDs were only seen as 5900, then when I added a newer 7200 WD HDD it was seen as 7200, it did not like the mismatch in HDD speed that it saw.
Though finally did clean HDDs. I think Web interface may be associated with what is already on the HDDs.
My HDD and hardware issues were resolved when I replaced PSU. Both types drives 3 x 7200 seen as 5900, and 3 x 7200 seen as 7200, and running together fine.
4. BTW are you using RAIDar 6.5.0.
- tony359Jun 09, 2023Apprentice
Just another hardware thing that has probably already happened.
1. After good PSU installed was CMOS cleared?
--
No, I did not update the BIOS so I didn't think of clearing the CMOS. I can try.
2. Has CMOS battery been checked?
--
No. Good point.
3. Are you keeping it simple with just 1 HDD, possibly 2 (raid 1), with HDDs especially raid arrays cleaned and cleared
--
No. Reason is: last time the system behaved, it lasted for 2 months. I cannot stay without my data for 2 months.
The only two options here are
a. Fix it with the current setup
b. try a factory default and migrate a backup
Testing with 2 random HDDs is likely not gaining any evidence I'm afraid.
I also encountered NIC, web access, and HDD problems prior to replacing the PSU. My original 7200 WD HDDs were only seen as 5900, then when I added a newer 7200 WD HDD it was seen as 7200, it did not like the mismatch in HDD speed that it saw.
--
Unfortunately the replacement PSU did not solve all the problems. I'm confident some of the issues I experienced were caused by the bad PSU but the NAS is still misbehaving I'm afraid.
All my HDDs are WD RED, 5400-ish (4TB are a bit slower than the 6TB).
4. BTW are you using RAIDar 6.5.0.
--
No. I am on OS6.
I appreciate a factory reset would be a good idea but I have 13TB on that NAS and I don't know where to store them for a backup. Yes, the NAS is more or less fully backed up (locally and online) but it would take me forever to restore those backups so I'd consider that as an emergency option only.
I could see if I could hire another NAS, transfer the data, reset and restore. But somehow I am not confident my problems would go away 🙂
Thanks for your input!
- tony359Jun 10, 2023Apprentice
Little update.
I checked the battery, it's ok, 3.1V. I replaced it some time ago when I serviced the box.
I re-reset the BIOS (only thing I change is the default fan speed!)
I swapped position of HDD0 with HDD4. I sprayed dry contact cleaner on the backplane and on the HDDs, cleaned with a small q-tip.
Once the NAS was powered up again, HDD0 failed to show up on the BIOS splash page straight away. So it's not the HDD and, to be honest, I feel that that might be a red herring. I never had issues with HDD0 so maybe it's a BIOS bug which then does not affect the software. No idea. But I now know it's not the drive.
I've added a TP-Link switch between the main switch and the NAS.
Next: throwing the NAS out of the window.
- tony359Jun 11, 2023Apprentice
And no, the NAS disappeared again.
Solution: SSH into other port and ifconfig the other port DOWN and then UP again.
I could try swapping the config but I think I tried that in the past already.
If someone could give me some directions for checking the HDDs offline as mentioned above, that would be great! 🙂
Thanks
- tony359Jun 11, 2023Apprentice
That's what I meant with "swapping the config" sorry. As in swap the IP addresses between ports.
I'll try but I think I tried that in the past already. 100% worth a try.
- tony359Jun 12, 2023Apprentice
The NICs are on two different IP range - one main network, one PC only.
What used to be on main network is now directly connected to the PC and what used to be connected to the PC is now connected to the main network and I've swapped the IP addresses accordingly.
I did that yesterday and I've just checked: NAS has disappeared. Sigh!
I SSH'd through the other NIC, restarted it and it worked as usual.
So
- It's not the specific NIC
- It's not the switch
It's curious that it's always the NIC on the main network failing and not the other.
Help 🙂
- schumakuJun 12, 2023Guru - Experienced User
As you are in the lucky situation having an alternate LAN interface (and IP subnet) available. what does the kernel output show when the device "disappeared", ...?
# dmesg
The risk that a network adapter does become flakey is very small. More typical, the adapters resp. the data connectivity does completely disappear completely, and the UPnP OS does no longer detect the adapter.
Most problems on such NASes are caused by RAID becoming inoperable, due to aged or breaking storage blocks.
Do you have a known working, reliable SATA storage block at hand to set-up the NAS with one single device volume, or two on a RAID 1 volume? Remove the potentially unhealthy storage blocks, and restart a test from scratch.
- tony359Jun 12, 2023Apprentice
Thanks, I'll test next time.
Many have (rightly) recommended a test with a couple of random HDDs. I have plenty so that wouldn't be an issue.
My concern is that sometimes the NAS stays online for weeks without issues and I really cannot keep my data offline for so long.
Is there a way to do an offline test of my drives? Someone recommended booting from a Debian Live-USB but I would need some minor guidance on that. I know how to make the USB, I'm just making sure (as much as possible) I don't do anything that can destroy my data.
Thanks! 🙂
- schumakuJun 12, 2023Guru - Experienced User
Start with retrieving the SMART data from the storage block (aka. disk). Next trigger a full SMART check (rapid, then full) of the storage block. Then retrieve the SMART data again.
You can do this on any platform, without erasing or re-partition or re-format the storage block - if done carefully of course.
- tony359Jun 12, 2023ApprenticeThanks.
I’ll Google how to do that. 🙂
Just to double check: do you mean doing those checks on the NAS itself while it’s online? - StephenBJun 13, 2023Guru - Experienced User
tony359 wrote:
Is there a way to do an offline test of my drives? 🙂
There is an on-line test in the maintenance menu you can use. That runs the full built-in smart test on all the drives in the volume.
You can also use smartctl -x /dev/sda from ssh to see more errors (UNCs in particular) on sda (or whatever disk you wish),
As far as off-line goes, the simplest way is to connect the drive to a Windows PC and run the vendor diag - Dashboard for WDC, and Seatools for Seagate. Unfortunately they don't run on MacOS.
But it seems to me that your symptoms are pointing either to the switch or perhaps the cable going from the NAS to the switch. It's always the NIC port connected to that switch that fails, and the other NIC always continues to work fine.
- tony359Jun 13, 2023Apprentice
Hi Stephen,
No, the ports were swapped last time - also the switch and the cable. So it's not a NIC or Network issue. Well. It ALWAYS fails on that NETWORK so it could be something on my main network. But on this occasion the NAS was wired to the main switch on another port and through an additional switch. So if it's something with that network, it's not a HW issue.
The online maintenance runs periodically. The logs show an "offline" test though. How should I read that? The drive is now 51888hrs.
SMART Extended Self-test Log Version: 1 (1 sectors) Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Interrupted (host reset) 90% 50227 - # 2 Extended offline Completed without error 00% 48081 - # 3 Extended offline Completed without error 00% 45875 - # 4 Extended offline Completed without error 00% 43691 - # 5 Extended offline Completed without error 00% 41536 - # 6 Extended offline Completed without error 00% 39834 - # 7 Extended offline Completed without error 00% 37636 - # 8 Extended offline Completed without error 00% 35455 - # 9 Extended offline Completed without error 00% 33273 - #10 Extended offline Completed without error 00% 31118 - #11 Extended offline Completed without error 00% 28912 - #12 Extended offline Completed without error 00% 26707 - #13 Extended offline Completed without error 00% 24525 - #14 Extended offline Completed without error 00% 22554 - #15 Extended offline Completed without error 00% 20712 - #16 Extended offline Completed without error 00% 19182 - #17 Short offline Completed without error 00% 82 - #18 Short offline Completed without error 00% 63 -I ran smartctl -x in the past and posted the output here earlier on this thread. I didn't spot anything but I am not an expert. There are UNC errors on SDA (which I now moved to SDE) but at 7872 hours, a few years ago! 🙂
Error 159 [14] occurred at disk power-on lifetime: 7872 hours (328 days + 0 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER -- ST COUNT LBA_48 LH LM LL DV DC -- -- -- == -- == == == -- -- -- -- -- 40 -- 51 00 00 00 00 4b 2b cc 40 40 00 Error: WP at LBA = 0x4b2bcc40 = 1261161536 Commands leading to the command that caused the error were: CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name -- == -- == -- == == == -- -- -- -- -- --------------- -------------------- 61 04 00 00 08 00 00 4b 2b c8 40 40 08 14:59:14.849 WRITE FPDMA QUEUED 60 04 00 00 00 00 00 4b 2b cc 40 40 08 14:59:14.849 READ FPDMA QUEUED ef 00 10 00 02 00 00 00 00 00 00 a0 08 14:59:14.849 SET FEATURES [Enable SATA feature] 27 00 00 00 00 00 00 00 00 00 00 e0 08 14:59:14.849 READ NATIVE MAX ADDRESS EXT [OBS-ACS-3] ec 00 00 00 00 00 00 00 00 00 00 a0 08 14:59:14.849 IDENTIFY DEVICE Error 158 [13] occurred at disk power-on lifetime: 7872 hours (328 days + 0 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER -- ST COUNT LBA_48 LH LM LL DV DC -- -- -- == -- == == == -- -- -- -- -- 40 -- 51 00 00 00 00 4b 2b cc 40 40 00 Error: UNC at LBA = 0x4b2bcc40 = 1261161536 Commands leading to the command that caused the error were: CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name -- == -- == -- == == == -- -- -- -- -- --------------- -------------------- 60 04 00 00 08 00 00 4b 2b cc 40 40 08 14:59:11.031 READ FPDMA QUEUED 61 04 00 00 00 00 00 4b 2b c8 40 40 08 14:59:11.031 WRITE FPDMA QUEUED ef 00 10 00 02 00 00 00 00 00 00 a0 08 14:59:11.031 SET FEATURES [Enable SATA feature] 27 00 00 00 00 00 00 00 00 00 00 e0 08 14:59:11.031 READ NATIVE MAX ADDRESS EXT [OBS-ACS-3] ec 00 00 00 00 00 00 00 00 00 00 a0 08 14:59:11.030 IDENTIFY DEVICE Error 157 [12] occurred at disk power-on lifetime: 7872 hours (328 days + 0 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER -- ST COUNT LBA_48 LH LM LL DV DC -- -- -- == -- == == == -- -- -- -- -- 40 -- 51 00 00 00 00 4b 2b cc 40 40 00 Error: WP at LBA = 0x4b2bcc40 = 1261161536 Commands leading to the command that caused the error were: CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name -- == -- == -- == == == -- -- -- -- -- --------------- -------------------- 61 04 00 00 08 00 00 4b 2b c8 40 40 08 14:59:07.223 WRITE FPDMA QUEUED 60 04 00 00 00 00 00 4b 2b cc 40 40 08 14:59:07.223 READ FPDMA QUEUED ef 00 10 00 02 00 00 00 00 00 00 a0 08 14:59:07.223 SET FEATURES [Enable SATA feature] 27 00 00 00 00 00 00 00 00 00 00 e0 08 14:59:07.223 READ NATIVE MAX ADDRESS EXT [OBS-ACS-3] ec 00 00 00 00 00 00 00 00 00 00 a0 08 14:59:07.223 IDENTIFY DEVICE Error 156 [11] occurred at disk power-on lifetime: 7872 hours (328 days + 0 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER -- ST COUNT LBA_48 LH LM LL DV DC -- -- -- == -- == == == -- -- -- -- -- 40 -- 51 00 00 00 00 4b 2b cc 40 40 00 Error: UNC at LBA = 0x4b2bcc40 = 1261161536 Commands leading to the command that caused the error were: CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name -- == -- == -- == == == -- -- -- -- -- --------------- -------------------- 60 04 00 00 08 00 00 4b 2b cc 40 40 08 14:59:03.405 READ FPDMA QUEUED 61 04 00 00 00 00 00 4b 2b c8 40 40 08 14:59:03.405 WRITE FPDMA QUEUED ef 00 10 00 02 00 00 00 00 00 00 a0 08 14:59:03.405 SET FEATURES [Enable SATA feature] 27 00 00 00 00 00 00 00 00 00 00 e0 08 14:59:03.405 READ NATIVE MAX ADDRESS EXT [OBS-ACS-3] ec 00 00 00 00 00 00 00 00 00 00 a0 08 14:59:03.405 IDENTIFY DEVICE Error 155 [10] occurred at disk power-on lifetime: 7872 hours (328 days + 0 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER -- ST COUNT LBA_48 LH LM LL DV DC -- -- -- == -- == == == -- -- -- -- -- 40 -- 51 00 00 00 00 4b 2b cc 40 40 00 Error: WP at LBA = 0x4b2bcc40 = 1261161536 Commands leading to the command that caused the error were: CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name -- == -- == -- == == == -- -- -- -- -- --------------- -------------------- 61 04 00 00 08 00 00 4b 2b c8 40 40 08 14:58:59.720 WRITE FPDMA QUEUED 60 04 00 00 00 00 00 4b 2b cc 40 40 08 14:58:59.720 READ FPDMA QUEUED ef 00 10 00 02 00 00 00 00 00 00 a0 08 14:58:59.720 SET FEATURES [Enable SATA feature] 27 00 00 00 00 00 00 00 00 00 00 e0 08 14:58:59.720 READ NATIVE MAX ADDRESS EXT [OBS-ACS-3] ec 00 00 00 00 00 00 00 00 00 00 a0 08 14:58:59.719 IDENTIFY DEVICE Error 154 [9] occurred at disk power-on lifetime: 7872 hours (328 days + 0 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER -- ST COUNT LBA_48 LH LM LL DV DC -- -- -- == -- == == == -- -- -- -- -- 40 -- 51 00 00 00 00 4b 2b cc 40 40 00 Error: UNC at LBA = 0x4b2bcc40 = 1261161536 Commands leading to the command that caused the error were: CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name -- == -- == -- == == == -- -- -- -- -- --------------- -------------------- 60 04 00 00 08 00 00 4b 2b cc 40 40 08 14:58:55.900 READ FPDMA QUEUED 61 04 00 00 00 00 00 4b 2b c8 40 40 08 14:58:55.900 WRITE FPDMA QUEUED ea 00 00 00 00 00 00 00 00 00 00 e0 08 14:58:55.873 FLUSH CACHE EXT 60 00 08 00 08 00 00 00 7f 22 18 40 08 14:58:55.838 READ FPDMA QUEUED 61 00 02 00 00 00 00 00 00 00 48 40 08 14:58:55.838 WRITE FPDMA QUEUED Error 153 [8] occurred at disk power-on lifetime: 7872 hours (328 days + 0 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER -- ST COUNT LBA_48 LH LM LL DV DC -- -- -- == -- == == == -- -- -- -- -- 40 -- 51 00 00 00 00 4b 2b c8 40 40 00 Error: UNC at LBA = 0x4b2bc840 = 1261160512 Commands leading to the command that caused the error were: CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name -- == -- == -- == == == -- -- -- -- -- --------------- -------------------- 60 04 00 00 00 00 00 4b 2b c8 40 40 08 14:58:52.283 READ FPDMA QUEUED ef 00 10 00 02 00 00 00 00 00 00 a0 08 14:58:52.283 SET FEATURES [Enable SATA feature] 27 00 00 00 00 00 00 00 00 00 00 e0 08 14:58:52.283 READ NATIVE MAX ADDRESS EXT [OBS-ACS-3] ec 00 00 00 00 00 00 00 00 00 00 a0 08 14:58:52.282 IDENTIFY DEVICE ef 00 03 00 46 00 00 00 00 00 00 a0 08 14:58:52.282 SET FEATURES [Set transfer mode] Error 152 [7] occurred at disk power-on lifetime: 7872 hours (328 days + 0 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER -- ST COUNT LBA_48 LH LM LL DV DC -- -- -- == -- == == == -- -- -- -- -- 40 -- 51 00 00 00 00 4b 2b c8 40 40 00 Error: UNC at LBA = 0x4b2bc840 = 1261160512 Commands leading to the command that caused the error were: CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name -- == -- == -- == == == -- -- -- -- -- --------------- -------------------- 60 04 00 00 00 00 00 4b 2b c8 40 40 08 14:58:48.786 READ FPDMA QUEUED ef 00 10 00 02 00 00 00 00 00 00 a0 08 14:58:48.786 SET FEATURES [Enable SATA feature] 27 00 00 00 00 00 00 00 00 00 00 e0 08 14:58:48.786 READ NATIVE MAX ADDRESS EXT [OBS-ACS-3] ec 00 00 00 00 00 00 00 00 00 00 a0 08 14:58:48.786 IDENTIFY DEVICE ef 00 03 00 46 00 00 00 00 00 00 a0 08 14:58:48.786 SET FEATURES [Set transfer mode]I am Windows so that's fine but wouldn't be better to run the tests on a Linux system so the file system can be checked as well? Also I think I think I'd prefer the disks to stay unmounted so I know I have less chances of damaging the RAID.
Can I start the NAS from a Debian live-USB? I could run the checks from there, assuming VGA works there. And what do you think of that suggestion of running btrfs-check on the drives? I don't dislike the idea of checking the file system.
The NAS disappeared again so I've run dmseg and it's attached (this forum lacks the ability to attach text files!).
Do I see lots of network going down messages after what seems to be a gap? And both ETH0 and ETH1.
Disabling and re-enabling ETH0 worked as usual.
And yes, I've now disabled IPv6 (it got re-enabled when I swapped the IPs I think)
- SandsharkJun 13, 2023Sensei - Experienced User
Yes, you can start a legacy NAS from a Debian Live USB (or even DOS or Windows). Native OS6 models are more picky about what they will start from.
- StephenBJun 13, 2023Guru - Experienced User
tony359 wrote:
I am Windows so that's fine but wouldn't be better to run the tests on a Linux system so the file system can be checked as well? Also I think I think I'd prefer the disks to stay unmounted so I know I have less chances of damaging the RAID.
I don't think so. If you needed that, I'd do it in the NAS.
I really don't see how this can be the disks or the file system. If it were, the second NIC wouldn't be responsive when the problem occurs. Plus normal operation wouldn't resume when you set the interface down and then up again.
tony359 wrote:
No, the ports were swapped last time - also the switch and the cable. So it's not a NIC or Network issue. Well. It ALWAYS fails on that NETWORK so it could be something on my main network.
I think definitely a network issue, though perhaps not the physical layer. The puzzle is what.
Are you using the NAS differently on the main network than you are on the PC connection?
The history here is of course extensive, and I'm have trouble keeping everything straight. Did the NAS ever lock up when it was only connected to the main network (with the PC NIC disconnected)?
tony359 wrote:
The online maintenance runs periodically. The logs show an "offline" test though. How should I read that? The drive is now 51888hrs.
The "extended offline" record is actually the test you run from the maintenance settings. No idea why is it described as "offline" by smartctl.
You should also see it at the end of volume.log. It looks like the NAS crashed (or was shut down) before the most recent test finished.
- tony359Jun 13, 2023Apprentice
>I don't think so. If you needed that, I'd do it in the NAS.
>I really don't see how this can be the disks or the file system. If it were, the second NIC wouldn't be responsive when the >problem occurs. Plus normal operation wouldn't resume when you set the interface down and then up again.
I appreciate your view and I don't disagree with it.
But this has been going on for months and I've tried many things short of a new set of HDDs.
Before I start messing up with my data I'd like to exhaust all the options.
One of them is to do an offline check via Live-CD. As I am not super-skilled with Linux and I care about my data, can someone roughly guide me so I don't obliterate my data 🙂
I guess I'll boot from a Live USB, the 5 RAID HDDs are not going to be mounted by default.
I can then run
btrfs-check --readonly /dev/sd(x)
This should check the file system?
Then smartctl -t long /dev/sd(x)
Anything else anybody can think I should do while the HDDs are offline?
>I think definitely a network issue, though perhaps not the physical layer. The puzzle is what.
>Are you using the NAS differently on the main network than you are on the PC connection?
>The history here is of course extensive, and I'm have trouble keeping everything straight. Did the NAS ever lock up >when it was only connected to the main network (with the PC NIC disconnected)?
The PC and the NAS are plugged on the same switch. There is nothing running on the NAS. I only use it as File System.
I appreciate the history is long and I thank you for bearing with me for so long and not suggesting I should go buy a Qnap 🙂
The second NIC connected to the PC is a recent addition as I discovered that when the NAS disappears I can still access it via the other NIC. The behaviour hasn't changed since I also plugged the PC directly into the NAS.
Months ago, the NAS stopped misbehaving when I completely disconnected it from ANY networks.
A week later I plugged it into the PC only (no main network, no internet)
Some weeks of good behaviour later, I put the NAS back on the main network, removing some port forwarding I had in the main router.
It worked PERFECTLY for 2 months.
Then it started disappearing twice a day. Out of the blue.
This is why I am pursuing unlikely routes: the above events point to NOTHING! 🙂
- tony359Jun 13, 2023Apprentice
quick addendum:
I've made a live-USB of Debian, played with it and a random HDD which I formatted btrfs.
If anybody has any suggestions on what to test while offline, please do let me know!
Also, if someone has any suggestions on what NOT to do while playing with those HDD, please also do let me know!
- StephenBJun 13, 2023Guru - Experienced User
You'd need to assemble the RAID array and mount it in order to run btrfs check.
Since your system boots, you can just run ssh (logging in as root), and run btrfs check from there. The device would be /dev/md127 (the raid array virtual disk).
Use --force because the file system is mounted. It won't try to repair anything, so no need to worry about read-only. Don't write anything to the data volume while it is running.
root@RN102:~# btrfs check --force /dev/md127 WARNING: filesystem mounted, continuing because of --force Checking filesystem on /dev/md127 ...You can also run smartctl from ssh (/dev/sda, etc), so need to use the liveCD there either.
root@RN102:~# smartctl --test=long /dev/sda smartctl 6.6 2017-11-05 r4594 [armv7l-linux-4.4.218.armada.1] (local build) Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org === START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION === Sending command: "Execute SMART Extended self-test routine immediately in off-line mode". Drive command "Execute SMART Extended self-test routine immediately in off-line mode" successful. Testing has begun. Please wait 127 minutes for test to complete. Test will complete after Tue Jun 13 20:35:03 2023 Use smartctl -X to abort test. root@RN102:~#
Related Content
NETGEAR Academy
Boost your skills with the Netgear Academy - Get trained, certified and stay ahead with the latest Netgear technology!
Join Us!