NETGEAR is aware of a growing number of phone and online scams. To learn how to stay safe click here.
Forum Discussion
HolgerGT86
Apr 08, 2020Guide
RN104 shutdown because disks exceed safe temperature
Hello, I'm using a RN104 at firmware level 6.10.3. The ReadyNAS is equipped with 4 times Hitachi HUA721010KLA330 1TB SATA drives, all 4 drives building one RAID5 X-RAID volume. Today the RN104 sh...
HolgerGT86
Apr 10, 2020Guide
it's me again ...
Can someone help me to understand how fan control works?
I'm wondering why the new (re-used) fan is not rotating that fast than the original one.
I know that the original fan was running with max speed of 3233 RPM which matches its specification of having a max speed of 3200 RPM.
With same fan setting "cool", the new fan is rotating up to 1920 RPM only for now.
I'm monitoring this because my new fan is a re-used fan of an old NetFinity 3000 server and I did not find its specifications. I only found specifications of a replacement model and that is running with up to 4800 RPMs - don't know if this applys to the fan I'm using, too.
As already documented, the replacement fan is currently running with a maximum speed of 1920 RPM. With that, the temperature of the CPU raises up to 66°C (151°F) and the temperature of the disks is as follows:
disk #1 = 49°C (120°F)
disk #2 = 54°C (129°F)
disk #3 = 53°C (127°F)
disk #4 = 47°C (116°F)
Is this normal or is the replacement fan already running at its maximum speed?
Is there a way to manually set the fan to maximum speed?
Many thanks and I wish happy Easter holidays to you and your families,
Holger
HolgerGT86
May 24, 2020Guide
Hello,
I now received my new fan for my RN104. Looks like the RN104 is giving less than 7 volts to the fan at startup, when the RN104 and its disk and CPU are cold. This results into a 0RPM warning message. Later on, when the RN104 is heating up, the fan is working fine and the RN104 disk temperature is about 10°C less than before.
Why 7 volts? From the specs of the fan I can read that the fan allows 7-13.8 volts.
Is there a way to set the minimum voltage value or minimum RPM of the fan?
Thanks,
Holger
- HolgerGT86Jun 08, 2020Guide
Good morning, good evening, good afternoon,
today, my RN104 shutdown again because the disks in the middle bays exceeded their safe temperature of 60°C. This means, it's not the fan which is failing because I replaced it with a new, much "stronger" one. Something else must be wrong, and I still believe it's related to running defragmentation maintenance. Why I believe so? Because the RN104 only experiences this issue when defragementation just completed. It never happened when running another process, when loading the NAS with backup workload, when manually creating I/O workload, ... whatever I do, it's never failing - except when defragmentation completed.
And, it never ever failed while defragementation is in progress. It always fails directly after defragementation completed.
-
Here's the RN104 log from today:
-
Jun 08, 2020 12:44:48 PM System: ReadyNASOS background service started. Jun 08, 2020 12:28:25 PM System: The system is shutting down. Jun 08, 2020 12:28:21 PM Disk: Disk in channel 3 (Internal) exceeded safe temperature threshold (60 C). Jun 08, 2020 12:28:19 PM Disk: System is shutting down because disk in channel 2 (Internal) exceeded safe temperature threshold (60 C). Jun 08, 2020 12:28:18 PM Disk: Disk in channel 2 (Internal) exceeded safe temperature threshold (60 C). Jun 08, 2020 12:27:04 PM Volume: Defragmentation complete for volume data. Jun 08, 2020 11:30:01 AM Volume: Defragmentation started for volume data. Jun 08, 2020 11:30:01 AM Volume: Defragmentation started for volume data. Jun 08, 2020 10:02:14 AM System: ReadyNASOS background service started. I again raise the question why defragmentation is started twice (at least the startup message is issued twice)? For me it looks like the terminating defragmentation process is "killing" the fan/ temperature controlling process - maybe because the "double" defragementation startup is causing structural inconsistencies in regards to PIDs.
I cannot prove this by myself, because I have no idea about which process is doing what and I cannot "trace" it. There's no change to manual monitor this behavior or to check fan speed at time the temperature exceeds safe level because it's too fast shutting down the RN104.
Idea's welcome...
Thanks and have a great day,
Holger
- StephenBJun 08, 2020Guru - Experienced User
Try running a disk test from the volume settings wheel, and see if that also triggers the problem. If it does, then remove the disks (obvious with the NAS powered down), and see if they are hot. Don't mix them up - make sure they remain in the proper slots.
What is the disk model?
HolgerGT86 wrote:
today, my RN104 shutdown again because the disks in the middle bays exceeded their safe temperature of 60°C. This means, it's not the fan which is failing because I replaced it with a new, much "stronger" one.
It could of course be that one of the disks is failing.
Another thing you can try is connecting them to a Windows PC (usb adapter/dock or sata) and test them with vendor tools - seatools for seagate, lifeguard for western digital. Run the long generic test.
While the test is running, touch the disk every now and then to get a sense of the temperature. Also install a SMART monitor (for instance CrystalDiskInfo), and look at the temperature being reported by the drive when the test is running.
- SandsharkJun 08, 2020Sensei
If your "stonger" (I assume you mean higher CFM) fan needs a higher RPM to attain that higher flow, then the NAS is not aware of that capability.
Do make sure the rear oif the NAS is at least 6" (more is better) from the wall and that there is nothing that can cause the exhaust to re-circulate to the intake (like being hemmed in on a shelf).
- HolgerGT86Jun 08, 2020Guide
Hello Sandshark, hello StephenB,
many thanks for your responses. I'll try to answer your questions although most of those, if not all, are already documented in this thread.
I have no option to test the disk drives outside the RN104.
The disk temperature is really about 60°C. When I restart the RN104 and I can access the admin web interface, the disk temperature is still close to 60°C and there's very warm air coming out of the rear of the RN104.
There's nothing blocking the air circulation. The RN104 is placed on a shelf in a cellar room. The air temperature in that room is about 15°C all year. There must be a picture already in this thread, but I'll upload it again.
You are right when saying that I mean higher RPM when saying stronger. The fan is rotating up to 3800RPM and has more/ higher air pressure/ air flow than the original fan. I'm monitoring it on a regular base and it's usually running at >3000RPM when timemachine backups or RN104 maintenance activities like defragmentation, data scrubbing, etc. are running. At time when I monitor the disk performance and fan speed, the disk temperature is between 30°C and 45°C for the inside disk drives.
In fact, including the original fan shipped with the RN104, it is the 3rd fan I'm currently trying ... all showing same behaviour - so I doubt it's the fan itself.
The disk drives are 1TB Hitachi storage server drives of type HUA721010KLA330.
As I already documented, I so far failed to recreate the issue. I already placed the RN104 at my desk, having it aside all day. I loaded the RN104 with as much I/O as I could but everything worked fine. Even when data scrubbing is running for about 48 hours + regular time machine backups in parallel, everything is working fine.
As already said, it only happens exactly after defragmentation ended. But, I was not able to recreate it by running defragmentation manually.
I don't know what else to document here... using a smart monitor will not help because I already know the disk drives smart values correclty report the 60°C. The only thing I was not able to "see" so far is the fan stopping to rotate when defragmentation ends.
I need to place the RN104 at my desk again and try to catch it with my eyes ... or I need to buy another NAS maybe as I'm tired to debug this behaviour ... don't know.
- HolgerGT86Jun 08, 2020Guide
Foto of the RN104 when placed in the shelf ...
- StephenBJun 08, 2020Guru - Experienced User
HolgerGT86 wrote:
I have no option to test the disk drives outside the RN104.
The disk temperature is really about 60°C.
I suggest purchasing a USB/SATA adapter or dock, so you can test the disk outside of the NAS. Adapters are quite inexpensive ($20 for one with that includes power - which you need). WD's lifeguard diagnostic should be able to test the drives.
HolgerGT86 wrote:
In fact, including the original fan shipped with the RN104, it is the 3rd fan I'm currently trying ... all showing same behaviour - so I doubt it's the fan itself.
The disk drives are 1TB Hitachi storage server drives of type HUA721010KLA330.
I suspect it is one of the disks. Have you been using these with all three fans?
One option is to try replacing the one that feels the hottest - perhaps a WD Gold (which should run a bit cooler anyway). Though if you move up to a larger size, you'd have other options (Ironwolf Pro, or WD Red Pro). If that doesn't solve it, then you could perhaps then hot-swap the other Hitachi with the one you removed.
HolgerGT86 wrote:
As already said, it only happens exactly after defragmentation ended. But, I was not able to recreate it by running defragmentation manually.
That is a puzzle. Have you tried running the disk test on the volume wheel? Or only scrubs and defrags?
- HolgerGT86Jun 09, 2020Guide
I have been using all the 4 same disks all the time with all fans.
I'm running all available maintenance processes including defrag, balance, disk test and data scrubbing once a month since years.
I'm using the RN104 as backup target only. No app is installed in addition to the base firmware.
From the S.M.A.R.T data, only the outer disk in slot1 show 2 relocations. All other error counters are 0 for all the 4 disk drives.
I have some of these disk drives still available, so I'll maybe start replacing one by one over time.
Maybe I'll rotate the disks so that the inner disk drives reporting the 60°C will become the outer disk drives, just to see if the issue is really related to the current inner drives. But this will take some time as the Marvell processor is not the fastest one...
I can do hardware actions, but what I cannot do is debugging the firmware. Because of this I'ld be happy is someone with software debugging capabilities can have a look why on my RN104 the defrag startup message is always issued twice. Is this a messaging problem only, or is the process really started up 2 times in parallel?
Have a great day,
Holger - StephenBJun 09, 2020Guru - Experienced User
HolgerGT86 wrote:
on my RN104 the defrag startup message is always issued twice. Is this a messaging problem only, or is the process really started up 2 times in parallel?
I've seen reports of this, but haven't seen any information on the root cause.
But even if two defrag processes were running simultaneously, that wouldn't cause the disks to overheat. All that the process does is search for fragmented files, and then attempt to defrag them. There has to be something else going on. You've looked for other errors in the log zip file?
HolgerGT86 wrote:
Maybe I'll rotate the disks so that the inner disk drives reporting the 60°C will become the outer disk drives, just to see if the issue is really related to the current inner drives.
Power down the NAS, then remove/label the disks by their original slots. Then shuffle them as you wish. As long as you shuffle with the NAS powered down, it should work ok with no resyncs needed.
- HolgerGT86Jul 10, 2020Guide
Hello, good morning, good evening and good afternoon,
I'm updating this thread with a current status to keep it alive. The RN104 is running without any issue since disabling the defragmentation maintenance process. Data scrubbing is running for more than a day on the RN104 and never experienced an issue.
I'm planning to move the RN104 to my office desk in the next days and run defragmentation again. I'll closely monitor the RN104 behaviour than.
I'll keep you updated.
Have a great day and stay healthy!
Holger
- HolgerGT86Jul 18, 2020Guide
Hello again,
I spend some hours in testing this further. I placed the RN104 at my desk and was monitoring it through the web GUI and by sitting next to it.
I scheduled defrag and, as usual, defrag started up and 2 log entries were written that defrag was started.
The disk temperature went up to a maximum of 44°C for the inner 2 drives as long as defrag was running.
Defrag completed successfully after running for about 50-55 minutes.
When defrag completed, the interesting part started. Within 1-2 minutes after defrag completed, the temperature of the inner 2 disk drives increased up to 51/52°C! The fan speed was increased from 2730rpm to 3150rpm.
I logged in to the NAS using ssh and run smartctl. The values returned by smartctl matched the temperature of the disk drives shown in the performance window of the GUI. Seems to be the disk drives are reporting their temperature accurate.
I attached a text file including the complete smartctl output ... just in case someone can see more than I do.
The test confirms that this is not a fan problem but the temperature of the disk drives really goes up immediately after defrag completes - why ever.
Would replacing the disk drives the only option or does someone have another idea what's going on here and how to fix/ mitigate it?
Thanks and regards,
Holger
- StephenBJul 18, 2020Guru - Experienced User
HolgerGT86 wrote:
Would replacing the disk drives the only option or does someone have another idea what's going on here and how to fix/ mitigate it?
Maybe first power down the NAS, and swap drive 1,2 and 3,4 (moving the inner drives to the outer slots). Then do a defrag and see if the heating problem stays with the slots, or moves with the drives. After that test, I'd power down the NAS and restore the drives to their original positions.
I would certainly consider replacing the drives. Enterprise-class (in my opinion) are overkill for the RN100 - it is limited by it's processor speed, not the disks. I suggest starting over - two WD30EFRX or two ST3000VN007 would cost a total of about $200 (current US prices) and would give you the same capacity you are using now. Then you'd have two slots for expansion (and if you like, you can put the drives in bay 1 and bay 4 to create more distance between then).
Avoid WD EFAX models between 2 TB and 6 TB - they are SMR technology, which IMO aren't great choices for RAID. The older EFRX line is fine. Ironwolf drives are also a good option. BTW, many desktop models are also not SMR, so I won't use them either.
- SandsharkJul 18, 2020Sensei
HolgerGT86 wrote:I logged in to the NAS using ssh and run smartctl. The values returned by smartctl matched the temperature of the disk drives shown in the performance window of the GUI. Seems to be the disk drives are reporting their temperature accurate.
I attached a text file including the complete smartctl output ... just in case someone can see more than I do.
The test confirms that this is not a fan problem but the temperature of the disk drives really goes up immediately after defrag completes - why ever.
Since both the ReadyNASOS and smartctl rely on the drive reporting it's own temperature (and the OS probably calls smartctl for that), this only indicates the reported temperature is consistent, not correct. But with more than one drive reporting a similar temperature, it likely is correct.
It sounds like an airflow problem, but with the NAS out on your desk, there shouldn't be anything external contributing to that. The ambient air temperature is within normal comfort range, correct?
Does the fan speed initially drop at the conclusion of the defrag and then have to jump up when the temperature starts to increase? I've seen the hystereses and step sizes of the fan control to be less than optimal on legacy NASes, but I don't have a lot of newer ones (and no 104) to evaluate them for those cases. It may be that the greater amount of heat produced by the enterprise drives is outside the expectations Netgear had when establishing the fan control parameters for a 104. But I've run 2TB and 3TB drives from the same family in both 2 and 4 bay legacy NASes running OS6.x and had no issues with high-intensity processes like initial sync or scrub (I don't use defrag, I think it's mostly a waste of time on RAID and it can quickly increase the space used by snapshots).
- StephenBJul 18, 2020Guru - Experienced User
Sandshark wrote:
But I've run 2TB and 3TB drives from the same family in both 2 and 4 bay legacy NASes running OS6.x and had no issues with high-intensity processes like initial sync or scrub (I don't use defrag, I think it's mostly a waste of time on RAID and it can quickly increase the space used by snapshots).Still, it is weird that this happens repeatedly after a defrag. It does make me wonder if there is something off with the file system.
Another thing that HolgerGT86 could try is doing a factory reset, rebuild the NAS, and then reload the data from backup. The risk here is that the disks might overheat during the RAID sync. But the fact that they don't overheat during a scrub suggests that they won't.
- HolgerGT86Jul 19, 2020Guide
StephenB wrote:
HolgerGT86 wrote:Would replacing the disk drives the only option or does someone have another idea what's going on here and how to fix/ mitigate it?
Maybe first power down the NAS, and swap drive 1,2 and 3,4 (moving the inner drives to the outer slots). Then do a defrag and see if the heating problem stays with the slots, or moves with the drives. After that test, I'd power down the NAS and restore the drives to their original positions.
I would certainly consider replacing the drives. Enterprise-class (in my opinion) are overkill for the RN100 - it is limited by it's processor speed, not the disks. I suggest starting over - two WD30EFRX or two ST3000VN007 would cost a total of about $200 (current US prices) and would give you the same capacity you are using now. Then you'd have two slots for expansion (and if you like, you can put the drives in bay 1 and bay 4 to create more distance between then).
Avoid WD EFAX models between 2 TB and 6 TB - they are SMR technology, which IMO aren't great choices for RAID. The older EFRX line is fine. Ironwolf drives are also a good option. BTW, many desktop models are also not SMR, so I won't use them either.
Hello StephenB,
many thanks for your response. I already moved the inner drives to the outer slots and vice versa. The RN104 came up fine in this configuration.
I'm using this type of enterprise drives because I received a bunch of them for free. Nevertheless, I'm thinking about replacing them. Thanks for providing recommendations. Besides this, I now left the front door open although I do not believe it will help. The outside air temperature is always about 15°C all the year and except the front door, nothing is blocking the air flow.
I believe in an internal disk drive cooling issue when they slow down after defrag completed. I don't know if they slow down their platter rotation speed and this is than reducing their cooling ... doesn't matter anymore. I turned off defrag already.
- HolgerGT86Jul 19, 2020Guide
Sandshark wrote:
HolgerGT86 wrote:I logged in to the NAS using ssh and run smartctl. The values returned by smartctl matched the temperature of the disk drives shown in the performance window of the GUI. Seems to be the disk drives are reporting their temperature accurate.
I attached a text file including the complete smartctl output ... just in case someone can see more than I do.
The test confirms that this is not a fan problem but the temperature of the disk drives really goes up immediately after defrag completes - why ever.
Since both the ReadyNASOS and smartctl rely on the drive reporting it's own temperature (and the OS probably calls smartctl for that), this only indicates the reported temperature is consistent, not correct. But with more than one drive reporting a similar temperature, it likely is correct.
It sounds like an airflow problem, but with the NAS out on your desk, there shouldn't be anything external contributing to that. The ambient air temperature is within normal comfort range, correct?
Does the fan speed initially drop at the conclusion of the defrag and then have to jump up when the temperature starts to increase? I've seen the hystereses and step sizes of the fan control to be less than optimal on legacy NASes, but I don't have a lot of newer ones (and no 104) to evaluate them for those cases. It may be that the greater amount of heat produced by the enterprise drives is outside the expectations Netgear had when establishing the fan control parameters for a 104. But I've run 2TB and 3TB drives from the same family in both 2 and 4 bay legacy NASes running OS6.x and had no issues with high-intensity processes like initial sync or scrub (I don't use defrag, I think it's mostly a waste of time on RAID and it can quickly increase the space used by snapshots).
Hello Sandshark,
many thanks for your response, too. I assume the temperature is correct as the drives are really warm/ hot.
The temperature at my desk is about 24°C, in the cellar room, where the RN104 is usually located, the temperature is about 15°C all the year.
When defrag is complete, the fan speed did not go down. It started to increase immediately, first slowly, than in larger steps. It's for sure not the fan ... it is a new one providing more air pressure than the original one. See my response to StephenB, too, please.
The issue is only occuring when running defrag and defrag completed. No issues at all running data scrubbing or disk test.
We can stop here ... I'll think about how to "resolve" this issue.
Thanks again!
- HolgerGT86Jul 19, 2020Guide
StephenB wrote:
Sandshark wrote:
But I've run 2TB and 3TB drives from the same family in both 2 and 4 bay legacy NASes running OS6.x and had no issues with high-intensity processes like initial sync or scrub (I don't use defrag, I think it's mostly a waste of time on RAID and it can quickly increase the space used by snapshots).Still, it is weird that this happens repeatedly after a defrag. It does make me wonder if there is something off with the file system.
Another thing that HolgerGT86 could try is doing a factory reset, rebuild the NAS, and then reload the data from backup. The risk here is that the disks might overheat during the RAID sync. But the fact that they don't overheat during a scrub suggests that they won't.
I'll think about starting over from scratch and restore a backup. I'm not sure though how to reload all data from backup including the user access rights and share settings as I'm backing up the RN104 to 2 USB drives attached to it. Besides this the RN104 is used to store Mac time machine backups and Windows 10 backups. I'm afraid to lose/ break things, so I'll delay this activity to the autumn/ winter time.
Many thanks for all you support/ help. You may close this thread.
Stay healthy!
Related Content
NETGEAR Academy
Boost your skills with the Netgear Academy - Get trained, certified and stay ahead with the latest Netgear technology!
Join Us!