× NETGEAR will be terminating ReadyCLOUD service by July 1st, 2023. For more details click here.
Orbi WiFi 7 RBE973
Reply

RN104 shutdown because disks exceed safe temperature

HolgerGT86
Guide

RN104 shutdown because disks exceed safe temperature

Hello,
I'm using a RN104 at firmware level 6.10.3. 

The ReadyNAS is equipped with 4 times Hitachi HUA721010KLA330 1TB SATA drives, all 4 drives building one RAID5 X-RAID volume.

Today the RN104 shutdown because the disks in channel 2 and 3 exceeded the safe temperature level of 62°C / 63°C.

This time I was at home and immediately restarted the RN104. It came back up fine and disks in channel 2 and 3 still reported a temperature of 60°C inn the ReadNAS Web User Interface (System - Performance view). The disk started to cool down immediately and the fan was running with 3233 RPM (max RPM?). The air coming out of the RN104 was really warm at that time.

The fan mode is set to cool which is confirmed by the systemd-journal:

"Apr 08 11:01:49 PrivateNAS readynasd[2366]: Using fan mode: cool"

The outside room temperature is about 20°C - it is a room in the cellar.

I'm currently running a disk test and I'm monitoring the temperature and the fan speed - all is fine so far.

I reviewed several logs and remembered that I run into same issue on March 8 already.

Interesting wise a scheduled defrag is running each 8th of a month:

Excerpt cron.log:

"Apr 08 11:15:01 PrivateNAS CRON[3456]: (root) CMD (/usr/bin/volume_schedule 2 &> /dev/null)"

Every time the defrag is being started, the process is started twice. I'm getting an eMail saying that defrag was started 2 times.

The systemd-journal.log seem to confirm this:

"Apr 08 11:15:01 PrivateNAS readynasd[2366]: Defragmentation started for volume data.
Apr 08 11:15:01 PrivateNAS readynasd[2366]: Defragmentation started for volume data.
Apr 08 11:15:02 PrivateNAS msmtpq[3476]: mail for [ -C /etc/msmtprc holger@martens-lonsheim.de --timeout=60 ] : send was successful
Apr 08 11:15:04 PrivateNAS msmtpq[3490]: mail for [ -C /etc/msmtprc holger@martens-lonsheim.de --timeout=60 ] : send was successful"

From the cron.log I only can see it getting started once. 

For both temperature triggered shutdown events the defrag process was just complete, when the RN104 shutdown.

I'm now wondering if the temperature triggered shutdown is somehow related to the scheduled defrag process?

I don't know if it's possible to log the fan speed somehow. Maybe the fan is not working correctly anymore?

I already cleaned the disk drives and RN104 inside after the March 8 shutdown. Means, no dust is hindering the air flow.

---

Log excerpt March 8:

Feb 08, 2020 12:31:35 PM
 
System: The system is shutting down.
Feb 08, 2020 12:31:31 PM
 
Disk: Disk in channel 3 (Internal) exceeded safe temperature threshold (66 C).
Feb 08, 2020 12:31:30 PM
 
Disk: System is shutting down because disk in channel 2 (Internal) exceeded safe temperature threshold (67 C).
Feb 08, 2020 12:31:28 PM
 
Disk: Disk in channel 2 (Internal) exceeded safe temperature threshold (67 C).
Feb 08, 2020 12:30:14 PM
 
Volume: Defragmentation complete for volume data.
Feb 08, 2020 11:15:01 AM
 
Volume: Defragmentation started for volume data.
Feb 08, 2020 11:15:01 AM
 
Volume: Defragmentation started for volume data.
Feb 08, 2020 11:02:16 AM
 
System: ReadyNASOS background service started.

 

Log excerpt April 8:

Apr 08, 2020 12:11:55 PM
 
System: ReadyNASOS background service started.
Apr 08, 2020 12:11:29 PM
 
Disk: Disk in channel 3 (Internal) exceeded safe temperature threshold (60 C).
Apr 08, 2020 12:11:26 PM
 
Disk: Disk in channel 2 (Internal) exceeded safe temperature threshold (60 C).
Apr 08, 2020 12:07:13 PM
 
System: The system is shutting down.
Apr 08, 2020 12:07:08 PM
 
Disk: Disk in channel 3 (Internal) exceeded safe temperature threshold (63 C).
Apr 08, 2020 12:07:07 PM
 
Disk: System is shutting down because disk in channel 2 (Internal) exceeded safe temperature threshold (64 C).
Apr 08, 2020 12:07:05 PM
 
Disk: Disk in channel 2 (Internal) exceeded safe temperature threshold (64 C).
Apr 08, 2020 12:05:51 PM
 
Volume: Defragmentation complete for volume data.
Apr 08, 2020 11:15:01 AM
 
Volume: Defragmentation started for volume data.
Apr 08, 2020 11:15:01 AM
 
Volume: Defragmentation started for volume data.
Apr 08, 2020 11:02:17 AM
 
System: ReadyNASOS background service started.

 

On March 8 the system was running at level 6.10.2

On April 8 the system was/is at level 6.10.3 (update was done at March 18).

I would be happy to receive some advice how to proceed.

Many thanks and regards,

Holger

Model: RN104|ReadyNAS 100 Series 4- Bay
Message 1 of 28
StephenB
Guru

Re: RN104 shutdown because disks exceed safe temperature


@HolgerGT86 wrote:

 

I would be happy to receive some advice how to proceed.

 


It's possible that one disk is overheating (and raising the temp of the adjacent disk).  Have you looked at the temperatures of disk 1 and disk 4?

 

I'd start by powering down the NAS, and then confirm that disks 2 and 3 are in fact hot, and also see how disks 1 and 4 compare.

 

Maybe also connect the disks to a PC, and run the vendor diagnostics on each one. You can monitor the temps using any tool that supports smart stat gathering (Acronis Disk Monitor is one of several such tools).  This would confirm that the disks are still working properly, as well as giving you some idea of the temperature rise during the test.  Ideally use a USB/SATA adapter.  A dock is ok too, but of course it would reduce the air flow around the disk.  If one disk is spiking more than the others, then you might just need to replace that drive.

 

WD's Lifeguard software should recognize the disk, and let you run the non-destructive long test.

 

Another test you could try is to swap disks 1 and 2, and also swap 3 and 4.  Do this with the NAS powered down, and label the disks by their original slot number.  The system should be able to mount the volume normally (even in the different order).  Then look at the temperature pattern, and see if the hottest disks are still in the center slots, or if they are shifting to the outside.

 

 

 

Message 2 of 28
HolgerGT86
Guide

Re: RN104 shutdown because disks exceed safe temperature

Hello Stephen,

many thanks for your fast response!

I confirm that the disk were really warm as the air coming out of the RN104 was really warm. When I powered the RN104 again ... some minutes after the shutdown ... the inner disks were at about 60°C and the outer disks at about 50-54°C. 

Swapping the disks would be an option I can try. Connecting a PC may be an issue as I use Apple at home.

What about the coexistence with the defrag processes? I'm using the RN104 as backup device only (about 99%), maybe I'm using it to share single files. Never had an issue with that.

I'm running defrag, scrub, balance and disk test maintenance on a regular base, but the issue only occured directly after defrag so far.

Is it possible that, when the parallel running defrag ends, the fan controlling process is halted, too? By chance I mean.

Any way to find out why the defrag is started twice?

Any way to find out if the fan controlling process is in trouble?

Message 3 of 28
StephenB
Guru

Re: RN104 shutdown because disks exceed safe temperature


@HolgerGT86 wrote:

 

What about the coexistence with the defrag processes? I'm using the RN104 as backup device only (about 99%), maybe I'm using it to share single files. Never had an issue with that.

 


Normally the disks will get hotter under heavy use.  So if the defrag (or any other task) has a lot of work to do, I'd expect a temperature rise.  But the NAS fan control should be enough to prevent the actual overheating.

 

It is conceivable that the fan got damaged when you cleaned it.  You should definitely be hearing the fan kick into high gear as the temperature goes up.  Maybe move the NAS to a new spot where that would be easier to hear???  You should also be able to feel the air flow, esp. when it ramps up.

 

X86 NAS (RN300 series and up) do have performance graphs that include SYS and CPU temps.  But not the RN100 and RN200 familes.

 

If you enable ssh, then smartctl might give you some temperature history

root@NAS:~# smartctl -x /dev/sda
smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.4.190.x86_64.1] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
...

SCT Temperature History Version:     2
Temperature Sampling Period:         1 minute
Temperature Logging Interval:        1 minute
Min/Max recommended Temperature:      0/65 Celsius
Min/Max Temperature Limit:           -40/70 Celsius
Temperature History Size (Index):    128 (17)

Index    Estimated Time   Temperature Celsius
  18    2020-04-08 09:18    35  ****************
 ...    ..(126 skipped).    ..  ****************
  17    2020-04-08 11:25    35  ****************

...

This might not be available with your disks though.

 

You can also get the current temp

root@NAS:~# smartctl -a /dev/sda | grep Temperature_Celsius
194 Temperature_Celsius     0x0002   203   203   000    Old_age   Always       -       32 (Min/Max 13/45)
root@NAS:~# smartctl -a /dev/sdb | grep Temperature_Celsius
194 Temperature_Celsius     0x0002   185   185   000    Old_age   Always       -       35 (Min/Max 14/48)
root@NAS:~# smartctl -a /dev/sdc | grep Temperature_Celsius
194 Temperature_Celsius     0x0022   119   105   000    Old_age   Always       -       33
root@NAS:~# smartctl -a /dev/sdd | grep Temperature_Celsius
194 Temperature_Celsius     0x0022   121   108   000    Old_age   Always       -       31
root@NAS:~#

It's easy to redo these commands w/o retyping them - you use the up arrow on your keyboard to see the previously entered commands, and then press enter to re-execute it again.

 

 

 

 

 

 

Message 4 of 28
StephenB
Guru

Re: RN104 shutdown because disks exceed safe temperature

BTW, you might find that the disks will run a bit cooler if you open the front door.  That's not a fix of course.

Message 5 of 28
HolgerGT86
Guide

Re: RN104 shutdown because disks exceed safe temperature

I checked the fan and it is running fine. The speed control is working as expected - now. The disk test just completed successfully. Temperature was about 40°C +/- 6°C all the time. The fan running at 27xx/ 3233 RPM.

The issue is that I had no chance to control the fan speed at time of issue so far.

This is, because I notice there's an issue when I receive the eMail that the disks are getting warm. Warning level and shutdown are all happening within 60 seconds, so no chance for me to run down into the cellar to check what's going on. When I'm down in the cellar the RN104 is already powered off.

I scheduled another defrag for tomorrow, trying to recreate the issue.

I'll try of the commands you provided. Maybe I can get a script running to monitor temperature and fan speed during the defrag.

Message 6 of 28
HolgerGT86
Guide

Re: RN104 shutdown because disks exceed safe temperature


@StephenB wrote:

BTW, you might find that the disks will run a bit cooler if you open the front door.  That's not a fix of course.


I did that already after bringing the RN104 back up. I closed it again because now the temperature is stable again. But thanks for letting me know!

Message 7 of 28
Sandshark
Sensei

Re: RN104 shutdown because disks exceed safe temperature

No warnings about low fan speed?  That's what it sounds like -- that the fan is failing.  The drive temperatures will rise dramtically during heavy use like a defrag if the fan't not running.  Is the back of the NAS close to the wall, or is the NAS close to anything (bookcase wall, another NAS, etc.) on the top or side?  Those can make the heat rise, too.  Space behind the NAS can be more important than some realize, as many fans do not work well with the high back pressure that can cause.  Restrictions around it can insulate it.  But more importantly, either of these can cause the air to re-circulate from exhaust to intake and not mix with surrounding cooler air much.

Message 8 of 28
HolgerGT86
Guide

Re: RN104 shutdown because disks exceed safe temperature

Many thanks for the good hints and tips - I'll work them off as good as possible. Furthermore I'll try to find a way to recreate this behaviour.

I'm attaching a picture of my RN104 and its location. There's plenty of space in front of it and in the back. Air flow is guaranteed and the incoming air is cool as it's located in the cellar.

 

Message 9 of 28
HolgerGT86
Guide

Re: RN104 shutdown because disks exceed safe temperature

I replaced the fan today. Let's see ... monitoring now.

Nevertheless, would be nice to understand why starting defrag is always logged twice and results in 2 eMails being sent.

 

I'll keep you updated.

Message 10 of 28
HolgerGT86
Guide

Re: RN104 shutdown because disks exceed safe temperature

it's me again ...

 

Can someone help me to understand how fan control works?

 

I'm wondering why the new (re-used) fan is not rotating that fast than the original one.

I know that the original fan was running with max speed of 3233 RPM which matches its specification of having a max speed of 3200 RPM. 

With same fan setting "cool", the new fan is rotating up to 1920 RPM only for now.

I'm monitoring this because my new fan is a re-used fan of an old NetFinity 3000 server and I did not find its specifications. I only found specifications of a replacement model and that is running with up to 4800 RPMs - don't know if this applys to the fan I'm using, too.

As already documented, the replacement fan is currently running with a maximum speed of 1920 RPM. With that, the temperature of the CPU raises up to 66°C (151°F) and the temperature of the disks is as follows: 

disk #1 = 49°C (120°F)

disk #2 = 54°C (129°F)

disk #3 = 53°C (127°F)

disk #4 = 47°C (116°F)

Is this normal or is the replacement fan already running at its maximum speed?

Is there a way to manually set the fan to maximum speed?

 

Many thanks and I wish happy Easter holidays to you and your families,

Holger

Message 11 of 28
HolgerGT86
Guide

Re: RN104 shutdown because disks exceed safe temperature

Hello,

I now received my new fan for my RN104. Looks like the RN104 is giving less than 7 volts to the fan at startup, when the RN104 and its disk and CPU are cold. This results into a 0RPM warning message. Later on, when the RN104 is heating up, the fan is working fine and the RN104 disk temperature is about 10°C less than before. 

Why 7 volts? From the specs of the fan I can read that the fan allows 7-13.8 volts.

Is there a way to set the minimum voltage value or minimum RPM of the fan?

Thanks,

Holger

Message 12 of 28
HolgerGT86
Guide

Re: RN104 shutdown because disks exceed safe temperature

 

Good morning, good evening, good afternoon,

today, my RN104 shutdown again because the disks in the middle bays exceeded their safe temperature of 60°C. This means, it's not the fan which is failing because I replaced it with a new, much "stronger" one. Something else must be wrong, and I still believe it's related to running defragmentation maintenance. Why I believe so? Because the RN104 only experiences this issue when defragementation just completed. It never happened when running another process, when loading the NAS with backup workload, when manually creating I/O workload, ... whatever I do, it's never failing - except when defragmentation completed.

And, it never ever failed while defragementation is in progress. It always fails directly after defragementation completed.

-

Here's the RN104 log from today:

-

Jun 08, 2020 12:44:48 PM
 
System: ReadyNASOS background service started.
Jun 08, 2020 12:28:25 PM
 
System: The system is shutting down.
Jun 08, 2020 12:28:21 PM
 
Disk: Disk in channel 3 (Internal) exceeded safe temperature threshold (60 C).
Jun 08, 2020 12:28:19 PM
 
Disk: System is shutting down because disk in channel 2 (Internal) exceeded safe temperature threshold (60 C).
Jun 08, 2020 12:28:18 PM
 
Disk: Disk in channel 2 (Internal) exceeded safe temperature threshold (60 C).
Jun 08, 2020 12:27:04 PM
 
Volume: Defragmentation complete for volume data.
Jun 08, 2020 11:30:01 AM
 
Volume: Defragmentation started for volume data.
Jun 08, 2020 11:30:01 AM
 
Volume: Defragmentation started for volume data.
Jun 08, 2020 10:02:14 AM
 
System: ReadyNASOS background service started.

 

I again raise the question why defragmentation is started twice (at least the startup message is issued twice)? For me it looks like the terminating defragmentation process is "killing" the fan/ temperature controlling process - maybe because the "double" defragementation startup is causing structural inconsistencies in regards to PIDs.

I cannot prove this by myself, because I have no idea about which process is doing what and I cannot "trace" it. There's no change to manual monitor this behavior or to check fan speed at time the temperature exceeds safe level because it's too fast shutting down the RN104.

Idea's welcome...

 

Thanks and have a great day,

Holger

Model: RN10400|ReadyNAS 100 Series 4- Bay (Diskless)
Message 13 of 28
StephenB
Guru

Re: RN104 shutdown because disks exceed safe temperature

Try running a disk test from the volume settings wheel, and see if that also triggers the problem.  If it does, then remove the disks (obvious with the NAS powered down), and see if they are hot.  Don't mix them up - make sure they remain in the proper slots.

 

What is the disk model?

 


@HolgerGT86 wrote:

 

today, my RN104 shutdown again because the disks in the middle bays exceeded their safe temperature of 60°C. This means, it's not the fan which is failing because I replaced it with a new, much "stronger" one.


It could of course be that one of the disks is failing.

 

Another thing you can try is connecting them to a Windows PC (usb adapter/dock or sata) and test them with vendor tools - seatools for seagate, lifeguard for western digital.  Run the long generic test. 

 

While the test is running, touch the disk every now and then to get a sense of the temperature.  Also install a SMART monitor (for instance CrystalDiskInfo), and look at the temperature being reported by the drive when the test is running.

Message 14 of 28
Sandshark
Sensei

Re: RN104 shutdown because disks exceed safe temperature

If your "stonger" (I assume you mean higher CFM)  fan needs a higher RPM to attain that higher flow, then the NAS is not aware of that capability.

 

Do make sure the rear oif the NAS is at least 6" (more is better) from the wall and that there is nothing that can cause the exhaust to re-circulate to the intake (like being hemmed in on a shelf).

Message 15 of 28
HolgerGT86
Guide

Re: RN104 shutdown because disks exceed safe temperature

Hello Sandshark, hello StephenB,

many thanks for your responses. I'll try to answer your questions although most of those, if not all, are already documented in this thread.

I have no option to test the disk drives outside the RN104. 

The disk temperature is really about 60°C. When I restart the RN104 and I can access the admin web interface, the disk temperature is still close to 60°C and there's very warm air coming out of the rear of the RN104.

There's nothing blocking the air circulation. The RN104 is placed on a shelf in a cellar room. The air temperature in that room is about 15°C all year. There must be a picture already in this thread, but I'll upload it again.

You are right when saying that I mean higher RPM when saying stronger. The fan is rotating up to 3800RPM and has more/ higher air pressure/ air flow than the original fan. I'm monitoring it on a regular base and it's usually running at >3000RPM when timemachine backups or RN104 maintenance activities like defragmentation, data scrubbing, etc. are running. At time when I monitor the disk performance and fan speed, the disk temperature is between 30°C and 45°C for the inside disk drives.

In fact, including the original fan shipped with the RN104, it is the 3rd fan I'm currently trying ... all showing same behaviour - so I doubt it's the fan itself.

The disk drives are 1TB Hitachi storage server drives of type HUA721010KLA330.

As I already documented, I so far failed to recreate the issue. I already placed the RN104 at my desk, having it aside all day. I loaded the RN104 with as much I/O as I could but everything worked fine. Even when data scrubbing is running for about 48 hours + regular time machine backups in parallel, everything is working fine. 

As already said, it only happens exactly after defragmentation ended. But, I was not able to recreate it by running defragmentation manually.

I don't know what else to document here... using a smart monitor will not help because I already know the disk drives smart values correclty report the 60°C. The only thing I was not able to "see" so far is the fan stopping to rotate when defragmentation ends. 

I need to place the RN104 at my desk again and try to catch it with my eyes ... or I need to buy another NAS maybe as I'm tired to debug this behaviour ... don't know.

Message 16 of 28
HolgerGT86
Guide

Re: RN104 shutdown because disks exceed safe temperature

Foto of the RN104 when placed in the shelf ...

Message 17 of 28
StephenB
Guru

Re: RN104 shutdown because disks exceed safe temperature


@HolgerGT86 wrote:

 

I have no option to test the disk drives outside the RN104. 

The disk temperature is really about 60°C. 


I suggest purchasing a USB/SATA adapter or dock, so you can test the disk outside of the NAS.  Adapters are quite inexpensive ($20 for one with that includes power - which you need).  WD's lifeguard diagnostic should be able to test the drives.

 


@HolgerGT86 wrote:

 

In fact, including the original fan shipped with the RN104, it is the 3rd fan I'm currently trying ... all showing same behaviour - so I doubt it's the fan itself.

The disk drives are 1TB Hitachi storage server drives of type HUA721010KLA330.

I suspect it is one of the disks.  Have you been using these with all three fans?  

One option is to try replacing the one that feels the hottest - perhaps a WD Gold (which should run a bit cooler anyway).  Though if you move up to a larger size, you'd have other options (Ironwolf Pro, or WD Red Pro). If that doesn't solve it, then you could perhaps then hot-swap the other Hitachi with the one you removed.  

 


@HolgerGT86 wrote:

 

As already said, it only happens exactly after defragmentation ended. But, I was not able to recreate it by running defragmentation manually.

That is a puzzle.  Have you tried running the disk test on the volume wheel?  Or only scrubs and defrags?

 

 

 

 

 

 

Message 18 of 28
HolgerGT86
Guide

Re: RN104 shutdown because disks exceed safe temperature

I have been using all the 4 same disks all the time with all fans.

I'm running all available maintenance processes including defrag, balance, disk test and data scrubbing once a month since years.

I'm using the RN104 as backup target only. No app is installed in addition to the base firmware.

From the S.M.A.R.T data, only the outer disk in slot1 show 2 relocations. All other error counters are 0 for all the 4 disk drives.

I have some of these disk drives still available, so I'll maybe start replacing one by one over time.

Maybe I'll rotate the disks so that the inner disk drives reporting the 60°C will become the outer disk drives, just to see if the issue is really related to the current inner drives. But this will take some time as the Marvell processor is not the fastest one...

I can do hardware actions, but what I cannot do is debugging the firmware. Because of this I'ld be happy is someone with software debugging capabilities can have a look why on my RN104 the defrag startup message is always issued twice. Is this a messaging problem only, or is the process really started up 2 times in parallel?

Have a great day, 
Holger

Message 19 of 28
StephenB
Guru

Re: RN104 shutdown because disks exceed safe temperature


@HolgerGT86 wrote:

on my RN104 the defrag startup message is always issued twice. Is this a messaging problem only, or is the process really started up 2 times in parallel?

 


I've seen reports of this, but haven't seen any information on the root cause.

 

But even if two defrag processes were running simultaneously, that wouldn't cause the disks to overheat.  All that the process does is search for fragmented files, and then attempt to defrag them.  There has to be something else going on.  You've looked for other errors in the log zip file?

 


@HolgerGT86 wrote:

 

Maybe I'll rotate the disks so that the inner disk drives reporting the 60°C will become the outer disk drives, just to see if the issue is really related to the current inner drives.


Power down the NAS, then remove/label the disks by their original slots.  Then shuffle them as you wish.  As long as you shuffle with the NAS powered down, it should work ok with no resyncs needed.

Message 20 of 28
HolgerGT86
Guide

Re: RN104 shutdown because disks exceed safe temperature

Hello, good morning, good evening and good afternoon,

I'm updating this thread with a current status to keep it alive. The RN104 is running without any issue since disabling the defragmentation maintenance process. Data scrubbing is running for more than a day on the RN104 and never experienced an issue.

I'm planning to move the RN104 to my office desk in the next days and run defragmentation again. I'll closely monitor the RN104 behaviour than.

I'll keep you updated.

Have a great day and stay healthy!

Holger

Message 21 of 28
HolgerGT86
Guide

Re: RN104 shutdown because disks exceed safe temperature

Hello again,

I spend some hours in testing this further. I placed the RN104 at my desk and was monitoring it through the web GUI and by sitting next to it.

I scheduled defrag and, as usual, defrag started up and 2 log entries were written that defrag was started.

The disk temperature went up to a maximum of 44°C for the inner 2 drives as long as defrag was running.

Defrag completed successfully after running for about 50-55 minutes.

When defrag completed, the interesting part started. Within 1-2 minutes after defrag completed, the temperature of the inner 2 disk drives increased up to 51/52°C! The fan speed was increased from 2730rpm to 3150rpm. 

I logged in to the NAS using ssh and run smartctl. The values returned by smartctl matched the temperature of the disk drives shown in the performance window of the GUI. Seems to be the disk drives are reporting their temperature accurate.

I attached a text file including the complete smartctl output ... just in case someone can see more than I do.

The test confirms that this is not a fan problem but the temperature of the disk drives really goes up immediately after defrag completes - why ever.

Would replacing the disk drives the only option or does someone have another idea what's going on here and how to fix/ mitigate it?

Thanks and regards, 

Holger

Message 22 of 28
StephenB
Guru

Re: RN104 shutdown because disks exceed safe temperature


@HolgerGT86 wrote:

 

Would replacing the disk drives the only option or does someone have another idea what's going on here and how to fix/ mitigate it?

Maybe first power down the NAS, and swap drive 1,2 and 3,4 (moving the inner drives to the outer slots).  Then do a defrag and see if the heating problem stays with the slots, or moves with the drives.  After that test, I'd power down the NAS and restore the drives to their original positions.

 

I would certainly consider replacing the drives.  Enterprise-class (in my opinion) are overkill for the RN100 - it is limited by it's processor speed, not the disks.  I suggest starting over - two WD30EFRX or two ST3000VN007 would cost a total of about $200 (current US prices) and would give you the same capacity you are using now.  Then you'd have two slots for expansion (and if you like, you can put the drives in bay 1 and bay 4 to create more distance between then). 

 

Avoid WD EFAX models between 2 TB and 6 TB - they are SMR technology, which IMO aren't great choices for RAID.  The older EFRX line is fine.  Ironwolf drives are also a good option.  BTW, many desktop models are also not SMR, so I won't use them either.  

 

 

 

 

Message 23 of 28
Sandshark
Sensei

Re: RN104 shutdown because disks exceed safe temperature


@HolgerGT86 wrote:

 

I logged in to the NAS using ssh and run smartctl. The values returned by smartctl matched the temperature of the disk drives shown in the performance window of the GUI. Seems to be the disk drives are reporting their temperature accurate.

I attached a text file including the complete smartctl output ... just in case someone can see more than I do.

The test confirms that this is not a fan problem but the temperature of the disk drives really goes up immediately after defrag completes - why ever.

 

Since both the ReadyNASOS and smartctl rely on the drive reporting it's own temperature (and the OS probably calls smartctl for that), this only indicates the reported temperature is consistent, not correct.  But with more than one drive reporting a similar temperature, it likely is correct.

 

It sounds like an airflow problem, but with the NAS out on your desk, there shouldn't be anything external contributing to that.  The ambient air temperature is within normal comfort range, correct?

 

Does the fan speed initially drop at the conclusion of the defrag and then have to jump up when the temperature starts to increase?  I've seen the hystereses and step sizes of the fan control to be less than optimal on legacy NASes, but I don't have a lot of newer ones (and no 104) to evaluate them for those cases.  It may be that the greater amount of heat produced by the enterprise drives is outside the expectations Netgear had when establishing the fan control parameters for a 104.  But I've run 2TB and 3TB drives from the same family in both 2 and 4 bay legacy NASes running OS6.x and had no issues with high-intensity processes like initial sync or scrub (I don't use defrag, I think it's mostly a waste of time on RAID and it can quickly increase the space used by snapshots).

Message 24 of 28
StephenB
Guru

Re: RN104 shutdown because disks exceed safe temperature


@Sandshark wrote:
But I've run 2TB and 3TB drives from the same family in both 2 and 4 bay legacy NASes running OS6.x and had no issues with high-intensity processes like initial sync or scrub (I don't use defrag, I think it's mostly a waste of time on RAID and it can quickly increase the space used by snapshots).

Still, it is weird that this happens repeatedly after a defrag.  It does make me wonder if there is something off with the file system.

 

Another thing that @HolgerGT86 could try is doing a factory reset, rebuild the NAS, and then reload the data from backup.  The risk here is that the disks might overheat during the RAID sync.  But the fact that they don't overheat during a scrub suggests that they won't.

Message 25 of 28
Top Contributors
Discussion stats
  • 27 replies
  • 4365 views
  • 2 kudos
  • 3 in conversation
Announcements