NETGEAR is aware of a growing number of phone and online scams. To learn how to stay safe click here.

Forum Discussion

HolgerGT86's avatar
Apr 08, 2020

RN104 shutdown because disks exceed safe temperature

Hello,
I'm using a RN104 at firmware level 6.10.3. 

The ReadyNAS is equipped with 4 times Hitachi HUA721010KLA330 1TB SATA drives, all 4 drives building one RAID5 X-RAID volume.

Today the RN104 shutdown because the disks in channel 2 and 3 exceeded the safe temperature level of 62°C / 63°C.

This time I was at home and immediately restarted the RN104. It came back up fine and disks in channel 2 and 3 still reported a temperature of 60°C inn the ReadNAS Web User Interface (System - Performance view). The disk started to cool down immediately and the fan was running with 3233 RPM (max RPM?). The air coming out of the RN104 was really warm at that time.

The fan mode is set to cool which is confirmed by the systemd-journal:

"Apr 08 11:01:49 PrivateNAS readynasd[2366]: Using fan mode: cool"

The outside room temperature is about 20°C - it is a room in the cellar.

I'm currently running a disk test and I'm monitoring the temperature and the fan speed - all is fine so far.

I reviewed several logs and remembered that I run into same issue on March 8 already.

Interesting wise a scheduled defrag is running each 8th of a month:

Excerpt cron.log:

"Apr 08 11:15:01 PrivateNAS CRON[3456]: (root) CMD (/usr/bin/volume_schedule 2 &> /dev/null)"

Every time the defrag is being started, the process is started twice. I'm getting an eMail saying that defrag was started 2 times.

The systemd-journal.log seem to confirm this:

"Apr 08 11:15:01 PrivateNAS readynasd[2366]: Defragmentation started for volume data.
Apr 08 11:15:01 PrivateNAS readynasd[2366]: Defragmentation started for volume data.
Apr 08 11:15:02 PrivateNAS msmtpq[3476]: mail for [ -C /etc/msmtprc holger@martens-lonsheim.de --timeout=60 ] : send was successful
Apr 08 11:15:04 PrivateNAS msmtpq[3490]: mail for [ -C /etc/msmtprc holger@martens-lonsheim.de --timeout=60 ] : send was successful"

From the cron.log I only can see it getting started once. 

For both temperature triggered shutdown events the defrag process was just complete, when the RN104 shutdown.

I'm now wondering if the temperature triggered shutdown is somehow related to the scheduled defrag process?

I don't know if it's possible to log the fan speed somehow. Maybe the fan is not working correctly anymore?

I already cleaned the disk drives and RN104 inside after the March 8 shutdown. Means, no dust is hindering the air flow.

---

Log excerpt March 8:

Feb 08, 2020 12:31:35 PM
 
System: The system is shutting down.
Feb 08, 2020 12:31:31 PM
 
Disk: Disk in channel 3 (Internal) exceeded safe temperature threshold (66 C).
Feb 08, 2020 12:31:30 PM
 
Disk: System is shutting down because disk in channel 2 (Internal) exceeded safe temperature threshold (67 C).
Feb 08, 2020 12:31:28 PM
 
Disk: Disk in channel 2 (Internal) exceeded safe temperature threshold (67 C).
Feb 08, 2020 12:30:14 PM
 
Volume: Defragmentation complete for volume data.
Feb 08, 2020 11:15:01 AM
 
Volume: Defragmentation started for volume data.
Feb 08, 2020 11:15:01 AM
 
Volume: Defragmentation started for volume data.
Feb 08, 2020 11:02:16 AM
 
System: ReadyNASOS background service started.

 

Log excerpt April 8:

Apr 08, 2020 12:11:55 PM
 
System: ReadyNASOS background service started.
Apr 08, 2020 12:11:29 PM
 
Disk: Disk in channel 3 (Internal) exceeded safe temperature threshold (60 C).
Apr 08, 2020 12:11:26 PM
 
Disk: Disk in channel 2 (Internal) exceeded safe temperature threshold (60 C).
Apr 08, 2020 12:07:13 PM
 
System: The system is shutting down.
Apr 08, 2020 12:07:08 PM
 
Disk: Disk in channel 3 (Internal) exceeded safe temperature threshold (63 C).
Apr 08, 2020 12:07:07 PM
 
Disk: System is shutting down because disk in channel 2 (Internal) exceeded safe temperature threshold (64 C).
Apr 08, 2020 12:07:05 PM
 
Disk: Disk in channel 2 (Internal) exceeded safe temperature threshold (64 C).
Apr 08, 2020 12:05:51 PM
 
Volume: Defragmentation complete for volume data.
Apr 08, 2020 11:15:01 AM
 
Volume: Defragmentation started for volume data.
Apr 08, 2020 11:15:01 AM
 
Volume: Defragmentation started for volume data.
Apr 08, 2020 11:02:17 AM
 
System: ReadyNASOS background service started.

 

On March 8 the system was running at level 6.10.2

On April 8 the system was/is at level 6.10.3 (update was done at March 18).

I would be happy to receive some advice how to proceed.

Many thanks and regards,

Holger

27 Replies

Replies have been turned off for this discussion
  • StephenB's avatar
    StephenB
    Guru - Experienced User

    HolgerGT86 wrote:

     

    I would be happy to receive some advice how to proceed.

     


    It's possible that one disk is overheating (and raising the temp of the adjacent disk).  Have you looked at the temperatures of disk 1 and disk 4?

     

    I'd start by powering down the NAS, and then confirm that disks 2 and 3 are in fact hot, and also see how disks 1 and 4 compare.

     

    Maybe also connect the disks to a PC, and run the vendor diagnostics on each one. You can monitor the temps using any tool that supports smart stat gathering (Acronis Disk Monitor is one of several such tools).  This would confirm that the disks are still working properly, as well as giving you some idea of the temperature rise during the test.  Ideally use a USB/SATA adapter.  A dock is ok too, but of course it would reduce the air flow around the disk.  If one disk is spiking more than the others, then you might just need to replace that drive.

     

    WD's Lifeguard software should recognize the disk, and let you run the non-destructive long test.

     

    Another test you could try is to swap disks 1 and 2, and also swap 3 and 4.  Do this with the NAS powered down, and label the disks by their original slot number.  The system should be able to mount the volume normally (even in the different order).  Then look at the temperature pattern, and see if the hottest disks are still in the center slots, or if they are shifting to the outside.

     

     

     

    • HolgerGT86's avatar
      HolgerGT86
      Guide

      Hello Stephen,

      many thanks for your fast response!

      I confirm that the disk were really warm as the air coming out of the RN104 was really warm. When I powered the RN104 again ... some minutes after the shutdown ... the inner disks were at about 60°C and the outer disks at about 50-54°C. 

      Swapping the disks would be an option I can try. Connecting a PC may be an issue as I use Apple at home.

      What about the coexistence with the defrag processes? I'm using the RN104 as backup device only (about 99%), maybe I'm using it to share single files. Never had an issue with that.

      I'm running defrag, scrub, balance and disk test maintenance on a regular base, but the issue only occured directly after defrag so far.

      Is it possible that, when the parallel running defrag ends, the fan controlling process is halted, too? By chance I mean.

      Any way to find out why the defrag is started twice?

      Any way to find out if the fan controlling process is in trouble?

      • StephenB's avatar
        StephenB
        Guru - Experienced User

        HolgerGT86 wrote:

         

        What about the coexistence with the defrag processes? I'm using the RN104 as backup device only (about 99%), maybe I'm using it to share single files. Never had an issue with that.

         


        Normally the disks will get hotter under heavy use.  So if the defrag (or any other task) has a lot of work to do, I'd expect a temperature rise.  But the NAS fan control should be enough to prevent the actual overheating.

         

        It is conceivable that the fan got damaged when you cleaned it.  You should definitely be hearing the fan kick into high gear as the temperature goes up.  Maybe move the NAS to a new spot where that would be easier to hear???  You should also be able to feel the air flow, esp. when it ramps up.

         

        X86 NAS (RN300 series and up) do have performance graphs that include SYS and CPU temps.  But not the RN100 and RN200 familes.

         

        If you enable ssh, then smartctl might give you some temperature history

        root@NAS:~# smartctl -x /dev/sda
        smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.4.190.x86_64.1] (local build)
        Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org
        
        === START OF INFORMATION SECTION ===
        ...
        
        SCT Temperature History Version:     2
        Temperature Sampling Period:         1 minute
        Temperature Logging Interval:        1 minute
        Min/Max recommended Temperature:      0/65 Celsius
        Min/Max Temperature Limit:           -40/70 Celsius
        Temperature History Size (Index):    128 (17)
        
        Index    Estimated Time   Temperature Celsius
          18    2020-04-08 09:18    35  ****************
         ...    ..(126 skipped).    ..  ****************
          17    2020-04-08 11:25    35  ****************
        
        ...

        This might not be available with your disks though.

         

        You can also get the current temp

        root@NAS:~# smartctl -a /dev/sda | grep Temperature_Celsius
        194 Temperature_Celsius     0x0002   203   203   000    Old_age   Always       -       32 (Min/Max 13/45)
        root@NAS:~# smartctl -a /dev/sdb | grep Temperature_Celsius
        194 Temperature_Celsius     0x0002   185   185   000    Old_age   Always       -       35 (Min/Max 14/48)
        root@NAS:~# smartctl -a /dev/sdc | grep Temperature_Celsius
        194 Temperature_Celsius     0x0022   119   105   000    Old_age   Always       -       33
        root@NAS:~# smartctl -a /dev/sdd | grep Temperature_Celsius
        194 Temperature_Celsius     0x0022   121   108   000    Old_age   Always       -       31
        root@NAS:~#

        It's easy to redo these commands w/o retyping them - you use the up arrow on your keyboard to see the previously entered commands, and then press enter to re-execute it again.

         

         

         

         

         

         

NETGEAR Academy

Boost your skills with the Netgear Academy - Get trained, certified and stay ahead with the latest Netgear technology! 

Join Us!

ProSupport for Business

Comprehensive support plans for maximum network uptime and business peace of mind.

 

Learn More