× NETGEAR will be terminating ReadyCLOUD service by July 1st, 2023. For more details click here.
Orbi WiFi 7 RBE973
Reply

ATA error count increasing- Bad disk or bad chassis?

als-chups
Aspirant

ATA error count increasing- Bad disk or bad chassis?

Hello   assembled community,

 I have a Ready NAS  214 diskless purchased in May 2020. I initially populated it with 3 x Toshiba  N300 4TB discs and all was well. In November I purchased a 4th Toshiba N300 4TB to increase the size of the volume. All went well initially, until, after 30 days , I started to get the 'increasing ATA error count on disc 4' email alert. I asked for a replacement disc from the vendor and was sent another Toshiba  4TB. I put it in bay 4, volume re-synced, but immediately started getting the increasing ATA errors on the new disc. The error appears to be the same each time. On boot up there are either 3 or 4 ATA errors, but this does not increase while the unit is on. It seems to happen on boot up only. The data is all accessible, and the system works well. I've opened a case with Netgear support, but I have to say they have been less than helpful. They suggested a factory re-set as the first thing to do. Is there a way to work out whether I have another bad disk, or whether there is a problem with the unit? I still have the first disc which was put in in November. If I was to  pull one of the other discs (1,2,or3) and replace it with the 'faulty' 4th disc, and let the volume re-sync, if there were no new ATA errors on this disc, would it confirm that the problem is with the chassis? Can I put either of the 'faulty' discs through a disc test attached via an HDD docking station to a laptop to see if there is a problem with either of the discs? Any help appreciated before I have to return  these two discs for replacement.

Message 1 of 9

Accepted Solutions
Sandshark
Sensei

Re: ATA error count increasing- Bad disk or bad chassis?


@als-chups wrote:

 

 

So, my conclusion is that it is a problem with slot 4 on the chassis, unless you guys can think of an alternative reason. The power supply seems OK.

 

 I have a couple of questions:-

    Does it matter? It looks like there are increasing ATA errors on start-up  which seem to be sorted within a 5 second timeframe on start-up and then give no further increase during the up-time of the unit. Performance is fine, and it does not look as if the disk involved is failing. The ATA errors are an increase of either 3 or 4 on each boot. If I monitor this for a sudden change, would that give me warning that the disk  may be the problem?

 The unit is still under hardware warranty, but any interaction with Netgear Support  makes me lose the will to live. I  could simply return it to the original vendor for a replacement.

 How do I re-establish my old volume and data with the other 4 drives I've pulled?

 My plan is:-

 1.Power down. Remove the drive from slot 4. 

2. Boot up with no drives.  Factory re-set the device.

3. Re-install configuration files downloaded before first factory reset. Power down.

4. Re-install the 4 drives in their correct order.

5. Re-boot.

 

Will this work, or is there a chance it will see the 4 drives as new and re-format them? 

I agree with your conclusion that it's the chassis.  You should check and make sure there isn't just something in the SATA connector that can be blown or picked out before you give up on it completely.

 

Does it matter?  Well, it will likely get worse and will eventually matter.  I would do a warraty replacement now, before it gets worse, possibly corrupting the data volume, or your warranty expires.

 

A factory default without drives does nothing.  A factory default re-initializes the drives from the flash; so no drives, no reset.  And your problem is not with the firmware, anyway, so no need to do a reset at all.

View solution in original post

Message 9 of 9

All Replies
rn_enthusiast
Virtuoso

Re: ATA error count increasing- Bad disk or bad chassis?

Hi @als-chups 

 

I agree that a factory reset does not seem like it will have any effect on this problem. ATA errors is one of those types of errors that can also be the cause of a fault bay/chassis. It can be the SATA connector as well, in the chassis - or it could be the HDD itself which seems unlikely at this point.

 

I think you should test the disk with a disk test tool. I am sure Toshiba has one available on their website. That will require that you can connect the HDD to your PC via a USB-to-SATA connector or similar.

 

If the disk is clean in that disk test, I would then lean toward a chassis issue. My money would be on a problem with the SATA connector in bay 4.

 

Download the logs from the web admin page. "System" > "Logs" > "Download Logs". Locate the dmesg.log file. Search for "ata4" in that log file. What lines do you see with regards to "ata4"?

 

 

Cheers

Message 2 of 9
als-chups
Aspirant

Re: ATA error count increasing- Bad disk or bad chassis?

Many thanks for your helpful reply. I'vee had a look for a Toshiba HDD test utility and they don't seem to supply one, but I'm sure that there are others available which will do the same thing. I'll keep looking.

 The Dmesg.log file mentions the ata4 in these lines;-

 

[Mon Jan 11 20:33:35 2021] ata4.00: exception Emask 0x10 SAct 0x800 SErr 0x400101 action 0x6 frozen
[Mon Jan 11 20:33:35 2021] ata4.00: irq_stat 0x08000000, interface fatal error
[Mon Jan 11 20:33:35 2021] ata4: SError: { RecovData UnrecovData Handshk }
[Mon Jan 11 20:33:35 2021] ata4.00: failed command: WRITE FPDMA QUEUED
[Mon Jan 11 20:33:35 2021] ata4.00: cmd 61/08:58:40:20:00/00:00:00:00:00/40 tag 11 ncq 4096 out
res 40/00:58:40:20:00/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
[Mon Jan 11 20:33:35 2021] ata4.00: status: { DRDY }
[Mon Jan 11 20:33:35 2021] ata4: hard resetting link
[Mon Jan 11 20:33:35 2021] ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[Mon Jan 11 20:33:36 2021] ata4.00: configured for UDMA/100
[Mon Jan 11 20:33:36 2021] ata4: EH complete
[Mon Jan 11 20:33:36 2021] ata4.00: exception Emask 0x10 SAct 0x40000000 SErr 0x400101 action 0x6 frozen
[Mon Jan 11 20:33:36 2021] ata4.00: irq_stat 0x08000000, interface fatal error
[Mon Jan 11 20:33:36 2021] ata4: SError: { RecovData UnrecovData Handshk }
[Mon Jan 11 20:33:36 2021] ata4.00: failed command: WRITE FPDMA QUEUED
[Mon Jan 11 20:33:36 2021] ata4.00: cmd 61/08:f0:40:20:00/00:00:00:00:00/40 tag 30 ncq 4096 out
res 40/00:f0:40:20:00/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
[Mon Jan 11 20:33:36 2021] ata4.00: status: { DRDY }
[Mon Jan 11 20:33:36 2021] ata4: hard resetting link
[Mon Jan 11 20:33:36 2021] ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[Mon Jan 11 20:33:36 2021] ata4.00: configured for UDMA/100
[Mon Jan 11 20:33:36 2021] ata4: EH complete
[Mon Jan 11 20:33:36 2021] ata4.00: exception Emask 0x10 SAct 0x2 SErr 0x400101 action 0x6 frozen
[Mon Jan 11 20:33:36 2021] ata4.00: irq_stat 0x08000000, interface fatal error
[Mon Jan 11 20:33:36 2021] ata4: SError: { RecovData UnrecovData Handshk }
[Mon Jan 11 20:33:36 2021] ata4.00: failed command: WRITE FPDMA QUEUED
[Mon Jan 11 20:33:36 2021] ata4.00: cmd 61/08:08:40:20:00/00:00:00:00:00/40 tag 1 ncq 4096 out
res 40/00:08:40:20:00/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
[Mon Jan 11 20:33:36 2021] ata4.00: status: { DRDY }
[Mon Jan 11 20:33:36 2021] ata4: hard resetting link
[Mon Jan 11 20:33:37 2021] ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[Mon Jan 11 20:33:37 2021] ata4.00: configured for UDMA/100
[Mon Jan 11 20:33:37 2021] ata4: EH complete
[Mon Jan 11 20:33:37 2021] ata4: limiting SATA link speed to 3.0 Gbps
[Mon Jan 11 20:33:37 2021] ata4.00: exception Emask 0x10 SAct 0x1000 SErr 0x400101 action 0x6 frozen
[Mon Jan 11 20:33:37 2021] ata4.00: irq_stat 0x08000000, interface fatal error
[Mon Jan 11 20:33:37 2021] ata4: SError: { RecovData UnrecovData Handshk }
[Mon Jan 11 20:33:37 2021] ata4.00: failed command: WRITE FPDMA QUEUED
[Mon Jan 11 20:33:37 2021] ata4.00: cmd 61/08:60:40:20:00/00:00:00:00:00/40 tag 12 ncq 4096 out
res 40/00:60:40:20:00/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
[Mon Jan 11 20:33:37 2021] ata4.00: status: { DRDY }
[Mon Jan 11 20:33:37 2021] ata4: hard resetting link
[Mon Jan 11 20:33:37 2021] ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
[Mon Jan 11 20:33:38 2021] ata4.00: configured for UDMA/100
[Mon Jan 11 20:33:38 2021] ata4: EH complete

 

Does this help in deciding if it's a problem with the SATA connector?

Message 3 of 9
StephenB
Guru

Re: ATA error count increasing- Bad disk or bad chassis?

Unfortunately, all the software logs can tell you is that there was an ATA error.  There's no way to tell if the problem is in the chassis, or the drive. 

 

One thing you could try is to power down and remove all the drives (labeling them by slot).  Then do a fresh factory install on the original disk (putting it in slot 1).  After that completes, see if you get any ATA errors when you copy data.  If you do, then then drive is likely the problem.  If you don't, then move the drive to slot 4, and try again.

 

If you do see ATA errors only in slot 4, then you have a good case that it is the chassis.  But if you don't see them, then the test is inconclusive.

Message 4 of 9
rn_enthusiast
Virtuoso

Re: ATA error count increasing- Bad disk or bad chassis?

There is definitely issues with reading this disk, as seen in the dmesg logs. Could be the chassis, could be disk but two new disks with the same issue... Just seems unlikely to me. Trying what StephenB suggested would help narrow it down, yes.

 

Message 5 of 9
StephenB
Guru

Re: ATA error count increasing- Bad disk or bad chassis?


@rn_enthusiast wrote:

but two new disks with the same issue... Just seems unlikely to me. 

 


True.  One complication here - when I've had one failing disk, I've sometimes seen ATA errors show up on other disks in the array as a side effect.

 

I'm also wondering about power - if the power adapter is failing (or the power circuitry in the NAS is misbehaving), then it's conceivable that the fourth drive isn't getting enough power.  

Message 6 of 9
rn_enthusiast
Virtuoso

Re: ATA error count increasing- Bad disk or bad chassis?

@StephenB wrote:

I'm also wondering about power - if the power adapter is failing (or the power circuitry in the NAS is misbehaving), then it's conceivable that the fourth drive isn't getting enough power.  


Yes, this is a good point.

Message 7 of 9
als-chups
Aspirant

Re: ATA error count increasing- Bad disk or bad chassis?

Thanks both for your help. All sensible advice.

So, on your recommendation I:-

 1. Performed a full back-up to an external USB device. Current ATA count 32 on drive 4.

 2. Downloaded the Configuration files so that I can get it back to its original state.

 3. Brought it into the house (it lives in an IT wall cabinet in an outhouse/office built 15 years ago and with power professionally installed, but, just in case the power to the building is a bit suspect, I brought it in)

 4. Re-booted in the house. ATA count went to 36 on re-boot.

 5. Powered off and power supply changed (I have another RND214 with 4x WD drives in it so used this power supply instead). On re-boot ATA error went to 40. 

 6. Powered off, pulled the 4 HDDs. All numbered as they came out.

 7. Put the 'old' HDD into slot one and  did a factory reset. No increase  in ATA error (it still reads ATA error number 49, which is what it was at when I pulled it and got the replacement)

 8. Transferred various files differing in size from 50mb to  14GB. No change in ATA error count. 

9. Between each file transfer did a restart. No increase in ATA count on any restarts.

10. Powered off, pulled the drive in slot 1 and replaced it in  slot 4. Re-boot. ATA error went to 52.  Recently moved files still present on the drive and shown as 'healthy'.

11. Two  further re-boots. Each time on boot the ATA error increased to 55 and then 58. Between re-boots transfer of large amounts of data with no increase in ATA error.

 

So, my conclusion is that it is a problem with slot 4 on the chassis, unless you guys can think of an alternative reason. The power supply seems OK.

 I have a couple of questions:-

    Does it matter? It looks like there are increasing ATA errors on start-up  which seem to be sorted within a 5 second timeframe on start-up and then give no further increase during the up-time of the unit. Performance is fine, and it does not look as if the disk involved is failing. The ATA errors are an increase of either 3 or 4 on each boot. If I monitor this for a sudden change, would that give me warning that the disk  may be the problem?

 The unit is still under hardware warranty, but any interaction with Netgear Support  makes me lose the will to live. I  could simply return it to the original vendor for a replacement.

 How do I re-establish my old volume and data with the other 4 drives I've pulled?

 My plan is:-

 1.Power down. Remove the drive from slot 4. 

2. Boot up with no drives.  Factory re-set the device.

3. Re-install configuration files downloaded before first factory reset. Power down.

4. Re-install the 4 drives in their correct order.

5. Re-boot.

 

Will this work, or is there a chance it will see the 4 drives as new and re-format them? 

 

I'm grateful to both for your time in helping me with this.

 

Message 8 of 9
Sandshark
Sensei

Re: ATA error count increasing- Bad disk or bad chassis?


@als-chups wrote:

 

 

So, my conclusion is that it is a problem with slot 4 on the chassis, unless you guys can think of an alternative reason. The power supply seems OK.

 

 I have a couple of questions:-

    Does it matter? It looks like there are increasing ATA errors on start-up  which seem to be sorted within a 5 second timeframe on start-up and then give no further increase during the up-time of the unit. Performance is fine, and it does not look as if the disk involved is failing. The ATA errors are an increase of either 3 or 4 on each boot. If I monitor this for a sudden change, would that give me warning that the disk  may be the problem?

 The unit is still under hardware warranty, but any interaction with Netgear Support  makes me lose the will to live. I  could simply return it to the original vendor for a replacement.

 How do I re-establish my old volume and data with the other 4 drives I've pulled?

 My plan is:-

 1.Power down. Remove the drive from slot 4. 

2. Boot up with no drives.  Factory re-set the device.

3. Re-install configuration files downloaded before first factory reset. Power down.

4. Re-install the 4 drives in their correct order.

5. Re-boot.

 

Will this work, or is there a chance it will see the 4 drives as new and re-format them? 

I agree with your conclusion that it's the chassis.  You should check and make sure there isn't just something in the SATA connector that can be blown or picked out before you give up on it completely.

 

Does it matter?  Well, it will likely get worse and will eventually matter.  I would do a warraty replacement now, before it gets worse, possibly corrupting the data volume, or your warranty expires.

 

A factory default without drives does nothing.  A factory default re-initializes the drives from the flash; so no drives, no reset.  And your problem is not with the firmware, anyway, so no need to do a reset at all.

Message 9 of 9
Top Contributors
Discussion stats
  • 8 replies
  • 3827 views
  • 3 kudos
  • 4 in conversation
Announcements