× NETGEAR will be terminating ReadyCLOUD service by July 1st, 2023. For more details click here.
Orbi WiFi 7 RBE973
Reply

ATA errors on one disk, failed other disk

dmahon1
Aspirant

ATA errors on one disk, failed other disk

On 6/1/15 I got the following email:

Detected increasing ATA errors on disk 4[ST32000542AS, 5XW1M471] 40 times in the past 30 days. This often indicates an impending failure. Please be prepared to replace this disk to maintain data redundancy.


On 10/1/15 I got:

Detected increasing ATA errors on disk 4[ST32000542AS, 5XW1M471] 171 times in the past 30 days. This often indicates an impending failure. Please be prepared to replace this disk to maintain data redundancy.


On 1/2/15 I got a differently formatted message:

ATA error count has increased in the last day.

Disk 4:
Previous count: 5
Current count: 170

Growing SMART errors indicate a disk that may fail soon. If the errors continue to increase, you should be prepared to replace the disk.


On 27/6/15 I got another:


Detected increasing ATA errors on disk 4[ST32000542AS, 5XW1M471] 182 times in the past 30 days. This often indicates an impending failure. Please be prepared to replace this disk to maintain data redundancy.


This is all same disk.

The SMART data for the disk suggests:


Model: ST32000542AS
Serial: 5XW1M471
Firmware: CC34


SMART Attribute

Spin Up Time 0
Start Stop Count 91
Reallocated Sector Count 0
Power On Hours 34760
Spin Retry Count 0
Power Cycle Count 91
Runtime Bad Block 0
End-to-End Error 0
Reported Uncorrect 198
Command Timeout 0
High Fly Writes 37
Airflow Temperature Cel 41
Temperature Celsius 41
Current Pending Sector 0
Offline Uncorrectable 0
UDMA CRC Error Count 0
Head Flying Hours 118626996750541
Total LBAs Written 4246113807
Total LBAs Read 3075062916

ATA Error Count 181


1) I don't understand the numbers (the ATA errors don't add up)
2) Why the differing message formats?
3) Does this level and type of error matter?

I also saw in the log from 21/6/15 (but received no email):


Reallocated sector count has increased in the last day. Disk 2: Previous count: 0 Current count: 233 ATA error count has increased in the last day. Disk 2: Previous count: 0 Current count: 4 Growing SMART errors indicate a disk that may fail soon. If the errors continue to increase, you should be prepared to replace the disk.


And today received:

Disk failure detected.


For Disk 2.

A replacement for Disk 2 (2TB WD Red) will arrive in the morning to replace it.

I have 2 USB disks connected which backup on alternate days from a daily snapshot. A successful backup was carried out on Wednesday and one is currently in progress for today (they take an age, even for incremental backups, I'm glad that there is only about 1.5TB of data). I've turned the backup job off for the disk that completed successfully on Wednesday and will only enable it when the array has been rebuilt.

What is the chance of Disk 4 failing during the rebuild? Should it be replaced once the array has been rebuilt?
Message 1 of 13
dmahon1
Aspirant

Re: ATA errors on one disk, failed other disk

PS:

The backup jobs start at 00:05

They finish at around 11:30 or 15:00 (depending on which device). Both are EXT3 formatted. There is a full backup (with remove) every 4 weeks, these times are just for the regular incremental backup (which many days is 0 bytes). It seems to go to 17:00 or 21:00 when a full backup is carried out.

Is this normal? I'm wondering what will happen when I move to bigger disks and use more storage - will backup be a problem? Will it take more than a day to back up the array?

ReadyNAS NVX with latest firmware.
Message 2 of 13
vandermerwe
Master

Re: ATA errors on one disk, failed other disk

Regarding disk 4, you should have replaced this in January.
Regarding disk 2, this also needs to be replaced, as you are planning.

Make sure your backup is right up to date and verified before you put in the first replacement disk, the resync may well finish off the other bad disk.

Regarding your backups, what protocol are you using, and what are you backing up to, another nas or a USB drive?

It is definitely taking too long for incremental, unless you have a lot of data added each day.
Message 3 of 13
dmahon1
Aspirant

Re: ATA errors on one disk, failed other disk

I appear to be in the wrong form - can a mod move this to general questions please?

What about these ATA errors? They appear not to be SMART errors at all. How many are significant? Why does the total number not tally up correctly?

Why didn't I get an email about the other errors on disk 2 which were SMART errors and did indeed turn out to be significant?
Message 4 of 13
dmahon1
Aspirant

Re: ATA errors on one disk, failed other disk

vandermerwe wrote:
Regarding disk 4, you should have replaced this in January.
Regarding disk 2, this also needs to be replaced, as you are planning.

Make sure your backup is right up to date and verified before you put in the first replacement disk, the resync may well finish off the other bad disk.

Regarding your backups, what protocol are you using, and what are you backing up to, another nas or a USB drive?

It is definitely taking too long for incremental, unless you have a lot of data added each data


USB drive (by port). Formatted EXT3.

The NAS in question belongs to a computer illiterate friend, who lives 300 miles away but happens to be staying here (going home tomorrow). I help him with it.

My NVX isn't much faster (6h for 700gb) to its USB drive formatted EXT3.
Message 5 of 13
StephenB
Guru

Re: ATA errors on one disk, failed other disk

dmahon wrote:
I appear to be in the wrong form - can a mod move this to general questions please?
I moved it (though I thought boot, installation, upgrade, expansion was the right place).
Message 6 of 13
vandermerwe
Master

Re: ATA errors on one disk, failed other disk

I don't know why the ATA errors don't seem to match the alerts' but ATA errors almost always indicate a failing disk. It's possible that if there are ATA errors on more than one drive in the same slot, then there could be a problem with the drive bay. That's not the case here though.
1 ATA error is significant , AFAIK, drives will be replaced if they have 1 ATA error. The rate of increase is important, some may consider leaving a drive with ATA errors if they are not increasing or increasing very slowly. It depends on your attitude to risk, how important the data is, and how robust and convenient your backups are.

Also not sure about the email alerts. You could look at your logs to see if there were alerts sent which were either not received or there was a problem sending.

The backup issue could have many causes. I would start by looking at the backup configuration to make sure that's right, testing the USB disk, and looking at what's being baked up. You say no data on some days and the job still takes so long. Look at the backup job logs. The difference in time between your full and incremental jobs is not large, suggesting that the incremental jobs are backing up a large amount of data. What backup protocol are you using ?
Message 7 of 13
dmahon1
Aspirant

Re: ATA errors on one disk, failed other disk

New disk in. Nothing. Rebooted - array rebuilding.

Another email too:

ATA error count has increased in the last day.

Disk 4:
Previous count: 170
Current count: 181

Growing SMART errors indicate a disk that may fail soon. If the errors continue to increase, you should be prepared to replace the disk.


I don't believe the numbers at all. Up and down, they don't add up. Should I believe there are actually any ATA errors? It was 181 several days ago, how can it have gone from 170-181 in the last day?

Obviously, I will change the disk. But should I change the NAS (to a different brand)?
Message 8 of 13
vandermerwe
Master

Re: ATA errors on one disk, failed other disk

You should run a thorough disk test using vendor tools.
Is the disk under warranty?

if it is not under warranty and it tests OK, then really it's up to you what you do about it. I would have replaced it, as I said earlier.
Message 9 of 13
dmahon1
Aspirant

Re: ATA errors on one disk, failed other disk

The disk is being replaced once the array has rebuilt.

But should I have faith in the NAS with these error messages that don't make sense? I'm also upset there was no email about the sector reallocations, which are something I would have taken notice of as these have led to failures before.
Message 10 of 13
vandermerwe
Master

Re: ATA errors on one disk, failed other disk

Have you checked the logs to see if there were any messages sent that failed? You will need to ssh in to see these logs.

I think the email log is at var/log/frontview/msmtp.log
Message 11 of 13
dmahon1
Aspirant

Re: ATA errors on one disk, failed other disk

vandermerwe wrote:
The backup issue could have many causes. I would start by looking at the backup configuration to make sure that's right, testing the USB disk, and looking at what's being baked up. You say no data on some days and the job still takes so long. Look at the backup job logs. The difference in time between your full and incremental jobs is not large, suggesting that the incremental jobs are backing up a large amount of data. What backup protocol are you using ?


An update, in case anyone searches in future and finds this:

I have just installed a new hard drive in my PC. I now still do a backup job from the NAS to the USB drive but also do another one to the new PC drive over the network.

The backup to PC is finished in under 10 minutes. It would appear to be a piss poor USB implementation on the NAS that is the trouble (this is replicated on two NAS and on four USB drives [EXT3 formatted] from different manufacturers).

The RAID array in question successfully completed the rebuild twice, with two replacement 2GB disks done separately and sequentially. It took about 12h for each rebuild.
Message 12 of 13
mdgm-ntgr
NETGEAR Employee Retired

Re: ATA errors on one disk, failed other disk

USB is resource intensive which is a significant factor in the time USB backups take to run.
Message 13 of 13
Top Contributors
Discussion stats
  • 12 replies
  • 3679 views
  • 0 kudos
  • 4 in conversation
Announcements