Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
ATA errors on one disk, failed other disk
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
2015-07-02
08:27 AM
2015-07-02
08:27 AM
ATA errors on one disk, failed other disk
On 6/1/15 I got the following email:
On 10/1/15 I got:
On 1/2/15 I got a differently formatted message:
On 27/6/15 I got another:
This is all same disk.
The SMART data for the disk suggests:
1) I don't understand the numbers (the ATA errors don't add up)
2) Why the differing message formats?
3) Does this level and type of error matter?
I also saw in the log from 21/6/15 (but received no email):
And today received:
For Disk 2.
A replacement for Disk 2 (2TB WD Red) will arrive in the morning to replace it.
I have 2 USB disks connected which backup on alternate days from a daily snapshot. A successful backup was carried out on Wednesday and one is currently in progress for today (they take an age, even for incremental backups, I'm glad that there is only about 1.5TB of data). I've turned the backup job off for the disk that completed successfully on Wednesday and will only enable it when the array has been rebuilt.
What is the chance of Disk 4 failing during the rebuild? Should it be replaced once the array has been rebuilt?
Detected increasing ATA errors on disk 4[ST32000542AS, 5XW1M471] 40 times in the past 30 days. This often indicates an impending failure. Please be prepared to replace this disk to maintain data redundancy.
On 10/1/15 I got:
Detected increasing ATA errors on disk 4[ST32000542AS, 5XW1M471] 171 times in the past 30 days. This often indicates an impending failure. Please be prepared to replace this disk to maintain data redundancy.
On 1/2/15 I got a differently formatted message:
ATA error count has increased in the last day.
Disk 4:
Previous count: 5
Current count: 170
Growing SMART errors indicate a disk that may fail soon. If the errors continue to increase, you should be prepared to replace the disk.
On 27/6/15 I got another:
Detected increasing ATA errors on disk 4[ST32000542AS, 5XW1M471] 182 times in the past 30 days. This often indicates an impending failure. Please be prepared to replace this disk to maintain data redundancy.
This is all same disk.
The SMART data for the disk suggests:
Model: ST32000542AS
Serial: 5XW1M471
Firmware: CC34
SMART Attribute
Spin Up Time 0
Start Stop Count 91
Reallocated Sector Count 0
Power On Hours 34760
Spin Retry Count 0
Power Cycle Count 91
Runtime Bad Block 0
End-to-End Error 0
Reported Uncorrect 198
Command Timeout 0
High Fly Writes 37
Airflow Temperature Cel 41
Temperature Celsius 41
Current Pending Sector 0
Offline Uncorrectable 0
UDMA CRC Error Count 0
Head Flying Hours 118626996750541
Total LBAs Written 4246113807
Total LBAs Read 3075062916
ATA Error Count 181
1) I don't understand the numbers (the ATA errors don't add up)
2) Why the differing message formats?
3) Does this level and type of error matter?
I also saw in the log from 21/6/15 (but received no email):
Reallocated sector count has increased in the last day. Disk 2: Previous count: 0 Current count: 233 ATA error count has increased in the last day. Disk 2: Previous count: 0 Current count: 4 Growing SMART errors indicate a disk that may fail soon. If the errors continue to increase, you should be prepared to replace the disk.
And today received:
Disk failure detected.
For Disk 2.
A replacement for Disk 2 (2TB WD Red) will arrive in the morning to replace it.
I have 2 USB disks connected which backup on alternate days from a daily snapshot. A successful backup was carried out on Wednesday and one is currently in progress for today (they take an age, even for incremental backups, I'm glad that there is only about 1.5TB of data). I've turned the backup job off for the disk that completed successfully on Wednesday and will only enable it when the array has been rebuilt.
What is the chance of Disk 4 failing during the rebuild? Should it be replaced once the array has been rebuilt?
Message 1 of 13
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
2015-07-02
08:34 AM
2015-07-02
08:34 AM
Re: ATA errors on one disk, failed other disk
PS:
The backup jobs start at 00:05
They finish at around 11:30 or 15:00 (depending on which device). Both are EXT3 formatted. There is a full backup (with remove) every 4 weeks, these times are just for the regular incremental backup (which many days is 0 bytes). It seems to go to 17:00 or 21:00 when a full backup is carried out.
Is this normal? I'm wondering what will happen when I move to bigger disks and use more storage - will backup be a problem? Will it take more than a day to back up the array?
ReadyNAS NVX with latest firmware.
The backup jobs start at 00:05
They finish at around 11:30 or 15:00 (depending on which device). Both are EXT3 formatted. There is a full backup (with remove) every 4 weeks, these times are just for the regular incremental backup (which many days is 0 bytes). It seems to go to 17:00 or 21:00 when a full backup is carried out.
Is this normal? I'm wondering what will happen when I move to bigger disks and use more storage - will backup be a problem? Will it take more than a day to back up the array?
ReadyNAS NVX with latest firmware.
Message 2 of 13
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
2015-07-02
08:57 AM
2015-07-02
08:57 AM
Re: ATA errors on one disk, failed other disk
Regarding disk 4, you should have replaced this in January.
Regarding disk 2, this also needs to be replaced, as you are planning.
Make sure your backup is right up to date and verified before you put in the first replacement disk, the resync may well finish off the other bad disk.
Regarding your backups, what protocol are you using, and what are you backing up to, another nas or a USB drive?
It is definitely taking too long for incremental, unless you have a lot of data added each day.
Regarding disk 2, this also needs to be replaced, as you are planning.
Make sure your backup is right up to date and verified before you put in the first replacement disk, the resync may well finish off the other bad disk.
Regarding your backups, what protocol are you using, and what are you backing up to, another nas or a USB drive?
It is definitely taking too long for incremental, unless you have a lot of data added each day.
Message 3 of 13
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
2015-07-02
10:15 AM
2015-07-02
10:15 AM
Re: ATA errors on one disk, failed other disk
I appear to be in the wrong form - can a mod move this to general questions please?
What about these ATA errors? They appear not to be SMART errors at all. How many are significant? Why does the total number not tally up correctly?
Why didn't I get an email about the other errors on disk 2 which were SMART errors and did indeed turn out to be significant?
What about these ATA errors? They appear not to be SMART errors at all. How many are significant? Why does the total number not tally up correctly?
Why didn't I get an email about the other errors on disk 2 which were SMART errors and did indeed turn out to be significant?
Message 4 of 13
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
2015-07-02
10:20 AM
2015-07-02
10:20 AM
Re: ATA errors on one disk, failed other disk
vandermerwe wrote: Regarding disk 4, you should have replaced this in January.
Regarding disk 2, this also needs to be replaced, as you are planning.
Make sure your backup is right up to date and verified before you put in the first replacement disk, the resync may well finish off the other bad disk.
Regarding your backups, what protocol are you using, and what are you backing up to, another nas or a USB drive?
It is definitely taking too long for incremental, unless you have a lot of data added each data
USB drive (by port). Formatted EXT3.
The NAS in question belongs to a computer illiterate friend, who lives 300 miles away but happens to be staying here (going home tomorrow). I help him with it.
My NVX isn't much faster (6h for 700gb) to its USB drive formatted EXT3.
Message 5 of 13
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
2015-07-02
11:05 AM
2015-07-02
11:05 AM
Re: ATA errors on one disk, failed other disk
I moved it (though I thought boot, installation, upgrade, expansion was the right place).
dmahon wrote: I appear to be in the wrong form - can a mod move this to general questions please?
Message 6 of 13
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
2015-07-02
01:42 PM
2015-07-02
01:42 PM
Re: ATA errors on one disk, failed other disk
I don't know why the ATA errors don't seem to match the alerts' but ATA errors almost always indicate a failing disk. It's possible that if there are ATA errors on more than one drive in the same slot, then there could be a problem with the drive bay. That's not the case here though.
1 ATA error is significant , AFAIK, drives will be replaced if they have 1 ATA error. The rate of increase is important, some may consider leaving a drive with ATA errors if they are not increasing or increasing very slowly. It depends on your attitude to risk, how important the data is, and how robust and convenient your backups are.
Also not sure about the email alerts. You could look at your logs to see if there were alerts sent which were either not received or there was a problem sending.
The backup issue could have many causes. I would start by looking at the backup configuration to make sure that's right, testing the USB disk, and looking at what's being baked up. You say no data on some days and the job still takes so long. Look at the backup job logs. The difference in time between your full and incremental jobs is not large, suggesting that the incremental jobs are backing up a large amount of data. What backup protocol are you using ?
1 ATA error is significant , AFAIK, drives will be replaced if they have 1 ATA error. The rate of increase is important, some may consider leaving a drive with ATA errors if they are not increasing or increasing very slowly. It depends on your attitude to risk, how important the data is, and how robust and convenient your backups are.
Also not sure about the email alerts. You could look at your logs to see if there were alerts sent which were either not received or there was a problem sending.
The backup issue could have many causes. I would start by looking at the backup configuration to make sure that's right, testing the USB disk, and looking at what's being baked up. You say no data on some days and the job still takes so long. Look at the backup job logs. The difference in time between your full and incremental jobs is not large, suggesting that the incremental jobs are backing up a large amount of data. What backup protocol are you using ?
Message 7 of 13
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
2015-07-03
01:55 PM
2015-07-03
01:55 PM
Re: ATA errors on one disk, failed other disk
New disk in. Nothing. Rebooted - array rebuilding.
Another email too:
I don't believe the numbers at all. Up and down, they don't add up. Should I believe there are actually any ATA errors? It was 181 several days ago, how can it have gone from 170-181 in the last day?
Obviously, I will change the disk. But should I change the NAS (to a different brand)?
Another email too:
ATA error count has increased in the last day.
Disk 4:
Previous count: 170
Current count: 181
Growing SMART errors indicate a disk that may fail soon. If the errors continue to increase, you should be prepared to replace the disk.
I don't believe the numbers at all. Up and down, they don't add up. Should I believe there are actually any ATA errors? It was 181 several days ago, how can it have gone from 170-181 in the last day?
Obviously, I will change the disk. But should I change the NAS (to a different brand)?
Message 8 of 13
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
2015-07-03
02:14 PM
2015-07-03
02:14 PM
Re: ATA errors on one disk, failed other disk
You should run a thorough disk test using vendor tools.
Is the disk under warranty?
if it is not under warranty and it tests OK, then really it's up to you what you do about it. I would have replaced it, as I said earlier.
Is the disk under warranty?
if it is not under warranty and it tests OK, then really it's up to you what you do about it. I would have replaced it, as I said earlier.
Message 9 of 13
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
2015-07-03
02:21 PM
2015-07-03
02:21 PM
Re: ATA errors on one disk, failed other disk
The disk is being replaced once the array has rebuilt.
But should I have faith in the NAS with these error messages that don't make sense? I'm also upset there was no email about the sector reallocations, which are something I would have taken notice of as these have led to failures before.
But should I have faith in the NAS with these error messages that don't make sense? I'm also upset there was no email about the sector reallocations, which are something I would have taken notice of as these have led to failures before.
Message 10 of 13
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
2015-07-03
03:59 PM
2015-07-03
03:59 PM
Re: ATA errors on one disk, failed other disk
Have you checked the logs to see if there were any messages sent that failed? You will need to ssh in to see these logs.
I think the email log is at var/log/frontview/msmtp.log
I think the email log is at var/log/frontview/msmtp.log
Message 11 of 13
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
2015-07-12
06:48 PM
2015-07-12
06:48 PM
Re: ATA errors on one disk, failed other disk
vandermerwe wrote: The backup issue could have many causes. I would start by looking at the backup configuration to make sure that's right, testing the USB disk, and looking at what's being baked up. You say no data on some days and the job still takes so long. Look at the backup job logs. The difference in time between your full and incremental jobs is not large, suggesting that the incremental jobs are backing up a large amount of data. What backup protocol are you using ?
An update, in case anyone searches in future and finds this:
I have just installed a new hard drive in my PC. I now still do a backup job from the NAS to the USB drive but also do another one to the new PC drive over the network.
The backup to PC is finished in under 10 minutes. It would appear to be a piss poor USB implementation on the NAS that is the trouble (this is replicated on two NAS and on four USB drives [EXT3 formatted] from different manufacturers).
The RAID array in question successfully completed the rebuild twice, with two replacement 2GB disks done separately and sequentially. It took about 12h for each rebuild.
Message 12 of 13
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
2015-07-12
07:44 PM
2015-07-12
07:44 PM
Re: ATA errors on one disk, failed other disk
USB is resource intensive which is a significant factor in the time USB backups take to run.
Message 13 of 13