× NETGEAR will be terminating ReadyCLOUD service by July 1st, 2023. For more details click here.
Orbi WiFi 7 RBE973
Reply

SMART errors - What do you do?

sdchew
Aspirant

SMART errors - What do you do?

I got a message from my READYNAS informing me one of the disk is experiencing a SMART event which may point to an impending drive failure. I looked at the SMART table and its basically the 'Current Pending Sector', 'Offline Uncorrectable' and the 'Multi Zone Error Rate' count which is rising.

I run a 5 disk dual redundancy array. I would like to seek some consensus on what can be done to push out replacing the drive if possible.

Should I:

a) don't mess around and replace the drive
b) pull the drive, low level format it and pop it back.
c) increase the disk scrubbing and volume consistency check frequency to safe guard against data corruption and ignore it until it fails

Any advise would be appreciated. Thank you
Message 1 of 11
sdchew
Aspirant

Re: SMART errors - What do you do?

Smart table just incase someone is interested:
SMART Attribute
Raw Read Error Rate 0
Spin Up Time 6233
Start Stop Count 210
Reallocated Sector Count 0
Seek Error Rate 0
Power On Hours 11376
Spin Retry Count 0
Calibration Retry Count 0
Power Cycle Count 208
Power-Off Retract Count 15
Load Cycle Count 507035
Temperature Celsius 40
Reallocated Event Count 0
Current Pending Sector 17
Offline Uncorrectable 17
UDMA CRC Error Count 0
Multi Zone Error Rate 27
ATA Error Count 0

By the way, if I choose option B and the array rebuild fails, data should be still intact due to dual redundancy right?
Message 2 of 11
ihartley
Tutor

Re: SMART errors - What do you do?

Well the SMART could be wrong but the "Load Cycle Count" seems very, very high. Can't say more without knowing what drive it is, but I'd check the mfrs specs. My guess is - if the number is correct - that the disk is parking itself after a short time-out and then un-parking. But check - SMART data can vary between manufacturers.

RMA the disk - I assume it's still in warranty - but replace ASAP.
Message 3 of 11
StephenB
Guru

Re: SMART errors - What do you do?

The key errors are the "current pending sector" and "offline uncorrectable". "Current Pending sectors" is incremented when the disk can't read a sector. On a write request, the sector would have been reallocated.

17 sectors is probably not high enough for the vendor tools to say the drive is bad. I would plan to replace it anyway - especially if the count suddenly jumped from near-0 up to 17.

If it is under warranty, you would get a refurbished drive. Generally I won't put refurbished drives in my RAID array, I put them somewhere else.

You should probably check the warranty status, as you may only have a 12 month warranty (which would be expired) anyway. If it is under warranty, then do a full diagnostic (read and write tests). The bad sector count will likely go up. Note that the write test would be destructive, so you should probably replace the drive first.

On the load cycle count, I have one WD30EZRX with >785000 load cycles, and it is still working fine. My own experience with green drives [so far] is that the load cycle specs are extremely conservative. However, I no longer put them in RAID arrays, I have replaced them in my primary NAS with WDC Red drives.

sdchew wrote:
...a) don't mess around and replace the drive
b) pull the drive, low level format it and pop it back.
c) increase the disk scrubbing and volume consistency check frequency to safe guard against data corruption and ignore it until it fails

(a) is what I would do if it were my drive. Since I have full backups, I might watch it a bit longer. But usually the failure counts rapidly accelerate once they begin climbing into double digits.

A variant of (b) is to run full read/write diagnostics with vendor tools, and then reexamine the SMART stats. Though I wouldn't put it back into the array in any case.

(c) I wouldn't increase the frequencies. Scrubbing in particular will increase the stress on all the drives.
Message 4 of 11
sdchew
Aspirant

Re: SMART errors - What do you do?

Yeah they are WD Green Drive and I'm been putting Red Drives since they appeared on the market. I did modify the Green drives to ensure they don't park so often.

Warranty is probably expired on the drive so I'll chuck it then...
Message 5 of 11
StephenB
Guru

Re: SMART errors - What do you do?

No harm in checking on-line at https://westerndigital.secure.force.com ... ck?lang=en

Your power-on hours suggest you purchased very close to the time when WDC reduced the warranty to 1 year. You might have gotten the tail-end of the 3 year warranty.
Message 6 of 11
sdchew
Aspirant

Re: SMART errors - What do you do?

StephenB wrote:
No harm in checking on-line at https://westerndigital.secure.force.com ... ck?lang=en

Your power-on hours suggest you purchased very close to the time when WDC reduced the warranty to 1 year. You might have gotten the tail-end of the 3 year warranty.


You're guess is spot on... I did a quick check using the link and it indeed has a 3 year warranty on it. Ends 2014.

I'll probably RMA the drive and think of what to do with it later...
Message 7 of 11
StephenB
Guru

Re: SMART errors - What do you do?

sdchew wrote:
I'll probably RMA the drive and think of what to do with it later...
That's my usual approach. Usually it ends up in a desktop or something less critical than my NAS. Sometimes it becomes an emergency spare.
Message 8 of 11
ihartley
Tutor

Re: SMART errors - What do you do?

I had a similar issue with the load/unload on a Green drive, my take is that they are either not tested beyond the spec (unlikely) or that beyond the spec there are reliability issues. I know the WD are rated @ 300k cycles, hence I would think that at 170% of that it is flag I would pay attention to. But of course it could go for another 10 years or fail in 10 seconds! 🙂

Just to note that replacement drives are "re-certified", not refurbished. One (not I) could argue that since they have undergone even more testing than a new drive they should be at least as reliable.... 🙂 Most likely they are just "no fault" returns, replaced mobo or firmware upgraded. The only "unknown" is how much action they have seen, since I guess they wipe the SMART stats. That's the only reason I won't put my data on them!
Message 9 of 11
StephenB
Guru

Re: SMART errors - What do you do?

Recertified is the right word of course. It wouldn't be cost effective to try and repair them.

They certainly do wipe the SMART stats. I got one a couple weeks ago from Seagate with 1 reallocated sector, so I guess they leave some of them alone. But not power on hours, spin ups, load cycle, etc.
Message 10 of 11
sdchew
Aspirant

SMART errors - What do you do?

Well I bought a new WD Red to replace it. Rebuilding the array went smoothly.

I ran WD's Datalife tool on the bad drive and it won't even complete. Says multiple unrecoverable sectors. I wonder how it was even functional in the NAS in the first place.
Message 11 of 11
Top Contributors
Discussion stats
  • 10 replies
  • 4034 views
  • 0 kudos
  • 3 in conversation
Announcements