RN424 , 4x 8TB WD Red Plus failing after 120 days?

This problem is with a RN424 with 4x 8TB WD80EFAX, all identical HDs purchased at the same time as the 424. The system has 6.10.4 Hotfix1 and has been up for 120 days, I had not even transferred all the files on it yet

There are also two other RNs, RN214a and RN214b, both with 4x 4TB (OS is 6.10.3), but I was running out of space hence the 424 earlier this year: I bit the bullet and got bigger disks in what is (supposedly) a better chassis.

This morning I thought that I could do some maintenace and started transferring files across. Did some folders first (~65 GB), then selected another batch (~435Gb).

Some time later I saw that I received two mails from RN424 – volume degraded.

“Disk Model:WDC WD80EFAX-68KNBN0 Serial:VGJL3SDG was removed from Channel 2 of the head unit.”

I cannot really say, my guess is that the failure happened after the first file operation finished (maybe).

I interrupted the copying and downloaded the logs. And this is where I am now.

NO BACKUP in existence – some doubling of data here and there, but not for the RN424 data

Opening the front of the 424, I see the red light of what I call disk 2 (disk 1,2,3,4 counting from the left) ( I am not familiar with th e424 front panel aat all)

The RN424 has ~14.03TB used and 7.74TB free.

The RN214a has ~7.71TB used and 3.19TB free

The RN214b has ~4.22TB used, 4.97TB of snapshots and 1.77TB free

(I must say that I really understand NOTHING about snapshots – my most precious data is on RN214a, this was something I wanted to read about and sort out, but haven’t had the time to do yet – infact I had forgotten about it)

Is it normal for a disk to fail after 120 days of usage? The last message I received from RN424 was the balancing operation, last Monday, no warnings.

Status.log simply says

[21/07/30 12:20:38 GMT] warning:disk:LOGMSG_DELETE_DISK Disk Model:WDC WD80EFAX-68KNBN0 Serial:VGJL3SDG was removed from Channel 2 of the head unit.

[21/07/30 12:20:41 GMT] warning:volume:LOGMSG_HEALTH_VOLUME Volume data health changed from Redundant to Degraded.”

Diskinfo.log reports on Channel 0,2,3 but misses the one which is “removed”.

Which log has the past reports from the disks (to see if they were warning of the failure?

How can I be POSITIVE of the disk been faulty from the logs? (I remember reading about errors other times but not in this occasion)

(yes I can PHISICALLY remove the disk and test it)

System-journal.log reports the event (~50 lines) (I cannot intepret it)

Is it possible for the RN424 to “misdiagnose” an otherwise healthy disks or to suffer an hardware fault?

But what should I do now

a) should I switch the RN424 off and take it offline? I can copy the few folders I would be working on RN214B and continue there for some time.
b) should I MOVE data back to other NASes freeing space (but that would not do anything, the Volume is degraded because one disk is missing, it will remain degraded: the only advantage would be that – if there is another HD fault, I would have lost less data – mott point, maybe.

Many thanks in advance,

Berillio

25 Replies

Replies have been turned off for this discussion

StephenB
Guru - Experienced User
Jul 30, 2021
berillio wrote:

This problem is with a RN424 with 4x 8TB WD80EFAX, all identical HDs purchased at the same time as the 424. The system has 6.10.4 Hotfix1 and has been up for 120 days, I had not even transferred all the files on it yet

Status.log simply says

[21/07/30 12:20:38 GMT] warning:disk:LOGMSG_DELETE_DISK Disk Model:WDC WD80EFAX-68KNBN0 Serial:VGJL3SDG was removed from Channel 2 of the head unit.

[21/07/30 12:20:41 GMT] warning:volume:LOGMSG_HEALTH_VOLUME Volume data health changed from Redundant to Degraded.”

(yes I can PHISICALLY remove the disk and test it)

Is it possible for the RN424 to “misdiagnose” an otherwise healthy disks or to suffer an hardware fault?

Disks can fail at any time, so it would be useful to test it in the PC.

The bay in the NAS might also have failed - if you have a spare disk, you might try doing a factory install (with only the spare disk inserted, in bay 1). Then power down and move the disk to bay 2. Power up, and make sure it works. If you try this test, label the disks by slot as you remove them.

berillio wrote:

But what should I do now

a) should I switch the RN424 off and take it offline? I can copy the few folders I would be working on RN214B and continue there for some time.

b) should I MOVE data back to other NASes freeing space (but that would not do anything, the Volume is degraded because one disk is missing, it will remain degraded: the only advantage would be that – if there is another HD fault, I would have lost less data – mott point, maybe.

I suggest running the disk test on the RN424 before you copy new data onto it (look on the volume settings wheel for the disk test). That will take a while, and you can test disk 2 in a Windows PC in parallel. Once you know the disk health, you can sort out the path forward.

Since the volume is degraded, the data is at more risk than your other NAS. If a second disk fails in the RN424, you will lose the data on it. So finding a way to back up the critical data on the RN424 would be prudent.

Do you have a backup plan in mind for the RN424? Ideally you'd move everthing on to it, and use the RN214s as backups.
- berillio
  Aspirant
  Jul 30, 2021
  I meant"Disks can fail at any time, so it would be useful to test it in the PC. "
  OK - "hot removal" from the 424?
  - berillio
    Aspirant
    Jul 30, 2021
    Sorry, ignore that "I meant" before the quote from Stephen B..