Forum Discussion

Aspirant

Apr 15, 2025

Solved

RN314 and high ATA error count on 2 disks

I have a RN314 and I've got high ATA error counts. I have two RAID 1 volumes, each 2x3TB disks. One disk in each volume has been reporting increasing ATA counts.

Disk: Detected increasing ATA error count: [60562] on disk 4 (Internal) [WDC WD30EFRX-68EUZN0, WD-WCC4N4KT2UX4] 20397 times in the past 30 days. This condition often indicates an impending failure. Be prepared to replace this disk to maintain data redundancy.

This also occurred on disk 1 in the past, but has now stopped showing increasing errors. NAS reports the volumes as healthy.

I've looked in the download logs at dmesg.log and can see LOTS of this group of messages, not just ata4, but also for ata1

[Wed Apr 16 13:46:51 2025] do_marvell_9170_recover: ignoring PCI device (8086:3a22) at PCI#0
[Wed Apr 16 13:46:51 2025] ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[Wed Apr 16 13:46:51 2025] ata4.00: irq_stat 0x40000001
[Wed Apr 16 13:46:51 2025] ata4.00: failed command: READ DMA
[Wed Apr 16 13:46:51 2025] ata4.00: cmd c8/00:08:48:00:00/00:00:00:00:00/e0 tag 0 dma 4096 in
         res 51/40:00:49:00:00/00:00:00:00:00/e0 Emask 0x9 (media error)
[Wed Apr 16 13:46:51 2025] ata4.00: status: { DRDY ERR }
[Wed Apr 16 13:46:51 2025] ata4.00: error: { UNC }
[Wed Apr 16 13:46:51 2025] ata4.00: configured for UDMA/133
[Wed Apr 16 13:46:51 2025] ata4: EH complete

Some additional messages like

[Wed Apr 16 13:19:12 2025] sd 0:0:0:0: [sda] tag#11 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[Wed Apr 16 13:19:12 2025] sd 0:0:0:0: [sda] tag#11 Sense Key : Medium Error [current] [descriptor] 
[Wed Apr 16 13:19:12 2025] sd 0:0:0:0: [sda] tag#11 Add. Sense: Unrecovered read error - auto reallocate failed
[Wed Apr 16 13:19:12 2025] sd 0:0:0:0: [sda] tag#11 CDB: Read(16) 88 00 00 00 00 00 00 00 00 48 00 00 00 08 00 00

So, I looked in the smart_history.log file and noticed that between July and Sep 2024

2024-07-14 21:52:21  WDC WD30EFRX-68AX9N0  WD-WMC1T1684364       0             0             0               -1          -1            0             0                  1         
2024-07-14 21:53:32  WDC WD30EFRX-68AX9N0  WD-WMC1T1684364       0             0             0               -1          -1            0             0                  3         
2024-07-14 21:55:33  WDC WD30EFRX-68AX9N0  WD-WMC1T1684364       0             0             0               -1          -1            0             0                  5         
2024-07-14 21:57:33  WDC WD30EFRX-68AX9N0  WD-WMC1T1684364       0             0             0               -1          -1            0             0                  7         
2024-07-14 21:59:34  WDC WD30EFRX-68AX9N0  WD-WMC1T1684364       0             0             0               -1          -1            0             0                  9         
...
2024-09-27 09:47:31  WDC WD30EFRX-68AX9N0  WD-WMC1T1684364       0             0             0               -1          -1            0             0                  65532     
2024-09-27 09:49:37  WDC WD30EFRX-68AX9N0  WD-WMC1T1684364       0             0             0               -1          -1            0             0                  65533     
2024-09-27 09:51:42  WDC WD30EFRX-68AX9N0  WD-WMC1T1684364       0             0             0               -1          -1            0             0                  65534     
2024-09-27 09:54:03  WDC WD30EFRX-68AX9N0  WD-WMC1T1684364       0             0             0               -1          -1            0             0                  65535

Then no more data for that drive model#, but then in Feb this year the same starts for this disk 4 and the ATA error count has gone from 1 to 60564 so far and climbing.

The disk_info.log shows this for disk 1 - seems like a lot of hours - not sure I've had the drives that long...

Health data 
  ATA Error Count:                52956
  Reallocated Sectors:            0
  Reallocation Events:            0
  Spin Retry Count:               0
  Current Pending Sector Count:   0
  Uncorrectable Sector Count:     0
  Temperature:                    34
  Start/Stop Count:               22485
  Power-On Hours:                 93510
  Power Cycle Count:              721
  Load Cycle Count:               22431

but disk 4 has only 14k power on hours

From all this, I can figure out the disks are probably stuffed - I'm consolidating data from vol2 to vol1 and will then make a single RAID array with the two good drives, but any expert (or otherwise) opinion out there that can help me figure out if this is definitely disk or chasis?

StephenB
Apr 16, 2025
UNCs and Reallocated Sectors definitely point to the disk. ATA errors relate to the SATA bus, so can either be an issue with the NAS or the disk.

8 Replies

Antsa
Aspirant
Apr 15, 2025
I did run one of the blocks of errors through ChatGPT and it quite helpfully gave me the following, which seems plausible and was interested to read that it deems the problem to be media rather than cable/controller...

[Wed Apr 16 13:46:51 2025] do_marvell_9170_recover: ignoring PCI device (8086:3a22) at PCI#0
This is referencing a PCI device (likely a storage controller), specifically one with a Marvell 9170 chip.
It's being ignored during recovery, which probably means it's not the active controller for the affected drive or it's not directly involved in the failure.
🧨 [Wed Apr 16 13:46:51 2025] ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
This line indicates the drive experienced an ATA exception, but the error mask (Emask) and other flags are all zeros — suggesting a lower-level issue, possibly at the drive/media level rather than the controller or cable.
⚠️ [Wed Apr 16 13:46:51 2025] ata4.00: irq_stat 0x40000001
A hardware-level status update. The irq_stat suggests that an uncorrectable error (typically media-related) occurred during a DMA operation (direct memory access).
❌ [Wed Apr 16 13:46:51 2025] ata4.00: failed command: READ DMA
The drive failed to read data using the DMA command, which is used for high-speed data transfer without CPU involvement.
📦 cmd c8/00:08:48:00:00/00:00:00:00:00/e0 tag 0 dma 4096 in
This is a low-level command format showing the drive attempted to read 4 KB (4096) from a specific sector (starting at LBA 0x48).
It failed with the result:
res 51/40:00:49:00:00/00:00:00:00:00/e0 Emask 0x9 (media error)
Emask 0x9 = media error, which typically means:
The drive tried to read data from a sector, but the data was uncorrectable, likely due to physical degradation of the sector.
🟥 [Wed Apr 16 13:46:51 2025] ata4.00: status: { DRDY ERR }
DRDY = Drive is ready
ERR = An error occurred
→ The drive responded, but the command couldn't be completed successfully.
🧨 [Wed Apr 16 13:46:51 2025] ata4.00: error: { UNC }
UNC = Uncorrectable error — usually a sign the drive couldn't recover the data from that sector even with ECC (error correction code).
⚙️ [Wed Apr 16 13:46:51 2025] ata4.00: configured for UDMA/133
Just confirms the drive is running in UDMA/133 mode — normal for SATA.
✅ [Wed Apr 16 13:46:51 2025] ata4: EH complete
EH = Error Handling
The kernel’s ATA error handler has finished trying to deal with the issue.
- StephenB
  Guru - Experienced User
  Apr 16, 2025
  UNCs and Reallocated Sectors definitely point to the disk. ATA errors relate to the SATA bus, so can either be an issue with the NAS or the disk.
  - Sandshark
    Sensei
    Apr 16, 2025
    One bad drive can also sometimes cause ATA errors on another channel.
    
    Your data is definitely at risk, so you need to complete a backup ASAP. If you do really have two failing drives (and I think that's most likely), the second could fail during re-sync of the first replacement, which often results in data loss.
Agive1997
Aspirant
Apr 16, 2025
Thank you for discussing this

NETGEAR Academy

Boost your skills with the Netgear Academy - Get trained, certified and stay ahead with the latest Netgear technology!

Join Us!

ProSupport for Business

Comprehensive support plans for maximum network uptime and business peace of mind.

Learn More

Forum Discussion

RN314 and high ATA error count on 2 disks

8 Replies

[Wed Apr 16 13:46:51 2025] do_marvell_9170_recover: ignoring PCI device (8086:3a22) at PCI#0

🧨 [Wed Apr 16 13:46:51 2025] ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0

⚠️ [Wed Apr 16 13:46:51 2025] ata4.00: irq_stat 0x40000001

❌ [Wed Apr 16 13:46:51 2025] ata4.00: failed command: READ DMA

📦 cmd c8/00:08:48:00:00/00:00:00:00:00/e0 tag 0 dma 4096 in

🟥 [Wed Apr 16 13:46:51 2025] ata4.00: status: { DRDY ERR }

🧨 [Wed Apr 16 13:46:51 2025] ata4.00: error: { UNC }

⚙️ [Wed Apr 16 13:46:51 2025] ata4.00: configured for UDMA/133

✅ [Wed Apr 16 13:46:51 2025] ata4: EH complete

Related Content

Detected increasing uncorrectable error count

Detected increasing uncorrectable error count:

ATA error count increasing- Bad disk or bad chassis?

error message from RN314

RN314 don't boot anymore after trying to fix "no space left on device" error.

NETGEAR Academy

ProSupport for Business