Forum Discussion

Guide

Jul 31, 2021

Solved

All Disks Dead, but the volume is still readable

Hi, The PSU in my old ReadyNAS had become slow to power up, so I ordered what was advertised as a compatible replacement from ebay. After powering replacing the PSU and powering up, the ReadyNAS ...

Adding Disks

Installation & Upgrade

Crash_HI
Jul 31, 2021
Thanks,
I had read your post and had already disabled the sleep function in troubleshooting.

The latest status is that overnight the Rsync finished, but then without any log entry or alert about a drive failure, the panel and RAIDar are reporting that the raid is not redundant/unprotected.

I disassembled and cleaned everything including repasting and contact cleaner on all of the connectors.

Upon bootup RAIDar briefly reported all drives healthy and all of their temperatures, but Frontview still showed all drives dead and only one drive temp. After a few minutes RAIDar lost the drive temperature data.

Any ideas why RAIDar and Frontview would be reporting different information or how to reset this?

Thanks

StephenB

Guru - Experienced User

Jul 31, 2021

Maybe enable SSH and do a bit of poking around there. You will be able to see the RAID rebuild status, and after that completes you can test the disks with smartctl.

Crash_HI

Guide

Jul 31, 2021

Hi,

Where should I look via SSH? Syslog?

I decided to reboot the NAS since RAIDar is now reporting all the drives active. RAIDiator still says all drives are dead. Both RAIDar and RAIDiator are only reporting temperature on one of the drives (1).

The good news is that the front panel now reports Resync status percentage.

Syslog contained a number of entries like this, but I do not know what it means.

---

Jul 31 04:36:18 Server kernel: ata2.00: exception Emask 0x10 SAct 0x7fffffff SErr 0x400100 action 0x6 frozen
Jul 31 04:36:18 Server kernel: ata2.00: irq_stat 0x08000000, interface fatal error
Jul 31 04:36:18 Server kernel: ata2: SError: { UnrecovData Handshk }
Jul 31 04:36:18 Server kernel: ata2.00: failed command: WRITE FPDMA QUEUED
Jul 31 04:36:18 Server kernel: ata2.00: cmd 61/a0:00:60:34:fc/01:00:e8:00:00/40 tag 0 ncq 212992 out
Jul 31 04:36:18 Server kernel: res 50/00:00:00:a2:fc/00:04:e8:00:00/40 Emask 0x10 (ATA bus error)
Jul 31 04:36:18 Server kernel: ata2.00: status: { DRDY }

---

Thanks very much!

Sandshark
Sensei
Jul 31, 2021
Many years ago I had a similar problem on a ProBE running OS 4.2.x, though not usually all drives. But more than one, yet my volume was actually still fine. My problem was four of the drives (which were older and took more power, having been moved from an NVX) coming out of sleep mode slower than the other two, and disabling sleep mode kept it from happening again. An underpowered PSU could cause a similar problem, I suspect. Even though that was back when the "Jedi" were active here, so there was a lot of developer support, the only way I could clean up the drives was to remove each drive and then let it re-sync or do a factory default. Anything less that they suggested didn't seem to "stick" for long.
- Crash_HI
  Guide
  Jul 31, 2021
  Thanks,
  I had read your post and had already disabled the sleep function in troubleshooting.
  
  The latest status is that overnight the Rsync finished, but then without any log entry or alert about a drive failure, the panel and RAIDar are reporting that the raid is not redundant/unprotected.
  
  I disassembled and cleaned everything including repasting and contact cleaner on all of the connectors.
  
  Upon bootup RAIDar briefly reported all drives healthy and all of their temperatures, but Frontview still showed all drives dead and only one drive temp. After a few minutes RAIDar lost the drive temperature data.
  
  Any ideas why RAIDar and Frontview would be reporting different information or how to reset this?
  
  Thanks
- Crash_HI
  Guide
  Aug 01, 2021
  It turns out that the solution was either to destroy and recreate the volume, or replace the OS. My best guess is that the faulty power supply or the fact that it was resulting in thousands of alerts and emails being sent, somehow corrupted the raid or configuration.
  
  I decided to install OS6, which resolved the problem, and now all the hard drives are displaying as healthy and reporting their temperature and status.
  
  Unfortunately, I misunderstood the process of upgrading to OS6, and didn't realize this would destroy the volume, so now I have to restore everything from backup, but at least now I know it wasn't hardware damage to the NAS caused by the faulty power supply.
  
  Thanks for your consideration,
  
  John