All Disks Dead, but the volume is still readable

Crash_HI · ‎2021-07-31

Hi,

The PSU in my old ReadyNAS had become slow to power up, so I ordered what was advertised as a compatible replacement from ebay. After powering replacing the PSU and powering up, the ReadyNAS began sending alerts that the +3.3V was out of range.

VCC power is out of normal range [expected: 3.30  current: 3.47].

I immediately contacted the seller of the PSU (kdmpower) who referred me to the included documentation that stated that it was normal to receive these (false) messages and that the PSU was not defective, so they could be ignored. I made the mistake of believing this rather than dismantling the unit and testing the PSU myself.

The next morning I found that in RAIDiator 4.2.31, the device health tab showed the status for all drives had changed to Dead, and their indicators were flashing yellow. All drives also show their temperature as 0 C / 32 F, which is of course not accurate.

The confusing thing is that I am still able to read the volume, and it is entirely backed up. I ran a comparison of all 4.7 million files on the NAS, and all files were size, timestamp, and CRC verified as consistent. I apparently have a ZombieNAS now.

I have already replaced the PSU with one that is known to be good, but I would appreciate any suggestions as to whether there is any way to get the system to read the drives properly. The volume is currently degraded due to a drive tray whose spring failed during handling, and it says it is rebuilding, so I don't want to shut it down, but I can't even get any status information about that process.

Any suggestions would be greatly appreciated.

Thanks for your consideration,

John

Crash_HI · ‎2021-07-31

Thanks,

I had read your post and had already disabled the sleep function in troubleshooting.

The latest status is that overnight the Rsync finished, but then without any log entry or alert about a drive failure, the panel and RAIDar are reporting that the raid is not redundant/unprotected.

I disassembled and cleaned everything including repasting and contact cleaner on all of the connectors.

Upon bootup RAIDar briefly reported all drives healthy and all of their temperatures, but Frontview still showed all drives dead and only one drive temp. After a few minutes RAIDar lost the drive temperature data.

Any ideas why RAIDar and Frontview would be reporting different information or how to reset this?

Thanks

View solution in original post

StephenB · ‎2021-07-31

Maybe enable SSH and do a bit of poking around there. You will be able to see the RAID rebuild status, and after that completes you can test the disks with smartctl.

Crash_HI · ‎2021-07-31

Hi,

Where should I look via SSH? Syslog?

I decided to reboot the NAS since RAIDar is now reporting all the drives active. RAIDiator still says all drives are dead. Both RAIDar and RAIDiator are only reporting temperature on one of the drives (1).

The good news is that the front panel now reports Resync status percentage.

Syslog contained a number of entries like this, but I do not know what it means.

---

Jul 31 04:36:18 Server kernel: ata2.00: exception Emask 0x10 SAct 0x7fffffff SErr 0x400100 action 0x6 frozen
Jul 31 04:36:18 Server kernel: ata2.00: irq_stat 0x08000000, interface fatal error
Jul 31 04:36:18 Server kernel: ata2: SError: { UnrecovData Handshk }
Jul 31 04:36:18 Server kernel: ata2.00: failed command: WRITE FPDMA QUEUED
Jul 31 04:36:18 Server kernel: ata2.00: cmd 61/a0:00:60:34:fc/01:00:e8:00:00/40 tag 0 ncq 212992 out
Jul 31 04:36:18 Server kernel: res 50/00:00:00:a2:fc/00:04:e8:00:00/40 Emask 0x10 (ATA bus error)
Jul 31 04:36:18 Server kernel: ata2.00: status: { DRDY }

---

Thanks very much!

Sandshark · ‎2021-07-31

Many years ago I had a similar problem on a ProBE running OS 4.2.x, though not usually all drives. But more than one, yet my volume was actually still fine. My problem was four of the drives (which were older and took more power, having been moved from an NVX) coming out of sleep mode slower than the other two, and disabling sleep mode kept it from happening again. An underpowered PSU could cause a similar problem, I suspect. Even though that was back when the "Jedi" were active here, so there was a lot of developer support, the only way I could clean up the drives was to remove each drive and then let it re-sync or do a factory default. Anything less that they suggested didn't seem to "stick" for long.

Crash_HI · ‎2021-07-31

Thanks,

I had read your post and had already disabled the sleep function in troubleshooting.

The latest status is that overnight the Rsync finished, but then without any log entry or alert about a drive failure, the panel and RAIDar are reporting that the raid is not redundant/unprotected.

I disassembled and cleaned everything including repasting and contact cleaner on all of the connectors.

Upon bootup RAIDar briefly reported all drives healthy and all of their temperatures, but Frontview still showed all drives dead and only one drive temp. After a few minutes RAIDar lost the drive temperature data.

Any ideas why RAIDar and Frontview would be reporting different information or how to reset this?

Thanks

Crash_HI · ‎2021-08-01

It turns out that the solution was either to destroy and recreate the volume, or replace the OS. My best guess is that the faulty power supply or the fact that it was resulting in thousands of alerts and emails being sent, somehow corrupted the raid or configuration.

I decided to install OS6, which resolved the problem, and now all the hard drives are displaying as healthy and reporting their temperature and status.

Unfortunately, I misunderstood the process of upgrading to OS6, and didn't realize this would destroy the volume, so now I have to restore everything from backup, but at least now I know it wasn't hardware damage to the NAS caused by the faulty power supply.

Thanks for your consideration,

John

All Disks Dead, but the volume is still readable

All Disks Dead, but the volume is still readable

Re: All Disks Dead, but the volume is still readable

Re: All Disks Dead, but the volume is still readable

Re: All Disks Dead, but the volume is still readable

Re: All Disks Dead, but the volume is still readable

Re: All Disks Dead, but the volume is still readable

Re: All Disks Dead, but the volume is still readable