No Volume exists

Paulaus · ‎2021-01-12

firmware 6.10.3
status healthy

It has 4 disks. I got messages that a disk was degraded from redundant to eventually dead. I got a message the a resync was performed. Subsequently get the msg 'No volume exists' and cannot see any of my data. This all happened over 3-4 days - I do not use the NAS often and it is relatively under-utilised but the data is important. I have purchased a replacement disk but have made no changes to the NAS since the first error messages. Desperate to recover data. I have attached some of log file, was not able to upload the zip.

rn_enthusiast · ‎2021-01-13

Hi @Paulaus

Thanks for the logs.

Yea, so it is pretty much as first expected. A disk dropped out of the raid and back in. This prompted a raid sync (resilver). During that sync another drive dropped out and your raid is now broken. It is regarding disk 2 and 3. On paper, the disks aren't actually looking that bad.
Disk 2: 12 Pending Sector Error
Disk 3: 1 Pending Sector Error

However, disk 3 is spewing errors in the kernel log. Looks like the NAS is having great difficulties communicating with that disk. Example below and this is repeated over and over.

[Mon Jan 11 12:01:51 2021] do_marvell_9170_recover: ignoring PCI device (8086:3a22) at PCI#0
[Mon Jan 11 12:01:51 2021] ata3.00: exception Emask 0x0 SAct 0x10000 SErr 0x0 action 0x0
[Mon Jan 11 12:01:51 2021] ata3.00: irq_stat 0x40000008
[Mon Jan 11 12:01:51 2021] ata3.00: failed command: READ FPDMA QUEUED
[Mon Jan 11 12:01:51 2021] ata3.00: cmd 60/08:80:48:00:80/00:00:00:00:00/40 tag 16 ncq 4096 in
res 41/40:00:49:00:80/00:00:00:00:00/40 Emask 0x409 (media error) <F>
[Mon Jan 11 12:01:51 2021] ata3.00: status: { DRDY ERR }
[Mon Jan 11 12:01:51 2021] ata3.00: error: { UNC }
[Mon Jan 11 12:01:51 2021] ata3.00: configured for UDMA/133
[Mon Jan 11 12:01:51 2021] sd 2:0:0:0: [sdc] tag#16 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[Mon Jan 11 12:01:51 2021] sd 2:0:0:0: [sdc] tag#16 Sense Key : Medium Error [current] [descriptor]
[Mon Jan 11 12:01:51 2021] sd 2:0:0:0: [sdc] tag#16 Add. Sense: Unrecovered read error - auto reallocate failed
[Mon Jan 11 12:01:51 2021] sd 2:0:0:0: [sdc] tag#16 CDB: Read(16) 88 00 00 00 00 00 00 80 00 48 00 00 00 08 00 00
[Mon Jan 11 12:01:51 2021] blk_update_request: I/O error, dev sdc, sector 8388681
[Mon Jan 11 12:01:51 2021] Buffer I/O error on dev sdc2, logical block 1, async page read
[Mon Jan 11 12:01:51 2021] ata3: EH complete

It is a case of a dual disk failure in a RAID5, which leaves the raid in a broken state. I feel that you were unlucky here to be honest. There were no prior signs that these disk were going to cause you trouble. It all happened rather suddenly. I do feel that this situation is salvageable. RAIDs can be saved from such a scenario but it might include cloning of disks if the current ones are too bad to work with and it will definitely involve manually reassembly of the raid.

You should contact Netgear support and discuss data recovery contract. Have their Level 3 team assess the situation and then take it from there. It won't be a free service I am sure but if the data matters and if you have no backups, it is the best (and likely cheapest) way to go.

Cheers

View solution in original post

rn_enthusiast · ‎2021-01-12

Something escalated quickly...

[21/01/06 19:22:49 AEST] warning:volume:LOGMSG_HEALTH_VOLUME Volume data health changed from Redundant to Degraded.

[21/01/07 18:01:12 AEST] notice:volume:LOGMSG_RESILVERSTARTED_VOLUME Resyncing started for Volume data.

[21/01/08 01:44:30 AEST] notice:volume:LOGMSG_HEALTH_VOLUME Volume data health changed from Degraded to Dead.
[21/01/08 01:44:31 AEST] err:disk:LOGMSG_ZFS_DISK_STATUS_CHANGED Disk in channel 2 (Internal) changed state from ONLINE to FAILED.

Looks like a bad disk caused a resync of the volume, which then caused a second disk to drop out of the raid mid-sync. Do you mind sending me the entire log-set from the NAS? That will give me more info to work with. You can download the full log-set via the web admin page under: "System" > "Logs" > "Download logs"

Just PM me over the log-set. Upload it to Google Drive, Dropbox or similar and shot me the link to download them, in a PM.

rn_enthusiast · ‎2021-01-13

Hi @Paulaus

Thanks for the logs.

Yea, so it is pretty much as first expected. A disk dropped out of the raid and back in. This prompted a raid sync (resilver). During that sync another drive dropped out and your raid is now broken. It is regarding disk 2 and 3. On paper, the disks aren't actually looking that bad.
Disk 2: 12 Pending Sector Error
Disk 3: 1 Pending Sector Error

However, disk 3 is spewing errors in the kernel log. Looks like the NAS is having great difficulties communicating with that disk. Example below and this is repeated over and over.

[Mon Jan 11 12:01:51 2021] do_marvell_9170_recover: ignoring PCI device (8086:3a22) at PCI#0
[Mon Jan 11 12:01:51 2021] ata3.00: exception Emask 0x0 SAct 0x10000 SErr 0x0 action 0x0
[Mon Jan 11 12:01:51 2021] ata3.00: irq_stat 0x40000008
[Mon Jan 11 12:01:51 2021] ata3.00: failed command: READ FPDMA QUEUED
[Mon Jan 11 12:01:51 2021] ata3.00: cmd 60/08:80:48:00:80/00:00:00:00:00/40 tag 16 ncq 4096 in
res 41/40:00:49:00:80/00:00:00:00:00/40 Emask 0x409 (media error) <F>
[Mon Jan 11 12:01:51 2021] ata3.00: status: { DRDY ERR }
[Mon Jan 11 12:01:51 2021] ata3.00: error: { UNC }
[Mon Jan 11 12:01:51 2021] ata3.00: configured for UDMA/133
[Mon Jan 11 12:01:51 2021] sd 2:0:0:0: [sdc] tag#16 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[Mon Jan 11 12:01:51 2021] sd 2:0:0:0: [sdc] tag#16 Sense Key : Medium Error [current] [descriptor]
[Mon Jan 11 12:01:51 2021] sd 2:0:0:0: [sdc] tag#16 Add. Sense: Unrecovered read error - auto reallocate failed
[Mon Jan 11 12:01:51 2021] sd 2:0:0:0: [sdc] tag#16 CDB: Read(16) 88 00 00 00 00 00 00 80 00 48 00 00 00 08 00 00
[Mon Jan 11 12:01:51 2021] blk_update_request: I/O error, dev sdc, sector 8388681
[Mon Jan 11 12:01:51 2021] Buffer I/O error on dev sdc2, logical block 1, async page read
[Mon Jan 11 12:01:51 2021] ata3: EH complete

It is a case of a dual disk failure in a RAID5, which leaves the raid in a broken state. I feel that you were unlucky here to be honest. There were no prior signs that these disk were going to cause you trouble. It all happened rather suddenly. I do feel that this situation is salvageable. RAIDs can be saved from such a scenario but it might include cloning of disks if the current ones are too bad to work with and it will definitely involve manually reassembly of the raid.

You should contact Netgear support and discuss data recovery contract. Have their Level 3 team assess the situation and then take it from there. It won't be a free service I am sure but if the data matters and if you have no backups, it is the best (and likely cheapest) way to go.

Cheers

Paulaus · ‎2021-01-13

Thanks for your help

StephenB · ‎2021-01-14

@rn_enthusiast wrote:

There were no prior signs that these disk were going to cause you trouble. It all happened rather suddenly.

One factor here is that the system won't detect a bad sector until it tries to read or write it - so a problem can lurk undetected for a long time. That's why I schedule disk tests (and RAID scrubs) in the maintenance schedule.

FWIW, Backblaze reported some time ago (2016) that about 25% of their disk failures occur with no warning (and good SMART stats reported before the failure).

I've seen this myself - most recently last week. That particular disk had no apparent issues reading or writing to the volume, and had good SMART stats. But it repeatedly failed the disk test in the NAS, and it also failed the vendor diag in a PC. Even after the failed tests, the SMART stats still looked good - no idea why, since the vendor diag reported "too many bad sectors".

You really do need a backup on another device - RAID simply isn't enough.

@rn_enthusiast wrote:

I do feel that this situation is salvageable.

I hope that is the case.

@rn_enthusiast wrote:

You should contact Netgear support and discuss data recovery contract. Have their Level 3 team assess the situation and then take it from there. It won't be a free service I am sure but if the data matters and if you have no backups, it is the best (and likely cheapest) way to go.

If you can connect the disks to a Windows PC, you could also try using RAID recovery software. ReclaiMe is one option that folks here have used with some success. It is expensive (but should be cheaper than a data recovery service).

rn_enthusiast · ‎2021-01-14

@StephenB wrote:
@rn_enthusiast wrote:
There were no prior signs that these disk were going to cause you trouble. It all happened rather suddenly.
One factor here is that the system won't detect a bad sector until it tries to read or write it - so a problem can lurk undetected for a long time. That's why I schedule disk tests (and RAID scrubs) in the maintenance schedule.

Good point. Definitely something OP should consider doing in the future. I do a disk test task every 3 months, myself.

Sandshark · ‎2021-01-14

While the problem would likely have still happened if the first drive was replaced (since it would still need to do a sync that included the other), what I don't understand is why the ReadyNAS suddenly decides on it's own that a drive that was previously dead should be re-introduced to the RAID and a re-sync started. It should take a conscious action by the admin to do that, so he can insure the backup is up to date before trying it or he can choose not to and go right for a new drive.

No Volume exists

No Volume exists

Re: No Volume exists

Re: No Volume exists

Re: No Volume exists

Re: No Volume exists

Re: No Volume exists

Re: No Volume exists

Re: No Volume exists