Volume lost on ReadyNas 104

Markuzz · ‎2023-06-25

Hello,

I lost my volume on a ReadyNas 104. No errors have been logged according to the admin page, only a bunch of warnings around April this year. See output in attachment.

I have not done any actions on the NAS so far. Help to reconnect to the volume is very much appreciated!

I downloaded all logs, so in case I need to share additional information. Please let me know.

Kind regards,

Markuzz.

StephenB · ‎2023-06-27

@Markuzz wrote:

See attachment for output 1 of 2.

This should assemble, so I am puzzled by the earlier issue.

The event counters (13922, 13917, 13926, 13917) do show some lost writes - they should all be 13926.

Maybe try this variation:

mdadm --assemble --really-force /dev/md127 /dev/sda3 /dev/sdb3 /dev/sdc3 /dev/sdd3

View solution in original post

schumaku · ‎2023-06-26

Hallo @Markuzz

Ignoring all warnings have lead to the end of this volume (including your data) started. Two disks were removed back in April, or some hardware issue caused to drop the channels with the ST3000DM001. Even before, the Toshiba and the Hitachi drive reported high spin retry counts, indicating a possible coming up failure. Read that complete message what your ReadyNAS indicated several times before.

Don't know if this RN104 will now no longer willing to take new HDDs in slot #1 and slot #3, certainly the HDD in channel #4 was lost before - there is went the only redundancy.

Not sue what you expect now.

Grüsse aus der Schweiz

-Kurt

Markuzz · ‎2023-06-26

Hi @schumaku,

Thanks for your swift reply.

I do not understand - while having a few warnings - the volume gets lost. Also the NAS is currently reporting all disks are healthy (except there is no active volume).

Is there anyway to try force connect the volume again?

Cheers, Mark.

StephenB · ‎2023-06-26

@Markuzz wrote:

I do not understand - while having a few warnings - the volume gets lost.

In February, the volume became degraded - meaning one disk dropped out of the array. The screenshot doesn't say which one.

Later on we see warnings that two additional disks were failing. Also that a third disk was removed.

All of these can (and often do) lead to data loss.

But we are only seeing the warnings in your screen shot, and there is lot of info that is likely missing. For instance, it is possible that the volume resynced after the February problem. Also we don't know what RAID mode you are using.

@Markuzz wrote:

Is there anyway to try force connect the volume again?

Probably the best path is RAID recovery software (ReclaiMe is one people have used here). You'd need a way to connect the disks to a Windows PC (usb adapter docks will work, some have multiple bays). And you'd need a place to offload the data.

It might be possible to forcibly assemble the array - not sure. I wouldn't put much stock in the disk health you are seeing now. Have you ever used the linux command line? Also is ssh already enabled on the NAS.

Markuzz · ‎2023-06-26

Hi @StephenB,

Many thanks for your reply.

I've copied the status.log with the warnings hereunder. I have not removed any disks in the last years.

The NAS is in X-Raid mode. Also I 'suddenly' noticed that the volume seems to be there, but inactive. See attachment.

I am familiar with ssh and the command line. So any help as to try to reconnect the volume via a terminal is highly appreciated!

Kind regards,

Mark

-- STATUS.LOG

[23/02/12 14:05:43 CET] warning:disk:LOGMSG_DELETE_DISK Disk Model:Hitachi HDS5C3030ALA630 Serial:MJ1311YNG6YY5A was removed from Channel 4 of the head unit.
[23/02/12 14:05:45 CET] warning:volume:LOGMSG_HEALTH_VOLUME Volume MusicDrive health changed from Redundant to Degraded.
[23/02/12 14:13:21 CET] info:system:LOGMSG_READYNASD_ABORTED_NOINFO ReadyNASOS service or process was restarted.
[23/02/12 14:13:26 CET] warning:volume:LOGMSG_HEALTH_VOLUME_WARN Volume MusicDrive is Degraded.
[23/02/12 14:13:26 CET] warning:system:LOGMSG_SENT_ALERT_MESG_FAILED Alert message failed to send.
[23/02/12 14:13:34 CET] info:system:LOGMSG_START_READYNASD ReadyNASOS background service started.
[23/02/12 14:13:38 CET] warning:system:LOGMSG_BOND_NETWORK_SLAVE_NIC_DOWN Bond interface bond0 has slave interface eth1 offline.
[23/02/12 14:13:53 CET] notice:volume:LOGMSG_RESILVERSTARTED_VOLUME Resyncing started for Volume MusicDrive.
[23/02/12 14:14:36 CET] warning:volume:LOGMSG_READYTIER_REPLACE_MIXED_DISK It is not recommended to mix different disk types. Current volume is using SATA 7200 RPM drives. Please replace the disk in channel 4 (Internal) to match the rest of the disks on volume MusicDrive for best performance.
[23/02/12 15:32:39 CET] warning:volume:LOGMSG_READYTIER_REPLACE_MIXED_DISK It is not recommended to mix different disk types. Current volume is using SATA 7200 RPM drives. Please replace the disk in channel 4 (Internal) to match the rest of the disks on volume MusicDrive for best performance.
[23/02/13 01:00:54 CET] warning:volume:LOGMSG_HEALTH_VOLUME_WARN Volume MusicDrive is Degraded.
[23/02/14 01:00:28 CET] warning:volume:LOGMSG_HEALTH_VOLUME_WARN Volume MusicDrive is Degraded.
[23/02/15 01:00:28 CET] warning:volume:LOGMSG_HEALTH_VOLUME_WARN Volume MusicDrive is Degraded.
[23/02/15 13:03:33 CET] notice:volume:LOGMSG_RESILVERCOMPLETE_VOLUME Volume MusicDrive is resynced.
[23/02/15 13:03:34 CET] notice:volume:LOGMSG_HEALTH_VOLUME Volume MusicDrive health changed from Degraded to Redundant.
[23/02/15 13:03:34 CET] notice:disk:LOGMSG_ZFS_DISK_STATUS_CHANGED Disk in channel 4 (Internal) changed state from RESYNC to ONLINE.
[23/02/16 08:18:35 CET] notice:system:LOGMSG_SYSTEM_HALT The system is shutting down.
[23/03/16 08:20:45 CET] warning:disk:LOGMSG_SMART_SPIN_RETRY_WARN Detected high spin retry count: [131073] on disk 2 (Internal) [TOSHIBA DT01ACA300, Y3RDZRKGS]. This condition often indicates an impending failure. Be prepared to replace this disk to maintain data redundancy.
[23/03/16 08:20:48 CET] info:system:LOGMSG_START_READYNASD ReadyNASOS background service started.
[23/03/16 08:20:45 CET] warning:system:LOGMSG_SENT_ALERT_MESG_FAILED Alert message failed to send.
[23/03/16 08:21:03 CET] warning:system:LOGMSG_BOND_NETWORK_SLAVE_NIC_DOWN Bond interface bond0 has slave interface eth1 offline.
[23/03/16 11:46:49 CET] notice:system:LOGMSG_SYSTEM_HALT The system is shutting down.
[23/04/16 12:48:51 CEST] info:system:LOGMSG_START_READYNASD ReadyNASOS background service started.
[23/04/16 12:49:00 CEST] warning:system:LOGMSG_BOND_NETWORK_SLAVE_NIC_DOWN Bond interface bond0 has slave interface eth1 offline.
[23/04/16 12:49:00 CEST] warning:system:LOGMSG_SENT_ALERT_MESG_FAILED Alert message failed to send.
[23/04/17 08:16:48 CEST] info:system:LOGMSG_READYNASD_ABORTED_NOINFO ReadyNASOS service or process was restarted.
[23/04/17 08:16:53 CEST] info:system:LOGMSG_START_READYNASD ReadyNASOS background service started.
[23/04/17 08:17:01 CEST] warning:system:LOGMSG_BOND_NETWORK_SLAVE_NIC_DOWN Bond interface bond0 has slave interface eth1 offline.
[23/04/17 08:17:01 CEST] warning:system:LOGMSG_SENT_ALERT_MESG_FAILED Alert message failed to send.
[23/04/20 00:33:07 CEST] info:system:LOGMSG_READYNASD_ABORTED_NOINFO ReadyNASOS service or process was restarted.
[23/04/20 00:33:13 CEST] info:system:LOGMSG_START_READYNASD ReadyNASOS background service started.
[23/04/20 00:33:21 CEST] warning:system:LOGMSG_BOND_NETWORK_SLAVE_NIC_DOWN Bond interface bond0 has slave interface eth1 offline.
[23/04/20 00:33:21 CEST] warning:system:LOGMSG_SENT_ALERT_MESG_FAILED Alert message failed to send.
[23/04/20 00:49:25 CEST] warning:disk:LOGMSG_DELETE_DISK Disk Model:ST3000DM001-1CH166 Serial:Z1F47LTW was removed from Channel 1 of the head unit.
[23/04/20 00:49:25 CEST] warning:volume:LOGMSG_HEALTH_VOLUME Volume MusicDrive health changed from Redundant to Dead.
[23/04/20 00:49:27 CEST] warning:disk:LOGMSG_DELETE_DISK Disk Model:ST3000DM001-1CH166 Serial:Z1F4C9PX was removed from Channel 3 of the head unit.
[23/04/20 00:49:29 CEST] notice:disk:LOGMSG_ADD_DISK Disk Model:ST3000DM001-1CH166 Serial:Z1F47LTW was added to Channel 1 of the head unit.
[23/04/20 00:49:36 CEST] notice:disk:LOGMSG_ADD_DISK Disk Model:ST3000DM001-1CH166 Serial:Z1F4C9PX was added to Channel 3 of the head unit.
[23/04/20 00:51:11 CEST] notice:disk:LOGMSG_ZFS_DISK_STATUS_CHANGED Disk in channel 1 (Internal) changed state from RESYNC to ONLINE.
[23/04/20 00:52:55 CEST] notice:disk:LOGMSG_ZFS_DISK_STATUS_CHANGED Disk in channel 3 (Internal) changed state from RESYNC to ONLINE.
[23/04/20 01:00:28 CEST] warning:volume:LOGMSG_HEALTH_VOLUME_WARN Volume MusicDrive is Dead.
[23/04/20 01:13:18 CEST] warning:disk:LOGMSG_SMART_SPIN_RETRY_WARN Detected high spin retry count: [327682] on disk 4 (Internal) [Hitachi HDS5C3030ALA630, MJ1311YNG6YY5A]. This condition often indicates an impending failure. Be prepared to replace this disk to maintain data redundancy.
[23/04/20 01:15:09 CEST] warning:disk:LOGMSG_DELETE_DISK Disk Model:Hitachi HDS5C3030ALA630 Serial:MJ1311YNG6YY5A was removed from Channel 4 of the head unit.
[23/04/20 01:15:21 CEST] warning:disk:LOGMSG_SMART_SPIN_RETRY_WARN Detected high spin retry count: [1245187] on disk 2 (Internal) [TOSHIBA DT01ACA300, Y3RDZRKGS]. This condition often indicates an impending failure. Be prepared to replace this disk to maintain data redundancy.
[23/04/20 05:34:22 CEST] notice:system:LOGMSG_SYSTEM_HALT The system is shutting down.

StephenB · ‎2023-06-26

@Markuzz wrote:

The NAS is in X-Raid mode. Also I 'suddenly' noticed that the volume seems to be there, but inactive. See attachment.

[23/04/20 01:00:28 CEST] warning:volume:LOGMSG_HEALTH_VOLUME_WARN Volume MusicDrive is Dead.

A little odd, since the volume name for X-RAID is always "data", and yours is "MusicDrive". So you must have started out in FlexRAID, and then switched before you did any expansion.

@Markuzz wrote:

I am familiar with ssh and the command line. So any help as to try to reconnect the volume via a terminal is highly appreciated!

Is ssh enabled on the NAS? If it isn't then you'll need to boot into tech support mode and connect with telnet. We can give more detailed instructions if needed.

Given the dicey situation with the disks, I'd begin with running full non-destructive tests with smartctl, and then attempt to assemble/remount the volume. If successful, I strongly recommend that you back up the data before doing anything else.

Sandshark · ‎2023-06-26

Your problem may be power related -- not enough to spin the drives up properly. It could be the power brick or something internal to the NAS.

The best next step is to test the two "bad" drives in a PC with the drive vendor's tools. If they actually pass (except for the existing SMART errors), then that adds evidence that that is the issue.

BTW, you have your alert email set up wrong, which is why you didn't get any email warnings about this.

Markuzz · ‎2023-06-26

Hi @Sandshark ,

That might actually be the case..... the power plug into the NAS is kinda dodgy. And does not connect that well.

Maybe that is why some disks drop out sometimes?

Cheers, Mark.

Markuzz · ‎2023-06-26

Hi @StephenB ,

SSH is enabled on the NAS.

"Given the dicey situation with the disks, I'd begin with running full non-destructive tests with smartctl, and then attempt to assemble/remount the volume. If successful, I strongly recommend that you back up the data before doing anything else."

Could you please provide me with the correct commands? I am not familiar with these. I also may need the standard ssh login credentials 🙂

Cheers! Mark.

Markuzz · ‎2023-06-26

Ignore the credentials..... found the admin user to login via ssh 🙂

StephenB · ‎2023-06-26

@Markuzz wrote:

Ignore the credentials..... found the admin user to login via ssh 🙂

Use root as the username for this, using the NAS admin password.

For disk testing

smartctl -t long /dev/sdX

where sdX is the disk you want to test. (Normally sda, sdb, sdc, or sdd).

You need to look back later to see the status.

smartctl -c /dev/sdX | grep execution

will let you see if the test is still in progress

smartctl -a /dev/sdX | grep offline

will give you the test history (most recent test first).

You can attempt to forcibly assemble the volume useing

mdadm --assemble --scan --really-force /dev/md127

If this works, you can attempt to mount the volume

btrfs device scan
mount -o ro /dev/md127 /MusicDrive

The -o ro mounts it as read-only (which I think is the right thing to do intially). If you want to go directly to read/write, then leave that out.

Markuzz · ‎2023-06-26

Many thanks @StephenB !

Test in progress..... will let you know what the results are, and if reassembling the volume succeeded.

Cheers! Mark.

Markuzz · ‎2023-06-26

Mornin' @StephenB ,

I checked the first 2 disks, no errors. So I tried to assemble md127, but I get error "not mentioned in conf file".

So I checked what's configured: mdadm --detail --scan

ARRAY /dev/md/0 metadata=1.2 name=2fe4f7cc:0 UUID=1dc24327:0fd661e6:42ad855b:f7c8a391

ARRAY /dev/md/1 metadata=1.2 name=2fe4f7cc:1 UUID=653c416b:60f3b909:9ea71f0e:9115b1d9

and

cat /proc/mdstat

Personalities : [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]

md1 : active raid10 sda2[0] sdd2[3] sdc2[2] sdb2[1]

1044480 blocks super 1.2 512K chunks 2 near-copies [4/4] [UUUU]

md0 : active raid1 sdd1[5] sda1[4] sdb1[6] sdc1[1]

4190208 blocks super 1.2 [4/4] [UUUU]

unused devices: <none>

So I tried to reassemble the existing volumes through:

mdadm --assemble --scan --really-force /dev/md0

mdadm --assemble --scan --really-force /dev/md1

Then mount: mount -o ro /dev/md1 /MusicDrive

mount: unknown filesystem type 'swap'

mount -o ro /dev/md0 /MusicDrive

mount: /dev/md0 is already mounted or /MusicDrive busy

/dev/md0 is already mounted on /

Not sure how to proceed from here, all suggestions are welcome 🙂

Cheers, Mark.

StephenB · ‎2023-06-27

@Markuzz wrote:

Then mount: mount -o ro /dev/md1 /MusicDrive

mount: unknown filesystem type 'swap'

mount -o ro /dev/md0 /MusicDrive

mount: /dev/md0 is already mounted or /MusicDrive busy

/dev/md0 is already mounted on /

Both of these are mistakes. md1 is a raw swap partition for the OS. md0 is the OS. So neither are MusicDrive.

@Markuzz wrote:
I checked the first 2 disks, no errors. So I tried to assemble md127, but I get error "not mentioned in conf file".

Try

mdadm --assemble --really-force /dev/md127 /dev/sd[?]3

Markuzz · ‎2023-06-27

Hi @StephenB ,

Thanks again for your reply. As you noticed, I am not very familiar with these issues 🙂

I tried: mdadm --assemble --really-force /dev/md127 /dev/sd[?]3

I got this returned:

mdadm: cannot open device /dev/sd[?]3: No such file or directory

mdadm: /dev/sd[?]3 has no superblock - assembly aborted

Kind regards,

Mark.

StephenB · ‎2023-06-27

@Markuzz wrote:

mdadm: cannot open device /dev/sd[?]3: No such file or directory

Can you run lsblk and then copy/paste the output here?

Markuzz · ‎2023-06-27

Hi @StephenB ,

See attachment for output for lsblk.

thanks, Mark.

StephenB · ‎2023-06-27

@Markuzz wrote:

See attachment for output for lsblk.

Thanks. So sda3, sdb3, sdc3, and sdd3 all exist. They are the partitions that need to be assembled.

mdadm is complaining about the superblock(s). What do you see with this?

mdadm --examine /dev/sd[a-d]3

Markuzz · ‎2023-06-27

Hi @StephenB ,

See attachment for output 1 of 2.

Kind regards,

Mark.

Markuzz · ‎2023-06-27

And file 2.

StephenB · ‎2023-06-27

@Markuzz wrote:

See attachment for output 1 of 2.

This should assemble, so I am puzzled by the earlier issue.

The event counters (13922, 13917, 13926, 13917) do show some lost writes - they should all be 13926.

Maybe try this variation:

mdadm --assemble --really-force /dev/md127 /dev/sda3 /dev/sdb3 /dev/sdc3 /dev/sdd3

Markuzz · ‎2023-06-27

@StephenB YOU ROCK! 😀

Yes, I was able to reassemble and mount the volume! It is in read-only mode now, but that's fine to backup all the data.

Many, many thanks for your help!

Kind regards,

Mark.

StephenB · ‎2023-06-28

@Markuzz wrote:

Many, many thanks for your help!

I'm glad I was able to help.

Volume lost on ReadyNas 104

Volume lost on ReadyNas 104

Re: Volume lost on ReadyNas 104

Re: Volume lost on ReadyNas 104

Re: Volume lost on ReadyNas 104

Re: Volume lost on ReadyNas 104

Re: Volume lost on ReadyNas 104

Re: Volume lost on ReadyNas 104

Re: Volume lost on ReadyNas 104

Re: Volume lost on ReadyNas 104

Re: Volume lost on ReadyNas 104

Re: Volume lost on ReadyNas 104

Re: Volume lost on ReadyNas 104

Re: Volume lost on ReadyNas 104

Re: Volume lost on ReadyNas 104

Re: Volume lost on ReadyNas 104

Re: Volume lost on ReadyNas 104

Re: Volume lost on ReadyNas 104

Re: Volume lost on ReadyNas 104

Re: Volume lost on ReadyNas 104

Re: Volume lost on ReadyNas 104

Re: Volume lost on ReadyNas 104

Re: Volume lost on ReadyNas 104

Re: Volume lost on ReadyNas 104

Re: Volume lost on ReadyNas 104