RN31400: data DEAD, All drive showing NEW healthy on reboot

Retired_Member · ‎2020-09-15

NAS was showing 'data Dead' on LCD on inspection after noticing that it wasn't working. Turned it off and restarted the box. Rebooted, admin suite has the notification 'Remove inactive volumes to use the disk. Disk #1,2,3,4.' Turned off box overnight. Turned on this morning, LCD giving messages such as new disk added, however disks are untouched. Ultimately booted into safemode this time. Have run OS Reinstall, no change.

Under volumes:

my original 'data' volume shows data 0, Free: 0, Type: RAID unknown.
new 'data-0' volume that has appeared shows data 8.17 TB, Free: 0, Type: RAID 5.
All four disks showing no errors, giving Volume State: NEW & Disk State: ONLINE on each

Is there a way to rebuild the data volume?

Logs:

Spoiler

Sep 15, 2020 09:56:29 AM		System: ReadyNASOS background service started.
Sep 15, 2020 09:56:29 AM		System: Firmware was upgraded to 6.10.3.
Sep 15, 2020 09:56:26 AM		System: ReadyNASOS service or process was restarted.
Sep 15, 2020 09:43:43 AM		System: ReadyNASOS background service started.
Sep 14, 2020 08:58:54 PM		System: Alert message failed to send.
Sep 14, 2020 08:58:54 PM		System: The system is shutting down.
Sep 14, 2020 08:54:11 PM		System: ReadyNASOS background service started.
Sep 14, 2020 08:51:08 PM		System: Alert message failed to send.
Sep 14, 2020 08:51:08 PM		System: The system is rebooting.
Sep 14, 2020 08:43:49 PM		System: ReadyNASOS background service started.
Sep 14, 2020 08:43:25 PM		System: ReadyNASOS service or process was restarted.
Sep 12, 2020 09:39:49 PM		Disk: Disk in channel 3 (Internal) changed state from ONLINE to RESYNC.
Sep 12, 2020 09:39:39 PM		Disk: Disk Model:WDC WD30EFRX-68EUZN0 Serial:WD-WCC4N6EXP6UL was added to Channel 3 of the head unit.
Sep 12, 2020 09:35:38 PM		Disk: Disk in channel 1 (Internal) changed state from RESYNC to ONLINE.
Sep 12, 2020 09:32:32 PM		Disk: Disk Model:WDC WD30EFRX-68EUZN0 Serial:WD-WCC4N4VF5VFL was added to Channel 1 of the head unit.
Sep 12, 2020 09:28:24 PM		Disk: Disk Model:WDC WD30EFRX-68EUZN0 Serial:WD-WCC4N6EXP6UL was removed from Channel 3 of the head unit.
Sep 12, 2020 09:24:23 PM		Volume: Volume data health changed from Redundant to Dead.
Sep 12, 2020 09:24:21 PM		Disk: Disk Model:WDC WD30EFRX-68EUZN0 Serial:WD-WCC4N4VF5VFL was removed from Channel 1 of the head unit.
Sep 12, 2020 12:23:49 AM		Snapshot: Snapshot prune worker successfully deleted snapshot 2020_07_18__00_00_20 from share or LUN Transmission.
Sep 12, 2020 12:00:25 AM		Snapshot: Snapshot c_1599868825 was successfully created for share or LUN Transmission.

Sep 15, 2020 09:56:29 AM System: ReadyNASOS background service started.Sep 15, 2020 09:56:29 AM System: Firmware was upgraded to 6.10.3.Sep 15, 2020 09:56:26 AM System: ReadyNASOS service or process was restarted.Sep 15, 2020 09:43:43 AM System: ReadyNASOS background service started.Sep 14, 2020 08:58:54 PM System: Alert message failed to send.Sep 14, 2020 08:58:54 PM System: The system is shutting down.Sep 14, 2020 08:54:11 PM System: ReadyNASOS background service started.Sep 14, 2020 08:51:08 PM System: Alert message failed to send.Sep 14, 2020 08:51:08 PM System: The system is rebooting.Sep 14, 2020 08:43:49 PM System: ReadyNASOS background service started.Sep 14, 2020 08:43:25 PM System: ReadyNASOS service or process was restarted.Sep 12, 2020 09:39:49 PM Disk: Disk in channel 3 (Internal) changed state from ONLINE to RESYNC.Sep 12, 2020 09:39:39 PM Disk: Disk Model:WDC WD30EFRX-68EUZN0 Serial:WD-WCC4N6EXP6UL was added to Channel 3 of the head unit.Sep 12, 2020 09:35:38 PM Disk: Disk in channel 1 (Internal) changed state from RESYNC to ONLINE.Sep 12, 2020 09:32:32 PM Disk: Disk Model:WDC WD30EFRX-68EUZN0 Serial:WD-WCC4N4VF5VFL was added to Channel 1 of the head unit.Sep 12, 2020 09:28:24 PM Disk: Disk Model:WDC WD30EFRX-68EUZN0 Serial:WD-WCC4N6EXP6UL was removed from Channel 3 of the head unit.Sep 12, 2020 09:24:23 PM Volume: Volume data health changed from Redundant to Dead.Sep 12, 2020 09:24:21 PM Disk: Disk Model:WDC WD30EFRX-68EUZN0 Serial:WD-WCC4N4VF5VFL was removed from Channel 1 of the head unit.Sep 12, 2020 12:23:49 AM Snapshot: Snapshot prune worker successfully deleted snapshot 2020_07_18__00_00_20 from share or LUN Transmission.Sep 12, 2020 12:00:25 AM Snapshot: Snapshot c_1599868825 was successfully created for share or LUN Transmission.

StephenB · ‎2020-09-19

I don't think the NAS retains any state on this, other than what btrfs itself retains.

What you need to do is forcibly mount the array, and perhaps also attempt to repair the file system. This isn't something I've ever needed to do, so I can't really offer much advice.

If you boot up in tech support mode, you could try

# rnutil chroot
# btrfs device scan
# btrfs fi show
# mount -t btrfs -o ro,recovery /dev/md127 /data

If you just go into ssh with the NAS running, then you'd skip the rnutil command.

One user here successfully used that mount command to mount a missing volume. If it works then it will mount the volume read-only so you'd need to offload the data.

View solution in original post

Sandshark · ‎2020-09-15

Since you have access to the data, I think you are far better off doing a backup, factory default, and restore. I would not trust that you have taken care of every possible detail if you manually try to re-build the configuration files. And if you haven't, future expansion could be problematic.

FYI "data-0" is the name of the first MDADM RAID that makes up BTRFS volume "data". If you have two layers of RAID (from vertical expansion in the past), you may have "lost" the second one (data-1).

Retired_Member · ‎2020-09-16

@Sandshark Thank you for your reply. Your help is utterly invaluable! I'm not sure I understand. I've only really looked at the web admin suite showing the following: https://i.imgur.com/MNs02tZ.jpg

The four identical drives have been in from the start, RAID 5 and at this point I will have no interest in expanding the device in the future. This box was chosen to avoid this kind of time sink, if a drive failed, slap in a new one and forget was intended. 😞 As you can see above when it was working I only had 'data' volume as RAID 5 with the usage tallied.

When it comes to backing up and reseting as you mention, how do I have access to data? Is there a process/guide that you could recommend? (sorry I am having difficulty googling this issue!) I'm guessing that the backup will probably requires roughly 4TB, I'll need to have that space available elsewhere... Could be time to get a new nas and put this one on ebay?

I'd love to know why the software decided that any hard drives disconnected considering the device is locked in a ventilated cupboard and only I have the key. Hey ho. 😕

StephenB · ‎2020-09-16

@Retired_Member wrote:

When it comes to backing up and reseting as you mention, how do I have access to data?

I think @Sandshark was for some reason assuming that you did still have access to the data.

@Retired_Member wrote:

I've only really looked at the web admin suite showing the following: https://i.imgur.com/MNs02tZ.jpg

Can you download the full log zip file (click on the logs page and you'll see a download control)?

The full disk smart stats are in disk_info.log, so you might look at those.

It would also be useful if you copy/paste mdstat.log into a reply here. It's best if you use the </> tool in the toolbar.

Retired_Member · ‎2020-09-16

@StephenB No worries on Sandshark, I totally understand misunderstandings when helping at arms length! 🙂 Probably my fault!

Here are the logs:

Disk_info.log

Device:             sdc
Controller:         0
Channel:            0
Model:              WDC WD30EFRX-68EUZN0
Serial:             WD-WCC4N4VF5VFL
Firmware:           82.00A82W
Class:              SATA
RPM:                5400
Sectors:            5860533168
Pool:               data-0
PoolType:           RAID 5
PoolState:          5
PoolHostId:         2fe5cdf4
Health data 
  ATA Error Count:                0
  Reallocated Sectors:            0
  Reallocation Events:            0
  Spin Retry Count:               0
  Current Pending Sector Count:   0
  Uncorrectable Sector Count:     0
  Temperature:                    41
  Start/Stop Count:               3541
  Power-On Hours:                 31420
  Power Cycle Count:              35
  Load Cycle Count:               3525

Device:             sda
Controller:         0
Channel:            1
Model:              WDC WD30EFRX-68EUZN0
Serial:             WD-WCC4N4SFKV58
Firmware:           82.00A82W
Class:              SATA
RPM:                5400
Sectors:            5860533168
Pool:               data-0
PoolType:           RAID 5
PoolState:          5
PoolHostId:         2fe5cdf4
Health data 
  ATA Error Count:                0
  Reallocated Sectors:            0
  Reallocation Events:            0
  Spin Retry Count:               0
  Current Pending Sector Count:   0
  Uncorrectable Sector Count:     0
  Temperature:                    43
  Start/Stop Count:               3720
  Power-On Hours:                 31413
  Power Cycle Count:              37
  Load Cycle Count:               3705

Device:             sdd
Controller:         0
Channel:            2
Model:              WDC WD30EFRX-68EUZN0
Serial:             WD-WCC4N6EXP6UL
Firmware:           82.00A82W
Class:              SATA
RPM:                5400
Sectors:            5860533168
Pool:               data-0
PoolType:           RAID 5
PoolState:          5
PoolHostId:         2fe5cdf4
Health data 
  ATA Error Count:                0
  Reallocated Sectors:            0
  Reallocation Events:            0
  Spin Retry Count:               0
  Current Pending Sector Count:   0
  Uncorrectable Sector Count:     0
  Temperature:                    43
  Start/Stop Count:               3790
  Power-On Hours:                 30676
  Power Cycle Count:              34
  Load Cycle Count:               3770

Device:             sdb
Controller:         0
Channel:            3
Model:              WDC WD30EFRX-68EUZN0
Serial:             WD-WCC4N6EXPLT6
Firmware:           82.00A82W
Class:              SATA
RPM:                5400
Sectors:            5860533168
Pool:               data-0
PoolType:           RAID 5
PoolState:          5
PoolHostId:         2fe5cdf4
Health data 
  ATA Error Count:                0
  Reallocated Sectors:            0
  Reallocation Events:            0
  Spin Retry Count:               0
  Current Pending Sector Count:   0
  Uncorrectable Sector Count:     0
  Temperature:                    41
  Start/Stop Count:               3734
  Power-On Hours:                 30640
  Power Cycle Count:              37
  Load Cycle Count:               3716

mdstat.log

Personalities : [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] 
md1 : active raid10 sdd2[3] sdc2[2] sdb2[1] sda2[0]
      1044480 blocks super 1.2 512K chunks 2 near-copies [4/4] [UUUU]
      
md0 : active raid1 sdd1[6] sdc1[4] sda1[1] sdb1[5]
      4190208 blocks super 1.2 [4/4] [UUUU]
      
unused devices: <none>
/dev/md/0:
           Version : 1.2
     Creation Time : Sat Dec 24 09:59:59 2016
        Raid Level : raid1
        Array Size : 4190208 (4.00 GiB 4.29 GB)
     Used Dev Size : 4190208 (4.00 GiB 4.29 GB)
      Raid Devices : 4
     Total Devices : 4
       Persistence : Superblock is persistent

       Update Time : Wed Sep 16 11:27:42 2020
             State : clean 
    Active Devices : 4
   Working Devices : 4
    Failed Devices : 0
     Spare Devices : 0

Consistency Policy : unknown

              Name : 2fe5cdf4:0  (local to host 2fe5cdf4)
              UUID : 0306a872:cd2ef930:e5f2473d:32708412
            Events : 649

    Number   Major   Minor   RaidDevice State
       4       8       33        0      active sync   /dev/sdc1
       1       8        1        1      active sync   /dev/sda1
       6       8       49        2      active sync   /dev/sdd1
       5       8       17        3      active sync   /dev/sdb1

As I see it, the box had a special moment and erroniously decided two discs at the same time were disconnected hence giving a fail on the raid array. The data is presumably all there still.

Thank you for your time and assistance!

StephenB · ‎2020-09-16

So the disks do have healthy smart stats, but the array for the volume itself doesn't appear in mdstat.log. You are only showing the OS partition and the swap partition.

What you might want to do next is send a private message (PM) to one of the mods - either @JohnCM_S or @Marc_V - asking them to analyze the logs. Upload your log zip file to cloud storage (google drive, dropbox, etc), and include a download link in the PM. Also send a link to this thread.

You send a PM using the envelope link in the upper right of the forum page. Note you shouldn't post a link to the log zip publicly.

Sandshark · ‎2020-09-16

Yes, I guess I confused your message with part of another where data access was still there. Have you ever done a vertical expansion on this NAS? If so, you are also missing the data-1 RAID, which should be part of the data BTRFS volume, which would prevent it from mounting. But rather than have you continue to post more log files, getting one of the mods to do an analysis will be a lot faster and probably more fruitful.

JohnCM_S · ‎2020-09-16

Hi dougal1983,

I checked the logs and saw that the volume cannot be mounted because there are not enough operational disks.

Sep 16 09:00:46 NAS kernel: md/raid:md127: device sda3 operational as raid disk 1
Sep 16 09:00:46 NAS kernel: md/raid:md127: device sdb3 operational as raid disk 3
Sep 16 09:00:46 NAS kernel: md/raid:md127: allocated 4362kB
Sep 16 09:00:46 NAS kernel: md/raid:md127: not enough operational devices (2/4 failed)

I saw errors in bay 1 and bay 3.

Sep 16 09:00:47 NAS kernel: ata1: exception Emask 0x10 SAct 0x0 SErr 0x4040000 action 0xe frozen
Sep 16 09:00:47 NAS kernel: ata1: irq_stat 0x00000040, connection status changed
Sep 16 09:00:47 NAS kernel: ata1: SError: { CommWake DevExch }
Sep 16 09:00:47 NAS kernel: ata1: limiting SATA link speed to 1.5 Gbps
Sep 16 09:00:47 NAS kernel: ata1: hard resetting link

Sep 16 09:00:47 NAS kernel: ata3: exception Emask 0x10 SAct 0x0 SErr 0x4040000 action 0xe frozen
Sep 16 09:00:47 NAS kernel: ata3: irq_stat 0x00000040, connection status changed
Sep 16 09:00:47 NAS kernel: ata3: SError: { CommWake DevExch }
Sep 16 09:00:47 NAS kernel: ata3: limiting SATA link speed to 1.5 Gbps
Sep 16 09:00:47 NAS kernel: ata3: hard resetting link

It will be best to check the health of those disks using the disk manufacturer's tool.

Regards,

Retired_Member · ‎2020-09-18

Righto. All disks checked.

Failed SMART/Extended Tests. Bad Sector repaired. (This'll be going in the bin, will get new drive).
Passed SMART/Extended Tests.
Passed SMART/Extended Tests.
Passed SMART/Extended Tests.

I did get disk 3 with unknown status, after plugging back in but have given connectors a little puff of air and replugged in and it's OK for now (hopefully).

Is there a way to attempt to rebuild volume from 2,3 & 4? Will add new disc in 1 with same capacity. If I get that I'll perform a full backup in penitance!

Thank you for any and all advice!

StephenB · ‎2020-09-18

You can try booting up the NAS again with slot 1 empty. Perhaps try read-only mode, for safety - see pages 74-75 here: https://www.downloads.netgear.com/files/GDC/READYNAS-100/ReadyNAS_%20OS6_Desktop_HM_EN.pdf

Though if there were lost writes due to the errors, the volume still might not mount. You could try Netgear support (though if you need data recovery, it will be expensive) - https://kb.netgear.com/69/ReadyNAS-Data-Recovery-Diagnostics-Scope-of-Service

Retired_Member · ‎2020-09-19

@StephenB No, didn't work booting in safemode and removing #1. The box decided at one point that 2 discs had failed and now will sit on it's hands. Disc 2,3,4 are fine although bay three might have an issue with the port. #1 had a bad sector and the box picked an terrible time to get it's knickers in a twist on port 3.

All I want to know is... can I ssh onto the box and tell it proceed with rebuild from 2,3,4. At this point, I don't care if I lose the data in the attempt. All I want is one attempt then I'll move on. If anyone can offer instruction (with absolute disclaimer, I won't blame anyone but myself).. I'd love to just attempt to recover data.

Alternatively if that is impossible on x-raid could you point me at DIY data recovery guides that may be relevant to x-raid devices.

StephenB · ‎2020-09-19

I don't think the NAS retains any state on this, other than what btrfs itself retains.

What you need to do is forcibly mount the array, and perhaps also attempt to repair the file system. This isn't something I've ever needed to do, so I can't really offer much advice.

If you boot up in tech support mode, you could try

# rnutil chroot
# btrfs device scan
# btrfs fi show
# mount -t btrfs -o ro,recovery /dev/md127 /data

If you just go into ssh with the NAS running, then you'd skip the rnutil command.

One user here successfully used that mount command to mount a missing volume. If it works then it will mount the volume read-only so you'd need to offload the data.

Retired_Member · ‎2020-09-21

Removed Disk 1. Booted up in tech support mode, done the following:

# rnutil chroot
# btrfs device scan
# btrfs fi show
# mount -t btrfs -o ro,recovery /dev/md127 /data

Restarted in normal mode(volume degraded, as no redundancy) and used RAIDar to browse and pull off data. Data saved! Thank you very much!

RN31400: data DEAD, All drive showing NEW healthy on reboot

RN31400: data DEAD, All drive showing NEW healthy on reboot

Re: RN31400: data DEAD, All drive showing NEW healthy on reboot

Re: RN31400: data DEAD, All drive showing NEW healthy on reboot

Re: RN31400: data DEAD, All drive showing NEW healthy on reboot

Re: RN31400: data DEAD, All drive showing NEW healthy on reboot

Re: RN31400: data DEAD, All drive showing NEW healthy on reboot

Re: RN31400: data DEAD, All drive showing NEW healthy on reboot

Re: RN31400: data DEAD, All drive showing NEW healthy on reboot

Re: RN31400: data DEAD, All drive showing NEW healthy on reboot

Re: RN31400: data DEAD, All drive showing NEW healthy on reboot

Re: RN31400: data DEAD, All drive showing NEW healthy on reboot

Re: RN31400: data DEAD, All drive showing NEW healthy on reboot

Re: RN31400: data DEAD, All drive showing NEW healthy on reboot

Re: RN31400: data DEAD, All drive showing NEW healthy on reboot