Orbi WiFi 7 RBE973
Reply

Re: Remove inactive volumes to use the disk. Disk #1,2,3,4. troubleshooting

bizmate
Tutor

Remove inactive volumes to use the disk. Disk #1,2,3,4. troubleshooting

As per many other posts i am experiencing
"Remove inactive volumes to use the disk. Disk #1,2,3,4."

But i cannot find a post or a guide that gives better troubleshooting info. So far apart from this error i see

 

ReadyNas_2024-12-13 at 19.54.30.png

I have also downloaded the logs and made them available here https://www.dropbox.com/scl/fi/l2alw1clyltwbxar08ru0/System_log-nas-bizmate-20241213-152433.zip?rlke... 

 

In the logs i am not sure what i am looking for exactly, so it would be great if you could suggest what other steps i can take to find the root cause or how to resync this system.
I was working a few nights ago and now all volumes are gone.

Please feel free to share a sequence of commands/steps even through ssh is fine.

 

Message 1 of 8
StephenB
Guru

Re: Remove inactive volumes to use the disk. Disk #1,2,3,4. troubleshooting

There is some privacy loss when you post the log zip publically, so I suggest that you delete the dropbox upload.

 

BTW, your email alerts are misconfigured - something you should take care of. 

 

Disk 1 (sda, serial #PK1334PBH9DLVX) has failed.  The log is filled with errors like this:

Dec 13 14:42:52 nas-bizmate kernel: Buffer I/O error on dev sda, logical block 0, async page read
Dec 13 14:42:52 nas-bizmate kernel: sd 0:0:0:0: [sda] tag#7 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Dec 13 14:42:52 nas-bizmate kernel: sd 0:0:0:0: [sda] tag#7 CDB: Read(16) 88 00 00 00 00 00 00 00 00 00 00 00 00 08 00 00
Dec 13 14:42:52 nas-bizmate kernel: blk_update_request: I/O error, dev sda, sector 0

 

 

In addition disk 2 is no longer in sync.  I can't tell from the logs how that happened, but the failure of disk 1 might be have resulted in lost writes to other disks (depending on the details of how the disk is behaving). 

[Fri Dec 13 15:14:27 2024] md: md127 stopped.
[Fri Dec 13 15:14:27 2024] md: bind<sda3>
[Fri Dec 13 15:14:27 2024] md: bind<sdb3>
[Fri Dec 13 15:14:27 2024] md: bind<sdd3>
[Fri Dec 13 15:14:27 2024] md: bind<sdc3>
[Fri Dec 13 15:14:27 2024] md: kicking non-fresh sdb3 from array!
[Fri Dec 13 15:14:27 2024] md: unbind<sdb3>
[Fri Dec 13 15:14:27 2024] md: export_rdev(sdb3)
[Fri Dec 13 15:14:27 2024] md: kicking non-fresh sda3 from array!
[Fri Dec 13 15:14:27 2024] md: unbind<sda3>
[Fri Dec 13 15:14:27 2024] md: export_rdev(sda3)
[Fri Dec 13 15:14:27 2024] md/raid:md127: device sdc3 operational as raid disk 2
[Fri Dec 13 15:14:27 2024] md/raid:md127: device sdd3 operational as raid disk 3
[Fri Dec 13 15:14:27 2024] md/raid:md127: allocated 4294kB
[Fri Dec 13 15:14:27 2024] md/raid:md127: not enough operational devices (2/4 failed)

 

If you have no backup of your data, then you have three options:

  1. You can power down, remove disk 1, and boot the NAS in tech support mode.  I can you give you some linux commands to force disk 2 into the RAID array.  There could be some data loss, due to the lost writes.  Making any mistakes with the commands could also result in data loss.
  2. You can also connect the disks to a Windows PC (using either SATA or USB docks/enclosures) and purchase a license to RAID recovery software.  ReclaiMe is a popular choice that many here have used with success.  If you look at other packages, make sure they support both software RAID and the BTRFS file system.
  3. You can contract with a data recovery service.

Option 1 is free, the others will cost.

 

Message 2 of 8
bizmate
Tutor

Re: Remove inactive volumes to use the disk. Disk #1,2,3,4. troubleshooting

@StephenB thank you for the reply. I removed the logs.
I use mainly linux or Mac, do not use windows in general. If the approach 1 and 2 only risk to loose recent writes i think it is fine.

 

  • For approach 1 - would i SSH into the machine to run these commands whilst disk 1 is removed?
  • For approach 2 - the logs you are showing seem to come from md, does that mean that mdadm can be used or for any kind of recovery on this system a btrfs tool is required?

 

Even if data is recovered what is the full system recovery process?

- Assuming option 1 works, should I buy a new disk from the same brand/model, or what other aspects should i follow

- if i remember correctly with SMART i should be able to check health of disks to see how far they are before they die. I have a separate SATA dock where i can check each disk from

In the next few days i ll check the other disks i have as backup and see if they are enough to make a backup (i dont remember what i last took a backup of the NAS, i need to write these things down) assuming i manage to recover some sort of access

Message 3 of 8
StephenB
Guru

Re: Remove inactive volumes to use the disk. Disk #1,2,3,4. troubleshooting


@bizmate wrote:

 

  • For approach 1 - would i SSH into the machine to run these commands whilst disk 1 is removed?

 


You'd use the menu to boot the NAS in tech support mode.  Then you'd connect with telnet (not ssh), using a back-door password.  Macs don't have telnet built in, so you'd either need to get a third party app for it, or use a linux system.

 


@bizmate wrote:
  • For approach 2 - the logs you are showing seem to come from md, does that mean that mdadm can be used 

Yes.  The issue is that the RAID array won't assemble.  The "repair" is to have mdadm assemble it anyway using the  --really-force option.

 

There could also be BTRFS issues - no way to tell until the array is assembled and you try to mount it.

 


@bizmate wrote:

If the approach 1 and 2 only risk to lose recent writes i think it is fine.


The writes could also corrupt BTRFS metadata/folder structures, which could result in quite a bit of data loss.

 

That is rare, but it can happen.

 


@bizmate wrote:

 

- Assuming option 1 works, should I buy a new disk from the same brand/model, or what other aspects should i follow

 


 

The disk model goes back to 2014, so if you try to match it you'd end up with either very old inventory or a used disk.  I don't recommend that.

 

Ultrastars do have a good reputation, and you can replace it with a current HC310.

 

You can mix/match models.  You do need to avoid disks that use SMR (Shingled Magnetic Recording).  NAS-purposed include Seagate Ironwolf and WD Red Plus (but not normal WD Reds, since they are SMR).  Any Enterprise-class drive will work.

 

If you see a need for more space over the next couple of years, then get a pair of larger drives, and expand the volume while you are at it.

 


@bizmate wrote:

 

- if i remember correctly with SMART i should be able to check health of disks to see how far they are before they die. I have a separate SATA dock where i can check each disk from


You can also run smartctl using ssh - running the long non-destructive test.

 

After you get the volume remounted, you should look at the maintenance functions on the volume settings wheel.  They include disk test and scrub.  Scrub requires accessing every sector of the disk, so IMO it also doubles as a diagnostic.

 

On my own NAS, I schedule one of these tasks every month (cycling through them all 3x a year).  I order them as 

  1. disk test
  2. balance
  3. scrub
  4. defrag

in order to fully exercise the disks every other month.

 

Although I do monitor SMART stats, I don't think they are a very good predictive indicator on how long a drive will last.  I look for signs that a disk needs immediate replacement.  Also keep in mind that drives can and do fail with no warning (and no SMART errors). 

 

The net here is that the only way to keep data safe is to have a solid backup plan.

 

 

Message 4 of 8
bizmate
Tutor

Re: Remove inactive volumes to use the disk. Disk #1,2,3,4. troubleshooting

I am waiting for sometime to check a spare disk i used as extra backup before trying the resync also because i cannot find my docking station.

On my mac brew install telnet worked
----

$ which telnet

/usr/local/bin/telnet

----

 

About the RAID not re-assemblying is the command required to fix it only

 

 mdadm   --really-force

 

Is this the only command required? Anything else you can add please let me know.
Ie 
1 - remove disk 1
2 - boot in tech support mode, is this something done at boot sequence? How can it get this done?
3 - any mdadm or other commands to execute here before the resync, ie "lsblk -o NAME,SIZE,FSTYPE,TYPE,MOUNTPOINT "
4 - what would the full resync command be?

 

About all the suggestions you gave i ll try to learn more. About replacing the drives with new ones  this one  https://fastclick.mu/toshiba-4tb-s300-surveillance-7200-rpm-128-mb-buffer/ i would say affordable one as Amazon or similar are not available here ... 

Message 5 of 8
StephenB
Guru

Re: Remove inactive volumes to use the disk. Disk #1,2,3,4. troubleshooting


@bizmate wrote:


2 - boot in tech support mode, is this something done at boot sequence? How can it get this done?

 


See pages 28-29 in the hardware manual here:

 

@bizmate wrote:


3 - any mdadm or other commands to execute here before the resync, ie "lsblk -o NAME,SIZE,FSTYPE,TYPE,MOUNTPOINT "

 


To be clear, you aren't resyncing - that cannot be done with one disk remove.

 

When you connect to the NAS with telnet, log in as root.  The password is infr8ntdebug.

 

Enter these commands:

rnutil chroot
mdadm --assemble --really-force /dev/md127 /dev/sda3 /dev/sdb3 /dev/sdc3
btrfs device scan
mount /dev/md127 /data
ls /data

 

The ls command should include your shares if everything works.

 

If you see errors from any of these commands, stop at that point and let us know what they are.

 

 

Otherwise, power down the NAS, and reboot.  The volume should then mount (but will be degraded due to the removed disk).

 


@bizmate wrote:


4 - what would the full resync command be?

 


Again, you cannot resync with a missing disk.  When you get the replacement you can hot-insert it into the empty slot (with the NAS running).  The NAS will do a brief disk test, and should then add the disk to the array.

 


@bizmate wrote:

 

About all the suggestions you gave i ll try to learn more. About replacing the drives with new ones  this one  https://fastclick.mu/toshiba-4tb-s300-surveillance-7200-rpm-128-mb-buffer/ i would say affordable one as Amazon or similar are not available here ... 


The S300 is a surveillance drive - optimized for sustained write performance over read.  It will work in a NAS, but I suggest the N300 instead.

Message 6 of 8
Sandshark
Sensei

Re: Remove inactive volumes to use the disk. Disk #1,2,3,4. troubleshooting

Before you add a new drive to re-sync, you should ponder these points:

 

Are your other drives just as old as the one that failed?  It's always best to back up all your data before a re-sync, but it's more important the older the drives are.  Re-sync puts a lot of stress on the old and the new.  Are they so old that you should consider replacing them all, or do they have any SMART errors?

 

Is just a re-sync what you want to do?  You are forcing the mdadm RAID assembly, which likely means there are errors on the "good" drives.  Would a factory default and restore of your data from that backup be better (it usually is, but it does take time).   If you do think all the drives should be replaced, then that's all the more reason to just start over with the new drives instead of swapping them one at a time.

Message 7 of 8
StephenB
Guru

Re: Remove inactive volumes to use the disk. Disk #1,2,3,4. troubleshooting


@Sandshark wrote:

 

Are your other drives just as old as the one that failed? 


The drives are all HGST Ultrastar.  10 years old, with about 56,000 power on hours (6.4 years).  Disk 2 ( serial PK2334PBHED7KR) is showing a high spin retry count from last April, but nothing concerning after that.  

 

It is quite possible that other disks will need to be replaced in the near future.

 

@bizmate - Backing up the data before doing adding the new disk would be a good idea.  If another disk does fail during the resync process, then you would lose all your data.

 


@Sandshark wrote:

 

Is just a re-sync what you want to do?  You are forcing the mdadm RAID assembly, which likely means there are errors on the "good" drives.  Would a factory default and restore of your data from that backup be better (it usually is, but it does take time).   


As I mentioned above, there will be some data corruption do to the lost writes, possibly of BTRFS metadata/structures.  Rebuilding the volume from scratch and restoring the files from backup would give you a completely clean file system.  But if you don't have that backup already, then there is no way to fix the corrupted files.

Message 8 of 8
Top Contributors
Discussion stats
  • 7 replies
  • 600 views
  • 0 kudos
  • 3 in conversation
Announcements