× NETGEAR will be terminating ReadyCLOUD service by July 1st, 2023. For more details click here.
Orbi WiFi 7 RBE973
Reply

Re: No Volume Exists - Remove inactive volumes in order to use the disk

Westyfield2
Tutor

No Volume Exists - Remove inactive volumes in order to use the disk

Hi,

 

Running v6.9.3

After a reboot the NAS came up saying "No Volume Exists".

Have done another couple of shutdowns and startups and it's now saying "Remove inactive volumes to use the disk. Disk #1,2,3,4,5,6,2,3,6,2,3."

 

Can anyone advise on how best to approach?

Had a quick look and none of the six disks are showing ATA Errors.

 

Happy to provide log files.

Thanks.

Message 1 of 19
JohnCM_S
NETGEAR Employee Retired

Re: No Volume Exists - Remove inactive volumes in order to use the disk

Hi Westyfield2,

 

You may upload the logs to Google Drive then send to me the download link via PM (private message). We will check what caused the error message.

 

Regards,

Message 2 of 19
Westyfield2
Tutor

Re: No Volume Exists - Remove inactive volumes in order to use the disk

PM sent

Message 3 of 19
Westyfield2
Tutor

Re: No Volume Exists - Remove inactive volumes in order to use the disk

So the disks look fine (Heart WD Enterprise)

 

Device:             sda
Controller:         0
Channel:            0
Model:              WDC WD2003FYYS-02W0B0
Serial:             WD-WMAY01159942
Firmware:           01.01D01
Class:              SATA
RPM:                7200
Sectors:            3907029168
Pool:               data
PoolType:           RAID 5
PoolState:          5
PoolHostId:         33eadf27
Health data 
  ATA Error Count:                1
  Reallocated Sectors:            0
  Reallocation Events:            0
  Spin Retry Count:               0
  Current Pending Sector Count:   0
  Uncorrectable Sector Count:     0
  Temperature:                    35
  Start/Stop Count:               4740
  Power-On Hours:                 63000
  Power Cycle Count:              64
  Load Cycle Count:               4716

Device:             sdb
Controller:         0
Channel:            1
Model:              WDC WD6002FRYZ-01WD5B0
Serial:             NCHBG8GS
Firmware:           01.01M02
Class:              SATA
RPM:                7200
Sectors:            11721045168
Pool:               data
PoolType:           RAID 5
PoolState:          5
PoolHostId:         33eadf27
Health data 
  ATA Error Count:                0
  Reallocated Sectors:            0
  Reallocation Events:            0
  Spin Retry Count:               0
  Current Pending Sector Count:   0
  Uncorrectable Sector Count:     0
  Temperature:                    35
  Start/Stop Count:               29
  Power-On Hours:                 20494
  Power Cycle Count:              29
  Load Cycle Count:               866

Device:             sdc
Controller:         0
Channel:            2
Model:              WDC WD6002FRYZ-01WD5B0
Serial:             NCGWTDVV
Firmware:           01.01M02
Class:              SATA
RPM:                7200
Sectors:            11721045168
Pool:               data
PoolType:           RAID 5
PoolState:          5
PoolHostId:         33eadf27
Health data 
  ATA Error Count:                0
  Reallocated Sectors:            0
  Reallocation Events:            0
  Spin Retry Count:               0
  Current Pending Sector Count:   0
  Uncorrectable Sector Count:     0
  Temperature:                    35
  Start/Stop Count:               32
  Power-On Hours:                 22223
  Power Cycle Count:              32
  Load Cycle Count:               938

Device:             sdd
Controller:         0
Channel:            3
Model:              WDC WD2003FYYS-02W0B1
Serial:             WD-WMAY04428148
Firmware:           01.01D02
Class:              SATA
RPM:                7200
Sectors:            3907029168
Pool:               data
PoolType:           RAID 5
PoolState:          5
PoolHostId:         33eadf27
Health data 
  ATA Error Count:                0
  Reallocated Sectors:            0
  Reallocation Events:            0
  Spin Retry Count:               0
  Current Pending Sector Count:   0
  Uncorrectable Sector Count:     0
  Temperature:                    37
  Start/Stop Count:               4759
  Power-On Hours:                 60625
  Power Cycle Count:              55
  Load Cycle Count:               4736

Device:             sde
Controller:         0
Channel:            4
Model:              WDC WD2003FYYS-02W0B1
Serial:             WD-WMAY04905430
Firmware:           01.01D02
Class:              SATA
RPM:                7200
Sectors:            3907029168
Pool:               data
PoolType:           RAID 5
PoolState:          5
PoolHostId:         33eadf27
Health data 
  ATA Error Count:                0
  Reallocated Sectors:            0
  Reallocation Events:            0
  Spin Retry Count:               0
  Current Pending Sector Count:   0
  Uncorrectable Sector Count:     0
  Temperature:                    36
  Start/Stop Count:               4318
  Power-On Hours:                 56959
  Power Cycle Count:              53
  Load Cycle Count:               4290

Device:             sdf
Controller:         0
Channel:            5
Model:              WDC WD4000F9YZ-09N20L0
Serial:             WD-WCC131766520
Firmware:           01.01A01
Class:              SATA
RPM:                7200
Sectors:            7814037168
Pool:               data
PoolType:           RAID 5
PoolState:          5
PoolHostId:         33eadf27
Health data 
  ATA Error Count:                0
  Reallocated Sectors:            0
  Reallocation Events:            0
  Spin Retry Count:               0
  Current Pending Sector Count:   0
  Uncorrectable Sector Count:     0
  Temperature:                    32
  Start/Stop Count:               1775
  Power-On Hours:                 43200
  Power Cycle Count:              36
  Load Cycle Count:               1754

From memory I think it probably had about 1.5TB free space when it failed.

 

But it looks like the transactions are slightly out of sync.

[Wed Apr 24 19:31:56 2019] md125: detected capacity change from 0 to 4000511819776
[Wed Apr 24 19:31:56 2019] BTRFS: device label 33eadf27:data devid 2 transid 1187337 /dev/md125
[Wed Apr 24 19:31:56 2019] BTRFS info (device md125): has skinny extents
[Wed Apr 24 19:32:15 2019] BTRFS error (device md125): parent transid verify failed on 9209338265600 wanted 1187338 found 1187336
[Wed Apr 24 19:32:15 2019] BTRFS error (device md125): parent transid verify failed on 9209338265600 wanted 1187338 found 1187336
[Wed Apr 24 19:32:15 2019] BTRFS warning (device md125): failed to read log tree
[Wed Apr 24 19:32:15 2019] BTRFS error (device md125): open_ctree failed

 

 

Message 4 of 19
JohnCM_S
NETGEAR Employee Retired

Re: No Volume Exists - Remove inactive volumes in order to use the disk

Hi Westyfield2,

 

I have checked the logs. It is correct that the transactions are out of sync

 

It says that the volume at this specific location knew it was going to complete a specific transaction (wanted) but, it found that it was only on a previous transaction.

 

Normally, you should be able to contact support so they can fix it but, they will not be able to assist you on this because your NAS is a legacy x86 NAS converted to OS6. Hopefully, other Community members can chime in here so they assist you on this.

 

Regards,

Message 5 of 19
Sandshark
Sensei

Re: No Volume Exists - Remove inactive volumes in order to use the disk

What are the results of lsblk?

Message 6 of 19
Westyfield2
Tutor

Re: No Volume Exists - Remove inactive volumes in order to use the disk

admin@NAS:/$ lsblk
NAME      MAJ:MIN RM  SIZE RO TYPE  MOUNTPOINT
sda         8:0    0  1.8T  0 disk
├─sda1      8:1    0    4G  0 part
│ └─md0     9:0    0    4G  0 raid1 /
├─sda2      8:2    0  512M  0 part
│ └─md1     9:1    0    2G  0 raid6 [SWAP]
└─sda3      8:3    0  1.8T  0 part
  └─md126   9:126  0  9.1T  0 raid5
sdb         8:16   0  5.5T  0 disk
├─sdb1      8:17   0    4G  0 part
│ └─md0     9:0    0    4G  0 raid1 /
├─sdb2      8:18   0  512M  0 part
│ └─md1     9:1    0    2G  0 raid6 [SWAP]
├─sdb3      8:19   0  1.8T  0 part
│ └─md126   9:126  0  9.1T  0 raid5
├─sdb4      8:20   0  1.8T  0 part
│ └─md125   9:125  0  3.7T  0 raid5
└─sdb5      8:21   0  1.8T  0 part
  └─md127   9:127  0  1.8T  0 raid1
sdc         8:32   0  5.5T  0 disk
├─sdc1      8:33   0    4G  0 part
│ └─md0     9:0    0    4G  0 raid1 /
├─sdc2      8:34   0  512M  0 part
│ └─md1     9:1    0    2G  0 raid6 [SWAP]
├─sdc3      8:35   0  1.8T  0 part
│ └─md126   9:126  0  9.1T  0 raid5
├─sdc4      8:36   0  1.8T  0 part
│ └─md125   9:125  0  3.7T  0 raid5
└─sdc5      8:37   0  1.8T  0 part
  └─md127   9:127  0  1.8T  0 raid1
sdd         8:48   0  1.8T  0 disk
├─sdd1      8:49   0    4G  0 part
│ └─md0     9:0    0    4G  0 raid1 /
├─sdd2      8:50   0  512M  0 part
│ └─md1     9:1    0    2G  0 raid6 [SWAP]
└─sdd3      8:51   0  1.8T  0 part
  └─md126   9:126  0  9.1T  0 raid5
sde         8:64   0  1.8T  0 disk
├─sde1      8:65   0    4G  0 part
│ └─md0     9:0    0    4G  0 raid1 /
├─sde2      8:66   0  512M  0 part
│ └─md1     9:1    0    2G  0 raid6 [SWAP]
└─sde3      8:67   0  1.8T  0 part
  └─md126   9:126  0  9.1T  0 raid5
sdf         8:80   0  3.7T  0 disk
├─sdf1      8:81   0    4G  0 part
│ └─md0     9:0    0    4G  0 raid1 /
├─sdf2      8:82   0  512M  0 part
│ └─md1     9:1    0    2G  0 raid6 [SWAP]
├─sdf3      8:83   0  1.8T  0 part
│ └─md126   9:126  0  9.1T  0 raid5
└─sdf4      8:84   0  1.8T  0 part
  └─md125   9:125  0  3.7T  0 raid5
Message 7 of 19
Sandshark
Sensei

Re: No Volume Exists - Remove inactive volumes in order to use the disk

OK, I just recently went though this exercise.  See How-to-recover-from-Remove-inactive-volumes-error.

 

I see your drives, in order of bays 1-6 are 2TB, 6TB, 6TB, 2TB, 2TB, 3TB.  These form three RAID layers, md127, md126, and md125.  All the RAIDs seem intact, but the BTRFS file system didn't mount.  If this differs from what you believe you have, then the rest of this is not going to work.

 

Try using cat  /proc/mdstat  to verify all the RAID layers are healthy.  If any are re-syncing, let them finish.  Next, mdadm  --detail  /dev/md127 (and md126 and md125), will show you (among other things) the array names, which should be an 8-digit hex host name followed by a colon and data-0, data-1, and data-2, assuming a single, standard XRAID volume named "data".  Also do a cat  /etc/fstab to see if all your data volumes (which I assume is just the one) are listed.  It should look something like this:

LABEL=43f6464e:data /data btrfs defaults 0 0

If all looks right, then mount --all should be all you need.  

 

If the RAID layers don't have the right names, then you are going to have to re-assemble them with the right ones.  I've not gone that far.  If fstab doesn't include your data volume, you're going to have to add it.  That 8-digit hex code in my example is the host ID again.

 

Let us know what works, or post anything that looks out of kilter from the commands I listed.

Message 8 of 19
StephenB
Guru

Re: No Volume Exists - Remove inactive volumes in order to use the disk


@Sandshark wrote:

I see your drives, in order of bays 1-6 are 2TB, 6TB, 6TB, 2TB, 2TB, 3TB.

FWIW, Drive 6 is 4 TB (3.7 TiB).

Message 9 of 19
Westyfield2
Tutor

Re: No Volume Exists - Remove inactive volumes in order to use the disk

Looks like md125 isn't happy.

 

cat  /proc/mdstat
Personalities : [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
md125 : active raid5 sdc4[0] sdb4[2] sdf4[1]
      3906749824 blocks super 1.2 level 5, 64k chunk, algorithm 2 [3/3] [UUU]

md126 : active raid5 sda3[0] sdf3[5] sde3[4] sdd3[3] sdc3[2] sdb3[6]
      9743313920 blocks super 1.2 level 5, 64k chunk, algorithm 2 [6/6] [UUUUUU]

md127 : active raid1 sdc5[0] sdb5[1]
      1953372928 blocks super 1.2 [2/2] [UU]

md1 : active raid6 sda2[0] sdf2[5] sde2[4] sdd2[3] sdc2[2] sdb2[1]
      2093056 blocks super 1.2 level 6, 512k chunk, algorithm 2 [6/6] [UUUUUU]

md0 : active raid1 sda1[0] sdf1[5] sde1[4] sdd1[3] sdc1[2] sdb1[6]
      4190208 blocks super 1.2 [6/6] [UUUUUU]

unused devices: <none>

 

 

mdadm  --detail  /dev/md127
/dev/md127:
        Version : 1.2
  Creation Time : Fri Dec 16 23:36:47 2016
     Raid Level : raid1
     Array Size : 1953372928 (1862.88 GiB 2000.25 GB)
  Used Dev Size : 1953372928 (1862.88 GiB 2000.25 GB)
   Raid Devices : 2
  Total Devices : 2
    Persistence : Superblock is persistent

    Update Time : Mon Apr 15 20:35:57 2019
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

           Name : 33eadf27:data-2  (local to host 33eadf27)
           UUID : a4343df5:9ab3de78:5fa43f83:8f0c56b5
         Events : 232

    Number   Major   Minor   RaidDevice State
       0       8       37        0      active sync   /dev/sdc5
       1       8       21        1      active sync   /dev/sdb5

 

 

mdadm  --detail  /dev/md126
/dev/md126:
        Version : 1.2
  Creation Time : Wed Oct  5 18:01:25 2016
     Raid Level : raid5
     Array Size : 9743313920 (9291.95 GiB 9977.15 GB)
  Used Dev Size : 1948662784 (1858.39 GiB 1995.43 GB)
   Raid Devices : 6
  Total Devices : 6
    Persistence : Superblock is persistent

    Update Time : Fri Apr 26 16:59:04 2019
          State : clean
 Active Devices : 6
Working Devices : 6
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 64K

           Name : 33eadf27:data-0  (local to host 33eadf27)
           UUID : 64c2c473:9f377754:501cbcc3:bf5a752e
         Events : 1232

    Number   Major   Minor   RaidDevice State
       0       8        3        0      active sync   /dev/sda3
       6       8       19        1      active sync   /dev/sdb3
       2       8       35        2      active sync   /dev/sdc3
       3       8       51        3      active sync   /dev/sdd3
       4       8       67        4      active sync   /dev/sde3
       5       8       83        5      active sync   /dev/sdf3

 

 

mdadm  --detail  /dev/md125
/dev/md125:
        Version : 1.2
  Creation Time : Wed Oct  5 18:02:22 2016
     Raid Level : raid5
     Array Size : 3906749824 (3725.77 GiB 4000.51 GB)
  Used Dev Size : 1953374912 (1862.88 GiB 2000.26 GB)
   Raid Devices : 3
  Total Devices : 3
    Persistence : Superblock is persistent

    Update Time : Fri Apr 26 16:59:04 2019
          State : clean
 Active Devices : 3
Working Devices : 3
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 64K

           Name : 33eadf27:data-1  (local to host 33eadf27)
           UUID : 699ba84d:86b18c86:edf75682:8aa9843a
         Events : 5574

    Number   Major   Minor   RaidDevice State
       0       8       36        0      active sync   /dev/sdc4
       1       8       84        1      active sync   /dev/sdf4
       2       8       20        2      active sync   /dev/sdb4

 

 

cat  /etc/fstab
LABEL=33eadf27:data /data btrfs defaults 0 0

 

 

mount --all
mount: /dev/md125: can't read superblock 

 

Message 10 of 19
Sandshark
Sensei

Re: No Volume Exists - Remove inactive volumes in order to use the disk

You could try stopping and re-assembling md125, but I fear it will do no good.  Maybe try booting (read-only would be best to try first) each set of 5 without one of the md125 elements --the 3TB and 6TB's.

Message 11 of 19
Hopchen
Prodigy

Re: No Volume Exists - Remove inactive volumes in order to use the disk

@Westyfield2 

 

Would you mind giving me a PM with the link for the logs? I will take a look for you.

 

 

Cheers

Message 12 of 19
Hopchen
Prodigy

Re: No Volume Exists - Remove inactive volumes in order to use the disk

Hi again

 

The raids themselves are not the issue. All of them are running fine. As you are using different sized disks, the NAS will partition them up and make several raids accordingly. Afterwards, the filesystem will stick these raids together into a single volume. As you can see, volume "data" is expecting 3 devices (your 3 raids).

Label: '33eadf27:data'  uuid: 52342247-97af-41fa-b85c-a717370cd941
	Total devices 3 FS bytes used 13.11TiB
	devid    1 size 9.07TiB used 8.64TiB path /dev/md126
	devid    2 size 3.64TiB used 3.21TiB path /dev/md125
	devid    3 size 1.82TiB used 1.39TiB path /dev/md127

 

What is actually happening here is that BTRFS makes a JBOD volume out of your 3 raids. This also means that each raid has it's own filesystem on top of it but BTRFS will require all 3 of them to be active in the volume, in order to mount the overall data volume.

 

As you discovered yourself, md125 has some issues with its filesystem. As a result the entire volume "data" cannot mount and you will see the "inactive volumes" error message.

[Wed Apr 24 19:32:15 2019] BTRFS error (device md125): parent transid verify failed on 9209338265600 wanted 1187338 found 1187336
[Wed Apr 24 19:32:15 2019] BTRFS error (device md125): parent transid verify failed on 9209338265600 wanted 1187338 found 1187336
[Wed Apr 24 19:32:15 2019] BTRFS warning (device md125): failed to read log tree
[Wed Apr 24 19:32:15 2019] BTRFS error (device md125): open_ctree failed

 

The problem with the filesystem on md125 is an inconsistency between the filesystem and the filesystem-journal. The types of issues can usually be combated by simply clearing the filesystem-journal on the affected device.

 

Here are some links that might be of interest:

https://btrfs.wiki.kernel.org/index.php/Btrfs-zero-log

https://askubuntu.com/questions/157917/how-do-i-recover-a-btrfs-partition-that-will-not-mount

 

Do a btrfs-zero-log on /dev/md125 and (gracefully) reboot the NAS.

 

Furthermore, be careful not filling the NAS too much. As with other filesystems, BTRFS does not like being filled up. It seems your volume was very full? I recommend leaving a good 10% free space at all times.

[19/04/13 08:45:15 WEST] info:snapshot:LOGMSG_SNAPSHOT_MONITOR_INFO Snapshot 2019_04_11__00_00_04 was deleted from share or LUN AV due to low free space on the volume.
[19/04/13 08:46:15 WEST] info:snapshot:LOGMSG_SNAPSHOT_MONITOR_INFO Snapshot 2019_04_11__00_00_04 was deleted from share or LUN Media due to low free space on the volume.
[19/04/13 08:47:15 WEST] info:snapshot:LOGMSG_SNAPSHOT_MONITOR_INFO Snapshot 2019_04_11__00_00_04 was deleted from share or LUN Files due to low free space on the volume.

 

 

Cheers

 

Message 13 of 19
Westyfield2
Tutor

Re: No Volume Exists - Remove inactive volumes in order to use the disk

That looks to have done it, thanks.

 

btrfs-zero-log /dev/md125
WARNING: this utility is deprecated, please use 'btrfs rescue zero-log'

parent transid verify failed on 9209338265600 wanted 1187338 found 1187336
parent transid verify failed on 9209338265600 wanted 1187338 found 1187336
parent transid verify failed on 9209338265600 wanted 1187338 found 1187336
Clearing log on /dev/md125, previous log_root 9209338265600, level 0
root@NAS:~#

Taking your point about capaicty, this is what I'm currently on:

13.12TB used of 14.53TB (1.41TB free)

 

Is there anything I should do to check it's fine before I expand with a bigger drive?

Similarly I've realised that defrag/scrub/balance aren't on a schedule, but how can I check it's safe before I run them?

Message 14 of 19
Hopchen
Prodigy

Re: No Volume Exists - Remove inactive volumes in order to use the disk

Hi again

 

Glad to hear your volume is back online.

 

As for the free space. Your volume is fairly large and thus keeping the amount of free space you have reported now is decent enough. I would be vary of going much below 1TiB free space on a volume this size.

 

I agree that you should do a full check on the health of the NAS at this point. I would recommend the following action, in order:

 

1. Take a full backup if you not have one already. This is the most important.

 

2. Get output of current disk stats by running command:

# get_disk_info

You can also get the disk stats from the logs if you like: disk_info.log

Take note of what the disk stats look like here.

 

3. Run a full disk health check from the GUI. It will take many hours to complete but checks the disks thoroughly. Afterwards, get the disk stats again and compare. Any disks that have increased in errors or showing new errors, replace them first.

 

4. Run a volume Scrub from the GUI to give the filesystem a full check. It will take a long time to complete as well.

 

5. Replace to larger disks if you like. Replace one at a time and wait for sync to complete before moving to the next one.

 

As for balance and defrag. I think a schedule of monthly defrag and balance is fine. I would also recommend doing a disk health check every quarter. A scrub is enough to do every 6 months or so. If your disk health is good and the initial scrub completes fine then you should be good to run scheduled tasks going forward.

 

 

Any questions, let me know - Cheers

Message 15 of 19
StephenB
Guru

Re: No Volume Exists - Remove inactive volumes in order to use the disk

@Hopchen wrote:

 

2. Get output of current disk stats by running command:

# get_disk_info

You can also get the disk stats from the logs if you like: disk_info.log

 


There's some additional information if you run 

# smartctl -x /dev/sdX

using the actual disk (/dev/sda, etc).

 

In particular there is an error log for disk commands.  I've been seeing some issues with one of my disks - volume reads sometimes time out (for instance, media playback freezes).  Nothing shows up in disk_info.log.  But when I use smartctl -x I am seeing stuff like this:

Error 324 [11] occurred at disk power-on lifetime: 21495 hours (895 days + 15 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 51 00 00 00 00 00 57 f2 38 40 00  Error: UNC at LBA = 0x0057f238 = 5763640

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  60 00 48 00 70 00 00 00 57 f2 08 40 08 18d+03:42:08.241  READ FPDMA QUEUED
  60 00 b0 00 68 00 00 00 65 16 30 40 08 18d+03:42:08.236  READ FPDMA QUEUED
  60 00 20 00 60 00 00 00 35 13 a0 40 08 18d+03:42:08.232  READ FPDMA QUEUED
  60 02 30 00 58 00 00 00 64 f4 c8 40 08 18d+03:42:08.232  READ FPDMA QUEUED
  60 00 10 00 50 00 00 00 64 f1 e0 40 08 18d+03:42:08.210  READ FPDMA QUEUED

Though I'd expected that a UNC would either increase the reallocated or pending sector count, that doesn't seem to be happening on this particular disk.  FWIW, kernel.log shows something like this when these errors occur:

 

Apr 20 05:27:28 NAS kernel: ata2.00: exception Emask 0x0 SAct 0xe00000 SErr 0x0 action 0x0
Apr 20 05:27:28 NAS kernel: ata2.00: irq_stat 0x40000008
Apr 20 05:27:28 NAS kernel: ata2.00: failed command: READ FPDMA QUEUED
Apr 20 05:27:28 NAS kernel: ata2.00: cmd 60/80:a8:c0:f3:3f/01:00:f5:01:00/40 tag 21 ncq 196608 in
                                     res 41/40:00:28:f4:3f/00:00:f5:01:00/00 Emask 0x409 (media error) <F>
Apr 20 05:27:28 NAS kernel: ata2.00: status: { DRDY ERR }
Apr 20 05:27:28 NAS kernel: ata2.00: error: { UNC }
Apr 20 05:27:28 NAS kernel: ata2.00: configured for UDMA/133
Apr 20 05:27:28 NAS kernel: sd 1:0:0:0: [sdb] tag#21 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Apr 20 05:27:28 NAS kernel: sd 1:0:0:0: [sdb] tag#21 Sense Key : Medium Error [current] [descriptor] 
Apr 20 05:27:28 NAS kernel: sd 1:0:0:0: [sdb] tag#21 Add. Sense: Unrecovered read error - auto reallocate failed
Apr 20 05:27:28 NAS kernel: sd 1:0:0:0: [sdb] tag#21 CDB: Read(16) 88 00 00 00 00 01 f5 3f f3 c0 00 00 01 80 00 00
Apr 20 05:27:28 NAS kernel: blk_update_request: I/O error, dev sdb, sector 8409576488
Apr 20 05:27:28 NAS kernel: ata2: EH complete

So the lesson here is that disk issues might not show up in the SMART stats.

 

FWIW, I will be replacing the disk shortly (as soon as I finish testing the replacement).  Then I'll test it more extensively with Lifeguard.

 


@Hopchen wrote:

 

As for balance and defrag. I think a schedule of monthly defrag and balance is fine. I would also recommend doing a disk health check every quarter. A scrub is enough to do every 6 months or so. If your disk health is good and the initial scrub completes fine then you should be good to run scheduled tasks going forward.

 


There's no one right answer on this.  Personally I do each test once a quarter.  Though if a lot of files are changing on your NAS, then I think it does make sense to increase the frequency of balance.  Defrag with btrfs is a mixed blessing - while reducing fragmentation will increase transfer speed on the the main shares, it also increases the amount of space used for snapshots.  So there is a tradeoff there that you should be mindful of.

 

FWIW, The scrub also functions as a disk test, since every sector in the data volume is read as part of the test.

 

Message 16 of 19
Hopchen
Prodigy

Re: No Volume Exists - Remove inactive volumes in order to use the disk

Yup, all good points from @StephenB as well 🙂

Message 17 of 19
Westyfield2
Tutor

Re: No Volume Exists - Remove inactive volumes in order to use the disk

Disk tests passed fine, only incremented the power-on-hours and the disk stats now has a self-test Extended offline Completed without error.

 

sda WDC WD2003FYYS-02W0B0 WD-WMAY01159942 is the only drive to have a single ATA error:

 

get_disk_info
Device:             sda
Controller:         0
Channel:            0
Model:              WDC WD2003FYYS-02W0B0
Serial:             WD-WMAY01159942
Firmware:           01.01D01
Class:              SATA
RPM:                7200
Sectors:            3907029168
Pool:               data
PoolType:           RAID 5
PoolState:          1
PoolHostId:         33eadf27
Health data 
  ATA Error Count:                1
  Reallocated Sectors:            0
  Reallocation Events:            0
  Spin Retry Count:               0
  Current Pending Sector Count:   0
  Uncorrectable Sector Count:     0
  Temperature:                    44
  Start/Stop Count:               4743
  Power-On Hours:                 63044
  Power Cycle Count:              67
  Load Cycle Count:               4719

Device:             sdb
Controller:         0
Channel:            1
Model:              WDC WD6002FRYZ-01WD5B0
Serial:             NCHBG8GS
Firmware:           01.01M02
Class:              SATA
RPM:                7200
Sectors:            11721045168
Pool:               data
PoolType:           RAID 5
PoolState:          1
PoolHostId:         33eadf27
Health data 
  ATA Error Count:                0
  Reallocated Sectors:            0
  Reallocation Events:            0
  Spin Retry Count:               0
  Current Pending Sector Count:   0
  Uncorrectable Sector Count:     0
  Temperature:                    48
  Start/Stop Count:               32
  Power-On Hours:                 20538
  Power Cycle Count:              32
  Load Cycle Count:               870

Device:             sdc
Controller:         0
Channel:            2
Model:              WDC WD6002FRYZ-01WD5B0
Serial:             NCGWTDVV
Firmware:           01.01M02
Class:              SATA
RPM:                7200
Sectors:            11721045168
Pool:               data
PoolType:           RAID 5
PoolState:          1
PoolHostId:         33eadf27
Health data 
  ATA Error Count:                0
  Reallocated Sectors:            0
  Reallocation Events:            0
  Spin Retry Count:               0
  Current Pending Sector Count:   0
  Uncorrectable Sector Count:     0
  Temperature:                    47
  Start/Stop Count:               35
  Power-On Hours:                 22267
  Power Cycle Count:              35
  Load Cycle Count:               942

Device:             sdd
Controller:         0
Channel:            3
Model:              WDC WD2003FYYS-02W0B1
Serial:             WD-WMAY04428148
Firmware:           01.01D02
Class:              SATA
RPM:                7200
Sectors:            3907029168
Pool:               data
PoolType:           RAID 5
PoolState:          1
PoolHostId:         33eadf27
Health data 
  ATA Error Count:                0
  Reallocated Sectors:            0
  Reallocation Events:            0
  Spin Retry Count:               0
  Current Pending Sector Count:   0
  Uncorrectable Sector Count:     0
  Temperature:                    47
  Start/Stop Count:               4762
  Power-On Hours:                 60669
  Power Cycle Count:              58
  Load Cycle Count:               4739

Device:             sde
Controller:         0
Channel:            4
Model:              WDC WD2003FYYS-02W0B1
Serial:             WD-WMAY04905430
Firmware:           01.01D02
Class:              SATA
RPM:                7200
Sectors:            3907029168
Pool:               data
PoolType:           RAID 5
PoolState:          1
PoolHostId:         33eadf27
Health data 
  ATA Error Count:                0
  Reallocated Sectors:            0
  Reallocation Events:            0
  Spin Retry Count:               0
  Current Pending Sector Count:   0
  Uncorrectable Sector Count:     0
  Temperature:                    49
  Start/Stop Count:               4321
  Power-On Hours:                 57003
  Power Cycle Count:              56
  Load Cycle Count:               4293

Device:             sdf
Controller:         0
Channel:            5
Model:              WDC WD4000F9YZ-09N20L0
Serial:             WD-WCC131766520
Firmware:           01.01A01
Class:              SATA
RPM:                7200
Sectors:            7814037168
Pool:               data
PoolType:           RAID 5
PoolState:          1
PoolHostId:         33eadf27
Health data 
  ATA Error Count:                0
  Reallocated Sectors:            0
  Reallocation Events:            0
  Spin Retry Count:               0
  Current Pending Sector Count:   0
  Uncorrectable Sector Count:     0
  Temperature:                    44
  Start/Stop Count:               1778
  Power-On Hours:                 43245
  Power Cycle Count:              39
  Load Cycle Count:               1757
root@NAS:~# smartctl -x /dev/sda
smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.4.116.x86_64.1] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital RE4
Device Model:     WDC WD2003FYYS-02W0B0
Serial Number:    WD-WMAY01159942
LU WWN Device Id: 5 0014ee 656594009
Firmware Version: 01.01D01
User Capacity:    2,000,398,934,016 bytes [2.00 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    7200 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS (minor revision not indicated)
SATA Version is:  SATA 2.6, 3.0 Gb/s
Local Time is:    Mon Apr 29 07:29:35 2019 WEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM level is:     254 (maximum performance), recommended: 128
APM level is:     254 (maximum performance)
Rd look-ahead is: Enabled
Write cache is:   Enabled
DSN feature is:   Unavailable
ATA Security is:  Disabled, NOT FROZEN [SEC1]
Wt Cache Reorder: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x84) Offline data collection activity
                                        was suspended by an interrupting command from host.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (29700) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        ( 302) minutes.
Conveyance self-test routine
recommended polling time:        (   5) minutes.
SCT capabilities:              (0x303f) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate     POSR-K   200   200   051    -    0
  3 Spin_Up_Time            POS--K   253   253   021    -    8691
  4 Start_Stop_Count        -O--CK   096   096   000    -    4743
  5 Reallocated_Sector_Ct   PO--CK   200   200   140    -    0
  7 Seek_Error_Rate         -OSR-K   200   200   000    -    0
  9 Power_On_Hours          -O--CK   014   014   000    -    63044
 10 Spin_Retry_Count        -O--CK   100   100   000    -    0
 11 Calibration_Retry_Count -O--CK   100   253   000    -    0
 12 Power_Cycle_Count       -O--CK   100   100   000    -    67
192 Power-Off_Retract_Count -O--CK   200   200   000    -    23
193 Load_Cycle_Count        -O--CK   199   199   000    -    4719
194 Temperature_Celsius     -O---K   108   097   000    -    44
196 Reallocated_Event_Count -O--CK   200   200   000    -    0
197 Current_Pending_Sector  -O--CK   200   200   000    -    0
198 Offline_Uncorrectable   ----CK   200   200   000    -    0
199 UDMA_CRC_Error_Count    -O--CK   200   200   000    -    0
200 Multi_Zone_Error_Rate   ---R--   200   200   000    -    1
                            ||||||_ K auto-keep
                            |||||__ C event count
                            ||||___ R error rate
                            |||____ S speed/performance
                            ||_____ O updated online
                            |______ P prefailure warning

General Purpose Log Directory Version 1
SMART           Log Directory Version 1 [multi-sector log support]
Address    Access  R/W   Size  Description
0x00       GPL,SL  R/O      1  Log Directory
0x01           SL  R/O      1  Summary SMART error log
0x02           SL  R/O      5  Comprehensive SMART error log
0x03       GPL     R/O      6  Ext. Comprehensive SMART error log
0x06           SL  R/O      1  SMART self-test log
0x07       GPL     R/O      1  Extended self-test log
0x09           SL  R/W      1  Selective self-test log
0x10       GPL     R/O      1  NCQ Command Error log
0x11       GPL     R/O      1  SATA Phy Event Counters log
0x80-0x9f  GPL,SL  R/W     16  Host vendor specific log
0xa0-0xa7  GPL,SL  VS      16  Device vendor specific log
0xa8-0xb5  GPL,SL  VS       1  Device vendor specific log
0xb6       GPL     VS       1  Device vendor specific log
0xb7       GPL,SL  VS       1  Device vendor specific log
0xbd       GPL,SL  VS       1  Device vendor specific log
0xc0       GPL,SL  VS       1  Device vendor specific log
0xc1       GPL     VS      24  Device vendor specific log
0xe0       GPL,SL  R/W      1  SCT Command/Status
0xe1       GPL,SL  R/W      1  SCT Data Transfer

SMART Extended Comprehensive Error Log Version: 1 (6 sectors)
Device Error Count: 1
        CR     = Command Register
        FEATR  = Features Register
        COUNT  = Count (was: Sector Count) Register
        LBA_48 = Upper bytes of LBA High/Mid/Low Registers ]  ATA-8
        LH     = LBA High (was: Cylinder High) Register    ]   LBA
        LM     = LBA Mid (was: Cylinder Low) Register      ] Register
        LL     = LBA Low (was: Sector Number) Register     ]
        DV     = Device (was: Device/Head) Register
        DC     = Device Control Register
        ER     = Error register
        ST     = Status register
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 1 [0] occurred at disk power-on lifetime: 40806 hours (1700 days + 6 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  10 -- 51 03 b8 00 00 33 d8 ea c0 40 00  Error: IDNF at LBA = 0x33d8eac0 = 869853888

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  60 03 b8 00 70 00 00 33 d9 dc c8 40 08 46d+17:24:57.544  READ FPDMA QUEUED
  60 02 98 00 68 00 00 33 d9 da 30 40 08 46d+17:24:57.544  READ FPDMA QUEUED
  60 00 c8 00 60 00 00 33 d9 d9 68 40 08 46d+17:24:57.543  READ FPDMA QUEUED
  61 00 80 00 58 00 00 33 d9 41 c0 40 08 46d+17:24:57.542  WRITE FPDMA QUEUED
  61 00 80 00 50 00 00 33 d9 3e c0 40 08 46d+17:24:57.542  WRITE FPDMA QUEUED

SMART Extended Self-test Log Version: 1 (1 sectors)
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%     63031         -
# 2  Short offline       Completed without error       00%         5         -
# 3  Short offline       Completed without error       00%         0         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

SCT Status Version:                  3
SCT Version (vendor specific):       258 (0x0102)
SCT Support Level:                   1
Device State:                        Active (0)
Current Temperature:                    44 Celsius
Power Cycle Min/Max Temperature:     26/46 Celsius
Lifetime    Min/Max Temperature:     26/55 Celsius
Under/Over Temperature Limit Count:   0/0

SCT Temperature History Version:     2
Temperature Sampling Period:         1 minute
Temperature Logging Interval:        1 minute
Min/Max recommended Temperature:      0/60 Celsius
Min/Max Temperature Limit:           -41/85 Celsius
Temperature History Size (Index):    478 (228)

Index    Estimated Time   Temperature Celsius
 229    2019-04-28 23:32    42  ***********************
 ...    ..( 27 skipped).    ..  ***********************
 257    2019-04-29 00:00    42  ***********************
 258    2019-04-29 00:01    43  ************************
 ...    ..(142 skipped).    ..  ************************
 401    2019-04-29 02:24    43  ************************
 402    2019-04-29 02:25    44  *************************
 ...    ..(  7 skipped).    ..  *************************
 410    2019-04-29 02:33    44  *************************
 411    2019-04-29 02:34    45  **************************
 ...    ..( 37 skipped).    ..  **************************
 449    2019-04-29 03:12    45  **************************
 450    2019-04-29 03:13    44  *************************
 ...    ..( 22 skipped).    ..  *************************
 473    2019-04-29 03:36    44  *************************
 474    2019-04-29 03:37    43  ************************
 ...    ..( 57 skipped).    ..  ************************
  54    2019-04-29 04:35    43  ************************
  55    2019-04-29 04:36    44  *************************
 ...    ..( 71 skipped).    ..  *************************
 127    2019-04-29 05:48    44  *************************
 128    2019-04-29 05:49    43  ************************
 ...    ..( 43 skipped).    ..  ************************
 172    2019-04-29 06:33    43  ************************
 173    2019-04-29 06:34    44  *************************
 ...    ..( 36 skipped).    ..  *************************
 210    2019-04-29 07:11    44  *************************
 211    2019-04-29 07:12    42  ***********************
 ...    ..( 16 skipped).    ..  ***********************
 228    2019-04-29 07:29    42  ***********************

SCT Error Recovery Control:
           Read:     70 (7.0 seconds)
          Write:     70 (7.0 seconds)

Device Statistics (GP/SMART Log 0x04) not supported

Pending Defects log (GP Log 0x0c) not supported

SATA Phy Event Counters (GP Log 0x11)
ID      Size     Value  Description
0x0001  2            0  Command failed due to ICRC error
0x0002  2            0  R_ERR response for data FIS
0x0003  2            0  R_ERR response for device-to-host data FIS
0x0004  2            0  R_ERR response for host-to-device data FIS
0x0005  2            0  R_ERR response for non-data FIS
0x0006  2            0  R_ERR response for device-to-host non-data FIS
0x0007  2            0  R_ERR response for host-to-device non-data FIS
0x000a  2           10  Device-to-host register FISes sent due to a COMRESET
0x000b  2            0  CRC errors within host-to-device FIS
0x8000  4       128983  Vendor specific
Message 18 of 19
StephenB
Guru

Re: No Volume Exists - Remove inactive volumes in order to use the disk


@Westyfield2 wrote:

Disk tests passed fine, only incremented the power-on-hours and the disk stats now has a self-test Extended offline Completed without error.


Well, there was this error on sda (perhaps related to the ATA error)

Error 1 [0] occurred at disk power-on lifetime: 40806 hours (1700 days + 6 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  10 -- 51 03 b8 00 00 33 d8 ea c0 40 00  Error: IDNF at LBA = 0x33d8eac0 = 869853888

But it occured quite a while ago, so I wouldn't be concerned about it.

 

@StephenB wrote:

 

FWIW, I will be replacing the disk shortly (as soon as I finish testing the replacement).  Then I'll test it more extensively with Lifeguard.


Just to follow up on my own smartctl -x issue...

 

That disk failed Lifeguard's extended test (too many bad sectors),  It had 35 days of warranty left, so I started the RMA today.  

 

 

 

Message 19 of 19
Top Contributors
Discussion stats
  • 18 replies
  • 3840 views
  • 0 kudos
  • 5 in conversation
Announcements