Forum Discussion

Tutor

Apr 21, 2019

No Volume Exists - Remove inactive volumes in order to use the disk

Hi, Running v6.9.3 After a reboot the NAS came up saying "No Volume Exists". Have done another couple of shutdowns and startups and it's now saying "Remove inactive volumes to use the disk....

Guru - Experienced User

Apr 28, 2019

Hopchen wrote:

2. Get output of current disk stats by running command:
# get_disk_info
You can also get the disk stats from the logs if you like: disk_info.log

There's some additional information if you run

# smartctl -x /dev/sdX

using the actual disk (/dev/sda, etc).

In particular there is an error log for disk commands. I've been seeing some issues with one of my disks - volume reads sometimes time out (for instance, media playback freezes). Nothing shows up in disk_info.log. But when I use smartctl -x I am seeing stuff like this:

Error 324 [11] occurred at disk power-on lifetime: 21495 hours (895 days + 15 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 51 00 00 00 00 00 57 f2 38 40 00  Error: UNC at LBA = 0x0057f238 = 5763640

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  60 00 48 00 70 00 00 00 57 f2 08 40 08 18d+03:42:08.241  READ FPDMA QUEUED
  60 00 b0 00 68 00 00 00 65 16 30 40 08 18d+03:42:08.236  READ FPDMA QUEUED
  60 00 20 00 60 00 00 00 35 13 a0 40 08 18d+03:42:08.232  READ FPDMA QUEUED
  60 02 30 00 58 00 00 00 64 f4 c8 40 08 18d+03:42:08.232  READ FPDMA QUEUED
  60 00 10 00 50 00 00 00 64 f1 e0 40 08 18d+03:42:08.210  READ FPDMA QUEUED

Though I'd expected that a UNC would either increase the reallocated or pending sector count, that doesn't seem to be happening on this particular disk. FWIW, kernel.log shows something like this when these errors occur:

Apr 20 05:27:28 NAS kernel: ata2.00: exception Emask 0x0 SAct 0xe00000 SErr 0x0 action 0x0
Apr 20 05:27:28 NAS kernel: ata2.00: irq_stat 0x40000008
Apr 20 05:27:28 NAS kernel: ata2.00: failed command: READ FPDMA QUEUED
Apr 20 05:27:28 NAS kernel: ata2.00: cmd 60/80:a8:c0:f3:3f/01:00:f5:01:00/40 tag 21 ncq 196608 in
                                     res 41/40:00:28:f4:3f/00:00:f5:01:00/00 Emask 0x409 (media error) <F>
Apr 20 05:27:28 NAS kernel: ata2.00: status: { DRDY ERR }
Apr 20 05:27:28 NAS kernel: ata2.00: error: { UNC }
Apr 20 05:27:28 NAS kernel: ata2.00: configured for UDMA/133
Apr 20 05:27:28 NAS kernel: sd 1:0:0:0: [sdb] tag#21 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Apr 20 05:27:28 NAS kernel: sd 1:0:0:0: [sdb] tag#21 Sense Key : Medium Error [current] [descriptor] 
Apr 20 05:27:28 NAS kernel: sd 1:0:0:0: [sdb] tag#21 Add. Sense: Unrecovered read error - auto reallocate failed
Apr 20 05:27:28 NAS kernel: sd 1:0:0:0: [sdb] tag#21 CDB: Read(16) 88 00 00 00 00 01 f5 3f f3 c0 00 00 01 80 00 00
Apr 20 05:27:28 NAS kernel: blk_update_request: I/O error, dev sdb, sector 8409576488
Apr 20 05:27:28 NAS kernel: ata2: EH complete

So the lesson here is that disk issues might not show up in the SMART stats.

FWIW, I will be replacing the disk shortly (as soon as I finish testing the replacement). Then I'll test it more extensively with Lifeguard.

Hopchen wrote:

As for balance and defrag. I think a schedule of monthly defrag and balance is fine. I would also recommend doing a disk health check every quarter. A scrub is enough to do every 6 months or so. If your disk health is good and the initial scrub completes fine then you should be good to run scheduled tasks going forward.

There's no one right answer on this. Personally I do each test once a quarter. Though if a lot of files are changing on your NAS, then I think it does make sense to increase the frequency of balance. Defrag with btrfs is a mixed blessing - while reducing fragmentation will increase transfer speed on the the main shares, it also increases the amount of space used for snapshots. So there is a tradeoff there that you should be mindful of.

FWIW, The scrub also functions as a disk test, since every sector in the data volume is read as part of the test.

Hopchen

Prodigy

Apr 28, 2019

Yup, all good points from StephenB as well :)

Westyfield2

Tutor

Apr 29, 2019

Disk tests passed fine, only incremented the power-on-hours and the disk stats now has a self-test Extended offline Completed without error.

sda WDC WD2003FYYS-02W0B0 WD-WMAY01159942 is the only drive to have a single ATA error:

get_disk_info
Device:             sda
Controller:         0
Channel:            0
Model:              WDC WD2003FYYS-02W0B0
Serial:             WD-WMAY01159942
Firmware:           01.01D01
Class:              SATA
RPM:                7200
Sectors:            3907029168
Pool:               data
PoolType:           RAID 5
PoolState:          1
PoolHostId:         33eadf27
Health data 
  ATA Error Count:                1
  Reallocated Sectors:            0
  Reallocation Events:            0
  Spin Retry Count:               0
  Current Pending Sector Count:   0
  Uncorrectable Sector Count:     0
  Temperature:                    44
  Start/Stop Count:               4743
  Power-On Hours:                 63044
  Power Cycle Count:              67
  Load Cycle Count:               4719

Device:             sdb
Controller:         0
Channel:            1
Model:              WDC WD6002FRYZ-01WD5B0
Serial:             NCHBG8GS
Firmware:           01.01M02
Class:              SATA
RPM:                7200
Sectors:            11721045168
Pool:               data
PoolType:           RAID 5
PoolState:          1
PoolHostId:         33eadf27
Health data 
  ATA Error Count:                0
  Reallocated Sectors:            0
  Reallocation Events:            0
  Spin Retry Count:               0
  Current Pending Sector Count:   0
  Uncorrectable Sector Count:     0
  Temperature:                    48
  Start/Stop Count:               32
  Power-On Hours:                 20538
  Power Cycle Count:              32
  Load Cycle Count:               870

Device:             sdc
Controller:         0
Channel:            2
Model:              WDC WD6002FRYZ-01WD5B0
Serial:             NCGWTDVV
Firmware:           01.01M02
Class:              SATA
RPM:                7200
Sectors:            11721045168
Pool:               data
PoolType:           RAID 5
PoolState:          1
PoolHostId:         33eadf27
Health data 
  ATA Error Count:                0
  Reallocated Sectors:            0
  Reallocation Events:            0
  Spin Retry Count:               0
  Current Pending Sector Count:   0
  Uncorrectable Sector Count:     0
  Temperature:                    47
  Start/Stop Count:               35
  Power-On Hours:                 22267
  Power Cycle Count:              35
  Load Cycle Count:               942

Device:             sdd
Controller:         0
Channel:            3
Model:              WDC WD2003FYYS-02W0B1
Serial:             WD-WMAY04428148
Firmware:           01.01D02
Class:              SATA
RPM:                7200
Sectors:            3907029168
Pool:               data
PoolType:           RAID 5
PoolState:          1
PoolHostId:         33eadf27
Health data 
  ATA Error Count:                0
  Reallocated Sectors:            0
  Reallocation Events:            0
  Spin Retry Count:               0
  Current Pending Sector Count:   0
  Uncorrectable Sector Count:     0
  Temperature:                    47
  Start/Stop Count:               4762
  Power-On Hours:                 60669
  Power Cycle Count:              58
  Load Cycle Count:               4739

Device:             sde
Controller:         0
Channel:            4
Model:              WDC WD2003FYYS-02W0B1
Serial:             WD-WMAY04905430
Firmware:           01.01D02
Class:              SATA
RPM:                7200
Sectors:            3907029168
Pool:               data
PoolType:           RAID 5
PoolState:          1
PoolHostId:         33eadf27
Health data 
  ATA Error Count:                0
  Reallocated Sectors:            0
  Reallocation Events:            0
  Spin Retry Count:               0
  Current Pending Sector Count:   0
  Uncorrectable Sector Count:     0
  Temperature:                    49
  Start/Stop Count:               4321
  Power-On Hours:                 57003
  Power Cycle Count:              56
  Load Cycle Count:               4293

Device:             sdf
Controller:         0
Channel:            5
Model:              WDC WD4000F9YZ-09N20L0
Serial:             WD-WCC131766520
Firmware:           01.01A01
Class:              SATA
RPM:                7200
Sectors:            7814037168
Pool:               data
PoolType:           RAID 5
PoolState:          1
PoolHostId:         33eadf27
Health data 
  ATA Error Count:                0
  Reallocated Sectors:            0
  Reallocation Events:            0
  Spin Retry Count:               0
  Current Pending Sector Count:   0
  Uncorrectable Sector Count:     0
  Temperature:                    44
  Start/Stop Count:               1778
  Power-On Hours:                 43245
  Power Cycle Count:              39
  Load Cycle Count:               1757

root@NAS:~# smartctl -x /dev/sda
smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.4.116.x86_64.1] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital RE4
Device Model:     WDC WD2003FYYS-02W0B0
Serial Number:    WD-WMAY01159942
LU WWN Device Id: 5 0014ee 656594009
Firmware Version: 01.01D01
User Capacity:    2,000,398,934,016 bytes [2.00 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    7200 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS (minor revision not indicated)
SATA Version is:  SATA 2.6, 3.0 Gb/s
Local Time is:    Mon Apr 29 07:29:35 2019 WEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM level is:     254 (maximum performance), recommended: 128
APM level is:     254 (maximum performance)
Rd look-ahead is: Enabled
Write cache is:   Enabled
DSN feature is:   Unavailable
ATA Security is:  Disabled, NOT FROZEN [SEC1]
Wt Cache Reorder: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x84) Offline data collection activity
                                        was suspended by an interrupting command from host.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (29700) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        ( 302) minutes.
Conveyance self-test routine
recommended polling time:        (   5) minutes.
SCT capabilities:              (0x303f) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate     POSR-K   200   200   051    -    0
  3 Spin_Up_Time            POS--K   253   253   021    -    8691
  4 Start_Stop_Count        -O--CK   096   096   000    -    4743
  5 Reallocated_Sector_Ct   PO--CK   200   200   140    -    0
  7 Seek_Error_Rate         -OSR-K   200   200   000    -    0
  9 Power_On_Hours          -O--CK   014   014   000    -    63044
 10 Spin_Retry_Count        -O--CK   100   100   000    -    0
 11 Calibration_Retry_Count -O--CK   100   253   000    -    0
 12 Power_Cycle_Count       -O--CK   100   100   000    -    67
192 Power-Off_Retract_Count -O--CK   200   200   000    -    23
193 Load_Cycle_Count        -O--CK   199   199   000    -    4719
194 Temperature_Celsius     -O---K   108   097   000    -    44
196 Reallocated_Event_Count -O--CK   200   200   000    -    0
197 Current_Pending_Sector  -O--CK   200   200   000    -    0
198 Offline_Uncorrectable   ----CK   200   200   000    -    0
199 UDMA_CRC_Error_Count    -O--CK   200   200   000    -    0
200 Multi_Zone_Error_Rate   ---R--   200   200   000    -    1
                            ||||||_ K auto-keep
                            |||||__ C event count
                            ||||___ R error rate
                            |||____ S speed/performance
                            ||_____ O updated online
                            |______ P prefailure warning

General Purpose Log Directory Version 1
SMART           Log Directory Version 1 [multi-sector log support]
Address    Access  R/W   Size  Description
0x00       GPL,SL  R/O      1  Log Directory
0x01           SL  R/O      1  Summary SMART error log
0x02           SL  R/O      5  Comprehensive SMART error log
0x03       GPL     R/O      6  Ext. Comprehensive SMART error log
0x06           SL  R/O      1  SMART self-test log
0x07       GPL     R/O      1  Extended self-test log
0x09           SL  R/W      1  Selective self-test log
0x10       GPL     R/O      1  NCQ Command Error log
0x11       GPL     R/O      1  SATA Phy Event Counters log
0x80-0x9f  GPL,SL  R/W     16  Host vendor specific log
0xa0-0xa7  GPL,SL  VS      16  Device vendor specific log
0xa8-0xb5  GPL,SL  VS       1  Device vendor specific log
0xb6       GPL     VS       1  Device vendor specific log
0xb7       GPL,SL  VS       1  Device vendor specific log
0xbd       GPL,SL  VS       1  Device vendor specific log
0xc0       GPL,SL  VS       1  Device vendor specific log
0xc1       GPL     VS      24  Device vendor specific log
0xe0       GPL,SL  R/W      1  SCT Command/Status
0xe1       GPL,SL  R/W      1  SCT Data Transfer

SMART Extended Comprehensive Error Log Version: 1 (6 sectors)
Device Error Count: 1
        CR     = Command Register
        FEATR  = Features Register
        COUNT  = Count (was: Sector Count) Register
        LBA_48 = Upper bytes of LBA High/Mid/Low Registers ]  ATA-8
        LH     = LBA High (was: Cylinder High) Register    ]   LBA
        LM     = LBA Mid (was: Cylinder Low) Register      ] Register
        LL     = LBA Low (was: Sector Number) Register     ]
        DV     = Device (was: Device/Head) Register
        DC     = Device Control Register
        ER     = Error register
        ST     = Status register
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 1 [0] occurred at disk power-on lifetime: 40806 hours (1700 days + 6 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  10 -- 51 03 b8 00 00 33 d8 ea c0 40 00  Error: IDNF at LBA = 0x33d8eac0 = 869853888

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  60 03 b8 00 70 00 00 33 d9 dc c8 40 08 46d+17:24:57.544  READ FPDMA QUEUED
  60 02 98 00 68 00 00 33 d9 da 30 40 08 46d+17:24:57.544  READ FPDMA QUEUED
  60 00 c8 00 60 00 00 33 d9 d9 68 40 08 46d+17:24:57.543  READ FPDMA QUEUED
  61 00 80 00 58 00 00 33 d9 41 c0 40 08 46d+17:24:57.542  WRITE FPDMA QUEUED
  61 00 80 00 50 00 00 33 d9 3e c0 40 08 46d+17:24:57.542  WRITE FPDMA QUEUED

SMART Extended Self-test Log Version: 1 (1 sectors)
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%     63031         -
# 2  Short offline       Completed without error       00%         5         -
# 3  Short offline       Completed without error       00%         0         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

SCT Status Version:                  3
SCT Version (vendor specific):       258 (0x0102)
SCT Support Level:                   1
Device State:                        Active (0)
Current Temperature:                    44 Celsius
Power Cycle Min/Max Temperature:     26/46 Celsius
Lifetime    Min/Max Temperature:     26/55 Celsius
Under/Over Temperature Limit Count:   0/0

SCT Temperature History Version:     2
Temperature Sampling Period:         1 minute
Temperature Logging Interval:        1 minute
Min/Max recommended Temperature:      0/60 Celsius
Min/Max Temperature Limit:           -41/85 Celsius
Temperature History Size (Index):    478 (228)

Index    Estimated Time   Temperature Celsius
 229    2019-04-28 23:32    42  ***********************
 ...    ..( 27 skipped).    ..  ***********************
 257    2019-04-29 00:00    42  ***********************
 258    2019-04-29 00:01    43  ************************
 ...    ..(142 skipped).    ..  ************************
 401    2019-04-29 02:24    43  ************************
 402    2019-04-29 02:25    44  *************************
 ...    ..(  7 skipped).    ..  *************************
 410    2019-04-29 02:33    44  *************************
 411    2019-04-29 02:34    45  **************************
 ...    ..( 37 skipped).    ..  **************************
 449    2019-04-29 03:12    45  **************************
 450    2019-04-29 03:13    44  *************************
 ...    ..( 22 skipped).    ..  *************************
 473    2019-04-29 03:36    44  *************************
 474    2019-04-29 03:37    43  ************************
 ...    ..( 57 skipped).    ..  ************************
  54    2019-04-29 04:35    43  ************************
  55    2019-04-29 04:36    44  *************************
 ...    ..( 71 skipped).    ..  *************************
 127    2019-04-29 05:48    44  *************************
 128    2019-04-29 05:49    43  ************************
 ...    ..( 43 skipped).    ..  ************************
 172    2019-04-29 06:33    43  ************************
 173    2019-04-29 06:34    44  *************************
 ...    ..( 36 skipped).    ..  *************************
 210    2019-04-29 07:11    44  *************************
 211    2019-04-29 07:12    42  ***********************
 ...    ..( 16 skipped).    ..  ***********************
 228    2019-04-29 07:29    42  ***********************

SCT Error Recovery Control:
           Read:     70 (7.0 seconds)
          Write:     70 (7.0 seconds)

Device Statistics (GP/SMART Log 0x04) not supported

Pending Defects log (GP Log 0x0c) not supported

SATA Phy Event Counters (GP Log 0x11)
ID      Size     Value  Description
0x0001  2            0  Command failed due to ICRC error
0x0002  2            0  R_ERR response for data FIS
0x0003  2            0  R_ERR response for device-to-host data FIS
0x0004  2            0  R_ERR response for host-to-device data FIS
0x0005  2            0  R_ERR response for non-data FIS
0x0006  2            0  R_ERR response for device-to-host non-data FIS
0x0007  2            0  R_ERR response for host-to-device non-data FIS
0x000a  2           10  Device-to-host register FISes sent due to a COMRESET
0x000b  2            0  CRC errors within host-to-device FIS
0x8000  4       128983  Vendor specific

StephenB
Guru - Experienced User
Apr 29, 2019
Westyfield2 wrote:

Disk tests passed fine, only incremented the power-on-hours and the disk stats now has a self-test Extended offline Completed without error.

Well, there was this error on sda (perhaps related to the ATA error)

Error 1 [0] occurred at disk power-on lifetime: 40806 hours (1700 days + 6 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER -- ST COUNT LBA_48 LH LM LL DV DC -- -- -- == -- == == == -- -- -- -- -- 10 -- 51 03 b8 00 00 33 d8 ea c0 40 00 Error: IDNF at LBA = 0x33d8eac0 = 869853888

But it occured quite a while ago, so I wouldn't be concerned about it.

StephenB wrote:

FWIW, I will be replacing the disk shortly (as soon as I finish testing the replacement). Then I'll test it more extensively with Lifeguard.

Just to follow up on my own smartctl -x issue...

That disk failed Lifeguard's extended test (too many bad sectors), It had 35 days of warranty left, so I started the RMA today.

NETGEAR Academy

Boost your skills with the Netgear Academy - Get trained, certified and stay ahead with the latest Netgear technology!

Join Us!

ProSupport for Business

Comprehensive support plans for maximum network uptime and business peace of mind.

Learn More

Forum Discussion

No Volume Exists - Remove inactive volumes in order to use the disk

Related Content

ReadyNAS RN424 | Inactive Volume + RAID Issue

RN104 Remove inactive volumes Disk 3,4

Remove inactive volumes

Volume Expansion question

Volumes now showing inactive after reboot

NETGEAR Academy

ProSupport for Business