- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
Disk test running 10 days already
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Disk test running 10 days already
Hello,
I'm running a readynas with firmware 6.10.1
On august 1 I started a disk test and it is still running now on the 11th.
I guess this isn't normal. What should I do? Is it safe to reboot?
Kind regards
Steve
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Re: Disk test running 10 days already
Last night had to reboot after the device became unresponsive when I activated file search.
Should I try the disk test again or do something else first?
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Re: Disk test running 10 days already
@stevevandegaer2 wrote:
Last night had to reboot after the device became unresponsive when I activated file search.
Should I try the disk test again or do something else first?
One possible reason for the very long test time is that you might have a failing disk. I suggest downloading the log zip file and looking in disk_info.log. Also look for disk related errors in kernel.log
You might also want to use ssh, and run smartctl -x
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Re: Disk test running 10 days already
Not seeing any error in either log file.
Started the disk check again, hoping I will get some result this time
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Re: Disk test running 10 days already
smartctl -x
smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.4.178.x86_64.1] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org
ERROR: smartctl requires a device name as the final command-line argument.
Use smartctl -h to get a usage summary
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Re: Disk test running 10 days already
@stevevandegaer2 wrote:
ERROR: smartctl requires a device name as the final command-line argument.
Sorry, I assumed more knowledge than you have.
You need to include the device name for each disk. So for the first disk you use
# smartctl -x /dev/sda
Repeat using sdb for disk 2, and so forth.
One thing to look for is the "Extended Comprehensive Log" section. Here's a snippet from my own system. "UNC" means "uncorrected error". Not all disks will have this section, but it is helpful when they do.
root@NAS:~# smartctl -x /dev/sdc smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.4.178.x86_64.1] (local build) Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === ... === START OF READ SMART DATA SECTION === ... SMART Extended Comprehensive Error Log Version: 1 (6 sectors) Device Error Count: 12 CR = Command Register FEATR = Features Register COUNT = Count (was: Sector Count) Register LBA_48 = Upper bytes of LBA High/Mid/Low Registers ] ATA-8 LH = LBA High (was: Cylinder High) Register ] LBA LM = LBA Mid (was: Cylinder Low) Register ] Register LL = LBA Low (was: Sector Number) Register ] DV = Device (was: Device/Head) Register DC = Device Control Register ER = Error register ST = Status register Powered_Up_Time is measured from power on, and printed as DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes, SS=sec, and sss=millisec. It "wraps" after 49.710 days. Error 12 [11] occurred at disk power-on lifetime: 36166 hours (1506 days + 22 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER -- ST COUNT LBA_48 LH LM LL DV DC -- -- -- == -- == == == -- -- -- -- -- 40 -- 51 00 00 00 00 0c 27 df 40 40 00 Error: UNC at LBA = 0x0c27df40 = 203939648 Commands leading to the command that caused the error were: CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name -- == -- == -- == == == -- -- -- -- -- --------------- -------------------- 60 00 80 00 c8 00 00 0c 27 df 40 40 08 1d+08:15:46.204 READ FPDMA QUEUED 60 00 08 00 c0 00 01 06 34 36 98 40 08 1d+08:15:46.163 READ FPDMA QUEUED 60 00 80 00 b8 00 00 0c 27 e4 40 40 08 1d+08:15:46.146 READ FPDMA QUEUED 60 00 80 00 b0 00 00 0c 27 e9 40 40 08 1d+08:15:46.123 READ FPDMA QUEUED 60 00 80 00 a8 00 00 0c 27 ee 40 40 08 1d+08:15:46.094 READ FPDMA QUEUED
Errors in this section might not mean the disk needs to be replaced. But if you are seeing errors that happened during your disk test, then it might help you isolate your problem. If you do see some, the next step is to power down the NAS and test the disk(s) in a Windows PC using vendor tools - lifeguard for Western Digital; seatools for Seagate. If they pass, put them back in to the NAS (in the same slot) before you power it up.
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Re: Disk test running 10 days already
I guess I should replace this first one?
SMART Extended Comprehensive Error Log Version: 1 (2 sectors)
Device Error Count: 1
CR = Command Register
FEATR = Features Register
COUNT = Count (was: Sector Count) Register
LBA_48 = Upper bytes of LBA High/Mid/Low Registers ] ATA-8
LH = LBA High (was: Cylinder High) Register ] LBA
LM = LBA Mid (was: Cylinder Low) Register ] Register
LL = LBA Low (was: Sector Number) Register ]
DV = Device (was: Device/Head) Register
DC = Device Control Register
ER = Error register
ST = Status register
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.
Error 1 [0] occurred at disk power-on lifetime: 9633 hours (401 days + 9 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER -- ST COUNT LBA_48 LH LM LL DV DC
-- -- -- == -- == == == -- -- -- -- --
84 -- 51 00 08 00 00 79 cc 70 00 40 00 Error: ICRC, ABRT at LBA = 0x79cc7000 = 2043441152
Commands leading to the command that caused the error were:
CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
-- == -- == -- == == == -- -- -- -- -- --------------- --------------------
60 00 d0 00 10 00 00 79 cc 70 30 40 08 03:48:00.465 READ FPDMA QUEUED
60 00 d0 00 10 00 00 79 cc 70 30 40 08 03:48:00.465 READ FPDMA QUEUED
60 00 d0 00 08 00 00 79 cc 70 28 40 08 03:48:00.465 READ FPDMA QUEUED
60 00 d0 00 10 00 00 79 cc 70 18 40 08 03:48:00.465 READ FPDMA QUEUED
60 00 d0 00 20 00 00 79 cc 6f f8 40 08 03:48:00.465 READ FPDMA QUEUED
SMART Extended Self-test Log Version: 1 (2 sectors)
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Interrupted (host reset) 90% 59582 -
# 2 Short offline Completed without error 00% 0 -
SMART Selective self-test log data structure revision number 0
Note: revision number not 1 implies that no selective self-test has ever been run
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Self_test_in_progress [60% left] (0-65535)
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
SMART Extended Comprehensive Error Log Version: 1 (6 sectors)
No Errors Logged
SMART Extended Self-test Log Version: 1 (1 sectors)
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 20109 -
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
SCT Status Version: 3
SCT Version (vendor specific): 258 (0x0102)
SCT Support Level: 1
Device State: DST executing in background (3)
Current Temperature: 36 Celsius
Power Cycle Min/Max Temperature: 30/36 Celsius
Lifetime Min/Max Temperature: 2/41 Celsius
Under/Over Temperature Limit Count: 0/0
Vendor specific:
01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
SMART Extended Comprehensive Error Log Version: 1 (2 sectors)
No Errors Logged
SMART Extended Self-test Log Version: 1 (2 sectors)
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Interrupted (host reset) 90% 59581 -
# 2 Short offline Completed without error 00% 0 -
SMART Selective self-test log data structure revision number 0
Note: revision number not 1 implies that no selective self-test has ever been run
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Self_test_in_progress [70% left] (0-65535)
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
SCT Status Version: 2
SCT Version (vendor specific): 256 (0x0100)
SCT Support Level: 1
Device State: Active (0)
Current Temperature: 36 Celsius
Power Cycle Min/Max Temperature: 30/36 Celsius
Lifetime Min/Max Temperature: 16/66 Celsius
Under/Over Temperature Limit Count: 0/0
SMART Extended Comprehensive Error Log Version: 1 (6 sectors)
No Errors Logged
SMART Extended Self-test Log Version: 1 (1 sectors)
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 50994 -
# 2 Short offline Completed without error 00% 0 -
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
SCT Status Version: 3
SCT Version (vendor specific): 258 (0x0102)
SCT Support Level: 1
Device State: DST executing in background (3)
Current Temperature: 37 Celsius
Power Cycle Min/Max Temperature: 32/37 Celsius
Lifetime Min/Max Temperature: 2/44 Celsius
Under/Over Temperature Limit Count: 0/0
Vendor specific:
01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
SMART Extended Comprehensive Error Log Version: 1 (6 sectors)
No Errors Logged
SMART Extended Self-test Log Version: 1 (1 sectors)
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 50994 -
# 2 Short offline Completed without error 00% 0 -
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
SCT Status Version: 3
SCT Version (vendor specific): 258 (0x0102)
SCT Support Level: 1
Device State: DST executing in background (3)
Current Temperature: 37 Celsius
Power Cycle Min/Max Temperature: 32/37 Celsius
Lifetime Min/Max Temperature: 2/44 Celsius
Under/Over Temperature Limit Count: 0/0
Vendor specific:
01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
SMART Extended Comprehensive Error Log Version: 1 (6 sectors)
No Errors Logged
SMART Extended Self-test Log Version: 1 (1 sectors)
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 17086 -
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
SCT Status Version: 3
SCT Version (vendor specific): 258 (0x0102)
SCT Support Level: 1
Device State: DST executing in background (3)
Current Temperature: 36 Celsius
Power Cycle Min/Max Temperature: 31/36 Celsius
Lifetime Min/Max Temperature: 2/41 Celsius
Under/Over Temperature Limit Count: 0/0
Vendor specific:
01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Re: Disk test running 10 days already
@stevevandegaer2 wrote:
I guess I should replace this first one?
...
84 -- 51 00 08 00 00 79 cc 70 00 40 00 Error: ICRC, ABRT at LBA = 0x79cc7000 = 2043441152
This error means that the request was aborted. That might not be a problem with the disk. In general I wouldn't be concerned about a single abort error. I have a couple I created accidentally when I was testing a drive with Lifeguard. Also, a single aborted error doesn't really explain the long-running test.
Still, you might look at how long ago that error happened. You can compute that by subtracting the powered-up time in the error message (e.g. 9633) from the current power up time in the SMART stats. That does assume the disk is powered 24x7 though.
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Re: Disk test running 10 days already
So at first view I don't have any faulty disks, my new disk test started about 6 hours ago. I'll check it again tomorrow and see if it is still running.
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Re: Disk test running 10 days already
@stevevandegaer2 wrote:
So at first view I don't have any faulty disks, my new disk test started about 6 hours ago. I'll check it again tomorrow and see if it is still running.
Note that the smartctl -x command was giving you the completion status on each disk.
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Self_test_in_progress [60% left] (0-65535)
You can reduce the clutter with
# smartctl -x /dev/sda | grep -i self_test
That would let you monitor progress, and give you some idea if it is locking up or just running very slowly.
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Re: Disk test running 10 days already
StephenB thank you very much for your help. This morning the diks test was complete and no errors where in the logs.