Forum Discussion

Tutor

Jun 27, 2019

Solved

kernel bug ReadyNASOS 6.10.1 screen massage __extent_writepsge_io+1d3

Hi, My Readynas hit kernel bug, it running latests code 6.10.1. Dmesg output after it happened. I needed to power cycle to get it rebooted. [1108355.142034] ------------[ cut here ]------------ [1...

StephenB

Jun 28, 2019

Shrinking this one down ...

jukkaforss wrote:

sdb part 1

root@readynas:~# smartctl -x /dev/sdb

ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate     POSR-K   200   200   051    -    159
  9 Power_On_Hours          -O--CK   028   028   000    -    52621

Error 9 [8] occurred at disk power-on lifetime: 50321 hours (2096 days + 17 hours)
Error: UNC at LBA = 0x12fdf7710 = 5098141456

Error 8 [7] occurred at disk power-on lifetime: 50321 hours (2096 days + 17 hours)
Error: WP at LBA = 0x12fdf7710 = 5098141456

Error 7 [6] occurred at disk power-on lifetime: 41988 hours (1749 days + 12 hours)
Error: UNC at LBA = 0xcf083840 = 3473422400

Error 6 [5] occurred at disk power-on lifetime: 41988 hours (1749 days + 12 hours)
Error: WP at LBA = 0xcf083840 = 3473422400

Error 5 [4] occurred at disk power-on lifetime: 38618 hours (1609 days + 2 hours)
Error: UNC at LBA = 0xe824a0c0 = 3894714560

Error 4 [3] occurred at disk power-on lifetime: 38618 hours (1609 days + 2 hours)
Error: WP at LBA = 0xe824a0b8 = 3894714552

Error 3 [2] occurred at disk power-on lifetime: 38618 hours (1609 days + 2 hours)
Error: UNC at LBA = 0xe824a0b8 = 3894714552

Error 2 [1] occurred at disk power-on lifetime: 38615 hours (1608 days + 23 hours)
Error: WP at LBA = 0x11616d4b8 = 4665562296

I saw a similar pattern on one of my WD60EFRX drives a while ago, and when I tested it with Lifeguard it failed. Though a second disk with the same pattern passed Lifeguard. So I recommend testing this disk (and perhaps replace it even if it does pass).

The most recent logged error was about 2000 hours ago (~ 3 months), so that particular error didn't cause the most recent crash. But I'm thinking that this disk likely triggered it anyway.

FWIW, I haven't seen any explanation of how to decode the raw read error rate. But it is quite a bit higher on this drive than your other ones.

jukkaforss

Tutor

Jun 28, 2019

sdb part 1

root@readynas:~# smartctl -x /dev/sdb
smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.4.178.x86_64.1] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Red
Device Model:     WDC WD30EFRX-68AX9N0
Serial Number:    WD-WMC1T3056108
LU WWN Device Id: 5 0014ee 658a0a0ff
Firmware Version: 80.00A80
User Capacity:    3,000,592,982,016 bytes [3.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2 (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Fri Jun 28 11:11:34 2019 EEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is:   Unavailable
APM feature is:   Unavailable
Rd look-ahead is: Enabled
Write cache is:   Enabled
DSN feature is:   Unavailable
ATA Security is:  Disabled, frozen [SEC2]
Wt Cache Reorder: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00)	Offline data collection activity
					was never started.
					Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0)	The previous self-test routine completed
					without error or no self-test has ever 
					been run.
Total time to complete Offline 
data collection: 		(39120) seconds.
Offline data collection
capabilities: 			 (0x7b) SMART execute Offline immediate.
					Auto Offline data collection on/off support.
					Suspend Offline collection upon new
					command.
					Offline surface scan supported.
					Self-test supported.
					Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	 (   2) minutes.
Extended self-test routine
recommended polling time: 	 ( 393) minutes.
Conveyance self-test routine
recommended polling time: 	 (   5) minutes.
SCT capabilities: 	       (0x70bd)	SCT Status supported.
					SCT Error Recovery Control supported.
					SCT Feature Control supported.
					SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate     POSR-K   200   200   051    -    159
  3 Spin_Up_Time            POS--K   210   178   021    -    4475
  4 Start_Stop_Count        -O--CK   100   100   000    -    134
  5 Reallocated_Sector_Ct   PO--CK   200   200   140    -    0
  7 Seek_Error_Rate         -OSR-K   200   200   000    -    0
  9 Power_On_Hours          -O--CK   028   028   000    -    52621
 10 Spin_Retry_Count        -O--CK   100   100   000    -    0
 11 Calibration_Retry_Count -O--CK   100   100   000    -    0
 12 Power_Cycle_Count       -O--CK   100   100   000    -    134
192 Power-Off_Retract_Count -O--CK   200   200   000    -    98
193 Load_Cycle_Count        -O--CK   200   200   000    -    35
194 Temperature_Celsius     -O---K   101   099   000    -    49
196 Reallocated_Event_Count -O--CK   200   200   000    -    0
197 Current_Pending_Sector  -O--CK   200   200   000    -    0
198 Offline_Uncorrectable   ----CK   100   253   000    -    0
199 UDMA_CRC_Error_Count    -O--CK   200   200   000    -    0
200 Multi_Zone_Error_Rate   ---R--   200   200   000    -    0
                            ||||||_ K auto-keep
                            |||||__ C event count
                            ||||___ R error rate
                            |||____ S speed/performance
                            ||_____ O updated online
                            |______ P prefailure warning

General Purpose Log Directory Version 1
SMART           Log Directory Version 1 [multi-sector log support]
Address    Access  R/W   Size  Description
0x00       GPL,SL  R/O      1  Log Directory
0x01           SL  R/O      1  Summary SMART error log
0x02           SL  R/O      5  Comprehensive SMART error log
0x03       GPL     R/O      6  Ext. Comprehensive SMART error log
0x06           SL  R/O      1  SMART self-test log
0x07       GPL     R/O      1  Extended self-test log
0x09           SL  R/W      1  Selective self-test log
0x10       GPL     R/O      1  NCQ Command Error log
0x11       GPL     R/O      1  SATA Phy Event Counters log
0x21       GPL     R/O      1  Write stream error log
0x22       GPL     R/O      1  Read stream error log
0x80-0x9f  GPL,SL  R/W     16  Host vendor specific log
0xa0-0xa7  GPL,SL  VS      16  Device vendor specific log
0xa8-0xb7  GPL,SL  VS       1  Device vendor specific log
0xbd       GPL,SL  VS       1  Device vendor specific log
0xc0       GPL,SL  VS       1  Device vendor specific log
0xc1       GPL     VS      93  Device vendor specific log
0xe0       GPL,SL  R/W      1  SCT Command/Status
0xe1       GPL,SL  R/W      1  SCT Data Transfer

SMART Extended Comprehensive Error Log Version: 1 (6 sectors)
Device Error Count: 9
	CR     = Command Register
	FEATR  = Features Register
	COUNT  = Count (was: Sector Count) Register
	LBA_48 = Upper bytes of LBA High/Mid/Low Registers ]  ATA-8
	LH     = LBA High (was: Cylinder High) Register    ]   LBA
	LM     = LBA Mid (was: Cylinder Low) Register      ] Register
	LL     = LBA Low (was: Sector Number) Register     ]
	DV     = Device (was: Device/Head) Register
	DC     = Device Control Register
	ER     = Error register
	ST     = Status register
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 9 [8] occurred at disk power-on lifetime: 50321 hours (2096 days + 17 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 51 00 00 00 01 2f df 77 10 40 00  Error: UNC at LBA = 0x12fdf7710 = 5098141456

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  60 00 10 00 60 00 00 67 4d 52 b0 40 08     01:04:39.235  READ FPDMA QUEUED
  61 00 40 00 58 00 01 24 f2 4a 00 40 08     01:04:39.235  WRITE FPDMA QUEUED
  61 00 40 00 50 00 01 24 f2 49 80 40 08     01:04:39.235  WRITE FPDMA QUEUED
  61 00 10 00 48 00 00 01 73 7a 18 40 08     01:04:39.235  WRITE FPDMA QUEUED
  60 00 80 00 40 00 01 2f df 76 c0 40 08     01:04:39.235  READ FPDMA QUEUED

Error 8 [7] occurred at disk power-on lifetime: 50321 hours (2096 days + 17 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 51 00 00 00 01 2f df 77 10 40 00  Error: WP at LBA = 0x12fdf7710 = 5098141456

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  61 00 40 00 50 00 01 24 f2 80 00 40 08     01:04:36.141  WRITE FPDMA QUEUED
  61 00 40 00 48 00 01 24 f2 7f c0 40 08     01:04:36.140  WRITE FPDMA QUEUED
  61 00 40 00 40 00 01 24 f2 59 c0 40 08     01:04:36.140  WRITE FPDMA QUEUED
  61 00 40 00 38 00 01 24 f2 56 00 40 08     01:04:36.140  WRITE FPDMA QUEUED
  61 00 40 00 30 00 01 24 f7 ad c0 40 08     01:04:36.140  WRITE FPDMA QUEUED

Error 7 [6] occurred at disk power-on lifetime: 41988 hours (1749 days + 12 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 51 00 00 00 00 cf 08 38 40 40 00  Error: UNC at LBA = 0xcf083840 = 3473422400

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  60 00 d0 00 00 00 00 04 bd 4a 40 40 08     02:27:16.202  READ FPDMA QUEUED
  60 01 80 00 f0 00 00 04 bd 48 40 40 08     02:27:16.202  READ FPDMA QUEUED
  60 01 80 00 e8 00 00 04 bd 46 40 40 08     02:27:16.202  READ FPDMA QUEUED
  60 01 80 00 e0 00 00 04 bd 44 40 40 08     02:27:16.202  READ FPDMA QUEUED
  60 01 80 00 d0 00 00 04 bd 40 40 40 08     02:27:16.202  READ FPDMA QUEUED

Error 6 [5] occurred at disk power-on lifetime: 41988 hours (1749 days + 12 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 51 00 00 00 00 cf 08 38 40 40 00  Error: WP at LBA = 0xcf083840 = 3473422400

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  61 00 40 00 a8 00 00 cf 02 e2 c0 40 08     02:27:13.298  WRITE FPDMA QUEUED
  60 00 d0 00 a0 00 00 04 bd 3a 40 40 08     02:27:13.289  READ FPDMA QUEUED
  60 01 30 00 98 00 00 04 bd 38 90 40 08     02:27:13.289  READ FPDMA QUEUED
  60 00 50 00 90 00 00 04 bd 38 40 40 08     02:27:13.280  READ FPDMA QUEUED
  60 00 b0 00 88 00 00 04 bd 37 10 40 08     02:27:13.280  READ FPDMA QUEUED

Error 5 [4] occurred at disk power-on lifetime: 38618 hours (1609 days + 2 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 51 00 00 00 00 e8 24 a0 c0 40 00  Error: UNC at LBA = 0xe824a0c0 = 3894714560

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  60 01 80 00 88 00 00 5a 64 0e 40 40 08     06:24:21.982  READ FPDMA QUEUED
  60 00 50 00 80 00 00 5a 64 10 40 40 08     06:24:21.982  READ FPDMA QUEUED
  60 00 08 00 78 00 00 e8 24 a0 c0 40 08     06:24:21.982  READ FPDMA QUEUED
  60 00 78 00 70 00 00 e8 24 a0 c8 40 08     06:24:21.982  READ FPDMA QUEUED
  60 00 30 00 68 00 00 5a 64 05 90 40 08     06:24:21.982  READ FPDMA QUEUED

Error 4 [3] occurred at disk power-on lifetime: 38618 hours (1609 days + 2 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 51 00 00 00 00 e8 24 a0 b8 40 00  Error: WP at LBA = 0xe824a0b8 = 3894714552

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  61 00 40 00 28 00 00 88 14 21 c0 40 08     06:24:19.084  WRITE FPDMA QUEUED
  60 00 80 00 20 00 00 e8 24 a0 40 40 08     06:24:19.084  READ FPDMA QUEUED
  60 00 78 00 18 00 00 e8 24 a0 c8 40 08     06:24:19.083  READ FPDMA QUEUED
  60 00 08 00 10 00 00 e8 24 a0 c0 40 08     06:24:19.083  READ FPDMA QUEUED
  60 00 50 00 08 00 00 5a 64 10 40 40 08     06:24:19.083  READ FPDMA QUEUED

Error 3 [2] occurred at disk power-on lifetime: 38618 hours (1609 days + 2 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 51 00 00 00 00 e8 24 a0 b8 40 00  Error: UNC at LBA = 0xe824a0b8 = 3894714552

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  60 01 80 00 50 00 00 e8 24 a0 40 40 08     06:24:16.156  READ FPDMA QUEUED
  60 00 80 00 48 00 00 e8 24 9f 40 40 08     06:24:16.156  READ FPDMA QUEUED
  ea 00 00 00 00 00 00 00 00 00 00 e0 08     06:24:16.131  FLUSH CACHE EXT
  61 00 01 00 38 00 00 00 90 00 48 40 08     06:24:16.131  WRITE FPDMA QUEUED
  ea 00 00 00 00 00 00 00 00 00 00 e0 08     06:24:16.131  FLUSH CACHE EXT

Error 2 [1] occurred at disk power-on lifetime: 38615 hours (1608 days + 23 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 51 00 00 00 01 16 16 d4 b8 40 00  Error: WP at LBA = 0x11616d4b8 = 4665562296

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  61 00 08 00 90 00 00 00 93 fe c0 40 08     02:53:56.605  WRITE FPDMA QUEUED
  60 00 80 00 88 00 01 16 16 d4 40 40 08     02:53:56.605  READ FPDMA QUEUED
  60 00 80 00 80 00 01 16 16 d4 c0 40 08     02:53:56.605  READ FPDMA QUEUED
  60 00 80 00 78 00 01 16 16 d5 40 40 08     02:53:56.605  READ FPDMA QUEUED
  60 00 08 00 70 00 00 00 71 98 98 40 08     02:53:56.605  READ FPDMA QUEUED

StephenB

Guru - Experienced User

Jun 28, 2019

Shrinking this one down ...

jukkaforss wrote:

sdb part 1

root@readynas:~# smartctl -x /dev/sdb

ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate     POSR-K   200   200   051    -    159
  9 Power_On_Hours          -O--CK   028   028   000    -    52621

Error 9 [8] occurred at disk power-on lifetime: 50321 hours (2096 days + 17 hours)
Error: UNC at LBA = 0x12fdf7710 = 5098141456

Error 8 [7] occurred at disk power-on lifetime: 50321 hours (2096 days + 17 hours)
Error: WP at LBA = 0x12fdf7710 = 5098141456

Error 7 [6] occurred at disk power-on lifetime: 41988 hours (1749 days + 12 hours)
Error: UNC at LBA = 0xcf083840 = 3473422400

Error 6 [5] occurred at disk power-on lifetime: 41988 hours (1749 days + 12 hours)
Error: WP at LBA = 0xcf083840 = 3473422400

Error 5 [4] occurred at disk power-on lifetime: 38618 hours (1609 days + 2 hours)
Error: UNC at LBA = 0xe824a0c0 = 3894714560

Error 4 [3] occurred at disk power-on lifetime: 38618 hours (1609 days + 2 hours)
Error: WP at LBA = 0xe824a0b8 = 3894714552

Error 3 [2] occurred at disk power-on lifetime: 38618 hours (1609 days + 2 hours)
Error: UNC at LBA = 0xe824a0b8 = 3894714552

Error 2 [1] occurred at disk power-on lifetime: 38615 hours (1608 days + 23 hours)
Error: WP at LBA = 0x11616d4b8 = 4665562296

The most recent logged error was about 2000 hours ago (~ 3 months), so that particular error didn't cause the most recent crash. But I'm thinking that this disk likely triggered it anyway.

FWIW, I haven't seen any explanation of how to decode the raw read error rate. But it is quite a bit higher on this drive than your other ones.

NETGEAR Academy

Boost your skills with the Netgear Academy - Get trained, certified and stay ahead with the latest Netgear technology!

Join Us!

ProSupport for Business

Comprehensive support plans for maximum network uptime and business peace of mind.

Learn More

Forum Discussion

kernel bug ReadyNASOS 6.10.1 screen massage __extent_writepsge_io+1d3

Related Content

NFSv4 HOWTO ReadyNASOS 6.5.1

uTorrent 3.3.30470 on ReadyNASOS 6.9.2?

Kernel modules for ReadyNASOS 6.9.X

Kernel archive for ReadyNASOS 6.5.0 seems broken

ReadyNASOS 6.10.0-RC1

NETGEAR Academy

ProSupport for Business