NETGEAR is aware of a growing number of phone and online scams. To learn how to stay safe click here.
Forum Discussion
kheno
Jun 18, 2015Aspirant
6.2.4 high cpu => lockup => reboot => boot failed
Hi,
I've had the same problem as mentioned here:
http://www.readynas.com/forum/viewtopic.php?f=160&t=81294
readynas ultra 4
os 6.2.4 (as I recall)
After a while ssh was not accessible.
I also had the only option to unsafe power off the device.
Only that now it keeps saying: booting
followed by boot failed, retry boot.
I've already tried the boot mem check which resulted in no errors.
Raidar only shows "system starting up..."
ssh, web, ... are not accessible.
I'm a little careful on what to do next.
Should I try the boot os install?
I've had the same problem as mentioned here:
http://www.readynas.com/forum/viewtopic.php?f=160&t=81294
readynas ultra 4
os 6.2.4 (as I recall)
After a while ssh was not accessible.
I also had the only option to unsafe power off the device.
Only that now it keeps saying: booting
followed by boot failed, retry boot.
I've already tried the boot mem check which resulted in no errors.
Raidar only shows "system starting up..."
ssh, web, ... are not accessible.
I'm a little careful on what to do next.
Should I try the boot os install?
14 Replies
Replies have been turned off for this discussion
- khenoAspirantupdate:
after the mem check, again rebooted and waited.
Raidar reports: management service is offline.
So It boots again, but probably cpu is again so high that admin and shh are not accessible. (or were for a short time)
downloading logs, ... failed. - khenoAspirantcould download logs. can pm on request.
First look:
systemd-journal.log
Jun 19 01:02:16 bigdisk kernel: btrfs: corrupt leaf, slot offset bad: block=2306986344448,root=1, slot=238
Jun 19 01:02:16 bigdisk kernel: btrfs: corrupt leaf, slot offset bad: block=2306986344448,root=1, slot=238
Jun 19 01:02:16 bigdisk kernel: btrfs: corrupt leaf, slot offset bad: block=2306986344448,root=1, slot=238
Jun 19 01:02:16 bigdisk kernel: btrfs: corrupt leaf, slot offset bad: block=2306986344448,root=1, slot=238
Jun 19 01:02:16 bigdisk kernel: btrfs: corrupt leaf, slot offset bad: block=2306986344448,root=1, slot=238
Jun 19 01:02:16 bigdisk kernel: btrfs: corrupt leaf, slot offset bad: block=2306986344448,root=1, slot=238
Jun 19 01:02:16 bigdisk kernel: btrfs: corrupt leaf, slot offset bad: block=2306986344448,root=1, slot=238
Jun 19 01:02:16 bigdisk kernel: btrfs: corrupt leaf, slot offset bad: block=2306986344448,root=1, slot=238
Jun 19 01:02:16 bigdisk kernel: btrfs: corrupt leaf, slot offset bad: block=2306986344448,root=1, slot=238
Jun 19 01:02:16 bigdisk kernel: btrfs: corrupt leaf, slot offset bad: block=2306986344448,root=1, slot=238
Jun 19 01:02:16 bigdisk kernel: btrfs: corrupt leaf, slot offset bad: block=2306986344448,root=1, slot=238
Jun 19 01:02:16 bigdisk kernel: btrfs: corrupt leaf, slot offset bad: block=2306986344448,root=1, slot=238
Jun 19 01:02:16 bigdisk kernel: btrfs: corrupt leaf, slot offset bad: block=2306986344448,root=1, slot=238
Jun 19 01:02:16 bigdisk kernel: btrfs: corrupt leaf, slot offset bad: block=2306986344448,root=1, slot=238
Jun 19 01:02:16 bigdisk kernel: btrfs: corrupt leaf, slot offset bad: block=2306986344448,root=1, slot=238
Jun 19 01:02:16 bigdisk kernel: btrfs: corrupt leaf, slot offset bad: block=2306986344448,root=1, slot=238
Jun 19 01:02:16 bigdisk kernel: btrfs: corrupt leaf, slot offset bad: block=2306986344448,root=1, slot=238 - mdgm-ntgrNETGEAR Employee RetiredCan you run the memory test boot menu option?: http://kb.netgear.com/app/answers/detail/a_id/21104
Run at least a few passes of this test.
If the memory passes then run the disk test boot menu option
Please send me the logs you have downloaded (there is an email address mentioned on the page linked to by the Sending Logs link in my sig) - khenoAspirantHi!
logs sent.
Memory test was run, but running again now.
Will run the disk test after that.
And keep you posted.
Thanks! - mdgm-ntgrNETGEAR Employee RetiredThanks for the update.
Looking at your logs one of your disks has 6 ATA error, but those errors were back in April. So I don't think that is the problem here though that disk might be failing.
Will be interested to hear the result of the memory test.
Do you have a backup? - khenoAspirantmemtest came clean. no errors.
disk test went from 0% to 100%
But now the lcd indicates: "Testing Disks" and power button/hard-disk numbers blink for already more than an hour.
Not 100% but I think it is only the hard-disk number 3 that blinks.
Does it take that long? I've red somewhere it should not take more than 10 minutes.
Oh, I have a partial offline backup; all important folders and files are back-upped every night to an offline site.
So, I prefer getting my data back over restoring :) but it's not the end of the world when something goes wrong.
Thanks - khenoAspirantHi,
It rebooted after a long time to land on the "failed to boot" status.
I shut down the Nas, figured it would also be good to take out the disks clean carefully the contacts and place them back.
Just in case of.
Booted again.
And downloaded the logs.
Where would I look for the result of the disk scan?
Accessing it through the smb shares results in unable to open shares or incomplete shares contents... :shock: - mdgm-ntgrNETGEAR Employee RetiredIf you SSH in and look at the smartctl output for the disks you will see the result of the extended tests.
- khenoAspirantThe thing is that after the testing it rebooted and is now in "boot failed" mode.
Where the admin and ssh are not available...
I was looking into buying a new rn314 by the end the summer.
I don't thing that buying one now and swapping drives will make my data accessible?
Is the data recoverable? should I look for an expert and pay him to have a look?
If there is a way to just get data recovered and copy it to external disks ...
I've used the linux command line for a while, but not that experienced to play with btrfs tools and the readynas setup, ...
As far as I can see in the downloaded logs through raidar: no real disk errors to be found except those from april.
btrfs mentions problems. I presume I would need to be able to login to ssh and fix probably the /c with btrfs tools (restore to another external drive or mount -o recovery,ro)
I could login in the tech support mode, but do not know where to start from there. - khenoAspirantok,
so I figured out the tech support mode, howto mount, and ...
So I am now able to ssh in when in normal boot.
smartctl output:
smartctl -a /dev/sda
smartctl 6.3 2014-07-26 r3976 [x86_64-linux-3.0.101.RNx86_64.3] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Western Digital Caviar Green (AF, SATA 6Gb/s)
Device Model: WDC WD30EZRX-00MMMB0
Serial Number: WD-WCAWZ1418318
LU WWN Device Id: 5 0014ee 25b9acf8e
Firmware Version: 80.00A80
User Capacity: 3,000,592,982,016 bytes [3.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Device is: In smartctl database [for details use: -P show]
ATA Version is: ATA8-ACS (minor revision not indicated)
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is: Sat Jun 20 20:53:02 2015 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (50760) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 488) minutes.
Conveyance self-test routine
recommended polling time: ( 5) minutes.
SCT capabilities: (0x3035) SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0
3 Spin_Up_Time 0x0027 158 148 021 Pre-fail Always - 9091
4 Start_Stop_Count 0x0032 091 091 000 Old_age Always - 9149
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0
9 Power_On_Hours 0x0032 070 070 000 Old_age Always - 22174
10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 100 100 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 099 099 000 Old_age Always - 1281
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 30
193 Load_Cycle_Count 0x0032 001 001 000 Old_age Always - 784723
194 Temperature_Celsius 0x0022 123 111 000 Old_age Always - 29
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 22161 -
# 2 Short offline Completed without error 00% 1 -
# 3 Short offline Completed without error 00% 0 -
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
----------------------------
smartctl -a /dev/sdb
smartctl 6.3 2014-07-26 r3976 [x86_64-linux-3.0.101.RNx86_64.3] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Western Digital Caviar Green (AF, SATA 6Gb/s)
Device Model: WDC WD30EZRX-00MMMB0
Serial Number: WD-WCAWZ1375860
LU WWN Device Id: 5 0014ee 20645a879
Firmware Version: 80.00A80
User Capacity: 3,000,592,982,016 bytes [3.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Device is: In smartctl database [for details use: -P show]
ATA Version is: ATA8-ACS (minor revision not indicated)
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is: Sat Jun 20 20:53:35 2015 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (51660) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 496) minutes.
Conveyance self-test routine
recommended polling time: ( 5) minutes.
SCT capabilities: (0x3035) SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0
3 Spin_Up_Time 0x0027 161 146 021 Pre-fail Always - 8933
4 Start_Stop_Count 0x0032 091 091 000 Old_age Always - 9086
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0
9 Power_On_Hours 0x0032 070 070 000 Old_age Always - 22174
10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 100 100 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 099 099 000 Old_age Always - 1281
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 31
193 Load_Cycle_Count 0x0032 001 001 000 Old_age Always - 787547
194 Temperature_Celsius 0x0022 122 109 000 Old_age Always - 30
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 22161 -
# 2 Short offline Completed without error 00% 1 -
# 3 Short offline Completed without error 00% 0 -
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
---------------------------------------------------------------------------------------
smartctl -a /dev/sdc
smartctl 6.3 2014-07-26 r3976 [x86_64-linux-3.0.101.RNx86_64.3] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Western Digital Caviar Green (AF, SATA 6Gb/s)
Device Model: WDC WD30EZRX-00DC0B0
Serial Number: WD-WCC1T1741134
LU WWN Device Id: 5 0014ee 2b3f2ee8f
Firmware Version: 80.00A80
User Capacity: 3,000,592,982,016 bytes [3.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-2 (minor revision not indicated)
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is: Sat Jun 20 20:54:16 2015 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (38400) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 385) minutes.
Conveyance self-test routine
recommended polling time: ( 5) minutes.
SCT capabilities: (0x70b5) SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0
3 Spin_Up_Time 0x0027 180 173 021 Pre-fail Always - 5983
4 Start_Stop_Count 0x0032 097 097 000 Old_age Always - 3444
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0
9 Power_On_Hours 0x0032 089 089 000 Old_age Always - 8736
10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 100 100 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 459
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 18
193 Load_Cycle_Count 0x0032 001 001 000 Old_age Always - 801457
194 Temperature_Celsius 0x0022 122 109 000 Old_age Always - 28
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 8734 -
# 2 Short offline Completed without error 00% 0 -
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
so I went on searching:
btrfs scrub start /dev/md0
btrfs scrub status /dev/md0
dmesg
no remarks but that it only scrubbed 1.45GB?!
and the btrfs errors previously reported
then checked raid setup:
mdadm --detail /dev/md0
/dev/md0:
Version : 1.2
Creation Time : Fri May 23 00:24:48 2014
Raid Level : raid1
Array Size : 4190208 (4.00 GiB 4.29 GB)
Used Dev Size : 4190208 (4.00 GiB 4.29 GB)
Raid Devices : 4
Total Devices : 3
Persistence : Superblock is persistent
Update Time : Sat Jun 20 21:24:08 2015
State : clean, degraded
Active Devices : 3
Working Devices : 3
Failed Devices : 0
Spare Devices : 0
Name : 37c0a64a:0 (local to host 37c0a64a)
UUID : 2b80a109:d01d1b7e:ed3cb495:04521fe4
Events : 562
Number Major Minor RaidDevice State
0 8 1 0 active sync /dev/sda1
1 8 17 1 active sync /dev/sdb1
4 0 0 4 removed
3 8 49 3 active sync /dev/sdd1
It seems degraded?
Still when degraded it should still hold all data?
I'm not an expert, but the strange thing is the disk seems connected and fine while the raid setup sees it as removed?
Oh, and still with a failing disk, data should not be corrupted.
So I was looking on howto check with btrfs. I think I need to do something alike:
umount /dev/md0
btrfs check --repair /dev/sda (destructive)
btrfs restore /dev/sda /mnt/restore (nondestructive restoreto externa disk)
only can't unmout : device is busy (by alot of processes)
Since I'm not really used to doing these things I would hope you could give me some advice/directions.
Thanks
Related Content
NETGEAR Academy

Boost your skills with the Netgear Academy - Get trained, certified and stay ahead with the latest Netgear technology!
Join Us!