Re: readynas 4220 memory segmentation fault

scottonaharley · ‎2022-04-17

This unit has been humming along for several years when this error started to pop up following the 6.10.7 upgrade.

I am able to access via ssh but the management service is not running. The web server is and I do make a connection to the web server. It is just that frontview is not running.

Network shares are available.

I did physically re-seat the memory to no effect.

I am able to download the log package via RAIDar.

The results of diagnostics via RAIDar are cryptic. It lists these two system errors:

Disk 1 has 8 Current Pending Sectors
Volume root is degraded

And nothing in the logs section.

This leads me to believe that perhaps something in front view is broken?
Is there a way to restart the management service via the cmd line?

As a last resort I will offload the data and factory default the unit. I would just rather not lose all that time as it has 12-10TB drives and moving data is a job.

Thank you in advance for any help you can give. It is much appreciated.

scottonaharley · ‎2022-04-20

Upgrading to 6.10.6 brought back the problem in the same lib along with the failure of the management interface after a short period of time.

<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<diag output>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

2022-04-20 06:02:31: snapshot_monito[6361]: segfault at 30 ip 00007f1ffa430950 sp 00007f1fc7ffeae8 error 4 in libapr-1.so.0.5.1[7f1ffa413000+32000]

<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

Downgrading to 6.10.5 eliminated the problem. I wonder if the clam can be updated independently of the OS. This is from the 6.10.6 release notes:

Antivirus ClamAV is upgraded to version 0.103.2+dfsg.

It looks like stability is restored at the 6.10.5 level.

View solution in original post

StephenB · ‎2022-04-18

@scottonaharley wrote:

The results of diagnostics via RAIDar are cryptic. It lists these two system errors:

Disk 1 has 8 Current Pending Sectors

Volume root is degraded

The second error suggests something is wrong with RAID array for the OS (and the first says that you are seeing disk errors on disk 1).

I'd try powering down, removing disk 1, and then try rebooting the NAS as read-only. If that results in normal access to the nas admin ui, then I'd test the disk with vendor tools. Personally I run both the full read test and a full write zeros/erase test.

@scottonaharley wrote:

This leads me to believe that perhaps something in front view is broken?
Is there a way to restart the management service via the cmd line?

You can restart the apache2 with

# systemctl restart apache2

readynasd is also a service, so you could try

# systemctl status readynasd
# systemctl restart readynasd

scottonaharley · ‎2022-04-18

Which disk is "Disk 1". Is it the disk labeled "sda" or is it "channel 1" when viewing the disk status drop down in the management interface?

I'm going to shutdown and remove the disk tagged as "channel 1" and reboot.

Executing the restart on the readynasd service brought the interface back up.

The error on the last line does not concern me. The "/data/video" share was old and has been removed. That error should no longer occur.

<<<<<<<<<<<<<<<<Command execution and output>>>>>>>>>>>>>>>>>>

root@poseidon:~# systemctl status readynasd

● readynasd.service - ReadyNAS System Daemon

Loaded: loaded (/lib/systemd/system/readynasd.service; enabled; vendor preset: enabled)

Active: failed (Result: start-limit-hit) since Sat 2022-04-16 15:45:24 EDT; 1 day 18h ago

Main PID: 5144 (code=killed, signal=SEGV)

Warning: Journal has been rotated since unit was started. Log output is incomplete or unavailable.

root@poseidon:~# systemctl restart readynasd

root@poseidon:~# systemctl status readynasd

● readynasd.service - ReadyNAS System Daemon

Loaded: loaded (/lib/systemd/system/readynasd.service; enabled; vendor preset: enabled)

Active: active (running) since Mon 2022-04-18 10:31:45 EDT; 12s ago

Main PID: 21377 (readynasd)

Status: "Start Main process"

CGroup: /system.slice/readynasd.service

└─21377 /usr/sbin/readynasd -v 3 -t

Apr 18 10:31:45 poseidon rn-expand[21377]: Checking if RAID disk sdb is expandable...

Apr 18 10:31:45 poseidon rn-expand[21377]: Checking if RAID disk sdg is expandable...

Apr 18 10:31:45 poseidon rn-expand[21377]: Checking if RAID disk sdf is expandable...

Apr 18 10:31:45 poseidon rn-expand[21377]: Checking if RAID disk sdi is expandable...

Apr 18 10:31:45 poseidon rn-expand[21377]: No enough disks for data-0 to expand [need 4, have 0]

Apr 18 10:31:45 poseidon rn-expand[21377]: 0 disks expandable in data

Apr 18 10:31:45 poseidon systemd[1]: Started ReadyNAS System Daemon.

Apr 18 10:31:45 poseidon readynasd[21377]: ReadyNASOS background service started.

Apr 18 10:31:45 poseidon readynasd[21377]: Snapper SetConfig successfully.

Apr 18 10:31:45 poseidon readynasd[21377]: Failed to chmod snap_path /data/video/.snapshots, errno = 2

root@poseidon:~#

<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

scottonaharley · ‎2022-04-18

Replaced the disk located in channel 1 and the "8 pending sectors" message is now gone. While the messages "Volume root is degraded" (existing) and "Volume Data is degraded" (new) do appear I am fairly confident that they will no longer appear once the array completes resyncing. That process will take more than 24 hours so I will post the results here upon completion.

scottonaharley · ‎2022-04-18

The problem is still here. It is only related to the management interface portion of the system. All other services seem unaffected.

Here is the diagnostics post readynasd failure:

<<<<<<<<<<<<<<<<<<<<,diagnostic output>>>>>>>>>>>>>>>>>>>>>>

Successfully completed diagnostics

System

Volume root is degraded
Volume data is degraded

Logs

2022-04-18 13:57:42: enclosure_monit[5053]: segfault at 30 ip 00007f594f448950 sp 00007f5941586ae8 error 4 in libapr-1.so.0.5.1[7f594f42b000+32000]
2022-04-18 12:13:11: md/raid:md127: raid level 6 active with 11 out of 12 devices, algorithm 2

System Management

2022-04-18 13:59:13: readynasd.service: Failed to fork: Cannot allocate memory
2022-04-18 13:59:13: Failed to start ReadyNAS System Daemon.
2022-04-18 13:59:13: Failed to start ReadyNAS System Daemon.
2022-04-18 13:59:12: readynasd.service: Failed to fork: Cannot allocate memory
2022-04-18 13:59:12: Failed to start ReadyNAS System Daemon.
2022-04-18 13:59:12: readynasd.service: Failed to fork: Cannot allocate memory
2022-04-18 13:59:12: Failed to start ReadyNAS System Daemon.
2022-04-18 13:59:12: readynasd.service: Failed to fork: Cannot allocate memory
2022-04-18 13:59:12: Failed to start ReadyNAS System Daemon.
2022-04-18 13:59:12: readynasd.service: Failed to fork: Cannot allocate memory
2022-04-18 13:59:12: Failed to start ReadyNAS System Daemon.
2022-04-18 12:13:49: NetworkStats eth0 failed: ERROR: mmaping file '/run/readynasd/stats/network_eth0_pkts.rrd': Invalid argument
2022-04-18 12:13:43: DB (main) schema version: 24 ==> 24
2022-04-18 12:13:43: DB (queue) schema version: new ==> 0

scottonaharley · ‎2022-04-19

I have downgraded to v6.10.5 and the management interface appears to be more stable. It has been up continuously for 24 hours (usually failed after a few hours) however I still have these messages in the diagnostic output. Any thoughts would be appreciated.

<<<<<<<<<<<<<<<<<<<<<<begin diagnostic output>>>>>>>>>>>>>>>>>>>>>

System

Volume root is degraded

Logs

2022-04-18 17:10:32: md/raid:md127: raid level 6 active with 11 out of 12 devices, algorithm 2
2022-04-18 13:57:42: enclosure_monit[5053]: segfault at 30 ip 00007f594f448950 sp 00007f5941586ae8 error 4 in libapr-1.so.0.5.1[7f594f42b000+32000]
2022-04-18 12:13:11: md/raid:md127: raid level 6 active with 11 out of 12 devices, algorithm 2

StephenB · ‎2022-04-19

@scottonaharley wrote:

I have downgraded to v6.10.5 and the management interface appears to be more stable. It has been up continuously for 24 hours (usually failed after a few hours) however I still have these messages in the diagnostic output. Any thoughts would be appreciated.

<<<<<<<<<<<<<<<<<<<<<<begin diagnostic output>>>>>>>>>>>>>>>>>>>>>

System

Volume root is degraded

Logs

2022-04-18 17:10:32: md/raid:md127: raid level 6 active with 11 out of 12 devices, algorithm 2

2022-04-18 13:57:42: enclosure_monit[5053]: segfault at 30 ip 00007f594f448950 sp 00007f5941586ae8 error 4 in libapr-1.so.0.5.1[7f594f42b000+32000]

2022-04-18 12:13:11: md/raid:md127: raid level 6 active with 11 out of 12 devices, algorithm 2

Can you post mdstat.log from the log zip file? It's simplest to just copy/paste it into a reply.

scottonaharley · ‎2022-04-19

This is the Diagnostic output and corresponding mdstat.log file

<<<<<<<<<<<<<<<<<<<<<Diagnostic output>>>>>>>>>>>>>>>>>>>>>>

System

Volume root is degraded

Logs

2022-04-18 17:10:32: md/raid:md127: raid level 6 active with 11 out of 12 devices, algorithm 2
2022-04-18 13:57:42: enclosure_monit[5053]: segfault at 30 ip 00007f594f448950 sp 00007f5941586ae8 error 4 in libapr-1.so.0.5.1[7f594f42b000+32000]

<<<<<<<<<<<<<<<<<<<Begin mdstat.log>>>>>>>>>>>>>>>>>>>>>>>>

Personalities : [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
md127 : active raid6 sdd3[13] sdi3[11] sdf3[10] sdg3[9] sdb3[8] sdj3[12] sde3[6] sdh3[5] sdc3[4] sdl3[3] sdk3[2] sda3[1]
97615871360 blocks super 1.2 level 6, 64k chunk, algorithm 2 [12/12] [UUUUUUUUUUUU]

md1 : active raid10 sda2[0] sdd2[11] sdl2[10] sdk2[9] sdj2[8] sdi2[7] sdh2[6] sdg2[5] sdf2[4] sde2[3] sdc2[2] sdb2[1]
3133440 blocks super 1.2 512K chunks 2 near-copies [12/12] [UUUUUUUUUUUU]

md0 : active raid1 sdk1[15] sda1[7] sdc1[8] sdh1[9] sdb1[10] sdg1[17] sdf1[18] sdd1[16] sde1[11] sdi1[12] sdj1[13] sdl1[14]
4190208 blocks super 1.2 [13/12] [UUUUUUUUUUUU_]

unused devices: <none>
/dev/md/0:
Version : 1.2
Creation Time : Thu Feb 4 10:52:54 2016
Raid Level : raid1
Array Size : 4190208 (4.00 GiB 4.29 GB)
Used Dev Size : 4190208 (4.00 GiB 4.29 GB)
Raid Devices : 13
Total Devices : 12
Persistence : Superblock is persistent

Update Time : Tue Apr 19 12:53:30 2022
State : active, degraded
Active Devices : 12
Working Devices : 12
Failed Devices : 0
Spare Devices : 0

Consistency Policy : unknown

Name : 0a435fe8:0 (local to host 0a435fe8)
UUID : 9dfe8599:c667e799:31d20bae:6c578b2b
Events : 14183728

Number Major Minor RaidDevice State
15 8 161 0 active sync /dev/sdk1
14 8 177 1 active sync /dev/sdl1
13 8 145 2 active sync /dev/sdj1
12 8 129 3 active sync /dev/sdi1
11 8 65 4 active sync /dev/sde1
16 8 49 5 active sync /dev/sdd1
18 8 81 6 active sync /dev/sdf1
17 8 97 7 active sync /dev/sdg1
10 8 17 8 active sync /dev/sdb1
9 8 113 9 active sync /dev/sdh1
8 8 33 10 active sync /dev/sdc1
7 8 1 11 active sync /dev/sda1
- 0 0 12 removed
/dev/md/1:
Version : 1.2
Creation Time : Mon Apr 18 12:19:11 2022
Raid Level : raid10
Array Size : 3133440 (2.99 GiB 3.21 GB)
Used Dev Size : 522240 (510.00 MiB 534.77 MB)
Raid Devices : 12
Total Devices : 12
Persistence : Superblock is persistent

Update Time : Mon Apr 18 14:51:30 2022
State : clean
Active Devices : 12
Working Devices : 12
Failed Devices : 0
Spare Devices : 0

Layout : near=2
Chunk Size : 512K

Consistency Policy : unknown

Name : 0a435fe8:1 (local to host 0a435fe8)
UUID : be669e54:da4ece09:4eb90e9b:0c6bd728
Events : 19

Number Major Minor RaidDevice State
0 8 2 0 active sync set-A /dev/sda2
1 8 18 1 active sync set-B /dev/sdb2
2 8 34 2 active sync set-A /dev/sdc2
3 8 66 3 active sync set-B /dev/sde2
4 8 82 4 active sync set-A /dev/sdf2
5 8 98 5 active sync set-B /dev/sdg2
6 8 114 6 active sync set-A /dev/sdh2
7 8 130 7 active sync set-B /dev/sdi2
8 8 146 8 active sync set-A /dev/sdj2
9 8 162 9 active sync set-B /dev/sdk2
10 8 178 10 active sync set-A /dev/sdl2
11 8 50 11 active sync set-B /dev/sdd2
/dev/md/data-0:
Version : 1.2
Creation Time : Sun Mar 5 02:59:59 2017
Raid Level : raid6
Array Size : 97615871360 (93093.75 GiB 99958.65 GB)
Used Dev Size : 9761587136 (9309.37 GiB 9995.87 GB)
Raid Devices : 12
Total Devices : 12
Persistence : Superblock is persistent

Update Time : Tue Apr 19 12:51:07 2022
State : clean
Active Devices : 12
Working Devices : 12
Failed Devices : 0
Spare Devices : 0

Layout : left-symmetric
Chunk Size : 64K

Consistency Policy : unknown

Name : 0a435fe8:data-0 (local to host 0a435fe8)
UUID : bef95a67:0caba9c5:bdf39d3f:3ec79306
Events : 76459

Number Major Minor RaidDevice State
13 8 51 0 active sync /dev/sdd3
1 8 3 1 active sync /dev/sda3
2 8 163 2 active sync /dev/sdk3
3 8 179 3 active sync /dev/sdl3
4 8 35 4 active sync /dev/sdc3
5 8 115 5 active sync /dev/sdh3
6 8 67 6 active sync /dev/sde3
12 8 147 7 active sync /dev/sdj3
8 8 19 8 active sync /dev/sdb3
9 8 99 9 active sync /dev/sdg3
10 8 83 10 active sync /dev/sdf3
11 8 131 11 active sync /dev/sdi3

StephenB · ‎2022-04-19

@scottonaharley wrote:
4190208 blocks super 1.2 [13/12] [UUUUUUUUUUUU_]

This bit from MD0 is weird - the system thinks there are 13 disks in the array for the root partition (the last one being missing).

The root is RAID-1 - the OS is replicated on every disk, so the system can boot from every disk.

So while this is confusing, I don't think it's critical.

I think these commands would resolve it:

mdadm --manage /dev/md0 --remove failed
mdadm --manage /dev/md0 --remove detached

But these aren't options I've ever used...

@scottonaharley wrote:

This is the Diagnostic output and corresponding mdstat.log file

<<<<<<<<<<<<<<<<<<<<<Diagnostic output>>>>>>>>>>>>>>>>>>>>>>

System

Volume root is degraded

Logs

2022-04-18 17:10:32: md/raid:md127: raid level 6 active with 11 out of 12 devices, algorithm 2

md127 : active raid6 sdd3[13] sdi3[11] sdf3[10] sdg3[9] sdb3[8] sdj3[12] sde3[6] sdh3[5] sdc3[4] sdl3[3] sdk3[2] sda3[1]
97615871360 blocks super 1.2 level 6, 64k chunk, algorithm 2 [12/12] [UUUUUUUUUUUU]
/dev/md/data-0:
Version : 1.2
Creation Time : Sun Mar 5 02:59:59 2017
Raid Level : raid6
Array Size : 97615871360 (93093.75 GiB 99958.65 GB)
Used Dev Size : 9761587136 (9309.37 GiB 9995.87 GB)
Raid Devices : 12
Total Devices : 12
Persistence : Superblock is persistent

Update Time : Tue Apr 19 12:51:07 2022
State : clean
Active Devices : 12
Working Devices : 12
Failed Devices : 0
Spare Devices : 0

md127 is the data volume, and whatever was going on with the 11 out of 12 devices warning appears to be resolved now.

@scottonaharley wrote:

2022-04-18 13:57:42: enclosure_monit[5053]: segfault at 30 ip 00007f594f448950 sp 00007f5941586ae8 error 4 in libapr-1.so.0.5.1[7f594f42b000+32000]

This is an apache library, so likely linked to the failure of the web ui. It might be useful to run the diagnostic again, and see if it gives the same result.

scottonaharley · ‎2022-04-19

The entries in the diagnostic output are unchanged after executing the commands and rebooting. However there are no new entries regarding the library file post downgrading to 6.10.5

I would expect that the first log message indicating 11 out of 12 devices was the result of replacing drive 1

I do not understand the Volume root is degraded message and I hesitate to use the tools available because of my unfamiliarity with front view and how it is integrated into the system.

Clamav is not updating (a known issue with 6.10.5. I am wondering if it is something in the configuration, the actual program or it's interface to front view. Can that be updated from the command line? Will that solve the virus definition file issue?

Thanks in advance.

<<<<<<<<<<<<<<<<<Diagnostic output>>>>>>>>>>>>>>>>>>>>>>>>>

System

Volume root is degraded

Logs

2022-04-18 17:10:32: md/raid:md127: raid level 6 active with 11 out of 12 devices, algorithm 2
2022-04-18 13:57:42: enclosure_monit[5053]: segfault at 30 ip 00007f594f448950 sp 00007f5941586ae8 error 4 in libapr-1.so.0.5.1[7f594f42b000+32000]

<<<<<<<<<<<<<<<<<<mdstat.log>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
no mdstat.log file in the current log file download

StephenB · ‎2022-04-19

@scottonaharley wrote:

The entries in the diagnostic output are unchanged after executing the commands and rebooting. However there are no new entries regarding the library file post downgrading to 6.10.5

Right. Those two log errors are from 4/18, so they aren't a result of running the diags.

@scottonaharley wrote:

I do not understand the Volume root is degraded message and I hesitate to use the tools available because of my unfamiliarity with front view and how it is integrated into the system.

Understood. As I said, I don't think it is critical. But somehow that volume is still including a disk that you replaced. There is no way to fix that from the admin web ui.

@scottonaharley wrote:

Clamav is not updating (a known issue with 6.10.5. I am wondering if it is something in the configuration, the actual program or it's interface to front view. Can that be updated from the command line? Will that solve the virus definition file issue?

I don't think you can fix that from the command line. You could try 6.10.6, which has the fix for it.

Sandshark · ‎2022-04-19

I believe this will fix the md0 issue:

mdadm --grow /dev/md0 --raid-devices=12

Yes, that's "grow" to actually shrink it, but that's the right keyword. It would definitely work if you were really reducing it from 13 to 12 drives, so I think it should also work to eliminate your "phantom" drive.

scottonaharley · ‎2022-04-19

That command cleared the "volume degraded" message. I will try moving to 6.10.6 to clear the clamav tomorrow and report the results.

scottonaharley · ‎2022-04-20

Upgrading to 6.10.6 brought back the problem in the same lib along with the failure of the management interface after a short period of time.

<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<diag output>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

2022-04-20 06:02:31: snapshot_monito[6361]: segfault at 30 ip 00007f1ffa430950 sp 00007f1fc7ffeae8 error 4 in libapr-1.so.0.5.1[7f1ffa413000+32000]

<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

Downgrading to 6.10.5 eliminated the problem. I wonder if the clam can be updated independently of the OS. This is from the 6.10.6 release notes:

Antivirus ClamAV is upgraded to version 0.103.2+dfsg.

It looks like stability is restored at the 6.10.5 level.