NETGEAR is aware of a growing number of phone and online scams. To learn how to stay safe click here.
Forum Discussion
luite
Mar 30, 2013Aspirant
Duo V2 out of memory prevention?
How familiar is it to you waking up finding your readinas responding to ping only - no https or ssh at all? You pull the plug :( for an unclean restart and after a restart you are good to go again. Then in /var/log/syslog you find that during the night your readynas was so good as to kill you apache-ssl and opensshd processes in order to solve a 'out of memory' issue.. :cry:
Apart from the fact that the unit ships without sufficient memory ;-) the culprit seems to be minidlna; but the darn thing is not killing the culprit alone. Now for the real question - does anyone have experience with limiting memory use of processes to prevent this from going haywire?
(1) For example using /etc/security/limits.conf
(2) Any way we can influence the kernel to prioritize processes to kill ?
If only I can prevent ssh from being killed I feel like being 1 step further towards salvation :D
Using radiator 5.3.7 without major changes (I've stopped trying to apt-get upgrade after the xth factory to get it to boot again ;-)):
root@readynas:~# uname -a
Linux readynas 2.6.31.8.duov2 #1 Tue Aug 28 11:21:02 HKT 2012 armv5tel GNU/Linux
Apart from the fact that the unit ships without sufficient memory ;-) the culprit seems to be minidlna; but the darn thing is not killing the culprit alone. Now for the real question - does anyone have experience with limiting memory use of processes to prevent this from going haywire?
(1) For example using /etc/security/limits.conf
(2) Any way we can influence the kernel to prioritize processes to kill ?
If only I can prevent ssh from being killed I feel like being 1 step further towards salvation :D
Using radiator 5.3.7 without major changes (I've stopped trying to apt-get upgrade after the xth factory to get it to boot again ;-)):
root@readynas:~# uname -a
Linux readynas 2.6.31.8.duov2 #1 Tue Aug 28 11:21:02 HKT 2012 armv5tel GNU/Linux
3 Replies
Replies have been turned off for this discussion
- luiteAspirantUpdate - I've found 2 possible approached.
1) Using cgroups - but this is not supported by the kernel
2) using /proc/.../oom_adj and /proc/.../oom_score
For the latter, the strange thing is that the oom_adj value is already set to -17 which is hardcoded value for 'never kill'.. huh?root@readynas:/proc/1412# ps ax|grep sshd
1412 ? Ss 0:00 /usr/sbin/sshd
2288 ? Ss 0:00 sshd: root@pts/0
3017 ? Ss 0:00 sshd: root@pts/1
6995 pts/1 S+ 0:00 grep sshd
root@readynas:/proc/1412# cat oom_score
0
root@readynas:/proc/1412# cat oom_adj
-17
Scrutinizing /var/log/syslog more closely, I see only the apache-ssl process being killed; which might mean the openssh server is 'only non-responsive'. That would suggest I should try to make the OOM (out of memory manager) even more aggressive ? It looks like after killing apache-ssl processes some RAM is available but the SWAP is fully used. Is that something to investigate?Mar 31 04:03:44 readynas kernel: Out of memory: kill process 27666 (apache-ssl) score 26300 or a child
Mar 31 04:03:44 readynas kernel: Killed process 27666 (apache-ssl)
Mar 31 04:03:44 readynas kernel: mysql invoked oom-killer: gfp_mask=0x201da, order=0, oomkilladj=0
Mar 31 04:03:44 readynas kernel: [<c002ecf0>] (unwind_backtrace+0x0/0xdc) from [<c009d268>] (oom_kill_process+0x58/0x1b4)
Mar 31 04:03:44 readynas kernel: [<c009d268>] (oom_kill_process+0x58/0x1b4) from [<c009d7bc>] (__out_of_memory+0x160/0x180)
Mar 31 04:03:44 readynas kernel: [<c009d7bc>] (__out_of_memory+0x160/0x180) from [<c009d840>] (out_of_memory+0x64/0x98)
Mar 31 04:03:44 readynas kernel: [<c009d840>] (out_of_memory+0x64/0x98) from [<c009ffdc>] (__alloc_pages_nodemask+0x3f0/0x4dc)
Mar 31 04:03:44 readynas kernel: [<c009ffdc>] (__alloc_pages_nodemask+0x3f0/0x4dc) from [<c00a1e28>] (__do_page_cache_readahead+0x8c/0x1d8)
Mar 31 04:03:44 readynas kernel: [<c00a1e28>] (__do_page_cache_readahead+0x8c/0x1d8) from [<c00a1f98>] (ra_submit+0x24/0x28)
Mar 31 04:03:44 readynas kernel: [<c00a1f98>] (ra_submit+0x24/0x28) from [<c009b1c0>] (filemap_fault+0x1b0/0x378)
Mar 31 04:03:44 readynas kernel: [<c009b1c0>] (filemap_fault+0x1b0/0x378) from [<c00abb38>] (__do_fault+0x50/0x3bc)
Mar 31 04:03:44 readynas kernel: [<c00abb38>] (__do_fault+0x50/0x3bc) from [<c00acd88>] (handle_mm_fault+0x248/0x588)
Mar 31 04:03:44 readynas kernel: [<c00acd88>] (handle_mm_fault+0x248/0x588) from [<c002fcf0>] (do_page_fault+0xe0/0x22c)
Mar 31 04:03:44 readynas kernel: [<c002fcf0>] (do_page_fault+0xe0/0x22c) from [<c0028230>] (do_DataAbort+0x30/0x90)
Mar 31 04:03:44 readynas kernel: [<c0028230>] (do_DataAbort+0x30/0x90) from [<c0028f1c>] (ret_from_exception+0x0/0x10)
Mar 31 04:03:44 readynas kernel: Exception stack(0xc517dfb0 to 0xc517dff8)
Mar 31 04:03:44 readynas kernel: dfa0: 007b9c5b 0000000b 0000001b 4009a1a8
Mar 31 04:03:44 readynas kernel: dfc0: 00000003 403fbc20 00000000 0f738b7d 0000000e 00000000 40025000 40023000
Mar 31 04:03:44 readynas kernel: dfe0: 0000006e bee34688 40009840 400094c8 20000010 ffffffff
Mar 31 04:03:44 readynas kernel: Mem-info:
Mar 31 04:03:44 readynas kernel: Normal per-cpu:
Mar 31 04:03:44 readynas kernel: CPU 0: hi: 90, btch: 15 usd: 65
Mar 31 04:03:44 readynas kernel: Active_anon:25457 active_file:131 inactive_anon:25501
Mar 31 04:03:44 readynas kernel: inactive_file:172 unevictable:8 dirty:0 writeback:339 unstable:0
Mar 31 04:03:44 readynas kernel: free:4688 slab:3355 mapped:19 pagetables:630 bounce:0
Mar 31 04:03:44 readynas kernel: Normal free:18752kB min:16384kB low:20480kB high:24576kB active_anon:101828kB inactive_anon:102004kB active_file:524k
B inactive_file:688kB unevictable:32kB present:260096kB pages_scanned:32 all_unreclaimable? no
Mar 31 04:03:44 readynas kernel: lowmem_reserve[]: 0 0
Mar 31 04:03:44 readynas kernel: Normal: 374*4kB 89*8kB 36*16kB 1*32kB 15*64kB 7*128kB 3*256kB 4*512kB 1*1024kB 1*2048kB 0*4096kB 1*8192kB 0*16384kB 0
*32768kB 0*65536kB 0*131072kB 0*262144kB 0*524288kB 0*1048576kB = 18752kB
Mar 31 04:03:44 readynas kernel: 2062 total pagecache pages
Mar 31 04:03:44 readynas kernel: 1743 pages in swap cache
Mar 31 04:03:44 readynas kernel: Swap cache stats: add 186888, delete 185145, find 128736/138061
Mar 31 04:03:44 readynas kernel: Free swap = 0kB
Mar 31 04:03:44 readynas kernel: Total swap = 524268kB
Mar 31 04:03:44 readynas kernel: 65536 pages of RAM
Mar 31 04:03:44 readynas kernel: 4988 free pages
Mar 31 04:03:44 readynas kernel: 3841 reserved pages
Mar 31 04:03:44 readynas kernel: 2076 slab pages
Mar 31 04:03:44 readynas kernel: 299 pages shared
Mar 31 04:03:44 readynas kernel: 1743 pages swap cached
Mar 31 04:03:44 readynas kernel: Out of memory: kill process 27676 (apache-ssl) score 26234 or a child - luiteAspirantUpdate 2 -Looks like that processes that are 'niced' are more likely to be killed due to the calculation of the out-of-memory killer. It also appears that readynas is configured to run apache-ssl with a nice value. This is generally a good idea, but it also means it is more likely to kill apache-ssl than it is to kill minidlna.
So far my best bet is (1) to not use minidlna at all or (2) make sure those processes are 'niced' more strongly than apache-ssl. Moreover, if the real culprit memory hog can be killed that should also mean that the system would become/remain responsive... I hope?
Here's the rather unelgant script I've put in /etc/rc2.d/S99renice to try to accomplish this
#!/bin/bash
#renice minidlna -> high value = low priority and more like to be killed by OOM (out of memory manager)
if [ ! -z "`pidof minidlna`" ]
then
renice -n 10 `pidof minidlna`
echo "Reniced minidlna to 10"
else
echo "minidlna not running while trying to renice"
fi
#renice sshd -> low value = high priority and less likely to be killed by OOM
if [ ! -z "`pidof minidlna`" ]
then
renice -n -1 `pidof sshd`
echo "Reniced sshd to -1"
else
echo "sshd not running while trying to renice"
fi - chirpaLuminaryYour script is a good first step.
readynasd looks to have a lot of memleaks in it if I watch it over time.
If you are running backup jobs in the GUI that use 'cp' to copy files somewhere, like a CIFS backup job, there is a known memory leak issue there as well. The Remote install as of 5.3.7 is using an old coreutils with a cp that will eventually eat all memory and cause OOMs; viewtopic.php?p=371803#p371803
Related Content
NETGEAR Academy
Boost your skills with the Netgear Academy - Get trained, certified and stay ahead with the latest Netgear technology!
Join Us!