2006-11-15 15:08:27

by Fabio Coatti

[permalink] [raw]
Subject: strange behaviour with x86_64

Hi all,
I'm seeing a strange behaviour on a dual opteron machine (or, at least it seem
so to me) and I need some advice.
The hardware is: dual opteron 2.4, dual core, 2GB ram
This machine runs apache webserver/mod_perl. Due to some bug in perl code,
often the machine runs out of memory, and oomkiller triggers.
Nothing wrong in this, but while this happens the machine freezes: no activity
is possible, it only answers to "ping" packets and to sysreq commands. Even
the console login freezes.
The only possible recovery is to reboot the machine. I've tried, to collect
more information, to reduce the swap space from 2Gb to 200Mb, and in fact the
lockup happens more often.
It seems to me that when the kernel runs out of memory (swap+phys), it is
unable to recover in any way, even if the oomkiller removes a good deal of
memory hog processes.
Maybe this is the usual behaviour, but I'm a little bit perplexed.
I've used netconsole to collect more data, and I'm attaching a sysreq-t output
obtained during the "lockup" phase. Thanks to Andrea for some useful hints on
how to use netconsole facility :)
Let me know if you need more information, or if this a "no issue" situation :)

the sysreq log file can be found at this url:

http://members.ferrara.linux.it/cova/sysreq.log


Speaking of netconsole, I've been able to collect information from sysreq, but
no boot messages are sent over the network; only when the userland comes up I
can see messages on the remote machines. As anyone some hints about this?

Thanks.


--
Fabio "Cova" Coatti http://members.ferrara.linux.it/cova
Ferrara Linux Users Group http://ferrara.linux.it
GnuPG fp:9765 A5B6 6843 17BC A646 BE8C FA56 373A 5374 C703
Old SysOps never die... they simply forget their password.


2006-11-16 12:38:54

by Oleg Verych

[permalink] [raw]
Subject: Re: strange behaviour with x86_64

On 2006-11-15, Fabio Coatti wrote:
> Hi all,

I'm just trying to move things forward, i'm not a developer.

> I'm seeing a strange behaviour on a dual opteron machine (or, at least it seem
> so to me) and I need some advice.
> The hardware is: dual opteron 2.4, dual core, 2GB ram
> This machine runs apache webserver/mod_perl. Due to some bug in perl code,
> often the machine runs out of memory, and oomkiller triggers.

You seem to run very interesting distribution; when reporting
*kernel* bugs (things), try to follow REPORTING-BUGS (in top dir. of kernel
sources): version (ver_linux), adding to CC list memory management
list from MAINTAINERS file.

As for me, kernel in oom condition is very bad thing. I don't think
"oom killer" idea is working in general (while it helps to link
allyesconfig kernel on 1G without swap ;).

Thus, i suggest you to play with stuff, described in
"Documentation/vm/overcommit-accounting".
____