2001-11-22 15:23:21

by Denis Vlasenko

[permalink] [raw]
Subject: OOM killer in 2.4.15pre1 still not 100% ok

Today I saw OOM killer in action for the very fist time.
Just want to inform that it still not 100% ok (IMHO):

I reconfigured my text box for NFS root fs operation
and turned off swap. The box has 128M RAM.

At the time of OOM I was in Midnight Commander on normal vc (not xterm).
X+KDE (Kmail and a couple of Konquerors) was loaded but I didn't work
in X a that moment.

OOM killed top. I have top permanently running on vc10.
I presume it was neither big nor newly fired process so I don't think it was
right candidate for kill.

Last top screen is below.
(How nice: OOM killer taking snapshots of processes at kill time,
i.e. aids in its own debugging! :-)
(Oh, nice idea: dump top-like info in syslog after each OOM kill?)
--
vda

5:01pm up 1:08, 3 users, load average: 0.18, 0.10, 0.06
61 processes: 58 sleeping, 2 running, 1 zombie, 0 stopped
CPU states: 1.9% user, 15.6% system, 0.0% nice, 82.3% idle
Mem: 126272K av, 123428K used, 2844K free, 0K shrd, 16K buff
Swap: 0K av, 0K used, 0K free 47748K cached

PID USER PRI NI SIZE RSS SHARE STAT LIB %CPU %MEM TIME COMMAND
4 root 15 0 0 0 0 SW 0 13.8 0.0 0:02 kswapd
974 root 9 0 4852 4852 4724 S 0 2.3 3.8 0:17 mpg123
693 user0 9 0 1440 1440 1232 R 0 0.3 1.1 0:19 top c s
790 root 9 0 5052 5052 4756 S 0 0.1 4.0 0:08 kdeinit: kded
816 root 9 0 9428 9428 8080 S 0 0.1 7.4 0:01 kdeinit:
kdesktop
975 root 9 0 1708 1708 1592 S 0 0.1 1.3 0:00 mpg123
1 root 8 0 188 188 160 S 0 0.0 0.1 0:05 init
2 root 9 0 0 0 0 SW 0 0.0 0.0 0:00 keventd
3 root 19 19 0 0 0 SWN 0 0.0 0.0 0:00
ksoftirqd_CPU0
5 root 9 0 0 0 0 SW 0 0.0 0.0 0:00 bdflush
6 root 9 0 0 0 0 SW 0 0.0 0.0 0:00 kupdated
7 root 9 0 0 0 0 SW 0 0.0 0.0 0:00 eth0
8 root 9 0 0 0 0 SW 0 0.0 0.0 0:00 rpciod
14 root 9 0 580 580 496 S 0 0.0 0.4 0:00 devfsd /dev
687 root 9 0 624 624 528 S 0 0.0 0.4 0:00 syslogd
690 root 9 0 1120 1120 444 S 0 0.0 0.8 0:00 klogd -c 3
702 root 9 0 124 124 96 S 0 0.0 0.0 0:00 dhcpcd -t 20
-R -d eth1
734 rpc 9 0 644 644 540 S 0 0.0 0.5 0:00 rpc.portmap
736 root 9 0 556 556 484 S 0 0.0 0.4 0:00 inetd
738 root 9 0 664 664 564 S 0 0.0 0.5 0:00 automount
--timeout 5 /mnt/auto
740 root 9 0 488 488 420 S 0 0.0 0.3 0:00 gpm -2 -m
/dev/psaux -t ps2
741 root 9 0 484 484 424 S 0 0.0 0.3 0:00 /sbin/agetty
38400 tty1 linux
743 root 9 0 1168 1168 892 S 0 0.0 0.9 0:00 -bash
745 root 9 0 1168 1168 892 S 0 0.0 0.9 0:00 -bash
748 root 9 0 484 484 424 S 0 0.0 0.3 0:00 /sbin/agetty
38400 tty4 linux
749 root 9 0 484 484 424 S 0 0.0 0.3 0:00 /sbin/agetty
38400 tty5 linux
752 root 9 0 484 484 424 S 0 0.0 0.3 0:00 /sbin/agetty
38400 tty6 linux
753 root 9 0 484 484 424 S 0 0.0 0.3 0:00 /sbin/agetty
38400 tty7 linux
754 root 9 0 1116 1116 820 S 0 0.0 0.8 0:00 nmbd
-l/var/log/samba/nmbd.log -
755 root 9 0 1168 1168 1060 S 0 0.0 0.9 0:00 -bash


2001-11-22 16:16:39

by Ryan Cumming

[permalink] [raw]
Subject: Re: OOM killer in 2.4.15pre1 still not 100% ok

On November 22, 2001 11:22, vda wrote:
> Today I saw OOM killer in action for the very fist time.
> Just want to inform that it still not 100% ok (IMHO):
>
> I reconfigured my text box for NFS root fs operation
> and turned off swap. The box has 128M RAM.
...

> 5:01pm up 1:08, 3 users, load average: 0.18, 0.10, 0.06
> 61 processes: 58 sleeping, 2 running, 1 zombie, 0 stopped
> CPU states: 1.9% user, 15.6% system, 0.0% nice, 82.3% idle
> Mem: 126272K av, 123428K used, 2844K free, 0K shrd, 16K buff
>Swap: 0K av, 0K used, 0K free 47748K cached

Er, with almost 3megs of free memory and -47megs- of cache, the problem isn't
with the OOM killer's selection, but the fact it was triggered with nearly
half the RAM still usable. Do you actually know the OOM killer was triggered?
Or did top just mysteriously exit?

Personally, I've found the new 2.4.14+ OOM killer to be highly accurate, I've
run ext3's shared mappings torture test on my 384meg RAM (256meg swap) box
repeatedly, and it pushes my computer to OOM every 60 seconds or so. It
always kills the correct process, with my box remaining absolutely stable,
even with less swap space than RAM.

-Ryan

2001-11-22 17:43:57

by Denis Vlasenko

[permalink] [raw]
Subject: Re: OOM killer in 2.4.15pre1 still not 100% ok

On Thursday 22 November 2001 14:15, Ryan Cumming wrote:
> On November 22, 2001 11:22, vda wrote:
> > Today I saw OOM killer in action for the very fist time.
> > Just want to inform that it still not 100% ok (IMHO):
> >
> > I reconfigured my text box for NFS root fs operation
> > and turned off swap. The box has 128M RAM.
>
> ...
>
> > 5:01pm up 1:08, 3 users, load average: 0.18, 0.10, 0.06
> > 61 processes: 58 sleeping, 2 running, 1 zombie, 0 stopped
> > CPU states: 1.9% user, 15.6% system, 0.0% nice, 82.3% idle
> > Mem: 126272K av, 123428K used, 2844K free, 0K shrd, 16K buff
> >Swap: 0K av, 0K used, 0K free 47748K cached
>
> Er, with almost 3megs of free memory and -47megs- of cache, the problem
> isn't with the OOM killer's selection, but the fact it was triggered with
> nearly half the RAM still usable. Do you actually know the OOM killer was
> triggered? Or did top just mysteriously exit?

I have no swap at all.
There was a message "Out of Memory: killing process top" or something such
on screen.
--
vda

2001-11-22 20:10:54

by Mike Galbraith

[permalink] [raw]
Subject: Re: OOM killer in 2.4.15pre1 still not 100% ok

On Thu, 22 Nov 2001, vda wrote:

> Today I saw OOM killer in action for the very fist time.
> Just want to inform that it still not 100% ok (IMHO):

With no swap, seems pretty frisky in pre9. I can't start X/KDE
with a 64MB ram boot. Touching the end of ram is very deadly atm
with no swap enabled.. can't say it's a wrong behavior though, as
I am touching the end. (I think I'd _rather_ it die than thrash)

-Mike

2001-11-23 11:03:09

by Denis Vlasenko

[permalink] [raw]
Subject: Re: OOM killer in 2.4.15pre1 still not 100% ok

On Thursday 22 November 2001 14:15, Ryan Cumming wrote:
> Personally, I've found the new 2.4.14+ OOM killer to be highly accurate,
> I've run ext3's shared mappings torture test on my 384meg RAM (256meg swap)
> box repeatedly, and it pushes my computer to OOM every 60 seconds or so. It
> always kills the correct process, with my box remaining absolutely stable,
> even with less swap space than RAM.

OOM did it again, this time I ran glib 1.2.10 configure script and while it
checked whether I have working -static, OOM again killed my top
(looks like it really likes my top :-). You can see newly started ld
which are about to trigger OOM within 5 secs in last screen of poor top.

Maybe I misunderstand something, but why OOM chose top? Is it how it is
intended to work?

PS. I thought 128 Megs of RAM is enough for X+KDE+gcc...
Do we all have to play Gig'o'Rama these days?
[Yes I know that swap will help, let's not start a thread...]
--
vda

Out of Memory: Killed process 726 (top)

12:20pm up 1:12, 4 users, load average: 0.39, 0.20, 0.11
67 processes: 62 sleeping, 5 running, 0 zombie, 0 stopped
CPU states: 12.4% user, 46.8% system, 0.0% nice, 40.7% idle
Mem: 126272K av, 124856K used, 1416K free, 0K shrd, 0K buff
Swap: 0K av, 0K used, 0K free 45664K cached

PID USER PRI NI SIZE RSS SHARE STAT LIB %CPU %MEM TIME COMMAND
4 root 17 0 0 0 0 SW 0 14.5 0.0 0:05 kswapd
2071 root 14 0 2340 2340 728 R 0 11.0 1.8 0:00 /usr/bin/ld
-m elf_i386 -static
900 root 13 0 1916 1916 1788 R 0 3.5 1.5 0:35 mpg123
1964 root 9 0 968 968 856 S 0 2.1 0.7 0:00 /bin/sh
../ltconfig --no-reexec
868 root 9 0 7008 7008 6212 S 0 1.9 5.5 0:00 kdeinit:
klipper -icon klipper -
1779 root 9 0 992 992 740 S 0 1.7 0.7 0:00 /bin/sh
../configure --prefix=/u
2067 root 9 0 496 496 400 S 0 1.1 0.3 0:00 gcc -o
conftest -g -O2 -static c
8 root 10 0 0 0 0 SW 0 0.7 0.0 0:00 rpciod
847 root 13 0 2272 2268 1580 R 0 0.7 1.7 0:00 artsd -F 10
-S 4096
726 user0 9 0 1340 1340 1128 R 0 0.5 1.0 0:21 top c s
2070 root 9 0 456 456 388 S 0 0.5 0.3 0:00
/usr/lib/gcc301/lib/gcc-lib/i686
820 root 9 0 47012 29M 1156 S 0 0.3 24.1 1:56 X :0
-layout mga
901 root 10 0 1684 1684 1568 S 0 0.3 1.3 0:01 mpg123
844 root 9 0 5016 5016 4720 S 0 0.1 3.9 0:08 kdeinit:
kded
860 root 9 0 9804 9804 8368 S 0 0.1 7.7 0:01 kdeinit:
kdesktop
1 root 8 0 52 52 24 S 0 0.0 0.0 0:05 init
2 root 9 0 0 0 0 SW 0 0.0 0.0 0:00 keventd
3 root 19 19 0 0 0 SWN 0 0.0 0.0 0:00
ksoftirqd_CPU0
5 root 9 0 0 0 0 SW 0 0.0 0.0 0:00 bdflush
6 root 9 0 0 0 0 SW 0 0.0 0.0 0:00 kupdated
7 root 9 0 0 0 0 SW 0 0.0 0.0 0:00 eth0
40 root 9 0 560 560 476 S 0 0.0 0.4 0:00 devfsd /dev
720 root 9 0 624 624 528 S 0 0.0 0.4 0:00 syslogd
723 root 9 0 1100 1100 424 S 0 0.0 0.8 0:00 klogd -c 3
735 root 9 0 124 124 96 S 0 0.0 0.0 0:00 dhcpcd -t
20 -R -d eth1
767 rpc 9 0 644 644 540 S 0 0.0 0.5 0:00 rpc.portmap
769 root 9 0 544 544 472 S 0 0.0 0.4 0:00 inetd
773 root 9 0 472 472 404 S 0 0.0 0.3 0:00 gpm -2 -m
/dev/psaux -t ps2
790 root 9 0 484 484 424 S 0 0.0 0.3 0:00
/sbin/agetty 38400 tty1 linux
791 root 9 0 1168 1168 892 S 0 0.0 0.9 0:00 -bash

2001-11-26 10:25:31

by Helge Hafting

[permalink] [raw]
Subject: Re: OOM killer in 2.4.15pre1 still not 100% ok

vda wrote:

> Maybe I misunderstand something, but why OOM chose top? Is it how it is
> intended to work?

It is intended to do the least possible damage when killing
something. I'd say it does nicely when killing "top", you
surely don't loose much work that way. :-)

Helge Hafting