2005-10-31 00:09:08

by Rob Landley

[permalink] [raw]
Subject: echo 0 > /proc/sys/vm/swappiness triggers OOM killer under 2.6.14.

Under 2.6.14 (UML), I have a workload that runs with 64 megs ram and 256 megs
swap space. It completes (albeit swapping like mad) with swappiness at the
default 60, but if I set it to 0 the OOM killer kicks in and the script
aborts.

The workload is basically compiling gcc 4.0.2 with gcc 3.3.2. Now gcc a pig
(hence the reason for feeding it 256 megs of swap space), but twiddling
swappiness shouldn't make the difference between success and failure.

Why does the OOM killer ever trigger when there are _any_ dirty pages queued
up for DMA to an existing local block device? (Or when there is SWAP SPACE
LEFT?) This is memory that will be freed in time, thus the system isn't
guaranteed to hang yet. Don't we only need to trigger the OOM killer if the
alternative is the system hanging?

Rob

P.S. Not only is this repeatable, but I have a script that I run that
downloads the UML and gcc sources, builds UML, and tries to build GCC under
it. I can put this up somewhere if anybody would like to try to reproduce
this themselves...


2005-11-01 04:06:47

by Robert Hancock

[permalink] [raw]
Subject: Re: echo 0 > /proc/sys/vm/swappiness triggers OOM killer under 2.6.14.

Rob Landley wrote:
> Under 2.6.14 (UML), I have a workload that runs with 64 megs ram and 256 megs
> swap space. It completes (albeit swapping like mad) with swappiness at the
> default 60, but if I set it to 0 the OOM killer kicks in and the script
> aborts.

You should get some debugging output in dmesg when the OOM killer kicks
in, can you post this?

--
Robert Hancock Saskatoon, SK, Canada
To email, remove "nospam" from [email protected]
Home Page: http://www.roberthancock.com/

2005-11-01 08:41:06

by Rob Landley

[permalink] [raw]
Subject: Re: echo 0 > /proc/sys/vm/swappiness triggers OOM killer under 2.6.14.

On Monday 31 October 2005 22:05, Robert Hancock wrote:
> Rob Landley wrote:
> > Under 2.6.14 (UML), I have a workload that runs with 64 megs ram and 256
> > megs swap space. It completes (albeit swapping like mad) with swappiness
> > at the default 60, but if I set it to 0 the OOM killer kicks in and the
> > script aborts.
>
> You should get some debugging output in dmesg when the OOM killer kicks
> in, can you post this?

Sure:

VFS: Mounted root (hostfs filesystem).
Adding 262136k swap on tmp/ubda. Priority:-1 extents:1 across:262136k
oom-killer: gfp_mask=0x400d2, order=0
Mem-info:
DMA per-cpu:
cpu 0 hot: low 14, high 42, batch 7 used:20
cpu 0 cold: low 0, high 14, batch 7 used:8
Normal per-cpu: empty
HighMem per-cpu: empty
Free pages: 1416kB (0kB HighMem)
Active:14014 inactive:718 dirty:1 writeback:0 unstable:0 free:354 slab:468
mapped:14722 pagetables:58
DMA free:1416kB min:1024kB low:1280kB high:1536kB active:56056kB
inactive:2872kB present:65536kB pages_scanned:26577 all_unreclaimable? no
lowmem_reserve[]: 0 0 0
Normal free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:0kB
pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0
HighMem free:0kB min:128kB low:160kB high:192kB active:0kB inactive:0kB
present:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0
DMA: 98*4kB 0*8kB 0*16kB 0*32kB 4*64kB 2*128kB 0*256kB 1*512kB 0*1024kB
0*2048kB 0*4096kB = 1416kB
Normal: empty
HighMem: empty
Swap cache: add 5689, delete 5485, find 583/725, race 0+0
Free swap = 255420kB
Total swap = 262136kB
Free swap: 255420kB
16384 pages of RAM
0 pages of HIGHMEM
671 reserved pages
1034 pages shared
204 pages swap cached
Out of Memory: Killed process 30055 (cc1).
Badness in handle_page_fault
at /home/landley/newbuild/firmware-build/tmpdir/linux-2.6.14/arch/um/kernel/trap_kern.c:98
08bafb10: [<0805d064>] handle_page_fault+0x1c4/0x260
08bafb50: [<0805d1a7>] segv+0xa7/0x2e0
08bafb70: [<0805e713>] setjmp_wrapper+0x83/0x90
08bafba8: [<0805e6c7>] setjmp_wrapper+0x37/0x90
08bafbd0: [<0805b1a5>] change_signals+0x65/0x90
08bafbe0: [<08060d3a>] do_ops+0x14a/0x150
08bafc10: [<0805f29f>] wait_stub_done+0x5f/0x110
08bafc40: [<0805d67c>] segv_handler+0x7c/0x90
08bafc60: [<08060fbe>] user_signal+0x3e/0x70
08bafc80: [<0805faa0>] userspace+0x180/0x220
08bafcf0: [<08060711>] fork_handler+0xc1/0xe0

System halted.

2005-11-02 00:07:29

by Robert Hancock

[permalink] [raw]
Subject: Re: echo 0 > /proc/sys/vm/swappiness triggers OOM killer under 2.6.14.

Rob Landley wrote:
>>>Under 2.6.14 (UML), I have a workload that runs with 64 megs ram and 256
>>>megs swap space. It completes (albeit swapping like mad) with swappiness
>>>at the default 60, but if I set it to 0 the OOM killer kicks in and the
>>>script aborts.
>>
>>You should get some debugging output in dmesg when the OOM killer kicks
>>in, can you post this?

> oom-killer: gfp_mask=0x400d2, order=0

OK, nothing really special about this..

> Free pages: 1416kB (0kB HighMem)
> Active:14014 inactive:718 dirty:1 writeback:0 unstable:0 free:354 slab:468
> mapped:14722 pagetables:58
> DMA free:1416kB min:1024kB low:1280kB high:1536kB active:56056kB
> inactive:2872kB present:65536kB pages_scanned:26577 all_unreclaimable? no

It looks like some memory is available here, but likely some UML person
would have to say for sure..

> Out of Memory: Killed process 30055 (cc1).
> Badness in handle_page_fault
> at /home/landley/newbuild/firmware-build/tmpdir/linux-2.6.14/arch/um/kernel/trap_kern.c:98

You likely need a UML person for this part too :-)

--
Robert Hancock Saskatoon, SK, Canada
To email, remove "nospam" from [email protected]
Home Page: http://www.roberthancock.com/

2005-11-02 07:33:05

by Dave Jones

[permalink] [raw]
Subject: Re: echo 0 > /proc/sys/vm/swappiness triggers OOM killer under 2.6.14.

On Tue, Nov 01, 2005 at 02:37:01AM -0600, Rob Landley wrote:


> oom-killer: gfp_mask=0x400d2, order=0

something explicitly asked for a highmem page.

> 0 pages of HIGHMEM

You don't have any.

Calling the oom-killer in this situation seems drastic though.

Dave

2005-11-02 19:13:43

by Rob Landley

[permalink] [raw]
Subject: Re: echo 0 > /proc/sys/vm/swappiness triggers OOM killer under 2.6.14.

On Wednesday 02 November 2005 01:32, Dave Jones wrote:
> On Tue, Nov 01, 2005 at 02:37:01AM -0600, Rob Landley wrote:
> > oom-killer: gfp_mask=0x400d2, order=0
>
> something explicitly asked for a highmem page.
>
> > 0 pages of HIGHMEM
>
> You don't have any.
>
> Calling the oom-killer in this situation seems drastic though.
>
> Dave

Except that the only difference between this test and the one that succeeds is
the value of "/proc/sys/vm/swappiness". With 60 it finishes, with 0 it
fails. The same binaries are being run by the same script, and in neither
case is there highmem in the kernel.

The test system is a User Mode Linux instance, running a shell script in place
of init. As a result, there are very few processes running in this system,
and only one is really active at a time.

At the failure point, the shell script calls the "make" of gcc 4.0.2, and far
and away the high point of memory usage is gcc's "genattrtab", which creates
and then compiles a .c file that causes the system to swap for about 5
minutes before it completes. (This is an extreme memory hog: Before I
started feeding UML a swap file, it couldn't complete with only 128 megs of
ram, but finished with 256. Now I'm telling UML mem=64M and attaching a 256
megabyte file to the Usermode Block Device driver, to act as a swap
partition.)

So at the point of failure, bash is blocked waiting on a child, make is
blocked waiting on a child, gcc is building its attrtab pig, and nothing else
(no daemons, not even init) is running on the system. It's a pretty
straightforward "the VM goes nuts in a low memory situation" case.

If you'd like to reproduce this, I can send you my build script. It's
self-contained, downloads all the source code it needs automatically, and
either succeeds or reproduces the problem quite deterministically depending
on whether or not the "echo 0 > /proc/sys/vm/swappiness" line is present or
not.

Rob

2005-11-03 01:36:33

by Rob Landley

[permalink] [raw]
Subject: Re: echo 0 > /proc/sys/vm/swappiness triggers OOM killer under 2.6.14.

On Tuesday 01 November 2005 18:07, Robert Hancock wrote:
> > Free pages: 1416kB (0kB HighMem)
> > Active:14014 inactive:718 dirty:1 writeback:0 unstable:0 free:354
> > slab:468 mapped:14722 pagetables:58
> > DMA free:1416kB min:1024kB low:1280kB high:1536kB active:56056kB
> > inactive:2872kB present:65536kB pages_scanned:26577 all_unreclaimable? no
>
> It looks like some memory is available here, but likely some UML person
> would have to say for sure..

UML = User Mode Linux. I.E. configuring and building the normal linux kernel
with "ARCH=um", which makes for much less rebooting when testing this sort of
thing out. I can try making an equivalent bootable kernel and re-running the
test under there to confirm I get the failure, if you'd like, but this part
of the kernel (the virtual memory subsystem) really shouldn't be affected by
this.

> > Out of Memory: Killed process 30055 (cc1).
> > Badness in handle_page_fault
> > at
> > /home/landley/newbuild/firmware-build/tmpdir/linux-2.6.14/arch/um/kernel/
> >trap_kern.c:98
>
> You likely need a UML person for this part too :-)

The fact that the User Mode Linux people wrote their own trap handler? Do you
think that changes the trap being dumped, or is likely to alter the behavior
of the VM subsystem?

Rob