2006-02-26 14:40:24

by Chuck Ebbert

[permalink] [raw]
Subject: OOM-killer too aggressive?

Chris Largret is getting repeated OOM kills because of DMA memory
exhaustion:

oom-killer: gfp_mask=0xd1, order=3

Call Trace: <ffffffff8104ed46>{out_of_memory+58} <ffffffff8104ff30>{__alloc_pages+534}
<ffffffff8104ffee>{__get_free_pages+48} <ffffffff8117d8e9>{dma_mem_alloc+31}
<ffffffff81183e70>{floppy_open+348} <ffffffff81072125>{do_open+172}
<ffffffff810724b4>{blkdev_open+0} <ffffffff810724dc>{blkdev_open+40}
<ffffffff81069fea>{__dentry_open+230} <ffffffff8106a10e>{nameidata_to_filp+40}
<ffffffff8106a153>{do_filp_open+51} <ffffffff8106a2cb>{get_unused_fd+116}
<ffffffff8106a477>{do_sys_open+73} <ffffffff8106a4d3>{sys_open+27}
<ffffffff8100aa3a>{system_call+126}
Mem-info:
DMA per-cpu:
cpu 0 hot: high 0, batch 1 used:0
cpu 0 cold: high 0, batch 1 used:0
cpu 1 hot: high 0, batch 1 used:0
cpu 1 cold: high 0, batch 1 used:0
DMA32 per-cpu:
cpu 0 hot: high 186, batch 31 used:184
cpu 0 cold: high 62, batch 15 used:4
cpu 1 hot: high 186, batch 31 used:160
cpu 1 cold: high 62, batch 15 used:4
Normal per-cpu: empty
HighMem per-cpu: empty
Free pages: 2843384kB (0kB HighMem)
Active:10367 inactive:38871 dirty:42 writeback:0 unstable:0 free:710846
slab:4726 mapped:2155 pagetables:147
DMA free:44kB min:32kB low:40kB high:48kB active:0kB inactive:0kB
present:15728kB pages_scanned:0 all_unreclaimable? yes
lowmem_reserve[]: 0 3014 3014 3014
DMA32 free:2843340kB min:7008kB low:8760kB high:10512kB active:41468kB
inactive:155484kB present:3086500kB pages_scanned:0 all_unreclaimable?
no
lowmem_reserve[]: 0 0 0 0
Normal free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB
present:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0
HighMem free:0kB min:128kB low:128kB high:128kB active:0kB inactive:0kB
present:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0
DMA: 1*4kB 1*8kB 0*16kB 1*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB
0*2048kB 0*4096kB = 44kB
DMA32: 933*4kB 573*8kB 229*16kB 74*32kB 21*64kB 5*128kB 1*256kB 1*512kB
0*1024kB 0*2048kB 690*4096kB = 2843340kB
Normal: empty
HighMem: empty
Swap cache: add 0, delete 0, find 0/0, race 0+0
Free swap = 1477960kB
Total swap = 1477960kB
Free swap: 1477960kB
786416 pages of RAM
17590 reserved pages
10033 pages shared
0 pages swap cached
Out of Memory: Killed process 4886 (dbus-daemon).


Looking at floppy_open, we have:

if (!floppy_track_buffer) {
/* if opening an ED drive, reserve a big buffer,
* else reserve a small one */
if ((UDP->cmos == 6) || (UDP->cmos == 5))
try = 64; /* Only 48 actually useful */
else
try = 32; /* Only 24 actually useful */

tmp = (char *)fd_dma_mem_alloc(1024 * try);
if (!tmp && !floppy_track_buffer) {
try >>= 1; /* buffer only one side */
INFBOUND(try, 16);
tmp = (char *)fd_dma_mem_alloc(1024 * try);
}
if (!tmp && !floppy_track_buffer) {
fallback_on_nodma_alloc(&tmp, 2048 * try);
}

So it will try to allocate half its first request if that fails, then
fall back to non-DMA memory as a last resort, but doesn't get a chance
because the OOM killer gets invoked. Maybe we need a new flag that says
"fail me immediately if no memory available"?

Or should floppy.c be fixed so it doesn't ask for so much?


I found a diagnostic patch but only this part applies to 2.6.16-rc4:

> From: Jens Axboe <[email protected]>

--- a/block/ll_rw_blk.c
+++ b/block/ll_rw_blk.c
@@ -637,6 +637,8 @@ void blk_queue_bounce_limit(request_queu
{
unsigned long bounce_pfn = dma_addr >> PAGE_SHIFT;

+ printk("q=%p, dma_addr=%llx, bounce pfn %lu\n", q, dma_addr, bounce_pfn);
+
/*
* set appropriate bounce gfp mask -- unfortunately we don't have a
* full 4GB zone, so we have to resort to low memory for any bounces.

--
Chuck
"Equations are the Devil's sentences." --Stephen Colbert


2006-02-26 18:24:21

by Andrew Morton

[permalink] [raw]
Subject: Re: OOM-killer too aggressive?

Chuck Ebbert <[email protected]> wrote:
>
> Chris Largret is getting repeated OOM kills because of DMA memory
> exhaustion:
>
> oom-killer: gfp_mask=0xd1, order=3
>

This could be related to the known GFP_DMA oom on some x86_64 machines.

> Looking at floppy_open, we have:
>
> if (!floppy_track_buffer) {
> /* if opening an ED drive, reserve a big buffer,
> * else reserve a small one */
> if ((UDP->cmos == 6) || (UDP->cmos == 5))
> try = 64; /* Only 48 actually useful */
> else
> try = 32; /* Only 24 actually useful */
>
> tmp = (char *)fd_dma_mem_alloc(1024 * try);
> if (!tmp && !floppy_track_buffer) {
> try >>= 1; /* buffer only one side */
> INFBOUND(try, 16);
> tmp = (char *)fd_dma_mem_alloc(1024 * try);
> }
> if (!tmp && !floppy_track_buffer) {
> fallback_on_nodma_alloc(&tmp, 2048 * try);
> }
>
> So it will try to allocate half its first request if that fails, then
> fall back to non-DMA memory as a last resort, but doesn't get a chance
> because the OOM killer gets invoked. Maybe we need a new flag that says
> "fail me immediately if no memory available"?

That's __GFP_NORETRY.

> Or should floppy.c be fixed so it doesn't ask for so much?
>

The page allocator uses 32k as the threshold for when-to-try-like-crazy.

x86_64 should probably be defining its own fd_dma_mem_alloc() which doesn't
use GFP_DMA.

--- devel/drivers/block/floppy.c~floppy-false-oom-fix 2006-02-26 10:14:38.000000000 -0800
+++ devel-akpm/drivers/block/floppy.c 2006-02-26 10:15:04.000000000 -0800
@@ -278,7 +278,8 @@ static void do_fd_request(request_queue_
#endif

#ifndef fd_dma_mem_alloc
-#define fd_dma_mem_alloc(size) __get_dma_pages(GFP_KERNEL,get_order(size))
+#define fd_dma_mem_alloc(size) \
+ __get_dma_pages(GFP_KERNEL|__GFP_NORETRY,get_order(size))
#endif

static inline void fallback_on_nodma_alloc(char **addr, size_t l)
_

2006-02-26 18:41:26

by Robert Hancock

[permalink] [raw]
Subject: Re: OOM-killer too aggressive?

Chuck Ebbert wrote:
> DMA free:44kB min:32kB low:40kB high:48kB active:0kB inactive:0kB
> present:15728kB pages_scanned:0 all_unreclaimable? yes

I think the big question is who used up all the DMA zone.. Surely not
the floppy driver..

> So it will try to allocate half its first request if that fails, then
> fall back to non-DMA memory as a last resort, but doesn't get a chance
> because the OOM killer gets invoked. Maybe we need a new flag that says
> "fail me immediately if no memory available"?

I think __GFP_NORETRY already does this.. There is also __GFP_NOWARN
which suppresses the allocation failure warning, not sure if we want
that or not..

--
Robert Hancock Saskatoon, SK, Canada
To email, remove "nospam" from [email protected]
Home Page: http://www.roberthancock.com/

2006-02-26 20:04:21

by Marcelo Tosatti

[permalink] [raw]
Subject: Re: OOM-killer too aggressive?

On Sun, Feb 26, 2006 at 12:39:31PM -0600, Robert Hancock wrote:
> Chuck Ebbert wrote:
> >DMA free:44kB min:32kB low:40kB high:48kB active:0kB inactive:0kB
> >present:15728kB pages_scanned:0 all_unreclaimable? yes
>
> I think the big question is who used up all the DMA zone.. Surely not
> the floppy driver..

The kernel text and data? "readelf -S vmlinux" output would be useful.

2006-02-26 20:39:26

by Andi Kleen

[permalink] [raw]
Subject: Re: OOM-killer too aggressive?

On Sun, Feb 26, 2006 at 10:21:52AM -0800, Andrew Morton wrote:
> Chuck Ebbert <[email protected]> wrote:
> >
> > Chris Largret is getting repeated OOM kills because of DMA memory
> > exhaustion:
> >
> > oom-killer: gfp_mask=0xd1, order=3
> >
>
> This could be related to the known GFP_DMA oom on some x86_64 machines.

What known GFP_DMA oom? GFP_DMA allocation should work.

-Andi

2006-02-26 20:56:01

by Chris Largret

[permalink] [raw]
Subject: Re: OOM-killer too aggressive?

On Sun, 2006-02-26 at 15:56 -0600, Marcelo Tosatti wrote:
> On Sun, Feb 26, 2006 at 12:39:31PM -0600, Robert Hancock wrote:
> > I think the big question is who used up all the DMA zone.. Surely not
> > the floppy driver..
>
> The kernel text and data? "readelf -S vmlinux" output would be useful.

$ readelf -S vmlinux
There are 52 section headers, starting at offset 0x2548488:

Section Headers:
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
[ 0] NULL 0000000000000000 00000000
0000000000000000 0000000000000000 0 0 0
[ 1] .text PROGBITS ffffffff81000000 00100000
000000000026102f 0000000000000000 AX 0 0 16
[ 2] __ex_table PROGBITS ffffffff81261030 00361030
0000000000004420 0000000000000000 A 0 0 8
[ 3] .rodata PROGBITS ffffffff81266000 00366000
000000000004ba6f 0000000000000000 A 0 0 32
[ 4] .pci_fixup PROGBITS ffffffff812b1a70 003b1a70
00000000000008a0 0000000000000000 A 0 0 16
[ 5] .rio_route PROGBITS ffffffff812b2310 0066997c
0000000000000000 0000000000000000 W 0 0 1
[ 6] __ksymtab PROGBITS ffffffff812b2310 003b2310
0000000000009ac0 0000000000000000 A 0 0 16
[ 7] __ksymtab_gpl PROGBITS ffffffff812bbdd0 003bbdd0
0000000000001ea0 0000000000000000 A 0 0 16
[ 8] __kcrctab PROGBITS ffffffff812bdc70 003bdc70
0000000000004d60 0000000000000000 A 0 0 8
[ 9] __kcrctab_gpl PROGBITS ffffffff812c29d0 003c29d0
0000000000000f50 0000000000000000 A 0 0 8
[10] __ksymtab_strings PROGBITS ffffffff812c3920 003c3920
0000000000010622 0000000000000000 A 0 0 32
[11] __param PROGBITS ffffffff812d4000 003d4000
0000000000000d20 0000000000000000 A 0 0 8
[12] .data PROGBITS ffffffff812d5000 003d5000
00000000000cc5d0 0000000000000000 WA 0 0 4096
[13] .bss NOBITS ffffffff813a1600 004a15d0
000000000008210c 0000000000000000 WA 0 0 64
[14] .data.cacheline_a PROGBITS ffffffff81424000 00524000
0000000000004c00 0000000000000000 WA 0 0 64
[15] .data.read_mostly PROGBITS ffffffff81428c00 00528c00
00000000000009b0 0000000000000000 WA 0 0 64
[16] .vsyscall_0 PROGBITS ffffffffff600000 00600000
0000000000000108 0000000000000000 AX 0 0 1
[17] .xtime_lock PROGBITS ffffffffff600140 00600140
0000000000000008 0000000000000000 WA 0 0 16
[18] .vxtime PROGBITS ffffffffff600150 00600150
0000000000000030 0000000000000000 WA 0 0 16
[19] .wall_jiffies PROGBITS ffffffffff600180 00600180
0000000000000008 0000000000000000 WA 0 0 16
[20] .sys_tz PROGBITS ffffffffff600190 00600190
0000000000000008 0000000000000000 WA 0 0 16
[21] .sysctl_vsyscall PROGBITS ffffffffff6001a0 006001a0
0000000000000004 0000000000000000 WA 0 0 16
[22] .xtime PROGBITS ffffffffff6001b0 006001b0
0000000000000010 0000000000000000 WA 0 0 16
[23] .jiffies PROGBITS ffffffffff6001c0 006001c0
0000000000000008 0000000000000000 WA 0 0 16
[24] .vsyscall_1 PROGBITS ffffffffff600400 00600400
000000000000002e 0000000000000000 AX 0 0 1
[25] .vsyscall_2 PROGBITS ffffffffff600800 00600800
000000000000000d 0000000000000000 AX 0 0 1
[26] .vsyscall_3 PROGBITS ffffffffff600c00 00600c00
000000000000000d 0000000000000000 AX 0 0 1
[27] .data.init_task PROGBITS ffffffff8142c000 0062c000
0000000000002000 0000000000000000 WA 0 0 32
[28] .init.text PROGBITS ffffffff8142e000 0062e000
00000000000238de 0000000000000000 AX 0 0 1
[29] .init.data PROGBITS ffffffff81452000 00652000
000000000000c560 0000000000000000 WA 0 0 4096
[30] .init.setup PROGBITS ffffffff8145e560 0065e560
0000000000000af8 0000000000000000 WA 0 0 8
[31] .initcall.init PROGBITS ffffffff8145f058 0065f058
0000000000000730 0000000000000000 WA 0 0 8
[32] .con_initcall.ini PROGBITS ffffffff8145f788 0065f788
0000000000000018 0000000000000000 WA 0 0 8
[33] .security_initcal PROGBITS ffffffff8145f7a0 0066997c
0000000000000000 0000000000000000 W 0 0 1
[34] .altinstructions PROGBITS ffffffff8145f7a0 0065f7a0
0000000000000283 0000000000000000 A 0 0 8
[35] .altinstr_replace PROGBITS ffffffff8145fa23 0065fa23
0000000000000095 0000000000000000 AX 0 0 1
[36] .exit.text PROGBITS ffffffff8145fab8 0065fab8
0000000000000d5d 0000000000000000 AX 0 0 1
[37] .init.ramfs PROGBITS ffffffff81461000 00661000
0000000000000086 0000000000000000 A 0 0 1
[38] .data.percpu PROGBITS ffffffff81462000 00662000
000000000000797c 0000000000000000 WA 0 0 64
[39] .comment PROGBITS 0000000000000000 0066997c
0000000000003d74 0000000000000000 0 0 1
[40] .debug_aranges PROGBITS 0000000000000000 0066d6f0
000000000000d4f0 0000000000000000 0 0 1
[41] .debug_pubnames PROGBITS 0000000000000000 0067abe0
0000000000026a6e 0000000000000000 0 0 1
[42] .debug_info PROGBITS 0000000000000000 006a164e
0000000001ab55e4 0000000000000000 0 0 1
[43] .debug_abbrev PROGBITS 0000000000000000 02156c32
00000000000ca03b 0000000000000000 0 0 1
[44] .debug_line PROGBITS 0000000000000000 02220c6d
0000000000190ccd 0000000000000000 0 0 1
[45] .debug_frame PROGBITS 0000000000000000 023b1940
000000000009ad88 0000000000000000 0 0 8
[46] .debug_str PROGBITS 0000000000000000 0244c6c8
00000000000be96a 0000000000000001 MS 0 0 1
[47] .debug_ranges PROGBITS 0000000000000000 0250b032
000000000003d1e0 0000000000000000 0 0 1
[48] .note.GNU-stack PROGBITS 0000000000000000 02548212
0000000000000000 0000000000000000 X 0 0 1
[49] .shstrtab STRTAB 0000000000000000 02548212
0000000000000273 0000000000000000 0 0 1
[50] .symtab SYMTAB 0000000000000000 02549188
00000000000b3898 0000000000000018 51 20791 8
[51] .strtab STRTAB 0000000000000000 025fca20
0000000000096692 0000000000000000 0 0 1
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings)
I (info), L (link order), G (group), x (unknown)
O (extra OS processing required) o (OS specific), p (processor
specific)

--
Chris Largret <http://daga.dyndns.org>

2006-02-26 21:06:46

by Chris Largret

[permalink] [raw]
Subject: Re: OOM-killer too aggressive?

On Sun, 2006-02-26 at 10:21 -0800, Andrew Morton wrote:
> Chuck Ebbert <[email protected]> wrote:
> >
> > Chris Largret is getting repeated OOM kills because of DMA memory
> > exhaustion:
> >
> > oom-killer: gfp_mask=0xd1, order=3
> >
>
> This could be related to the known GFP_DMA oom on some x86_64 machines.

I'm not sure if this has any bearing on it, but the OOM Killer only does
this when I compile the kernel with SMP support.

> > Or should floppy.c be fixed so it doesn't ask for so much?
>
> The page allocator uses 32k as the threshold for when-to-try-like-crazy.
>
> x86_64 should probably be defining its own fd_dma_mem_alloc() which doesn't
> use GFP_DMA.
>
> --- devel/drivers/block/floppy.c~floppy-false-oom-fix 2006-02-26 10:14:38.000000000 -0800
> +++ devel-akpm/drivers/block/floppy.c 2006-02-26 10:15:04.000000000 -0800
> @@ -278,7 +278,8 @@ static void do_fd_request(request_queue_

Sorry, this didn't help on my machine. I am running that latest kernel
pre-patch (2.6.16-rc4) for testing right now and had to modify the
offsets a little. If there's any output that would help, please let me
know.

--
Chris Largret <http://daga.dyndns.org>

2006-02-26 21:11:23

by Andrew Morton

[permalink] [raw]
Subject: Re: OOM-killer too aggressive?

Andi Kleen <[email protected]> wrote:
>
> On Sun, Feb 26, 2006 at 10:21:52AM -0800, Andrew Morton wrote:
> > Chuck Ebbert <[email protected]> wrote:
> > >
> > > Chris Largret is getting repeated OOM kills because of DMA memory
> > > exhaustion:
> > >
> > > oom-killer: gfp_mask=0xd1, order=3
> > >
> >
> > This could be related to the known GFP_DMA oom on some x86_64 machines.
>
> What known GFP_DMA oom? GFP_DMA allocation should work.
>

There's a problem on some x86_64 machines which confuses the BIO layer.
BIO makes simple decisions about bounce pfns and some x86_64 memory layouts
cause them to go wrong. Net effect: lots of GFP_DMA allocations in the BIO
layer.

http://readlist.com/lists/vger.kernel.org/linux-kernel/36/182357.html
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=175173

2006-02-26 21:22:48

by Marcelo Tosatti

[permalink] [raw]
Subject: Re: OOM-killer too aggressive?

On Sun, Feb 26, 2006 at 12:56:10PM -0800, Chris Largret wrote:
> On Sun, 2006-02-26 at 15:56 -0600, Marcelo Tosatti wrote:
> > On Sun, Feb 26, 2006 at 12:39:31PM -0600, Robert Hancock wrote:
> > > I think the big question is who used up all the DMA zone.. Surely not
> > > the floppy driver..
> >
> > The kernel text and data? "readelf -S vmlinux" output would be useful.
>
> $ readelf -S vmlinux
> There are 52 section headers, starting at offset 0x2548488:

<snip>

> [49] .shstrtab STRTAB 0000000000000000 02548212
> 0000000000000273 0000000000000000 0 0 1
> [50] .symtab SYMTAB 0000000000000000 02549188
> 00000000000b3898 0000000000000018 51 20791 8
> [51] .strtab STRTAB 0000000000000000 025fca20
> 0000000000096692 0000000000000000 0 0 1

More than 40MB, that should partially explain it...

2006-02-26 21:39:02

by Andrew Morton

[permalink] [raw]
Subject: Re: OOM-killer too aggressive?

Chris Largret <[email protected]> wrote:
>
> On Sun, 2006-02-26 at 10:21 -0800, Andrew Morton wrote:
> > Chuck Ebbert <[email protected]> wrote:
> > >
> > > Chris Largret is getting repeated OOM kills because of DMA memory
> > > exhaustion:
> > >
> > > oom-killer: gfp_mask=0xd1, order=3
> > >
> >
> > This could be related to the known GFP_DMA oom on some x86_64 machines.
>
> I'm not sure if this has any bearing on it, but the OOM Killer only does
> this when I compile the kernel with SMP support.

I doubt if that's related.

> > > Or should floppy.c be fixed so it doesn't ask for so much?
> >
> > The page allocator uses 32k as the threshold for when-to-try-like-crazy.
> >
> > x86_64 should probably be defining its own fd_dma_mem_alloc() which doesn't
> > use GFP_DMA.
> >
> > --- devel/drivers/block/floppy.c~floppy-false-oom-fix 2006-02-26 10:14:38.000000000 -0800
> > +++ devel-akpm/drivers/block/floppy.c 2006-02-26 10:15:04.000000000 -0800
> > @@ -278,7 +278,8 @@ static void do_fd_request(request_queue_
>
> Sorry, this didn't help on my machine. I am running that latest kernel
> pre-patch (2.6.16-rc4) for testing right now and had to modify the
> offsets a little. If there's any output that would help, please let me
> know.
>

hm, OK. I suppose we can hit it with the big hammer, but I'd be reluctant
to merge this patch because it has the potential to hide problems, such as
the as-yet-unfixed bio-uses-ZONE_DMA one.

--- devel/mm/page_alloc.c~a 2006-02-26 13:26:56.000000000 -0800
+++ devel-akpm/mm/page_alloc.c 2006-02-26 13:28:58.000000000 -0800
@@ -1003,7 +1003,8 @@ rebalance:
zonelist, alloc_flags);
if (page)
goto got_pg;
- } else if ((gfp_mask & __GFP_FS) && !(gfp_mask & __GFP_NORETRY)) {
+ } else if ((gfp_mask & __GFP_FS) &&
+ !(gfp_mask & (__GFP_NORETRY|__GFP_DMA))) {
/*
* Go through the zonelist yet one more time, keep
* very high watermark here, this is only to catch
@@ -1027,7 +1028,7 @@ rebalance:
* <= 3, but that may not be true in other implementations.
*/
do_retry = 0;
- if (!(gfp_mask & __GFP_NORETRY)) {
+ if (!(gfp_mask & (__GFP_NORETRY|__GFP_DMA))) {
if ((order <= 3) || (gfp_mask & __GFP_REPEAT))
do_retry = 1;
if (gfp_mask & __GFP_NOFAIL)
_

2006-02-26 23:00:13

by Chris Largret

[permalink] [raw]
Subject: Re: OOM-killer too aggressive?

On Sun, 2006-02-26 at 13:31 -0800, Andrew Morton wrote:
> > Sorry, this didn't help on my machine. I am running that latest kernel
> > pre-patch (2.6.16-rc4) for testing right now and had to modify the
> > offsets a little. If there's any output that would help, please let me
> > know.
>
> hm, OK. I suppose we can hit it with the big hammer, but I'd be reluctant
> to merge this patch because it has the potential to hide problems, such as
> the as-yet-unfixed bio-uses-ZONE_DMA one.
>
> --- devel/mm/page_alloc.c~a 2006-02-26 13:26:56.000000000 -0800
> +++ devel-akpm/mm/page_alloc.c 2006-02-26 13:28:58.000000000 -0800
> @@ -1003,7 +1003,8 @@ rebalance:

I reversed the previous patch before applying this one. If they were
supposed to be used together, let me know.

>From the initial results it looks like the OOM Killer is not being used
now, Unfortunately I can't check with dmesg because right after login is
initiated (but before I get a chance to type anything) there is a
"Kernel BUG" message. This is all that is is printed when a serial
console is in use. If you need the rest of the information, let me know
and I'll see about typing it up.

----------- [cut here ] --------- [please bite here ] ---------
Kernel BUG at mm/vmalloc.c:352
invalid opcode: 0000 [1] SMP
CPU 1
Modules linked in: snd_pcm_oss snd_mixer_oss md5 ipv6 ipt_recent
ipt_REJECT xt_state xt_tcpudp iptable_filter ip_tables x_tables nfs
lockd nfs_acl sunrpc uhci_hcd r8169 ohci1394 ieee1394 emu10k1_gp
gameport snd_emu10k1 snd_rawmidi snd_ac97_codec snd_ac97_bus snd_pcm
snd_seq_device snd_timer snd_page_alloc snd_util_mem snd_hwdep snd
tda9887 tuner cx8800 cx88xx video_buf ir_common tveeprom compat_ioctl32
v4l1_compat v4l2_common btcx_risc videodev forcedeth usblp ohci_hcd
i2c_nforce2 ehci_hc

--
Chris Largret <http://daga.dyndns.org>

2006-02-26 23:34:53

by Chuck Ebbert

[permalink] [raw]
Subject: Re: OOM-killer too aggressive?

In-Reply-To: <[email protected]>

On Sun, 26 Feb 2006 at 21:39:17 +0100, Andi Kleen wrote:
> On Sun, Feb 26, 2006 at 10:21:52AM -0800, Andrew Morton wrote:
> > Chuck Ebbert <[email protected]> wrote:
> > >
> > > Chris Largret is getting repeated OOM kills because of DMA memory
> > > exhaustion:
> > >
> > > oom-killer: gfp_mask=0xd1, order=3
> > >
> >
> > This could be related to the known GFP_DMA oom on some x86_64 machines.
>
> What known GFP_DMA oom? GFP_DMA allocation should work.

http://marc.theaimsgroup.com/?t=113895864600001&r=1&w=2
http://marc.theaimsgroup.com/?t=113766047000002&r=1&w=2

--
Chuck
"Equations are the Devil's sentences." --Stephen Colbert

2006-02-26 23:47:43

by Andi Kleen

[permalink] [raw]
Subject: Re: OOM-killer too aggressive?

> hm, OK. I suppose we can hit it with the big hammer, but I'd be reluctant
> to merge this patch because it has the potential to hide problems, such as
> the as-yet-unfixed bio-uses-ZONE_DMA one.

Better would be to fix the block layer. I think something like that
would be better: (only lightly tested - it booted on a 6GB x86-64 box)

It is over pessimistic on systems with real IOMMU that can even remap
to DMA addresses < 4GB - in the future those might want to define
some ARCH_HAS macro so it can be checked here.

Does that patch fix the problem?

That said adding GFP_NORETRY to the floppy allocation is probably still a good
idea. I will do that change here.

-Andi


Disable block layer bouncing for most memory on 64bit systems

The low level PCI DMA mapping functions should handle it in most cases.

This should fix problems with depleting the DMA zone early. The old
code used precious GFP_DMA memory in many cases where it was not needed.

Signed-off-by: Andi Kleen <[email protected]>

Index: linux/block/ll_rw_blk.c
===================================================================
--- linux.orig/block/ll_rw_blk.c
+++ linux/block/ll_rw_blk.c
@@ -625,26 +625,32 @@ static inline int ordered_bio_endio(stru
* Different hardware can have different requirements as to what pages
* it can do I/O directly to. A low level driver can call
* blk_queue_bounce_limit to have lower memory pages allocated as bounce
- * buffers for doing I/O to pages residing above @page. By default
- * the block layer sets this to the highest numbered "low" memory page.
+ * buffers for doing I/O to pages residing above @page.
**/
void blk_queue_bounce_limit(request_queue_t *q, u64 dma_addr)
{
unsigned long bounce_pfn = dma_addr >> PAGE_SHIFT;
+ int dma = 0;

- /*
- * set appropriate bounce gfp mask -- unfortunately we don't have a
- * full 4GB zone, so we have to resort to low memory for any bounces.
- * ISA has its own < 16MB zone.
- */
- if (bounce_pfn < blk_max_low_pfn) {
+ q->bounce_gfp = GFP_NOIO;
+#if BITS_PER_LONG == 64
+ /* Assume anything >= 4GB can be handled by IOMMU.
+ Actually some IOMMUs can handle everything, but I don't
+ know of a way to test this here. */
+ if (bounce_pfn < (0xffffffff>>PAGE_SHIFT))
+ dma = 1;
+ q->bounce_pfn = max_low_pfn;
+#else
+ if (bounce_pfn < blk_max_low_pfn)
+ dma = 1;
+ q->bounce_pfn = bounce_pfn;
+#endif
+ if (dma) {
BUG_ON(dma_addr < BLK_BOUNCE_ISA);
init_emergency_isa_pool();
q->bounce_gfp = GFP_NOIO | GFP_DMA;
- } else
- q->bounce_gfp = GFP_NOIO;
-
- q->bounce_pfn = bounce_pfn;
+ q->bounce_pfn = bounce_pfn;
+ }
}

EXPORT_SYMBOL(blk_queue_bounce_limit);

2006-02-26 23:51:44

by Andi Kleen

[permalink] [raw]
Subject: Re: OOM-killer too aggressive?


Thinking about this more I think we need a __GFP_NOOOM for other
purposes too. e.g. the x86-64 IOMMU code tries to do similar
fallbacks and I suspect it will be hit by the OOM killer too.

-Andi

2006-02-27 00:22:37

by Andrew Morton

[permalink] [raw]
Subject: Re: OOM-killer too aggressive?

Chris Largret <[email protected]> wrote:
>
> On Sun, 2006-02-26 at 13:31 -0800, Andrew Morton wrote:
> > > Sorry, this didn't help on my machine. I am running that latest kernel
> > > pre-patch (2.6.16-rc4) for testing right now and had to modify the
> > > offsets a little. If there's any output that would help, please let me
> > > know.
> >
> > hm, OK. I suppose we can hit it with the big hammer, but I'd be reluctant
> > to merge this patch because it has the potential to hide problems, such as
> > the as-yet-unfixed bio-uses-ZONE_DMA one.
> >
> > --- devel/mm/page_alloc.c~a 2006-02-26 13:26:56.000000000 -0800
> > +++ devel-akpm/mm/page_alloc.c 2006-02-26 13:28:58.000000000 -0800
> > @@ -1003,7 +1003,8 @@ rebalance:
>
> I reversed the previous patch before applying this one. If they were
> supposed to be used together, let me know.

No, that's right.

> >From the initial results it looks like the OOM Killer is not being used
> now, Unfortunately I can't check with dmesg because right after login is
> initiated (but before I get a chance to type anything) there is a
> "Kernel BUG" message. This is all that is is printed when a serial
> console is in use. If you need the rest of the information, let me know
> and I'll see about typing it up.
>
> ----------- [cut here ] --------- [please bite here ] ---------
> Kernel BUG at mm/vmalloc.c:352
> invalid opcode: 0000 [1] SMP
> CPU 1
> Modules linked in: snd_pcm_oss snd_mixer_oss md5 ipv6 ipt_recent
> ipt_REJECT xt_state xt_tcpudp iptable_filter ip_tables x_tables nfs
> lockd nfs_acl sunrpc uhci_hcd r8169 ohci1394 ieee1394 emu10k1_gp
> gameport snd_emu10k1 snd_rawmidi snd_ac97_codec snd_ac97_bus snd_pcm
> snd_seq_device snd_timer snd_page_alloc snd_util_mem snd_hwdep snd
> tda9887 tuner cx8800 cx88xx video_buf ir_common tveeprom compat_ioctl32
> v4l1_compat v4l2_common btcx_risc videodev forcedeth usblp ohci_hcd
> i2c_nforce2 ehci_hc

Sigh. The floppy driver's just a jpke. Looks like the failed allocation
fell back to vmalloc then screwed it up.

I rather doubt whether x86_64 needs to be constraining itself to the ISA
DMA region anyway - something for Andi to look at please?

You could try this one instead, although I guess I'll need to fire up the
test box for this bug.


--- devel/include/asm-x86_64/floppy.h~b 2006-02-26 16:15:44.000000000 -0800
+++ devel-akpm/include/asm-x86_64/floppy.h 2006-02-26 16:16:21.000000000 -0800
@@ -40,7 +40,7 @@
#define fd_disable_irq() disable_irq(FLOPPY_IRQ)
#define fd_free_irq() free_irq(FLOPPY_IRQ, NULL)
#define fd_get_dma_residue() SW._get_dma_residue(FLOPPY_DMA)
-#define fd_dma_mem_alloc(size) SW._dma_mem_alloc(size)
+#define fd_dma_mem_alloc(size) __alloc_pages(GFP_KERNEL|__GFP_DMA32, get_order(size))
#define fd_dma_setup(addr, size, mode, io) SW._dma_setup(addr, size, mode, io)

#define FLOPPY_CAN_FALLBACK_ON_NODMA
_

2006-02-27 01:01:57

by Chris Largret

[permalink] [raw]
Subject: Re: OOM-killer too aggressive?

On Sun, 2006-02-26 at 16:20 -0800, Andrew Morton wrote:

> Sigh. The floppy driver's just a jpke. Looks like the failed allocation
> fell back to vmalloc then screwed it up.

> You could try this one instead, although I guess I'll need to fire up the
> test box for this bug.

> --- devel/include/asm-x86_64/floppy.h~b 2006-02-26 16:15:44.000000000 -0800
> +++ devel-akpm/include/asm-x86_64/floppy.h 2006-02-26 16:16:21.000000000 -0800
> @@ -40,7 +40,7 @@
> #define fd_disable_irq() disable_irq(FLOPPY_IRQ)
> #define fd_free_irq() free_irq(FLOPPY_IRQ, NULL)
> #define fd_get_dma_residue() SW._get_dma_residue(FLOPPY_DMA)
> -#define fd_dma_mem_alloc(size) SW._dma_mem_alloc(size)
> +#define fd_dma_mem_alloc(size) __alloc_pages(GFP_KERNEL|__GFP_DMA32, get_order(size))
> #define fd_dma_setup(addr, size, mode, io) SW._dma_setup(addr, size, mode, io)
>
> #define FLOPPY_CAN_FALLBACK_ON_NODMA

CC drivers/block/floppy.o
drivers/block/floppy.c: In function `raw_cmd_copyin':
drivers/block/floppy.c:3245: error: too few arguments to function
`__alloc_pages'
drivers/block/floppy.c: In function `floppy_open':
drivers/block/floppy.c:3738: error: too few arguments to function
`__alloc_pages'
drivers/block/floppy.c:3742: error: too few arguments to function
`__alloc_pages'
make[2]: *** [drivers/block/floppy.o] Error 1
make[1]: *** [drivers/block] Error 2
make: *** [drivers] Error 2


I'm sorry, but I'm not sure where to start for looking up the definition
for __alloc_pages().

--
Chris Largret <http://daga.dyndns.org>

2006-02-27 01:48:19

by Chris Largret

[permalink] [raw]
Subject: Re: OOM-killer too aggressive?

On Sun, 2006-02-26 at 18:22 -0600, Marcelo Tosatti wrote:
> On Sun, Feb 26, 2006 at 12:56:10PM -0800, Chris Largret wrote:
> > $ readelf -S vmlinux
> > There are 52 section headers, starting at offset 0x2548488:
>
> <snip>
>
> > [49] .shstrtab STRTAB 0000000000000000 02548212
> > 0000000000000273 0000000000000000 0 0 1
> > [50] .symtab SYMTAB 0000000000000000 02549188
> > 00000000000b3898 0000000000000018 51 20791 8
> > [51] .strtab STRTAB 0000000000000000 025fca20
> > 0000000000096692 0000000000000000 0 0 1
>
> More than 40MB, that should partially explain it...

Ouch. I hadn't noticed that and will have to see about bringing that
down a little. It's the same size when compiling without SMP, and the
OOM Killer doesn't cause problems then. There is something else that is
causing these problems.

>From using ls on the *.o files, it appears (as expected) that most of
this is the built-in drivers. The pruning should be fun. :)

--
Chris Largret <http://daga.dyndns.org>

2006-02-27 01:58:53

by Andrew Morton

[permalink] [raw]
Subject: Re: OOM-killer too aggressive?

Chris Largret <[email protected]> wrote:
>
> drivers/block/floppy.c:3245: error: too few arguments to function
> `__alloc_pages'

doh.

--- devel/include/asm-x86_64/floppy.h~b 2006-02-26 16:15:44.000000000 -0800
+++ devel-akpm/include/asm-x86_64/floppy.h 2006-02-26 17:57:02.000000000 -0800
@@ -40,7 +40,7 @@
#define fd_disable_irq() disable_irq(FLOPPY_IRQ)
#define fd_free_irq() free_irq(FLOPPY_IRQ, NULL)
#define fd_get_dma_residue() SW._get_dma_residue(FLOPPY_DMA)
-#define fd_dma_mem_alloc(size) SW._dma_mem_alloc(size)
+#define fd_dma_mem_alloc(size) alloc_pages(GFP_KERNEL|__GFP_DMA32, get_order(size))
#define fd_dma_setup(addr, size, mode, io) SW._dma_setup(addr, size, mode, io)

#define FLOPPY_CAN_FALLBACK_ON_NODMA
_

2006-02-27 06:34:53

by Chris Largret

[permalink] [raw]
Subject: Re: OOM-killer too aggressive?

On Sun, 2006-02-26 at 17:57 -0800, Andrew Morton wrote:
> Chris Largret <[email protected]> wrote:
> >
> > drivers/block/floppy.c:3245: error: too few arguments to function
> > `__alloc_pages'
>
> doh.
>
> --- devel/include/asm-x86_64/floppy.h~b 2006-02-26 16:15:44.000000000 -0800
> +++ devel-akpm/include/asm-x86_64/floppy.h 2006-02-26 17:57:02.000000000 -0800

Earlier I said that there was a "Kernel BUG" and all processing stopped
right after the login prompt was displayed (but before I could type
anything). Now the kernel continues to work, but the messages are a
little disconcerting. Here is the version with a backtrace (from dmesg):


Bad page state in process 'swapper'
page:ffff810001539168 flags:0x4000000000000400 mapping:0000000000000000
mapcount:0 count:0
Trying to fix it up, but a reboot is needed
Backtrace:

Call Trace: <IRQ> <ffffffff8104f0cd>{bad_page+85}
<ffffffff8104f419>{__free_pages_ok+179}
<ffffffff810500da>{__free_pages+49} <ffffffff8105015f>{free_pages
+131}
<ffffffff8117db99>{_fd_dma_mem_free+43}
<ffffffff81182ca0>{floppy_release_irq_and_dma+243}
<ffffffff811809fc>{set_dor+323}
<ffffffff81183d50>{motor_off_callback+0}
<ffffffff81183d74>{motor_off_callback+36}
<ffffffff81034e97>{run_timer_softirq+366}
<ffffffff810314e5>{__do_softirq+85}
<ffffffff8100bec6>{call_softirq+30}
<ffffffff8100d92e>{do_softirq+51} <ffffffff8103162a>{irq_exit+72}
<ffffffff810160b8>{smp_apic_timer_interrupt+75}
<ffffffff81008d0f>{default_idle+0}
<ffffffff8100b820>{apic_timer_interrupt+132} <EOI>
<ffffffff81008d0f>{default_idle+0}
<ffffffff81008d3e>{default_idle+47} <ffffffff81008f44>{cpu_idle
+103}
<ffffffff814390a1>{start_secondary+1189}
Bad page state in process 'swapper'
page:ffff8100015391a0 flags:0x4000000000000400 mapping:0000000000000000
mapcount:0 count:1
Trying to fix it up, but a reboot is needed
Backtrace:

Call Trace: <IRQ> <ffffffff8104f0cd>{bad_page+85}
<ffffffff8104f419>{__free_pages_ok+179}
<ffffffff810500da>{__free_pages+49} <ffffffff8105015f>{free_pages
+131}
<ffffffff8117db99>{_fd_dma_mem_free+43}
<ffffffff81182ca0>{floppy_release_irq_and_dma+243}
<ffffffff811809fc>{set_dor+323}
<ffffffff81183d50>{motor_off_callback+0}
<ffffffff81183d74>{motor_off_callback+36}
<ffffffff81034e97>{run_timer_softirq+366}
<ffffffff810314e5>{__do_softirq+85}
<ffffffff8100bec6>{call_softirq+30}
<ffffffff8100d92e>{do_softirq+51} <ffffffff8103162a>{irq_exit+72}
<ffffffff810160b8>{smp_apic_timer_interrupt+75}
<ffffffff81008d0f>{default_idle+0}
<ffffffff8100b820>{apic_timer_interrupt+132} <EOI>
<ffffffff81008d0f>{default_idle+0}
<ffffffff81008d3e>{default_idle+47} <ffffffff81008f44>{cpu_idle
+103}
<ffffffff814390a1>{start_secondary+1189}
Bad page state in process 'swapper'
page:ffff8100015391d8 flags:0x4000000000000400 mapping:0000000000000000
mapcount:0 count:1
Trying to fix it up, but a reboot is needed
Backtrace:

Call Trace: <IRQ> <ffffffff8104f0cd>{bad_page+85}
<ffffffff8104f419>{__free_pages_ok+179}
<ffffffff810500da>{__free_pages+49} <ffffffff8105015f>{free_pages
+131}
<ffffffff8117db99>{_fd_dma_mem_free+43}
<ffffffff81182ca0>{floppy_release_irq_and_dma+243}
<ffffffff811809fc>{set_dor+323}
<ffffffff81183d50>{motor_off_callback+0}
<ffffffff81183d74>{motor_off_callback+36}
<ffffffff81034e97>{run_timer_softirq+366}
<ffffffff810314e5>{__do_softirq+85}
<ffffffff8100bec6>{call_softirq+30}
<ffffffff8100d92e>{do_softirq+51} <ffffffff8103162a>{irq_exit+72}
<ffffffff810160b8>{smp_apic_timer_interrupt+75}
<ffffffff81008d0f>{default_idle+0}
<ffffffff8100b820>{apic_timer_interrupt+132} <EOI>
<ffffffff81008d0f>{default_idle+0}
<ffffffff81008d3e>{default_idle+47} <ffffffff81008f44>{cpu_idle
+103}
<ffffffff814390a1>{start_secondary+1189}
Bad page state in process 'swapper'
page:ffff810001539210 flags:0x4000000000000400 mapping:0000000000000000
mapcount:0 count:1
Trying to fix it up, but a reboot is needed
Backtrace:

Call Trace: <IRQ> <ffffffff8104f0cd>{bad_page+85}
<ffffffff8104f419>{__free_pages_ok+179}
<ffffffff810500da>{__free_pages+49} <ffffffff8105015f>{free_pages
+131}
<ffffffff8117db99>{_fd_dma_mem_free+43}
<ffffffff81182ca0>{floppy_release_irq_and_dma+243}
<ffffffff811809fc>{set_dor+323}
<ffffffff81183d50>{motor_off_callback+0}
<ffffffff81183d74>{motor_off_callback+36}
<ffffffff81034e97>{run_timer_softirq+366}
<ffffffff810314e5>{__do_softirq+85}
<ffffffff8100bec6>{call_softirq+30}
<ffffffff8100d92e>{do_softirq+51} <ffffffff8103162a>{irq_exit+72}
<ffffffff810160b8>{smp_apic_timer_interrupt+75}
<ffffffff81008d0f>{default_idle+0}
<ffffffff8100b820>{apic_timer_interrupt+132} <EOI>
<ffffffff81008d0f>{default_idle+0}
<ffffffff81008d3e>{default_idle+47} <ffffffff81008f44>{cpu_idle
+103}
<ffffffff814390a1>{start_secondary+1189}
Bad page state in process 'swapper'
page:ffff810001539248 flags:0x4000000000000400 mapping:0000000000000000
mapcount:0 count:1
Trying to fix it up, but a reboot is needed
Backtrace:

Call Trace: <IRQ> <ffffffff8104f0cd>{bad_page+85}
<ffffffff8104f419>{__free_pages_ok+179}
<ffffffff810500da>{__free_pages+49} <ffffffff8105015f>{free_pages
+131}
<ffffffff8117db99>{_fd_dma_mem_free+43}
<ffffffff81182ca0>{floppy_release_irq_and_dma+243}
<ffffffff811809fc>{set_dor+323}
<ffffffff81183d50>{motor_off_callback+0}
<ffffffff81183d74>{motor_off_callback+36}
<ffffffff81034e97>{run_timer_softirq+366}
<ffffffff810314e5>{__do_softirq+85}
<ffffffff8100bec6>{call_softirq+30}
<ffffffff8100d92e>{do_softirq+51} <ffffffff8103162a>{irq_exit+72}
<ffffffff810160b8>{smp_apic_timer_interrupt+75}
<ffffffff81008d0f>{default_idle+0}
<ffffffff8100b820>{apic_timer_interrupt+132} <EOI>
<ffffffff81008d0f>{default_idle+0}
<ffffffff81008d3e>{default_idle+47} <ffffffff81008f44>{cpu_idle
+103}
<ffffffff814390a1>{start_secondary+1189}
Bad page state in process 'swapper'
page:ffff810001539280 flags:0x4000000000000400 mapping:0000000000000000
mapcount:0 count:1
Trying to fix it up, but a reboot is needed
Backtrace:

Call Trace: <IRQ> <ffffffff8104f0cd>{bad_page+85}
<ffffffff8104f419>{__free_pages_ok+179}
<ffffffff810500da>{__free_pages+49} <ffffffff8105015f>{free_pages
+131}
<ffffffff8117db99>{_fd_dma_mem_free+43}
<ffffffff81182ca0>{floppy_release_irq_and_dma+243}
<ffffffff811809fc>{set_dor+323}
<ffffffff81183d50>{motor_off_callback+0}
<ffffffff81183d74>{motor_off_callback+36}
<ffffffff81034e97>{run_timer_softirq+366}
<ffffffff810314e5>{__do_softirq+85}
<ffffffff8100bec6>{call_softirq+30}
<ffffffff8100d92e>{do_softirq+51} <ffffffff8103162a>{irq_exit+72}
<ffffffff810160b8>{smp_apic_timer_interrupt+75}
<ffffffff81008d0f>{default_idle+0}
<ffffffff8100b820>{apic_timer_interrupt+132} <EOI>
<ffffffff81008d0f>{default_idle+0}
<ffffffff81008d3e>{default_idle+47} <ffffffff81008f44>{cpu_idle
+103}
<ffffffff814390a1>{start_secondary+1189}
Bad page state in process 'swapper'
page:ffff8100015392b8 flags:0x4000000000000400 mapping:0000000000000000
mapcount:0 count:1
Trying to fix it up, but a reboot is needed
Backtrace:

Call Trace: <IRQ> <ffffffff8104f0cd>{bad_page+85}
<ffffffff8104f419>{__free_pages_ok+179}
<ffffffff810500da>{__free_pages+49} <ffffffff8105015f>{free_pages
+131}
<ffffffff8117db99>{_fd_dma_mem_free+43}
<ffffffff81182ca0>{floppy_release_irq_and_dma+243}
<ffffffff811809fc>{set_dor+323}
<ffffffff81183d50>{motor_off_callback+0}
<ffffffff81183d74>{motor_off_callback+36}
<ffffffff81034e97>{run_timer_softirq+366}
<ffffffff810314e5>{__do_softirq+85}
<ffffffff8100bec6>{call_softirq+30}
<ffffffff8100d92e>{do_softirq+51} <ffffffff8103162a>{irq_exit+72}
<ffffffff810160b8>{smp_apic_timer_interrupt+75}
<ffffffff81008d0f>{default_idle+0}
<ffffffff8100b820>{apic_timer_interrupt+132} <EOI>
<ffffffff81008d0f>{default_idle+0}
<ffffffff81008d3e>{default_idle+47} <ffffffff81008f44>{cpu_idle
+103}
<ffffffff814390a1>{start_secondary+1189}
Bad page state in process 'swapper'
page:ffff8100015392f0 flags:0x4000000000000400 mapping:0000000000000000
mapcount:0 count:1
Trying to fix it up, but a reboot is needed
Backtrace:

Call Trace: <IRQ> <ffffffff8104f0cd>{bad_page+85}
<ffffffff8104f419>{__free_pages_ok+179}
<ffffffff810500da>{__free_pages+49} <ffffffff8105015f>{free_pages
+131}
<ffffffff8117db99>{_fd_dma_mem_free+43}
<ffffffff81182ca0>{floppy_release_irq_and_dma+243}
<ffffffff811809fc>{set_dor+323}
<ffffffff81183d50>{motor_off_callback+0}
<ffffffff81183d74>{motor_off_callback+36}
<ffffffff81034e97>{run_timer_softirq+366}
<ffffffff810314e5>{__do_softirq+85}
<ffffffff8100bec6>{call_softirq+30}
<ffffffff8100d92e>{do_softirq+51} <ffffffff8103162a>{irq_exit+72}
<ffffffff810160b8>{smp_apic_timer_interrupt+75}
<ffffffff81008d0f>{default_idle+0}
<ffffffff8100b820>{apic_timer_interrupt+132} <EOI>
<ffffffff81008d0f>{default_idle+0}
<ffffffff81008d3e>{default_idle+47} <ffffffff81008f44>{cpu_idle
+103}
<ffffffff814390a1>{start_secondary+1189}


--
Chris Largret <http://daga.dyndns.org>

2006-02-27 12:47:45

by Marcelo Tosatti

[permalink] [raw]
Subject: Re: OOM-killer too aggressive?

On Sun, Feb 26, 2006 at 05:48:15PM -0800, Chris Largret wrote:
> On Sun, 2006-02-26 at 18:22 -0600, Marcelo Tosatti wrote:
> > On Sun, Feb 26, 2006 at 12:56:10PM -0800, Chris Largret wrote:
> > > $ readelf -S vmlinux
> > > There are 52 section headers, starting at offset 0x2548488:
> >
> > <snip>
> >
> > > [49] .shstrtab STRTAB 0000000000000000 02548212
> > > 0000000000000273 0000000000000000 0 0 1
> > > [50] .symtab SYMTAB 0000000000000000 02549188
> > > 00000000000b3898 0000000000000018 51 20791 8
> > > [51] .strtab STRTAB 0000000000000000 025fca20
> > > 0000000000096692 0000000000000000 0 0 1
> >
> > More than 40MB, that should partially explain it...
>
> Ouch. I hadn't noticed that and will have to see about bringing that
> down a little. It's the same size when compiling without SMP, and the
> OOM Killer doesn't cause problems then. There is something else that is
> causing these problems.

Indeed, this only explains why the DMA zone is full.

The floppy driver is asking for a large contiguous chunk of memory
in the DMA zone, which the allocator tries to satistify by killing
applications.

Andrew's patch makes the allocator give up easier, which allows the
driver to fallback to non-contiguous memory (that is the real problem).

> >From using ls on the *.o files, it appears (as expected) that most of
> this is the built-in drivers. The pruning should be fun. :)

There should be no need to prune it to fix the OOM issue, it explains
why the DMA memory is full though.

2006-02-27 22:30:22

by Christoph Lameter

[permalink] [raw]
Subject: Re: OOM-killer too aggressive?

On Sun, 27 Feb 2006, Andi Kleen wrote:

> Thinking about this more I think we need a __GFP_NOOOM for other
> purposes too. e.g. the x86-64 IOMMU code tries to do similar
> fallbacks and I suspect it will be hit by the OOM killer too.

Isnt this also a constrained allocation? We could expand the check to also
catch these types of restrictions and fail.

2006-02-28 00:41:24

by Andi Kleen

[permalink] [raw]
Subject: Re: OOM-killer too aggressive?

On Mon, Feb 27, 2006 at 02:30:02PM -0800, Christoph Lameter wrote:
> On Sun, 27 Feb 2006, Andi Kleen wrote:
>
> > Thinking about this more I think we need a __GFP_NOOOM for other
> > purposes too. e.g. the x86-64 IOMMU code tries to do similar
> > fallbacks and I suspect it will be hit by the OOM killer too.
>
> Isnt this also a constrained allocation? We could expand the check to also
> catch these types of restrictions and fail.

No, it uses the full fallback zone list of the target node, not a custom
one. Would be hard to detect without a flag.

Maybe __GFP_NORETRY is actually good enough for this purpose. Opinions?

-Andi

2006-02-28 01:00:36

by Andrew Morton

[permalink] [raw]
Subject: Re: OOM-killer too aggressive?

Andi Kleen <[email protected]> wrote:
>
> On Mon, Feb 27, 2006 at 02:30:02PM -0800, Christoph Lameter wrote:
> > On Sun, 27 Feb 2006, Andi Kleen wrote:
> >
> > > Thinking about this more I think we need a __GFP_NOOOM for other
> > > purposes too. e.g. the x86-64 IOMMU code tries to do similar
> > > fallbacks and I suspect it will be hit by the OOM killer too.
> >
> > Isnt this also a constrained allocation? We could expand the check to also
> > catch these types of restrictions and fail.
>
> No, it uses the full fallback zone list of the target node, not a custom
> one. Would be hard to detect without a flag.
>
> Maybe __GFP_NORETRY is actually good enough for this purpose. Opinions?
>

I was thinking that your __GFP_NOOOM was a thinko. How would it differ
from __GFP_NORETRY?

2006-02-28 01:04:14

by Christoph Lameter

[permalink] [raw]
Subject: Re: OOM-killer too aggressive?

On Mon, 27 Feb 2006, Andrew Morton wrote:

> > On Mon, Feb 27, 2006 at 02:30:02PM -0800, Christoph Lameter wrote:
> > > Isnt this also a constrained allocation? We could expand the check to also
> > > catch these types of restrictions and fail.
> >
> > No, it uses the full fallback zone list of the target node, not a custom
> > one. Would be hard to detect without a flag.

Right but it specifies in its flags that not all system memory can satisfy
this particular memory request. That fact may be detected by the
out_of_memory() function. We could do something special there instead of
OOMing.



2006-02-28 01:25:59

by Andi Kleen

[permalink] [raw]
Subject: Re: OOM-killer too aggressive?

> I was thinking that your __GFP_NOOOM was a thinko. How would it differ
> from __GFP_NORETRY?

__GFP_NORETRY seems to skip at least one retry pass as far as I can see.
__GFP_NOOOM wouldn't. But perhaps the additional pass only makes sense
with oom killing? I'm not sure - that is why i was asking.

-Andi

2006-02-28 01:39:47

by Andrew Morton

[permalink] [raw]
Subject: Re: OOM-killer too aggressive?

Andi Kleen <[email protected]> wrote:
>
> > I was thinking that your __GFP_NOOOM was a thinko. How would it differ
> > from __GFP_NORETRY?
>
> __GFP_NORETRY seems to skip at least one retry pass as far as I can see.
> __GFP_NOOOM wouldn't. But perhaps the additional pass only makes sense
> with oom killing? I'm not sure - that is why i was asking.
>

Oh, OK. That final get_page_from_freelist() is allegedly to see if a
parallel oom-killing freed some pages - we already know that
try_to_free_pages() didn't work.

I rather doubt that it'll make any difference.

2006-02-28 12:09:55

by Andi Kleen

[permalink] [raw]
Subject: Re: OOM-killer too aggressive?

On Mon, Feb 27, 2006 at 05:38:30PM -0800, Andrew Morton wrote:
> Andi Kleen <[email protected]> wrote:
> >
> > > I was thinking that your __GFP_NOOOM was a thinko. How would it differ
> > > from __GFP_NORETRY?
> >
> > __GFP_NORETRY seems to skip at least one retry pass as far as I can see.
> > __GFP_NOOOM wouldn't. But perhaps the additional pass only makes sense
> > with oom killing? I'm not sure - that is why i was asking.
> >
>
> Oh, OK. That final get_page_from_freelist() is allegedly to see if a
> parallel oom-killing freed some pages - we already know that
> try_to_free_pages() didn't work.
>
> I rather doubt that it'll make any difference.

I switched over the x86-64 IOMMU code and floppy code to use
__GFP_NORETRY now.

But perhaps it would be better to rename it to __GFP_NOOOM
because I think that would express its meaning better.

-Andi