I've been following quite closely the development of 2.6.9, testing
every -rc release and a lot of -bk's.
Upon changing from 2.6.9-rc2 to 2.6.9-rc3 I began experiencing random
oom kills whenever a high memory i/o load took place.
This happened with plenty of free memory, and with whatever values I
used for vm.overcommit_ratio and vm.overcommit_memory
Doubling the physical RAM didn't change the situation either.
Having traced the problem to 2.6.9-rc3, I took a look at the differences
in memory handling between 2.6.9-rc2 and 2.6.9-rc3 and with the attached
patch I have no more oom kills. Not a single one.
I'm not saying everything within the patch is needed, not even that it's
the right thing to change. Nonetheless, 2.6.9 vanilla was unusable,
while this avoids those memory leaks.
Please, review and see what's wrong there :)
--
A mother takes twenty years to make a man of her boy, and another woman
makes a fool of him in twenty minutes.
a -- Frost
Javier Marcet <[email protected]>
Andrew Morton wrote:
> Javier Marcet <[email protected]> wrote:
>
>>I've been following quite closely the development of 2.6.9, testing
>> every -rc release and a lot of -bk's.
>>
>> Upon changing from 2.6.9-rc2 to 2.6.9-rc3 I began experiencing random
>> oom kills whenever a high memory i/o load took place.
>
>
> Do you have swap online?
When he first reported it he said no swap.
> What sort of machine is it, and how much memory has it?
Ditto - 1Gb ram.
Con
Javier Marcet <[email protected]> wrote:
>
> I've been following quite closely the development of 2.6.9, testing
> every -rc release and a lot of -bk's.
>
> Upon changing from 2.6.9-rc2 to 2.6.9-rc3 I began experiencing random
> oom kills whenever a high memory i/o load took place.
Do you have swap online?
What sort of machine is it, and how much memory has it?
> This happened with plenty of free memory, and with whatever values I
> used for vm.overcommit_ratio and vm.overcommit_memory
> Doubling the physical RAM didn't change the situation either.
>
> Having traced the problem to 2.6.9-rc3, I took a look at the differences
> in memory handling between 2.6.9-rc2 and 2.6.9-rc3 and with the attached
> patch I have no more oom kills. Not a single one.
>
> I'm not saying everything within the patch is needed, not even that it's
> the right thing to change. Nonetheless, 2.6.9 vanilla was unusable,
> while this avoids those memory leaks.
That patch only affects NUMA machines?
On Sat, 23 Oct 2004, Javier Marcet wrote:
> I'm not saying everything within the patch is needed, not even that it's
> the right thing to change.
I suspect the following (still untested) patch might be
needed, too. Basically when the VM gets tight, it could
still ignore swappable pages with the referenced bit set.
Both really referenced pages and pages from the process
that currently has the swap token.
Forcefully deactivating a few pages when we run with
priority 0 might get rid of the false OOM kills.
I'm about to test this on a very small system here, and
will let you know how things go.
===== mm/vmscan.c 1.231 vs edited =====
--- 1.231/mm/vmscan.c Sun Oct 17 01:07:24 2004
+++ edited/mm/vmscan.c Mon Oct 25 17:38:56 2004
@@ -379,7 +379,7 @@
referenced = page_referenced(page, 1);
/* In active use or really unfreeable? Activate it. */
- if (referenced && page_mapping_inuse(page))
+ if (referenced && sc->priority && page_mapping_inuse(page))
goto activate_locked;
#ifdef CONFIG_SWAP
@@ -715,7 +715,7 @@
if (page_mapped(page)) {
if (!reclaim_mapped ||
(total_swap_pages == 0 && PageAnon(page)) ||
- page_referenced(page, 0)) {
+ (page_referenced(page, 0) && sc->priority)) {
list_add(&page->lru, &l_active);
continue;
}
Rik van Riel <[email protected]> wrote:
>
> - if (referenced && page_mapping_inuse(page))
> + if (referenced && sc->priority && page_mapping_inuse(page))
Makes heaps of sense, but I'd like to exactly understand why people are
getting oomings before doing something like this. I think we're still
waiting for a testcase?
On Mon, 25 Oct 2004, Rik van Riel wrote:
> On Mon, 25 Oct 2004, Andrew Morton wrote:
> > Rik van Riel <[email protected]> wrote:
> > >
> > > - if (referenced && page_mapping_inuse(page))
> > > + if (referenced && sc->priority && page_mapping_inuse(page))
> >
> > Makes heaps of sense, but I'd like to exactly understand why people are
> > getting oomings before doing something like this. I think we're still
> > waiting for a testcase?
>
> I'm now running Yum on a (virtual) system with 96MB RAM and
> 100MB swap. This used to get an OOM kill very quickly, but
> still seems to be running now, after 20 minutes.
It completed, without being OOM killed like before.
--
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." - Brian W. Kernighan
Andrew Morton wrote:
> Rik van Riel <[email protected]> wrote:
>
>>- if (referenced && page_mapping_inuse(page))
>>+ if (referenced && sc->priority && page_mapping_inuse(page))
>
>
> Makes heaps of sense, but I'd like to exactly understand why people are
> getting oomings before doing something like this. I think we're still
> waiting for a testcase?
I have found that quite often it is because all_unreclaimable gets set,
scanning slows down, and the OOM killer goes off.
Rik, I wonder if you can put some printk's where all_unreclaimable
is being set to 1, and see if there is any correlation to OOMs?
Aside from that, the patch does make sense, but might be too aggressive.
In heavy swapping loads, the "zero priority" scan might make up a
significant proportion of the scanning done so you'll want to be
careful about regressions there.
On Mon, Oct 25, 2004 at 06:33:35PM -0400, Rik van Riel wrote:
> On Mon, 25 Oct 2004, Rik van Riel wrote:
> > On Mon, 25 Oct 2004, Andrew Morton wrote:
> > > Rik van Riel <[email protected]> wrote:
> > > >
> > > > - if (referenced && page_mapping_inuse(page))
> > > > + if (referenced && sc->priority && page_mapping_inuse(page))
> > >
> > > Makes heaps of sense, but I'd like to exactly understand why people are
> > > getting oomings before doing something like this. I think we're still
> > > waiting for a testcase?
> >
> > I'm now running Yum on a (virtual) system with 96MB RAM and
> > 100MB swap. This used to get an OOM kill very quickly, but
> > still seems to be running now, after 20 minutes.
>
> It completed, without being OOM killed like before.
Barry,
Can you please test Rik's patch with your spurious OOM kill testcase?
===== mm/vmscan.c 1.231 vs edited =====
--- 1.231/mm/vmscan.c Sun Oct 17 01:07:24 2004
+++ edited/mm/vmscan.c Mon Oct 25 17:38:56 2004
@@ -379,7 +379,7 @@
referenced = page_referenced(page, 1);
/* In active use or really unfreeable? Activate it. */
- if (referenced && page_mapping_inuse(page))
+ if (referenced && sc->priority && page_mapping_inuse(page))
goto activate_locked;
#ifdef CONFIG_SWAP
@@ -715,7 +715,7 @@
if (page_mapped(page)) {
if (!reclaim_mapped ||
(total_swap_pages == 0 && PageAnon(page)) ||
- page_referenced(page, 0)) {
+ (page_referenced(page, 0) && sc->priority)) {
list_add(&page->lru, &l_active);
continue;
}
Marcelo Tosatti escreveu:
> Can you please test Rik's patch with your spurious OOM kill testcase?
Do you have a particular test case in mind? Is it accessible to the rest
of us? If you send it to me I will run it on my 64MB P2 machine, which
makes a very good test rig for the oom_killer because it is normally
plagued by it.
I have already run Rik's patch to great success using my test case of
compiling umlsim. Without the patch this fails every time at the linking
the UML kernel stage.
Regards,
Chris R.
On Thu, Oct 28, 2004 at 05:27:18PM +0200, Chris Ross wrote:
>
>
> Marcelo Tosatti escreveu:
> >Can you please test Rik's patch with your spurious OOM kill testcase?
>
> Do you have a particular test case in mind?
Anonymous memory intensive loads. It was easy to trigger the
problem with "fillmem" from Quintela's memtest suite.
> Is it accessible to the rest
> of us? If you send it to me I will run it on my 64MB P2 machine, which
> makes a very good test rig for the oom_killer because it is normally
> plagued by it.
>
> I have already run Rik's patch to great success using my test case of
> compiling umlsim. Without the patch this fails every time at the linking
> the UML kernel stage.
Cool!
Marcelo & co.,
Testing again: on plain 2.6.10-rc1-mm2 (i.e. without Rik's patch)
building umlsim fails on my 64MB P2 350MHz Gentoo box exactly as before.
Regards,
Chris R.
Oct 29 15:25:19 sleepy oom-killer: gfp_mask=0xd0
Oct 29 15:25:19 sleepy DMA per-cpu:
Oct 29 15:25:19 sleepy cpu 0 hot: low 2, high 6, batch 1
Oct 29 15:25:19 sleepy cpu 0 cold: low 0, high 2, batch 1
Oct 29 15:25:19 sleepy Normal per-cpu:
Oct 29 15:25:19 sleepy cpu 0 hot: low 4, high 12, batch 2
Oct 29 15:25:19 sleepy cpu 0 cold: low 0, high 4, batch 2
Oct 29 15:25:19 sleepy HighMem per-cpu: empty
Oct 29 15:25:19 sleepy
Oct 29 15:25:19 sleepy Free pages: 244kB (0kB HighMem)
Oct 29 15:25:19 sleepy Active:12269 inactive:596 dirty:0 writeback:0
unstable:0 free:61 slab:1117 mapped:12368 pagetables:140
Oct 29 15:25:19 sleepy DMA free:60kB min:60kB low:120kB high:180kB
active:12304kB inactive:0kB present:16384kB pages_scanned:15252
all_unreclaimable? yes
Oct 29 15:25:19 sleepy protections[]: 0 0 0
Oct 29 15:25:19 sleepy Normal free:184kB min:188kB low:376kB high:564kB
active:36772kB inactive:2384kB present:49144kB pages_scanned:41571
all_unreclaimable
? yes
Oct 29 15:25:19 sleepy protections[]: 0 0 0
Oct 29 15:25:19 sleepy HighMem free:0kB min:128kB low:256kB high:384kB
active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
Oct 29 15:25:19 sleepy protections[]: 0 0 0
Oct 29 15:25:19 sleepy DMA: 4294567735*4kB 4294792863*8kB
4294895642*16kB 4294943555*32kB 4294962724*64kB 4294966891*128kB
4294967255*256kB 4294967283*512kB
4294967290*1024kB 4294967293*2048kB 4294967294*4096kB = 4289685332kB
Oct 29 15:25:19 sleepy Normal: 4293893066*4kB 4294583823*8kB
4294849819*16kB 4294950038*32kB 4294966291*64kB 4294966753*128kB
4294967182*256kB 4294967238*51
2kB 4294967265*1024kB 4294967278*2048kB 4294967281*4096kB = 4284847952kB
Oct 29 15:25:19 sleepy HighMem: empty
Oct 29 15:25:19 sleepy Swap cache: add 9372, delete 7530, find
1491/1835, race 0+0
Oct 29 15:25:19 sleepy Out of Memory: Killed process 12157 (ld).
Marcelo,
On Thu, 2004-10-28 at 10:06 -0200, Marcelo Tosatti wrote:
> Can you please test Rik's patch with your spurious OOM kill testcase?
I have similar problems with hackbench on a PIII/Celeron 300MHz, 128MB
RAM.
Also Rik's patch does not help. I have the impression that it makes
almost no difference whether swap is on or not. With swap on, oom-killer
even kills rpc.mountd when it prints out that 56MB memory are available.
It happens on 2.6.8.1 too. On 2.6.7 I can run hackbench 40 without any
problem.
Full logs are attached.
tglx
Running hackbench 40, with swap on, I get
oom-killer: gfp_mask=0xd0
Free pages: 356kB (0kB HighMem)
Out of Memory: Killed process 1030 (portmap).
oom-killer: gfp_mask=0xd0
Free pages: 6684kB (0kB HighMem)
Out of Memory: Killed process 1182 (atd).
oom-killer: gfp_mask=0xd0
Free pages: 17076kB (0kB HighMem)
Out of Memory: Killed process 1173 (sshd).
oom-killer: gfp_mask=0xd0
Free pages: 20160kB (0kB HighMem)
Out of Memory: Killed process 1191 (bash).
- That's the shell on which hackbench was started
oom-killer: gfp_mask=0xd0
Free pages: 56544kB (0kB HighMem)
Out of Memory: Killed process 1149 (rpc.mountd).
Switching swap off, I get
oom-killer: gfp_mask=0xd0
Free pages: 404kB (0kB HighMem)
Out of Memory: Killed process 1031 (portmap).
oom-killer: gfp_mask=0xd0
Free pages: 356kB (0kB HighMem)
Out of Memory: Killed process 1169 (atd).
oom-killer: gfp_mask=0xd0
Free pages: 792kB (0kB HighMem)
Out of Memory: Killed process 1160 (sshd).
oom-killer: gfp_mask=0xd0
Free pages: 2340kB (0kB HighMem)
Out of Memory: Killed process 1178 (bash).
- That's the shell on which hackbench was started
Chris Ross escreveu:
> Testing again: on plain 2.6.10-rc1-mm2 (i.e. without Rik's patch)
> building umlsim fails on my 64MB P2 350MHz Gentoo box exactly as before.
To confirm, 2.6.10-rc1-mm2 with Rik's patch compiles umlsim-65
(http://umlsim.sourceforge.net/umlsim-65.tar.gz) just fine.
Regards,
Chris R.
On Sat, 2004-10-23 at 14:59 +0200, Javier Marcet wrote:
> I've been following quite closely the development of 2.6.9, testing
> every -rc release and a lot of -bk's.
>
> Upon changing from 2.6.9-rc2 to 2.6.9-rc3 I began experiencing random
> oom kills whenever a high memory i/o load took place.
> This happened with plenty of free memory, and with whatever values I
> used for vm.overcommit_ratio and vm.overcommit_memory
> Doubling the physical RAM didn't change the situation either.
>
> Having traced the problem to 2.6.9-rc3, I took a look at the differences
> in memory handling between 2.6.9-rc2 and 2.6.9-rc3 and with the attached
> patch I have no more oom kills. Not a single one.
>
> I'm not saying everything within the patch is needed, not even that it's
> the right thing to change. Nonetheless, 2.6.9 vanilla was unusable,
> while this avoids those memory leaks.
>
> Please, review and see what's wrong there :)
The changes in mempolicy.c are unrelated, except you have a NUMA enabled
machine.
The flush_dcache_page() is only relevant for non x86, as they result in
a NOP there.
tglx
Hi,
> Oct 29 15:25:19 sleepy protections[]: 0 0 0
> Oct 29 15:25:19 sleepy DMA: 4294567735*4kB 4294792863*8kB
> 4294895642*16kB 4294943555*32kB 4294962724*64kB 4294966891*128kB
> 4294967255*256kB 4294967283*512kB
> 4294967290*1024kB 4294967293*2048kB 4294967294*4096kB = 4289685332kB
> Oct 29 15:25:19 sleepy Normal: 4293893066*4kB 4294583823*8kB
> 4294849819*16kB 4294950038*32kB 4294966291*64kB 4294966753*128kB
> 4294967182*256kB 4294967238*51
> 2kB 4294967265*1024kB 4294967278*2048kB 4294967281*4096kB = 4284847952kB
> Oct 29 15:25:19 sleepy HighMem: empty
> Oct 29 15:25:19 sleepy Swap cache: add 9372, delete 7530, find
This looks odd.
How about this fix ?
I don't know why this is missng ....
Kame <[email protected]>
--
---
linux-2.6.10-rc1-mm2-kamezawa/mm/page_alloc.c | 4 +++-
1 files changed, 3 insertions(+), 1 deletion(-)
diff -puN mm/page_alloc.c~clean-up mm/page_alloc.c
--- linux-2.6.10-rc1-mm2/mm/page_alloc.c~clean-up 2004-10-30 17:07:01.918419104 +0900
+++ linux-2.6.10-rc1-mm2-kamezawa/mm/page_alloc.c 2004-10-30 17:08:25.904651256 +0900
@@ -261,7 +261,9 @@ static inline void __free_pages_bulk (st
}
coalesced = base + page_idx;
set_page_order(coalesced, order);
- list_add(&coalesced->lru, &zone->free_area[order].free_list);
+ area = zone->free_area + order;
+ list_add(&coalesced->lru, &area->free_list);
+ area->nr_free++;
}
static inline void free_pages_check(const char *function, struct page *page)
_
Hiroyuki KAMEZAWA escreveu:
> How about this fix ?
> I don't know why this is missng ....
Instead of, or as well as Rik's fix?
Regards,
Chris R.
Chris Ross wrote:
>
>
> Hiroyuki KAMEZAWA escreveu:
>
>> How about this fix ?
>> I don't know why this is missng ....
>
>
> Instead of, or as well as Rik's fix?
>
Both of Rik's and this will be needed, I think.
> Regards,
> Chris R.
>
zone->free_area[order]->nr_free is corrupted, this patch fix it.
It looks there is no area->nr_free++ code during freeing pages, now.
Kame <[email protected]>
Hiroyuki KAMEZAWA escreveu:
> zone->free_area[order]->nr_free is corrupted, this patch fix it.
>
> It looks there is no area->nr_free++ code during freeing pages, now.
It's corrupt because area is out of scope at that point - it's declared
within the for loop above.
Should I move your fix into the loop or move the declaration of area to
function scope?
Regards,
Chris R.
Chris Ross wrote:
>
>
> Hiroyuki KAMEZAWA escreveu:
>
>> zone->free_area[order]->nr_free is corrupted, this patch fix it.
>>
>> It looks there is no area->nr_free++ code during freeing pages, now.
>
>
> It's corrupt because area is out of scope at that point - it's declared
> within the for loop above.
>
> Should I move your fix into the loop or move the declaration of area to
> function scope?
>
Oh, Okay, my patch was wrong ;(.
Very sorry for wrong hack.
This one will be Okay.
Sorry,
Kame <[email protected]>
-
linux-2.6.10-rc1-mm2-kamezawa/mm/page_alloc.c | 1 +
1 files changed, 1 insertion(+)
diff -puN mm/page_alloc.c~cleanup2 mm/page_alloc.c
--- linux-2.6.10-rc1-mm2/mm/page_alloc.c~cleanup2 2004-10-30 18:40:19.024529640 +0900
+++ linux-2.6.10-rc1-mm2-kamezawa/mm/page_alloc.c 2004-10-30 18:40:40.225306632 +0900
@@ -262,6 +262,7 @@ static inline void __free_pages_bulk (st
coalesced = base + page_idx;
set_page_order(coalesced, order);
list_add(&coalesced->lru, &zone->free_area[order].free_list);
+ zone->free_area[order].nr_free++;
}
static inline void free_pages_check(const char *function, struct page *page)
_
On Sat, 2004-10-30 at 18:53 +0900, Hiroyuki KAMEZAWA wrote:
> > Should I move your fix into the loop or move the declaration of area to
> > function scope?
> >
> Oh, Okay, my patch was wrong ;(.
> Very sorry for wrong hack.
> This one will be Okay.
It fixes at least the corrupted output of show_free_areas().
DMA: 4294966389*4kB 4294966983*8kB 4294967156*16kB .....
Normal: 4294954991*4kB 4294962949*8kB 4294965607*16kB ....
now it's
DMA: 248*4kB 63*8kB 7*16kB 1*32kB 0*64kB 0*128kB ...
Normal: 204*4kB 416*8kB 157*16kB 20*32kB 3*64kB ...
Good catch.
But it still does not fix the random madness of oom-killer. Once it is
triggered it keeps going even if there is 50MB free memory available.
tglx
Hiroyuki KAMEZAWA escreveu:
> Oh, Okay, my patch was wrong ;(.
> Very sorry for wrong hack.
> This one will be Okay.
That works, now my oom report looks like this...
Oct 30 17:32:22 sleepy oom-killer: gfp_mask=0x1d2
Oct 30 17:32:22 sleepy DMA per-cpu:
Oct 30 17:32:22 sleepy cpu 0 hot: low 2, high 6, batch 1
Oct 30 17:32:22 sleepy cpu 0 cold: low 0, high 2, batch 1
Oct 30 17:32:22 sleepy Normal per-cpu:
Oct 30 17:32:22 sleepy cpu 0 hot: low 4, high 12, batch 2
Oct 30 17:32:22 sleepy cpu 0 cold: low 0, high 4, batch 2
Oct 30 17:32:22 sleepy HighMem per-cpu: empty
Oct 30 17:32:22 sleepy
Oct 30 17:32:22 sleepy Free pages: 332kB (0kB HighMem)
Oct 30 17:32:22 sleepy Active:11887 inactive:517 dirty:0 writeback:0
unstable:0 free:83 slab:1347 mapped:11930 pagetables:247
Oct 30 17:32:22 sleepy DMA free:60kB min:60kB low:120kB high:180kB
active:11256kB inactive:436kB present:16384kB pages_scanned:11686
all_unreclaimable? yes
Oct 30 17:32:22 sleepy protections[]: 0 0 0
Oct 30 17:32:22 sleepy Normal free:272kB min:188kB low:376kB high:564kB
active:36292kB inactive:1632kB present:49144kB pages_scanned:6922
all_unreclaimable? no
Oct 30 17:32:22 sleepy protections[]: 0 0 0
Oct 30 17:32:22 sleepy HighMem free:0kB min:128kB low:256kB high:384kB
active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
Oct 30 17:32:22 sleepy protections[]: 0 0 0
Oct 30 17:32:22 sleepy DMA: 1*4kB 1*8kB 1*16kB 1*32kB 0*64kB 0*128kB
0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 60kB
Oct 30 17:32:22 sleepy Normal: 0*4kB 12*8kB 1*16kB 1*32kB 0*64kB 1*128kB
0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 272kB
Oct 30 17:32:22 sleepy HighMem: empty
Oct 30 17:32:22 sleepy Swap cache: add 136776, delete 129314, find
37853/51620, race 0+0
Oct 30 17:32:22 sleepy Out of Memory: Killed process 12395 (ld).
> But it still does not fix the random madness of oom-killer. Once it is
> triggered it keeps going even if there is 50MB free memory available.
>
Additionally to Rik's and Kame's fixes, I changed the criterias in
oom_kill a bit to
- take processes which fork a lot of children into account instead of
killing stuff like portmap and sshd just because they haven't used a lot
of CPU time since they started.
- prevent oom-killer to continue to kill processes, even if memory is
available
I was facing the above problems on a small embedded system, where a
forking server started to flood the machine with child processes.
oom-killer killed portmap and sshd instead of the real culprit and took
away the opportunity to log into the machine remotely.
The problem can be simulated with hackbench on a small UP system.
I don't know, if these changes have any negative impacts on the
testcases which were used in the original design. Rik ??
tglx