2001-10-05 19:24:51

by Hugh Dickins

[permalink] [raw]
Subject: pre4 oom too soon

2.4.11-pre4 gives me oom_kill I never got before.
All numbers decimal in 4kB pages:

num_physpages 65520
free or freeable 56000 (from MemFree after swapoff afterwards)
total_swap_pages 132526
prog tries to hog 153600

At oom_kill time:

all_zones_low yes (DMA & Normal well above min, no Highmem)
nr_swap_pages 0
page_cache_size 59013
swapper_space.nrpages 58202

I'm not sure exactly what to blame in out_of_memory(), but it does
look wrong to depend so much on whether nr_swap_pages happens to be
0 at that instant or not, and a lot of that full swap is duplicated
in the swap cache. Probably that should be taken into consideration?

(I wonder whether, before my 2.4.10 fix to lowest_bit and highest_bit
in scan_swap_map, it was rarer to find nr_swap_pages 0 - swap pages
could be free, but invisible to scan_swap_map.)

Hugh


2001-10-05 19:49:26

by Marcelo Tosatti

[permalink] [raw]
Subject: Re: pre4 oom too soon


On Fri, 5 Oct 2001, Hugh Dickins wrote:

> 2.4.11-pre4 gives me oom_kill I never got before.
> All numbers decimal in 4kB pages:
>
> num_physpages 65520
> free or freeable 56000 (from MemFree after swapoff afterwards)
> total_swap_pages 132526
> prog tries to hog 153600
>
> At oom_kill time:
>
> all_zones_low yes (DMA & Normal well above min, no Highmem)
> nr_swap_pages 0
> page_cache_size 59013
> swapper_space.nrpages 58202
>
> I'm not sure exactly what to blame in out_of_memory(), but it does
> look wrong to depend so much on whether nr_swap_pages happens to be
> 0 at that instant or not, and a lot of that full swap is duplicated
> in the swap cache. Probably that should be taken into consideration?

The issue is that right now we're going to _check_ for OOM each time
kswapd_balance_pgdat is not able to make all zones have enough free
pages: That is way too fragile (I submitted the patch to Linus saying that
it was just a previa, and he included it anyway.. :))



do {
need_more_balance = 0;
pgdat = pgdat_list;
do
need_more_balance |= kswapd_balance_pgdat(pgdat);
while ((pgdat = pgdat->node_next));
if (need_more_balance && out_of_memory()) {
oom_kill();
}
} while (need_more_balance);

Note that a full kswapd_balance_pgdat() is going to scan only a small
portion of the lists. I'm pretty sure we have to guarantee kswapd scanned
at least all lists (maybe scanned all lists twice), before checking for
OOM.

I guess I'll not be able to write a patch to give us that behaviour
_today_, but I'll do so Monday if nobody else does.

2001-10-05 20:00:07

by Linus Torvalds

[permalink] [raw]
Subject: Re: pre4 oom too soon


On Fri, 5 Oct 2001, Hugh Dickins wrote:
>
> I'm not sure exactly what to blame in out_of_memory(), but it does
> look wrong to depend so much on whether nr_swap_pages happens to be 0
> at that instant or not, and a lot of that full swap is duplicated in
> the swap cache. Probably that should be taken into consideration?

That sounds sane. If we have swap cache pages, we're not out of memory
yet.

Linus

2001-10-05 20:03:07

by Linus Torvalds

[permalink] [raw]
Subject: Re: pre4 oom too soon


On Fri, 5 Oct 2001, Marcelo Tosatti wrote:
>
> Note that a full kswapd_balance_pgdat() is going to scan only a small
> portion of the lists. I'm pretty sure we have to guarantee kswapd
> scanned at least all lists (maybe scanned all lists twice), before
> checking for OOM.

Why not just say "if we have swap cache pages, we aren't oom".

If we've scanned all lists twice, we should have unmapped all users of
swap-cache pages, and we should have dropped them.

And make the test be not quite black-and-white: we're almost always going
to have a _few_ swap-cache pages around under heavy memory load, if only
because of read-ahead etc that pins the pages. But if the swap cache is a
noticeable fraction of memory, we're obviously not oom _regardless_ of
what the VM balancers say.

Linus

2001-10-05 20:13:07

by Linus Torvalds

[permalink] [raw]
Subject: Re: pre4 oom too soon


Actually, it looks like the easiest solution is to just remove the

cache_mem -= swapper_space.nrpages;

which should just automatically do the right thing.

Linus