LinuxLists.cc - [PATCH] mm/memory.c, 2.4.1 : memory leak with swap cache (updated)

2001-03-22 22:06:59

Subject: [PATCH] mm/memory.c, 2.4.1 : memory leak with swap cache (updated)

2.4.1 has a memory leak (temporary) where anonymous memory pages that have
been moved into the swap cache will stick around after their vma has been
unmapped by the owning process. These pages are not free'd in free_pte()
because they are still referenced by the page cache. In addition, if the
pages are dirty, they will be written out to the swap device before they
are reclaimed even though the owning process no longer will be using the
pages.

free_pte in mm/memory.c has been modified to check to see if the page is
only being referenced by the swap cache (and possibly buffers). If so,
the buffers (if existant) are free'd and the page and swap cache
entry are removed immediately.

Essentially, this is the same patch as before, but there was one condition
in which case we would leak and extra reference to the targeted page if
the counts would not allow us to remove the swap cache entry. The leak in
2.4.1 also applies to 2.4.2 and 2.4.3-pre5.

Rich Jerrell
[email protected]

Attachments:

2.4.1-paging-fix-22.03.01.patch (1.46 kB)

2001-03-22 23:20:01

by Rik van Riel

[permalink] [raw]

Subject: Re: [PATCH] mm/memory.c, 2.4.1 : memory leak with swap cache (updated)

On Thu, 22 Mar 2001, Richard Jerrell wrote:

> 2.4.1 has a memory leak (temporary) where anonymous memory pages
> that have been moved into the swap cache will stick around after
> their vma has been unmapped by the owning process.

> free_pte in mm/memory.c has been modified to check to see if the
> page is only being referenced by the swap cache

Your idea is nice, but the patch lacks a few things:

- SMP locking, what if some other process faults in this page
between the atomic_read of the page count and the test later?
- testing if our process is the _only_ user of this swap page,
for eg. apache you'll have lots of COW-shared pages .. it would
be good to keep the page in memory for our siblings

regards,

Rik
--
Linux MM bugzilla: http://linux-mm.org/bugzilla.shtml

Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/
http://www.conectiva.com/ http://distro.conectiva.com/

2001-03-23 16:24:09

by Richard Jerrell

[permalink] [raw]

Subject: Re: [PATCH] mm/memory.c, 2.4.1 : memory leak with swap cache (updated)

> Your idea is nice, but the patch lacks a few things:
>
> - SMP locking, what if some other process faults in this page
> between the atomic_read of the page count and the test later?

It can't happen. free_pte is called with the page_table_lock held in
addition to having the mmap_sem downed.

> - testing if our process is the _only_ user of this swap page,
> for eg. apache you'll have lots of COW-shared pages .. it would
> be good to keep the page in memory for our siblings

This is already done in free_page_and_swap_cache.

There is a problem with a possible kernel panic in that
try_to_free_buffers is called with a wait of 1 (thanks to Andrew Morton
for pointing that out) and we might reschedule while we wait on io. So,
to fix it, here is an even newer (and simpler) patch. Everything is
handled in free_page_and_swap_cache, so we just call that if we can
successfully look up the entry.

Rich

Attachments:

2.4.1-paging-fix-23.03.01.patch (711.00 B)

2001-03-23 23:59:37

by Rik van Riel

[permalink] [raw]

Subject: Re: [PATCH] mm/memory.c, 2.4.1 : memory leak with swap cache (updated)

On Fri, 23 Mar 2001, Richard Jerrell wrote:
> > Your idea is nice, but the patch lacks a few things:
> >
> > - SMP locking, what if some other process faults in this page
> > between the atomic_read of the page count and the test later?
>
> It can't happen. free_pte is called with the page_table_lock held in
> addition to having the mmap_sem downed.

The page_table_lock and the mmap_sem only protect the *current*
task. Think about something like an apache with 500 children who
COW share the same page...

> > - testing if our process is the _only_ user of this swap page,
> > for eg. apache you'll have lots of COW-shared pages .. it would
> > be good to keep the page in memory for our siblings
>
> This is already done in free_page_and_swap_cache.

Ok ...

regards,

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/
http://www.conectiva.com/ http://distro.conectiva.com.br/

2001-03-27 16:23:33

by Stephen C. Tweedie

[permalink] [raw]

Subject: Re: [PATCH] mm/memory.c, 2.4.1 : memory leak with swap cache (updated)

Hi,

On Thu, Mar 22, 2001 at 05:21:46PM -0500, Richard Jerrell wrote:
> 2.4.1 has a memory leak (temporary) where anonymous memory pages that have
> been moved into the swap cache will stick around after their vma has been
> unmapped by the owning process. These pages are not free'd in free_pte()
> because they are still referenced by the page cache. In addition, if the
> pages are dirty, they will be written out to the swap device before they
> are reclaimed even though the owning process no longer will be using the
> pages.
>
> free_pte in mm/memory.c has been modified to check to see if the page is
> only being referenced by the swap cache (and possibly buffers).

But is it worth it?

fork and exit are very hot paths in the kernel, and this patch can force
a page cache lookup on a large number of pte which wouldn't be looked
up before.

The classic case is sendmail or apache, where you can have a parent
process rapidly forking a large number of children. If part of the
parent gets swapped out due to lack of use, then the children all
inherit swapped ptes and each such page will result in an extra page
cache lookup in zap_page_range on exit with this change.

Given that the leak is, as you say, temporary, and that the leak will
be recovered as soon as we start swapping again, do we really want to
pollute the fast path for the sake of a bit more speed during
swapping?

Cheers,
Stephen

2001-03-27 21:19:37

by Rik van Riel

[permalink] [raw]

Subject: Re: [PATCH] mm/memory.c, 2.4.1 : memory leak with swap cache (updated)

On Tue, 27 Mar 2001, Richard Jerrell wrote:

> Instead of removing the swap cache page at process exit and possibly
> expending time doing disk IO as you have pointed out, we check during
> refill_inactive_scan and page_launder for a page that is

Three comments:

1. we take an extra reference on the page, how does that
affect the test for if the page is shared or not ?
2. we call delete_from_swap_cache with the pagemap_lru_lock
held, since this tries to grab the pagecache_lock we can
easily deadlock with the rest of the kernel (where the
locking order is opposite)
3. there are no comments in the code explaining what this
suspicious-looking piece of code does ;)

regards,

Rik
--
Linux MM bugzilla: http://linux-mm.org/bugzilla.shtml

Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/
http://www.conectiva.com/ http://distro.conectiva.com/

2001-03-27 21:11:24

by Richard Jerrell

[permalink] [raw]

Subject: Re: [PATCH] mm/memory.c, 2.4.1 : memory leak with swap cache (updated)

> fork and exit are very hot paths in the kernel, and this patch can force
> a page cache lookup on a large number of pte which wouldn't be looked
> up before.

True, but I don't know how large of a performance hit the system takes.

> Given that the leak is, as you say, temporary, and that the leak will
> be recovered as soon as we start swapping again, do we really want to
> pollute the fast path for the sake of a bit more speed during
> swapping?

It isn't speed of swapping that is the biggest problem. The problem is
that if you run a memory intensive task, exit after being placed on an
lru, and run it again, there won't be enough memory to execute because all
the memory you used previously is now sitting in the swap cache. That
isn't to say that without being patched the speed isn't poor. After all,
we'd be paging out a dead processes pages.

But you are right, this fix is slow and that can be improved. So,
hopefully this patch is satisfactory in respect to speed and fixing the
leak. And will also remove the panic which is possible with the other
patches (can't do a lookup_swap_cache with a spinlock held).

Instead of removing the swap cache page at process exit and possibly
expending time doing disk IO as you have pointed out, we check during
refill_inactive_scan and page_launder for a page that is

1) in the swap cache
2) is not locked
3) is only being referenced by the swap cache, us, and possibly by
buffers
4) has no one else referencing the swap cell

If that is true, we can safely remove that page without writing it to
disk. In addition, the number of swap cache pages are included in the
amount returned from vm_enough_memory to get rid of the temporary leak.

So, the exit path remains unchanged, reclaiming a page is faster for when
the page is no longer being mapped, and the lazy reclaiming for multiply
referenced pages remains intact.

Rich

Attachments:

2.4.1-paging-fix-27.03.01.patch (2.50 kB)

2001-03-27 21:53:07

by Linus Torvalds

[permalink] [raw]

Subject: Re: [PATCH] mm/memory.c, 2.4.1 : memory leak with swap cache (updated)

On Tue, 27 Mar 2001, Richard Jerrell wrote:
>
> Instead of removing the swap cache page at process exit and possibly
> expending time doing disk IO as you have pointed out, we check during
> refill_inactive_scan and page_launder for a page that is

I think this patch looks pretty good. However, I don't think you can
safely do a "is_shared()" query without holding the page lock.

I'd be happy to be shown otherwise, of course. I'm just generally very
wary of "is_shared()", and that function makes me nervous. I'd almost
prefer to get rid of it, and test for the stuff it tests for directly
(most places that test this are likely to not need all the tests
anyway).

I also have this suspicion that most of the advantage of this patch
could easily be gotten by just testing for the exclusive "no longer
used" case in the swap-cache "writepage()" function. That would have
the advantage of localizing the test more, and minimizing special-case
swap-cache tests in the general VM codepaths.

Comments?

Linus

2001-03-27 22:52:57

by Richard Jerrell

[permalink] [raw]

Subject: Re: [PATCH] mm/memory.c, 2.4.1 : memory leak with swap cache (updated)

> 1. we take an extra reference on the page, how does that
> affect the test for if the page is shared or not ?

is_page_shared expects us to have our own reference to the page.

> 2. we call delete_from_swap_cache with the pagemap_lru_lock
> held, since this tries to grab the pagecache_lock we can
> easily deadlock with the rest of the kernel (where the
> locking order is opposite)

You're right. Oversight on my part. Here is another version of the
patch.

> 3. there are no comments in the code explaining what this
> suspicious-looking piece of code does ;)

Oops... I sent out the wrong version of the patch the first time. This
one has comments, promise. And it has one less bug. :)

Rich

Attachments:

2.4.1-paging-fix-27.03.01.patch (3.40 kB)

2001-03-27 23:05:07

by Rik van Riel

[permalink] [raw]

Subject: Re: [PATCH] mm/memory.c, 2.4.1 : memory leak with swap cache (updated)

On Tue, 27 Mar 2001, Richard Jerrell wrote:

> Oops... I sent out the wrong version of the patch the first time.
> This one has comments, promise. And it has one less bug. :)

Looks good to me (at first glance). Any volunteer to
stress-test this on an SMP machine ?

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/
http://www.conectiva.com/ http://distro.conectiva.com.br/

2001-03-28 23:49:47

by Tim Haynes

[permalink] [raw]

Subject: Re: Ideas for the oom problem

On Wed, Mar 28, 2001 at 06:33:04PM -0500, Hacksaw wrote:

> > Anyone working as root is (sorry) an idiot! root's processes are normally
> > quite system-relevant and so they should never be killed, if we can avoid
> > it.
>
> The real world intrudes. Root sometimes needs to look at documentation,
> which, these days is often available as html. Sometimes it's only as html.
> And people in a panic who aren't trained sys-admins aren't going to remember
> to log in as someone else.

Why are they logged in as root in the first place? Is there something they
can't do over sudo?
I definitely remember seeing a document saying `if you find yourself needing to
`man foo', do it in another terminal as your non-root self'; it might or might
not've been the SAG.

In any case, what happened to `if you use this rope you will hang yourself'?
There has to be a point where you abandon catering for all kinds of fool and
get on with writing something useful, I think.

> I completely agree that doing general work as root is a bad idea. I do most
> root things via sudo. It sure would be nice if all the big dists supplied it
> (Hey, RedHat! You listening?) as part of their normal set.

RH have been listening since v7.0.

~Tim

2001-03-29 00:13:37

by Hacksaw

[permalink] [raw]

Subject: Re: Ideas for the oom problem

> On Wed, Mar 28, 2001 at 06:33:04PM -0500, Hacksaw wrote:

> Why are they logged in as root in the first place? Is there something they
> can't do over sudo?

I have the "Gnome workstation" version of rawhide (7.0.xxx) on my new laptop.
I don't see sudo. Of course, it's rawhide, but you'd think, if it were in 7.0,
it'd make it. Or maybe they decided that the gnome workstation didn't need
it... Hmmm.

> I definitely remember seeing a document saying `if you find yourself needing to
> `man foo', do it in another terminal as your non-root self'; it might or might
> not've been the SAG.

Sucks if you are trying to figure out a VT problem.

> In any case, what happened to `if you use this rope you will hang yourself'?
> There has to be a point where you abandon catering for all kinds of fool and
> get on with writing something useful, I think.

Let's accept one thing: Root, should in fact, be allowed to do anything a
regular user can. The fact that hanging is a possibility might ought to be
pointed out. I have my shell set up to tell me I'm root. But the fact is, the
typical sys-admin is essentially always logged in as root somewhere, and
changing terminals to look at man pages is sometimes not an option.

For that matter, I have often figured out that something had funny permission
problems by discovering that the problem goes away if I run a program as root.

Assuming everything root is doing must be sacrosanct is a pipe dream.
Assuming everything a regular user is doing is expendable is BOFH think.

I do agree that you have to draw a line. I'm just saying that's the wrong one.

> > I completely agree that doing general work as root is a bad idea. I do most
> > root things via sudo. It sure would be nice if all the big dists supplied it
> > (Hey, RedHat! You listening?) as part of their normal set.
>
> RH have been listening since v7.0.

Good. I hope it comes out well in 7.1, considering my experience with rawhide.

2001-03-28 23:34:27

by Hacksaw

[permalink] [raw]

Subject: Re: Ideas for the oom problem

> --On Wednesday, March 28, 2001 09:38:04 -0500 Hacksaw <[email protected]>
> wrote:
> >
> > Deciding what not to kill based on who started it seems like a bad idea.
> > Root can start netscape just as easily as any user, but if the choice of
> > processes to kill is root's netscape or a user's experimental database,
> > I'd want the netscape to go away.
>
> root does not use netscape -FULLSTOP-

Making assumptions about what users will do is foolish.

> Anyone working as root is (sorry) an idiot! root's processes are normally
> quite system-relevant and so they should never be killed, if we can avoid
> it.

The real world intrudes. Root sometimes needs to look at documentation, which,
these days is often available as html. Sometimes it's only as html. And people
in a panic who aren't trained sys-admins aren't going to remember to log in as
someone else.

I completely agree that doing general work as root is a bad idea. I do most
root things via sudo. It sure would be nice if all the big dists supplied it
(Hey, RedHat! You listening?) as part of their normal set.

> There can however be processes owned by other users which shouldn't be
> killed in OOM-Situation, but generally root's processes are more important
> than a normal user's processes.

I'd suggest that this is going to change. Not to regular users, though, so
it's still a good point. But we should be figuring out how to compartmentalize
all our servers. Rarely do most servers need to run as root. Just login ones,
and those should be limited.

So which should die, the users experiment, or identd?

> What about doing something really critical to avoid the upcoming OOM-situ
> and get your shell killed because you were to slow?

Right. I agree that roots shell should be exempt. It may be that all shells
should be exempt, or maybe all recent shells.

Better, though, would be to establish the idea of "linchpins".

A linchpin is a process marked with a don't kill for OOM flag (a capability?).
Only those in root group should be able to start one. And darn few things
should be marked as such. Some very small shell, vi, ed, maybe a small emacs.
Just enough so that our heroic admin can gracefully ease the OOM situ by
changing a few bits of /etc or killing off a few well chosen processes.

On the other hand, a flag that says "kill me first" might be even better.

In any case, I'd certainly expect the OOM killer to sort by memory usage, and
kill off the hogs first. I assume it does that.

2001-03-29 08:04:19

by Helge Hafting

[permalink] [raw]

Subject: [PATCH] mm/memory.c, 2.4.1 : memory leak with swap cache (updated)

Attachments:

Subject: Re: [PATCH] mm/memory.c, 2.4.1 : memory leak with swap cache (updated)

Subject: Re: [PATCH] mm/memory.c, 2.4.1 : memory leak with swap cache (updated)

Attachments:

Subject: Re: [PATCH] mm/memory.c, 2.4.1 : memory leak with swap cache (updated)

Subject: Re: [PATCH] mm/memory.c, 2.4.1 : memory leak with swap cache (updated)

Subject: Re: [PATCH] mm/memory.c, 2.4.1 : memory leak with swap cache (updated)

Subject: Re: [PATCH] mm/memory.c, 2.4.1 : memory leak with swap cache (updated)

Attachments:

Subject: Re: [PATCH] mm/memory.c, 2.4.1 : memory leak with swap cache (updated)

Subject: Re: [PATCH] mm/memory.c, 2.4.1 : memory leak with swap cache (updated)

Attachments:

Subject: Re: [PATCH] mm/memory.c, 2.4.1 : memory leak with swap cache (updated)

Subject: Re: Ideas for the oom problem

Subject: Re: Ideas for the oom problem

Subject: Re: Ideas for the oom problem

Subject: Re: Disturbing news..

Subject: Re: Disturbing news..

Subject: Ideas for the oom problem

Subject: Re: Ideas for the oom problem

Subject: Re: Ideas for the oom problem

Subject: Re: Ideas for the oom problem

Subject: Re: Ideas for the oom problem

Subject: Re: Ideas for the oom problem

Subject: Re: Ideas for the oom problem

Subject: Re: Ideas for the oom problem

Subject: Disturbing news..

Subject: Re: Disturbing news.. Idea

Subject: Re: Disturbing news..

Subject: Re: Disturbing news..

Subject: Re: Disturbing news..

Subject: Re: Disturbing news..

Subject: Re: Disturbing news..

Subject: Re: Disturbing news..

Subject: Re: Disturbing news..

Subject: Re: Disturbing news..

Subject: Re: Disturbing news..

Subject: Re: Disturbing news..

Subject: Re: Disturbing news..

Subject: Re: Disturbing news..

Subject: Re: Disturbing news..

Subject: Re: Ideas for the oom problem

Subject: Re: Disturbing news..

Subject: Re: Disturbing news..

Subject: Re: Disturbing news..

Subject: Re: Disturbing news..

Subject: Re: Ideas for the oom problem

Subject: Re: Disturbing news..

Subject: Re: Disturbing news..

Subject: Re: Disturbing news..

Subject: Re: Disturbing news..

Subject: Re: Disturbing news..

Subject: Re: Disturbing news..

Subject: Re: Disturbing news..

Subject: Re: Disturbing news..

Subject: Re: Disturbing news..

Subject: Re: Disturbing news..

Subject: Re: Disturbing news..

Subject: Re: Disturbing news..

Subject: Re: Disturbing news..

Subject: Re: Disturbing news..

Subject: Re: Disturbing news..

Subject: Re: Disturbing news..

Subject: Serious Latency problems : 2.4.4-pre5