LinuxLists.cc - 2.4.14 + Bug in swap

2001-11-21 06:20:22

Subject: 2.4.14 + Bug in swap_out.

In swap_out we have the following code:

spin_lock(&mmlist_lock);
mm = swap_mm;
while (mm->swap_address == TASK_SIZE || mm == &init_mm) {
mm->swap_address = 0;
mm = list_entry(mm->mmlist.next, struct mm_struct, mmlist);
if (mm == swap_mm)
goto empty;
swap_mm = mm;
}

/* Make sure the mm doesn't disappear when we drop the lock.. */
atomic_inc(&mm->mm_users);
spin_unlock(&mmlist_lock);

nr_pages = swap_out_mm(mm, nr_pages, &counter, classzone);

mmput(mm);

And looking in fork.c mmput under with right circumstances becomes.
kmem_cache_free(mm_cachep, (mm)))

So it appears that there is nothing that keeps the mm_struct that
swap_mm points to as being valid.

I guess the easy fix would be to increment the count on swap_mm,
and then do an mmput we assign something else to the value of swap_mm. But
I don't know if that is what we want.

Thoughts?

Eric

2001-11-21 06:29:53

by David Miller

[permalink] [raw]

Subject: Re: 2.4.14 + Bug in swap_out.

From: [email protected] (Eric W. Biederman)
Date: 20 Nov 2001 23:01:06 -0700

And looking in fork.c mmput under with right circumstances becomes.
kmem_cache_free(mm_cachep, (mm)))

So it appears that there is nothing that keeps the mm_struct that
swap_mm points to as being valid.

I do not agree with your analysis.

If we hold the mmlist lock and we find the mm on the swap mm list, by
definition it must have a non-zero user count already. (put an assert
there if you don't believe me :-)

Only when the user count drops to zero will mmput() free up the mm.
It simultaneously grabs the mmlist lock when it drops the user count
to zero, this is how it synchronizes with the rest of the world.
Perhaps you aren't noticing that it is using "atomic_dec_and_lock()"
or you don't understand how that primitive works?

We increment the mm user count before dropping the mmlist lock in the
swapper, so even if the user does a mmput() we still hold a reference.
ie. mmput won't put the user count to zero.

2001-11-21 06:56:58

by Eric W. Biederman

[permalink] [raw]

Subject: Re: 2.4.14 + Bug in swap_out.

"David S. Miller" <[email protected]> writes:

> I do not agree with your analysis.

Neither do I now but not for your reasons :)

I looked again we are o.k. but just barely. mmput explicitly checks
to see if it is freeing the swap_mm, and fixes if we are. It is a
nasty interplay with the swap_mm global, but the code is correct.

My apologies for freaking out I but I couldn't imagine mmput doing
something like that.

Eric

2001-11-21 12:14:37

by Rik van Riel

[permalink] [raw]

Subject: Re: 2.4.14 + Bug in swap_out.

On 20 Nov 2001, Eric W. Biederman wrote:

> /* Make sure the mm doesn't disappear when we drop the lock.. */
> atomic_inc(&mm->mm_users);
> spin_unlock(&mmlist_lock);
>
> nr_pages = swap_out_mm(mm, nr_pages, &counter, classzone);
>
> mmput(mm);
>
>
> And looking in fork.c mmput under with right circumstances becomes.
> kmem_cache_free(mm_cachep, (mm)))
>
> So it appears that there is nothing that keeps the mm_struct that
> swap_mm points to as being valid.

The atomic_inc(&mm->mm_users) above should make sure this
mm_struct stays valid.

regards,

Rik
--
Shortwave goes a long way: irc.starchat.net #swl

http://www.surriel.com/ http://distro.conectiva.com/

2001-11-21 12:17:48

by Rik van Riel

[permalink] [raw]

Subject: Re: 2.4.14 + Bug in swap_out.

On 20 Nov 2001, Eric W. Biederman wrote:
> "David S. Miller" <[email protected]> writes:
>
> > I do not agree with your analysis.
>
> Neither do I now but not for your reasons :)
>
> I looked again we are o.k. but just barely. mmput explicitly checks
> to see if it is freeing the swap_mm, and fixes if we are. It is a
> nasty interplay with the swap_mm global, but the code is correct.

To be honest I don't see the reason for this subtle
playing with swap_mm in mmput(), since the refcounting
should mean we're safe.

Rik
--
Shortwave goes a long way: irc.starchat.net #swl

http://www.surriel.com/ http://distro.conectiva.com/

2001-11-21 13:51:12

by Eric W. Biederman

[permalink] [raw]

Subject: Re: 2.4.14 + Bug in swap_out.

Rik van Riel <[email protected]> writes:

> On 20 Nov 2001, Eric W. Biederman wrote:
> > "David S. Miller" <[email protected]> writes:
> >
> > > I do not agree with your analysis.
> >
> > Neither do I now but not for your reasons :)
> >
> > I looked again we are o.k. but just barely. mmput explicitly checks
> > to see if it is freeing the swap_mm, and fixes if we are. It is a
> > nasty interplay with the swap_mm global, but the code is correct.
>
> To be honest I don't see the reason for this subtle
> playing with swap_mm in mmput(), since the refcounting
> should mean we're safe.

We only hold a ref count for the duration of swap_out_mm.
Not for the duration of the value in swap_mm.

Eric

2001-11-21 14:20:56

by Rik van Riel

[permalink] [raw]

Subject: Re: 2.4.14 + Bug in swap_out.

On 21 Nov 2001, Eric W. Biederman wrote:

> We only hold a ref count for the duration of swap_out_mm.
> Not for the duration of the value in swap_mm.

In that case, why can't we just take the next mm from
init_mm and just "roll over" our mm to the back of the
list once we're done with it ?

Removing magic is good ;)

regards,

Rik
--
DMCA, SSSCA, W3C? Who cares? http://thefreeworld.net/

http://www.surriel.com/ http://distro.conectiva.com/

2001-11-21 14:41:07

by Eric W. Biederman

[permalink] [raw]

Subject: Re: 2.4.14 + Bug in swap_out.

Rik van Riel <[email protected]> writes:

> On 21 Nov 2001, Eric W. Biederman wrote:
>
> > We only hold a ref count for the duration of swap_out_mm.
> > Not for the duration of the value in swap_mm.
>
> In that case, why can't we just take the next mm from
> init_mm and just "roll over" our mm to the back of the
> list once we're done with it ?

Sounds good to me. Unless we have another user for that list.

> Removing magic is good ;)

Definitely. Things that are locally correct are much easier
to verify and trust. I'm satisfied for the moment that it isn't
actually broken. But more obvious code is definitely a plus
if we can get it.

Eric

2001-11-21 15:33:17

by Hugh Dickins

[permalink] [raw]

Subject: Re: 2.4.14 + Bug in swap_out.

On Wed, 21 Nov 2001, Rik van Riel wrote:
> On 21 Nov 2001, Eric W. Biederman wrote:
>
> > We only hold a ref count for the duration of swap_out_mm.
> > Not for the duration of the value in swap_mm.

Exactly.

> In that case, why can't we just take the next mm from
> init_mm and just "roll over" our mm to the back of the
> list once we're done with it ?

No. That's how it used to be, that's what I changed it from.

fork and exec are well ordered in how they add to the mmlist,
and that ordering (children after parent) suited swapoff nicely,
to minimize duplication of a swapent while it's being unused;
except swap_out randomized the order by cycling init_mm around it.

I agree that mmput would look nicer without the reference to swap_mm.
If you want to make a change here, I'd suggest replacing swap_mm
pointer by full dummy marker swap_mm, then mmput wouldn't need to
worry about it at all.

I didn't do it that way because I couldn't see where to initialize
the swap_mm structure without touching each architecture separately;
and I also wondered if there might be some stats gathering utility
out there which would get confused by finding a non-mm in the mmlist.

Hugh

2001-11-21 15:39:57

by Rik van Riel

[permalink] [raw]

Subject: Re: 2.4.14 + Bug in swap_out.

On Wed, 21 Nov 2001, Hugh Dickins wrote:

> > In that case, why can't we just take the next mm from
> > init_mm and just "roll over" our mm to the back of the
> > list once we're done with it ?
>
> No. That's how it used to be, that's what I changed it from.
>
> fork and exec are well ordered in how they add to the mmlist,
> and that ordering (children after parent) suited swapoff nicely,
> to minimize duplication of a swapent while it's being unused;
> except swap_out randomized the order by cycling init_mm around it.

Urmmm, so the code was obfuscated in order to optimise
swapoff() ?

Exactly how bad was the "mmlist randomising" for swapoff() ?

regards,

Rik
--
DMCA, SSSCA, W3C? Who cares? http://thefreeworld.net/

http://www.surriel.com/ http://distro.conectiva.com/

2001-11-21 16:17:36

by Hugh Dickins

[permalink] [raw]

Subject: Re: 2.4.14 + Bug in swap_out.

On Wed, 21 Nov 2001, Rik van Riel wrote:
> On Wed, 21 Nov 2001, Hugh Dickins wrote:
> >
> > fork and exec are well ordered in how they add to the mmlist,
> > and that ordering (children after parent) suited swapoff nicely,
> > to minimize duplication of a swapent while it's being unused;
> > except swap_out randomized the order by cycling init_mm around it.
>
> Urmmm, so the code was obfuscated in order to optimise
> swapoff() ?

To speed swapoff, I changed the code back to how fork (see comment
on "Add it to the mmlist" in fork.c old and new) and exec seemed to
intend. I don't see see that I _obfuscated_ the code:
what's so difficult about swap_mm?

> Exactly how bad was the "mmlist randomising" for swapoff() ?

It was unnecessary and counter-productive, I changed it.
Exact number? No, but small.

Hugh

2001-11-21 16:46:50

by Eric W. Biederman

[permalink] [raw]

Subject: Re: 2.4.14 + Bug in swap_out.

Hugh Dickins <[email protected]> writes:

> On Wed, 21 Nov 2001, Rik van Riel wrote:
> > On Wed, 21 Nov 2001, Hugh Dickins wrote:
> > >
> > > fork and exec are well ordered in how they add to the mmlist,
> > > and that ordering (children after parent) suited swapoff nicely,
> > > to minimize duplication of a swapent while it's being unused;
> > > except swap_out randomized the order by cycling init_mm around it.
> >
> > Urmmm, so the code was obfuscated in order to optimise
> > swapoff() ?
>
> To speed swapoff, I changed the code back to how fork (see comment
> on "Add it to the mmlist" in fork.c old and new) and exec seemed to
> intend. I don't see see that I _obfuscated_ the code:
> what's so difficult about swap_mm?

Practical test when I pointed out that something needed to be done
(and I didn't see the code in mmput) both David & Rik didn't even
see the problem much less where it was worked around. And neither
of the saw the code in mmput. If people can look at the code
and see what is going on that is hard to follow, by definition.

The primary problem with swap_mm is that swap_mm is used totally
unexpectedly in a different file. Instead of it's usage being small
local and contained.

> > Exactly how bad was the "mmlist randomising" for swapoff() ?
>
> It was unnecessary and counter-productive, I changed it.
> Exact number? No, but small.

There is some sense in allowing swapoff not to check new processes
but... The only optimization that really makes sense with swapoff is
to turn it inside out and traverse each process only once... With
possibly a little of the current logic to handle the shared swap case.

Eric

2001-11-21 17:43:46

by Hugh Dickins

[permalink] [raw]

Subject: Re: 2.4.14 + Bug in swap_out.

On 21 Nov 2001, Eric W. Biederman wrote:
>
> The primary problem with swap_mm is that swap_mm is used totally
> unexpectedly in a different file. Instead of it's usage being small
> local and contained.

swap_mm is not the kernel's only global variable, and it is commented:

/* Placeholder for swap_out(): may be updated by fork.c:mmput() */
struct mm_struct *swap_mm = &init_mm;

I think the problem is rather that people thought they knew this code
without reading it, then found it different from what they imagined.

> There is some sense in allowing swapoff not to check new processes

No, it used to miss new processes, it needs to catch them, now it does.
But that's not the reason for the mmlist ordering: the ordering is
desirable so that you can eliminate all swapents from parent, then
eliminate all from child - other way round may leave some in child.

> but... The only optimization that really makes sense with swapoff is
> to turn it inside out and traverse each process only once... With
> possibly a little of the current logic to handle the shared swap case.

I wouldn't say "only"; and isn't it true that at least the current
(hideously cpu intensive) way does minimize swap disk head movement?
which may often (especially at shutdown time) be more important than
than minimizing cpu use.

But certainly, cpu-wise, it's much more attractive at least to try
to traverse each mm once only. Unfortunately, as you observe, it
does then still need the current logic to pick up the leftovers.
That current logic could certainly be simplified (once it's a rare
path, throw out complicating speedups); but overall it would be
more not less complicated - a series of two different methods -
so I haven't yet gathered up the energy to go that way.

Hugh