2004-09-13 19:06:52

by Andries E. Brouwer

[permalink] [raw]
Subject: [no patch] broken use of mm_release / deactivate_mm

Recent kernels have a bug in fork(). Things can be improved a bit
by commenting out the lines

/* Get rid of any cached register state */
deactivate_mm(tsk, mm);

in fork.c:mm_release().

What happens at a fork, is that a long sequence of things is done,
and if a failure occurs all previous things are undone. Thus
(in copy_process()):

if ((retval = copy_mm(clone_flags, p)))
goto bad_fork_cleanup_signal;
if ((retval = copy_namespace(clone_flags, p)))
goto bad_fork_cleanup_mm;
retval = copy_thread(0, clone_flags, stack_start, stack_size, p, regs);
if (retval)
goto bad_fork_cleanup_namespace;

...

bad_fork_cleanup_namespace:
exit_namespace(p);
bad_fork_cleanup_mm:
exit_mm(p);
if (p->active_mm)
mmdrop(p->active_mm);
bad_fork_cleanup_signal:
...

Thus, we may do exit_mm() when an attempted fork fails.
The argument of exit_mm() is this new, not completeley initialized
task_struct.

Now exit.c:exit_mm(p) does mm_release(p,p->mm), and this
mm_release() does deactivate_mm(), a macro that clears %fs and %gs.
Ach. A segfault is the result.

More is wrong with mm_release(). It examines p->clear_child_tid,
and possibly does put_user(0, tidptr);.
Oops.

In our case p->clear_child_tid had not yet been initialized for
the child, that happens in copy_thread() that we never reached.
So this is the value the parent had.

Also the call
enter_lazy_tlb(mm, current);
seems strange in this context.

It seems to me that the proper action is some reshuffling
of this code. Maybe it is cleanest to separate the cleanup
code for a failed fork entirely from the code for an exiting
process.

Andries


2004-09-13 19:30:40

by Linus Torvalds

[permalink] [raw]
Subject: Re: [no patch] broken use of mm_release / deactivate_mm



On Mon, 13 Sep 2004, Andries Brouwer wrote:
>
> What happens at a fork, is that a long sequence of things is done,
> and if a failure occurs all previous things are undone. Thus
> (in copy_process()):
>
> if ((retval = copy_mm(clone_flags, p)))
> goto bad_fork_cleanup_signal;
> if ((retval = copy_namespace(clone_flags, p)))
> goto bad_fork_cleanup_mm;
> retval = copy_thread(0, clone_flags, stack_start, stack_size, p, regs);
> if (retval)
> goto bad_fork_cleanup_namespace;
>
> ...
>
> bad_fork_cleanup_namespace:
> exit_namespace(p);
> bad_fork_cleanup_mm:
> exit_mm(p);
> if (p->active_mm)
> mmdrop(p->active_mm);

I agree. Looks like the "exit_mm()" should really be a "mmput()".

Can we have a few more eyes on this thing? Ingo, Nick?

Linus

2004-09-14 12:44:33

by Nick Piggin

[permalink] [raw]
Subject: Re: [no patch] broken use of mm_release / deactivate_mm

Linus Torvalds wrote:
>
> On Mon, 13 Sep 2004, Andries Brouwer wrote:
>
>>What happens at a fork, is that a long sequence of things is done,
>>and if a failure occurs all previous things are undone. Thus
>>(in copy_process()):
>>
>> if ((retval = copy_mm(clone_flags, p)))
>> goto bad_fork_cleanup_signal;
>> if ((retval = copy_namespace(clone_flags, p)))
>> goto bad_fork_cleanup_mm;
>> retval = copy_thread(0, clone_flags, stack_start, stack_size, p, regs);
>> if (retval)
>> goto bad_fork_cleanup_namespace;
>>
>>...
>>
>>bad_fork_cleanup_namespace:
>> exit_namespace(p);
>>bad_fork_cleanup_mm:
>> exit_mm(p);
>> if (p->active_mm)
>> mmdrop(p->active_mm);
>
>
> I agree. Looks like the "exit_mm()" should really be a "mmput()".
>
> Can we have a few more eyes on this thing? Ingo, Nick?
>

AFAIKS yes. exit_mm doesn't look legal unless its dropping the current
mm context. And mmput looks like it should clean up everything - it is
used almost exactly the same way to cleanup a failure case in copy_mm.

2004-09-14 15:14:34

by Linus Torvalds

[permalink] [raw]
Subject: Re: [no patch] broken use of mm_release / deactivate_mm



On Tue, 14 Sep 2004, Nick Piggin wrote:
> >
> > I agree. Looks like the "exit_mm()" should really be a "mmput()".
> >
> > Can we have a few more eyes on this thing? Ingo, Nick?
>
> AFAIKS yes. exit_mm doesn't look legal unless its dropping the current
> mm context. And mmput looks like it should clean up everything - it is
> used almost exactly the same way to cleanup a failure case in copy_mm.

Does everybody also agree that the

if (p->active_mm)
mmdrop(p->active_mm);

should also be dropped, and that mmput() does all of that correctly too?

(Again, looking at all the counts etc, I think the answer is a resounding
yes, but dammit, this code has obviously never gotten any testing at all,
since it effectively never happens).

Linus

2004-09-14 17:23:17

by Herbert Poetzl

[permalink] [raw]
Subject: Re: [no patch] broken use of mm_release / deactivate_mm

On Tue, Sep 14, 2004 at 08:06:14AM -0700, Linus Torvalds wrote:
>
>
> On Tue, 14 Sep 2004, Nick Piggin wrote:
> > >
> > > I agree. Looks like the "exit_mm()" should really be a "mmput()".
> > >
> > > Can we have a few more eyes on this thing? Ingo, Nick?
> >
> > AFAIKS yes. exit_mm doesn't look legal unless its dropping the current
> > mm context. And mmput looks like it should clean up everything - it is
> > used almost exactly the same way to cleanup a failure case in copy_mm.
>
> Does everybody also agree that the
>
> if (p->active_mm)
> mmdrop(p->active_mm);
>
> should also be dropped, and that mmput() does all of that correctly too?
>
> (Again, looking at all the counts etc, I think the answer is a resounding
> yes, but dammit, this code has obviously never gotten any testing at all,
> since it effectively never happens).

IIRC, linux-vserver did hit it once or twice
but I wasn't sure at that time that it isn't my
fault ...

this 'error path' was used by the memory limiting
code, so it's probably easy to test with minor
adjustments ...

best,
Herbert

> Linus
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2004-09-14 23:24:19

by Andries E. Brouwer

[permalink] [raw]
Subject: Re: [no patch] broken use of mm_release / deactivate_mm

On Tue, Sep 14, 2004 at 08:06:14AM -0700, Linus Torvalds wrote:

> Does everybody also agree that ... mmput() does all of that correctly too?

I think so, but do not have time to check all details.


Now the first parameter of mm_release() always is current,
so that this parameter is superfluous. Similarly, the only
parameter of exit_mm() always is current, and hence is superfluous.

Maybe it is a good idea to remove the pretense that these routines
might do something useful for general parameters, now that
deactivate_mm() does sneaky global things.

Andries