2007-11-16 17:24:57

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH 2/3] wait_task_stopped: tidy up the noreap case

wait_task_stopped(WNOWAIT) unlocks tasklist_lock and re-checks ->exit_code and
->exit_state. This is not needed: both were valid before we dropped the lock,
and without tasklist_lock both are not stable anyway.

Read the exit_code under tasklist and report the cached value without re-check.
In fact this fixes the race with the dying child, we can report a completely
false exit_code if ->exit_state is not visible yet.

Signed-off-by: Oleg Nesterov <[email protected]>

--- 24/kernel/exit.c~2_NOREAP 2007-11-16 18:13:54.000000000 +0300
+++ 24/kernel/exit.c 2007-11-16 18:18:24.000000000 +0300
@@ -1356,10 +1356,10 @@ static int wait_task_stopped(struct task
int noreap, struct siginfo __user *infop,
int __user *stat_addr, struct rusage __user *ru)
{
- int retval, exit_code;
+ int retval, exit_code = p->exit_code;
pid_t pid;

- if (!p->exit_code)
+ if (!exit_code)
return 0;
if (delayed_group_leader && !(p->ptrace & PT_PTRACED) &&
p->signal->group_stop_count > 0)
@@ -1384,9 +1384,6 @@ static int wait_task_stopped(struct task
uid_t uid = p->uid;
int why = (p->ptrace & PT_PTRACED) ? CLD_TRAPPED : CLD_STOPPED;

- exit_code = p->exit_code;
- if (unlikely(!exit_code) || unlikely(p->exit_state))
- goto bail_ref;
return wait_noreap_copyout(p, pid, uid,
why, (exit_code << 8) | 0x7f,
infop, ru);
@@ -1417,7 +1414,6 @@ static int wait_task_stopped(struct task
* resumed, or it resumed and then died.
*/
write_unlock_irq(&tasklist_lock);
-bail_ref:
put_task_struct(p);
/*
* We are returning to the wait loop without having successfully


2007-11-16 20:24:19

by Roland McGrath

[permalink] [raw]
Subject: Re: [PATCH 2/3] wait_task_stopped: tidy up the noreap case

This is good, but not quite enough. The original intent behind having the
test was never to return mismatched stale/fresh data. (Not that it ever
really worked as intended.) That is, it's fine if the task has woken up
and done other things while WNOWAIT reports it as stopped--that's stale
data, but it just means the waitid call happened "before" the resumption.
However, it should not report anything that could not possibly have been
true before the resumption. i.e. a changed exit_code, which now means an
normal termination status or a death signal, not the stop signal. This
also applies to the uid, in case the thread called setuid upon resuming
(and even to ptracedness, not that that one really matters). (It doesn't
matter for rusage, since that's not really an exact change of state with
reliable ordering anyway.)

So the setting of uid and why should also move before read_unlock.
While you're at it, you could fix the status argument to wait_noreap_copyout.
It should be just exit_code, not the WIFSTOPPED bit format it does now.


Thanks,
Roland

2007-11-17 16:38:58

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 2/3] wait_task_stopped: tidy up the noreap case

On 11/16, Roland McGrath wrote:
>
> This is good, but not quite enough. The original intent behind having the
> test was never to return mismatched stale/fresh data. (Not that it ever
> really worked as intended.) That is, it's fine if the task has woken up
> and done other things while WNOWAIT reports it as stopped--that's stale
> data, but it just means the waitid call happened "before" the resumption.
> However, it should not report anything that could not possibly have been
> true before the resumption. i.e. a changed exit_code, which now means an
> normal termination status or a death signal, not the stop signal. This
> also applies to the uid, in case the thread called setuid upon resuming
> (and even to ptracedness, not that that one really matters). (It doesn't
> matter for rusage, since that's not really an exact change of state with
> reliable ordering anyway.)
>
> So the setting of uid and why should also move before read_unlock.

Yes I agree, and I also realized this. In fact, I already tried to do this
a long ago: http://marc.info/?l=linux-kernel&m=112809846204068, please note
that !noreap branch should be changed as well.

This time I'am trying to cleanup (remove) the games with ->exit_state first.
I am mostly concerned about 3/3 patch, what do you think about it?

And. Please note that 3/3 removes the "It must also be done with the write
lock held to prevent a race with the EXIT_ZOMBIE case" comment. Afaics, we
don't need write_lock(tasklist) any longer, we can simplify things further
and remove the EGAIN case completely.

However, wait_task_stopped does:

/* move to end of parent's list to avoid starvation */
remove_parent(p);
add_parent(p);

That is why we need write_lock(). Is this really so important? Yes, the next
do_wait() can find another "interesting" task a bit faster, but only a little
bit. wait_task_continued() could be optimized in a same manner...

Also. I think the locking is not complete. {read,write}_lock(tasklist) can't
really pin the task in TRACED/STOPPED state. We need ->siglock to ensure that
the child can't escape from get_signal_to_deliver() at least, so it can't do
exit/setuid/etc. I was going to try to do this later, because this needs nasty
changes...

Oh well. OK, we can ignore patches 2-3 for now. I'd like to know your opinion
before going further, perhaps I missed something else.

> While you're at it, you could fix the status argument to wait_noreap_copyout.
> It should be just exit_code, not the WIFSTOPPED bit format it does now.

OK, unless Scott is going to do this.

Oleg.

2007-11-18 09:46:13

by Scott James Remnant

[permalink] [raw]
Subject: Re: [PATCH 2/3] wait_task_stopped: tidy up the noreap case

On Sat, 2007-11-17 at 19:38 +0300, Oleg Nesterov wrote:

> > While you're at it, you could fix the status argument to
> wait_noreap_copyout.
> > It should be just exit_code, not the WIFSTOPPED bit format it does now.
>
> OK, unless Scott is going to do this.
>
Have sent this patch separately to the list -- since it's actually my
first ever, I hope I got the format/sign-off/etc. right.

Scott
--
Scott James Remnant
[email protected]


Attachments:
signature.asc (189.00 B)
This is a digitally signed message part