2008-11-23 20:39:42

by Oleg Nesterov

[permalink] [raw]
Subject: Re: + do_wait-wakeup-optimization.patch added to -mm tree

> From: Roland McGrath <[email protected]>
>
> +static int needs_wakeup(struct task_struct *task, struct do_wait_queue_entry *w)
> +{
> + if ((w->options & __WNOTHREAD) && task->parent != w->wq.private)
> + return 0;
> +
> + if (eligible_child(w->type, w->pid, w->options,
> + task, task->exit_signal))
> + return 1;
> +
> + if (thread_group_leader(task)) {
> + /*
> + * In a group leader, do_notify_parent() may have
> + * just reset task->exit_signal because SIGCHLD was
> + * ignored, but that doesn't prevent the wakeup.
> + */
> + if (!task_detached(task) ||
> + !eligible_child(w->type, w->pid, w->options,
> + task, SIGCHLD))
> + return 0;
> + } else {
> + /*
> + * In a non-leader, this might be the release_task()
> + * case, where it's the leader rather than task
> + * whose parent is being woken.
> + */
> + if (!eligible_child(w->type, w->pid, w->options,
> + task->group_leader,
> + task_detached(task->group_leader) ?
> + SIGCHLD : task->group_leader->exit_signal))
> + return 0;
> + }
> +
> + return 1;
> +}

Unless I missed something, this is not right.

This "task" is current, iow it is the caller of do_notify_parent(). Sometime
it is OK (release_task, exit_notify), but in general not, afaics.

Let's suppose the ptracer finds the EXIT_ZOMBIE tracee and notifies its
->real_parent which sleeps in do_wait(). In that case the usage of
eligible_child(task == ptracer) above is bogus, and checking for
group_leader is not rifgt too.

> +static int do_wait_wake_function(wait_queue_t *curr, unsigned mode, int sync,
> + void *key)
> +{
> + struct task_struct *task = current;

I think we can fix (and simplify) this code if we change __wake_up_parent(),
it should call __wake_up(key => p), so we can do

struct task_struct *task = key;

> + if (!needs_wakeup(task, w))
> + return 0;
> +
> + return default_wake_function(curr, mode, sync, key);

perhaps autoremove_wake_function() makes more sense.

Oleg.


2008-11-23 20:55:22

by Oleg Nesterov

[permalink] [raw]
Subject: do_wait() vs do_notify_parent_cldstop() theoretical race?

Looking at do_wait(), suddenly I am starting to suspect we have the
highly theoretical race with do_notify_parent_cldstop().

do_wait:

add_wait_queue(...);

current->state = TASK_INTERRUPTIBLE;

read_lock(tasklist_lock);

... try to find the "interesting" task ...

read_unlock(tasklist_lock);

if (!retval) // not found
schedule();

We don't race with do_notify_parent() because it takes tasklist
for writing. But do_notify_parent_cldstop() can run in parallel
under read_lock(tasklist).

Now suppose that "->state = TASK_INTERRUPTIBLE" leaks deeply into
the critical section. In theory, it is possible that wait_consider_task()
checks task_is_stopped_or_traced() or SIGNAL_STOP_CONTINUED first, then
CPU sets state = TASK_INTERRUPTIBLE. And we can miss the event if
do_notify_parent_cldstop() happens in between.

No?

Oleg.

2008-12-04 00:55:04

by Roland McGrath

[permalink] [raw]
Subject: Re: do_wait() vs do_notify_parent_cldstop() theoretical race?

I guess. There might well be something in there implicitly that's actually
a memory barrier already. But sounds like that should use set_current_state().


Thanks,
Roland

2008-12-04 00:55:51

by Roland McGrath

[permalink] [raw]
Subject: Re: + do_wait-wakeup-optimization.patch added to -mm tree

> Let's suppose the ptracer finds the EXIT_ZOMBIE tracee and notifies its
> ->real_parent which sleeps in do_wait(). In that case the usage of
> eligible_child(task == ptracer) above is bogus, and checking for
> group_leader is not rifgt too.

I had overlooked that do_notify_parent() call.

> > +static int do_wait_wake_function(wait_queue_t *curr, unsigned mode, int sync,
> > + void *key)
> > +{
> > + struct task_struct *task = current;
>
> I think we can fix (and simplify) this code if we change __wake_up_parent(),
> it should call __wake_up(key => p), so we can do
>
> struct task_struct *task = key;

I had not looked into the bowels of various __wake_up variants, just
assumed it would stay as it is and use wake_up_interruptible_sync.

That would certainly be cleaner. Then do_wait_wake_function would not need
the second of its special cases, only the one double-check for the
thread_group_leader && task_detached case.

I don't see an exposed __wake_up* variant that both passes a "key" pointer
through and does "sync". For __wake_up_parent, "sync" is quite desireable.

> > + if (!needs_wakeup(task, w))
> > + return 0;
> > +
> > + return default_wake_function(curr, mode, sync, key);
>
> perhaps autoremove_wake_function() makes more sense.

Why? The do_wait loop will have to go through again and still might just
sleep again. The explicit remove at the end of do_wait seems fine to me.


Thanks,
Roland

2008-12-04 01:06:38

by Roland McGrath

[permalink] [raw]
Subject: Re: do_wait() vs do_notify_parent_cldstop() theoretical race?

I guess. There might well be something in there implicitly that's actually
a memory barrier already. But sounds like that should use set_current_state().


Thanks,
Roland

2008-12-04 01:07:48

by Roland McGrath

[permalink] [raw]
Subject: Re: + do_wait-wakeup-optimization.patch added to -mm tree

> Let's suppose the ptracer finds the EXIT_ZOMBIE tracee and notifies its
> ->real_parent which sleeps in do_wait(). In that case the usage of
> eligible_child(task == ptracer) above is bogus, and checking for
> group_leader is not rifgt too.

I had overlooked that do_notify_parent() call.

> > +static int do_wait_wake_function(wait_queue_t *curr, unsigned mode, int sync,
> > + void *key)
> > +{
> > + struct task_struct *task = current;
>
> I think we can fix (and simplify) this code if we change __wake_up_parent(),
> it should call __wake_up(key => p), so we can do
>
> struct task_struct *task = key;

I had not looked into the bowels of various __wake_up variants, just
assumed it would stay as it is and use wake_up_interruptible_sync.

That would certainly be cleaner. Then do_wait_wake_function would not need
the second of its special cases, only the one double-check for the
thread_group_leader && task_detached case.

I don't see an exposed __wake_up* variant that both passes a "key" pointer
through and does "sync". For __wake_up_parent, "sync" is quite desireable.

> > + if (!needs_wakeup(task, w))
> > + return 0;
> > +
> > + return default_wake_function(curr, mode, sync, key);
>
> perhaps autoremove_wake_function() makes more sense.

Why? The do_wait loop will have to go through again and still might just
sleep again. The explicit remove at the end of do_wait seems fine to me.


Thanks,
Roland

2008-12-04 15:27:55

by Oleg Nesterov

[permalink] [raw]
Subject: Re: + do_wait-wakeup-optimization.patch added to -mm tree

On 11/23, Roland McGrath wrote:
>
> > > +static int do_wait_wake_function(wait_queue_t *curr, unsigned mode, int sync,
> > > + void *key)
> > > +{
> > > + struct task_struct *task = current;
> >
> > I think we can fix (and simplify) this code if we change __wake_up_parent(),
> > it should call __wake_up(key => p), so we can do
> >
> > struct task_struct *task = key;
>
> I don't see an exposed __wake_up* variant that both passes a "key" pointer
> through and does "sync". For __wake_up_parent, "sync" is quite desireable.

Well, yes... and __wake_up_common() is static. Perhaps we can make a new
helper. I must admit, I don't understand what "sync" actually means nowadays.

> > > + if (!needs_wakeup(task, w))
> > > + return 0;
> > > +
> > > + return default_wake_function(curr, mode, sync, key);
> >
> > perhaps autoremove_wake_function() makes more sense.
>
> Why? The do_wait loop will have to go through again and still might just
> sleep again. The explicit remove at the end of do_wait seems fine to me.

Yes, yes, I was wrong. I forgot about "repeat:" in do_wait().

Oleg.

2008-12-04 21:00:45

by Roland McGrath

[permalink] [raw]
Subject: Re: + do_wait-wakeup-optimization.patch added to -mm tree

> > I don't see an exposed __wake_up* variant that both passes a "key" pointer
> > through and does "sync". For __wake_up_parent, "sync" is quite desireable.
>
> Well, yes... and __wake_up_common() is static. Perhaps we can make a new
> helper.

Right, that's what I was suggesting (and not volunteering to do ;-).

> I must admit, I don't understand what "sync" actually means nowadays.

I don't claim to know any actual scheduler innards. But the meaning as I
understand it is to "make it runnable, but don't try to reschedule right
now because current will block quite soon anyway. If this does indeed
reduce work done to immediately reschedule, then it seems quite desireable
to avoid that flutter since the dying/stopping thread is very few cycles
away from yielding, and in the death case it will be for the last time and
rescheduling earlier just means a later unnecessary switch back and delayed
put_task_struct processing after the reap.


Thanks,
Roland