2005-09-29 15:46:38

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH] fix TASK_STOPPED vs TASK_NONINTERACTIVE interaction

do_signal_stop:

for_each_thread(t) {
if (t->state < TASK_STOPPED)
++sig->group_stop_count;
}

However, TASK_NONINTERACTIVE > TASK_STOPPED, so this loop will not
count TASK_INTERRUPTIBLE | TASK_NONINTERACTIVE threads.

See also wait_task_stopped(), which checks ->state > TASK_STOPPED.

Signed-off-by: Oleg Nesterov <[email protected]>

--- 2.6.14-rc2/include/linux/sched.h~6_NONINT 2005-09-24 18:33:51.000000000 +0400
+++ 2.6.14-rc2/include/linux/sched.h 2005-09-29 23:42:25.000000000 +0400
@@ -110,11 +110,11 @@ extern unsigned long nr_iowait(void);
#define TASK_RUNNING 0
#define TASK_INTERRUPTIBLE 1
#define TASK_UNINTERRUPTIBLE 2
-#define TASK_STOPPED 4
-#define TASK_TRACED 8
-#define EXIT_ZOMBIE 16
-#define EXIT_DEAD 32
-#define TASK_NONINTERACTIVE 64
+#define TASK_NONINTERACTIVE 4
+#define TASK_STOPPED 8
+#define TASK_TRACED 16
+#define EXIT_ZOMBIE 32
+#define EXIT_DEAD 64

#define __set_task_state(tsk, state_value) \
do { (tsk)->state = (state_value); } while (0)


2005-09-29 16:03:08

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] fix TASK_STOPPED vs TASK_NONINTERACTIVE interaction



On Thu, 29 Sep 2005, Oleg Nesterov wrote:
>
> do_signal_stop:
>
> for_each_thread(t) {
> if (t->state < TASK_STOPPED)
> ++sig->group_stop_count;
> }
>
> However, TASK_NONINTERACTIVE > TASK_STOPPED, so this loop will not
> count TASK_INTERRUPTIBLE | TASK_NONINTERACTIVE threads.

Ok, I think your patch is correct (we really do try to keep an order to
those task flags), but the _real_ issue is that the comparisons are bogus.

Using ">" for task states is wrong. It's a bitmask, and if you want to
check multiple states, then we should just do so with

if (t->state & (TASK_xxx | TASK_yyy | ...))

Oh, well. The inequality comparisons are shorter, and historical, so I
guess it's debatable whether we should remove them all.

Linus

2005-09-29 16:26:33

by Paulo Marques

[permalink] [raw]
Subject: Re: [PATCH] fix TASK_STOPPED vs TASK_NONINTERACTIVE interaction

Linus Torvalds wrote:
>
> On Thu, 29 Sep 2005, Oleg Nesterov wrote:
>
>>[...]
>>However, TASK_NONINTERACTIVE > TASK_STOPPED, so this loop will not
>>count TASK_INTERRUPTIBLE | TASK_NONINTERACTIVE threads.
>
> [...]
> Using ">" for task states is wrong. It's a bitmask, and if you want to
> check multiple states, then we should just do so with
>
> if (t->state & (TASK_xxx | TASK_yyy | ...))
>
> Oh, well. The inequality comparisons are shorter, and historical, so I
> guess it's debatable whether we should remove them all.

I did a quick grep through 2.6.14-rc2 to see how many "them all" were,
and the only two places I could find, where a inequality operator was
being used on a task state, were this one in kernel/signal.c and another
in kernel/exit.c:

./kernel/exit.c:1194: unlikely(p->state > TASK_STOPPED)

So maybe it is not so bad to just change these to a bit mask and
disallow inequality comparisons in the future, if you guys feel that is
the way to go...

--
Paulo Marques - http://www.grupopie.com

The rule is perfect: in all matters of opinion our
adversaries are insane.
Mark Twain

2005-09-29 21:54:50

by Roland McGrath

[permalink] [raw]
Subject: Re: [PATCH] fix TASK_STOPPED vs TASK_NONINTERACTIVE interaction

I am dubious about this change. I don't see a corresponding change to
fs/proc/array.c where it knows what all the bit values are. I am not at
all sure there aren't other places that know these values and need a fixup
if you change them.

Any tests using < TASK_STOPPED or the like are left over from the time when
the TASK_ZOMBIE and TASK_DEAD bits were in the same word, and it served to
check for "stopped or dead". I think this one in do_signal_stop is the
only such case. It has been buggy ever since exit_state was separated.
Changing the bit values doesn't fix the bug that it isn't checking the
exit_state value. This patch is probably the right fix for that, but I
have not tested it.

Signed-off-by: Roland McGrath <[email protected]>

--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1775,7 +1775,8 @@ do_signal_stop(int signr)
* stop is always done with the siglock held,
* so this check has no races.
*/
- if (t->state < TASK_STOPPED) {
+ if (!t->exit_state &&
+ !(t->state & (TASK_STOPPED|TASK_TRACED))) {
stop_count++;
signal_wake_up(t, 0);
}

2005-09-29 22:15:54

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] fix TASK_STOPPED vs TASK_NONINTERACTIVE interaction



On Thu, 29 Sep 2005, Roland McGrath wrote:
>
> I am dubious about this change. I don't see a corresponding change to
> fs/proc/array.c where it knows what all the bit values are.

You're right. Not only that, but "TASK_NONINTERACTIVE" is special in that
it's an _additional_ flag to the task state, not an independent flag at
all.

Ie it's _really_ only valid as a bitmask.

So I think we're better off reverting that ordering change, and testing
the bitmap properly.

> Any tests using < TASK_STOPPED or the like are left over from the time when
> the TASK_ZOMBIE and TASK_DEAD bits were in the same word, and it served to
> check for "stopped or dead".

Correct again.

Btw, that brings up another thing: those EXIT_ZOMBIE/EXIT_DEAD flags are
really really confusing.

It's two different words, but the way we use them in get_task_state(),
they are or'ed together, which is why they need to have non-overlapping
bit definitions. But there's no comment about that anywhere.

I'll add a comment to <linux/sched.h> about it.

Thanks,

Linus

2005-09-30 16:39:00

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH] fix TASK_STOPPED vs TASK_NONINTERACTIVE interaction

Roland, could you please explain this code in wait_task_stopped()

if (!exit_code || p->state > TASK_STOPPED)
goto bail_ref;

It looks like "WSTOPPED | WNOWAIT is illegal for TASK_TRACED child"
to me. Is this correct? I think no.

Actually, I don't understand why we are checking p->state at all, we
already dropped tasklist_lock, the state can change at any monent.

Also, wait_task_stopped() calculates ->si_code outside tasklist, is
it correct?

In other words, could you explain why this (untested) patch incorrect ?

Thanks in advance,
Oleg.

--- 2.6.14-rc2/kernel/exit.c~ 2005-09-24 18:33:35.000000000 +0400
+++ 2.6.14-rc2/kernel/exit.c 2005-10-01 00:45:06.000000000 +0400
@@ -1174,10 +1174,13 @@ static int wait_task_stopped(task_t *p,
struct siginfo __user *infop,
int __user *stat_addr, struct rusage __user *ru)
{
- int retval, exit_code;
+ int retval, why, exit_code = p->exit_code;

- if (!p->exit_code)
+ if (!exit_code)
return 0;
+
+ why = (p->ptrace & PT_PTRACED) ? CLD_TRAPPED : CLD_STOPPED;
+
if (delayed_group_leader && !(p->ptrace & PT_PTRACED) &&
p->signal && p->signal->group_stop_count > 0)
/*
@@ -1196,19 +1199,10 @@ static int wait_task_stopped(task_t *p,
get_task_struct(p);
read_unlock(&tasklist_lock);

- if (unlikely(noreap)) {
- pid_t pid = p->pid;
- uid_t uid = p->uid;
- int why = (p->ptrace & PT_PTRACED) ? CLD_TRAPPED : CLD_STOPPED;
-
- exit_code = p->exit_code;
- if (unlikely(!exit_code) ||
- unlikely(p->state > TASK_STOPPED))
- goto bail_ref;
- return wait_noreap_copyout(p, pid, uid,
+ if (unlikely(noreap))
+ return wait_noreap_copyout(p, p->pid, p->uid,
why, (exit_code << 8) | 0x7f,
infop, ru);
- }

write_lock_irq(&tasklist_lock);

@@ -1235,7 +1229,6 @@ static int wait_task_stopped(task_t *p,
* resumed, or it resumed and then died.
*/
write_unlock_irq(&tasklist_lock);
-bail_ref:
put_task_struct(p);
/*
* We are returning to the wait loop without having successfully
@@ -1252,6 +1245,7 @@ bail_ref:
remove_parent(p);
add_parent(p, p->parent);

+ why = (p->ptrace & PT_PTRACED) ? CLD_TRAPPED : CLD_STOPPED;
write_unlock_irq(&tasklist_lock);

retval = ru ? getrusage(p, RUSAGE_BOTH, ru) : 0;
@@ -1262,9 +1256,7 @@ bail_ref:
if (!retval && infop)
retval = put_user(0, &infop->si_errno);
if (!retval && infop)
- retval = put_user((short)((p->ptrace & PT_PTRACED)
- ? CLD_TRAPPED : CLD_STOPPED),
- &infop->si_code);
+ retval = put_user((short)why, &infop->si_code);
if (!retval && infop)
retval = put_user(exit_code, &infop->si_status);
if (!retval && infop)

2005-09-30 17:18:23

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] fix TASK_STOPPED vs TASK_NONINTERACTIVE interaction



On Fri, 30 Sep 2005, Oleg Nesterov wrote:
>
> Roland, could you please explain this code in wait_task_stopped()
>
> if (!exit_code || p->state > TASK_STOPPED)
> goto bail_ref;

Regardless of any other explanations, it turns out that "p->state" can be
something like "TASK_RUNNING | TASK_NONINTERACTIVE", and then this would
trigger totally incorrectly.

> It looks like "WSTOPPED | WNOWAIT is illegal for TASK_TRACED child"
> to me. Is this correct? I think no.

No, I think it's correct. If you have a traced child, you can't just wait
for it. You need to use ptrace to release it first.

> Actually, I don't understand why we are checking p->state at all, we
> already dropped tasklist_lock, the state can change at any monent.

If it's TASK_TRACED, and it's our child, then it shouldn't be changing.

Besides, even if it does, we had a perfectly fine race, and we'll have
been woken up again and we'll just go through the do_wait() loop once
more.

So I think the code is mostly correct. But that ">" is definitely
incorrect.

Maybe it should just be

if (!exit_code || (p->state & TASK_TRACED))

instead?

Roland?

Linus

2005-10-01 11:19:46

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH] fix TASK_STOPPED vs TASK_NONINTERACTIVE interaction

Linus Torvalds wrote:
>
> On Fri, 30 Sep 2005, Oleg Nesterov wrote:
> >
> > Roland, could you please explain this code in wait_task_stopped()
> >
> > if (!exit_code || p->state > TASK_STOPPED)
> > goto bail_ref;
>
> Regardless of any other explanations, it turns out that "p->state" can be
> something like "TASK_RUNNING | TASK_NONINTERACTIVE", and then this would
> trigger totally incorrectly.
>
> > It looks like "WSTOPPED | WNOWAIT is illegal for TASK_TRACED child"
> > to me. Is this correct? I think no.
>
> No, I think it's correct. If you have a traced child, you can't just wait
> for it. You need to use ptrace to release it first.

But it is ok to do do_wait(WSTOPPED /* without WNOWAIT */) for TASK_TRACED
child. wait_task_stopped() does not actually wait, it just eats ->exit_code.

> > Actually, I don't understand why we are checking p->state at all, we
> > already dropped tasklist_lock, the state can change at any monent.
>
> If it's TASK_TRACED, and it's our child, then it shouldn't be changing.

It can be child of other thread in our thread group (do_wait() iterates
over all threads ->children lists) and that thread can do PTRACE_DETACH.
But this does not matter.

> Besides, even if it does, we had a perfectly fine race, and we'll have
> been woken up again and we'll just go through the do_wait() loop once
> more.

Yes,

> So I think the code is mostly correct. But that ">" is definitely
> incorrect.
>
> Maybe it should just be
>
> if (!exit_code || (p->state & TASK_TRACED))
>
> instead?

I still think that checking p->state here just pointless and confusing.
The task was TASK_STOPPED or TASK_TRACED on entering, we have WNOWAIT
flag, we should just call wait_noreap_copyout(). And 'p' can change it's
->state just after the check.

And I think that wait_task_stopped() should get ->exit_code and ->si_code
atomically under tasklist_lock, p->ptrace can be changed after we dropped
that lock.

Btw,
do_wait():
case TASK_STOPPED:
if (!(options & WUNTRACED) && !my_ptrace_child(p))
continue;

Looks like it should be '||' here?

Oleg.

2005-10-02 01:12:22

by Roland McGrath

[permalink] [raw]
Subject: Re: [PATCH] fix TASK_STOPPED vs TASK_NONINTERACTIVE interaction

> > It looks like "WSTOPPED | WNOWAIT is illegal for TASK_TRACED child"
> > to me. Is this correct? I think no.
>
> No, I think it's correct. If you have a traced child, you can't just wait
> for it. You need to use ptrace to release it first.

That's correct. If your child is in TASK_TRACED because someone else is
ptrace'ing it, then you don't notice its state.

> > Actually, I don't understand why we are checking p->state at all, we
> > already dropped tasklist_lock, the state can change at any monent.
>
> If it's TASK_TRACED, and it's our child, then it shouldn't be changing.

Actually, it can in two cases. First is SIGKILL, which can always wake it
up suddenly. The second is when we are not the actual ptrace parent, but
another thread in the same thread group. (do_wait loops on all the threads
in the group and checks each one's children list. my_ptrace_child does not
distinguish the ptracer from other threads in its group.)

> Besides, even if it does, we had a perfectly fine race, and we'll have
> been woken up again and we'll just go through the do_wait() loop once
> more.
>
> So I think the code is mostly correct. But that ">" is definitely
> incorrect.

Indeed. I failed to notice this > test when the other just came up recently,
because I had a change to it kicking around in my tree for a long time.
I can't honestly recall and would have to reconstruct why I wrote it this way.
The simple change Linus suggested seems fine too, probably better.


Thanks,
Roland


--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -1199,14 +1188,15 @@ static int wait_task_stopped(task_t *p,
if (unlikely(noreap)) {
pid_t pid = p->pid;
uid_t uid = p->uid;
- int why = (p->ptrace & PT_PTRACED) ? CLD_TRAPPED : CLD_STOPPED;
+ int state = p->state;

exit_code = p->exit_code;
- if (unlikely(!exit_code) ||
- unlikely(p->state > TASK_STOPPED))
+ if (unlikely(!exit_code) || unlikely(p->state != state))
goto bail_ref;
+
+ state = (state == TASK_TRACED) ? CLD_TRAPPED : CLD_STOPPED;
return wait_noreap_copyout(p, pid, uid,
- why, (exit_code << 8) | 0x7f,
+ state, (exit_code << 8) | 0x7f,
infop, ru);
}

2005-10-03 02:54:30

by Coywolf Qi Hunt

[permalink] [raw]
Subject: Re: [PATCH] fix TASK_STOPPED vs TASK_NONINTERACTIVE interaction

On 9/30/05, Linus Torvalds <[email protected]> wrote:
>
>
> On Thu, 29 Sep 2005, Oleg Nesterov wrote:
> >
> > do_signal_stop:
> >
> > for_each_thread(t) {
> > if (t->state < TASK_STOPPED)
> > ++sig->group_stop_count;
> > }
> >
> > However, TASK_NONINTERACTIVE > TASK_STOPPED, so this loop will not
> > count TASK_INTERRUPTIBLE | TASK_NONINTERACTIVE threads.
>
> Ok, I think your patch is correct (we really do try to keep an order to
> those task flags), but the _real_ issue is that the comparisons are bogus.
>
> Using ">" for task states is wrong. It's a bitmask, and if you want to
> check multiple states, then we should just do so with
>
> if (t->state & (TASK_xxx | TASK_yyy | ...))
>
> Oh, well. The inequality comparisons are shorter, and historical, so I

The inequality comparisons are faster or faster on some CPU.
We should use it if we can keep the order IMHO.

> guess it's debatable whether we should remove them all.
>
> Linus

--
Coywolf Qi Hunt
http://sosdg.org/~coywolf/

2005-10-03 03:24:39

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] fix TASK_STOPPED vs TASK_NONINTERACTIVE interaction



On Mon, 3 Oct 2005, Coywolf Qi Hunt wrote:
>
> The inequality comparisons are faster or faster on some CPU.
> We should use it if we can keep the order IMHO.

Which CPU are you talking about? Inequality is almost never faster than a
logical op.

Any CPU I can think of has a "and + compare against zero". x86 calls it
"test", although a regular "and" will do it too. ppc has "andi.". alpha
comparisons are against registers being zero or not, so again it's an
"and".

And there are very few ALU's that don't do logicals (they're much simpler
than an adder), although I seem to remember that the P4 was odd in that
respect (with the second ALU being limited to some strange subcases).

So I don't think that ends up being a concern in real life.

Linus

2005-10-03 23:49:09

by George Spelvin

[permalink] [raw]
Subject: Re: [PATCH] fix TASK_STOPPED vs TASK_NONINTERACTIVE interaction

> Any CPU I can think of has a "and + compare against zero". x86 calls it
> "test", although a regular "and" will do it too. ppc has "andi.". alpha
> comparisons are against registers being zero or not, so again it's an
> "and".

About the only one for which "cmp" is faster than "test" is the 680x0.
"cmp" is a subtract that doesn't write the result (except to the condition
codes). "tst" (called "bit" on the VAX and 6502) is a similar AND.

The 68k doesn't have that. You have to obliterate one of the operands,
and it can't be the immediate operand, so it's often two instructions.

The 68k does have an instruction called "test", but it just sets the
condition codes based on a single operand (since it doesn't set the
condition codes on a simple move).

It *does* have a single-bit test instruction, which evaluates
"if (x & 1<<y)" in one cycle, but that doesn't help for multi-bit masks
like "if (x % 4096 == 0)"