2005-05-07 06:31:31

by Yuly Finkelberg

[permalink] [raw]
Subject: Scheduler: Spinning until tasks are STOPPED




Hi,

I sent a message regarding this issue earlier, but after re-reading
it, I realized that it wasn't very clear. Hopefully, this will
clarify things a little bit:

I have a strange scheduling issue: a bunch of worker tasks are all waiting
on a wait queue. Each task is woken up by the preceeding, does some work,
wakes up the next one, and then sends a SIGSTOP to itself. The last task however
does not stop itself, but instead yield()s until all tasks have reached state
TASK_STOPPED.

The code looks like this (irrelevant parts cut out):
...
ret = wait_event_interruptible(waitq, next_in_line == myself);
...
(some work)
...
next_in_line = next;
ret = wakeup_next_one();
if (!last_one)
send_sig(SIGSTOP, current, 1);
else
spin_until_all_stopped()

When run with 50 tasks, normally this works well. However sometimes one of the
tasks (never the last one) gets stuck between calling wakeup_next_one() and
between sending the signal. It accumulates system time, and its stack looks
like (no pending signals, ti_flags is clear):

c55e7ad0 00000086 c55e6000 c55e7a94 00000046 c55e6000 c55e7ad0 c0109c2d
00000000 c0497800 00000001 d38da344 0013bc9c c5632840 00071931 d3d93161
0013bc9c c55d546c c05d3960 0000270f c05d3960 c55e6000 c0106f25 c05d3960

Call Trace:
[<c0106f25>] need_resched+0x27/0x32

(yes, this is not a mistake: this is ALL the stack reported by show_stack())

Normally the spinning task will magically get released after "a while", where
few seconds < "a while" < 10 minutes and sometimes even longer.
So the mystery is -
1. Why does the task spin for so long ?
2. Where does it spin ? (the kernel stack doesn't hint on anything...)
3. How can I find out #2 ?
4. How to fix it ?
5. Is there a better way to make sure a specific task is STOPPED ?

Currently running 2.6.8.1 and 2.6.9 (UP, PREEMPT). I'd appreciate any
help here...


2005-05-07 07:12:37

by Nick Piggin

[permalink] [raw]
Subject: Re: Scheduler: Spinning until tasks are STOPPED

Yuly Finkelberg wrote:
>
>
>
> Hi,
>
> I sent a message regarding this issue earlier, but after re-reading
> it, I realized that it wasn't very clear. Hopefully, this will
> clarify things a little bit:
>
> I have a strange scheduling issue: a bunch of worker tasks are all waiting
> on a wait queue. Each task is woken up by the preceeding, does some work,
> wakes up the next one, and then sends a SIGSTOP to itself. The last task however
> does not stop itself, but instead yield()s until all tasks have reached state
> TASK_STOPPED.
>
> The code looks like this (irrelevant parts cut out):
> ...
> ret = wait_event_interruptible(waitq, next_in_line == myself);
> ...
> (some work)
> ...
> next_in_line = next;
> ret = wakeup_next_one();
> if (!last_one)
> send_sig(SIGSTOP, current, 1);
> else
> spin_until_all_stopped()
>
> When run with 50 tasks, normally this works well. However sometimes one of the
> tasks (never the last one) gets stuck between calling wakeup_next_one() and
> between sending the signal. It accumulates system time, and its stack looks
> like (no pending signals, ti_flags is clear):
>
> c55e7ad0 00000086 c55e6000 c55e7a94 00000046 c55e6000 c55e7ad0 c0109c2d
> 00000000 c0497800 00000001 d38da344 0013bc9c c5632840 00071931 d3d93161
> 0013bc9c c55d546c c05d3960 0000270f c05d3960 c55e6000 c0106f25 c05d3960
>
> Call Trace:
> [<c0106f25>] need_resched+0x27/0x32
>
> (yes, this is not a mistake: this is ALL the stack reported by show_stack())
>
> Normally the spinning task will magically get released after "a while", where
> few seconds < "a while" < 10 minutes and sometimes even longer.
> So the mystery is -
> 1. Why does the task spin for so long ?
> 2. Where does it spin ? (the kernel stack doesn't hint on anything...)
> 3. How can I find out #2 ?
> 4. How to fix it ?
> 5. Is there a better way to make sure a specific task is STOPPED ?
>
> Currently running 2.6.8.1 and 2.6.9 (UP, PREEMPT). I'd appreciate any
> help here...

You're doing this in the *kernel*? It sounds like it should be done
in userspace or done a different way (ie. not with 50 tasks).

And using signals and spinning on yield for synchronisation and
process control in the kernel like this is fairly crazy.

Can't you use a semaphore or something?

2005-05-07 17:36:25

by Yuly Finkelberg

[permalink] [raw]
Subject: Re: Scheduler: Spinning until tasks are STOPPED

Nick,

> You're doing this in the *kernel*? It sounds like it should be done
> in userspace or done a different way (ie. not with 50 tasks).

These are tasks that are running in the kernel on behalf of a new system call.

> And using signals and spinning on yield for synchronisation and
> process control in the kernel like this is fairly crazy.

The problem appears to be not with the process that is
spinning/yielding, but rather the one process which gets stuck. It is
charged almost all the system time. I agree that it's not pretty
though...

> Can't you use a semaphore or something?

There is noone to call up() when a process is actually stopped.

If you have any ideas as to what can be happening or a better way to
accomplish this (in the kernel), I'd appreciate hearing it.

2005-05-07 22:39:06

by Bodo Eggert

[permalink] [raw]
Subject: Re: Scheduler: Spinning until tasks are STOPPED

Yuly Finkelberg <[email protected]> wrote:

> If you have any ideas as to what can be happening or a better way to
> accomplish this (in the kernel), I'd appreciate hearing it.

Accomplish what? You didn't tell much.
--
Top 100 things you don't want the sysadmin to say:
88. Management says...

2005-05-08 04:00:23

by Nick Piggin

[permalink] [raw]
Subject: Re: Scheduler: Spinning until tasks are STOPPED

Yuly Finkelberg wrote:
> Nick,
>
>
>>You're doing this in the *kernel*? It sounds like it should be done
>>in userspace or done a different way (ie. not with 50 tasks).
>
>
> These are tasks that are running in the kernel on behalf of a new system call.
>

Well it still sounds like the kernel is doing too much. For example,
why don't you just have a syscall (or char device) just to send out
the events, and do everything else (all the queueing and
synchronisation and signalling) in userspace?

>
>>And using signals and spinning on yield for synchronisation and
>>process control in the kernel like this is fairly crazy.
>
>
> The problem appears to be not with the process that is
> spinning/yielding, but rather the one process which gets stuck. It is
> charged almost all the system time. I agree that it's not pretty
> though...
>
>
>>Can't you use a semaphore or something?
>
>
> There is noone to call up() when a process is actually stopped.
>
> If you have any ideas as to what can be happening or a better way to
> accomplish this (in the kernel), I'd appreciate hearing it.
>

OK, for a simple example, instead of spinning on yield(), do a
down() on a locked mutex.

Then have maybe an `atomic_t nr_running` which is incremented for
each worker task running. When they are ready to stop, they can
do an atomic_dec_and_test of nr_running, and the last one can up()
the mutex. If you absolutely need to know when the process is
actually stopped, why?

Also, sending one's self a SIGSTOP to stop is not so good.
Generally you shouldn't use signals at all in the kernel if
possible. So why don't those guys just return to usermode and you
can raise a signal or whatever you need from there.

--
SUSE Labs, Novell Inc.

2005-05-09 00:10:24

by Yuly Finkelberg

[permalink] [raw]
Subject: Re: Scheduler: Spinning until tasks are STOPPED

> Well it still sounds like the kernel is doing too much. For example,
> why don't you just have a syscall (or char device) just to send out
> the events, and do everything else (all the queueing and
> synchronisation and signalling) in userspace?

True, it does :)
However, the operation requires that a consistent in-kernel state will
hold for the tasks. They all (except for the last one) do some work,
send a SIGSTOP to themselves, and return to usermode (handling the
STOP signal). The last task does not stop itself, but instead
verifies that all of the others are stopped before it returns.

> OK, for a simple example, instead of spinning on yield(), do a
> down() on a locked mutex.
> Then have maybe an `atomic_t nr_running` which is incremented for
> each worker task running. When they are ready to stop, they can
> do an atomic_dec_and_test of nr_running, and the last one can up()
> the mutex. If you absolutely need to know when the process is
> actually stopped, why?

I need to ensure that the internal state of the processes will not
change (unless of course some other signal gets delivered, which will
not be the case).

It doesn't look like the problem is with the task that is spinning
until the others have stopped. Instead, it looks like one of the
other tasks is spinning somewhere in between the time that it wakes up
its successor and the time that it sends it self the STOP signal. It
is clearly getting preempted but then makes no progress (sometimes for
a VERY long time).

2005-05-09 06:06:04

by Nick Piggin

[permalink] [raw]
Subject: Re: Scheduler: Spinning until tasks are STOPPED

Yuly Finkelberg wrote:
>>Well it still sounds like the kernel is doing too much. For example,
>>why don't you just have a syscall (or char device) just to send out
>>the events, and do everything else (all the queueing and
>>synchronisation and signalling) in userspace?
>
>
> True, it does :)
> However, the operation requires that a consistent in-kernel state will
> hold for the tasks. They all (except for the last one) do some work,
> send a SIGSTOP to themselves, and return to usermode (handling the
> STOP signal). The last task does not stop itself, but instead
> verifies that all of the others are stopped before it returns.
>

Well you can do that all from userspace basically.
At least you should be able to do it as well as you can from
kernel (providing you have a syscall/device to retrieve events).

>
>>OK, for a simple example, instead of spinning on yield(), do a
>>down() on a locked mutex.
>>Then have maybe an `atomic_t nr_running` which is incremented for
>>each worker task running. When they are ready to stop, they can
>>do an atomic_dec_and_test of nr_running, and the last one can up()
>>the mutex. If you absolutely need to know when the process is
>>actually stopped, why?
>
>
> I need to ensure that the internal state of the processes will not
> change (unless of course some other signal gets delivered, which will
> not be the case).
>

They won't change *much*. Nothing that is detectable from userspace
(except for perhaps /proc entries).

> It doesn't look like the problem is with the task that is spinning
> until the others have stopped. Instead, it looks like one of the
> other tasks is spinning somewhere in between the time that it wakes up
> its successor and the time that it sends it self the STOP signal. It
> is clearly getting preempted but then makes no progress (sometimes for
> a VERY long time).
>

Well it is a bit difficult to help further because you haven't posted
the working code or said what you are trying to do.

Stick a few printks around the place or try a kernel debugger to see
if you can't work out what is going wrong. Compiling the kernel with
debug info can allow you to find out what line of code EIP is pointing
to, which can also be helpful.

--
SUSE Labs, Novell Inc.