I have been playing around with the scheduler in the test9
kernel and noticed that it sometimes chooses tasks to run
that are not on the run-queue. This may seem strange, but
here is how it happens
task A on processor 0, calls __lock_sock() which does the
following:
void __lock_sock(struct sock *sk)
{
DECLARE_WAITQUEUE(wait, current);
add_wait_queue_exclusive(&sk->lock.wq, &wait);
for(;;) {
current->state = TASK_EXCLUSIVE | TASK_UNINTERRUPTIBLE;
spin_unlock_bh(&sk->lock.slock);
schedule();
spin_lock_bh(&sk->lock.slock);
if(!sk->lock.users)
break;
}
current->state = TASK_RUNNING;
remove_wait_queue(&sk->lock.wq, &wait);
}
Now when __lock_sock calls schedule, the task's state is set
as above and the following scheduler code removes the task from
the run-queue.
switch (prev->state & ~TASK_EXCLUSIVE) {
case TASK_INTERRUPTIBLE:
if (signal_pending(prev)) {
prev->state = TASK_RUNNING;
break;
}
default:
del_from_runqueue(prev);
case TASK_RUNNING:
}
After the task is removed from the run-queue, an interrupt is
serviced on another CPU which ultimately calls __wake_up_common().
__wake_up_common() chooses task A to wakeup and best_exclusive is
is set to A. The following code in __wake_up_common() is then
executed:
if (best_exclusive)
best_exclusive->state = TASK_RUNNING;
wq_write_unlock_irqrestore(&q->lock, flags);
if (best_exclusive) {
if (sync)
wake_up_process_synchronous(best_exclusive);
else
wake_up_process(best_exclusive);
}
Note that the state of task A will then be set to TASK_RUNNING.
Now back on CPU 1 (where we are in the scheduler routine) we
perform the following test:
if (prev->state == TASK_RUNNING)
goto still_running;
Since the state of prev has been changed to TASK_RUNNING by the
__wake_up_common code, we set next = prev. This means that we
potentially choose to continue running the current task, even
though the task has been deleted from the run-queue.
Now, what usually happens is that wake_up_process_synchronous or
wake_up_process will add the task back to the run-queue as soon
as the scheduler drops the run-queue lock. Therefore, this does
not seem to cause any problems.
I'm curious, is this behavior by design OR are we just getting
lucky?
Thanks,
--
Mike Kravetz [email protected]
IBM Linux Technology Center
15450 SW Koll Parkway
Beaverton, OR 97006-6063 (503)578-3494
Date: Wed, 8 Nov 2000 15:11:49 -0800
From: Mike Kravetz <[email protected]>
The following code in __wake_up_common() is then
executed:
if (best_exclusive)
best_exclusive->state = TASK_RUNNING;
wq_write_unlock_irqrestore(&q->lock, flags);
test10 fixes this error, now it sets TASK_RUNNING and
adds the task back to the runqueue all under the runqueue
lock.
Later,
David S. Miller
[email protected]
"David S. Miller" wrote:
>
> Date: Wed, 8 Nov 2000 15:11:49 -0800
> From: Mike Kravetz <[email protected]>
>
> The following code in __wake_up_common() is then
> executed:
>
> if (best_exclusive)
> best_exclusive->state = TASK_RUNNING;
> wq_write_unlock_irqrestore(&q->lock, flags);
>
> test10 fixes this error, now it sets TASK_RUNNING and
> adds the task back to the runqueue all under the runqueue
> lock.
In our preemptable kernel work we often put (or leave) tasks on the run
queue that are not in state TASK_RUNNING and want to treat them as if
they are in state TASK_RUNNING. We thus changed the test in schedule()
to "task_on_runqueue(prev)"....
George