2007-08-29 21:53:04

by Aleksandar Dezelin

[permalink] [raw]
Subject: Possible kernel lock in semaphore's __down()

Hi!

I'm a newbie here on the list and also as a "kernel hacker". There's a
bug reported in bugzilla (Bug 7927), cite:


> In the function __down
>
> fastcall void __sched __down(struct semaphore * sem)
> {
> struct task_struct *tsk = current;
> DECLARE_WAITQUEUE(wait, tsk);
> unsigned long flags;
>
> tsk->state = TASK_UNINTERRUPTIBLE;
> spin_lock_irqsave(&sem->wait.lock, flags);
> add_wait_queue_exclusive_locked(&sem->wait, &wait);
> ...
> }
>
>
> From this code fragment, it sets the tsk->state to TASK_UNINTERRUPTIBLE before
> gets the spinlock. Assume at that moment, a interrupt ocuur and and after the
> interrupt handle ends, an other process is scheduled to run (assume the kernel
> is preemptalbe). In this case, the previous process ( its state has set to
> TASK_UNINTERRUPTIBLE) has been picked off the run queue, and it has not yet add
> to the wait queue( sem->wait ), so it may be never waited up forever.
>

I have marked it as rejected as as I can see at the time this function is called,
it is guaranteed that ret_from_intr() will not call schedule() on return from an
interrupt handler to either kernel space or user space because of the call
to macro might_sleep() in semaphore's down(). Am I wrong?

Thanks and best regards,
Aleksandar Dezelin




2007-08-30 07:16:20

by Peter Zijlstra

[permalink] [raw]
Subject: Re: Possible kernel lock in semaphore's __down()

On Wed, 2007-08-29 at 23:52 +0200, Aleksandar Dezelin wrote:
> Hi!
>
> I'm a newbie here on the list and also as a "kernel hacker". There's a
> bug reported in bugzilla (Bug 7927), cite:
>
>
> > In the function __down
> >
> > fastcall void __sched __down(struct semaphore * sem)
> > {
> > struct task_struct *tsk = current;
> > DECLARE_WAITQUEUE(wait, tsk);
> > unsigned long flags;
> >
> > tsk->state = TASK_UNINTERRUPTIBLE;
> > spin_lock_irqsave(&sem->wait.lock, flags);
> > add_wait_queue_exclusive_locked(&sem->wait, &wait);
> > ...
> > }
> >
> >
> > From this code fragment, it sets the tsk->state to TASK_UNINTERRUPTIBLE before
> > gets the spinlock. Assume at that moment, a interrupt ocuur and and after the
> > interrupt handle ends, an other process is scheduled to run (assume the kernel
> > is preemptalbe). In this case, the previous process ( its state has set to
> > TASK_UNINTERRUPTIBLE) has been picked off the run queue, and it has not yet add
> > to the wait queue( sem->wait ), so it may be never waited up forever.
> >
>
> I have marked it as rejected as as I can see at the time this function is called,
> it is guaranteed that ret_from_intr() will not call schedule() on return from an
> interrupt handler to either kernel space or user space because of the call
> to macro might_sleep() in semaphore's down(). Am I wrong?

I think the reported meant interrupt driven involuntary preemption. So
ret_from_intr() is not the right place to look. But afaict you're still
right, see how preempt_schedule*() adds PREEMPT_ACTIVE to the
preempt_count, and how that makes the scheduler ignore the task state.

2007-08-30 09:12:39

by Oleg Nesterov

[permalink] [raw]
Subject: Re: Possible kernel lock in semaphore's __down()

(just in case Peter's explanation was too concise)

On 08/30, Peter Zijlstra wrote:
>
> On Wed, 2007-08-29 at 23:52 +0200, Aleksandar Dezelin wrote:
> > Hi!
> >
> > I'm a newbie here on the list and also as a "kernel hacker". There's a
> > bug reported in bugzilla (Bug 7927), cite:
> >
> >
> > > In the function __down
> > >
> > > fastcall void __sched __down(struct semaphore * sem)
> > > {
> > > struct task_struct *tsk = current;
> > > DECLARE_WAITQUEUE(wait, tsk);
> > > unsigned long flags;
> > >
> > > tsk->state = TASK_UNINTERRUPTIBLE;
> > > spin_lock_irqsave(&sem->wait.lock, flags);
> > > add_wait_queue_exclusive_locked(&sem->wait, &wait);
> > > ...
> > > }
> > >
> > >
> > > From this code fragment, it sets the tsk->state to TASK_UNINTERRUPTIBLE before
> > > gets the spinlock. Assume at that moment, a interrupt ocuur and and after the
> > > interrupt handle ends, an other process is scheduled to run (assume the kernel
> > > is preemptalbe). In this case, the previous process ( its state has set to
> > > TASK_UNINTERRUPTIBLE) has been picked off the run queue,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
please see below,

> and it has not yet add
> > > to the wait queue( sem->wait ), so it may be never waited up forever.
> > >
> >
> > I have marked it as rejected as as I can see at the time this function is called,
> > it is guaranteed that ret_from_intr() will not call schedule() on return from an
> > interrupt handler to either kernel space or user space because of the call
> > to macro might_sleep() in semaphore's down(). Am I wrong?

No, ret_from_intr() still can schedule(). Actually it does preempt_schedule_irq()
which sets PREEMPT_ACTIVE.

And, as Peter explained,

> I think the reported meant interrupt driven involuntary preemption. So
> ret_from_intr() is not the right place to look. But afaict you're still
> right, see how preempt_schedule*() adds PREEMPT_ACTIVE to the
> preempt_count, and how that makes the scheduler ignore the task state.

this is OK, because in this case schedule() doesn't remove the task from
run queue even if its state is TASK_UNINTERRUPTIBLE,

if (prev->state && !(preempt_count() & PREEMPT_ACTIVE)) {
...
deactivate_task();
}

note the "!(preempt_count() & PREEMPT_ACTIVE))" check. So the task is still
runnable, we don't need to wake it, it will get CPU eventually.

Oleg.