Subject: Re: Possible kernel lock in semaphore's __down()
From: Peter Zijlstra <peterz@infradead.org>
To: Aleksandar Dezelin <dezelin@gmail.com>
Cc: linux-kernel@vger.kernel.org, mingo <mingo@redhat.com>,
       Oleg Nesterov <oleg@tv-sign.ru>
In-Reply-To: <1188424371.8853.9.camel@synaptical>
References: <1188424371.8853.9.camel@synaptical>
Content-Type: text/plain
Date: Thu, 30 Aug 2007 09:16:05 +0200
Message-Id: <1188458165.6112.19.camel@twins>
Mime-Version: 1.0
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 1895
Lines: 45

On Wed, 2007-08-29 at 23:52 +0200, Aleksandar Dezelin wrote:
> Hi!
> 
> I'm a newbie here on the list and also as a "kernel hacker". There's a
> bug reported in bugzilla (Bug 7927), cite:
> 
> 
> > In the function __down
> >  
> > fastcall void __sched __down(struct semaphore * sem)
> > {
> >  struct task_struct *tsk = current;
> >  DECLARE_WAITQUEUE(wait, tsk);
> >  unsigned long flags;
> >  
> >  tsk->state = TASK_UNINTERRUPTIBLE;
> >  spin_lock_irqsave(&sem->wait.lock, flags);
> >  add_wait_queue_exclusive_locked(&sem->wait, &wait);
> >  ...
> > }
> >  
> > 
> > From this code fragment, it sets the tsk->state to TASK_UNINTERRUPTIBLE before 
> > gets the spinlock. Assume at that moment, a interrupt ocuur and and after the 
> > interrupt handle ends, an other process is scheduled to run (assume the kernel 
> > is preemptalbe). In this case, the previous process ( its state has set to 
> > TASK_UNINTERRUPTIBLE) has been picked off the run queue, and it has not yet add 
> > to the wait queue( sem->wait ), so it may be never waited up forever. 
> > 
> 
> I have marked it as rejected as as I can see at the time this function is called,
> it is guaranteed that ret_from_intr() will not call schedule() on return from an 
> interrupt handler to either kernel space or user space because of the call 
> to macro might_sleep() in semaphore's down(). Am I wrong?

I think the reported meant interrupt driven involuntary preemption. So
ret_from_intr() is not the right place to look. But afaict you're still
right, see how preempt_schedule*() adds PREEMPT_ACTIVE to the
preempt_count, and how that makes the scheduler ignore the task state.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/