Subject: Re: deadlocks if use htb
From: Peter Zijlstra <a.p.zijlstra@chello.nl>
To: Jarek Poplawski <jarkao2@gmail.com>
Cc: Denys Fedoryschenko <denys@visp.net.lb>, Chris Caputo <ccaputo@alt.net>,
       netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
       Badalian Vyacheslav <slavon@bigtelecom.ru>,
       Thomas Gleixner <tglx@linutronix.de>
In-Reply-To: <20090115090120.GE4190@ff.dom.local>
References: <20081010090426.GA6054@ff.dom.local>
	 <200901141417.58667.denys@visp.net.lb> <1231937404.14825.4.camel@laptop>
	 <200901141505.46929.denys@visp.net.lb> <20090114131257.GC6117@ff.dom.local>
	 <1231938929.14825.6.camel@laptop> <20090114132603.GD6117@ff.dom.local>
	 <1231939946.14825.9.camel@laptop> <20090114141311.GA6643@ff.dom.local>
	 <1231943283.14825.14.camel@laptop>  <20090115090120.GE4190@ff.dom.local>
Content-Type: text/plain
Date: Thu, 15 Jan 2009 11:46:48 +0100
Message-Id: <1232016408.8870.43.camel@laptop>
Mime-Version: 1.0
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 2018
Lines: 70

On Thu, 2009-01-15 at 09:01 +0000, Jarek Poplawski wrote:
> On Wed, Jan 14, 2009 at 03:28:03PM +0100, Peter Zijlstra wrote:
> ....
> > Right, found all that...
> > 
> > Can't spot anything obviously wrong though.. hrtimer_start*() does
> > remove_hrtimer() which checks STATE_ENQUEUED, STATE_PENDING and pulls it
> > off the relevant list before it continues the enqueue.
> > 
> > However a loop in enqueue_hrtimer() would suggest a corrupted RB-tree,
> > but I cannot find an RB-op that doesn't hold base-lock.
> > 
> 
> I've revisited it yesterday, and if I don't miss something, there is
> possible a scenario similar to this:
> 
> cpu1:				cpu2:
> 
> run_hrtimer_pending
> spin_unlock
> restart = fn(timer)
> 
> 				hrtimer_start
> 				enqueue_hrtimer
> 
> 				hrtimer_start
> 				remove_hrtimer
> 				(the HRTIMER_STATE_CALLBACK is removed)
> 
> 				switch_hrtimer_base
> spin_lock
> (not this hrtimer's anymore)
> __remove_hrtimer
> list_add_tail			enqueue_hrtimer
> 

(looking at .28 code)

run_hrtimer_pending() reads like:

while (pending timers) {
  __remove_hrtimer(timer, HRTIMER_STATE_CALLBACK);
  spin_unlock(&cpu_base->lock);

  fn(timer);

  spin_lock(&cpu_base->lock);
  timer->state &= ~HRTIMER_STATE_CALLBACK; // _should_ result in HRTIMER_STATE_INACTIVE
  if (HRTIMER_RESTART)
    re-queue
  else if (timer->state != INACTIVE) {
    // so another cpu re-queued this timer _while_ we were executing it.
    if (timer is first && !reprogramm) {
      __remove_hrtimer(timer, HRTIMER_STATE_PENDING);
      list_add_tail(timer, &cb_pending);
    }
  } 
}

So in the window where we drop the lock, one can, as you said, have
another cpu requeue the timer, but the rb_entry and list_entry are free,
so it should not cause the data corruption we're seeing.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/