Return-path: Received: from mail.screens.ru ([213.234.233.54]:55477 "EHLO mail.screens.ru" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752950AbXF3LqH (ORCPT ); Sat, 30 Jun 2007 07:46:07 -0400 Date: Sat, 30 Jun 2007 15:46:58 +0400 From: Oleg Nesterov To: Ingo Molnar Cc: Johannes Berg , Arjan van de Ven , Linux Kernel list , linux-wireless , Peter Zijlstra , mingo@elte.hu, Thomas Sattler Subject: Re: [RFC/PATCH] debug workqueue deadlocks with lockdep Message-ID: <20070630114658.GA344@tv-sign.ru> References: <1182969638.4769.56.camel@johannes.berg> <1183048429.4089.1.camel@johannes.berg> <1183052001.4089.2.camel@johannes.berg> <1183190728.7932.43.camel@earth4> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii In-Reply-To: <1183190728.7932.43.camel@earth4> Sender: linux-wireless-owner@vger.kernel.org List-ID: On 06/30, Ingo Molnar wrote: > > On Thu, 2007-06-28 at 19:33 +0200, Johannes Berg wrote: > > No, that's not right either, but Arjan just helped me a bit with how > > lockdep works and I think I have the right idea now. Ignore this for > > now, I'll send a new patch in a few days. > > ok. But in general, this is a very nice idea! > > i've Cc:-ed Oleg. Oleg, what do you think? I think we should keep all > the workqueue APIs specified in a form that makes them lockdep coverable > like Johannes did. This debug mechanism could have helped with the > recent DVB lockup that Thomas Sattler reported. I think this idea is great! Johannes, could you change wait_on_work() as well? Most users of flush_workqueue() should be converted to use it. > @@ -342,6 +351,9 @@ static int flush_cpu_workqueue(struct cp > } else { > struct wq_barrier barr; > > + lock_acquire(&cwq->wq->lockdep_map, 0, 0, 0, 2, _THIS_IP_); > + lock_release(&cwq->wq->lockdep_map, 0, _THIS_IP_); > + > active = 0; > spin_lock_irq(&cwq->lock); I am not sure why you skip "if (cwq->thread == current)" case, it can deadlock in the same way. But, perhaps we should not change flush_cpu_workqueue(). If we detect the deadlock, we will have num_online_cpus() reports, yes? And, > if (!list_empty(&cwq->worklist) || cwq->current_work != NULL) { > @@ -376,6 +388,8 @@ void fastcall flush_workqueue(struct wor > int cpu; > > might_sleep(); > + lock_acquire(&wq->lockdep_map, 0, 0, 0, 2, _THIS_IP_); > + lock_release(&wq->lockdep_map, 0, _THIS_IP_); one of the 2 callers was already modified. Perhaps it is better to add lock_acquire() into the second caller, cleanup_workqueue_thread(), but skip flush_cpu_workqueue() ? Oleg.