Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758911AbZAGSG7 (ORCPT ); Wed, 7 Jan 2009 13:06:59 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1753661AbZAGSGu (ORCPT ); Wed, 7 Jan 2009 13:06:50 -0500 Received: from e28smtp02.in.ibm.com ([59.145.155.2]:41267 "EHLO e28smtp02.in.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753930AbZAGSGt (ORCPT ); Wed, 7 Jan 2009 13:06:49 -0500 Date: Wed, 7 Jan 2009 23:39:37 +0530 From: Vaidyanathan Srinivasan To: Peter Zijlstra Cc: Ingo Molnar , Linux Kernel , Balbir Singh , Andrew Morton , Mike Galbraith Subject: Re: [BUG] 2.6.28-git LOCKDEP: Possible recursive rq->lock Message-ID: <20090107180937.GP4574@dirshya.in.ibm.com> Reply-To: svaidy@linux.vnet.ibm.com References: <20090104174450.GB4301@dirshya.in.ibm.com> <1231092523.29980.4.camel@twins> <20090105040635.GF4301@dirshya.in.ibm.com> <20090105130638.GB6014@elte.hu> <20090107114947.GJ4574@dirshya.in.ibm.com> <20090107122913.GL4574@dirshya.in.ibm.com> <1231333963.11687.288.camel@twins> <20090107142009.GM4574@dirshya.in.ibm.com> <1231338537.11687.295.camel@twins> <20090107163100.GO4574@dirshya.in.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline In-Reply-To: <20090107163100.GO4574@dirshya.in.ibm.com> User-Agent: Mutt/1.5.18 (2008-05-17) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4142 Lines: 106 * Vaidyanathan Srinivasan [2009-01-07 22:01:00]: > * Peter Zijlstra [2009-01-07 15:28:57]: > > > On Wed, 2009-01-07 at 19:50 +0530, Vaidyanathan Srinivasan wrote: > > > * Peter Zijlstra [2009-01-07 14:12:43]: > > > > > > > On Wed, 2009-01-07 at 17:59 +0530, Vaidyanathan Srinivasan wrote: > > > > > > > > > ============================================= > > > > > [ INFO: possible recursive locking detected ] > > > > > 2.6.28-autotest-tip-sv #1 > > > > > --------------------------------------------- > > > > > klogd/5062 is trying to acquire lock: > > > > > (&rq->lock){++..}, at: [] task_rq_lock+0x45/0x7e > > > > > > > > > > but task is already holding lock: > > > > > (&rq->lock){++..}, at: [] schedule+0x158/0xa31 > > > > > > > > > > other info that might help us debug this: > > > > > 1 lock held by klogd/5062: > > > > > #0: (&rq->lock){++..}, at: [] schedule+0x158/0xa31 > > > > > > > > > > stack backtrace: > > > > > Pid: 5062, comm: klogd Not tainted 2.6.28-autotest-tip-sv #1 > > > > > Call Trace: > > > > > [] __lock_acquire+0xeb9/0x16a4 > > > > > [] ? __lock_acquire+0x1688/0x16a4 > > > > > [] lock_acquire+0x85/0xa9 > > > > > [] ? task_rq_lock+0x45/0x7e > > > > > [] _spin_lock+0x31/0x66 > > > > > [] ? task_rq_lock+0x45/0x7e > > > > > [] task_rq_lock+0x45/0x7e > > > > > [] try_to_wake_up+0x88/0x27a > > > > > [] wake_up_process+0x10/0x12 > > > > > [] schedule+0x560/0xa31 > > > > > > > > I'd be most curious to know where in schedule we are. > > > > > > ok, we are in sched.c:3777 > > > > > > double_unlock_balance(this_rq, busiest); > > > if (active_balance) > > > >>>>>>>>>>> wake_up_process(busiest->migration_thread); > > > > > > } else > > > > > > In active balance in newidle. This implies sched_mc was 2 at that time. > > > let me trace this and debug further. > > > > How about something like this? Strictly speaking we'll not deadlock, > > because ttwu will not be able to place the migration task on our rq, but > > since the code can deal with both rqs getting unlocked, this seems the > > easiest way out. > > Hi Peter, > > I agree. Unlocking this_rq is an easy way out. Thanks for the > suggestion. I have moved the unlock and lock withing the if > condition. > > --Vaidy > > sched: bug fix -- do not call ttwu while holding rq->lock > > When sched_mc=2 wake_up_process() is called on busiest_rq > while holding this_rq lock in load_balance_newidle() > Though this will not deadlock, this is a lockdep warning > and the situation is easily solved by releasing the this_rq > lock at this point in code > > Signed-off-by: Vaidyanathan Srinivasan > > diff --git a/kernel/sched.c b/kernel/sched.c > index 71a054f..703a669 100644 > --- a/kernel/sched.c > +++ b/kernel/sched.c > @@ -3773,8 +3773,12 @@ redo: > } > > double_unlock_balance(this_rq, busiest); > - if (active_balance) > + if (active_balance) { > + /* Should not call ttwu while holding a rq->lock */ > + spin_unlock(&this_rq->lock); > wake_up_process(busiest->migration_thread); > + spin_lock(&this_rq->lock); > + } > > } else > sd->nr_balance_failed = 0; Hi Peter and Ingo, The above fix seem to have fixed the lockdep warning. Please include in sched-tip for further testing and later push to mainline. Thanks, Vaidy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/