Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S965297Ab2JYGOO (ORCPT ); Thu, 25 Oct 2012 02:14:14 -0400 Received: from e23smtp01.au.ibm.com ([202.81.31.143]:49647 "EHLO e23smtp01.au.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932112Ab2JYGON (ORCPT ); Thu, 25 Oct 2012 02:14:13 -0400 Message-ID: <5088D877.8010707@linux.vnet.ibm.com> Date: Thu, 25 Oct 2012 11:43:11 +0530 From: "Srivatsa S. Bhat" User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:15.0) Gecko/20120828 Thunderbird/15.0 MIME-Version: 1.0 To: Michael Wang CC: Peter Zijlstra , Huacai Chen , Ingo Molnar , linux-kernel@vger.kernel.org, Fuxin Zhang , Thomas Gleixner , Tejun Heo Subject: Re: [RFC][PATCH] sched: Fix a deadlock of cpu-hotplug References: <1351070714-17676-1-git-send-email-chenhc@lemote.com> <1351071502.13456.54.camel@twins> <5088B2DF.9050705@linux.vnet.ibm.com> In-Reply-To: <5088B2DF.9050705@linux.vnet.ibm.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit x-cbid: 12102506-1618-0000-0000-000002B5B8F1 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 6612 Lines: 148 On 10/25/2012 09:02 AM, Michael Wang wrote: > On 10/24/2012 05:38 PM, Peter Zijlstra wrote: >> On Wed, 2012-10-24 at 17:25 +0800, Huacai Chen wrote: >>> We found poweroff sometimes fails on our computers, so we have the >>> lock debug options configured. Then, when we do poweroff or take a >>> cpu down via cpu-hotplug, kernel complain as below. To resove this, >>> we modify sched_ttwu_pending(), disable the local irq when acquire >>> rq->lock. >>> >>> [ 83.066406] ================================= >>> [ 83.066406] [ INFO: inconsistent lock state ] >>> [ 83.066406] 3.5.0-3.lemote #428 Not tainted >>> [ 83.066406] --------------------------------- >>> [ 83.066406] inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage. >>> [ 83.066406] migration/1/7 [HC0[0]:SC0[0]:HE1:SE1] takes: >>> [ 83.066406] (&rq->lock){?.-.-.}, at: [] sched_ttwu_pending+0x64/0x98 >>> [ 83.066406] {IN-HARDIRQ-W} state was registered at: >>> [ 83.066406] [] __lock_acquire+0x80c/0x1cc0 >>> [ 83.066406] [] lock_acquire+0x60/0x9c >>> [ 83.066406] [] _raw_spin_lock+0x3c/0x50 >>> [ 83.066406] [] scheduler_tick+0x48/0x178 >>> [ 83.066406] [] update_process_times+0x54/0x70 >>> [ 83.066406] [] tick_handle_periodic+0x2c/0x9c >>> [ 83.066406] [] c0_compare_interrupt+0x8c/0x94 >>> [ 83.066406] [] handle_irq_event_percpu+0x7c/0x248 >>> [ 83.066406] [] handle_percpu_irq+0x8c/0xc0 >>> [ 83.066406] [] generic_handle_irq+0x48/0x58 >>> [ 83.066406] [] do_IRQ+0x18/0x24 >>> [ 83.066406] [] mach_irq_dispatch+0xe4/0x124 >>> [ 83.066406] [] ret_from_irq+0x0/0x4 >>> [ 83.066406] [] console_unlock+0x3e8/0x4c0 >>> [ 83.066406] [] con_init+0x370/0x398 >>> [ 83.066406] [] console_init+0x34/0x50 >>> [ 83.066406] [] start_kernel+0x2f8/0x4e0 >>> [ 83.066406] irq event stamp: 971 >>> [ 83.066406] hardirqs last enabled at (971): [] local_flush_tlb_all+0x134/0x17c >>> [ 83.066406] hardirqs last disabled at (970): [] local_flush_tlb_all+0x48/0x17c >>> [ 83.066406] softirqs last enabled at (0): [] copy_process+0x510/0x117c >>> [ 83.066406] softirqs last disabled at (0): [< (null)>] (null) >>> [ 83.066406] >>> [ 83.066406] other info that might help us debug this: >>> [ 83.066406] Possible unsafe locking scenario: >>> [ 83.066406] >>> [ 83.066406] CPU0 >>> [ 83.066406] ---- >>> [ 83.066406] lock(&rq->lock); >>> [ 83.066406] >>> [ 83.066406] lock(&rq->lock); >>> [ 83.066406] >>> [ 83.066406] *** DEADLOCK *** >>> [ 83.066406] >>> [ 83.066406] no locks held by migration/1/7. >>> [ 83.066406] >>> [ 83.066406] stack backtrace: >>> [ 83.066406] Call Trace: >>> [ 83.066406] [] dump_stack+0x8/0x34 >>> [ 83.066406] [] print_usage_bug+0x2ec/0x314 >>> [ 83.066406] [] mark_lock+0x3fc/0x774 >>> [ 83.066406] [] __lock_acquire+0x8a8/0x1cc0 >>> [ 83.066406] [] lock_acquire+0x60/0x9c >>> [ 83.066406] [] _raw_spin_lock+0x3c/0x50 >>> [ 83.066406] [] sched_ttwu_pending+0x64/0x98 >>> [ 83.066406] [] migration_call+0x10c/0x2e0 >>> [ 83.066406] [] notifier_call_chain+0x44/0x94 >>> [ 83.066406] [] __cpu_notify+0x30/0x5c >>> [ 83.066406] [] take_cpu_down+0x5c/0x70 >>> [ 83.066406] [] stop_machine_cpu_stop+0x104/0x1e8 >>> [ 83.066406] [] cpu_stopper_thread+0x110/0x1ac >>> [ 83.066406] [] kthread+0x88/0x90 >>> [ 83.066406] [] kernel_thread_helper+0x10/0x18 >> >> Weird, that's from a CPU_DYING call, I thought those were with IRQs >> disabled. >> >> Look at how __stop_machine() calls the function with IRQs disabled for ! >> stop_machine_initialized or !SMP. Also stop_machine_cpu_stop() seems to >> disabled interrupts, so how do we end up calling take_cpu_down() with >> IRQs enabled? > > The patch is no doubt wrong... > > The discuss in: > > https://lkml.org/lkml/2012/7/19/164 > > Which also faced the issue that the timer interrupt come in after apic > was shut down, I'm not sure whether this could do help to Huacai, just > as a clue... > One interesting thing that I noted in that case was that we noticed that (stale) interrupt exactly at the call to local_irq_restore() in stop_machine_cpu_stop(). However, as Peter pointed out, migration_call's CPU_DYING notifier runs right in the middle of the stop machine dance, much much before the call to local_irq_restore().. so it doesn't look like a case of a stale interrupt being recognized.. it looks as if the sequence of local_irq_disable(), hard_irq_disable() and __cpu_disable() somehow managed to wrongly keep the interrupts still enabled... Regards, Srivatsa S. Bhat > >> >> That simply doesn't make any sense. >> >>> Signed-off-by: Huacai Chen >>> --- >>> kernel/sched/core.c | 5 +++-- >>> 1 files changed, 3 insertions(+), 2 deletions(-) >>> >>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c >>> index 36e2666..703754a 100644 >>> --- a/kernel/sched/core.c >>> +++ b/kernel/sched/core.c >>> @@ -1468,9 +1468,10 @@ static void sched_ttwu_pending(void) >>> { >>> struct rq *rq = this_rq(); >>> struct llist_node *llist = llist_del_all(&rq->wake_list); >>> + unsigned long flags; >>> struct task_struct *p; >>> >>> - raw_spin_lock(&rq->lock); >>> + raw_spin_lock_irqsave(&rq->lock, flags); >>> >>> while (llist) { >>> p = llist_entry(llist, struct task_struct, wake_entry); >>> @@ -1478,7 +1479,7 @@ static void sched_ttwu_pending(void) >>> ttwu_do_activate(rq, p, 0); >>> } >>> >>> - raw_spin_unlock(&rq->lock); >>> + raw_spin_unlock_irqrestore(&rq->lock, flags); >>> } >>> >>> void scheduler_ipi(void) >> >> >> That's wrong though, you add the cost to the common case instead of the >> hardly ever ran hotplug case. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/