Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755071Ab0KJIkG (ORCPT ); Wed, 10 Nov 2010 03:40:06 -0500 Received: from mx2.mail.elte.hu ([157.181.151.9]:57627 "EHLO mx2.mail.elte.hu" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753015Ab0KJIkD (ORCPT ); Wed, 10 Nov 2010 03:40:03 -0500 Date: Wed, 10 Nov 2010 09:39:50 +0100 From: Ingo Molnar To: DDD Cc: Don Zickus , Peter Zijlstra , Thomas Gleixner , Ingo Molnar , "H. Peter Anvin" , x86@kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH] x86: avoid calling arch_trigger_all_cpu_backtrace() at the same time on SMP Message-ID: <20101110083950.GA8370@elte.hu> References: <1288721818-10851-1-git-send-email-dzickus@redhat.com> <20101110074307.GC29493@elte.hu> <4CDA5962.4030508@windriver.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4CDA5962.4030508@windriver.com> User-Agent: Mutt/1.5.20 (2009-08-17) X-ELTE-SpamScore: -2.0 X-ELTE-SpamLevel: X-ELTE-SpamCheck: no X-ELTE-SpamVersion: ELTE 2.0 X-ELTE-SpamCheck-Details: score=-2.0 required=5.9 tests=BAYES_00 autolearn=no SpamAssassin version=3.2.5 -2.0 BAYES_00 BODY: Bayesian spam probability is 0 to 1% [score: 0.0000] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4827 Lines: 135 * DDD wrote: > Ingo Molnar wrote: > >* Don Zickus wrote: > > > >>From: Dongdong Deng > >> > >>The spin_lock_debug/rcu_cpu_stall detector uses > >>trigger_all_cpu_backtrace() to dump cpu backtrace. > >>Therefore it is possible that trigger_all_cpu_backtrace() > >>could be called at the same time on different CPUs, which > >>triggers and 'unknown reason NMI' warning. The following case > >>illustrates the problem: > >> > >> CPU1 CPU2 ... CPU N > >> trigger_all_cpu_backtrace() > >> set "backtrace_mask" to cpu mask > >> | > >>generate NMI interrupts generate NMI interrupts ... > >> \ | / > >> \ | / > >> The "backtrace_mask" will be cleaned by the first NMI interrupt > >> at nmi_watchdog_tick(), then the following NMI interrupts generated > >>by other cpus's arch_trigger_all_cpu_backtrace() will be took as > >>unknown reason NMI interrupts. > >> > >>This patch uses a lock to avoid the problem, and stop the > >>arch_trigger_all_cpu_backtrace() calling to avoid dumping double cpu > >>backtrace info when there is already a trigger_all_cpu_backtrace() > >>in progress. > >> > >>Signed-off-by: Dongdong Deng > >>Reviewed-by: Bruce Ashfield > >>CC: Thomas Gleixner > >>CC: Ingo Molnar > >>CC: "H. Peter Anvin" > >>CC: x86@kernel.org > >>CC: linux-kernel@vger.kernel.org > >>Signed-off-by: Don Zickus > >>--- > >> arch/x86/kernel/apic/hw_nmi.c | 14 ++++++++++++++ > >> arch/x86/kernel/apic/nmi.c | 14 ++++++++++++++ > >> 2 files changed, 28 insertions(+), 0 deletions(-) > >> > >>diff --git a/arch/x86/kernel/apic/hw_nmi.c b/arch/x86/kernel/apic/hw_nmi.c > >>index cefd694..3aea0a5 100644 > >>--- a/arch/x86/kernel/apic/hw_nmi.c > >>+++ b/arch/x86/kernel/apic/hw_nmi.c > >>@@ -29,6 +29,16 @@ u64 hw_nmi_get_sample_period(void) > >> void arch_trigger_all_cpu_backtrace(void) > >> { > >> int i; > >>+ static arch_spinlock_t lock = __ARCH_SPIN_LOCK_UNLOCKED; > > > >Please dont put statics into the middle of local variables - put > >them into file scope in a visible way. > > Got it. will change it. > > > > >>+ unsigned long flags; > >>+ > >>+ local_irq_save(flags); > >>+ if (!arch_spin_trylock(&lock)) > >>+ /* > >>+ * If there is already a trigger_all_cpu_backtrace() > >>+ * in progress, don't output double cpu dump infos. > >>+ */ > >>+ goto out_restore_irq; > >> cpumask_copy(to_cpumask(backtrace_mask), cpu_online_mask); > >>@@ -41,6 +51,10 @@ void arch_trigger_all_cpu_backtrace(void) > >> break; > >> mdelay(1); > >> } > >>+ > >>+ arch_spin_unlock(&lock); > >>+out_restore_irq: > >>+ local_irq_restore(flags); > >> } > >> static int __kprobes > >>diff --git a/arch/x86/kernel/apic/nmi.c b/arch/x86/kernel/apic/nmi.c > >>index a43f71c..5fa8a13 100644 > >>--- a/arch/x86/kernel/apic/nmi.c > >>+++ b/arch/x86/kernel/apic/nmi.c > >>@@ -552,6 +552,16 @@ int do_nmi_callback(struct pt_regs *regs, int cpu) > >> void arch_trigger_all_cpu_backtrace(void) > >> { > >> int i; > >>+ static arch_spinlock_t lock = __ARCH_SPIN_LOCK_UNLOCKED; > >>+ unsigned long flags; > >>+ > >>+ local_irq_save(flags); > >>+ if (!arch_spin_trylock(&lock)) > >>+ /* > >>+ * If there is already a trigger_all_cpu_backtrace() > >>+ * in progress, don't output double cpu dump infos. > >>+ */ > >>+ goto out_restore_irq; > >> cpumask_copy(to_cpumask(backtrace_mask), cpu_online_mask); > >>@@ -564,4 +574,8 @@ void arch_trigger_all_cpu_backtrace(void) > >> break; > >> mdelay(1); > >> } > >>+ > >>+ arch_spin_unlock(&lock); > >>+out_restore_irq: > >>+ local_irq_restore(flags); > > > >This spinlock is never actually used as a spinlock - it's a "in > >progress" flag. Why not use a flag and test_and_set_bit()? > > Yep, it's a "in progress" flag, I will change to use a flag to > replace the spinlock. Thanks. > >Also, the irq disabling really needed, will this code ever be called with irqs on? > > This code could be called with irqs on. > > If the arch_trigger_all_cpu_backtrace() was triggered by > "spin_lock()"'s spin_lock_debug detector, it is possible that > the irq is enabled, thus we have to save and disable it here. Ok - please document that fact in the code. Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/