Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757442Ab2HGXSn (ORCPT ); Tue, 7 Aug 2012 19:18:43 -0400 Received: from mga03.intel.com ([143.182.124.21]:59764 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755834Ab2HGXSk (ORCPT ); Tue, 7 Aug 2012 19:18:40 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="4.71,315,1320652800"; d="scan'208";a="178107309" Message-ID: <1344381608.10460.3.camel@ymzhang.sh.intel.com> Subject: Re: [PATCH] x86/smp: Fix cpuN startup panic From: Yanmin Zhang Reply-To: yanmin_zhang@linux.intel.com To: Jiang Liu Cc: "Chen, LinX Z" , linux-kernel@vger.kernel.org, mingo@redhat.com, tglx@linutronix.de, hpa@zytor.com Date: Wed, 08 Aug 2012 07:20:08 +0800 In-Reply-To: <5021436D.4040205@gmail.com> References: <5020E4F0.5060203@intel.com> <5021436D.4040205@gmail.com> Organization: Intel. Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.2.2- Content-Transfer-Encoding: 7bit Mime-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2900 Lines: 74 On Wed, 2012-08-08 at 00:33 +0800, Jiang Liu wrote: > On 08/07/2012 05:50 PM, Chen, LinX Z wrote: > > From: Lin Chen > > > > We hit a panic while doing cpu hotplug test. > > <0>[ 627.982857] Kernel panic - not syncing: smp_callin: CPU1 started up but did not get a callout! > > <0>[ 627.982864] > > <4>[ 627.982876] Pid: 0, comm: kworker/0:1 Tainted: G ... > > <4>[ 627.982883] Call Trace: > > <4>[ 627.982903] [] panic+0x66/0x16c > > <4>[ 627.982918] [] ? default_get_apic_id+0x1c/0x40 > > <4>[ 627.982931] [] start_secondary+0xda/0x252 > > > > During BSP bootup AP, it is possible that BSP be preempted before > > finishing STARTUP sequence of AP(set cpu_callout_mask) which maybe cause > > AP busy wait for it. At present, AP will wait for 2 seconds then panic. > > > > This patch let AP waits until BSP finish the startup sequence and gives > > WARNING when BSP is preempted more than 2 seconds. > > > > Signed-off-by: Yanmin Zhang > > Signed-off-by: Lin Chen > > --- > > arch/x86/kernel/smpboot.c | 11 ++++++----- > > 1 files changed, 6 insertions(+), 5 deletions(-) > > > > diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c > > index 7c5a8c3..a9e3379 100644 > > --- a/arch/x86/kernel/smpboot.c > > +++ b/arch/x86/kernel/smpboot.c > > @@ -165,19 +165,20 @@ static void __cpuinit smp_callin(void) > > * Waiting 2s total for startup (udelay is not yet working) > > */ > > timeout = jiffies + 2*HZ; > > - while (time_before(jiffies, timeout)) { > > + while (1) { > Hi Yanmin, > > Seems a little risky, what if a slave CPU can't be booted due to hardware errors? Slave CPU runs the loop. Basically, there is a handshake between BSP and AP. The patch doesn't change BSP codes. So when slave CPU fails, BSP still goes ahead and kernel still works. > Regards! > Gerry > > > /* > > * Has the boot CPU finished it's STARTUP sequence? > > */ > > if (cpumask_test_cpu(cpuid, cpu_callout_mask)) > > break; > > cpu_relax(); > > + if (!time_before(jiffies, timeout)) { > > + WARN(1, "%s: CPU%d started up but did not get a callout!\n", > > + __func__, cpuid); > > + timeout = jiffies + 2*HZ; > > + } > > } > > > > - if (!time_before(jiffies, timeout)) { > > - panic("%s: CPU%d started up but did not get a callout!\n", > > - __func__, cpuid); > > - } > > > > /* > > * the boot CPU has finished the init stage and is spinning > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/