2012-08-07 10:43:01

by Chen LinX

[permalink] [raw]
Subject: [PATCH] x86/smp: Fix cpuN startup panic

From: Lin Chen <[email protected]>

We hit a panic while doing cpu hotplug test.
<0>[ 627.982857] Kernel panic - not syncing: smp_callin: CPU1 started up but did not get a callout!
<0>[ 627.982864]
<4>[ 627.982876] Pid: 0, comm: kworker/0:1 Tainted: G ...
<4>[ 627.982883] Call Trace:
<4>[ 627.982903] [<c18f2977>] panic+0x66/0x16c
<4>[ 627.982918] [<c12234cc>] ? default_get_apic_id+0x1c/0x40
<4>[ 627.982931] [<c18ef96d>] start_secondary+0xda/0x252

During BSP bootup AP, it is possible that BSP be preempted before
finishing STARTUP sequence of AP(set cpu_callout_mask) which maybe cause
AP busy wait for it. At present, AP will wait for 2 seconds then panic.

This patch let AP waits until BSP finish the startup sequence and gives
WARNING when BSP is preempted more than 2 seconds.

Signed-off-by: Yanmin Zhang <[email protected]>
Signed-off-by: Lin Chen <[email protected]>
---
arch/x86/kernel/smpboot.c | 11 ++++++-----
1 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 7c5a8c3..a9e3379 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -165,19 +165,20 @@ static void __cpuinit smp_callin(void)
* Waiting 2s total for startup (udelay is not yet working)
*/
timeout = jiffies + 2*HZ;
- while (time_before(jiffies, timeout)) {
+ while (1) {
/*
* Has the boot CPU finished it's STARTUP sequence?
*/
if (cpumask_test_cpu(cpuid, cpu_callout_mask))
break;
cpu_relax();
+ if (!time_before(jiffies, timeout)) {
+ WARN(1, "%s: CPU%d started up but did not get a callout!\n",
+ __func__, cpuid);
+ timeout = jiffies + 2*HZ;
+ }
}

- if (!time_before(jiffies, timeout)) {
- panic("%s: CPU%d started up but did not get a callout!\n",
- __func__, cpuid);
- }

/*
* the boot CPU has finished the init stage and is spinning
--
1.7.1


2012-08-07 16:34:01

by Jiang Liu

[permalink] [raw]
Subject: Re: [PATCH] x86/smp: Fix cpuN startup panic

On 08/07/2012 05:50 PM, Chen, LinX Z wrote:
> From: Lin Chen <[email protected]>
>
> We hit a panic while doing cpu hotplug test.
> <0>[ 627.982857] Kernel panic - not syncing: smp_callin: CPU1 started up but did not get a callout!
> <0>[ 627.982864]
> <4>[ 627.982876] Pid: 0, comm: kworker/0:1 Tainted: G ...
> <4>[ 627.982883] Call Trace:
> <4>[ 627.982903] [<c18f2977>] panic+0x66/0x16c
> <4>[ 627.982918] [<c12234cc>] ? default_get_apic_id+0x1c/0x40
> <4>[ 627.982931] [<c18ef96d>] start_secondary+0xda/0x252
>
> During BSP bootup AP, it is possible that BSP be preempted before
> finishing STARTUP sequence of AP(set cpu_callout_mask) which maybe cause
> AP busy wait for it. At present, AP will wait for 2 seconds then panic.
>
> This patch let AP waits until BSP finish the startup sequence and gives
> WARNING when BSP is preempted more than 2 seconds.
>
> Signed-off-by: Yanmin Zhang <[email protected]>
> Signed-off-by: Lin Chen <[email protected]>
> ---
> arch/x86/kernel/smpboot.c | 11 ++++++-----
> 1 files changed, 6 insertions(+), 5 deletions(-)
>
> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> index 7c5a8c3..a9e3379 100644
> --- a/arch/x86/kernel/smpboot.c
> +++ b/arch/x86/kernel/smpboot.c
> @@ -165,19 +165,20 @@ static void __cpuinit smp_callin(void)
> * Waiting 2s total for startup (udelay is not yet working)
> */
> timeout = jiffies + 2*HZ;
> - while (time_before(jiffies, timeout)) {
> + while (1) {
Hi Yanmin,

Seems a little risky, what if a slave CPU can't be booted due to hardware errors?
Regards!
Gerry

> /*
> * Has the boot CPU finished it's STARTUP sequence?
> */
> if (cpumask_test_cpu(cpuid, cpu_callout_mask))
> break;
> cpu_relax();
> + if (!time_before(jiffies, timeout)) {
> + WARN(1, "%s: CPU%d started up but did not get a callout!\n",
> + __func__, cpuid);
> + timeout = jiffies + 2*HZ;
> + }
> }
>
> - if (!time_before(jiffies, timeout)) {
> - panic("%s: CPU%d started up but did not get a callout!\n",
> - __func__, cpuid);
> - }
>
> /*
> * the boot CPU has finished the init stage and is spinning

2012-08-07 23:18:43

by Yanmin Zhang

[permalink] [raw]
Subject: Re: [PATCH] x86/smp: Fix cpuN startup panic

On Wed, 2012-08-08 at 00:33 +0800, Jiang Liu wrote:
> On 08/07/2012 05:50 PM, Chen, LinX Z wrote:
> > From: Lin Chen <[email protected]>
> >
> > We hit a panic while doing cpu hotplug test.
> > <0>[ 627.982857] Kernel panic - not syncing: smp_callin: CPU1 started up but did not get a callout!
> > <0>[ 627.982864]
> > <4>[ 627.982876] Pid: 0, comm: kworker/0:1 Tainted: G ...
> > <4>[ 627.982883] Call Trace:
> > <4>[ 627.982903] [<c18f2977>] panic+0x66/0x16c
> > <4>[ 627.982918] [<c12234cc>] ? default_get_apic_id+0x1c/0x40
> > <4>[ 627.982931] [<c18ef96d>] start_secondary+0xda/0x252
> >
> > During BSP bootup AP, it is possible that BSP be preempted before
> > finishing STARTUP sequence of AP(set cpu_callout_mask) which maybe cause
> > AP busy wait for it. At present, AP will wait for 2 seconds then panic.
> >
> > This patch let AP waits until BSP finish the startup sequence and gives
> > WARNING when BSP is preempted more than 2 seconds.
> >
> > Signed-off-by: Yanmin Zhang <[email protected]>
> > Signed-off-by: Lin Chen <[email protected]>
> > ---
> > arch/x86/kernel/smpboot.c | 11 ++++++-----
> > 1 files changed, 6 insertions(+), 5 deletions(-)
> >
> > diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> > index 7c5a8c3..a9e3379 100644
> > --- a/arch/x86/kernel/smpboot.c
> > +++ b/arch/x86/kernel/smpboot.c
> > @@ -165,19 +165,20 @@ static void __cpuinit smp_callin(void)
> > * Waiting 2s total for startup (udelay is not yet working)
> > */
> > timeout = jiffies + 2*HZ;
> > - while (time_before(jiffies, timeout)) {
> > + while (1) {
> Hi Yanmin,
>
> Seems a little risky, what if a slave CPU can't be booted due to hardware errors?
Slave CPU runs the loop. Basically, there is a handshake between BSP and AP.
The patch doesn't change BSP codes. So when slave CPU fails, BSP still goes ahead
and kernel still works.

> Regards!
> Gerry
>
> > /*
> > * Has the boot CPU finished it's STARTUP sequence?
> > */
> > if (cpumask_test_cpu(cpuid, cpu_callout_mask))
> > break;
> > cpu_relax();
> > + if (!time_before(jiffies, timeout)) {
> > + WARN(1, "%s: CPU%d started up but did not get a callout!\n",
> > + __func__, cpuid);
> > + timeout = jiffies + 2*HZ;
> > + }
> > }
> >
> > - if (!time_before(jiffies, timeout)) {
> > - panic("%s: CPU%d started up but did not get a callout!\n",
> > - __func__, cpuid);
> > - }
> >
> > /*
> > * the boot CPU has finished the init stage and is spinning
>