2011-06-20 09:24:46

by Santosh Shilimkar

[permalink] [raw]
Subject: [RFC PATCH] ARM: smp: Fix the CPU hotplug race with scheduler.

The current ARM CPU hotplug code suffers from couple of race conditions
in CPU online path with scheduler.
The ARM CPU hotplug code doesn't wait for hot-plugged CPU to be marked
active as part of cpu_notify() by the CPU which brought it up before
enabling interrupts.

So we end up in with couple of race conditions,
1) Interrupts are enabled even before CPU is marked as active.
2) Newly plugged CPU is marked as active but it is not marked online yet.

When an interrupt happens before the cpu_active bit is set, the scheduler
can't schedule the woken thread which is bound to that newly onlined cpu
and and selects a fallback runqueue.

Secondly marking CPU active before it is online also not desirable behaviour.

Fix this race conditions.

Signed-off-by: Santosh Shilimkar <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Russell King <[email protected]>
---
On v3.0 kernel I started seeing lock-up and random crashes when
CPU online/offline was attempted aggressively. With git bisect I could
reach to Peter's commit e4a52bcb9a18142d79e231b6733cabdbf2e67c1f[sched:
Remove rq->lock from the first half of ttwu()] which was also reported
by Marc Zyngier. But even after using the follow up fix from Peter
d6aa8f85f163[sched: Fix ttwu() for __ARCH_WANT_INTERRUPTS_ON_CTXSW],
I was still seeing issues with hotplug path. So as a experiment I just
pushed down the interrupt enabling on newly plugged CPU after it's marked
as online. This made things better and much stable but occasionally I was
still seeing lock-up.

With above as background I looked at arch/x86/ code and got convinced
myself that the experimental hack could be the right fix. While doing
this I came across a commit from Thomas fd8a7de177b [x86: cpu-hotplug:
Prevent softirq wakeup on wrong CPU] which fixed the race 2) on x86
architecture.

In this patch I have folded possible fixes for both race conditions for
ARM hotplug code as mentioned in change log. Hopefully I am not
introducing any new race with this patch and hence the RFC.

arch/arm/kernel/smp.c | 18 +++++++++++-------
1 files changed, 11 insertions(+), 7 deletions(-)

diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
index 344e52b..84373a9 100644
--- a/arch/arm/kernel/smp.c
+++ b/arch/arm/kernel/smp.c
@@ -302,13 +302,6 @@ asmlinkage void __cpuinit secondary_start_kernel(void)
platform_secondary_init(cpu);

/*
- * Enable local interrupts.
- */
- notify_cpu_starting(cpu);
- local_irq_enable();
- local_fiq_enable();
-
- /*
* Setup the percpu timer for this CPU.
*/
percpu_timer_setup();
@@ -322,6 +315,17 @@ asmlinkage void __cpuinit secondary_start_kernel(void)
*/
set_cpu_online(cpu, true);

+
+ while (!cpumask_test_cpu(smp_processor_id(), cpu_active_mask))
+ cpu_relax();
+
+ /*
+ * Enable local interrupts.
+ */
+ notify_cpu_starting(cpu);
+ local_irq_enable();
+ local_fiq_enable();
+
/*
* OK, it's off to the idle thread for us
*/
--
1.6.0.4


2011-06-20 09:51:10

by Russell King - ARM Linux

[permalink] [raw]
Subject: Re: [RFC PATCH] ARM: smp: Fix the CPU hotplug race with scheduler.

On Mon, Jun 20, 2011 at 02:53:59PM +0530, Santosh Shilimkar wrote:
> The current ARM CPU hotplug code suffers from couple of race conditions
> in CPU online path with scheduler.
> The ARM CPU hotplug code doesn't wait for hot-plugged CPU to be marked
> active as part of cpu_notify() by the CPU which brought it up before
> enabling interrupts.

Hmm, why not just move the set_cpu_online() call before notify_cpu_starting()
and add the wait after the set_cpu_online() ?

2011-06-20 10:15:00

by Russell King - ARM Linux

[permalink] [raw]
Subject: Re: [RFC PATCH] ARM: smp: Fix the CPU hotplug race with scheduler.

On Mon, Jun 20, 2011 at 10:50:53AM +0100, Russell King - ARM Linux wrote:
> On Mon, Jun 20, 2011 at 02:53:59PM +0530, Santosh Shilimkar wrote:
> > The current ARM CPU hotplug code suffers from couple of race conditions
> > in CPU online path with scheduler.
> > The ARM CPU hotplug code doesn't wait for hot-plugged CPU to be marked
> > active as part of cpu_notify() by the CPU which brought it up before
> > enabling interrupts.
>
> Hmm, why not just move the set_cpu_online() call before notify_cpu_starting()
> and add the wait after the set_cpu_online() ?

Actually, the race is caused by the CPU being marked online (and therefore
available for the scheduler) but not yet active (the CPU asking this one
to boot hasn't run the online notifiers yet.)

This, I feel, is a fault of generic code. If the CPU is not ready to have
processes scheduled on it (because migration is not initialized) then we
shouldn't be scheduling processes on the new CPU yet.

In any case, this should close the window by ensuring that we don't receive
an interrupt in the online-but-not-active case. Can you please test?

arch/arm/kernel/smp.c | 8 +++++++-
1 files changed, 7 insertions(+), 1 deletions(-)

diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
index 344e52b..e34d750 100644
--- a/arch/arm/kernel/smp.c
+++ b/arch/arm/kernel/smp.c
@@ -318,9 +318,15 @@ asmlinkage void __cpuinit secondary_start_kernel(void)
smp_store_cpu_info(cpu);

/*
- * OK, now it's safe to let the boot CPU continue
+ * OK, now it's safe to let the boot CPU continue. Wait for
+ * the CPU migration code to notice that the CPU is online
+ * before we continue.
*/
+ local_irq_disable();
set_cpu_online(cpu, true);
+ while (!cpumask_test_cpu(cpu, cpu_active_mask))
+ cpu_relax();
+ local_irq_enable();

/*
* OK, it's off to the idle thread for us

2011-06-20 10:20:07

by Santosh Shilimkar

[permalink] [raw]
Subject: Re: [RFC PATCH] ARM: smp: Fix the CPU hotplug race with scheduler.

On 6/20/2011 3:20 PM, Russell King - ARM Linux wrote:
> On Mon, Jun 20, 2011 at 02:53:59PM +0530, Santosh Shilimkar wrote:
>> The current ARM CPU hotplug code suffers from couple of race conditions
>> in CPU online path with scheduler.
>> The ARM CPU hotplug code doesn't wait for hot-plugged CPU to be marked
>> active as part of cpu_notify() by the CPU which brought it up before
>> enabling interrupts.
>
> Hmm, why not just move the set_cpu_online() call before notify_cpu_starting()
> and add the wait after the set_cpu_online() ?
I think that's what patch is doing. Do you mean, calling hotplug
notifier chain immediately after CPU marked as online. Something
like below.

set_cpu_online(cpu, true);
notify_cpu_starting(cpu);
while (!cpumask_test_cpu(smp_processor_id(), cpu_active_mask))
cpu_relax();

/*
* Enable local interrupts.
*/
local_irq_enable();
local_fiq_enable();


Regards
Santosh

2011-06-20 10:28:13

by Santosh Shilimkar

[permalink] [raw]
Subject: Re: [RFC PATCH] ARM: smp: Fix the CPU hotplug race with scheduler.

On 6/20/2011 3:44 PM, Russell King - ARM Linux wrote:
> On Mon, Jun 20, 2011 at 10:50:53AM +0100, Russell King - ARM Linux wrote:
>> On Mon, Jun 20, 2011 at 02:53:59PM +0530, Santosh Shilimkar wrote:
>>> The current ARM CPU hotplug code suffers from couple of race conditions
>>> in CPU online path with scheduler.
>>> The ARM CPU hotplug code doesn't wait for hot-plugged CPU to be marked
>>> active as part of cpu_notify() by the CPU which brought it up before
>>> enabling interrupts.
>>
>> Hmm, why not just move the set_cpu_online() call before notify_cpu_starting()
>> and add the wait after the set_cpu_online() ?
>
> Actually, the race is caused by the CPU being marked online (and therefore
> available for the scheduler) but not yet active (the CPU asking this one
> to boot hasn't run the online notifiers yet.)
>
Scheduler uses the active mask and not online mask. For schedules CPU
is ready for migration as soon as it is marked as active and that's
the reason, interrupts should never be enabled before CPU is marked
as active in online path.

> This, I feel, is a fault of generic code. If the CPU is not ready to have
> processes scheduled on it (because migration is not initialized) then we
> shouldn't be scheduling processes on the new CPU yet.
>
> In any case, this should close the window by ensuring that we don't receive
> an interrupt in the online-but-not-active case. Can you please test?
>
No it doesn't work. I still get the crash. The important point
here is not to enable interrupts before CPU is marked
as online and active.

Regards
Santosh

2011-06-20 10:35:43

by Russell King - ARM Linux

[permalink] [raw]
Subject: Re: [RFC PATCH] ARM: smp: Fix the CPU hotplug race with scheduler.

On Mon, Jun 20, 2011 at 03:58:03PM +0530, Santosh Shilimkar wrote:
> On 6/20/2011 3:44 PM, Russell King - ARM Linux wrote:
>> On Mon, Jun 20, 2011 at 10:50:53AM +0100, Russell King - ARM Linux wrote:
>>> On Mon, Jun 20, 2011 at 02:53:59PM +0530, Santosh Shilimkar wrote:
>>>> The current ARM CPU hotplug code suffers from couple of race conditions
>>>> in CPU online path with scheduler.
>>>> The ARM CPU hotplug code doesn't wait for hot-plugged CPU to be marked
>>>> active as part of cpu_notify() by the CPU which brought it up before
>>>> enabling interrupts.
>>>
>>> Hmm, why not just move the set_cpu_online() call before notify_cpu_starting()
>>> and add the wait after the set_cpu_online() ?
>>
>> Actually, the race is caused by the CPU being marked online (and therefore
>> available for the scheduler) but not yet active (the CPU asking this one
>> to boot hasn't run the online notifiers yet.)
>>
> Scheduler uses the active mask and not online mask. For schedules CPU
> is ready for migration as soon as it is marked as active and that's
> the reason, interrupts should never be enabled before CPU is marked
> as active in online path.
>
>> This, I feel, is a fault of generic code. If the CPU is not ready to have
>> processes scheduled on it (because migration is not initialized) then we
>> shouldn't be scheduling processes on the new CPU yet.
>>
>> In any case, this should close the window by ensuring that we don't receive
>> an interrupt in the online-but-not-active case. Can you please test?
>>
> No it doesn't work. I still get the crash. The important point
> here is not to enable interrupts before CPU is marked
> as online and active.

But we can't do that.

2011-06-20 10:44:34

by Russell King - ARM Linux

[permalink] [raw]
Subject: Re: [RFC PATCH] ARM: smp: Fix the CPU hotplug race with scheduler.

On Mon, Jun 20, 2011 at 03:58:03PM +0530, Santosh Shilimkar wrote:
> No it doesn't work. I still get the crash. The important point
> here is not to enable interrupts before CPU is marked
> as online and active.

What is the crash (in full please)?

Do we know what interrupt is causing it?

2011-06-20 10:45:45

by Santosh Shilimkar

[permalink] [raw]
Subject: Re: [RFC PATCH] ARM: smp: Fix the CPU hotplug race with scheduler.

On 6/20/2011 4:05 PM, Russell King - ARM Linux wrote:
> On Mon, Jun 20, 2011 at 03:58:03PM +0530, Santosh Shilimkar wrote:
>> On 6/20/2011 3:44 PM, Russell King - ARM Linux wrote:
>>> On Mon, Jun 20, 2011 at 10:50:53AM +0100, Russell King - ARM Linux wrote:
>>>> On Mon, Jun 20, 2011 at 02:53:59PM +0530, Santosh Shilimkar wrote:
>>>>> The current ARM CPU hotplug code suffers from couple of race conditions
>>>>> in CPU online path with scheduler.
>>>>> The ARM CPU hotplug code doesn't wait for hot-plugged CPU to be marked
>>>>> active as part of cpu_notify() by the CPU which brought it up before
>>>>> enabling interrupts.
>>>>
>>>> Hmm, why not just move the set_cpu_online() call before notify_cpu_starting()
>>>> and add the wait after the set_cpu_online() ?
>>>
>>> Actually, the race is caused by the CPU being marked online (and therefore
>>> available for the scheduler) but not yet active (the CPU asking this one
>>> to boot hasn't run the online notifiers yet.)
>>>
>> Scheduler uses the active mask and not online mask. For schedules CPU
>> is ready for migration as soon as it is marked as active and that's
>> the reason, interrupts should never be enabled before CPU is marked
>> as active in online path.
>>
>>> This, I feel, is a fault of generic code. If the CPU is not ready to have
>>> processes scheduled on it (because migration is not initialized) then we
>>> shouldn't be scheduling processes on the new CPU yet.
>>>
>>> In any case, this should close the window by ensuring that we don't receive
>>> an interrupt in the online-but-not-active case. Can you please test?
>>>
>> No it doesn't work. I still get the crash. The important point
>> here is not to enable interrupts before CPU is marked
>> as online and active.
>
> But we can't do that.
Why is that ?
Is it because of calibration or the hotplug start notifies needs to
be called with interrupts enabled ?

Regards
Santosh

2011-06-20 10:48:08

by Santosh Shilimkar

[permalink] [raw]
Subject: Re: [RFC PATCH] ARM: smp: Fix the CPU hotplug race with scheduler.

On 6/20/2011 4:14 PM, Russell King - ARM Linux wrote:
> On Mon, Jun 20, 2011 at 03:58:03PM +0530, Santosh Shilimkar wrote:
>> No it doesn't work. I still get the crash. The important point
>> here is not to enable interrupts before CPU is marked
>> as online and active.
>
> What is the crash (in full please)?
>
> Do we know what interrupt is causing it?
Yes. It's because of interrupt and the CPU active-online
race.

Here is the chash log..
[ 21.025451] CPU1: Booted secondary processor
[ 21.025451] CPU1: Unknown IPI message 0x1
[ 21.029113] Switched to NOHz mode on CPU #1
[ 21.029174] BUG: spinlock lockup on CPU#1, swapper/0, c06220c4
[ 21.029235] [<c0064704>] (unwind_backtrace+0x0/0xf4) from
[<c028edc8>] (do_raw_spin_lock+0xd0/0x164)
[ 21.029266] [<c028edc8>] (do_raw_spin_lock+0xd0/0x164) from
[<c00cc3c4>] (tick_do_update_jiffies64+0x3c/0x118)
[ 21.029296] [<c00cc3c4>] (tick_do_update_jiffies64+0x3c/0x118) from
[<c00ccb04>] (tick_check_idle+0xb0/0x110)
[ 21.029327] [<c00ccb04>] (tick_check_idle+0xb0/0x110) from
[<c00a29cc>] (irq_enter+0x68/0x70)
[ 21.029327] [<c00a29cc>] (irq_enter+0x68/0x70) from [<c00623c4>]
(ipi_timer+0x24/0x40)
[ 21.029357] [<c00623c4>] (ipi_timer+0x24/0x40) from [<c0051368>]
(do_local_timer+0x54/0x70)
[ 21.029388] [<c0051368>] (do_local_timer+0x54/0x70) from [<c048a09c>]
(__irq_svc+0x3c/0x120)
[ 21.029388] Exception stack(0xef87bf78 to 0xef87bfc0)
[ 21.029388] bf60:
00000000 00026ec0
[ 21.029418] bf80: c0622080 ffff7483 c0622080 ffff7483 ef87a000
00000000 c0622080 411fc092
[ 21.029418] bfa0: c063a4f0 00000000 00000001 ef87bfc0 c0482e08
c0482b0c 60000113 ffffffff
[ 21.029449] [<c048a09c>] (__irq_svc+0x3c/0x120) from [<c0482b0c>]
(calibrate_delay+0x8c/0x1d4)
[ 21.029479] [<c0482b0c>] (calibrate_delay+0x8c/0x1d4) from
[<c0482e08>] (secondary_start_kernel+0x110/0x1ac)
[ 21.029510] [<c0482e08>] (secondary_start_kernel+0x110/0x1ac) from
[<c0070ee4>] (platform_cpu_die+0x34/0x54)
[ 22.021362] CPU1: failed to come online
[ 23.997955] CPU1: failed to come online
[ 25.000122] BUG: spinlock lockup on CPU#0, kthreadd/663, efa27e64
[ 25.006408] [<c0064704>] (unwind_backtrace+0x0/0xf4) from
[<c028edc8>] (do_raw_spin_lock+0xd0/0x164)
[ 25.015808] [<c028edc8>] (do_raw_spin_lock+0xd0/0x164) from
[<c048985c>] (_raw_spin_lock_irqsave+0x4c/0x58)
[ 25.025848] [<c048985c>] (_raw_spin_lock_irqsave+0x4c/0x58) from
[<c008ba24>] (complete+0x1c/0x5c)
[ 25.035095] [<c008ba24>] (complete+0x1c/0x5c) from [<c00baf78>]
(kthread+0x68/0x90)
[ 25.042968] [<c00baf78>] (kthread+0x68/0x90) from [<c005dfdc>]
(kernel_thread_exit+0x0/0x8)

2011-06-20 11:14:20

by Russell King - ARM Linux

[permalink] [raw]
Subject: Re: [RFC PATCH] ARM: smp: Fix the CPU hotplug race with scheduler.

On Mon, Jun 20, 2011 at 04:17:58PM +0530, Santosh Shilimkar wrote:
> Yes. It's because of interrupt and the CPU active-online
> race.

I don't see that as a conclusion from this dump.

> Here is the chash log..
> [ 21.025451] CPU1: Booted secondary processor
> [ 21.025451] CPU1: Unknown IPI message 0x1
> [ 21.029113] Switched to NOHz mode on CPU #1
> [ 21.029174] BUG: spinlock lockup on CPU#1, swapper/0, c06220c4

That's the xtime seqlock. We're trying to update the xtime from CPU1,
which is not yet online and not yet active. That's fine, we're just
spinning on the spinlock here, waiting for the other CPUs to release
it.

But what this is saying is that the other CPUs aren't releasing it.
The cpu hotplug code doesn't hold the seqlock either. So who else is
holding this lock, causing CPU1 to time out on it.

The other thing is that this is only supposed to trigger after about
one second:

u64 loops = loops_per_jiffy * HZ;
for (i = 0; i < loops; i++) {
if (arch_spin_trylock(&lock->raw_lock))
return;
__delay(1);
}

which from the timings you have at the beginning of your printk lines
is clearly not the case - it's more like 61us.

Are you running with those h/w timer delay patches?

2011-06-20 11:25:54

by Santosh Shilimkar

[permalink] [raw]
Subject: Re: [RFC PATCH] ARM: smp: Fix the CPU hotplug race with scheduler.

On 6/20/2011 4:43 PM, Russell King - ARM Linux wrote:
> On Mon, Jun 20, 2011 at 04:17:58PM +0530, Santosh Shilimkar wrote:
>> Yes. It's because of interrupt and the CPU active-online
>> race.
>
> I don't see that as a conclusion from this dump.
>
>> Here is the chash log..
>> [ 21.025451] CPU1: Booted secondary processor
>> [ 21.025451] CPU1: Unknown IPI message 0x1
>> [ 21.029113] Switched to NOHz mode on CPU #1
>> [ 21.029174] BUG: spinlock lockup on CPU#1, swapper/0, c06220c4
>
> That's the xtime seqlock. We're trying to update the xtime from CPU1,
> which is not yet online and not yet active. That's fine, we're just
> spinning on the spinlock here, waiting for the other CPUs to release
> it.
>
> But what this is saying is that the other CPUs aren't releasing it.
> The cpu hotplug code doesn't hold the seqlock either. So who else is
> holding this lock, causing CPU1 to time out on it.
>
> The other thing is that this is only supposed to trigger after about
> one second:
>
> u64 loops = loops_per_jiffy * HZ;
> for (i = 0; i< loops; i++) {
> if (arch_spin_trylock(&lock->raw_lock))
> return;
> __delay(1);
> }
>
> which from the timings you have at the beginning of your printk lines
> is clearly not the case - it's more like 61us.
>
> Are you running with those h/w timer delay patches?
Nope.

Regards
Santosh

2011-06-20 11:40:43

by Russell King - ARM Linux

[permalink] [raw]
Subject: Re: [RFC PATCH] ARM: smp: Fix the CPU hotplug race with scheduler.

On Mon, Jun 20, 2011 at 04:55:43PM +0530, Santosh Shilimkar wrote:
> On 6/20/2011 4:43 PM, Russell King - ARM Linux wrote:
>> On Mon, Jun 20, 2011 at 04:17:58PM +0530, Santosh Shilimkar wrote:
>>> Yes. It's because of interrupt and the CPU active-online
>>> race.
>>
>> I don't see that as a conclusion from this dump.
>>
>>> Here is the chash log..
>>> [ 21.025451] CPU1: Booted secondary processor
>>> [ 21.025451] CPU1: Unknown IPI message 0x1
>>> [ 21.029113] Switched to NOHz mode on CPU #1
>>> [ 21.029174] BUG: spinlock lockup on CPU#1, swapper/0, c06220c4
>>
>> That's the xtime seqlock. We're trying to update the xtime from CPU1,
>> which is not yet online and not yet active. That's fine, we're just
>> spinning on the spinlock here, waiting for the other CPUs to release
>> it.
>>
>> But what this is saying is that the other CPUs aren't releasing it.
>> The cpu hotplug code doesn't hold the seqlock either. So who else is
>> holding this lock, causing CPU1 to time out on it.
>>
>> The other thing is that this is only supposed to trigger after about
>> one second:
>>
>> u64 loops = loops_per_jiffy * HZ;
>> for (i = 0; i< loops; i++) {
>> if (arch_spin_trylock(&lock->raw_lock))
>> return;
>> __delay(1);
>> }
>>
>> which from the timings you have at the beginning of your printk lines
>> is clearly not the case - it's more like 61us.
>>
>> Are you running with those h/w timer delay patches?
> Nope.

Ok. So loops_per_jiffy must be too small. My guess is you're using an
older kernel without 71c696b1 (calibrate: extract fall-back calculation
into own helper).

The delay calibration code used to start out by setting:

loops_per_jiffy = (1<<12);

This will shorten the delay right down, and that's probably causing these
false spinlock lockup bug dumps.

Arranging for IRQs to be disabled across the delay calibration just avoids
the issue by preventing any spinlock being taken.

The reason that CPU#0 also complains about spinlock lockup is that for
some reason CPU#1 never finishes its calibration, and so the loop also
times out early on CPU#0.

Of course, fiddling with this global variable in this way is _not_ a good
idea while other CPUs are running and using that variable.

We could also do with implementing trigger_all_cpu_backtrace() to get
backtraces from the other CPUs when spinlock lockup happens...

2011-06-20 11:42:15

by Santosh Shilimkar

[permalink] [raw]
Subject: Re: [RFC PATCH] ARM: smp: Fix the CPU hotplug race with scheduler.

On 6/20/2011 4:15 PM, Santosh Shilimkar wrote:
> On 6/20/2011 4:05 PM, Russell King - ARM Linux wrote:
>> On Mon, Jun 20, 2011 at 03:58:03PM +0530, Santosh Shilimkar wrote:
>>> On 6/20/2011 3:44 PM, Russell King - ARM Linux wrote:
>>>> On Mon, Jun 20, 2011 at 10:50:53AM +0100, Russell King - ARM Linux
>>>> wrote:
>>>>> On Mon, Jun 20, 2011 at 02:53:59PM +0530, Santosh Shilimkar wrote:
>>>>>> The current ARM CPU hotplug code suffers from couple of race
>>>>>> conditions
>>>>>> in CPU online path with scheduler.
>>>>>> The ARM CPU hotplug code doesn't wait for hot-plugged CPU to be
>>>>>> marked
>>>>>> active as part of cpu_notify() by the CPU which brought it up before
>>>>>> enabling interrupts.
>>>>>
>>>>> Hmm, why not just move the set_cpu_online() call before
>>>>> notify_cpu_starting()
>>>>> and add the wait after the set_cpu_online() ?
>>>>
>>>> Actually, the race is caused by the CPU being marked online (and
>>>> therefore
>>>> available for the scheduler) but not yet active (the CPU asking this
>>>> one
>>>> to boot hasn't run the online notifiers yet.)
>>>>
>>> Scheduler uses the active mask and not online mask. For schedules CPU
>>> is ready for migration as soon as it is marked as active and that's
>>> the reason, interrupts should never be enabled before CPU is marked
>>> as active in online path.
>>>
>>>> This, I feel, is a fault of generic code. If the CPU is not ready to
>>>> have
>>>> processes scheduled on it (because migration is not initialized)
>>>> then we
>>>> shouldn't be scheduling processes on the new CPU yet.
>>>>
>>>> In any case, this should close the window by ensuring that we don't
>>>> receive
>>>> an interrupt in the online-but-not-active case. Can you please test?
>>>>
>>> No it doesn't work. I still get the crash. The important point
>>> here is not to enable interrupts before CPU is marked
>>> as online and active.
>>
>> But we can't do that.
> Why is that ?
> Is it because of calibration or the hotplug start notifies needs to
> be called with interrupts enabled ?
>
BTW, how is ARM different from X86 here. I mean the X86 code seems to
do similar what my patch is trying to fix for ARM. Some pointers
would help me to understand why can't we delay the interrupt enable
part on ARM hotplug code.

Regards
Santosh

2011-06-20 11:51:58

by Santosh Shilimkar

[permalink] [raw]
Subject: Re: [RFC PATCH] ARM: smp: Fix the CPU hotplug race with scheduler.

On 6/20/2011 5:10 PM, Russell King - ARM Linux wrote:
> On Mon, Jun 20, 2011 at 04:55:43PM +0530, Santosh Shilimkar wrote:
>> On 6/20/2011 4:43 PM, Russell King - ARM Linux wrote:
>>> On Mon, Jun 20, 2011 at 04:17:58PM +0530, Santosh Shilimkar wrote:
>>>> Yes. It's because of interrupt and the CPU active-online
>>>> race.
>>>
>>> I don't see that as a conclusion from this dump.
>>>
>>>> Here is the chash log..
>>>> [ 21.025451] CPU1: Booted secondary processor
>>>> [ 21.025451] CPU1: Unknown IPI message 0x1
>>>> [ 21.029113] Switched to NOHz mode on CPU #1
>>>> [ 21.029174] BUG: spinlock lockup on CPU#1, swapper/0, c06220c4
>>>
>>> That's the xtime seqlock. We're trying to update the xtime from CPU1,
>>> which is not yet online and not yet active. That's fine, we're just
>>> spinning on the spinlock here, waiting for the other CPUs to release
>>> it.
>>>
>>> But what this is saying is that the other CPUs aren't releasing it.
>>> The cpu hotplug code doesn't hold the seqlock either. So who else is
>>> holding this lock, causing CPU1 to time out on it.
>>>
>>> The other thing is that this is only supposed to trigger after about
>>> one second:
>>>
>>> u64 loops = loops_per_jiffy * HZ;
>>> for (i = 0; i< loops; i++) {
>>> if (arch_spin_trylock(&lock->raw_lock))
>>> return;
>>> __delay(1);
>>> }
>>>
>>> which from the timings you have at the beginning of your printk lines
>>> is clearly not the case - it's more like 61us.
>>>
>>> Are you running with those h/w timer delay patches?
>> Nope.
>
> Ok. So loops_per_jiffy must be too small. My guess is you're using an
> older kernel without 71c696b1 (calibrate: extract fall-back calculation
> into own helper).
>
I am on V3.0-rc3+(latest mainline) and the above commit is already
part of it.

> The delay calibration code used to start out by setting:
>
> loops_per_jiffy = (1<<12);
>
> This will shorten the delay right down, and that's probably causing these
> false spinlock lockup bug dumps.
>
> Arranging for IRQs to be disabled across the delay calibration just avoids
> the issue by preventing any spinlock being taken.
>
> The reason that CPU#0 also complains about spinlock lockup is that for
> some reason CPU#1 never finishes its calibration, and so the loop also
> times out early on CPU#0.
>
I am not sure but what I think is happening is as soon as interrupts
start firing, as part of IRQ handling, scheduler will try to
enqueue softIRQ thread for newly booted CPU since it sees that
it's active and ready. But that's failing and both CPU's
eventually lock-up. But I may be wrong here.

> Of course, fiddling with this global variable in this way is _not_ a good
> idea while other CPUs are running and using that variable.
>
> We could also do with implementing trigger_all_cpu_backtrace() to get
> backtraces from the other CPUs when spinlock lockup happens...

Any pointers on the other question about "why we need to enable
interrupts before the CPU is ready?"

Regards
Santosh

2011-06-20 12:20:02

by Russell King - ARM Linux

[permalink] [raw]
Subject: Re: [RFC PATCH] ARM: smp: Fix the CPU hotplug race with scheduler.

On Mon, Jun 20, 2011 at 05:21:48PM +0530, Santosh Shilimkar wrote:
> On 6/20/2011 5:10 PM, Russell King - ARM Linux wrote:
>> On Mon, Jun 20, 2011 at 04:55:43PM +0530, Santosh Shilimkar wrote:
>>> On 6/20/2011 4:43 PM, Russell King - ARM Linux wrote:
>>>> On Mon, Jun 20, 2011 at 04:17:58PM +0530, Santosh Shilimkar wrote:
>>>>> Yes. It's because of interrupt and the CPU active-online
>>>>> race.
>>>>
>>>> I don't see that as a conclusion from this dump.
>>>>
>>>>> Here is the chash log..
>>>>> [ 21.025451] CPU1: Booted secondary processor
>>>>> [ 21.025451] CPU1: Unknown IPI message 0x1
>>>>> [ 21.029113] Switched to NOHz mode on CPU #1
>>>>> [ 21.029174] BUG: spinlock lockup on CPU#1, swapper/0, c06220c4
>>>>
>>>> That's the xtime seqlock. We're trying to update the xtime from CPU1,
>>>> which is not yet online and not yet active. That's fine, we're just
>>>> spinning on the spinlock here, waiting for the other CPUs to release
>>>> it.
>>>>
>>>> But what this is saying is that the other CPUs aren't releasing it.
>>>> The cpu hotplug code doesn't hold the seqlock either. So who else is
>>>> holding this lock, causing CPU1 to time out on it.
>>>>
>>>> The other thing is that this is only supposed to trigger after about
>>>> one second:
>>>>
>>>> u64 loops = loops_per_jiffy * HZ;
>>>> for (i = 0; i< loops; i++) {
>>>> if (arch_spin_trylock(&lock->raw_lock))
>>>> return;
>>>> __delay(1);
>>>> }
>>>>
>>>> which from the timings you have at the beginning of your printk lines
>>>> is clearly not the case - it's more like 61us.
>>>>
>>>> Are you running with those h/w timer delay patches?
>>> Nope.
>>
>> Ok. So loops_per_jiffy must be too small. My guess is you're using an
>> older kernel without 71c696b1 (calibrate: extract fall-back calculation
>> into own helper).
>>
> I am on V3.0-rc3+(latest mainline) and the above commit is already
> part of it.
>
>> The delay calibration code used to start out by setting:
>>
>> loops_per_jiffy = (1<<12);
>>
>> This will shorten the delay right down, and that's probably causing these
>> false spinlock lockup bug dumps.
>>
>> Arranging for IRQs to be disabled across the delay calibration just avoids
>> the issue by preventing any spinlock being taken.
>>
>> The reason that CPU#0 also complains about spinlock lockup is that for
>> some reason CPU#1 never finishes its calibration, and so the loop also
>> times out early on CPU#0.
>>
> I am not sure but what I think is happening is as soon as interrupts
> start firing, as part of IRQ handling, scheduler will try to
> enqueue softIRQ thread for newly booted CPU since it sees that
> it's active and ready. But that's failing and both CPU's
> eventually lock-up. But I may be wrong here.

Even if that happens, there is NO WAY that the spinlock lockup detector
should report lockup in anything under 1s.

>> Of course, fiddling with this global variable in this way is _not_ a good
>> idea while other CPUs are running and using that variable.
>>
>> We could also do with implementing trigger_all_cpu_backtrace() to get
>> backtraces from the other CPUs when spinlock lockup happens...
>
> Any pointers on the other question about "why we need to enable
> interrupts before the CPU is ready?"

To ensure that things like the delay loop calibration and twd calibration
can run, though that looks like it'll run happily enough with the boot
CPU updating jiffies.

However, I'm still not taking your patch because I believe its just
papering over the real issue, which is not as you describe.

You first need to work out why the spinlock lockup detection is firing
after just 61us rather than the full 1s and fix that.

You then need to work out whether you really do have spinlock lockup,
and if so, why. Implementing trigger_all_cpu_backtrace() may help to
find out what CPU#0 is doing, though we can only do that with IRQs on,
and so would be fragile.

We can test whether CPU#0 is going off to do something else while CPU#1
is being brought up, by adding a preempt_disable() / preempt_enable()
in __cpu_up() to prevent the wait-for-cpu#1-online being preempted by
other threads - I suspect you'll still see spinlock lockup on the
xtime seqlock on CPU#1 though. That would suggest a coherency issue.

Finally, how are you provoking this - and what kernel configuration are
you using?

2011-06-20 12:27:12

by Santosh Shilimkar

[permalink] [raw]
Subject: Re: [RFC PATCH] ARM: smp: Fix the CPU hotplug race with scheduler.

On 6/20/2011 5:49 PM, Russell King - ARM Linux wrote:
> On Mon, Jun 20, 2011 at 05:21:48PM +0530, Santosh Shilimkar wrote:
>> On 6/20/2011 5:10 PM, Russell King - ARM Linux wrote:

[...]

>>
>> Any pointers on the other question about "why we need to enable
>> interrupts before the CPU is ready?"
>
> To ensure that things like the delay loop calibration and twd calibration
> can run, though that looks like it'll run happily enough with the boot
> CPU updating jiffies.
>
I guessed it and had same point as above. Calibration will still
work.

> However, I'm still not taking your patch because I believe its just
> papering over the real issue, which is not as you describe.
>
> You first need to work out why the spinlock lockup detection is firing
> after just 61us rather than the full 1s and fix that.
>
This is possibly because of my script which doesn't wait for 1
second.

> You then need to work out whether you really do have spinlock lockup,
> and if so, why. Implementing trigger_all_cpu_backtrace() may help to
> find out what CPU#0 is doing, though we can only do that with IRQs on,
> and so would be fragile.
>
> We can test whether CPU#0 is going off to do something else while CPU#1
> is being brought up, by adding a preempt_disable() / preempt_enable()
> in __cpu_up() to prevent the wait-for-cpu#1-online being preempted by
> other threads - I suspect you'll still see spinlock lockup on the
> xtime seqlock on CPU#1 though. That would suggest a coherency issue.
>
> Finally, how are you provoking this - and what kernel configuration are
> you using?
Latest mainline kernel with omap2plus_defconfig and below simple script
to trigger the failure.

-------------
while true
do
echo 0 > /sys/devices/system/cpu/cpu1/online
echo 1 > /sys/devices/system/cpu/cpu1/online
done


Regards
Santosh

2011-06-20 12:57:37

by Russell King - ARM Linux

[permalink] [raw]
Subject: Re: [RFC PATCH] ARM: smp: Fix the CPU hotplug race with scheduler.

On Mon, Jun 20, 2011 at 05:57:01PM +0530, Santosh Shilimkar wrote:
> On 6/20/2011 5:49 PM, Russell King - ARM Linux wrote:
>> On Mon, Jun 20, 2011 at 05:21:48PM +0530, Santosh Shilimkar wrote:
>>> On 6/20/2011 5:10 PM, Russell King - ARM Linux wrote:
>
> [...]
>
>>>
>>> Any pointers on the other question about "why we need to enable
>>> interrupts before the CPU is ready?"
>>
>> To ensure that things like the delay loop calibration and twd calibration
>> can run, though that looks like it'll run happily enough with the boot
>> CPU updating jiffies.
>>
> I guessed it and had same point as above. Calibration will still
> work.
>
>> However, I'm still not taking your patch because I believe its just
>> papering over the real issue, which is not as you describe.
>>
>> You first need to work out why the spinlock lockup detection is firing
>> after just 61us rather than the full 1s and fix that.
>>
> This is possibly because of my script which doesn't wait for 1
> second.

How could a userspace script affect the internal behaviour of
spin_lock() and the spinlock lockup detector?

> Latest mainline kernel with omap2plus_defconfig and below simple script
> to trigger the failure.
>
> -------------
> while true
> do
> echo 0 > /sys/devices/system/cpu/cpu1/online
> echo 1 > /sys/devices/system/cpu/cpu1/online
> done

Thanks, I'll give it a go here and see if I can debug it further.

2011-06-20 14:24:04

by Russell King - ARM Linux

[permalink] [raw]
Subject: Re: [RFC PATCH] ARM: smp: Fix the CPU hotplug race with scheduler.

On Mon, Jun 20, 2011 at 12:40:19PM +0100, Russell King - ARM Linux wrote:
> Ok. So loops_per_jiffy must be too small. My guess is you're using an
> older kernel without 71c696b1 (calibrate: extract fall-back calculation
> into own helper).

Right, this commit above helps show the problem - and it's fairly subtle.

It's a race condition. Let's first look at the spinlock debugging code.
It does this:

static void __spin_lock_debug(raw_spinlock_t *lock)
{
u64 i;
u64 loops = loops_per_jiffy * HZ;

for (;;) {
for (i = 0; i < loops; i++) {
if (arch_spin_trylock(&lock->raw_lock))
return;
__delay(1);
}
/* print warning */
}
}

If loops_per_jiffy is zero, we never try to grab the spinlock, because
we never enter the inner for loop. We immediately print a warning,
and re-execute the outer loop for ever, resulting in the CPU locking up
in this condition.

In theory, we should never see a zero loops_per_jiffy value, because it
represents the number of loops __delay() needs to delay by one jiffy and
clearly zero makes no sense.

However, calibrate_delay() does this (which x86 and ARM call on secondary
CPU startup):

calibrate_delay()
{
...
if (preset_lpj) {
} else if ((!printed) && lpj_fine) {
} else if ((loops_per_jiffy = calibrate_delay_direct()) != 0) {
} else {
/* approximation/convergence stuff */
}
}

Now, before 71c696b, this used to be:

} else {
loops_per_jiffy = (1<<12);

So the window between calibrate_delay_direct() returning and setting
loops_per_jiffy to zero, and the re-initialization of loops_per_jiffy
was relatively short (maybe even the compiler optimized away the zero
write.)

However, after 71c696b, this now does:

} else {
if (!printed)
pr_info("Calibrating delay loop... ");
+ loops_per_jiffy = calibrate_delay_converge();

So, as loops_per_jiffy is not local to this function, the compiler has
to write out that zero value, before calling calibrate_delay_converge(),
and loops_per_jiffy only becomes non-zero _after_ calibrate_delay_converge()
has returned. This opens the window and allows the spinlock debugging
code to explode.

This patch closes the window completely, by only writing to loops_per_jiffy
only when we have a real value for it.

This allows me to boot 3.0.0-rc3 on Versatile Express (4 CPU) whereas
without this it fails with spinlock lockup and rcu problems.

init/calibrate.c | 14 ++++++++------
1 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/init/calibrate.c b/init/calibrate.c
index 2568d22..aae2f40 100644
--- a/init/calibrate.c
+++ b/init/calibrate.c
@@ -245,30 +245,32 @@ static unsigned long __cpuinit calibrate_delay_converge(void)

void __cpuinit calibrate_delay(void)
{
+ unsigned long lpj;
static bool printed;

if (preset_lpj) {
- loops_per_jiffy = preset_lpj;
+ lpj = preset_lpj;
if (!printed)
pr_info("Calibrating delay loop (skipped) "
"preset value.. ");
} else if ((!printed) && lpj_fine) {
- loops_per_jiffy = lpj_fine;
+ lpj = lpj_fine;
pr_info("Calibrating delay loop (skipped), "
"value calculated using timer frequency.. ");
- } else if ((loops_per_jiffy = calibrate_delay_direct()) != 0) {
+ } else if ((lpj = calibrate_delay_direct()) != 0) {
if (!printed)
pr_info("Calibrating delay using timer "
"specific routine.. ");
} else {
if (!printed)
pr_info("Calibrating delay loop... ");
- loops_per_jiffy = calibrate_delay_converge();
+ lpj = calibrate_delay_converge();
}
if (!printed)
pr_cont("%lu.%02lu BogoMIPS (lpj=%lu)\n",
- loops_per_jiffy/(500000/HZ),
- (loops_per_jiffy/(5000/HZ)) % 100, loops_per_jiffy);
+ lpj/(500000/HZ),
+ (lpj/(5000/HZ)) % 100, lpj);

+ loops_per_jiffy = lpj;
printed = true;
}

2011-06-20 14:54:42

by Santosh Shilimkar

[permalink] [raw]
Subject: Re: [RFC PATCH] ARM: smp: Fix the CPU hotplug race with scheduler.

On 6/20/2011 7:53 PM, Russell King - ARM Linux wrote:
> On Mon, Jun 20, 2011 at 12:40:19PM +0100, Russell King - ARM Linux wrote:
>> Ok. So loops_per_jiffy must be too small. My guess is you're using an
>> older kernel without 71c696b1 (calibrate: extract fall-back calculation
>> into own helper).
>
> Right, this commit above helps show the problem - and it's fairly subtle.
>
> It's a race condition. Let's first look at the spinlock debugging code.
> It does this:
>
> static void __spin_lock_debug(raw_spinlock_t *lock)
> {
> u64 i;
> u64 loops = loops_per_jiffy * HZ;
>
> for (;;) {
> for (i = 0; i< loops; i++) {
> if (arch_spin_trylock(&lock->raw_lock))
> return;
> __delay(1);
> }
> /* print warning */
> }
> }
>
> If loops_per_jiffy is zero, we never try to grab the spinlock, because
> we never enter the inner for loop. We immediately print a warning,
> and re-execute the outer loop for ever, resulting in the CPU locking up
> in this condition.
>
> In theory, we should never see a zero loops_per_jiffy value, because it
> represents the number of loops __delay() needs to delay by one jiffy and
> clearly zero makes no sense.
>
> However, calibrate_delay() does this (which x86 and ARM call on secondary
> CPU startup):
>
> calibrate_delay()
> {
> ...
> if (preset_lpj) {
> } else if ((!printed)&& lpj_fine) {
> } else if ((loops_per_jiffy = calibrate_delay_direct()) != 0) {
> } else {
> /* approximation/convergence stuff */
> }
> }
>
> Now, before 71c696b, this used to be:
>
> } else {
> loops_per_jiffy = (1<<12);
>
> So the window between calibrate_delay_direct() returning and setting
> loops_per_jiffy to zero, and the re-initialization of loops_per_jiffy
> was relatively short (maybe even the compiler optimized away the zero
> write.)
>
> However, after 71c696b, this now does:
>
> } else {
> if (!printed)
> pr_info("Calibrating delay loop... ");
> + loops_per_jiffy = calibrate_delay_converge();
>
> So, as loops_per_jiffy is not local to this function, the compiler has
> to write out that zero value, before calling calibrate_delay_converge(),
> and loops_per_jiffy only becomes non-zero _after_ calibrate_delay_converge()
> has returned. This opens the window and allows the spinlock debugging
> code to explode.
>
> This patch closes the window completely, by only writing to loops_per_jiffy
> only when we have a real value for it.
>
> This allows me to boot 3.0.0-rc3 on Versatile Express (4 CPU) whereas
> without this it fails with spinlock lockup and rcu problems.
>
> init/calibrate.c | 14 ++++++++------
> 1 files changed, 8 insertions(+), 6 deletions(-)
>
I am away from my board now. Will test this change.
btw, the online-active race is still open even with this patch close
and should be fixed.

Regards
Santosh

2011-06-20 15:02:01

by Russell King - ARM Linux

[permalink] [raw]
Subject: Re: [RFC PATCH] ARM: smp: Fix the CPU hotplug race with scheduler.

On Mon, Jun 20, 2011 at 08:24:33PM +0530, Santosh Shilimkar wrote:
> I am away from my board now. Will test this change.
> btw, the online-active race is still open even with this patch close
> and should be fixed.

I have yet to see any evidence of that race - I've been running your
test loop for about an hour so far on Versatile Express and nothing
yet.

That's not to say that we shouldn't wait for the active mask to become
true before calling schedule(), but I don't think its as big a deal as
you're suggesting it is.

2011-06-20 15:10:56

by Santosh Shilimkar

[permalink] [raw]
Subject: Re: [RFC PATCH] ARM: smp: Fix the CPU hotplug race with scheduler.

On 6/20/2011 8:31 PM, Russell King - ARM Linux wrote:
> On Mon, Jun 20, 2011 at 08:24:33PM +0530, Santosh Shilimkar wrote:
>> I am away from my board now. Will test this change.
>> btw, the online-active race is still open even with this patch close
>> and should be fixed.
>
> I have yet to see any evidence of that race - I've been running your
> test loop for about an hour so far on Versatile Express and nothing
> yet.
>
In that case my script was just exposing the calibrate() code race
condition.

> That's not to say that we shouldn't wait for the active mask to become
> true before calling schedule(), but I don't think its as big a deal as
> you're suggesting it is.
I am not expert to really trigger that exact race online-to-active but
am sure the race needs to be fixed.

May be Thomas can suggest a test-case to expose that race. From
his fix for x86, it appeared that the race was indeed hit in some
sequence.

Regards,
Santosh

2011-06-21 09:08:45

by Santosh Shilimkar

[permalink] [raw]
Subject: Re: [RFC PATCH] ARM: smp: Fix the CPU hotplug race with scheduler.

Russell,

On 6/20/2011 8:24 PM, Santosh Shilimkar wrote:
> On 6/20/2011 7:53 PM, Russell King - ARM Linux wrote:
>> On Mon, Jun 20, 2011 at 12:40:19PM +0100, Russell King - ARM Linux wrote:
[....]

>> So, as loops_per_jiffy is not local to this function, the compiler has
>> to write out that zero value, before calling calibrate_delay_converge(),
>> and loops_per_jiffy only becomes non-zero _after_
>> calibrate_delay_converge()
>> has returned. This opens the window and allows the spinlock debugging
>> code to explode.
>>
>> This patch closes the window completely, by only writing to
>> loops_per_jiffy
>> only when we have a real value for it.
>>
>> This allows me to boot 3.0.0-rc3 on Versatile Express (4 CPU) whereas
>> without this it fails with spinlock lockup and rcu problems.
>>
>> init/calibrate.c | 14 ++++++++------
>> 1 files changed, 8 insertions(+), 6 deletions(-)
>>
> I am away from my board now. Will test this change.
Have tested your change and it seems to fix the crash I
was observing. Are you planning to send this fix for rc5?

> btw, the online-active race is still open even with this patch close
> and should be fixed.
>
The only problem remains is waiting for active mask before
marking CPU online. Shall I refresh my patch with only
this change then ?

Regards
Santosh

2011-06-21 10:01:24

by Russell King - ARM Linux

[permalink] [raw]
Subject: Re: [RFC PATCH] ARM: smp: Fix the CPU hotplug race with scheduler.

On Tue, Jun 21, 2011 at 02:38:34PM +0530, Santosh Shilimkar wrote:
> Russell,
>
> On 6/20/2011 8:24 PM, Santosh Shilimkar wrote:
>> On 6/20/2011 7:53 PM, Russell King - ARM Linux wrote:
>>> So, as loops_per_jiffy is not local to this function, the compiler has
>>> to write out that zero value, before calling calibrate_delay_converge(),
>>> and loops_per_jiffy only becomes non-zero _after_
>>> calibrate_delay_converge()
>>> has returned. This opens the window and allows the spinlock debugging
>>> code to explode.
>>>
>>> This patch closes the window completely, by only writing to
>>> loops_per_jiffy
>>> only when we have a real value for it.
>>>
>>> This allows me to boot 3.0.0-rc3 on Versatile Express (4 CPU) whereas
>>> without this it fails with spinlock lockup and rcu problems.
>>>
>>> init/calibrate.c | 14 ++++++++------
>>> 1 files changed, 8 insertions(+), 6 deletions(-)
>>>
>> I am away from my board now. Will test this change.
> Have tested your change and it seems to fix the crash I
> was observing. Are you planning to send this fix for rc5?

Yes. I think sending CPUs into infinite loops in the spinlock code is
definitely sufficiently serious that it needs to go to Linus ASAP.
It'd be nice to have a tested-by line though.

>> btw, the online-active race is still open even with this patch close
>> and should be fixed.
>>
> The only problem remains is waiting for active mask before
> marking CPU online. Shall I refresh my patch with only
> this change then ?

I already have that as a separate change.

2011-06-21 10:17:12

by Santosh Shilimkar

[permalink] [raw]
Subject: Re: [RFC PATCH] ARM: smp: Fix the CPU hotplug race with scheduler.

On 6/21/2011 3:30 PM, Russell King - ARM Linux wrote:
> On Tue, Jun 21, 2011 at 02:38:34PM +0530, Santosh Shilimkar wrote:
>> Russell,
>>
>> On 6/20/2011 8:24 PM, Santosh Shilimkar wrote:
>>> On 6/20/2011 7:53 PM, Russell King - ARM Linux wrote:
>>>> So, as loops_per_jiffy is not local to this function, the compiler has
>>>> to write out that zero value, before calling calibrate_delay_converge(),
>>>> and loops_per_jiffy only becomes non-zero _after_
>>>> calibrate_delay_converge()
>>>> has returned. This opens the window and allows the spinlock debugging
>>>> code to explode.
>>>>
>>>> This patch closes the window completely, by only writing to
>>>> loops_per_jiffy
>>>> only when we have a real value for it.
>>>>
>>>> This allows me to boot 3.0.0-rc3 on Versatile Express (4 CPU) whereas
>>>> without this it fails with spinlock lockup and rcu problems.
>>>>
>>>> init/calibrate.c | 14 ++++++++------
>>>> 1 files changed, 8 insertions(+), 6 deletions(-)
>>>>
>>> I am away from my board now. Will test this change.
>> Have tested your change and it seems to fix the crash I
>> was observing. Are you planning to send this fix for rc5?
>
> Yes. I think sending CPUs into infinite loops in the spinlock code is
> definitely sufficiently serious that it needs to go to Linus ASAP.
> It'd be nice to have a tested-by line though.
>
Sure.

>>> btw, the online-active race is still open even with this patch close
>>> and should be fixed.
>>>
>> The only problem remains is waiting for active mask before
>> marking CPU online. Shall I refresh my patch with only
>> this change then ?
>
> I already have that as a separate change.
Can you point me to both of these commits so that I have
them in my tree for testing.

Thanks for help.

Regards
Santosh

2011-06-21 10:19:23

by Russell King - ARM Linux

[permalink] [raw]
Subject: Re: [RFC PATCH] ARM: smp: Fix the CPU hotplug race with scheduler.

On Tue, Jun 21, 2011 at 03:47:04PM +0530, Santosh Shilimkar wrote:
> On 6/21/2011 3:30 PM, Russell King - ARM Linux wrote:
>> On Tue, Jun 21, 2011 at 02:38:34PM +0530, Santosh Shilimkar wrote:
>>> Russell,
>>>
>>> On 6/20/2011 8:24 PM, Santosh Shilimkar wrote:
>>>> On 6/20/2011 7:53 PM, Russell King - ARM Linux wrote:
>>>>> So, as loops_per_jiffy is not local to this function, the compiler has
>>>>> to write out that zero value, before calling calibrate_delay_converge(),
>>>>> and loops_per_jiffy only becomes non-zero _after_
>>>>> calibrate_delay_converge()
>>>>> has returned. This opens the window and allows the spinlock debugging
>>>>> code to explode.
>>>>>
>>>>> This patch closes the window completely, by only writing to
>>>>> loops_per_jiffy
>>>>> only when we have a real value for it.
>>>>>
>>>>> This allows me to boot 3.0.0-rc3 on Versatile Express (4 CPU) whereas
>>>>> without this it fails with spinlock lockup and rcu problems.
>>>>>
>>>>> init/calibrate.c | 14 ++++++++------
>>>>> 1 files changed, 8 insertions(+), 6 deletions(-)
>>>>>
>>>> I am away from my board now. Will test this change.
>>> Have tested your change and it seems to fix the crash I
>>> was observing. Are you planning to send this fix for rc5?
>>
>> Yes. I think sending CPUs into infinite loops in the spinlock code is
>> definitely sufficiently serious that it needs to go to Linus ASAP.
>> It'd be nice to have a tested-by line though.
>>
> Sure.
>
>>>> btw, the online-active race is still open even with this patch close
>>>> and should be fixed.
>>>>
>>> The only problem remains is waiting for active mask before
>>> marking CPU online. Shall I refresh my patch with only
>>> this change then ?
>>
>> I already have that as a separate change.
> Can you point me to both of these commits so that I have
> them in my tree for testing.

I won't be committing the init/calibrate.c change to a git tree - it
isn't ARM stuff so it goes in patch form.

2011-06-21 10:21:11

by Santosh Shilimkar

[permalink] [raw]
Subject: Re: [RFC PATCH] ARM: smp: Fix the CPU hotplug race with scheduler.

On 6/21/2011 3:49 PM, Russell King - ARM Linux wrote:
> On Tue, Jun 21, 2011 at 03:47:04PM +0530, Santosh Shilimkar wrote:

[...]

>>> I already have that as a separate change.
>> Can you point me to both of these commits so that I have
>> them in my tree for testing.
>
> I won't be committing the init/calibrate.c change to a git tree - it
> isn't ARM stuff so it goes in patch form.

Patches with change log would be fine as well.

Regards
Santosh

2011-06-21 10:26:19

by Russell King - ARM Linux

[permalink] [raw]
Subject: Re: [RFC PATCH] ARM: smp: Fix the CPU hotplug race with scheduler.

On Tue, Jun 21, 2011 at 03:51:00PM +0530, Santosh Shilimkar wrote:
> On 6/21/2011 3:49 PM, Russell King - ARM Linux wrote:
>> On Tue, Jun 21, 2011 at 03:47:04PM +0530, Santosh Shilimkar wrote:
>
> [...]
>
>>>> I already have that as a separate change.
>>> Can you point me to both of these commits so that I have
>>> them in my tree for testing.
>>
>> I won't be committing the init/calibrate.c change to a git tree - it
>> isn't ARM stuff so it goes in patch form.
>
> Patches with change log would be fine as well.

The answer is not at the moment, but maybe soon.

2011-06-21 20:16:52

by Stephen Boyd

[permalink] [raw]
Subject: Re: [RFC PATCH] ARM: smp: Fix the CPU hotplug race with scheduler.

On 06/21/2011 03:26 AM, Russell King - ARM Linux wrote:
> On Tue, Jun 21, 2011 at 03:51:00PM +0530, Santosh Shilimkar wrote:
>> On 6/21/2011 3:49 PM, Russell King - ARM Linux wrote:
>>> I won't be committing the init/calibrate.c change to a git tree - it
>>> isn't ARM stuff so it goes in patch form.
>> Patches with change log would be fine as well.
> The answer is not at the moment, but maybe soon.

Should we send those two patches to the stable trees as well? They seem
to fix issues with cpu onlining that have existed for a long time.

--
Sent by an employee of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.

2011-06-21 23:11:05

by Russell King - ARM Linux

[permalink] [raw]
Subject: Re: [RFC PATCH] ARM: smp: Fix the CPU hotplug race with scheduler.

On Tue, Jun 21, 2011 at 01:16:47PM -0700, Stephen Boyd wrote:
> On 06/21/2011 03:26 AM, Russell King - ARM Linux wrote:
> > On Tue, Jun 21, 2011 at 03:51:00PM +0530, Santosh Shilimkar wrote:
> >> On 6/21/2011 3:49 PM, Russell King - ARM Linux wrote:
> >>> I won't be committing the init/calibrate.c change to a git tree - it
> >>> isn't ARM stuff so it goes in patch form.
> >> Patches with change log would be fine as well.
> > The answer is not at the moment, but maybe soon.
>
> Should we send those two patches to the stable trees as well? They seem
> to fix issues with cpu onlining that have existed for a long time.

Looks to me like the problem was introduced for 2.6.39-rc1, so we
should probably get the fix into the 2.6.39-stable tree too.

2011-06-22 00:07:01

by Stephen Boyd

[permalink] [raw]
Subject: Re: [RFC PATCH] ARM: smp: Fix the CPU hotplug race with scheduler.

On 06/21/2011 04:10 PM, Russell King - ARM Linux wrote:
> On Tue, Jun 21, 2011 at 01:16:47PM -0700, Stephen Boyd wrote:
>> On 06/21/2011 03:26 AM, Russell King - ARM Linux wrote:
>>> On Tue, Jun 21, 2011 at 03:51:00PM +0530, Santosh Shilimkar wrote:
>>>> On 6/21/2011 3:49 PM, Russell King - ARM Linux wrote:
>>>>> I won't be committing the init/calibrate.c change to a git tree - it
>>>>> isn't ARM stuff so it goes in patch form.
>>>> Patches with change log would be fine as well.
>>> The answer is not at the moment, but maybe soon.
>> Should we send those two patches to the stable trees as well? They seem
>> to fix issues with cpu onlining that have existed for a long time.
> Looks to me like the problem was introduced for 2.6.39-rc1, so we
> should probably get the fix into the 2.6.39-stable tree too.

Are we talking about the loops_per_jiffy problem or the cpu_active
problem? I would think the cpu_active problem has been there since SMP
support was added to ARM and the loops_per_jiffy problem has been there
(depending on the compiler) since 8a9e1b0 ([PATCH] Platform SMIs and
their interferance with tsc based delay calibration, 2005-06-23).

So pretty much every stable tree would want both of these patches.

--
Sent by an employee of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.

2011-06-22 10:07:11

by Russell King - ARM Linux

[permalink] [raw]
Subject: Re: [RFC PATCH] ARM: smp: Fix the CPU hotplug race with scheduler.

On Tue, Jun 21, 2011 at 05:06:58PM -0700, Stephen Boyd wrote:
> On 06/21/2011 04:10 PM, Russell King - ARM Linux wrote:
> > On Tue, Jun 21, 2011 at 01:16:47PM -0700, Stephen Boyd wrote:
> >> On 06/21/2011 03:26 AM, Russell King - ARM Linux wrote:
> >>> On Tue, Jun 21, 2011 at 03:51:00PM +0530, Santosh Shilimkar wrote:
> >>>> On 6/21/2011 3:49 PM, Russell King - ARM Linux wrote:
> >>>>> I won't be committing the init/calibrate.c change to a git tree - it
> >>>>> isn't ARM stuff so it goes in patch form.
> >>>> Patches with change log would be fine as well.
> >>> The answer is not at the moment, but maybe soon.
> >> Should we send those two patches to the stable trees as well? They seem
> >> to fix issues with cpu onlining that have existed for a long time.
> > Looks to me like the problem was introduced for 2.6.39-rc1, so we
> > should probably get the fix into the 2.6.39-stable tree too.
>
> Are we talking about the loops_per_jiffy problem or the cpu_active
> problem? I would think the cpu_active problem has been there since SMP
> support was added to ARM and the loops_per_jiffy problem has been there
> (depending on the compiler) since 8a9e1b0 ([PATCH] Platform SMIs and
> their interferance with tsc based delay calibration, 2005-06-23).

The cpu_active problem hasn't actually caused any symptoms on ARM, so
it's low priority. It's only a problem which should be sorted in
-stable _if_ someone reports that it has caused a problem. Up until
Santosh's patch, no one has done so, and I've not seen any problems
on any of my ARM SMP platforms coming from it.

As for the loops_per_jiffy, it isn't a problem before the commit ID
I pointed out - I've checked the assembly, and the compiler optimizes
away the initialization of loops_per_jiffy to zero - the first write
is when its set to (1<<12). Take a moment to think about this:

if ((loops_per_jiffy = 0) == 0) {
} else {
loops_per_jiffy = 1<<12;
...
}

Any compiler worth talking about is going to optimize away the initial
constant write to loops_per_jiffy there provided loops_per_jiffy is not
volatile.

So, although its not desirable for older kernels to have their lpj
overwritten in this way, it doesn't cause the spinlock debugging code
to explode.

This can be shown to be correct because there hasn't been any problem
with ARM secondary CPU bringup until recently.

Plus, the previous version of the code requires significant changes to
sort the problem out.

So, the lpj patch will only sensibly apply to 2.6.39-rc1 and later,
and so it's only going to be submitted for 2.6.39-stable. Previous
kernels, the risks of changing it outweighs by several orders of
magnitude any benefit coming from the change.