2010-08-05 04:28:55

by Darren Hart

[permalink] [raw]
Subject: [PATCH 0/4] powerpc: cpu offline/online fixes for CONFIG_PREEMPT

The following patch series addresses several issues detected during intense CPU
offline/online testing on the mainline kernel with CONFIG_PREEMPT=y. These
patches require the following patch from Brian King:

http://patchwork.ozlabs.org/patch/59645/

Tested against linux-2.6.git master with and without CONFIG_DEBUG_PREEMPT.


2010-08-05 04:29:09

by Darren Hart

[permalink] [raw]
Subject: [PATCH 1/3] powerpc-enable-preemption-before-cpu_die

From: Signed-off-by: Darren Hart <[email protected]>

start_secondary() is called shortly after _start and also via

cpu_idle()->cpu_die()->pseries_mach_cpu_die()

start_secondary() expects a preempt_count() of 0. pseries_mach_cpu_die() is
called via the cpu_idle() routine with preemption disabled, resulting in the
following repeating message during rapid cpu offline/online tests
with CONFIG_PREEMPT=y:

BUG: scheduling while atomic: swapper/0/0x00000002
Modules linked in: autofs4 binfmt_misc dm_mirror dm_region_hash dm_log [last unloaded: scsi_wait_scan]
Call Trace:
[c00000010e7079c0] [c0000000000133ec] .show_stack+0xd8/0x218 (unreliable)
[c00000010e707aa0] [c0000000006a47f0] .dump_stack+0x28/0x3c
[c00000010e707b20] [c00000000006e7a4] .__schedule_bug+0x7c/0x9c
[c00000010e707bb0] [c000000000699d9c] .schedule+0x104/0x800
[c00000010e707cd0] [c000000000015b24] .cpu_idle+0x1c4/0x1d8
[c00000010e707d70] [c0000000006aa1b4] .start_secondary+0x398/0x3d4
[c00000010e707e30] [c000000000008278] .start_secondary_resume+0x10/0x14

Move the cpu_die() call inside the existing preemption enabled block of
cpu_idle(). This is safe as the idle task is affined to a single CPU so the
debug_smp_processor_id() tests (from cpu_should_die()) won't trigger as we are
in a "migration disabled" region.

Signed-off-by: Darren Hart <[email protected]>
Acked-by: Will Schmidt <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Nathan Fontenot <[email protected]>
Cc: Robert Jennings <[email protected]>
Cc: Brian King <[email protected]>
---
arch/powerpc/kernel/idle.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/kernel/idle.c b/arch/powerpc/kernel/idle.c
index 049dda6..39a2baa 100644
--- a/arch/powerpc/kernel/idle.c
+++ b/arch/powerpc/kernel/idle.c
@@ -94,9 +94,9 @@ void cpu_idle(void)
HMT_medium();
ppc64_runlatch_on();
tick_nohz_restart_sched_tick();
+ preempt_enable_no_resched();
if (cpu_should_die())
cpu_die();
- preempt_enable_no_resched();
schedule();
preempt_disable();
}
--
1.7.0.4

2010-08-05 04:29:25

by Darren Hart

[permalink] [raw]
Subject: [PATCH 2/3] powerpc-silence-__cpu_up-under-normal-operation

From: Signed-off-by: Darren Hart <[email protected]>

During CPU offline/online tests __cpu_up would flood the logs with
the following message:

Processor 0 found.

This provides no useful information to the user as there is no context
provided, and since the operation was a success (to this point) it is expected
that the CPU will come back online, providing all the feedback necessary.

Change the "Processor found" message to DBG() similar to other such messages in
the same function. Also, add an appropriate log level for the "Processor is
stuck" message.

Signed-off-by: Darren Hart <[email protected]>
Acked-by: Will Schmidt <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Nathan Fontenot <[email protected]>
Cc: Robert Jennings <[email protected]>
Cc: Brian King <[email protected]>
---
arch/powerpc/kernel/smp.c | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index 5c196d1..cc05792 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -429,11 +429,11 @@ int __cpuinit __cpu_up(unsigned int cpu)
#endif

if (!cpu_callin_map[cpu]) {
- printk("Processor %u is stuck.\n", cpu);
+ printk(KERN_ERR "Processor %u is stuck.\n", cpu);
return -ENOENT;
}

- printk("Processor %u found.\n", cpu);
+ DBG("Processor %u found.\n", cpu);

if (smp_ops->give_timebase)
smp_ops->give_timebase();
--
1.7.0.4

2010-08-05 04:29:37

by Darren Hart

[permalink] [raw]
Subject: [PATCH 3/3] powerpc-silence-xics_migrate_irqs_away-during-cpu-offline

From: Signed-off-by: Darren Hart <[email protected]>

All IRQs are migrated away from a CPU that is being offlined so the
following messages suggest a problem when the system is behaving as
designed:

IRQ 262 affinity broken off cpu 1
IRQ 17 affinity broken off cpu 0
IRQ 18 affinity broken off cpu 0
IRQ 19 affinity broken off cpu 0
IRQ 256 affinity broken off cpu 0
IRQ 261 affinity broken off cpu 0
IRQ 262 affinity broken off cpu 0

Don't print these messages when the CPU is not online.

Signed-off-by: Darren Hart <[email protected]>
Acked-by: Will Schmidt <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Nathan Fontenot <[email protected]>
Cc: Robert Jennings <[email protected]>
Cc: Brian King <[email protected]>
---
arch/powerpc/platforms/pseries/xics.c | 6 ++++--
1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/xics.c b/arch/powerpc/platforms/pseries/xics.c
index f19d194..8d0b0b1 100644
--- a/arch/powerpc/platforms/pseries/xics.c
+++ b/arch/powerpc/platforms/pseries/xics.c
@@ -930,8 +930,10 @@ void xics_migrate_irqs_away(void)
if (xics_status[0] != hw_cpu)
goto unlock;

- printk(KERN_WARNING "IRQ %u affinity broken off cpu %u\n",
- virq, cpu);
+ /* This is expected during cpu offline. */
+ if (cpu_online(cpu))
+ printk(KERN_WARNING "IRQ %u affinity broken off cpu %u\n",
+ virq, cpu);

/* Reset affinity to all cpus */
cpumask_setall(irq_to_desc(virq)->affinity);
--
1.7.0.4