2009-12-16 17:06:39

by Peter Zijlstra

[permalink] [raw]
Subject: [PATCH 02/12] sched: Fix set_cpu_active() in cpu_down()

Sachin found cpu hotplug test failures on powerpc, which made the
kernel hang on his POWER box.

The problem is that we fail to re-activate a cpu when a hot-unplug
fails. Fix this by moving the de-activation into _cpu_down after doing
the initial checks.

Remove the synchronize_sched() calls and rely on those implied by
rebuilding the sched domains using the new mask.

Reported-by: Sachin Sant <[email protected]>
Signed-off-by: Xiaotian Feng <[email protected]>
Tested-by: Sachin Sant <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
---
kernel/cpu.c | 24 +++---------------------
1 file changed, 3 insertions(+), 21 deletions(-)

Index: linux-2.6/kernel/cpu.c
===================================================================
--- linux-2.6.orig/kernel/cpu.c
+++ linux-2.6/kernel/cpu.c
@@ -209,6 +209,7 @@ static int __ref _cpu_down(unsigned int
return -ENOMEM;

cpu_hotplug_begin();
+ set_cpu_active(cpu, false);
err = __raw_notifier_call_chain(&cpu_chain, CPU_DOWN_PREPARE | mod,
hcpu, -1, &nr_calls);
if (err == NOTIFY_BAD) {
@@ -280,18 +281,6 @@ int __ref cpu_down(unsigned int cpu)
goto out;
}

- set_cpu_active(cpu, false);
-
- /*
- * Make sure the all cpus did the reschedule and are not
- * using stale version of the cpu_active_mask.
- * This is not strictly necessary becuase stop_machine()
- * that we run down the line already provides the required
- * synchronization. But it's really a side effect and we do not
- * want to depend on the innards of the stop_machine here.
- */
- synchronize_sched();
-
err = _cpu_down(cpu, 0);

out:
@@ -382,19 +371,12 @@ int disable_nonboot_cpus(void)
return error;
cpu_maps_update_begin();
first_cpu = cpumask_first(cpu_online_mask);
- /* We take down all of the non-boot CPUs in one shot to avoid races
+ /*
+ * We take down all of the non-boot CPUs in one shot to avoid races
* with the userspace trying to use the CPU hotplug at the same time
*/
cpumask_clear(frozen_cpus);

- for_each_online_cpu(cpu) {
- if (cpu == first_cpu)
- continue;
- set_cpu_active(cpu, false);
- }
-
- synchronize_sched();
-
printk("Disabling non-boot CPUs ...\n");
for_each_online_cpu(cpu) {
if (cpu == first_cpu)

--


2009-12-16 17:56:29

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 02/12] sched: Fix set_cpu_active() in cpu_down()

On Wed, 2009-12-16 at 18:04 +0100, Peter Zijlstra wrote:
> plain text document attachment (foo13.patch)

still need to fix quilt...

From: Xiaotian Feng <[email protected]>

> Sachin found cpu hotplug test failures on powerpc, which made the
> kernel hang on his POWER box.
>
> The problem is that we fail to re-activate a cpu when a hot-unplug
> fails. Fix this by moving the de-activation into _cpu_down after doing
> the initial checks.
>
> Remove the synchronize_sched() calls and rely on those implied by
> rebuilding the sched domains using the new mask.
>
> Reported-by: Sachin Sant <[email protected]>
> Signed-off-by: Xiaotian Feng <[email protected]>
> Tested-by: Sachin Sant <[email protected]>
> Signed-off-by: Peter Zijlstra <[email protected]>

2009-12-16 18:37:33

by Xiaotian Feng

[permalink] [raw]
Subject: [tip:sched/urgent] sched: Fix set_cpu_active() in cpu_down()

Commit-ID: 9ee349ad6d326df3633d43f54202427295999c47
Gitweb: http://git.kernel.org/tip/9ee349ad6d326df3633d43f54202427295999c47
Author: Xiaotian Feng <[email protected]>
AuthorDate: Wed, 16 Dec 2009 18:04:32 +0100
Committer: Ingo Molnar <[email protected]>
CommitDate: Wed, 16 Dec 2009 19:01:53 +0100

sched: Fix set_cpu_active() in cpu_down()

Sachin found cpu hotplug test failures on powerpc, which made
the kernel hang on his POWER box.

The problem is that we fail to re-activate a cpu when a
hot-unplug fails. Fix this by moving the de-activation into
_cpu_down after doing the initial checks.

Remove the synchronize_sched() calls and rely on those implied
by rebuilding the sched domains using the new mask.

Reported-by: Sachin Sant <[email protected]>
Signed-off-by: Xiaotian Feng <[email protected]>
Tested-by: Sachin Sant <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
Cc: Mike Galbraith <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
kernel/cpu.c | 24 +++---------------------
1 files changed, 3 insertions(+), 21 deletions(-)

diff --git a/kernel/cpu.c b/kernel/cpu.c
index 291ac58..1c8ddd6 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -209,6 +209,7 @@ static int __ref _cpu_down(unsigned int cpu, int tasks_frozen)
return -ENOMEM;

cpu_hotplug_begin();
+ set_cpu_active(cpu, false);
err = __raw_notifier_call_chain(&cpu_chain, CPU_DOWN_PREPARE | mod,
hcpu, -1, &nr_calls);
if (err == NOTIFY_BAD) {
@@ -280,18 +281,6 @@ int __ref cpu_down(unsigned int cpu)
goto out;
}

- set_cpu_active(cpu, false);
-
- /*
- * Make sure the all cpus did the reschedule and are not
- * using stale version of the cpu_active_mask.
- * This is not strictly necessary becuase stop_machine()
- * that we run down the line already provides the required
- * synchronization. But it's really a side effect and we do not
- * want to depend on the innards of the stop_machine here.
- */
- synchronize_sched();
-
err = _cpu_down(cpu, 0);

out:
@@ -382,19 +371,12 @@ int disable_nonboot_cpus(void)
return error;
cpu_maps_update_begin();
first_cpu = cpumask_first(cpu_online_mask);
- /* We take down all of the non-boot CPUs in one shot to avoid races
+ /*
+ * We take down all of the non-boot CPUs in one shot to avoid races
* with the userspace trying to use the CPU hotplug at the same time
*/
cpumask_clear(frozen_cpus);

- for_each_online_cpu(cpu) {
- if (cpu == first_cpu)
- continue;
- set_cpu_active(cpu, false);
- }
-
- synchronize_sched();
-
printk("Disabling non-boot CPUs ...\n");
for_each_online_cpu(cpu) {
if (cpu == first_cpu)