2011-06-14 17:07:07

by Tejun Heo

[permalink] [raw]
Subject: [PATCHSET] stop_machine: implement stop_machine_from_offline_cpu()

Hello,

This three patch series implements stop_machine_from_offline_cpu(),
which is stop_machine() which can be used from a CPU which isn't yet
online.

This is for mtrr which needs stop_machine while bringing up a CPU. It
currently implements its own stop_machine directly using
stop_one_cpu(). Duplicating such core functionality is wasteful and
fragile. e.g. mtrr's implementation currently can deadlock against
generic stop_machine().

This patchset is born from Suresh's attempt at implementing similar
feature into [__]stop_machine()[1], which IMHO is a bit too subtle.
Given the peculiarity of the feature, I think it's better to have
dedicated function with an ugly name. Also, this way, the
implementation is very well isolated from the rest of stop_cpus
machinery and much simpler.

This patchset contains the following three patches.

0001-stop_machine-kill-__stop_machine.patch
0002-stop_machine-reorganize-stop_cpus-implementation.patch
0003-stop_machine-implement-stop_machine_from_offline_cpu.patch

0001 is a cleanup patch which can go in regardless of the rest (with
ack from x86 people of course). 0002 is pure code reorganization.
0003 implements the feature.

Suresh, how does this look to you? Can you please put the mtrr
changes on top of this and see how it works?

arch/x86/kernel/alternative.c | 5 -
include/linux/stop_machine.h | 42 +++------------
kernel/cpu.c | 2
kernel/stop_machine.c | 117 +++++++++++++++++++++++++++++++++++-------
4 files changed, 114 insertions(+), 52 deletions(-)

Thanks.

--
tejun

[1] http://thread.gmane.org/gmane.linux.kernel/1154383/focus=1154384


2011-06-14 17:10:05

by Tejun Heo

[permalink] [raw]
Subject: [PATCH 1/3] stop_machine: kill __stop_machine()

stop_machine() is different from __stop_machine() in that it
automatically calls get/put_online_cpus() to disable CPU hotplug. For
__stop_machine(), the caller is responsible for achieving exclusion
against CPU hotplug using either get/put_online_cpus() or
cpu_hotplug_begin/done().

However, get_online_cpus() can nest safely inside both another
get_online_cpus() or cpu_hotplug_begin(); thus, it's safe to use
stop_machine() instead of __stop_machine() making the distinction
pointless - the overhead of extra get/put_online_cpus() is negligible
compared to stop_machine and they basically become noop if hotplug is
in progress.

This patch converts all current __stop_machine() users to
stop_machine() and kills __stop_machine(). While at it, move function
comment for stop_machine() from function declaration to definition and
update it slightly.

Signed-off-by: Tejun Heo <[email protected]>
Cc: [email protected]
Cc: Ingo Molnar <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Suresh Siddha <[email protected]>
Cc: Peter Zijlstra <[email protected]>
---
arch/x86/kernel/alternative.c | 5 ++---
include/linux/stop_machine.h | 34 ++--------------------------------
kernel/cpu.c | 2 +-
kernel/stop_machine.c | 38 +++++++++++++++++++++++++++-----------
4 files changed, 32 insertions(+), 47 deletions(-)

diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index a81f2d5..f7a021e 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -719,8 +719,7 @@ void *__kprobes text_poke_smp(void *addr, const void *opcode, size_t len)
tpp.nparams = 1;
atomic_set(&stop_machine_first, 1);
wrote_text = 0;
- /* Use __stop_machine() because the caller already got online_cpus. */
- __stop_machine(stop_machine_text_poke, (void *)&tpp, cpu_online_mask);
+ stop_machine(stop_machine_text_poke, (void *)&tpp, cpu_online_mask);
return addr;
}

@@ -741,5 +740,5 @@ void __kprobes text_poke_smp_batch(struct text_poke_param *params, int n)

atomic_set(&stop_machine_first, 1);
wrote_text = 0;
- __stop_machine(stop_machine_text_poke, (void *)&tpp, NULL);
+ stop_machine(stop_machine_text_poke, (void *)&tpp, NULL);
}
diff --git a/include/linux/stop_machine.h b/include/linux/stop_machine.h
index 092dc9b..4f1a73c 100644
--- a/include/linux/stop_machine.h
+++ b/include/linux/stop_machine.h
@@ -98,36 +98,12 @@ static inline int try_stop_cpus(const struct cpumask *cpumask,
*/
#if defined(CONFIG_STOP_MACHINE) && defined(CONFIG_SMP)

-/**
- * stop_machine: freeze the machine on all CPUs and run this function
- * @fn: the function to run
- * @data: the data ptr for the @fn()
- * @cpus: the cpus to run the @fn() on (NULL = any online cpu)
- *
- * Description: This causes a thread to be scheduled on every cpu,
- * each of which disables interrupts. The result is that no one is
- * holding a spinlock or inside any other preempt-disabled region when
- * @fn() runs.
- *
- * This can be thought of as a very heavy write lock, equivalent to
- * grabbing every spinlock in the kernel. */
int stop_machine(int (*fn)(void *), void *data, const struct cpumask *cpus);

-/**
- * __stop_machine: freeze the machine on all CPUs and run this function
- * @fn: the function to run
- * @data: the data ptr for the @fn
- * @cpus: the cpus to run the @fn() on (NULL = any online cpu)
- *
- * Description: This is a special version of the above, which assumes cpus
- * won't come or go while it's being called. Used by hotplug cpu.
- */
-int __stop_machine(int (*fn)(void *), void *data, const struct cpumask *cpus);
-
#else /* CONFIG_STOP_MACHINE && CONFIG_SMP */

-static inline int __stop_machine(int (*fn)(void *), void *data,
- const struct cpumask *cpus)
+static inline int stop_machine(int (*fn)(void *), void *data,
+ const struct cpumask *cpus)
{
int ret;
local_irq_disable();
@@ -136,11 +112,5 @@ static inline int __stop_machine(int (*fn)(void *), void *data,
return ret;
}

-static inline int stop_machine(int (*fn)(void *), void *data,
- const struct cpumask *cpus)
-{
- return __stop_machine(fn, data, cpus);
-}
-
#endif /* CONFIG_STOP_MACHINE && CONFIG_SMP */
#endif /* _LINUX_STOP_MACHINE */
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 12b7458..bb23d5b 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -235,7 +235,7 @@ static int __ref _cpu_down(unsigned int cpu, int tasks_frozen)
goto out_release;
}

- err = __stop_machine(take_cpu_down, &tcd_param, cpumask_of(cpu));
+ err = stop_machine(take_cpu_down, &tcd_param, cpumask_of(cpu));
if (err) {
/* CPU didn't die: tell everyone. Can't complain. */
cpu_notify_nofail(CPU_DOWN_FAILED | mod, hcpu);
diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
index e3516b2..3d3f47d 100644
--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -464,24 +464,40 @@ static int stop_machine_cpu_stop(void *data)
return err;
}

-int __stop_machine(int (*fn)(void *), void *data, const struct cpumask *cpus)
+/**
+ * stop_machine - freeze the machine on all online CPUs and run this function
+ * @fn: the function to run
+ * @data: the data ptr for the @fn()
+ * @cpus: the cpus to run the @fn() on (NULL = any online cpu)
+ *
+ * This causes a thread to be scheduled on every cpu, each of which
+ * disables interrupts. The result is that no one is holding a spinlock or
+ * inside any other preempt-disabled region when @fn() runs.
+ *
+ * This can be thought of as a very heavy write lock, equivalent to
+ * grabbing every spinlock in the kernel.
+ *
+ * CONTEXT:
+ * Might sleep. Temporarily stops all online CPUs.
+ *
+ * RETURNS:
+ * 0 if all executions of @fn returned 0, any non zero return value if any
+ * returned non zero.
+ */
+int stop_machine(int (*fn)(void *), void *data, const struct cpumask *cpus)
{
struct stop_machine_data smdata = { .fn = fn, .data = data,
- .num_threads = num_online_cpus(),
.active_cpus = cpus };
-
- /* Set the initial state and stop all online cpus. */
- set_state(&smdata, STOPMACHINE_PREPARE);
- return stop_cpus(cpu_online_mask, stop_machine_cpu_stop, &smdata);
-}
-
-int stop_machine(int (*fn)(void *), void *data, const struct cpumask *cpus)
-{
int ret;

/* No CPUs can come up or down during this. */
get_online_cpus();
- ret = __stop_machine(fn, data, cpus);
+ smdata.num_threads = num_online_cpus(),
+
+ /* Set the initial state and stop all online cpus. */
+ set_state(&smdata, STOPMACHINE_PREPARE);
+ ret = stop_cpus(cpu_online_mask, stop_machine_cpu_stop, &smdata);
+
put_online_cpus();
return ret;
}
--
1.7.5.2

2011-06-14 17:07:12

by Tejun Heo

[permalink] [raw]
Subject: [PATCH 2/3] stop_machine: reorganize stop_cpus() implementation

Split __stop_cpus() into stop_cpus_queue() and stop_cpus_locked().
The former handles only the queueing part. The latter uses the queue
function and is functionally equivalent to __stop_cpus().

The reorganization is to help future improvements to stop_machine()
and doesn't introduce any behavior difference.

Signed-off-by: Tejun Heo <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Suresh Siddha <[email protected]>
Cc: Peter Zijlstra <[email protected]>
---
kernel/stop_machine.c | 20 ++++++++++++++------
1 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
index 3d3f47d..198973f 100644
--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -136,10 +136,11 @@ void stop_one_cpu_nowait(unsigned int cpu, cpu_stop_fn_t fn, void *arg,
static DEFINE_MUTEX(stop_cpus_mutex);
static DEFINE_PER_CPU(struct cpu_stop_work, stop_cpus_work);

-int __stop_cpus(const struct cpumask *cpumask, cpu_stop_fn_t fn, void *arg)
+static void stop_cpus_queue(const struct cpumask *cpumask,
+ cpu_stop_fn_t fn, void *arg,
+ struct cpu_stop_done *done)
{
struct cpu_stop_work *work;
- struct cpu_stop_done done;
unsigned int cpu;

/* initialize works and done */
@@ -147,9 +148,8 @@ int __stop_cpus(const struct cpumask *cpumask, cpu_stop_fn_t fn, void *arg)
work = &per_cpu(stop_cpus_work, cpu);
work->fn = fn;
work->arg = arg;
- work->done = &done;
+ work->done = done;
}
- cpu_stop_init_done(&done, cpumask_weight(cpumask));

/*
* Disable preemption while queueing to avoid getting
@@ -161,7 +161,15 @@ int __stop_cpus(const struct cpumask *cpumask, cpu_stop_fn_t fn, void *arg)
cpu_stop_queue_work(&per_cpu(cpu_stopper, cpu),
&per_cpu(stop_cpus_work, cpu));
preempt_enable();
+}

+static int stop_cpus_locked(const struct cpumask *cpumask,
+ cpu_stop_fn_t fn, void *arg)
+{
+ struct cpu_stop_done done;
+
+ cpu_stop_init_done(&done, cpumask_weight(cpumask));
+ stop_cpus_queue(cpumask, fn, arg, &done);
wait_for_completion(&done.completion);
return done.executed ? done.ret : -ENOENT;
}
@@ -200,7 +208,7 @@ int stop_cpus(const struct cpumask *cpumask, cpu_stop_fn_t fn, void *arg)

/* static works are used, process one request at a time */
mutex_lock(&stop_cpus_mutex);
- ret = __stop_cpus(cpumask, fn, arg);
+ ret = stop_cpus_locked(cpumask, fn, arg);
mutex_unlock(&stop_cpus_mutex);
return ret;
}
@@ -230,7 +238,7 @@ int try_stop_cpus(const struct cpumask *cpumask, cpu_stop_fn_t fn, void *arg)
/* static works are used, process one request at a time */
if (!mutex_trylock(&stop_cpus_mutex))
return -EAGAIN;
- ret = __stop_cpus(cpumask, fn, arg);
+ ret = stop_cpus_locked(cpumask, fn, arg);
mutex_unlock(&stop_cpus_mutex);
return ret;
}
--
1.7.5.2

2011-06-14 17:07:25

by Tejun Heo

[permalink] [raw]
Subject: [PATCH 3/3] stop_machine: implement stop_machine_from_offline_cpu()

Currently, mtrr wants stop_machine functionality while a CPU is being
brought up. As stop_machine() requires the calling CPU to be online,
mtrr implements its own stop_machine using stop_one_cpu() on each
online CPU. This doesn't only unnecessarily duplicate complex logic
but also introduces a possibility of deadlock when it races against
the generic stop_machine().

This patch implements stop_machine_from_offline_cpu() to serve such
use cases. Its functionality is basically the same to stop_machine();
however, it should be called from a CPU which isn't online and doesn't
depend on working scheduling on the calling CPU.

This is achieved by using busy loops for synchronization and
open-coding stop_cpus queueing and waiting with direct invocation of
fn() for local CPU inbetween.

This is inspired from Suresh's patch to implement similar
functionality into stop_machine().

http://article.gmane.org/gmane.linux.kernel/1154384

Signed-off-by: Tejun Heo <[email protected]>
Cc: Suresh Siddha <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
---
include/linux/stop_machine.h | 14 ++++++++-
kernel/stop_machine.c | 61 +++++++++++++++++++++++++++++++++++++++++-
2 files changed, 72 insertions(+), 3 deletions(-)

diff --git a/include/linux/stop_machine.h b/include/linux/stop_machine.h
index 4f1a73c..c78ef6a 100644
--- a/include/linux/stop_machine.h
+++ b/include/linux/stop_machine.h
@@ -99,18 +99,28 @@ static inline int try_stop_cpus(const struct cpumask *cpumask,
#if defined(CONFIG_STOP_MACHINE) && defined(CONFIG_SMP)

int stop_machine(int (*fn)(void *), void *data, const struct cpumask *cpus);
+int stop_machine_from_offline_cpu(int (*fn)(void *), void *data,
+ const struct cpumask *cpus);

#else /* CONFIG_STOP_MACHINE && CONFIG_SMP */

static inline int stop_machine(int (*fn)(void *), void *data,
const struct cpumask *cpus)
{
+ unsigned long flags;
int ret;
- local_irq_disable();
+
+ local_irq_save(flags);
ret = fn(data);
- local_irq_enable();
+ local_irq_restore(flags);
return ret;
}

+static inline int stop_machine_from_offline_cpu(int (*fn)(void *), void *data,
+ const struct cpumask *cpus)
+{
+ return stop_machine(fn, data, cpus);
+}
+
#endif /* CONFIG_STOP_MACHINE && CONFIG_SMP */
#endif /* _LINUX_STOP_MACHINE */
diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
index 198973f..ff648d9 100644
--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -439,8 +439,15 @@ static int stop_machine_cpu_stop(void *data)
struct stop_machine_data *smdata = data;
enum stopmachine_state curstate = STOPMACHINE_NONE;
int cpu = smp_processor_id(), err = 0;
+ unsigned long flags;
bool is_active;

+ /*
+ * When called from stop_machine_from_offline_cpu(), irq might
+ * already be disabled. Save the state and restore it on exit.
+ */
+ local_save_flags(flags);
+
if (!smdata->active_cpus)
is_active = cpu == cpumask_first(cpu_online_mask);
else
@@ -468,7 +475,7 @@ static int stop_machine_cpu_stop(void *data)
}
} while (curstate != STOPMACHINE_EXIT);

- local_irq_enable();
+ local_irq_restore(flags);
return err;
}

@@ -511,4 +518,56 @@ int stop_machine(int (*fn)(void *), void *data, const struct cpumask *cpus)
}
EXPORT_SYMBOL_GPL(stop_machine);

+/**
+ * stop_machine_from_offline_cpu - stop_machine() from offline CPU
+ * @fn: the function to run
+ * @data: the data ptr for the @fn()
+ * @cpus: the cpus to run the @fn() on (NULL = any online cpu)
+ *
+ * This is identical to stop_machine() but can be called from a CPU which
+ * isn't online. The local CPU is in the process of hotplug (so no other
+ * CPU hotplug can start) and not marked online and doesn't have enough
+ * context to sleep.
+ *
+ * This function provides stop_machine() functionality for such state by
+ * using busy-wait for synchronization and executing @fn directly for local
+ * CPU.
+ *
+ * CONTEXT:
+ * Local CPU is offline. Temporarily stops all online CPUs.
+ *
+ * RETURNS:
+ * 0 if all executions of @fn returned 0, any non zero return value if any
+ * returned non zero.
+ */
+int stop_machine_from_offline_cpu(int (*fn)(void *), void *data,
+ const struct cpumask *cpus)
+{
+ struct stop_machine_data smdata = { .fn = fn, .data = data,
+ .active_cpus = cpus };
+ struct cpu_stop_done done;
+ int ret;
+
+ /* Local CPU must be offline and CPU hotplug in progress. */
+ BUG_ON(cpu_online(raw_smp_processor_id()));
+ smdata.num_threads = num_online_cpus() + 1; /* +1 for local */
+
+ /* No proper task established and can't sleep - busy wait for lock. */
+ while (!mutex_trylock(&stop_cpus_mutex))
+ cpu_relax();
+
+ /* Schedule work on other CPUs and execute directly for local CPU */
+ set_state(&smdata, STOPMACHINE_PREPARE);
+ cpu_stop_init_done(&done, num_online_cpus());
+ stop_cpus_queue(cpu_online_mask, stop_machine_cpu_stop, &smdata, &done);
+ ret = stop_machine_cpu_stop(&smdata);
+
+ /* Busy wait for completion. */
+ while (!completion_done(&done.completion))
+ cpu_relax();
+
+ mutex_unlock(&stop_cpus_mutex);
+ return ret ?: done.ret;
+}
+
#endif /* CONFIG_STOP_MACHINE */
--
1.7.5.2

2011-06-16 12:10:55

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCHSET] stop_machine: implement stop_machine_from_offline_cpu()

On Tue, 2011-06-14 at 19:06 +0200, Tejun Heo wrote:
> This three patch series implements stop_machine_from_offline_cpu(),
> which is stop_machine() which can be used from a CPU which isn't yet
> online.
>
> This is for mtrr which needs stop_machine while bringing up a CPU. It
> currently implements its own stop_machine directly using
> stop_one_cpu(). Duplicating such core functionality is wasteful and
> fragile. e.g. mtrr's implementation currently can deadlock against
> generic stop_machine().
>
> This patchset is born from Suresh's attempt at implementing similar
> feature into [__]stop_machine()[1], which IMHO is a bit too subtle.
> Given the peculiarity of the feature, I think it's better to have
> dedicated function with an ugly name. Also, this way, the
> implementation is very well isolated from the rest of stop_cpus
> machinery and much simpler.

Maybe a silly question, but why does mtrr need all this? Surely mtrr can
serialize state by other means than stopping all cpus. A simple mutex
around the shared state blocking other cpus from updating the mtrr state
while we're copying the state to our newly born cpu should cure things.

I'm sure I'm missing something trivial here, but shouldn't we at least
explain the details and explore alternatives before merging ugly code?

2011-06-16 12:13:35

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 1/3] stop_machine: kill __stop_machine()

On Tue, 2011-06-14 at 19:06 +0200, Tejun Heo wrote:
> +++ b/arch/x86/kernel/alternative.c
> @@ -719,8 +719,7 @@ void *__kprobes text_poke_smp(void *addr, const void *opcode, size_t len)
> tpp.nparams = 1;
> atomic_set(&stop_machine_first, 1);
> wrote_text = 0;
> - /* Use __stop_machine() because the caller already got online_cpus. */
> - __stop_machine(stop_machine_text_poke, (void *)&tpp, cpu_online_mask);
> + stop_machine(stop_machine_text_poke, (void *)&tpp, cpu_online_mask);
> return addr;
> }

Please have a look at:

---
commit d91309f69b7bdb64aeb30106fde8d18c5dd354b5
Author: Peter Zijlstra <[email protected]>
Date: Fri Feb 11 22:07:46 2011 +0100

x86: Fix text_poke_smp_batch() deadlock

Fix this deadlock - we are already holding the mutex:

=======================================================
[ INFO: possible circular locking dependency detected ] 2.6.38-rc4-test+ #1
-------------------------------------------------------
bash/1850 is trying to acquire lock:
(text_mutex){+.+.+.}, at: [<ffffffff8100a9c1>] return_to_handler+0x0/0x2f

but task is already holding lock:
(smp_alt){+.+...}, at: [<ffffffff8100a9c1>] return_to_handler+0x0/0x2f

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #2 (smp_alt){+.+...}:
[<ffffffff81082d02>] lock_acquire+0xcd/0xf8
[<ffffffff8192e119>] __mutex_lock_common+0x4c/0x339
[<ffffffff8192e4ca>] mutex_lock_nested+0x3e/0x43
[<ffffffff8101050f>] alternatives_smp_switch+0x77/0x1d8
[<ffffffff81926a6f>] do_boot_cpu+0xd7/0x762
[<ffffffff819277dd>] native_cpu_up+0xe6/0x16a
[<ffffffff81928e28>] _cpu_up+0x9d/0xee
[<ffffffff81928f4c>] cpu_up+0xd3/0xe7
[<ffffffff82268d4b>] kernel_init+0xe8/0x20a
[<ffffffff8100ba24>] kernel_thread_helper+0x4/0x10

-> #1 (cpu_hotplug.lock){+.+.+.}:
[<ffffffff81082d02>] lock_acquire+0xcd/0xf8
[<ffffffff8192e119>] __mutex_lock_common+0x4c/0x339
[<ffffffff8192e4ca>] mutex_lock_nested+0x3e/0x43
[<ffffffff810568cc>] get_online_cpus+0x41/0x55
[<ffffffff810a1348>] stop_machine+0x1e/0x3e
[<ffffffff819314c1>] text_poke_smp_batch+0x3a/0x3c
[<ffffffff81932b6c>] arch_optimize_kprobes+0x10d/0x11c
[<ffffffff81933a51>] kprobe_optimizer+0x152/0x222
[<ffffffff8106bb71>] process_one_work+0x1d3/0x335
[<ffffffff8106cfae>] worker_thread+0x104/0x1a4
[<ffffffff810707c4>] kthread+0x9d/0xa5
[<ffffffff8100ba24>] kernel_thread_helper+0x4/0x10

-> #0 (text_mutex){+.+.+.}:

other info that might help us debug this:

6 locks held by bash/1850:
#0: (&buffer->mutex){+.+.+.}, at: [<ffffffff8100a9c1>] return_to_handler+0x0/0x2f
#1: (s_active#75){.+.+.+}, at: [<ffffffff8100a9c1>] return_to_handler+0x0/0x2f
#2: (x86_cpu_hotplug_driver_mutex){+.+.+.}, at: [<ffffffff8100a9c1>] return_to_handler+0x0/0x2f
#3: (cpu_add_remove_lock){+.+.+.}, at: [<ffffffff8100a9c1>] return_to_handler+0x0/0x2f
#4: (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff8100a9c1>] return_to_handler+0x0/0x2f
#5: (smp_alt){+.+...}, at: [<ffffffff8100a9c1>] return_to_handler+0x0/0x2f

stack backtrace:
Pid: 1850, comm: bash Not tainted 2.6.38-rc4-test+ #1
Call Trace:

[<ffffffff81080eb2>] print_circular_bug+0xa8/0xb7
[<ffffffff8192e4ca>] mutex_lock_nested+0x3e/0x43
[<ffffffff81010302>] alternatives_smp_unlock+0x3d/0x93
[<ffffffff81010630>] alternatives_smp_switch+0x198/0x1d8
[<ffffffff8102568a>] native_cpu_die+0x65/0x95
[<ffffffff818cc4ec>] _cpu_down+0x13e/0x202
[<ffffffff8117a619>] sysfs_write_file+0x108/0x144
[<ffffffff8111f5a2>] vfs_write+0xac/0xff
[<ffffffff8111f7a9>] sys_write+0x4a/0x6e

Reported-by: Steven Rostedt <[email protected]>
Tested-by: Steven Rostedt <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
LKML-Reference: <1297458466.5226.93.camel@laptop>
Signed-off-by: Ingo Molnar <[email protected]>

diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index 1236085..7038b95 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -671,7 +671,7 @@ void __kprobes text_poke_smp_batch(struct text_poke_param *params, int n)

atomic_set(&stop_machine_first, 1);
wrote_text = 0;
- stop_machine(stop_machine_text_poke, (void *)&tpp, NULL);
+ __stop_machine(stop_machine_text_poke, (void *)&tpp, NULL);
}

#if defined(CONFIG_DYNAMIC_FTRACE) || defined(HAVE_JUMP_LABEL)

2011-06-16 12:15:16

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCHSET] stop_machine: implement stop_machine_from_offline_cpu()

Hello, Peter.

On Thu, Jun 16, 2011 at 02:10:19PM +0200, Peter Zijlstra wrote:
> Maybe a silly question, but why does mtrr need all this? Surely mtrr can
> serialize state by other means than stopping all cpus. A simple mutex
> around the shared state blocking other cpus from updating the mtrr state
> while we're copying the state to our newly born cpu should cure things.

Hmmm... good question. I don't know mtrr too well either but the
stop-machine requirement seems to directly come from intel's
specification. I suppose Suresh can fill us in better.

Thanks.

--
tejun

2011-06-16 12:45:00

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 1/3] stop_machine: kill __stop_machine()

On Thu, Jun 16, 2011 at 02:12:52PM +0200, Peter Zijlstra wrote:
> On Tue, 2011-06-14 at 19:06 +0200, Tejun Heo wrote:
> > +++ b/arch/x86/kernel/alternative.c
> > @@ -719,8 +719,7 @@ void *__kprobes text_poke_smp(void *addr, const void *opcode, size_t len)
> > tpp.nparams = 1;
> > atomic_set(&stop_machine_first, 1);
> > wrote_text = 0;
> > - /* Use __stop_machine() because the caller already got online_cpus. */
> > - __stop_machine(stop_machine_text_poke, (void *)&tpp, cpu_online_mask);
> > + stop_machine(stop_machine_text_poke, (void *)&tpp, cpu_online_mask);
> > return addr;
> > }
>
> Please have a look at:
>
> ---
> commit d91309f69b7bdb64aeb30106fde8d18c5dd354b5
> Author: Peter Zijlstra <[email protected]>
> Date: Fri Feb 11 22:07:46 2011 +0100
>
> x86: Fix text_poke_smp_batch() deadlock
>
> Fix this deadlock - we are already holding the mutex:

Ah, right, I thought cpu_hotplug_begin() sets active_writer and then
returns with cpu_hotplug.lock released. Weird locking there.
Anyways, this one should be dropped then and I don't think it affects
the rest of the series much.

Thanks.

--
tejun

2011-06-16 17:21:46

by Suresh Siddha

[permalink] [raw]
Subject: Re: [PATCHSET] stop_machine: implement stop_machine_from_offline_cpu()

On Thu, 2011-06-16 at 05:15 -0700, Tejun Heo wrote:
> Hello, Peter.
>
> On Thu, Jun 16, 2011 at 02:10:19PM +0200, Peter Zijlstra wrote:
> > Maybe a silly question, but why does mtrr need all this? Surely mtrr can
> > serialize state by other means than stopping all cpus. A simple mutex
> > around the shared state blocking other cpus from updating the mtrr state
> > while we're copying the state to our newly born cpu should cure things.
>
> Hmmm... good question. I don't know mtrr too well either but the
> stop-machine requirement seems to directly come from intel's
> specification. I suppose Suresh can fill us in better.

Yes, It is coming from SDM guidelines of updating MTRR's. MTRR/PAT
change the memory attributes and to ensure that no one else is
simultaneously accessing memory while a cpu is changing the attributes
of that memory, we need system wide rendezvous. MTRR/PAT is not only
updated during cpu online, we allow MTRR's to be changed (and all the
cpu's need to be in sync) at runtime too.

For example, change log for the commit
d0af9eed5aa91b6b7b5049cae69e5ea956fd85c3 explains a recent issue we
encountered.

thanks,
suresh

2011-06-16 17:38:07

by Suresh Siddha

[permalink] [raw]
Subject: Re: [PATCH 1/3] stop_machine: kill __stop_machine()

On Thu, 2011-06-16 at 05:12 -0700, Peter Zijlstra wrote:
> On Tue, 2011-06-14 at 19:06 +0200, Tejun Heo wrote:
> Please have a look at:
>
> ---
> commit d91309f69b7bdb64aeb30106fde8d18c5dd354b5
> Author: Peter Zijlstra <[email protected]>
> Date: Fri Feb 11 22:07:46 2011 +0100
>
> x86: Fix text_poke_smp_batch() deadlock
>
> Fix this deadlock - we are already holding the mutex:
>
...
> diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
> index 1236085..7038b95 100644
> --- a/arch/x86/kernel/alternative.c
> +++ b/arch/x86/kernel/alternative.c
> @@ -671,7 +671,7 @@ void __kprobes text_poke_smp_batch(struct text_poke_param *params, int n)
>
> atomic_set(&stop_machine_first, 1);
> wrote_text = 0;
> - stop_machine(stop_machine_text_poke, (void *)&tpp, NULL);
> + __stop_machine(stop_machine_text_poke, (void *)&tpp, NULL);
> }
>
> #if defined(CONFIG_DYNAMIC_FTRACE) || defined(HAVE_JUMP_LABEL)
>

Peter, So it looks like we are allowing a new cpu to come online in
parallel, while we poke the text? Isn't it a problem? What am I missing?

thanks,
suresh

2011-06-16 19:00:29

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 1/3] stop_machine: kill __stop_machine()

On Thu, 2011-06-16 at 10:37 -0700, Suresh Siddha wrote:
> On Thu, 2011-06-16 at 05:12 -0700, Peter Zijlstra wrote:
> > On Tue, 2011-06-14 at 19:06 +0200, Tejun Heo wrote:
> > Please have a look at:
> >
> > ---
> > commit d91309f69b7bdb64aeb30106fde8d18c5dd354b5
> > Author: Peter Zijlstra <[email protected]>
> > Date: Fri Feb 11 22:07:46 2011 +0100
> >
> > x86: Fix text_poke_smp_batch() deadlock
> >
> > Fix this deadlock - we are already holding the mutex:
> >
> ...
> > diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
> > index 1236085..7038b95 100644
> > --- a/arch/x86/kernel/alternative.c
> > +++ b/arch/x86/kernel/alternative.c
> > @@ -671,7 +671,7 @@ void __kprobes text_poke_smp_batch(struct text_poke_param *params, int n)
> >
> > atomic_set(&stop_machine_first, 1);
> > wrote_text = 0;
> > - stop_machine(stop_machine_text_poke, (void *)&tpp, NULL);
> > + __stop_machine(stop_machine_text_poke, (void *)&tpp, NULL);
> > }
> >
> > #if defined(CONFIG_DYNAMIC_FTRACE) || defined(HAVE_JUMP_LABEL)
> >
>
> Peter, So it looks like we are allowing a new cpu to come online in
> parallel, while we poke the text? Isn't it a problem? What am I missing?

the caller already did get_online_cpus(),

do_optimize_kprobes()
get_online_cpus()
arch_optimize_kprobes()
text_poke_smp_batch()
put_online_cpus()

2011-06-16 18:17:36

by Suresh Siddha

[permalink] [raw]
Subject: Re: [PATCH 1/3] stop_machine: kill __stop_machine()

On Thu, 2011-06-16 at 10:55 -0700, Peter Zijlstra wrote:
> On Thu, 2011-06-16 at 10:37 -0700, Suresh Siddha wrote:
> > On Thu, 2011-06-16 at 05:12 -0700, Peter Zijlstra wrote:
> > > On Tue, 2011-06-14 at 19:06 +0200, Tejun Heo wrote:
> > > Please have a look at:
> > >
> > > ---
> > > commit d91309f69b7bdb64aeb30106fde8d18c5dd354b5
> > > Author: Peter Zijlstra <[email protected]>
> > > Date: Fri Feb 11 22:07:46 2011 +0100
> > >
> > > x86: Fix text_poke_smp_batch() deadlock
> > >
> > > Fix this deadlock - we are already holding the mutex:
> > >
> > ...
> > > diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
> > > index 1236085..7038b95 100644
> > > --- a/arch/x86/kernel/alternative.c
> > > +++ b/arch/x86/kernel/alternative.c
> > > @@ -671,7 +671,7 @@ void __kprobes text_poke_smp_batch(struct text_poke_param *params, int n)
> > >
> > > atomic_set(&stop_machine_first, 1);
> > > wrote_text = 0;
> > > - stop_machine(stop_machine_text_poke, (void *)&tpp, NULL);
> > > + __stop_machine(stop_machine_text_poke, (void *)&tpp, NULL);
> > > }
> > >
> > > #if defined(CONFIG_DYNAMIC_FTRACE) || defined(HAVE_JUMP_LABEL)
> > >
> >
> > Peter, So it looks like we are allowing a new cpu to come online in
> > parallel, while we poke the text? Isn't it a problem? What am I missing?
>
> the caller already did get_online_cpus(),
>
> do_optimize_kprobes()
> get_online_cpus()
> arch_optimize_kprobes()
> text_poke_smp_batch()
> put_online_cpus()

So the circular dependency reported is not possible in practice right?

Above patch is working around a false positive.

thanks,
suresh

2011-06-16 18:28:12

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 1/3] stop_machine: kill __stop_machine()

Hello,

On Thu, Jun 16, 2011 at 11:17:40AM -0700, Suresh Siddha wrote:
> On Thu, 2011-06-16 at 10:55 -0700, Peter Zijlstra wrote:
> > the caller already did get_online_cpus(),
> >
> > do_optimize_kprobes()
> > get_online_cpus()
> > arch_optimize_kprobes()
> > text_poke_smp_batch()
> > put_online_cpus()

Peter, I don't think it's that simple. get_online_cpus() itself can't
create circular dependency by itself. It allows recursing. The chain
involves cpu_hotplug_begin() which returns with hotplug mutex held.

> So the circular dependency reported is not possible in practice right?
>
> Above patch is working around a false positive.

But I don't think the lockdep warning is spurious. text_mutex ->
hotplug.lock dependency is created during kprobe patching.
hotplug.lock -> smp_alt during CPU hotplug, and then the alternative
code calling get_online_cpus() creates reverse dependency through
get_online_cpus(). So, the reasoning and comment are wrong but
grapping hotplug mutex there does create a circular dependency.

This probably can be resolved in prettier way but let's leave it alone
for now.

Thanks.

--
tejun

2011-06-16 18:28:49

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 1/3] stop_machine: kill __stop_machine()

On Thu, 2011-06-16 at 11:17 -0700, Suresh Siddha wrote:

> > > Peter, So it looks like we are allowing a new cpu to come online in
> > > parallel, while we poke the text? Isn't it a problem? What am I missing?
> >
> > the caller already did get_online_cpus(),
> >
> > do_optimize_kprobes()
> > get_online_cpus()
> > arch_optimize_kprobes()
> > text_poke_smp_batch()
> > put_online_cpus()
>
> So the circular dependency reported is not possible in practice right?

why not? get_online_cpus() takes a mutex, that connects smp_alt and
text_mutex and can cause a deadlock.

2011-06-16 18:36:44

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 1/3] stop_machine: kill __stop_machine()

On Thu, 2011-06-16 at 20:28 +0200, Tejun Heo wrote:
> Peter, I don't think it's that simple. get_online_cpus() itself can't
> create circular dependency by itself. It allows recursing. The chain
> involves cpu_hotplug_begin() which returns with hotplug mutex held.


Right, its like:

mutex_lock(&a);
get_online_cpus();

vs

cpu_hotplug_begin()
mutex_lock(&a);


that will really deadlock

2011-06-16 18:44:33

by Suresh Siddha

[permalink] [raw]
Subject: Re: [PATCH 1/3] stop_machine: kill __stop_machine()

On Thu, 2011-06-16 at 11:36 -0700, Peter Zijlstra wrote:
> On Thu, 2011-06-16 at 20:28 +0200, Tejun Heo wrote:
> > Peter, I don't think it's that simple. get_online_cpus() itself can't
> > create circular dependency by itself. It allows recursing. The chain
> > involves cpu_hotplug_begin() which returns with hotplug mutex held.
>
>
> Right, its like:
>
> mutex_lock(&a);
> get_online_cpus();
>
> vs
>
> cpu_hotplug_begin()
> mutex_lock(&a);
>
>
> that will really deadlock

Its actually like:

get_online_cpus();
mutex_lock(&a);
get_online_cpus(); <--- recursive get_online_cpus()

vs

cpu_hotplug_begin()
mutex_lock(&a);

So it can't cause deadlock in practice.

thanks,
suresh