2009-06-04 06:57:06

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH] cpuhotplug: introduce try_get_online_cpus() take 2

Gautham R Shenoy wrote:
> On Fri, May 29, 2009 at 06:53:42PM -0700, Paul E. McKenney wrote:
>> On Fri, May 29, 2009 at 04:29:30PM +0800, Lai Jiangshan wrote:
>>> Current get_online_cpus()/put_online_cpus() re-implement
>>> a rw_semaphore, so it is converted to a real rw_semaphore in this fix.
>>> It simplifies codes, and is good for read.
>>>
>>> And misc fix:
>>> 1) Add comments for cpu_hotplug.active_writer.
>>> 2) The theoretical disadvantage described in cpu_hotplug_begin()'s
>>> comments is no longer existed when we use rw_semaphore,
>>> so this part of comments was removed.
>>>
>>> [Impact: improve get_online_cpus()/put_online_cpus() ]
>> Actually, it turns out that for my purposes it is only necessary to check:
>>
>> cpu_hotplug.active_writer != NULL
>>
>> The only time that it is unsafe to invoke get_online_cpus() is when
>> in a notifier, and in that case the value of cpu_hotplug.active_writer
>> is stable. There could be false positives, but these are harmless, as
>> the fallback is simply synchronize_sched().
>>
>> Even this is only needed should the deadlock scenario you pointed out
>> arise in practice.
>>
>> As Oleg noted, there are some "interesting" constraints on
>> get_online_cpus(). Adding Gautham Shenoy to CC for his views.
>
> So, to put it in a sentence, get_online_cpus()/put_online_cpus() is a
> read-write semaphore with read-preference while allowing writer to
> downgrade to a reader when required.
>
> Read-preference was one of the ways of allowing unsuspecting functions
> which need the protection against cpu-hotplug to end up seeking help of
> functions which also need protection against cpu-hotplug. IOW allow a
> single context to call get_online_cpus() without giving away to circular
> deadlock. A fair reader-write lock wouldn't allow that since in the
> presence of a write, the recursive reads would block, thereby causing a
> deadlock.
>
> Also, around the time when this design was chosen, we had a whole bunch
> of functions which did try to take the original "cpu_hotplug_mutex"
> recursively. We could do well to use Lai's implementation if such
> functions have mended their ways since this would make it a lot simpler
> :-) . But I suspect it is easier said than done!
>
> BTW, I second the idea of try_get_online_cpus(). I had myself proposed
> this idea a year back. http://lkml.org/lkml/2008/4/29/222.
>
>
>

Add CC to Peter Zijlstra <[email protected]>

(This patch is against mainline but not for inclusion, it will adapted
against -mm when request)

Requst For Comments and Reviewing Hungeringly.

- Lockless for get_online_cpus()'s fast path
- Introduce try_get_online_cpus()

---
diff --git a/include/linux/cpu.h b/include/linux/cpu.h
index 2643d84..63b216c 100644
--- a/include/linux/cpu.h
+++ b/include/linux/cpu.h
@@ -104,6 +104,7 @@ extern struct sysdev_class cpu_sysdev_class;

extern void get_online_cpus(void);
extern void put_online_cpus(void);
+extern int try_get_online_cpus(void);
#define hotcpu_notifier(fn, pri) { \
static struct notifier_block fn##_nb __cpuinitdata = \
{ .notifier_call = fn, .priority = pri }; \
@@ -117,6 +118,7 @@ int cpu_down(unsigned int cpu);

#define get_online_cpus() do { } while (0)
#define put_online_cpus() do { } while (0)
+static inline int try_get_online_cpus(void) { return 1; }
#define hotcpu_notifier(fn, pri) do { (void)(fn); } while (0)
/* These aren't inline functions due to a GCC bug. */
#define register_hotcpu_notifier(nb) ({ (void)(nb); 0; })
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 395b697..54d6e0d 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -14,6 +14,8 @@
#include <linux/kthread.h>
#include <linux/stop_machine.h>
#include <linux/mutex.h>
+#include <asm/atomic.h>
+#include <linux/wait.h>

#ifdef CONFIG_SMP
/* Serializes the updates to cpu_online_mask, cpu_present_mask */
@@ -26,21 +28,26 @@ static __cpuinitdata RAW_NOTIFIER_HEAD(cpu_chain);
*/
static int cpu_hotplug_disabled;

+/*
+ * @cpu_hotplug is a special read-write semaphore with these semantics:
+ * 1) It is read-preference and allows reader-in-reader recursion.
+ * 2) It allows writer to downgrade to a reader when required.
+ * (allows reader-in-writer recursion.)
+ * 3) It allows only one thread to require the write-side lock at most.
+ * (cpu_add_remove_lock ensures it.)
+ */
static struct {
struct task_struct *active_writer;
- struct mutex lock; /* Synchronizes accesses to refcount, */
- /*
- * Also blocks the new readers during
- * an ongoing cpu hotplug operation.
- */
- int refcount;
+ wait_queue_head_t sleeping_readers;
+ /* refcount = 0 means the writer owns the lock. */
+ atomic_t refcount;
} cpu_hotplug;

void __init cpu_hotplug_init(void)
{
cpu_hotplug.active_writer = NULL;
- mutex_init(&cpu_hotplug.lock);
- cpu_hotplug.refcount = 0;
+ init_waitqueue_head(&cpu_hotplug.sleeping_readers);
+ atomic_set(&cpu_hotplug.refcount, 1);
}

#ifdef CONFIG_HOTPLUG_CPU
@@ -50,10 +57,20 @@ void get_online_cpus(void)
might_sleep();
if (cpu_hotplug.active_writer == current)
return;
- mutex_lock(&cpu_hotplug.lock);
- cpu_hotplug.refcount++;
- mutex_unlock(&cpu_hotplug.lock);

+ if (unlikely(!atomic_inc_not_zero(&cpu_hotplug.refcount))) {
+ DEFINE_WAIT(wait);
+
+ for (;;) {
+ prepare_to_wait(&cpu_hotplug.sleeping_readers, &wait,
+ TASK_UNINTERRUPTIBLE);
+ if (atomic_inc_not_zero(&cpu_hotplug.refcount))
+ break;
+ schedule();
+ }
+
+ finish_wait(&cpu_hotplug.sleeping_readers, &wait);
+ }
}
EXPORT_SYMBOL_GPL(get_online_cpus);

@@ -61,14 +78,27 @@ void put_online_cpus(void)
{
if (cpu_hotplug.active_writer == current)
return;
- mutex_lock(&cpu_hotplug.lock);
- if (!--cpu_hotplug.refcount && unlikely(cpu_hotplug.active_writer))
- wake_up_process(cpu_hotplug.active_writer);
- mutex_unlock(&cpu_hotplug.lock);

+ if (unlikely(atomic_dec_and_test(&cpu_hotplug.refcount))) {
+ smp_mb__after_atomic_dec();
+ BUG_ON(!cpu_hotplug.active_writer);
+ wake_up_process(cpu_hotplug.active_writer);
+ }
}
EXPORT_SYMBOL_GPL(put_online_cpus);

+int try_get_online_cpus(void)
+{
+ if (cpu_hotplug.active_writer == current)
+ return 1;
+
+ if (likely(atomic_inc_not_zero(&cpu_hotplug.refcount)))
+ return 1;
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(try_get_online_cpus);
+
#endif /* CONFIG_HOTPLUG_CPU */

/*
@@ -86,46 +116,41 @@ void cpu_maps_update_done(void)
}

/*
- * This ensures that the hotplug operation can begin only when the
- * refcount goes to zero.
+ * This ensures that the hotplug operation can begin only when
+ * there is no ongoing reader.
*
* Note that during a cpu-hotplug operation, the new readers, if any,
- * will be blocked by the cpu_hotplug.lock
+ * will be blocked and queued at cpu_hotplug.sleeping_readers.
*
* Since cpu_hotplug_begin() is always called after invoking
* cpu_maps_update_begin(), we can be sure that only one writer is active.
*
- * Note that theoretically, there is a possibility of a livelock:
- * - Refcount goes to zero, last reader wakes up the sleeping
- * writer.
- * - Last reader unlocks the cpu_hotplug.lock.
- * - A new reader arrives at this moment, bumps up the refcount.
- * - The writer acquires the cpu_hotplug.lock finds the refcount
- * non zero and goes to sleep again.
- *
- * However, this is very difficult to achieve in practice since
- * get_online_cpus() not an api which is called all that often.
- *
*/
static void cpu_hotplug_begin(void)
{
cpu_hotplug.active_writer = current;
+ smp_mb__before_atomic_dec();
+ atomic_dec(&cpu_hotplug.refcount);

for (;;) {
- mutex_lock(&cpu_hotplug.lock);
- if (likely(!cpu_hotplug.refcount))
+ set_current_state(TASK_UNINTERRUPTIBLE);
+ if (!atomic_read(&cpu_hotplug.refcount))
break;
- __set_current_state(TASK_UNINTERRUPTIBLE);
- mutex_unlock(&cpu_hotplug.lock);
schedule();
}
+
+ __set_current_state(TASK_RUNNING);
}

static void cpu_hotplug_done(void)
{
cpu_hotplug.active_writer = NULL;
- mutex_unlock(&cpu_hotplug.lock);
+ atomic_inc(&cpu_hotplug.refcount);
+
+ if (waitqueue_active(&cpu_hotplug.sleeping_readers))
+ wake_up(&cpu_hotplug.sleeping_readers);
}
+
/* Need to know about CPUs going up/down? */
int __ref register_cpu_notifier(struct notifier_block *nb)
{


2009-06-04 20:54:51

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH] cpuhotplug: introduce try_get_online_cpus() take 2

On 06/04, Lai Jiangshan wrote:
>
> - Lockless for get_online_cpus()'s fast path
> - Introduce try_get_online_cpus()

I think this can work...

> @@ -50,10 +57,20 @@ void get_online_cpus(void)
> might_sleep();
> if (cpu_hotplug.active_writer == current)
> return;
> - mutex_lock(&cpu_hotplug.lock);
> - cpu_hotplug.refcount++;
> - mutex_unlock(&cpu_hotplug.lock);
>
> + if (unlikely(!atomic_inc_not_zero(&cpu_hotplug.refcount))) {
> + DEFINE_WAIT(wait);
> +
> + for (;;) {
> + prepare_to_wait(&cpu_hotplug.sleeping_readers, &wait,
> + TASK_UNINTERRUPTIBLE);
> + if (atomic_inc_not_zero(&cpu_hotplug.refcount))
> + break;
> + schedule();
> + }
> +
> + finish_wait(&cpu_hotplug.sleeping_readers, &wait);
> + }
> }

Looks like the code above can be replaced with

wait_event(atomic_inc_not_zero(&cpu_hotplug.refcount));

> static void cpu_hotplug_done(void)
> {
> cpu_hotplug.active_writer = NULL;
> - mutex_unlock(&cpu_hotplug.lock);
> + atomic_inc(&cpu_hotplug.refcount);
> +
> + if (waitqueue_active(&cpu_hotplug.sleeping_readers))
> + wake_up(&cpu_hotplug.sleeping_readers);
> }

This looks racy.

Suppose that the new reader comes right before atomic_inc(). The first
inc_not_zero() fails, the readear does prepare_to_wait(), the 2nd
inc_not_zero() fails too.

cpu_hotplug_done() does atomic_inc().

What guarantees we must see waitqueue_active() == T?

I think cpu_hotplug_done() should do unconditional wake_up(). This path
is slow anyway, "if (waitqueue_active())" does not buy too much. In this
case .sleeping_readers->lock closes the race.

Unless I missed something, of course.


Minor, but I'd suggest to use wake_up_all(). This does not make any
difference because we do not have WQ_FLAG_EXCLUSIVE waiters, but imho
looks a bit cleaner.


Hmm. It seems to me that cpu_hotplug_done() needs mb__before_atomic_inc()
before atomic_inc. Otherwise, "active_writer = NULL" can be re-ordered with
atomic_inc(). If the new reader does get_online_cpus() + put_online_cpus()
quicky, it can see active_writer != NULL.

Oleg.

2009-06-05 01:30:51

by Lai Jiangshan

[permalink] [raw]
Subject: Re: [PATCH] cpuhotplug: introduce try_get_online_cpus() take 2

Oleg Nesterov wrote:
> On 06/04, Lai Jiangshan wrote:
>> - Lockless for get_online_cpus()'s fast path
>> - Introduce try_get_online_cpus()
>
> I think this can work...
>
>> @@ -50,10 +57,20 @@ void get_online_cpus(void)
>> might_sleep();
>> if (cpu_hotplug.active_writer == current)
>> return;
>> - mutex_lock(&cpu_hotplug.lock);
>> - cpu_hotplug.refcount++;
>> - mutex_unlock(&cpu_hotplug.lock);
>>
>> + if (unlikely(!atomic_inc_not_zero(&cpu_hotplug.refcount))) {
>> + DEFINE_WAIT(wait);
>> +
>> + for (;;) {
>> + prepare_to_wait(&cpu_hotplug.sleeping_readers, &wait,
>> + TASK_UNINTERRUPTIBLE);
>> + if (atomic_inc_not_zero(&cpu_hotplug.refcount))
>> + break;
>> + schedule();
>> + }
>> +
>> + finish_wait(&cpu_hotplug.sleeping_readers, &wait);
>> + }
>> }
>
> Looks like the code above can be replaced with
>
> wait_event(atomic_inc_not_zero(&cpu_hotplug.refcount));

You are right, but with the atomic_inc_not_zero() has side-effect,
I'm afraid that wait_event() will be changed in future, and it may
increases the cpu_hotplug.refcount twice.

#define wait_event(wq, condition) ......

I consider that @condition should not have side-effect, it should be
some thing like this:

some_number == 2, !some_condition, some_thing_has_done,
......

>
>> static void cpu_hotplug_done(void)
>> {
>> cpu_hotplug.active_writer = NULL;
>> - mutex_unlock(&cpu_hotplug.lock);
>> + atomic_inc(&cpu_hotplug.refcount);
>> +
>> + if (waitqueue_active(&cpu_hotplug.sleeping_readers))
>> + wake_up(&cpu_hotplug.sleeping_readers);
>> }
>
> This looks racy.
>
> Suppose that the new reader comes right before atomic_inc(). The first
> inc_not_zero() fails, the readear does prepare_to_wait(), the 2nd
> inc_not_zero() fails too.
>
> cpu_hotplug_done() does atomic_inc().
>
> What guarantees we must see waitqueue_active() == T?
>
> I think cpu_hotplug_done() should do unconditional wake_up(). This path
> is slow anyway, "if (waitqueue_active())" does not buy too much. In this
> case .sleeping_readers->lock closes the race.
>
> Unless I missed something, of course.

You are definitely right, cpu_hotplug_done() should do unconditional
wake_up(). waitqueue_active() has no synchronization codes.

>
>
> Minor, but I'd suggest to use wake_up_all(). This does not make any
> difference because we do not have WQ_FLAG_EXCLUSIVE waiters, but imho
> looks a bit cleaner.
>
>
> Hmm. It seems to me that cpu_hotplug_done() needs mb__before_atomic_inc()
> before atomic_inc. Otherwise, "active_writer = NULL" can be re-ordered with
> atomic_inc(). If the new reader does get_online_cpus() + put_online_cpus()
> quicky, it can see active_writer != NULL.
>
>

The lines "active_writer = NULL" and "atomic_inc()" can exchange,
there is no code need to synchronize to them.
get_online_cpus()/put_online_cpus() will see "active_writer != current",
it just what get_online_cpus()/put_online_cpus() needs.

Lai

2009-06-05 02:19:24

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH] cpuhotplug: introduce try_get_online_cpus() take 2

On 06/05, Lai Jiangshan wrote:
>
> Oleg Nesterov wrote:
> > On 06/04, Lai Jiangshan wrote:
> >>
> >> + if (unlikely(!atomic_inc_not_zero(&cpu_hotplug.refcount))) {
> >> + DEFINE_WAIT(wait);
> >> +
> >> + for (;;) {
> >> + prepare_to_wait(&cpu_hotplug.sleeping_readers, &wait,
> >> + TASK_UNINTERRUPTIBLE);
> >> + if (atomic_inc_not_zero(&cpu_hotplug.refcount))
> >> + break;
> >> + schedule();
> >> + }
> >> +
> >> + finish_wait(&cpu_hotplug.sleeping_readers, &wait);
> >> + }
> >> }
> >
> > Looks like the code above can be replaced with
> >
> > wait_event(atomic_inc_not_zero(&cpu_hotplug.refcount));
>
> You are right, but with the atomic_inc_not_zero() has side-effect,
> I'm afraid that wait_event() will be changed in future, and it may
> increases the cpu_hotplug.refcount twice.

We already have side-effects in wait_event(), see use_module().
And wait_event(atomic_inc_not_zero()) is much easier to understand.

Personally, I think wait_event() must work correctly if "condition"
has side-effects.

But this is subjective. So, of course, please do what you like more.

> > Hmm. It seems to me that cpu_hotplug_done() needs mb__before_atomic_inc()
> > before atomic_inc. Otherwise, "active_writer = NULL" can be re-ordered with
> > atomic_inc(). If the new reader does get_online_cpus() + put_online_cpus()
> > quicky, it can see active_writer != NULL.
> >
>
> The lines "active_writer = NULL" and "atomic_inc()" can exchange,
> there is no code need to synchronize to them.
> get_online_cpus()/put_online_cpus() will see "active_writer != current",
> it just what get_online_cpus()/put_online_cpus() needs.

I meant another problem, but I misread the patch. When I actually applied
it I don't see any problems with re-ordering.

This means I should find something else ;) put_online_cpus() does not
need smp_mb__after_atomic_dec(). atomic_dec_and_test() implies mb() on
both sides, this is even documented.

Oleg.

2009-06-05 15:37:27

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH] cpuhotplug: introduce try_get_online_cpus() take 2

On Thu, Jun 04, 2009 at 02:58:20PM +0800, Lai Jiangshan wrote:
> Gautham R Shenoy wrote:
> > On Fri, May 29, 2009 at 06:53:42PM -0700, Paul E. McKenney wrote:
> >> On Fri, May 29, 2009 at 04:29:30PM +0800, Lai Jiangshan wrote:
> >>> Current get_online_cpus()/put_online_cpus() re-implement
> >>> a rw_semaphore, so it is converted to a real rw_semaphore in this fix.
> >>> It simplifies codes, and is good for read.
> >>>
> >>> And misc fix:
> >>> 1) Add comments for cpu_hotplug.active_writer.
> >>> 2) The theoretical disadvantage described in cpu_hotplug_begin()'s
> >>> comments is no longer existed when we use rw_semaphore,
> >>> so this part of comments was removed.
> >>>
> >>> [Impact: improve get_online_cpus()/put_online_cpus() ]
> >> Actually, it turns out that for my purposes it is only necessary to check:
> >>
> >> cpu_hotplug.active_writer != NULL
> >>
> >> The only time that it is unsafe to invoke get_online_cpus() is when
> >> in a notifier, and in that case the value of cpu_hotplug.active_writer
> >> is stable. There could be false positives, but these are harmless, as
> >> the fallback is simply synchronize_sched().
> >>
> >> Even this is only needed should the deadlock scenario you pointed out
> >> arise in practice.
> >>
> >> As Oleg noted, there are some "interesting" constraints on
> >> get_online_cpus(). Adding Gautham Shenoy to CC for his views.
> >
> > So, to put it in a sentence, get_online_cpus()/put_online_cpus() is a
> > read-write semaphore with read-preference while allowing writer to
> > downgrade to a reader when required.
> >
> > Read-preference was one of the ways of allowing unsuspecting functions
> > which need the protection against cpu-hotplug to end up seeking help of
> > functions which also need protection against cpu-hotplug. IOW allow a
> > single context to call get_online_cpus() without giving away to circular
> > deadlock. A fair reader-write lock wouldn't allow that since in the
> > presence of a write, the recursive reads would block, thereby causing a
> > deadlock.
> >
> > Also, around the time when this design was chosen, we had a whole bunch
> > of functions which did try to take the original "cpu_hotplug_mutex"
> > recursively. We could do well to use Lai's implementation if such
> > functions have mended their ways since this would make it a lot simpler
> > :-) . But I suspect it is easier said than done!
> >
> > BTW, I second the idea of try_get_online_cpus(). I had myself proposed
> > this idea a year back. http://lkml.org/lkml/2008/4/29/222.
> >
> >
> >
>
> Add CC to Peter Zijlstra <[email protected]>
>
> (This patch is against mainline but not for inclusion, it will adapted
> against -mm when request)
>
> Requst For Comments and Reviewing Hungeringly.
>
> - Lockless for get_online_cpus()'s fast path
> - Introduce try_get_online_cpus()

One question for Gautham Shenoy -- are non-atomic CPU-hotplug notifiers
always invoked from the task that did the cpu_hotplug_begin()?

If so, well and good. If not, then it would not be possible to
expedite RCU grace periods from within CPU-hotplug notifiers.

Thanx, Paul

> ---
> diff --git a/include/linux/cpu.h b/include/linux/cpu.h
> index 2643d84..63b216c 100644
> --- a/include/linux/cpu.h
> +++ b/include/linux/cpu.h
> @@ -104,6 +104,7 @@ extern struct sysdev_class cpu_sysdev_class;
>
> extern void get_online_cpus(void);
> extern void put_online_cpus(void);
> +extern int try_get_online_cpus(void);
> #define hotcpu_notifier(fn, pri) { \
> static struct notifier_block fn##_nb __cpuinitdata = \
> { .notifier_call = fn, .priority = pri }; \
> @@ -117,6 +118,7 @@ int cpu_down(unsigned int cpu);
>
> #define get_online_cpus() do { } while (0)
> #define put_online_cpus() do { } while (0)
> +static inline int try_get_online_cpus(void) { return 1; }
> #define hotcpu_notifier(fn, pri) do { (void)(fn); } while (0)
> /* These aren't inline functions due to a GCC bug. */
> #define register_hotcpu_notifier(nb) ({ (void)(nb); 0; })
> diff --git a/kernel/cpu.c b/kernel/cpu.c
> index 395b697..54d6e0d 100644
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -14,6 +14,8 @@
> #include <linux/kthread.h>
> #include <linux/stop_machine.h>
> #include <linux/mutex.h>
> +#include <asm/atomic.h>
> +#include <linux/wait.h>
>
> #ifdef CONFIG_SMP
> /* Serializes the updates to cpu_online_mask, cpu_present_mask */
> @@ -26,21 +28,26 @@ static __cpuinitdata RAW_NOTIFIER_HEAD(cpu_chain);
> */
> static int cpu_hotplug_disabled;
>
> +/*
> + * @cpu_hotplug is a special read-write semaphore with these semantics:
> + * 1) It is read-preference and allows reader-in-reader recursion.
> + * 2) It allows writer to downgrade to a reader when required.
> + * (allows reader-in-writer recursion.)
> + * 3) It allows only one thread to require the write-side lock at most.
> + * (cpu_add_remove_lock ensures it.)
> + */
> static struct {
> struct task_struct *active_writer;
> - struct mutex lock; /* Synchronizes accesses to refcount, */
> - /*
> - * Also blocks the new readers during
> - * an ongoing cpu hotplug operation.
> - */
> - int refcount;
> + wait_queue_head_t sleeping_readers;
> + /* refcount = 0 means the writer owns the lock. */
> + atomic_t refcount;
> } cpu_hotplug;
>
> void __init cpu_hotplug_init(void)
> {
> cpu_hotplug.active_writer = NULL;
> - mutex_init(&cpu_hotplug.lock);
> - cpu_hotplug.refcount = 0;
> + init_waitqueue_head(&cpu_hotplug.sleeping_readers);
> + atomic_set(&cpu_hotplug.refcount, 1);
> }
>
> #ifdef CONFIG_HOTPLUG_CPU
> @@ -50,10 +57,20 @@ void get_online_cpus(void)
> might_sleep();
> if (cpu_hotplug.active_writer == current)
> return;
> - mutex_lock(&cpu_hotplug.lock);
> - cpu_hotplug.refcount++;
> - mutex_unlock(&cpu_hotplug.lock);
>
> + if (unlikely(!atomic_inc_not_zero(&cpu_hotplug.refcount))) {
> + DEFINE_WAIT(wait);
> +
> + for (;;) {
> + prepare_to_wait(&cpu_hotplug.sleeping_readers, &wait,
> + TASK_UNINTERRUPTIBLE);
> + if (atomic_inc_not_zero(&cpu_hotplug.refcount))
> + break;
> + schedule();
> + }
> +
> + finish_wait(&cpu_hotplug.sleeping_readers, &wait);
> + }
> }
> EXPORT_SYMBOL_GPL(get_online_cpus);
>
> @@ -61,14 +78,27 @@ void put_online_cpus(void)
> {
> if (cpu_hotplug.active_writer == current)
> return;
> - mutex_lock(&cpu_hotplug.lock);
> - if (!--cpu_hotplug.refcount && unlikely(cpu_hotplug.active_writer))
> - wake_up_process(cpu_hotplug.active_writer);
> - mutex_unlock(&cpu_hotplug.lock);
>
> + if (unlikely(atomic_dec_and_test(&cpu_hotplug.refcount))) {
> + smp_mb__after_atomic_dec();
> + BUG_ON(!cpu_hotplug.active_writer);
> + wake_up_process(cpu_hotplug.active_writer);
> + }
> }
> EXPORT_SYMBOL_GPL(put_online_cpus);
>
> +int try_get_online_cpus(void)
> +{
> + if (cpu_hotplug.active_writer == current)
> + return 1;
> +
> + if (likely(atomic_inc_not_zero(&cpu_hotplug.refcount)))
> + return 1;
> +
> + return 0;
> +}
> +EXPORT_SYMBOL_GPL(try_get_online_cpus);
> +
> #endif /* CONFIG_HOTPLUG_CPU */
>
> /*
> @@ -86,46 +116,41 @@ void cpu_maps_update_done(void)
> }
>
> /*
> - * This ensures that the hotplug operation can begin only when the
> - * refcount goes to zero.
> + * This ensures that the hotplug operation can begin only when
> + * there is no ongoing reader.
> *
> * Note that during a cpu-hotplug operation, the new readers, if any,
> - * will be blocked by the cpu_hotplug.lock
> + * will be blocked and queued at cpu_hotplug.sleeping_readers.
> *
> * Since cpu_hotplug_begin() is always called after invoking
> * cpu_maps_update_begin(), we can be sure that only one writer is active.
> *
> - * Note that theoretically, there is a possibility of a livelock:
> - * - Refcount goes to zero, last reader wakes up the sleeping
> - * writer.
> - * - Last reader unlocks the cpu_hotplug.lock.
> - * - A new reader arrives at this moment, bumps up the refcount.
> - * - The writer acquires the cpu_hotplug.lock finds the refcount
> - * non zero and goes to sleep again.
> - *
> - * However, this is very difficult to achieve in practice since
> - * get_online_cpus() not an api which is called all that often.
> - *
> */
> static void cpu_hotplug_begin(void)
> {
> cpu_hotplug.active_writer = current;
> + smp_mb__before_atomic_dec();
> + atomic_dec(&cpu_hotplug.refcount);
>
> for (;;) {
> - mutex_lock(&cpu_hotplug.lock);
> - if (likely(!cpu_hotplug.refcount))
> + set_current_state(TASK_UNINTERRUPTIBLE);
> + if (!atomic_read(&cpu_hotplug.refcount))
> break;
> - __set_current_state(TASK_UNINTERRUPTIBLE);
> - mutex_unlock(&cpu_hotplug.lock);
> schedule();
> }
> +
> + __set_current_state(TASK_RUNNING);
> }
>
> static void cpu_hotplug_done(void)
> {
> cpu_hotplug.active_writer = NULL;
> - mutex_unlock(&cpu_hotplug.lock);
> + atomic_inc(&cpu_hotplug.refcount);
> +
> + if (waitqueue_active(&cpu_hotplug.sleeping_readers))
> + wake_up(&cpu_hotplug.sleeping_readers);
> }
> +
> /* Need to know about CPUs going up/down? */
> int __ref register_cpu_notifier(struct notifier_block *nb)
> {
>
>

2009-06-08 02:35:06

by Lai Jiangshan

[permalink] [raw]
Subject: Re: [PATCH] cpuhotplug: introduce try_get_online_cpus() take 2

Paul E. McKenney wrote:
>> Add CC to Peter Zijlstra <[email protected]>
>>
>> (This patch is against mainline but not for inclusion, it will adapted
>> against -mm when request)
>>
>> Requst For Comments and Reviewing Hungeringly.
>>
>> - Lockless for get_online_cpus()'s fast path
>> - Introduce try_get_online_cpus()
>
> One question for Gautham Shenoy -- are non-atomic CPU-hotplug notifiers
> always invoked from the task that did the cpu_hotplug_begin()?
>
> If so, well and good. If not, then it would not be possible to
> expedite RCU grace periods from within CPU-hotplug notifiers.
>
> Thanx, Paul
>


I have no proper machine for several days to test the next version
of this patch, so it is still being delayed a while.

Lai.


Subject: Re: [PATCH] cpuhotplug: introduce try_get_online_cpus() take 2

Hi Paul,

On Fri, Jun 05, 2009 at 08:37:14AM -0700, Paul E. McKenney wrote:
>
> One question for Gautham Shenoy -- are non-atomic CPU-hotplug notifiers
> always invoked from the task that did the cpu_hotplug_begin()?

Except for the notifiers handling two events, rest of the notifiers
are always invoked from the task that did the cpu_hotplug_begin().

The two events are CPU_DYING which is called from the context of the
stop_machine_thread and CPU_STARTING which is called from the context of
the idle thread on the CPU that has just come up. The notifiers handling
these two events are expected to be atomic.

> If so, well and good. If not, then it would not be possible to
> expedite RCU grace periods from within CPU-hotplug notifiers.

I hope this would be good enough :-)
>
> Thanx, Paul
>

--
Thanks and Regards
gautham

2009-06-08 14:25:46

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH] cpuhotplug: introduce try_get_online_cpus() take 2

On Mon, Jun 08, 2009 at 09:49:34AM +0530, Gautham R Shenoy wrote:
> Hi Paul,
>
> On Fri, Jun 05, 2009 at 08:37:14AM -0700, Paul E. McKenney wrote:
> >
> > One question for Gautham Shenoy -- are non-atomic CPU-hotplug notifiers
> > always invoked from the task that did the cpu_hotplug_begin()?
>
> Except for the notifiers handling two events, rest of the notifiers
> are always invoked from the task that did the cpu_hotplug_begin().
>
> The two events are CPU_DYING which is called from the context of the
> stop_machine_thread and CPU_STARTING which is called from the context of
> the idle thread on the CPU that has just come up. The notifiers handling
> these two events are expected to be atomic.
>
> > If so, well and good. If not, then it would not be possible to
> > expedite RCU grace periods from within CPU-hotplug notifiers.
>
> I hope this would be good enough :-)

Works for me!! ;-)

Thanx, Paul

2009-06-09 12:06:00

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH -mm] cpuhotplug: introduce try_get_online_cpus() take 3

It's for -mm tree.

It also works for mainline if you apply this at first:
http://lkml.org/lkml/2009/2/17/58

Subject: [PATCH -mm] cpuhotplug: introduce try_get_online_cpus() take 3

get_online_cpus() is a typically coarsely granular lock.
It's a source of ABBA deadlock.

Thanks to the CPU notifiers, Some subsystem's global lock will
be required after cpu_hotplug.lock. Subsystem's global lock
is coarsely granular lock too, thus a lot's of lock in kernel
should be required after cpu_hotplug.lock(if we need
cpu_hotplug.lock held too)

Otherwise it may come to a ABBA deadlock like this:

thread 1 | thread 2
_cpu_down() | Lock a-kernel-lock.
cpu_hotplug_begin() |
down_write(&cpu_hotplug.lock) |
__raw_notifier_call_chain(CPU_DOWN_PREPARE) | get_online_cpus()
------------------------------------------------------------------------
Lock a-kernel-lock.(wait thread2) | down_read(&cpu_hotplug.lock)
(wait thread 1)

But CPU online/offline are happened very rarely, get_online_cpus()
returns success quickly in all probability.
So it's an asinine behavior that get_online_cpus() is not allowed
to be required after we had held "a-kernel-lock".

To dispel the ABBA deadlock, this patch introduces
try_get_online_cpus(). It returns fail very rarely. It gives the
caller a chance to select an alternative way to finish works,
instead of sleeping or deadlock.

Changed from V1
Lockless for get_online_cpus()'s fast path

Changed from V2
Fix patch as Oleg Nesterov valuable suggestions.

Suggested-by: Paul E. McKenney <[email protected]>
Signed-off-by: Lai Jiangshan <[email protected]>
---
diff --git a/include/linux/cpu.h b/include/linux/cpu.h
index 4d668e0..eeb9ca5 100644
--- a/include/linux/cpu.h
+++ b/include/linux/cpu.h
@@ -99,6 +99,7 @@ extern struct sysdev_class cpu_sysdev_class;

extern void get_online_cpus(void);
extern void put_online_cpus(void);
+extern int try_get_online_cpus(void);
#define hotcpu_notifier(fn, pri) { \
static struct notifier_block fn##_nb __cpuinitdata = \
{ .notifier_call = fn, .priority = pri }; \
@@ -112,6 +113,7 @@ int cpu_down(unsigned int cpu);

#define get_online_cpus() do { } while (0)
#define put_online_cpus() do { } while (0)
+static inline int try_get_online_cpus(void) { return 1; }
#define hotcpu_notifier(fn, pri) do { (void)(fn); } while (0)
/* These aren't inline functions due to a GCC bug. */
#define register_hotcpu_notifier(nb) ({ (void)(nb); 0; })
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 609fae1..afecc95 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -14,6 +14,8 @@
#include <linux/kthread.h>
#include <linux/stop_machine.h>
#include <linux/mutex.h>
+#include <asm/atomic.h>
+#include <linux/wait.h>

#ifdef CONFIG_SMP
/* Serializes the updates to cpu_online_mask, cpu_present_mask */
@@ -26,16 +28,23 @@ static __cpuinitdata RAW_NOTIFIER_HEAD(cpu_chain);
*/
static int cpu_hotplug_disabled;

+/*
+ * @cpu_hotplug is a special read-write semaphore with these semantics:
+ * 1) It is read-preference and allows reader-in-reader recursion.
+ * 2) It allows writer to downgrade to a reader when required.
+ * (allows reader-in-writer recursion.)
+ * 3) It allows only one thread to require the write-side lock at most.
+ * (cpu_add_remove_lock ensures it.)
+ */
static struct {
struct task_struct *active_writer;
- struct mutex lock; /* Synchronizes accesses to refcount, */
- /*
- * Also blocks the new readers during
- * an ongoing cpu hotplug operation.
- */
- int refcount;
+ wait_queue_head_t sleeping_readers;
+ /* refcount = 0 means the writer owns the lock. */
+ atomic_t refcount;
} cpu_hotplug = {
- .lock = __MUTEX_INITIALIZER(cpu_hotplug.lock),
+ NULL,
+ __WAIT_QUEUE_HEAD_INITIALIZER(cpu_hotplug.sleeping_readers),
+ ATOMIC_INIT(1),
};

#ifdef CONFIG_HOTPLUG_CPU
@@ -45,10 +54,9 @@ void get_online_cpus(void)
might_sleep();
if (cpu_hotplug.active_writer == current)
return;
- mutex_lock(&cpu_hotplug.lock);
- cpu_hotplug.refcount++;
- mutex_unlock(&cpu_hotplug.lock);

+ wait_event(cpu_hotplug.sleeping_readers,
+ likely(atomic_inc_not_zero(&cpu_hotplug.refcount)));
}
EXPORT_SYMBOL_GPL(get_online_cpus);

@@ -56,14 +64,27 @@ void put_online_cpus(void)
{
if (cpu_hotplug.active_writer == current)
return;
- mutex_lock(&cpu_hotplug.lock);
- if (!--cpu_hotplug.refcount && unlikely(cpu_hotplug.active_writer))
- wake_up_process(cpu_hotplug.active_writer);
- mutex_unlock(&cpu_hotplug.lock);

+ if (unlikely(atomic_dec_and_test(&cpu_hotplug.refcount))) {
+ /* atomic_dec_and_test() implies smp_mb__after_atomic_dec() */
+ BUG_ON(!cpu_hotplug.active_writer);
+ wake_up_process(cpu_hotplug.active_writer);
+ }
}
EXPORT_SYMBOL_GPL(put_online_cpus);

+int try_get_online_cpus(void)
+{
+ if (cpu_hotplug.active_writer == current)
+ return 1;
+
+ if (likely(atomic_inc_not_zero(&cpu_hotplug.refcount)))
+ return 1;
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(try_get_online_cpus);
+
#endif /* CONFIG_HOTPLUG_CPU */

/*
@@ -81,46 +102,42 @@ void cpu_maps_update_done(void)
}

/*
- * This ensures that the hotplug operation can begin only when the
- * refcount goes to zero.
+ * This ensures that the hotplug operation can begin only when
+ * there is no ongoing reader.
*
* Note that during a cpu-hotplug operation, the new readers, if any,
- * will be blocked by the cpu_hotplug.lock
+ * will be blocked and queued at cpu_hotplug.sleeping_readers.
*
* Since cpu_hotplug_begin() is always called after invoking
* cpu_maps_update_begin(), we can be sure that only one writer is active.
*
- * Note that theoretically, there is a possibility of a livelock:
- * - Refcount goes to zero, last reader wakes up the sleeping
- * writer.
- * - Last reader unlocks the cpu_hotplug.lock.
- * - A new reader arrives at this moment, bumps up the refcount.
- * - The writer acquires the cpu_hotplug.lock finds the refcount
- * non zero and goes to sleep again.
- *
- * However, this is very difficult to achieve in practice since
- * get_online_cpus() not an api which is called all that often.
- *
*/
static void cpu_hotplug_begin(void)
{
cpu_hotplug.active_writer = current;

+ /* atomic_dec_and_test() implies smp_mb__before_atomic_dec() */
+ if (atomic_dec_and_test(&cpu_hotplug.refcount))
+ return;
+
for (;;) {
- mutex_lock(&cpu_hotplug.lock);
- if (likely(!cpu_hotplug.refcount))
+ set_current_state(TASK_UNINTERRUPTIBLE);
+ if (!atomic_read(&cpu_hotplug.refcount))
break;
- __set_current_state(TASK_UNINTERRUPTIBLE);
- mutex_unlock(&cpu_hotplug.lock);
schedule();
}
+
+ __set_current_state(TASK_RUNNING);
}

static void cpu_hotplug_done(void)
{
+ atomic_inc(&cpu_hotplug.refcount);
+ wake_up_all(&cpu_hotplug.sleeping_readers);
+
cpu_hotplug.active_writer = NULL;
- mutex_unlock(&cpu_hotplug.lock);
}
+
/* Need to know about CPUs going up/down? */
int __ref register_cpu_notifier(struct notifier_block *nb)
{

2009-06-09 19:35:46

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH -mm] cpuhotplug: introduce try_get_online_cpus() take 3

On Tue, 09 Jun 2009 20:07:09 +0800
Lai Jiangshan <[email protected]> wrote:

> get_online_cpus() is a typically coarsely granular lock.
> It's a source of ABBA deadlock.
>
> Thanks to the CPU notifiers, Some subsystem's global lock will
> be required after cpu_hotplug.lock. Subsystem's global lock
> is coarsely granular lock too, thus a lot's of lock in kernel
> should be required after cpu_hotplug.lock(if we need
> cpu_hotplug.lock held too)
>
> Otherwise it may come to a ABBA deadlock like this:
>
> thread 1 | thread 2
> _cpu_down() | Lock a-kernel-lock.
> cpu_hotplug_begin() |
> down_write(&cpu_hotplug.lock) |
> __raw_notifier_call_chain(CPU_DOWN_PREPARE) | get_online_cpus()
> ------------------------------------------------------------------------
> Lock a-kernel-lock.(wait thread2) | down_read(&cpu_hotplug.lock)
> (wait thread 1)

Confused. cpu_hotplug_begin() doesn't do
down_write(&cpu_hotplug.lock). If it _were_ to do that then yes, we'd
be vulnerable to the above deadlock.

2009-06-09 23:48:08

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH -mm] cpuhotplug: introduce try_get_online_cpus() take 3

On Tue, Jun 09, 2009 at 12:34:38PM -0700, Andrew Morton wrote:
> On Tue, 09 Jun 2009 20:07:09 +0800
> Lai Jiangshan <[email protected]> wrote:
>
> > get_online_cpus() is a typically coarsely granular lock.
> > It's a source of ABBA deadlock.
> >
> > Thanks to the CPU notifiers, Some subsystem's global lock will
> > be required after cpu_hotplug.lock. Subsystem's global lock
> > is coarsely granular lock too, thus a lot's of lock in kernel
> > should be required after cpu_hotplug.lock(if we need
> > cpu_hotplug.lock held too)
> >
> > Otherwise it may come to a ABBA deadlock like this:
> >
> > thread 1 | thread 2
> > _cpu_down() | Lock a-kernel-lock.
> > cpu_hotplug_begin() |
> > down_write(&cpu_hotplug.lock) |
> > __raw_notifier_call_chain(CPU_DOWN_PREPARE) | get_online_cpus()
> > ------------------------------------------------------------------------
> > Lock a-kernel-lock.(wait thread2) | down_read(&cpu_hotplug.lock)
> > (wait thread 1)
>
> Confused. cpu_hotplug_begin() doesn't do
> down_write(&cpu_hotplug.lock). If it _were_ to do that then yes, we'd
> be vulnerable to the above deadlock.

The current implementation is a bit more complex. If you hold a kernel
mutex across get_online_cpus() and also acquire that same kernel mutex
in a hotplug notifier that permits sleeping, I believe that you really
can get a deadlock as follows:

Task 1 | Task 2
| mutex_lock(&mylock);
cpu_hotplug_begin() |
mutex_lock(&cpu_hotplug.lock); |
[assume cpu_hotplug.refcount == 0] | get_online_cpus()
---------------------------------------------------------------------------
mutex_lock(&mylock); | mutex_lock(&cpu_hotplug.lock);


That said, when I look at the raw_notifier_call_chain() and
unregister_cpu_notifier() code paths, it is not obvious to me that they
exclude each other or otherwise protect the cpu_chain list...

Thanx, Paul

2009-06-10 00:56:49

by Lai Jiangshan

[permalink] [raw]
Subject: Re: [PATCH -mm] cpuhotplug: introduce try_get_online_cpus() take 3

Andrew Morton wrote:
> On Tue, 09 Jun 2009 20:07:09 +0800
> Lai Jiangshan <[email protected]> wrote:
>
>> get_online_cpus() is a typically coarsely granular lock.
>> It's a source of ABBA deadlock.
>>
>> Thanks to the CPU notifiers, Some subsystem's global lock will
>> be required after cpu_hotplug.lock. Subsystem's global lock
>> is coarsely granular lock too, thus a lot's of lock in kernel
>> should be required after cpu_hotplug.lock(if we need
>> cpu_hotplug.lock held too)
>>
>> Otherwise it may come to a ABBA deadlock like this:
>>
>> thread 1 | thread 2
>> _cpu_down() | Lock a-kernel-lock.
>> cpu_hotplug_begin() |
>> down_write(&cpu_hotplug.lock) |
>> __raw_notifier_call_chain(CPU_DOWN_PREPARE) | get_online_cpus()
>> ------------------------------------------------------------------------
>> Lock a-kernel-lock.(wait thread2) | down_read(&cpu_hotplug.lock)
>> (wait thread 1)
>
> Confused. cpu_hotplug_begin() doesn't do
> down_write(&cpu_hotplug.lock). If it _were_ to do that then yes, we'd
> be vulnerable to the above deadlock.
>

Ouch, this changelog is modified from the V1. But it not is modified
correctly. I apologize.

Lai.

2009-06-10 01:12:35

by Lai Jiangshan

[permalink] [raw]
Subject: [PATCH -mm resend] cpuhotplug: introduce try_get_online_cpus() take 3

It's for -mm tree.

It also works for mainline if you apply this at first:
http://lkml.org/lkml/2009/2/17/58

Subject: [PATCH -mm] cpuhotplug: introduce try_get_online_cpus() take 3

get_online_cpus() is a typically coarsely granular lock.
It's a source of ABBA or ABBCCA... deadlock.

Thanks to the CPU notifiers, Some subsystem's global lock will
be required after cpu_hotplug.lock. Subsystem's global lock
is coarsely granular lock too, thus a lot's of lock in kernel
should be required after cpu_hotplug.lock(if we need
cpu_hotplug.lock held too)

Otherwise it may come to a ABBA deadlock like this:

thread 1 | thread 2
_cpu_down() | Lock a-kernel-lock.
cpu_hotplug_begin() |
mutex_lock(&cpu_hotplug.lock) |
__raw_notifier_call_chain(CPU_DOWN_PREPARE) | get_online_cpus()
------------------------------------------------------------------------
Lock a-kernel-lock.(wait thread2) | mutex_lock(&cpu_hotplug.lock)
(wait thread 1)

But CPU online/offline are happened very rarely, get_online_cpus()
returns success quickly in all probability.
So it's an asinine behavior that get_online_cpus() is not allowed
to be required after we had held "a-kernel-lock".

To dispel the ABBA deadlock, this patch introduces
try_get_online_cpus(). It returns fail very rarely. It gives the
caller a chance to select an alternative way to finish works,
instead of sleeping or deadlock.

Changed from V1
Lockless for get_online_cpus()'s fast path

Changed from V2
Fix patch as Oleg Nesterov's valuable suggestions.

Suggested-by: Paul E. McKenney <[email protected]>
Signed-off-by: Lai Jiangshan <[email protected]>
---
diff --git a/include/linux/cpu.h b/include/linux/cpu.h
index 4d668e0..eeb9ca5 100644
--- a/include/linux/cpu.h
+++ b/include/linux/cpu.h
@@ -99,6 +99,7 @@ extern struct sysdev_class cpu_sysdev_class;

extern void get_online_cpus(void);
extern void put_online_cpus(void);
+extern int try_get_online_cpus(void);
#define hotcpu_notifier(fn, pri) { \
static struct notifier_block fn##_nb __cpuinitdata = \
{ .notifier_call = fn, .priority = pri }; \
@@ -112,6 +113,7 @@ int cpu_down(unsigned int cpu);

#define get_online_cpus() do { } while (0)
#define put_online_cpus() do { } while (0)
+static inline int try_get_online_cpus(void) { return 1; }
#define hotcpu_notifier(fn, pri) do { (void)(fn); } while (0)
/* These aren't inline functions due to a GCC bug. */
#define register_hotcpu_notifier(nb) ({ (void)(nb); 0; })
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 609fae1..afecc95 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -14,6 +14,8 @@
#include <linux/kthread.h>
#include <linux/stop_machine.h>
#include <linux/mutex.h>
+#include <asm/atomic.h>
+#include <linux/wait.h>

#ifdef CONFIG_SMP
/* Serializes the updates to cpu_online_mask, cpu_present_mask */
@@ -26,16 +28,23 @@ static __cpuinitdata RAW_NOTIFIER_HEAD(cpu_chain);
*/
static int cpu_hotplug_disabled;

+/*
+ * @cpu_hotplug is a special read-write semaphore with these semantics:
+ * 1) It is read-preference and allows reader-in-reader recursion.
+ * 2) It allows writer to downgrade to a reader when required.
+ * (allows reader-in-writer recursion.)
+ * 3) It allows only one thread to require the write-side lock at most.
+ * (cpu_add_remove_lock ensures it.)
+ */
static struct {
struct task_struct *active_writer;
- struct mutex lock; /* Synchronizes accesses to refcount, */
- /*
- * Also blocks the new readers during
- * an ongoing cpu hotplug operation.
- */
- int refcount;
+ wait_queue_head_t sleeping_readers;
+ /* refcount = 0 means the writer owns the lock. */
+ atomic_t refcount;
} cpu_hotplug = {
- .lock = __MUTEX_INITIALIZER(cpu_hotplug.lock),
+ NULL,
+ __WAIT_QUEUE_HEAD_INITIALIZER(cpu_hotplug.sleeping_readers),
+ ATOMIC_INIT(1),
};

#ifdef CONFIG_HOTPLUG_CPU
@@ -45,10 +54,9 @@ void get_online_cpus(void)
might_sleep();
if (cpu_hotplug.active_writer == current)
return;
- mutex_lock(&cpu_hotplug.lock);
- cpu_hotplug.refcount++;
- mutex_unlock(&cpu_hotplug.lock);

+ wait_event(cpu_hotplug.sleeping_readers,
+ likely(atomic_inc_not_zero(&cpu_hotplug.refcount)));
}
EXPORT_SYMBOL_GPL(get_online_cpus);

@@ -56,14 +64,27 @@ void put_online_cpus(void)
{
if (cpu_hotplug.active_writer == current)
return;
- mutex_lock(&cpu_hotplug.lock);
- if (!--cpu_hotplug.refcount && unlikely(cpu_hotplug.active_writer))
- wake_up_process(cpu_hotplug.active_writer);
- mutex_unlock(&cpu_hotplug.lock);

+ if (unlikely(atomic_dec_and_test(&cpu_hotplug.refcount))) {
+ /* atomic_dec_and_test() implies smp_mb__after_atomic_dec() */
+ BUG_ON(!cpu_hotplug.active_writer);
+ wake_up_process(cpu_hotplug.active_writer);
+ }
}
EXPORT_SYMBOL_GPL(put_online_cpus);

+int try_get_online_cpus(void)
+{
+ if (cpu_hotplug.active_writer == current)
+ return 1;
+
+ if (likely(atomic_inc_not_zero(&cpu_hotplug.refcount)))
+ return 1;
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(try_get_online_cpus);
+
#endif /* CONFIG_HOTPLUG_CPU */

/*
@@ -81,46 +102,42 @@ void cpu_maps_update_done(void)
}

/*
- * This ensures that the hotplug operation can begin only when the
- * refcount goes to zero.
+ * This ensures that the hotplug operation can begin only when
+ * there is no ongoing reader.
*
* Note that during a cpu-hotplug operation, the new readers, if any,
- * will be blocked by the cpu_hotplug.lock
+ * will be blocked and queued at cpu_hotplug.sleeping_readers.
*
* Since cpu_hotplug_begin() is always called after invoking
* cpu_maps_update_begin(), we can be sure that only one writer is active.
*
- * Note that theoretically, there is a possibility of a livelock:
- * - Refcount goes to zero, last reader wakes up the sleeping
- * writer.
- * - Last reader unlocks the cpu_hotplug.lock.
- * - A new reader arrives at this moment, bumps up the refcount.
- * - The writer acquires the cpu_hotplug.lock finds the refcount
- * non zero and goes to sleep again.
- *
- * However, this is very difficult to achieve in practice since
- * get_online_cpus() not an api which is called all that often.
- *
*/
static void cpu_hotplug_begin(void)
{
cpu_hotplug.active_writer = current;

+ /* atomic_dec_and_test() implies smp_mb__before_atomic_dec() */
+ if (atomic_dec_and_test(&cpu_hotplug.refcount))
+ return;
+
for (;;) {
- mutex_lock(&cpu_hotplug.lock);
- if (likely(!cpu_hotplug.refcount))
+ set_current_state(TASK_UNINTERRUPTIBLE);
+ if (!atomic_read(&cpu_hotplug.refcount))
break;
- __set_current_state(TASK_UNINTERRUPTIBLE);
- mutex_unlock(&cpu_hotplug.lock);
schedule();
}
+
+ __set_current_state(TASK_RUNNING);
}

static void cpu_hotplug_done(void)
{
+ atomic_inc(&cpu_hotplug.refcount);
+ wake_up_all(&cpu_hotplug.sleeping_readers);
+
cpu_hotplug.active_writer = NULL;
- mutex_unlock(&cpu_hotplug.lock);
}
+
/* Need to know about CPUs going up/down? */
int __ref register_cpu_notifier(struct notifier_block *nb)
{

2009-06-10 01:43:36

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH -mm resend] cpuhotplug: introduce try_get_online_cpus() take 3

On Wed, 10 Jun 2009 09:13:58 +0800 Lai Jiangshan <[email protected]> wrote:

> It's for -mm tree.
>
> It also works for mainline if you apply this at first:
> http://lkml.org/lkml/2009/2/17/58
>
> Subject: [PATCH -mm] cpuhotplug: introduce try_get_online_cpus() take 3
>
> get_online_cpus() is a typically coarsely granular lock.
> It's a source of ABBA or ABBCCA... deadlock.
>
> Thanks to the CPU notifiers, Some subsystem's global lock will
> be required after cpu_hotplug.lock. Subsystem's global lock
> is coarsely granular lock too, thus a lot's of lock in kernel
> should be required after cpu_hotplug.lock(if we need
> cpu_hotplug.lock held too)
>
> Otherwise it may come to a ABBA deadlock like this:
>
> thread 1 | thread 2
> _cpu_down() | Lock a-kernel-lock.
> cpu_hotplug_begin() |
> mutex_lock(&cpu_hotplug.lock) |
> __raw_notifier_call_chain(CPU_DOWN_PREPARE) | get_online_cpus()
> ------------------------------------------------------------------------
> Lock a-kernel-lock.(wait thread2) | mutex_lock(&cpu_hotplug.lock)
> (wait thread 1)

uh, OK.

> But CPU online/offline are happened very rarely, get_online_cpus()
> returns success quickly in all probability.
> So it's an asinine behavior that get_online_cpus() is not allowed
> to be required after we had held "a-kernel-lock".
>
> To dispel the ABBA deadlock, this patch introduces
> try_get_online_cpus(). It returns fail very rarely. It gives the
> caller a chance to select an alternative way to finish works,
> instead of sleeping or deadlock.

I still think we should really avoid having to do this. trylocks are
nasty things.

Looking at the above, one would think that a correct fix would be to fix
the bug in "thread 2": take the locks in the correct order? As
try_get_online_cpus() doesn't actually have any callers, it's hard to
take that thought any further.

2009-06-11 08:40:31

by Lai Jiangshan

[permalink] [raw]
Subject: Re: [PATCH -mm resend] cpuhotplug: introduce try_get_online_cpus() take 3

Andrew Morton wrote:
>
> I still think we should really avoid having to do this. trylocks are
> nasty things.
>
> Looking at the above, one would think that a correct fix would be to fix
> the bug in "thread 2": take the locks in the correct order? As
> try_get_online_cpus() doesn't actually have any callers, it's hard to
> take that thought any further.
>
>

Sometimes, we can not reorder the locks' order.
try_get_online_cpus() is really needless when no one uses it.

Paul's expedited RCU V7 may need it:
http://lkml.org/lkml/2009/5/22/332

So this patch can be omitted when Paul does not use it.
It's totally OK for me.

Lai

2009-06-11 18:50:31

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH -mm resend] cpuhotplug: introduce try_get_online_cpus() take 3

On Thu, Jun 11, 2009 at 04:41:42PM +0800, Lai Jiangshan wrote:
> Andrew Morton wrote:
> >
> > I still think we should really avoid having to do this. trylocks are
> > nasty things.
> >
> > Looking at the above, one would think that a correct fix would be to fix
> > the bug in "thread 2": take the locks in the correct order? As
> > try_get_online_cpus() doesn't actually have any callers, it's hard to
> > take that thought any further.
>
> Sometimes, we can not reorder the locks' order.
> try_get_online_cpus() is really needless when no one uses it.
>
> Paul's expedited RCU V7 may need it:
> http://lkml.org/lkml/2009/5/22/332
>
> So this patch can be omitted when Paul does not use it.
> It's totally OK for me.

Although my patch does not need it in and of itself, if someone were
to hold a kernel mutex across synchronize_sched_expedited(), and also
acquire that same kernel mutex in a hotplug notifier, the deadlock that
Lai calls out would occur.

Even if no one uses synchronize_sched_expedited() in this manner, I feel
that it is good to explore the possibility of dealing with it. As
Andrew Morton pointed out, CPU-hotplug locking is touchy, so on-the-fly
fixes are to be avoided if possible.

Thanx, Paul

Subject: Re: [PATCH -mm resend] cpuhotplug: introduce try_get_online_cpus() take 3

On Thu, Jun 11, 2009 at 11:50:15AM -0700, Paul E. McKenney wrote:
> On Thu, Jun 11, 2009 at 04:41:42PM +0800, Lai Jiangshan wrote:
> > Andrew Morton wrote:
> > >
> > > I still think we should really avoid having to do this. trylocks are
> > > nasty things.
> > >
> > > Looking at the above, one would think that a correct fix would be to fix
> > > the bug in "thread 2": take the locks in the correct order? As
> > > try_get_online_cpus() doesn't actually have any callers, it's hard to
> > > take that thought any further.
> >
> > Sometimes, we can not reorder the locks' order.
> > try_get_online_cpus() is really needless when no one uses it.
> >
> > Paul's expedited RCU V7 may need it:
> > http://lkml.org/lkml/2009/5/22/332
> >
> > So this patch can be omitted when Paul does not use it.
> > It's totally OK for me.
>
> Although my patch does not need it in and of itself, if someone were
> to hold a kernel mutex across synchronize_sched_expedited(), and also
> acquire that same kernel mutex in a hotplug notifier, the deadlock that
> Lai calls out would occur.
>
> Even if no one uses synchronize_sched_expedited() in this manner, I feel
> that it is good to explore the possibility of dealing with it. As
> Andrew Morton pointed out, CPU-hotplug locking is touchy, so on-the-fly
> fixes are to be avoided if possible.

Agreed. Though I like the atomic refcount version of
get_online_cpus()/put_online_cpus() that Lai has proposed.

Anyways, to quote the need for try_get_online_cpus() when it was
proposed last year, it was to be used in worker thread context.

Because in those times we could not do a get_online_cpus() from
the worker thread context fearing the follwing deadlock during
a cpu-hotplug.

Thread 1:(cpu_offline) | Thread 2 ( worker_thread)
-----------------------------------------------------------------------
cpu_hotplug_begin(); |
. |
. | get_online_cpus(); /*Blocks */
. |
. |
CPU_DEAD: |
workqueue_cpu_callback(); |
cleanup_workqueue_thread() |
/* Waits for worker thread
* to finish.
* Hence a deadlock.
*/

This was fixed by introducing the CPU_POST_DEAD event, the notification

>
> Thanx, Paul

--
Thanks and Regards
gautham