2008-10-23 00:59:45

by Rusty Russell

[permalink] [raw]
Subject: [PATCH 1/7] work_on_cpu: helper for doing task on a CPU.


Several places in the kernel do the following:

saved_mask = current->cpus_allowed;
set_cpus_allowed_ptr(current, &cpumask_of_cpu(pr->id));
somefunc();
/* restore the previous state */
set_cpus_allowed_ptr(current, &saved_mask);

This is bad, because a process's cpumask is observable and
manipulatable by userspace and should not be toyed with.

We have the infrastructure, this just creates a nice wrapper to
encourage its use.

Signed-off-by: Rusty Russell <[email protected]>
Cc: Ingo Molnar <[email protected]>
---
include/linux/workqueue.h | 8 +++++++
kernel/workqueue.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 56 insertions(+)

diff -r b72e0cbdd249 include/linux/workqueue.h
--- a/include/linux/workqueue.h Thu Oct 23 00:39:36 2008 +1100
+++ b/include/linux/workqueue.h Thu Oct 23 10:53:51 2008 +1100
@@ -240,4 +240,12 @@ void cancel_rearming_delayed_work(struct
cancel_delayed_work_sync(work);
}

+#ifndef CONFIG_SMP
+static inline long work_on_cpu(unsigned int cpu, long (*fn)(void *), void *arg)
+{
+ return fn(arg);
+}
+#else
+long work_on_cpu(unsigned int cpu, long (*fn)(void *), void *arg);
+#endif /* CONFIG_SMP */
#endif
diff -r b72e0cbdd249 kernel/workqueue.c
--- a/kernel/workqueue.c Thu Oct 23 00:39:36 2008 +1100
+++ b/kernel/workqueue.c Thu Oct 23 10:53:51 2008 +1100
@@ -970,6 +970,54 @@ undo:
return ret;
}

+#ifdef CONFIG_SMP
+struct work_for_cpu {
+ struct work_struct work;
+ long (*fn)(void *);
+ void *arg;
+ long ret;
+ struct completion done;
+};
+
+static void do_work_for_cpu(struct work_struct *w)
+{
+ struct work_for_cpu *wfc = container_of(w, struct work_for_cpu, work);
+
+ wfc->ret = wfc->fn(wfc->arg);
+ complete(&wfc->done);
+}
+
+/**
+ * work_on_cpu - run a function in user context on a particular cpu
+ * @cpu: the cpu to run on
+ * @fn: the function to run
+ * @arg: the function arg
+ *
+ * This will return -EINVAL in the cpu is not online, or the return value
+ * of @fn otherwise.
+ */
+long work_on_cpu(unsigned int cpu, long (*fn)(void *), void *arg)
+{
+ struct work_for_cpu wfc;
+
+ INIT_WORK(&wfc.work, do_work_for_cpu);
+ init_completion(&wfc.done);
+ wfc.fn = fn;
+ wfc.arg = arg;
+ get_online_cpus();
+ if (unlikely(!cpu_online(cpu))) {
+ wfc.ret = -EINVAL;
+ complete(&wfc.done);
+ } else
+ schedule_work_on(cpu, &wfc.work);
+ put_online_cpus();
+ wait_for_completion(&wfc.done);
+
+ return wfc.ret;
+}
+EXPORT_SYMBOL_GPL(work_on_cpu);
+#endif /* CONFIG_SMP */
+
void __init init_workqueues(void)
{
cpu_populated_map = cpu_online_map;


Subject: Re: [PATCH 1/7] work_on_cpu: helper for doing task on a CPU.

On Thu, Oct 23, 2008 at 01:01:28AM +0000, Rusty Russell wrote:
>
> Several places in the kernel do the following:
>
> saved_mask = current->cpus_allowed;
> set_cpus_allowed_ptr(current, &cpumask_of_cpu(pr->id));
> somefunc();
> /* restore the previous state */
> set_cpus_allowed_ptr(current, &saved_mask);
>
> This is bad, because a process's cpumask is observable and
> manipulatable by userspace and should not be toyed with.
>
> We have the infrastructure, this just creates a nice wrapper to
> encourage its use.
>
> Signed-off-by: Rusty Russell <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> ---
> include/linux/workqueue.h | 8 +++++++
> kernel/workqueue.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 56 insertions(+)
>
> diff -r b72e0cbdd249 include/linux/workqueue.h
> --- a/include/linux/workqueue.h Thu Oct 23 00:39:36 2008 +1100
> +++ b/include/linux/workqueue.h Thu Oct 23 10:53:51 2008 +1100
> @@ -240,4 +240,12 @@ void cancel_rearming_delayed_work(struct
> cancel_delayed_work_sync(work);
>

<snip>
> + */
> +long work_on_cpu(unsigned int cpu, long (*fn)(void *), void *arg)
> +{
> + struct work_for_cpu wfc;
> +
> + INIT_WORK(&wfc.work, do_work_for_cpu);
> + init_completion(&wfc.done);
> + wfc.fn = fn;
> + wfc.arg = arg;
> + get_online_cpus();
> + if (unlikely(!cpu_online(cpu))) {
> + wfc.ret = -EINVAL;
> + complete(&wfc.done);
> + } else
> + schedule_work_on(cpu, &wfc.work);

Won't this cause priority inversion in case of real-time tasks ?

> + put_online_cpus();
> + wait_for_completion(&wfc.done);
> +
> + return wfc.ret;
> +}
> +EXPORT_SYMBOL_GPL(work_on_cpu);
> +#endif /* CONFIG_SMP */
> +
> void __init init_workqueues(void)
> {
> cpu_populated_map = cpu_online_map;
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

--
Thanks and Regards
gautham

2008-10-23 09:40:15

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 1/7] work_on_cpu: helper for doing task on a CPU.

On 10/23, Rusty Russell wrote:
>
> +long work_on_cpu(unsigned int cpu, long (*fn)(void *), void *arg)
> +{
> + struct work_for_cpu wfc;
> +
> + INIT_WORK(&wfc.work, do_work_for_cpu);
> + init_completion(&wfc.done);
> + wfc.fn = fn;
> + wfc.arg = arg;
> + get_online_cpus();
> + if (unlikely(!cpu_online(cpu))) {
> + wfc.ret = -EINVAL;
> + complete(&wfc.done);
> + } else
> + schedule_work_on(cpu, &wfc.work);

I do not claim this is wrong, but imho the code is a bit lisleading and
needs a comment (or the "fix", please see below).

Once we drop cpu_hotplug lock, CPU can go away and this work can migrate
to another cpu.

> + put_online_cpus();
> + wait_for_completion(&wfc.done);

Actually you don't need work_for_cpu->done, you can use flush_work().

IOW, I'd suggest

long work_on_cpu(unsigned int cpu, long (*fn)(void *), void *arg)
{
struct work_for_cpu wfc;

INIT_WORK(&wfc.work, do_work_for_cpu);
wfc.fn = fn;
wfc.arg = arg;
wfc.ret = -EINVAL;

get_online_cpus();
if (likely(cpu_online(cpu))) {
schedule_work_on(cpu, &wfc.work);
flush_work(&wfc.work);
}
put_online_cpus();

return wfc.ret;
}

Oleg.

Subject: Re: [PATCH 1/7] work_on_cpu: helper for doing task on a CPU.

On Thu, Oct 23, 2008 at 11:40:36AM +0200, Oleg Nesterov wrote:
> On 10/23, Rusty Russell wrote:
> >
> > +long work_on_cpu(unsigned int cpu, long (*fn)(void *), void *arg)
> > +{
> > + struct work_for_cpu wfc;
> > +
> > + INIT_WORK(&wfc.work, do_work_for_cpu);
> > + init_completion(&wfc.done);
> > + wfc.fn = fn;
> > + wfc.arg = arg;
> > + get_online_cpus();
> > + if (unlikely(!cpu_online(cpu))) {
> > + wfc.ret = -EINVAL;
> > + complete(&wfc.done);
> > + } else
> > + schedule_work_on(cpu, &wfc.work);
>
> I do not claim this is wrong, but imho the code is a bit lisleading and
> needs a comment (or the "fix", please see below).
>
> Once we drop cpu_hotplug lock, CPU can go away and this work can migrate
> to another cpu.

True.

>
> > + put_online_cpus();
> > + wait_for_completion(&wfc.done);
>
> Actually you don't need work_for_cpu->done, you can use flush_work().
>
> IOW, I'd suggest
>
> long work_on_cpu(unsigned int cpu, long (*fn)(void *), void *arg)
> {
> struct work_for_cpu wfc;
>
> INIT_WORK(&wfc.work, do_work_for_cpu);
> wfc.fn = fn;
> wfc.arg = arg;
> wfc.ret = -EINVAL;
>
> get_online_cpus();
> if (likely(cpu_online(cpu))) {
> schedule_work_on(cpu, &wfc.work);
> flush_work(&wfc.work);
> }

OK, how about doing the following? That will solve the problem
of deadlock you pointed out in patch 6.

get_online_cpus();
if (likely(per_cpu(cpu_state, cpuid) == CPU_ONLINE)) {
schedule_work_on(cpu, &wfc.work);
flush_work(&wfc.work);
} else if (per_cpu(cpu_state, cpuid) != CPU_DEAD)) {
/*
* We're the CPU-Hotplug thread. Call the
* function synchronously so that we don't
* deadlock with any pending work-item blocked
* on get_online_cpus()
*/
cpumask_t orignal_mask = current->cpus_allowed;
set_cpus_allowed_ptr(current, &cpumask_of_cpu(cpu);
wfc.ret = fn(arg);
set_cpus_allowed_ptr(current, &original_mask);

}
> put_online_cpus();
>
> return wfc.ret;
> }
>
> Oleg.
>

--
Thanks and Regards
gautham

2008-10-23 15:10:47

by Rusty Russell

[permalink] [raw]
Subject: Re: [PATCH 1/7] work_on_cpu: helper for doing task on a CPU.

On Thursday 23 October 2008 20:40:36 Oleg Nesterov wrote:
> Once we drop cpu_hotplug lock, CPU can go away and this work can migrate
> to another cpu.

Thanks Oleg. It is wrong. I'll use your suggestion; it's simpler.

Cheers,
Rusty.

work_on_cpu: helper for doing task on a CPU.

Several places in the kernel do the following:

saved_mask = current->cpus_allowed;
set_cpus_allowed_ptr(current, &cpumask_of_cpu(pr->id));
somefunc();
/* restore the previous state */
set_cpus_allowed_ptr(current, &saved_mask);

This is bad, because a process's cpumask is observable and
manipulatable by userspace and should not be toyed with.

We have the infrastructure, this just creates a nice wrapper to
encourage its use.

Signed-off-by: Rusty Russell <[email protected]>
Cc: Ingo Molnar <[email protected]>
---
include/linux/workqueue.h | 8 ++++++++
kernel/workqueue.c | 45 +++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 53 insertions(+)

diff -r 494880dd124b include/linux/workqueue.h
--- a/include/linux/workqueue.h Fri Oct 24 00:29:43 2008 +1100
+++ b/include/linux/workqueue.h Fri Oct 24 01:16:52 2008 +1100
@@ -240,4 +240,12 @@ void cancel_rearming_delayed_work(struct
cancel_delayed_work_sync(work);
}

+#ifndef CONFIG_SMP
+static inline long work_on_cpu(unsigned int cpu, long (*fn)(void *), void *arg)
+{
+ return fn(arg);
+}
+#else
+long work_on_cpu(unsigned int cpu, long (*fn)(void *), void *arg);
+#endif /* CONFIG_SMP */
#endif
diff -r 494880dd124b kernel/workqueue.c
--- a/kernel/workqueue.c Fri Oct 24 00:29:43 2008 +1100
+++ b/kernel/workqueue.c Fri Oct 24 01:16:52 2008 +1100
@@ -970,6 +970,51 @@ undo:
return ret;
}

+#ifdef CONFIG_SMP
+struct work_for_cpu {
+ struct work_struct work;
+ long (*fn)(void *);
+ void *arg;
+ long ret;
+};
+
+static void do_work_for_cpu(struct work_struct *w)
+{
+ struct work_for_cpu *wfc = container_of(w, struct work_for_cpu, work);
+
+ wfc->ret = wfc->fn(wfc->arg);
+}
+
+/**
+ * work_on_cpu - run a function in user context on a particular cpu
+ * @cpu: the cpu to run on
+ * @fn: the function to run
+ * @arg: the function arg
+ *
+ * This will return -EINVAL in the cpu is not online, or the return value
+ * of @fn otherwise.
+ */
+long work_on_cpu(unsigned int cpu, long (*fn)(void *), void *arg)
+{
+ struct work_for_cpu wfc;
+
+ INIT_WORK(&wfc.work, do_work_for_cpu);
+ wfc.fn = fn;
+ wfc.arg = arg;
+ get_online_cpus();
+ if (unlikely(!cpu_online(cpu)))
+ wfc.ret = -EINVAL;
+ else {
+ schedule_work_on(cpu, &wfc.work);
+ flush_workqueue(&wfc.work);
+ }
+ put_online_cpus();
+
+ return wfc.ret;
+}
+EXPORT_SYMBOL_GPL(work_on_cpu);
+#endif /* CONFIG_SMP */
+
void __init init_workqueues(void)
{
cpu_populated_map = cpu_online_map;

2008-10-23 16:34:52

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 1/7] work_on_cpu: helper for doing task on a CPU.

On 10/23, Gautham R Shenoy wrote:
>
> On Thu, Oct 23, 2008 at 11:40:36AM +0200, Oleg Nesterov wrote:
> >
> > IOW, I'd suggest
> >
> > long work_on_cpu(unsigned int cpu, long (*fn)(void *), void *arg)
> > {
> > struct work_for_cpu wfc;
> >
> > INIT_WORK(&wfc.work, do_work_for_cpu);
> > wfc.fn = fn;
> > wfc.arg = arg;
> > wfc.ret = -EINVAL;
> >
> > get_online_cpus();
> > if (likely(cpu_online(cpu))) {
> > schedule_work_on(cpu, &wfc.work);
> > flush_work(&wfc.work);
> > }
>
> OK, how about doing the following? That will solve the problem
> of deadlock you pointed out in patch 6.
>
> get_online_cpus();
> if (likely(per_cpu(cpu_state, cpuid) == CPU_ONLINE)) {
> schedule_work_on(cpu, &wfc.work);
> flush_work(&wfc.work);
> } else if (per_cpu(cpu_state, cpuid) != CPU_DEAD)) {
> /*
> * We're the CPU-Hotplug thread. Call the
> * function synchronously so that we don't
> * deadlock with any pending work-item blocked
> * on get_online_cpus()
> */
> cpumask_t orignal_mask = current->cpus_allowed;
> set_cpus_allowed_ptr(current, &cpumask_of_cpu(cpu);
> wfc.ret = fn(arg);
> set_cpus_allowed_ptr(current, &original_mask);

Not sure I understand...

_cpu_up() does raw_notifier_call_chain(CPU_ONLINE) after __cpu_up(),
at this point per_cpu(cpu_state) == CPU_ONLINE.

(OK, this is not exactly true, start_secondary() updates cpu_online_map
and only then cpu_state = CPU_ONLINE, but __cpu_up() waits for
cpu_online(cpu) == T).


Anyway, personally I dislike this special case. We must not use work_on_cpu()
if we hold the lock which can be used by some work_struct, cpu_hotplug is not
special at all.

Oleg.

2008-10-23 17:01:40

by Oleg Nesterov

[permalink] [raw]
Subject: do_boot_cpu can deadlock?

Hmm. arch/x86/kernel/smpboot.c:do_boot_cpu() can deadlock ?

It is called from _cpu_up() under cpu_hotplug_begin(), and it
waits for c_idle.work. Again, if we have the pending work which
needs get_online_cpus() we seem to have problems.

Oleg.

Subject: Re: do_boot_cpu can deadlock?

On Thu, Oct 23, 2008 at 07:02:12PM +0200, Oleg Nesterov wrote:
> Hmm. arch/x86/kernel/smpboot.c:do_boot_cpu() can deadlock ?
>
> It is called from _cpu_up() under cpu_hotplug_begin(), and it
> waits for c_idle.work. Again, if we have the pending work which
> needs get_online_cpus() we seem to have problems.

Good point. Though this code gets triggered mostly during boot time when
the CPUs are being brought online for the first time. If we have some
work-item pending at that time, which needs get_online_cpus(), we could
possibly see this deadlock.

>
> Oleg.

--
Thanks and Regards
gautham

2008-10-23 18:49:24

by Cyrill Gorcunov

[permalink] [raw]
Subject: Re: do_boot_cpu can deadlock?

[Gautham R Shenoy - Thu, Oct 23, 2008 at 11:51:19PM +0530]
| On Thu, Oct 23, 2008 at 07:02:12PM +0200, Oleg Nesterov wrote:
| > Hmm. arch/x86/kernel/smpboot.c:do_boot_cpu() can deadlock ?
| >
| > It is called from _cpu_up() under cpu_hotplug_begin(), and it
| > waits for c_idle.work. Again, if we have the pending work which
| > needs get_online_cpus() we seem to have problems.
|
| Good point. Though this code gets triggered mostly during boot time when
| the CPUs are being brought online for the first time. If we have some
| work-item pending at that time, which needs get_online_cpus(), we could
| possibly see this deadlock.
|
| >
| > Oleg.
|
| --
| Thanks and Regards
| gautham
|

May I ask? If I understand right we do use this part of do_boot_cpu

if (!keventd_up() || current_is_keventd())
c_idle.work.func(&c_idle.work);
else {
schedule_work(&c_idle.work);
wait_for_completion(&c_idle.done);
}

if only we've been called the first time after power on. And all
subsequent call of this do_boot_cpu would lead to

if (c_idle.idle) {
c_idle.idle->thread.sp = (unsigned long) (((struct pt_regs *)
(THREAD_SIZE + task_stack_page(c_idle.idle))) - 1);
init_idle(c_idle.idle, cpu);
goto do_rest;
}

ie go to do_rest and no wait_for_completion/schedule_work at all.
Did I miss something? *Sorry* in advance if the question is quite
not related. This work-pending situation is in 'possible' scenario
only (ie we don't have such a callers for now... yet)?

- Cyrill -

2008-10-24 03:04:44

by Rusty Russell

[permalink] [raw]
Subject: Re: [PATCH 1/7] work_on_cpu: helper for doing task on a CPU.

On Friday 24 October 2008 01:36:05 Gautham R Shenoy wrote:
> OK, how about doing the following? That will solve the problem
> of deadlock you pointed out in patch 6.
>
> get_online_cpus();
> if (likely(per_cpu(cpu_state, cpuid) == CPU_ONLINE)) {
> schedule_work_on(cpu, &wfc.work);
> flush_work(&wfc.work);
> } else if (per_cpu(cpu_state, cpuid) != CPU_DEAD)) {
> /*
> * We're the CPU-Hotplug thread. Call the
> * function synchronously so that we don't
> * deadlock with any pending work-item blocked
> * on get_online_cpus()
> */
> cpumask_t orignal_mask = current->cpus_allowed;
> set_cpus_allowed_ptr(current, &cpumask_of_cpu(cpu);
> wfc.ret = fn(arg);
> set_cpus_allowed_ptr(current, &original_mask);
> }

Hi Gautham, Oleg,

Unfortunately that's exactly what I'm trying to get away from: another cpumask
on the stack :(

The cpu hotplug thread is just whoever wrote 0 to "online" in sys. And in
fact it already plays with its cpumask, which should be fixed too.

I think we should BUG_ON(per_cpu(cpu_state, cpuid) != CPU_DEAD) to ensure we
never use work_on_cpu in the hotplug cpu path. Then we use
smp_call_function() for that hard intel_cacheinfo case. Finally, we fix the
cpu hotplug path to use schedule_work_on() itself rather than playing games
with cpumask.

If you agree, I'll spin the patches...

Thanks for the brainpower,
Rusty.

Subject: Re: [PATCH 1/7] work_on_cpu: helper for doing task on a CPU.

On Fri, Oct 24, 2008 at 02:04:35PM +1100, Rusty Russell wrote:
> On Friday 24 October 2008 01:36:05 Gautham R Shenoy wrote:
> > OK, how about doing the following? That will solve the problem
> > of deadlock you pointed out in patch 6.
> >
> > get_online_cpus();
> > if (likely(per_cpu(cpu_state, cpuid) == CPU_ONLINE)) {
> > schedule_work_on(cpu, &wfc.work);
> > flush_work(&wfc.work);
> > } else if (per_cpu(cpu_state, cpuid) != CPU_DEAD)) {
> > /*
> > * We're the CPU-Hotplug thread. Call the
> > * function synchronously so that we don't
> > * deadlock with any pending work-item blocked
> > * on get_online_cpus()
> > */
> > cpumask_t orignal_mask = current->cpus_allowed;
> > set_cpus_allowed_ptr(current, &cpumask_of_cpu(cpu);
> > wfc.ret = fn(arg);
> > set_cpus_allowed_ptr(current, &original_mask);
> > }
>
> Hi Gautham, Oleg,
>
> Unfortunately that's exactly what I'm trying to get away from: another cpumask
> on the stack :(

Oh, okay, understood.
>
> The cpu hotplug thread is just whoever wrote 0 to "online" in sys. And in
> fact it already plays with its cpumask, which should be fixed too.

Right, we do that during cpu_offline to ensure that we don't run on the
cpu which is to be offlined.

>
> I think we should BUG_ON(per_cpu(cpu_state, cpuid) != CPU_DEAD) to ensure we
> never use work_on_cpu in the hotplug cpu path. Then we use
> smp_call_function() for that hard intel_cacheinfo case. Finally, we fix the
> cpu hotplug path to use schedule_work_on() itself rather than playing games
> with cpumask.
>
> If you agree, I'll spin the patches...

How about the following?

We go with this method, but instead of piggybacking on
the generic kevents workqueue, we create our own on_each_cpu_wq, for this
purpose.

Since the worker threads of this workqueue run only the work-items
queued by us, and since we take care of the cpu-hotplug locking _before_
queueing the work item, we can be sure that we won't deadlock
since no work-item when running can block on get_online_cpus().

This would hold good even for those work-items queued from
the CPU-Hotplug notifier call path.

Thus we can have the same semantics everywhere, rather than having
multiple cases.

Does this make sense?
>
> Thanks for the brainpower,
> Rusty.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

--
Thanks and Regards
gautham

2008-10-24 09:33:42

by Oleg Nesterov

[permalink] [raw]
Subject: Re: do_boot_cpu can deadlock?

On 10/23, Cyrill Gorcunov wrote:
>
> [Gautham R Shenoy - Thu, Oct 23, 2008 at 11:51:19PM +0530]
> | On Thu, Oct 23, 2008 at 07:02:12PM +0200, Oleg Nesterov wrote:
> | > Hmm. arch/x86/kernel/smpboot.c:do_boot_cpu() can deadlock ?
> | >
> | > It is called from _cpu_up() under cpu_hotplug_begin(), and it
> | > waits for c_idle.work. Again, if we have the pending work which
> | > needs get_online_cpus() we seem to have problems.
> |
> | Good point. Though this code gets triggered mostly during boot time when
> | the CPUs are being brought online for the first time. If we have some
> | work-item pending at that time, which needs get_online_cpus(), we could
> | possibly see this deadlock.
> |
> | >
> | > Oleg.
> |
> | --
> | Thanks and Regards
> | gautham
> |
>
> May I ask? If I understand right we do use this part of do_boot_cpu
>
> if (!keventd_up() || current_is_keventd())
> c_idle.work.func(&c_idle.work);
> else {
> schedule_work(&c_idle.work);
> wait_for_completion(&c_idle.done);
> }
>
> if only we've been called the first time after power on. And all
> subsequent call of this do_boot_cpu would lead to
>
> if (c_idle.idle) {
> c_idle.idle->thread.sp = (unsigned long) (((struct pt_regs *)
> (THREAD_SIZE + task_stack_page(c_idle.idle))) - 1);
> init_idle(c_idle.idle, cpu);
> goto do_rest;
> }
>
> ie go to do_rest and no wait_for_completion/schedule_work at all.
> Did I miss something? *Sorry* in advance if the question is quite
> not related. This work-pending situation is in 'possible' scenario
> only (ie we don't have such a callers for now... yet)?

There are no problems during boot time, afaics.

kernel_init() calls smp_init() before do_basic_setup()->init_workqueues().
This means that do_boot_cpu() won't use workqueues due to !keventd_up().

But let's suppose we boot with maxcpus=1, and then bring up another CPU.
Or we really add the new physical CPU (I don't really know if this is
possible on x86).

Oleg.

Subject: Re: do_boot_cpu can deadlock?

On Fri, Oct 24, 2008 at 11:33:42AM +0200, Oleg Nesterov wrote:
> On 10/23, Cyrill Gorcunov wrote:
> >
> > [Gautham R Shenoy - Thu, Oct 23, 2008 at 11:51:19PM +0530]
> > | On Thu, Oct 23, 2008 at 07:02:12PM +0200, Oleg Nesterov wrote:
> > | > Hmm. arch/x86/kernel/smpboot.c:do_boot_cpu() can deadlock ?
> > | >
> > | > It is called from _cpu_up() under cpu_hotplug_begin(), and it
> > | > waits for c_idle.work. Again, if we have the pending work which
> > | > needs get_online_cpus() we seem to have problems.
> > |
> > | Good point. Though this code gets triggered mostly during boot time when
> > | the CPUs are being brought online for the first time. If we have some
> > | work-item pending at that time, which needs get_online_cpus(), we could
> > | possibly see this deadlock.
> > |
> > | >
> > | > Oleg.
> > |
> > | --
> > | Thanks and Regards
> > | gautham
> > |
> >
> > May I ask? If I understand right we do use this part of do_boot_cpu
> >
> > if (!keventd_up() || current_is_keventd())
> > c_idle.work.func(&c_idle.work);
> > else {
> > schedule_work(&c_idle.work);
> > wait_for_completion(&c_idle.done);
> > }
> >
> > if only we've been called the first time after power on. And all
> > subsequent call of this do_boot_cpu would lead to
> >
> > if (c_idle.idle) {
> > c_idle.idle->thread.sp = (unsigned long) (((struct pt_regs *)
> > (THREAD_SIZE + task_stack_page(c_idle.idle))) - 1);
> > init_idle(c_idle.idle, cpu);
> > goto do_rest;
> > }
> >
> > ie go to do_rest and no wait_for_completion/schedule_work at all.
> > Did I miss something? *Sorry* in advance if the question is quite
> > not related. This work-pending situation is in 'possible' scenario
> > only (ie we don't have such a callers for now... yet)?
>
> There are no problems during boot time, afaics.
>
> kernel_init() calls smp_init() before do_basic_setup()->init_workqueues().
> This means that do_boot_cpu() won't use workqueues due to !keventd_up().
>
> But let's suppose we boot with maxcpus=1, and then bring up another CPU.
> Or we really add the new physical CPU (I don't really know if this is
> possible on x86).

Even I am not sure if physical hotplug is possible.

But think about the virtualization case when we want to add
additional CPU to a KVM guest at runtime? This
is not such a rare use-case. It could dead-lock that time, No?

>
> Oleg.

--
Thanks and Regards
gautham

2008-10-24 10:29:47

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 1/7] work_on_cpu: helper for doing task on a CPU.

On 10/24, Gautham R Shenoy wrote:
>
> On Fri, Oct 24, 2008 at 02:04:35PM +1100, Rusty Russell wrote:
> >
> > I think we should BUG_ON(per_cpu(cpu_state, cpuid) != CPU_DEAD) to ensure we
> > never use work_on_cpu in the hotplug cpu path. Then we use
> > smp_call_function() for that hard intel_cacheinfo case. Finally, we fix the
> > cpu hotplug path to use schedule_work_on() itself rather than playing games
> > with cpumask.
> >
> > If you agree, I'll spin the patches...
>
> How about the following?
>
> We go with this method, but instead of piggybacking on
> the generic kevents workqueue, we create our own on_each_cpu_wq, for this
> purpose.

Gautham, Rusty, I am a bit lost on this discussion...

Why should we care about this deadlock? Just do not use work_on_cpu() from
the hotplug cpu path, that is all.

Once again, the "cpu_hotplug_begin()" lock is not special. You can't use
work_on_cpu() under (say) rtnl_lock() for the same reason, this lock is
used by work->func() too.

Perhaps I missed something, and work_on_cpu() is really important for
cpu-hotplug path?

Oleg.

2008-10-24 10:51:20

by Oleg Nesterov

[permalink] [raw]
Subject: Re: do_boot_cpu can deadlock?

On 10/24, Gautham R Shenoy wrote:
>
> On Fri, Oct 24, 2008 at 11:33:42AM +0200, Oleg Nesterov wrote:
>
> > But let's suppose we boot with maxcpus=1, and then bring up another CPU.
> > Or we really add the new physical CPU (I don't really know if this is
> > possible on x86).
>
> Even I am not sure if physical hotplug is possible.
>
> But think about the virtualization case when we want to add
> additional CPU to a KVM guest at runtime? This
> is not such a rare use-case. It could dead-lock that time, No?

virtualization/KVM is a black magic to me ;) I don't know how/if it is
possible to add CPU at runtime.

Anyway, booting with maxcpus=1 allows us to bring up another CPU later,
and idle_thread_array[cpu] == NULL in that case, yes?

Perhaps smp_prepare_cpus() can do fork_idle() for_each_possible_cpu() ?
Or we can change do_boot_cpu() to use kthread_run() to create the idle
thread.

Oleg.

2008-10-24 11:18:56

by Rusty Russell

[permalink] [raw]
Subject: Re: [PATCH 1/7] work_on_cpu: helper for doing task on a CPU.

On Friday 24 October 2008 21:29:57 Oleg Nesterov wrote:
> On 10/24, Gautham R Shenoy wrote:
> > On Fri, Oct 24, 2008 at 02:04:35PM +1100, Rusty Russell wrote:
> > > I think we should BUG_ON(per_cpu(cpu_state, cpuid) != CPU_DEAD) to
> > > ensure we never use work_on_cpu in the hotplug cpu path. Then we use
> > > smp_call_function() for that hard intel_cacheinfo case. Finally, we
> > > fix the cpu hotplug path to use schedule_work_on() itself rather than
> > > playing games with cpumask.
> > >
> > > If you agree, I'll spin the patches...
> >
> > How about the following?
> >
> > We go with this method, but instead of piggybacking on
> > the generic kevents workqueue, we create our own on_each_cpu_wq, for this
> > purpose.
>
> Gautham, Rusty, I am a bit lost on this discussion...
>
> Why should we care about this deadlock? Just do not use work_on_cpu() from
> the hotplug cpu path, that is all.

No, I agree with you (Oleg). Gautham's proposal would work, but at the cost
of yet another thread per CPU :(

Since we know how to handle the one problematic case Oleg spotted, *and* we
know how to BUG_ON to make sure noone introduces new ones, I think this is
clearest.

Thanks,
Rusty.

Subject: Re: [PATCH 1/7] work_on_cpu: helper for doing task on a CPU.

On Fri, Oct 24, 2008 at 12:29:57PM +0200, Oleg Nesterov wrote:
> On 10/24, Gautham R Shenoy wrote:
> >
> > On Fri, Oct 24, 2008 at 02:04:35PM +1100, Rusty Russell wrote:
> > >
> > > I think we should BUG_ON(per_cpu(cpu_state, cpuid) != CPU_DEAD) to ensure we
> > > never use work_on_cpu in the hotplug cpu path. Then we use
> > > smp_call_function() for that hard intel_cacheinfo case. Finally, we fix the
> > > cpu hotplug path to use schedule_work_on() itself rather than playing games
> > > with cpumask.
> > >
> > > If you agree, I'll spin the patches...
> >
> > How about the following?
> >
> > We go with this method, but instead of piggybacking on
> > the generic kevents workqueue, we create our own on_each_cpu_wq, for this
> > purpose.
>
> Gautham, Rusty, I am a bit lost on this discussion...
>
> Why should we care about this deadlock? Just do not use work_on_cpu() from
> the hotplug cpu path, that is all.
>
> Once again, the "cpu_hotplug_begin()" lock is not special. You can't use
> work_on_cpu() under (say) rtnl_lock() for the same reason, this lock is
> used by work->func() too.
>
> Perhaps I missed something, and work_on_cpu() is really important for
> cpu-hotplug path?

Rusty, Oleg,

Having a rule that we shouldn't use work_on_cpu() in cpu-hotplug path
is a good thing. But maintaining it can be difficult.

We've seen that in the past with the cpucontrol mutex.
We had clear rules that functions which get called in
cpu-hotplug callback paths, shouldn't take this mutex. But with
functions that were called in the cpu-hotplug notifier
path as well as normal paths, it created a whole locking mess,
and took quite some time to fix.

Similarly, right now, we can have a BUG_ON() which notifies us whenever
someone ends up calling a function that invokes work_on_cpu() from the
CPU-Hotplug callpath. But we will fix it only when the BUG_ON() is hit.

On the other hand, if we have a mechanism that's guaranteed to work
irrespective of the callpaths, why not use that ?

I am not opposed to the proposed design.
Just voicing out an alternative thought. I could be completely wrong :-)

>
> Oleg.
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

--
Thanks and Regards
gautham

2008-10-24 13:24:36

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 1/7] work_on_cpu: helper for doing task on a CPU.

On 10/24, Gautham R Shenoy wrote:
>
> Having a rule that we shouldn't use work_on_cpu() in cpu-hotplug path
> is a good thing. But maintaining it can be difficult.
>
> We've seen that in the past with the cpucontrol mutex.
> We had clear rules that functions which get called in
> cpu-hotplug callback paths, shouldn't take this mutex. But with
> functions that were called in the cpu-hotplug notifier
> path as well as normal paths, it created a whole locking mess,
> and took quite some time to fix.
>
> Similarly, right now, we can have a BUG_ON() which notifies us whenever
> someone ends up calling a function that invokes work_on_cpu() from the
> CPU-Hotplug callpath. But we will fix it only when the BUG_ON() is hit.
>
> On the other hand, if we have a mechanism that's guaranteed to work
> irrespective of the callpaths, why not use that ?

If we add another wq for work_on_cpu(), then we add another hard-to-maintain
rule: get_online_cpus() must not be used by any work which can be queued
on that wq. And, yet another per-cpu thread...

Personally I don't even think we need a BUG_ON() in work_on_cpu(), because
I don't think cpu-hotplug path is so special.

Not that I have a strong opinion though.

Oleg.

Subject: Re: [PATCH 1/7] work_on_cpu: helper for doing task on a CPU.

On Fri, Oct 24, 2008 at 03:25:09PM +0200, Oleg Nesterov wrote:
> On 10/24, Gautham R Shenoy wrote:
> >
> > Having a rule that we shouldn't use work_on_cpu() in cpu-hotplug path
> > is a good thing. But maintaining it can be difficult.
> >
> > We've seen that in the past with the cpucontrol mutex.
> > We had clear rules that functions which get called in
> > cpu-hotplug callback paths, shouldn't take this mutex. But with
> > functions that were called in the cpu-hotplug notifier
> > path as well as normal paths, it created a whole locking mess,
> > and took quite some time to fix.
> >
> > Similarly, right now, we can have a BUG_ON() which notifies us whenever
> > someone ends up calling a function that invokes work_on_cpu() from the
> > CPU-Hotplug callpath. But we will fix it only when the BUG_ON() is hit.
> >
> > On the other hand, if we have a mechanism that's guaranteed to work
> > irrespective of the callpaths, why not use that ?
>
> If we add another wq for work_on_cpu(), then we add another hard-to-maintain
> rule: get_online_cpus() must not be used by any work which can be queued
> on that wq. And, yet another per-cpu thread...

No, we don't have that rule!

Because using Rusty's function with a seperate workqueue,
we queue the work item as follows:

get_online_cpus();
queue_work_on(cpu, &on_each_cpu_wq, &wfc.work);
flush_work(&wfc.work);
put_online_cpus();

The very fact that we've successfully queued the work-item means that
no cpu-hotplug operation can occur till our work item finishes
execution.

Hence the work can use get_online_cpus()!

Yes, we end up using additional resources in the form of another per-cpu
threads. But is that so much of an issue?

>
> Personally I don't even think we need a BUG_ON() in work_on_cpu(), because
> I don't think cpu-hotplug path is so special.
>
> Not that I have a strong opinion though.
>
> Oleg.

--
Thanks and Regards
gautham

2008-10-24 14:23:42

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 1/7] work_on_cpu: helper for doing task on a CPU.

On 10/24, Gautham R Shenoy wrote:
>
> On Fri, Oct 24, 2008 at 03:25:09PM +0200, Oleg Nesterov wrote:
> >
> > If we add another wq for work_on_cpu(), then we add another hard-to-maintain
> > rule: get_online_cpus() must not be used by any work which can be queued
> > on that wq. And, yet another per-cpu thread...
>
> No, we don't have that rule!
>
> Because using Rusty's function with a seperate workqueue,
> we queue the work item as follows:
>
> get_online_cpus();
> queue_work_on(cpu, &on_each_cpu_wq, &wfc.work);
> flush_work(&wfc.work);
> put_online_cpus();
>
> The very fact that we've successfully queued the work-item means that
> no cpu-hotplug operation can occur till our work item finishes
> execution.

Ah yes, thanks for correcting me.

> Yes, we end up using additional resources in the form of another per-cpu
> threads. But is that so much of an issue?

Well, don't ask me... but the only reason why we need these threads
is that we want to make work_on_cpu() useable from cpu-hotplug path,
a bit strange ;)

Oleg.