2008-07-12 16:18:43

by Oleg Nesterov

[permalink] [raw]
Subject: Re: + pm-introduce-new-interfaces-schedule_work_on-and-queue_work_on.patch added to -mm tree

(Gautham cc'ed)

On 07/11, Andrew Morton wrote:
>
> Subject: pm: introduce new interfaces schedule_work_on() and queue_work_on()
> From: Zhang Rui <[email protected]>
>
> This interface allows adding a job on a specific cpu.
>
> Although a work struct on a cpu will be scheduled to other cpu if the cpu
> dies, there is a recursion if a work task tries to offline the cpu it's
> running on. we need to schedule the task to a specific cpu in this case.
> http://bugzilla.kernel.org/show_bug.cgi?id=10897

So, this is used in http://bugzilla.kernel.org/attachment.cgi?id=16707

--- linux-2.6.orig/kernel/power/poweroff.c 2008-06-30 16:01:35.000000000 +0800
+++ linux-2.6/kernel/power/poweroff.c 2008-07-03 10:50:05.000000000 +0800
@@ -25,7 +25,8 @@

static void handle_poweroff(int key, struct tty_struct *tty)
{
- schedule_work(&poweroff_work);
+ /* run sysrq poweroff on boot cpu */
+ schedule_work_on(first_cpu(cpu_online_map), &poweroff_work);
}

static struct sysrq_key_op sysrq_poweroff_op = {

A couple of silly questions, I don't understand the low-level details.

This patch (and kernel_power_off() afaics) assumes that the boot cpu
can't be cpu_down()'ed. Is it true in general? For example, grep shows
that arch/s390/kernel/smp.c:topology_init()->smp_add_present_cpu()
sets ->hotpluggable = 1 for all present CPUs?

Another question. I can't understand why first_cpu(cpu_online_map) is
always the boot CPU on every arch. IOW, shouldn't boot_cpu_init() set
some "boot_cpu = smp_processor_id()" which should be use instead of
first_cpu(cpu_online_map) ?

Thanks,

Oleg.


Subject: Re: + pm-introduce-new-interfaces-schedule_work_on-and-queue_work_on.patch added to -mm tree

On Sat, Jul 12, 2008 at 08:21:49PM +0400, Oleg Nesterov wrote:
> (Gautham cc'ed)
>

Sorry for the delay... I'm a bit tied down to other things until aug
20th :(

> On 07/11, Andrew Morton wrote:
> >
> > Subject: pm: introduce new interfaces schedule_work_on() and queue_work_on()
> > From: Zhang Rui <[email protected]>
> >
> > This interface allows adding a job on a specific cpu.
> >
> > Although a work struct on a cpu will be scheduled to other cpu if the cpu
> > dies, there is a recursion if a work task tries to offline the cpu it's
> > running on. we need to schedule the task to a specific cpu in this case.
> > http://bugzilla.kernel.org/show_bug.cgi?id=10897
>
> So, this is used in http://bugzilla.kernel.org/attachment.cgi?id=16707
>
> --- linux-2.6.orig/kernel/power/poweroff.c 2008-06-30 16:01:35.000000000 +0800
> +++ linux-2.6/kernel/power/poweroff.c 2008-07-03 10:50:05.000000000 +0800
> @@ -25,7 +25,8 @@
>
> static void handle_poweroff(int key, struct tty_struct *tty)
> {
> - schedule_work(&poweroff_work);
> + /* run sysrq poweroff on boot cpu */
> + schedule_work_on(first_cpu(cpu_online_map), &poweroff_work);
> }
>
> static struct sysrq_key_op sysrq_poweroff_op = {
>
> A couple of silly questions, I don't understand the low-level details.
>
> This patch (and kernel_power_off() afaics) assumes that the boot cpu
> can't be cpu_down()'ed. Is it true in general? For example, grep shows
> that arch/s390/kernel/smp.c:topology_init()->smp_add_present_cpu()
> sets ->hotpluggable = 1 for all present CPUs?

I tried this on a Power system sometime back and I was able to
offline CPU0. What I am not sure however, is
if that was the boot-cpu.

On x86, I do remember reading somewhere why we cannot offline
CPU0.

/me searches.

Yes, in arch/x86/kernel/topology.c

int __ref arch_register_cpu(int num)
{
/*
* CPU0 cannot be offlined due to several
* restrictions and assumptions in kernel. This basically
* doesnt add a control file, one cannot attempt to offline
* BSP.
*
* Also certain PCI quirks require not to enable hotplug control
* for all CPU's.
*/
if (num)
per_cpu(cpu_devices, num).cpu.hotpluggable = 1;
return register_cpu(&per_cpu(cpu_devices, num).cpu, num);
}

>
> Another question. I can't understand why first_cpu(cpu_online_map) is
> always the boot CPU on every arch. IOW, shouldn't boot_cpu_init() set
> some "boot_cpu = smp_processor_id()" which should be use instead of
> first_cpu(cpu_online_map) ?
>

Not very sure about this one.
> Thanks,
>
> Oleg.
>

--
Thanks and Regards
gautham

2008-07-24 12:40:34

by Oleg Nesterov

[permalink] [raw]
Subject: Re: + pm-introduce-new-interfaces-schedule_work_on-and-queue_work_on.patch added to -mm tree

On 07/22, Gautham R Shenoy wrote:
>
> On Sat, Jul 12, 2008 at 08:21:49PM +0400, Oleg Nesterov wrote:
> >
> > So, this is used in http://bugzilla.kernel.org/attachment.cgi?id=16707
> >
> > --- linux-2.6.orig/kernel/power/poweroff.c 2008-06-30 16:01:35.000000000 +0800
> > +++ linux-2.6/kernel/power/poweroff.c 2008-07-03 10:50:05.000000000 +0800
> > @@ -25,7 +25,8 @@
> >
> > static void handle_poweroff(int key, struct tty_struct *tty)
> > {
> > - schedule_work(&poweroff_work);
> > + /* run sysrq poweroff on boot cpu */
> > + schedule_work_on(first_cpu(cpu_online_map), &poweroff_work);
> > }
> >
> > static struct sysrq_key_op sysrq_poweroff_op = {
> >
> > A couple of silly questions, I don't understand the low-level details.
> >
> > This patch (and kernel_power_off() afaics) assumes that the boot cpu
> > can't be cpu_down()'ed. Is it true in general? For example, grep shows
> > that arch/s390/kernel/smp.c:topology_init()->smp_add_present_cpu()
> > sets ->hotpluggable = 1 for all present CPUs?
>
> I tried this on a Power system sometime back and I was able to
> offline CPU0.

This means that

pm-schedule-sysrq-poweroff-on-boot-cpu.patch

is not 100% right. It is still possible to hang/deadlock if we race
with cpu_down(first_cpu(cpu_online_map)).

The bug is mostly theoretical, but perhaps should be fixed anyway,
handle_poweroff() can use kthread_run().

Oleg.

2008-07-25 01:22:52

by Zhang, Rui

[permalink] [raw]
Subject: Re: + pm-introduce-new-interfaces-schedule_work_on-and-queue_work_on.patch added to -mm tree

On Thu, 2008-07-24 at 20:43 +0800, Oleg Nesterov wrote:
> On 07/22, Gautham R Shenoy wrote:
> >
> > On Sat, Jul 12, 2008 at 08:21:49PM +0400, Oleg Nesterov wrote:
> > >
> > > So, this is used in
> http://bugzilla.kernel.org/attachment.cgi?id=16707
> > >
> > > --- linux-2.6.orig/kernel/power/poweroff.c 2008-06-30
> 16:01:35.000000000 +0800
> > > +++ linux-2.6/kernel/power/poweroff.c 2008-07-03
> 10:50:05.000000000 +0800
> > > @@ -25,7 +25,8 @@
> > >
> > > static void handle_poweroff(int key, struct tty_struct *tty)
> > > {
> > > - schedule_work(&poweroff_work);
> > > + /* run sysrq poweroff on boot cpu */
> > > + schedule_work_on(first_cpu(cpu_online_map),
> &poweroff_work);
> > > }
> > >
> > > static struct sysrq_key_op sysrq_poweroff_op = {
> > >
> > > A couple of silly questions, I don't understand the low-level
> details.
> > >
> > > This patch (and kernel_power_off() afaics) assumes that the boot
> cpu
> > > can't be cpu_down()'ed. Is it true in general? For example, grep
> shows
> > > that arch/s390/kernel/smp.c:topology_init()->smp_add_present_cpu()
> > > sets ->hotpluggable = 1 for all present CPUs?
> >
> > I tried this on a Power system sometime back and I was able to
> > offline CPU0.
>
> This means that
>
> pm-schedule-sysrq-poweroff-on-boot-cpu.patch
>
> is not 100% right. It is still possible to hang/deadlock if we race
> with cpu_down(first_cpu(cpu_online_map)).

Yes, you're right.
But then should we fix disable_nonboot_cpus as well?

int disable_nonboot_cpus(void)
{
first_cpu = first_cpu(cpu_online_map);
...

for_each_online_cpu(cpu) {
if (cpu == first_cpu)
continue;
error = _cpu_down(cpu, 1);
...
}
...
}

thanks,
rui

> The bug is mostly theoretical, but perhaps should be fixed anyway,
> handle_poweroff() can use kthread_run().
>

2008-07-25 09:39:21

by Oleg Nesterov

[permalink] [raw]
Subject: Re: + pm-introduce-new-interfaces-schedule_work_on-and-queue_work_on.patch added to -mm tree

On 07/25, Zhang Rui wrote:
>
> On Thu, 2008-07-24 at 20:43 +0800, Oleg Nesterov wrote:
> >
> > This means that
> >
> > pm-schedule-sysrq-poweroff-on-boot-cpu.patch
> >
> > is not 100% right. It is still possible to hang/deadlock if we race
> > with cpu_down(first_cpu(cpu_online_map)).
>
> Yes, you're right.
> But then should we fix disable_nonboot_cpus as well?
>
> int disable_nonboot_cpus(void)
> {
> first_cpu = first_cpu(cpu_online_map);
> ...
>
> for_each_online_cpu(cpu) {
> if (cpu == first_cpu)
> continue;
> error = _cpu_down(cpu, 1);
> ...
> }
> ...
> }

Note that disable_nonboot_cpus() does first_cpu = first_cpu() under
cpu_maps_update_begin(), so we can't race with cpu-hotplug.

However, this afaics means that its name is wrong, and
printk("Disabling non-boot CPUs ...\n") is not right too.
What it does is disable_all_but_one_cpus().

And, it is not clear why disable_nonboot_cpus() assumes that
all but first_cpu(cpu_online_map) must have .hotpluggable == 1.

And, if one of the callers really need to preserve the boot CPU,
I don't understand how it is guaranteed it must be first_cpu().

Oleg.

2008-08-05 20:04:31

by Pavel Machek

[permalink] [raw]
Subject: Re: + pm-introduce-new-interfaces-schedule_work_on-and-queue_work_on.patch added to -mm tree

Hi!

> > > This means that
> > >
> > > pm-schedule-sysrq-poweroff-on-boot-cpu.patch
> > >
> > > is not 100% right. It is still possible to hang/deadlock if we race
> > > with cpu_down(first_cpu(cpu_online_map)).
> >
> > Yes, you're right.
> > But then should we fix disable_nonboot_cpus as well?
> >
> > int disable_nonboot_cpus(void)
> > {
> > first_cpu = first_cpu(cpu_online_map);
> > ...
> >
> > for_each_online_cpu(cpu) {
> > if (cpu == first_cpu)
> > continue;
> > error = _cpu_down(cpu, 1);
> > ...
> > }
> > ...
> > }
>
> Note that disable_nonboot_cpus() does first_cpu = first_cpu() under
> cpu_maps_update_begin(), so we can't race with cpu-hotplug.
>
> However, this afaics means that its name is wrong, and
> printk("Disabling non-boot CPUs ...\n") is not right too.
> What it does is disable_all_but_one_cpus().

I thought that first cpu is defined to be boot cpu?

> And, it is not clear why disable_nonboot_cpus() assumes that
> all but first_cpu(cpu_online_map) must have .hotpluggable == 1.

Where does it assume that?

It will fail if some CPUs can't be unplugged, and I'm afraid suspend
can't work in such case...

> And, if one of the callers really need to preserve the boot CPU,
> I don't understand how it is guaranteed it must be first_cpu().

Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2008-08-06 12:42:19

by Oleg Nesterov

[permalink] [raw]
Subject: Re: + pm-introduce-new-interfaces-schedule_work_on-and-queue_work_on.patch added to -mm tree

On 08/05, Pavel Machek wrote:
>
> > > > This means that
> > > >
> > > > pm-schedule-sysrq-poweroff-on-boot-cpu.patch
> > > >
> > > > is not 100% right. It is still possible to hang/deadlock if we race
> > > > with cpu_down(first_cpu(cpu_online_map)).
> > >
> > > Yes, you're right.
> > > But then should we fix disable_nonboot_cpus as well?
> > >
> > > int disable_nonboot_cpus(void)
> > > {
> > > first_cpu = first_cpu(cpu_online_map);
> > > ...
> > >
> > > for_each_online_cpu(cpu) {
> > > if (cpu == first_cpu)
> > > continue;
> > > error = _cpu_down(cpu, 1);
> > > ...
> > > }
> > > ...
> > > }
> >
> > Note that disable_nonboot_cpus() does first_cpu = first_cpu() under
> > cpu_maps_update_begin(), so we can't race with cpu-hotplug.
> >
> > However, this afaics means that its name is wrong, and
> > printk("Disabling non-boot CPUs ...\n") is not right too.
> > What it does is disable_all_but_one_cpus().
>
> I thought that first cpu is defined to be boot cpu?

I don't know, but I don't really understand this low-level code.

Is it documented? This is certainly true on x86, but I don't
understand why this must be true on every arch.

Let's see. start_kernel() does smp_setup_processor_id(). Is it
guaranteed that it chooses the lowest number from cpu_possible_map?
This helper is only defined for voyager, but anyway it is not clear
why start_kernel() must be always called on CPU 0. Otherwise,
the next cpu_up() (from smp_init() or later) can add another CPU
which becomes first_cpu(cpu_online_map).

But, from disable_nonboot_cpus's pov this doesn't matter. Even if
the first cpu must be boot cpu, it can be (in general) cpu_down()'ed.
In that case, when disable_nonboot_cpus() is called, first_cpu()
returns another value.

Once again, I don't claim this all is wrong.

> > And, it is not clear why disable_nonboot_cpus() assumes that
> > all but first_cpu(cpu_online_map) must have .hotpluggable == 1.
>
> Where does it assume that?
>
> It will fail if some CPUs can't be unplugged, and I'm afraid suspend
> can't work in such case...

Yes I see. But disable_nonboot_cpus() doesn't check .hotpluggable,
it just takes CPU down regardless of .hotpluggable, is it always OK?

Oleg.