(Gautham cc'ed)
On 07/11, Andrew Morton wrote:
>
> Subject: pm: introduce new interfaces schedule_work_on() and queue_work_on()
> From: Zhang Rui <[email protected]>
>
> This interface allows adding a job on a specific cpu.
>
> Although a work struct on a cpu will be scheduled to other cpu if the cpu
> dies, there is a recursion if a work task tries to offline the cpu it's
> running on. we need to schedule the task to a specific cpu in this case.
> http://bugzilla.kernel.org/show_bug.cgi?id=10897
So, this is used in http://bugzilla.kernel.org/attachment.cgi?id=16707
--- linux-2.6.orig/kernel/power/poweroff.c 2008-06-30 16:01:35.000000000 +0800
+++ linux-2.6/kernel/power/poweroff.c 2008-07-03 10:50:05.000000000 +0800
@@ -25,7 +25,8 @@
static void handle_poweroff(int key, struct tty_struct *tty)
{
- schedule_work(&poweroff_work);
+ /* run sysrq poweroff on boot cpu */
+ schedule_work_on(first_cpu(cpu_online_map), &poweroff_work);
}
static struct sysrq_key_op sysrq_poweroff_op = {
A couple of silly questions, I don't understand the low-level details.
This patch (and kernel_power_off() afaics) assumes that the boot cpu
can't be cpu_down()'ed. Is it true in general? For example, grep shows
that arch/s390/kernel/smp.c:topology_init()->smp_add_present_cpu()
sets ->hotpluggable = 1 for all present CPUs?
Another question. I can't understand why first_cpu(cpu_online_map) is
always the boot CPU on every arch. IOW, shouldn't boot_cpu_init() set
some "boot_cpu = smp_processor_id()" which should be use instead of
first_cpu(cpu_online_map) ?
Thanks,
Oleg.
On Sat, Jul 12, 2008 at 08:21:49PM +0400, Oleg Nesterov wrote:
> (Gautham cc'ed)
>
Sorry for the delay... I'm a bit tied down to other things until aug
20th :(
> On 07/11, Andrew Morton wrote:
> >
> > Subject: pm: introduce new interfaces schedule_work_on() and queue_work_on()
> > From: Zhang Rui <[email protected]>
> >
> > This interface allows adding a job on a specific cpu.
> >
> > Although a work struct on a cpu will be scheduled to other cpu if the cpu
> > dies, there is a recursion if a work task tries to offline the cpu it's
> > running on. we need to schedule the task to a specific cpu in this case.
> > http://bugzilla.kernel.org/show_bug.cgi?id=10897
>
> So, this is used in http://bugzilla.kernel.org/attachment.cgi?id=16707
>
> --- linux-2.6.orig/kernel/power/poweroff.c 2008-06-30 16:01:35.000000000 +0800
> +++ linux-2.6/kernel/power/poweroff.c 2008-07-03 10:50:05.000000000 +0800
> @@ -25,7 +25,8 @@
>
> static void handle_poweroff(int key, struct tty_struct *tty)
> {
> - schedule_work(&poweroff_work);
> + /* run sysrq poweroff on boot cpu */
> + schedule_work_on(first_cpu(cpu_online_map), &poweroff_work);
> }
>
> static struct sysrq_key_op sysrq_poweroff_op = {
>
> A couple of silly questions, I don't understand the low-level details.
>
> This patch (and kernel_power_off() afaics) assumes that the boot cpu
> can't be cpu_down()'ed. Is it true in general? For example, grep shows
> that arch/s390/kernel/smp.c:topology_init()->smp_add_present_cpu()
> sets ->hotpluggable = 1 for all present CPUs?
I tried this on a Power system sometime back and I was able to
offline CPU0. What I am not sure however, is
if that was the boot-cpu.
On x86, I do remember reading somewhere why we cannot offline
CPU0.
/me searches.
Yes, in arch/x86/kernel/topology.c
int __ref arch_register_cpu(int num)
{
/*
* CPU0 cannot be offlined due to several
* restrictions and assumptions in kernel. This basically
* doesnt add a control file, one cannot attempt to offline
* BSP.
*
* Also certain PCI quirks require not to enable hotplug control
* for all CPU's.
*/
if (num)
per_cpu(cpu_devices, num).cpu.hotpluggable = 1;
return register_cpu(&per_cpu(cpu_devices, num).cpu, num);
}
>
> Another question. I can't understand why first_cpu(cpu_online_map) is
> always the boot CPU on every arch. IOW, shouldn't boot_cpu_init() set
> some "boot_cpu = smp_processor_id()" which should be use instead of
> first_cpu(cpu_online_map) ?
>
Not very sure about this one.
> Thanks,
>
> Oleg.
>
--
Thanks and Regards
gautham
On 07/22, Gautham R Shenoy wrote:
>
> On Sat, Jul 12, 2008 at 08:21:49PM +0400, Oleg Nesterov wrote:
> >
> > So, this is used in http://bugzilla.kernel.org/attachment.cgi?id=16707
> >
> > --- linux-2.6.orig/kernel/power/poweroff.c 2008-06-30 16:01:35.000000000 +0800
> > +++ linux-2.6/kernel/power/poweroff.c 2008-07-03 10:50:05.000000000 +0800
> > @@ -25,7 +25,8 @@
> >
> > static void handle_poweroff(int key, struct tty_struct *tty)
> > {
> > - schedule_work(&poweroff_work);
> > + /* run sysrq poweroff on boot cpu */
> > + schedule_work_on(first_cpu(cpu_online_map), &poweroff_work);
> > }
> >
> > static struct sysrq_key_op sysrq_poweroff_op = {
> >
> > A couple of silly questions, I don't understand the low-level details.
> >
> > This patch (and kernel_power_off() afaics) assumes that the boot cpu
> > can't be cpu_down()'ed. Is it true in general? For example, grep shows
> > that arch/s390/kernel/smp.c:topology_init()->smp_add_present_cpu()
> > sets ->hotpluggable = 1 for all present CPUs?
>
> I tried this on a Power system sometime back and I was able to
> offline CPU0.
This means that
pm-schedule-sysrq-poweroff-on-boot-cpu.patch
is not 100% right. It is still possible to hang/deadlock if we race
with cpu_down(first_cpu(cpu_online_map)).
The bug is mostly theoretical, but perhaps should be fixed anyway,
handle_poweroff() can use kthread_run().
Oleg.
On Thu, 2008-07-24 at 20:43 +0800, Oleg Nesterov wrote:
> On 07/22, Gautham R Shenoy wrote:
> >
> > On Sat, Jul 12, 2008 at 08:21:49PM +0400, Oleg Nesterov wrote:
> > >
> > > So, this is used in
> http://bugzilla.kernel.org/attachment.cgi?id=16707
> > >
> > > --- linux-2.6.orig/kernel/power/poweroff.c 2008-06-30
> 16:01:35.000000000 +0800
> > > +++ linux-2.6/kernel/power/poweroff.c 2008-07-03
> 10:50:05.000000000 +0800
> > > @@ -25,7 +25,8 @@
> > >
> > > static void handle_poweroff(int key, struct tty_struct *tty)
> > > {
> > > - schedule_work(&poweroff_work);
> > > + /* run sysrq poweroff on boot cpu */
> > > + schedule_work_on(first_cpu(cpu_online_map),
> &poweroff_work);
> > > }
> > >
> > > static struct sysrq_key_op sysrq_poweroff_op = {
> > >
> > > A couple of silly questions, I don't understand the low-level
> details.
> > >
> > > This patch (and kernel_power_off() afaics) assumes that the boot
> cpu
> > > can't be cpu_down()'ed. Is it true in general? For example, grep
> shows
> > > that arch/s390/kernel/smp.c:topology_init()->smp_add_present_cpu()
> > > sets ->hotpluggable = 1 for all present CPUs?
> >
> > I tried this on a Power system sometime back and I was able to
> > offline CPU0.
>
> This means that
>
> pm-schedule-sysrq-poweroff-on-boot-cpu.patch
>
> is not 100% right. It is still possible to hang/deadlock if we race
> with cpu_down(first_cpu(cpu_online_map)).
Yes, you're right.
But then should we fix disable_nonboot_cpus as well?
int disable_nonboot_cpus(void)
{
first_cpu = first_cpu(cpu_online_map);
...
for_each_online_cpu(cpu) {
if (cpu == first_cpu)
continue;
error = _cpu_down(cpu, 1);
...
}
...
}
thanks,
rui
> The bug is mostly theoretical, but perhaps should be fixed anyway,
> handle_poweroff() can use kthread_run().
>
On 07/25, Zhang Rui wrote:
>
> On Thu, 2008-07-24 at 20:43 +0800, Oleg Nesterov wrote:
> >
> > This means that
> >
> > pm-schedule-sysrq-poweroff-on-boot-cpu.patch
> >
> > is not 100% right. It is still possible to hang/deadlock if we race
> > with cpu_down(first_cpu(cpu_online_map)).
>
> Yes, you're right.
> But then should we fix disable_nonboot_cpus as well?
>
> int disable_nonboot_cpus(void)
> {
> first_cpu = first_cpu(cpu_online_map);
> ...
>
> for_each_online_cpu(cpu) {
> if (cpu == first_cpu)
> continue;
> error = _cpu_down(cpu, 1);
> ...
> }
> ...
> }
Note that disable_nonboot_cpus() does first_cpu = first_cpu() under
cpu_maps_update_begin(), so we can't race with cpu-hotplug.
However, this afaics means that its name is wrong, and
printk("Disabling non-boot CPUs ...\n") is not right too.
What it does is disable_all_but_one_cpus().
And, it is not clear why disable_nonboot_cpus() assumes that
all but first_cpu(cpu_online_map) must have .hotpluggable == 1.
And, if one of the callers really need to preserve the boot CPU,
I don't understand how it is guaranteed it must be first_cpu().
Oleg.
Hi!
> > > This means that
> > >
> > > pm-schedule-sysrq-poweroff-on-boot-cpu.patch
> > >
> > > is not 100% right. It is still possible to hang/deadlock if we race
> > > with cpu_down(first_cpu(cpu_online_map)).
> >
> > Yes, you're right.
> > But then should we fix disable_nonboot_cpus as well?
> >
> > int disable_nonboot_cpus(void)
> > {
> > first_cpu = first_cpu(cpu_online_map);
> > ...
> >
> > for_each_online_cpu(cpu) {
> > if (cpu == first_cpu)
> > continue;
> > error = _cpu_down(cpu, 1);
> > ...
> > }
> > ...
> > }
>
> Note that disable_nonboot_cpus() does first_cpu = first_cpu() under
> cpu_maps_update_begin(), so we can't race with cpu-hotplug.
>
> However, this afaics means that its name is wrong, and
> printk("Disabling non-boot CPUs ...\n") is not right too.
> What it does is disable_all_but_one_cpus().
I thought that first cpu is defined to be boot cpu?
> And, it is not clear why disable_nonboot_cpus() assumes that
> all but first_cpu(cpu_online_map) must have .hotpluggable == 1.
Where does it assume that?
It will fail if some CPUs can't be unplugged, and I'm afraid suspend
can't work in such case...
> And, if one of the callers really need to preserve the boot CPU,
> I don't understand how it is guaranteed it must be first_cpu().
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
On 08/05, Pavel Machek wrote:
>
> > > > This means that
> > > >
> > > > pm-schedule-sysrq-poweroff-on-boot-cpu.patch
> > > >
> > > > is not 100% right. It is still possible to hang/deadlock if we race
> > > > with cpu_down(first_cpu(cpu_online_map)).
> > >
> > > Yes, you're right.
> > > But then should we fix disable_nonboot_cpus as well?
> > >
> > > int disable_nonboot_cpus(void)
> > > {
> > > first_cpu = first_cpu(cpu_online_map);
> > > ...
> > >
> > > for_each_online_cpu(cpu) {
> > > if (cpu == first_cpu)
> > > continue;
> > > error = _cpu_down(cpu, 1);
> > > ...
> > > }
> > > ...
> > > }
> >
> > Note that disable_nonboot_cpus() does first_cpu = first_cpu() under
> > cpu_maps_update_begin(), so we can't race with cpu-hotplug.
> >
> > However, this afaics means that its name is wrong, and
> > printk("Disabling non-boot CPUs ...\n") is not right too.
> > What it does is disable_all_but_one_cpus().
>
> I thought that first cpu is defined to be boot cpu?
I don't know, but I don't really understand this low-level code.
Is it documented? This is certainly true on x86, but I don't
understand why this must be true on every arch.
Let's see. start_kernel() does smp_setup_processor_id(). Is it
guaranteed that it chooses the lowest number from cpu_possible_map?
This helper is only defined for voyager, but anyway it is not clear
why start_kernel() must be always called on CPU 0. Otherwise,
the next cpu_up() (from smp_init() or later) can add another CPU
which becomes first_cpu(cpu_online_map).
But, from disable_nonboot_cpus's pov this doesn't matter. Even if
the first cpu must be boot cpu, it can be (in general) cpu_down()'ed.
In that case, when disable_nonboot_cpus() is called, first_cpu()
returns another value.
Once again, I don't claim this all is wrong.
> > And, it is not clear why disable_nonboot_cpus() assumes that
> > all but first_cpu(cpu_online_map) must have .hotpluggable == 1.
>
> Where does it assume that?
>
> It will fail if some CPUs can't be unplugged, and I'm afraid suspend
> can't work in such case...
Yes I see. But disable_nonboot_cpus() doesn't check .hotpluggable,
it just takes CPU down regardless of .hotpluggable, is it always OK?
Oleg.