2013-04-16 09:58:30

by Robin Holt

[permalink] [raw]
Subject: [Patch -v4 1/4] Migrate shutdown/reboot to boot cpu.

We recently noticed that reboot of a 1024 cpu machine takes approx 16
minutes of just stopping the cpus. The slowdown was tracked to commit
f96972f.

The current implementation does all the work of hot removing the cpus
before halting the system. We are switching to just migrating to the
boot cpu and then continuing with shutdown/reboot.

This also has the effect of not breaking x86's command line parameter for
specifying the reboot cpu. Note, this code was shamelessly copied from
arch/x86/kernel/reboot.c with bits removed pertaining to the reboot_cpu
command line parameter.

Signed-off-by: Robin Holt <[email protected]>
Tested-by: Shawn Guo <[email protected]>
To: Ingo Molnar <[email protected]>
To: Russ Anderson <[email protected]>
To: Oleg Nesterov <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Lai Jiangshan <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Linux Kernel Mailing List <[email protected]>
Cc: Michel Lespinasse <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: "Paul E. McKenney" <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Robin Holt <[email protected]>
Cc: "[email protected]" <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: the arch/x86 maintainers <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: <[email protected]>

---

Changes since -v1.
- Set PF_THREAD_BOUND before migrating to eliminate potential race.
- Modified kernel_power_off to also migrate instead of using
disable_nonboot_cpus().
---
kernel/sys.c | 22 +++++++++++++++++++---
1 file changed, 19 insertions(+), 3 deletions(-)

diff --git a/kernel/sys.c b/kernel/sys.c
index 0da73cf..5ef7aa2 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -357,6 +357,22 @@ int unregister_reboot_notifier(struct notifier_block *nb)
}
EXPORT_SYMBOL(unregister_reboot_notifier);

+void migrate_to_reboot_cpu(void)
+{
+ /* The boot cpu is always logical cpu 0 */
+ int reboot_cpu_id = 0;
+
+ /* Make certain the cpu I'm about to reboot on is online */
+ if (!cpu_online(reboot_cpu_id))
+ reboot_cpu_id = smp_processor_id();
+
+ /* Prevent races with other tasks migrating this task. */
+ current->flags |= PF_THREAD_BOUND;
+
+ /* Make certain I only run on the appropriate processor */
+ set_cpus_allowed_ptr(current, cpumask_of(reboot_cpu_id));
+}
+
/**
* kernel_restart - reboot the system
* @cmd: pointer to buffer containing command to execute for restart
@@ -368,7 +384,7 @@ EXPORT_SYMBOL(unregister_reboot_notifier);
void kernel_restart(char *cmd)
{
kernel_restart_prepare(cmd);
- disable_nonboot_cpus();
+ migrate_to_reboot_cpu();
syscore_shutdown();
if (!cmd)
printk(KERN_EMERG "Restarting system.\n");
@@ -395,7 +411,7 @@ static void kernel_shutdown_prepare(enum system_states state)
void kernel_halt(void)
{
kernel_shutdown_prepare(SYSTEM_HALT);
- disable_nonboot_cpus();
+ migrate_to_reboot_cpu();
syscore_shutdown();
printk(KERN_EMERG "System halted.\n");
kmsg_dump(KMSG_DUMP_HALT);
@@ -414,7 +430,7 @@ void kernel_power_off(void)
kernel_shutdown_prepare(SYSTEM_POWER_OFF);
if (pm_power_off_prepare)
pm_power_off_prepare();
- disable_nonboot_cpus();
+ migrate_to_reboot_cpu();
syscore_shutdown();
printk(KERN_EMERG "Power down.\n");
kmsg_dump(KMSG_DUMP_POWEROFF);
--
1.8.1.2


2013-04-16 11:33:04

by Ingo Molnar

[permalink] [raw]
Subject: Re: [Patch -v4 1/4] Migrate shutdown/reboot to boot cpu.


* Robin Holt <[email protected]> wrote:

> We recently noticed that reboot of a 1024 cpu machine takes approx 16
> minutes of just stopping the cpus. The slowdown was tracked to commit
> f96972f.
>
> The current implementation does all the work of hot removing the cpus
> before halting the system. We are switching to just migrating to the
> boot cpu and then continuing with shutdown/reboot.
>
> This also has the effect of not breaking x86's command line parameter for
> specifying the reboot cpu. Note, this code was shamelessly copied from
> arch/x86/kernel/reboot.c with bits removed pertaining to the reboot_cpu
> command line parameter.
>
> Signed-off-by: Robin Holt <[email protected]>
> Tested-by: Shawn Guo <[email protected]>
> To: Ingo Molnar <[email protected]>
> To: Russ Anderson <[email protected]>
> To: Oleg Nesterov <[email protected]>
> Cc: Andrew Morton <[email protected]>
> Cc: "H. Peter Anvin" <[email protected]>
> Cc: Lai Jiangshan <[email protected]>
> Cc: Linus Torvalds <[email protected]>
> Cc: Linux Kernel Mailing List <[email protected]>
> Cc: Michel Lespinasse <[email protected]>
> Cc: Oleg Nesterov <[email protected]>
> Cc: "Paul E. McKenney" <[email protected]>
> Cc: Paul Mackerras <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Robin Holt <[email protected]>
> Cc: "[email protected]" <[email protected]>
> Cc: Tejun Heo <[email protected]>
> Cc: the arch/x86 maintainers <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: <[email protected]>
>
> ---
>
> Changes since -v1.
> - Set PF_THREAD_BOUND before migrating to eliminate potential race.
> - Modified kernel_power_off to also migrate instead of using
> disable_nonboot_cpus().
> ---
> kernel/sys.c | 22 +++++++++++++++++++---
> 1 file changed, 19 insertions(+), 3 deletions(-)
>
> diff --git a/kernel/sys.c b/kernel/sys.c
> index 0da73cf..5ef7aa2 100644
> --- a/kernel/sys.c
> +++ b/kernel/sys.c
> @@ -357,6 +357,22 @@ int unregister_reboot_notifier(struct notifier_block *nb)
> }
> EXPORT_SYMBOL(unregister_reboot_notifier);
>
> +void migrate_to_reboot_cpu(void)

It appears to be file-scope, so should be static I guess?

> +{
> + /* The boot cpu is always logical cpu 0 */
> + int reboot_cpu_id = 0;
> +
> + /* Make certain the cpu I'm about to reboot on is online */
> + if (!cpu_online(reboot_cpu_id))
> + reboot_cpu_id = smp_processor_id();

Shouldn't we pick the first online CPU instead, to make it deterministic?

Also, does this codepath prevent hotplug from going on in parallel?

( Plus, the smp_processor_id() is in a preemptible section AFAICS, so it will
throw a warning with preempt debug on. )

> +
> + /* Prevent races with other tasks migrating this task. */

(I guess the colon can be dropped here, like in the other comments.)

> + current->flags |= PF_THREAD_BOUND;
> +
> + /* Make certain I only run on the appropriate processor */
> + set_cpus_allowed_ptr(current, cpumask_of(reboot_cpu_id));
> +}

Thanks,

Ingo

2013-04-16 12:06:41

by Robin Holt

[permalink] [raw]
Subject: Re: [Patch -v4 1/4] Migrate shutdown/reboot to boot cpu.

On Tue, Apr 16, 2013 at 01:32:56PM +0200, Ingo Molnar wrote:
>
> * Robin Holt <[email protected]> wrote:
>
> > We recently noticed that reboot of a 1024 cpu machine takes approx 16
> > minutes of just stopping the cpus. The slowdown was tracked to commit
> > f96972f.
> >
> > The current implementation does all the work of hot removing the cpus
> > before halting the system. We are switching to just migrating to the
> > boot cpu and then continuing with shutdown/reboot.
> >
> > This also has the effect of not breaking x86's command line parameter for
> > specifying the reboot cpu. Note, this code was shamelessly copied from
> > arch/x86/kernel/reboot.c with bits removed pertaining to the reboot_cpu
> > command line parameter.
> >
> > Signed-off-by: Robin Holt <[email protected]>
> > Tested-by: Shawn Guo <[email protected]>
> > To: Ingo Molnar <[email protected]>
> > To: Russ Anderson <[email protected]>
> > To: Oleg Nesterov <[email protected]>
> > Cc: Andrew Morton <[email protected]>
> > Cc: "H. Peter Anvin" <[email protected]>
> > Cc: Lai Jiangshan <[email protected]>
> > Cc: Linus Torvalds <[email protected]>
> > Cc: Linux Kernel Mailing List <[email protected]>
> > Cc: Michel Lespinasse <[email protected]>
> > Cc: Oleg Nesterov <[email protected]>
> > Cc: "Paul E. McKenney" <[email protected]>
> > Cc: Paul Mackerras <[email protected]>
> > Cc: Peter Zijlstra <[email protected]>
> > Cc: Robin Holt <[email protected]>
> > Cc: "[email protected]" <[email protected]>
> > Cc: Tejun Heo <[email protected]>
> > Cc: the arch/x86 maintainers <[email protected]>
> > Cc: Thomas Gleixner <[email protected]>
> > Cc: <[email protected]>
> >
> > ---
> >
> > Changes since -v1.
> > - Set PF_THREAD_BOUND before migrating to eliminate potential race.
> > - Modified kernel_power_off to also migrate instead of using
> > disable_nonboot_cpus().
> > ---
> > kernel/sys.c | 22 +++++++++++++++++++---
> > 1 file changed, 19 insertions(+), 3 deletions(-)
> >
> > diff --git a/kernel/sys.c b/kernel/sys.c
> > index 0da73cf..5ef7aa2 100644
> > --- a/kernel/sys.c
> > +++ b/kernel/sys.c
> > @@ -357,6 +357,22 @@ int unregister_reboot_notifier(struct notifier_block *nb)
> > }
> > EXPORT_SYMBOL(unregister_reboot_notifier);
> >
> > +void migrate_to_reboot_cpu(void)
>
> It appears to be file-scope, so should be static I guess?

Done.

> > +{
> > + /* The boot cpu is always logical cpu 0 */
> > + int reboot_cpu_id = 0;
> > +
> > + /* Make certain the cpu I'm about to reboot on is online */
> > + if (!cpu_online(reboot_cpu_id))
> > + reboot_cpu_id = smp_processor_id();
>
> Shouldn't we pick the first online CPU instead, to make it deterministic?

Done.

reboot_cpu_id = cpumask_first(cpu_online_mask);

> Also, does this codepath prevent hotplug from going on in parallel?

Not sure. I have not considered hotplug. I will look that over when I
am in the office.

> ( Plus, the smp_processor_id() is in a preemptible section AFAICS, so it will
> throw a warning with preempt debug on. )
>
> > +
> > + /* Prevent races with other tasks migrating this task. */
>
> (I guess the colon can be dropped here, like in the other comments.)

Done.

>
> > + current->flags |= PF_THREAD_BOUND;
> > +
> > + /* Make certain I only run on the appropriate processor */
> > + set_cpus_allowed_ptr(current, cpumask_of(reboot_cpu_id));
> > +}

I will resubmit when I have the hotplug stuff understood and after giving
the set some more time for comments.

Robin

2013-04-16 14:01:07

by Robin Holt

[permalink] [raw]
Subject: Re: [Patch -v4 1/4] Migrate shutdown/reboot to boot cpu.

> > > +{
> > > + /* The boot cpu is always logical cpu 0 */
> > > + int reboot_cpu_id = 0;
> > > +
> > > + /* Make certain the cpu I'm about to reboot on is online */
> > > + if (!cpu_online(reboot_cpu_id))
> > > + reboot_cpu_id = smp_processor_id();
> >
> > Shouldn't we pick the first online CPU instead, to make it deterministic?
>
> Done.
>
> reboot_cpu_id = cpumask_first(cpu_online_mask);
>
> > Also, does this codepath prevent hotplug from going on in parallel?
>
> Not sure. I have not considered hotplug. I will look that over when I
> am in the office.

OK. I have been mulling this over for a bit and I don't think I
understand what you are asking.

I would expect that if an architecture depends upon a certain cpu for
shutdown/reboot/halt/suspend/hibernate and that support has been compiled
in, then the arch should be preventing that cpu from being removed.
I do not know how that would work and think that is far beyond the scope
of the initial problem I have been trying to solve. If that is your
question, I certainly do not know how to address it. I get the feeling
this is off your mark due to the "parallel" wording above.

The other question I think you might be asking is something about the
shutdown/reboot/halt task trying to migrate to a cpu which is in the
process of being off-lined. I believe the code will "work" in that
we will have selected a cpu to migrate to, that migration may fail,
in which case our task remains on the current cpu, may succeed and the
cpu is immediately offlined, in which case the hotplug code should move
us to another cpu, or we complete the shutdown before the hotplug code
gets a chance to run, in which case it is irrelevant.

Have I addressed your concern? Are you asking me to look into a method
for preventing the arch from hot removing the shutdown/reboot cpu?

Thanks,
Robin

2013-04-16 15:50:58

by Srivatsa S. Bhat

[permalink] [raw]
Subject: Re: [Patch -v4 1/4] Migrate shutdown/reboot to boot cpu.

On 04/16/2013 05:36 PM, Robin Holt wrote:
> On Tue, Apr 16, 2013 at 01:32:56PM +0200, Ingo Molnar wrote:
>>
>> * Robin Holt <[email protected]> wrote:
>>
>>> We recently noticed that reboot of a 1024 cpu machine takes approx 16
>>> minutes of just stopping the cpus. The slowdown was tracked to commit
>>> f96972f.
>>>
>>> The current implementation does all the work of hot removing the cpus
>>> before halting the system. We are switching to just migrating to the
>>> boot cpu and then continuing with shutdown/reboot.
>>>
>>> This also has the effect of not breaking x86's command line parameter for
>>> specifying the reboot cpu. Note, this code was shamelessly copied from
>>> arch/x86/kernel/reboot.c with bits removed pertaining to the reboot_cpu
>>> command line parameter.
>>>
>>> Signed-off-by: Robin Holt <[email protected]>
>>> Tested-by: Shawn Guo <[email protected]>
>>> To: Ingo Molnar <[email protected]>
>>> To: Russ Anderson <[email protected]>
>>> To: Oleg Nesterov <[email protected]>
>>> Cc: Andrew Morton <[email protected]>
>>> Cc: "H. Peter Anvin" <[email protected]>
>>> Cc: Lai Jiangshan <[email protected]>
>>> Cc: Linus Torvalds <[email protected]>
>>> Cc: Linux Kernel Mailing List <[email protected]>
>>> Cc: Michel Lespinasse <[email protected]>
>>> Cc: Oleg Nesterov <[email protected]>
>>> Cc: "Paul E. McKenney" <[email protected]>
>>> Cc: Paul Mackerras <[email protected]>
>>> Cc: Peter Zijlstra <[email protected]>
>>> Cc: Robin Holt <[email protected]>
>>> Cc: "[email protected]" <[email protected]>
>>> Cc: Tejun Heo <[email protected]>
>>> Cc: the arch/x86 maintainers <[email protected]>
>>> Cc: Thomas Gleixner <[email protected]>
>>> Cc: <[email protected]>
>>>
>>> ---
>>>
>>> Changes since -v1.
>>> - Set PF_THREAD_BOUND before migrating to eliminate potential race.
>>> - Modified kernel_power_off to also migrate instead of using
>>> disable_nonboot_cpus().
>>> ---
>>> kernel/sys.c | 22 +++++++++++++++++++---
>>> 1 file changed, 19 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/kernel/sys.c b/kernel/sys.c
>>> index 0da73cf..5ef7aa2 100644
>>> --- a/kernel/sys.c
>>> +++ b/kernel/sys.c
>>> @@ -357,6 +357,22 @@ int unregister_reboot_notifier(struct notifier_block *nb)
>>> }
>>> EXPORT_SYMBOL(unregister_reboot_notifier);
>>>
>>> +void migrate_to_reboot_cpu(void)
>>
>> It appears to be file-scope, so should be static I guess?
>
> Done.
>
>>> +{
>>> + /* The boot cpu is always logical cpu 0 */
>>> + int reboot_cpu_id = 0;
>>> +
>>> + /* Make certain the cpu I'm about to reboot on is online */
>>> + if (!cpu_online(reboot_cpu_id))
>>> + reboot_cpu_id = smp_processor_id();
>>
>> Shouldn't we pick the first online CPU instead, to make it deterministic?
>
> Done.
>
> reboot_cpu_id = cpumask_first(cpu_online_mask);
>

Let me ask again: if CPU 0 (or whatever the preferred reboot cpu is)
is offline, then why should we even bother pinning the task to (another)
CPU? Why not just proceed with the reboot?

Regards,
Srivatsa S. Bhat

2013-04-16 16:22:42

by Robin Holt

[permalink] [raw]
Subject: Re: [Patch -v4 1/4] Migrate shutdown/reboot to boot cpu.

On Tue, Apr 16, 2013 at 09:18:07PM +0530, Srivatsa S. Bhat wrote:
> On 04/16/2013 05:36 PM, Robin Holt wrote:
> > On Tue, Apr 16, 2013 at 01:32:56PM +0200, Ingo Molnar wrote:
> >>
> >> * Robin Holt <[email protected]> wrote:
> >>
> >>> We recently noticed that reboot of a 1024 cpu machine takes approx 16
> >>> minutes of just stopping the cpus. The slowdown was tracked to commit
> >>> f96972f.
> >>>
> >>> The current implementation does all the work of hot removing the cpus
> >>> before halting the system. We are switching to just migrating to the
> >>> boot cpu and then continuing with shutdown/reboot.
> >>>
> >>> This also has the effect of not breaking x86's command line parameter for
> >>> specifying the reboot cpu. Note, this code was shamelessly copied from
> >>> arch/x86/kernel/reboot.c with bits removed pertaining to the reboot_cpu
> >>> command line parameter.
> >>>
> >>> Signed-off-by: Robin Holt <[email protected]>
> >>> Tested-by: Shawn Guo <[email protected]>
> >>> To: Ingo Molnar <[email protected]>
> >>> To: Russ Anderson <[email protected]>
> >>> To: Oleg Nesterov <[email protected]>
> >>> Cc: Andrew Morton <[email protected]>
> >>> Cc: "H. Peter Anvin" <[email protected]>
> >>> Cc: Lai Jiangshan <[email protected]>
> >>> Cc: Linus Torvalds <[email protected]>
> >>> Cc: Linux Kernel Mailing List <[email protected]>
> >>> Cc: Michel Lespinasse <[email protected]>
> >>> Cc: Oleg Nesterov <[email protected]>
> >>> Cc: "Paul E. McKenney" <[email protected]>
> >>> Cc: Paul Mackerras <[email protected]>
> >>> Cc: Peter Zijlstra <[email protected]>
> >>> Cc: Robin Holt <[email protected]>
> >>> Cc: "[email protected]" <[email protected]>
> >>> Cc: Tejun Heo <[email protected]>
> >>> Cc: the arch/x86 maintainers <[email protected]>
> >>> Cc: Thomas Gleixner <[email protected]>
> >>> Cc: <[email protected]>
> >>>
> >>> ---
> >>>
> >>> Changes since -v1.
> >>> - Set PF_THREAD_BOUND before migrating to eliminate potential race.
> >>> - Modified kernel_power_off to also migrate instead of using
> >>> disable_nonboot_cpus().
> >>> ---
> >>> kernel/sys.c | 22 +++++++++++++++++++---
> >>> 1 file changed, 19 insertions(+), 3 deletions(-)
> >>>
> >>> diff --git a/kernel/sys.c b/kernel/sys.c
> >>> index 0da73cf..5ef7aa2 100644
> >>> --- a/kernel/sys.c
> >>> +++ b/kernel/sys.c
> >>> @@ -357,6 +357,22 @@ int unregister_reboot_notifier(struct notifier_block *nb)
> >>> }
> >>> EXPORT_SYMBOL(unregister_reboot_notifier);
> >>>
> >>> +void migrate_to_reboot_cpu(void)
> >>
> >> It appears to be file-scope, so should be static I guess?
> >
> > Done.
> >
> >>> +{
> >>> + /* The boot cpu is always logical cpu 0 */
> >>> + int reboot_cpu_id = 0;
> >>> +
> >>> + /* Make certain the cpu I'm about to reboot on is online */
> >>> + if (!cpu_online(reboot_cpu_id))
> >>> + reboot_cpu_id = smp_processor_id();
> >>
> >> Shouldn't we pick the first online CPU instead, to make it deterministic?
> >
> > Done.
> >
> > reboot_cpu_id = cpumask_first(cpu_online_mask);
> >
>
> Let me ask again: if CPU 0 (or whatever the preferred reboot cpu is)
> is offline, then why should we even bother pinning the task to (another)
> CPU? Why not just proceed with the reboot?

No idea. I copied it from the arch/x86 code. I can not defend it.

Robin

2013-04-17 07:47:00

by Ingo Molnar

[permalink] [raw]
Subject: Re: [Patch -v4 1/4] Migrate shutdown/reboot to boot cpu.


* Robin Holt <[email protected]> wrote:

> > reboot_cpu_id = cpumask_first(cpu_online_mask);
> >
> > > Also, does this codepath prevent hotplug from going on in parallel?
> >
> > Not sure. I have not considered hotplug. I will look that over when I
> > am in the office.
>
> OK. I have been mulling this over for a bit and I don't think I
> understand what you are asking.

Well, I just saw the apparently naked use of cpu_online_mask, and asked myself
whether that's safe against hotplug.

Upstream we had two methods:

- historical: just reboot on any random CPU we happen to run on
- current: offline all nonboot CPUs then reboot on the boot CPU

Both methods were implicitly "CPU hotplug safe", no locking needed, because either
they didn't need any, or because it used disable_nonboot_cpus() which is a hotplug
safe method.

Now your patches change this to:

- migrate to CPU#0 [if possible] and reboot there

Given that on a system CPU-hotplugging might be executing on any given CPU, if
reboot is running on another you have to consider the interactions. The previous
historic and current upstream method was reasonably hotplug safe - yours I'm not
sure about, there's just no hotplug locking in it, etc?

> I would expect that if an architecture depends upon a certain cpu for
> shutdown/reboot/halt/suspend/hibernate and that support has been compiled in,
> then the arch should be preventing that cpu from being removed. I do not know
> how that would work and think that is far beyond the scope of the initial
> problem I have been trying to solve. If that is your question, I certainly do
> not know how to address it. I get the feeling this is off your mark due to the
> "parallel" wording above.

What you mention here should indeed already be handled by the architecture hotplug
code (for example on x86 the boot CPU cannot be hot-removed).

But beyond that, your use of cpu_online_map is AFAICS not hotplug-safe. For
example a possible race would be that another CPU might be not-unplugging a CPU
and you try to reboot-migrate to it.

Thanks,

Ingo

2013-04-17 07:48:40

by Ingo Molnar

[permalink] [raw]
Subject: Re: [Patch -v4 1/4] Migrate shutdown/reboot to boot cpu.


* Robin Holt <[email protected]> wrote:

> On Tue, Apr 16, 2013 at 09:18:07PM +0530, Srivatsa S. Bhat wrote:
> > On 04/16/2013 05:36 PM, Robin Holt wrote:
> > > On Tue, Apr 16, 2013 at 01:32:56PM +0200, Ingo Molnar wrote:
> > >>
> > >> * Robin Holt <[email protected]> wrote:
> > >>
> > >>> We recently noticed that reboot of a 1024 cpu machine takes approx 16
> > >>> minutes of just stopping the cpus. The slowdown was tracked to commit
> > >>> f96972f.
> > >>>
> > >>> The current implementation does all the work of hot removing the cpus
> > >>> before halting the system. We are switching to just migrating to the
> > >>> boot cpu and then continuing with shutdown/reboot.
> > >>>
> > >>> This also has the effect of not breaking x86's command line parameter for
> > >>> specifying the reboot cpu. Note, this code was shamelessly copied from
> > >>> arch/x86/kernel/reboot.c with bits removed pertaining to the reboot_cpu
> > >>> command line parameter.
> > >>>
> > >>> Signed-off-by: Robin Holt <[email protected]>
> > >>> Tested-by: Shawn Guo <[email protected]>
> > >>> To: Ingo Molnar <[email protected]>
> > >>> To: Russ Anderson <[email protected]>
> > >>> To: Oleg Nesterov <[email protected]>
> > >>> Cc: Andrew Morton <[email protected]>
> > >>> Cc: "H. Peter Anvin" <[email protected]>
> > >>> Cc: Lai Jiangshan <[email protected]>
> > >>> Cc: Linus Torvalds <[email protected]>
> > >>> Cc: Linux Kernel Mailing List <[email protected]>
> > >>> Cc: Michel Lespinasse <[email protected]>
> > >>> Cc: Oleg Nesterov <[email protected]>
> > >>> Cc: "Paul E. McKenney" <[email protected]>
> > >>> Cc: Paul Mackerras <[email protected]>
> > >>> Cc: Peter Zijlstra <[email protected]>
> > >>> Cc: Robin Holt <[email protected]>
> > >>> Cc: "[email protected]" <[email protected]>
> > >>> Cc: Tejun Heo <[email protected]>
> > >>> Cc: the arch/x86 maintainers <[email protected]>
> > >>> Cc: Thomas Gleixner <[email protected]>
> > >>> Cc: <[email protected]>
> > >>>
> > >>> ---
> > >>>
> > >>> Changes since -v1.
> > >>> - Set PF_THREAD_BOUND before migrating to eliminate potential race.
> > >>> - Modified kernel_power_off to also migrate instead of using
> > >>> disable_nonboot_cpus().
> > >>> ---
> > >>> kernel/sys.c | 22 +++++++++++++++++++---
> > >>> 1 file changed, 19 insertions(+), 3 deletions(-)
> > >>>
> > >>> diff --git a/kernel/sys.c b/kernel/sys.c
> > >>> index 0da73cf..5ef7aa2 100644
> > >>> --- a/kernel/sys.c
> > >>> +++ b/kernel/sys.c
> > >>> @@ -357,6 +357,22 @@ int unregister_reboot_notifier(struct notifier_block *nb)
> > >>> }
> > >>> EXPORT_SYMBOL(unregister_reboot_notifier);
> > >>>
> > >>> +void migrate_to_reboot_cpu(void)
> > >>
> > >> It appears to be file-scope, so should be static I guess?
> > >
> > > Done.
> > >
> > >>> +{
> > >>> + /* The boot cpu is always logical cpu 0 */
> > >>> + int reboot_cpu_id = 0;
> > >>> +
> > >>> + /* Make certain the cpu I'm about to reboot on is online */
> > >>> + if (!cpu_online(reboot_cpu_id))
> > >>> + reboot_cpu_id = smp_processor_id();
> > >>
> > >> Shouldn't we pick the first online CPU instead, to make it deterministic?
> > >
> > > Done.
> > >
> > > reboot_cpu_id = cpumask_first(cpu_online_mask);
> > >
> >
> > Let me ask again: if CPU 0 (or whatever the preferred reboot cpu is)
> > is offline, then why should we even bother pinning the task to (another)
> > CPU? Why not just proceed with the reboot?
>
> No idea. I copied it from the arch/x86 code. I can not defend it.

I'd say it's a quality of implementation improvement if the choice of the CPU is
deterministic, as long as the current configuration of CPUs is deterministic.

I.e. instead of 'reboot on the first CPU, or a random CPU', make the rule 'reboot
on the first online CPU'. That's a simple rule to think about.

( On most architectures CPU#0 cannot be unplugged, so the rule will effectively be
'reboot on CPU#0'. Like the current upstream behavior. )

Thanks,

Ingo

2013-04-17 09:27:45

by Borislav Petkov

[permalink] [raw]
Subject: Re: [Patch -v4 1/4] Migrate shutdown/reboot to boot cpu.

On Wed, Apr 17, 2013 at 09:46:53AM +0200, Ingo Molnar wrote:
> What you mention here should indeed already be handled by the
> architecture hotplug code (for example on x86 the boot CPU cannot be
> hot-removed).

Supposedly, some new Intels (I think Ivybridge or so) can actually be
hot-removed.

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2013-04-17 10:03:21

by Robin Holt

[permalink] [raw]
Subject: Re: [Patch -v4 1/4] Migrate shutdown/reboot to boot cpu.

On Wed, Apr 17, 2013 at 09:48:35AM +0200, Ingo Molnar wrote:
>
> * Robin Holt <[email protected]> wrote:
>
> > On Tue, Apr 16, 2013 at 09:18:07PM +0530, Srivatsa S. Bhat wrote:
> > > On 04/16/2013 05:36 PM, Robin Holt wrote:
> > > > On Tue, Apr 16, 2013 at 01:32:56PM +0200, Ingo Molnar wrote:
> > > >>
> > > >> * Robin Holt <[email protected]> wrote:
> > > >>
> > > >>> We recently noticed that reboot of a 1024 cpu machine takes approx 16
> > > >>> minutes of just stopping the cpus. The slowdown was tracked to commit
> > > >>> f96972f.
> > > >>>
> > > >>> The current implementation does all the work of hot removing the cpus
> > > >>> before halting the system. We are switching to just migrating to the
> > > >>> boot cpu and then continuing with shutdown/reboot.
> > > >>>
> > > >>> This also has the effect of not breaking x86's command line parameter for
> > > >>> specifying the reboot cpu. Note, this code was shamelessly copied from
> > > >>> arch/x86/kernel/reboot.c with bits removed pertaining to the reboot_cpu
> > > >>> command line parameter.
> > > >>>
> > > >>> Signed-off-by: Robin Holt <[email protected]>
> > > >>> Tested-by: Shawn Guo <[email protected]>
> > > >>> To: Ingo Molnar <[email protected]>
> > > >>> To: Russ Anderson <[email protected]>
> > > >>> To: Oleg Nesterov <[email protected]>
> > > >>> Cc: Andrew Morton <[email protected]>
> > > >>> Cc: "H. Peter Anvin" <[email protected]>
> > > >>> Cc: Lai Jiangshan <[email protected]>
> > > >>> Cc: Linus Torvalds <[email protected]>
> > > >>> Cc: Linux Kernel Mailing List <[email protected]>
> > > >>> Cc: Michel Lespinasse <[email protected]>
> > > >>> Cc: Oleg Nesterov <[email protected]>
> > > >>> Cc: "Paul E. McKenney" <[email protected]>
> > > >>> Cc: Paul Mackerras <[email protected]>
> > > >>> Cc: Peter Zijlstra <[email protected]>
> > > >>> Cc: Robin Holt <[email protected]>
> > > >>> Cc: "[email protected]" <[email protected]>
> > > >>> Cc: Tejun Heo <[email protected]>
> > > >>> Cc: the arch/x86 maintainers <[email protected]>
> > > >>> Cc: Thomas Gleixner <[email protected]>
> > > >>> Cc: <[email protected]>
> > > >>>
> > > >>> ---
> > > >>>
> > > >>> Changes since -v1.
> > > >>> - Set PF_THREAD_BOUND before migrating to eliminate potential race.
> > > >>> - Modified kernel_power_off to also migrate instead of using
> > > >>> disable_nonboot_cpus().
> > > >>> ---
> > > >>> kernel/sys.c | 22 +++++++++++++++++++---
> > > >>> 1 file changed, 19 insertions(+), 3 deletions(-)
> > > >>>
> > > >>> diff --git a/kernel/sys.c b/kernel/sys.c
> > > >>> index 0da73cf..5ef7aa2 100644
> > > >>> --- a/kernel/sys.c
> > > >>> +++ b/kernel/sys.c
> > > >>> @@ -357,6 +357,22 @@ int unregister_reboot_notifier(struct notifier_block *nb)
> > > >>> }
> > > >>> EXPORT_SYMBOL(unregister_reboot_notifier);
> > > >>>
> > > >>> +void migrate_to_reboot_cpu(void)
> > > >>
> > > >> It appears to be file-scope, so should be static I guess?
> > > >
> > > > Done.
> > > >
> > > >>> +{
> > > >>> + /* The boot cpu is always logical cpu 0 */
> > > >>> + int reboot_cpu_id = 0;
> > > >>> +
> > > >>> + /* Make certain the cpu I'm about to reboot on is online */
> > > >>> + if (!cpu_online(reboot_cpu_id))
> > > >>> + reboot_cpu_id = smp_processor_id();
> > > >>
> > > >> Shouldn't we pick the first online CPU instead, to make it deterministic?
> > > >
> > > > Done.
> > > >
> > > > reboot_cpu_id = cpumask_first(cpu_online_mask);
> > > >
> > >
> > > Let me ask again: if CPU 0 (or whatever the preferred reboot cpu is)
> > > is offline, then why should we even bother pinning the task to (another)
> > > CPU? Why not just proceed with the reboot?
> >
> > No idea. I copied it from the arch/x86 code. I can not defend it.
>
> I'd say it's a quality of implementation improvement if the choice of the CPU is
> deterministic, as long as the current configuration of CPUs is deterministic.
>
> I.e. instead of 'reboot on the first CPU, or a random CPU', make the rule 'reboot
> on the first online CPU'. That's a simple rule to think about.
>
> ( On most architectures CPU#0 cannot be unplugged, so the rule will effectively be
> 'reboot on CPU#0'. Like the current upstream behavior. )

Would you be more comfortable with:

static void migrate_to_reboot_cpu(void)
{
/* The boot cpu is always logical cpu 0 */
int reboot_cpu_id = reboot_cpuid;

get_online_cpus();

/* Make certain the cpu I'm about to reboot on is online */
if (!cpu_online(reboot_cpu_id))
reboot_cpu_id = 0;
if (!cpu_online(reboot_cpu_id))
reboot_cpu_id = cpumask_first(cpu_online_mask);

/* Prevent races with other tasks migrating this task */
current->flags |= PF_THREAD_BOUND;

/* Make certain I only run on the appropriate processor */
set_cpus_allowed_ptr(current, cpumask_of(reboot_cpu_id));

put_online_cpus();
}

Robin

2013-04-17 11:34:14

by Srivatsa S. Bhat

[permalink] [raw]
Subject: Re: [Patch -v4 1/4] Migrate shutdown/reboot to boot cpu.

On 04/17/2013 03:33 PM, Robin Holt wrote:
> On Wed, Apr 17, 2013 at 09:48:35AM +0200, Ingo Molnar wrote:
>>
>> * Robin Holt <[email protected]> wrote:
>>
>>> On Tue, Apr 16, 2013 at 09:18:07PM +0530, Srivatsa S. Bhat wrote:
>>>> On 04/16/2013 05:36 PM, Robin Holt wrote:
>>>>> On Tue, Apr 16, 2013 at 01:32:56PM +0200, Ingo Molnar wrote:
>>>>>>
>>>>>> * Robin Holt <[email protected]> wrote:
>>>>>>
>>>>>>> We recently noticed that reboot of a 1024 cpu machine takes approx 16
>>>>>>> minutes of just stopping the cpus. The slowdown was tracked to commit
>>>>>>> f96972f.
>>>>>>>
>>>>>>> The current implementation does all the work of hot removing the cpus
>>>>>>> before halting the system. We are switching to just migrating to the
>>>>>>> boot cpu and then continuing with shutdown/reboot.
>>>>>>>
>>>>>>> This also has the effect of not breaking x86's command line parameter for
>>>>>>> specifying the reboot cpu. Note, this code was shamelessly copied from
>>>>>>> arch/x86/kernel/reboot.c with bits removed pertaining to the reboot_cpu
>>>>>>> command line parameter.
>>>>>>>
[...]
>>>>>>> ---
>>>>>>> kernel/sys.c | 22 +++++++++++++++++++---
>>>>>>> 1 file changed, 19 insertions(+), 3 deletions(-)
>>>>>>>
>>>>>>> diff --git a/kernel/sys.c b/kernel/sys.c
>>>>>>> index 0da73cf..5ef7aa2 100644
>>>>>>> --- a/kernel/sys.c
>>>>>>> +++ b/kernel/sys.c
>>>>>>> @@ -357,6 +357,22 @@ int unregister_reboot_notifier(struct notifier_block *nb)
>>>>>>> }
>>>>>>> EXPORT_SYMBOL(unregister_reboot_notifier);
>>>>>>>
>>>>>>> +void migrate_to_reboot_cpu(void)
>>>>>>
>>>>>> It appears to be file-scope, so should be static I guess?
>>>>>
>>>>> Done.
>>>>>
>>>>>>> +{
>>>>>>> + /* The boot cpu is always logical cpu 0 */
>>>>>>> + int reboot_cpu_id = 0;
>>>>>>> +
>>>>>>> + /* Make certain the cpu I'm about to reboot on is online */
>>>>>>> + if (!cpu_online(reboot_cpu_id))
>>>>>>> + reboot_cpu_id = smp_processor_id();
>>>>>>
>>>>>> Shouldn't we pick the first online CPU instead, to make it deterministic?
>>>>>
>>>>> Done.
>>>>>
>>>>> reboot_cpu_id = cpumask_first(cpu_online_mask);
>>>>>
>>>>
>>>> Let me ask again: if CPU 0 (or whatever the preferred reboot cpu is)
>>>> is offline, then why should we even bother pinning the task to (another)
>>>> CPU? Why not just proceed with the reboot?
>>>
>>> No idea. I copied it from the arch/x86 code. I can not defend it.
>>
>> I'd say it's a quality of implementation improvement if the choice of the CPU is
>> deterministic, as long as the current configuration of CPUs is deterministic.
>>
>> I.e. instead of 'reboot on the first CPU, or a random CPU', make the rule 'reboot
>> on the first online CPU'. That's a simple rule to think about.
>>
>> ( On most architectures CPU#0 cannot be unplugged, so the rule will effectively be
>> 'reboot on CPU#0'. Like the current upstream behavior. )
>
> Would you be more comfortable with:
>
> static void migrate_to_reboot_cpu(void)
> {
> /* The boot cpu is always logical cpu 0 */
> int reboot_cpu_id = reboot_cpuid;
>
> get_online_cpus();
>
> /* Make certain the cpu I'm about to reboot on is online */
> if (!cpu_online(reboot_cpu_id))
> reboot_cpu_id = 0;

This assignment is not required. cpumask_first() gives you 0 if CPU 0
is online.

> if (!cpu_online(reboot_cpu_id))
> reboot_cpu_id = cpumask_first(cpu_online_mask);
>
> /* Prevent races with other tasks migrating this task */
> current->flags |= PF_THREAD_BOUND;
>
> /* Make certain I only run on the appropriate processor */
> set_cpus_allowed_ptr(current, cpumask_of(reboot_cpu_id));
>
> put_online_cpus();
> }
>

The get/put_online_cpus() doesn't help in this case, because if a
hotplug operation is started _after_ this function returns, then
your task will get force migrated - which makes the get/put_online_cpus()
pointless. What we need to do is *disable* CPU hotplug altogether.
We need not even enable it back, since we are rebooting/powering off
anyway.

So how about using the following patch in your series and using
cpu_hotplug_disable() to achieve the goal?


------------------------------------------------------------->
From: Srivatsa S. Bhat <[email protected]>
Subject: [PATCH] CPU hotplug: Provide a generic helper to disable/enable CPU hotplug

There are instances in the kernel where we would like to disable
CPU hotplug (from sysfs) during some important operation. Today
the freezer code depends on this and the code to do it was kinda
tailor-made for that.

Restructure the code and make it generic enough to be useful for
other usecases too.

Signed-off-by: Srivatsa S. Bhat <[email protected]>
---

kernel/cpu.c | 27 +++++++++------------------
1 file changed, 9 insertions(+), 18 deletions(-)

diff --git a/kernel/cpu.c b/kernel/cpu.c
index b5e4ab2..28769f5 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -541,29 +541,20 @@ static int __init alloc_frozen_cpus(void)
core_initcall(alloc_frozen_cpus);

/*
- * Prevent regular CPU hotplug from racing with the freezer, by disabling CPU
- * hotplug when tasks are about to be frozen. Also, don't allow the freezer
- * to continue until any currently running CPU hotplug operation gets
- * completed.
- * To modify the 'cpu_hotplug_disabled' flag, we need to acquire the
- * 'cpu_add_remove_lock'. And this same lock is also taken by the regular
- * CPU hotplug path and released only after it is complete. Thus, we
- * (and hence the freezer) will block here until any currently running CPU
- * hotplug operation gets completed.
+ * Wait for currently running CPU hotplug operations to complete (if any) and
+ * disable future CPU hotplug (from sysfs). The 'cpu_add_remove_lock' protects
+ * the 'cpu_hotplug_disabled' flag. The same lock is also acquired by the
+ * hotplug path before performing hotplug operations. So acquiring that lock
+ * guarantees mutual exclusion from any currently running hotplug operations.
*/
-void cpu_hotplug_disable_before_freeze(void)
+void cpu_hotplug_disable(void)
{
cpu_maps_update_begin();
cpu_hotplug_disabled = 1;
cpu_maps_update_done();
}

-
-/*
- * When tasks have been thawed, re-enable regular CPU hotplug (which had been
- * disabled while beginning to freeze tasks).
- */
-void cpu_hotplug_enable_after_thaw(void)
+void cpu_hotplug_enable(void)
{
cpu_maps_update_begin();
cpu_hotplug_disabled = 0;
@@ -589,12 +580,12 @@ cpu_hotplug_pm_callback(struct notifier_block *nb,

case PM_SUSPEND_PREPARE:
case PM_HIBERNATION_PREPARE:
- cpu_hotplug_disable_before_freeze();
+ cpu_hotplug_disable();
break;

case PM_POST_SUSPEND:
case PM_POST_HIBERNATION:
- cpu_hotplug_enable_after_thaw();
+ cpu_hotplug_enable();
break;

default:

2013-04-17 11:58:23

by Robin Holt

[permalink] [raw]
Subject: Re: [Patch -v4 1/4] Migrate shutdown/reboot to boot cpu.

> The get/put_online_cpus() doesn't help in this case, because if a
> hotplug operation is started _after_ this function returns, then
> your task will get force migrated - which makes the get/put_online_cpus()
> pointless. What we need to do is *disable* CPU hotplug altogether.
> We need not even enable it back, since we are rebooting/powering off
> anyway.
>
> So how about using the following patch in your series and using
> cpu_hotplug_disable() to achieve the goal?

I will incorporate it before my patch and utilize. Thank you very
much as I was at a loss as to what I should be doing.

Robin

2013-04-19 07:56:39

by Ingo Molnar

[permalink] [raw]
Subject: Re: [Patch -v4 1/4] Migrate shutdown/reboot to boot cpu.


* Borislav Petkov <[email protected]> wrote:

> On Wed, Apr 17, 2013 at 09:46:53AM +0200, Ingo Molnar wrote:
>
> > What you mention here should indeed already be handled by the architecture
> > hotplug code (for example on x86 the boot CPU cannot be hot-removed).
>
> Supposedly, some new Intels (I think Ivybridge or so) can actually be
> hot-removed.

There are WIP patches in existence that remove the limitations on the kernel side,
but they are not upstream yet, so currently the constraint exists upstream.

Thanks,

Ingo

2013-04-19 08:32:45

by Srivatsa S. Bhat

[permalink] [raw]
Subject: Re: [Patch -v4 1/4] Migrate shutdown/reboot to boot cpu.

On 04/19/2013 01:26 PM, Ingo Molnar wrote:
>
> * Borislav Petkov <[email protected]> wrote:
>
>> On Wed, Apr 17, 2013 at 09:46:53AM +0200, Ingo Molnar wrote:
>>
>>> What you mention here should indeed already be handled by the architecture
>>> hotplug code (for example on x86 the boot CPU cannot be hot-removed).
>>
>> Supposedly, some new Intels (I think Ivybridge or so) can actually be
>> hot-removed.
>
> There are WIP patches in existence that remove the limitations on the kernel side,
> but they are not upstream yet, so currently the constraint exists upstream.
>

I thought Fenghua Yu's upstream commits were supposed to handle that. Don't they?

a71c8bc x86, topology: Debug CPU0 hotplug
6f5298c x86/i387.c: Initialize thread xstate only on CPU0 only once
8d966a0 x86, hotplug: Handle retrigger irq by the first available CPU
30242aa x86, hotplug: The first online processor saves the MTRR state
27fd185 x86, hotplug: During CPU0 online, enable x2apic, set_numa_node.
e1c467e x86, hotplug: Wake up CPU0 via NMI instead of INIT, SIPI, SIPI
3e2a0cc x86-32, hotplug: Add start_cpu0() entry point to head_32.S
42e78e9 x86-64, hotplug: Add start_cpu0() entry point to head_64.S
6e32d47 kernel/cpu.c: Add comment for priority in cpu_hotplug_pm_callback
209efae x86, hotplug, suspend: Online CPU0 for suspend or hibernate
30106c1 x86, hotplug: Support functions for CPU0 online/offline
4d25031 x86, topology: Don't offline CPU0 if any PIC irq can not be migrated out of it
80aa1df x86, Kconfig: Add config switch for CPU0 hotplug
f78cff4 doc: Add x86 CPU0 online/offline feature

Regards,
Srivatsa S. Bhat

2013-04-19 08:44:36

by Ingo Molnar

[permalink] [raw]
Subject: Re: [Patch -v4 1/4] Migrate shutdown/reboot to boot cpu.


* Srivatsa S. Bhat <[email protected]> wrote:

> On 04/19/2013 01:26 PM, Ingo Molnar wrote:
> >
> > * Borislav Petkov <[email protected]> wrote:
> >
> >> On Wed, Apr 17, 2013 at 09:46:53AM +0200, Ingo Molnar wrote:
> >>
> >>> What you mention here should indeed already be handled by the architecture
> >>> hotplug code (for example on x86 the boot CPU cannot be hot-removed).
> >>
> >> Supposedly, some new Intels (I think Ivybridge or so) can actually be
> >> hot-removed.
> >
> > There are WIP patches in existence that remove the limitations on the kernel side,
> > but they are not upstream yet, so currently the constraint exists upstream.
> >
>
> I thought Fenghua Yu's upstream commits were supposed to handle that. Don't they?

Hm, I thought there were more patches needed to support actual hardware - the
feature also has various limitations related suspend/resume. Are these
CPU0-hotplug commits all that was needed to support the new hot-pluggable
hardware?

In any case, you are right - it's indeed possible upstream as well.

Thanks,

Ingo