2013-06-11 21:51:35

by Dave Hansen

[permalink] [raw]
Subject: cpu hotplug: possible_cpus broken (again?) next-20130607

possible_cpus looks broken again. I'm booting with:

maxcpus=10 possible_cpus=160

But I only get 0-9 in sysfs:

> # ls /sys/devices/system/cpu/
> cpu0 cpu2 cpu4 cpu6 cpu8 cpufreq kernel_max offline possible probe uevent
> cpu1 cpu3 cpu5 cpu7 cpu9 cpuidle modalias online present release



2013-06-11 21:56:11

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: cpu hotplug: possible_cpus broken (again?) next-20130607

On Tuesday, June 11, 2013 02:51:33 PM Dave Hansen wrote:
> possible_cpus looks broken again. I'm booting with:
>
> maxcpus=10 possible_cpus=160
>
> But I only get 0-9 in sysfs:
>
> > # ls /sys/devices/system/cpu/
> > cpu0 cpu2 cpu4 cpu6 cpu8 cpufreq kernel_max offline possible probe uevent
> > cpu1 cpu3 cpu5 cpu7 cpu9 cpuidle modalias online present release

Can you please test the acpi-hotplug branch of the linux-pm.git tree?

Rafael


--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

2013-06-11 22:17:30

by Dave Hansen

[permalink] [raw]
Subject: Re: cpu hotplug: possible_cpus broken (again?) next-20130607

On 06/11/2013 03:05 PM, Rafael J. Wysocki wrote:
> On Tuesday, June 11, 2013 02:51:33 PM Dave Hansen wrote:
>> possible_cpus looks broken again. I'm booting with:
>>
>> maxcpus=10 possible_cpus=160
>>
>> But I only get 0-9 in sysfs:
>>
>>> # ls /sys/devices/system/cpu/
>>> cpu0 cpu2 cpu4 cpu6 cpu8 cpufreq kernel_max offline possible probe uevent
>>> cpu1 cpu3 cpu5 cpu7 cpu9 cpuidle modalias online present release
>
> Can you please test the acpi-hotplug branch of the linux-pm.git tree?

That branch seems to work happily.

2013-06-11 22:25:43

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: cpu hotplug: possible_cpus broken (again?) next-20130607

On Tuesday, June 11, 2013 03:17:28 PM Dave Hansen wrote:
> On 06/11/2013 03:05 PM, Rafael J. Wysocki wrote:
> > On Tuesday, June 11, 2013 02:51:33 PM Dave Hansen wrote:
> >> possible_cpus looks broken again. I'm booting with:
> >>
> >> maxcpus=10 possible_cpus=160
> >>
> >> But I only get 0-9 in sysfs:
> >>
> >>> # ls /sys/devices/system/cpu/
> >>> cpu0 cpu2 cpu4 cpu6 cpu8 cpufreq kernel_max offline possible probe uevent
> >>> cpu1 cpu3 cpu5 cpu7 cpu9 cpuidle modalias online present release
> >
> > Can you please test the acpi-hotplug branch of the linux-pm.git tree?
>
> That branch seems to work happily.

In that case the problem may have been reintroduced by a merge conflict fix in
linux-next.

Thanks,
Rafael


--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

2013-06-11 22:32:50

by Toshi Kani

[permalink] [raw]
Subject: Re: cpu hotplug: possible_cpus broken (again?) next-20130607

On Wed, 2013-06-12 at 00:34 +0200, Rafael J. Wysocki wrote:
> On Tuesday, June 11, 2013 03:17:28 PM Dave Hansen wrote:
> > On 06/11/2013 03:05 PM, Rafael J. Wysocki wrote:
> > > On Tuesday, June 11, 2013 02:51:33 PM Dave Hansen wrote:
> > >> possible_cpus looks broken again. I'm booting with:
> > >>
> > >> maxcpus=10 possible_cpus=160
> > >>
> > >> But I only get 0-9 in sysfs:
> > >>
> > >>> # ls /sys/devices/system/cpu/
> > >>> cpu0 cpu2 cpu4 cpu6 cpu8 cpufreq kernel_max offline possible probe uevent
> > >>> cpu1 cpu3 cpu5 cpu7 cpu9 cpuidle modalias online present release
> > >
> > > Can you please test the acpi-hotplug branch of the linux-pm.git tree?
> >
> > That branch seems to work happily.
>
> In that case the problem may have been reintroduced by a merge conflict fix in
> linux-next.

I believe the problem was introduced by the following change. From the
description, though, this is exactly what this patch was trying to
change... Adding Youguan to the list.

commit 3e275a5ba367ab74b3a4e49114307baed989fcac
Author: Youquan Song <[email protected]>
Date: Fri Jun 7 10:07:08 2013 +1000

drivers/base/cpu.c: fix maxcpus boot option


Thanks,
-Toshi



2013-06-12 00:17:50

by Youquan Song

[permalink] [raw]
Subject: Re: cpu hotplug: possible_cpus broken (again?) next-20130607

On Tue, Jun 11, 2013 at 04:32:34PM -0600, Toshi Kani wrote:
> On Wed, 2013-06-12 at 00:34 +0200, Rafael J. Wysocki wrote:
> > On Tuesday, June 11, 2013 03:17:28 PM Dave Hansen wrote:
> > > On 06/11/2013 03:05 PM, Rafael J. Wysocki wrote:
> > > > On Tuesday, June 11, 2013 02:51:33 PM Dave Hansen wrote:
> > > >> possible_cpus looks broken again. I'm booting with:
> > > >>
> > > >> maxcpus=10 possible_cpus=160
> > > >>
> > > >> But I only get 0-9 in sysfs:
> > > >>
> > > >>> # ls /sys/devices/system/cpu/
> > > >>> cpu0 cpu2 cpu4 cpu6 cpu8 cpufreq kernel_max offline possible probe uevent
> > > >>> cpu1 cpu3 cpu5 cpu7 cpu9 cpuidle modalias online present release
> > > >
> > > > Can you please test the acpi-hotplug branch of the linux-pm.git tree?
> > >
> > > That branch seems to work happily.
> >
> > In that case the problem may have been reintroduced by a merge conflict fix in
> > linux-next.
>
> I believe the problem was introduced by the following change. From the
> description, though, this is exactly what this patch was trying to
> change... Adding Youguan to the list.
>
> commit 3e275a5ba367ab74b3a4e49114307baed989fcac
> Author: Youquan Song <[email protected]>
> Date: Fri Jun 7 10:07:08 2013 +1000
>
> drivers/base/cpu.c: fix maxcpus boot option
>
Hi Toshi,

Thanks Thoshi for the information.
please try the below patch to fix the issue by moving the code to
store_online.

diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c
index 3d48fc8..2378f42 100644
--- a/drivers/base/cpu.c
+++ b/drivers/base/cpu.c
@@ -60,6 +60,13 @@ static ssize_t __ref store_online(struct device *dev,
kobject_uevent(&dev->kobj, KOBJ_OFFLINE);
break;
case '1':
+#ifdef CONFIG_SMP
+ /* return when cpu number greater than maximum number of
CPUs */
+ if (setup_max_cpus <= num_online_cpus() + 1) {
+ cpu_hotplug_driver_unlock();
+ return -EINVAL;
+ }
+#endif
from_nid = cpu_to_node(cpuid);
ret = cpu_up(cpuid);

Thanks
-Youquan

2013-06-12 00:24:17

by Dave Hansen

[permalink] [raw]
Subject: Re: cpu hotplug: possible_cpus broken (again?) next-20130607

On 06/12/2013 05:03 AM, Youquan Song wrote:
> +#ifdef CONFIG_SMP
> + /* return when cpu number greater than maximum number of
> CPUs */
> + if (setup_max_cpus <= num_online_cpus() + 1) {
> + cpu_hotplug_driver_unlock();
> + return -EINVAL;
> + }
> +#endif
> from_nid = cpu_to_node(cpuid);
> ret = cpu_up(cpuid);

Your patch is line-wrapped.

Also, the #ifdef is unnecessary. If CONFIG_SMP is off:

static const unsigned int setup_max_cpus = NR_CPUS;
#define num_online_cpus() 1U

The compiler will take care of optimizing out the the if() without the
explicit #ifdef.

Also, the +1 looks goofy to me. Doesn't this do the same thing (and
isn't it much easier to read)?

if (num_online_cpus() >= setup_max_cpus)

2013-06-12 00:46:54

by Youquan Song

[permalink] [raw]
Subject: Re: cpu hotplug: possible_cpus broken (again?) next-20130607

> On 06/12/2013 05:03 AM, Youquan Song wrote:
> > +#ifdef CONFIG_SMP
> > + /* return when cpu number greater than maximum number of
> > CPUs */
> > + if (setup_max_cpus <= num_online_cpus() + 1) {
> > + cpu_hotplug_driver_unlock();
> > + return -EINVAL;
> > + }
> > +#endif
> > from_nid = cpu_to_node(cpuid);
> > ret = cpu_up(cpuid);
>
> Your patch is line-wrapped.
>
> Also, the #ifdef is unnecessary. If CONFIG_SMP is off:
>
> static const unsigned int setup_max_cpus = NR_CPUS;
> #define num_online_cpus() 1U
>
> The compiler will take care of optimizing out the the if() without the
> explicit #ifdef.
>
> Also, the +1 looks goofy to me. Doesn't this do the same thing (and
> isn't it much easier to read)?
>
> if (num_online_cpus() >= setup_max_cpus)
>

Thanks. Here is a formal patch for it. please review and try.

Subject: [PATCH] core: Fix maxcpus boot option broken

maxcpus boot option to limit maximum number of CPUs on system, but this option
is broken at recent kernel. Though we use maxcpus to limit CPUs number, but
current kernel will register all of present CPUs in sysfs.
udev will enumerate all registered cpu at sysfs, and it will bring up the CPU
if the CPU is offline. So the maxcpus option is broken.

This patch will limit the online cpus number not over limitation of maxcpus
option. So it will keep the maxcpus limitation when udev enumeration
or other intention of bring up CPUs over the limitation by method like
echo 1 > /sys/devices/system/cpu/online

Signed-off-by: Youquan Song <[email protected]>
---
diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c
index 3d48fc8..e32fffa 100644
--- a/drivers/base/cpu.c
+++ b/drivers/base/cpu.c
@@ -60,6 +60,13 @@ static ssize_t __ref store_online(struct device *dev,
kobject_uevent(&dev->kobj, KOBJ_OFFLINE);
break;
case '1':
+ /* Return when online cpu number equal or greater than
+ * maximum number of CPUs */
+ if (num_online_cpus() >= setup_max_cpus) {
+ cpu_hotplug_driver_unlock();
+ return -EINVAL;
+ }
+
from_nid = cpu_to_node(cpuid);
ret = cpu_up(cpuid);

2013-06-12 04:07:39

by Yinghai Lu

[permalink] [raw]
Subject: Re: cpu hotplug: possible_cpus broken (again?) next-20130607

On Wed, Jun 12, 2013 at 5:32 AM, Youquan Song
<[email protected]> wrote:
>> On 06/12/2013 05:03 AM, Youquan Song wrote:
>> > +#ifdef CONFIG_SMP
>> > + /* return when cpu number greater than maximum number of
>> > CPUs */
>> > + if (setup_max_cpus <= num_online_cpus() + 1) {
>> > + cpu_hotplug_driver_unlock();
>> > + return -EINVAL;
>> > + }
>> > +#endif
>> > from_nid = cpu_to_node(cpuid);
>> > ret = cpu_up(cpuid);
>>
>> Your patch is line-wrapped.
>>
>> Also, the #ifdef is unnecessary. If CONFIG_SMP is off:
>>
>> static const unsigned int setup_max_cpus = NR_CPUS;
>> #define num_online_cpus() 1U
>>
>> The compiler will take care of optimizing out the the if() without the
>> explicit #ifdef.
>>
>> Also, the +1 looks goofy to me. Doesn't this do the same thing (and
>> isn't it much easier to read)?
>>
>> if (num_online_cpus() >= setup_max_cpus)
>>
>
> Thanks. Here is a formal patch for it. please review and try.
>
> Subject: [PATCH] core: Fix maxcpus boot option broken
>
> maxcpus boot option to limit maximum number of CPUs on system, but this option
> is broken at recent kernel. Though we use maxcpus to limit CPUs number, but
> current kernel will register all of present CPUs in sysfs.
> udev will enumerate all registered cpu at sysfs, and it will bring up the CPU
> if the CPU is offline. So the maxcpus option is broken.
>
> This patch will limit the online cpus number not over limitation of maxcpus
> option. So it will keep the maxcpus limitation when udev enumeration
> or other intention of bring up CPUs over the limitation by method like
> echo 1 > /sys/devices/system/cpu/online

Interesting, you are changing long standing meaning of maxcpus=

We always use maxcpus=1 to have one cpu up, and later in user space
to online other cpus like
echo 1 > /sys/devices/system/cpuX/online.

aka maxcpus= is a soft limit or initial online nr.

we already have nr_cpus= for hard limit.

So need to drop
commit 3e275a5ba367ab74b3a4e49114307baed989fcac
Author: Youquan Song <[email protected]>
Date: Fri Jun 7 10:07:08 2013 +1000

drivers/base/cpu.c: fix maxcpus boot option

Greg,
Can you drop that 3e275a5ba36 from your drivers/core tree ?

Thanks

Yinghai

2013-06-12 10:53:28

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: cpu hotplug: possible_cpus broken (again?) next-20130607

On Tuesday, June 11, 2013 09:07:37 PM Yinghai Lu wrote:
> On Wed, Jun 12, 2013 at 5:32 AM, Youquan Song
> <[email protected]> wrote:
> >> On 06/12/2013 05:03 AM, Youquan Song wrote:
> >> > +#ifdef CONFIG_SMP
> >> > + /* return when cpu number greater than maximum number of
> >> > CPUs */
> >> > + if (setup_max_cpus <= num_online_cpus() + 1) {
> >> > + cpu_hotplug_driver_unlock();
> >> > + return -EINVAL;
> >> > + }
> >> > +#endif
> >> > from_nid = cpu_to_node(cpuid);
> >> > ret = cpu_up(cpuid);
> >>
> >> Your patch is line-wrapped.
> >>
> >> Also, the #ifdef is unnecessary. If CONFIG_SMP is off:
> >>
> >> static const unsigned int setup_max_cpus = NR_CPUS;
> >> #define num_online_cpus() 1U
> >>
> >> The compiler will take care of optimizing out the the if() without the
> >> explicit #ifdef.
> >>
> >> Also, the +1 looks goofy to me. Doesn't this do the same thing (and
> >> isn't it much easier to read)?
> >>
> >> if (num_online_cpus() >= setup_max_cpus)
> >>
> >
> > Thanks. Here is a formal patch for it. please review and try.
> >
> > Subject: [PATCH] core: Fix maxcpus boot option broken
> >
> > maxcpus boot option to limit maximum number of CPUs on system, but this option
> > is broken at recent kernel. Though we use maxcpus to limit CPUs number, but
> > current kernel will register all of present CPUs in sysfs.
> > udev will enumerate all registered cpu at sysfs, and it will bring up the CPU
> > if the CPU is offline. So the maxcpus option is broken.
> >
> > This patch will limit the online cpus number not over limitation of maxcpus
> > option. So it will keep the maxcpus limitation when udev enumeration
> > or other intention of bring up CPUs over the limitation by method like
> > echo 1 > /sys/devices/system/cpu/online
>
> Interesting, you are changing long standing meaning of maxcpus=
>
> We always use maxcpus=1 to have one cpu up, and later in user space
> to online other cpus like
> echo 1 > /sys/devices/system/cpuX/online.
>
> aka maxcpus= is a soft limit or initial online nr.
>
> we already have nr_cpus= for hard limit.
>
> So need to drop
> commit 3e275a5ba367ab74b3a4e49114307baed989fcac
> Author: Youquan Song <[email protected]>
> Date: Fri Jun 7 10:07:08 2013 +1000
>
> drivers/base/cpu.c: fix maxcpus boot option

Agreed.

Thanks,
Rafael


> Greg,
> Can you drop that 3e275a5ba36 from your drivers/core tree ?
>
> Thanks
>
> Yinghai
--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

2013-06-12 13:50:39

by Youquan Song

[permalink] [raw]
Subject: Re: cpu hotplug: possible_cpus broken (again?) next-20130607

> > Interesting, you are changing long standing meaning of maxcpus=
> >
> > We always use maxcpus=1 to have one cpu up, and later in user space
> > to online other cpus like
> > echo 1 > /sys/devices/system/cpuX/online.
> >
> > aka maxcpus= is a soft limit or initial online nr.
> >
> > we already have nr_cpus= for hard limit.
> >
> > So need to drop
> > commit 3e275a5ba367ab74b3a4e49114307baed989fcac
> > Author: Youquan Song <[email protected]>
> > Date: Fri Jun 7 10:07:08 2013 +1000
> >
> > drivers/base/cpu.c: fix maxcpus boot option
>
> Agreed.

Yes. I also agree to drop it and the fix need more consideration.
I try use maxcpus to limit cpu number to debug a well known applition
because it fail to run when cpu number is larger to > 69.
When I use maxcpus at to limit the boot CPUs number, but udev will
enable all of the CPUs at 3.10 kernel automatically.
I also try maxcpus at 3.0 kernel, it does not show the maxcpus issue.
I have digged out recently, it is the commit at 3.2 kernel
8a25a2fd126c621f44f3aeaef80d51f00fc11639 "cpu: convert 'cpu' and
'machinecheck' sysdev_class to a regular subsystem" result in udev
automatically enable all of CPUs though maxcpus has been provided.

So the next, I need look at udev try to enable all of CPUs though
maxcpus provided. Possibly, it can also fix it in udev daemon.

Secondly, I think that the maxcpus= option description is too confused in
Documentation/kernel-parameters.txt. The maxcpus and nr_cpus option need
switch their name.
Currently:

maxcpus= [SMP] Maximum number of processors that an SMP kernel
should make use of. maxcpus=n : n >= 0 limits the
kernel to using 'n' processors. n=0 is a special case,
it is equivalent to "nosmp", which also disables
the IO APIC.

How about change to

maxcpus= [SMP] Maximum number of processors that an SMP kernel
bring up during booting. maxcpus=n : n >= 0 limits the
kernel to using 'n' processors. n=0 is a special case,
it is equivalent to "nosmp", which also disables
the IO APIC.


Thanks
-Youquan

2013-06-13 15:36:36

by Toshi Kani

[permalink] [raw]
Subject: Re: cpu hotplug: possible_cpus broken (again?) next-20130607

On Wed, 2013-06-12 at 21:36 -0400, Youquan Song wrote:
> > > Interesting, you are changing long standing meaning of maxcpus=
> > >
> > > We always use maxcpus=1 to have one cpu up, and later in user space
> > > to online other cpus like
> > > echo 1 > /sys/devices/system/cpuX/online.
> > >
> > > aka maxcpus= is a soft limit or initial online nr.
> > >
> > > we already have nr_cpus= for hard limit.
> > >
> > > So need to drop
> > > commit 3e275a5ba367ab74b3a4e49114307baed989fcac
> > > Author: Youquan Song <[email protected]>
> > > Date: Fri Jun 7 10:07:08 2013 +1000
> > >
> > > drivers/base/cpu.c: fix maxcpus boot option
> >
> > Agreed.
>
> Yes. I also agree to drop it and the fix need more consideration.

Agreed.

> I try use maxcpus to limit cpu number to debug a well known applition
> because it fail to run when cpu number is larger to > 69.
> When I use maxcpus at to limit the boot CPUs number, but udev will
> enable all of the CPUs at 3.10 kernel automatically.
> I also try maxcpus at 3.0 kernel, it does not show the maxcpus issue.

I tested Linus's 3.10-rc5 kernel on Fedora 18, but I did not see such
behavior. Could this be related with your environment?

> I have digged out recently, it is the commit at 3.2 kernel
> 8a25a2fd126c621f44f3aeaef80d51f00fc11639 "cpu: convert 'cpu' and
> 'machinecheck' sysdev_class to a regular subsystem" result in udev
> automatically enable all of CPUs though maxcpus has been provided.
>
> So the next, I need look at udev try to enable all of CPUs though
> maxcpus provided. Possibly, it can also fix it in udev daemon.
>
> Secondly, I think that the maxcpus= option description is too confused in
> Documentation/kernel-parameters.txt. The maxcpus and nr_cpus option need
> switch their name.

I am fine with the clarification.

> Currently:
>
> maxcpus= [SMP] Maximum number of processors that an SMP kernel
> should make use of. maxcpus=n : n >= 0 limits the
> kernel to using 'n' processors. n=0 is a special case,
> it is equivalent to "nosmp", which also disables
> the IO APIC.
>
> How about change to
>
> maxcpus= [SMP] Maximum number of processors that an SMP kernel
> bring up during booting. maxcpus=n : n >= 0 limits the

bring -> brings

> kernel to using 'n' processors. n=0 is a special case,
> it is equivalent to "nosmp", which also disables
> the IO APIC.


Thanks,
-Toshi