2015-11-06 05:23:59

by Zhenzhong Duan

[permalink] [raw]
Subject: Question with maxcpus= parameter.

Hi Maintainers,

Recently we faced an cpu online issue with maxcpus= parameter.

We want to have 4 cpus onlined at bootup, test 3.8.13-stable on an 72
cpus env with maxcpus=4, I found more cpus than 4 are onlined.
It's the udev scripts make them onlined. But below script exist for a
long time.
ACTION=="add", KERNEL=="cpu[0-9]*", RUN+="/bin/bash -c 'echo 1 >
/sys/devices/system/cpu/%k/online'"

maxcpu= parameter didn't take effect, so is this a kernel bug? Or that
script should be removed?

Btw: 2.6.39 works fine, I checked udev log, seems CPU ADD event is only
sent for 4cpus.
Why the difference between 2.6.39 and 3.8.13?

thanks
zduan


2015-11-06 06:19:56

by Zhenzhong Duan

[permalink] [raw]
Subject: Re: Question with maxcpus= parameter.

Thanks for your quick response, Raj.

It's OL6 compatible with rhel6.

zduan
在 2015/11/6 14:57, Raj, Ashok 写道:
> Hi Zduan
>
> do you know which distribution it is? This isn't a kernel bug.
>
> if you look at dmesg you should see how many CPUs were booted. But the sysfs
> files are created and the usermode script is bringing every cpu online.
>
> Tony mentioned this to me couple weeks ago when i was fixing another bug
> in maxcpus. I think you can safely remove the script and it should be fine.
>
> Cheers,
> Ashok
>
>
> On Fri, Nov 06, 2015 at 01:24:16PM +0800, Zhenzhong Duan wrote:
>> Hi Maintainers,
>>
>> Recently we faced an cpu online issue with maxcpus= parameter.
>>
>> We want to have 4 cpus onlined at bootup, test 3.8.13-stable on an 72 cpus
>> env with maxcpus=4, I found more cpus than 4 are onlined.
>> It's the udev scripts make them onlined. But below script exist for a long
>> time.
>> ACTION=="add", KERNEL=="cpu[0-9]*", RUN+="/bin/bash -c 'echo 1 >
>> /sys/devices/system/cpu/%k/online'"
>>
>> maxcpu= parameter didn't take effect, so is this a kernel bug? Or that
>> script should be removed?
>>
>> Btw: 2.6.39 works fine, I checked udev log, seems CPU ADD event is only sent
>> for 4cpus.
>> Why the difference between 2.6.39 and 3.8.13?
>>
>> thanks
>> zduan

2015-11-06 05:59:48

by Ashok Raj

[permalink] [raw]
Subject: Re: Question with maxcpus= parameter.

Hi Zduan

do you know which distribution it is? This isn't a kernel bug.

if you look at dmesg you should see how many CPUs were booted. But the sysfs
files are created and the usermode script is bringing every cpu online.

Tony mentioned this to me couple weeks ago when i was fixing another bug
in maxcpus. I think you can safely remove the script and it should be fine.

Cheers,
Ashok


On Fri, Nov 06, 2015 at 01:24:16PM +0800, Zhenzhong Duan wrote:
> Hi Maintainers,
>
> Recently we faced an cpu online issue with maxcpus= parameter.
>
> We want to have 4 cpus onlined at bootup, test 3.8.13-stable on an 72 cpus
> env with maxcpus=4, I found more cpus than 4 are onlined.
> It's the udev scripts make them onlined. But below script exist for a long
> time.
> ACTION=="add", KERNEL=="cpu[0-9]*", RUN+="/bin/bash -c 'echo 1 >
> /sys/devices/system/cpu/%k/online'"
>
> maxcpu= parameter didn't take effect, so is this a kernel bug? Or that
> script should be removed?
>
> Btw: 2.6.39 works fine, I checked udev log, seems CPU ADD event is only sent
> for 4cpus.
> Why the difference between 2.6.39 and 3.8.13?
>
> thanks
> zduan

2015-11-09 05:48:02

by Zhenzhong Duan

[permalink] [raw]
Subject: Re: Question with maxcpus= parameter.

Tried nr_cpus=4, works.

[root@rwssq01 ~]# cat /sys/devices/system/cpu/possible
0-3
[root@rwssq01 ~]# cat /sys/devices/system/cpu/present
0-3
[root@rwssq01 ~]# uname -a
Linux rwssq01.us.oracle.com 3.8.13-44.1.1.el6uek.x86_64 #2 SMP Wed Sep
10 06:10:25 PDT 2014 x86_64 x86_64 x86_64 GNU/Linux

zduan
在 2015/11/7 0:14, Konrad Rzeszutek Wilk 写道:
> On Fri, Nov 06, 2015 at 01:24:16PM +0800, Zhenzhong Duan wrote:
>> Hi Maintainers,
>>
>> Recently we faced an cpu online issue with maxcpus= parameter.
> Did you try 'nr_cpus' ?
>> We want to have 4 cpus onlined at bootup, test 3.8.13-stable on an 72 cpus
>> env with maxcpus=4, I found more cpus than 4 are onlined.
>> It's the udev scripts make them onlined. But below script exist for a long
>> time.
>> ACTION=="add", KERNEL=="cpu[0-9]*", RUN+="/bin/bash -c 'echo 1 >
>> /sys/devices/system/cpu/%k/online'"
>>
>> maxcpu= parameter didn't take effect, so is this a kernel bug? Or that
>> script should be removed?
>>
>> Btw: 2.6.39 works fine, I checked udev log, seems CPU ADD event is only sent
>> for 4cpus.
>> Why the difference between 2.6.39 and 3.8.13?
>>
>> thanks
>> zduan

2015-11-09 06:12:10

by Yinghai Lu

[permalink] [raw]
Subject: Re: Question with maxcpus= parameter.

On Sun, Nov 8, 2015 at 9:47 PM, Zhenzhong Duan
<[email protected]> wrote:
> Tried nr_cpus=4, works.
>

nr_cpus and maxcpus are different.

maxcpus=4 means kernel will only bring up 4 cpus, but other cpus still
can be brought up online.
if there are more cpu are there according acpi MADT.

nr_cpus=4 that means 4 is hard limit, just like you compiled kernel
with CONFIG_NR_CPUS=4.

Yinghai

2015-11-09 09:09:21

by Zhenzhong Duan

[permalink] [raw]
Subject: Re: Question with maxcpus= parameter.

在 2015/11/9 14:12, Yinghai Lu 写道:
> On Sun, Nov 8, 2015 at 9:47 PM, Zhenzhong Duan
> <[email protected]> wrote:
>> Tried nr_cpus=4, works.
>>
> nr_cpus and maxcpus are different.
>
> maxcpus=4 means kernel will only bring up 4 cpus, but other cpus still
> can be brought up online.
> if there are more cpu are there according acpi MADT.
>
> nr_cpus=4 that means 4 is hard limit, just like you compiled kernel
> with CONFIG_NR_CPUS=4.
I know that, what confused me is uek2(2.6.39-400.249.4.el6uek.x86_64)
works with maxcpus=,
but uek3(3.8.13-44.1.1.el6uek.x86_64) not when I don't comment out the
script.
I have ever suspected uek2 send CPU ADD event for only 4 cpus.
dyndbg="file kobject_uevent.c +p" is used when debug, vimdiff with both
dmesg:

uek2 | uek3
......
PCI: Using configuration type 1 for base
access | PCI: Using configuration
type 1 for base access
kobject: 'node0' (ffffffff81d6c9d0):
kobject_uevent_env | kobject: 'node0'
(ffff883f25984410): kobject_uevent_env
kobject: 'cpu0' (ffff88407ec0b2f8):
kobject_uevent_env | kobject: 'cpu0'
(ffff883f7ec0c3d8): kobject_uevent_env
kobject: 'cpu1' (ffff88407ec2b2f8):
kobject_uevent_env | kobject: 'cpu1'
(ffff883f7ec2c3d8): kobject_uevent_env
......
kobject: 'cpu70' (ffff88407f4cb2f8):
kobject_uevent_env | kobject: 'cpu70'
(ffff883f7f4cc3d8): kobject_uevent_env
kobject: 'cpu71' (ffff88407f4eb2f8):
kobject_uevent_env | kobject: 'cpu71'
(ffff883f7f4ec3d8): kobject_uevent_env
......
dracut: dracut-004-356.0.1.el6 | dracut:
dracut-004-356.0.1.el6
dracut: rd_NO_LUKS: removing cryptoluks
activation | dracut: rd_NO_LUKS: removing
cryptoluks activation
kobject: 'dm_mod' (ffffffffa000e290):
kobject_uevent_env | kobject: 'dm_mod'
(ffffffffa000f3b0): kobject_uevent_env
device-mapper: uevent: version
1.0.3 |
device-mapper: uevent: version 1.0.3
udev: starting version 147 | udev: starting version 147
......
kobject: 'cpu0' (ffff883f0076dc10):
kobject_uevent_env | kobject: 'cpu0'
(ffff883f7ec0c3d8): kobject_uevent_env
kobject: 'cpu1' (ffff883f0076d810):
kobject_uevent_env | kobject: 'cpu1'
(ffff883f7ec2c3d8): kobject_uevent_env
kobject: 'cpu2' (ffff883f0076d410):
kobject_uevent_env | kobject: 'cpu10'
(ffff883f7ed4c3d8): kobject_uevent_env
kobject: 'cpu3' (ffff883f0076d010):
kobject_uevent_env | kobject: 'cpu11'
(ffff883f7ed6c3d8): kobject_uevent_env
kobject: 'id' (ffff883f05716c10):
kobject_uevent_env | kobject: 'cpu12'
(ffff883f7ed8c3d8): kobject_uevent_env
kobject: 'fbcon' (ffff883f05779c10):
kobject_uevent_env | kobject: 'cpu13'
(ffff883f7f42c3d8): kobject_uevent_env
...... | ...... (total 72 cpus)
dracut: | dracut:
dracut: Switching root | dracut: Switching root
udev: starting version 147 | udev: starting version 147
kobject: 'cpu0' (ffff883f0076dc10):
kobject_uevent_env | kobject: 'cpu0'
(ffff883f7ec0c3d8): kobject_uevent_env
kobject: 'cpu1' (ffff883f0076d810):
kobject_uevent_env | kobject: 'cpu1'
(ffff883f7ec2c3d8): kobject_uevent_env
kobject: 'cpu2' (ffff883f0076d410):
kobject_uevent_env | kobject: 'cpu10'
(ffff883f7ed4c3d8): kobject_uevent_env
kobject: 'cpu3' (ffff883f0076d010):
kobject_uevent_env | kobject: 'cpu11'
(ffff883f7ed6c3d8): kobject_uevent_env
kobject: 'id' (ffff883f05716c10):
kobject_uevent_env | kobject: 'cpu12'
(ffff883f7ed8c3d8): kobject_uevent_env
kobject: 'fbcon' (ffff883f05779c10):
kobject_uevent_env | kobject: 'cpu13'
(ffff883f7f42c3d8): kobject_uevent_env
...... | ...... (total 72 cpus)

I looked at the path to send event at bootup, it's almost same, the
dmesg confirmed this.
uek2
topology_init->arch_register_cpu->register_cpu->sysdev_register->kobject_uevent(&dev->kobj,
KOBJ_ADD);
uek3
topology_init->arch_register_cpu->register_cpu->device_register->kobject_uevent(&dev->kobj,
KOBJ_ADD);

But I don't find who send event again after udev start with different
cpu count.

thanks
zduan

2015-11-09 17:04:46

by Yinghai Lu

[permalink] [raw]
Subject: Re: Question with maxcpus= parameter.

On Mon, Nov 9, 2015 at 1:09 AM, Zhenzhong Duan
<[email protected]> wrote:

> I know that, what confused me is uek2(2.6.39-400.249.4.el6uek.x86_64) works
> with maxcpus=,
> but uek3(3.8.13-44.1.1.el6uek.x86_64) not when I don't comment out the
> script.
> I have ever suspected uek2 send CPU ADD event for only 4 cpus.
> dyndbg="file kobject_uevent.c +p" is used when debug, vimdiff with both
> dmesg:

Should be a regression. Can you bisect it?

Thanks

Yinghai