2010-01-27 23:35:59

by zhiyi

[permalink] [raw]
Subject: nr_cpu_ids incorrect on AMD Quad-Core Opteron 8380

Hi,
I have just found the value of nr_cpu_ids is 32, instead of 16 on my
new Dell PowerEdge R905 which has 16 cpus (4 quad-core Opteron 8380).
However, /proc/cpuinfo displays the right number (16).
I have searched the archive but found no related subject. Is there
any patch already available to fix the problem?
I am using 2.6.31-4 and verified this incorrect value (32) using a
module.
Please CC responses to my personal email.
Cheers,
Zhiyi


2010-01-30 17:18:14

by Borislav Petkov

[permalink] [raw]
Subject: Re: nr_cpu_ids incorrect on AMD Quad-Core Opteron 8380

On Thu, Jan 28, 2010 at 12:16:44PM +1300, zhiyi wrote:
> Hi,
> I have just found the value of nr_cpu_ids is 32, instead of 16 on my
> new Dell PowerEdge R905 which has 16 cpus (4 quad-core Opteron
> 8380). However, /proc/cpuinfo displays the right number (16).
> I have searched the archive but found no related subject. Is there
> any patch already available to fix the problem?
> I am using 2.6.31-4 and verified this incorrect value (32) using a
> module.

nr_cpu_ids is set to CONFIG_NR_CPUS on SMP kernels and denotes the max
CPUs supported by your kernel. This is not a bug but a configurable
option in the kernel for saving memory.

You can still set CONFIG_NR_CPUS in "-> Processor type and features" to
the max number of cores N you have on your machine if you want to save
approx 8*N KB. But guessing from your quadsocket configuration, memory
shouldn't be that scarce on that machine to go the trouble :).

--
Regards/Gruss,
Boris.

2010-01-31 23:43:04

by zhiyi

[permalink] [raw]
Subject: Re: nr_cpu_ids incorrect on AMD Quad-Core Opteron 8380

Hi Boris,
Thanks for your answer.
However, your answer doesn't fully resolve my puzzle.
My module worked for a kernel 2.6.29 with CONFIG_NR_CPUS=64 on my dual
quad-core AMD Opteron. The value of nr_cpu_ids I got from the same
module was correct (8). It seems nr_cpu_ids has nothing to do with
CONFIG_NR_CPUS.
Cheers,
Zhiyi

On 31/01/2010, at 6:18 AM, Borislav Petkov wrote:

> On Thu, Jan 28, 2010 at 12:16:44PM +1300, zhiyi wrote:
>> Hi,
>> I have just found the value of nr_cpu_ids is 32, instead of 16 on my
>> new Dell PowerEdge R905 which has 16 cpus (4 quad-core Opteron
>> 8380). However, /proc/cpuinfo displays the right number (16).
>> I have searched the archive but found no related subject. Is there
>> any patch already available to fix the problem?
>> I am using 2.6.31-4 and verified this incorrect value (32) using a
>> module.
>
> nr_cpu_ids is set to CONFIG_NR_CPUS on SMP kernels and denotes the max
> CPUs supported by your kernel. This is not a bug but a configurable
> option in the kernel for saving memory.
>
> You can still set CONFIG_NR_CPUS in "-> Processor type and features"
> to
> the max number of cores N you have on your machine if you want to save
> approx 8*N KB. But guessing from your quadsocket configuration, memory
> shouldn't be that scarce on that machine to go the trouble :).
>
> --
> Regards/Gruss,
> Boris.
>

=======================
Zhiyi Huang
Dept of Computer Science
University of Otago
Email: [email protected]
Phone: 0064-3-4795680
Fax: 0064-3-4798529

2010-02-01 14:33:17

by Borislav Petkov

[permalink] [raw]
Subject: Re: nr_cpu_ids incorrect on AMD Quad-Core Opteron 8380

On Sun, Jan 31, 2010 at 11:59 PM, zhiyi <[email protected]> wrote:
> My module worked for a kernel 2.6.29 with CONFIG_NR_CPUS=64 on my dual
> quad-core AMD Opteron. The value of nr_cpu_ids I got from the same module
> was correct (8).

This could mean that your include/generated/autoconf.h, which is used by
external modules, is not updated and contains CONFIG_NR_CPUS=8. Do

make mrproper

in your kernel directory by moving your .config out of the way first and
then rebuild your kernel and external module(s). nr_cpu_ids should pick
up the updated CONFIG_NR_CPUS value from your .config.

--
Regards/Gruss,
Boris

2010-02-01 15:04:48

by Borislav Petkov

[permalink] [raw]
Subject: Re: nr_cpu_ids incorrect on AMD Quad-Core Opteron 8380

On Mon, Feb 1, 2010 at 2:57 PM, Borislav Petkov <[email protected]> wrote:
> On Sun, Jan 31, 2010 at 11:59 PM, zhiyi <[email protected]> wrote:
>> My module worked for a kernel 2.6.29 with CONFIG_NR_CPUS=64 on my dual
>> quad-core AMD Opteron. The value of nr_cpu_ids I got from the same module
>> was correct (8).
>
> This could mean that your include/generated/autoconf.h, which is used by
> external modules, is not updated and contains CONFIG_NR_CPUS=8. Do
>
> make mrproper
>
> in your kernel directory by moving your .config out of the way first and
> then rebuild your kernel and external module(s). nr_cpu_ids should pick
> up the updated CONFIG_NR_CPUS value from your .config.

That's actually not necessary - the nr_cpu_ids thing is set at compile
time to CONFIG_NR_CPUS but then capped to a possibly lower value upon
boot depending on the info in the ACPI mptables. Can you please send the
.config and the full dmesg of the quadsocket machine?

Thanks.

--
Regards/Gruss,
Boris

2010-02-02 00:35:18

by zhiyi

[permalink] [raw]
Subject: Re: nr_cpu_ids incorrect on AMD Quad-Core Opteron 8380


On 2/02/2010, at 2:57 AM, Borislav Petkov wrote:

> On Sun, Jan 31, 2010 at 11:59 PM, zhiyi <[email protected]> wrote:
>> My module worked for a kernel 2.6.29 with CONFIG_NR_CPUS=64 on my
>> dual
>> quad-core AMD Opteron. The value of nr_cpu_ids I got from the same
>> module
>> was correct (8).
>
> This could mean that your include/generated/autoconf.h, which is
> used by
> external modules, is not updated and contains CONFIG_NR_CPUS=8.

I have checked include/linux/autoconf.h in my linux src and found
#define CONFIG_NR_CPUS 64

but I don't see any dir like "generated" under include/

By the way, I compiled my module independent of the compilation of
Linux tree, i.e. after the kernel is compiled and booted.

Cheers,
Zhiyi


> Do
>
> make mrproper
>
> in your kernel directory by moving your .config out of the way first
> and
> then rebuild your kernel and external module(s). nr_cpu_ids should
> pick
> up the updated CONFIG_NR_CPUS value from your .config.
>
> --
> Regards/Gruss,
> Boris
>

2010-02-02 00:49:11

by zhiyi

[permalink] [raw]
Subject: Re: nr_cpu_ids incorrect on AMD Quad-Core Opteron 8380


On 2/02/2010, at 4:04 AM, Borislav Petkov wrote:

> On Mon, Feb 1, 2010 at 2:57 PM, Borislav Petkov <[email protected]
> > wrote:
>> On Sun, Jan 31, 2010 at 11:59 PM, zhiyi <[email protected]>
>> wrote:
>>> My module worked for a kernel 2.6.29 with CONFIG_NR_CPUS=64 on my
>>> dual
>>> quad-core AMD Opteron. The value of nr_cpu_ids I got from the same
>>> module
>>> was correct (8).
>>
>> This could mean that your include/generated/autoconf.h, which is
>> used by
>> external modules, is not updated and contains CONFIG_NR_CPUS=8. Do
>>
>> make mrproper
>>
>> in your kernel directory by moving your .config out of the way
>> first and
>> then rebuild your kernel and external module(s). nr_cpu_ids should
>> pick
>> up the updated CONFIG_NR_CPUS value from your .config.
>
> That's actually not necessary - the nr_cpu_ids thing is set at compile
> time to CONFIG_NR_CPUS but then capped to a possibly lower value upon
> boot depending on the info in the ACPI mptables. Can you please send
> the
> .config and the full dmesg of the quadsocket machine?

The dmesg relevant to nr_cpu_ids of the quad socket is:

[ 0.000000] NR_CPUS:64 nr_cpumask_bits:64 nr_cpu_ids:32 nr_node_ids:4

For comparison purposes, the dmesg for the dual socket is:

[ 0.000000] NR_CPUS:64 nr_cpumask_bits:64 nr_cpu_ids:8 nr_node_ids:1

It seems nr_node_ids:1 for the dual socket is not quite right though.

Cheers,
Zhiyi




>
>
> Thanks.
>
> --
> Regards/Gruss,
> Boris
>

=======================
Zhiyi Huang
Dept of Computer Science
University of Otago
Email: [email protected]
Phone: 0064-3-4795680
Fax: 0064-3-4798529

2010-02-02 07:08:01

by Borislav Petkov

[permalink] [raw]
Subject: Re: nr_cpu_ids incorrect on AMD Quad-Core Opteron 8380

On Tue, Feb 02, 2010 at 01:49:06PM +1300, zhiyi wrote:
> The dmesg relevant to nr_cpu_ids of the quad socket is:
>
> [ 0.000000] NR_CPUS:64 nr_cpumask_bits:64 nr_cpu_ids:32 nr_node_ids:4
>

No, this is not the relevant info. I was actually looking for the MADT ACPI
table and whether there are disabled entries like so:

[ 0.000000] ACPI: LAPIC (acpi-id[0x03] lapic-id[0x02] enabled)
[ 0.000000] ACPI: LAPIC (acpi-id[0x04] lapic-id[0x03] enabled)
[ 0.000000] ACPI: LAPIC (acpi-id[0x05] lapic-id[0x84] disabled)
[ 0.000000] ACPI: LAPIC (acpi-id[0x06] lapic-id[0x85] disabled)

because if there are, nr_cpu_ids will include those when you don't boot
with "possible_cpus=N".

Anyhow, you can read this for more info: http://www.pubbs.net/kernel/200912/64310/

--
Regards/Gruss,
Boris.

2010-02-02 21:33:53

by zhiyi

[permalink] [raw]
Subject: Re: nr_cpu_ids incorrect on AMD Quad-Core Opteron 8380


On 2/02/2010, at 8:07 PM, Borislav Petkov wrote:

> On Tue, Feb 02, 2010 at 01:49:06PM +1300, zhiyi wrote:
>> The dmesg relevant to nr_cpu_ids of the quad socket is:
>>
>> [ 0.000000] NR_CPUS:64 nr_cpumask_bits:64 nr_cpu_ids:32
>> nr_node_ids:4
>>
>
> No, this is not the relevant info. I was actually looking for the
> MADT ACPI
> table and whether there are disabled entries like so:
>
> [ 0.000000] ACPI: LAPIC (acpi-id[0x03] lapic-id[0x02] enabled)
> [ 0.000000] ACPI: LAPIC (acpi-id[0x04] lapic-id[0x03] enabled)
> [ 0.000000] ACPI: LAPIC (acpi-id[0x05] lapic-id[0x84] disabled)
> [ 0.000000] ACPI: LAPIC (acpi-id[0x06] lapic-id[0x85] disabled)
>
> because if there are, nr_cpu_ids will include those when you don't
> boot
> with "possible_cpus=N".

Thanks. I understand the problem now. Below is the related message on
my quad-socket. Not surprisingly nr_cpu_ids should be 32 since the
machine can potentially accommodate 32 cores.

The good thing is that I realized I should use num_present_cpus()
instead of nr_cpu_ids for my purpose (get the number of real cores on
the machine)

Many thanks,
Zhiyi


[ 0.000000] ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled)
[ 0.000000] ACPI: LAPIC (acpi_id[0x02] lapic_id[0x0c] enabled)
[ 0.000000] ACPI: LAPIC (acpi_id[0x03] lapic_id[0x08] enabled)
[ 0.000000] ACPI: LAPIC (acpi_id[0x04] lapic_id[0x04] enabled)
[ 0.000000] ACPI: LAPIC (acpi_id[0x05] lapic_id[0x01] enabled)
[ 0.000000] ACPI: LAPIC (acpi_id[0x06] lapic_id[0x0d] enabled)
[ 0.000000] ACPI: LAPIC (acpi_id[0x07] lapic_id[0x09] enabled)
[ 0.000000] ACPI: LAPIC (acpi_id[0x08] lapic_id[0x05] enabled)
[ 0.000000] ACPI: LAPIC (acpi_id[0x09] lapic_id[0x02] enabled)
[ 0.000000] ACPI: LAPIC (acpi_id[0x0a] lapic_id[0x0e] enabled)
[ 0.000000] ACPI: LAPIC (acpi_id[0x0b] lapic_id[0x0a] enabled)
[ 0.000000] ACPI: LAPIC (acpi_id[0x0c] lapic_id[0x06] enabled)
[ 0.000000] ACPI: LAPIC (acpi_id[0x0d] lapic_id[0x03] enabled)
[ 0.000000] ACPI: LAPIC (acpi_id[0x0e] lapic_id[0x0f] enabled)
[ 0.000000] ACPI: LAPIC (acpi_id[0x0f] lapic_id[0x0b] enabled)
[ 0.000000] ACPI: LAPIC (acpi_id[0x10] lapic_id[0x07] enabled)
[ 0.000000] ACPI: LAPIC (acpi_id[0x11] lapic_id[0x20] disabled)
[ 0.000000] ACPI: LAPIC (acpi_id[0x12] lapic_id[0x21] disabled)
[ 0.000000] ACPI: LAPIC (acpi_id[0x13] lapic_id[0x22] disabled)
[ 0.000000] ACPI: LAPIC (acpi_id[0x14] lapic_id[0x23] disabled)
[ 0.000000] ACPI: LAPIC (acpi_id[0x15] lapic_id[0x24] disabled)
[ 0.000000] ACPI: LAPIC (acpi_id[0x16] lapic_id[0x25] disabled)
[ 0.000000] ACPI: LAPIC (acpi_id[0x17] lapic_id[0x26] disabled)
[ 0.000000] ACPI: LAPIC (acpi_id[0x18] lapic_id[0x27] disabled)
[ 0.000000] ACPI: LAPIC (acpi_id[0x19] lapic_id[0x28] disabled)
[ 0.000000] ACPI: LAPIC (acpi_id[0x1a] lapic_id[0x29] disabled)
[ 0.000000] ACPI: LAPIC (acpi_id[0x1b] lapic_id[0x2a] disabled)
[ 0.000000] ACPI: LAPIC (acpi_id[0x1c] lapic_id[0x2b] disabled)
[ 0.000000] ACPI: LAPIC (acpi_id[0x1d] lapic_id[0x2c] disabled)
[ 0.000000] ACPI: LAPIC (acpi_id[0x1e] lapic_id[0x2d] disabled)
[ 0.000000] ACPI: LAPIC (acpi_id[0x1f] lapic_id[0x2e] disabled)
[ 0.000000] ACPI: LAPIC (acpi_id[0x20] lapic_id[0x2f] disabled)




>
>
> Anyhow, you can read this for more info: http://www.pubbs.net/kernel/200912/64310/
>
> --
> Regards/Gruss,
> Boris.
>

=======================
Zhiyi Huang
Dept of Computer Science
University of Otago
Email: [email protected]
Phone: 0064-3-4795680
Fax: 0064-3-4798529

2010-02-03 07:27:17

by Borislav Petkov

[permalink] [raw]
Subject: Re: nr_cpu_ids incorrect on AMD Quad-Core Opteron 8380

On Wed, Feb 03, 2010 at 10:33:46AM +1300, zhiyi wrote:
> The good thing is that I realized I should use num_present_cpus()
> instead of nr_cpu_ids for my purpose (get the number of real cores
> on the machine)

Just a minor thing: there could be a subtlety with hotplug when
using num_present_cpus() since this includes all CPUs, even the
hotplug-offlined ones. Depending on your case, you might want to use
num_online_cpus() instead. Look at the comment at the beginning of
<include/linux/cpumask.h> which explains all the different masks to
figure out which one fits your needs best.

--
Regards/Gruss,
Boris.