2018-02-07 18:45:14

by Simon Gaiser

[permalink] [raw]
Subject: Re: [v6,3/3] x86/smpboot: Fix __max_logical_packages estimate

Prarit Bhargava:
> A system booted with a small number of cores enabled per package
> panics because the estimate of __max_logical_packages is too low.
> This occurs when the total number of active cores across all packages
> is less than the maximum core count for a single package.
>
> ie) On a 4 package system with 20 cores/package where only 4 cores
> are enabled on each package, the value of __max_logical_packages is
> calculated as DIV_ROUND_UP(16 / 20) = 1 and not 4.
>
> Calculate __max_logical_packages after the cpu enumeration has completed.
> Use the boot cpu's data to extrapolate the number of packages.
>
> Signed-off-by: Prarit Bhargava <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: "H. Peter Anvin" <[email protected]>
> Cc: [email protected]
> Cc: Peter Zijlstra <[email protected]>
> Cc: Andi Kleen <[email protected]>
> Cc: Dave Hansen <[email protected]>
> Cc: Piotr Luc <[email protected]>
> Cc: Kan Liang <[email protected]>
> Cc: Borislav Petkov <[email protected]>
> Cc: Stephane Eranian <[email protected]>
> Cc: Prarit Bhargava <[email protected]>
> Cc: Arvind Yadav <[email protected]>
> Cc: Andy Lutomirski <[email protected]>
> Cc: Christian Borntraeger <[email protected]>
> Cc: "Kirill A. Shutemov" <[email protected]>
> Cc: Tom Lendacky <[email protected]>
> Cc: He Chen <[email protected]>
> Cc: Mathias Krause <[email protected]>
> Cc: Tim Chen <[email protected]>
> Cc: Vitaly Kuznetsov <[email protected]>
> ---
> arch/x86/kernel/smpboot.c | 55 +++++++++--------------------------------------
> 1 file changed, 10 insertions(+), 45 deletions(-)
>
> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> index 838d36ff7ba6..2e3c5a394e79 100644
> --- a/arch/x86/kernel/smpboot.c
> +++ b/arch/x86/kernel/smpboot.c
> @@ -308,12 +308,6 @@ int topology_update_package_map(unsigned int pkg, unsigned int cpu)
> if (new >= 0)
> goto found;
>
> - if (logical_packages >= __max_logical_packages) {
> - pr_warn("Package %u of CPU %u exceeds BIOS package data %u.\n",
> - logical_packages, cpu, __max_logical_packages);
> - return -ENOSPC;
> - }
> -
> new = logical_packages++;
> if (new != pkg)
> pr_info("CPU %u Converting physical %u to logical package %u\n",
> @@ -323,44 +317,6 @@ int topology_update_package_map(unsigned int pkg, unsigned int cpu)
> return 0;
> }
>
> -static void __init smp_init_package_map(struct cpuinfo_x86 *c, unsigned int cpu)
> -{
> - unsigned int ncpus;
> -
> - /*
> - * Today neither Intel nor AMD support heterogenous systems. That
> - * might change in the future....
> - *
> - * While ideally we'd want '* smp_num_siblings' in the below @ncpus
> - * computation, this won't actually work since some Intel BIOSes
> - * report inconsistent HT data when they disable HT.
> - *
> - * In particular, they reduce the APIC-IDs to only include the cores,
> - * but leave the CPUID topology to say there are (2) siblings.
> - * This means we don't know how many threads there will be until
> - * after the APIC enumeration.
> - *
> - * By not including this we'll sometimes over-estimate the number of
> - * logical packages by the amount of !present siblings, but this is
> - * still better than MAX_LOCAL_APIC.
> - *
> - * We use total_cpus not nr_cpu_ids because nr_cpu_ids can be limited
> - * on the command line leading to a similar issue as the HT disable
> - * problem because the hyperthreads are usually enumerated after the
> - * primary cores.
> - */
> - ncpus = boot_cpu_data.x86_max_cores;
> - if (!ncpus) {
> - pr_warn("x86_max_cores == zero !?!?");
> - ncpus = 1;
> - }
> -
> - __max_logical_packages = DIV_ROUND_UP(total_cpus, ncpus);
> - pr_info("Max logical packages: %u\n", __max_logical_packages);
> -
> - topology_update_package_map(c->phys_proc_id, cpu);
> -}
> -
> void __init smp_store_boot_cpu_info(void)
> {
> int id = 0; /* CPU 0 */
> @@ -368,7 +324,7 @@ void __init smp_store_boot_cpu_info(void)
>
> *c = boot_cpu_data;
> c->cpu_index = id;
> - smp_init_package_map(c, id);
> + topology_update_package_map(c->phys_proc_id, id);
> cpu_data(id).set = 1;
> }
>
> @@ -1371,7 +1327,16 @@ void __init native_smp_prepare_boot_cpu(void)
>
> void __init native_smp_cpus_done(unsigned int max_cpus)
> {
> + int ncpus;
> +
> pr_debug("Boot done\n");
> + /*
> + * Today neither Intel nor AMD support heterogenous systems so
> + * extrapolate the boot cpu's data to all packages.
> + */
> + ncpus = cpu_data(0).booted_cores * smp_num_siblings;
> + __max_logical_packages = DIV_ROUND_UP(nr_cpu_ids, ncpus);
> + pr_info("Max logical packages: %u\n", __max_logical_packages);
>
> if (x86_has_numa_in_package)
> set_sched_topology(x86_numa_in_package_topology);

This breaks booting as Xen PV domain for me. The problem seems to be
that native_smp_cpus_done() is never called on a PV domain. So
__max_logical_packages is uninitialized and this leads to a NULL
pointer dereference in coretemp.


Attachments:
signature.asc (849.00 B)
OpenPGP digital signature

2018-02-07 19:07:16

by Prarit Bhargava

[permalink] [raw]
Subject: Re: [v6,3/3] x86/smpboot: Fix __max_logical_packages estimate



On 02/07/2018 01:44 PM, Simon Gaiser wrote:
> Prarit Bhargava:
>> A system booted with a small number of cores enabled per package
>> panics because the estimate of __max_logical_packages is too low.
>> This occurs when the total number of active cores across all packages
>> is less than the maximum core count for a single package.
>>
>> ie) On a 4 package system with 20 cores/package where only 4 cores
>> are enabled on each package, the value of __max_logical_packages is
>> calculated as DIV_ROUND_UP(16 / 20) = 1 and not 4.
>>
>> Calculate __max_logical_packages after the cpu enumeration has completed.
>> Use the boot cpu's data to extrapolate the number of packages.
>>
>> Signed-off-by: Prarit Bhargava <[email protected]>
>> Cc: Thomas Gleixner <[email protected]>
>> Cc: Ingo Molnar <[email protected]>
>> Cc: "H. Peter Anvin" <[email protected]>
>> Cc: [email protected]
>> Cc: Peter Zijlstra <[email protected]>
>> Cc: Andi Kleen <[email protected]>
>> Cc: Dave Hansen <[email protected]>
>> Cc: Piotr Luc <[email protected]>
>> Cc: Kan Liang <[email protected]>
>> Cc: Borislav Petkov <[email protected]>
>> Cc: Stephane Eranian <[email protected]>
>> Cc: Prarit Bhargava <[email protected]>
>> Cc: Arvind Yadav <[email protected]>
>> Cc: Andy Lutomirski <[email protected]>
>> Cc: Christian Borntraeger <[email protected]>
>> Cc: "Kirill A. Shutemov" <[email protected]>
>> Cc: Tom Lendacky <[email protected]>
>> Cc: He Chen <[email protected]>
>> Cc: Mathias Krause <[email protected]>
>> Cc: Tim Chen <[email protected]>
>> Cc: Vitaly Kuznetsov <[email protected]>
>> ---
>> arch/x86/kernel/smpboot.c | 55 +++++++++--------------------------------------
>> 1 file changed, 10 insertions(+), 45 deletions(-)
>>
>> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
>> index 838d36ff7ba6..2e3c5a394e79 100644
>> --- a/arch/x86/kernel/smpboot.c
>> +++ b/arch/x86/kernel/smpboot.c
>> @@ -308,12 +308,6 @@ int topology_update_package_map(unsigned int pkg, unsigned int cpu)
>> if (new >= 0)
>> goto found;
>>
>> - if (logical_packages >= __max_logical_packages) {
>> - pr_warn("Package %u of CPU %u exceeds BIOS package data %u.\n",
>> - logical_packages, cpu, __max_logical_packages);
>> - return -ENOSPC;
>> - }
>> -
>> new = logical_packages++;
>> if (new != pkg)
>> pr_info("CPU %u Converting physical %u to logical package %u\n",
>> @@ -323,44 +317,6 @@ int topology_update_package_map(unsigned int pkg, unsigned int cpu)
>> return 0;
>> }
>>
>> -static void __init smp_init_package_map(struct cpuinfo_x86 *c, unsigned int cpu)
>> -{
>> - unsigned int ncpus;
>> -
>> - /*
>> - * Today neither Intel nor AMD support heterogenous systems. That
>> - * might change in the future....
>> - *
>> - * While ideally we'd want '* smp_num_siblings' in the below @ncpus
>> - * computation, this won't actually work since some Intel BIOSes
>> - * report inconsistent HT data when they disable HT.
>> - *
>> - * In particular, they reduce the APIC-IDs to only include the cores,
>> - * but leave the CPUID topology to say there are (2) siblings.
>> - * This means we don't know how many threads there will be until
>> - * after the APIC enumeration.
>> - *
>> - * By not including this we'll sometimes over-estimate the number of
>> - * logical packages by the amount of !present siblings, but this is
>> - * still better than MAX_LOCAL_APIC.
>> - *
>> - * We use total_cpus not nr_cpu_ids because nr_cpu_ids can be limited
>> - * on the command line leading to a similar issue as the HT disable
>> - * problem because the hyperthreads are usually enumerated after the
>> - * primary cores.
>> - */
>> - ncpus = boot_cpu_data.x86_max_cores;
>> - if (!ncpus) {
>> - pr_warn("x86_max_cores == zero !?!?");
>> - ncpus = 1;
>> - }
>> -
>> - __max_logical_packages = DIV_ROUND_UP(total_cpus, ncpus);
>> - pr_info("Max logical packages: %u\n", __max_logical_packages);
>> -
>> - topology_update_package_map(c->phys_proc_id, cpu);
>> -}
>> -
>> void __init smp_store_boot_cpu_info(void)
>> {
>> int id = 0; /* CPU 0 */
>> @@ -368,7 +324,7 @@ void __init smp_store_boot_cpu_info(void)
>>
>> *c = boot_cpu_data;
>> c->cpu_index = id;
>> - smp_init_package_map(c, id);
>> + topology_update_package_map(c->phys_proc_id, id);
>> cpu_data(id).set = 1;
>> }
>>
>> @@ -1371,7 +1327,16 @@ void __init native_smp_prepare_boot_cpu(void)
>>
>> void __init native_smp_cpus_done(unsigned int max_cpus)
>> {
>> + int ncpus;
>> +
>> pr_debug("Boot done\n");
>> + /*
>> + * Today neither Intel nor AMD support heterogenous systems so
>> + * extrapolate the boot cpu's data to all packages.
>> + */
>> + ncpus = cpu_data(0).booted_cores * smp_num_siblings;
>> + __max_logical_packages = DIV_ROUND_UP(nr_cpu_ids, ncpus);
>> + pr_info("Max logical packages: %u\n", __max_logical_packages);
>>
>> if (x86_has_numa_in_package)
>> set_sched_topology(x86_numa_in_package_topology);
>
> This breaks booting as Xen PV domain for me. The problem seems to be
> that native_smp_cpus_done() is never called on a PV domain. So
> __max_logical_packages is uninitialized and this leads to a NULL
> pointer dereference in coretemp.
>

I'll see if I can figure out a way to test that. Does 947134d9b00f
("x86/smpboot: Do not use smp_num_siblings in __max_logical_packages
calculation") help?

P.

2018-02-07 19:27:53

by Simon Gaiser

[permalink] [raw]
Subject: Re: [v6,3/3] x86/smpboot: Fix __max_logical_packages estimate

Prarit Bhargava:
> On 02/07/2018 01:44 PM, Simon Gaiser wrote:
>> Prarit Bhargava:
>>> A system booted with a small number of cores enabled per package
>>> panics because the estimate of __max_logical_packages is too low.
>>> This occurs when the total number of active cores across all packages
>>> is less than the maximum core count for a single package.
>>>
>>> ie) On a 4 package system with 20 cores/package where only 4 cores
>>> are enabled on each package, the value of __max_logical_packages is
>>> calculated as DIV_ROUND_UP(16 / 20) = 1 and not 4.
>>>
>>> Calculate __max_logical_packages after the cpu enumeration has completed.
>>> Use the boot cpu's data to extrapolate the number of packages.
>>>
>>> Signed-off-by: Prarit Bhargava <[email protected]>
>>> Cc: Thomas Gleixner <[email protected]>
>>> Cc: Ingo Molnar <[email protected]>
>>> Cc: "H. Peter Anvin" <[email protected]>
>>> Cc: [email protected]
>>> Cc: Peter Zijlstra <[email protected]>
>>> Cc: Andi Kleen <[email protected]>
>>> Cc: Dave Hansen <[email protected]>
>>> Cc: Piotr Luc <[email protected]>
>>> Cc: Kan Liang <[email protected]>
>>> Cc: Borislav Petkov <[email protected]>
>>> Cc: Stephane Eranian <[email protected]>
>>> Cc: Prarit Bhargava <[email protected]>
>>> Cc: Arvind Yadav <[email protected]>
>>> Cc: Andy Lutomirski <[email protected]>
>>> Cc: Christian Borntraeger <[email protected]>
>>> Cc: "Kirill A. Shutemov" <[email protected]>
>>> Cc: Tom Lendacky <[email protected]>
>>> Cc: He Chen <[email protected]>
>>> Cc: Mathias Krause <[email protected]>
>>> Cc: Tim Chen <[email protected]>
>>> Cc: Vitaly Kuznetsov <[email protected]>
>>> ---
>>> arch/x86/kernel/smpboot.c | 55 +++++++++--------------------------------------
>>> 1 file changed, 10 insertions(+), 45 deletions(-)
>>>
>>> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
>>> index 838d36ff7ba6..2e3c5a394e79 100644
>>> --- a/arch/x86/kernel/smpboot.c
>>> +++ b/arch/x86/kernel/smpboot.c
>>> @@ -308,12 +308,6 @@ int topology_update_package_map(unsigned int pkg, unsigned int cpu)
>>> if (new >= 0)
>>> goto found;
>>>
>>> - if (logical_packages >= __max_logical_packages) {
>>> - pr_warn("Package %u of CPU %u exceeds BIOS package data %u.\n",
>>> - logical_packages, cpu, __max_logical_packages);
>>> - return -ENOSPC;
>>> - }
>>> -
>>> new = logical_packages++;
>>> if (new != pkg)
>>> pr_info("CPU %u Converting physical %u to logical package %u\n",
>>> @@ -323,44 +317,6 @@ int topology_update_package_map(unsigned int pkg, unsigned int cpu)
>>> return 0;
>>> }
>>>
>>> -static void __init smp_init_package_map(struct cpuinfo_x86 *c, unsigned int cpu)
>>> -{
>>> - unsigned int ncpus;
>>> -
>>> - /*
>>> - * Today neither Intel nor AMD support heterogenous systems. That
>>> - * might change in the future....
>>> - *
>>> - * While ideally we'd want '* smp_num_siblings' in the below @ncpus
>>> - * computation, this won't actually work since some Intel BIOSes
>>> - * report inconsistent HT data when they disable HT.
>>> - *
>>> - * In particular, they reduce the APIC-IDs to only include the cores,
>>> - * but leave the CPUID topology to say there are (2) siblings.
>>> - * This means we don't know how many threads there will be until
>>> - * after the APIC enumeration.
>>> - *
>>> - * By not including this we'll sometimes over-estimate the number of
>>> - * logical packages by the amount of !present siblings, but this is
>>> - * still better than MAX_LOCAL_APIC.
>>> - *
>>> - * We use total_cpus not nr_cpu_ids because nr_cpu_ids can be limited
>>> - * on the command line leading to a similar issue as the HT disable
>>> - * problem because the hyperthreads are usually enumerated after the
>>> - * primary cores.
>>> - */
>>> - ncpus = boot_cpu_data.x86_max_cores;
>>> - if (!ncpus) {
>>> - pr_warn("x86_max_cores == zero !?!?");
>>> - ncpus = 1;
>>> - }
>>> -
>>> - __max_logical_packages = DIV_ROUND_UP(total_cpus, ncpus);
>>> - pr_info("Max logical packages: %u\n", __max_logical_packages);
>>> -
>>> - topology_update_package_map(c->phys_proc_id, cpu);
>>> -}
>>> -
>>> void __init smp_store_boot_cpu_info(void)
>>> {
>>> int id = 0; /* CPU 0 */
>>> @@ -368,7 +324,7 @@ void __init smp_store_boot_cpu_info(void)
>>>
>>> *c = boot_cpu_data;
>>> c->cpu_index = id;
>>> - smp_init_package_map(c, id);
>>> + topology_update_package_map(c->phys_proc_id, id);
>>> cpu_data(id).set = 1;
>>> }
>>>
>>> @@ -1371,7 +1327,16 @@ void __init native_smp_prepare_boot_cpu(void)
>>>
>>> void __init native_smp_cpus_done(unsigned int max_cpus)
>>> {
>>> + int ncpus;
>>> +
>>> pr_debug("Boot done\n");
>>> + /*
>>> + * Today neither Intel nor AMD support heterogenous systems so
>>> + * extrapolate the boot cpu's data to all packages.
>>> + */
>>> + ncpus = cpu_data(0).booted_cores * smp_num_siblings;
>>> + __max_logical_packages = DIV_ROUND_UP(nr_cpu_ids, ncpus);
>>> + pr_info("Max logical packages: %u\n", __max_logical_packages);
>>>
>>> if (x86_has_numa_in_package)
>>> set_sched_topology(x86_numa_in_package_topology);
>>
>> This breaks booting as Xen PV domain for me. The problem seems to be
>> that native_smp_cpus_done() is never called on a PV domain. So
>> __max_logical_packages is uninitialized and this leads to a NULL
>> pointer dereference in coretemp.
>>
>
> I'll see if I can figure out a way to test that.

Ok, thanks. If you need some logs, or if I should test something just
ask.

> Does 947134d9b00f
> ("x86/smpboot: Do not use smp_num_siblings in __max_logical_packages
> calculation") help?

No.


Attachments:
signature.asc (849.00 B)
OpenPGP digital signature

2018-02-07 19:32:07

by Prarit Bhargava

[permalink] [raw]
Subject: Re: [v6,3/3] x86/smpboot: Fix __max_logical_packages estimate



On 02/07/2018 02:26 PM, Simon Gaiser wrote:
> Prarit Bhargava:
>> On 02/07/2018 01:44 PM, Simon Gaiser wrote:

<snip>

>>> This breaks booting as Xen PV domain for me. The problem seems to be
>>> that native_smp_cpus_done() is never called on a PV domain. So
>>> __max_logical_packages is uninitialized and this leads to a NULL
>>> pointer dereference in coretemp.
>>>
>>
>> I'll see if I can figure out a way to test that.
>
> Ok, thanks. If you need some logs, or if I should test something just
> ask.
>

Thanks, I may send you a patch off-list to test.

>> Does 947134d9b00f
>> ("x86/smpboot: Do not use smp_num_siblings in __max_logical_packages
>> calculation") help?
>
> No.
>

Yeah, your description was absolutely correct. native_smp_cpus_done() is only
called for HVM guests and not PV. I'm going to think about that and send you a
patch in a bit.

P.