Subject: [RFC PATCH] perf/x86/intel/rapl: avoid access unallocate memory

After the hotplug rework Charles Williams reported that his vmware
virtualized system no longer boots and crashes in rapl_cpu_online().
As it turns out topology_max_packages() reports four while
topology_logical_package_id() for CPU two and three returns 65535. That
means cpu_to_rapl_pmu() for those CPUs is accessing not allocated memory
of rapl_pmus->pmus[].
"M. Vefa Bicakci" reported the same problem on XEN.
This patch ensures we error out in such an invalid situation.

Reported-by: "Charles (Chas) Williams" <[email protected]>
Tested-by: "M. Vefa Bicakci" <[email protected]>
Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
---
I am not sure if this a race with the new hotplug code or something that was
always there. Both (M. Vefa Bicakc and Charles) say that the box boots
sometimes fine (without the patch). smp_store_boot_cpu_info() should have run
before the notofoert and thus should have set the info properly. However I got
the following bootlog from Charles with this patch:

[ 0.017110] smpboot: APIC(0) Converting physical 0 to logical package 0
[ 0.017111] smpboot: APIC(1) Converting physical 1 to logical package 1
[ 0.017113] smpboot: Max logical packages: 2

[ 1.995494] RAPL PMU: rapl pmu error: max package: 2 but CPU1 belongs to 65535
[ 1.995647] rapl pmu error: max package: 2 but CPU1 belongs to 65535

So it seems that the information got overwritten. I am not sure how to proceed
here. That memory corruption should be found and fixed and a boot crash might
motivate one to do so… I can't reproduce this on barematal.

Thread starts at
[email protected]

arch/x86/events/intel/rapl.c | 9 +++++++++
1 file changed, 9 insertions(+)

diff --git a/arch/x86/events/intel/rapl.c b/arch/x86/events/intel/rapl.c
index 0a535cea8ff3..f5d85f2853d7 100644
--- a/arch/x86/events/intel/rapl.c
+++ b/arch/x86/events/intel/rapl.c
@@ -682,6 +682,15 @@ static int __init init_rapl_pmus(void)
{
int maxpkg = topology_max_packages();
size_t size;
+ unsigned int cpu;
+
+ for_each_possible_cpu(cpu) {
+ if (topology_logical_package_id(cpu) >= maxpkg) {
+ pr_err("rapl pmu error: max package: %u but CPU%d belongs to %u\n",
+ maxpkg, cpu, topology_logical_package_id(cpu));
+ return -EINVAL;
+ }
+ }

size = sizeof(*rapl_pmus) + maxpkg * sizeof(struct rapl_pmu *);
rapl_pmus = kzalloc(size, GFP_KERNEL);
--
2.10.2


2016-11-02 22:48:16

by Charles (Chas) Williams

[permalink] [raw]
Subject: Re: [RFC PATCH] perf/x86/intel/rapl: avoid access unallocate memory

On 11/02/2016 08:25 AM, Sebastian Andrzej Siewior wrote:
> I am not sure if this a race with the new hotplug code or something that was
> always there. Both (M. Vefa Bicakc and Charles) say that the box boots
> sometimes fine (without the patch). smp_store_boot_cpu_info() should have run
> before the notofoert and thus should have set the info properly. However I got
> the following bootlog from Charles with this patch:

I don't this this is a race. Here is some debugging from the two CPU VM
(2 sockets, 1 core per socket). In identify_cpu() we have:

/* The boot/hotplug time assigment got cleared, restore it */
c->logical_proc_id = topology_phys_to_logical_pkg(c->phys_proc_id);

The values just after this:

[ 0.228306] identify_cpu: c ffff88023fd0a040 logical_proc_id 65535 c->phys_proc_id 2

So what's interesting here, is the phys_proc_id of 2 for CPU1:

int topology_phys_to_logical_pkg(unsigned int phys_pkg)
{
if (phys_pkg >= max_physical_pkg_id)
return -1;
return physical_to_logical_pkg[phys_pkg];
}

And we happen to know the max_physical_pkg_id is 2 in this case.
So apparently, topology_phys_to_logical_pkg() returns -1 and it gets
assigned to the logical_proc_id.

I don't know why the CPU's phys_proc_id is 2.

Subject: Re: [RFC PATCH] perf/x86/intel/rapl: avoid access unallocate memory

On 2016-11-02 18:47:49 [-0400], Charles (Chas) Williams wrote:
> I don't this this is a race. Here is some debugging from the two CPU VM
> (2 sockets, 1 core per socket). In identify_cpu() we have:
>
> /* The boot/hotplug time assigment got cleared, restore it */
> c->logical_proc_id = topology_phys_to_logical_pkg(c->phys_proc_id);
>
> The values just after this:
>
> [ 0.228306] identify_cpu: c ffff88023fd0a040 logical_proc_id 65535 c->phys_proc_id 2
>
> So what's interesting here, is the phys_proc_id of 2 for CPU1:
>
> int topology_phys_to_logical_pkg(unsigned int phys_pkg)
> {
> if (phys_pkg >= max_physical_pkg_id)
> return -1;
> return physical_to_logical_pkg[phys_pkg];
> }
>
> And we happen to know the max_physical_pkg_id is 2 in this case.
> So apparently, topology_phys_to_logical_pkg() returns -1 and it gets
> assigned to the logical_proc_id.
>
> I don't know why the CPU's phys_proc_id is 2.

This is the physical ID. You have two logical IDs (on your two sockets
machine). What is max_physical_pkg_id? In order to get that -1 you would
have to max_physical_pkg_id of 1 but code does
max_physical_pkg_id = DIV_ROUND_UP(MAX_LOCAL_APIC, ncpus);

and I would be a little surprised if this is 1.

Sebastian

2016-11-04 12:20:58

by Charles (Chas) Williams

[permalink] [raw]
Subject: Re: [RFC PATCH] perf/x86/intel/rapl: avoid access unallocate memory



On 11/03/2016 01:47 PM, Sebastian Andrzej Siewior wrote:
> On 2016-11-02 18:47:49 [-0400], Charles (Chas) Williams wrote:
>> I don't this this is a race. Here is some debugging from the two CPU VM
>> (2 sockets, 1 core per socket). In identify_cpu() we have:
>>
>> /* The boot/hotplug time assigment got cleared, restore it */
>> c->logical_proc_id = topology_phys_to_logical_pkg(c->phys_proc_id);
>>
>> The values just after this:
>>
>> [ 0.228306] identify_cpu: c ffff88023fd0a040 logical_proc_id 65535 c->phys_proc_id 2
>>
>> So what's interesting here, is the phys_proc_id of 2 for CPU1:
>>
>> int topology_phys_to_logical_pkg(unsigned int phys_pkg)
>> {
>> if (phys_pkg >= max_physical_pkg_id)
>> return -1;
>> return physical_to_logical_pkg[phys_pkg];
>> }
>>
>> And we happen to know the max_physical_pkg_id is 2 in this case.
>> So apparently, topology_phys_to_logical_pkg() returns -1 and it gets
>> assigned to the logical_proc_id.
>>
>> I don't know why the CPU's phys_proc_id is 2.
>
> This is the physical ID. You have two logical IDs (on your two sockets
> machine). What is max_physical_pkg_id? In order to get that -1 you would
> have to max_physical_pkg_id of 1 but code does
> max_physical_pkg_id = DIV_ROUND_UP(MAX_LOCAL_APIC, ncpus);
>
> and I would be a little surprised if this is 1.
>
> Sebastian

The initial CPU boots and is identified:

[ 0.009018] identify_boot_cpu
[ 0.009174] generic_identify: phys_proc_id is now 0
...
[ 0.009427] identify_cpu: before c ffffffff81ae2680 logical_proc_id 0 c->phys_proc_id 0
[ 0.009506] identify_cpu: after c ffffffff81ae2680 logical_proc_id 65535 c->phys_proc_id 0

So, this is fine because the APIC hasn't been scanned yet. APIC
now gets scanned:

[ 0.015789] smpboot: APIC(0) Converting physical 0 to logical package 0, cpu 0 (ffff88023fc0a040)
[ 0.015794] smpboot: APIC(1) Converting physical 1 to logical package 1, cpu 1 (ffff88023fd0a040)
[ 0.015797] smpboot: Max logical packages: 2

So, at this point, I think everything is correct. But now the secondary
CPU's "boot":

[ 0.236569] identify_secondary_cpu
[ 0.236620] generic_identify: phys_proc_id is now 2

[ 0.236745] identify_cpu: before c ffff88023fd0a040 logical_proc_id 65535 c->phys_proc_id 2
[ 0.236747] identify_cpu: after c ffff88023fd0a040 logical_proc_id 65535 c->phys_proc_id 2

So, APIC discovered I have a cpu 0 and 1 but generic_identify() is called
my second CPU, 2. This is >= max_physical_pkg_id, so it is going to get
set to -1.

The comment at the end of identfy_cpu() says:

/* The boot/hotplug time assigment got cleared, restore it */

So, logical_proc_id being wrong here before restoration doesn't bother
me since I assume something in booting the secondary CPU's clears any
existing cpu data.

I know detect_extended_topology() is likely being called for both CPU's
and getting the right values (checking this now). I don't know why
generic_identify() is resetting this value.

Subject: Re: [RFC PATCH] perf/x86/intel/rapl: avoid access unallocate memory

On 2016-11-04 08:20:37 [-0400], Charles (Chas) Williams wrote:
> The initial CPU boots and is identified:
>
> [ 0.009018] identify_boot_cpu
> [ 0.009174] generic_identify: phys_proc_id is now 0
> ...
> [ 0.009427] identify_cpu: before c ffffffff81ae2680 logical_proc_id 0 c->phys_proc_id 0
> [ 0.009506] identify_cpu: after c ffffffff81ae2680 logical_proc_id 65535 c->phys_proc_id 0
>
> So, this is fine because the APIC hasn't been scanned yet. APIC
> now gets scanned:
>
> [ 0.015789] smpboot: APIC(0) Converting physical 0 to logical package 0, cpu 0 (ffff88023fc0a040)
> [ 0.015794] smpboot: APIC(1) Converting physical 1 to logical package 1, cpu 1 (ffff88023fd0a040)
> [ 0.015797] smpboot: Max logical packages: 2

where is the APICID here is comming from?

> So, at this point, I think everything is correct. But now the secondary
> CPU's "boot":
>
> [ 0.236569] identify_secondary_cpu
> [ 0.236620] generic_identify: phys_proc_id is now 2

so here is where fun starts. Xen has also
arch/x86/xen/smp.c::cpu_bringup() where the phys_proc_id is changed. But
isn't done for vmware but it might a place where they duct tape things.

How is this APIC id different from the earlier? I guess based on your
output that generic_identify() changes the content of phys_proc_id.

> [ 0.236745] identify_cpu: before c ffff88023fd0a040 logical_proc_id 65535 c->phys_proc_id 2
> [ 0.236747] identify_cpu: after c ffff88023fd0a040 logical_proc_id 65535 c->phys_proc_id 2
>
> So, APIC discovered I have a cpu 0 and 1 but generic_identify() is called
> my second CPU, 2. This is >= max_physical_pkg_id, so it is going to get
> set to -1.

Now. max_physical_pkg_id is huge. The physical_to_logical_pkg array is
set to -1 on init so slot two has the value -1. That is what you see -
not the -1 because of ">= max_physical_pkg_id".

> The comment at the end of identfy_cpu() says:
>
> /* The boot/hotplug time assigment got cleared, restore it */
>
> So, logical_proc_id being wrong here before restoration doesn't bother
> me since I assume something in booting the secondary CPU's clears any
> existing cpu data.
>
> I know detect_extended_topology() is likely being called for both CPU's
> and getting the right values (checking this now). I don't know why
> generic_identify() is resetting this value.

I don't know either. But it is clearly reading the apic id twice and
second approach is different from the first which leads to different
results. So if you figure out how the first APICID for the second CPU is
retrieved and then you see how it happens for the second time. There
must be a difference.

Sebastian

2016-11-04 20:43:10

by Charles (Chas) Williams

[permalink] [raw]
Subject: Re: [RFC PATCH] perf/x86/intel/rapl: avoid access unallocate memory

On 11/04/2016 02:03 PM, Sebastian Andrzej Siewior wrote:
> On 2016-11-04 08:20:37 [-0400], Charles (Chas) Williams wrote:
>> The initial CPU boots and is identified:
>>
>> [ 0.009018] identify_boot_cpu
>> [ 0.009174] generic_identify: phys_proc_id is now 0
>> ...
>> [ 0.009427] identify_cpu: before c ffffffff81ae2680 logical_proc_id 0 c->phys_proc_id 0
>> [ 0.009506] identify_cpu: after c ffffffff81ae2680 logical_proc_id 65535 c->phys_proc_id 0
>>
>> So, this is fine because the APIC hasn't been scanned yet. APIC
>> now gets scanned:
>>
>> [ 0.015789] smpboot: APIC(0) Converting physical 0 to logical package 0, cpu 0 (ffff88023fc0a040)
>> [ 0.015794] smpboot: APIC(1) Converting physical 1 to logical package 1, cpu 1 (ffff88023fd0a040)
>> [ 0.015797] smpboot: Max logical packages: 2
>
> where is the APICID here is comming from?

This comes from here:

unsigned int apicid = apic->cpu_present_to_apicid(cpu);

if (apicid == BAD_APICID || !apic->apic_id_valid(apicid))
continue;
if (!topology_update_package_map(apicid, cpu))

And I think this is the part that is "wrong". The apicid appears to
be a logical CPU id. I believe that in most cases this mapping comes
from x86_bios_cpu_apicid (or x86_cpu_to_apicid) which is generated in
generic_processor_info() which maps apicid's to logical cpu indexes.

Note that apic->cpu_present_to_apicid() is using just the cpu_index.

for_each_present_cpu(cpu) {
unsigned int apicid = apic->cpu_present_to_apicid(cpu);

if (apicid == BAD_APICID || !apic->apic_id_valid(apicid))
continue;
if (!topology_update_package_map(apicid, cpu))
continue;
pr_warn("CPU %u APICId %x disabled\n", cpu, apicid);
per_cpu(x86_bios_cpu_apicid, cpu) = BAD_APICID;
set_cpu_possible(cpu, false);
set_cpu_present(cpu, false);
}

>> So, at this point, I think everything is correct. But now the secondary
>> CPU's "boot":
>>
>> [ 0.236569] identify_secondary_cpu
>> [ 0.236620] generic_identify: phys_proc_id is now 2
>
> so here is where fun starts. Xen has also
> arch/x86/xen/smp.c::cpu_bringup() where the phys_proc_id is changed. But
> isn't done for vmware but it might a place where they duct tape things.
>
> How is this APIC id different from the earlier? I guess based on your
> output that generic_identify() changes the content of phys_proc_id.
>
>> [ 0.236745] identify_cpu: before c ffff88023fd0a040 logical_proc_id 65535 c->phys_proc_id 2
>> [ 0.236747] identify_cpu: after c ffff88023fd0a040 logical_proc_id 65535 c->phys_proc_id 2
>>
>> So, APIC discovered I have a cpu 0 and 1 but generic_identify() is called
>> my second CPU, 2. This is >= max_physical_pkg_id, so it is going to get
>> set to -1.
>
> Now. max_physical_pkg_id is huge. The physical_to_logical_pkg array is
> set to -1 on init so slot two has the value -1. That is what you see -
> not the -1 because of ">= max_physical_pkg_id".
>
>> The comment at the end of identfy_cpu() says:
>>
>> /* The boot/hotplug time assigment got cleared, restore it */
>>
>> So, logical_proc_id being wrong here before restoration doesn't bother
>> me since I assume something in booting the secondary CPU's clears any
>> existing cpu data.
>>
>> I know detect_extended_topology() is likely being called for both CPU's
>> and getting the right values (checking this now). I don't know why
>> generic_identify() is resetting this value.
>
> I don't know either. But it is clearly reading the apic id twice and
> second approach is different from the first which leads to different
> results. So if you figure out how the first APICID for the second CPU is
> retrieved and then you see how it happens for the second time. There
> must be a difference.

The phys core id from generic_identify() comes from the CPU's EBX register
so we _know_ this is right.

if (c->cpuid_level >= 0x00000001) {
c->initial_apicid = (cpuid_ebx(1) >> 24) & 0xFF;
#ifdef CONFIG_X86_32
# ifdef CONFIG_SMP
c->apicid = apic->phys_pkg_id(c->initial_apicid, 0);
# else
c->apicid = c->initial_apicid;
# endif
#endif
c->phys_proc_id = c->initial_apicid;
}

The intel docs http://x86.renejeschke.de/html/file_module_x86_id_45.html
claims this is the Local APIC ID. So it seems likely this is correct
value. It's not clear it matter if this is the right value or not
though. Even if this is the correct apicid, nothing knows about it.

An argument could be made that instead of checking the cpuid level, we
could just use the apicid based on the cpu index just like the other code.
It would be consistent at least then.

Subject: Re: [RFC PATCH] perf/x86/intel/rapl: avoid access unallocate memory

On 2016-11-04 16:42:33 [-0400], Charles (Chas) Williams wrote:
> This comes from here:
>
> unsigned int apicid = apic->cpu_present_to_apicid(cpu);
>
> if (apicid == BAD_APICID || !apic->apic_id_valid(apicid))
> continue;
> if (!topology_update_package_map(apicid, cpu))

It is late here. Can you check what function apic->cpu_present_to_apicid
is? It should do the right thing for your APIC.

> The phys core id from generic_identify() comes from the CPU's EBX register
> so we _know_ this is right.

So you are sure cpuid_level is > 1 here.

> if (c->cpuid_level >= 0x00000001) {
> c->initial_apicid = (cpuid_ebx(1) >> 24) & 0xFF;
> #ifdef CONFIG_X86_32
> # ifdef CONFIG_SMP
> c->apicid = apic->phys_pkg_id(c->initial_apicid, 0);
> # else
> c->apicid = c->initial_apicid;
> # endif
> #endif
> c->phys_proc_id = c->initial_apicid;
> }
>
> The intel docs http://x86.renejeschke.de/html/file_module_x86_id_45.html
> claims this is the Local APIC ID. So it seems likely this is correct
> value. It's not clear it matter if this is the right value or not
> though. Even if this is the correct apicid, nothing knows about it.

Need to check that later (and your whole mail) against an official
manual. But as you see in the code, it checks the apicid against
->phys_pkg_id on 32bit so it seems sometimes there are changes required
to what the CPUID returns.

Sebastian

2016-11-07 16:21:42

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [RFC PATCH] perf/x86/intel/rapl: avoid access unallocate memory

On Wed, 2 Nov 2016, Charles (Chas) Williams wrote:

> On 11/02/2016 08:25 AM, Sebastian Andrzej Siewior wrote:
> > I am not sure if this a race with the new hotplug code or something that was
> > always there. Both (M. Vefa Bicakc and Charles) say that the box boots
> > sometimes fine (without the patch). smp_store_boot_cpu_info() should have
> > run
> > before the notofoert and thus should have set the info properly. However I
> > got
> > the following bootlog from Charles with this patch:
>
> I don't this this is a race. Here is some debugging from the two CPU VM
> (2 sockets, 1 core per socket). In identify_cpu() we have:
>
> /* The boot/hotplug time assigment got cleared, restore it */
> c->logical_proc_id = topology_phys_to_logical_pkg(c->phys_proc_id);
>
> The values just after this:
>
> [ 0.228306] identify_cpu: c ffff88023fd0a040 logical_proc_id 65535
> c->phys_proc_id 2
>
> So what's interesting here, is the phys_proc_id of 2 for CPU1:
>
> int topology_phys_to_logical_pkg(unsigned int phys_pkg)
> {
> if (phys_pkg >= max_physical_pkg_id)
> return -1;
> return physical_to_logical_pkg[phys_pkg];
> }
>
> And we happen to know the max_physical_pkg_id is 2 in this case.
> So apparently, topology_phys_to_logical_pkg() returns -1 and it gets
> assigned to the logical_proc_id.
>
> I don't know why the CPU's phys_proc_id is 2.

max_physical_pkg_id gets initialized via:

cpus = boot_cpu_data.x86_max_cores;
max_physical_pkg_id = DIV_ROUND_UP(MAX_LOCAL_APIC, ncpus);

What's the value of boot_cpu_data.x86_max_cores and MAX_LOCAL_APIC?

Thanks,

tglx

2016-11-07 17:00:13

by Charles (Chas) Williams

[permalink] [raw]
Subject: Re: [RFC PATCH] perf/x86/intel/rapl: avoid access unallocate memory

On 11/07/2016 11:19 AM, Thomas Gleixner wrote:
> On Wed, 2 Nov 2016, Charles (Chas) Williams wrote:
>
>> On 11/02/2016 08:25 AM, Sebastian Andrzej Siewior wrote:
>>> I am not sure if this a race with the new hotplug code or something that was
>>> always there. Both (M. Vefa Bicakc and Charles) say that the box boots
>>> sometimes fine (without the patch). smp_store_boot_cpu_info() should have
>>> run
>>> before the notofoert and thus should have set the info properly. However I
>>> got
>>> the following bootlog from Charles with this patch:
>>
>> I don't this this is a race. Here is some debugging from the two CPU VM
>> (2 sockets, 1 core per socket). In identify_cpu() we have:
>>
>> /* The boot/hotplug time assigment got cleared, restore it */
>> c->logical_proc_id = topology_phys_to_logical_pkg(c->phys_proc_id);
>>
>> The values just after this:
>>
>> [ 0.228306] identify_cpu: c ffff88023fd0a040 logical_proc_id 65535
>> c->phys_proc_id 2
>>
>> So what's interesting here, is the phys_proc_id of 2 for CPU1:
>>
>> int topology_phys_to_logical_pkg(unsigned int phys_pkg)
>> {
>> if (phys_pkg >= max_physical_pkg_id)
>> return -1;
>> return physical_to_logical_pkg[phys_pkg];
>> }
>>
>> And we happen to know the max_physical_pkg_id is 2 in this case.
>> So apparently, topology_phys_to_logical_pkg() returns -1 and it gets
>> assigned to the logical_proc_id.
>>
>> I don't know why the CPU's phys_proc_id is 2.
>
> max_physical_pkg_id gets initialized via:
>
> cpus = boot_cpu_data.x86_max_cores;
> max_physical_pkg_id = DIV_ROUND_UP(MAX_LOCAL_APIC, ncpus);
>
> What's the value of boot_cpu_data.x86_max_cores and MAX_LOCAL_APIC?

I have discovered that that is not the problem. smp_init_package_map()
is calculating the physical core id using the following:

for_each_present_cpu(cpu) {
unsigned int apicid = apic->cpu_present_to_apicid(cpu);

...
if (!topology_update_package_map(apicid, cpu))
continue;

...
int topology_update_package_map(unsigned int apicid, unsigned int cpu)
{
unsigned int new, pkg = apicid >> boot_cpu_data.x86_coreid_bits;

But later when the secondary CPU's are identified they use a different
calculation using the local APIC ID from the CPU's registers:

static void generic_identify(struct cpuinfo_x86 *c)
...
if (c->cpuid_level >= 0x00000001) {
c->initial_apicid = (cpuid_ebx(1) >> 24) & 0xFF;
...
c->phys_proc_id = c->initial_apicid;

So at the end of identify_cpu() when the boot/hotplug assignment is
put back:

c->logical_proc_id = topology_phys_to_logical_pkg(c->phys_proc_id);

topology_phys_to_logical_pkg() is returning an invalid logical processor
since one isn't configured.

It's not clear to me what the right thing to do is or which is right.

2016-11-07 20:23:46

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [RFC PATCH] perf/x86/intel/rapl: avoid access unallocate memory

On Mon, 7 Nov 2016, Charles (Chas) Williams wrote:
> On 11/07/2016 11:19 AM, Thomas Gleixner wrote:
> > On Wed, 2 Nov 2016, Charles (Chas) Williams wrote:
> > > I don't know why the CPU's phys_proc_id is 2.
> >
> > max_physical_pkg_id gets initialized via:
> >
> > cpus = boot_cpu_data.x86_max_cores;
> > max_physical_pkg_id = DIV_ROUND_UP(MAX_LOCAL_APIC, ncpus);
> >
> > What's the value of boot_cpu_data.x86_max_cores and MAX_LOCAL_APIC?
>
> I have discovered that that is not the problem. smp_init_package_map()
> is calculating the physical core id using the following:
>
> for_each_present_cpu(cpu) {
> unsigned int apicid = apic->cpu_present_to_apicid(cpu);
>
> ...
> if (!topology_update_package_map(apicid, cpu))
> continue;
>
> ...
> int topology_update_package_map(unsigned int apicid, unsigned int cpu)
> {
> unsigned int new, pkg = apicid >>
> boot_cpu_data.x86_coreid_bits;
>
> But later when the secondary CPU's are identified they use a different
> calculation using the local APIC ID from the CPU's registers:
>
> static void generic_identify(struct cpuinfo_x86 *c)
> ...
> if (c->cpuid_level >= 0x00000001) {
> c->initial_apicid = (cpuid_ebx(1) >> 24) & 0xFF;
> ...
> c->phys_proc_id = c->initial_apicid;
>
> So at the end of identify_cpu() when the boot/hotplug assignment is
> put back:
>
> c->logical_proc_id = topology_phys_to_logical_pkg(c->phys_proc_id);
>
> topology_phys_to_logical_pkg() is returning an invalid logical processor
> since one isn't configured.
>
> It's not clear to me what the right thing to do is or which is right.

Nice detective work! So the issue is that the package mapping code honours
boot_cpu_data.x86_coreid_bit, while generic_identify does
not. boot_cpu_data.x86_coreid_bit is obviously 1 in your case. Tentative
fix below. I still need to gow through that maze and figure out what could
go wrong with that :(

Thanks,

tglx

8<------------------------
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -905,6 +905,8 @@ static void detect_null_seg_behavior(str

static void generic_identify(struct cpuinfo_x86 *c)
{
+ unsigned int pkg;
+
c->extended_cpuid_level = 0;

if (!have_cpuid_p())
@@ -929,7 +931,8 @@ static void generic_identify(struct cpui
c->apicid = c->initial_apicid;
# endif
#endif
- c->phys_proc_id = c->initial_apicid;
+ pkg = c->initial_apicid >> boot_cpu_data.x86_coreid_bits;
+ c->phys_proc_id = pkg;
}

get_model_name(c); /* Default name */



2016-11-08 14:21:10

by Charles (Chas) Williams

[permalink] [raw]
Subject: Re: [RFC PATCH] perf/x86/intel/rapl: avoid access unallocate memory

On 11/07/2016 03:20 PM, Thomas Gleixner wrote:
> On Mon, 7 Nov 2016, Charles (Chas) Williams wrote:
>> On 11/07/2016 11:19 AM, Thomas Gleixner wrote:
>>> On Wed, 2 Nov 2016, Charles (Chas) Williams wrote:
>>>> I don't know why the CPU's phys_proc_id is 2.
>>>
>>> max_physical_pkg_id gets initialized via:
>>>
>>> cpus = boot_cpu_data.x86_max_cores;
>>> max_physical_pkg_id = DIV_ROUND_UP(MAX_LOCAL_APIC, ncpus);
>>>
>>> What's the value of boot_cpu_data.x86_max_cores and MAX_LOCAL_APIC?
>>
>> I have discovered that that is not the problem. smp_init_package_map()
>> is calculating the physical core id using the following:
>>
>> for_each_present_cpu(cpu) {
>> unsigned int apicid = apic->cpu_present_to_apicid(cpu);
>>
>> ...
>> if (!topology_update_package_map(apicid, cpu))
>> continue;
>>
>> ...
>> int topology_update_package_map(unsigned int apicid, unsigned int cpu)
>> {
>> unsigned int new, pkg = apicid >>
>> boot_cpu_data.x86_coreid_bits;
>>
>> But later when the secondary CPU's are identified they use a different
>> calculation using the local APIC ID from the CPU's registers:
>>
>> static void generic_identify(struct cpuinfo_x86 *c)
>> ...
>> if (c->cpuid_level >= 0x00000001) {
>> c->initial_apicid = (cpuid_ebx(1) >> 24) & 0xFF;
>> ...
>> c->phys_proc_id = c->initial_apicid;
>>
>> So at the end of identify_cpu() when the boot/hotplug assignment is
>> put back:
>>
>> c->logical_proc_id = topology_phys_to_logical_pkg(c->phys_proc_id);
>>
>> topology_phys_to_logical_pkg() is returning an invalid logical processor
>> since one isn't configured.
>>
>> It's not clear to me what the right thing to do is or which is right.
>
> Nice detective work! So the issue is that the package mapping code honours
> boot_cpu_data.x86_coreid_bit, while generic_identify does
> not. boot_cpu_data.x86_coreid_bit is obviously 1 in your case. Tentative
> fix below. I still need to gow through that maze and figure out what could
> go wrong with that :(

I don't think that will fix the issue. My deubugging leads me to believe
that boot_cpu_data.x86_coreid_bits is probably 0:

[ 0.016335] topology_update_package_map: apicid 0 pkg 0 cpu 0
[ 0.016398] smpboot: APIC(0) Converting physical 0 to logical package 0, cpu 0 (ffff88023fc0a040)
[ 0.016399] topology_update_package_map: apicid 1 pkg 1 cpu 1
[ 0.016462] smpboot: APIC(1) Converting physical 1 to logical package 1, cpu 1 (ffff88023fd0a040)

So, I don't know where apic->cpu_present_to_apicid(cpu) is getting its
apicid but it certainly doesn't seem to the match the apicid in the
CPU's registers. For whatever reason, my VMware system is reporting
that the second CPU has a local APIC ID of 2:

[ 0.009115] identify_cpu: cpu_index 0 phys_proc_id is now 0, apicid 0, initial_apicid 0
...
[ 0.237401] identify_cpu: cpu_index 1 phys_proc_id is now 2, apicid 2, initial_apicid 2

I was thinking it might be better to call topology_update_package_map()
at the bottom of identify_cpu() to setup the secondary CPU's. The boot
cpu could be setup during smp_init_package_map().

topology_update_package_map() is also poorly named it can create the
assignment, but can't update it (since it doesn't know the previous
mapping).


> 8<------------------------
> --- a/arch/x86/kernel/cpu/common.c
> +++ b/arch/x86/kernel/cpu/common.c
> @@ -905,6 +905,8 @@ static void detect_null_seg_behavior(str
>
> static void generic_identify(struct cpuinfo_x86 *c)
> {
> + unsigned int pkg;
> +
> c->extended_cpuid_level = 0;
>
> if (!have_cpuid_p())
> @@ -929,7 +931,8 @@ static void generic_identify(struct cpui
> c->apicid = c->initial_apicid;
> # endif
> #endif
> - c->phys_proc_id = c->initial_apicid;
> + pkg = c->initial_apicid >> boot_cpu_data.x86_coreid_bits;
> + c->phys_proc_id = pkg;
> }
>
> get_model_name(c); /* Default name */
>
>
>

2016-11-08 14:34:14

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [RFC PATCH] perf/x86/intel/rapl: avoid access unallocate memory

On Tue, 8 Nov 2016, Charles (Chas) Williams wrote:
> [ 0.016335] topology_update_package_map: apicid 0 pkg 0 cpu 0
> [ 0.016398] smpboot: APIC(0) Converting physical 0 to logical
> package 0, cpu 0 (ffff88023fc0a040)
> [ 0.016399] topology_update_package_map: apicid 1 pkg 1 cpu 1
> [ 0.016462] smpboot: APIC(1) Converting physical 1 to logical
> package 1, cpu 1 (ffff88023fd0a040)
>
> So, I don't know where apic->cpu_present_to_apicid(cpu) is getting its
> apicid but it certainly doesn't seem to the match the apicid in the
> CPU's registers. For whatever reason, my VMware system is reporting
> that the second CPU has a local APIC ID of 2:

The initial information comes from MP tables or ACPI.

> [ 0.009115] identify_cpu: cpu_index 0 phys_proc_id is now 0,
> apicid 0, initial_apicid 0
> ...
> [ 0.237401] identify_cpu: cpu_index 1 phys_proc_id is now 2,
> apicid 2, initial_apicid 2

And the CPUID emulation tells something different. Sigh!

> I was thinking it might be better to call topology_update_package_map()
> at the bottom of identify_cpu() to setup the secondary CPU's. The boot
> cpu could be setup during smp_init_package_map().

Perhaps, but that does not make the inconsistencies go away....

Thanks,

tglx

2016-11-08 14:57:34

by Charles (Chas) Williams

[permalink] [raw]
Subject: Re: [RFC PATCH] perf/x86/intel/rapl: avoid access unallocate memory



On 11/08/2016 09:31 AM, Thomas Gleixner wrote:
> On Tue, 8 Nov 2016, Charles (Chas) Williams wrote:
>> [ 0.016335] topology_update_package_map: apicid 0 pkg 0 cpu 0
>> [ 0.016398] smpboot: APIC(0) Converting physical 0 to logical
>> package 0, cpu 0 (ffff88023fc0a040)
>> [ 0.016399] topology_update_package_map: apicid 1 pkg 1 cpu 1
>> [ 0.016462] smpboot: APIC(1) Converting physical 1 to logical
>> package 1, cpu 1 (ffff88023fd0a040)
>>
>> So, I don't know where apic->cpu_present_to_apicid(cpu) is getting its
>> apicid but it certainly doesn't seem to the match the apicid in the
>> CPU's registers. For whatever reason, my VMware system is reporting
>> that the second CPU has a local APIC ID of 2:
>
> The initial information comes from MP tables or ACPI.
>
>> [ 0.009115] identify_cpu: cpu_index 0 phys_proc_id is now 0,
>> apicid 0, initial_apicid 0
>> ...
>> [ 0.237401] identify_cpu: cpu_index 1 phys_proc_id is now 2,
>> apicid 2, initial_apicid 2
>
> And the CPUID emulation tells something different. Sigh!
>
>> I was thinking it might be better to call topology_update_package_map()
>> at the bottom of identify_cpu() to setup the secondary CPU's. The boot
>> cpu could be setup during smp_init_package_map().
>
> Perhaps, but that does not make the inconsistencies go away....

By the time I know it's not consistent, there isn't anything I can do
about it. I can't update the table to remove the bad information.

The other alternative, is to trust the ACPI and just update the
cpu_data's apicid in identify_cpu() to the value from the table.
The earlier kernels didn't seem to rely as much on this information.
But it does appear to be "wrong" in the APIC table. From acpidump:

[02Ch 0044 1] Subtable Type : 00 [Processor Local APIC]
[02Dh 0045 1] Length : 08
[02Eh 0046 1] Processor ID : 00
[02Fh 0047 1] Local Apic ID : 00
[030h 0048 4] Flags (decoded below) : 00000001
Processor Enabled : 1

[034h 0052 1] Subtable Type : 00 [Processor Local APIC]
[035h 0053 1] Length : 08
[036h 0054 1] Processor ID : 01
[037h 0055 1] Local Apic ID : 01
[038h 0056 4] Flags (decoded below) : 00000001
Processor Enabled : 1



2016-11-08 16:24:46

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [RFC PATCH] perf/x86/intel/rapl: avoid access unallocate memory

On Tue, 8 Nov 2016, Charles (Chas) Williams wrote:
> On 11/08/2016 09:31 AM, Thomas Gleixner wrote:
> > On Tue, 8 Nov 2016, Charles (Chas) Williams wrote:
> > > [ 0.016335] topology_update_package_map: apicid 0 pkg 0 cpu 0
> > > [ 0.016398] smpboot: APIC(0) Converting physical 0 to logical
> > > package 0, cpu 0 (ffff88023fc0a040)
> > > [ 0.016399] topology_update_package_map: apicid 1 pkg 1 cpu 1
> > > [ 0.016462] smpboot: APIC(1) Converting physical 1 to logical
> > > package 1, cpu 1 (ffff88023fd0a040)
> > >
> > > So, I don't know where apic->cpu_present_to_apicid(cpu) is getting its
> > > apicid but it certainly doesn't seem to the match the apicid in the
> > > CPU's registers. For whatever reason, my VMware system is reporting
> > > that the second CPU has a local APIC ID of 2:
> >
> > The initial information comes from MP tables or ACPI.
> >
> > > [ 0.009115] identify_cpu: cpu_index 0 phys_proc_id is now 0,
> > > apicid 0, initial_apicid 0
> > > ...
> > > [ 0.237401] identify_cpu: cpu_index 1 phys_proc_id is now 2,
> > > apicid 2, initial_apicid 2
> >
> > And the CPUID emulation tells something different. Sigh!
> >
> > > I was thinking it might be better to call topology_update_package_map()
> > > at the bottom of identify_cpu() to setup the secondary CPU's. The boot
> > > cpu could be setup during smp_init_package_map().
> >
> > Perhaps, but that does not make the inconsistencies go away....
>
> By the time I know it's not consistent, there isn't anything I can do
> about it. I can't update the table to remove the bad information.
>
> The other alternative, is to trust the ACPI and just update the
> cpu_data's apicid in identify_cpu() to the value from the table.
> The earlier kernels didn't seem to rely as much on this information.
> But it does appear to be "wrong" in the APIC table. From acpidump:
>
> [02Ch 0044 1] Subtable Type : 00 [Processor Local APIC]
> [02Dh 0045 1] Length : 08
> [02Eh 0046 1] Processor ID : 00
> [02Fh 0047 1] Local Apic ID : 00
> [030h 0048 4] Flags (decoded below) : 00000001
> Processor Enabled : 1
>
> [034h 0052 1] Subtable Type : 00 [Processor Local APIC]
> [035h 0053 1] Length : 08
> [036h 0054 1] Processor ID : 01
> [037h 0055 1] Local Apic ID : 01
> [038h 0056 4] Flags (decoded below) : 00000001
> Processor Enabled : 1

Well, which one is wrong is hard to tell :)

I'll have a look how to sort that out.

Thanks,

tglx

2016-11-09 15:38:30

by Thomas Gleixner

[permalink] [raw]
Subject: [PATCH] x86/cpuid: Deal with broken firmware once more

Both ACPI and MP specifications require that the APIC id in the respective
tables must be the same as the APIC id in CPUID.

The kernel retrieves the physical package id from the APIC id during the
ACPI/MP table scan and builds the physical to logical package map.

There exist Virtualbox and Xen implementations which violate the spec. As a
result the physical to logical package map, which relies on the ACPI/MP
tables does not work on those systems, because the CPUID initialized
physical package id does not match the firmware id. This causes system
crashes and malfunction due to invalid package mappings.

The only way to cure this is to sanitize the physical package id after the
CPUID enumeration and yell when the APIC ids are different. If the physical
package IDs differ use the package information from the ACPI/MP tables so
the existing logical package map just works.

Reported-by: "Charles (Chas) Williams" <[email protected]>,
Reported-by: M. Vefa Bicakci <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
---
arch/x86/kernel/cpu/common.c | 31 +++++++++++++++++++++++++++++--
1 file changed, 29 insertions(+), 2 deletions(-)

--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -979,6 +979,34 @@ static void x86_init_cache_qos(struct cp
}

/*
+ * The physical to logical package id mapping is initialized from the
+ * acpi/mptables information. Make sure that CPUID actually agrees with
+ * that.
+ */
+static void sanitize_package_id(struct cpuinfo_x86 *c)
+{
+#ifdef CONFIG_SMP
+ unsigned int pkg, apicid, cpu = smp_processor_id();
+
+ apicid = apic->cpu_present_to_apicid(cpu);
+ pkg = apicid >> boot_cpu_data.x86_coreid_bits;
+
+ if (apicid != c->initial_apicid) {
+ pr_err(FW_BUG "CPU%u: APIC id mismatch. Firmware: %x CPUID: %x\n",
+ cpu, apicid, c->initial_apicid);
+ }
+ if (pkg != c->phys_proc_id) {
+ pr_err(FW_BUG "CPU%u: Using firmware package id %u instead of %u\n",
+ cpu, pkg, c->phys_proc_id);
+ c->phys_proc_id = pkg;
+ }
+ c->logical_proc_id = topology_phys_to_logical_pkg(pkg);
+#else
+ c->locical_proc_id = 0;
+#endif
+}
+
+/*
* This does the hard work of actually picking apart the CPU stuff...
*/
static void identify_cpu(struct cpuinfo_x86 *c)
@@ -1103,8 +1131,7 @@ static void identify_cpu(struct cpuinfo_
#ifdef CONFIG_NUMA
numa_add_cpu(smp_processor_id());
#endif
- /* The boot/hotplug time assigment got cleared, restore it */
- c->logical_proc_id = topology_phys_to_logical_pkg(c->phys_proc_id);
+ sanitize_package_id(c);
}

/*

2016-11-09 15:40:27

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH] x86/cpuid: Deal with broken firmware once more

On Wed, 9 Nov 2016, Thomas Gleixner wrote:
> + if (pkg != c->phys_proc_id) {
> + pr_err(FW_BUG "CPU%u: Using firmware package id %u instead of %u\n",
> + cpu, pkg, c->phys_proc_id);
> + c->phys_proc_id = pkg;
> + }
> + c->logical_proc_id = topology_phys_to_logical_pkg(pkg);
> +#else
> + c->locical_proc_id = 0;

That want's to be logical_proc_id of course ...

Stupid me.

2016-11-09 16:03:41

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] x86/cpuid: Deal with broken firmware once more

On Wed, Nov 09, 2016 at 04:35:51PM +0100, Thomas Gleixner wrote:
> Both ACPI and MP specifications require that the APIC id in the respective
> tables must be the same as the APIC id in CPUID.
>
> The kernel retrieves the physical package id from the APIC id during the
> ACPI/MP table scan and builds the physical to logical package map.
>
> There exist Virtualbox and Xen implementations which violate the spec. As a

ISTR it was VMware, not VirtualBox, but whatever.. they're both crazy
virt stuff.

> /*
> + * The physical to logical package id mapping is initialized from the
> + * acpi/mptables information. Make sure that CPUID actually agrees with
> + * that.
> + */
> +static void sanitize_package_id(struct cpuinfo_x86 *c)
> +{
> +#ifdef CONFIG_SMP
> + unsigned int pkg, apicid, cpu = smp_processor_id();
> +
> + apicid = apic->cpu_present_to_apicid(cpu);
> + pkg = apicid >> boot_cpu_data.x86_coreid_bits;
> +
> + if (apicid != c->initial_apicid) {
> + pr_err(FW_BUG "CPU%u: APIC id mismatch. Firmware: %x CPUID: %x\n",
> + cpu, apicid, c->initial_apicid);

Should we not also 'fix' c->initial_apicid ?

> + }
> + if (pkg != c->phys_proc_id) {
> + pr_err(FW_BUG "CPU%u: Using firmware package id %u instead of %u\n",
> + cpu, pkg, c->phys_proc_id);
> + c->phys_proc_id = pkg;
> + }
> + c->logical_proc_id = topology_phys_to_logical_pkg(pkg);
> +#else
> + c->locical_proc_id = 0;

UP FTW ;-)

> +#endif
> +}

2016-11-09 16:36:12

by Charles (Chas) Williams

[permalink] [raw]
Subject: Re: [PATCH] x86/cpuid: Deal with broken firmware once more



On 11/09/2016 11:03 AM, Peter Zijlstra wrote:
> On Wed, Nov 09, 2016 at 04:35:51PM +0100, Thomas Gleixner wrote:
>> Both ACPI and MP specifications require that the APIC id in the respective
>> tables must be the same as the APIC id in CPUID.
>>
>> The kernel retrieves the physical package id from the APIC id during the
>> ACPI/MP table scan and builds the physical to logical package map.
>>
>> There exist Virtualbox and Xen implementations which violate the spec. As a
>
> ISTR it was VMware, not VirtualBox, but whatever.. they're both crazy
> virt stuff.

Yes, this was VMware in particular. It would be good to get this comment
right so as not to mislead anyone.

>> /*
>> + * The physical to logical package id mapping is initialized from the
>> + * acpi/mptables information. Make sure that CPUID actually agrees with
>> + * that.
>> + */
>> +static void sanitize_package_id(struct cpuinfo_x86 *c)
>> +{
>> +#ifdef CONFIG_SMP
>> + unsigned int pkg, apicid, cpu = smp_processor_id();
>> +
>> + apicid = apic->cpu_present_to_apicid(cpu);
>> + pkg = apicid >> boot_cpu_data.x86_coreid_bits;
>> +
>> + if (apicid != c->initial_apicid) {
>> + pr_err(FW_BUG "CPU%u: APIC id mismatch. Firmware: %x CPUID: %x\n",
>> + cpu, apicid, c->initial_apicid);
>
> Should we not also 'fix' c->initial_apicid ?

Since we have c->apicid and c->initial_apicid it seems reasonable to keep one
set to the "correct" value. I don't think c->initial_apicid is used past
this.

I should have some tests on this patch later today.

2016-11-09 18:16:38

by Charles (Chas) Williams

[permalink] [raw]
Subject: Re: [PATCH] x86/cpuid: Deal with broken firmware once more

On 11/09/2016 10:35 AM, Thomas Gleixner wrote:
> Both ACPI and MP specifications require that the APIC id in the respective
> tables must be the same as the APIC id in CPUID.
>
> The kernel retrieves the physical package id from the APIC id during the
> ACPI/MP table scan and builds the physical to logical package map.
>
> There exist Virtualbox and Xen implementations which violate the spec. As a
> result the physical to logical package map, which relies on the ACPI/MP
> tables does not work on those systems, because the CPUID initialized
> physical package id does not match the firmware id. This causes system
> crashes and malfunction due to invalid package mappings.
>
> The only way to cure this is to sanitize the physical package id after the
> CPUID enumeration and yell when the APIC ids are different. If the physical
> package IDs differ use the package information from the ACPI/MP tables so
> the existing logical package map just works.
>
> Reported-by: "Charles (Chas) Williams" <[email protected]>,
> Reported-by: M. Vefa Bicakci <[email protected]>
> Signed-off-by: Thomas Gleixner <[email protected]>


For 4 virtual sockets, 1 core per socket VM:

[ 0.235459] .... node #0, CPUs: #1
[ 0.238579] Disabled fast string operations
[ 0.238620] mce: CPU supports 0 MCE banks
[ 0.238864] [Firmware Bug]: CPU1: APIC id mismatch. Firmware: 1 CPUID: 2
[ 0.238878] [Firmware Bug]: CPU1: Using firmware package id 1 instead of 2
[ 0.239502] #2
[ 0.241298] Disabled fast string operations
[ 0.241356] mce: CPU supports 0 MCE banks
[ 0.241429] [Firmware Bug]: CPU2: APIC id mismatch. Firmware: 2 CPUID: 4
[ 0.241431] [Firmware Bug]: CPU2: Using firmware package id 2 instead of 4
[ 0.241631] #3
[ 0.244075] Disabled fast string operations
[ 0.244112] mce: CPU supports 0 MCE banks
[ 0.244284] [Firmware Bug]: CPU3: APIC id mismatch. Firmware: 3 CPUID: 6
[ 0.244293] [Firmware Bug]: CPU3: Using firmware package id 3 instead of 6

For a 2 virtual sockets, 2 cores per socket, VMware seems to get its
APIC table correct as this fixup code isn't triggered. The mapping looks like:

[ 0.028911] topology_update_package_map: apicid 0 pkg 0 cpu 0
[ 0.029068] smpboot: APIC(0) Converting physical 0 to logical package 0, cpu 0
[ 0.029072] topology_update_package_map: apicid 1 pkg 0 cpu 1
[ 0.029220] topology_update_package_map: apicid 2 pkg 1 cpu 2
[ 0.029376] smpboot: APIC(2) Converting physical 1 to logical package 1, cpu 2
[ 0.029381] topology_update_package_map: apicid 3 pkg 1 cpu 3
[ 0.029525] smpboot: Max logical packages: 2

For a VM with 1 virtual socket and 4 cores, the behavior is again correct.

[ 0.016198] topology_update_package_map: apicid 0 pkg 0 cpu 0
[ 0.016271] smpboot: APIC(0) Converting physical 0 to logical package 0, cpu 0 (ffff88023fc0a040)
[ 0.016273] topology_update_package_map: apicid 1 pkg 0 cpu 1
[ 0.016336] topology_update_package_map: apicid 2 pkg 0 cpu 2
[ 0.016397] topology_update_package_map: apicid 3 pkg 0 cpu 3

It looks like VMware might have some assumption about the minimum number
of cores on a virtual socket. Regardless, the fix solves my problem!

Thanks!

2016-11-09 18:41:21

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH] x86/cpuid: Deal with broken firmware once more

On Wed, 9 Nov 2016, Charles (Chas) Williams wrote:
> On 11/09/2016 11:03 AM, Peter Zijlstra wrote:
> > On Wed, Nov 09, 2016 at 04:35:51PM +0100, Thomas Gleixner wrote:
> > > Both ACPI and MP specifications require that the APIC id in the respective
> > > tables must be the same as the APIC id in CPUID.
> > >
> > > The kernel retrieves the physical package id from the APIC id during the
> > > ACPI/MP table scan and builds the physical to logical package map.
> > >
> > > There exist Virtualbox and Xen implementations which violate the spec. As
> > > a
> >
> > ISTR it was VMware, not VirtualBox, but whatever.. they're both crazy
> > virt stuff.
>
> Yes, this was VMware in particular. It would be good to get this comment
> right so as not to mislead anyone.

Sure, will fix.

>
> > > /*
> > > + * The physical to logical package id mapping is initialized from the
> > > + * acpi/mptables information. Make sure that CPUID actually agrees with
> > > + * that.
> > > + */
> > > +static void sanitize_package_id(struct cpuinfo_x86 *c)
> > > +{
> > > +#ifdef CONFIG_SMP
> > > + unsigned int pkg, apicid, cpu = smp_processor_id();
> > > +
> > > + apicid = apic->cpu_present_to_apicid(cpu);
> > > + pkg = apicid >> boot_cpu_data.x86_coreid_bits;
> > > +
> > > + if (apicid != c->initial_apicid) {
> > > + pr_err(FW_BUG "CPU%u: APIC id mismatch. Firmware: %x CPUID:
> > > %x\n",
> > > + cpu, apicid, c->initial_apicid);
> >
> > Should we not also 'fix' c->initial_apicid ?
>
> Since we have c->apicid and c->initial_apicid it seems reasonable to keep one
> set to the "correct" value. I don't think c->initial_apicid is used past
> this.

It is, but just for a printk in the MCE code, so it should not matter at all.

> I should have some tests on this patch later today.
>

Subject: [tip:x86/urgent] x86/cpu: Deal with broken firmware (VMWare/XEN)

Commit-ID: d49597fd3bc7d9534de55e9256767f073be1b33a
Gitweb: http://git.kernel.org/tip/d49597fd3bc7d9534de55e9256767f073be1b33a
Author: Thomas Gleixner <[email protected]>
AuthorDate: Wed, 9 Nov 2016 16:35:51 +0100
Committer: Thomas Gleixner <[email protected]>
CommitDate: Wed, 9 Nov 2016 21:05:01 +0100

x86/cpu: Deal with broken firmware (VMWare/XEN)

Both ACPI and MP specifications require that the APIC id in the respective
tables must be the same as the APIC id in CPUID.

The kernel retrieves the physical package id from the APIC id during the
ACPI/MP table scan and builds the physical to logical package map. The
physical package id which is used after a CPU comes up is retrieved from
CPUID. So we rely on ACPI/MP tables and CPUID agreeing in that respect.

There exist VMware and XEN implementations which violate the spec. As a
result the physical to logical package map, which relies on the ACPI/MP
tables does not work on those systems, because the CPUID initialized
physical package id does not match the firmware id. This causes system
crashes and malfunction due to invalid package mappings.

The only way to cure this is to sanitize the physical package id after the
CPUID enumeration and yell when the APIC ids are different. Fix up the
initial APIC id, which is fine as it is only used printout purposes.

If the physical package IDs differ yell and use the package information
from the ACPI/MP tables so the existing logical package map just works.

Chas provided the resulting dmesg output for his affected 4 virtual
sockets, 1 core per socket VM:

[Firmware Bug]: CPU1: APIC id mismatch. Firmware: 1 CPUID: 2
[Firmware Bug]: CPU1: Using firmware package id 1 instead of 2
....

Reported-and-tested-by: "Charles (Chas) Williams" <[email protected]>,
Reported-by: M. Vefa Bicakci <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Sebastian Andrzej Siewior <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Alok Kataria <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: #4.6+ <stable@vger,kernel.org>
Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1611091613540.3501@nanos
Signed-off-by: Thomas Gleixner <[email protected]>
---
arch/x86/kernel/cpu/common.c | 32 ++++++++++++++++++++++++++++++--
1 file changed, 30 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 9bd910a..cc9e980 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -979,6 +979,35 @@ static void x86_init_cache_qos(struct cpuinfo_x86 *c)
}

/*
+ * The physical to logical package id mapping is initialized from the
+ * acpi/mptables information. Make sure that CPUID actually agrees with
+ * that.
+ */
+static void sanitize_package_id(struct cpuinfo_x86 *c)
+{
+#ifdef CONFIG_SMP
+ unsigned int pkg, apicid, cpu = smp_processor_id();
+
+ apicid = apic->cpu_present_to_apicid(cpu);
+ pkg = apicid >> boot_cpu_data.x86_coreid_bits;
+
+ if (apicid != c->initial_apicid) {
+ pr_err(FW_BUG "CPU%u: APIC id mismatch. Firmware: %x CPUID: %x\n",
+ cpu, apicid, c->initial_apicid);
+ c->initial_apicid = apicid;
+ }
+ if (pkg != c->phys_proc_id) {
+ pr_err(FW_BUG "CPU%u: Using firmware package id %u instead of %u\n",
+ cpu, pkg, c->phys_proc_id);
+ c->phys_proc_id = pkg;
+ }
+ c->logical_proc_id = topology_phys_to_logical_pkg(pkg);
+#else
+ c->logical_proc_id = 0;
+#endif
+}
+
+/*
* This does the hard work of actually picking apart the CPU stuff...
*/
static void identify_cpu(struct cpuinfo_x86 *c)
@@ -1103,8 +1132,7 @@ static void identify_cpu(struct cpuinfo_x86 *c)
#ifdef CONFIG_NUMA
numa_add_cpu(smp_processor_id());
#endif
- /* The boot/hotplug time assigment got cleared, restore it */
- c->logical_proc_id = topology_phys_to_logical_pkg(c->phys_proc_id);
+ sanitize_package_id(c);
}

/*

2016-11-10 03:57:45

by M. Vefa Bicakci

[permalink] [raw]
Subject: Re: [PATCH] x86/cpuid: Deal with broken firmware once more

On 11/09/2016 06:35 PM, Thomas Gleixner wrote:
> Both ACPI and MP specifications require that the APIC id in the respective
> tables must be the same as the APIC id in CPUID.
>
> The kernel retrieves the physical package id from the APIC id during the
> ACPI/MP table scan and builds the physical to logical package map.
>
> There exist Virtualbox and Xen implementations which violate the spec. As a
> result the physical to logical package map, which relies on the ACPI/MP
> tables does not work on those systems, because the CPUID initialized
> physical package id does not match the firmware id. This causes system
> crashes and malfunction due to invalid package mappings.
>
> The only way to cure this is to sanitize the physical package id after the
> CPUID enumeration and yell when the APIC ids are different. If the physical
> package IDs differ use the package information from the ACPI/MP tables so
> the existing logical package map just works.
>
> Reported-by: "Charles (Chas) Williams" <[email protected]>,
> Reported-by: M. Vefa Bicakci <[email protected]>
> Signed-off-by: Thomas Gleixner <[email protected]>

Hello Thomas and Sebastian,

Sorry for the delay in reporting what I have found out and for the delay in testing this patch, which has been due to my health.

I have found that your patch unfortunately does not improve the situation for me. Here is an excerpt obtained from the dmesg of a kernel compiled with this patch *as well as* Sebastian's patch:

=== 8< ===
[ 0.002561] CPU: Physical Processor ID: 0

[ 0.002566] CPU: Processor Core ID: 0

[ 0.002572] [Firmware Bug]: CPU0: APIC id mismatch. Firmware: ffff CPUID: 2

[ 0.002577] [Firmware Bug]: CPU0: Using firmware package id 4095 instead of 0

[ 0.002586] Last level iTLB entries: 4KB 512, 2MB 8, 4MB 8

[ 0.002591] Last level dTLB entries: 4KB 512, 2MB 32, 4MB 32, 1GB 0

[ 0.033319] ftrace: allocating 28753 entries in 113 pages

[ 0.040121] cpu 0 spinlock event irq 1

[ 0.040145] smpboot: Max logical packages: 1

[ 0.040155] VPMU disabled by hypervisor.

[ 0.040181] Performance Events: unsupported p6 CPU model 42 no PMU driver, software events only.

[ 0.047050] NMI watchdog: disabled (cpu0): hardware events not enabled

[ 0.047065] NMI watchdog: Shutting down hard lockup detector on all cpus

[ 0.052015] installing Xen timer for CPU 1

[ 0.052074] SMP alternatives: switching to SMP code

[ 0.002000] Disabled fast string operations

[ 0.002000] [Firmware Bug]: CPU1: APIC id mismatch. Firmware: ffff CPUID: 2

[ 0.002000] [Firmware Bug]: CPU1: Using firmware package id 4095 instead of 0

[ 0.002000] smpboot: APIC(ffff) Converting physical 4095 to logical package 0

[ 0.078061] cpu 1 spinlock event irq 13

...
[ 0.216404] Freeing initrd memory: 4340K (ffff880001fa7000 - ffff8800023e4000)

[ 0.216487] RAPL PMU: rapl pmu error: max package: 1 but CPU0 belongs to 65535

[ 0.217572] futex hash table entries: 512 (order: 3, 32768 bytes)

[ 0.218293] Initialise system trusted keyrings

...
[ 0.216404] Freeing initrd memory: 4340K (ffff880001fa7000 - ffff8800023e4000)

[ 0.216487] RAPL PMU: rapl pmu error: max package: 1 but CPU0 belongs to 65535

[ 0.217572] futex hash table entries: 512 (order: 3, 32768 bytes)

...
[ 2.974474] intel_rapl: Found RAPL domain package

[ 2.974489] intel_rapl: Found RAPL domain core

[ 2.974498] intel_rapl: Found RAPL domain uncore

[ 2.974518] intel_rapl: RAPL package 4095 domain package locked by BIOS

=== >8 ===

As you can see, your patch unfortunately does not correct the issue with the virtual CPU package identifiers in Xen-based virtual machines using para-virtualization.

In summary, the root cause of the issue for me appears to be that the boot-up code in the init_apic_mappings function switches the APIC 'ops' structure from Xen's 'ops' structure to the no-op ops structure. Due to this, the smp_init_package_map uses the no-op APIC ops structure's cpu_present_to_to_apicid function, even though it should use the corresponding method from Xen's APIC ops structure (i.e., xen_cpu_present_to_apicid).

Here is a dmesg excerpt with a kernel patched with Sebastian's patch (and some debugging code), exhibiting the issue I have just explained: (Note the 'switched to apic NOOP' line.)

=== 8< ===
(early) [ 0.000000] SFI: Simple Firmware Interface v0.81 http://simplefirmware.org
(early) [ 0.000000] smpboot: Allowing 2 CPUs, 0 hotplug CPUs
(early) [ 0.000000] No local APIC present
(early) [ 0.000000] APIC: disable apic facility
(early) [ 0.000000] APIC: switched to apic NOOP
(early) [ 0.000000] e820: [mem 0xfa000000-0xffffffff] available for PCI devices
(early) [ 0.000000] Booting paravirtualized kernel on Xen
...
[ 0.034082] mvb: kernel_init_freeable:1007: About to call smp_prepare_cpus...
[ 0.034123] cpu 0 spinlock event irq 1
[ 0.034138] smpboot: mvb: smp_init_package_map:372: max_physical_pkg_id, after set: 4096
[ 0.034146] mvb: __default_cpu_present_to_apicid:612: Returning 65535! mps_cpu: 0, nr_cpu_ids: 2, cpu_present(mps_cpu): 1
[ 0.034155] smpboot: mvb: smp_init_package_map:379: apicid: 65535, apic_id_valid(apicid):0
[ 0.034162] smpboot: Max logical packages: 1
[ 0.034169] VPMU disabled by hypervisor.
[ 0.034187] Performance Events: unsupported p6 CPU model 42 no PMU driver, software events only.
[ 0.041131] NMI watchdog: disabled (cpu0): hardware events not enabled
[ 0.041142] NMI watchdog: Shutting down hard lockup detector on all cpus
[ 0.046024] installing Xen timer for CPU 1
[ 0.046121] SMP alternatives: switching to SMP code
[ 0.002000] Disabled fast string operations
[ 0.002000] mvb: detect_ht: CPU has hyper-threading capability
[ 0.002000] mvb: CPU: Physical Processor ID: 0
[ 0.002000] mvb: CPU: Processor Core ID: 0
[ 0.002000] mvb: identify_cpu:1112: c: ffff880013b0a040, c->logical_proc_id: 65535
[ 0.002000] mvb: __default_cpu_present_to_apicid:612: Returning 65535! mps_cpu: 1, nr_cpu_ids: 2, cpu_present(mps_cpu): 1
[ 0.002000] smpboot: mvb: topology_update_package_map:270: cpu: 1, pkg: 4095
[ 0.002000] smpboot: APIC(ffff) Converting physical 4095 to logical package 0
[ 0.002000] smpboot: mvb: topology_update_package_map:305: cpu: 1, cpu_data(cpu).logical_proc_id: 0
...
[ 0.266540] RAPL PMU: mvb: init_rapl_pmus:686: rapl pmu: max package: 1
[ 0.266547] RAPL PMU: mvb: rapl pmu: CPU0 (ffff880013a0a040) belongs to package ID 65535.
[ 0.266559] RAPL PMU: mvb: rapl pmu: Package ID >= max package for CPU0 (ffff880013a0a040). This is an error.
[ 0.266569] RAPL PMU: mvb: rapl pmu: CPU1 (ffff880013b0a040) belongs to package ID 0.
=== >8 ===

Through some debugging, last week I came up with the patch at the end of this e-mail, which does correct the issue for me. Once again, I am sorry for reporting this too late.

With a kernel built with the patch below, Sebastian's patch and some debugging code, I obtain the following dmesg output, which appears to be correct and does not emit RAPL-related errors during boot-up:

=== 8< ===
[ 0.032083] mvb: kernel_init_freeable:1007: About to call smp_prepare_cpus...

[ 0.032106] cpu 0 spinlock event irq 1

[ 0.032119] smpboot: mvb: smp_init_package_map:372: max_physical_pkg_id, after set: 4096

[ 0.032127] mvb: xen_cpu_present_to_apicid:151: cpu: 0, ret: 0

[ 0.032132] smpboot: mvb: topology_update_package_map:270: cpu: 0, pkg: 0

[ 0.032137] smpboot: APIC(0) Converting physical 0 to logical package 0

[ 0.032142] smpboot: mvb: topology_update_package_map:305: cpu: 0, cpu_data(cpu).logical_proc_id: 0

[ 0.032149] smpboot: mvb: smp_init_package_map:385: topology_update_package_map successful.

[ 0.032155] smpboot: Max logical packages: 1

[ 0.032161] VPMU disabled by hypervisor.

[ 0.032177] Performance Events: unsupported p6 CPU model 42 no PMU driver, software events only.

[ 0.039061] NMI watchdog: disabled (cpu0): hardware events not enabled

[ 0.039102] NMI watchdog: Shutting down hard lockup detector on all cpus

[ 0.045048] installing Xen timer for CPU 1

[ 0.045145] SMP alternatives: switching to SMP code

[ 0.002000] Disabled fast string operations

[ 0.002000] mvb: detect_ht: CPU has hyper-threading capability

[ 0.002000] mvb: CPU: Physical Processor ID: 0

[ 0.002000] mvb: CPU: Processor Core ID: 1

[ 0.002000] mvb: identify_cpu:1112: c: ffff88001790a040, c->logical_proc_id: 0

[ 0.002000] mvb: xen_cpu_present_to_apicid:151: cpu: 1, ret: 0

[ 0.002000] smpboot: mvb: topology_update_package_map:270: cpu: 1, pkg: 0

[ 0.002000] smpboot: mvb: topology_update_package_map:305: cpu: 1, cpu_data(cpu).logical_proc_id: 0

[ 0.067149] cpu 1 spinlock event irq 13

...
[ 0.208395] RAPL PMU: mvb: init_rapl_pmus:686: rapl pmu: max package: 1

[ 0.208400] RAPL PMU: mvb: rapl pmu: CPU0 (ffff88001780a040) belongs to package ID 0.

[ 0.208409] RAPL PMU: mvb: rapl pmu: CPU1 (ffff88001790a040) belongs to package ID 0.

[ 0.208491] RAPL PMU: API unit is 2^-32 Joules, 3 fixed counters, 163840 ms ovfl timer

[ 0.208507] RAPL PMU: hw unit of domain pp0-core 2^-16 Joules

[ 0.208512] RAPL PMU: hw unit of domain package 2^-16 Joules

[ 0.208517] RAPL PMU: hw unit of domain pp1-gpu 2^-16 Joules

[ 0.209083] futex hash table entries: 512 (order: 3, 32768 bytes)

=== >8 ===

My patch may not be a very elegant correction for the issue at hand, unfortunately (and it probably does not comply with the kernel patch submission guidelines).

In addition, I think Sebastian's patch should be included in the mainline kernel, so that potential bugs do not bring down the kernel with a RAPL-related oops at boot-up time with virtualization.

Please note that I will need to be offline for most of the day today (my current timezone is UTC+03:00), and as a result my responses to your replies will most likely be a bit late.

Thank you,

Vefa


>From b7097820285ab6a8588879969d74c56d890d4fd4 Mon Sep 17 00:00:00 2001
From: "M. Vefa Bicakci" <[email protected]>
Date: Fri, 4 Nov 2016 10:01:19 +0300
Subject: [PATCH] Do not reset apic to apic_noop with Xen

---
arch/x86/include/asm/apic.h | 3 +++
arch/x86/kernel/apic/apic.c | 7 +++++--
arch/x86/xen/apic.c | 4 +++-
3 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h
index b3d4c042e610..8c37580b7eb7 100644
--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -359,6 +359,9 @@ struct apic {
*/
extern struct apic *apic;

+/* Indicates whether a hypervisor has already set 'apic'. */
+extern int apic_set_by_hypervisor;
+
/*
* APIC drivers are probed based on how they are listed in the .apicdrivers
* section. So the order is important and enforced by the ordering
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index 076c315cdf18..a3a1d4570acf 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -172,6 +172,8 @@ unsigned long mp_lapic_addr;
int disable_apic;
/* Disable local APIC timer from the kernel commandline or via dmi quirk */
static int disable_apic_timer __initdata;
+/* Indicates whether a hypervisor has already set 'apic'. */
+int apic_set_by_hypervisor;
/* Local APIC timer works in C2 */
int local_apic_timer_c2_ok;
EXPORT_SYMBOL_GPL(local_apic_timer_c2_ok);
@@ -1788,8 +1790,9 @@ void __init init_apic_mappings(void)
return;
}

- /* If no local APIC can be found return early */
- if (!smp_found_config && detect_init_APIC()) {
+ if (apic_set_by_hypervisor) {
+ /* Hypervisor has already taken care of the APIC. */
+ } else if (!smp_found_config && detect_init_APIC()) {
/* lets NOP'ify apic operations */
pr_info("APIC: disable apic facility\n");
apic_disable();
diff --git a/arch/x86/xen/apic.c b/arch/x86/xen/apic.c
index 5ac792a1d419..b79a003c13a5 100644
--- a/arch/x86/xen/apic.c
+++ b/arch/x86/xen/apic.c
@@ -229,8 +229,10 @@ void __init xen_init_apic(void)
x86_io_apic_ops.read = xen_io_apic_read;
/* On PV guests the APIC CPUID bit is disabled so none of the
* routines end up executing. */
- if (!xen_initial_domain())
+ if (!xen_initial_domain()) {
apic = &xen_pv_apic;
+ apic_set_by_hypervisor = 1;
+ }

x86_platform.apic_post_init = xen_apic_check;
}
--
2.5.5

2016-11-10 10:51:26

by Charles (Chas) Williams

[permalink] [raw]
Subject: Re: [PATCH] x86/cpuid: Deal with broken firmware once more



On 11/09/2016 10:57 PM, M. Vefa Bicakci wrote:
> [ 0.002000] mvb: CPU: Physical Processor ID: 0
> [ 0.002000] mvb: CPU: Processor Core ID: 0
> [ 0.002000] mvb: identify_cpu:1112: c: ffff880013b0a040, c->logical_proc_id: 65535
> [ 0.002000] mvb: __default_cpu_present_to_apicid:612: Returning 65535! mps_cpu: 1, nr_cpu_ids: 2, cpu_present(mps_cpu): 1
> [ 0.002000] smpboot: mvb: topology_update_package_map:270: cpu: 1, pkg: 4095
> [ 0.002000] smpboot: APIC(ffff) Converting physical 4095 to logical package 0
> [ 0.002000] smpboot: mvb: topology_update_package_map:305: cpu: 1, cpu_data(cpu).logical_proc_id: 0

This seems strange. 0xffff is BAD_APICID. Why didn't this fail here:

for_each_present_cpu(cpu) {
unsigned int apicid = apic->cpu_present_to_apicid(cpu);

if (apicid == BAD_APICID || !apic->apic_id_valid(apicid)) <<<<<<<<<<
continue;
if (!topology_update_package_map(apicid, cpu))
continue;

topology_update_package_map() should never have been called?

2016-11-10 11:15:48

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH] x86/cpuid: Deal with broken firmware once more

On Thu, 10 Nov 2016, M. Vefa Bicakci wrote:

> I have found that your patch unfortunately does not improve the situation
> for me. Here is an excerpt obtained from the dmesg of a kernel compiled
> with this patch *as well as* Sebastian's patch:

> [ 0.002561] CPU: Physical Processor ID: 0
> [ 0.002566] CPU: Processor Core ID: 0
> [ 0.002572] [Firmware Bug]: CPU0: APIC id mismatch. Firmware: ffff CPUID: 2

So apic->cpu_present_to_apicid() gives us a completely bogus APIC id which
translates to a bogus package id. And looking at the XEN code:

xen_pv_apic.cpu_present_to_apicid = xen_cpu_present_to_apicid,

and xen_cpu_present_to_apicid does:

static int xen_cpu_present_to_apicid(int cpu)
{
if (cpu_present(cpu))
return xen_get_apic_id(xen_apic_read(APIC_ID));
else
return BAD_APICID;
}

So independent of which present CPU we query we get just some random
information, in the above case we get BAD_APICID from xen_apic_read() not
from the else path as this CPU _IS_ present.

What's so wrong with storing the fricking firmware supplied APICid as
everybody else does and report it back when queried?

This damned attitude of we just hack the code into submission and let
everybody else deal with the outcoming is utterly annoying.

Thanks,

tglx

2016-11-10 11:16:39

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH] x86/cpuid: Deal with broken firmware once more

On Thu, 10 Nov 2016, Charles (Chas) Williams wrote:
> On 11/09/2016 10:57 PM, M. Vefa Bicakci wrote:
> > [ 0.002000] mvb: CPU: Physical Processor ID: 0
> > [ 0.002000] mvb: CPU: Processor Core ID: 0
> > [ 0.002000] mvb: identify_cpu:1112: c: ffff880013b0a040,
> > c->logical_proc_id: 65535
> > [ 0.002000] mvb: __default_cpu_present_to_apicid:612: Returning 65535!
> > mps_cpu: 1, nr_cpu_ids: 2, cpu_present(mps_cpu): 1
> > [ 0.002000] smpboot: mvb: topology_update_package_map:270: cpu: 1, pkg:
> > 4095
> > [ 0.002000] smpboot: APIC(ffff) Converting physical 4095 to logical
> > package 0
> > [ 0.002000] smpboot: mvb: topology_update_package_map:305: cpu: 1,
> > cpu_data(cpu).logical_proc_id: 0
>
> This seems strange. 0xffff is BAD_APICID. Why didn't this fail here:
>
> for_each_present_cpu(cpu) {
> unsigned int apicid = apic->cpu_present_to_apicid(cpu);
>
> if (apicid == BAD_APICID || !apic->apic_id_valid(apicid))
> <<<<<<<<<<
> continue;
> if (!topology_update_package_map(apicid, cpu))
> continue;
>
> topology_update_package_map() should never have been called?

That's right. Looking into it ....


2016-11-10 11:39:55

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] x86/cpuid: Deal with broken firmware once more

On Thu, Nov 10, 2016 at 12:13:04PM +0100, Thomas Gleixner wrote:
> What's so wrong with storing the fricking firmware supplied APICid as
> everybody else does and report it back when queried?
>
> This damned attitude of we just hack the code into submission and let
> everybody else deal with the outcoming is utterly annoying.

At some point I'd recommend just marking things BROKEN and getting on
with life. There's only so much crap you can deal with.

What Xen does here is completely insane.

2016-11-10 14:04:44

by Boris Ostrovsky

[permalink] [raw]
Subject: Re: [PATCH] x86/cpuid: Deal with broken firmware once more

On 11/10/2016 06:13 AM, Thomas Gleixner wrote:
> On Thu, 10 Nov 2016, M. Vefa Bicakci wrote:
>
>> I have found that your patch unfortunately does not improve the situation
>> for me. Here is an excerpt obtained from the dmesg of a kernel compiled
>> with this patch *as well as* Sebastian's patch:
>> [ 0.002561] CPU: Physical Processor ID: 0
>> [ 0.002566] CPU: Processor Core ID: 0
>> [ 0.002572] [Firmware Bug]: CPU0: APIC id mismatch. Firmware: ffff CPUID: 2
> So apic->cpu_present_to_apicid() gives us a completely bogus APIC id which
> translates to a bogus package id. And looking at the XEN code:
>
> xen_pv_apic.cpu_present_to_apicid = xen_cpu_present_to_apicid,
>
> and xen_cpu_present_to_apicid does:
>
> static int xen_cpu_present_to_apicid(int cpu)
> {
> if (cpu_present(cpu))
> return xen_get_apic_id(xen_apic_read(APIC_ID));
> else
> return BAD_APICID;
> }
>
> So independent of which present CPU we query we get just some random
> information, in the above case we get BAD_APICID from xen_apic_read() not
> from the else path as this CPU _IS_ present.
>
> What's so wrong with storing the fricking firmware supplied APICid as
> everybody else does and report it back when queried?

By firmware you mean ACPI? It is most likely not available to PV guests.
How about returning cpu_data(cpu).initial_apicid?

And what was the original problem?

-boris

>
> This damned attitude of we just hack the code into submission and let
> everybody else deal with the outcoming is utterly annoying.
>
> Thanks,
>
> tglx


2016-11-10 15:07:32

by Charles (Chas) Williams

[permalink] [raw]
Subject: Re: [PATCH] x86/cpuid: Deal with broken firmware once more



On 11/10/2016 09:02 AM, Boris Ostrovsky wrote:
> On 11/10/2016 06:13 AM, Thomas Gleixner wrote:
>> On Thu, 10 Nov 2016, M. Vefa Bicakci wrote:
>>
>>> I have found that your patch unfortunately does not improve the situation
>>> for me. Here is an excerpt obtained from the dmesg of a kernel compiled
>>> with this patch *as well as* Sebastian's patch:
>>> [ 0.002561] CPU: Physical Processor ID: 0
>>> [ 0.002566] CPU: Processor Core ID: 0
>>> [ 0.002572] [Firmware Bug]: CPU0: APIC id mismatch. Firmware: ffff CPUID: 2
>> So apic->cpu_present_to_apicid() gives us a completely bogus APIC id which
>> translates to a bogus package id. And looking at the XEN code:
>>
>> xen_pv_apic.cpu_present_to_apicid = xen_cpu_present_to_apicid,
>>
>> and xen_cpu_present_to_apicid does:
>>
>> static int xen_cpu_present_to_apicid(int cpu)
>> {
>> if (cpu_present(cpu))
>> return xen_get_apic_id(xen_apic_read(APIC_ID));
>> else
>> return BAD_APICID;
>> }
>>
>> So independent of which present CPU we query we get just some random
>> information, in the above case we get BAD_APICID from xen_apic_read() not
>> from the else path as this CPU _IS_ present.
>>
>> What's so wrong with storing the fricking firmware supplied APICid as
>> everybody else does and report it back when queried?
>
> By firmware you mean ACPI? It is most likely not available to PV guests.
> How about returning cpu_data(cpu).initial_apicid?
>
> And what was the original problem?

The original issue I found was that VMware was returning a different set
of APIC id's in the ACPI tables than what it advertised on the CPU's.

http://www.mail-archive.com/[email protected]/msg1266716.html

>>
>> This damned attitude of we just hack the code into submission and let
>> everybody else deal with the outcoming is utterly annoying.
>>
>> Thanks,
>>
>> tglx
>
>

2016-11-10 15:15:00

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH] x86/cpuid: Deal with broken firmware once more

On Thu, 10 Nov 2016, Boris Ostrovsky wrote:
> On 11/10/2016 06:13 AM, Thomas Gleixner wrote:
> > On Thu, 10 Nov 2016, M. Vefa Bicakci wrote:
> >
> >> I have found that your patch unfortunately does not improve the situation
> >> for me. Here is an excerpt obtained from the dmesg of a kernel compiled
> >> with this patch *as well as* Sebastian's patch:
> >> [ 0.002561] CPU: Physical Processor ID: 0
> >> [ 0.002566] CPU: Processor Core ID: 0
> >> [ 0.002572] [Firmware Bug]: CPU0: APIC id mismatch. Firmware: ffff CPUID: 2
> > So apic->cpu_present_to_apicid() gives us a completely bogus APIC id which
> > translates to a bogus package id. And looking at the XEN code:
> >
> > xen_pv_apic.cpu_present_to_apicid = xen_cpu_present_to_apicid,
> >
> > and xen_cpu_present_to_apicid does:
> >
> > static int xen_cpu_present_to_apicid(int cpu)
> > {
> > if (cpu_present(cpu))
> > return xen_get_apic_id(xen_apic_read(APIC_ID));
> > else
> > return BAD_APICID;
> > }
> >
> > So independent of which present CPU we query we get just some random
> > information, in the above case we get BAD_APICID from xen_apic_read() not
> > from the else path as this CPU _IS_ present.
> >
> > What's so wrong with storing the fricking firmware supplied APICid as
> > everybody else does and report it back when queried?
>
> By firmware you mean ACPI? It is most likely not available to PV guests.

You either have to provide ACPI or MP tables. And either of those has to
provide the intial APIC ids for the CPUs. They are supposed to match the
IDs which are in the CPUID leafs.

> How about returning cpu_data(cpu).initial_apicid?

For a not yet brought up CPU. That's what the initial ACPI/MP table
enumeration is for.

Thanks,

tglx

2016-11-10 15:29:56

by Boris Ostrovsky

[permalink] [raw]
Subject: Re: [PATCH] x86/cpuid: Deal with broken firmware once more

On 11/10/2016 10:05 AM, Charles (Chas) Williams wrote:
>
>
> On 11/10/2016 09:02 AM, Boris Ostrovsky wrote:
>> On 11/10/2016 06:13 AM, Thomas Gleixner wrote:
>>> On Thu, 10 Nov 2016, M. Vefa Bicakci wrote:
>>>
>>>> I have found that your patch unfortunately does not improve the
>>>> situation
>>>> for me. Here is an excerpt obtained from the dmesg of a kernel
>>>> compiled
>>>> with this patch *as well as* Sebastian's patch:
>>>> [ 0.002561] CPU: Physical Processor ID: 0
>>>> [ 0.002566] CPU: Processor Core ID: 0
>>>> [ 0.002572] [Firmware Bug]: CPU0: APIC id mismatch. Firmware:
>>>> ffff CPUID: 2
>>> So apic->cpu_present_to_apicid() gives us a completely bogus APIC id
>>> which
>>> translates to a bogus package id. And looking at the XEN code:
>>>
>>> xen_pv_apic.cpu_present_to_apicid = xen_cpu_present_to_apicid,
>>>
>>> and xen_cpu_present_to_apicid does:
>>>
>>> static int xen_cpu_present_to_apicid(int cpu)
>>> {
>>> if (cpu_present(cpu))
>>> return xen_get_apic_id(xen_apic_read(APIC_ID));
>>> else
>>> return BAD_APICID;
>>> }
>>>
>>> So independent of which present CPU we query we get just some random
>>> information, in the above case we get BAD_APICID from
>>> xen_apic_read() not
>>> from the else path as this CPU _IS_ present.
>>>
>>> What's so wrong with storing the fricking firmware supplied APICid as
>>> everybody else does and report it back when queried?
>>
>> By firmware you mean ACPI? It is most likely not available to PV guests.
>> How about returning cpu_data(cpu).initial_apicid?
>>
>> And what was the original problem?
>
> The original issue I found was that VMware was returning a different set
> of APIC id's in the ACPI tables than what it advertised on the CPU's.
>
> http://www.mail-archive.com/[email protected]/msg1266716.html

For Xen, we recently added a6a198bc60e6 ("xen/x86: Update topology map
for PV VCPUs") to at least temporarily work around some topology map
problems that PV guests have with RAPL (which I think is what Vefa's
problem was).

-boris


2016-11-10 15:37:40

by Boris Ostrovsky

[permalink] [raw]
Subject: Re: [PATCH] x86/cpuid: Deal with broken firmware once more

On 11/10/2016 10:12 AM, Thomas Gleixner wrote:
> On Thu, 10 Nov 2016, Boris Ostrovsky wrote:
>> On 11/10/2016 06:13 AM, Thomas Gleixner wrote:
>>> On Thu, 10 Nov 2016, M. Vefa Bicakci wrote:
>>>
>>>> I have found that your patch unfortunately does not improve the situation
>>>> for me. Here is an excerpt obtained from the dmesg of a kernel compiled
>>>> with this patch *as well as* Sebastian's patch:
>>>> [ 0.002561] CPU: Physical Processor ID: 0
>>>> [ 0.002566] CPU: Processor Core ID: 0
>>>> [ 0.002572] [Firmware Bug]: CPU0: APIC id mismatch. Firmware: ffff CPUID: 2
>>> So apic->cpu_present_to_apicid() gives us a completely bogus APIC id which
>>> translates to a bogus package id. And looking at the XEN code:
>>>
>>> xen_pv_apic.cpu_present_to_apicid = xen_cpu_present_to_apicid,
>>>
>>> and xen_cpu_present_to_apicid does:
>>>
>>> static int xen_cpu_present_to_apicid(int cpu)
>>> {
>>> if (cpu_present(cpu))
>>> return xen_get_apic_id(xen_apic_read(APIC_ID));
>>> else
>>> return BAD_APICID;
>>> }
>>>
>>> So independent of which present CPU we query we get just some random
>>> information, in the above case we get BAD_APICID from xen_apic_read() not
>>> from the else path as this CPU _IS_ present.
>>>
>>> What's so wrong with storing the fricking firmware supplied APICid as
>>> everybody else does and report it back when queried?
>> By firmware you mean ACPI? It is most likely not available to PV guests.
> You either have to provide ACPI or MP tables. And either of those has to
> provide the intial APIC ids for the CPUs. They are supposed to match the
> IDs which are in the CPUID leafs.
>
>> How about returning cpu_data(cpu).initial_apicid?
> For a not yet brought up CPU. That's what the initial ACPI/MP table
> enumeration is for.

Unfortunately PV guests have neither. So we may need to emulate
something in xen_cpu_present_to_apicid().

-boris

(+xen-devel)

Subject: Re: [PATCH] x86/cpuid: Deal with broken firmware once more

On 2016-11-10 10:31:20 [-0500], Boris Ostrovsky wrote:
> For Xen, we recently added a6a198bc60e6 ("xen/x86: Update topology map
> for PV VCPUs") to at least temporarily work around some topology map
> problems that PV guests have with RAPL (which I think is what Vefa's
> problem was).

I wouldn't blame it on RAPL. It is RAPL that exposes the wrong APICID
<-> CPU <-> Package mapping.

> -boris

Sebastian

2016-11-10 17:16:27

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH] x86/cpuid: Deal with broken firmware once more

On Thu, 10 Nov 2016, Boris Ostrovsky wrote:
> On 11/10/2016 10:12 AM, Thomas Gleixner wrote:
> > On Thu, 10 Nov 2016, Boris Ostrovsky wrote:
> >> By firmware you mean ACPI? It is most likely not available to PV guests.
> > You either have to provide ACPI or MP tables. And either of those has to
> > provide the intial APIC ids for the CPUs. They are supposed to match the
> > IDs which are in the CPUID leafs.
> >
> >> How about returning cpu_data(cpu).initial_apicid?
> > For a not yet brought up CPU. That's what the initial ACPI/MP table
> > enumeration is for.
>
> Unfortunately PV guests have neither. So we may need to emulate
> something in xen_cpu_present_to_apicid().

SFI does the same thing and according to the dmesg which was posted, this
is using SFI. We also have devicetree based boot concept which provides the
APICids in the CPU enumeration at boot time in a way which the whole x86
machinery is expecting.

So what kind of APICid is XEN handing in via SFI? None, or just an invalid
one?

Thanks,

tglx

2016-11-10 17:18:06

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH] x86/cpuid: Deal with broken firmware once more

On Thu, 10 Nov 2016, Boris Ostrovsky wrote:
> On 11/10/2016 10:05 AM, Charles (Chas) Williams wrote:
> For Xen, we recently added a6a198bc60e6 ("xen/x86: Update topology map
> for PV VCPUs") to at least temporarily work around some topology map
> problems that PV guests have with RAPL (which I think is what Vefa's
> problem was).

RAPL is just the messenger and it could be any other facility relying on
the package map information. The underlying root cause is something
else. See the other mail.

Thanks,

tglx

2016-11-10 18:01:40

by Boris Ostrovsky

[permalink] [raw]
Subject: Re: [PATCH] x86/cpuid: Deal with broken firmware once more

On 11/10/2016 12:13 PM, Thomas Gleixner wrote:
> On Thu, 10 Nov 2016, Boris Ostrovsky wrote:
>> On 11/10/2016 10:12 AM, Thomas Gleixner wrote:
>>> On Thu, 10 Nov 2016, Boris Ostrovsky wrote:
>>>> By firmware you mean ACPI? It is most likely not available to PV guests.
>>> You either have to provide ACPI or MP tables. And either of those has to
>>> provide the intial APIC ids for the CPUs. They are supposed to match the
>>> IDs which are in the CPUID leafs.
>>>
>>>> How about returning cpu_data(cpu).initial_apicid?
>>> For a not yet brought up CPU. That's what the initial ACPI/MP table
>>> enumeration is for.
>> Unfortunately PV guests have neither. So we may need to emulate
>> something in xen_cpu_present_to_apicid().
> SFI does the same thing and according to the dmesg which was posted, this
> is using SFI. We also have devicetree based boot concept which provides the
> APICids in the CPU enumeration at boot time in a way which the whole x86
> machinery is expecting.
>
> So what kind of APICid is XEN handing in via SFI? None, or just an invalid
> one?

None, SFI is not used at all. Or device tree for that matter.

The (PV) guest currently assumes APIC ID (i.e. APIC[0x20]) to always be
zero (we just fixed a bug where CPUID may return an
inconsistent/non-zero initial APIC ID value, but that's a different
issue I think). There is no topology, really, everything is flat.

So this obviously is rather, ahem, fragile and we've been fixing bugs
there, especially with introduction of package map (a6a198bc60e6 that I
mentioned in the other message is one such example). There is work
scheduled on the hypervisor side to properly report initial APIC IDs in
CPUID and that may help with correctly dealing with this on Linux side.

(Longer term we would like to move to PVH where we will have "true" APIC
emulation, with ACPI, just like in HVM guests. You can do your part by
helping: http://marc.info/?l=linux-kernel&m=147791731414152&w=3 ;-))


-boris




2016-11-11 05:50:08

by Alok Kataria

[permalink] [raw]
Subject: Re: [tip:x86/urgent] x86/cpu: Deal with broken firmware (VMWare/XEN)

Hi Thomas,

On Wed, 2016-11-09 at 12:27 -0800, tip-bot for Thomas Gleixner wrote:
> Commit-ID: d49597fd3bc7d9534de55e9256767f073be1b33a
> Gitweb: https://urldefense.proofpoint.com/v2/url?u=http-3A__git.kernel.org_tip_d49597fd3bc7d9534de55e9256767f073be1b33a&d=CwIDaQ&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs&r=2AkLWShm6V8Nuu8ZZ-80Flo6y0XxCGmO1xrsAeRArAE&m=WBsB4JFr-Dct0um4Kf8QAxC7w6p-Mlk3H-LwItQJ7Fw&s=qI64vSH3y6q8wJhcqpI4dXYma-i1RTtlxgKwKwhFWWo&e=
> Author: Thomas Gleixner <[email protected]>
> AuthorDate: Wed, 9 Nov 2016 16:35:51 +0100
> Committer: Thomas Gleixner <[email protected]>
> CommitDate: Wed, 9 Nov 2016 21:05:01 +0100
>
> x86/cpu: Deal with broken firmware (VMWare/XEN)
>
> Both ACPI and MP specifications require that the APIC id in the respective
> tables must be the same as the APIC id in CPUID.
>
> The kernel retrieves the physical package id from the APIC id during the
> ACPI/MP table scan and builds the physical to logical package map. The
> physical package id which is used after a CPU comes up is retrieved from
> CPUID. So we rely on ACPI/MP tables and CPUID agreeing in that respect.
>
> There exist VMware and XEN implementations which violate the spec. As a
> result the physical to logical package map, which relies on the ACPI/MP
> tables does not work on those systems, because the CPUID initialized
> physical package id does not match the firmware id. This causes system
> crashes and malfunction due to invalid package mappings.

For documentation purpose let me note that, VMware VMs running at
virtual hardware version 9 and above don't have this ACPI/MP and CPUID
divergence on the package id. So not everyone will see this issue on
their VMs, this bug is limited to folks running at virtual hardware
version 8 and prior.

It's good that we can workaround the platform bug for those VMs, thanks
for adding these checks.

Alok

>
> The only way to cure this is to sanitize the physical package id after the
> CPUID enumeration and yell when the APIC ids are different. Fix up the
> initial APIC id, which is fine as it is only used printout purposes.
>
> If the physical package IDs differ yell and use the package information
> from the ACPI/MP tables so the existing logical package map just works.
>
> Chas provided the resulting dmesg output for his affected 4 virtual
> sockets, 1 core per socket VM:
>
> [Firmware Bug]: CPU1: APIC id mismatch. Firmware: 1 CPUID: 2
> [Firmware Bug]: CPU1: Using firmware package id 1 instead of 2
> ....
>
> Reported-and-tested-by: "Charles (Chas) Williams" <[email protected]>,
> Reported-by: M. Vefa Bicakci <[email protected]>
> Signed-off-by: Thomas Gleixner <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Sebastian Andrzej Siewior <[email protected]>
> Cc: Borislav Petkov <[email protected]>
> Cc: Alok Kataria <[email protected]>
> Cc: Boris Ostrovsky <[email protected]>
> Cc: #4.6+ <stable@vger,kernel.org>
> Link: https://urldefense.proofpoint.com/v2/url?u=http-3A__lkml.kernel.org_r_alpine.DEB.2.20.1611091613540.3501-40nanos&d=CwIDaQ&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs&r=2AkLWShm6V8Nuu8ZZ-80Flo6y0XxCGmO1xrsAeRArAE&m=WBsB4JFr-Dct0um4Kf8QAxC7w6p-Mlk3H-LwItQJ7Fw&s=HNQMGUrw_s6Mc_oyREBnD4TrUjERbLcH1viAZr-aFPY&e=
> Signed-off-by: Thomas Gleixner <[email protected]>
> ---
> arch/x86/kernel/cpu/common.c | 32 ++++++++++++++++++++++++++++++--
> 1 file changed, 30 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
> index 9bd910a..cc9e980 100644
> --- a/arch/x86/kernel/cpu/common.c
> +++ b/arch/x86/kernel/cpu/common.c
> @@ -979,6 +979,35 @@ static void x86_init_cache_qos(struct cpuinfo_x86 *c)
> }
>
> /*
> + * The physical to logical package id mapping is initialized from the
> + * acpi/mptables information. Make sure that CPUID actually agrees with
> + * that.
> + */
> +static void sanitize_package_id(struct cpuinfo_x86 *c)
> +{
> +#ifdef CONFIG_SMP
> + unsigned int pkg, apicid, cpu = smp_processor_id();
> +
> + apicid = apic->cpu_present_to_apicid(cpu);
> + pkg = apicid >> boot_cpu_data.x86_coreid_bits;
> +
> + if (apicid != c->initial_apicid) {
> + pr_err(FW_BUG "CPU%u: APIC id mismatch. Firmware: %x CPUID: %x\n",
> + cpu, apicid, c->initial_apicid);
> + c->initial_apicid = apicid;
> + }
> + if (pkg != c->phys_proc_id) {
> + pr_err(FW_BUG "CPU%u: Using firmware package id %u instead of %u\n",
> + cpu, pkg, c->phys_proc_id);
> + c->phys_proc_id = pkg;
> + }
> + c->logical_proc_id = topology_phys_to_logical_pkg(pkg);
> +#else
> + c->logical_proc_id = 0;
> +#endif
> +}
> +
> +/*
> * This does the hard work of actually picking apart the CPU stuff...
> */
> static void identify_cpu(struct cpuinfo_x86 *c)
> @@ -1103,8 +1132,7 @@ static void identify_cpu(struct cpuinfo_x86 *c)
> #ifdef CONFIG_NUMA
> numa_add_cpu(smp_processor_id());
> #endif
> - /* The boot/hotplug time assigment got cleared, restore it */
> - c->logical_proc_id = topology_phys_to_logical_pkg(c->phys_proc_id);
> + sanitize_package_id(c);
> }
>
> /*


2016-11-12 22:06:06

by M. Vefa Bicakci

[permalink] [raw]
Subject: Re: [PATCH] x86/cpuid: Deal with broken firmware once more

On 11/10/2016 01:50 PM, Charles (Chas) Williams wrote:
>
>
> On 11/09/2016 10:57 PM, M. Vefa Bicakci wrote:
>> [ 0.002000] mvb: CPU: Physical Processor ID: 0
>> [ 0.002000] mvb: CPU: Processor Core ID: 0
>> [ 0.002000] mvb: identify_cpu:1112: c: ffff880013b0a040,
>> c->logical_proc_id: 65535
>> [ 0.002000] mvb: __default_cpu_present_to_apicid:612: Returning
>> 65535! mps_cpu: 1, nr_cpu_ids: 2, cpu_present(mps_cpu): 1
>> [ 0.002000] smpboot: mvb: topology_update_package_map:270: cpu: 1,
>> pkg: 4095
>> [ 0.002000] smpboot: APIC(ffff) Converting physical 4095 to logical
>> package 0
>> [ 0.002000] smpboot: mvb: topology_update_package_map:305: cpu: 1,
>> cpu_data(cpu).logical_proc_id: 0
>
> This seems strange. 0xffff is BAD_APICID. Why didn't this fail here:
>
> for_each_present_cpu(cpu) {
> unsigned int apicid = apic->cpu_present_to_apicid(cpu);
>
> if (apicid == BAD_APICID ||
> !apic->apic_id_valid(apicid)) <<<<<<<<<<
> continue;
> if (!topology_update_package_map(apicid, cpu))
> continue;
>
> topology_update_package_map() should never have been called?

(Sorry for the delay!)

It appears that this is a different call path than the one in
smp_init_package_map function. I *think* the following call path is the
one shown in the dmesg, but I am not sure:

cpu_bringup_and_idle
cpu_bringup (arch/x86/xen/smp.c)
smp_store_cpu_info (this call path branch is included for context)
identify_secondary_cpu
identify_cpu
detect_ht
topology_update_package_map

Sorry about the potentially misleading dmesg excerpt I posted.

Vefa

2016-11-12 22:06:19

by M. Vefa Bicakci

[permalink] [raw]
Subject: Re: [PATCH] x86/cpuid: Deal with broken firmware once more

On 11/10/2016 06:31 PM, Boris Ostrovsky wrote:
> On 11/10/2016 10:05 AM, Charles (Chas) Williams wrote:
>>
>>
>> On 11/10/2016 09:02 AM, Boris Ostrovsky wrote:
>>> On 11/10/2016 06:13 AM, Thomas Gleixner wrote:
>>>> On Thu, 10 Nov 2016, M. Vefa Bicakci wrote:
>>>>
>>>>> I have found that your patch unfortunately does not improve the
>>>>> situation
>>>>> for me. Here is an excerpt obtained from the dmesg of a kernel
>>>>> compiled
>>>>> with this patch *as well as* Sebastian's patch:
>>>>> [ 0.002561] CPU: Physical Processor ID: 0
>>>>> [ 0.002566] CPU: Processor Core ID: 0
>>>>> [ 0.002572] [Firmware Bug]: CPU0: APIC id mismatch. Firmware:
>>>>> ffff CPUID: 2
>>>> So apic->cpu_present_to_apicid() gives us a completely bogus APIC id
>>>> which
>>>> translates to a bogus package id. And looking at the XEN code:
>>>>
>>>> xen_pv_apic.cpu_present_to_apicid = xen_cpu_present_to_apicid,
>>>>
>>>> and xen_cpu_present_to_apicid does:
>>>>
>>>> static int xen_cpu_present_to_apicid(int cpu)
>>>> {
>>>> if (cpu_present(cpu))
>>>> return xen_get_apic_id(xen_apic_read(APIC_ID));
>>>> else
>>>> return BAD_APICID;
>>>> }
>>>>
>>>> So independent of which present CPU we query we get just some random
>>>> information, in the above case we get BAD_APICID from
>>>> xen_apic_read() not
>>>> from the else path as this CPU _IS_ present.
>>>>
>>>> What's so wrong with storing the fricking firmware supplied APICid as
>>>> everybody else does and report it back when queried?
>>>
>>> By firmware you mean ACPI? It is most likely not available to PV guests.
>>> How about returning cpu_data(cpu).initial_apicid?
>>>
>>> And what was the original problem?
>>
>> The original issue I found was that VMware was returning a different set
>> of APIC id's in the ACPI tables than what it advertised on the CPU's.
>>
>> http://www.mail-archive.com/[email protected]/msg1266716.html
>
> For Xen, we recently added a6a198bc60e6 ("xen/x86: Update topology map
> for PV VCPUs") to at least temporarily work around some topology map
> problems that PV guests have with RAPL (which I think is what Vefa's
> problem was).

Hello Boris,

(Sorry for the delay!)

It appears that the problem is a bit different compared to the one
corrected by a6a198bc60e6, because my kernel tree -- based on 4.8.6 --
already includes the -stable backport of that commit, i.e.
88540ad0820ddfb05626e0136c0e5a79cea85fd1

The patch I included in my previous e-mail (dated 2016-11-10) corrects
root cause of the issue I am having with 4.8.6. Sebastian's original
patch adding error checking to the RAPL module prevents the RAPL module
from causing a kernel oops without my patch.

The issue I am experiencing is caused by the boot-up code in the
'init_apic_mappings' function switching the APIC ops structure from
Xen's structure to a no-op structure by calling the 'apic_disable'
function. Please let me know if I can clarify or elaborate.

For the record, using 4.8.7 without my correction patch patch does not
rectify the issue at hand. 4.8.7 changes the call site of the
'init_apic_mapping' function, so I had thought that it could be helpful.

Thank you,

Vefa

2016-11-13 18:03:30

by Boris Ostrovsky

[permalink] [raw]
Subject: Re: [PATCH] x86/cpuid: Deal with broken firmware once more

On 11/12/2016 05:05 PM, M. Vefa Bicakci wrote:
> On 11/10/2016 06:31 PM, Boris Ostrovsky wrote:
>> On 11/10/2016 10:05 AM, Charles (Chas) Williams wrote:
>>>
>>> On 11/10/2016 09:02 AM, Boris Ostrovsky wrote:
>>>> On 11/10/2016 06:13 AM, Thomas Gleixner wrote:
>>>>> On Thu, 10 Nov 2016, M. Vefa Bicakci wrote:
>>>>>
>>>>>> I have found that your patch unfortunately does not improve the
>>>>>> situation
>>>>>> for me. Here is an excerpt obtained from the dmesg of a kernel
>>>>>> compiled
>>>>>> with this patch *as well as* Sebastian's patch:
>>>>>> [ 0.002561] CPU: Physical Processor ID: 0
>>>>>> [ 0.002566] CPU: Processor Core ID: 0
>>>>>> [ 0.002572] [Firmware Bug]: CPU0: APIC id mismatch. Firmware:
>>>>>> ffff CPUID: 2
>>>>> So apic->cpu_present_to_apicid() gives us a completely bogus APIC id
>>>>> which
>>>>> translates to a bogus package id. And looking at the XEN code:
>>>>>
>>>>> xen_pv_apic.cpu_present_to_apicid = xen_cpu_present_to_apicid,
>>>>>
>>>>> and xen_cpu_present_to_apicid does:
>>>>>
>>>>> static int xen_cpu_present_to_apicid(int cpu)
>>>>> {
>>>>> if (cpu_present(cpu))
>>>>> return xen_get_apic_id(xen_apic_read(APIC_ID));
>>>>> else
>>>>> return BAD_APICID;
>>>>> }
>>>>>
>>>>> So independent of which present CPU we query we get just some random
>>>>> information, in the above case we get BAD_APICID from
>>>>> xen_apic_read() not
>>>>> from the else path as this CPU _IS_ present.
>>>>>
>>>>> What's so wrong with storing the fricking firmware supplied APICid as
>>>>> everybody else does and report it back when queried?
>>>> By firmware you mean ACPI? It is most likely not available to PV guests.
>>>> How about returning cpu_data(cpu).initial_apicid?
>>>>
>>>> And what was the original problem?
>>> The original issue I found was that VMware was returning a different set
>>> of APIC id's in the ACPI tables than what it advertised on the CPU's.
>>>
>>> http://www.mail-archive.com/[email protected]/msg1266716.html
>> For Xen, we recently added a6a198bc60e6 ("xen/x86: Update topology map
>> for PV VCPUs") to at least temporarily work around some topology map
>> problems that PV guests have with RAPL (which I think is what Vefa's
>> problem was).
> Hello Boris,
>
> (Sorry for the delay!)
>
> It appears that the problem is a bit different compared to the one
> corrected by a6a198bc60e6, because my kernel tree -- based on 4.8.6 --
> already includes the -stable backport of that commit, i.e.
> 88540ad0820ddfb05626e0136c0e5a79cea85fd1
>
> The patch I included in my previous e-mail (dated 2016-11-10) corrects
> root cause of the issue I am having with 4.8.6. Sebastian's original
> patch adding error checking to the RAPL module prevents the RAPL module
> from causing a kernel oops without my patch.

I don't see any messages from you on that date. Can you provide a link
to it (and to Sebastian's patch)?

(BTW, generally it's a good idea to copy xen-devel list on any
Xen-related issues).

>
> The issue I am experiencing is caused by the boot-up code in the
> 'init_apic_mappings' function switching the APIC ops structure from
> Xen's structure to a no-op structure by calling the 'apic_disable'
> function. Please let me know if I can clarify or elaborate.

apic_disable() is only invoked if there is no APIC present (i.e.
detect_init_APIC() returns a non-zero value) and I don't think this can
happen. Is your CPUID[1].edx[9] not set?

-boris

>
> For the record, using 4.8.7 without my correction patch patch does not
> rectify the issue at hand. 4.8.7 changes the call site of the
> 'init_apic_mapping' function, so I had thought that it could be helpful.
>
> Thank you,
>
> Vefa


2016-11-13 23:43:51

by M. Vefa Bicakci

[permalink] [raw]
Subject: Re: [PATCH] x86/cpuid: Deal with broken firmware once more

On 11/13/2016 09:04 PM, Boris Ostrovsky wrote:
> On 11/12/2016 05:05 PM, M. Vefa Bicakci wrote:
>> On 11/10/2016 06:31 PM, Boris Ostrovsky wrote:
>>> On 11/10/2016 10:05 AM, Charles (Chas) Williams wrote:
>>>>
>>>> On 11/10/2016 09:02 AM, Boris Ostrovsky wrote:
>>>>> On 11/10/2016 06:13 AM, Thomas Gleixner wrote:
>>>>>> On Thu, 10 Nov 2016, M. Vefa Bicakci wrote:
>>>>>>
>>>>>>> I have found that your patch unfortunately does not improve the
>>>>>>> situation
>>>>>>> for me. Here is an excerpt obtained from the dmesg of a kernel
>>>>>>> compiled
>>>>>>> with this patch *as well as* Sebastian's patch:
>>>>>>> [ 0.002561] CPU: Physical Processor ID: 0
>>>>>>> [ 0.002566] CPU: Processor Core ID: 0
>>>>>>> [ 0.002572] [Firmware Bug]: CPU0: APIC id mismatch. Firmware:
>>>>>>> ffff CPUID: 2
>>>>>> So apic->cpu_present_to_apicid() gives us a completely bogus APIC id
>>>>>> which
>>>>>> translates to a bogus package id. And looking at the XEN code:
>>>>>>
>>>>>> xen_pv_apic.cpu_present_to_apicid = xen_cpu_present_to_apicid,
>>>>>>
>>>>>> and xen_cpu_present_to_apicid does:
>>>>>>
>>>>>> static int xen_cpu_present_to_apicid(int cpu)
>>>>>> {
>>>>>> if (cpu_present(cpu))
>>>>>> return xen_get_apic_id(xen_apic_read(APIC_ID));
>>>>>> else
>>>>>> return BAD_APICID;
>>>>>> }
>>>>>>
>>>>>> So independent of which present CPU we query we get just some random
>>>>>> information, in the above case we get BAD_APICID from
>>>>>> xen_apic_read() not
>>>>>> from the else path as this CPU _IS_ present.
>>>>>>
>>>>>> What's so wrong with storing the fricking firmware supplied APICid as
>>>>>> everybody else does and report it back when queried?
>>>>> By firmware you mean ACPI? It is most likely not available to PV guests.
>>>>> How about returning cpu_data(cpu).initial_apicid?
>>>>>
>>>>> And what was the original problem?
>>>> The original issue I found was that VMware was returning a different set
>>>> of APIC id's in the ACPI tables than what it advertised on the CPU's.
>>>>
>>>> http://www.mail-archive.com/[email protected]/msg1266716.html
>>> For Xen, we recently added a6a198bc60e6 ("xen/x86: Update topology map
>>> for PV VCPUs") to at least temporarily work around some topology map
>>> problems that PV guests have with RAPL (which I think is what Vefa's
>>> problem was).
>> Hello Boris,
>>
>> (Sorry for the delay!)
>>
>> It appears that the problem is a bit different compared to the one
>> corrected by a6a198bc60e6, because my kernel tree -- based on 4.8.6 --
>> already includes the -stable backport of that commit, i.e.
>> 88540ad0820ddfb05626e0136c0e5a79cea85fd1
>>
>> The patch I included in my previous e-mail (dated 2016-11-10) corrects
>> root cause of the issue I am having with 4.8.6. Sebastian's original
>> patch adding error checking to the RAPL module prevents the RAPL module
>> from causing a kernel oops without my patch.
>
> I don't see any messages from you on that date. Can you provide a link
> to it (and to Sebastian's patch)?
>
> (BTW, generally it's a good idea to copy xen-devel list on any
> Xen-related issues).

As I explain below, it turns out that my issue was 'only' a kernel
configuration issue.

For reference, I had unknowingly solved my kernel-configuration-induced
issue via the patch at:
https://marc.info/?l=linux-kernel&m=147875027314638&w=2

Sebastian's patch (which adds error handling to the RAPL module) is at:
https://marc.info/?l=linux-kernel&m=147739814217598&w=2

>> The issue I am experiencing is caused by the boot-up code in the
>> 'init_apic_mappings' function switching the APIC ops structure from
>> Xen's structure to a no-op structure by calling the 'apic_disable'
>> function. Please let me know if I can clarify or elaborate.
>
> apic_disable() is only invoked if there is no APIC present (i.e.
> detect_init_APIC() returns a non-zero value) and I don't think this can
> happen. Is your CPUID[1].edx[9] not set?

I found out that my domU kernels invoke the 'apic_disable' function
because CONFIG_X86_MPPARSE was not enabled in my kernel configuration,
which would cause the 'smp_found_config' bit to be unset at boot-up.

This would cause 'init_apic_mappings' to call 'apic_disable', which
would cause Xen's 'apic' ops structure pointer to be replaced with the
no-op APIC ops structure's pointer.

The use of the no-op APIC ops structure would in turn cause invalid
virtual CPU package identifiers to be generated. Invalid CPU package
identifiers would in turn cause the RAPL module to produce a kernel oops
due to potentially missing error handling.

It looks like I have been ignoring the following kernel warning which I
should have noticed a long time ago:

MPS support code is not built-in.
Using acpi=off or acpi=noirq or pci=noacpi may have problem

To all on this e-mail thread, I learned a bit through this exercise, but
I have also taken a lot of everyone's time and created quite a bit of
e-mail traffic because of a kernel configuration issue on my end.

My apologies.

Vefa

2016-11-15 01:23:00

by Boris Ostrovsky

[permalink] [raw]
Subject: Re: [PATCH] x86/cpuid: Deal with broken firmware once more



On 11/13/2016 06:42 PM, M. Vefa Bicakci wrote:

> I found out that my domU kernels invoke the 'apic_disable' function
> because CONFIG_X86_MPPARSE was not enabled in my kernel configuration,
> which would cause the 'smp_found_config' bit to be unset at boot-up.


smp_found_config is not the problem, it is usually zero for Xen PV guests.

What is the problem is that because of your particular config selection
acpi_mps_check() fails (with the error message that you mention below)
and that leads to X86_FEATURE_APIC being cleared. And then we indeed
switch to APIC noop and things go south after that.

-boris

>
> This would cause 'init_apic_mappings' to call 'apic_disable', which
> would cause Xen's 'apic' ops structure pointer to be replaced with the
> no-op APIC ops structure's pointer.
>
> The use of the no-op APIC ops structure would in turn cause invalid
> virtual CPU package identifiers to be generated. Invalid CPU package
> identifiers would in turn cause the RAPL module to produce a kernel oops
> due to potentially missing error handling.
>
> It looks like I have been ignoring the following kernel warning which I
> should have noticed a long time ago:
>
> MPS support code is not built-in.
> Using acpi=off or acpi=noirq or pci=noacpi may have problem
>
> To all on this e-mail thread, I learned a bit through this exercise, but
> I have also taken a lot of everyone's time and created quite a bit of
> e-mail traffic because of a kernel configuration issue on my end.
>
> My apologies.
>
> Vefa
>

2016-11-18 11:19:43

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH] x86/cpuid: Deal with broken firmware once more

On Mon, 14 Nov 2016, Boris Ostrovsky wrote:
> On 11/13/2016 06:42 PM, M. Vefa Bicakci wrote:
>
> > I found out that my domU kernels invoke the 'apic_disable' function
> > because CONFIG_X86_MPPARSE was not enabled in my kernel configuration,
> > which would cause the 'smp_found_config' bit to be unset at boot-up.
>
> smp_found_config is not the problem, it is usually zero for Xen PV guests.
>
> What is the problem is that because of your particular config selection
> acpi_mps_check() fails (with the error message that you mention below) and
> that leads to X86_FEATURE_APIC being cleared. And then we indeed switch to
> APIC noop and things go south after that.

Indeed. And what really puzzles me is that Xen manages to bring up a
secondary CPU despite APIC being disabled.

There are quite some assumptions about no APIC == no SMP in all of x86. Can
we please make Xen behave like anything else?

Thanks,

tglx

2016-11-18 14:21:17

by Boris Ostrovsky

[permalink] [raw]
Subject: Re: [PATCH] x86/cpuid: Deal with broken firmware once more

On 11/18/2016 06:16 AM, Thomas Gleixner wrote:
> On Mon, 14 Nov 2016, Boris Ostrovsky wrote:
>> On 11/13/2016 06:42 PM, M. Vefa Bicakci wrote:
>>
>>> I found out that my domU kernels invoke the 'apic_disable' function
>>> because CONFIG_X86_MPPARSE was not enabled in my kernel configuration,
>>> which would cause the 'smp_found_config' bit to be unset at boot-up.
>> smp_found_config is not the problem, it is usually zero for Xen PV guests.
>>
>> What is the problem is that because of your particular config selection
>> acpi_mps_check() fails (with the error message that you mention below) and
>> that leads to X86_FEATURE_APIC being cleared. And then we indeed switch to
>> APIC noop and things go south after that.
> Indeed. And what really puzzles me is that Xen manages to bring up a
> secondary CPU despite APIC being disabled.

PV guests bring secondary VCPUs up using hypercalls (see xen_cpu_up()).

> There are quite some assumptions about no APIC == no SMP in all of x86. Can
> we please make Xen behave like anything else?
>

I will try to see if we can improve APIC emulation for these guests.
Unfortunately it will have to be done on kernel side since we still need
to support older Xen versions.

But as I said earlier, the right answer to this is PVH.

-boris