Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934448AbcKKFuI (ORCPT ); Fri, 11 Nov 2016 00:50:08 -0500 Received: from ex13-edg-ou-001.vmware.com ([208.91.0.189]:9102 "EHLO EX13-EDG-OU-001.vmware.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755944AbcKKFuG (ORCPT ); Fri, 11 Nov 2016 00:50:06 -0500 From: Alok Kataria To: "bigeasy@linutronix.de" , "mingo@kernel.org" , "peterz@infradead.org" , "boris.ostrovsky@oracle.com" , "linux-kernel@vger.kernel.org" , "tglx@linutronix.de" , "hpa@zytor.com" , "bp@alien8.de" , "m.v.b@runbox.com" CC: "linux-tip-commits@vger.kernel.org" Subject: Re: [tip:x86/urgent] x86/cpu: Deal with broken firmware (VMWare/XEN) Thread-Topic: [tip:x86/urgent] x86/cpu: Deal with broken firmware (VMWare/XEN) Thread-Index: AQHSOsf2eismAeiT00SM68TD5cPIuKDTSq0A Date: Fri, 11 Nov 2016 05:49:18 +0000 Message-ID: <1478843692.2694.235.camel@vmware.com> References: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: spf=none (sender IP is ) smtp.mailfrom=akataria@vmware.com; x-originating-ip: [27.250.19.131] x-microsoft-exchange-diagnostics: 1;BY2PR05MB693;7:I3IRpLgaX+y0U/HZ0w5qcfDvE2hvtrBSQJcgZ7SW/iFPT7TevMCsSUtHWG1TA/hjDAIre2ba15prlu08ZhYwr5leP7QL+JRM4K4ExdKGIYrl6xgBgVoul96W5Xoac+XOUWACIAAoOEP9lV3/5kT4W+O87alcMDk/Fw53uUjJN4Jl+4MgkccywnGRfAj8NSw/KWNayivR8ZBMVGYG2u9qfGhwAmZg4rF9UVML+49ThzsJxQY923DcGOI+98ooUYlZsrDiFUr6VOSlONQeaY9mfMdC/HqYsh4/2e9pqo34l8Xwast0wydfI+t6gOkr07wHicRuzY0E3YNG76hwqXxxSvWuQU6YGK0G7zbZEl8xlak=;20:Wq5q6PxvH3dXvM3+KT3kqjAtFR0Ako/9fwD8OA4M6C30Hjn7Ypjhx+X3Q2BnnOlqsXl8Cs+/1bsXjfBbIOqh2zq9v+5yMP4EJWqLxzW17p1BbJNc8GnFiZKpTktCiD/oMKMiivOlTvbkbLm/W2KGY95QDl8VOOJRJomHW6bvQx4= x-ms-office365-filtering-correlation-id: 97ddd85b-b0f9-407f-0696-08d409f67aac x-microsoft-antispam: UriScan:;BCL:0;PCL:0;RULEID:(22001);SRVR:BY2PR05MB693; x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:(61668805478150)(10436049006162)(146099531331640); x-exchange-antispam-report-cfa-test: BCL:0;PCL:0;RULEID:(6040176)(601004)(2401047)(5005006)(8121501046)(10201501046)(3002001);SRVR:BY2PR05MB693;BCL:0;PCL:0;RULEID:;SRVR:BY2PR05MB693; x-forefront-prvs: 012349AD1C x-forefront-antispam-report: SFV:NSPM;SFS:(10009020)(6009001)(7916002)(199003)(377424004)(24454002)(189002)(229853002)(3660700001)(81166006)(97736004)(5001770100001)(3280700002)(2900100001)(7736002)(575784001)(86362001)(81156014)(101416001)(77096005)(92566002)(2201001)(218543002)(4326007)(5660300001)(586003)(2950100002)(305945005)(7846002)(189998001)(102836003)(6116002)(8936002)(50986999)(7416002)(54356999)(3846002)(76176999)(106356001)(68736007)(36756003)(103116003)(122556002)(33646002)(66066001)(105586002)(106116001)(2906002)(87936001)(8676002)(99286002)(2501003);DIR:OUT;SFP:1101;SCL:1;SRVR:BY2PR05MB693;H:BY2PR05MB696.namprd05.prod.outlook.com;FPR:;SPF:None;PTR:InfoNoRecords;A:1;MX:1;LANG:en; spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="utf-8" Content-ID: MIME-Version: 1.0 X-MS-Exchange-CrossTenant-originalarrivaltime: 11 Nov 2016 05:49:18.7812 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: b39138ca-3cee-4b4a-a4d6-cd83d9dd62f0 X-MS-Exchange-Transport-CrossTenantHeadersStamped: BY2PR05MB693 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from base64 to 8bit by mail.home.local id uAB5oE71028563 Content-Length: 5103 Lines: 118 Hi Thomas, On Wed, 2016-11-09 at 12:27 -0800, tip-bot for Thomas Gleixner wrote: > Commit-ID: d49597fd3bc7d9534de55e9256767f073be1b33a > Gitweb: https://urldefense.proofpoint.com/v2/url?u=http-3A__git.kernel.org_tip_d49597fd3bc7d9534de55e9256767f073be1b33a&d=CwIDaQ&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs&r=2AkLWShm6V8Nuu8ZZ-80Flo6y0XxCGmO1xrsAeRArAE&m=WBsB4JFr-Dct0um4Kf8QAxC7w6p-Mlk3H-LwItQJ7Fw&s=qI64vSH3y6q8wJhcqpI4dXYma-i1RTtlxgKwKwhFWWo&e= > Author: Thomas Gleixner > AuthorDate: Wed, 9 Nov 2016 16:35:51 +0100 > Committer: Thomas Gleixner > CommitDate: Wed, 9 Nov 2016 21:05:01 +0100 > > x86/cpu: Deal with broken firmware (VMWare/XEN) > > Both ACPI and MP specifications require that the APIC id in the respective > tables must be the same as the APIC id in CPUID. > > The kernel retrieves the physical package id from the APIC id during the > ACPI/MP table scan and builds the physical to logical package map. The > physical package id which is used after a CPU comes up is retrieved from > CPUID. So we rely on ACPI/MP tables and CPUID agreeing in that respect. > > There exist VMware and XEN implementations which violate the spec. As a > result the physical to logical package map, which relies on the ACPI/MP > tables does not work on those systems, because the CPUID initialized > physical package id does not match the firmware id. This causes system > crashes and malfunction due to invalid package mappings. For documentation purpose let me note that, VMware VMs running at virtual hardware version 9 and above don't have this ACPI/MP and CPUID divergence on the package id. So not everyone will see this issue on their VMs, this bug is limited to folks running at virtual hardware version 8 and prior. It's good that we can workaround the platform bug for those VMs, thanks for adding these checks. Alok > > The only way to cure this is to sanitize the physical package id after the > CPUID enumeration and yell when the APIC ids are different. Fix up the > initial APIC id, which is fine as it is only used printout purposes. > > If the physical package IDs differ yell and use the package information > from the ACPI/MP tables so the existing logical package map just works. > > Chas provided the resulting dmesg output for his affected 4 virtual > sockets, 1 core per socket VM: > > [Firmware Bug]: CPU1: APIC id mismatch. Firmware: 1 CPUID: 2 > [Firmware Bug]: CPU1: Using firmware package id 1 instead of 2 > .... > > Reported-and-tested-by: "Charles (Chas) Williams" , > Reported-by: M. Vefa Bicakci > Signed-off-by: Thomas Gleixner > Cc: Peter Zijlstra > Cc: Sebastian Andrzej Siewior > Cc: Borislav Petkov > Cc: Alok Kataria > Cc: Boris Ostrovsky > Cc: #4.6+ > Link: https://urldefense.proofpoint.com/v2/url?u=http-3A__lkml.kernel.org_r_alpine.DEB.2.20.1611091613540.3501-40nanos&d=CwIDaQ&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs&r=2AkLWShm6V8Nuu8ZZ-80Flo6y0XxCGmO1xrsAeRArAE&m=WBsB4JFr-Dct0um4Kf8QAxC7w6p-Mlk3H-LwItQJ7Fw&s=HNQMGUrw_s6Mc_oyREBnD4TrUjERbLcH1viAZr-aFPY&e= > Signed-off-by: Thomas Gleixner > --- > arch/x86/kernel/cpu/common.c | 32 ++++++++++++++++++++++++++++++-- > 1 file changed, 30 insertions(+), 2 deletions(-) > > diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c > index 9bd910a..cc9e980 100644 > --- a/arch/x86/kernel/cpu/common.c > +++ b/arch/x86/kernel/cpu/common.c > @@ -979,6 +979,35 @@ static void x86_init_cache_qos(struct cpuinfo_x86 *c) > } > > /* > + * The physical to logical package id mapping is initialized from the > + * acpi/mptables information. Make sure that CPUID actually agrees with > + * that. > + */ > +static void sanitize_package_id(struct cpuinfo_x86 *c) > +{ > +#ifdef CONFIG_SMP > + unsigned int pkg, apicid, cpu = smp_processor_id(); > + > + apicid = apic->cpu_present_to_apicid(cpu); > + pkg = apicid >> boot_cpu_data.x86_coreid_bits; > + > + if (apicid != c->initial_apicid) { > + pr_err(FW_BUG "CPU%u: APIC id mismatch. Firmware: %x CPUID: %x\n", > + cpu, apicid, c->initial_apicid); > + c->initial_apicid = apicid; > + } > + if (pkg != c->phys_proc_id) { > + pr_err(FW_BUG "CPU%u: Using firmware package id %u instead of %u\n", > + cpu, pkg, c->phys_proc_id); > + c->phys_proc_id = pkg; > + } > + c->logical_proc_id = topology_phys_to_logical_pkg(pkg); > +#else > + c->logical_proc_id = 0; > +#endif > +} > + > +/* > * This does the hard work of actually picking apart the CPU stuff... > */ > static void identify_cpu(struct cpuinfo_x86 *c) > @@ -1103,8 +1132,7 @@ static void identify_cpu(struct cpuinfo_x86 *c) > #ifdef CONFIG_NUMA > numa_add_cpu(smp_processor_id()); > #endif > - /* The boot/hotplug time assigment got cleared, restore it */ > - c->logical_proc_id = topology_phys_to_logical_pkg(c->phys_proc_id); > + sanitize_package_id(c); > } > > /*