Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1031392AbdDTNY7 (ORCPT ); Thu, 20 Apr 2017 09:24:59 -0400 Received: from mx1.redhat.com ([209.132.183.28]:60104 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S945666AbdDTNY5 (ORCPT ); Thu, 20 Apr 2017 09:24:57 -0400 DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 51A6A5A5D Authentication-Results: ext-mx06.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx06.extmail.prod.ext.phx2.redhat.com; spf=pass smtp.mailfrom=vkuznets@redhat.com DKIM-Filter: OpenDKIM Filter v2.11.0 mx1.redhat.com 51A6A5A5D From: Vitaly Kuznetsov To: x86@kernel.org Cc: Thomas Gleixner , Ingo Molnar , "H. Peter Anvin" , Borislav Petkov , Prarit Bhargava , linux-kernel@vger.kernel.org, xen-devel@lists.xenproject.org Subject: [PATCH RFC] x86/smpboot: Set safer __max_logical_packages limit Date: Thu, 20 Apr 2017 15:24:53 +0200 Message-Id: <20170420132453.19652-1-vkuznets@redhat.com> X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.30]); Thu, 20 Apr 2017 13:24:56 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2745 Lines: 64 Recent changes in logical package management (Commit 9d85eb9119f4 ("x86/smpboot: Make logical package management more robust") and its predecessor) caused boot failures for some Xen guests. E.g. I'm trying to boot 10 CPU guest on AMD Opteron 4284 system and I see the following crash: [ 0.116104] smpboot: Max logical packages: 1 ... [ 0.590068] #8 [ 0.001000] smpboot: Package 1 of CPU 8 exceeds BIOS package data 1. [ 0.001000] ------------[ cut here ]------------ [ 0.001000] kernel BUG at arch/x86/kernel/cpu/common.c:1020! This is happening because total_cpus is 10 and x86_max_cores is 16(!). Turns out, the number of CPUs (vCPUs in our case) in each logical package doesn't have to be exactly x86_max_cores, we can have any number of CPUs <= x86_max_cores and they also don't have to match for all logical packages. This breaks the current concept of __max_logical_packages. In this patch I suggest we set __max_logical_packages based on the max_physical_pkg_id and total_cpus, this should be safe and cover all possible cases. Alternatively, we may think about eliminating the concept of __max_logical_packages completely and relying on max_physical_pkg_id/ total_cpus where we currently use topology_max_packages(). The issue could've been solved in Xen too I guess. CPUID returning x86_max_cores can be tweaked to be the lowerest(?) possible number of all logical packages of the guest. Fixes: 9d85eb9119f4 ("x86/smpboot: Make logical package management more robust") Signed-off-by: Vitaly Kuznetsov --- arch/x86/kernel/smpboot.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c index bd1f1ad..85f41cd 100644 --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -359,7 +359,6 @@ static void __init smp_init_package_map(struct cpuinfo_x86 *c, unsigned int cpu) ncpus = 1; } - __max_logical_packages = DIV_ROUND_UP(total_cpus, ncpus); logical_packages = 0; /* @@ -367,6 +366,15 @@ static void __init smp_init_package_map(struct cpuinfo_x86 *c, unsigned int cpu) * package can be smaller than the actual used apic ids. */ max_physical_pkg_id = DIV_ROUND_UP(MAX_LOCAL_APIC, ncpus); + + /* + * Each logical package has not more than x86_max_cores CPUs but + * it can happen that it has less, e.g. we may have 1 CPU per logical + * package regardless of what's in x86_max_cores. This is seen on some + * Xen setups with AMD processors. + */ + __max_logical_packages = min(max_physical_pkg_id, total_cpus); + size = max_physical_pkg_id * sizeof(unsigned int); physical_to_logical_pkg = kmalloc(size, GFP_KERNEL); memset(physical_to_logical_pkg, 0xff, size); -- 2.9.3