Received: by 10.223.176.5 with SMTP id f5csp1013399wra; Wed, 7 Feb 2018 11:07:16 -0800 (PST) X-Google-Smtp-Source: AH8x2250gXwC1ZeFetVCEVQEkhGMbR6JSuXfmc8Paas2lQo062HI+dpojkQHEOfJtHQFJEWjdylj X-Received: by 10.99.114.71 with SMTP id c7mr5698832pgn.283.1518030436520; Wed, 07 Feb 2018 11:07:16 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1518030436; cv=none; d=google.com; s=arc-20160816; b=sLQ6H2tWXR8EshC3lcak38avGNBdDwX+Q6P6tldsDazmk2kWWnE3WifSIfX0qhyauh GSf0QeDZLHrC3gQ8cKnQG53uPCj5IuIzyhCOdzNeTY+YX2aVfvTXUKxMiZjT3ncl3naZ nJQivqMVxErqsnTmSr9lv3S9rNjFgwj+vvrJJVo30IoMKOA3weuYnEEH+f7/jCZApJ4M Or3n2j4i1U0+uWwZkfgOJNKpfCYhd8KQ+gKqn49zGtZ+DFQm4fGg35Jg0jmmtbJuQS6Z U/GjuCFnYGVOMgxVc04Cdmi2BzJXCdoyIoDDqymPryv6oY6i/EJR7MHhpLOFkRrP90Pe N8sQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:in-reply-to :mime-version:user-agent:date:message-id:from:cc:references:to :subject:arc-authentication-results; bh=ncVljUtPulgfJd6Zxhv6RGfQl4f0Myx8YpMU1ULstgM=; b=hgLWf3F/9A7oCFHE1ZEjGzrLsYb6x+28pvKWMMsKXC2u8PjVzz/MW/DF0gD4cCWLoQ 9LxCvDdEh76BTPRNh4Ybr8Ln5D4K7H/77L7oEMh/HzGU/LIGMkXO4aaYsuD4X0QAhXpL 38FIXDQxFWVA7e2fHXtMjLxbQxivWjTSg71hiqTC9j204HI284bmOL+J8I4lWj2tX5Bg 78JyuKvGqJB90CX53GGXb+dzsoWdUBuatioK4lhGNUIZbictt3ZJNiv4S1+fiPTsQyJQ Hck3j008ZIgeC9VhBMEIOeYPQ7wKslzyjllbPRFzBI3z1G72u88W60VmxTPJ6YF/EdAK vXIA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id e3-v6si1498064plk.542.2018.02.07.11.07.01; Wed, 07 Feb 2018 11:07:16 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754517AbeBGTEe (ORCPT + 99 others); Wed, 7 Feb 2018 14:04:34 -0500 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:45576 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1754062AbeBGTEd (ORCPT ); Wed, 7 Feb 2018 14:04:33 -0500 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 9A3F240FB62B; Wed, 7 Feb 2018 19:04:32 +0000 (UTC) Received: from [10.16.186.145] (prarit-guest.khw.lab.eng.bos.redhat.com [10.16.186.145]) by smtp.corp.redhat.com (Postfix) with ESMTP id B95DA1006EDA; Wed, 7 Feb 2018 19:04:30 +0000 (UTC) Subject: Re: [v6,3/3] x86/smpboot: Fix __max_logical_packages estimate To: Simon Gaiser , xen-devel References: <20171114124257.22013-4-prarit@redhat.com> <370a7ab2-8f57-b99b-428b-d0b1dfe401e7@invisiblethingslab.com> Cc: linux-kernel@vger.kernel.org, Thomas Gleixner , Ingo Molnar , "H. Peter Anvin" , x86@kernel.org, Peter Zijlstra , Andi Kleen , Dave Hansen , Piotr Luc , Kan Liang , Borislav Petkov , Stephane Eranian , Arvind Yadav , Andy Lutomirski , Christian Borntraeger , "Kirill A. Shutemov" , Tom Lendacky , He Chen , Mathias Krause , Tim Chen , Vitaly Kuznetsov From: Prarit Bhargava Message-ID: <6fdee802-bf24-7fbb-c95a-a6e0d840fbde@redhat.com> Date: Wed, 7 Feb 2018 14:04:30 -0500 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.7.0 MIME-Version: 1.0 In-Reply-To: <370a7ab2-8f57-b99b-428b-d0b1dfe401e7@invisiblethingslab.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.78 on 10.11.54.3 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.7]); Wed, 07 Feb 2018 19:04:32 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.7]); Wed, 07 Feb 2018 19:04:32 +0000 (UTC) for IP:'10.11.54.3' DOMAIN:'int-mx03.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'prarit@redhat.com' RCPT:'' Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 02/07/2018 01:44 PM, Simon Gaiser wrote: > Prarit Bhargava: >> A system booted with a small number of cores enabled per package >> panics because the estimate of __max_logical_packages is too low. >> This occurs when the total number of active cores across all packages >> is less than the maximum core count for a single package. >> >> ie) On a 4 package system with 20 cores/package where only 4 cores >> are enabled on each package, the value of __max_logical_packages is >> calculated as DIV_ROUND_UP(16 / 20) = 1 and not 4. >> >> Calculate __max_logical_packages after the cpu enumeration has completed. >> Use the boot cpu's data to extrapolate the number of packages. >> >> Signed-off-by: Prarit Bhargava >> Cc: Thomas Gleixner >> Cc: Ingo Molnar >> Cc: "H. Peter Anvin" >> Cc: x86@kernel.org >> Cc: Peter Zijlstra >> Cc: Andi Kleen >> Cc: Dave Hansen >> Cc: Piotr Luc >> Cc: Kan Liang >> Cc: Borislav Petkov >> Cc: Stephane Eranian >> Cc: Prarit Bhargava >> Cc: Arvind Yadav >> Cc: Andy Lutomirski >> Cc: Christian Borntraeger >> Cc: "Kirill A. Shutemov" >> Cc: Tom Lendacky >> Cc: He Chen >> Cc: Mathias Krause >> Cc: Tim Chen >> Cc: Vitaly Kuznetsov >> --- >> arch/x86/kernel/smpboot.c | 55 +++++++++-------------------------------------- >> 1 file changed, 10 insertions(+), 45 deletions(-) >> >> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c >> index 838d36ff7ba6..2e3c5a394e79 100644 >> --- a/arch/x86/kernel/smpboot.c >> +++ b/arch/x86/kernel/smpboot.c >> @@ -308,12 +308,6 @@ int topology_update_package_map(unsigned int pkg, unsigned int cpu) >> if (new >= 0) >> goto found; >> >> - if (logical_packages >= __max_logical_packages) { >> - pr_warn("Package %u of CPU %u exceeds BIOS package data %u.\n", >> - logical_packages, cpu, __max_logical_packages); >> - return -ENOSPC; >> - } >> - >> new = logical_packages++; >> if (new != pkg) >> pr_info("CPU %u Converting physical %u to logical package %u\n", >> @@ -323,44 +317,6 @@ int topology_update_package_map(unsigned int pkg, unsigned int cpu) >> return 0; >> } >> >> -static void __init smp_init_package_map(struct cpuinfo_x86 *c, unsigned int cpu) >> -{ >> - unsigned int ncpus; >> - >> - /* >> - * Today neither Intel nor AMD support heterogenous systems. That >> - * might change in the future.... >> - * >> - * While ideally we'd want '* smp_num_siblings' in the below @ncpus >> - * computation, this won't actually work since some Intel BIOSes >> - * report inconsistent HT data when they disable HT. >> - * >> - * In particular, they reduce the APIC-IDs to only include the cores, >> - * but leave the CPUID topology to say there are (2) siblings. >> - * This means we don't know how many threads there will be until >> - * after the APIC enumeration. >> - * >> - * By not including this we'll sometimes over-estimate the number of >> - * logical packages by the amount of !present siblings, but this is >> - * still better than MAX_LOCAL_APIC. >> - * >> - * We use total_cpus not nr_cpu_ids because nr_cpu_ids can be limited >> - * on the command line leading to a similar issue as the HT disable >> - * problem because the hyperthreads are usually enumerated after the >> - * primary cores. >> - */ >> - ncpus = boot_cpu_data.x86_max_cores; >> - if (!ncpus) { >> - pr_warn("x86_max_cores == zero !?!?"); >> - ncpus = 1; >> - } >> - >> - __max_logical_packages = DIV_ROUND_UP(total_cpus, ncpus); >> - pr_info("Max logical packages: %u\n", __max_logical_packages); >> - >> - topology_update_package_map(c->phys_proc_id, cpu); >> -} >> - >> void __init smp_store_boot_cpu_info(void) >> { >> int id = 0; /* CPU 0 */ >> @@ -368,7 +324,7 @@ void __init smp_store_boot_cpu_info(void) >> >> *c = boot_cpu_data; >> c->cpu_index = id; >> - smp_init_package_map(c, id); >> + topology_update_package_map(c->phys_proc_id, id); >> cpu_data(id).set = 1; >> } >> >> @@ -1371,7 +1327,16 @@ void __init native_smp_prepare_boot_cpu(void) >> >> void __init native_smp_cpus_done(unsigned int max_cpus) >> { >> + int ncpus; >> + >> pr_debug("Boot done\n"); >> + /* >> + * Today neither Intel nor AMD support heterogenous systems so >> + * extrapolate the boot cpu's data to all packages. >> + */ >> + ncpus = cpu_data(0).booted_cores * smp_num_siblings; >> + __max_logical_packages = DIV_ROUND_UP(nr_cpu_ids, ncpus); >> + pr_info("Max logical packages: %u\n", __max_logical_packages); >> >> if (x86_has_numa_in_package) >> set_sched_topology(x86_numa_in_package_topology); > > This breaks booting as Xen PV domain for me. The problem seems to be > that native_smp_cpus_done() is never called on a PV domain. So > __max_logical_packages is uninitialized and this leads to a NULL > pointer dereference in coretemp. > I'll see if I can figure out a way to test that. Does 947134d9b00f ("x86/smpboot: Do not use smp_num_siblings in __max_logical_packages calculation") help? P.