Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752672AbdLDQp7 (ORCPT ); Mon, 4 Dec 2017 11:45:59 -0500 Received: from mx1.redhat.com ([209.132.183.28]:44846 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751754AbdLDQp5 (ORCPT ); Mon, 4 Dec 2017 11:45:57 -0500 From: Prarit Bhargava To: linux-kernel@vger.kernel.org Cc: Prarit Bhargava , Prarit@vger.kernel.org, Jakub Kicinski , "netdev@vger.kernel.org" , Thomas Gleixner , Clark Williams Subject: Re: [bisected] x86 boot still broken on -rc2 Date: Mon, 4 Dec 2017 11:45:21 -0500 Message-Id: <20171204164521.17870-1-prarit@redhat.com> In-Reply-To: References: X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.28]); Mon, 04 Dec 2017 16:45:57 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2923 Lines: 76 On 12/04/2017 08:13 AM, Prarit Bhargava wrote: > > > x86: Booting SMP configuration: > .... node #0, CPUs: #1 #2 #3 #4 > .... node #1, CPUs: #5 #6 #7 #8 #9 > .... node #0, CPUs: #10 #11 #12 #13 #14 > .... node #1, CPUs: #15 #16 #17 #18 #19 > smp: Brought up 2 nodes, 20 CPUs > smpboot: Max logical packages: 1 > > which means that the calculation of logical packages is wrong because > > ncpus = cpu_data(0).booted_cores * smp_num_siblings; > ncpus = 10 * 2; > ncpus = 20; > > smp_num_siblings is defined as "The number of threads in a core" which > should be 1 if HT/SMT is disabled. > > It looks like my patch has exposed a bug in the > smp_num_siblings calculation. I'm still debugging ... The bug is that smp_num_siblings has been incorrectly calculated as the *maximum* number of threads in a core, and not the actual number of threads in a core on systems which have a CPUID level greater than 0xb. (see arch/x86/kernel/cpu/topology.c:59) That will take some time to investigate and come up with a proper solution and fix. In the meantime, the patch below will fix the problem in the short-term. I've tested the patch using SMT enabled, SMT disabled, maxcpus=1 and nr_cpus=1. tglx, Please revert b4c0a7326f5d ("x86/smpboot: Fix __max_logical_packages estimate") if you think that is a better option. The problem with smp_num_siblings has been around for almost a decade. P. ---8<--- Subject: [PATCH] arch/x86: Do not use smp_num_siblings in __max_logical_packages calculation Documentation/x86/topology.txt defines smp_num_siblings as "The number of threads in a core". Since commit bbb65d2d365e ("x86: use cpuid vector 0xb when available for detecting cpu topology") smp_num_siblings is the maximum number of threads in a core. If Simultaneous MultiThreading (SMT) is disabled on a system, smp_num_siblings is 2 and not 1 as expected. Use topology_max_smt_threads() in the __max_logical_packages calculation. Signed-off-by: Prarit Bhargava Cc: "netdev@vger.kernel.org" Cc: Thomas Gleixner Cc: Clark Williams --- arch/x86/kernel/smpboot.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c index 3d01df7d7cf6..eaee15fb7d8b 100644 --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -1304,7 +1304,7 @@ void __init native_smp_cpus_done(unsigned int max_cpus) * Today neither Intel nor AMD support heterogenous systems so * extrapolate the boot cpu's data to all packages. */ - ncpus = cpu_data(0).booted_cores * smp_num_siblings; + ncpus = cpu_data(0).booted_cores * topology_max_smt_threads(); __max_logical_packages = DIV_ROUND_UP(nr_cpu_ids, ncpus); pr_info("Max logical packages: %u\n", __max_logical_packages); -- 1.8.3.1