Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp11649962rwd; Thu, 22 Jun 2023 16:53:20 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ56xlxBG0pbbkZJSZwW1e7OKgK95e/u4zYOiHANWMoByE575GlOG6/49ZRQFF+b1LLPFkE/ X-Received: by 2002:a05:6358:cb1c:b0:131:16d0:743f with SMTP id gr28-20020a056358cb1c00b0013116d0743fmr5906768rwb.1.1687477999826; Thu, 22 Jun 2023 16:53:19 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1687477999; cv=none; d=google.com; s=arc-20160816; b=nZtlm9r0Cq0i8fkGii0EpzEjbj6pmkoG+RRjAnO/duRMz7PQWm/LZXYy3wOFQMshXg bgDmYDKioj/78ahhG/K8E3EdX7teAJJ9/Y2t93TvE9wkpfQc531KXlIDf9nfI0Er1xBM gY6M9sVSCa644WgRtxiqRtTufSRffGY6TdaNtLr8HnwngxndcXloYjMyIDymaxGnj3ZN sgtDTUPXY6PkDZbjgF2hjGg2u/veCK6z8UZ9h5cPII4k/8c46oMpZzWdnTFJhsut9gu9 gHnIXLqfkD4N8JuGa/fzJ9boYsaCxhrQpOdiL/3A9UPK9YcpAR5S17Q0g1kn7SydR2at DmYQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:message-id:date:references :in-reply-to:subject:cc:to:dkim-signature:dkim-signature:from; bh=gvYbXwq2eSAZRz3hMFUqvNYl8+fG5F3sO59deliWYI0=; b=jCsXYQAGS2Fodgb/lenNdO8a6OIwjiAgMbmxsqwIo1NyZT8nWUrniWzdU4Pv/TKcsN 9NCPqMRGy+J+SZjxxjKd8jMPl3QOBnO+DblnvCm8fb9e1MiNM3qoVgq/EsA3jdMfPXnB jSHXuEXlrc2le6FxXOSKHi4KuaIHpwBmhYvcpsVl9dJGnpObV4aCmbOKIBlbw9RlEpPE Rlv7R4ajuefTyLSzAy8uryjMBXB51WJp4EojoQd4jnmgxh6LyuYRZpno0AxZOcIqW6NE mYRHjSEv3zoBA79utQnGVqScYjS589jOo49Hf+JaMSlh3cGHNMuumty6fhVx4xRykdsB uDeA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=2sJUWJyo; dkim=neutral (no key) header.i=@linutronix.de header.s=2020e; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id w8-20020a63f508000000b00553ced07cf8si7425284pgh.215.2023.06.22.16.53.07; Thu, 22 Jun 2023 16:53:19 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=2sJUWJyo; dkim=neutral (no key) header.i=@linutronix.de header.s=2020e; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230343AbjFVXHb (ORCPT + 99 others); Thu, 22 Jun 2023 19:07:31 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39346 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230239AbjFVXH3 (ORCPT ); Thu, 22 Jun 2023 19:07:29 -0400 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 17F42135 for ; Thu, 22 Jun 2023 16:07:28 -0700 (PDT) From: Thomas Gleixner DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1687475245; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=gvYbXwq2eSAZRz3hMFUqvNYl8+fG5F3sO59deliWYI0=; b=2sJUWJyocUarp+h83+VJcOl63TkpanwxdYEGl/8lSrlBpDzzCTx7exhEN8L3OcQ0fEvJYJ EfN8TkSWyfmDJ7DoyTuL2eXuq3HL+QJyyD4oAKaWhQuJlj1av5l7SwMUIeTW6aJmQ+qzaG XCeJDXiHy52+Ytuc/pOnvrnWGo2wucdriFZBcVfChKyf+v5zjzvcYrXNdH0HBSoWyRIohs 18YS84ZLUIud1ue9bKP7XVfqxNuLWx4O1pdqH89MiemZxODmSIr7KMNj7n6/oHSRByjqEa pX0bTkgF26NBdM8PSFMTPxk4g3b78PqYFlqqo3IA3mGf3+TRA7zhpfnOQyyv2Q== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1687475245; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=gvYbXwq2eSAZRz3hMFUqvNYl8+fG5F3sO59deliWYI0=; b=ZQjvlkM0XNNM0OJhmGLAgWzrVc5BXrcT5iDrkNn/rU+1eKhNflA5lHvUFlrLzxqEMOafEz EWb/fXO8Xboi9ICA== To: Feng Tang , Peter Zijlstra Cc: Ingo Molnar , Borislav Petkov , Dave Hansen , "H . Peter Anvin" , David Woodhouse , "Paul E . McKenney" , x86@kernel.org, linux-kernel@vger.kernel.org, rui.zhang@intel.com, tim.c.chen@intel.com Subject: Re: [Patch v2 2/2] x86/tsc: use logical_packages as a better estimation of socket numbers In-Reply-To: <87h6qz7et0.ffs@tglx> References: <20230613052523.1106821-1-feng.tang@intel.com> <20230613052523.1106821-2-feng.tang@intel.com> <20230615092021.GE1683497@hirez.programming.kicks-ass.net> <87h6qz7et0.ffs@tglx> Date: Fri, 23 Jun 2023 01:07:24 +0200 Message-ID: <87edm36qqb.ffs@tglx> MIME-Version: 1.0 Content-Type: text/plain X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Jun 22 2023 at 16:27, Thomas Gleixner wrote: > On Fri, Jun 16 2023 at 15:18, Feng Tang wrote: > So something like the below should just work. Well it works in principle, but does not take any of the command line parameters which limit nr_possible CPUs or the actual kernel configuration into account. But the principle itself works correctly. Below is an updated version, which takes them into account. The data here is from a two socket system with 32 CPUs per socket. No command line parameters (NR_CPUS=64): smpboot: Allowing 64 CPUs, 32 hotplug CPUs clocksource: tsc-early: mask: 0xffffffffffffffff max_cycles: 0x1e3306b9ada, max_idle_ns: 440795224413 ns smp: Brought up 1 node, 32 CPUs smpboot: Max logical packages ACPI enumeration: 2 "possible_cpus=32" (NR_CPUS=64) or No command line parameter (NR_CPUS=32): smpboot: Allowing 32 CPUs, 0 hotplug CPUs clocksource: tsc-early: mask: 0xffffffffffffffff max_cycles: 0x1e3306b9ada, max_idle_ns: 440795224413 ns smp: Brought up 1 node, 32 CPUs smpboot: Max logical packages ACPI enumeration: 1 maxcpus=32 smpboot: Allowing 64 CPUs, 0 hotplug CPUs clocksource: tsc-early: mask: 0xffffffffffffffff max_cycles: 0x1e3306b9ada, max_idle_ns: 440795224413 ns smp: Brought up 1 node, 32 CPUs smpboot: Max logical packages ACPI enumeration: 2 But that's really all we should do. If the ACPI table enumerates CPUs as hotpluggable which can never arrive, then so be it. We have enough parameters to override the BIOS nonsense. Trying to do more magic MAD table parsing with heuristics is just wrong. We already have way too many heuristics and workarounds for broken firmware, but for the problem at hand, we really don't need more. The only systems I observed so far which have a non-sensical amount of "hotpluggable" CPUs are high-end server machines. It's a resonable expectation that machines with high-end price tags come with correct firmware. Trying to work around that (except with the existing command line options) is just proliferating this mess. This has to stop. Thanks, tglx --- --- a/arch/x86/include/asm/apic.h +++ b/arch/x86/include/asm/apic.h @@ -509,9 +509,12 @@ extern int default_check_phys_apicid_pre #ifdef CONFIG_SMP bool apic_id_is_primary_thread(unsigned int id); void apic_smt_update(void); +extern unsigned int apic_to_pkg_shift; +void logical_packages_update(u32 apicid, bool enabled); #else static inline bool apic_id_is_primary_thread(unsigned int id) { return false; } static inline void apic_smt_update(void) { } +static inline void logical_packages_update(u32 apicid, bool enabled) { } #endif struct msi_msg; --- a/arch/x86/kernel/acpi/boot.c +++ b/arch/x86/kernel/acpi/boot.c @@ -178,6 +178,7 @@ static int acpi_register_lapic(int id, u } if (!enabled) { + logical_packages_update(acpiid, false); ++disabled_cpus; return -EINVAL; } @@ -189,6 +190,8 @@ static int acpi_register_lapic(int id, u if (cpu >= 0) early_per_cpu(x86_cpu_to_acpiid, cpu) = acpiid; + logical_packages_update(acpiid, cpu >= 0); + return cpu; } --- a/arch/x86/kernel/cpu/amd.c +++ b/arch/x86/kernel/cpu/amd.c @@ -692,6 +692,8 @@ static void early_init_amd(struct cpuinf } } + detect_extended_topology_early(c); + if (cpu_has(c, X86_FEATURE_TOPOEXT)) smp_num_siblings = ((cpuid_ebx(0x8000001e) >> 8) & 0xff) + 1; } --- a/arch/x86/kernel/cpu/topology.c +++ b/arch/x86/kernel/cpu/topology.c @@ -29,6 +29,8 @@ unsigned int __max_die_per_package __rea EXPORT_SYMBOL(__max_die_per_package); #ifdef CONFIG_SMP +unsigned int apic_to_pkg_shift __ro_after_init; + /* * Check if given CPUID extended topology "leaf" is implemented */ @@ -66,7 +68,7 @@ int detect_extended_topology_early(struc { #ifdef CONFIG_SMP unsigned int eax, ebx, ecx, edx; - int leaf; + int leaf, subleaf; leaf = detect_extended_topology_leaf(c); if (leaf < 0) @@ -80,6 +82,14 @@ int detect_extended_topology_early(struc */ c->initial_apicid = edx; smp_num_siblings = max_t(int, smp_num_siblings, LEVEL_MAX_SIBLINGS(ebx)); + + for (subleaf = 1; subleaf < 8; subleaf++) { + cpuid_count(leaf, subleaf, &eax, &ebx, &ecx, &edx); + + if (ebx == 0 || !LEAFB_SUBTYPE(ecx)) + break; + apic_to_pkg_shift = BITS_SHIFT_NEXT_LEVEL(eax); + } #endif return 0; } --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -1501,17 +1501,91 @@ void __init native_smp_prepare_boot_cpu( native_pv_lock_init(); } +struct logical_pkg { + unsigned int enabled_cpus; + unsigned int disabled_cpus; +}; + +/* + * Needs to be size of NR_CPUS because virt allows to create the weirdest + * topologies just because it can. + */ +static struct logical_pkg logical_pkgs[NR_CPUS] __refdata; + +void logical_packages_update(u32 apicid, bool enabled) +{ + struct logical_pkg *lp; + unsigned int pkg; + + if (!apic_to_pkg_shift || system_state != SYSTEM_BOOTING) + return; + + pkg = (apicid >> apic_to_pkg_shift); + + lp = logical_pkgs + pkg; + if (enabled) + lp->enabled_cpus++; + else + lp->disabled_cpus++; + + if (++pkg > __max_logical_packages) + __max_logical_packages = pkg; +} + +static void __init logical_packages_finish_setup(unsigned int possible) +{ + unsigned int pkg, maxpkg = 0, maxcpus = 0; + + if (!apic_to_pkg_shift) + return; + + /* Scan the enabled CPUs first */ + for (pkg = 0; pkg < __max_logical_packages; pkg++) { + if (!logical_pkgs[pkg].enabled_cpus) + continue; + + maxpkg++; + maxcpus += logical_pkgs[pkg].enabled_cpus; + + if (maxcpus >= possible) { + __max_logical_packages = maxpkg; + return; + } + } + + /* There is still room, scan for disabled CPUs */ + for (pkg = 0; pkg < __max_logical_packages; pkg++) { + if (logical_pkgs[pkg].enabled_cpus || !logical_pkgs[pkg].disabled_cpus) + continue; + + maxpkg++; + maxcpus += logical_pkgs[pkg].disabled_cpus; + + if (maxcpus >= possible) + break; + } + + __max_logical_packages = maxpkg; +} + void __init calculate_max_logical_packages(void) { int ncpus; + if (__max_logical_packages) { + pr_info("Max logical packages ACPI enumeration: %u\n", + __max_logical_packages); + return; + } + /* * Today neither Intel nor AMD support heterogeneous systems so * extrapolate the boot cpu's data to all packages. */ ncpus = cpu_data(0).booted_cores * topology_max_smt_threads(); __max_logical_packages = DIV_ROUND_UP(total_cpus, ncpus); - pr_info("Max logical packages: %u\n", __max_logical_packages); + + pr_info("Max logical packages estimated: %u\n", __max_logical_packages); } void __init native_smp_cpus_done(unsigned int max_cpus) @@ -1619,6 +1693,8 @@ early_param("possible_cpus", _setup_poss for (i = 0; i < possible; i++) set_cpu_possible(i, true); + + logical_packages_finish_setup(possible); } #ifdef CONFIG_HOTPLUG_CPU