Received: by 2002:a05:7412:b10a:b0:f3:1519:9f41 with SMTP id az10csp519029rdb; Thu, 30 Nov 2023 10:37:52 -0800 (PST) X-Google-Smtp-Source: AGHT+IGLIk7QvSEcIrmWYJ7ClZnPTQRlG39djhlMKHKxPRSPOtCwtx7sqe6FCUOfjO44NTGYuRwd X-Received: by 2002:a05:6a20:3d11:b0:18c:9855:e949 with SMTP id y17-20020a056a203d1100b0018c9855e949mr16361344pzi.15.1701369471822; Thu, 30 Nov 2023 10:37:51 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1701369471; cv=none; d=google.com; s=arc-20160816; b=XXebidEAxkkIMDEw7tdR1uKkEg192M68bpXdkCwnWA94f4i/30sXrtMn4D7S2p6eRi B46ExxsU9CffvbxasJaE3jiUYMXweVa3wiccDxhb4pOJFpDpgMtnbR/HS/h1ZSZXd8sr th/3iU4IStdEjkV87kpRRPWxXfkvfUV6HHdY/PsGfZgQJFridaeuBVVQ/0QmJWsoRmX9 IwTMTYdhKM9werZWIxm6qkUKhsKetXHSFA9bUfHHETi7p47fNqck2JTyuwZq0yhaPPn3 51l5NBrXcjR5FySsqnCHofZQucDTKFZJ1a8gpV1uhYTUR0TTrXiu5x6B3RFcVEshbYQP rmsQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date; bh=DrjTsTYO2xlkgWoVz4wv+b3rfotA5xQPi9VNa2keNcM=; fh=Ed9PZynrqo9np6puup+DlOK7edsDgSPVHfl/e/7Y9g0=; b=Mi4N9ZeWBmkE6gZauxil/zqXB7MjqYgKs0cdcqmT9O7inj6MPbHClsrfC411U+fG+2 MRihAKmYoH/QXyaUY9CcnR1TkoZ6A1W5jCbhsfDUTQj+c5rVomGklvpYpZxXyuH0Aok0 EMS7uwMrqkM/HthxqxFf5rS66+uzepNy/wTxfBR/GHqg4GaWlKTdxBmcF+Pe7Dnncwoc JA4IxA5wrGGPw8y+WLFpTDeWwUmUZv75DBzZVC619NcsTBLXd7P2/VKP1YZXghL4HbDS mXOjnDaSEaNGRjp94Ra0f58J3KQ/+O0k4pUvhI5gVzekBtfgSw+75ZRWaQZl1zDpOkxL pRpQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:6 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from pete.vger.email (pete.vger.email. [2620:137:e000::3:6]) by mx.google.com with ESMTPS id v16-20020a63f850000000b005c1b58a321asi1948482pgj.411.2023.11.30.10.37.51 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 30 Nov 2023 10:37:51 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:6 as permitted sender) client-ip=2620:137:e000::3:6; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:6 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by pete.vger.email (Postfix) with ESMTP id 1414782CC47B; Thu, 30 Nov 2023 10:37:49 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at pete.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230106AbjK3Sh1 (ORCPT + 99 others); Thu, 30 Nov 2023 13:37:27 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35792 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232231AbjK3Sh0 (ORCPT ); Thu, 30 Nov 2023 13:37:26 -0500 Received: from mail-ej1-f43.google.com (mail-ej1-f43.google.com [209.85.218.43]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 734C010E2; Thu, 30 Nov 2023 10:37:32 -0800 (PST) Received: by mail-ej1-f43.google.com with SMTP id a640c23a62f3a-9e1021dbd28so180955566b.3; Thu, 30 Nov 2023 10:37:32 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701369451; x=1701974251; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=DrjTsTYO2xlkgWoVz4wv+b3rfotA5xQPi9VNa2keNcM=; b=ZszbT5XWILnOA2CWjrt9midoU2oe6iPofHAaMm56f/kw4FWr+jcrkZa/pAK3v3KaDc OvQjmaS3aS3ntkDIHRgFIuEOhjJxRoV3HHhSn9ad1tnPPfWhDq80sY+pGO7Hj3+Fv10j f77VGhTikPmHVLaRv856SbUz+6MrhztnqacQbZhVF0EzOAxc0GFORLBQW/Ca11VvdFyN gJtBnWWuFuCarXUwliq/rcVZcE/gmmNmmbZsSRrvHR08loz3GS4B+c7/MtM/RvzLDTY3 glQrXyvp+xJBEW5r5ULrMFwhR0rdb+Z7KDY5GR5U6pF3ncrZCTltdiue9UkpDxGGXj4o jF7g== X-Gm-Message-State: AOJu0YyoFiFyKKV/mvypbXC+NapgrpyUKNWSvFezwDRRv1Raf5pR8xjZ b96PAOkPaa4RbnwrlN7MMWNJHzMTOd2bpA== X-Received: by 2002:a19:8c02:0:b0:50b:c4e3:b601 with SMTP id o2-20020a198c02000000b0050bc4e3b601mr11990lfd.66.1701367365369; Thu, 30 Nov 2023 10:02:45 -0800 (PST) Received: from localhost (ec2-13-42-112-133.eu-west-2.compute.amazonaws.com. [13.42.112.133]) by smtp.gmail.com with ESMTPSA id r11-20020ac25a4b000000b005033948f108sm214420lfn.272.2023.11.30.10.02.43 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 30 Nov 2023 10:02:44 -0800 (PST) Date: Thu, 30 Nov 2023 18:02:42 +0000 From: Fam Zheng To: Tony Luck Cc: Fenghua Yu , Reinette Chatre , Peter Newman , Jonathan Corbet , Shuah Khan , x86@kernel.org, Shaopeng Tan , James Morse , Jamie Iles , Babu Moger , Randy Dunlap , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, patches@lists.linux.dev, Shaopeng Tan Subject: Re: [PATCH v12 7/8] x86/resctrl: Sub NUMA Cluster detection and enable Message-ID: References: <20231109230915.73600-1-tony.luck@intel.com> <20231130003418.89964-1-tony.luck@intel.com> <20231130003418.89964-8-tony.luck@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20231130003418.89964-8-tony.luck@intel.com> X-Spam-Status: No, score=-0.8 required=5.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on pete.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (pete.vger.email [0.0.0.0]); Thu, 30 Nov 2023 10:37:49 -0800 (PST) Hi Tony, On Wed, Nov 29, 2023 at 04:34:17PM -0800, Tony Luck wrote: > There isn't a simple hardware bit that indicates whether a CPU is > running in Sub NUMA Cluster (SNC) mode. Infer the state by comparing > the ratio of NUMA nodes to L3 cache instances. > > When SNC mode is detected, reconfigure the RMID counters by updating > the MSR_RMID_SNC_CONFIG MSR on each socket as CPUs are seen. > > Clearing bit zero of the MSR divides the RMIDs and renumbers the ones > on the second SNC node to start from zero. > > Signed-off-by: Tony Luck > Reviewed-by: Peter Newman > Reviewed-by: Reinette Chatre > Reviewed-by: Shaopeng Tan > Tested-by: Shaopeng Tan > --- > arch/x86/include/asm/msr-index.h | 1 + > arch/x86/kernel/cpu/resctrl/core.c | 96 ++++++++++++++++++++++++++++++ > 2 files changed, 97 insertions(+) > > diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h > index 1d51e1850ed0..94d29d81e6db 100644 > --- a/arch/x86/include/asm/msr-index.h > +++ b/arch/x86/include/asm/msr-index.h > @@ -1111,6 +1111,7 @@ > #define MSR_IA32_QM_CTR 0xc8e > #define MSR_IA32_PQR_ASSOC 0xc8f > #define MSR_IA32_L3_CBM_BASE 0xc90 > +#define MSR_RMID_SNC_CONFIG 0xca0 > #define MSR_IA32_L2_CBM_BASE 0xd10 > #define MSR_IA32_MBA_THRTL_BASE 0xd50 > > diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c > index cf5aba8a74bf..3293ab4c58b0 100644 > --- a/arch/x86/kernel/cpu/resctrl/core.c > +++ b/arch/x86/kernel/cpu/resctrl/core.c > @@ -16,11 +16,14 @@ > > #define pr_fmt(fmt) "resctrl: " fmt > > +#include > #include > #include > #include > #include > +#include > > +#include > #include > #include > #include "internal.h" > @@ -740,11 +743,42 @@ static void clear_closid_rmid(int cpu) > wrmsr(MSR_IA32_PQR_ASSOC, 0, 0); > } > > +/* > + * The power-on reset value of MSR_RMID_SNC_CONFIG is 0x1 > + * which indicates that RMIDs are configured in legacy mode. > + * This mode is incompatible with Linux resctrl semantics > + * as RMIDs are partitioned between SNC nodes, which requires > + * a user to know which RMID is allocated to a task. > + * Clearing bit 0 reconfigures the RMID counters for use > + * in Sub NUMA Cluster mode. This mode is better for Linux. > + * The RMID space is divided between all SNC nodes with the > + * RMIDs renumbered to start from zero in each node when > + * couning operations from tasks. Code to read the counters > + * must adjust RMID counter numbers based on SNC node. See > + * __rmid_read() for code that does this. > + */ > +static void snc_remap_rmids(int cpu) > +{ > + u64 val; > + > + /* Only need to enable once per package. */ > + if (cpumask_first(topology_core_cpumask(cpu)) != cpu) > + return; > + > + rdmsrl(MSR_RMID_SNC_CONFIG, val); > + val &= ~BIT_ULL(0); > + wrmsrl(MSR_RMID_SNC_CONFIG, val); > +} > + > static int resctrl_online_cpu(unsigned int cpu) > { > struct rdt_resource *r; > > mutex_lock(&rdtgroup_mutex); > + > + if (snc_nodes_per_l3_cache > 1) > + snc_remap_rmids(cpu); > + > for_each_capable_rdt_resource(r) > domain_add_cpu(cpu, r); > /* The cpu is set in default rdtgroup after online. */ > @@ -999,11 +1033,73 @@ static __init bool get_rdt_resources(void) > return (rdt_mon_capable || rdt_alloc_capable); > } > > +/* CPU models that support MSR_RMID_SNC_CONFIG */ > +static const struct x86_cpu_id snc_cpu_ids[] __initconst = { > + X86_MATCH_INTEL_FAM6_MODEL(ICELAKE_X, 0), > + X86_MATCH_INTEL_FAM6_MODEL(SAPPHIRERAPIDS_X, 0), > + X86_MATCH_INTEL_FAM6_MODEL(EMERALDRAPIDS_X, 0), > + X86_MATCH_INTEL_FAM6_MODEL(GRANITERAPIDS_X, 0), > + {} > +}; > + > +/* > + * There isn't a simple hardware bit that indicates whether a CPU is running > + * in Sub NUMA Cluster (SNC) mode. Infer the state by comparing the > + * ratio of NUMA nodes to L3 cache instances. > + * It is not possible to accurately determine SNC state if the system is > + * booted with a maxcpus=N parameter. That distorts the ratio of SNC nodes > + * to L3 caches. It will be OK if system is booted with hyperthreading > + * disabled (since this doesn't affect the ratio). > + */ > +static __init int snc_get_config(void) > +{ > + unsigned long *node_caches; > + int mem_only_nodes = 0; > + int cpu, node, ret; > + int num_l3_caches; > + > + if (!x86_match_cpu(snc_cpu_ids)) > + return 1; > + > + node_caches = bitmap_zalloc(nr_node_ids, GFP_KERNEL); > + if (!node_caches) > + return 1; > + > + cpus_read_lock(); > + > + if (num_online_cpus() != num_present_cpus()) > + pr_warn("Some CPUs offline, SNC detection may be incorrect\n"); > + > + for_each_node(node) { > + cpu = cpumask_first(cpumask_of_node(node)); > + if (cpu < nr_cpu_ids) > + set_bit(get_cpu_cacheinfo_id(cpu, 3), node_caches); Are we sure get_cpu_cacheinfo_id() is an valid index here? Looking at the function it could be -1 or larger than nr_node_ids. Fam > + else > + mem_only_nodes++; > + } > + cpus_read_unlock(); > + > + num_l3_caches = bitmap_weight(node_caches, nr_node_ids); > + kfree(node_caches); > + > + if (!num_l3_caches) > + return 1; > + > + ret = (nr_node_ids - mem_only_nodes) / num_l3_caches; > + > + if (ret > 1) > + rdt_resources_all[RDT_RESOURCE_L3].r_resctrl.mon_scope = RESCTRL_NODE; > + > + return ret; > +} > + > static __init void rdt_init_res_defs_intel(void) > { > struct rdt_hw_resource *hw_res; > struct rdt_resource *r; > > + snc_nodes_per_l3_cache = snc_get_config(); > + > for_each_rdt_resource(r) { > hw_res = resctrl_to_arch_res(r); > > -- > 2.41.