Received: by 2002:ab2:b82:0:b0:1f3:401:3cfb with SMTP id 2csp20130lqh; Wed, 27 Mar 2024 13:26:20 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCW8KpJJMoeyQJc5nkWCah6vuJz25Z55w9d2LK6g8iNJVFKf/LMLN+JrsyKoA6BZQjJeDn0xoSqFDddQUVWDWtY7bFwjFnkJk2AMJBLJzA== X-Google-Smtp-Source: AGHT+IGei02MmT2wqam+1D2mtshHPGxdRDV6kVDIGgvcUiv5IOatHKN8XMIOt0FLaj+C7oBX8Tqr X-Received: by 2002:a05:6359:7c1f:b0:17b:520f:8249 with SMTP id xm31-20020a0563597c1f00b0017b520f8249mr433192rwb.31.1711571180278; Wed, 27 Mar 2024 13:26:20 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1711571180; cv=pass; d=google.com; s=arc-20160816; b=JVWZ6IEo2nxrf8efrMmzbrW6UIdhQ8AScT0CG+RQIbjQJtGNLWXuH9Y+dysMoGzggd 90vzmn1+23n1Qi9jF7SJNRN0Yd1IY7jic0bVu7u1EsJkYwfEYUVPtt9q3++MHSdV4Lt1 7HlIf901HXwkyyyB5XCWVyTR20iPweFtSGrb60tfqL5XMK1YPCDxaffC66XpWdIbKM7G bq0P1EDLA8Ri1/UJS3gewM0h8SgaIdsxrDgEpepOFnMSGLlU9Zy2+Yqv8cIqya8jE22b IfTAI68s0sEzbMyUSlBSJYhkaskh9UgZAM3tvZ7w8dK1IbnUFSK1XotpehjoR2WnwJXs FgvQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from:dkim-signature; bh=rzQA0msHZG5btGGQh21JqtWRIp2RHOcoa3Un2di5nvs=; fh=H9/+fk2Ete/Gzpx6dba0TKFidU7mNxuKLM/h3KEx+fQ=; b=BuMw6UdrngN2lZsucHkcQCgWyBvlyu6NsJCbOvsdDUXAc9rJL6o0PbzDrzDdzE4n+h kxyP8FlTS7JfwV0sq4VmYMkX9zqvrP5LQ96twKgsmWAz8ZDVxySW20yE2kc75WrQvHtx pYoMJcHpR1NAdUPlgSxXIRBoPeNnCTEsPr+XvI/Ibts8N0ReReE8YXwQvuuy6frQq0Op LOSIUKZrWz5simRIm3ilQS9NXn30H7PyVf075O4iCraO3C8Yaax7LVzTh+a+R8sPjtZg JobbXN0/fPx7fVU7yELGLGn1VGk+OSPr8+nYqYJaMwxWDb0BvZ2DOKbBw2PZ/FrHlvwf U9NA==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=KtktXxvn; arc=pass (i=1 spf=pass spfdomain=intel.com dkim=pass dkdomain=intel.com dmarc=pass fromdomain=intel.com); spf=pass (google.com: domain of linux-kernel+bounces-121959-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) smtp.mailfrom="linux-kernel+bounces-121959-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from sy.mirrors.kernel.org (sy.mirrors.kernel.org. [147.75.48.161]) by mx.google.com with ESMTPS id e22-20020a63f556000000b005d8b6fe2b5dsi12221443pgk.569.2024.03.27.13.26.19 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 27 Mar 2024 13:26:20 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-121959-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) client-ip=147.75.48.161; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=KtktXxvn; arc=pass (i=1 spf=pass spfdomain=intel.com dkim=pass dkdomain=intel.com dmarc=pass fromdomain=intel.com); spf=pass (google.com: domain of linux-kernel+bounces-121959-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) smtp.mailfrom="linux-kernel+bounces-121959-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sy.mirrors.kernel.org (Postfix) with ESMTPS id 8AC46B2797D for ; Wed, 27 Mar 2024 20:07:24 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 104621553AC; Wed, 27 Mar 2024 20:04:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="KtktXxvn" Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id ED1F0153823 for ; Wed, 27 Mar 2024 20:04:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.12 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711569851; cv=none; b=rHXX84RqWXvK9LaxSG7fIag7j6CFXj0XbLHpm9ZOVsOIIbgs9DlurG1wZ22gpn09LtzfzBlRJYCO9AduvWLcZfZQnRGR9hNC3DjABT+ex4Cn3qsDj4ZF5qHDURrZ2b7jG6fmRlPmFho7AhGAYnaO8Qmkvr3zwIeBlVAMPdEPsSs= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711569851; c=relaxed/simple; bh=jnjYRbh/fPxftA3RtGXzv9UpX5enrkyZvcYUhq10uNg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=kOymHdHF6kcnbbAT6wY7GoxM4BL69lEZtdsPmnbwKnrY0lJqnTmswT0nLP39hM/r6N85D+wel6yA1oWh3aArDB8+YK5I7Ibzys/WQrzK1gTa1SauIPGEb5RdKEijX7Jv3R3X5HYIDnuo+xyTtywGDRqw8eYROWHqeg5E6siKj0g= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=KtktXxvn; arc=none smtp.client-ip=198.175.65.12 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1711569850; x=1743105850; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=jnjYRbh/fPxftA3RtGXzv9UpX5enrkyZvcYUhq10uNg=; b=KtktXxvnxwcv6XGt8VLiYSSI9Rvh4ShVWSnyzu+iuefD2n4NP70LPEOJ ZWkslgBwq96mBXu+DpYfgZb0/f/+jrshWYPv3nNJu3UrJl0ZLKGVoNGcn 3K9H9H6sLJbm7aaHf2t6MpCirnSqu74PDHMCFrrTUwhNxWZ4RBLu7+KkZ JZ3pzpsb5BLyUPfetaFBsnhz4VlhIrhKoo1CM4+/ufl/Stl4N8sugtKJm 3cTFtHZ33I5KUjhvdHjwe+IAdOX3MlsXucpCRl6QJOPz7G/aiLBytWjOF IBch43duRtiVepAUQCvJNGgqFYdhHBZmqIGXOUfbhj7IR1SxTF9Z0DJU5 Q==; X-CSE-ConnectionGUID: W8wRhlIrS0O910ukY8U+vQ== X-CSE-MsgGUID: sxEC4Ze1Q6qH83+uwQe2mQ== X-IronPort-AV: E=McAfee;i="6600,9927,11026"; a="18133030" X-IronPort-AV: E=Sophos;i="6.07,159,1708416000"; d="scan'208";a="18133030" Received: from orviesa010.jf.intel.com ([10.64.159.150]) by orvoesa104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Mar 2024 13:04:04 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.07,159,1708416000"; d="scan'208";a="16246150" Received: from agluck-desk3.sc.intel.com ([172.25.222.105]) by orviesa010-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Mar 2024 13:04:04 -0700 From: Tony Luck To: Fenghua Yu , Reinette Chatre , Maciej Wieczor-Retman , Peter Newman , James Morse , Babu Moger , Drew Fustini Cc: x86@kernel.org, linux-kernel@vger.kernel.org, patches@lists.linux.dev, Tony Luck Subject: [PATCH 10/10] x86/resctrl: Determine Sub-NUMA configuration Date: Wed, 27 Mar 2024 13:03:52 -0700 Message-ID: <20240327200352.236835-11-tony.luck@intel.com> X-Mailer: git-send-email 2.44.0 In-Reply-To: <20240327200352.236835-1-tony.luck@intel.com> References: <20240327200352.236835-1-tony.luck@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit There isn't an explicit enumeration of Sub-NUMA cluster mode. Use the ratio of the number of CPUs that share an L3 cache instance with CPU 0 against the number of CPUs that share a node with CPU0. When Sub-NUMA cluster mode is enabled, adjust the number of RMIDs, the sclaing factor, and setup the parent/child pointers in the L3 and SUBL3 rdt_resource structures, etc. As each Sub-NUMA domain is brought online, update the MSR_RMID_SNC_CONFIG to remap RMID counters. Signed-off-by: Tony Luck --- arch/x86/include/asm/msr-index.h | 1 + arch/x86/kernel/cpu/resctrl/monitor.c | 68 ++++++++++++++++++++++++-- arch/x86/kernel/cpu/resctrl/rdtgroup.c | 26 ++++++++++ 3 files changed, 91 insertions(+), 4 deletions(-) diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h index 05956bd8bacf..b54c26016c93 100644 --- a/arch/x86/include/asm/msr-index.h +++ b/arch/x86/include/asm/msr-index.h @@ -1158,6 +1158,7 @@ #define MSR_IA32_QM_CTR 0xc8e #define MSR_IA32_PQR_ASSOC 0xc8f #define MSR_IA32_L3_CBM_BASE 0xc90 +#define MSR_RMID_SNC_CONFIG 0xca0 #define MSR_IA32_L2_CBM_BASE 0xd10 #define MSR_IA32_MBA_THRTL_BASE 0xd50 diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c index 1ba40d5f5d77..757d475158a3 100644 --- a/arch/x86/kernel/cpu/resctrl/monitor.c +++ b/arch/x86/kernel/cpu/resctrl/monitor.c @@ -17,6 +17,7 @@ #include #include +#include #include #include @@ -1051,16 +1052,59 @@ static void l3_mon_evt_init(struct rdt_resource *r) list_add_tail(&mbm_local_event.list, &r->evt_list); } +/* CPU models that support MSR_RMID_SNC_CONFIG */ +static const struct x86_cpu_id snc_cpu_ids[] __initconst = { + X86_MATCH_INTEL_FAM6_MODEL(ICELAKE_X, 0), + X86_MATCH_INTEL_FAM6_MODEL(SAPPHIRERAPIDS_X, 0), + X86_MATCH_INTEL_FAM6_MODEL(EMERALDRAPIDS_X, 0), + X86_MATCH_INTEL_FAM6_MODEL(GRANITERAPIDS_X, 0), + X86_MATCH_INTEL_FAM6_MODEL(ATOM_CRESTMONT_X, 0), + {} +}; + +static __init int snc_get_config(void) +{ + struct cpu_cacheinfo *ci = get_cpu_cacheinfo(0); + cpumask_t *l3_cpumask = NULL; + const cpumask_t *node0_cpumask; + int i; + + if (!x86_match_cpu(snc_cpu_ids)) + return 1; + + for (i = 0; i < ci->num_leaves; i++) { + if (ci->info_list[i].level == 3) { + if (ci->info_list[i].attributes & CACHE_ID) { + l3_cpumask = &ci->info_list[i].shared_cpu_map; + break; + } + } + } + if (!l3_cpumask) { + pr_info("can't get CPU0 L3 mask\n"); + return 1; + } + + node0_cpumask = cpumask_of_node(cpu_to_node(0)); + + return bitmap_weight(cpumask_bits(l3_cpumask), nr_cpu_ids) / + bitmap_weight(cpumask_bits(node0_cpumask), nr_cpu_ids); +} + int __init rdt_get_mon_l3_config(struct rdt_resource *r) { unsigned int mbm_offset = boot_cpu_data.x86_cache_mbm_width_offset; struct rdt_hw_resource *hw_res = resctrl_to_arch_res(r); unsigned int threshold; + int snc_ways; int ret; + snc_ways = snc_get_config(); + if (snc_ways > 1) + pr_info("Sub-NUMA cluster detected with %d nodes per L3 cache\n", snc_ways); resctrl_rmid_realloc_limit = boot_cpu_data.x86_cache_size * 1024; - hw_res->mon_scale = boot_cpu_data.x86_cache_occ_scale; - r->num_rmid = boot_cpu_data.x86_cache_max_rmid + 1; + hw_res->mon_scale = boot_cpu_data.x86_cache_occ_scale / snc_ways; + r->num_rmid = (boot_cpu_data.x86_cache_max_rmid + 1) / snc_ways; hw_res->mbm_width = MBM_CNTR_WIDTH_BASE; if (mbm_offset > 0 && mbm_offset <= MBM_CNTR_WIDTH_OFFSET_MAX) @@ -1084,8 +1128,23 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r) */ resctrl_rmid_realloc_threshold = resctrl_arch_round_mon_val(threshold); - r->num_siblings = 1; - rdt_l3_mon_resource = r; + if (snc_ways > 1) { + struct rdt_hw_resource *shw_res; + + rdt_l3_mon_resource = &rdt_resources_all[RDT_RESOURCE_SUBL3].r_resctrl; + rdt_l3_mon_resource->num_rmid = r->num_rmid; + rdt_l3_mon_resource->num_siblings = snc_ways; + + shw_res = resctrl_to_arch_res(rdt_l3_mon_resource); + shw_res->mon_scale = hw_res->mon_scale; + shw_res->mbm_width = hw_res->mbm_width; + + r->child = rdt_l3_mon_resource; + rdt_l3_mon_resource->parent = r; + } else { + r->num_siblings = 1; + rdt_l3_mon_resource = r; + } ret = dom_data_init(rdt_l3_mon_resource); if (ret) @@ -1110,6 +1169,7 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r) l3_mon_evt_init(rdt_l3_mon_resource); + r->mon_capable = true; rdt_l3_mon_resource->mon_capable = true; rdt_l3_mon_resource->mon_enabled = true; diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c index 3e7f2e36b71e..b1f79fafa333 100644 --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c @@ -4067,6 +4067,29 @@ static int domain_setup_mon_state(struct rdt_resource *r, struct rdt_domain *d) return 0; } +/* + * The power-on reset value of MSR_RMID_SNC_CONFIG is 0x1 + * which indicates that RMIDs are configured in legacy mode. + * This mode is incompatible with Linux resctrl semantics + * as RMIDs are partitioned between SNC nodes, which requires + * a user to know which RMID is allocated to a task. + * Clearing bit 0 reconfigures the RMID counters for use + * in Sub NUMA Cluster mode. This mode is better for Linux. + * The RMID space is divided between all SNC nodes with the + * RMIDs renumbered to start from zero in each node when + * couning operations from tasks. Code to read the counters + * must adjust RMID counter numbers based on SNC node. See + * __rmid_read() for code that does this. + */ +static void snc_remap_rmids(void) +{ + u64 val; + + rdmsrl(MSR_RMID_SNC_CONFIG, val); + val &= ~BIT_ULL(0); + wrmsrl(MSR_RMID_SNC_CONFIG, val); +} + int resctrl_online_domain(struct rdt_resource *r, struct rdt_domain *d) { int err = 0; @@ -4082,6 +4105,9 @@ int resctrl_online_domain(struct rdt_resource *r, struct rdt_domain *d) if (!r->mon_enabled) goto out_unlock; + if (r->num_siblings > 1) + snc_remap_rmids(); + err = domain_setup_mon_state(r, d); if (err) goto out_unlock; -- 2.44.0