Received: by 2002:a05:7412:f589:b0:e2:908c:2ebd with SMTP id eh9csp523444rdb; Tue, 31 Oct 2023 14:17:38 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGZngstoYvwSFdSjsFeJmtdFsyESBaIlstCPl5RBVzwjpdRcx/0vX/azyVhAfnWQ4GlZPZF X-Received: by 2002:a17:902:ec82:b0:1cc:6acc:8fa4 with SMTP id x2-20020a170902ec8200b001cc6acc8fa4mr2764237plg.32.1698787058123; Tue, 31 Oct 2023 14:17:38 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1698787058; cv=none; d=google.com; s=arc-20160816; b=OmeXFB1VeL7/GS14sS5xn+yTdeznUOfMWk+R3oWgjKOcx7OxW9O7DH6I+/Vy1V7FfX vvTaJqu9G+V3msdH7tVQ/xqFUTSnet/vqwYCMoZGpaALQ6DqDi1Tfd6DfqL9bCSFqcfn WbRH9qiMQecUc1B7t6lo1L2A69vXogW/u0NCmn+1ZnVQECpwPVpYaIsCoQXumlTf9+k1 um4dZFNLA0F8nUgG1NcKnQaAv+uMJKjCRu8bgNXjOCr4v8zhYzKJsRbWtkczriylGhGF ymNJ1cbb4ON6q1YzEHOT/CgeF2ZoBEI8WaBiQ0bTfGmsxkZeZtuxI3Avns3CTTnqCmYV iUTw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=AtqDsDw//DUtr38qFAURFpPyJf2zBgY2xouhQ7VxybM=; fh=EIH9XAmicvPIUSP7TBeBhZ/WaoqG49JQ3xV1i3Gl7Co=; b=z35LO+kw9XvZ7snAYP1+XMQCvN+PcXkBiuIW2ItFiqLVkTw5/o/MkRyHVFy5g4HhQl NqXgzAofFTFu9WrGkI4Zr789eW06/q3l7HCr10mw+glza08HAW3moxC30ZMysPDvDK1w ozRHwmI/qLjonpHwtJlHsDHA+z2+8+XNNaJJU9FNzMyO+UrA7sPj3ZpOCDnQiV49+HUT OvOMLrV1ux0pjyxt+omUxLcucmEJfepIDrt3ZB7b1BlbgI+XWeWCkV+GmAjEs4Lumnh/ R+Qi7yMK/Shvjv2I1QlKhWzzAfn+W/KAJ1DrgRGeDTXm0eglIckDSFs3UNGBUFzU/1eo r60A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=f7XqiRFv; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:4 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from howler.vger.email (howler.vger.email. [2620:137:e000::3:4]) by mx.google.com with ESMTPS id jj19-20020a170903049300b001c62b9d56f8si1547528plb.606.2023.10.31.14.17.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 31 Oct 2023 14:17:38 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:4 as permitted sender) client-ip=2620:137:e000::3:4; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=f7XqiRFv; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:4 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by howler.vger.email (Postfix) with ESMTP id 89F648028126; Tue, 31 Oct 2023 14:17:29 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at howler.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1344646AbjJaVRU (ORCPT + 99 others); Tue, 31 Oct 2023 17:17:20 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52330 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231287AbjJaVRT (ORCPT ); Tue, 31 Oct 2023 17:17:19 -0400 Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.115]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1D7D3E4; Tue, 31 Oct 2023 14:17:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1698787037; x=1730323037; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=4IsjhjVYSNBncvjiDxuNC4YOnUBnAGYPBdW4HBXImVc=; b=f7XqiRFvAeOu0uZWstFYtfMpTX5w+ciwdTAMpECewu2c+uIBF9ZkJJCc fGA9PcV+W69BVe/D1Q7b5EnX4uoaIMm42XanIDRzSiaya0RWqMCXau9V3 K94QaCTZjX1sYVE13qcGqQKPGVp1rneOhhiggGVt5DNUJxreNIQatExLZ 4lX6bkmM79fbJAEim7ktdEng3kWVf34jvZplWb8CTs+8lsiZMXuR4o5Sq BWGwh+rCX5LF35IrxQbRJfJ+GiFu+dYUR/75cElaKT5Z0x85VLcnRO4pK xsmr0W89Pk79FvmSXkiXCatM0iW7uR0aNPibtoPqTLrY9VtWG8PZEDIKp w==; X-IronPort-AV: E=McAfee;i="6600,9927,10880"; a="388197060" X-IronPort-AV: E=Sophos;i="6.03,266,1694761200"; d="scan'208";a="388197060" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 31 Oct 2023 14:17:16 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10880"; a="884334986" X-IronPort-AV: E=Sophos;i="6.03,266,1694761200"; d="scan'208";a="884334986" Received: from agluck-desk3.sc.intel.com ([172.25.222.74]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 31 Oct 2023 14:17:15 -0700 From: Tony Luck To: Fenghua Yu , Reinette Chatre , Peter Newman , Jonathan Corbet , Shuah Khan , x86@kernel.org Cc: Shaopeng Tan , James Morse , Jamie Iles , Babu Moger , Randy Dunlap , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, patches@lists.linux.dev, Tony Luck Subject: [PATCH v10 0/8] Add support for Sub-NUMA cluster (SNC) systems Date: Tue, 31 Oct 2023 14:17:00 -0700 Message-ID: <20231031211708.37390-1-tony.luck@intel.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20231020213100.123598-1-tony.luck@intel.com> References: <20231020213100.123598-1-tony.luck@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-1.3 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on howler.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (howler.vger.email [0.0.0.0]); Tue, 31 Oct 2023 14:17:30 -0700 (PDT) The Sub-NUMA cluster feature on some Intel processors partitions the CPUs that share an L3 cache into two or more sets. This plays havoc with the Resource Director Technology (RDT) monitoring features. Prior to this patch Intel has advised that SNC and RDT are incompatible. Some of these CPU support an MSR that can partition the RMID counters in the same way. This allows monitoring features to be used. With the caveat that users must be aware that Linux may migrate tasks more frequently between SNC nodes than between "regular" NUMA nodes, so reading counters from all SNC nodes may be needed to get a complete picture of activity for tasks. Cache and memory bandwidth allocation features continue to operate at the scope of the L3 cache. Signed-off-by: Tony Luck --- Dropped Peter's "Reviewed-by" from all but parts 5 & 8 since there have been many changes since he provided those. Other changes since v9 (all from Reinette's comments) global s/cpu/CPU/ in commit messages and code comments #1 New test for invalid domain id before calling rdt_find_domain() means that error handling in that function and at all call-sites can be simplified. In pseudo_lock_region_init() use the new enum resctrl_scope for local variable. #2 Include *all* common fields in the rdt_domain_hdr. Defer adding "type" until it is used later in part #3. #3 Fix commit to be specific the only the RDT_RESOURCE_L3 resource is going to have different monitor and control scope. Rename get_domain_from_cpu() -> get_ctrl_domain_from_cpu() Rewrite comment for rdt_find_domains(). Add "type" field to rdt_domain_hdr structure. Delete the /* RDT_RESOURCE_MBA is never mon_capable */ comment. #4 Comment against patch 4, but now fixed in patch #2. cpu_mask is included in common header. #5 No comments. No changes. #6 Fixed missing word s/monitoring on Intel/monitoring on an Intel/ Deleted "A later patch" paragraph. Expanded description how how values are "adjusted" for mon_scale and cache size. Changed type of "snc_nodes_per_l3_cache" to "unsigned int". #7 Expand h/w to hardware (commit and code comments) Remove "earlier commit" reference s/counnter/counter/ Check for offline CPUs and warn user SNC detection may be broken. #8 No comments. No changes. Tony Luck (8): x86/resctrl: Prepare for new domain scope x86/resctrl: Prepare to split rdt_domain structure x86/resctrl: Prepare for different scope for control/monitor operations x86/resctrl: Split the rdt_domain and rdt_hw_domain structures x86/resctrl: Add node-scope to the options for feature scope x86/resctrl: Introduce snc_nodes_per_l3_cache x86/resctrl: Sub NUMA Cluster detection and enable x86/resctrl: Update documentation with Sub-NUMA cluster changes Documentation/arch/x86/resctrl.rst | 23 +- include/linux/resctrl.h | 87 +++-- arch/x86/include/asm/msr-index.h | 1 + arch/x86/kernel/cpu/resctrl/internal.h | 66 ++-- arch/x86/kernel/cpu/resctrl/core.c | 411 +++++++++++++++++----- arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 58 +-- arch/x86/kernel/cpu/resctrl/monitor.c | 68 ++-- arch/x86/kernel/cpu/resctrl/pseudo_lock.c | 26 +- arch/x86/kernel/cpu/resctrl/rdtgroup.c | 149 ++++---- 9 files changed, 607 insertions(+), 282 deletions(-) base-commit: 5a6a09e97199d6600d31383055f9d43fbbcbe86f -- 2.41.0