Received: by 2002:a05:6358:9144:b0:117:f937:c515 with SMTP id r4csp2313566rwr; Fri, 21 Apr 2023 07:19:02 -0700 (PDT) X-Google-Smtp-Source: AKy350YoypkRfG+AYORc3i/eJ2Sf0DaG8B52+tfQhXWToAHyT/oEsCTwm7LWC+/BE3PL7Fn+kayQ X-Received: by 2002:a17:902:e544:b0:1a6:cf4b:4d7d with SMTP id n4-20020a170902e54400b001a6cf4b4d7dmr6307544plf.2.1682086741935; Fri, 21 Apr 2023 07:19:01 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1682086741; cv=none; d=google.com; s=arc-20160816; b=LBWL0LjrNNvPgiyffdfWpDzPvjO2yIWPfr54PJXvlfxHIVOwyMeMLnWA0kbR/49due PgMZcsnvEAGh+QRMqXp+HjqlIeEi7xMYseSSccqAVNaDdwGqzA3jRR+/kQDlprMyZlDJ qAiT55+D6yPFKorfBaBgxGfk9f8ARJYQTD+2xzOtA5g0sTXsBBG74GY7lKNmS1J8Ygwp 5LHuL38EFWLAbxO9I5a1QkZKUKJQqB6ZsNGi6iUP9F2gAN34twy+sb16Cvm5CZaRz6KV eJSSXW2rn6C1PlpTvLvpalylD+STTcsfL+J6x257qNOp5Cfn+W8AZUErKNOBGJT+l6ei u7PQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:mime-version:date :dkim-signature; bh=ObvMQpDO4VdOnjufZAdgVMFaduqKSCa+HnPbId1089s=; b=gjTKFhF7kEOW+vfnBgmoMKbHRPBYQaaZzEFjAoR7EkCHAyzyh1jAX+h4NDQcI/j6eI I3wnau6vbwpCWSHqYJOnXPQpUS7Sia74nl1Z6dV5zbstOw6lxmeKlq2k4CksfdKWb4MF ZR1NloayLSl+Os2tw4cXKNjPLNobELh0CBM+qWRgDnsBOGk8MRIv+vsyl8jPwv13aedt 7kCqpbXJN5a1ZWBD/cg6LLh/JraZycGQu1YAGn2riUhQQ4QYJ3hC2bcantvulnapWY1A g/YvsbIj6I15slw+vM1N3Wgm6QqMJJ2QOzjE225i19VbLmNH8xIkN1QlCOeD0uJbdAs3 jAxA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=ns5H+QuU; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id i8-20020a17090332c800b001a64b2dc495si4990552plr.462.2023.04.21.07.18.50; Fri, 21 Apr 2023 07:19:01 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=ns5H+QuU; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231956AbjDUOSN (ORCPT + 99 others); Fri, 21 Apr 2023 10:18:13 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53666 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232359AbjDUORq (ORCPT ); Fri, 21 Apr 2023 10:17:46 -0400 Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com [IPv6:2607:f8b0:4864:20::b4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 133B9C677 for ; Fri, 21 Apr 2023 07:17:44 -0700 (PDT) Received: by mail-yb1-xb4a.google.com with SMTP id 3f1490d57ef6-b92309d84c1so6367219276.1 for ; Fri, 21 Apr 2023 07:17:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1682086663; x=1684678663; h=cc:to:from:subject:message-id:mime-version:date:from:to:cc:subject :date:message-id:reply-to; bh=ObvMQpDO4VdOnjufZAdgVMFaduqKSCa+HnPbId1089s=; b=ns5H+QuUh8mZPsqMyCzdWVAI4JV7KSMOsX/PkKfi+KNM578zeMJZglKgjpr9UxuRnA wi25x9VT/iovSYgHd+ve7v7cd+FGEBPpntXYK5n0ggGm32KLQWoAmoG1zm5YzGbgtbrq RgTiawh3lkrVdmd0R3kDNubrRwYA0ZgP70oIVMckT5bHQVK56Xdft2ji0Wnav1ky6YhF nniaUNEowlFAQLCk2RUV86VHg09+JUS0zqM05SVxhgO6+lK/se36eKCwGMRdVYXqX4nx 8uLc5IBLZvH0koSjHV6sUtsZWbHv1v1y0x+sKRLknFsVO04HECC7IZBwaOVCeLtP7eYM qOAw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1682086663; x=1684678663; h=cc:to:from:subject:message-id:mime-version:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=ObvMQpDO4VdOnjufZAdgVMFaduqKSCa+HnPbId1089s=; b=Yv2v6h80XkCKI4lCXtJh0aMqBTmp+6HjxgSoL4V2cTiCQo7Lc9Szd2hug5yIQyUYLC BLv/4b2i48XZ5+AExyePJIKaC74bZ9I72NCRf//6BEXl+DBzxQW8d3r15ESRZfoUU/1H NfhCYK3xpQ9gtO0epENkC/ACedxbQ1PNhVS4aHcZoYLIKAMDuG2N4DHekHR2RQBpMLwM JWzA2kL2QZYa+jfnWiNaQOgUGRCmCXcba/oam4uREJQL+EmfZAhhLIQFJ+J38uyhiCvu zg0OuoDIkzJhYP/JBejEAP71waZmEvUDmRDt/IuI3hMH2Aqp5TQNcbAV7BRUiLidsYR5 Pifg== X-Gm-Message-State: AAQBX9f01d2VjhnHYHbqKrnGuwtqk+k6Qt33waCBE63CEfTVb2ggqHhT SvBncvXJ42ovMfTk1EcyWHkF1jYrwaUOi296GQ== X-Received: from peternewman0.zrh.corp.google.com ([2a00:79e0:9d:6:c801:daa2:428c:d3fc]) (user=peternewman job=sendgmr) by 2002:a05:690c:2b88:b0:54c:15ad:11e4 with SMTP id en8-20020a05690c2b8800b0054c15ad11e4mr1865952ywb.0.1682086663349; Fri, 21 Apr 2023 07:17:43 -0700 (PDT) Date: Fri, 21 Apr 2023 16:17:14 +0200 Mime-Version: 1.0 X-Mailer: git-send-email 2.40.0.634.g4ca3ef3211-goog Message-ID: <20230421141723.2405942-1-peternewman@google.com> Subject: [PATCH v1 0/9] x86/resctrl: Use soft RMIDs for reliable MBM on AMD From: Peter Newman To: Fenghua Yu , Reinette Chatre Cc: Babu Moger , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, "H. Peter Anvin" , Stephane Eranian , James Morse , linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, Peter Newman Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED, USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Reinette, Fenghua, This series introduces a new mount option enabling an alternate mode for MBM to work around an issue on present AMD implementations and any other resctrl implementation where there are more RMIDs (or equivalent) than hardware counters. The L3 External Bandwidth Monitoring feature of the AMD PQoS extension[1] only guarantees that RMIDs currently assigned to a processor will be tracked by hardware. The counters of any other RMIDs which are no longer being tracked will be reset to zero. The MBM event counters return "Unavailable" to indicate when this has happened. An interval for effectively measuring memory bandwidth typically needs to be multiple seconds long. In Google's workloads, it is not feasible to bound the number of jobs with different RMIDs which will run in a cache domain over any period of time. Consequently, on a fully-committed system where all RMIDs are allocated, few groups' counters return non-zero values. To demonstrate the underlying issue, the first patch provides a test case in tools/testing/selftests/resctrl/test_rmids.sh. On an AMD EPYC 7B12 64-Core Processor with the default behavior: # ./test_rmids.sh Created 255 monitoring groups. g1: mbm_total_bytes: Unavailable -> Unavailable (FAIL) g2: mbm_total_bytes: Unavailable -> Unavailable (FAIL) g3: mbm_total_bytes: Unavailable -> Unavailable (FAIL) [..] g238: mbm_total_bytes: Unavailable -> Unavailable (FAIL) g239: mbm_total_bytes: Unavailable -> Unavailable (FAIL) g240: mbm_total_bytes: Unavailable -> Unavailable (FAIL) g241: mbm_total_bytes: Unavailable -> 660497472 g242: mbm_total_bytes: Unavailable -> 660793344 g243: mbm_total_bytes: Unavailable -> 660477312 g244: mbm_total_bytes: Unavailable -> 660495360 g245: mbm_total_bytes: Unavailable -> 660775360 g246: mbm_total_bytes: Unavailable -> 660645504 g247: mbm_total_bytes: Unavailable -> 660696128 g248: mbm_total_bytes: Unavailable -> 660605248 g249: mbm_total_bytes: Unavailable -> 660681280 g250: mbm_total_bytes: Unavailable -> 660834240 g251: mbm_total_bytes: Unavailable -> 660440064 g252: mbm_total_bytes: Unavailable -> 660501504 g253: mbm_total_bytes: Unavailable -> 660590720 g254: mbm_total_bytes: Unavailable -> 660548352 g255: mbm_total_bytes: Unavailable -> 660607296 255 groups, 0 returned counts in first pass, 15 in second successfully measured bandwidth from 15/255 groups To compare, here is the output from an Intel(R) Xeon(R) Platinum 8173M CPU: # ./test_rmids.sh Created 223 monitoring groups. g1: mbm_total_bytes: 0 -> 606126080 g2: mbm_total_bytes: 0 -> 613236736 g3: mbm_total_bytes: 0 -> 610254848 [..] g221: mbm_total_bytes: 0 -> 584679424 g222: mbm_total_bytes: 0 -> 588808192 g223: mbm_total_bytes: 0 -> 587317248 223 groups, 223 returned counts in first pass, 223 in second successfully measured bandwidth from 223/223 groups To make better use of the hardware in such a use case, this patchset introduces a "soft" RMID implementation, where each CPU is permanently assigned a "hard" RMID. On context switches which change the current soft RMID, the difference between each CPU's current event counts and most recent counts is added to the totals for the current or outgoing soft RMID. This technique does not work for cache occupancy counters, so this patch series disables cache occupancy events when soft RMIDs are enabled. This series adds the "mbm_soft_rmid" mount option to allow users to opt-in to the functionaltiy when they deem it helpful. When the same system from the earlier AMD example enables the mbm_soft_rmid mount option: # ./test_rmids.sh Created 255 monitoring groups. g1: mbm_total_bytes: 0 -> 686560576 g2: mbm_total_bytes: 0 -> 668204416 [..] g252: mbm_total_bytes: 0 -> 672651200 g253: mbm_total_bytes: 0 -> 666956800 g254: mbm_total_bytes: 0 -> 665917056 g255: mbm_total_bytes: 0 -> 671049600 255 groups, 255 returned counts in first pass, 255 in second successfully measured bandwidth from 255/255 groups (patches are based on tip/master) [1] https://www.amd.com/system/files/TechDocs/56375_1.03_PUB.pdf Peter Newman (8): selftests/resctrl: Verify all RMIDs count together x86/resctrl: Add resctrl_mbm_flush_cpu() to collect CPUs' MBM events x86/resctrl: Flush MBM event counts on soft RMID change x86/resctrl: Call mon_event_count() directly for soft RMIDs x86/resctrl: Create soft RMID version of __mon_event_count() x86/resctrl: Assign HW RMIDs to CPUs for soft RMID x86/resctrl: Use mbm_update() to push soft RMID counts x86/resctrl: Add mount option to enable soft RMID Stephane Eranian (1): x86/resctrl: Hold a spinlock in __rmid_read() on AMD arch/x86/include/asm/resctrl.h | 29 +++- arch/x86/kernel/cpu/resctrl/core.c | 80 ++++++++- arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 9 +- arch/x86/kernel/cpu/resctrl/internal.h | 19 ++- arch/x86/kernel/cpu/resctrl/monitor.c | 158 +++++++++++++++++- arch/x86/kernel/cpu/resctrl/rdtgroup.c | 52 ++++++ tools/testing/selftests/resctrl/test_rmids.sh | 93 +++++++++++ 7 files changed, 425 insertions(+), 15 deletions(-) create mode 100755 tools/testing/selftests/resctrl/test_rmids.sh base-commit: dd806e2f030e57dd5bac973372aa252b6c175b73 -- 2.40.0.634.g4ca3ef3211-goog