Received: by 2002:a05:7412:8d10:b0:f3:1519:9f41 with SMTP id bj16csp1180493rdb; Wed, 6 Dec 2023 10:38:51 -0800 (PST) X-Google-Smtp-Source: AGHT+IGuwdIR/7TbuF/FgXomTOvaB/XAy98t+/GNqusQvqRqLfle+yZBFkNs9d/wffASKfEF9kG0 X-Received: by 2002:a17:902:bd04:b0:1d0:6ffe:1e63 with SMTP id p4-20020a170902bd0400b001d06ffe1e63mr1151033pls.70.1701887930856; Wed, 06 Dec 2023 10:38:50 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1701887930; cv=none; d=google.com; s=arc-20160816; b=0q0dhubApVQm234Oxz1mxpY4tnNfz9Y5arQVrK/XJSsDfyqn+QMY3McMu2PwQkE3MS vFDl7p4AXs8odiW5mImvZcl2h7R2p0FvLXpBCbkbA0mE8FYbUt5CtSvuLaKDUQQvP12I EWelQDk39UaqUb+2HPncXWatP5LZQuDcNgbLa40lU1mKBjIq8DxtxReQJHPIf1Cw7U7X 19GiEnWLpaKIo839FHgGcGDXDrEiOBKxpYBYXwYkAKJWrS+8phrt1R9sqUMS12KYCOJf 8MTP+kqYZeS1rQexUVAqURFOQg/mx/5Dy3WTxJ/h6tNU3vlNeg/mP1wTh0l9mwODZ4S+ PY+Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=1kHi+/x/yV8/AO54c41VfM0jyCOf9g4qLEuUxaryfSU=; fh=ilA/TaBsBHJyAPVjPxR4JZ5v9LoIDVWQkt5oW9EDX7c=; b=yrq0ryX29ISe63Ez74P4s9NyAZCA17Y/czEPDy/iVyFp5TNoCGuCJ1WawMhwfLdnQz qkNSt8ITxqLOqhyN4LyMGnjZjmLmNJ6NM0uXDCutg5lWPciCbp9R61Ku8LMIhFFfhh/h bLN7pU37zl8dQhS3c8bnyx+y0kKUg2mdakfw5z/jlklzviwCHETBe4k7JBNlWRFA60Ha dYoW25PjhMFnO4+l534R6d8v40Omjnpd8fsyni+KexmTjJjWcxERJ+jBc7fL0/Ldt4kN nSZifoYSNQiODz6pMx847nJqQconzp1Dt0Dz1+LydWXfiKpES8TexLLSB3JhBrivB7DE Txqw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=PUQINWTZ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from groat.vger.email (groat.vger.email. [2620:137:e000::3:5]) by mx.google.com with ESMTPS id ik14-20020a170902ab0e00b001d07f2e166csi183924plb.261.2023.12.06.10.38.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 06 Dec 2023 10:38:50 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) client-ip=2620:137:e000::3:5; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=PUQINWTZ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by groat.vger.email (Postfix) with ESMTP id 35973802852D; Wed, 6 Dec 2023 10:38:47 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at groat.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1378819AbjLFSiV (ORCPT + 99 others); Wed, 6 Dec 2023 13:38:21 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40774 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1378355AbjLFSiU (ORCPT ); Wed, 6 Dec 2023 13:38:20 -0500 Received: from mail-il1-x133.google.com (mail-il1-x133.google.com [IPv6:2607:f8b0:4864:20::133]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1E79811F for ; Wed, 6 Dec 2023 10:38:27 -0800 (PST) Received: by mail-il1-x133.google.com with SMTP id e9e14a558f8ab-35d68239732so7805ab.1 for ; Wed, 06 Dec 2023 10:38:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1701887906; x=1702492706; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=1kHi+/x/yV8/AO54c41VfM0jyCOf9g4qLEuUxaryfSU=; b=PUQINWTZcuJ1Gzxm+DVf8U75wXjpYcVdMtpAoAD2Iw0x8aACAL/G89S72dNVxK7UoB wnVsTWUJiFM9R4kDT2yaDaBzg6PgTC2eK4KQZs/ZV/SMq2EX9CirM54Gihu034T0tsDS N7dNsYL+rzxUXWCBdBT2zecBQz1v3vGRd1WAVADPwB81EM0V1QX3JzGATe/2M2lEbeIi XgyW7nEhYX9SXPMsmNH3P38TjiPmEgJa32ggS+7YNfN08KbKuIVDw3tktcldndu00y0W 2aMRtNUOjZhvDVJZ6u5KfyooWPwB3I6AVnC8Y3h9ETLim2MT6JXSrCLMDjEOFIkrR1P/ iErQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701887906; x=1702492706; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=1kHi+/x/yV8/AO54c41VfM0jyCOf9g4qLEuUxaryfSU=; b=jETOLpandunL8kSDdbtExCinoFi1js82yjTNJxixpWBe0be4lq+0WYGh3O8j4n+HVu 3ptHd9LH3SZ4xEJh3VdTnWIbp/ry61AHy/UsRMVpb1YYWb9gyYzHI0bTOiEo6Aif1cHB VYbAA66SRe4w3bdSVYjr3FrHyQpcozYGDacF4O0iAUSevMep3denwc0tgdY/6A45oEGB DxbxDaHS6IGgg5otdboAIL0+tJYUxFPWVB0DZN1GvZC6Uux77oZxk5VC/Orc1xqMa6f0 OIosw3xNKcT9jkVJqLOzT6LrM7kV7dxwVB7+9PLdA5sHrzdJJFlKv3w+Lv1tfRQotgBS pGmA== X-Gm-Message-State: AOJu0YxXTgcglRtA5Qb+TqC0c8KY30CN0dPeVhVgTAM13Xs0FyEmc3CT AHrEvS/Gn1nx03I8BL9aL11n+TwokLsrKnXKeXz0CA== X-Received: by 2002:a05:6e02:1bc4:b0:35d:5fd7:b57 with SMTP id x4-20020a056e021bc400b0035d5fd70b57mr369567ilv.20.1701887906105; Wed, 06 Dec 2023 10:38:26 -0800 (PST) MIME-Version: 1.0 References: <20230421141723.2405942-1-peternewman@google.com> <20230421141723.2405942-4-peternewman@google.com> <38b9e6df-cccd-a745-da4a-1d1a0ec86ff3@intel.com> <31993ea8-97e5-b8d5-b344-48db212bc9cf@intel.com> <04c9eb5e-3395-05e6-f0cc-bc8f054a6031@intel.com> <101c0235-c354-43b1-afc2-1332bd8b453a@intel.com> In-Reply-To: <101c0235-c354-43b1-afc2-1332bd8b453a@intel.com> From: Peter Newman Date: Wed, 6 Dec 2023 10:38:15 -0800 Message-ID: Subject: Re: [PATCH v1 3/9] x86/resctrl: Add resctrl_mbm_flush_cpu() to collect CPUs' MBM events To: Reinette Chatre Cc: Fenghua Yu , Babu Moger , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, "H. Peter Anvin" , Stephane Eranian , James Morse , linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-8.4 required=5.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE, USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on groat.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (groat.vger.email [0.0.0.0]); Wed, 06 Dec 2023 10:38:47 -0800 (PST) Hi Reinette, On Tue, Dec 5, 2023 at 5:47=E2=80=AFPM Reinette Chatre wrote: > > On 12/5/2023 4:33 PM, Peter Newman wrote: > > On Tue, Dec 5, 2023 at 1:57=E2=80=AFPM Reinette Chatre > > wrote: > >> On 12/1/2023 12:56 PM, Peter Newman wrote: > > Ignoring any present-day resctrl interfaces, what we minimally need is.= .. > > > > 1. global "start measurement", which enables a > > read-counters-on-context switch flag, and broadcasts an IPI to all > > CPUs to read their current count > > 2. wait 5 seconds > > 3. global "end measurement", to IPI all CPUs again for final counts > > and clear the flag from step 1 > > > > Then the user could read at their leisure all the (frozen) event > > counts from memory until the next measurement begins. > > > > In our case, if we're measuring as often as 5 seconds for every > > minute, that will already be a 12x aggregate reduction in overhead, > > which would be worthwhile enough. > > The "con" here would be that during those 5 seconds (which I assume would= be > controlled via user space so potentially shorter or longer) all tasks in = the > system is expected to have significant (but yet to be measured) impact > on context switch delay. Yes, of course. In the worst case I've measured, Zen2, it's roughly a 1700-cycle context switch penalty (~20%) for tasks in different monitoring groups. Bad, but the benefit we gain from the per-RMID MBM data makes up for it several times over if we only pay the cost during a measurement. > I expect the overflow handler should only be run during the measurement > timeframe, to not defeat the "at their leisure" reading of counters. Yes, correct. We wouldn't be interested in overflows of the hardware counter when not actively measuring bandwidth. > > >>> The second involves avoiding the situation where a hardware counter > >>> could be deallocated: Determine the number of simultaneous RMIDs > >>> supported, reduce the effective number of RMIDs available to that > >>> number. Use the default RMID (0) for all "unassigned" monitoring > >> > >> hmmm ... so on the one side there is "only the RMID within the PQR > >> register can be guaranteed to be tracked by hardware" and on the > >> other side there is "A given implementation may have insufficient > >> hardware to simultaneously track the bandwidth for all RMID values > >> that the hardware supports." > >> > >> From the above there seems to be something in the middle where > >> some subset of the RMID values supported by hardware can be used > >> to simultaneously track bandwidth? How can it be determined > >> what this number of RMID values is? > > > > In the context of AMD, we could use the smallest number of CPUs in any > > L3 domain as a lower bound of the number of counters. > > Could you please elaborate on this? (With the numbers of CPUs nowadays th= is > may be many RMIDs, perhaps even more than what ABMC supports.) I think the "In the context of AMD" part is key. This feature would only be applicable to the AMD implementations we have today which do not implement ABMC. I believe the difficulties are unique to the topologies of these systems: many small L3 domains per node with a relatively small number of CPUs in each. If the L3 domains were large and few, simply restricting the number of RMIDs and allocating on group creation as we do today would probably be fine. > I am missing something here since it is not obvious to me how this lower > bound is determined. Let's assume that there are as many monitor groups > (and thus as many assigned RMIDs) as there are CPUs in a L3 domain. > Each monitor group may have many tasks. It can be expected that at any > moment in time only a subset of assigned RMIDs are assigned to CPUs > via the CPUs' PQR registers. Of those RMIDs that are not assigned to > CPUs, how can it be certain that they continue to be tracked by hardware? Are you asking whether the counters will ever be reclaimed proactively? The behavior I've observed is that writing a new RMID into a PQR_ASSOC register when all hardware counters in the domain are allocated will trigger the reallocation. However, I admit the wording in the PQoS spec[1] is only written to support the permanent-assignment workaround in the current patch series: "All RMIDs which are currently in use by one or more processors in the QOS domain will be tracked. The hardware will always begin tracking a new RMID value when it gets written to the PQR_ASSOC register of any of the processors in the QOS domain and it is not already being tracked. When the hardware begins tracking an RMID that it was not previously tracking, it will clear the QM_CTR for all events in the new RMID." I would need to confirm whether this is the case and request the documentation be clarified if it is. > >>> > >>> While the second feature is a lot more disruptive at the filesystem > >>> layer, it does eliminate the added context switch overhead. Also, it > >> > >> Which changes to filesystem layer are you anticipating? > > > > Roughly speaking... > > > > 1. The proposed "assign" interface would have to become more indirect > > to avoid understanding how assign could be implemented on various > > platforms. > > It is almost starting to sound like we could learn from the tracing > interface where individual events can be enabled/disabled ... with severa= l > events potentially enabled with an "enable" done higher in hierarchy, per= haps > even globally to support the first approach ... Sorry, can you clarify the part about the tracing interface? Tracing to support dynamic autoconfiguration of events? Thanks! -Peter [1] AMD64 Technology Platform Quality of Service Extensions, Revision: 1.0= 3: https://bugzilla.kernel.org/attachment.cgi?id=3D301365