Received: by 2002:a05:6358:9144:b0:117:f937:c515 with SMTP id r4csp10319192rwr; Fri, 12 May 2023 06:46:02 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ69WsuMklihcJXLkRg5BKCuho7pLIk8w6GK0yjTVJ7t7/+9czmAiBMadjh6+ZPhrZXOp1ey X-Received: by 2002:a17:90b:390a:b0:250:7322:3357 with SMTP id ob10-20020a17090b390a00b0025073223357mr22041107pjb.4.1683899162544; Fri, 12 May 2023 06:46:02 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1683899162; cv=none; d=google.com; s=arc-20160816; b=PaEN5tn3maRyhjy3vHj35jRAWd9GwKIxEltSoZNYhPsv/JyOz3lEaoqxkCgNJURUgs VIeeilOFbZunPEv/gtSU9TooX1UUr0EHCz1JtgsBFTubEduyY2wZeQ7aNYadDaPw+OZ1 pjUTGJHc585K+pYtj3gRBtfiQKqTos4jWGuQ7DzKBaelLJXYwOu60KcgUyZF+k+X+51C UXvz+hrZKsjEojNGeQSSPEOZpTmnyjwfmLmZ75+h+DXfz4v5mM4rUE4Z695I9W3YllEa JA4uvV2teWMFXb85Q1U+NltXu4XsP8n98Eiuj+Bjjz4FIo77jf0oXDPFkgkqyGfS3f5E 0krQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=mbwZn6wbb6AWyMrRGtCopiudSvQ/zU2/r+76z+7YvMg=; b=LVpNnOBq7fbS+ZnNSkYVVPgWuhHzm7U/+LBUtHVkSvX8WD7IZrcrsg3GqSfs0XEWGN 7bXZpLueRA5RvEJNijnKLVRdr7JbO8yBmxEtHNXSxtIp2udqEF82tnV4nddqIbUOnQEv 25kvq5R358cinkMLs9V3pB09y6d1HV6pGoblBlC73PO1p0bcel+Fipvd7TOxxsGHLl3/ XIwT/I+oT0k0mz+aRJFstmUvvkoADSKmGSEv5Co5F2pNBEr4X6xnc/s+inQdma2QXeAT mHFKsdkFdG6gotCgwTW2c/YVJavT5ew85Cqhtekxfjc1+b2iuO9mv2vFP7LrOfwgXv45 ST5g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=eqxzVZCd; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id b11-20020a170902d50b00b001a1a0db7f5bsi10864631plg.335.2023.05.12.06.45.48; Fri, 12 May 2023 06:46:02 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=eqxzVZCd; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240863AbjELNZv (ORCPT + 99 others); Fri, 12 May 2023 09:25:51 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46318 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S240646AbjELNZu (ORCPT ); Fri, 12 May 2023 09:25:50 -0400 Received: from mail-qt1-x82c.google.com (mail-qt1-x82c.google.com [IPv6:2607:f8b0:4864:20::82c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DA0EF199E for ; Fri, 12 May 2023 06:25:48 -0700 (PDT) Received: by mail-qt1-x82c.google.com with SMTP id d75a77b69052e-3ef34c49cb9so1010301cf.1 for ; Fri, 12 May 2023 06:25:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1683897948; x=1686489948; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=mbwZn6wbb6AWyMrRGtCopiudSvQ/zU2/r+76z+7YvMg=; b=eqxzVZCdywYF0MYeWxFkG9lVsVh+8ezrwXRSbPQqcbObWjSMtIbdxXAJaATeelqNO3 kmQsm63IkVz+TmGXAIweWz3t3YM+kJY/J6jA0pysT+mIGOgHZHOyIEdUBK39F0eTkrCL hFpPr3ZtfpZWqov9IUJgWSm9Bg2a2JrAspjbOtvSoKaWUE5z74jecRSviQ9DGqoBVjLh q+Scc1LFgpxP2+vhaQcIzd0m1TNNbILCqN/jb2kyfPPveTP4yAfd9S1V8jcS7zpY8qTL PAjM2A0Ypu2Oz2L+QOWXiIloSg/9Vum1GEaspiTa/S8rxrOxTRnDbbQmtHZKLBvxtMLS 8GAA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1683897948; x=1686489948; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=mbwZn6wbb6AWyMrRGtCopiudSvQ/zU2/r+76z+7YvMg=; b=XUSi5TcLoaBTRoGuSmTSfrRJ5Pwga2WLkoEIeo+hKnDc/J4pB650bcXnt0nTpzRAu5 VPkdUhX5fk51Niqpi/IdT/y5+pcgoARWeWdCLwELgWCRwvVD6E1N+1jF+GMZxDMnl/AM Tszs7SOVJ4oN7m6tvBh7UurVrpscgVndh22/xjTfjdBWnpIHIBMfEbmrvzhrJ2jUjWju t85HVDYK3G+YzZ9tNmobhkGdSVbop2LDlPDqO+dK8hCdVEP6pZMPCfg417dmhyOgvrDA DWhKJ8I5TtzAUg123u1gtyLPIyWT5WwPPaUVFku1s9lRdYdiFKdowmn1rrRCuEumtlCY Gtog== X-Gm-Message-State: AC+VfDyIQgwSEZ3z+QnbWPIuNsK3Utcwti5Nf1SDWRkleiY36r5u7us8 JHBt70BkeGsvIjq1YnjOyEd+/S6gJqpWiBmuFmoa9w== X-Received: by 2002:a05:622a:c3:b0:3db:1c01:9d95 with SMTP id p3-20020a05622a00c300b003db1c019d95mr382182qtw.4.1683897947838; Fri, 12 May 2023 06:25:47 -0700 (PDT) MIME-Version: 1.0 References: <20230421141723.2405942-1-peternewman@google.com> <20230421141723.2405942-4-peternewman@google.com> <38b9e6df-cccd-a745-da4a-1d1a0ec86ff3@intel.com> In-Reply-To: <38b9e6df-cccd-a745-da4a-1d1a0ec86ff3@intel.com> From: Peter Newman Date: Fri, 12 May 2023 15:25:37 +0200 Message-ID: Subject: Re: [PATCH v1 3/9] x86/resctrl: Add resctrl_mbm_flush_cpu() to collect CPUs' MBM events To: Reinette Chatre Cc: Fenghua Yu , Babu Moger , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, "H. Peter Anvin" , Stephane Eranian , James Morse , linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-17.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Reinette, On Thu, May 11, 2023 at 11:37=E2=80=AFPM Reinette Chatre wrote: > On 4/21/2023 7:17 AM, Peter Newman wrote: > > Implement resctrl_mbm_flush_cpu(), which collects a domain's current MB= M > > event counts into its current software RMID. The delta for each CPU is > > determined by tracking the previous event counts in per-CPU data. The > > software byte counts reside in the arch-independent mbm_state > > structures. > > Could you elaborate why the arch-independent mbm_state was chosen? It largely had to do with how many soft RMIDs to implement. For our own needs, we were mainly concerned with getting back to the number of monitoring groups the hardware claimed to support, so there wasn't much internal motivation to support an unbounded number of soft RMIDs. However, breaking this artificial connection between supported HW and SW RMIDs to support arbitrarily-many monitoring groups could make the implementation conceptually cleaner. If you agree, I would be happy to give it a try in the next series. > > + /* cache occupancy events are disabled in this mode */ > > + WARN_ON(!is_mbm_event(evtid)); > > If this is hit it would trigger a lot, perhaps WARN_ON_ONCE? Ok > > > + > > + if (evtid =3D=3D QOS_L3_MBM_LOCAL_EVENT_ID) { > > + counter =3D &state->local; > > + } else { > > + WARN_ON(evtid !=3D QOS_L3_MBM_TOTAL_EVENT_ID); > > + counter =3D &state->total; > > + } > > + > > + /* > > + * Propagate the value read from the hw_rmid assigned to the curr= ent CPU > > + * into the "soft" rmid associated with the current task or CPU. > > + */ > > + m =3D get_mbm_state(d, soft_rmid, evtid); > > + if (!m) > > + return; > > + > > + if (resctrl_arch_rmid_read(r, d, hw_rmid, evtid, &val)) > > + return; > > + > > This all seems unsafe to run without protection. The code relies on > the rdt_domain but a CPU hotplug event could result in the domain > disappearing underneath this code. The accesses to the data structures > also appear unsafe to me. Note that resctrl_arch_rmid_read() updates > the architectural MBM state and this same state can be updated concurrent= ly > in other code paths without appropriate locking. The domain is supposed to always be the current one, but I see that even a get_domain_from_cpu(smp_processor_id(), ...) call needs to walk a resource's domain list to find a matching entry, which could be concurrently modified when other domains are added/removed. Similarly, when soft RMIDs are enabled, it should not be possible to call resctrl_arch_rmid_read() outside of on the current CPU's HW RMID. I'll need to confirm whether it's safe to access the current CPU's rdt_domain in an atomic context. If it isn't, I assume I would have to arrange all of the state used during flush to be per-CPU. I expect the constraints on what data can be safely accessed where is going to constrain how the state is ultimately arranged, so I will need to settle this before I can come back to the other questions about mbm_state. > > > + /* Count bandwidth after the first successful counter read. */ > > + if (counter->initialized) { > > + /* Assume that mbm_update() will prevent double-overflows= . */ > > + if (val !=3D counter->prev_bytes) > > + atomic64_add(val - counter->prev_bytes, > > + &m->soft_rmid_bytes); > > + } else { > > + counter->initialized =3D true; > > + } > > + > > + counter->prev_bytes =3D val; > > I notice a lot of similarities between the above and the software control= ler, > see mbm_bw_count(). Thanks for pointing this out, I'll take a look. > > > +} > > + > > +/* > > + * Called from context switch code __resctrl_sched_in() when the curre= nt soft > > + * RMID is changing or before reporting event counts to user space. > > + */ > > +void resctrl_mbm_flush_cpu(void) > > +{ > > + struct rdt_resource *r =3D &rdt_resources_all[RDT_RESOURCE_L3].r_= resctrl; > > + int cpu =3D smp_processor_id(); > > + struct rdt_domain *d; > > + > > + d =3D get_domain_from_cpu(cpu, r); > > + if (!d) > > + return; > > + > > + if (is_mbm_local_enabled()) > > + __mbm_flush(QOS_L3_MBM_LOCAL_EVENT_ID, r, d); > > + if (is_mbm_total_enabled()) > > + __mbm_flush(QOS_L3_MBM_TOTAL_EVENT_ID, r, d); > > +} > > This (potentially) adds two MSR writes and two MSR reads to what could po= ssibly > be quite slow MSRs if it was not designed to be used in context switch. D= o you > perhaps have data on how long these MSR reads/writes take on these system= s to get > an idea about the impact on context switch? I think this data should feat= ure > prominently in the changelog. I can probably use ftrace to determine the cost of an __rmid_read() call on a few implementations. To understand the overall impact to context switch, I can put together a scenario where I can control whether the context switches being measured result in change of soft RMID to prevent the data from being diluted by non-flushing switches. Thank you for looking over these changes! -Peter