Received: by 2002:a05:6358:d09b:b0:dc:cd0c:909e with SMTP id jc27csp311107rwb; Wed, 9 Nov 2022 02:51:41 -0800 (PST) X-Google-Smtp-Source: AMsMyM6oj4tIEo9wSpXdolcXGZo8Gi4Qi+BAIiqM9w1n84F2YI7had4UDHJua/eJ/9sMOmHFbxYx X-Received: by 2002:a17:907:a422:b0:7a6:c4cb:dd42 with SMTP id sg34-20020a170907a42200b007a6c4cbdd42mr55923220ejc.735.1667991100914; Wed, 09 Nov 2022 02:51:40 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1667991100; cv=none; d=google.com; s=arc-20160816; b=DzNYR43ydp0bROEKq1Fs2m5ANviGKhneP6NxoZZFRq72Kc7D89vm2QXDS6kuiwnTox BXdDlPFx31dd1nsU+7sMkfLr89yZZ0RedImhRz0gAGLToq/SdiY9yOus1rtakR4/pBCZ h7TiGyLt0Lf2HWufvleFXvhs5bnCDBf2rdw28ar2Lev2Qf1X8cSu1vbwmGB8nfeZhJhE 6vB0EbhmLwVoT/uGv74Icfm96fTwikrU8+bn8d/08FkBvNOJR3nxpfOdA0BptBQ8fp3R q6WW9n6ATVhLZSptzDVePlgzI/tuFGkiFbpaQVT70vpSsmghABklOFT56zO1LXR8lzV+ KE0Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=u1s0y43ZpuBvyMz8zkFJk/Bh36z/uSELIAuni6JBmQc=; b=xtYpOOSGqQOttaiFPhy/Thjgv/SRZa7vvhKPFRPr6WCVJG+nomZ9O6JpWNfCJC5JiK djPeMvOY2qGAN8vrggWgizM2Mz2cien05qMPr7MpJ0LaQHKaMNivRklJ10DJwSM9PX6Q yVDoUEVBYJS1Yv1vCpWTF2O7rGFYdoKZZ7VGst3PioQcP6WGHe5DnpswrRGhLBeuSmzj OSJMcYeWL9Plu/N99r4wyk2ClftDZXpn7UTpPnFQdOHvi8oqyLIKVclMwgcGFJIO1sQt s8pm4tlHGSjfE5Gw68WutN7EvVDX7oEVBws9vo68grVbJBZzUMGxbCa6xFyzyF3mBox8 mF9A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=JPd7ilXs; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id w11-20020a05640234cb00b00462b89cc2c6si18281422edc.267.2022.11.09.02.51.18; Wed, 09 Nov 2022 02:51:40 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=JPd7ilXs; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230187AbiKIJux (ORCPT + 94 others); Wed, 9 Nov 2022 04:50:53 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51544 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229818AbiKIJuw (ORCPT ); Wed, 9 Nov 2022 04:50:52 -0500 Received: from mail-yb1-xb29.google.com (mail-yb1-xb29.google.com [IPv6:2607:f8b0:4864:20::b29]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0A3091409F for ; Wed, 9 Nov 2022 01:50:51 -0800 (PST) Received: by mail-yb1-xb29.google.com with SMTP id r3so20389247yba.5 for ; Wed, 09 Nov 2022 01:50:51 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=u1s0y43ZpuBvyMz8zkFJk/Bh36z/uSELIAuni6JBmQc=; b=JPd7ilXsPlRk4baOLUUijHuKIyTl3/LkY19HBwIEdcPGxVD7UWC0QfT+/h59P4GMXl /jMRH/n87zn90m4V+iDrQ+g6sDP9+P8g9XEjxvXDzMVzca7NMmSe7JVowTOeTX58xiWW wLAooVlLv/HAqv/eeKYp1NTlV1i9YJvebRNDje3qNHv1GZkzWcWYZb6hmAmIUEUfuaHZ fa3odSDQpoxob5fr+wgFLsFKqVKIDU/wEokhS3bkZYkuzBw2c6Zf+COJhtU6tO+sIBdi 2KG0u0YgqDbUEwh+Jdn093/xNsg2t1AIdnRTiiSaJqsZ0VXNCCY540aXVlXwFoHUGa1u j4pQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=u1s0y43ZpuBvyMz8zkFJk/Bh36z/uSELIAuni6JBmQc=; b=s6vh9jfH7zjeAsF2yKxUKLJJawU00W8xQCj9gImSGwB5kGMdAlLh2GeMbR/skBlrhi Ng7MdAZatjhAAEB5pDC/5Vnt5rR0M9jbsqsg1kM9rzzczSHU7SpBA87fHGK8GakL2lQH va5QDFavLsrQj5NV16flh0bCOnd0ZL/qlllRz+IJJC5eH5tyTq0/j72atFXQECNpVlPH k6uAmXnAaGcKoWJ/Vg9ikHcZXQ1eA6cgmMePCQk9Z9Gjjoef/KpI3QundnM0CNxR5rSc /BMrMaHkiIT2hGQ0IvfdomFQIlhL53ZujaqdXDRI8TCctrIwt/MWU0RMvsjaw6UpOibT 6akQ== X-Gm-Message-State: ACrzQf2OVNJiBpVvmXleYLYmzoohcvh/8YGHXX+KcuLKAmrHL7LzRMHW FB9EizEVgnacLMK0F8wO0t/4DVHpJVrS6RX1E4u8pw== X-Received: by 2002:a05:6902:10ca:b0:671:3616:9147 with SMTP id w10-20020a05690210ca00b0067136169147mr56611236ybu.105.1667987450132; Wed, 09 Nov 2022 01:50:50 -0800 (PST) MIME-Version: 1.0 References: <81a7b4f6-fbb5-380e-532d-f2c1fc49b515@intel.com> <76bb4dc9-ab7c-4cb6-d1bf-26436c88c6e2@arm.com> <835d769b-3662-7be5-dcdd-804cb1f3999a@arm.com> <09029c7a-489a-7054-1ab5-01fa879fb42f@intel.com> In-Reply-To: From: Peter Newman Date: Wed, 9 Nov 2022 10:50:38 +0100 Message-ID: Subject: Re: [RFD] resctrl: reassigning a running container's CTRL_MON group To: Reinette Chatre Cc: James Morse , Tony Luck , "Yu, Fenghua" , "Eranian, Stephane" , "linux-kernel@vger.kernel.org" , Thomas Gleixner , Babu Moger , Gaurang Upasani Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-17.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Reinette, On Tue, Nov 8, 2022 at 10:28 PM Reinette Chatre wrote: > On 11/3/2022 10:06 AM, James Morse wrote: > > That is true. MPAM has an additional headache here as it needs to allocate a monitor in > > order to read the counters. If there are enough monitors for each CLOSID*RMID to have one, > > then MPAM can export the counter files in the same way RDT does. > > > > While there are systems that have enough monitors, I don't think this is going to be the > > norm. To allow systems that don't have a surfeit of monitors to use the counters, I plan > > to export the values from resctrl_arch_rmid_read() via perf. (but only for bandwidth counters) > > This sounds related to the way monitoring was done in earlier kernels. This was > long before I become involved with this work. Unfortunately I am not familiar with > all the history involved that ended in it being removed from the kernel. Looks like > this was around v4.6, here is a sample commit that may help point to what was done: Sort of related, this is a problem we have to work around on AMD implementations that I will be sharing a patch for soon. Note the second paragraph at the top of page 13: https://developer.amd.com/wp-content/resources/56375_1.00.pdf AMD QoS often provides less counters than RMIDs, but the architecture promises there will be at least as many counters in a QoS domain as CPUs. Using this we can permanently pin RMIDs to CPUs and read the counters on every task switch to implement MBM RMIDs in software. This has the caveats that evictions while one task is running could have resulted from a previous task on the current CPU, but will be counted against the new task's software-RMID, and that CMT doesn't work. I will propose making this available as a mount option for cloud container use cases which need to monitor a large number of tasks on B/W counter-poor systems, and of course don't need CMT. > [...] > > > I think the solution to all this is: > > * Add rename support to move a monitor group between two control groups. > > ** On x86, this is guaranteed to preserve the RMID, so the destination counter continues > > unaffected. > > ** On arm64, the PARTID is also relevant to the monitors, so the old counters will > > continue to count. > > This looks like the solution to me also. > > The details of the arm64 support is not clear to me though. The destination > group may not have enough PMG to host the new group so failures need to be > handled. As you mention also, the old counters will continue to count. > I assume that you mean the hardware will still have a record of the occupancy > and that needs some time to dissipate? I assume this would fall under the > limbo handling so in some scenarios (for example the just moved monitor > group used the last PMG) it may take some time for the source control > group to allow a new monitor group? The new counters will also not > reflect the task's history. > > Moving an arm64 monitor group may thus have a few surprises for user > space while sounding complex to support. Would adding all this additional > support be worth it if the guidance to user space is to instead create many > control groups in such a control-group-rich environment? > > > Whether this old counters keep counting needs exposing to user-space so that it is aware. > > Could you please elaborate? Do old counters not always keep counting? Based on this, is it even worth it to allocate PMGs given that the systems James has seen so far only have a single PMG bit? All this will get us is the ability to create a single child mon_group in each control group. This seems too limiting for the feature to be useful. -Peter