Received: by 2002:a05:6358:1087:b0:cb:c9d3:cd90 with SMTP id j7csp3616611rwi; Wed, 12 Oct 2022 04:46:06 -0700 (PDT) X-Google-Smtp-Source: AMsMyM6H2ByiYntxstyQCgn3/Pwei9Tc3UXSk7C26KHLCuEZZ1yM3g2A3AcG4PI6c2SP+93gCWLl X-Received: by 2002:a17:90b:3b49:b0:20d:971f:4bc with SMTP id ot9-20020a17090b3b4900b0020d971f04bcmr1060330pjb.108.1665575165798; Wed, 12 Oct 2022 04:46:05 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1665575165; cv=none; d=google.com; s=arc-20160816; b=sNUkdyz065P1vipJILmbuPbC6OA63Ag7ZNxSi63Qr/ePU0LmoUJjgTkctU6kcFU4Ub Emepsxe3FxwFtp+ulcZHHl718bUkO7+RCBYAcacoBJpt6PTVR71ZMsdpyKPb1a1eozTf P1I3PIZZofXtzbuePLAP2ISoEJi4EvoVlWIw69K56W1yUl/BeUxilYfDHF0yPMBXJ/g1 fs6DHUvm++qD5PFQybQPFvT5lVYJ3upvzi7GdDUz4eUoQt+wAvjV92INUlZGLImTmshe JS+pK/Xb5x1JQw/vdLFuRASN98g5DPD4FGtzFSB3sHqCJnwBMPc19ZSMep9FoBTuuiPP Ct9g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=bgZEs8Ai8jmB4PtGsPZVSG6Q81sH974yw7tFMZN5UWg=; b=Zbvl6IH5E/EbsJhnpcrDfuSjNlhs5n4DAk96tNsto3BfOzYEp34zxT4+9LA9WfQdII LbENvvN3bb3sWasfZJQ0K0mh9q0xNUmX5z5inagv6KwyS3Tz/H44giMxhQB2dUreOU1e qz0wXTu+4CBMxCQg1yy+zH6a+riN80EXerLYogPINIoyR7aMfSmvnTK2VT5mnmvBXVS/ J+XA9p88uyPfI86raNGEKmJqdxBTA08+XZTE6dBY9sxQlqYyonUsq1TbQd4et8DN8DF4 vQcoDHblF17jIj99CFxWLdmttnF/qK6ny6GGEvSry9qq7K9lF9Y7z1qLLqrQdIXhmiFi tdFg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=JkrgaWN3; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id o5-20020a056a001bc500b00557bb4f6977si16637614pfw.106.2022.10.12.04.45.53; Wed, 12 Oct 2022 04:46:05 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=JkrgaWN3; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229607AbiJLLVP (ORCPT + 99 others); Wed, 12 Oct 2022 07:21:15 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37632 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229563AbiJLLVO (ORCPT ); Wed, 12 Oct 2022 07:21:14 -0400 Received: from mail-yw1-x112d.google.com (mail-yw1-x112d.google.com [IPv6:2607:f8b0:4864:20::112d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F03F3BEFB2 for ; Wed, 12 Oct 2022 04:21:12 -0700 (PDT) Received: by mail-yw1-x112d.google.com with SMTP id 00721157ae682-3573ed7cc15so152708087b3.1 for ; Wed, 12 Oct 2022 04:21:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=bgZEs8Ai8jmB4PtGsPZVSG6Q81sH974yw7tFMZN5UWg=; b=JkrgaWN32USp5UVlzD2Fsy/cdMndpq2ueF0NHA3hr4tk4zVdWTA4EaaWejHPEhngFk T38gbPspVmvtq8a6U8z8+AbAUoEAMeo+3rWLXcqjoRDYmBOZgvxhGWQbV7qhGh7bYxxP OUQ4fVOcm1KbwHmdmHeCU9W47/+z5BmxLm71J3fkmxIbEvxWy0bozRYaK4INSfdJG503 NQ8QzhgsKB9FPi7lKeeZnOEDPm70R0TjjKR9lb2aPcY9lluQieuM7BsHthkLQh6b/qPf 1ip7NsXUyj12pBonGYgZhHBAnf3VvzQU2bh5knI+aDPJiICyZ+WMyVzsHkgtQil6izhX F7Mg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=bgZEs8Ai8jmB4PtGsPZVSG6Q81sH974yw7tFMZN5UWg=; b=vjUXxvH1V9or2+11HPXXewu2ifKPqa7R26UcxnKIeBXFh7ABg7OEhVCJT0gJs5yVf1 zAN0Vo3CCq82Kfnub899iP0+56iykDymjhWlxlTn8elTFWjmU2yh6+UsjITIX7aT1lf5 glCRXk4fdgLp4OJP0OfmHa5uimNnVXDiloQZBxQBNZPalpmZrlck7XFep6NLi4jx2wrQ 28x5NTJTGbCGLG1VCuNEC3Vzbcp3XjiKHK+11w6dbQ71X0L46n+h+x7/Few7hGWxQWle y/KVyRG3AoLJMFa/Rg6RThNzQcA/gIjXzmJTCPCVfXXf4qGZFaRWliZQuArjekBk8/Y/ BbjQ== X-Gm-Message-State: ACrzQf0bom+J9S22D6zx3aZz9knIHlrqG1ceWsqEqglGqPIfCFRGqOAj KxsZJUhn1R1Ja37+X+MnjFoliuliGwdTQu5HtEnJcg== X-Received: by 2002:a81:4ccc:0:b0:360:811f:4b74 with SMTP id z195-20020a814ccc000000b00360811f4b74mr18745782ywa.398.1665573672034; Wed, 12 Oct 2022 04:21:12 -0700 (PDT) MIME-Version: 1.0 References: <81a7b4f6-fbb5-380e-532d-f2c1fc49b515@intel.com> In-Reply-To: <81a7b4f6-fbb5-380e-532d-f2c1fc49b515@intel.com> From: Peter Newman Date: Wed, 12 Oct 2022 13:21:00 +0200 Message-ID: Subject: Re: [RFD] resctrl: reassigning a running container's CTRL_MON group To: Reinette Chatre Cc: Tony Luck , "Yu, Fenghua" , "Eranian, Stephane" , "linux-kernel@vger.kernel.org" , Thomas Gleixner , James Morse , Babu Moger , Gaurang Upasani Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-17.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org [Adding Gaurang to CC] On Tue, Oct 11, 2022 at 1:35 AM Reinette Chatre wrote: > > On 10/7/2022 10:28 AM, Tony Luck wrote: > > I don't know how complex it would for the kernel to implement this. Or > > whether it would meet Google's needs. > > > > How about moving monitor groups from one control group to another? > > Based on the initial description I got the impression that there is > already a monitor group for every container. (Please correct me if I am > wrong). If this is the case then it may be possible to create an interface > that could move an entire monitor group to another control group. This would > keep the benefit of usage counts remaining intact, tasks get a new closid, but > keep their rmid. There would be no need for the user to specify process-ids. Yes, Stephane also pointed out the importance of maintaining RMID assignments as well and I don't believe I put enough emphasis on it during my original email. We need to maintain accurate memory bandwidth usage counts on all containers, so it's important to be able to maintain an RMID assignment and its event counts across a CoS downgrade. The solutions Tony suggested do solve the races in moving the tasks, but the container would need to temporarily join the default MON group in the new CTRL_MON group before it can be moved to its replacement MON group. Being able to re-parent a MON group would allow us to change the CLOSID independently of the RMID in a container and would address the issue. The only other point I can think of to differentiate it from the automatic CLOSID management solution is whether the 1:1 CTRL_MON:CLOSID approach will become too limiting going forward. For example, if there are configurations where one resource has far fewer CLOSIDs than others and we want to start assigning CLOSIDs on-demand, per-resource to avoid wasting other resources' available CLOSID spaces. If we can foresee this becoming a concern, then automatic CLOSID management would be inevitable. -Peter On Tue, Oct 11, 2022 at 1:35 AM Reinette Chatre wrote: > > On 10/7/2022 10:28 AM, Tony Luck wrote: > > On Fri, Oct 07, 2022 at 08:44:53AM -0700, Yu, Fenghua wrote: > >> Hi, Peter, > >> > >>> On 10/7/2022 3:39 AM, Peter Newman wrote: > > > >>>> The CLOSID management rules would roughly be: > >>>> > >>>> 1. If an update would cause a CTRL_MON group's config to match that of > >>>> an existing group, the CTRL_MON group's CLOSID should change to that > >>>> of the existing group, where the definition of "match" is: all > >>>> control values match in all domains for all resources, as well as > >>>> the cpu masks matching. > > > > So the micro steps are: > > > > # mkdir newgroup > > # New groups are created with maximum resources. So this might > > # match the root/default group (if the root schemata had not > > # been edited) ... so you could re-use CLOSID=0 for this, or > > # perhaps allocate a new CLOSID > > # edit newgroup/schemata > > # if this update makes this schemata match some other group, > > # then update the CLOSID for this group to be same as the other > > # group. > >>>> > >>>> 2. If an update to a CTRL_MON group sharing a CLOSID with another group > >>>> causes that group to no longer match any others, a new CLOSID must > >>>> be allocated. > > # So you have reference counts for CLOSIDs for how many groups > > # share it. In above example the change to the schemata and > > # alloction of a new CLOSID would decrement the reference count > > # and free the old CLOSID if it goes to zero > >>>> > >>>> 3. An update to a CTRL_MON group using a non-shared CLOSID which > >>>> continues to not match any others follows the current resctrl > >>>> behavior. > > # An update to a CTRL_MON group that has a CLOSID reference > > # count > 1 would try to allocate a new CLOSID if the new > > # schemata doesn't match any other group. If all CLOSIDs are > > # already in use, the write(2) to the schemata file must fail > > # ... maybe -ENOSPC is the right error code? > > > > Note that if the root/default CTRL_MON had been editted you might not be > > able to create a new group (even though you intend to make to match some > > existing group and share a CLOSID). Perhaps we could change existing > > semantics so that new groups copy the root group schemata instead of > > being maximally permissibe with all resources? > >>>> > >>>> Before I prepare any patches for review, I'm interested in any > >>>> comments or suggestions on the use case and solution. > >>>> > >>>> Are there simpler strategies for reassigning a running container's > >>>> tasks to a different CTRL_MON group that we should be considering first? > > > > Do tasks in a container share a "process group"? If they do, then a > > simpler option would be some syntax to assign a group to a resctrl group > > (perhaps as a negative task-id? or with a "G" prefix??). > > > > Or is there some other simple way to enumerate all the tasks in a > > container with some syntax that is convenient for both the user and the > > kernel? If there is, then add code to allow something like: > > # echo C{containername} > tasks > > and have the resctrl code move all tasks en masse. > > > > Yet another option would be syntax to apply the move recursively to all > > descendents of the given task id. > > > > # echo R{process-id} > tasks > > > > I don't know how complex it would for the kernel to implement this. Or > > whether it would meet Google's needs. > > > > How about moving monitor groups from one control group to another? > > Based on the initial description I got the impression that there is > already a monitor group for every container. (Please correct me if I am > wrong). If this is the case then it may be possible to create an interface > that could move an entire monitor group to another control group. This would > keep the benefit of usage counts remaining intact, tasks get a new closid, but > keep their rmid. There would be no need for the user to specify process-ids. > > Reinette