Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp6144900rwd; Mon, 5 Jun 2023 13:50:26 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ5fu81sBItleoqyE97pLkB0s8aS+h3F/hxgm8zfXLspqg5l8WjcQ2V1R0NnfD1ncsJ3RLoU X-Received: by 2002:a17:90b:b04:b0:256:3491:a256 with SMTP id bf4-20020a17090b0b0400b002563491a256mr4125627pjb.15.1685998225710; Mon, 05 Jun 2023 13:50:25 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1685998225; cv=none; d=google.com; s=arc-20160816; b=YP0jwtw1fK7Hg5JrHGujeQ684nk0m1WFcmHn8E7ThV9Byf+ZbIg3DR69pLLxukdFys 3hD0j/A3RoBed5x5Ao8y8qJUsflg5BRXp8vJHZBglgOUHfoaOgaHWot/obUZYkr4B1Ea FmG9e3cQfPNm0WPVdjDz/MZERAqwpG1o8tenmBh706mVqelpue51ueqJtPg9QFJNF6Z9 r/6QzOjLmTvv8eMgwsdbqliD1GITSw1jfrkRU9I/BlQvyYlZpn8BCh2XMIH7zYmmyZm8 EGSymz53vH3X7ihA+ue5y0lFfnNJnkolEsxrYO7pwO/JwDPjbKqYjA/W7JM5LsSCWNhk Mfmg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:sender:dkim-signature; bh=IcWiwa0YHCZMhNxhxBqh5ZQM/CWxwyBQyza33kkTAC0=; b=OwfjB+VuG2v1zqkKZzeO84S6wX4JLEQHPGaFg8bHZPwYewG69WgKhRyG7H96kCjcIa fAJRoDDncSLuditzagat4DlugDY2GO0XBTvX3ZARph9vYHbbk+cIVBllMqMwXOLqDa7U zTSqz+SnPD1zdADVx5agAz7rx7VIt0VXqIrkoe97GlGiLrBRO8lpnCNeJblSP8FYwB5w flxMGBSJv5796v4EU5SW7aoFzpkH5g2dHsoWKSBtaBYIED1Rg9+br9LY8myE65k21U1Z ECwcOWGnBKj1bP1e/mj4a/IpIPKB/8KYTZxDWja3ShuF8K0oXshbykPDctRxsh6MOaVW +Saw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20221208 header.b=i8JXhA9z; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id bm18-20020a656e92000000b0052c575f1d8dsi6035716pgb.260.2023.06.05.13.50.12; Mon, 05 Jun 2023 13:50:25 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20221208 header.b=i8JXhA9z; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232609AbjFEU1m (ORCPT + 99 others); Mon, 5 Jun 2023 16:27:42 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51340 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232454AbjFEU1k (ORCPT ); Mon, 5 Jun 2023 16:27:40 -0400 Received: from mail-pl1-x629.google.com (mail-pl1-x629.google.com [IPv6:2607:f8b0:4864:20::629]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EB8F298; Mon, 5 Jun 2023 13:27:35 -0700 (PDT) Received: by mail-pl1-x629.google.com with SMTP id d9443c01a7336-1b01dac1a82so27949405ad.2; Mon, 05 Jun 2023 13:27:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1685996855; x=1688588855; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:sender:from:to:cc:subject:date:message-id :reply-to; bh=IcWiwa0YHCZMhNxhxBqh5ZQM/CWxwyBQyza33kkTAC0=; b=i8JXhA9zZMx49v6PRp16clRpN55RyyktDBCfAeBEFfFnEgBqXEDKSYt5ZAyi47e2uf Y/KK46uCWEJLuizeei3enQGeTIl69jOUY8zzIOeGmy3aM/zO7xwfOV+YSbg9EV0cg3G5 hqztIAp2GcduSsce4mxSdP04CxAfOcqixllrUe0J8cFFjAlTpii8qkv+EVJQ/+JgZrMh jwbkaENVdINA4RprEbsTX78c//E3/Kj+Oj57bnekhvsKtWG1usE1kmO636cgxZsrWRL+ k/INory2HDFdwDxgptcLlPMiQ/obxvaq2ZlTJU/Oba5Vr1E31iY6gKks/921N+Jkqh+K G7wQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1685996855; x=1688588855; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:sender:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=IcWiwa0YHCZMhNxhxBqh5ZQM/CWxwyBQyza33kkTAC0=; b=LXFww5SFr1rKtWcR21+NwHPRcl2aXbZ0nP0Va3Dxa0ddJfthKb611dICo+q7hnaFzr 2XCaVdLyPZV6gmoxrVf1KFfaWEnmThdvxYDPFpAGW/boOPNFAa/Dv9bnluIkEHA95ikn V9kkcXcSaw9qIkneCz9w+qlH1d96zy02LUDBLZsw8SdiVTRpKEWBwK1wsDm9MJ9Eo9Wc tWHUDTuoWPU8KbA+SHam9U03uvHlAPli7FfcErd3ZAnTKMmJ1t4u65Kg/V2e4mlMgQ5a BMSECW/Kx3VyFEM5Lg8yvqKaVEBhcWGupvBzI5v+O2w7HcSEHrtTN82UTnpvj8ThxpaV GOuA== X-Gm-Message-State: AC+VfDwS9mVoawOJTm0ky2UCzHjrExuG+cc5PrbXnWfor9YmeFbno/U+ GAnDtgodlOjwKi/Lreg019M= X-Received: by 2002:a17:902:bcc5:b0:1b0:3d54:357d with SMTP id o5-20020a170902bcc500b001b03d54357dmr51205pls.24.1685996854689; Mon, 05 Jun 2023 13:27:34 -0700 (PDT) Received: from localhost (dhcp-72-235-13-41.hawaiiantel.net. [72.235.13.41]) by smtp.gmail.com with ESMTPSA id jj21-20020a170903049500b001ac4d3d3f72sm6920798plb.296.2023.06.05.13.27.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 05 Jun 2023 13:27:34 -0700 (PDT) Sender: Tejun Heo Date: Mon, 5 Jun 2023 10:27:33 -1000 From: Tejun Heo To: Waiman Long Cc: Michal =?iso-8859-1?Q?Koutn=FD?= , Zefan Li , Johannes Weiner , Jonathan Corbet , Shuah Khan , linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, Juri Lelli , Valentin Schneider , Frederic Weisbecker , Mrunal Patel , Ryan Phillips , Brent Rowsell , Peter Hunt , Phil Auld Subject: Re: [RFC PATCH 0/5] cgroup/cpuset: A new "isolcpus" paritition Message-ID: References: <759603dd-7538-54ad-e63d-bb827b618ae3@redhat.com> <405b2805-538c-790b-5bf8-e90d3660f116@redhat.com> <18793f4a-fd39-2e71-0b77-856afb01547b@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-1.5 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_EF,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hello, On Mon, Jun 05, 2023 at 04:00:39PM -0400, Waiman Long wrote: ... > > file seems hacky to me. e.g. How would it interact with namespacing? Are > > there reasons why this can't be properly hierarchical other than the amount > > of work needed? For example: > > > > cpuset.cpus.exclusive is a per-cgroup file and represents the mask of CPUs > > that the cgroup holds exclusively. The mask is always a subset of > > cpuset.cpus. The parent loses access to a CPU when the CPU is given to a > > child by setting the CPU in the child's cpus.exclusive and the CPU can't > > be given to more than one child. IOW, exclusive CPUs are available only to > > the leaf cgroups that have them set in their .exclusive file. > > > > When a cgroup is turned into a partition, its cpuset.cpus and > > cpuset.cpus.exclusive should be the same. For backward compatibility, if > > the cgroup's parent is already a partition, cpuset will automatically > > attempt to add all cpus in cpuset.cpus into cpuset.cpus.exclusive. > > > > I could well be missing something important but I'd really like to see > > something like the above where the reservation feature blends in with the > > rest of cpuset. > > It can certainly be made hierarchical as you suggest. It does increase > complexity from both user and kernel point of view. > > From the user point of view, there is one more knob to manage hierarchically > which is not used that often. From user pov, this only affects them when they want to create partitions down the tree, right? > From the kernel point of view, we may need to have one more cpumask per > cpuset as the current subparts_cpus is used to track automatic reservation. > We need another cpumask to contain extra exclusive CPUs not allocated > through automatic reservation. The fact that you mention this new control > file as a list of exclusively owned CPUs for this cgroup. Creating a > partition is in fact allocating exclusive CPUs to a cgroup. So it kind of > overlaps with the cpuset.cpus.partititon file. Can we fail a write to Yes, it substitutes and expands on cpuset.cpus.partition behavior. > cpuset.cpus.exclusive if those exclusive CPUs cannot be granted or will this > exclusive list is only valid if a valid partition can be formed. So we need > to properly manage the dependency between these 2 control files. So, I think cpus.exclusive can become the sole mechanism to arbitrate exclusive owenership of CPUs and .partition can depend on .exclusive. > Alternatively, I have no problem exposing cpuset.cpus.exclusive as a > read-only file. It is a bit problematic if we need to make it writable. I don't follow. How would remote partitions work then? > As for namespacing, you do raise a good point. I was thinking mostly from a > whole system point of view as the use case that I am aware of does not needs > that. To allow delegation of exclusive CPUs to a child cgroup, that cgroup > has to be a partition root itself. One compromise that I can think of is to > only allow automatic reservation only in such a scenario. In that case, I > need to support a remote load balanced partition as well and hierarchical > sub-partitions underneath it. That can be done with some extra code to the > existing v2 patchset without introducing too much complexity. > > IOW, the use of remote partition is only allowed on the whole system level > where one has access to the cgroup root. Exclusive CPUs distribution within > a container can only be done via the use of adjacent partitions with > automatic reservation. Will that be a good enough compromise from your point > of view? It seems too twisted to me. I'd much prefer it to be better integrated with the rest of cpuset. Thanks. -- tejun