Received: by 2002:a05:6358:53a8:b0:117:f937:c515 with SMTP id z40csp967769rwe; Fri, 14 Apr 2023 12:09:51 -0700 (PDT) X-Google-Smtp-Source: AKy350b9A0e9JXOIkzvGOawJU/UDK7gSbkd1+/gvjMqZEJKNkEV3aCKqGHjkxUGNt8gkRQltTMxC X-Received: by 2002:a05:6a20:840a:b0:ee:bac2:c6e0 with SMTP id c10-20020a056a20840a00b000eebac2c6e0mr360019pzd.44.1681499390842; Fri, 14 Apr 2023 12:09:50 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1681499390; cv=none; d=google.com; s=arc-20160816; b=anJaDg/wt9+7cQGPsr2SIbPX37rMLscoy3kZNPQ++iZUVGORgmWQ3NL/P7+38E1aQ/ I8z5JbgZXN8bjk1TlzjGf4T20w2IoXeS/+mgM9JoZALPNheQXzFed6O6lvnipXTXqwAF j9ec2R3M6R1HNG+rL+5zlsp7sxz0z+0p9mVCHHcpuWvWKPEZY9E7qdlP5hHFs4ERRXzj 5pk77jBNYP9RNsUe7wbcqy1e9fVjSEb6yzR6trAGgpm6xXgqhFh0Qc5AaJrkv+pEBfVL knyjO4jd6GVhqn/xIUhHVhTOjsnrBnjcRHY9o0LMTudAAjk/CRlMMj77dXDQZUXhwoTd l2WA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:references :cc:to:from:content-language:subject:user-agent:mime-version:date :message-id:dkim-signature; bh=dkAMl7zGu0d47GwL2c/7Q7hgpJNswSuiV5ht7dE8kSA=; b=u/h+8kJduwSXdCcTYDmjqEgXf3lF7avbudG7zjcBL96UU9h/kHz7qLYxNLex/eYeDa gGtsCJiy5Xfr3B5WyYhsInEGq5DgQQnp0AeNAuFPfvbQbnMOPMoXO2xccdWFXBSJtEEq 2hRxxXdrUP4EGI/wWndVy0TpzzYQwhkm/mqYmX8gK6tc1wymeibW4ufAGe7DSHTPTtDH pWFHsIq5uK0/L3++0ln8ieYskBucvadewKmAdLudsUqqZvKN637WJUStljy6eEG6Ryk2 gjThrNlo2IvIl8zG5OCPtzAYJIHc7QA09ekoYVVEPiDJgdtBdaowZ+YGBWCoJ0Lbmncf zyXg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b="ep+n/TAM"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id bk13-20020a056a02028d00b0051b75d13052si1696463pgb.408.2023.04.14.12.09.36; Fri, 14 Apr 2023 12:09:50 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b="ep+n/TAM"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229514AbjDNTHW (ORCPT + 99 others); Fri, 14 Apr 2023 15:07:22 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42302 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229628AbjDNTHV (ORCPT ); Fri, 14 Apr 2023 15:07:21 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C45E61703 for ; Fri, 14 Apr 2023 12:06:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1681499193; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=dkAMl7zGu0d47GwL2c/7Q7hgpJNswSuiV5ht7dE8kSA=; b=ep+n/TAMUgxv/3a8iVKz8w8x/FAPORXwZss5uLQd5eMkkqCgHS2rXxKrBPoYI0kk5t2ePA 0iGtz+guXDLW7hSOcKB7WCxSNBcJwthOGAOtRrSV6Sp0RThQLMI01D5g78MmfeePBoohFV tJAPDBROi95D9/C13v+PtjupKR9DIyo= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-340-OTxuLRa6M7eOHyW_t1s0_Q-1; Fri, 14 Apr 2023 15:06:28 -0400 X-MC-Unique: OTxuLRa6M7eOHyW_t1s0_Q-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 181AD384708A; Fri, 14 Apr 2023 19:06:28 +0000 (UTC) Received: from [10.22.18.140] (unknown [10.22.18.140]) by smtp.corp.redhat.com (Postfix) with ESMTP id 7110A2027043; Fri, 14 Apr 2023 19:06:27 +0000 (UTC) Message-ID: <46d26abf-a725-b924-47fa-4419b20bbc02@redhat.com> Date: Fri, 14 Apr 2023 15:06:27 -0400 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.7.1 Subject: Re: [RFC PATCH 0/5] cgroup/cpuset: A new "isolcpus" paritition Content-Language: en-US From: Waiman Long To: Tejun Heo Cc: Zefan Li , Johannes Weiner , Jonathan Corbet , Shuah Khan , linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, Juri Lelli , Valentin Schneider , Frederic Weisbecker References: <1b8d9128-d076-7d37-767d-11d6af314662@redhat.com> <9862da55-5f41-24c3-f3bb-4045ccf24b2e@redhat.com> <226cb2da-e800-6531-4e57-cbf991022477@redhat.com> <60ec12dc-943c-b8f0-8b6f-97c5d332144c@redhat.com> In-Reply-To: <60ec12dc-943c-b8f0-8b6f-97c5d332144c@redhat.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 3.1 on 10.11.54.4 X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,NICE_REPLY_A, RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE, T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 4/14/23 13:38, Waiman Long wrote: > On 4/14/23 13:34, Tejun Heo wrote: >> On Fri, Apr 14, 2023 at 01:29:25PM -0400, Waiman Long wrote: >>> On 4/14/23 12:54, Tejun Heo wrote: >>>> On Thu, Apr 13, 2023 at 09:22:19PM -0400, Waiman Long wrote: >>>>> I now have a slightly different idea of how to do that. We already >>>>> have an >>>>> internal cpumask for partitioning - subparts_cpus. I am thinking >>>>> about >>>>> exposing it as cpuset.cpus.reserve. The current way of creating >>>>> subpartitions will be called automatic reservation and require a >>>>> direct >>>>> parent/child partition relationship. But as soon as a user write >>>>> anything to >>>>> it, it will break automatic reservation and require manual >>>>> reservation going >>>>> forward. >>>>> >>>>> In that way, we can keep the old behavior, but also support new >>>>> use cases. I >>>>> am going to work on that. >>>> I'm not sure I fully understand the proposed behavior but it does >>>> sound more >>>> quirky. >>> The idea is to use the existing subparts_cpus for cpu reservation >>> instead of >>> adding a new cpumask for that purpose. The current way of partition >>> creation >>> does cpus reservation (setting subparts_cpus) automatically with the >>> constraint that the parent of a partition must be a partition root >>> itself. >>> One way to relax this constraint is to allow a new manual >>> reservation mode >>> where users can set reserve cpus manually and distribute them down the >>> hierarchy before activating a partition to use those cpus. >>> >>> Now the question is how to enable this new manual reservation mode. >>> One way >>> to do it is to enable it whenever the new cpuset.cpus.reserve file is >>> modified. Alternatively, we may enable it by a cgroupfs mount option >>> or a >>> boot command line option. >> It'd probably be best if we can keep the behavior within cgroupfs if >> possible. Would you mind writing up the documentation section >> describing the >> behavior beforehand? I think things would be clearer if we look at it >> from >> the interface documentation side. > > Sure, will do that. I need some time and so it will be early next week. Just kidding :-) Below is a draft of the new cpuset.cpus.reserve cgroupfs file:   cpuset.cpus.reserve         A read-write multiple values file which exists on all         cpuset-enabled cgroups.         It lists the reserved CPUs to be used for the creation of         child partitions.  See the section on "cpuset.cpus.partition"         below for more information on cpuset partition.  These reserved         CPUs should be a subset of "cpuset.cpus" and will be mutually         exclusive of "cpuset.cpus.effective" when used since these         reserved CPUs cannot be used by tasks in the current cgroup.         There are two modes for partition CPUs reservation -         auto or manual.  The system starts up in auto mode where         "cpuset.cpus.reserve" will be set automatically when valid         child partitions are created and users don't need to touch the         file at all.  This mode has the limitation that the parent of a         partition must be a partition root itself.  So child partition         has to be created one-by-one from the cgroup root down.         To enable the creation of a partition down in the hierarchy         without the intermediate cgroups to be partition roots, one         has to turn on the manual reservation mode by writing directly         to "cpuset.cpus.reserve" with a value different from its         current value.  By distributing the reserve CPUs down the cgroup         hierarchy to the parent of the target cgroup, this target cgroup         can be switched to become a partition root if its "cpuset.cpus"         is a subset of the set of valid reserve CPUs in its parent. The         set of valid reserve CPUs is the set that are present in all         its ancestors' "cpuset.cpus.reserve" up to cgroup root and         which have not been allocated to another valid partition yet.         Once manual reservation mode is enabled, a cgroup administrator         must always set up "cpuset.cpus.reserve" files properly before         a valid partition can be created. So this mode has more         administrative overhead but with greater flexibility. Cheers, Longman