Received: by 2002:a05:6358:9144:b0:117:f937:c515 with SMTP id r4csp1671725rwr; Wed, 3 May 2023 20:14:47 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ5F+s8ilw93Jx+URxxDFeGNkszJXH2Uaah28ictpQ74GcQ3jK0TyBmnQFBtKQd+Ikz37znt X-Received: by 2002:a17:903:185:b0:1aa:dba2:d155 with SMTP id z5-20020a170903018500b001aadba2d155mr2441838plg.48.1683170086978; Wed, 03 May 2023 20:14:46 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1683170086; cv=none; d=google.com; s=arc-20160816; b=KMAaTJ3OkAASUCjyMqris2uqgUP0u8/yKLmfuSRcZTEPf7w2QrtWbApLBwVMssGzIJ PFWWZE+9iXFgJCAmjPJwRdmutmvppb5opJ+SZiIrTgydgZxqmtV2oezTf5RTTsqxTSX0 3yZyxSPGX4mg1aIj1XbDQvJEYdF3VI6w13gPd0pf/Z+YcDPCPpInnUq/7erJcclo2oVD TOHw1JnJbZD5UtsZD3DqcHHOlA4HEsM/u+64hBCUEt4POXmx5wqSgtcWDtbTVrYo4uQ3 hBccPX5uH61Idals9F/6O6TpBnaMYr6E0jElPvgeqwWPZP8ljRYDwnkqrf5ellM8iSqI Chgw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id:dkim-signature; bh=zwXrGnlynmWthuhj861VrvHW2/WpagpsgRCBG9/mDP0=; b=V4p9t3rG/eguxPvfJO5S6MqSiac6kUk67CVA+2vXPIHE9a59B5QE91U3f4fYMzvvHB MEwfsd9NM1zIZ9ElNllhngpPjwlYhPk6OD5eUkwpC4F+vSxuIGfw6LRnfv+SXz/Yp3qk /N1t2FO8Pg7MC3KCITWofGV+d3qzN2Tp1PdYQkmAaJcPyDglDZiQg2cImdR05jVouHNw udjRsNGjHPbBNc/SkbDNBz1oHeEy8whczGTxxO/pyh8jBNrvuekCeY4z3yNwQggSmyie MzEH+jEqFpLqqEawxYN5LLh7AF0T17lILWfpjFldqmq+iBUivGtufChlkBB25KucQJ+3 /Uaw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=SEnfqTjv; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id z6-20020a170903018600b001ab115a96c7si5235936plg.340.2023.05.03.20.14.18; Wed, 03 May 2023 20:14:46 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=SEnfqTjv; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229649AbjEDDCd (ORCPT + 99 others); Wed, 3 May 2023 23:02:33 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35724 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229618AbjEDDCa (ORCPT ); Wed, 3 May 2023 23:02:30 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AFD97173A for ; Wed, 3 May 2023 20:01:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1683169304; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=zwXrGnlynmWthuhj861VrvHW2/WpagpsgRCBG9/mDP0=; b=SEnfqTjvDiBZJ2U2OY/a2NOGHIw60C/AX1RFkN1NZ5e7F1tUBrc/UEItuvy6Lu3gRzSmi0 MBwSo3xATh2ULU2mNqR/oQlu54VKl50xjNBBVhTqQBFCj2kQNUymnplbH4FK1aGyiJHW08 upLhaQ02DX+ogf7FAiFOvwS7I9XHKXI= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-322-Dc2ExmcBNOecvx6fkKjDEw-1; Wed, 03 May 2023 23:01:37 -0400 X-MC-Unique: Dc2ExmcBNOecvx6fkKjDEw-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.rdu2.redhat.com [10.11.54.1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 2F6C21C08796; Thu, 4 May 2023 03:01:37 +0000 (UTC) Received: from [10.22.17.228] (unknown [10.22.17.228]) by smtp.corp.redhat.com (Postfix) with ESMTP id 4186740C2064; Thu, 4 May 2023 03:01:36 +0000 (UTC) Message-ID: Date: Wed, 3 May 2023 23:01:36 -0400 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.7.1 Subject: Re: [RFC PATCH 0/5] cgroup/cpuset: A new "isolcpus" paritition Content-Language: en-US To: =?UTF-8?Q?Michal_Koutn=c3=bd?= Cc: Tejun Heo , Zefan Li , Johannes Weiner , Jonathan Corbet , Shuah Khan , linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, Juri Lelli , Valentin Schneider , Frederic Weisbecker References: <9862da55-5f41-24c3-f3bb-4045ccf24b2e@redhat.com> <226cb2da-e800-6531-4e57-cbf991022477@redhat.com> <60ec12dc-943c-b8f0-8b6f-97c5d332144c@redhat.com> <46d26abf-a725-b924-47fa-4419b20bbc02@redhat.com> From: Waiman Long In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 3.1 on 10.11.54.1 X-Spam-Status: No, score=-6.5 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,NICE_REPLY_A, RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE, T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 5/2/23 18:27, Michal Koutný wrote: > On Tue, May 02, 2023 at 05:26:17PM -0400, Waiman Long wrote: >> In the new scheme, the available cpus are still directly passed down to a >> descendant cgroup. However, isolated CPUs (or more generally CPUs dedicated >> to a partition) have to be exclusive. So what the cpuset.cpus.reserve does >> is to identify those exclusive CPUs that can be excluded from the >> effective_cpus of the parent cgroups before they are claimed by a child >> partition. Currently this is done automatically when a child partition is >> created off a parent partition root. The new scheme will break it into 2 >> separate steps without the requirement that the parent of a partition has to >> be a partition root itself. > new scheme > 1st step: > echo C >p/cpuset.cpus.reserve > # p/cpuset.cpus.effective == A-C (1) > 2nd step (claim): > echo C' >p/c/cpuset.cpus # C'⊆C > echo root >p/c/cpuset.cpus.partition It is something like that. However, the current scheme of automatic reservation is also supported, i.e. cpuset.cpus.reserve will be set automatically when the child cgroup becomes a valid partition as long as the cpuset.cpus.reserve file is not written to. This is for backward compatibility. Once it is written to, automatic mode will end and users have to manually set it afterward. > > current scheme > 1st step (configure): > echo C >p/c/cpuset.cpus > 2nd step (reserve & claim): > echo root >p/c/cpuset.cpus.partition > # p/cpuset.cpus.effective == A-C (2) > > As long as p/c is unpopulated, (1) and (2) are equal situations. > Why is the (different) two step procedure needed? > > Also the relaxation of requirement of a parent being a partition > confuses me -- if the parent is not a partition, i.e. it has no > exclusive ownership of CPUs but it can still "give" it to children -- is > child partition meant to be exclusive? (IOW can parent siblings reserve > some same CPUs?) A valid partition root has exclusive ownership of its CPUs. That is a rule that won't be changed. As a result, an incoming partition root cannot claim CPUs that have been allocated to another partition. To simplify thing, transition to a valid partition root is not possible if any of the CPUs in its cpuset.cpus are not in the cpuset.cpus.reserve of its ancestor or have been allocated to another partition. The partition root simply becomes invalid. The parent can virtually give the reserved CPUs from the root down the hierarchy and a child can claim them once it becomes a partition root. In manual mode, we need to check all the way up the hierarchy to the root to figure out what CPUs in cpuset.cpus.reserve are valid. It has higher overhead, but enabling partition is not a fast operation anyway. Cheers, Longman