Received: by 2002:a6b:500f:0:0:0:0:0 with SMTP id e15csp194721iob; Tue, 3 May 2022 14:56:18 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwKz7Itl5QjKfbbKciqBkE0GYvk8eMunXP/BqkWa4WN7SNlw8Xg1NkNWY/oU/25k7L12fbM X-Received: by 2002:a05:6402:f25:b0:427:bf59:ad72 with SMTP id i37-20020a0564020f2500b00427bf59ad72mr13267708eda.231.1651614978631; Tue, 03 May 2022 14:56:18 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1651614978; cv=none; d=google.com; s=arc-20160816; b=N/XyqqpArowtPhXwwqyOvQJ9jCD0UjZKwJbRIomLUjW2z6n86dCn0aBvGdWDMPkVxU QbNVaRBELm91dFbxVGjEZhV2coVOx+i7gbWn8oc9oKl5ydobVc8N2lJ2x9Uo/mM3MTPF YXU8uBg2fzXXULRA2yqtnPdyJ9eDmeTtrqaVyVTCijhpbD+V11ZO8LhYaya+cDuOHlOU SVxK9s7jd/Yx65nt0C+xZNef41EGYpOfSxpx69v/SoO5CzG0A4lowpgqiXD1fTb+YoWP NRhPQIjqCdlMM4ZtwaLTsvyGixImTM+2cwu5ANDaMvkp+woi1jrOr1mUJFzxXTyGnTpE nLgg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id:dkim-signature; bh=wAjZbwqar/ddmAXQ+agT5r7ZgOk1RQOfgtbGHsbQLps=; b=aNaqT4t8+dBrx0RlyWJ4iPDCzajOpUhQ2Ln1ncBvE9fn/cKBetF1REfSm1SuSu/wRT NHRpLxMr47149YtzmceUBrFvbyLaTLcMu5XM3dAKMuJwHlDgEdCSVg5bo5VRyAPMO3tz 53jXh8YqFHa/l+gJ3ocRsoTZMf+F9ZGxizketmpyZ3rzTCm3fYyWIdPeQeiJxkizub4y JWxUSgmAzJHcgrkkXhUr0/L5oAgSzI8vjlJGqrwla5nzJkgvhqbqAtwzcGzm31WJdTxI SNvrZ9rTbwqwu4Dwzd6BpW0wMI/HYEt+Kd8Eb2fpaW6x+hQarEYPSAyTNpIk86lvGRzg 51KA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=HTICCcmv; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id e17-20020a17090724d100b006e86e278dcesi14448035ejn.766.2022.05.03.14.55.54; Tue, 03 May 2022 14:56:18 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=HTICCcmv; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241595AbiECSqw (ORCPT + 99 others); Tue, 3 May 2022 14:46:52 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34768 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S241585AbiECSqu (ORCPT ); Tue, 3 May 2022 14:46:50 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 8AF6332ED4 for ; Tue, 3 May 2022 11:43:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1651603348; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=wAjZbwqar/ddmAXQ+agT5r7ZgOk1RQOfgtbGHsbQLps=; b=HTICCcmvyHKcILd/jRMpjEbSlparerMPF/fu3L1WDUc81l7TemH6xoYRSYkprufBBxj+9+ aDHi0SvMtAmsyg0v9V5pM87/+1UI2AzwA+A6T44gHMp41JyEje2mVbu8XKcLBApwbIMGlI Dvp/UdXOs8h2hywHWz7vg+r6Wy8SCmE= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-596-baiEU2_JMEaKQJhAkf0UPA-1; Tue, 03 May 2022 14:39:36 -0400 X-MC-Unique: baiEU2_JMEaKQJhAkf0UPA-1 Received: from smtp.corp.redhat.com (int-mx09.intmail.prod.int.rdu2.redhat.com [10.11.54.9]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id E2D7A3806703; Tue, 3 May 2022 18:39:23 +0000 (UTC) Received: from [10.18.17.215] (dhcp-17-215.bos.redhat.com [10.18.17.215]) by smtp.corp.redhat.com (Postfix) with ESMTP id 2A13B54CB17; Tue, 3 May 2022 18:39:23 +0000 (UTC) Message-ID: <66910f2d-a8c0-6c04-cca0-62a00fbad6cf@redhat.com> Date: Tue, 3 May 2022 14:39:23 -0400 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.5.0 Subject: Re: [PATCH v10 3/8] cgroup/cpuset: Allow no-task partition to have empty cpuset.cpus.effective Content-Language: en-US To: Phil Auld Cc: Tejun Heo , Zefan Li , Johannes Weiner , Jonathan Corbet , Shuah Khan , cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, Andrew Morton , Roman Gushchin , Peter Zijlstra , Juri Lelli , Frederic Weisbecker , Marcelo Tosatti , =?UTF-8?Q?Michal_Koutn=c3=bd?= References: <20220503162149.1764245-1-longman@redhat.com> <20220503162149.1764245-4-longman@redhat.com> <20220503175454.GA20433@pauld.bos.csb> From: Waiman Long In-Reply-To: <20220503175454.GA20433@pauld.bos.csb> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.85 on 10.11.54.9 X-Spam-Status: No, score=-6.3 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,NICE_REPLY_A, RCVD_IN_DNSWL_LOW,SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 5/3/22 13:54, Phil Auld wrote: > Hi Waiman > > On Tue, May 03, 2022 at 12:21:44PM -0400 Waiman Long wrote: >> Currently, a partition root cannot have empty "cpuset.cpus.effective". >> As a result, a parent partition root cannot distribute out all its CPUs >> to child partitions with no CPUs left. However in most cases, there >> shouldn't be any tasks associated with intermediate nodes of the default >> hierarchy. So the current rule is too restrictive and can waste valuable >> CPU resource. >> >> To address this issue, we are now allowing a partition to have empty >> "cpuset.cpus.effective" as long as it has no task. Therefore, a parent >> partition with no task can now have all its CPUs distributed out to its >> child partitions. The top cpuset always have some house-keeping tasks >> running and so its list of effective cpu can't never be empty. > s/never/ever/ It is a double negative. I think I will just remove "never". >> Once a partition with empty "cpuset.cpus.effective" is formed, no >> new task can be moved into it until "cpuset.cpus.effective" becomes >> non-empty. >> >> Signed-off-by: Waiman Long >> --- >> kernel/cgroup/cpuset.c | 111 +++++++++++++++++++++++++++++++---------- >> 1 file changed, 84 insertions(+), 27 deletions(-) >> >> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c >> index d156a39d7a08..7d9abd50a1b9 100644 >> --- a/kernel/cgroup/cpuset.c >> +++ b/kernel/cgroup/cpuset.c >> @@ -412,6 +412,41 @@ static inline bool is_in_v2_mode(void) >> (cpuset_cgrp_subsys.root->flags & CGRP_ROOT_CPUSET_V2_MODE); >> } >> >> +/** >> + * partition_is_populated - check if partition has tasks >> + * @cs: partition root to be checked >> + * @excluded_child: a child cpuset to be excluded in task checking >> + * Return: true if there are tasks, false otherwise >> + * >> + * It is assumed that @cs is a valid partition root. @excluded_child should >> + * be non-NULL when this cpuset is going to become a partition itself. >> + */ >> +static inline bool partition_is_populated(struct cpuset *cs, >> + struct cpuset *excluded_child) >> +{ >> + struct cgroup_subsys_state *css; >> + struct cpuset *child; >> + >> + if (cs->css.cgroup->nr_populated_csets) >> + return true; >> + if (!excluded_child && !cs->nr_subparts_cpus) >> + return cgroup_is_populated(cs->css.cgroup); >> + >> + rcu_read_lock(); >> + cpuset_for_each_child(child, css, cs) { >> + if (child == excluded_child) >> + continue; >> + if (is_partition_valid(child)) >> + continue; >> + if (cgroup_is_populated(child->css.cgroup)) { >> + rcu_read_unlock(); >> + return true; >> + } >> + } >> + rcu_read_unlock(); >> + return false; >> +} >> + >> /* >> * Return in pmask the portion of a task's cpusets's cpus_allowed that >> * are online and are capable of running the task. If none are found, >> @@ -1252,22 +1287,25 @@ static int update_parent_subparts_cpumask(struct cpuset *cs, int cmd, >> if ((cmd != partcmd_update) && css_has_online_children(&cs->css)) >> return -EBUSY; >> >> - /* >> - * Enabling partition root is not allowed if not all the CPUs >> - * can be granted from parent's effective_cpus or at least one >> - * CPU will be left after that. >> - */ >> - if ((cmd == partcmd_enable) && >> - (!cpumask_subset(cs->cpus_allowed, parent->effective_cpus) || >> - cpumask_equal(cs->cpus_allowed, parent->effective_cpus))) >> - return -EINVAL; >> - >> - /* >> - * A cpumask update cannot make parent's effective_cpus become empty. >> - */ >> adding = deleting = false; >> old_prs = new_prs = cs->partition_root_state; >> if (cmd == partcmd_enable) { >> + /* >> + * Enabling partition root is not allowed if not all the CPUs >> + * can be granted from parent's effective_cpus. >> + */ >> + if (!cpumask_subset(cs->cpus_allowed, parent->effective_cpus)) >> + return -EINVAL; >> + >> + /* >> + * A parent can be left with no CPU as long as there is no >> + * task directly associated with the parent partition. For >> + * such a parent, no new task can be moved into it. >> + */ >> + if (partition_is_populated(parent, cs) && >> + cpumask_equal(cs->cpus_allowed, parent->effective_cpus)) >> + return -EINVAL; >> + > You might consider switching these around to check the cpumasks first. Good point, partition_is_populated() is more expensive. > > >> cpumask_copy(tmp->addmask, cs->cpus_allowed); >> adding = true; >> } else if (cmd == partcmd_disable) { >> @@ -1289,9 +1327,10 @@ static int update_parent_subparts_cpumask(struct cpuset *cs, int cmd, >> adding = cpumask_andnot(tmp->addmask, tmp->addmask, >> parent->subparts_cpus); >> /* >> - * Return error if the new effective_cpus could become empty. >> + * Return error if the new effective_cpus could become empty >> + * and there are tasks in the parent. >> */ >> - if (adding && >> + if (adding && partition_is_populated(parent, cs) && >> cpumask_equal(parent->effective_cpus, tmp->addmask)) { > Same. Thanks, Longman