Received: by 2002:ac0:a594:0:0:0:0:0 with SMTP id m20-v6csp3740902imm; Thu, 17 May 2018 13:59:18 -0700 (PDT) X-Google-Smtp-Source: AB8JxZoiS/Nihh+1NsH1ZXEcmBoYRUsMhSuP/WrQXMb7q1/WSjWp1yu353iGmQxucnBxVXmEkKuX X-Received: by 2002:a17:902:1a8:: with SMTP id b37-v6mr6794098plb.326.1526590758107; Thu, 17 May 2018 13:59:18 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1526590758; cv=none; d=google.com; s=arc-20160816; b=KxOEmTsxglEgVHzZi9HFz3XsJnQpkyTxeVTHmcZT0qkVIkKv7yspWSaWUP6vhMHuDR WF5dHt1InhqBHbdDAN8PyNfIdhV6qB4XnClbebU/rZIZzjumtPabJNn1LJiOsSXNu92t o2YDBCfgNk6LZ9toU0EJo5UDt+vWz+hF3GX+zpCtADYtTeqYKqi1k+ZzvGP+liIHT0nC EFQED+HapSD51gE1ezW1aSrme0l40YS8aWxg54NE4HKHeeKDdSAzHih5/JQJDWwDtWCZ lzXjs2EZk+T0GgFoL2q6B7PsvI+Ltz7jkrctd1VyzV0j+fOybgoCTVPx6zGIBG5iV+Qw pSWw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:arc-authentication-results; bh=65b0Yqn7Xcalr98UfLe4DOL+Kh6PCNEIOfC41cm8IhE=; b=CSi3roV7kzAB2Pa2QAmUclqjzj8V3seRFHi/wvk09sbnn6lqiO7w0sOY2HyKdIdpt9 gtOnHo3ScAuHwLz/TrygRXcD58LhGrxatNkyDEG/DoxDlNe5ulTFUWiJfr12EjZzHMhN LiOsjsQl715YR7UfTfUNCA97zwK6QNkeRxSaxhwQ67FJ5042TRhrU0QwUZ5mxVHyIYRq drSTQYs8Px8wVxsOzrnPi5tlb5i9sO8nmgmfs+5cDlzDudPWZL2mJlOxOJXb5Q1Xpzur LwFWDH57r2BxupNtweBEPtcGapxKexcrM3RtHGeqQGhQ+CB0jCMc3AV5F91g9YZ1kZGw M7yg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 23-v6si6002046pfn.28.2018.05.17.13.59.03; Thu, 17 May 2018 13:59:18 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752472AbeEQU5T (ORCPT + 99 others); Thu, 17 May 2018 16:57:19 -0400 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:43720 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751898AbeEQUz6 (ORCPT ); Thu, 17 May 2018 16:55:58 -0400 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 83A5F79D36; Thu, 17 May 2018 20:55:57 +0000 (UTC) Received: from llong.com (dhcp-17-164.bos.redhat.com [10.18.17.164]) by smtp.corp.redhat.com (Postfix) with ESMTP id 22F312024CBC; Thu, 17 May 2018 20:55:57 +0000 (UTC) From: Waiman Long To: Tejun Heo , Li Zefan , Johannes Weiner , Peter Zijlstra , Ingo Molnar Cc: cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, kernel-team@fb.com, pjt@google.com, luto@amacapital.net, Mike Galbraith , torvalds@linux-foundation.org, Roman Gushchin , Juri Lelli , Waiman Long Subject: [PATCH v8 3/6] cpuset: Add cpuset.sched.load_balance flag to v2 Date: Thu, 17 May 2018 16:55:42 -0400 Message-Id: <1526590545-3350-4-git-send-email-longman@redhat.com> In-Reply-To: <1526590545-3350-1-git-send-email-longman@redhat.com> References: <1526590545-3350-1-git-send-email-longman@redhat.com> X-Scanned-By: MIMEDefang 2.78 on 10.11.54.4 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.2]); Thu, 17 May 2018 20:55:57 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.2]); Thu, 17 May 2018 20:55:57 +0000 (UTC) for IP:'10.11.54.4' DOMAIN:'int-mx04.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'longman@redhat.com' RCPT:'' Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org The sched.load_balance flag is needed to enable CPU isolation similar to what can be done with the "isolcpus" kernel boot parameter. Its value can only be changed in a scheduling domain with no child cpusets. On a non-scheduling domain cpuset, the value of sched.load_balance is inherited from its parent. This flag is set by the parent and is not delegatable. Signed-off-by: Waiman Long --- Documentation/cgroup-v2.txt | 24 ++++++++++++++++++++ kernel/cgroup/cpuset.c | 53 +++++++++++++++++++++++++++++++++++++++++---- 2 files changed, 73 insertions(+), 4 deletions(-) diff --git a/Documentation/cgroup-v2.txt b/Documentation/cgroup-v2.txt index 54d9e22..071b634d 100644 --- a/Documentation/cgroup-v2.txt +++ b/Documentation/cgroup-v2.txt @@ -1536,6 +1536,30 @@ Cpuset Interface Files CPUs of the parent cgroup. Once it is set, this flag cannot be cleared if there are any child cgroups with cpuset enabled. + A parent cgroup cannot distribute all its CPUs to child + scheduling domain cgroups unless its load balancing flag is + turned off. + + cpuset.sched.load_balance + A read-write single value file which exists on non-root + cpuset-enabled cgroups. It is a binary value flag that accepts + either "0" (off) or a non-zero value (on). This flag is set + by the parent and is not delegatable. + + When it is on, tasks within this cpuset will be load-balanced + by the kernel scheduler. Tasks will be moved from CPUs with + high load to other CPUs within the same cpuset with less load + periodically. + + When it is off, there will be no load balancing among CPUs on + this cgroup. Tasks will stay in the CPUs they are running on + and will not be moved to other CPUs. + + The initial value of this flag is "1". This flag is then + inherited by child cgroups with cpuset enabled. Its state + can only be changed on a scheduling domain cgroup with no + cpuset-enabled children. + Device controller ----------------- diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index e1a1af0..368e1b7 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -510,7 +510,7 @@ static int validate_change(struct cpuset *cur, struct cpuset *trial) par = parent_cs(cur); - /* On legacy hiearchy, we must be a subset of our parent cpuset. */ + /* On legacy hierarchy, we must be a subset of our parent cpuset. */ ret = -EACCES; if (!is_in_v2_mode() && !is_cpuset_subset(trial, par)) goto out; @@ -1061,6 +1061,14 @@ static int update_isolated_cpumask(struct cpuset *cpuset, goto out; /* + * A parent can't distribute all its CPUs to child scheduling + * domain cpusets unless load balancing is off. + */ + if (adding & !deleting && is_sched_load_balance(parent) && + cpumask_equal(addmask, parent->effective_cpus)) + goto out; + + /* * Check if any CPUs in addmask or delmask are in a sibling cpuset. * An empty sibling cpus_allowed means it is the same as parent's * effective_cpus. This checking is skipped if the cpuset is dying. @@ -1531,6 +1539,16 @@ static int update_flag(cpuset_flagbits_t bit, struct cpuset *cs, domain_flag_changed = (is_sched_domain(cs) != is_sched_domain(trialcs)); + /* + * On default hierachy, a load balance flag change is only allowed + * in a scheduling domain with no child cpuset. + */ + if (cgroup_subsys_on_dfl(cpuset_cgrp_subsys) && balance_flag_changed && + (!is_sched_domain(cs) || css_has_online_children(&cs->css))) { + err = -EINVAL; + goto out; + } + if (domain_flag_changed) { err = turning_on ? update_isolated_cpumask(cs, NULL, cs->cpus_allowed) @@ -2187,6 +2205,14 @@ static s64 cpuset_read_s64(struct cgroup_subsys_state *css, struct cftype *cft) .flags = CFTYPE_NOT_ON_ROOT, }, + { + .name = "sched.load_balance", + .read_u64 = cpuset_read_u64, + .write_u64 = cpuset_write_u64, + .private = FILE_SCHED_LOAD_BALANCE, + .flags = CFTYPE_NOT_ON_ROOT, + }, + { } /* terminate */ }; @@ -2200,19 +2226,38 @@ static s64 cpuset_read_s64(struct cgroup_subsys_state *css, struct cftype *cft) cpuset_css_alloc(struct cgroup_subsys_state *parent_css) { struct cpuset *cs; + struct cgroup_subsys_state *errptr = ERR_PTR(-ENOMEM); if (!parent_css) return &top_cpuset.css; cs = kzalloc(sizeof(*cs), GFP_KERNEL); if (!cs) - return ERR_PTR(-ENOMEM); + return errptr; if (!alloc_cpumask_var(&cs->cpus_allowed, GFP_KERNEL)) goto free_cs; if (!alloc_cpumask_var(&cs->effective_cpus, GFP_KERNEL)) goto free_cpus; - set_bit(CS_SCHED_LOAD_BALANCE, &cs->flags); + /* + * On default hierarchy, inherit parent's CS_SCHED_LOAD_BALANCE flag. + * Creating new cpuset is also not allowed if the effective_cpus of + * its parent is empty. + */ + if (cgroup_subsys_on_dfl(cpuset_cgrp_subsys)) { + struct cpuset *parent = css_cs(parent_css); + + if (test_bit(CS_SCHED_LOAD_BALANCE, &parent->flags)) + set_bit(CS_SCHED_LOAD_BALANCE, &cs->flags); + + if (cpumask_empty(parent->effective_cpus)) { + errptr = ERR_PTR(-EINVAL); + goto free_cpus; + } + } else { + set_bit(CS_SCHED_LOAD_BALANCE, &cs->flags); + } + cpumask_clear(cs->cpus_allowed); nodes_clear(cs->mems_allowed); cpumask_clear(cs->effective_cpus); @@ -2226,7 +2271,7 @@ static s64 cpuset_read_s64(struct cgroup_subsys_state *css, struct cftype *cft) free_cpumask_var(cs->cpus_allowed); free_cs: kfree(cs); - return ERR_PTR(-ENOMEM); + return errptr; } static int cpuset_css_online(struct cgroup_subsys_state *css) -- 1.8.3.1