Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp4573785imu; Tue, 15 Jan 2019 02:19:08 -0800 (PST) X-Google-Smtp-Source: ALg8bN5JnfqJgBdjdZa83j/TW5OUQ93qLRzv384IJpYOuzTQaCpiywUSsMFBC+5XXSUWLTiSkMPt X-Received: by 2002:a17:902:6909:: with SMTP id j9mr3258475plk.196.1547547548465; Tue, 15 Jan 2019 02:19:08 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1547547548; cv=none; d=google.com; s=arc-20160816; b=CY/IPq5XL5yV95V63EyweKLZaqn1IOTKXL0i2LB2zN3jC5vil7r7x6eyzfZQUTm3cn s+LFAc/EJirvZAo8NqAQW/dccKVSGyE2negV/2dW7ZfCcZpuGLOrzuIw8ht4BmcVRuIi Kh+mhkotVHXJwsdKJ5iKVl+YGzKA446UJIydxbCc3lkoKHc7OhR7A5vXCBbEPiEp90hS 3cszNtAHM54KZgvi67El8kypNnRpxCn8m6TAvVHsesUTs0UXlpnoTgTdvNW49P+J6GNs Q5XINGd/V1IihQWFf5VK0Pjly05/t03Gpg4tYwYpO8Os91h6hj63/RgASN/fmOiUNasv BMHw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=yvN9h4s4NgrDArgqLiJYdClqMlG9yyCK61KlXu3UZcU=; b=ZqyYQinnOy1FugDXFHIpxH/eK6/cE/qXeYwzbF//JM+vxmyzs1CNYuOTAKTW0nsMjb mKIgvRlGDrKoIc90ph5SwtsBAWXG8BGuIWZPwJ7zwAUOdJcjn/RIoc6Lx4T3MvOLVMhF /MvYQf7AQQik/A2fbZDZPfzceKGjZnDeCCR1dQV6+9UnSypRg7NZ4qcLMZRmcXuz0rNZ 1nRaWNj8J25hv7bscQIg68oRXXe/TFG/Y9T5BPeSq5aeYB4LOFASIBK3MFePyh1n+HZi JJN88ErvTzaFG1a2eSuYYhJBzG/KUHj3Q0U2vIrAg4GG6KVqlma0UONSbQhOoLlXiv7+ 2HmQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id j14si2876009pgi.354.2019.01.15.02.18.53; Tue, 15 Jan 2019 02:19:08 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728933AbfAOKQN (ORCPT + 99 others); Tue, 15 Jan 2019 05:16:13 -0500 Received: from usa-sjc-mx-foss1.foss.arm.com ([217.140.101.70]:47008 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728884AbfAOKQL (ORCPT ); Tue, 15 Jan 2019 05:16:11 -0500 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 779961596; Tue, 15 Jan 2019 02:16:10 -0800 (PST) Received: from e110439-lin.cambridge.arm.com (e110439-lin.cambridge.arm.com [10.1.194.43]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 6093F3F70D; Tue, 15 Jan 2019 02:16:07 -0800 (PST) From: Patrick Bellasi To: linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, linux-api@vger.kernel.org Cc: Ingo Molnar , Peter Zijlstra , Tejun Heo , "Rafael J . Wysocki" , Vincent Guittot , Viresh Kumar , Paul Turner , Quentin Perret , Dietmar Eggemann , Morten Rasmussen , Juri Lelli , Todd Kjos , Joel Fernandes , Steve Muckle , Suren Baghdasaryan Subject: [PATCH v6 13/16] sched/core: uclamp: Propagate parent clamps Date: Tue, 15 Jan 2019 10:15:10 +0000 Message-Id: <20190115101513.2822-14-patrick.bellasi@arm.com> X-Mailer: git-send-email 2.19.2 In-Reply-To: <20190115101513.2822-1-patrick.bellasi@arm.com> References: <20190115101513.2822-1-patrick.bellasi@arm.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org In order to properly support hierarchical resources control, the cgroup delegation model requires that attribute writes from a child group never fail but still are (potentially) constrained based on parent's assigned resources. This requires to properly propagate and aggregate parent attributes down to its descendants. Let's implement this mechanism by adding a new "effective" clamp value for each task group. The effective clamp value is defined as the smaller value between the clamp value of a group and the effective clamp value of its parent. This is the actual clamp value enforced on tasks in a task group. Since it can be interesting for userspace, e.g. system management software, to know exactly what the currently propagated/enforced configuration is, the effective clamp values are exposed to user-space by means of a new pair of read-only attributes cpu.util.{min,max}.effective. Signed-off-by: Patrick Bellasi Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Tejun Heo --- Changes in v6: Others: - wholesale s/group/bucket/ - wholesale s/_{get,put}/_{inc,dec}/ to match refcount APIs --- Documentation/admin-guide/cgroup-v2.rst | 25 ++++++- include/linux/sched.h | 10 ++- kernel/sched/core.c | 89 +++++++++++++++++++++++-- 3 files changed, 117 insertions(+), 7 deletions(-) diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst index a059aaf7cce6..7aad2435e961 100644 --- a/Documentation/admin-guide/cgroup-v2.rst +++ b/Documentation/admin-guide/cgroup-v2.rst @@ -984,22 +984,43 @@ All time durations are in microseconds. A read-write single value file which exists on non-root cgroups. The default is "0", i.e. no utilization boosting. - The minimum utilization in the range [0, 1024]. + The requested minimum utilization in the range [0, 1024]. This interface allows reading and setting minimum utilization clamp values similar to the sched_setattr(2). This minimum utilization value is used to clamp the task specific minimum utilization clamp. + cpu.util.min.effective + A read-only single value file which exists on non-root cgroups and + reports minimum utilization clamp value currently enforced on a task + group. + + The actual minimum utilization in the range [0, 1024]. + + This value can be lower then cpu.util.min in case a parent cgroup + allows only smaller minimum utilization values. + cpu.util.max A read-write single value file which exists on non-root cgroups. The default is "1024". i.e. no utilization capping - The maximum utilization in the range [0, 1024]. + The requested maximum utilization in the range [0, 1024]. This interface allows reading and setting maximum utilization clamp values similar to the sched_setattr(2). This maximum utilization value is used to clamp the task specific maximum utilization clamp. + cpu.util.max.effective + A read-only single value file which exists on non-root cgroups and + reports maximum utilization clamp value currently enforced on a task + group. + + The actual maximum utilization in the range [0, 1024]. + + This value can be lower then cpu.util.max in case a parent cgroup + is enforcing a more restrictive clamping on max utilization. + + Memory ------ diff --git a/include/linux/sched.h b/include/linux/sched.h index c8f391d1cdc5..05d286524d70 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -625,7 +625,15 @@ struct uclamp_se { unsigned int bucket_id : bits_per(UCLAMP_BUCKETS); unsigned int mapped : 1; unsigned int active : 1; - /* Clamp bucket and value actually used by a RUNNABLE task */ + /* + * Clamp bucket and value actually used by a scheduling entity, + * i.e. a (RUNNABLE) task or a task group. + * For task groups, this is the value (possibly) enforced by a + * parent task group. + * For a task, this is the value (possibly) enforced by the + * task group the task is currently part of or by the system + * default clamp values, whichever is the most restrictive. + */ struct { unsigned int value : bits_per(SCHED_CAPACITY_SCALE); unsigned int bucket_id : bits_per(UCLAMP_BUCKETS); diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 29ae83fb9786..ddbd591b305c 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -1300,6 +1300,7 @@ static void __init init_uclamp(void) uc_se = &root_task_group.uclamp[clamp_id]; uc_se->value = uclamp_none(clamp_id); uc_se->bucket_id = 0; + uc_se->effective.value = uclamp_none(clamp_id); #endif } } @@ -6890,6 +6891,8 @@ static inline int alloc_uclamp_sched_group(struct task_group *tg, parent->uclamp[clamp_id].value; tg->uclamp[clamp_id].bucket_id = parent->uclamp[clamp_id].bucket_id; + tg->uclamp[clamp_id].effective.value = + parent->uclamp[clamp_id].effective.value; } #endif @@ -7143,6 +7146,45 @@ static void cpu_cgroup_attach(struct cgroup_taskset *tset) } #ifdef CONFIG_UCLAMP_TASK_GROUP +static void cpu_util_update_hier(struct cgroup_subsys_state *css, + int clamp_id, unsigned int value) +{ + struct cgroup_subsys_state *top_css = css; + struct uclamp_se *uc_se, *uc_parent; + + css_for_each_descendant_pre(css, top_css) { + /* + * The first visited task group is top_css, which clamp value + * is the one passed as parameter. For descendent task + * groups we consider their current value. + */ + uc_se = &css_tg(css)->uclamp[clamp_id]; + if (css != top_css) + value = uc_se->value; + + /* + * Skip the whole subtrees if the current effective clamp is + * already matching the TG's clamp value. + * In this case, all the subtrees already have top_value, or a + * more restrictive value, as effective clamp. + */ + uc_parent = &css_tg(css)->parent->uclamp[clamp_id]; + if (uc_se->effective.value == value && + uc_parent->effective.value >= value) { + css = css_rightmost_descendant(css); + continue; + } + + /* Propagate the most restrictive effective value */ + if (uc_parent->effective.value < value) + value = uc_parent->effective.value; + if (uc_se->effective.value == value) + continue; + + uc_se->effective.value = value; + } +} + static int cpu_util_min_write_u64(struct cgroup_subsys_state *css, struct cftype *cftype, u64 min_value) { @@ -7162,6 +7204,9 @@ static int cpu_util_min_write_u64(struct cgroup_subsys_state *css, goto out; } + /* Update effective clamps to track the most restrictive value */ + cpu_util_update_hier(css, UCLAMP_MIN, min_value); + out: rcu_read_unlock(); @@ -7187,6 +7232,9 @@ static int cpu_util_max_write_u64(struct cgroup_subsys_state *css, goto out; } + /* Update effective clamps to track the most restrictive value */ + cpu_util_update_hier(css, UCLAMP_MAX, max_value); + out: rcu_read_unlock(); @@ -7194,14 +7242,17 @@ static int cpu_util_max_write_u64(struct cgroup_subsys_state *css, } static inline u64 cpu_uclamp_read(struct cgroup_subsys_state *css, - enum uclamp_id clamp_id) + enum uclamp_id clamp_id, + bool effective) { struct task_group *tg; u64 util_clamp; rcu_read_lock(); tg = css_tg(css); - util_clamp = tg->uclamp[clamp_id].value; + util_clamp = effective + ? tg->uclamp[clamp_id].effective.value + : tg->uclamp[clamp_id].value; rcu_read_unlock(); return util_clamp; @@ -7210,13 +7261,25 @@ static inline u64 cpu_uclamp_read(struct cgroup_subsys_state *css, static u64 cpu_util_min_read_u64(struct cgroup_subsys_state *css, struct cftype *cft) { - return cpu_uclamp_read(css, UCLAMP_MIN); + return cpu_uclamp_read(css, UCLAMP_MIN, false); } static u64 cpu_util_max_read_u64(struct cgroup_subsys_state *css, struct cftype *cft) { - return cpu_uclamp_read(css, UCLAMP_MAX); + return cpu_uclamp_read(css, UCLAMP_MAX, false); +} + +static u64 cpu_util_min_effective_read_u64(struct cgroup_subsys_state *css, + struct cftype *cft) +{ + return cpu_uclamp_read(css, UCLAMP_MIN, true); +} + +static u64 cpu_util_max_effective_read_u64(struct cgroup_subsys_state *css, + struct cftype *cft) +{ + return cpu_uclamp_read(css, UCLAMP_MAX, true); } #endif /* CONFIG_UCLAMP_TASK_GROUP */ @@ -7564,11 +7627,19 @@ static struct cftype cpu_legacy_files[] = { .read_u64 = cpu_util_min_read_u64, .write_u64 = cpu_util_min_write_u64, }, + { + .name = "util.min.effective", + .read_u64 = cpu_util_min_effective_read_u64, + }, { .name = "util.max", .read_u64 = cpu_util_max_read_u64, .write_u64 = cpu_util_max_write_u64, }, + { + .name = "util.max.effective", + .read_u64 = cpu_util_max_effective_read_u64, + }, #endif { } /* Terminate */ }; @@ -7744,12 +7815,22 @@ static struct cftype cpu_files[] = { .read_u64 = cpu_util_min_read_u64, .write_u64 = cpu_util_min_write_u64, }, + { + .name = "util.min.effective", + .flags = CFTYPE_NOT_ON_ROOT, + .read_u64 = cpu_util_min_effective_read_u64, + }, { .name = "util.max", .flags = CFTYPE_NOT_ON_ROOT, .read_u64 = cpu_util_max_read_u64, .write_u64 = cpu_util_max_write_u64, }, + { + .name = "util.max.effective", + .flags = CFTYPE_NOT_ON_ROOT, + .read_u64 = cpu_util_max_effective_read_u64, + }, #endif { } /* terminate */ }; -- 2.19.2