Received: by 2002:ac0:98c7:0:0:0:0:0 with SMTP id g7-v6csp3739900imd; Mon, 29 Oct 2018 11:35:02 -0700 (PDT) X-Google-Smtp-Source: AJdET5c50u483xY2S9l2f3qcKELs44Acpa+yGab/MQILWJUyK9XWNMyGBlwM5GXdGMPGD33cMS5D X-Received: by 2002:a63:924e:: with SMTP id s14-v6mr14544266pgn.141.1540838102431; Mon, 29 Oct 2018 11:35:02 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1540838102; cv=none; d=google.com; s=arc-20160816; b=YMCrfwDFU+3HuWPrs5QVyhmiDaNHwewqoUChXEmu7AXafohYIhKC6yeubmCL9pckgq i1r+MRFAh+AEMPzOaN+aTGHJQEmKIWHkh1N/IHOXmM+SB/BasFLWr+gpGMjikf2BZlHC seZ4pjDEZkdodWYqiGF5su7jgztgBVhPiW3cjxnOZ52rTHJ5VsF2TrG3HT9kCbduTHd0 eT50UXWr217q5HBU9SopqKcx3MJ0m2hg9zqAEJQuX5utFpweO3YMImbQJzcoq+/UrY9k K+V/4od1Edc5IIugSIURqsm0cBVlSChABmUZK8thcGnI6A02/QzQyfJCHnUrYV5xDSps m6kA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from; bh=J28EHVtv5wRS9qSEBRmsO2odmmkoY92hothm281m5y4=; b=Prq7OxHEcUVXZOFUucxjl4o/3TCudJ0O5HGpDbB9Cuf7hE3Ov6EZjwcoS9zgNqa3ix Q09UXRu8XSPLt2ks/jQnCUyP+T9uDtTdepnGmhqIGUHSEIMMjwbGWilWKxkp1dyuLx5j f/9uvUwP4Czaaors6zYcc7o4P9H7JGlIUDNexrK6UhQS4iBieewTo2d1CANyhcRxyR7V TR0FKUbRaOfTMBdHo6ESPV1BeraU37aayzmLK+dHOTwcg8pHoVizR8IIp1MDIaj4tyQm rr4PbTDzkz/YpMwxfaIu0uogCr/wbhQbHOz1MyDrBlgAAwbmtSOr1YOJfhCw5kXovkZi uCZQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id e12-v6si20556428pfi.271.2018.10.29.11.34.47; Mon, 29 Oct 2018 11:35:02 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729545AbeJ3DXq (ORCPT + 99 others); Mon, 29 Oct 2018 23:23:46 -0400 Received: from foss.arm.com ([217.140.101.70]:44630 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729275AbeJ3DXq (ORCPT ); Mon, 29 Oct 2018 23:23:46 -0400 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id DB86C341; Mon, 29 Oct 2018 11:33:54 -0700 (PDT) Received: from e110439-lin.cambridge.arm.com (e110439-lin.cambridge.arm.com [10.1.194.43]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id EA9BA3F6A8; Mon, 29 Oct 2018 11:33:51 -0700 (PDT) From: Patrick Bellasi To: linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org Cc: Ingo Molnar , Peter Zijlstra , Tejun Heo , "Rafael J . Wysocki" , Vincent Guittot , Viresh Kumar , Paul Turner , Quentin Perret , Dietmar Eggemann , Morten Rasmussen , Juri Lelli , Todd Kjos , Joel Fernandes , Steve Muckle , Suren Baghdasaryan Subject: [PATCH v5 07/15] sched/core: uclamp: add clamp group bucketing support Date: Mon, 29 Oct 2018 18:33:02 +0000 Message-Id: <20181029183311.29175-9-patrick.bellasi@arm.com> X-Mailer: git-send-email 2.18.0 In-Reply-To: <20181029183311.29175-1-patrick.bellasi@arm.com> References: <20181029183311.29175-1-patrick.bellasi@arm.com> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org The limited number of clamp groups is required to have an effective and efficient run-time tracking of the clamp groups required by RUNNABLE tasks. However, we must ensure we can always know which clamp group to use in the fast path (task enqueue/dequeue time) to refcount each tasks, whatever its clamp value is. To this purpose we can trade-off CPU clamping precision for efficiency by turning CPU's clamp groups into buckets, each one representing a range of possible clamp values. The number of clamp groups configured at compile time defines the range of utilization clamp values tracked by each CPU clamp group. For example, with the default configuration: CONFIG_UCLAMP_GROUPS_COUNT 5 we will have 5 clamp groups tracking 20% utilization each. In this case, a task with util_min=25% will have group_id=1. This bucketing mechanisms applies only on the fast-path, when tasks are refcounted into a per-CPU clamp groups at enqueue/dequeue time, while tasks keep tracking their task-specific clamp value requested from user-space. This allows to track, within each bucket, the maximum task-specific clamp value for tasks refcounted in that bucket. In the example above, a 25% boosted tasks will be refcounted in the [20..39]% bucket and will set the bucket clamp effective value to 25%. If a second 30% boosted task should be co-scheduled on the same CPU, that task will be refcounted in the same bucket of the first task and it will boost the bucket clamp effective value to 30%. The clamp effective value of a bucket will be reset to its nominal value (named the "group_value", 20% in the example above) when there are anymore tasks refcounted in that bucket. On a real system we expect a limited number of sufficiently different clamp values thus, this simple bucketing mechanism is still effective to track tasks clamp effective values quite closely. An additional boost/capping margin can be added to some tasks, in the example above the 25% task will be boosted to 30% until it exits the CPU. If that should be considered not acceptable on certain systems, it's always possible to reduce the margin by increasing the bucketing resolution. Indeed, by properly setting the number of CONFIG_UCLAMP_GROUPS_COUNT, we can trade off memory efficiency for resolution. The already existing mechanism to map "clamp values" into "clamp groups" ensures to use only the minimal set of clamp groups actually required. For example, if we have only 20% and 25% clamped tasks, by setting: CONFIG_UCLAMP_GROUPS_COUNT 20 we will allocate 20 5% resolution buckets, however we will use only 2 two of them in the fast-path, since their 5% resolution will be enough to always distinguish them. Signed-off-by: Patrick Bellasi Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Rafael J. Wysocki Cc: Paul Turner Cc: Suren Baghdasaryan Cc: Todd Kjos Cc: Joel Fernandes Cc: Steve Muckle Cc: Juri Lelli Cc: Quentin Perret Cc: Dietmar Eggemann Cc: Morten Rasmussen Cc: linux-kernel@vger.kernel.org Cc: linux-pm@vger.kernel.org --- Changes in v5: Others: - renamed uclamp_round into uclamp_group_value to better represent what this function returns - rebased on v4.19 Changes in v4: Message-ID: <20180809152313.lewfhufidhxb2qrk@darkstar> - implements the idea discussed in this thread Others: - new patch added in this version - rebased on v4.19-rc1 --- kernel/sched/core.c | 48 ++++++++++++++++++++++++++++++++++++++++----- 1 file changed, 43 insertions(+), 5 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index b23f80c07be9..9b49062439f3 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -783,6 +783,27 @@ union uclamp_map { */ static union uclamp_map uclamp_maps[UCLAMP_CNT][UCLAMP_GROUPS]; +/* + * uclamp_group_value: get the "group value" for a given "clamp value" + * @value: the utiliation "clamp value" to translate + * + * The number of clamp group, which is defined at compile time, allows to + * track a finite number of different clamp values. Thus clamp values are + * grouped into bins each one representing a different "group value". + * This method returns the "group value" corresponding to the specified + * "clamp value". + */ +static inline unsigned int uclamp_group_value(unsigned int clamp_value) +{ +#define UCLAMP_GROUP_DELTA (SCHED_CAPACITY_SCALE / CONFIG_UCLAMP_GROUPS_COUNT) +#define UCLAMP_GROUP_UPPER (UCLAMP_GROUP_DELTA * CONFIG_UCLAMP_GROUPS_COUNT) + + if (clamp_value >= UCLAMP_GROUP_UPPER) + return SCHED_CAPACITY_SCALE; + + return UCLAMP_GROUP_DELTA * (clamp_value / UCLAMP_GROUP_DELTA); +} + /** * uclamp_cpu_update: updates the utilization clamp of a CPU * @rq: the CPU's rq which utilization clamp has to be updated @@ -848,6 +869,7 @@ static inline void uclamp_cpu_update(struct rq *rq, unsigned int clamp_id, static inline void uclamp_cpu_get_id(struct task_struct *p, struct rq *rq, unsigned int clamp_id) { + unsigned int clamp_value; unsigned int group_id; if (unlikely(!p->uclamp[clamp_id].mapped)) @@ -870,6 +892,11 @@ static inline void uclamp_cpu_get_id(struct task_struct *p, struct rq *rq, rq->uclamp.value[clamp_id] = p->uclamp[clamp_id].value; } + /* CPU's clamp groups track the max effective clamp value */ + clamp_value = p->uclamp[clamp_id].value; + if (clamp_value > rq->uclamp.group[clamp_id][group_id].value) + rq->uclamp.group[clamp_id][group_id].value = clamp_value; + if (rq->uclamp.value[clamp_id] < p->uclamp[clamp_id].value) rq->uclamp.value[clamp_id] = p->uclamp[clamp_id].value; } @@ -917,8 +944,16 @@ static inline void uclamp_cpu_put_id(struct task_struct *p, struct rq *rq, cpu_of(rq), clamp_id, group_id); } #endif - if (clamp_value >= rq->uclamp.value[clamp_id]) + if (clamp_value >= rq->uclamp.value[clamp_id]) { + /* + * Each CPU's clamp group value is reset to its nominal group + * value whenever there are anymore RUNNABLE tasks refcounting + * that clamp group. + */ + rq->uclamp.group[clamp_id][group_id].value = + uclamp_maps[clamp_id][group_id].value; uclamp_cpu_update(rq, clamp_id, clamp_value); + } } /** @@ -1065,10 +1100,13 @@ static void uclamp_group_get(struct task_struct *p, struct uclamp_se *uc_se, unsigned int prev_group_id = uc_se->group_id; union uclamp_map uc_map_old, uc_map_new; unsigned int free_group_id; + unsigned int group_value; unsigned int group_id; unsigned long res; int cpu; + group_value = uclamp_group_value(clamp_value); + retry: free_group_id = UCLAMP_GROUPS; @@ -1076,7 +1114,7 @@ static void uclamp_group_get(struct task_struct *p, struct uclamp_se *uc_se, uc_map_old.data = atomic_long_read(&uc_maps[group_id].adata); if (free_group_id == UCLAMP_GROUPS && !uc_map_old.se_count) free_group_id = group_id; - if (uc_map_old.value == clamp_value) + if (uc_map_old.value == group_value) break; } if (group_id >= UCLAMP_GROUPS) { @@ -1092,7 +1130,7 @@ static void uclamp_group_get(struct task_struct *p, struct uclamp_se *uc_se, } uc_map_new.se_count = uc_map_old.se_count + 1; - uc_map_new.value = clamp_value; + uc_map_new.value = group_value; res = atomic_long_cmpxchg(&uc_maps[group_id].adata, uc_map_old.data, uc_map_new.data); if (res != uc_map_old.data) @@ -1113,9 +1151,9 @@ static void uclamp_group_get(struct task_struct *p, struct uclamp_se *uc_se, #endif } - if (uc_cpu->group[clamp_id][group_id].value == clamp_value) + if (uc_cpu->group[clamp_id][group_id].value == group_value) continue; - uc_cpu->group[clamp_id][group_id].value = clamp_value; + uc_cpu->group[clamp_id][group_id].value = group_value; } done: -- 2.18.0