Received: by 2002:ac0:a5a7:0:0:0:0:0 with SMTP id m36-v6csp3440563imm; Fri, 20 Jul 2018 17:26:36 -0700 (PDT) X-Google-Smtp-Source: AAOMgpeeL9CDpy1pu/rlowVOrmDuQZbVogHT99LfDkMiBDgx9xf5ohjnIKLiF+SIhYDRU0TSYHCY X-Received: by 2002:a17:902:b609:: with SMTP id b9-v6mr3980795pls.106.1532132796526; Fri, 20 Jul 2018 17:26:36 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1532132796; cv=none; d=google.com; s=arc-20160816; b=0f9vscDYLMn1RU3cXFOAb50o3TbPkaPRMqCRTvb43fuypa7ZO9fpsspQpp6a4VN0H2 pCDjh4oe4eZO0gopkUgSQnt+TlYE1X02ogv8JlQu9KHdhBIFAhbkn2W11Lfw4ZrJKvGW WCPQRGvdMwi8vIy52W92y8KPsZVGrncTyyySJqeM1oVRhlVOLuxDhVuwsbeCT81o8/ap rztZiDs7jMRIT6zqCOrE+bdIiXNaqHXNcG6SbGJ7+yZdq2FVxpunkWiU6CfD8u4DJitL MVuulqtkMlxP/XT3QUFMb7/1m1z4deLBdtHg4/eisgw58R/wJcJOMnKcjbKLvhKbC/PZ u1gA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :references:in-reply-to:mime-version:dkim-signature :arc-authentication-results; bh=LVWyZ+MgKn+7Bsoyeo8qY34s+y6YancMwuntpY4XE2E=; b=fZYM/JPF/5ekYCRTJueJ0lcE7iiC6ChYOmtJqkl08lVWX91JY+UmIXLZwS5168nmUu NUooUebjnI50jzdke+jeUbTS/+zFwMVcmMY66hOuVsCDAbCigLMuWdm8Zszv800SyMZk bsSoTt+2tji1mnqfa3SiX/f3RLyjcta+V2j5O0ap9JvuULHyy7Ic99z6iCS5Nzo7low9 Y3sOGe0fjokj/6dO39s/2VsygyY47Dgr0ZYJ8EDGA3sv66ptbimMKiTZWy59d7kdrrEc J7knxlM2Ffbe9qTjBE6npLEhVidddFzybxYRlReJbLWgrVOyrpxkIaHizHnCBbzh+0BT Vr4g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b="Ow/wJMRV"; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id t188-v6si2952375pfd.148.2018.07.20.17.26.21; Fri, 20 Jul 2018 17:26:36 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b="Ow/wJMRV"; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728830AbeGUBQL (ORCPT + 99 others); Fri, 20 Jul 2018 21:16:11 -0400 Received: from mail-it0-f66.google.com ([209.85.214.66]:33672 "EHLO mail-it0-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728068AbeGUBQL (ORCPT ); Fri, 20 Jul 2018 21:16:11 -0400 Received: by mail-it0-f66.google.com with SMTP id y124-v6so6547323itc.0 for ; Fri, 20 Jul 2018 17:25:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=LVWyZ+MgKn+7Bsoyeo8qY34s+y6YancMwuntpY4XE2E=; b=Ow/wJMRVLc5SFvIFhipnUnkgD4XOiF6nlxtoLOtfX0gV0KKSCG1dYjnXEr9nya8oO0 HILqwZhI0fA+8mNZ4CJMm3bV2MmZf2+BA3y+/v3uBYLjOe9+U3prjRPpA1lN2fNRI3Gt h6WbNABS7QRXcOWnLEYazGNVj05aR8QLaSGAKNFYAxsXTl4cvfhq2MLXwnAE5VaAQtHj poLZgOVBzRoxHJ+ZrRjeMq9CRYECdWsEOWNsb5KUNGPTK8nDPjBrXaXvzFGUgUkGwBu6 mOxX1rzOXB8FYeX4fQSn5e8Aa+pYZjWtwjtGpa/n/5bzRdsrlR6SUqPWQ6IGtlviuVh+ nbUw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=LVWyZ+MgKn+7Bsoyeo8qY34s+y6YancMwuntpY4XE2E=; b=Cupy6EdRH5M/9/iVnSkUqeu8Ep4UpYqjQICPT8pw6irEIMGj4JI4k1rMVOQGYJcKc0 YdBdmFUnlEPNgIRe5CFgEMkjXfSa1UQZi+6Lb7TtAYx8ew+NTFfGlBpbx4Ls5J2Vuyh+ /4vGJJ6Yw/x9JXcyWef7ptvDNf23DP+DLDFnMo56bFeQq+Z6YqyeWHqsyrFx0X8sFfcu 5BOjxi1kwXWPvGaaOx1i+/3OHfPKs3IU2qcllBBxPQymwkGnc2t9pOGOCn5scK+fqXbk jHmWiHBYw3tDhOxmyBiutA0QaBvekJYqKNhCgcZZtu5MepmszyK3Nxk8S47ZBkC3XQZp zWSQ== X-Gm-Message-State: AOUpUlE9mXsS3LEXR+WokfK9WE5X6t0lpUF/B8N/JRAMAXEsPROcUCjc Dc/jgPwTeaI6s+JxBdSCt0iIGbjJkY8fP6TlyUXMNw== X-Received: by 2002:a02:5952:: with SMTP id p79-v6mr3626391jab.129.1532132732577; Fri, 20 Jul 2018 17:25:32 -0700 (PDT) MIME-Version: 1.0 Received: by 2002:ac0:e445:0:0:0:0:0 with HTTP; Fri, 20 Jul 2018 17:25:31 -0700 (PDT) In-Reply-To: <20180720151156.GA31421@e110439-lin> References: <20180716082906.6061-1-patrick.bellasi@arm.com> <20180716082906.6061-3-patrick.bellasi@arm.com> <20180720151156.GA31421@e110439-lin> From: Suren Baghdasaryan Date: Fri, 20 Jul 2018 17:25:31 -0700 Message-ID: Subject: Re: [PATCH v2 02/12] sched/core: uclamp: map TASK's clamp values into CPU's clamp groups To: Patrick Bellasi Cc: linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, Ingo Molnar , Peter Zijlstra , Tejun Heo , "Rafael J . Wysocki" , Viresh Kumar , Vincent Guittot , Paul Turner , Dietmar Eggemann , Morten Rasmussen , Juri Lelli , Todd Kjos , Joel Fernandes , Steve Muckle Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Patrick, On Fri, Jul 20, 2018 at 8:11 AM, Patrick Bellasi wrote: > Hi Suren, > thanks for the review, all good point... some more comments follow > inline. > > On 19-Jul 16:51, Suren Baghdasaryan wrote: >> On Mon, Jul 16, 2018 at 1:28 AM, Patrick Bellasi >> wrote: > > [...] > >> > +/** >> > + * uclamp_group_available: checks if a clamp group is available >> > + * @clamp_id: the utilization clamp index (i.e. min or max clamp) >> > + * @group_id: the group index in the given clamp_id >> > + * >> > + * A clamp group is not free if there is at least one SE which is sing a clamp >> >> Did you mean to say "single clamp"? > > No, it's "...at least one SE which is USING a clamp value..." > >> > + * value mapped on the specified clamp_id. These SEs are reference counted by >> > + * the se_count of a uclamp_map entry. >> > + * >> > + * Return: true if there are no SE's mapped on the specified clamp >> > + * index and group >> > + */ >> > +static inline bool uclamp_group_available(int clamp_id, int group_id) >> > +{ >> > + struct uclamp_map *uc_map = &uclamp_maps[clamp_id][0]; >> > + >> > + return (uc_map[group_id].value == UCLAMP_NONE); >> >> The usage of UCLAMP_NONE is very confusing to me. It was not used at >> all in the patch where it was introduced [1/12], here it's used as a >> clamp value and in uclamp_group_find() it's used as group_id. Please >> clarify the usage. > > Yes, it's meant to represent a "clamp not valid" condition, whatever > it's a "clamp group" or a "clamp value"... perhaps the name can be > improved. > >> I also feel UCLAMP_NONE does not really belong to >> the uclamp_id enum because other elements there are indexes in >> uclamp_maps and this one is a special value. > > Right, it looks a bit misplaced, I agree. I think I tried to set it > using a #define but there was some issues I don't remember now... > Anyway, I'll give it another go... > > >> IMHO if both *group_id* >> and *value* need a special value (-1) to represent >> unused/uninitialized entry it would be better to use different >> constants. Maybe UCLAMP_VAL_NONE and UCLAMP_GROUP_NONE? > > Yes, maybe we can use a > > #define UCLAMP_NOT_VALID -1 > > and get rid the confusing enum entry. > > Will update it on v3. > Sounds good to me. >> > +} > > [...] > >> > +/** >> > + * uclamp_group_find: finds the group index of a utilization clamp group >> > + * @clamp_id: the utilization clamp index (i.e. min or max clamping) >> > + * @clamp_value: the utilization clamping value lookup for >> > + * >> > + * Verify if a group has been assigned to a certain clamp value and return >> > + * its index to be used for accounting. >> > + * >> > + * Since only a limited number of utilization clamp groups are allowed, if no >> > + * groups have been assigned for the specified value, a new group is assigned >> > + * if possible. Otherwise an error is returned, meaning that an additional clamp >> > + * value is not (currently) supported. >> > + */ >> > +static int >> > +uclamp_group_find(int clamp_id, unsigned int clamp_value) >> > +{ >> > + struct uclamp_map *uc_map = &uclamp_maps[clamp_id][0]; >> > + int free_group_id = UCLAMP_NONE; >> > + unsigned int group_id = 0; >> > + >> > + for ( ; group_id <= CONFIG_UCLAMP_GROUPS_COUNT; ++group_id) { >> > + /* Keep track of first free clamp group */ >> > + if (uclamp_group_available(clamp_id, group_id)) { >> > + if (free_group_id == UCLAMP_NONE) >> > + free_group_id = group_id; >> > + continue; >> > + } >> > + /* Return index of first group with same clamp value */ >> > + if (uc_map[group_id].value == clamp_value) >> > + return group_id; >> > + } >> > + /* Default to first free clamp group */ >> > + if (group_id > CONFIG_UCLAMP_GROUPS_COUNT) >> >> Is the condition above needed? I think it's always true if you got here. >> Also AFAIKT after the for loop you can just do: >> >> return (free_group_id != UCLAMP_NONE) ? free_group_id : -ENOSPC; > > Yes, you right... the code above can be simplified! > >> >> > + group_id = free_group_id; >> > + /* All clamp group already track different clamp values */ >> > + if (group_id == UCLAMP_NONE) >> > + return -ENOSPC; >> > + return group_id; >> > +} > > [...] > >> > +static inline void uclamp_group_put(int clamp_id, int group_id) >> > +{ >> > + struct uclamp_map *uc_map = &uclamp_maps[clamp_id][0]; >> > + unsigned long flags; >> > + >> > + /* Ignore SE's not yet attached */ >> > + if (group_id == UCLAMP_NONE) >> > + return; >> > + >> > + /* Remove SE from this clamp group */ >> > + raw_spin_lock_irqsave(&uc_map[group_id].se_lock, flags); >> > + uc_map[group_id].se_count -= 1; >> >> If uc_map[group_id].se_count was 0 before decrement you end up with >> se_count == -1 and no reset for the element. > > Well... this should never happen, otherwise the refcounting is not > working as expected. > > Maybe we can add (at least) a debug check and warning, something like: > > #ifdef SCHED_DEBUG > if (unlikely(uc_map[group_id].se_count == 0)) { > WARN(1, "invalid clamp group [%d:%d] refcount\n", > clamp_id, group_id); > uc_map[group_id].se_count = 1; > } > #endif > >> > + if (uc_map[group_id].se_count == 0) >> > + uclamp_group_reset(clamp_id, group_id); >> > + raw_spin_unlock_irqrestore(&uc_map[group_id].se_lock, flags); >> > +} >> > + > > [...] > >> > static inline int __setscheduler_uclamp(struct task_struct *p, >> > const struct sched_attr *attr) >> > { >> > + struct uclamp_se *uc_se; >> > + int retval = 0; >> > + >> > if (attr->sched_util_min > attr->sched_util_max) >> > return -EINVAL; >> > if (attr->sched_util_max > SCHED_CAPACITY_SCALE) >> > return -EINVAL; >> > >> > - p->uclamp[UCLAMP_MIN] = attr->sched_util_min; >> > - p->uclamp[UCLAMP_MAX] = attr->sched_util_max; >> > + mutex_lock(&uclamp_mutex); >> > + >> > + /* Update min utilization clamp */ >> > + uc_se = &p->uclamp[UCLAMP_MIN]; >> > + retval |= uclamp_group_get(p, UCLAMP_MIN, uc_se, >> > + attr->sched_util_min); >> > + >> > + /* Update max utilization clamp */ >> > + uc_se = &p->uclamp[UCLAMP_MAX]; >> > + retval |= uclamp_group_get(p, UCLAMP_MAX, uc_se, >> > + attr->sched_util_max); >> > + >> > + mutex_unlock(&uclamp_mutex); >> > + >> > + /* >> > + * If one of the two clamp values should fail, >> > + * let the userspace know. >> > + */ >> > + if (retval) >> > + return -ENOSPC; >> >> Maybe a minor issue but this failure is ambiguous. It might mean: >> 1. no clamp value was updated >> 2. UCLAMP_MIN was updated but UCLAMP_MAX was not >> 3. UCLAMP_MAX was updated but UCLAMP_MIN was not > > That's right, I put a bit of thought on that me too but at the end > I've been convinced that the possibility to use a single syscall to > set both clamps at the same time has some benefits for user-space. > > Maybe the current solution can be improved by supporting an (optional) > strict semantic with an in-kernel roll-back in case one of the two > uclamp_group_get should fail. > > The strict semantic with roll-back could be controller via an > additional flag, e.g. SCHED_FLAG_UTIL_CLAMP_STRICT. > > When the flag is set, either we are able to set both the attributes or > we roll-back. Otherwise, when the flag is not set, we keep the current > behavior. i.e. we set what we can and report an error to notify > userspace that one constraints was not enforced. > > The following snippet should implement this semantics: > > ---8<--- > > /* Uclamp flags */ > #define SCHED_FLAG_UTIL_CLAMP_STRICT 0x11 /* Roll-back on failure */ > #define SCHED_FLAG_UTIL_CLAMP_MIN 0x12 /* Update util_min */ > #define SCHED_FLAG_UTIL_CLAMP_MAX 0x14 /* Update util_max */ > #define SCHED_FLAG_UTIL_CLAMP ( \ > SCHED_FLAG_UTIL_CLAMP_MIN | SCHED_FLAG_UTIL_CLAMP_MAX) > Having ability to update only min or only max this way might be indeed very useful. Instead of rolling back on failure I would suggest to check both inputs first to make sure there won't be any error before updating. This would remove the need for SCHED_FLAG_UTIL_CLAMP_STRICT (which I think any user would want to set to 1 anyway). Looks like uclamp_group_get() can fail only if uclamp_group_find() fails to find a slot for uclamp_value or a free slot. So one way to do this search before update is to call uclamp_group_find() for both UCLAMP_MIN and UCLAMP_MAX beforehand and if they succeed then pass obtained next_group_ids into uclamp_group_get() to avoid doing the same search twice. This requires some refactoring of uclamp_group_get() but I think the end result would be a cleaner and more predictable solution. > static inline int __setscheduler_uclamp(struct task_struct *p, > const struct sched_attr *attr) > { > unsigned int uclamp_value_old = 0; > unsigned int uclamp_value; > struct uclamp_se *uc_se; > int retval = 0; > > if (attr->sched_util_min > attr->sched_util_max) > return -EINVAL; > if (attr->sched_util_max > 100) > return -EINVAL; > > mutex_lock(&uclamp_mutex); > > if (!(attr->sched_flags & SCHED_FLAG_UTIL_CLAMP_MIN)) > goto set_util_max; > > uc_se = &p->uclamp[UCLAMP_MIN]; > uclamp_value = scale_from_percent(attr->sched_util_min); > if (uc_se->value == uclamp_value) > goto set_util_max; > > /* Update min utilization clamp */ > uclamp_value_old = uc_se->value; > retval |= uclamp_group_get(p, NULL, UCLAMP_MIN, uc_se, uclamp_value); > if (retval && > attr->sched_flags & SCHED_FLAG_UTIL_CLAMP_STRICT) > goto exit_failure; > > set_util_max: > > if (!(attr->sched_flags & SCHED_FLAG_UTIL_CLAMP_MAX)) > goto exit_done; > > uc_se = &p->uclamp[UCLAMP_MAX]; > uclamp_value = scale_from_percent(attr->sched_util_max); > if (uc_se->value == uclamp_value) > goto exit_done; > > /* Update max utilization clamp */ > if (uclamp_group_get(p, NULL, UCLAMP_MAX, > uc_se, uclamp_value)) > goto exit_rollback; > > exit_done: > mutex_unlock(&uclamp_mutex); > return retval; > > exit_rollback: > if (attr->sched_flags & SCHED_FLAG_UTIL_CLAMP_MIN && > attr->sched_flags & SCHED_FLAG_UTIL_CLAMP_STRICT) { > uclamp_group_get(p, NULL, UCLAMP_MIN, > uc_se, uclamp_value_old); > } > exit_failure: > mutex_unlock(&uclamp_mutex); > > return -ENOSPC; > } > > ---8<--- > > > Could that work better? > > The code is maybe a bit more convoluted... but perhaps it can be > improved by encoding it in a loop. > > > -- > #include > > Patrick Bellasi