Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp1928095imu; Thu, 24 Jan 2019 04:30:46 -0800 (PST) X-Google-Smtp-Source: ALg8bN40FoaRcNWJlTi5rh2OyddDNhHcxYL8edU3/V1BWSRit8bUOh0nFp5ZgtZqnPVzol98GXxV X-Received: by 2002:a63:4c5:: with SMTP id 188mr5783891pge.391.1548333046763; Thu, 24 Jan 2019 04:30:46 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1548333046; cv=none; d=google.com; s=arc-20160816; b=Y8OXionx3oEsh4ZMPRxo1K3NvizQML561yUfxLjIbqaYEzoECPRr8Jwms/sJqYjsdP E968Vw1+4U3/me07x0kpiRJuQrC8B2g4+rBSN0c9A7awiL7SYvhDBCRLMzdGDMxqoUyI QZqrQXP6/iHj1F6ac7DS1YMN9Op8J/bCXCHzAMSp3PZrphNji0ZBg8uJ4jw2sOPXRu2h yn0/ZUB+5abo1mjNByYFToqxHirx92QLUOafPHn0d/99GIRZ4tFolISlJFSgzxp0zCLH y8afGu2/nFuDrNXHsdyd8Od367ySs5oVRP4xVNvMFW+592E9IgeOMNd5SBYpTh4jLfCd gPcQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=hF1akqYup72h5uNRglj94nm+OM84fNs3GXbbiuHRoeo=; b=y3P4SQacyvRiB8tuBEp7fAfgP8jPafUJzCjn7dEk0FrY+yDfA7tbwKh1zdKJQlW4Iu mgy3hVWmkXTXPPHBTgRMJKwTFDYrUT6RjF3R635fwn2znnvkKbtR8UrTsCOEuQkd7oNQ jPF93LC44nQ/a+9TjO++AsGe2C5pv2Vs1Pbxoq/Lgc2C7qfdfmStI3RmHvKLLHprK67Y /kLexEELoGNGb29pth27IUvemegWAt7HtFnHCKaEI/4EpWm+mpsO6/XJRYt+A0YTS7qE aLjTdlZqYTDWK2sPkA/TzlBWBulNERKLWsfk8E/YuyT01JkZ7p1loXKMMoDcXAFNEOFD AWuA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 14si21173616pgx.178.2019.01.24.04.30.32; Thu, 24 Jan 2019 04:30:46 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727918AbfAXMaQ (ORCPT + 99 others); Thu, 24 Jan 2019 07:30:16 -0500 Received: from usa-sjc-mx-foss1.foss.arm.com ([217.140.101.70]:55918 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727902AbfAXMaP (ORCPT ); Thu, 24 Jan 2019 07:30:15 -0500 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 904DFEBD; Thu, 24 Jan 2019 04:30:14 -0800 (PST) Received: from e110439-lin (e110439-lin.cambridge.arm.com [10.1.194.43]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 9C70E3F237; Thu, 24 Jan 2019 04:30:11 -0800 (PST) Date: Thu, 24 Jan 2019 12:30:09 +0000 From: Patrick Bellasi To: Peter Zijlstra Cc: linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, linux-api@vger.kernel.org, Ingo Molnar , Tejun Heo , "Rafael J . Wysocki" , Vincent Guittot , Viresh Kumar , Paul Turner , Quentin Perret , Dietmar Eggemann , Morten Rasmussen , Juri Lelli , Todd Kjos , Joel Fernandes , Steve Muckle , Suren Baghdasaryan Subject: Re: [PATCH v6 09/16] sched/cpufreq: uclamp: Add utilization clamping for RT tasks Message-ID: <20190124123009.2yulcf25ld66popd@e110439-lin> References: <20190115101513.2822-1-patrick.bellasi@arm.com> <20190115101513.2822-10-patrick.bellasi@arm.com> <20190123104944.GX27931@hirez.programming.kicks-ass.net> <20190123144011.iid3avb63r5v4r2c@e110439-lin> <20190123201146.GH17749@hirez.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190123201146.GH17749@hirez.programming.kicks-ass.net> User-Agent: NeoMutt/20180716 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 23-Jan 21:11, Peter Zijlstra wrote: > On Wed, Jan 23, 2019 at 02:40:11PM +0000, Patrick Bellasi wrote: > > On 23-Jan 11:49, Peter Zijlstra wrote: > > > On Tue, Jan 15, 2019 at 10:15:06AM +0000, Patrick Bellasi wrote: > > > > @@ -858,16 +859,23 @@ static inline void > > > > uclamp_effective_get(struct task_struct *p, unsigned int clamp_id, > > > > unsigned int *clamp_value, unsigned int *bucket_id) > > > > { > > > > + struct uclamp_se *default_clamp; > > > > + > > > > /* Task specific clamp value */ > > > > *clamp_value = p->uclamp[clamp_id].value; > > > > *bucket_id = p->uclamp[clamp_id].bucket_id; > > > > > > > > + /* RT tasks have different default values */ > > > > + default_clamp = task_has_rt_policy(p) > > > > + ? uclamp_default_perf > > > > + : uclamp_default; > > > > + > > > > /* System default restriction */ > > > > - if (unlikely(*clamp_value < uclamp_default[UCLAMP_MIN].value || > > > > - *clamp_value > uclamp_default[UCLAMP_MAX].value)) { > > > > + if (unlikely(*clamp_value < default_clamp[UCLAMP_MIN].value || > > > > + *clamp_value > default_clamp[UCLAMP_MAX].value)) { > > > > /* Keep it simple: unconditionally enforce system defaults */ > > > > - *clamp_value = uclamp_default[clamp_id].value; > > > > - *bucket_id = uclamp_default[clamp_id].bucket_id; > > > > + *clamp_value = default_clamp[clamp_id].value; > > > > + *bucket_id = default_clamp[clamp_id].bucket_id; > > > > } > > > > } > > > > > > So I still don't much like the whole effective thing; > > > > :/ > > > > I find back-annotation useful in many cases since we have different > > sources for possible clamp values: > > > > 1. task specific > > 2. cgroup defined > > 3. system defaults > > 4. system power default > > (I'm not sure I've seen 4 happen yet, what's that?) Typo: that's "system s/power/perf/ default", i.e. uclamp_default_perf introduced by this patch. > Anyway, once you get range composition defined; that should be something > like: > > R_p \Compose_g R_g > > Where R_p is the range of task-p, and R_g is the range of the g'th > cgroup of p (where you can make an identity between the root cgroup and > the system default). > > Now; as per the other email; I think the straight forward composition: > > struct range compose(struct range a, struct range b) > { > return (range){.min = clamp(a.min, b.min, b.max), > .max = clamp(a.max, b.min, b.max), }; > } This composition is done in uclamp_effective_get() but it's slightly different, since we want to support a "nice policy" where tasks can always ask less then what they have got assigned. Thus, from an abstract standpoint, if a task is in a cgroup: task.min <= R_g.min task.max <= R_g.max While, for tasks in the root cgroup system default applies and we enforece: task.min >= R_0.min task.max <= R_0.max ... where the "nice policy" is currently not more supported, but perhaps we can/should use the same for system defaults too. > (note that this is non-commutative, so we have to pay attention to > get the order 'right') > > Works in this case; unlike the cpu/rq conposition where we resort to a > pure max function for non-interference. Right. > > I don't think we can avoid to somehow back annotate on which bucket a > > task has been refcounted... it makes dequeue so much easier, it helps > > in ensuring that the refcouning is consistent and enable lazy updates. > > So I'll have to go over the code again, but I'm wondering why you're > changing uclamp_se::bucket_id on a runnable task. We change only the "requested" value, not the "effective" one. > Ideally you keep bucket_id invariant between enqueue and dequeue; then > dequeue knows where we put it. Right, that's what we do for the "effective" value. > Now I suppose actually determining bucket_id is 'expensive' (it > certainly is with the whole mapping scheme, but even that integer > division is not nice), so we'd like to precompute the bucket_id. Yes, although the complexity is mostly in the composition logic described above not on mapping at all. We have "mapping" overheads only when we change a "request" value and that's from slow-paths. The "effective" value is computed at each enqueue time by composing precomputed bucket_id representing the current "requested" task's, cgroup's and system default's values. > This then leads to the problem of how to change uclamp_se::value while > runnable (since per the other thread you don't want to always update all > runnable tasks). > > So far so right? Yes. > I'm thikning that if we haz a single bit, say: > > struct uclamp_se { > ... > unsigned int changed : 1; > }; > > We can update uclamp_se::value and set uclamp_se::changed, and then the > next enqueue will (unlikely) test-and-clear changed and recompute the > bucket_id. This mean will lazy update the "requested" bucket_id by deferring its computation at enqueue time. Which saves us a copy of the bucket_id, i.e. we will have only the "effective" value updated at enqueue time. But... > Would that not be simpler? ... although being simpler it does not fully exploit the slow-path, a syscall which is usually running from a different process context (system management software). It also fits better for lazy updates but, in the cgroup case, where we wanna enforce an update ASAP for RUNNABLE tasks, we will still have to do the updates from the slow-path. Will look better into this simplification while working on v7, perhaps the linear mapping can really help in that too. Cheers Patrick -- #include Patrick Bellasi