Received: by 2002:ac0:950c:0:0:0:0:0 with SMTP id f12csp3107197imc; Wed, 13 Mar 2019 09:01:16 -0700 (PDT) X-Google-Smtp-Source: APXvYqz+aA1P9eY2kYOUjLHPcBGOVKmNlbNXsZ6RzX/FfpuD/WUqvdUdvyNhtBIhofBEFONSWU4u X-Received: by 2002:a62:bd17:: with SMTP id a23mr44071305pff.233.1552492875976; Wed, 13 Mar 2019 09:01:15 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1552492875; cv=none; d=google.com; s=arc-20160816; b=NeJMDt/cLhOQIWXICnxRiKK6jYMirSEfNbbsdn8WMDT7cMKr1JzEY2XGEmpPdXpwrp m6U+AAKMh4UKNbhw6ghcpV04p6k2hpstyF7N90EEujd2oYbKKgEtpO999SaPpXC2VfWI COImd/H7pNQjZxd6dF8xfXLyEosh3noHzE7cXCOEj8bjWCX7PG+hYV9fMwdvbSuHzI/J YmUaYG/vxmzhiQNGbQxT1CE4HMgK2mXAiHyQNHEBYbUQpKBT3eEauIjUK5ncUPuvOoKQ sr/04ywkaMwKpGUykDiG1iHnkX7wUughoxs6b0vAB74YZQ1hhfDK6mD+uCDZbsTvPmin cLSg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=hCs/VMU89zZ+B//Q5huIkeI2QH5BRJq6A47OsRWrCQU=; b=CT0VITs8jq7C6ChKlWq7eg4FJtuSCTHcE/7sWW6KIUw1cJ50745atlOcwUiQ5peA+r i9mFouKudcvKPg5uxvPevy06qpsyIlMXZhM82j1bCsHcBDksjEUjIua4O3ummgEnqRUO 2XvYZAVK4Gkkz4esbt05kHckN7F8BwAQJT5Cg+RLCCPmtQOP1dq8DewI/wVlc0MBw9xE r8BuOe4vaZiYx5zR8/FwCyVqofK9VrKvz5QbYGVRjuobUW+zQHpzXdSn+GR4//TtKfjQ M9vqvXDzvNwxj2EaT9uNL62Wmt28UWCm3dEDqw61AXwJj7M3Aihl64eeDmdBqZtYP6i3 bYXw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id k90si1720435pfj.187.2019.03.13.09.00.58; Wed, 13 Mar 2019 09:01:15 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726510AbfCMQAB (ORCPT + 99 others); Wed, 13 Mar 2019 12:00:01 -0400 Received: from usa-sjc-mx-foss1.foss.arm.com ([217.140.101.70]:59768 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725876AbfCMQAA (ORCPT ); Wed, 13 Mar 2019 12:00:00 -0400 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 39EA180D; Wed, 13 Mar 2019 09:00:00 -0700 (PDT) Received: from e110439-lin (e110439-lin.cambridge.arm.com [10.1.194.43]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 455C33F71D; Wed, 13 Mar 2019 08:59:57 -0700 (PDT) Date: Wed, 13 Mar 2019 15:59:54 +0000 From: Patrick Bellasi To: Peter Zijlstra Cc: linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, linux-api@vger.kernel.org, Ingo Molnar , Tejun Heo , "Rafael J . Wysocki" , Vincent Guittot , Viresh Kumar , Paul Turner , Quentin Perret , Dietmar Eggemann , Morten Rasmussen , Juri Lelli , Todd Kjos , Joel Fernandes , Steve Muckle , Suren Baghdasaryan Subject: Re: [PATCH v7 01/15] sched/core: uclamp: Add CPU's clamp buckets refcounting Message-ID: <20190313155954.jse2tyn5iqxm6wle@e110439-lin> References: <20190208100554.32196-1-patrick.bellasi@arm.com> <20190208100554.32196-2-patrick.bellasi@arm.com> <20190313135238.GC5922@hirez.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190313135238.GC5922@hirez.programming.kicks-ass.net> User-Agent: NeoMutt/20180716 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 13-Mar 14:52, Peter Zijlstra wrote: > On Fri, Feb 08, 2019 at 10:05:40AM +0000, Patrick Bellasi wrote: > > +/* > > + * When a task is enqueued on a rq, the clamp bucket currently defined by the > > + * task's uclamp::bucket_id is reference counted on that rq. This also > > + * immediately updates the rq's clamp value if required. > > + * > > + * Since tasks know their specific value requested from user-space, we track > > + * within each bucket the maximum value for tasks refcounted in that bucket. > > + * This provide a further aggregation (local clamping) which allows to track > > + * within each bucket the exact "requested" clamp value whenever all tasks > > + * RUNNABLE in that bucket require the same clamp. > > + */ > > +static inline void uclamp_rq_inc_id(struct task_struct *p, struct rq *rq, > > + unsigned int clamp_id) > > +{ > > + unsigned int bucket_id = p->uclamp[clamp_id].bucket_id; > > + unsigned int rq_clamp, bkt_clamp, tsk_clamp; > > + > > + rq->uclamp[clamp_id].bucket[bucket_id].tasks++; > > + > > + /* > > + * Local clamping: rq's buckets always track the max "requested" > > + * clamp value from all RUNNABLE tasks in that bucket. > > + */ > > + tsk_clamp = p->uclamp[clamp_id].value; > > + bkt_clamp = rq->uclamp[clamp_id].bucket[bucket_id].value; > > + rq->uclamp[clamp_id].bucket[bucket_id].value = max(bkt_clamp, tsk_clamp); > > So, if I read this correct: > > - here we track a max value in a bucket, > > > + rq_clamp = READ_ONCE(rq->uclamp[clamp_id].value); > > + WRITE_ONCE(rq->uclamp[clamp_id].value, max(rq_clamp, tsk_clamp)); > > +} > > + > > +/* > > + * When a task is dequeued from a rq, the clamp bucket reference counted by > > + * the task is released. If this is the last task reference counting the rq's > > + * max active clamp value, then the rq's clamp value is updated. > > + * Both the tasks reference counter and the rq's cached clamp values are > > + * expected to be always valid, if we detect they are not we skip the updates, > > + * enforce a consistent state and warn. > > + */ > > +static inline void uclamp_rq_dec_id(struct task_struct *p, struct rq *rq, > > + unsigned int clamp_id) > > +{ > > + unsigned int bucket_id = p->uclamp[clamp_id].bucket_id; > > + unsigned int rq_clamp, bkt_clamp; > > + > > + SCHED_WARN_ON(!rq->uclamp[clamp_id].bucket[bucket_id].tasks); > > + if (likely(rq->uclamp[clamp_id].bucket[bucket_id].tasks)) > > + rq->uclamp[clamp_id].bucket[bucket_id].tasks--; > > + > > + /* > > + * Keep "local clamping" simple and accept to (possibly) overboost > > + * still RUNNABLE tasks in the same bucket. > > + */ > > + if (likely(rq->uclamp[clamp_id].bucket[bucket_id].tasks)) > > + return; > > (Oh man, I hope that generates semi sane code; long live CSE passes I > suppose) What do you mean ? > But we never decrement that bkt_clamp value on dequeue. We decrement the bkt_clamp value only when the bucket becomes empty and thus we pass the condition above. That's what the comment above is there to call out. > > + bkt_clamp = rq->uclamp[clamp_id].bucket[bucket_id].value; > > + > > + /* The rq's clamp value is expected to always track the max */ > > + rq_clamp = READ_ONCE(rq->uclamp[clamp_id].value); > > + SCHED_WARN_ON(bkt_clamp > rq_clamp); > > + if (bkt_clamp >= rq_clamp) { > > head hurts, this reads ==, how can this ever not be so? Never, given the current code, that's just defensive programming. If in the future the accounting should be accidentally broken by some refactoring we warn and fix the corrupted data structures at first chance. Is that so bad? > > + /* > > + * Reset rq's clamp bucket value to its nominal value whenever > > + * there are anymore RUNNABLE tasks refcounting it. > > -ENOPARSE That's related to the comment above, when you say we don't decrement the bkt_clamp. Because of backetization, we potentially end up tracking tasks with different requested clamp values in the same bucket. For example, with 20% bucket size, we can have: Task1: util_min=25% Task2: util_min=35% accounted in the same bucket. This ensure that while they are both running we boost 35%. If Task1 should run longer than Task2, Task1 will be "overboosted" until its end. The bucket value will be reset to 20% (nominal value) when both tasks are idle. > > + */ > > + rq->uclamp[clamp_id].bucket[bucket_id].value = > > + uclamp_bucket_value(rq_clamp); > > But basically you decrement the bucket value to the nominal value. Yes, at this point we know there are no more tasks in this bucket and we reset its value. > > > + uclamp_rq_update(rq, clamp_id); > > + } > > +} > > Given all that, what is to stop the bucket value to climbing to > uclamp_bucket_value(+1)-1 and staying there (provided there's someone > runnable)? Nothing... but that's an expected consequence of bucketization. > Why are we doing this... ? You can either decide to: a) always boost tasks to just the bucket nominal value thus always penalizing both Task1 and Task2 of the example above b) always boost tasks to the bucket "max" value thus always overboosting both Task1 and Task2 of the example above The solution above instead has a very good property: in systems where you have only few and well defined clamp values we always provide the exact boost. For example, if your system requires only 23% and 47% boost values (totally random numbers), then you can always get the exact boost required using just 3 bucksts or ~33% size each. In systems where you don't know which boost values you will have, you can still defined the maximum overboost granularity you accept for each task by just tuning the number of clamp groups. For example, with 20 groups you can have a 5% max overboost. -- #include Patrick Bellasi