Received: by 2002:a25:d7c1:0:0:0:0:0 with SMTP id o184csp1742931ybg; Sat, 19 Oct 2019 01:32:24 -0700 (PDT) X-Google-Smtp-Source: APXvYqwgZzbn/6hoJomm+SUcqz2zMIszHvuU1sUxDujpIHlQmL5o6ij6EHOvQMFwbaQtqhOXDDDY X-Received: by 2002:a50:da0f:: with SMTP id z15mr14088220edj.137.1571473944792; Sat, 19 Oct 2019 01:32:24 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1571473944; cv=none; d=google.com; s=arc-20160816; b=PSG0t3parwwGfF/DbdXfxWYPgdJBn3enIyeY8DLI38gJz5pjaIBGZrAxdMUA5q14Ia vxjAj8FGcbcx/9tuQSp6V78aCS7hSYOQgbnyGq8LtKDKRBpndwF1+nk4uWlr92NsR2B3 bsCW+vDrxwxG7sRXMZa+VYOQ/jCcmbfuDj3P23WsjtD/GcAkQ2kkGNhoxqAUIqsN7UGi LzhImEZW+/eCbFygjyOoWu5Dm+iIuXVxo1GuFDHN05ZEC39DX57OMUxsncJaHB2MPVIl CSB/whOZi9cDmw7x56ogUGu67hmG54X500DsOwcy4UYwSXQ1oWKJ5VRZbpPm1mhHTYuC WYVw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=yLLmFOSm8HttUwzcA55yirZl/vYaKzUHayfsFn/aSKw=; b=EZmXHMfc7ZvJYo6Ls4x0fW82Ogw9BpQtV6GPkedWL1EXeSJAvfxaS/SV6jUnw56ZoA TU335Ws5+MoNZG4KZ+jUjjsoK1QAlXzVPc8TNNAF+avCO9DW0MyeJsn0bbFzT0yyehvG 20JVIY4ql1z1C6z5y+KQrI5IOVRTGwqzEv/bGzDRMFWssY1tCCdWq4LiAlNeFIpJoUCi pS1PnbCXBZSsFcRXmxYJ9/9HBLg0mq0BvEbUj+TYrI/qwi6kv7a93Lr7da6qXLlIJ0xC wxZUpRQZ8BICE8pqw/nRU9F3F4OMd9HCcY9qP6JGjnfsXdET8WvorVWJDbUj1PTpvtL6 JR+g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=ENDlG38c; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id v27si4765335ejb.182.2019.10.19.01.32.02; Sat, 19 Oct 2019 01:32:24 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=ENDlG38c; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2405778AbfJRPVL (ORCPT + 99 others); Fri, 18 Oct 2019 11:21:11 -0400 Received: from mail-lj1-f194.google.com ([209.85.208.194]:45437 "EHLO mail-lj1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2389421AbfJRPVL (ORCPT ); Fri, 18 Oct 2019 11:21:11 -0400 Received: by mail-lj1-f194.google.com with SMTP id q64so6592190ljb.12 for ; Fri, 18 Oct 2019 08:21:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=yLLmFOSm8HttUwzcA55yirZl/vYaKzUHayfsFn/aSKw=; b=ENDlG38cJuub/7j/v0Rk3VPnVnnPDhPE1t/OdMIQBWilYVRbNguMx/WSjX1uY2RfI8 vD4ES6JgbzxSTPk8KQNPRgDo7LFw4Yg4W2plBYQJrQAM3Ju76KUvpBfLJlScuqhoHJ9U cvsEpqnDWXakoF+30bCaS/6BzwGKGe0HNDJmA1/refOgDMxevJ6uFmLW+QK8dTiyLaCz M64lky6fzJho9x5cV4tKntKJ2qXppel0FgLaSS7WioUq42gYFQrC+TnoUc1Xw9uzRV3A EjbR3+ts5ypcUuBFxFd91FxTj72yHcY5TzVxE14dw5tuGgYTyyZnm2mDScbUsxL5VR/A vrmQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=yLLmFOSm8HttUwzcA55yirZl/vYaKzUHayfsFn/aSKw=; b=daUIpSor2EhgqFuhuvsqnGT5xt/xV5g30mze93gX/JRqXZF/a28GySkG3UOufzsBa4 T/R95txxk68GsSe9RNSPWqrwTS03tYG4x5GCU/pmvTBBP+V1njaUGrQYVdg8lmLKlRHo /slsw+zynkoZ+1FrnRqYPWiYBYJzTUGDqnuEjprsde0yu6vGdf2LHvW1y/oA3IAbAhTf GHaNM+01ed/j9/yLhaPYNnrsPuIplpe4p56/wNKWKsfJJTqKxjNsFq09nWxQKFXj2uvc +1kSPZMLwIcE002AEogTjBdI/UoaN6omv9HH9NNozQ60Pgmmlw01+Fga7q6X0I2gjTpH TeXw== X-Gm-Message-State: APjAAAUxefffECeH//1W+8o/2B2ZNDbeY92abqZnfWIpC+HaJxQ7FFCx N69sRL/MRUVNCIoQKBUFX2leqsNZZQpcGuRebRorig== X-Received: by 2002:a2e:978e:: with SMTP id y14mr6233454lji.206.1571412067347; Fri, 18 Oct 2019 08:21:07 -0700 (PDT) MIME-Version: 1.0 References: <20191011134500.235736-1-douglas.raillard@arm.com> <20191014145315.GZ2311@hirez.programming.kicks-ass.net> <20191017095015.GI2311@hirez.programming.kicks-ass.net> <7edb1b73-54e7-5729-db5d-6b3b1b616064@arm.com> <20191017190708.GF22902@worktop.programming.kicks-ass.net> <0b807cb3-6a88-1138-dc66-9a32d9bba7ea@arm.com> <20191018120719.GH2328@hirez.programming.kicks-ass.net> <32d07c51-847d-9d51-480c-c8836f1aedc7@arm.com> In-Reply-To: <32d07c51-847d-9d51-480c-c8836f1aedc7@arm.com> From: Vincent Guittot Date: Fri, 18 Oct 2019 17:20:55 +0200 Message-ID: Subject: Re: [RFC PATCH v3 0/6] sched/cpufreq: Make schedutil energy aware To: Douglas Raillard Cc: Peter Zijlstra , linux-kernel , "open list:THERMAL" , Ingo Molnar , "Rafael J. Wysocki" , viresh kumar , Juri Lelli , Dietmar Eggemann , Quentin Perret , Patrick Bellasi , dh.han@samsung.com Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, 18 Oct 2019 at 16:44, Douglas Raillard wrote: > > > > On 10/18/19 1:07 PM, Peter Zijlstra wrote: > > On Fri, Oct 18, 2019 at 12:46:25PM +0100, Douglas Raillard wrote: > > > >>> What I don't see is how that that difference makes sense as input to: > >>> > >>> cost(x) : (1 + x) * cost_j > >> > >> The actual input is: > >> x = (EM_COST_MARGIN_SCALE/SCHED_CAPACITY_SCALE) * (util - util_est) > >> > >> Since EM_COST_MARGIN_SCALE == SCHED_CAPACITY_SCALE == 1024, this factor of 1 > >> is not directly reflected in the code but is important for units > >> consistency. > > > > But completely irrelevant for the actual math and conceptual > > understanding. > > > how that that difference makes sense as input to > I was unsure if you referred to the units being inconsistent or the > actual way of computing values being strange, so I provided some > justification for both. > > > Just because computers suck at real numbers, and floats > > are expensive, doesn't mean we have to burden ourselves with fixed point > > when writing equations. > > > > Also, as a physicist I'm prone to normalizing everything to 1, because > > that's lazy. > > > >>> I suppose that limits the additional OPP to twice the previously > >>> selected cost / efficiency (see the confusion from that other email). > >>> But given that efficency drops (or costs rise) for higher OPPs that > >>> still doesn't really make sense.. > > > >> Yes, this current limit to +100% freq boosting is somehow arbitrary and > >> could probably benefit from being tunable in some way (Kconfig option > >> maybe). When (margin > 0), we end up selecting an OPP that has a higher cost > >> than the one strictly required, which is expected. The goal is to speed > >> things up at the expense of more power consumed to achieve the same work, > >> hence at a lower efficiency (== higher cost). > > > > No, no Kconfig knobs. > > > >> That's the main reason why this boosting apply a margin on the cost of the > >> selected OPP rather than just inflating the util. This allows controlling > >> directly how much more power (battery life) we are going to spend to achieve > >> some work that we know could be achieved with less power. > > > > But you're not; the margin is relative to the OPP, it is not absolute. > > Considering a CPU with 1024 max capacity (since we are not talking about > migrations here, we can ignore CPU invariance): > > work = normalized number of iterations of a given busy loop > # Thanks to freq invariance > work = util (between 0 and 1) > util = f/f_max > > # f(work) is the min freq that is admissible for "work", which we will > # abbreviate as "f" > f(work) = work * f_max > > # from struct em_cap_state doc in energy_model.h > cost(f) = power(f) * f_max / f > cost(f) = power(f) / util > cost(f) = power(f) / work > power(f) = cost(f) * work > > boosted_cost(f) = cost(f) + x > boosted_power(f) = boosted_cost(f) * work > boosted_power(f) = (cost(f) + x) * work > > # Let's normalize cost() so we can forget about f and deal only with work. > cost'(work) = cost(f)/cost(f_max) > x' = x/cost(f_max) > boosted_power'(work) = (cost'(work) + x') * work > boosted_power'(work) = cost'(work) * work + x' * work > boosted_power'(work) = power'(work) + x' * work > boosted_power'(work) = power'(work) + A(work) > > # Over a duration T, spend an extra B unit of energy > B(work) = A(work) * T > lost_battery_percent(work) = 100 * B(work)/total_battery_energy > lost_battery_percent(work) = 100 * T * x' * work /total_battery_energy > lost_battery_percent(work) = > (100 * T / cost(f_max) / total_battery_energy) * x * work > > This means that the effect of boosting on battery life is proportional > to "x" unless I made a mistake somewhere. Because the boost is relative to cost(f) and cost is not linear to the frequency, I don't think that it's is a linear relation. > > > > > Or rather, the only actual limit is in relation to the max OPP. So you > > have very little actual control over how much more energy you're > > spending. > > > >>> So while I agree that 2) is a reasonable signal to work from, everything > >>> that comes after is still much confusing me. > > > >> "When applying these boosting rules on the runqueue util signals ...": > >> Assuming the set of enqueued tasks stays the same between 2 observations > >> from schedutil, if we see the rq util_avg increase above its > >> util_est.enqueued, that means that at least one task had its util_avg go > >> above util_est.enqueued. We might miss some boosting opportunities if some > >> (util - util_est) compensates: > >> TASK_1(util - util_est) = - TASK_2(util - util_est) > >> but working on the aggregated value is much easier in schedutil, to avoid > >> crawling the list of entities. > > > > That still does not explain why 'util - util_est', when >0, makes for a > > sensible input into an OPP relative function > I agree that 'util - util_est', when >0, indicates utilization is > > increasing (for the aperiodic blah blah blah). But after that I'm still > > confused. > > For the same reason PELT makes a sensible input for OPP selection. > Currently, OPP selection is based on max(util_avg, util_est.enqueued) > (from cpu_util_cfs in sched.h), so as soon as we have > (util - util_est > 0), the OPP will be selected according to util_avg. > In a way, using util_avg there is already some kind of boosting. > > Since the boosting is essentially (util - constant), it grows the same > way as util. If we think of (util - util_est) as being some estimation > of how wrong we were in the estimation of the task "true" utilization of > the CPU, then it makes sense to feed that to the boost. The wronger we > were, the more we want to boost, because the more time passes, the more > the scheduler realizes it actually does not know what the task needs. In > doubt, provide a higher freq than usual until we get to know this task > better. When that happens (at the next period), boosting is disabled and > we revert to the usual behavior (aka margin=0). > > Hope we are converging to some wording that makes sense.