Received: by 2002:ac0:a5b6:0:0:0:0:0 with SMTP id m51-v6csp792025imm; Wed, 6 Jun 2018 06:08:02 -0700 (PDT) X-Google-Smtp-Source: ADUXVKLCLMhqkAfotKVEW4CqHCUtAjlvLGIUPchu898UGrVYbMZEBvQgK9BPuF5kT3ZN3mFlZYfW X-Received: by 2002:a63:740d:: with SMTP id p13-v6mr2491896pgc.327.1528290482687; Wed, 06 Jun 2018 06:08:02 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1528290482; cv=none; d=google.com; s=arc-20160816; b=BI6fIvxRcfZstmlR1of/6FJBQEx09sGVjPtzRIdxhrRHnY0KHev0MLhocf3K/enMpS +FUapEjiO6SmI33GCZxmUYlhJhoo6c8r80CsKm7Z6PO20+AI4aTBXL3V3BtPS+1cDLZ7 NP65EtWWSciAmKSivFBfb1JXxfuBjmdm7Uf/fy5vEAsh2FaepfdQUoY6xRAs6zY6AYHq 5D2ROlGGmsKsr1Kpus0vL4KqUxmKXhUZmOHtVz3MIxhOuGA1LvYlqwzqUEXL+dA4Bci/ TiYjI8WBBvoDvgH5/cprluRzWvkcEeLvyIseAhlhJaCNrihrM2u/UOLWVhsmCpxjpWYm m+aQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:dkim-signature :arc-authentication-results; bh=5sRcFRX6La4MmtwYs1e0D1LpjIgfmIHFW+RZIrp9ux4=; b=Wgg2xnp76mjokwvc7Hd0l0p883QLrQWTodD2Sc9AOYGGxSTqRubjyU/Kl9KH0qx3rT pMM2qh9yIQY6jzclQ/wqVIWs8216kmInbAKmdMFQbuemVFw60gvV9ojCjFwfCt1vas2Y oSMZiOvJ58hjD3vf1pkJlLmOlEZaI+2NW2MiWhxxfscn7Hil3Lm0nBJ8U5ZVqfxZV8AF fnw/Mw5QdwYZ+hWLN6ZGjmZKWfPX78R86jJjQHLvrF/CSZSywLE/see/NSxvVaI1OYa6 UPHDxYELtMHx75/9gzv8zp3As/fEt6FUR7Vsee9/ZV/xjZ8RDSF6ZLJPK3A6L0STJpnM 1wfA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@evidence-eu-com.20150623.gappssmtp.com header.s=20150623 header.b=WYlagfUv; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id h1-v6si8893552pfn.285.2018.06.06.06.07.48; Wed, 06 Jun 2018 06:08:02 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@evidence-eu-com.20150623.gappssmtp.com header.s=20150623 header.b=WYlagfUv; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752059AbeFFNGD (ORCPT + 99 others); Wed, 6 Jun 2018 09:06:03 -0400 Received: from mail-wr0-f194.google.com ([209.85.128.194]:44561 "EHLO mail-wr0-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751864AbeFFNGB (ORCPT ); Wed, 6 Jun 2018 09:06:01 -0400 Received: by mail-wr0-f194.google.com with SMTP id y15-v6so6198968wrg.11 for ; Wed, 06 Jun 2018 06:06:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=evidence-eu-com.20150623.gappssmtp.com; s=20150623; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=5sRcFRX6La4MmtwYs1e0D1LpjIgfmIHFW+RZIrp9ux4=; b=WYlagfUv7aw1fg77bBPSbR7moRasps4IPL6OYKo5TNsrn3wnPkI65XS6XCA9tU6j6K lOsOx+TdCkJeIQXVfZ1hhjvxPB6yhfpQHJhs4rwMl9/lTeiY/e+G3GL4eHUly8ofLgpP /UWba39HVdKIqGeDAnhpMXcYeVL/bCGrAndoyx6fTpDG1MTRjuFfKyE4T4SlZngIWg/r QRzHfF5EQdlGDJt3mBxhqSoSYcVgCeFLfTL+2iHtPS4WTljhk2RrGxf/0WwyeChMGfMA 0/8LM8iAPX9uJBbELBX/fSSSyc5wWlqb1RiuBgEiXp01SRdd9q3eOKwBiUD8wZTIXTb+ jNug== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=5sRcFRX6La4MmtwYs1e0D1LpjIgfmIHFW+RZIrp9ux4=; b=FfrC8eW5eijjAZuJZxm3PqrI7p95gtIr+d4rOlB8+xlt21qhH7L6PzBSgaSuVWGQtc PiBfX0KGpPzv9OIQyY8n7ptpCgmc/qrcEr4l14n+eA5vWubit0TaYXf/3QOMDPSo7Usk FJfChXzgvHVdx5bb+QPLOJsPPGeSIMkYCFe9qkUWmw9ack4fbWVNFhbm8/G9bHWKYmBo d7q/NlI54oSb0+F/m0mwK2OS14Lv1/xr4cZJNUvtepk6RE7ebdDtZ4/FaFVDUZFk9W4J ajuAYa92WnSAGb0lzZETVxug1A0Ci5sOY2vbdlQ3Wd0cS8m3nqNFZYTwLT2GokeFOzdv 7mmw== X-Gm-Message-State: APt69E2A0fGQUnHNROiaa00Do96sanrNa9op/6PXb3qgIad4t8BWp1uT Us8/CcibfVghGfYcZoqTZGdUuQ== X-Received: by 2002:adf:c844:: with SMTP id e4-v6mr2289023wrh.236.1528290360029; Wed, 06 Jun 2018 06:06:00 -0700 (PDT) Received: from [192.168.10.157] (host92-93-static.8-79-b.business.telecomitalia.it. [79.8.93.92]) by smtp.gmail.com with ESMTPSA id p3-v6sm36530673wrn.31.2018.06.06.06.05.58 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 06 Jun 2018 06:05:59 -0700 (PDT) Subject: Re: [PATCH v5 00/10] track CPU utilization To: Quentin Perret Cc: Juri Lelli , Vincent Guittot , Peter Zijlstra , Ingo Molnar , linux-kernel , "Rafael J. Wysocki" , Dietmar Eggemann , Morten Rasmussen , viresh kumar , Valentin Schneider , Luca Abeni References: <1527253951-22709-1-git-send-email-vincent.guittot@linaro.org> <20180605105721.GA12193@e108498-lin.cambridge.arm.com> <20180605121153.GD16081@localhost.localdomain> <20180605130548.GB12193@e108498-lin.cambridge.arm.com> <20180605131518.GG16081@localhost.localdomain> <20180605140101.GE12193@e108498-lin.cambridge.arm.com> <20180605141317.GJ16081@localhost.localdomain> From: Claudio Scordino Message-ID: <6c2dc1aa-3e19-be14-0ed8-b29003c72e61@evidence.eu.com> Date: Wed, 6 Jun 2018 15:05:58 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.8.0 MIME-Version: 1.0 In-Reply-To: <20180605141317.GJ16081@localhost.localdomain> Content-Type: text/plain; charset=iso-8859-15; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Quentin, Il 05/06/2018 16:13, Juri Lelli ha scritto: > On 05/06/18 15:01, Quentin Perret wrote: >> On Tuesday 05 Jun 2018 at 15:15:18 (+0200), Juri Lelli wrote: >>> On 05/06/18 14:05, Quentin Perret wrote: >>>> On Tuesday 05 Jun 2018 at 14:11:53 (+0200), Juri Lelli wrote: >>>>> Hi Quentin, >>>>> >>>>> On 05/06/18 11:57, Quentin Perret wrote: >>>>> >>>>> [...] >>>>> >>>>>> What about the diff below (just a quick hack to show the idea) applied >>>>>> on tip/sched/core ? >>>>>> >>>>>> ---8<--- >>>>>> diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c >>>>>> index a8ba6d1f262a..23a4fb1c2c25 100644 >>>>>> --- a/kernel/sched/cpufreq_schedutil.c >>>>>> +++ b/kernel/sched/cpufreq_schedutil.c >>>>>> @@ -180,9 +180,12 @@ static void sugov_get_util(struct sugov_cpu *sg_cpu) >>>>>> sg_cpu->util_dl = cpu_util_dl(rq); >>>>>> } >>>>>> >>>>>> +unsigned long scale_rt_capacity(int cpu); >>>>>> static unsigned long sugov_aggregate_util(struct sugov_cpu *sg_cpu) >>>>>> { >>>>>> struct rq *rq = cpu_rq(sg_cpu->cpu); >>>>>> + int cpu = sg_cpu->cpu; >>>>>> + unsigned long util, dl_bw; >>>>>> >>>>>> if (rq->rt.rt_nr_running) >>>>>> return sg_cpu->max; >>>>>> @@ -197,7 +200,14 @@ static unsigned long sugov_aggregate_util(struct sugov_cpu *sg_cpu) >>>>>> * util_cfs + util_dl as requested freq. However, cpufreq is not yet >>>>>> * ready for such an interface. So, we only do the latter for now. >>>>>> */ >>>>>> - return min(sg_cpu->max, (sg_cpu->util_dl + sg_cpu->util_cfs)); >>>>>> + util = arch_scale_cpu_capacity(NULL, cpu) * scale_rt_capacity(cpu); >>>>> >>>>> Sorry to be pedantinc, but this (ATM) includes DL avg contribution, so, >>>>> since we use max below, we will probably have the same problem that we >>>>> discussed on Vincent's approach (overestimation of DL contribution while >>>>> we could use running_bw). >>>> >>>> Ah no, you're right, this isn't great for long running deadline tasks. >>>> We should definitely account for the running_bw here, not the dl avg... >>>> >>>> I was trying to address the issue of RT stealing time from CFS here, but >>>> the DL integration isn't quite right which this patch as-is, I agree ... >>>> >>>>> >>>>>> + util >>= SCHED_CAPACITY_SHIFT; >>>>>> + util = arch_scale_cpu_capacity(NULL, cpu) - util; >>>>>> + util += sg_cpu->util_cfs; >>>>>> + dl_bw = (rq->dl.this_bw * SCHED_CAPACITY_SCALE) >> BW_SHIFT; >>>>> >>>>> Why this_bw instead of running_bw? >>>> >>>> So IIUC, this_bw should basically give you the absolute reservation (== the >>>> sum of runtime/deadline ratios of all DL tasks on that rq). >>> >>> Yep. >>> >>>> The reason I added this max is because I'm still not sure to understand >>>> how we can safely drop the freq below that point ? If we don't guarantee >>>> to always stay at least at the freq required by DL, aren't we risking to >>>> start a deadline tasks stuck at a low freq because of rate limiting ? In >>>> this case, if that tasks uses all of its runtime then you might start >>>> missing deadlines ... >>> >>> We decided to avoid (software) rate limiting for DL with e97a90f7069b >>> ("sched/cpufreq: Rate limits for SCHED_DEADLINE"). >> >> Right, I spotted that one, but yeah you could also be limited by HW ... >> >>> >>>> My feeling is that the only safe thing to do is to guarantee to never go >>>> below the freq required by DL, and to optimistically add CFS tasks >>>> without raising the OPP if we have good reasons to think that DL is >>>> using less than it required (which is what we should get by using >>>> running_bw above I suppose). Does that make any sense ? >>> >>> Then we can't still avoid the hardware limits, so using running_bw is a >>> trade off between safety (especially considering soft real-time >>> scenarios) and energy consumption (which seems to be working in >>> practice). >> >> Ok, I see ... Have you guys already tried something like my patch above >> (keeping the freq >= this_bw) in real world use cases ? Is this costing >> that much energy in practice ? If we fill the gaps left by DL (when it > > IIRC, Claudio (now Cc-ed) did experiment a bit with both approaches, so > he might add some numbers to my words above. I didn't (yet). But, please > consider that I might be reserving (for example) 50% of bandwidth for my > heavy and time sensitive task and then have that task wake up only once > in a while (but I'll be keeping clock speed up for the whole time). :/ As far as I can remember, we never tested energy consumption of running_bw vs this_bw, as at OSPM'17 we had already decided to use running_bw implementing GRUB-PA. The rationale is that, as Juri pointed out, the amount of spare (i.e. reclaimable) bandwidth in this_bw is very user-dependent. For example, the user can let this_bw be much higher than the measured bandwidth, just to be sure that the deadlines are met even in corner cases. In practice, this means that the task executes for quite a short time and then blocks (with its bandwidth reclaimed, hence the CPU frequency reduced, at the 0lag time). Using this_bw rather than running_bw, the CPU frequency would remain at the same fixed value even when the task is blocked. I understand that on some cases it could even be better (i.e. no waste of energy in frequency switch). However, IMHO, these are corner cases and in the average case it is better to rely on running_bw and reduce the CPU frequency accordingly. Best regards, Claudio