Received: by 2002:ac0:a5b6:0:0:0:0:0 with SMTP id m51-v6csp966452imm; Tue, 5 Jun 2018 07:14:03 -0700 (PDT) X-Google-Smtp-Source: ADUXVKLIhZgHEKpUMwzJ3UY6eaBTvLP8TAI2qf7TjB9+3JmNoWxhs67JAz0gIrdowj3M2t+2yOU6 X-Received: by 2002:a17:902:b786:: with SMTP id e6-v6mr27047371pls.260.1528208043922; Tue, 05 Jun 2018 07:14:03 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1528208043; cv=none; d=google.com; s=arc-20160816; b=YHiJAUxWXyiXD6HuEwU6raKP8gn6IBZsA6p36Qpe/ywgDS82QIM3H3tviyodtUHmMq vS6yDM9LIRlaG2pz8sBnmJft244EQEwPCDnNFR4mK29TRNkjCCmHXSZdAA0GVifzeFFk DUStaErphkOWZNNyPRL+uYni2elqghXc4o4qwCs9Zg28lGvfJgQ/OppYTqYQct3Fo0f3 dnBxSjTJ+GK9E0rEouokQf0r8myZwsBNBqeGADn8mgFdQC/vVYkMWzBk6sFDboV9l1yi itXUHJlkftGckOiVRnpj9xypEJ/RJgPOwbATIK7NpUKbuf0giU04DcuElQmwlEMonFFu OaGw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:arc-authentication-results; bh=O3lN0qkRyrBCekEHhZirjfe9BXToLvpMi/omZIJZ5FE=; b=Fzr2FWVrfQejWlAiwUzVPLI03aTqg7CHICWdwZduuMk4Y8CTYai9yigdOH0JOILQJj sAwfz/NX3bSwGsWCWVo+obNcepT/LCwYnyUBDYi2+xGxmjlN1bD7NwRjepYkm523PGE2 qjlm/EmPB6Pv/r9ywjyeJ2RrWzpaWOLqDfZ72gVgLS8JYfbZoz4QYGGyR3kVyC1nzRr6 XjoZeRa50hQ0EAc5RgTvL4uzyGlayZmF5EqmLACdyolp+788cvRP4V7H81BUFsKqNZQg bjrrY231MEyPmPOHc16xOeqE2OJIAdpyRZGzuCrz+zrRXihCRN4qRm3gqhgLqbBbe270 0Eiw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id r72-v6si18997251pfb.65.2018.06.05.07.13.47; Tue, 05 Jun 2018 07:14:03 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752016AbeFEONX (ORCPT + 99 others); Tue, 5 Jun 2018 10:13:23 -0400 Received: from mail-wm0-f66.google.com ([74.125.82.66]:38962 "EHLO mail-wm0-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751539AbeFEONW (ORCPT ); Tue, 5 Jun 2018 10:13:22 -0400 Received: by mail-wm0-f66.google.com with SMTP id p11-v6so5410367wmc.4 for ; Tue, 05 Jun 2018 07:13:21 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=O3lN0qkRyrBCekEHhZirjfe9BXToLvpMi/omZIJZ5FE=; b=m//CJNxkmDsXn5ThScLJj25xVLyibl2uMXS2RxkkfZcnNG/AyjBwr09upszdGzBkqz 1pocGsZgl11D/oN6mFdYHkg9xnjjXfZ8tiOSsJF9z69BowPogvz3Mfagiz6DOpP8wIBl ZhuY+FlGOZc2JxN1MA7L/nR9W+5OYaWDxzyiZLbRDnKuhj+spGGfuc36T2cbBOmKY7Jt JFHbMH08PF/A5OI2XAGEt1+WoStOv1kQi8d5/rVF7j2s1NtZ66B2cnT+En2zmw0ys3PD AsUtkJvUTgCCGwHPAVZ35LXKT34piS3D19g5WQ7uoMSwCNbtTVbnc/KszzWkGcdKYGs9 FEbw== X-Gm-Message-State: APt69E2tZrQuTHaX/HfH9ngwd8WekOKN6AWCNbsO8SmFYGjFBATNbPkt jO0pUADnz9o4ttkbksDz9N7R+Q== X-Received: by 2002:a1c:8312:: with SMTP id f18-v6mr12872949wmd.127.1528208000991; Tue, 05 Jun 2018 07:13:20 -0700 (PDT) Received: from localhost.localdomain ([151.15.207.242]) by smtp.gmail.com with ESMTPSA id x5-v6sm15970520wrr.3.2018.06.05.07.13.19 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Tue, 05 Jun 2018 07:13:20 -0700 (PDT) Date: Tue, 5 Jun 2018 16:13:17 +0200 From: Juri Lelli To: Quentin Perret Cc: Vincent Guittot , Peter Zijlstra , Ingo Molnar , linux-kernel , "Rafael J. Wysocki" , Dietmar Eggemann , Morten Rasmussen , viresh kumar , Valentin Schneider , Claudio Scordino , Luca Abeni Subject: Re: [PATCH v5 00/10] track CPU utilization Message-ID: <20180605141317.GJ16081@localhost.localdomain> References: <1527253951-22709-1-git-send-email-vincent.guittot@linaro.org> <20180605105721.GA12193@e108498-lin.cambridge.arm.com> <20180605121153.GD16081@localhost.localdomain> <20180605130548.GB12193@e108498-lin.cambridge.arm.com> <20180605131518.GG16081@localhost.localdomain> <20180605140101.GE12193@e108498-lin.cambridge.arm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180605140101.GE12193@e108498-lin.cambridge.arm.com> User-Agent: Mutt/1.9.2 (2017-12-15) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 05/06/18 15:01, Quentin Perret wrote: > On Tuesday 05 Jun 2018 at 15:15:18 (+0200), Juri Lelli wrote: > > On 05/06/18 14:05, Quentin Perret wrote: > > > On Tuesday 05 Jun 2018 at 14:11:53 (+0200), Juri Lelli wrote: > > > > Hi Quentin, > > > > > > > > On 05/06/18 11:57, Quentin Perret wrote: > > > > > > > > [...] > > > > > > > > > What about the diff below (just a quick hack to show the idea) applied > > > > > on tip/sched/core ? > > > > > > > > > > ---8<--- > > > > > diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c > > > > > index a8ba6d1f262a..23a4fb1c2c25 100644 > > > > > --- a/kernel/sched/cpufreq_schedutil.c > > > > > +++ b/kernel/sched/cpufreq_schedutil.c > > > > > @@ -180,9 +180,12 @@ static void sugov_get_util(struct sugov_cpu *sg_cpu) > > > > > sg_cpu->util_dl = cpu_util_dl(rq); > > > > > } > > > > > > > > > > +unsigned long scale_rt_capacity(int cpu); > > > > > static unsigned long sugov_aggregate_util(struct sugov_cpu *sg_cpu) > > > > > { > > > > > struct rq *rq = cpu_rq(sg_cpu->cpu); > > > > > + int cpu = sg_cpu->cpu; > > > > > + unsigned long util, dl_bw; > > > > > > > > > > if (rq->rt.rt_nr_running) > > > > > return sg_cpu->max; > > > > > @@ -197,7 +200,14 @@ static unsigned long sugov_aggregate_util(struct sugov_cpu *sg_cpu) > > > > > * util_cfs + util_dl as requested freq. However, cpufreq is not yet > > > > > * ready for such an interface. So, we only do the latter for now. > > > > > */ > > > > > - return min(sg_cpu->max, (sg_cpu->util_dl + sg_cpu->util_cfs)); > > > > > + util = arch_scale_cpu_capacity(NULL, cpu) * scale_rt_capacity(cpu); > > > > > > > > Sorry to be pedantinc, but this (ATM) includes DL avg contribution, so, > > > > since we use max below, we will probably have the same problem that we > > > > discussed on Vincent's approach (overestimation of DL contribution while > > > > we could use running_bw). > > > > > > Ah no, you're right, this isn't great for long running deadline tasks. > > > We should definitely account for the running_bw here, not the dl avg... > > > > > > I was trying to address the issue of RT stealing time from CFS here, but > > > the DL integration isn't quite right which this patch as-is, I agree ... > > > > > > > > > > > > + util >>= SCHED_CAPACITY_SHIFT; > > > > > + util = arch_scale_cpu_capacity(NULL, cpu) - util; > > > > > + util += sg_cpu->util_cfs; > > > > > + dl_bw = (rq->dl.this_bw * SCHED_CAPACITY_SCALE) >> BW_SHIFT; > > > > > > > > Why this_bw instead of running_bw? > > > > > > So IIUC, this_bw should basically give you the absolute reservation (== the > > > sum of runtime/deadline ratios of all DL tasks on that rq). > > > > Yep. > > > > > The reason I added this max is because I'm still not sure to understand > > > how we can safely drop the freq below that point ? If we don't guarantee > > > to always stay at least at the freq required by DL, aren't we risking to > > > start a deadline tasks stuck at a low freq because of rate limiting ? In > > > this case, if that tasks uses all of its runtime then you might start > > > missing deadlines ... > > > > We decided to avoid (software) rate limiting for DL with e97a90f7069b > > ("sched/cpufreq: Rate limits for SCHED_DEADLINE"). > > Right, I spotted that one, but yeah you could also be limited by HW ... > > > > > > My feeling is that the only safe thing to do is to guarantee to never go > > > below the freq required by DL, and to optimistically add CFS tasks > > > without raising the OPP if we have good reasons to think that DL is > > > using less than it required (which is what we should get by using > > > running_bw above I suppose). Does that make any sense ? > > > > Then we can't still avoid the hardware limits, so using running_bw is a > > trade off between safety (especially considering soft real-time > > scenarios) and energy consumption (which seems to be working in > > practice). > > Ok, I see ... Have you guys already tried something like my patch above > (keeping the freq >= this_bw) in real world use cases ? Is this costing > that much energy in practice ? If we fill the gaps left by DL (when it IIRC, Claudio (now Cc-ed) did experiment a bit with both approaches, so he might add some numbers to my words above. I didn't (yet). But, please consider that I might be reserving (for example) 50% of bandwidth for my heavy and time sensitive task and then have that task wake up only once in a while (but I'll be keeping clock speed up for the whole time). :/ > doesn't use all the runtime) with CFS tasks that might no be so bad ... > > Thank you very much for taking the time to explain all this, I really > appreciate :-) Sure. Thanks for participating to the discussion! Best, - Juri