Received: by 2002:ac0:a5b6:0:0:0:0:0 with SMTP id m51-v6csp601850imm; Wed, 6 Jun 2018 03:00:18 -0700 (PDT) X-Google-Smtp-Source: ADUXVKL6MOWEHkIuDgJsYTNwSTB+dDKmOFQ5EEjpldIadDgWZhjCmc+ZzNrwxaEebHs1T9GhpApS X-Received: by 2002:a17:902:4545:: with SMTP id m63-v6mr2590107pld.268.1528279218329; Wed, 06 Jun 2018 03:00:18 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1528279218; cv=none; d=google.com; s=arc-20160816; b=JrQQ1ZBim4lfER8OnlrS0CfRQ7nzu41N3Eu4RE0ZEWTKdyh1BxgIBjZeZaUcCisk6c m0Xk2ZtNmtkxQnkDsZjCCFOyV5FptoHCTMwKwBN7G3qPGH2RmI+FkxxUBqH3SRQtMJ1k TcAx9N8knNBeUDEUHMkkzG5f2g0ibuS+ukwIqPPgkU5uzhqwUxt5SfD76NT/LDxRBXic Z2kPESiBvFD0UcyvW4m3S8uMh8VjORsBpNzH/YJYwdHe2syvHgr5Klv9wnGJR6i/fZue xWERt5WCfFmcYA3rS0YnFPLqgkWBl98ZxaJL3pZJ7YzAO3FRqZdfWs4MY8tCuKSm013o Uqiw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :references:in-reply-to:mime-version:dkim-signature :arc-authentication-results; bh=hcaReWr7tYTwsty9wnPoC3XlQUHLU+fSqPQg8sRKscE=; b=0faeffkZywW8vUQk4q2gHD8+RBFvOq9LufuN7Zzirb6B7f+ESQONjIZZJMihFEZpUo ITIZj/2CgKfduposTxt3XJH+GSAo5rZzAQeABbLx65o/YkSLYV6wp/CGhpcEXguurCjm VAq+W3hy/44Vgh16/7P3gmS0i44/tCp0+2m3hgIoO0ErGj9xbbmsLU3S/lkEyNcl7/KL YeoUPnSCKYt/ggdzV+heSsmuO+/qa+COPJAJ2VC5x8stuz/uUEzxczQUZAc8tfMq1N4m l2jSANcZ6DRpsW6CphPHBxK7bWIEeiU8gccO2Ni0yVtaQWbM5h9clrddtJDp3SYUs0Vp f4pg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=WZXOeWZt; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id i11-v6si22686847pgc.350.2018.06.06.03.00.03; Wed, 06 Jun 2018 03:00:18 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=WZXOeWZt; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932522AbeFFJ70 (ORCPT + 99 others); Wed, 6 Jun 2018 05:59:26 -0400 Received: from mail-it0-f66.google.com ([209.85.214.66]:53959 "EHLO mail-it0-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932369AbeFFJ7Z (ORCPT ); Wed, 6 Jun 2018 05:59:25 -0400 Received: by mail-it0-f66.google.com with SMTP id a195-v6so7255180itd.3 for ; Wed, 06 Jun 2018 02:59:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=hcaReWr7tYTwsty9wnPoC3XlQUHLU+fSqPQg8sRKscE=; b=WZXOeWZtrWm60o2tmPbcXnnMv+vaDNBXEDtGXVjBmTM0tq8h25jBPhEMtVBnLCNoGA 6P2aqw6ZPUmIUou1gYPbUyje751sl7XYSOJhRC0b7u6Uoo0YJ9zKw0U+GoQByqUWFQT9 JTggfOnCsPmQHV4y6e3rfcTgv6bfdd/NS8Aks= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=hcaReWr7tYTwsty9wnPoC3XlQUHLU+fSqPQg8sRKscE=; b=ghsSXDziWXcBFu9Q3ETMymLyoghurYRRcn9Jx3P5FcMNw/wT0+xk1Y2A/S72IpYu7I o8+hgMbY91P+eEBnwt/+6rLvfVNAKHsMrTeUu4eoSq9YsG1uLMiYsuq1u8CYSRGStSSj G/UZsf0FPrAVag2eoBHXqJIXG7FyObNzQRrBBCJ4pQzifyw/fQft99QqcjzvF7xErUb2 jUzKRp8lfaRs9cexoCSud4Va8fcXdwM1bMiAaZ0VdkKBNSrgsV6GBzcB/En1UAzG/GsW BkCNtkRyo+9oH9Ce6w+s0I/3Vcq2oQalBjN/Gh9iWAza+UkD87lONcJO4pc5aez4InSh XqnA== X-Gm-Message-State: APt69E3DnEKMpZ9R+9kccg/zwORxDKRrHuTjjGlMREpiXQ5uH2NVmlXn hNQSoDLRSpCm9RvNinbalscmfvhh7zbuE8Kqo4SbnA== X-Received: by 2002:a24:eb17:: with SMTP id h23-v6mr1735770itj.17.1528279165033; Wed, 06 Jun 2018 02:59:25 -0700 (PDT) MIME-Version: 1.0 Received: by 2002:a6b:304a:0:0:0:0:0 with HTTP; Wed, 6 Jun 2018 02:59:04 -0700 (PDT) In-Reply-To: <20180606094409.GA10870@e108498-lin.cambridge.arm.com> References: <1527253951-22709-1-git-send-email-vincent.guittot@linaro.org> <20180604165047.GU12180@hirez.programming.kicks-ass.net> <20180605141809.GV12180@hirez.programming.kicks-ass.net> <20180606094409.GA10870@e108498-lin.cambridge.arm.com> From: Vincent Guittot Date: Wed, 6 Jun 2018 11:59:04 +0200 Message-ID: Subject: Re: [PATCH v5 00/10] track CPU utilization To: Quentin Perret Cc: Peter Zijlstra , Ingo Molnar , linux-kernel , "Rafael J. Wysocki" , Juri Lelli , Dietmar Eggemann , Morten Rasmussen , viresh kumar , Valentin Schneider Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 6 June 2018 at 11:44, Quentin Perret wrote: > On Tuesday 05 Jun 2018 at 16:18:09 (+0200), Peter Zijlstra wrote: >> On Mon, Jun 04, 2018 at 08:08:58PM +0200, Vincent Guittot wrote: >> > On 4 June 2018 at 18:50, Peter Zijlstra wrote: >> >> > > So this patch-set tracks the !cfs occupation using the same function, >> > > which is all good. But what, if instead of using that to compensate the >> > > OPP selection, we employ that to renormalize the util signal? >> > > >> > > If we normalize util against the dynamic (rt_avg affected) cpu_capacity, >> > > then I think your initial problem goes away. Because while the RT task >> > > will push the util to .5, it will at the same time push the CPU capacity >> > > to .5, and renormalized that gives 1. >> > > >> > > NOTE: the renorm would then become something like: >> > > scale_cpu = arch_scale_cpu_capacity() / rt_frac(); >> >> Should probably be: >> >> scale_cpu = atch_scale_cpu_capacity() / (1 - rt_frac()) >> >> > > >> > > >> > > On IRC I mentioned stopping the CFS clock when preempted, and while that >> > > would result in fixed numbers, Vincent was right in pointing out the >> > > numbers will be difficult to interpret, since the meaning will be purely >> > > CPU local and I'm not sure you can actually fix it again with >> > > normalization. >> > > >> > > Imagine, running a .3 RT task, that would push the (always running) CFS >> > > down to .7, but because we discard all !cfs time, it actually has 1. If >> > > we try and normalize that we'll end up with ~1.43, which is of course >> > > completely broken. >> > > >> > > >> > > _However_, all that happens for util, also happens for load. So the above >> > > scenario will also make the CPU appear less loaded than it actually is. >> > >> > The load will continue to increase because we track runnable state and >> > not running for the load >> >> Duh yes. So renormalizing it once, like proposed for util would actually >> do the right thing there too. Would not that allow us to get rid of >> much of the capacity magic in the load balance code? >> >> /me thinks more.. >> >> Bah, no.. because you don't want this dynamic renormalization part of >> the sums. So you want to keep it after the fact. :/ >> >> > As you mentioned, scale_rt_capacity give the remaining capacity for >> > cfs and it will behave like cfs util_avg now that it uses PELT. So as >> > long as cfs util_avg < scale_rt_capacity(we probably need a margin) >> > we keep using dl bandwidth + cfs util_avg + rt util_avg for selecting >> > OPP because we have remaining spare capacity but if cfs util_avg == >> > scale_rt_capacity, we make sure to use max OPP. >> >> Good point, when cfs-util < cfs-cap then there is idle time and the util >> number is 'right', when cfs-util == cfs-cap we're overcommitted and >> should go max. >> >> Since the util and cap values are aligned that should track nicely. > > So Vincent proposed to have a margin between cfs util and cfs cap to be > sure there is a little bit of idle time. This is _exactly_ what the > overutilized flag in EAS does. That would actually make a lot of sense > to use that flag in schedutil. The idea is basically to say, if there > isn't enough idle time on all CPUs, the util signal are kinda wrong, so > let's not make any decisions (task placement or OPP selection) based on > that. If overutilized, go to max freq. Does that make sense ? Yes it's similar to the overutilized except that - this is done per cpu and whereas overutilization is for the whole system - the test is done at every freq update and not only during some cfs event and it uses the last up to date value and not a periodically updated snapshot of the value - this is done also without EAS Then for the margin, it has to be discussed if it is really needed or not > > Thanks, > Quentin