Received: by 2002:ac0:a5b6:0:0:0:0:0 with SMTP id m51-v6csp1070382imm; Tue, 5 Jun 2018 08:40:12 -0700 (PDT) X-Google-Smtp-Source: ADUXVKIF4So+wtUCOXkA/0s12t+/qHI0duGEJhgzBnBwEjj7LM6yT8a+h6Qo6Ey0cLvdwvYbDYSx X-Received: by 2002:a63:744c:: with SMTP id e12-v6mr21446612pgn.4.1528213211903; Tue, 05 Jun 2018 08:40:11 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1528213211; cv=none; d=google.com; s=arc-20160816; b=LuH0H9FCBBTDnSDsVxTn8kuxcVl75qlsl+l3FNf2LEwcMf0k1a5gHOcxJqc74cy7xq jZWRM5jnrJvQ9A6m0LflQogAGpidW2lHlT2c5EYvGXOgqjMp8ED7dLWOTvGFs2s4zndb rDnPQaAltyqgu1d97hl2DEGVWO7GoPDhTQDEEfl7lWHPLWQu0LbAElX+dPSOLCw2twUX 8ZH3Z2rD2ht2v3i2JKozRW8R2mNXKJietIfGquP+uU0k110aiMDcBnWCypb4UybGqj4x IXkJF9e1MeXte9aJ1GLHpUzyxNDtJICCO5NAP2CuiLtFlj+ZqbpYFpU/6aPXUMB269pM 9NhA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:arc-authentication-results; bh=MeqfYKkgsua80gkWFVFBTEVl6WDHX/dAtnrhMwK2br4=; b=yztK3heb+FoLRhEFDSPpPKK6nJyT/Wd+hWt+c9l+o558hPHt63QrL27py046lxJ7j0 xX90ltD0T4RTfPESc7uVCyE4jITMASDZjNtJdcehEXdnMgNUHKHt/LJ1eyF29Yna2NDq EZRndZC4dZj/xiXRPv9uhQw3dNkfhHGo/BzhDmwTZB+UuTyrQnJhVisneaa3Yz7/hVuC Bnn7m17sZRifaZ0f+5zMY5HNaRNiSaoyy36BMbITKZhoSCmKPs1dFww0dBgI7GLamXnO QjHMYV9k8RZBTauMx5gUgl+0+miZUoxwXrzhK59mAtNICaOcZpLRvApiXXhifm5jeCMj GfFA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id v32-v6si48949274plb.273.2018.06.05.08.39.57; Tue, 05 Jun 2018 08:40:11 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751962AbeFEPic (ORCPT + 99 others); Tue, 5 Jun 2018 11:38:32 -0400 Received: from usa-sjc-mx-foss1.foss.arm.com ([217.140.101.70]:57660 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751599AbeFEPib (ORCPT ); Tue, 5 Jun 2018 11:38:31 -0400 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 10CE91596; Tue, 5 Jun 2018 08:38:31 -0700 (PDT) Received: from e110439-lin (e110439-lin.cambridge.arm.com [10.1.210.68]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 2383A3F557; Tue, 5 Jun 2018 08:38:29 -0700 (PDT) Date: Tue, 5 Jun 2018 16:38:26 +0100 From: Patrick Bellasi To: Peter Zijlstra Cc: Vincent Guittot , Ingo Molnar , linux-kernel , "Rafael J. Wysocki" , Juri Lelli , Dietmar Eggemann , Morten Rasmussen , viresh kumar , Valentin Schneider , Quentin Perret Subject: Re: [PATCH v5 00/10] track CPU utilization Message-ID: <20180605153826.GE32302@e110439-lin> References: <1527253951-22709-1-git-send-email-vincent.guittot@linaro.org> <20180604165047.GU12180@hirez.programming.kicks-ass.net> <20180605141809.GV12180@hirez.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180605141809.GV12180@hirez.programming.kicks-ass.net> User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 05-Jun 16:18, Peter Zijlstra wrote: > On Mon, Jun 04, 2018 at 08:08:58PM +0200, Vincent Guittot wrote: > > On 4 June 2018 at 18:50, Peter Zijlstra wrote: > > > > So this patch-set tracks the !cfs occupation using the same function, > > > which is all good. But what, if instead of using that to compensate the > > > OPP selection, we employ that to renormalize the util signal? > > > > > > If we normalize util against the dynamic (rt_avg affected) cpu_capacity, > > > then I think your initial problem goes away. Because while the RT task > > > will push the util to .5, it will at the same time push the CPU capacity > > > to .5, and renormalized that gives 1. And would not that mean also that a 50% task co-scheduled with the same 50% RT task, will be reported as a 100% util_avg task? > > > > > > NOTE: the renorm would then become something like: > > > scale_cpu = arch_scale_cpu_capacity() / rt_frac(); > > Should probably be: > > scale_cpu = atch_scale_cpu_capacity() / (1 - rt_frac()) > > > > > > > > > > On IRC I mentioned stopping the CFS clock when preempted, and while that > > > would result in fixed numbers, Vincent was right in pointing out the > > > numbers will be difficult to interpret, since the meaning will be purely > > > CPU local and I'm not sure you can actually fix it again with > > > normalization. > > > > > > Imagine, running a .3 RT task, that would push the (always running) CFS > > > down to .7, but because we discard all !cfs time, it actually has 1. If > > > we try and normalize that we'll end up with ~1.43, which is of course > > > completely broken. > > > > > > > > > _However_, all that happens for util, also happens for load. So the above > > > scenario will also make the CPU appear less loaded than it actually is. > > > > The load will continue to increase because we track runnable state and > > not running for the load > > Duh yes. So renormalizing it once, like proposed for util would actually > do the right thing there too. Would not that allow us to get rid of > much of the capacity magic in the load balance code? > > /me thinks more.. > > Bah, no.. because you don't want this dynamic renormalization part of > the sums. So you want to keep it after the fact. :/ > > > As you mentioned, scale_rt_capacity give the remaining capacity for > > cfs and it will behave like cfs util_avg now that it uses PELT. So as > > long as cfs util_avg < scale_rt_capacity(we probably need a margin) > > we keep using dl bandwidth + cfs util_avg + rt util_avg for selecting > > OPP because we have remaining spare capacity but if cfs util_avg == > > scale_rt_capacity, we make sure to use max OPP. What will happen for the 50% task of the example above? > Good point, when cfs-util < cfs-cap then there is idle time and the util > number is 'right', when cfs-util == cfs-cap we're overcommitted and > should go max. Again I cannot easily read the example above... Would that mean that a 50% CFS task, preempted by a 50% RT task (which already set OPP to max while RUNNABLE) will end up running at the max OPP too? > Since the util and cap values are aligned that should track nicely. True... the only potential issue I see is that we are steering PELT behaviors towards better driving schedutil to run high-demand workloads while _maybe_ affecting quite sensibly the capacity of PELT to describe how much CPU a task uses. Ultimately, utilization has always been a metric on "how much you use"... while here it seems to me we are bending it to be something to define "how fast you have to run". -- #include Patrick Bellasi