Received: by 2002:ac0:a5b6:0:0:0:0:0 with SMTP id m51-v6csp591332imm; Wed, 6 Jun 2018 02:46:06 -0700 (PDT) X-Google-Smtp-Source: ADUXVKJFaN1lOjrHHp4hzD+XUTFwsyct1wTvhKApaj1bqcnHMIKhziGnAI4pSV6ZoOzJ+NhnV+ZU X-Received: by 2002:a62:d6d2:: with SMTP id a79-v6mr1756815pfl.87.1528278366282; Wed, 06 Jun 2018 02:46:06 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1528278366; cv=none; d=google.com; s=arc-20160816; b=oxLSUV+mFPRV8qpN6ugfKrSAUihGwlRulc3ypIt/9J2YwV/aAO+bLydzYQZBgDCuZq PnHHwWtm7GSXZXAHj71cJsHLHO67MJz8HnYfRMvsU+3U0szzNdOtdbDviQca25XN6KEd iBUfbn3CiJovAFUkVlY+PONFYfgFcEPSl7isRunTfenQY1cH0nEyC8q1TmVIe/Y1qTW7 mr/hSa7PmhW1wZY3869i2s63bzJ32mMkBdWe/Qt2DCdOwrsgM5+hwLXB+M0mByZ4vkhr O8FEoCjbPODmGOn5m1IobaQop8SsSKfKLuSI/aGJTczNDYZ7YV9NNLJHt0fqJzal55TR TwxA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:arc-authentication-results; bh=96cRagJtYZe7k7kIqSQpwaVcNJvRd1a7DieHesfhb80=; b=eK/BRgQ1TVDQBm1hQbZ8LnKJEPd4z6WIgKDrBlt0sgcnV4nvUmjPSWzsCrMl1tgxAs 9tWZDRTetL3P1JljU5ygCXw7IixVKpj3R/jZh4MOqC0dGp+azfiLHPGEFw8kXbVveSNS lSFM/PbPzvED5SqFtrRWvsiYUUnRJXQ8MhJ/eRuOhZUX+QuSnAnzWS00g7kL1gFgwYf5 GPZzfmfV2cRHt5L2Bp2p0zoPgja9fQpx20GE+XS71qSslerdazxcoc5PaO+BG/b2yGI+ 5Aexiqc3C8VVOmrTgoVmafShBp1f93OsbAoaGgZepkQe0sNzOe5l1x4WmCHXASniRlIQ EHKA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id b14-v6si9017169pls.292.2018.06.06.02.45.51; Wed, 06 Jun 2018 02:46:06 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932652AbeFFJoq (ORCPT + 99 others); Wed, 6 Jun 2018 05:44:46 -0400 Received: from foss.arm.com ([217.140.101.70]:38592 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932472AbeFFJop (ORCPT ); Wed, 6 Jun 2018 05:44:45 -0400 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id A5C4F15AB; Wed, 6 Jun 2018 02:44:44 -0700 (PDT) Received: from e108498-lin.cambridge.arm.com (e108498-lin.cambridge.arm.com [10.1.210.84]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id DB8C83F557; Wed, 6 Jun 2018 02:44:42 -0700 (PDT) Date: Wed, 6 Jun 2018 10:44:38 +0100 From: Quentin Perret To: Peter Zijlstra Cc: Vincent Guittot , Ingo Molnar , linux-kernel , "Rafael J. Wysocki" , Juri Lelli , Dietmar Eggemann , Morten Rasmussen , viresh kumar , Valentin Schneider Subject: Re: [PATCH v5 00/10] track CPU utilization Message-ID: <20180606094409.GA10870@e108498-lin.cambridge.arm.com> References: <1527253951-22709-1-git-send-email-vincent.guittot@linaro.org> <20180604165047.GU12180@hirez.programming.kicks-ass.net> <20180605141809.GV12180@hirez.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180605141809.GV12180@hirez.programming.kicks-ass.net> User-Agent: Mutt/1.8.3 (2017-05-23) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tuesday 05 Jun 2018 at 16:18:09 (+0200), Peter Zijlstra wrote: > On Mon, Jun 04, 2018 at 08:08:58PM +0200, Vincent Guittot wrote: > > On 4 June 2018 at 18:50, Peter Zijlstra wrote: > > > > So this patch-set tracks the !cfs occupation using the same function, > > > which is all good. But what, if instead of using that to compensate the > > > OPP selection, we employ that to renormalize the util signal? > > > > > > If we normalize util against the dynamic (rt_avg affected) cpu_capacity, > > > then I think your initial problem goes away. Because while the RT task > > > will push the util to .5, it will at the same time push the CPU capacity > > > to .5, and renormalized that gives 1. > > > > > > NOTE: the renorm would then become something like: > > > scale_cpu = arch_scale_cpu_capacity() / rt_frac(); > > Should probably be: > > scale_cpu = atch_scale_cpu_capacity() / (1 - rt_frac()) > > > > > > > > > > On IRC I mentioned stopping the CFS clock when preempted, and while that > > > would result in fixed numbers, Vincent was right in pointing out the > > > numbers will be difficult to interpret, since the meaning will be purely > > > CPU local and I'm not sure you can actually fix it again with > > > normalization. > > > > > > Imagine, running a .3 RT task, that would push the (always running) CFS > > > down to .7, but because we discard all !cfs time, it actually has 1. If > > > we try and normalize that we'll end up with ~1.43, which is of course > > > completely broken. > > > > > > > > > _However_, all that happens for util, also happens for load. So the above > > > scenario will also make the CPU appear less loaded than it actually is. > > > > The load will continue to increase because we track runnable state and > > not running for the load > > Duh yes. So renormalizing it once, like proposed for util would actually > do the right thing there too. Would not that allow us to get rid of > much of the capacity magic in the load balance code? > > /me thinks more.. > > Bah, no.. because you don't want this dynamic renormalization part of > the sums. So you want to keep it after the fact. :/ > > > As you mentioned, scale_rt_capacity give the remaining capacity for > > cfs and it will behave like cfs util_avg now that it uses PELT. So as > > long as cfs util_avg < scale_rt_capacity(we probably need a margin) > > we keep using dl bandwidth + cfs util_avg + rt util_avg for selecting > > OPP because we have remaining spare capacity but if cfs util_avg == > > scale_rt_capacity, we make sure to use max OPP. > > Good point, when cfs-util < cfs-cap then there is idle time and the util > number is 'right', when cfs-util == cfs-cap we're overcommitted and > should go max. > > Since the util and cap values are aligned that should track nicely. So Vincent proposed to have a margin between cfs util and cfs cap to be sure there is a little bit of idle time. This is _exactly_ what the overutilized flag in EAS does. That would actually make a lot of sense to use that flag in schedutil. The idea is basically to say, if there isn't enough idle time on all CPUs, the util signal are kinda wrong, so let's not make any decisions (task placement or OPP selection) based on that. If overutilized, go to max freq. Does that make sense ? Thanks, Quentin