Received: by 2002:ac0:a5b6:0:0:0:0:0 with SMTP id m51-v6csp3850950imm; Mon, 4 Jun 2018 10:15:51 -0700 (PDT) X-Google-Smtp-Source: ADUXVKLJplZ1iOJyAsIn9XNUuRTxF+QbFA6y1k0FG44oYhioGPpxEBehUwfx/K+AbWdo6sg3qGVe X-Received: by 2002:a63:6e4e:: with SMTP id j75-v6mr43301pgc.125.1528132551618; Mon, 04 Jun 2018 10:15:51 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1528132551; cv=none; d=google.com; s=arc-20160816; b=P9rbsXipjI3muhEmQZYc35qlCFsWQwc3qR1zWdzAc/c4iTt20cCyMjFXC9sVHJL/MS aIOGIS6nVOKk0pK5AgkErJNaZiR8CnckWf9v/Fi3FUlKWkRPH0aHUG2m7TPMCLMQ8TZX 9micqo3jnPwMrw5fPvy2qyROBjy4XHnP0LyAec5b1ST8t6A4rrcubDBQSMruAGVfY0Gp T4OZ3GFGQ7ugP0EBrRAT4UKVVakMfOtgQacNppL0mdfdX3z9XnWr3MoFPUkNkGy7SlCq oOBTyF1377XTMWDVanUh6rVc4+1NnnxJDyh2rqpjBxXwJ3/EHCtXpnhI7Fd1uelnSbuR 0lVw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:arc-authentication-results; bh=4Is0iiM16wjrn6zM8a5OUDhHK3v2WjZZ+CiHg2A2/60=; b=pSLn+/b/IP8Ei+CInROzrY0ANiuAo61/rHsSTnEzEFhbeBe638OvCc29SSoEtTYG+q wUzVKzF0aGhDlXdvOs3Xst9YSyv6O7N/SMj0bFEIqc0GxbCKwSWynNVh8qf0vqdFKk+R tkMhLiekRzY+NZyYaCUg3EeESn6DaI1QVd/CnQEcGfAhVjalfQm2ZyThDhrXggW6d/Fw noSUy0aTtxIpxMOHtGGn4gOcQAM31f9J9uYT/r5Ig/ovU8iBgEl4RR3+cqgai4iZQt4a gaVKt/37uJ+Hop+gx+s3s4MQW/GLTQYlWNMmUd+hw3oz/G+L89r9KG3t2dP3fzAxyH5U qurg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id p6-v6si37087153pga.25.2018.06.04.10.15.36; Mon, 04 Jun 2018 10:15:51 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751242AbeFDRNs (ORCPT + 99 others); Mon, 4 Jun 2018 13:13:48 -0400 Received: from usa-sjc-mx-foss1.foss.arm.com ([217.140.101.70]:46060 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751103AbeFDRNr (ORCPT ); Mon, 4 Jun 2018 13:13:47 -0400 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 9A3971596; Mon, 4 Jun 2018 10:13:46 -0700 (PDT) Received: from e108498-lin.cambridge.arm.com (e108498-lin.cambridge.arm.com [10.1.210.84]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id CF9CF3F59D; Mon, 4 Jun 2018 10:13:44 -0700 (PDT) Date: Mon, 4 Jun 2018 18:13:40 +0100 From: Quentin Perret To: Peter Zijlstra Cc: Vincent Guittot , mingo@kernel.org, linux-kernel@vger.kernel.org, rjw@rjwysocki.net, juri.lelli@redhat.com, dietmar.eggemann@arm.com, Morten.Rasmussen@arm.com, viresh.kumar@linaro.org, valentin.schneider@arm.com Subject: Re: [PATCH v5 00/10] track CPU utilization Message-ID: <20180604171339.GA25372@e108498-lin.cambridge.arm.com> References: <1527253951-22709-1-git-send-email-vincent.guittot@linaro.org> <20180604165047.GU12180@hirez.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180604165047.GU12180@hirez.programming.kicks-ass.net> User-Agent: Mutt/1.8.3 (2017-05-23) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Monday 04 Jun 2018 at 18:50:47 (+0200), Peter Zijlstra wrote: > On Fri, May 25, 2018 at 03:12:21PM +0200, Vincent Guittot wrote: > > When both cfs and rt tasks compete to run on a CPU, we can see some frequency > > drops with schedutil governor. In such case, the cfs_rq's utilization doesn't > > reflect anymore the utilization of cfs tasks but only the remaining part that > > is not used by rt tasks. We should monitor the stolen utilization and take > > it into account when selecting OPP. This patchset doesn't change the OPP > > selection policy for RT tasks but only for CFS tasks > > So the problem is that when RT/DL/stop/IRQ happens and preempts CFS > tasks, time continues and the CFS load tracking will see !running and > decay things. > > Then, when we get back to CFS, we'll have lower load/util than we > expected. > > In particular, your focus is on OPP selection, and where we would have > say: u=1 (always running task), after being preempted by our RT task for > a while, it will now have u=.5. With the effect that when the RT task > goes sleep we'll drop our OPP to .5 max -- which is 'wrong', right? > > Your solution is to track RT/DL/stop/IRQ with the identical PELT average > as we track cfs util. Such that we can then add the various averages to > reconstruct the actual utilisation signal. > > This should work for the case of the utilization signal on UP. When we > consider that PELT migrates the signal around on SMP, but we don't do > that to the per-rq signals we have for RT/DL/stop/IRQ. > > There is also the 'complaint' that this ends up with 2 util signals for > DL, complicating things. > > > So this patch-set tracks the !cfs occupation using the same function, > which is all good. But what, if instead of using that to compensate the > OPP selection, we employ that to renormalize the util signal? > > If we normalize util against the dynamic (rt_avg affected) cpu_capacity, > then I think your initial problem goes away. Because while the RT task > will push the util to .5, it will at the same time push the CPU capacity > to .5, and renormalized that gives 1. > > NOTE: the renorm would then become something like: > scale_cpu = arch_scale_cpu_capacity() / rt_frac(); Isn't it equivalent ? I mean, you can remove RT/DL/stop/IRQ from the CPU capacity and compare the CFS util_avg against that, or you can add RT/DL/stop/IRQ to the CFS util_avg and compare it to arch_scale_cpu_capacity(). Both should be interchangeable no ? By adding RT/DL/IRQ PELT signals to the CFS util_avg, Vincent is proposing to go with the latter I think. But aren't the signals we currently use to account for RT/DL/stop/IRQ in cpu_capacity good enough for that ? Can't we just add the diff between capacity_orig_of and capacity_of to the CFS util and do OPP selection with that (for !nr_rt_running) ? Maybe add a min with dl running_bw to be on the safe side ... ? > > > On IRC I mentioned stopping the CFS clock when preempted, and while that > would result in fixed numbers, Vincent was right in pointing out the > numbers will be difficult to interpret, since the meaning will be purely > CPU local and I'm not sure you can actually fix it again with > normalization. > > Imagine, running a .3 RT task, that would push the (always running) CFS > down to .7, but because we discard all !cfs time, it actually has 1. If > we try and normalize that we'll end up with ~1.43, which is of course > completely broken. > > > _However_, all that happens for util, also happens for load. So the above > scenario will also make the CPU appear less loaded than it actually is. > > Now, we actually try and compensate for that by decreasing the capacity > of the CPU. But because the existing rt_avg and PELT signals are so > out-of-tune, this is likely to be less than ideal. With that fixed > however, the best this appears to do is, as per the above, preserve the > actual load. But what we really wanted is to actually inflate the load, > such that someone will take load from us -- we're doing less actual work > after all. > > Possibly, we can do something like: > > scale_cpu_capacity / (rt_frac^2) > > for load, then we inflate the load and could maybe get rid of all this > capacity_of() sprinkling, but that needs more thinking. > > > But I really feel we need to consider both util and load, as this issue > affects both.