Received: by 2002:ac0:a5b6:0:0:0:0:0 with SMTP id m51-v6csp3825767imm; Mon, 4 Jun 2018 09:52:01 -0700 (PDT) X-Google-Smtp-Source: ADUXVKJxbISw3VJo6ediM0ZxgHuEFMWZ1Kbm+StuZ18UNvjaJ1Z6Jf6ykThYTa6J7NWCcr/qhKur X-Received: by 2002:a17:902:b604:: with SMTP id b4-v6mr22507497pls.18.1528131121679; Mon, 04 Jun 2018 09:52:01 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1528131121; cv=none; d=google.com; s=arc-20160816; b=0LlIPrZ5NCmAtwB0KMgsnwkoIg9pUW9o0I34+4Kgj6M9UbdEL7VwUq03A8XTz3qifi yvgnJhI0FgW7O/tkkCtrwlSZ4ff+zAHeEzmCrgLP6lW1Fa/O94kStqyWXuKC4Fz95Rct eZ/ai3u2yQh2+lssqIMQhnz6mDGxhzaBGa+JTm2FJUHP3Ro8y2Op8AjlcADQ34RCBHop LrrzaFriKwK2tX+G6j/i+IqE8vQhH5Kmh7YUFkx9GnTshGj8dtl29OonJX3/gKWUEmGw hp8fwK+D7PCKSeL0NhOc3gMrFw7ubklXDnp8n7rcPzq0EPkXwOlUB1krem2/+3LnaUy+ 4XuA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature:arc-authentication-results; bh=0oA65KGtWlteOoNuzE7OJ4zs22WV6AFJdZAf2NVGlao=; b=HABi0kGtOH01tNszWb2wsSX6Vfh9+qDxww8A4HkOTbZmVEDntzUxx+FVNHh1nNl6ht vu1Nw5VETpivF3FspGSIqNBRyF9SXCDcUlTFoduly6Xg2op3WxxO+XMfWEGySDi5Xv5T IuhQbDjP6nQWUAORie8VEXpUZsft2vWh81BKM2GlLc0Q8dsZ9p2tIsNe6wZ4l0MQwwOl 89haCrZ4c7MBkbNKwm2GH40Fcjjq2EDgHm5tgeLatYtHHRXx8Lup59jl14SXiVXnzIfX opkKnd2n07EAGG1KPSMd9j27MTdz1ufGckI1jH48LWkNOQLrzzGo8InYyrwdu8d3JNJm gYzw== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@infradead.org header.s=merlin.20170209 header.b=izDH1JoY; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id e7-v6si30563803plj.311.2018.06.04.09.51.47; Mon, 04 Jun 2018 09:52:01 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=fail header.i=@infradead.org header.s=merlin.20170209 header.b=izDH1JoY; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751474AbeFDQvB (ORCPT + 99 others); Mon, 4 Jun 2018 12:51:01 -0400 Received: from merlin.infradead.org ([205.233.59.134]:46310 "EHLO merlin.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751203AbeFDQu6 (ORCPT ); Mon, 4 Jun 2018 12:50:58 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=merlin.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=0oA65KGtWlteOoNuzE7OJ4zs22WV6AFJdZAf2NVGlao=; b=izDH1JoYTAxW2v2Tf87rLy3lt 1DsWKhW1gM512ES6qRH8hx6g/axQhJlpqvZ2cphc6azFWFYilWfDsmdRZgNuwZxOZ24ozb7EbWHxt muc+jO0LuiEt/pYDq9/6pNMtQcIcaKQF+PzkioKXmi3JG7rUxXIlfSgo8+77/DwBTJAibtRigBXa7 nU1RdHAlDPvr9ovcY7v/q9j8AovXtVjbbkJNzjfoY8QQI76tTfuZYBs8D5DB51ZL8ugSlGsb96xPY uPbnv0EmwP+U63NlqKtmzqulH/Kqc14tIn+ZktIlkLFiVGb/V5P8pektjk3MH9EMAO7eYQRIZd5Mc jfiqZK3TQ==; Received: from j217100.upc-j.chello.nl ([24.132.217.100] helo=hirez.programming.kicks-ass.net) by merlin.infradead.org with esmtpsa (Exim 4.90_1 #2 (Red Hat Linux)) id 1fPsh0-0007GL-0I; Mon, 04 Jun 2018 16:50:50 +0000 Received: by hirez.programming.kicks-ass.net (Postfix, from userid 1000) id 5625F201F0961; Mon, 4 Jun 2018 18:50:47 +0200 (CEST) Date: Mon, 4 Jun 2018 18:50:47 +0200 From: Peter Zijlstra To: Vincent Guittot Cc: mingo@kernel.org, linux-kernel@vger.kernel.org, rjw@rjwysocki.net, juri.lelli@redhat.com, dietmar.eggemann@arm.com, Morten.Rasmussen@arm.com, viresh.kumar@linaro.org, valentin.schneider@arm.com, quentin.perret@arm.com Subject: Re: [PATCH v5 00/10] track CPU utilization Message-ID: <20180604165047.GU12180@hirez.programming.kicks-ass.net> References: <1527253951-22709-1-git-send-email-vincent.guittot@linaro.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1527253951-22709-1-git-send-email-vincent.guittot@linaro.org> User-Agent: Mutt/1.9.5 (2018-04-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, May 25, 2018 at 03:12:21PM +0200, Vincent Guittot wrote: > When both cfs and rt tasks compete to run on a CPU, we can see some frequency > drops with schedutil governor. In such case, the cfs_rq's utilization doesn't > reflect anymore the utilization of cfs tasks but only the remaining part that > is not used by rt tasks. We should monitor the stolen utilization and take > it into account when selecting OPP. This patchset doesn't change the OPP > selection policy for RT tasks but only for CFS tasks So the problem is that when RT/DL/stop/IRQ happens and preempts CFS tasks, time continues and the CFS load tracking will see !running and decay things. Then, when we get back to CFS, we'll have lower load/util than we expected. In particular, your focus is on OPP selection, and where we would have say: u=1 (always running task), after being preempted by our RT task for a while, it will now have u=.5. With the effect that when the RT task goes sleep we'll drop our OPP to .5 max -- which is 'wrong', right? Your solution is to track RT/DL/stop/IRQ with the identical PELT average as we track cfs util. Such that we can then add the various averages to reconstruct the actual utilisation signal. This should work for the case of the utilization signal on UP. When we consider that PELT migrates the signal around on SMP, but we don't do that to the per-rq signals we have for RT/DL/stop/IRQ. There is also the 'complaint' that this ends up with 2 util signals for DL, complicating things. So this patch-set tracks the !cfs occupation using the same function, which is all good. But what, if instead of using that to compensate the OPP selection, we employ that to renormalize the util signal? If we normalize util against the dynamic (rt_avg affected) cpu_capacity, then I think your initial problem goes away. Because while the RT task will push the util to .5, it will at the same time push the CPU capacity to .5, and renormalized that gives 1. NOTE: the renorm would then become something like: scale_cpu = arch_scale_cpu_capacity() / rt_frac(); On IRC I mentioned stopping the CFS clock when preempted, and while that would result in fixed numbers, Vincent was right in pointing out the numbers will be difficult to interpret, since the meaning will be purely CPU local and I'm not sure you can actually fix it again with normalization. Imagine, running a .3 RT task, that would push the (always running) CFS down to .7, but because we discard all !cfs time, it actually has 1. If we try and normalize that we'll end up with ~1.43, which is of course completely broken. _However_, all that happens for util, also happens for load. So the above scenario will also make the CPU appear less loaded than it actually is. Now, we actually try and compensate for that by decreasing the capacity of the CPU. But because the existing rt_avg and PELT signals are so out-of-tune, this is likely to be less than ideal. With that fixed however, the best this appears to do is, as per the above, preserve the actual load. But what we really wanted is to actually inflate the load, such that someone will take load from us -- we're doing less actual work after all. Possibly, we can do something like: scale_cpu_capacity / (rt_frac^2) for load, then we inflate the load and could maybe get rid of all this capacity_of() sprinkling, but that needs more thinking. But I really feel we need to consider both util and load, as this issue affects both.