Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751954AbdC0G7g (ORCPT ); Mon, 27 Mar 2017 02:59:36 -0400 Received: from mail-oi0-f46.google.com ([209.85.218.46]:33100 "EHLO mail-oi0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751457AbdC0G71 (ORCPT ); Mon, 27 Mar 2017 02:59:27 -0400 MIME-Version: 1.0 In-Reply-To: References: <4366682.tsferJN35u@aspire.rjw.lan> <2185243.flNrap3qq1@aspire.rjw.lan> <20170320035745.GC25659@vireshk-i7> <20170320123416.GB27896@e110439-lin> From: Vincent Guittot Date: Mon, 27 Mar 2017 08:59:05 +0200 Message-ID: Subject: Re: [RFC][PATCH 2/2] cpufreq: schedutil: Force max frequency on busy CPUs To: Joel Fernandes Cc: Patrick Bellasi , "Rafael J. Wysocki" , Viresh Kumar , Linux PM , LKML , Peter Zijlstra , Srinivas Pandruvada , Juri Lelli , Morten Rasmussen , Ingo Molnar Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2115 Lines: 51 On 25 March 2017 at 04:48, Joel Fernandes wrote: > Hi Vincent, > > On Thu, Mar 23, 2017 at 3:08 PM, Vincent Guittot > wrote: > [..] >>>> >>>>> So I'm not really aligned with the description of your problem: PELT >>>>> metric underestimates the load of the CPU. The PELT is just about >>>>> tracking CFS task utilization but not whole CPU utilization and >>>>> according to your description of the problem (time stolen by irq), >>>>> your problem doesn't come from an underestimation of CFS task but from >>>>> time spent in something else but not accounted in the value used by >>>>> schedutil >>>> >>>> Quite likely. Indeed, it can really be that the CFS task is preempted >>>> because of some RT activity generated by the IRQ handler. >>>> >>>> More in general, I've also noticed many suboptimal freq switches when >>>> RT tasks interleave with CFS ones, because of: >>>> - relatively long down _and up_ throttling times >>>> - the way schedutil's flags are tracked and updated >>>> - the callsites from where we call schedutil updates >>>> >>>> For example it can really happen that we are running at the highest >>>> OPP because of some RT activity. Then we switch back to a relatively >>>> low utilization CFS workload and then: >>>> 1. a tick happens which produces a frequency drop >>> >>> Any idea why this frequency drop would happen? Say a running CFS task >>> gets preempted by RT task, the PELT signal shouldn't drop for the >>> duration the CFS task is preempted because the task is runnable, so >> >> utilization only tracks the running state but not runnable state. >> Runnable state is tracked in load_avg > > Thanks. I got it now. > > Correct me if I'm wrong but strictly speaking utilization for a cfs_rq > (which drives the frequency for CFS) still tracks the blocked/runnable > time of tasks although its decayed as time moves forward. Only when we > migrate the rq of a cfs task is the util_avg contribution removed from > the rq. But I can see now why running RT can decay this load tracking > signal. Yes. you're right > > Regards, > Joel