Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933860AbdCWWIk (ORCPT ); Thu, 23 Mar 2017 18:08:40 -0400 Received: from mail-ot0-f177.google.com ([74.125.82.177]:35514 "EHLO mail-ot0-f177.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752225AbdCWWIh (ORCPT ); Thu, 23 Mar 2017 18:08:37 -0400 MIME-Version: 1.0 In-Reply-To: References: <4366682.tsferJN35u@aspire.rjw.lan> <2185243.flNrap3qq1@aspire.rjw.lan> <20170320035745.GC25659@vireshk-i7> <20170320123416.GB27896@e110439-lin> From: Vincent Guittot Date: Thu, 23 Mar 2017 23:08:10 +0100 Message-ID: Subject: Re: [RFC][PATCH 2/2] cpufreq: schedutil: Force max frequency on busy CPUs To: Joel Fernandes Cc: Patrick Bellasi , "Rafael J. Wysocki" , Viresh Kumar , Linux PM , LKML , Peter Zijlstra , Srinivas Pandruvada , Juri Lelli , Morten Rasmussen , Ingo Molnar Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3133 Lines: 68 On 23 March 2017 at 00:56, Joel Fernandes wrote: > On Mon, Mar 20, 2017 at 5:34 AM, Patrick Bellasi > wrote: >> On 20-Mar 09:26, Vincent Guittot wrote: >>> On 20 March 2017 at 04:57, Viresh Kumar wrote: >>> > On 19-03-17, 14:34, Rafael J. Wysocki wrote: >>> >> From: Rafael J. Wysocki >>> >> >>> >> The PELT metric used by the schedutil governor underestimates the >>> >> CPU utilization in some cases. The reason for that may be time spent >>> >> in interrupt handlers and similar which is not accounted for by PELT. >>> >>> Are you sure of the root cause described above (time stolen by irq >>> handler) or is it just a hypotheses ? That would be good to be sure of >>> the root cause >>> Furthermore, IIRC the time spent in irq context is also accounted as >>> run time for the running cfs task but not RT and deadline task running >>> time >> >> As long as the IRQ processing does not generate a context switch, >> which is happening (eventually) if the top half schedule some deferred >> work to be executed by a bottom half. >> >> Thus, me too I would say that all the top half time is accounted in >> PELT, since the current task is still RUNNABLE/RUNNING. > > Sorry if I'm missing something but doesn't this depend on whether you > have CONFIG_IRQ_TIME_ACCOUNTING enabled? > > __update_load_avg uses rq->clock_task for deltas which I think > shouldn't account IRQ time with that config option. So it should be > quite possible for IRQ time spent to reduce the PELT signal right? > >> >>> So I'm not really aligned with the description of your problem: PELT >>> metric underestimates the load of the CPU. The PELT is just about >>> tracking CFS task utilization but not whole CPU utilization and >>> according to your description of the problem (time stolen by irq), >>> your problem doesn't come from an underestimation of CFS task but from >>> time spent in something else but not accounted in the value used by >>> schedutil >> >> Quite likely. Indeed, it can really be that the CFS task is preempted >> because of some RT activity generated by the IRQ handler. >> >> More in general, I've also noticed many suboptimal freq switches when >> RT tasks interleave with CFS ones, because of: >> - relatively long down _and up_ throttling times >> - the way schedutil's flags are tracked and updated >> - the callsites from where we call schedutil updates >> >> For example it can really happen that we are running at the highest >> OPP because of some RT activity. Then we switch back to a relatively >> low utilization CFS workload and then: >> 1. a tick happens which produces a frequency drop > > Any idea why this frequency drop would happen? Say a running CFS task > gets preempted by RT task, the PELT signal shouldn't drop for the > duration the CFS task is preempted because the task is runnable, so utilization only tracks the running state but not runnable state. Runnable state is tracked in load_avg > once the CFS task gets CPU back, schedutil should still maintain the > capacity right? > > Regards, > Joel