Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754860AbdCVX4w (ORCPT ); Wed, 22 Mar 2017 19:56:52 -0400 Received: from mail-ua0-f170.google.com ([209.85.217.170]:34670 "EHLO mail-ua0-f170.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751722AbdCVX4m (ORCPT ); Wed, 22 Mar 2017 19:56:42 -0400 MIME-Version: 1.0 In-Reply-To: <20170320123416.GB27896@e110439-lin> References: <4366682.tsferJN35u@aspire.rjw.lan> <2185243.flNrap3qq1@aspire.rjw.lan> <20170320035745.GC25659@vireshk-i7> <20170320123416.GB27896@e110439-lin> From: Joel Fernandes Date: Wed, 22 Mar 2017 16:56:40 -0700 Message-ID: Subject: Re: [RFC][PATCH 2/2] cpufreq: schedutil: Force max frequency on busy CPUs To: Patrick Bellasi Cc: "Rafael J. Wysocki" , Viresh Kumar , Vincent Guittot , Linux PM , LKML , Peter Zijlstra , Srinivas Pandruvada , Juri Lelli , Morten Rasmussen , Ingo Molnar Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2881 Lines: 63 On Mon, Mar 20, 2017 at 5:34 AM, Patrick Bellasi wrote: > On 20-Mar 09:26, Vincent Guittot wrote: >> On 20 March 2017 at 04:57, Viresh Kumar wrote: >> > On 19-03-17, 14:34, Rafael J. Wysocki wrote: >> >> From: Rafael J. Wysocki >> >> >> >> The PELT metric used by the schedutil governor underestimates the >> >> CPU utilization in some cases. The reason for that may be time spent >> >> in interrupt handlers and similar which is not accounted for by PELT. >> >> Are you sure of the root cause described above (time stolen by irq >> handler) or is it just a hypotheses ? That would be good to be sure of >> the root cause >> Furthermore, IIRC the time spent in irq context is also accounted as >> run time for the running cfs task but not RT and deadline task running >> time > > As long as the IRQ processing does not generate a context switch, > which is happening (eventually) if the top half schedule some deferred > work to be executed by a bottom half. > > Thus, me too I would say that all the top half time is accounted in > PELT, since the current task is still RUNNABLE/RUNNING. Sorry if I'm missing something but doesn't this depend on whether you have CONFIG_IRQ_TIME_ACCOUNTING enabled? __update_load_avg uses rq->clock_task for deltas which I think shouldn't account IRQ time with that config option. So it should be quite possible for IRQ time spent to reduce the PELT signal right? > >> So I'm not really aligned with the description of your problem: PELT >> metric underestimates the load of the CPU. The PELT is just about >> tracking CFS task utilization but not whole CPU utilization and >> according to your description of the problem (time stolen by irq), >> your problem doesn't come from an underestimation of CFS task but from >> time spent in something else but not accounted in the value used by >> schedutil > > Quite likely. Indeed, it can really be that the CFS task is preempted > because of some RT activity generated by the IRQ handler. > > More in general, I've also noticed many suboptimal freq switches when > RT tasks interleave with CFS ones, because of: > - relatively long down _and up_ throttling times > - the way schedutil's flags are tracked and updated > - the callsites from where we call schedutil updates > > For example it can really happen that we are running at the highest > OPP because of some RT activity. Then we switch back to a relatively > low utilization CFS workload and then: > 1. a tick happens which produces a frequency drop Any idea why this frequency drop would happen? Say a running CFS task gets preempted by RT task, the PELT signal shouldn't drop for the duration the CFS task is preempted because the task is runnable, so once the CFS task gets CPU back, schedutil should still maintain the capacity right? Regards, Joel