Received: by 2002:ac0:a5a7:0:0:0:0:0 with SMTP id m36-v6csp4674050imm; Mon, 30 Jul 2018 20:34:21 -0700 (PDT) X-Google-Smtp-Source: AAOMgpe3KMfp0u3kgLQkOM7H8vTArVhysCYzCwhdbjD+3e0WWleQ32XlX3PLfTgD41mb3Z4ib7k7 X-Received: by 2002:a65:5307:: with SMTP id m7-v6mr19178150pgq.431.1533008061513; Mon, 30 Jul 2018 20:34:21 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1533008061; cv=none; d=google.com; s=arc-20160816; b=Gumr7m7O9DixioHexr6nKLBlH3vBPMwiDiO7ghtdv3zLjeVI4DiG80gh8kqVRw+VaM 6QoqP44OEShX1IeUq5J2UmYw4PHmY/hUqqoFX1w9JTuHExaUPpWjtz8tqUzezkSmR2CY kZjXc1YjsFF1FLp8hJxj0nJ5NsoZAuS74DT5f4FSc4hlxw7K8DttKwB8jQcxJjH1ndT9 oQq7lQP3rYTR5GJFl8BfS+JObwidEYsXnt3AnRYaP++i9xIN5ESD9E54BDplxzlWH3ZX in08T/XnruRHdy1uJ2Bm507ftaF4jxzcCzjtk1a0ANpJD3p84Azzn+iBBNujo+RRBmy7 4dLQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature :arc-authentication-results; bh=6lOjUy1TkEaoDnWGD0SY+vArFlkVZWuzFedM0ZXNf6I=; b=a0zDlNuP8ZOcxa4PEQ6185Ug3bYWgCelV6liFT7lV5XxuS3EmByWsHNe4y7JcDApfI byGgpbmbdrzHjzUZUfmA64354gmxGqgRqF2UtxkbUCelvqo//l0FalqVrPFmsvkV3QQZ WaEFpk7PJAqi9b21VFg1v9pWNSo2Z441KLipfRdVuq3zEVeB6YsGYIg2LZygwWdPamVi q6u3yUVu3zZF8krh4HyHiU1+8Gt394dT4t89VTqbs1d+Av66pesqQu3iU7vxtCnP21kv //0YOlOu9jMLM+pfTESh/We+O9BLnjfpyALUUnlLQzM28xSvMayHA8+WazYKkUk8fBnn YUQQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=G+B1LuK6; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f5-v6si12394028pga.340.2018.07.30.20.34.07; Mon, 30 Jul 2018 20:34:21 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=G+B1LuK6; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727495AbeGaFKv (ORCPT + 99 others); Tue, 31 Jul 2018 01:10:51 -0400 Received: from mail-oi0-f68.google.com ([209.85.218.68]:36438 "EHLO mail-oi0-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726389AbeGaFKv (ORCPT ); Tue, 31 Jul 2018 01:10:51 -0400 Received: by mail-oi0-f68.google.com with SMTP id n21-v6so25316266oig.3; Mon, 30 Jul 2018 20:32:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=6lOjUy1TkEaoDnWGD0SY+vArFlkVZWuzFedM0ZXNf6I=; b=G+B1LuK6Yeo9KMT1TYTQjYoS0FDQUOnozF5CSzdtuQVmbEaMUsNcDMM6go6QOOlk3P pdaxxdtcJ52mh6joBaUey/LDDp5+hoEyUqI30Ho9OQlyTq7s8NgafZer51ypyr2i1QOp z2qeGJmeyk1Lv7oSpw+3uraEomLsg5bTVYVc87zuyzAbt8EA2yioB7g1NtfnBRY/hp4P TDQ5wfAHQrXBvpODSEdxShnaCY8tea7FiVlpVxIb6nA3nNACZwmHOaWZdFpFqp5KcEDf IbU2opcq7BoXeZ8uYvFIjRxSrEi09Pe8brqS0ooKdaxPSliiS2Zkigsz9V8qJEn8IOQX ebSQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=6lOjUy1TkEaoDnWGD0SY+vArFlkVZWuzFedM0ZXNf6I=; b=aMWFm224uZ2Q+kjY+p7QWMplGkTzzn8EcxYX914vSgvJbzq1H9uTVZnzBf/0ZsPEMD o754qqK/lgjrtYqo2dlAvZVd0cPcwmhZ6GiTTh4Fe/X7gIK1kEv/lZqj2XhgxRWGWoIy 9Yg9P2hkOkjyPv2lxVyHefi9zhlxxfrsKajh3b24waBHcthxShdjOzGVpCC7HXzw7vk2 13LxE9tJ659vaHMRPkjI44duyUcZbQRJI0uqwUt1PsrZ8YJSvA+3QuyuTSAeuRQZF9Qh eoAFCaTkUDu6nddQFrqhPgoyRb+G+qoupezP0h0IdjmGDbNgYYmWu+8x7wgnZ/pkke/+ 9H/Q== X-Gm-Message-State: AOUpUlHtbAiRCGS2Z+lvQwdku8D2D/Bp0j5uhnITmp++oqoydGc1xvUo nPfb433v/o60dnJ5DApgKRy0t3DwOdm67kGmc14= X-Received: by 2002:aca:4ec6:: with SMTP id c189-v6mr21630113oib.186.1533007964019; Mon, 30 Jul 2018 20:32:44 -0700 (PDT) MIME-Version: 1.0 References: <1530200714-4504-1-git-send-email-vincent.guittot@linaro.org> <1530200714-4504-7-git-send-email-vincent.guittot@linaro.org> In-Reply-To: From: Wanpeng Li Date: Tue, 31 Jul 2018 11:32:38 +0800 Message-ID: Subject: Re: [PATCH 06/11] sched/irq: add irq utilization tracking To: Vincent Guittot Cc: Peter Zijlstra , Ingo Molnar , LKML , "Rafael J. Wysocki" , juri.lelli@redhat.com, Dietmar Eggemann , Morten Rasmussen , Viresh Kumar , valentin.schneider@arm.com, Patrick Bellasi , joel@joelfernandes.org, Daniel Lezcano , quentin.perret@arm.com, Luca Abeni , claudio@evidence.eu.com, Ingo Molnar , kvm Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, 31 Jul 2018 at 00:43, Vincent Guittot wrote: > > Hi Wanpeng, > > On Thu, 26 Jul 2018 at 05:09, Wanpeng Li wrote: > > > > Hi Vincent, > > On Fri, 29 Jun 2018 at 03:07, Vincent Guittot > > wrote: > > > > > > interrupt and steal time are the only remaining activities tracked by > > > rt_avg. Like for sched classes, we can use PELT to track their average > > > utilization of the CPU. But unlike sched class, we don't track when > > > entering/leaving interrupt; Instead, we take into account the time spent > > > under interrupt context when we update rqs' clock (rq_clock_task). > > > This also means that we have to decay the normal context time and account > > > for interrupt time during the update. > > > > > > That's also important to note that because > > > rq_clock == rq_clock_task + interrupt time > > > and rq_clock_task is used by a sched class to compute its utilization, the > > > util_avg of a sched class only reflects the utilization of the time spent > > > in normal context and not of the whole time of the CPU. The utilization of > > > interrupt gives an more accurate level of utilization of CPU. > > > The CPU utilization is : > > > avg_irq + (1 - avg_irq / max capacity) * /Sum avg_rq > > > > > > Most of the time, avg_irq is small and neglictible so the use of the > > > approximation CPU utilization = /Sum avg_rq was enough > > > > > > Cc: Ingo Molnar > > > Cc: Peter Zijlstra > > > Signed-off-by: Vincent Guittot > > > --- > > > kernel/sched/core.c | 4 +++- > > > kernel/sched/fair.c | 13 ++++++++++--- > > > kernel/sched/pelt.c | 40 ++++++++++++++++++++++++++++++++++++++++ > > > kernel/sched/pelt.h | 16 ++++++++++++++++ > > > kernel/sched/sched.h | 3 +++ > > > 5 files changed, 72 insertions(+), 4 deletions(-) > > > > > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > > > index 78d8fac..e5263a4 100644 > > > --- a/kernel/sched/core.c > > > +++ b/kernel/sched/core.c > > > @@ -18,6 +18,8 @@ > > > #include "../workqueue_internal.h" > > > #include "../smpboot.h" > > > > > > +#include "pelt.h" > > > + > > > #define CREATE_TRACE_POINTS > > > #include > > > > > > @@ -186,7 +188,7 @@ static void update_rq_clock_task(struct rq *rq, s64 delta) > > > > > > #if defined(CONFIG_IRQ_TIME_ACCOUNTING) || defined(CONFIG_PARAVIRT_TIME_ACCOUNTING) > > > if ((irq_delta + steal) && sched_feat(NONTASK_CAPACITY)) > > > - sched_rt_avg_update(rq, irq_delta + steal); > > > + update_irq_load_avg(rq, irq_delta + steal); > > > > I think we should not add steal time into irq load tracking, steal > > time is always 0 on native kernel which doesn't matter, what will > > happen when guest disables IRQ_TIME_ACCOUNTING and enables > > PARAVIRT_TIME_ACCOUNTING? Steal time is not the real irq util_avg. In > > addition, we haven't exposed power management for performance which > > means that e.g. schedutil governor can not cooperate with passive mode > > intel_pstate driver to tune the OPP. To decay the old steal time avg > > and add the new one just wastes cpu cycles. > > In fact, I have kept the same behavior as with rt_avg, which was > already adding steal time when computing scale_rt_capacity, which is > used to reflect the remaining capacity for FAIR tasks and is used in > load balance. I'm not sure that it's worth using different variables > for irq and steal. > That being said, I see a possible optimization in schedutil when > PARAVIRT_TIME_ACCOUNTING is enable and IRQ_TIME_ACCOUNTING is disable. > With this kind of config, scale_irq_capacity can be a nop for > schedutil but scales the utilization for scale_rt_capacity Yeah, this is what in my mind before, you can make a patch for that. :) Regards, Wanpeng Li