Received: by 2002:ac0:a5b6:0:0:0:0:0 with SMTP id m51-v6csp4625814imm; Wed, 30 May 2018 08:57:34 -0700 (PDT) X-Google-Smtp-Source: ADUXVKIRKn/JnvxjnSckHqjgXA3FtcVQuW7ZfLVev0MoQRBlNdKpoRgzCWeR7JF+L6ArbxALYf84 X-Received: by 2002:a17:902:42c3:: with SMTP id h61-v6mr3389578pld.164.1527695854049; Wed, 30 May 2018 08:57:34 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1527695854; cv=none; d=google.com; s=arc-20160816; b=Kze9tuWws7n6D2wQFfUaz5EB7iABR8THNTziXMYjwT/Ap2pEIOgXKLUL2MfZ/IjaPy SJQR+vCrQgW21yQciCrlk0zIgxdBMykmosHkwwyp7qAwJganKQKxo6uiQZXs1SKrDgoS W+mclJKf00b/bPhUWmq2nMLCl27ACFVVpbQuRB2FH3DZMk/CNbyiPeiJL6JhZT2cDMBr mqfm10/4kPYWf/J0ZK9yG45Qm//ZLQ22VTOdbYkIvlWqbAe2rX75SF4CggwHa75N0lNs LP9Ym0pWSxOZfBtjHC8YpjxpkphTvjEyCmjKp1m3XJVEWHT54/5SGvy1qo/LZoQJiMW1 msMg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:arc-authentication-results; bh=paVYB1Bid3v5HIPMuTqQR4fFqMZu4SdIVMoDrJCleYY=; b=RfWFGOQbQuMH9DbQqKy2+2E8HPeJ91/n4V9QoezSSIeuRgM3HAnrsyDCxIZqf5lwWE W6lhrgkw8wR44pnjEf9gorgOnyFG0BPjcc8c/kRcUFAn8KKWCA7000GSUnBDQL9o/Dne zdicangRlbrDMDj7EBoqnVIw8YBuou3wCjEhwPDYzM+0Nbx6y6CMwFpDif7Ol3k7oS+R 3l9SdOoBHfG/FR0bYeGB/EYwdfSHZuq+p5F7H9c/9x5qXWcFZOL4kqW4eKj9bFY7nbcl BvW65vZgIiTQhXLU8iBVgHzbh9O0AZLNjJ814W/Ii/JPkN/r7RaNNNadSv70vCxAp7MS YF7w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id b26-v6si35811299pff.251.2018.05.30.08.57.20; Wed, 30 May 2018 08:57:34 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753906AbeE3Pzk (ORCPT + 99 others); Wed, 30 May 2018 11:55:40 -0400 Received: from foss.arm.com ([217.140.101.70]:58610 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753867AbeE3Pzg (ORCPT ); Wed, 30 May 2018 11:55:36 -0400 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 4461515AD; Wed, 30 May 2018 08:55:36 -0700 (PDT) Received: from [0.0.0.0] (e107985-lin.cambridge.arm.com [10.1.210.41]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 1DDBD3F24A; Wed, 30 May 2018 08:55:29 -0700 (PDT) Subject: Re: [PATCH v5 07/10] sched/irq: add irq utilization tracking To: Vincent Guittot , peterz@infradead.org, mingo@kernel.org, linux-kernel@vger.kernel.org, rjw@rjwysocki.net Cc: juri.lelli@redhat.com, Morten.Rasmussen@arm.com, viresh.kumar@linaro.org, valentin.schneider@arm.com, quentin.perret@arm.com References: <1527253951-22709-1-git-send-email-vincent.guittot@linaro.org> <1527253951-22709-8-git-send-email-vincent.guittot@linaro.org> From: Dietmar Eggemann Message-ID: <72473e6f-8ade-8e26-3282-276fcae4c4c7@arm.com> Date: Wed, 30 May 2018 17:55:24 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.8.0 MIME-Version: 1.0 In-Reply-To: <1527253951-22709-8-git-send-email-vincent.guittot@linaro.org> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 05/25/2018 03:12 PM, Vincent Guittot wrote: > interrupt and steal time are the only remaining activities tracked by > rt_avg. Like for sched classes, we can use PELT to track their average > utilization of the CPU. But unlike sched class, we don't track when > entering/leaving interrupt; Instead, we take into account the time spent > under interrupt context when we update rqs' clock (rq_clock_task). > This also means that we have to decay the normal context time and account > for interrupt time during the update. > > That's also important to note that because > rq_clock == rq_clock_task + interrupt time > and rq_clock_task is used by a sched class to compute its utilization, the > util_avg of a sched class only reflects the utilization of the time spent > in normal context and not of the whole time of the CPU. The utilization of > interrupt gives an more accurate level of utilization of CPU. > The CPU utilization is : > avg_irq + (1 - avg_irq / max capacity) * /Sum avg_rq > > Most of the time, avg_irq is small and neglictible so the use of the > approximation CPU utilization = /Sum avg_rq was enough [...] > @@ -7362,6 +7363,7 @@ static void update_blocked_averages(int cpu) > } > update_rt_rq_load_avg(rq_clock_task(rq), rq, 0); > update_dl_rq_load_avg(rq_clock_task(rq), rq, 0); > + update_irq_load_avg(rq, 0); So this one decays the signals only in case the update_rq_clock_task() didn't call update_irq_load_avg() because 'irq_delta + steal' is 0, right? [...] > diff --git a/kernel/sched/pelt.c b/kernel/sched/pelt.c > index 3d5bd3a..d2e4f21 100644 > --- a/kernel/sched/pelt.c > +++ b/kernel/sched/pelt.c > @@ -355,3 +355,41 @@ int update_dl_rq_load_avg(u64 now, struct rq *rq, int running) > > return 0; > } > + > +/* > + * irq: > + * > + * util_sum = \Sum se->avg.util_sum but se->avg.util_sum is not tracked > + * util_sum = cpu_scale * load_sum > + * runnable_load_sum = load_sum > + * > + */ > + > +int update_irq_load_avg(struct rq *rq, u64 running) > +{ > + int ret = 0; > + /* > + * We know the time that has been used by interrupt since last update > + * but we don't when. Let be pessimistic and assume that interrupt has > + * happened just before the update. This is not so far from reality > + * because interrupt will most probably wake up task and trig an update > + * of rq clock during which the metric si updated. > + * We start to decay with normal context time and then we add the > + * interrupt context time. > + * We can safely remove running from rq->clock because > + * rq->clock += delta with delta >= running This is true as long update_irq_load_avg() with a 'running != 0' is called only after rq->clock moved forward (rq->clock += delta) (which is true for update_rq_clock()->update_rq_clock_task()). > + */ > + ret = ___update_load_sum(rq->clock - running, rq->cpu, &rq->avg_irq, > + 0, > + 0, > + 0); > + ret += ___update_load_sum(rq->clock, rq->cpu, &rq->avg_irq, > + 1, > + 1, > + 1); So you decay the signal in [sa->lut, rq->clock - running] (assumed to be the portion of delta used by the task scheduler) and you increase it in [rq->clock - running, rq->clock] (irq and virt portion of delta). That means that this signal is updated on rq->clock whereas the others are on rq->clock_task. What about the ever growing clock diff between them? I see e.g ~6s after 20min uptime and up to 1.5ms 'running'. It should be still safe to sum the sched class and irq signal in sugov_aggregate_util() because they are independent, I guess. [...]