Received: by 2002:ac0:a5b6:0:0:0:0:0 with SMTP id m51-v6csp894383imm; Tue, 5 Jun 2018 06:13:22 -0700 (PDT) X-Google-Smtp-Source: ADUXVKKOnSIRT+/dF9ckpNk/pF/tz45hDNKXr6m1LKeTY4zAWiBK3YQi8DPRIMdmgCY0cPfCyRSo X-Received: by 2002:a62:121a:: with SMTP id a26-v6mr21609271pfj.104.1528204401916; Tue, 05 Jun 2018 06:13:21 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1528204401; cv=none; d=google.com; s=arc-20160816; b=cPxutjlAfVk17p9BSqKKpCEIMDAi/ZWTOOLiGN8WkLrPo5TX4PVtnvJJGaqlbZ3HEv 9eaMHqNt/t+7KIsL/xmWW/AXmytnnyLZ5ZvgOJsW3nMaKTxNfg7BsCYPf4lIXpLAqRtc 1g9iBf/ujoSJD6sBwypU0dp6JwcM3Gxp1qwYj8A5p7egKa/A1mOYd5CwpV9Az4VXlt10 xbWSDsKPyNqJgYghrzXhpi17INCqDtkANm2PTrP2ng81bkRgW6T3XANmT5Ba3iSKan/z 9CHgo/rtH4p1GA6UjpKs8ehQheIWJzZZp+jdTlOe2DnmYg6ThfUjfIH3Kkj5fs2kHC8m nCdA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:arc-authentication-results; bh=OUUQyanjNhGZivhRWPdNtQn1zfeX/XU8tOXdrCfrYuw=; b=gkbJHLyZpWcyncgabAK+GBvwIJBgKjlzlFbYlF38MDc7Lj7T+3nX4XXuRTZFYfAiuO 0/Dap8WAzPbJB5+c9rcnYYw8c7uRP8PpMpE58wVFCAtpNnOIPYnQtQxQb+Pj5P3GydC8 zubD0qRWSgleUlFOkouO2zOI+lSqJycbeTk+oFG1xjZozko1/QzlLYj99nfs9ei76vpI khOX6kG/KKQ1i+VctgU8TzMfS/SDyN/D7NqYDtM3hBpgm7C+/DXvcymYk3oJ5ROXNhzQ WMYeA7d2juOFxkCl9kPVfF4HcNRC/t15xrKoPJjzfch0W/GmAHFSxLZBc1qIjxUxDcY+ hU4Q== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id s3-v6si48823488plb.394.2018.06.05.06.13.07; Tue, 05 Jun 2018 06:13:21 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751840AbeFENM3 (ORCPT + 99 others); Tue, 5 Jun 2018 09:12:29 -0400 Received: from usa-sjc-mx-foss1.foss.arm.com ([217.140.101.70]:55766 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751765AbeFENM2 (ORCPT ); Tue, 5 Jun 2018 09:12:28 -0400 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id C19A01435; Tue, 5 Jun 2018 06:12:27 -0700 (PDT) Received: from e108498-lin.cambridge.arm.com (e108498-lin.cambridge.arm.com [10.1.210.84]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 035493F5A0; Tue, 5 Jun 2018 06:12:25 -0700 (PDT) Date: Tue, 5 Jun 2018 14:12:24 +0100 From: Quentin Perret To: Vincent Guittot Cc: Peter Zijlstra , Ingo Molnar , linux-kernel , "Rafael J. Wysocki" , Juri Lelli , Dietmar Eggemann , Morten Rasmussen , viresh kumar , Valentin Schneider Subject: Re: [PATCH v5 00/10] track CPU utilization Message-ID: <20180605131224.GC12193@e108498-lin.cambridge.arm.com> References: <1527253951-22709-1-git-send-email-vincent.guittot@linaro.org> <20180605105721.GA12193@e108498-lin.cambridge.arm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.8.3 (2017-05-23) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tuesday 05 Jun 2018 at 13:59:56 (+0200), Vincent Guittot wrote: > On 5 June 2018 at 12:57, Quentin Perret wrote: > > Hi Vincent, > > > > On Tuesday 05 Jun 2018 at 10:36:26 (+0200), Vincent Guittot wrote: > >> Hi Quentin, > >> > >> On 25 May 2018 at 15:12, Vincent Guittot wrote: > >> > This patchset initially tracked only the utilization of RT rq. During > >> > OSPM summit, it has been discussed the opportunity to extend it in order > >> > to get an estimate of the utilization of the CPU. > >> > > >> > - Patches 1-3 correspond to the content of patchset v4 and add utilization > >> > tracking for rt_rq. > >> > > >> > When both cfs and rt tasks compete to run on a CPU, we can see some frequency > >> > drops with schedutil governor. In such case, the cfs_rq's utilization doesn't > >> > reflect anymore the utilization of cfs tasks but only the remaining part that > >> > is not used by rt tasks. We should monitor the stolen utilization and take > >> > it into account when selecting OPP. This patchset doesn't change the OPP > >> > selection policy for RT tasks but only for CFS tasks > >> > > >> > A rt-app use case which creates an always running cfs thread and a rt threads > >> > that wakes up periodically with both threads pinned on same CPU, show lot of > >> > frequency switches of the CPU whereas the CPU never goes idles during the > >> > test. I can share the json file that I used for the test if someone is > >> > interested in. > >> > > >> > For a 15 seconds long test on a hikey 6220 (octo core cortex A53 platfrom), > >> > the cpufreq statistics outputs (stats are reset just before the test) : > >> > $ cat /sys/devices/system/cpu/cpufreq/policy0/stats/total_trans > >> > without patchset : 1230 > >> > with patchset : 14 > >> > >> I have attached the rt-app json file that I use for this test > > > > Thank you very much ! I did a quick test with a much simpler fix to this > > RT-steals-time-from-CFS issue using just the existing scale_rt_capacity(). > > I get the following results on Hikey960: > > > > Without patch: > > cat /sys/devices/system/cpu/cpufreq/policy0/stats/total_trans > > 12 > > cat /sys/devices/system/cpu/cpufreq/policy4/stats/total_trans > > 640 > > With patch > > cat /sys/devices/system/cpu/cpufreq/policy0/stats/total_trans > > 8 > > cat /sys/devices/system/cpu/cpufreq/policy4/stats/total_trans > > 12 > > > > Yes the rt_avg stuff is out of sync with the PELT signal, but do you think > > this is an actual issue for realistic use-cases ? > > yes I think that it's worth syncing and consolidating things on the > same metric. The result will be saner and more robust as we will have > the same behavior TBH I'm not disagreeing with that, the PELT-everywhere approach feels cleaner in a way, but do you have a use-case in mind where this will definitely help ? I mean, yes the rt_avg is a slow response to the RT pressure, but is this always a problem ? Ramping down slower might actually help in some cases no ? > > > > > What about the diff below (just a quick hack to show the idea) applied > > on tip/sched/core ? > > > > ---8<--- > > diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c > > index a8ba6d1f262a..23a4fb1c2c25 100644 > > --- a/kernel/sched/cpufreq_schedutil.c > > +++ b/kernel/sched/cpufreq_schedutil.c > > @@ -180,9 +180,12 @@ static void sugov_get_util(struct sugov_cpu *sg_cpu) > > sg_cpu->util_dl = cpu_util_dl(rq); > > } > > > > +unsigned long scale_rt_capacity(int cpu); > > static unsigned long sugov_aggregate_util(struct sugov_cpu *sg_cpu) > > { > > struct rq *rq = cpu_rq(sg_cpu->cpu); > > + int cpu = sg_cpu->cpu; > > + unsigned long util, dl_bw; > > > > if (rq->rt.rt_nr_running) > > return sg_cpu->max; > > @@ -197,7 +200,14 @@ static unsigned long sugov_aggregate_util(struct sugov_cpu *sg_cpu) > > * util_cfs + util_dl as requested freq. However, cpufreq is not yet > > * ready for such an interface. So, we only do the latter for now. > > */ > > - return min(sg_cpu->max, (sg_cpu->util_dl + sg_cpu->util_cfs)); > > + util = arch_scale_cpu_capacity(NULL, cpu) * scale_rt_capacity(cpu); > > + util >>= SCHED_CAPACITY_SHIFT; > > + util = arch_scale_cpu_capacity(NULL, cpu) - util; > > + util += sg_cpu->util_cfs; > > + dl_bw = (rq->dl.this_bw * SCHED_CAPACITY_SCALE) >> BW_SHIFT; > > + > > + /* Make sure to always provide the reserved freq to DL. */ > > + return max(util, dl_bw); > > } > > > > static void sugov_set_iowait_boost(struct sugov_cpu *sg_cpu, u64 time, unsigned int flags) > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > > index f01f0f395f9a..0e87cbe47c8b 100644 > > --- a/kernel/sched/fair.c > > +++ b/kernel/sched/fair.c > > @@ -7868,7 +7868,7 @@ static inline int get_sd_load_idx(struct sched_domain *sd, > > return load_idx; > > } > > > > -static unsigned long scale_rt_capacity(int cpu) > > +unsigned long scale_rt_capacity(int cpu) > > { > > struct rq *rq = cpu_rq(cpu); > > u64 total, used, age_stamp, avg; > > --->8--- > > > > > > > >> > >> > > >> > If we replace the cfs thread of rt-app by a sysbench cpu test, we can see > >> > performance improvements: > >> > > >> > - Without patchset : > >> > Test execution summary: > >> > total time: 15.0009s > >> > total number of events: 4903 > >> > total time taken by event execution: 14.9972 > >> > per-request statistics: > >> > min: 1.23ms > >> > avg: 3.06ms > >> > max: 13.16ms > >> > approx. 95 percentile: 12.73ms > >> > > >> > Threads fairness: > >> > events (avg/stddev): 4903.0000/0.00 > >> > execution time (avg/stddev): 14.9972/0.00 > >> > > >> > - With patchset: > >> > Test execution summary: > >> > total time: 15.0014s > >> > total number of events: 7694 > >> > total time taken by event execution: 14.9979 > >> > per-request statistics: > >> > min: 1.23ms > >> > avg: 1.95ms > >> > max: 10.49ms > >> > approx. 95 percentile: 10.39ms > >> > > >> > Threads fairness: > >> > events (avg/stddev): 7694.0000/0.00 > >> > execution time (avg/stddev): 14.9979/0.00 > >> > > >> > The performance improvement is 56% for this use case. > >> > > >> > - Patches 4-5 add utilization tracking for dl_rq in order to solve similar > >> > problem as with rt_rq > >> > > >> > - Patches 6 uses dl and rt utilization in the scale_rt_capacity() and remove > >> > dl and rt from sched_rt_avg_update > >> > > >> > - Patches 7-8 add utilization tracking for interrupt and use it select OPP > >> > A test with iperf on hikey 6220 gives: > >> > w/o patchset w/ patchset > >> > Tx 276 Mbits/sec 304 Mbits/sec +10% > >> > Rx 299 Mbits/sec 328 Mbits/sec +09% > >> > > >> > 8 iterations of iperf -c server_address -r -t 5 > >> > stdev is lower than 1% > >> > Only WFI idle state is enable (shallowest arm idle state) > >> > > >> > - Patches 9 removes the unused sched_avg_update code > >> > > >> > - Patch 10 removes the unused sched_time_avg_ms > >> > > >> > Change since v3: > >> > - add support of periodic update of blocked utilization > >> > - rebase on lastest tip/sched/core > >> > > >> > Change since v2: > >> > - move pelt code into a dedicated pelt.c file > >> > - rebase on load tracking changes > >> > > >> > Change since v1: > >> > - Only a rebase. I have addressed the comments on previous version in > >> > patch 1/2 > >> > > >> > Vincent Guittot (10): > >> > sched/pelt: Move pelt related code in a dedicated file > >> > sched/rt: add rt_rq utilization tracking > >> > cpufreq/schedutil: add rt utilization tracking > >> > sched/dl: add dl_rq utilization tracking > >> > cpufreq/schedutil: get max utilization > >> > sched: remove rt and dl from sched_avg > >> > sched/irq: add irq utilization tracking > >> > cpufreq/schedutil: take into account interrupt > >> > sched: remove rt_avg code > >> > proc/sched: remove unused sched_time_avg_ms > >> > > >> > include/linux/sched/sysctl.h | 1 - > >> > kernel/sched/Makefile | 2 +- > >> > kernel/sched/core.c | 38 +--- > >> > kernel/sched/cpufreq_schedutil.c | 24 ++- > >> > kernel/sched/deadline.c | 7 +- > >> > kernel/sched/fair.c | 381 +++---------------------------------- > >> > kernel/sched/pelt.c | 395 +++++++++++++++++++++++++++++++++++++++ > >> > kernel/sched/pelt.h | 63 +++++++ > >> > kernel/sched/rt.c | 10 +- > >> > kernel/sched/sched.h | 57 ++++-- > >> > kernel/sysctl.c | 8 - > >> > 11 files changed, 563 insertions(+), 423 deletions(-) > >> > create mode 100644 kernel/sched/pelt.c > >> > create mode 100644 kernel/sched/pelt.h > >> > > >> > -- > >> > 2.7.4 > >> > > > > >