Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp34808imu; Thu, 24 Jan 2019 19:12:55 -0800 (PST) X-Google-Smtp-Source: ALg8bN6jpqIDC6x9v60SpcVusW5wZqzLiLYjh38QPHKsaqCEPCuk5GFMnDgWfYssVMhA32S4ORn7 X-Received: by 2002:a63:557:: with SMTP id 84mr8183306pgf.411.1548385974933; Thu, 24 Jan 2019 19:12:54 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1548385974; cv=none; d=google.com; s=arc-20160816; b=PiCkG76GgRee55h+cs4i2O2RdouOWwSlGmvNSqLaMw6ylpJILGEjj+JBRFbPQvFK5Z rS/71ixF7H8gIB0XulUDe8CdFt6JArCrbK99PiFbiXc4AaZchKDY3etmk+csTEhxBJuR atCot9286pFOfuPvHa48yESaCFoA5e6hOgAegfrmfp5egyOnzvzIKcBl8i3LqKx1Zl5X itSKR6M+1aL30JmToTmhyK/9nQE3SScLEOcr/3QUjWSEl3ULwqT0oxYbqPT8lxIuGJ5R E8LmeOl46W2cHlQQ6BqQ7NNciK8wQN52Gfrh+R4BKlitQFbBFyBmcXBsulqBXbj1n3G5 okSQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject; bh=8jAzcggyMwb4LaGg4hhpjtH6MHlv0F4KSi8CaDFrMgw=; b=OarJNPEqKEapWwJbalwvP7gqVy2HT7G/40wHtFA/oQXgNinEJNnSgl4LOI5uCiet8b MtxAYbYeDsQODZLtu0RkIkMYlMhPgWDvNAj4s8juvsnKs6484oNnDI23nuS7cHjj3gkP cW8SXd18KoJHnhvYjtMUfHVmX5h0MwlcrUxwX8MX/qbQ4AxUvGsha95VX6N4lzbccksm lLgiyAu46W4ON1oryFZUZgo14S+tBS+H1cQewvuPjnBcF+Lfs3eqUH0qw/dTwjlBY0G9 uYKl/jL7TXO+8/OEx0oktxgt4L/3z8qyR6lgNzHvyyPeej0qaLneGLkuaYMKBCiy300f xjvQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id c10si1500427pla.173.2019.01.24.19.12.13; Thu, 24 Jan 2019 19:12:54 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726815AbfAYDMJ (ORCPT + 99 others); Thu, 24 Jan 2019 22:12:09 -0500 Received: from out30-131.freemail.mail.aliyun.com ([115.124.30.131]:57981 "EHLO out30-131.freemail.mail.aliyun.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725991AbfAYDMJ (ORCPT ); Thu, 24 Jan 2019 22:12:09 -0500 X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R511e4;CH=green;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01f04391;MF=yun.wang@linux.alibaba.com;NM=1;PH=DS;RN=5;SR=0;TI=SMTPD_---0TIxfUcd_1548385900; Received: from testdeMacBook-Pro.local(mailfrom:yun.wang@linux.alibaba.com fp:SMTPD_---0TIxfUcd_1548385900) by smtp.aliyun-inc.com(127.0.0.1); Fri, 25 Jan 2019 11:12:06 +0800 Subject: Re: [PATCH] sched/debug: Show intergroup and hierarchy sum wait time of a task group To: ufo19890607@gmail.com, mingo@redhat.com, peterz@infradead.org, yuzhoujian@didichuxing.com Cc: linux-kernel@vger.kernel.org References: <1548236816-18712-1-git-send-email-ufo19890607@gmail.com> From: =?UTF-8?B?546L6LSH?= Message-ID: <3680160f-a439-02a3-3d40-56de18096c4b@linux.alibaba.com> Date: Fri, 25 Jan 2019 11:11:40 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 In-Reply-To: <1548236816-18712-1-git-send-email-ufo19890607@gmail.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2019/1/23 下午5:46, ufo19890607@gmail.com wrote: > From: yuzhoujian > > We can monitor the sum wait time of a task group since 'commit 3d6c50c27bd6 > ("sched/debug: Show the sum wait time of a task group")'. However this > wait_sum just represents the confilct between different task groups, since > it is simply sum the wait time of task_group's cfs_rq. And we still cannot > evaluate the conflict between all the tasks within hierarchy of this group, > so the hierarchy wait time is still needed. Could you please give us a scene that we do need this hierarchy wait_sum, despite the extra overhead? Regards, Michael Wang > > Thus we introduce hierarchy wait_sum which summarizes the total wait sum of > all the tasks in the hierarchy of a group. > > The 'cpu.stat' is modified to show the statistic, like: > > nr_periods 0 > nr_throttled 0 > throttled_time 0 > intergroup wait_sum 2842251984 > hierarchy wait_sum 6389509389332798 > > From now on we can monitor both the wait_sum of intergroup and hierarchy, > which will inevitably help a system administrator know how intense the CPU > competition is within a task group and between different task groups. We > can calculate the wait rate of a task group based on hierarchy wait_sum and > cpuacct.usage. > > For example: > X% = (current_wait_sum - last_wait_sum) / ((current_usage - > last_usage) + (current_wait_sum - last_wait_sum)) > > That means the task group paid X percentage of time on runqueue waiting > for the CPU. > > Signed-off-by: yuzhoujian > --- > kernel/sched/core.c | 11 +++++++---- > kernel/sched/fair.c | 17 +++++++++++++++++ > kernel/sched/sched.h | 3 +++ > 3 files changed, 27 insertions(+), 4 deletions(-) > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index ee77636..172e6fb 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -6760,13 +6760,16 @@ static int cpu_cfs_stat_show(struct seq_file *sf, void *v) > seq_printf(sf, "throttled_time %llu\n", cfs_b->throttled_time); > > if (schedstat_enabled() && tg != &root_task_group) { > - u64 ws = 0; > + u64 inter_ws = 0, hierarchy_ws = 0; > int i; > > - for_each_possible_cpu(i) > - ws += schedstat_val(tg->se[i]->statistics.wait_sum); > + for_each_possible_cpu(i) { > + inter_ws += schedstat_val(tg->se[i]->statistics.wait_sum); > + hierarchy_ws += tg->cfs_rq[i]->hierarchy_wait_sum; > + } > > - seq_printf(sf, "wait_sum %llu\n", ws); > + seq_printf(sf, "intergroup wait_sum %llu\n", inter_ws); > + seq_printf(sf, "hierarchy wait_sum %llu\n", hierarchy_ws); > } > > return 0; > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index e2ff4b6..35e89ca 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -858,6 +858,19 @@ static void update_curr_fair(struct rq *rq) > } > > static inline void > +update_hierarchy_wait_sum(struct sched_entity *se, > + u64 delta_wait) > +{ > + for_each_sched_entity(se) { > + struct cfs_rq *cfs_rq = cfs_rq_of(se); > + > + if (cfs_rq->tg != &root_task_group) > + __schedstat_add(cfs_rq->hierarchy_wait_sum, > + delta_wait); > + } > +} > + > +static inline void > update_stats_wait_end(struct cfs_rq *cfs_rq, struct sched_entity *se) > { > struct task_struct *p; > @@ -880,6 +893,7 @@ static void update_curr_fair(struct rq *rq) > return; > } > trace_sched_stat_wait(p, delta); > + update_hierarchy_wait_sum(se, delta); > } > > __schedstat_set(se->statistics.wait_max, > @@ -10273,6 +10287,9 @@ void init_cfs_rq(struct cfs_rq *cfs_rq) > #ifndef CONFIG_64BIT > cfs_rq->min_vruntime_copy = cfs_rq->min_vruntime; > #endif > +#ifdef CONFIG_SCHEDSTATS > + cfs_rq->hierarchy_wait_sum = 0; > +#endif > #ifdef CONFIG_SMP > raw_spin_lock_init(&cfs_rq->removed.lock); > #endif > diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h > index d27c1a5..c01ab99 100644 > --- a/kernel/sched/sched.h > +++ b/kernel/sched/sched.h > @@ -496,6 +496,9 @@ struct cfs_rq { > #ifndef CONFIG_64BIT > u64 min_vruntime_copy; > #endif > +#ifdef CONFIG_SCHEDSTATS > + u64 hierarchy_wait_sum; > +#endif > > struct rb_root_cached tasks_timeline; > >