Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752143AbdGaLjr (ORCPT ); Mon, 31 Jul 2017 07:39:47 -0400 Received: from bombadil.infradead.org ([65.50.211.133]:50115 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752102AbdGaLjq (ORCPT ); Mon, 31 Jul 2017 07:39:46 -0400 Date: Mon, 31 Jul 2017 13:39:40 +0200 From: Peter Zijlstra To: Josef Bacik Cc: mingo@redhat.com, linux-kernel@vger.kernel.org, umgwanakikbuti@gmail.com, tj@kernel.org, kernel-team@fb.com, Josef Bacik Subject: Re: [PATCH 2/7] sched/fair: calculate runnable_weight slightly differently Message-ID: <20170731113940.cj4rfqxsgcdi4dtw@hirez.programming.kicks-ass.net> References: <1500038464-8742-1-git-send-email-josef@toxicpanda.com> <1500038464-8742-3-git-send-email-josef@toxicpanda.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1500038464-8742-3-git-send-email-josef@toxicpanda.com> User-Agent: NeoMutt/20170609 (1.8.3) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3174 Lines: 100 On Fri, Jul 14, 2017 at 01:20:59PM +0000, Josef Bacik wrote: > From: Josef Bacik > > Our runnable_weight currently looks like this > > runnable_weight = shares * runnable_load_avg / load_avg > > The goal is to scale the runnable weight for the group based on its runnable to > load_avg ratio. The problem with this is it biases us towards tasks that never > go to sleep. Tasks that go to sleep are going to have their runnable_load_avg > decayed pretty hard, which will drastically reduce the runnable weight of groups > with interactive tasks. To solve this imbalance we tweak this slightly, so in > the ideal case it is still the above, but in the interactive case it is > > runnable_weight = shares * runnable_weight / load_weight > > which will make the weight distribution fairer between interactive and > non-interactive groups. > > Signed-off-by: Josef Bacik > --- > kernel/sched/fair.c | 12 +++++++++--- > 1 file changed, 9 insertions(+), 3 deletions(-) > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index 326bc55..5d4489e 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -2880,9 +2880,15 @@ static void update_cfs_group(struct sched_entity *se) > * Note: we need to deal with very sporadic 'runnable > load' cases > * due to numerical instability. > */ > - runnable = shares * gcfs_rq->avg.runnable_load_avg; > - if (runnable) > - runnable /= max(gcfs_rq->avg.load_avg, gcfs_rq->avg.runnable_load_avg); > + runnable = shares * max(scale_load_down(gcfs_rq->runnable_weight), > + gcfs_rq->avg.runnable_load_avg); > + if (runnable) { > + long divider = max(gcfs_rq->avg.load_avg, > + scale_load_down(gcfs_rq->load.weight)); > + divider = max_t(long, 1, divider); > + runnable /= divider; > + } > + runnable = clamp_t(long, runnable, MIN_SHARES, shares); So what should be: grq->runnable_load_avg runnable = shares * ---------------------- grq->load_avg Turns into: max(gcfs_rq->avg.runnable_load_avg, gcfs_rq->runnable_weight) shares * ------------------------------------------------------------- max(gcfs_rq->avg.load_avg , gcfs_rq->load.weight) For which I think we can have an analogous argument to what we have for calc_cfs_shares() no? That is, we use the immediate values to get better representation for interactive, but use the avg values as lower bounds in case the immediate value is 0. I think we can write this better like: /* rename calc_cfs_shares to calc_group_shares */ /* * big comment a-la calc_group_shares goes here */ static long calc_group_runnable(...) { /* * We need to deal with numerical instabilities that can result * in sporadic cases where: runnable_avg > load_avg. */ load_avg = max(gcfs_rq->avg.runnable_load_avg, gcfs_rq->avg.load_avg); /* * Since the immediate values can be 0, use the averages as * lower bounds. */ runnable = max(gcfs_rq->runnable_weight, gcfs_rq->avg.runnable_load_avg); load = max(gcfs_rq->load.weight , load_avg); runnable *= shares; if (load) runnable /= load; return clamp_t(long, runnable, MIN_SHARES, shares); } But yes..