Received: by 2002:a25:e7d8:0:0:0:0:0 with SMTP id e207csp2352672ybh; Mon, 9 Mar 2020 04:16:20 -0700 (PDT) X-Google-Smtp-Source: ADFU+vvr6OYxI//iewrbNt1a2/LyzP5aLMZJHOwJfbUifZhXj+O+JM9D44eK76PlrgqsXiKol705 X-Received: by 2002:a54:4403:: with SMTP id k3mr10310501oiw.111.1583752580715; Mon, 09 Mar 2020 04:16:20 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1583752580; cv=none; d=google.com; s=arc-20160816; b=N5FkTe7M8iNv2BbaL5F/ID/fuQE3FxW0x1jnl06+dMk9tv+wE2A11h2akEDMwT5SA8 2e7oAAo5hrBy8odJhc52uC0DyeEwCYgwRKCWW/RN0fmQOEUmWfVxWJ6kQeQezNFJ/fNw jFYlF7LNIEG0TPVHSzLitu6pyENB3n+QBFvVBC5Ni7CTgI2VxkgJbERmV/qF6tnGBtM/ f8a/JKwl+85bPdGZYejR0wDypW6Ro3YeQ5/JFj0Oe8h/EDMpYLFmZM7GbmVS9AV5piZC iY4iBDqH+Mljh4sBcE4uQDJHiYTxfjD+06CjaKpB1zAEHmVFu2VAIHhTOcMgyI40OEbz v4xg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=0x8fgi0ytJn5jh/uttLWlvF3qbwpztPE+Wqbly2aQ4g=; b=XevpImrNTudho9u5grxfDtOUn+yV2EHAngOL7sROZ3Q68aLEMGsKyMdn3R58If9dKq wtua5zXD2GGkcQ5xFZi4SFcVvv36ZorP0D97ftkAqOa2UQ+7r4EzNv4qv57K7KjdyjhB enl7SXrq6vU2uoK+CD2UbsZ5iMHhl6DNZ1fAIp8AK6UX+61/jtWMOUroZgMQz2YTjfvk BGFLcIuMMSws3vx0TeNhiRsRLXYuhxOVb3IOF00IA1G9seEDBpgbnwb4ovUgi/dG/yIN BvwLEs9QDjU9bcbA/rAuOZLRqy6Vb4rSzIIVBOJnT1A3hoa35z9YgSgvvOqUY3ZnXFMG Bp6A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=zPZSzVFO; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id c196si2979196oib.41.2020.03.09.04.16.09; Mon, 09 Mar 2020 04:16:20 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=zPZSzVFO; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726793AbgCILPv (ORCPT + 99 others); Mon, 9 Mar 2020 07:15:51 -0400 Received: from mail-lj1-f195.google.com ([209.85.208.195]:41663 "EHLO mail-lj1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726637AbgCILPu (ORCPT ); Mon, 9 Mar 2020 07:15:50 -0400 Received: by mail-lj1-f195.google.com with SMTP id o10so2727693ljc.8 for ; Mon, 09 Mar 2020 04:15:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=0x8fgi0ytJn5jh/uttLWlvF3qbwpztPE+Wqbly2aQ4g=; b=zPZSzVFOomf+WSU42nis8HI2QYGPUv9nR68m1C372MpiovUy9avVIQ2xODxX8kvuD7 yxOuxOkQvfX8apr3NvuwFzPsroMl7wDD0JlcwOb0DIfkkVJlWPoe4umak5fHrJ2BDabW 2cRwJ5NXPipX2sI7G97aiGxnsol33Vs8XfV4ZZ7VPaG0GZZuhh27be5PN67p1gPJGF8A 27dSZIy/6En2spdw/RvVIG5dCV6CbgNqCQvcxLKg4C50MNfeq4U0wuv42SBVTsbalq8/ loDXs1yFmH2+a/hDfnCqG81Hj4BcY8pl077nwhaCIvp9U7QCMn3PWIrCFUNP2GevyrvR 7qIQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=0x8fgi0ytJn5jh/uttLWlvF3qbwpztPE+Wqbly2aQ4g=; b=ubzLFEeDJBHaPznDwj38bU0N+AaNM5y6HYvl5eTlj49RKxtAMcYVUOZmx+89JM4T1X YWbv3+scUKM+85MXRhKpy0C0nu6ugDRWTJJbi6zRQIvXr/5edX98OFl7ktBfkkpnEBz8 LklLV/A5U2ESpV88Zzht8KcAYbopqZX1pubwHqE+x0v3j5aMRQEmaRt6/gUtZe5qyic1 XRFWRKZ8Ggxxs86GZpLRnIjzm4OiaW3S8NKgjbmBg4iKpzYxpJ2iG6Qo0o7wppt2t7Co o91Z3sMZZTi+zu3dOeO/B7eqzs5wsgFhbsUkTAim7WoYf4F7vRlJv9YAUZhHT8Mmpg3c Zl6g== X-Gm-Message-State: ANhLgQ3IdUHQeyBECTE4u3nSh4ZsnU/9afIDIfNF0jPF10+pcYIuA4gf WOWPucXagipWXyUZqHBwlR+HjVs7FtXASfIG7c0eVA== X-Received: by 2002:a2e:b5c3:: with SMTP id g3mr3444500ljn.151.1583752548843; Mon, 09 Mar 2020 04:15:48 -0700 (PDT) MIME-Version: 1.0 References: <44fa1cee-08db-e4ab-e5ab-08d6fbd421d7@linux.alibaba.com> <20200303195245.GF2596@hirez.programming.kicks-ass.net> <1180c6cd-ff61-2c9f-d689-ffe58f8c5a68@linux.alibaba.com> In-Reply-To: From: Vincent Guittot Date: Mon, 9 Mar 2020 12:15:36 +0100 Message-ID: Subject: Re: [RFC PATCH] sched: fix the nonsense shares when load of cfs_rq is too, small To: Ben Segall Cc: =?UTF-8?B?546L6LSH?= , Peter Zijlstra , Ingo Molnar , Juri Lelli , Dietmar Eggemann , Steven Rostedt , Mel Gorman , "open list:SCHEDULER" Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, 6 Mar 2020 at 20:17, wrote: > > =E7=8E=8B=E8=B4=87 writes: > > > On 2020/3/5 =E4=B8=8A=E5=8D=882:47, bsegall@google.com wrote: > > [snip] > >>> Argh, because A->cfs_rq.load.weight is B->se.load.weight which is > >>> B->shares/nr_cpus. > >>> > >>>> While the se of D on root cfs_rq is far more bigger than 2, so it > >>>> wins the battle. > >>>> > >>>> This patch add a check on the zero load and make it as MIN_SHARES > >>>> to fix the nonsense shares, after applied the group C wins as > >>>> expected. > >>>> > >>>> Signed-off-by: Michael Wang > >>>> --- > >>>> kernel/sched/fair.c | 2 ++ > >>>> 1 file changed, 2 insertions(+) > >>>> > >>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > >>>> index 84594f8aeaf8..53d705f75fa4 100644 > >>>> --- a/kernel/sched/fair.c > >>>> +++ b/kernel/sched/fair.c > >>>> @@ -3182,6 +3182,8 @@ static long calc_group_shares(struct cfs_rq *c= fs_rq) > >>>> tg_shares =3D READ_ONCE(tg->shares); > >>>> > >>>> load =3D max(scale_load_down(cfs_rq->load.weight), cfs_rq->avg.lo= ad_avg); > >>>> + if (!load && cfs_rq->load.weight) > >>>> + load =3D MIN_SHARES; > >>>> > >>>> tg_weight =3D atomic_long_read(&tg->load_avg); > >>> > >>> Yeah, I suppose that'll do. Hurmph, wants a comment though. > >>> > >>> But that has me looking at other users of scale_load_down(), and does= n't > >>> at least update_tg_cfs_load() suffer the same problem? > >> > >> I think instead we should probably scale_load_down(tg_shares) and > >> scale_load(load_avg). tg_shares is always a scaled integer, so just > >> moving the source of the scaling in the multiply should do the job. > >> > >> ie > >> > >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > >> index fcc968669aea..6d7a9d72d742 100644 > >> --- a/kernel/sched/fair.c > >> +++ b/kernel/sched/fair.c > >> @@ -3179,9 +3179,9 @@ static long calc_group_shares(struct cfs_rq *cfs= _rq) > >> long tg_weight, tg_shares, load, shares; > >> struct task_group *tg =3D cfs_rq->tg; > >> > >> - tg_shares =3D READ_ONCE(tg->shares); > >> + tg_shares =3D scale_load_down(READ_ONCE(tg->shares)); > >> > >> - load =3D max(scale_load_down(cfs_rq->load.weight), cfs_rq->avg= .load_avg); > >> + load =3D max(cfs_rq->load.weight, scale_load(cfs_rq->avg.load_= avg)); > >> > >> tg_weight =3D atomic_long_read(&tg->load_avg); > > > > Get the point, but IMHO fix scale_load_down() sounds better, to > > cover all the similar cases, let's first try that way see if it's > > working :-) > > Yeah, that might not be a bad idea as well; it's just that doing this > fix would keep you from losing all your precision (and I'd have to think > if that would result in fairness issues like having all the group ses > having the full tg shares, or something like that). AFAICT, we already have a fairness problem case because scale_load_down is used in calc_delta_fair() so all sched groups that have a weight lower than 1024 will end up with the same increase of their vruntime when running. Then the load_avg is used to balance between rq so load_balance will ensure at least 1 task per CPU but not more because the load_avg which is then used will stay null. That being said, having a min of 2 for scale_load_down will enable us to have the tg->load_avg !=3D 0 so a tg_weight !=3D 0 and each sched group will not have the full shares. But it will make those group completely fair anyway. The best solution would be not to scale down the weight but that's a bigger change