Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755253AbcJRMHu (ORCPT ); Tue, 18 Oct 2016 08:07:50 -0400 Received: from bombadil.infradead.org ([198.137.202.9]:46850 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754972AbcJRMHu (ORCPT ); Tue, 18 Oct 2016 08:07:50 -0400 Date: Tue, 18 Oct 2016 14:07:44 +0200 From: Peter Zijlstra To: Dietmar Eggemann Cc: Vincent Guittot , Joseph Salisbury , Ingo Molnar , Linus Torvalds , Thomas Gleixner , LKML , Mike Galbraith , omer.akram@canonical.com Subject: Re: [v4.8-rc1 Regression] sched/fair: Apply more PELT fixes Message-ID: <20161018120744.GZ3142@twins.programming.kicks-ass.net> References: <20161014151827.GA10379@linaro.org> <2bb765e7-8a5f-c525-a6ae-fbec6fae6354@canonical.com> <20161017090903.GA11962@linaro.org> <4e15ad55-beeb-e860-0420-8f439d076758@arm.com> <20161017131952.GR3117@twins.programming.kicks-ass.net> <94cc6deb-f93e-60ec-5834-e84a8b98e73c@arm.com> <20161018090747.GW3142@twins.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23.1 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1843 Lines: 41 On Tue, Oct 18, 2016 at 12:15:11PM +0100, Dietmar Eggemann wrote: > On 18/10/16 10:07, Peter Zijlstra wrote: > > On Mon, Oct 17, 2016 at 11:52:39PM +0100, Dietmar Eggemann wrote: > > On IRC you mentioned that adding list_add_leaf_cfs_rq() to > > online_fair_sched_group() cures this, this would actually match with > > unregister_fair_sched_group() doing list_del_leaf_cfs_rq() and avoid > > a few instructions on the enqueue path, so that's all good. > > Yes, I was able to recreate a similar problem (not related to the cpu > masks) on ARM64 (6 logical cpus). I created 100 2. level tg's but only > put one task (no cpu affinity, so it could run on multiple cpus) in one > of these tg's (mainly to see the related cfs_rq's in /proc/sched_debug). > > I get a remaining .tg_load_avg : 49898 for cfs_rq[x]:/tg_1 Ah, and since all those CPUs are online, we decay all that load away. OK makes sense now. > > I'm just not immediately seeing how that cures things. The only relevant > > user of the leaf_cfs_rq list seems to be update_blocked_averages() which > > is called from the balance code (idle_balance() and > > rebalance_domains()). But neither should call that for offline (or > > !present) CPUs. > > Assuming this is load from the 99 2. level tg's which never had a task > running, putting list_add_leaf_cfs_rq() into online_fair_sched_group() > for all cpus makes sure that all the 'blocked load' get's decayed. > > Doing what Vincent just suggested, not initializing tg se's w/ 1024 but > w/ 0 instead prevents this from being necessary. Indeed. I just worry about the cases where we do no propagate the load up, eg. the stuff fixed by: 1476695653-12309-5-git-send-email-vincent.guittot@linaro.org If we hit an intermediary cgroup with 0 load, we might get some interactivity issues. But it could be I got lost again :-)