Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754595AbcJRLPW (ORCPT ); Tue, 18 Oct 2016 07:15:22 -0400 Received: from foss.arm.com ([217.140.101.70]:57484 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750703AbcJRLPP (ORCPT ); Tue, 18 Oct 2016 07:15:15 -0400 Subject: Re: [v4.8-rc1 Regression] sched/fair: Apply more PELT fixes To: Peter Zijlstra References: <20161014151827.GA10379@linaro.org> <2bb765e7-8a5f-c525-a6ae-fbec6fae6354@canonical.com> <20161017090903.GA11962@linaro.org> <4e15ad55-beeb-e860-0420-8f439d076758@arm.com> <20161017131952.GR3117@twins.programming.kicks-ass.net> <94cc6deb-f93e-60ec-5834-e84a8b98e73c@arm.com> <20161018090747.GW3142@twins.programming.kicks-ass.net> Cc: Vincent Guittot , Joseph Salisbury , Ingo Molnar , Linus Torvalds , Thomas Gleixner , LKML , Mike Galbraith , omer.akram@canonical.com From: Dietmar Eggemann Message-ID: Date: Tue, 18 Oct 2016 12:15:11 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.2.0 MIME-Version: 1.0 In-Reply-To: <20161018090747.GW3142@twins.programming.kicks-ass.net> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2362 Lines: 58 On 18/10/16 10:07, Peter Zijlstra wrote: > On Mon, Oct 17, 2016 at 11:52:39PM +0100, Dietmar Eggemann wrote: [...] >> Using for_each_online_cpu(i) instead of for_each_possible_cpu(i) in >> online_fair_sched_group() works on this machine, i.e. the .tg_load_avg >> of system.slice tg is 0 after startup. > > Right, so the reason for using present_mask is that it avoids having to > deal with hotplug, also all the per-cpu memory is allocated and present > for !online CPUs anyway, so might as well set it up properly anyway. > > (You might want to start booting your laptop with "possible_cpus=4" to > save some memory FWIW.) The question for me is could this be the reason for the X1 Carbon platform as well? The initial pastebin from Joseph (http://paste.ubuntu.com/23312351) showed .tg_load_avg : 381697 on a 4 logical cpu thing. With a couple of more services than 80 this might be the problem. > > But yes, we have a bug here too... /me ponders > > So aside from funny BIOSes, this should also show up when creating > cgroups when you have offlined a few CPUs, which is far more common I'd > think. Yes. > On IRC you mentioned that adding list_add_leaf_cfs_rq() to > online_fair_sched_group() cures this, this would actually match with > unregister_fair_sched_group() doing list_del_leaf_cfs_rq() and avoid > a few instructions on the enqueue path, so that's all good. Yes, I was able to recreate a similar problem (not related to the cpu masks) on ARM64 (6 logical cpus). I created 100 2. level tg's but only put one task (no cpu affinity, so it could run on multiple cpus) in one of these tg's (mainly to see the related cfs_rq's in /proc/sched_debug). I get a remaining .tg_load_avg : 49898 for cfs_rq[x]:/tg_1 > I'm just not immediately seeing how that cures things. The only relevant > user of the leaf_cfs_rq list seems to be update_blocked_averages() which > is called from the balance code (idle_balance() and > rebalance_domains()). But neither should call that for offline (or > !present) CPUs. Assuming this is load from the 99 2. level tg's which never had a task running, putting list_add_leaf_cfs_rq() into online_fair_sched_group() for all cpus makes sure that all the 'blocked load' get's decayed. Doing what Vincent just suggested, not initializing tg se's w/ 1024 but w/ 0 instead prevents this from being necessary. [...]