Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751970AbdG1M7c (ORCPT ); Fri, 28 Jul 2017 08:59:32 -0400 Received: from bombadil.infradead.org ([65.50.211.133]:51266 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751822AbdG1M7b (ORCPT ); Fri, 28 Jul 2017 08:59:31 -0400 Date: Fri, 28 Jul 2017 14:59:24 +0200 From: Peter Zijlstra To: Dietmar Eggemann Cc: Jeffrey Hugo , Ingo Molnar , linux-kernel@vger.kernel.org, Austin Christ , Tyler Baicar , Timur Tabi Subject: Re: [PATCH V6] sched/fair: Remove group imbalance from calculate_imbalance() Message-ID: <20170728125924.eaw6unxude2qiyym@hirez.programming.kicks-ass.net> References: <1499975708-31090-1-git-send-email-jhugo@codeaurora.org> <0f91065c-94cb-3780-0d77-f2be682086bf@arm.com> <20170726145407.rfswqxoclvezukwq@hirez.programming.kicks-ass.net> <5ddf061e-26a2-7151-adff-7ae339c848ac@arm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <5ddf061e-26a2-7151-adff-7ae339c848ac@arm.com> User-Agent: NeoMutt/20170609 (1.8.3) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1548 Lines: 46 On Fri, Jul 28, 2017 at 01:16:24PM +0100, Dietmar Eggemann wrote: > >> IIRC the topology you had in mind was MC + DIE level with n (n > 2) DIE > >> level sched groups. > > > > That'd be a NUMA box? > > I don't think it's NUMA. SD level are MC, DIE w/ # DIE sg's >> 2. Ah, I can't read. I thought >2 DIEs. > > So this is 4 * 18 * 2 = 144 cpus: > > Impressive ;-) Takes forever to boot though :/ > > If I then start a 3rd loop, I see 100% 50%,50%. I then kill the 100%. > > Then instantly they balance and I get 2x100% back. > > Yeah, could reproduce on IVB-EP (2x10x2). OK, I have one of those. What should I do, because I didn't actually see anything odd. > > Anything else I need to reproduce? (other than maybe a slightly less > > insane machine :-) > > I guess what Jeff is trying to avoid is that 'busiest->load_per_task' > lowered to 'sds->avg_load' in case of an imbalanced busiest sg: > > if (busiest->group_type == group_imbalanced) > busiest->load_per_task = min(busiest->load_per_task, sds->avg_load); > > is so low that later fix_small_imbalance() won't be called and > 'env->imbalance' stays so low that load-balance of on 50% task to the > now idle cpu won't happen. > > if (env->imbalance < busiest->load_per_task) > fix_small_imbalance(env, sds); > > Having really a lot of otherwise idle DIE sg's helps to keep > 'sds->avg_load' low in comparison to 'busiest->load_per_task'. Right, but the whole load_per_task thing is a bit wonky, and since that's the basis of fix_small_imbalance() I'm very suspect.