Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752634AbdGELWi (ORCPT ); Wed, 5 Jul 2017 07:22:38 -0400 Received: from merlin.infradead.org ([205.233.59.134]:59532 "EHLO merlin.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752547AbdGELWh (ORCPT ); Wed, 5 Jul 2017 07:22:37 -0400 Date: Wed, 5 Jul 2017 13:22:32 +0200 From: Peter Zijlstra To: Jeffrey Hugo Cc: Ingo Molnar , linux-kernel@vger.kernel.org, Dietmar Eggemann , Austin Christ , Tyler Baicar , Timur Tabi Subject: Re: [PATCH V5 1/2] sched/fair: Fix load_balance() affinity redo path Message-ID: <20170705112232.tkeqnwh2urlc6nbx@hirez.programming.kicks-ass.net> References: <1496863138-11322-1-git-send-email-jhugo@codeaurora.org> <1496863138-11322-2-git-send-email-jhugo@codeaurora.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1496863138-11322-2-git-send-email-jhugo@codeaurora.org> User-Agent: NeoMutt/20170113 (1.7.2) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1422 Lines: 27 On Wed, Jun 07, 2017 at 01:18:57PM -0600, Jeffrey Hugo wrote: > If load_balance() fails to migrate any tasks because all tasks were > affined, load_balance() removes the source cpu from consideration and > attempts to redo and balance among the new subset of cpus. > > There is a bug in this code path where the algorithm considers all active > cpus in the system (minus the source that was just masked out). This is > not valid for two reasons: some active cpus may not be in the current > scheduling domain and one of the active cpus is dst_cpu. These cpus should > not be considered, as we cannot pull load from them. > > Instead of failing out of load_balance(), we may end up redoing the search > with no valid cpus and incorrectly concluding the domain is balanced. > Additionally, if the group_imbalance flag was just set, it may also be > incorrectly unset, thus the flag will not be seen by other cpus in future > load_balance() runs as that algorithm intends. > > Fix the check by removing cpus not in the current domain and the dst_cpu > from considertation, thus limiting the evaluation to valid remaining cpus > from which load might be migrated. > > Co-authored-by: Austin Christ > Co-authored-by: Dietmar Eggemann > Signed-off-by: Jeffrey Hugo > Tested-by: Tyler Baicar Yes, this looks good. Thanks!