Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752798AbbGIO3p (ORCPT ); Thu, 9 Jul 2015 10:29:45 -0400 Received: from foss.arm.com ([217.140.101.70]:46553 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751430AbbGIO3l (ORCPT ); Thu, 9 Jul 2015 10:29:41 -0400 Date: Thu, 9 Jul 2015 15:32:20 +0100 From: Morten Rasmussen To: Yuyang Du Cc: Peter Zijlstra , Mike Galbraith , Rabin Vincent , "mingo@redhat.com" , "linux-kernel@vger.kernel.org" , Paul Turner , Ben Segall Subject: Re: [PATCH?] Livelock in pick_next_task_fair() / idle_balance() Message-ID: <20150709143219.GB8668@e105550-lin.cambridge.arm.com> References: <20150701145551.GA15690@axis.com> <20150701204404.GH25159@twins.programming.kicks-ass.net> <20150701232511.GA5197@intel.com> <1435824347.5351.18.camel@gmail.com> <20150702010539.GB5197@intel.com> <20150702114032.GA7598@e105550-lin.cambridge.arm.com> <20150702193702.GD5197@intel.com> <20150703093441.GA15477@e105550-lin.cambridge.arm.com> <20150703163831.GQ3644@twins.programming.kicks-ass.net> <20150705223144.GG5197@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150705223144.GG5197@intel.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3240 Lines: 68 On Mon, Jul 06, 2015 at 06:31:44AM +0800, Yuyang Du wrote: > On Fri, Jul 03, 2015 at 06:38:31PM +0200, Peter Zijlstra wrote: > > > I'm not against having a policy that sits somewhere in between, we just > > > have to agree it is the right policy and clean up the load-balance code > > > such that the implemented policy is clear. > > > > Right, for balancing its a tricky question, but mixing them without > > intent is, as you say, a bit of a mess. > > > > So clearly blocked load doesn't make sense for (new)idle balancing. OTOH > > it does make some sense for the regular periodic balancing, because > > there we really do care mostly about the averages, esp. so when we're > > overloaded -- but there are issues there too. > > > > Now we can't track them both (or rather we could, but overhead). > > > > I like Yuyang's load tracking rewrite, but it changes exactly this part, > > and I'm not sure I understand the full ramifications of that yet. I don't think anybody does ;-) But I think we should try to make it work. > Thanks. It would be a pure average policy, which is non-perfect like now, > and certainly needs a mixing like now, but it is worth a starter, because > it is simple and reasaonble, and based on it, the other parts can be simple > and reasonable. I think we all agree on the benefits of taking blocked load into account but also that there are some policy questions to be addressed. > > One way out would be to split the load balancer into 3 distinct regions; > > > > 1) get a task on every CPU, screw everything else. > > 2) get each CPU fully utilized, still ignoring 'load' > > 3) when everybody is fully utilized, consider load. Seems very reasonable to me. We more or less follow that idea in the energy-model driven scheduling patches, at least 2) and 3). The difficult bit is detecting when to transition between 2) and 3). If you want to enforce smp_nice you have to start worrying about task priority as soon as one cpu is fully utilized. For example, a fully utilized cpu has two high priority tasks while all other cpus are running low priority tasks and are not fully utilized. The utilization imbalance may be too small to cause any tasks to be migrated, so we end up giving fewer cycles to the high priority tasks. > > If we make find_busiest_foo() select one of these 3, and make > > calculate_imbalance() invariant to the metric passed in, and have things > > like cpu_load() and task_load() return different, but coherent, numbers > > depending on which region we're in, this almost sounds 'simple'. > > > > The devil is in the details, and the balancer is a hairy nest of details > > which will make the above non-trivial. Yes, but if we have an overall policy like the one you propose we can at least make it complicated and claim that we think we know what it is supposed to do ;-) I agree that there is some work to be done in find_busiest_*() and calcuate_imbalance() + friends. Maybe step one should be to clean them up a bit. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/