Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932176AbbGCJcW (ORCPT ); Fri, 3 Jul 2015 05:32:22 -0400 Received: from foss.arm.com ([217.140.101.70]:52424 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932122AbbGCJcP (ORCPT ); Fri, 3 Jul 2015 05:32:15 -0400 Date: Fri, 3 Jul 2015 10:34:41 +0100 From: Morten Rasmussen To: Yuyang Du Cc: Mike Galbraith , Peter Zijlstra , Rabin Vincent , "mingo@redhat.com" , "linux-kernel@vger.kernel.org" , Paul Turner , Ben Segall Subject: Re: [PATCH?] Livelock in pick_next_task_fair() / idle_balance() Message-ID: <20150703093441.GA15477@e105550-lin.cambridge.arm.com> References: <20150630143057.GA31689@axis.com> <1435728995.9397.7.camel@gmail.com> <20150701145551.GA15690@axis.com> <20150701204404.GH25159@twins.programming.kicks-ass.net> <20150701232511.GA5197@intel.com> <1435824347.5351.18.camel@gmail.com> <20150702010539.GB5197@intel.com> <20150702114032.GA7598@e105550-lin.cambridge.arm.com> <20150702193702.GD5197@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150702193702.GD5197@intel.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4668 Lines: 99 On Fri, Jul 03, 2015 at 03:37:02AM +0800, Yuyang Du wrote: > Hi Morten, > > On Thu, Jul 02, 2015 at 12:40:32PM +0100, Morten Rasmussen wrote: > > detach_tasks() will attempts to pull 62 based on tasks task_h_load() but > > the task_h_load() sum is only 5 + 10 + 0 and hence detach_tasks() will > > empty the src_rq. > > > > IOW, since task groups include blocked load in the load_avg_contrib (see > > __update_group_entity_contrib() and __update_cfs_rq_tg_load_contrib()) the > > imbalance includes blocked load and hence env->imbalance >= > > sum(task_h_load(p)) for all tasks p on the rq. Which leads to > > detach_tasks() emptying the rq completely in the reported scenario where > > blocked load > runnable load. > > Whenever I want to know the load avg concerning task group, I need to > walk through the complete codes again, I prefer not to do it this time. > But it should not be that simply to say "the 118 comes from the blocked load". But the whole hierarchy of group entities is updated each time we enqueue or dequeue a task. I don't see how the group entity load_avg_contrib is not up to date? Why do you need to update it again? In any case, we have one task in the group hierarchy which has a load_avg_contrib of 0 and the grand-grand parent group entity has a load_avg_contrib of 118 and no additional tasks. That load contribution must be from tasks which are no longer around on the rq? No? > Anyway, with blocked load, yes, we definitely can't move (or even find) some > ammount of the imbalance if we only look at the tasks on the queue. But this > may or may not be a problem. > > Firstly, the question comes to whether we want blocked load anywhere. > This is just about a "now vs. average" question. That is what I meant in the paragraph below. It is a scheduling policy question. > Secondly, if we stick to average, we just need to treat the blocked load > consistently, not that group SE has it, but task SE does not, or somewhere > has it, others not. I agree that inconsistent use of blocked load will lead us into trouble. The problem is that none of the load-balance logic was designed for blocked load. It was written to deal load that is currently on the rq and slightly biased by average cpu load, not dealing with load contribution of tasks which we can't migrate at the moment because they are blocked. The load-balance code has to be updated to deal with blocked load. We will run into all sorts of issues if we don't and roll out use of blocked load everywhere. However, before we can rework the load-balance code, we have to agree on the now vs average balance policy. Your proposed patch implements a policy somewhere in between. We try to balance based on average, but we don't allow idle_balance() to empty a cpu completely. A pure average balance policy would allow a cpu to go idle even if we could do have tasks waiting on other rqs if the blocked load indicates that other task will show up shortly one the cpu. A pure "now" balance would balance based on runnable_load_avg for all entities including groups ignoring all blocked load, but that goes against the PELT group balancing design. I'm not against having a policy that sits somewhere in between, we just have to agree it is the right policy and clean up the load-balance code such that the implemented policy is clear. Morten > > Thanks, > Yuyang > > > Whether emptying the src_rq is the right thing to do depends on on your > > point of view. Does balanced load (runnable+blocked) take priority over > > keeping cpus busy or not? For idle_balance() it seems intuitively > > correct to not empty the rq and hence you could consider env->imbalance > > to be too big. > > > > I think we will see more of this kind of problems if we include > > weighted_cpuload() as well. Parts of the imbalance calculation code is > > quite old and could use some attention first. > > > > A short term fix could be what Yuyang propose, stop pulling tasks when > > there is only one left in detach_tasks(). It won't affect active load > > balance where we may want to migrate the last task as it active load > > balance doesn't use detach_tasks(). > > > > Morten > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/