Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932480AbbLNNHE (ORCPT ); Mon, 14 Dec 2015 08:07:04 -0500 Received: from foss.arm.com ([217.140.101.70]:42798 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932323AbbLNNHC (ORCPT ); Mon, 14 Dec 2015 08:07:02 -0500 Date: Mon, 14 Dec 2015 13:07:26 +0000 From: Morten Rasmussen To: Peter Zijlstra Cc: Yuyang Du , Andrey Ryabinin , mingo@redhat.com, linux-kernel@vger.kernel.org, Paul Turner , Ben Segall Subject: Re: [PATCH] sched/fair: fix mul overflow on 32-bit systems Message-ID: <20151214130723.GB9870@e105550-lin.cambridge.arm.com> References: <1449838518-26543-1-git-send-email-aryabinin@virtuozzo.com> <20151211132551.GO6356@twins.programming.kicks-ass.net> <20151211133612.GG6373@twins.programming.kicks-ass.net> <566AD6E1.2070005@virtuozzo.com> <20151211175751.GA27552@e105550-lin.cambridge.arm.com> <20151213224224.GC28098@intel.com> <20151214115453.GN6357@twins.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20151214115453.GN6357@twins.programming.kicks-ass.net> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3928 Lines: 82 On Mon, Dec 14, 2015 at 12:54:53PM +0100, Peter Zijlstra wrote: > On Mon, Dec 14, 2015 at 06:42:24AM +0800, Yuyang Du wrote: > > > In most cases 'r' shouldn't exceed 1024 and util_sum not significantly > > > exceed 1024*47742, but in extreme cases like spawning lots of new tasks > > > it may potentially overflow 32 bit. Newly created tasks contribute > > > 1024*47742 each to the rq util_sum, which means that more than ~87 new > > > tasks on a single rq will get us in trouble I think. > > > Both can workaround the issue with additional overhead. But I suspectthey > > will end up going in the wrong direction for util_avg. The question is a > > big util_sum (much bigger than 1024) may not be in a right range for it > > to be used in load balancing. > > Right, it being >100% doesn't make any sense. We should look at ensuring > it saturates at 100%, or at least have it be bounded much tighter to > that, as currently its entirely unbounded, which is quite horrible. Agreed, >100% is a transient state (which can be rather long) which only means over-utilized, nothing more. Would you like the metric itself to be changed to saturate at 100% or just cap it to 100% when used? It is not straight forward to provide a bound on the sum. There isn't one for load_avg either. If we want to guarantee an upper bound for cfs_rq->avg.util_sum we have to somehow cap the se->avg.util_avg contributions for each sched_entity. This cap depends on the cpu and how many other tasks are associated with that cpu. The cap may have to change when tasks migrate. > > The problem is that it is not so good to initiate a new task's util_avg > > to 1024. At least, it makes much less sense than a new task's load_avg > > being initiated to its full weight. Because the top util_avg should be > > well bounded by 1024 - the CPU's full utilization. > > > > So, maybe give the initial util_sum to an average of its cfs_rq, like: > > > cfs_rq->avg.util_sum / cfs_rq->load.weight * task->load.weight > > > > And make sure that initial value's is bounded on various conditions. > > That more or less results in an harmonic series, which is still very > much unbounded. > > However, I think that makes sense, but would propose doing it > differently. That condition is generally a maximum (assuming proper > functioning of the weight based scheduling etc..) for any one task, so > on migrate we can hard clip to this value. > > That still doesn't get rid of the harmonic series though, so we need > more. Now we can obviously also hard clip the sum on add, which I > suspect we'll need to do. > > That laves us with a problem on remove though, at which point we can > clip to this max if needed, but that will add a fair amount of cost to > remove :/ > > Alternatively, and I still have to go look through the code, we should > clip when we've already calculated the weight based ratio anyway, > avoiding the cost of that extra division. Why use load.weight to scale util_avg? It is affected by priority. Isn't just the ratio 1/nr_running that you are after? IIUC, you propose to clip the sum itself. In which case you are running into trouble when removing tasks. You don't know how much to remove from the clipped sum. Another problem is that load.weight is just a snapshot while avg.util_avg includes tasks that are not currently on the rq so the scaling factor is probably bigger than what you want. If we leave the sum as it is (unclipped) add/remove shouldn't give us any problems. The only problem is the overflow, which is solved by using a 64bit type for load_avg. That is not an acceptable solution? > In any case, ideas, we'll have to play with I suppose. Agreed. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/