Date: Mon, 14 Dec 2015 15:20:21 +0100
From: Peter Zijlstra <peterz@infradead.org>
To: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Yuyang Du <yuyang.du@intel.com>, Andrey Ryabinin <aryabinin@virtuozzo.com>,
        mingo@redhat.com, linux-kernel@vger.kernel.org,
        Paul Turner <pjt@google.com>, Ben Segall <bsegall@google.com>
Subject: Re: [PATCH] sched/fair: fix mul overflow on 32-bit systems
Message-ID: <20151214142021.GO6357@twins.programming.kicks-ass.net>
References: <1449838518-26543-1-git-send-email-aryabinin@virtuozzo.com>
 <20151211132551.GO6356@twins.programming.kicks-ass.net>
 <20151211133612.GG6373@twins.programming.kicks-ass.net>
 <566AD6E1.2070005@virtuozzo.com>
 <20151211175751.GA27552@e105550-lin.cambridge.arm.com>
 <20151213224224.GC28098@intel.com>
 <20151214115453.GN6357@twins.programming.kicks-ass.net>
 <20151214130723.GB9870@e105550-lin.cambridge.arm.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20151214130723.GB9870@e105550-lin.cambridge.arm.com>
User-Agent: Mutt/1.5.21 (2012-12-30)
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 2644
Lines: 64

On Mon, Dec 14, 2015 at 01:07:26PM +0000, Morten Rasmussen wrote:

> Agreed, >100% is a transient state (which can be rather long) which only
> means over-utilized, nothing more. Would you like the metric itself to
> be changed to saturate at 100% or just cap it to 100% when used?

We already cap it when using it IIRC. But no, I was thinking of the
measure itself.

> It is not straight forward to provide a bound on the sum.

Agreed..

> There isn't one for load_avg either.

But that one is fundamentally unbound, whereas the util thing is
fundamentally bound, except our implementation isn't.

> If we want to guarantee an upper bound for
> cfs_rq->avg.util_sum we have to somehow cap the se->avg.util_avg
> contributions for each sched_entity. This cap depends on the cpu and how
> many other tasks are associated with that cpu. The cap may have to
> change when tasks migrate.

Yep, blows :-)

> > However, I think that makes sense, but would propose doing it
> > differently. That condition is generally a maximum (assuming proper
> > functioning of the weight based scheduling etc..) for any one task, so
> > on migrate we can hard clip to this value.

> Why use load.weight to scale util_avg? It is affected by priority. Isn't
> just the ratio 1/nr_running that you are after?

Remember, the util thing is based on running, so assuming each task
always wants to run, each task gets to run w_i/\Sum_j w_j due to CFS
being a weighted fair queueing thingy.

> IIUC, you propose to clip the sum itself. In which case you are running
> into trouble when removing tasks. You don't know how much to remove from
> the clipped sum.

Right, then we'll have to slowly gain it again.

> Another problem is that load.weight is just a snapshot while
> avg.util_avg includes tasks that are not currently on the rq so the
> scaling factor is probably bigger than what you want.

Our weight guestimates also include non running (aka blocked) tasks,
right?

> If we leave the sum as it is (unclipped) add/remove shouldn't give us
> any problems. The only problem is the overflow, which is solved by using
> a 64bit type for load_avg. That is not an acceptable solution?

It might be. After all, any time any of this is needed we're CPU bound
and the utilization measure is pointless anyway. That measure only
matters if its small and the sum is 'small'. After that its back to the
normal load based thingy.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/