Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932870AbcJRPTy (ORCPT ); Tue, 18 Oct 2016 11:19:54 -0400 Received: from mail-lf0-f46.google.com ([209.85.215.46]:35772 "EHLO mail-lf0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S934242AbcJRPTp (ORCPT ); Tue, 18 Oct 2016 11:19:45 -0400 MIME-Version: 1.0 In-Reply-To: <20161018110954.GX3142@twins.programming.kicks-ass.net> References: <20160928101422.GR5016@twins.programming.kicks-ass.net> <20160928193731.GD16071@codeblueprint.co.uk> <20161010100107.GZ16071@codeblueprint.co.uk> <20161010173440.GA28945@linaro.org> <20161011102453.GA16071@codeblueprint.co.uk> <20161011185759.GD16071@codeblueprint.co.uk> <20161018110954.GX3142@twins.programming.kicks-ass.net> From: Vincent Guittot Date: Tue, 18 Oct 2016 17:19:22 +0200 Message-ID: Subject: Re: [PATCH] sched/fair: Do not decay new task load on first enqueue To: Peter Zijlstra Cc: Matt Fleming , Wanpeng Li , Ingo Molnar , "linux-kernel@vger.kernel.org" , Mike Galbraith , Yuyang Du , Dietmar Eggemann Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1763 Lines: 42 On 18 October 2016 at 13:09, Peter Zijlstra wrote: > On Wed, Oct 12, 2016 at 09:41:36AM +0200, Vincent Guittot wrote: > >> ok. In fact, I have noticed another regression with tip/sched/core and >> hackbench while looking at yours. >> I have bisect to : >> 10e2f1acd0 ("sched/core: Rewrite and improve select_idle_siblings") >> >> hackbench -P -g 1 >> >> v4.8 tip/sched/core tip/sched/core+revert 10e2f1acd010 >> and 1b568f0aabf2 >> min 0.051 0,052 0.049 >> avg 0.057(0%) 0,062(-7%) 0.056(+1%) >> max 0.070 0,073 0.067 >> stdev +/-8% +/-10% +/-9% >> >> The issue seems to be that it prevents some migration at wake up at >> the end of hackbench test so we have last tasks that compete for the >> same CPU whereas other CPUs are idle in the same MC domain. I haven't >> to look more deeply which part of the patch do the regression yet > > So select_idle_cpu(), which does the LLC wide CPU scan is now throttled > by a comparison between avg_cost and avg_idle; where avg_cost is a > historical measure of how costly it was to scan the entire LLC domain > and avg_idle is our current idle time guestimate (also a historical > average). > > The problem was that a number of workloads were spending quite a lot of > time here scanning CPUs while they could be doing useful work (esp. > since newer parts have silly amounts of CPUs per LLC). make sense > > The toggle is a heuristic with a random number in.. we could see if > there's anything better we can do. I know some people take the toggle > out entirely, but that will regress other workloads. ok. so removing the toggle fixes the problem in this test case too. may be we can take into account the sd_llc_size in the toggle