Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2993476AbdDYSty (ORCPT ); Tue, 25 Apr 2017 14:49:54 -0400 Received: from mail-pg0-f43.google.com ([74.125.83.43]:33666 "EHLO mail-pg0-f43.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1953506AbdDYSto (ORCPT ); Tue, 25 Apr 2017 14:49:44 -0400 Date: Tue, 25 Apr 2017 11:49:41 -0700 From: Tejun Heo To: Vincent Guittot Cc: Ingo Molnar , Peter Zijlstra , linux-kernel , Linus Torvalds , Mike Galbraith , Paul Turner , Chris Mason , kernel-team@fb.com Subject: Re: [PATCH 2/2] sched/fair: Always propagate runnable_load_avg Message-ID: <20170425184941.GB15593@wtj.duckdns.org> References: <20170424201344.GA14169@wtj.duckdns.org> <20170424201444.GC14169@wtj.duckdns.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.8.0 (2017-02-23) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2205 Lines: 50 Hello, On Tue, Apr 25, 2017 at 02:59:18PM +0200, Vincent Guittot wrote: > >> So you are changing the purpose of propagate_entity_load_avg which > >> aims to propagate load_avg/util_avg changes only when a task migrate > >> and you also want to propagate the enqueue/dequeue in the parent > >> cfs_rq->runnable_load_avg Yeah, it always propagates runnable_load_avg and load_avg/util_avg too on migrations. > > In fact you want that sched_entity load_avg reflects > > cfs_rq->runnable_load_avg and not cfs_rq->avg.load_avg Yes, that's how it gets changed. The load balancer assumes that the root's runnable_load_avg is the total sum of all currently active tasks. Nesting cfs_rq's shouldn't change that and how it should be mapped is clearly defined (scaled recursively till it reaches the root), which is what the code calculates. The change in cfs_rq->avg.load_avg's behavior is to reflect that immediate propagation as load_avg and runnable_load_avg are tightly coupled. While it does change a nested cfs_rq's load_avg behavior. It sheds of the extra layer of averaging and directly reflects the scaled load avgs of its members, which are already time averaged. I could have missed something but couldn't spot anything which can break from this. > I have run a quick test with your patches and schbench on my platform. > I haven't been able to reproduce your regression but my platform is > quite different from yours (only 8 cores without SMT) > But most importantly, the parent cfs_rq->runnable_load_avg never > reaches 0 (or almost 0) when it is idle. Instead, it still has a > runnable_load_avg (this is not due to rounding computation) whereas > runnable_load_avg should be 0 Heh, let me try that out. Probably a silly mistake somewhere. > Just to be curious, Is your regression still there if you disable > SMT/hyperthreading on your paltform? Will try that too. I can't see why HT would change it because I see single CPU queues misevaluated. Just in case, you need to tune the test params so that it doesn't load the machine too much and that there are some non-CPU intensive workloads going on to purturb things a bit. Anyways, I'm gonna try disabling HT. Thanks. -- tejun