Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751890AbdHGMwL (ORCPT ); Mon, 7 Aug 2017 08:52:11 -0400 Received: from usa-sjc-mx-foss1.foss.arm.com ([217.140.101.70]:48362 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751364AbdHGMwK (ORCPT ); Mon, 7 Aug 2017 08:52:10 -0400 Date: Mon, 7 Aug 2017 13:51:43 +0100 From: Morten Rasmussen To: Peter Zijlstra Cc: Brendan Jackman , Ingo Molnar , linux-kernel@vger.kernel.org, Joel Fernandes , Andres Oportus , Dietmar Eggemann , Vincent Guittot , Josef Bacik Subject: Re: [PATCH] sched/fair: Sync task util before slow-path wakeup Message-ID: <20170807125143.GA498@morras01-work> References: <20170802131002.31576-1-brendan.jackman@arm.com> <20170802132405.z5gvut7ecaygbhvy@hirez.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170802132405.z5gvut7ecaygbhvy@hirez.programming.kicks-ass.net> User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2548 Lines: 57 On Wed, Aug 02, 2017 at 03:24:05PM +0200, Peter Zijlstra wrote: > On Wed, Aug 02, 2017 at 02:10:02PM +0100, Brendan Jackman wrote: > > We use task_util in find_idlest_group via capacity_spare_wake. This > > task_util is updated in wake_cap. However wake_cap is not the only > > reason for ending up in find_idlest_group - we could have been sent > > there by wake_wide. So explicitly sync the task util with prev_cpu > > when we are about to head to find_idlest_group. > > > > We could simply do this at the beginning of > > select_task_rq_fair (i.e. irrespective of whether we're heading to > > select_idle_sibling or find_idlest_group & co), but I didn't want to > > slow down the select_idle_sibling path more than necessary. > > > > Don't do this during fork balancing, we won't need the task_util and > > we'd just clobber the last_update_time, which is supposed to be 0. > > So I remember Morten explicitly not aging util of tasks on wakeup > because the old util was higher and better representative of what the > new util would be, or something along those lines. > > Morten? That was the intention, but when we discussed the wake_cap() stuff we decided to drop that hoping that decay clamping or some other magic would be added on top later. So this patch is in line with current behaviour. Using non-aged util is causing trouble when comparing prev_cpu to other cpus. In cpu_util_wake() we compensate for the fact that the aged task util is already included in the cpu util on the prev_cpu. For that to work, we need to age the task util so we know how much is already accounted for. In the original wake_cap() series I think I had a patch that store the non-aged version so we could calculate the potential cpu util as: predicted_cpu_util(prev_cpu) = cpu_util(prev_cpu) - task_util_aged(task) + task_util_nonaged(task) predicted_cpu_util(other_cpu) = cpu_util(other_cpu) + task_util_nonaged(task) This would be better always under-estimating the task util by using the aged util as we currently do: predicted_cpu_util(prev_cpu) = cpu_util(prev_cpu) - task_util_aged(task) + task_util_aged(task) predicted_cpu_util(other_cpu) = cpu_util(other_cpu) + task_util_aged(task) but at least it gives us a fair comparison between prev_cpu and other cpus. The Android kernel carries additional patches that tracks the max (peak) utilization and uses that as the non aged util for wake-up placement. I'm hoping we can discuss this topic again at LPC, as last years idea of clamping decay didn't work very well to solve this issue.