Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752603AbcJJNyq (ORCPT ); Mon, 10 Oct 2016 09:54:46 -0400 Received: from foss.arm.com ([217.140.101.70]:47508 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752411AbcJJNyp (ORCPT ); Mon, 10 Oct 2016 09:54:45 -0400 Subject: Re: [PATCH] sched/fair: Do not decay new task load on first enqueue To: Vincent Guittot , Matt Fleming References: <20160923115808.2330-1-matt@codeblueprint.co.uk> <20160928101422.GR5016@twins.programming.kicks-ass.net> <20160928193731.GD16071@codeblueprint.co.uk> <20161010100107.GZ16071@codeblueprint.co.uk> Cc: Wanpeng Li , Peter Zijlstra , Ingo Molnar , "linux-kernel@vger.kernel.org" , Mike Galbraith , Yuyang Du From: Dietmar Eggemann Message-ID: Date: Mon, 10 Oct 2016 14:54:41 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.2.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3295 Lines: 79 On 10/10/16 13:29, Vincent Guittot wrote: > On 10 October 2016 at 12:01, Matt Fleming wrote: >> On Sun, 09 Oct, at 11:39:27AM, Wanpeng Li wrote: >>> >>> The difference between this patch and Peterz's is your patch have a >>> delta since activate_task()->enqueue_task() does do update_rq_clock(), >>> so why don't have the delta will cause low cpu machines (4 or 8) to >>> regress against your another reply in this thread? >> >> Both my patch and Peter's patch cause issues with low cpu machines. In >> <20161004201105.GP16071@codeblueprint.co.uk> I said, >> >> "This patch causes some low cpu machines (4 or 8) to regress. It turns >> out they regress with my patch too." >> >> Have I misunderstood your question? >> >> I ran out of time to investigate this last week, though I did try all >> proposed patches, including Vincent's, and none of them produced wins >> across the board. > > I have tried to reprocude your issue on my target an hikey board (ARM > based octo cores) but i failed to see a regression with commit > 7dc603c9028e. Neverthless, i can see tasks not been well spread Wasn't this about the two patches mentioned in this thread? The one from Matt using 'se->sum_exec_runtime' in the if condition in enqueue_entity_load_avg() and Peterz's conditional call to update_rq_clock(rq) in enqueue_task()? > during fork as you mentioned. So I have studied a bit more the > spreading issue during fork last week and i have a new version of my > proposed patch that i'm going to send soon. With this patch, i can see > a good spread of tasks during the fork sequence and some kind of perf > improvement even if it's bit difficult as the variance is quite > important with hackbench test so it's mainly an improvement of > repeatability of the result Hikey (ARM64 2x4 cpus) board: cpufreq: performance, cpuidle: disabled Performance counter stats for 'perf bench sched messaging -g 20 -l 500' (10 runs): (1) tip/sched/core: commit 447976ef4fd0 5.902209533 seconds time elapsed ( +- 0.31% ) (2) tip/sched/core + original patch on the 'sched/fair: Do not decay new task load on first enqueue' thread (23/09/16) 5.919933030 seconds time elapsed ( +- 0.44% ) (3) tip/sched/core + Peter's ENQUEUE_NEW patch on the 'sched/fair: Do not decay new task load on first enqueue' thread (28/09/16) 5.970195534 seconds time elapsed ( +- 0.37% ) Not sure if we can call this a regression but it also shows no performance gain. >> >> I should get a bit further this week. >> >> Vincent, Dietmar, did you guys ever get around to submitting your PELT >> tracepoint patches? Getting some introspection into the scheduler's > > My tarcepoint are not in a shape to be submitted and would need a > cleanup as some are more hacks for debugging than real trace events. > Nevertheless, i can push them on a git branch if they can be useful > for someone We carry two trace events locally, one for PELT on se and one for cfs_rq's (I have to add the runnable bits here) which work for CONFIG_FAIR_GROUP_SCHED and !CONFIG_FAIR_GROUP_SCHED. I put them into __update_load_avg(), attach_entity_load_avg() and detach_entity_load_avg(). I could post them but so far mainline has been reluctant to see the need for PELT related trace events ... [...]