MIME-Version: 1.0
In-Reply-To: <f2091da3-b96e-d26c-8db7-a1db2d9237ae@arm.com>
References: <20160923115808.2330-1-matt@codeblueprint.co.uk>
 <20160928101422.GR5016@twins.programming.kicks-ass.net> <20160928193731.GD16071@codeblueprint.co.uk>
 <CANRm+CyVFuT3XJt7DZEBZgHb_hQPzDUfOGnkAqNexH4q2ex74Q@mail.gmail.com>
 <20161010100107.GZ16071@codeblueprint.co.uk> <CAKfTPtBFrahA2fBoG5S5CBiJHb8EZkUbPaOZ4jZFc1mVYH5zJQ@mail.gmail.com>
 <f2091da3-b96e-d26c-8db7-a1db2d9237ae@arm.com>
From: Vincent Guittot <vincent.guittot@linaro.org>
Date: Mon, 10 Oct 2016 20:29:01 +0200
Message-ID: <CAKfTPtCsM0T0wKMnbqu556kJLTu9gGQs874uLp84hDjqs3+U5Q@mail.gmail.com>
Subject: Re: [PATCH] sched/fair: Do not decay new task load on first enqueue
To: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Matt Fleming <matt@codeblueprint.co.uk>,
        Wanpeng Li <kernellwp@gmail.com>,
        Peter Zijlstra <peterz@infradead.org>, Ingo Molnar <mingo@kernel.org>,
        "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
        Mike Galbraith <umgwanakikbuti@gmail.com>,
        Yuyang Du <yuyang.du@intel.com>
Content-Type: text/plain; charset=UTF-8
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 3681
Lines: 87

On 10 October 2016 at 15:54, Dietmar Eggemann <dietmar.eggemann@arm.com> wrote:
> On 10/10/16 13:29, Vincent Guittot wrote:
>> On 10 October 2016 at 12:01, Matt Fleming <matt@codeblueprint.co.uk> wrote:
>>> On Sun, 09 Oct, at 11:39:27AM, Wanpeng Li wrote:
>>>>
>>>> The difference between this patch and Peterz's is your patch have a
>>>> delta since activate_task()->enqueue_task() does do update_rq_clock(),
>>>> so why don't have the delta will cause low cpu machines (4 or 8) to
>>>> regress against your another reply in this thread?
>>>
>>> Both my patch and Peter's patch cause issues with low cpu machines. In
>>> <20161004201105.GP16071@codeblueprint.co.uk> I said,
>>>
>>>  "This patch causes some low cpu machines (4 or 8) to regress. It turns
>>>   out they regress with my patch too."
>>>
>>> Have I misunderstood your question?
>>>
>>> I ran out of time to investigate this last week, though I did try all
>>> proposed patches, including Vincent's, and none of them produced wins
>>> across the board.
>>
>> I have tried to reprocude your issue on my target an hikey board (ARM
>> based octo cores) but i failed to see a regression with commit
>> 7dc603c9028e. Neverthless, i can see tasks not been well  spread
>
> Wasn't this about the two patches mentioned in this thread? The one from
> Matt using 'se->sum_exec_runtime' in the if condition in
> enqueue_entity_load_avg() and Peterz's conditional call to
> update_rq_clock(rq) in enqueue_task()?

I was trying to reproduce the regression that Matt mentioned at the
beg of the thread not those linked to proposed fixes

>
>> during fork as you mentioned. So I have studied a bit more the
>> spreading issue during fork last week and i have a new version of my
>> proposed patch that i'm going to send soon. With this patch, i can see
>> a good spread of tasks  during the fork sequence and some kind of perf
>> improvement even if it's bit difficult as the variance is quite
>> important with hackbench test so it's mainly an improvement of
>> repeatability of the result
>
> Hikey  (ARM64 2x4 cpus) board: cpufreq: performance, cpuidle: disabled
>
> Performance counter stats for 'perf bench sched messaging -g 20 -l 500'
> (10 runs):
>
> (1) tip/sched/core: commit 447976ef4fd0
>
>     5.902209533 seconds time elapsed ( +- 0.31% )

This seems to be too long to test the impact of the forking phase of hackbench

>
> (2) tip/sched/core + original patch on the 'sched/fair: Do not decay
>     new task load on first enqueue' thread (23/09/16)
>
>     5.919933030 seconds time elapsed ( +- 0.44% )
>
> (3) tip/sched/core + Peter's ENQUEUE_NEW patch on the 'sched/fair: Do
>     not decay new task load on first enqueue' thread (28/09/16)
>
>     5.970195534 seconds time elapsed ( +- 0.37% )
>
> Not sure if we can call this a regression but it also shows no
> performance gain.
>
>>>
>>> I should get a bit further this week.
>>>
>>> Vincent, Dietmar, did you guys ever get around to submitting your PELT
>>> tracepoint patches? Getting some introspection into the scheduler's
>>
>> My tarcepoint are not in a shape to be submitted and would need a
>> cleanup as some are more hacks for debugging than real trace events.
>> Nevertheless, i can push them on a git branch if they can be useful
>> for someone
>
> We carry two trace events locally, one for PELT on se and one for
> cfs_rq's (I have to add the runnable bits here) which work for
> CONFIG_FAIR_GROUP_SCHED and !CONFIG_FAIR_GROUP_SCHED. I put them into
> __update_load_avg(), attach_entity_load_avg() and
> detach_entity_load_avg(). I could post them but so far mainline has been
> reluctant to see the need for PELT related trace events ...
>
> [...]