Subject: Re: [PATCH] sched/fair: Do not decay new task load on first enqueue
To: Vincent Guittot <vincent.guittot@linaro.org>
References: <20160923115808.2330-1-matt@codeblueprint.co.uk>
 <20160928101422.GR5016@twins.programming.kicks-ass.net>
 <20160928193731.GD16071@codeblueprint.co.uk>
 <CANRm+CyVFuT3XJt7DZEBZgHb_hQPzDUfOGnkAqNexH4q2ex74Q@mail.gmail.com>
 <20161010100107.GZ16071@codeblueprint.co.uk>
 <CAKfTPtBFrahA2fBoG5S5CBiJHb8EZkUbPaOZ4jZFc1mVYH5zJQ@mail.gmail.com>
 <f2091da3-b96e-d26c-8db7-a1db2d9237ae@arm.com>
 <CAKfTPtCsM0T0wKMnbqu556kJLTu9gGQs874uLp84hDjqs3+U5Q@mail.gmail.com>
Cc: Matt Fleming <matt@codeblueprint.co.uk>,
        Wanpeng Li <kernellwp@gmail.com>,
        Peter Zijlstra <peterz@infradead.org>, Ingo Molnar <mingo@kernel.org>,
        "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
        Mike Galbraith <umgwanakikbuti@gmail.com>,
        Yuyang Du <yuyang.du@intel.com>
From: Dietmar Eggemann <dietmar.eggemann@arm.com>
Message-ID: <43581f1a-fee7-5547-7946-ec1ffcfd64f1@arm.com>
Date: Tue, 11 Oct 2016 10:44:25 +0100
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101
 Thunderbird/45.2.0
MIME-Version: 1.0
In-Reply-To: <CAKfTPtCsM0T0wKMnbqu556kJLTu9gGQs874uLp84hDjqs3+U5Q@mail.gmail.com>
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 2926
Lines: 109

On 10/10/16 19:29, Vincent Guittot wrote:
> On 10 October 2016 at 15:54, Dietmar Eggemann <dietmar.eggemann@arm.com> wrote:
>> On 10/10/16 13:29, Vincent Guittot wrote:
>>> On 10 October 2016 at 12:01, Matt Fleming <matt@codeblueprint.co.uk> wrote:
>>>> On Sun, 09 Oct, at 11:39:27AM, Wanpeng Li wrote:

[...]

>>> I have tried to reprocude your issue on my target an hikey board (ARM
>>> based octo cores) but i failed to see a regression with commit
>>> 7dc603c9028e. Neverthless, i can see tasks not been well  spread
>>
>> Wasn't this about the two patches mentioned in this thread? The one from
>> Matt using 'se->sum_exec_runtime' in the if condition in
>> enqueue_entity_load_avg() and Peterz's conditional call to
>> update_rq_clock(rq) in enqueue_task()?
> 
> I was trying to reproduce the regression that Matt mentioned at the
> beg of the thread not those linked to proposed fixes

OK.

> 
>>
>>> during fork as you mentioned. So I have studied a bit more the
>>> spreading issue during fork last week and i have a new version of my
>>> proposed patch that i'm going to send soon. With this patch, i can see
>>> a good spread of tasks  during the fork sequence and some kind of perf
>>> improvement even if it's bit difficult as the variance is quite
>>> important with hackbench test so it's mainly an improvement of
>>> repeatability of the result
>>
>> Hikey  (ARM64 2x4 cpus) board: cpufreq: performance, cpuidle: disabled
>>
>> Performance counter stats for 'perf bench sched messaging -g 20 -l 500'
>> (10 runs):
>>
>> (1) tip/sched/core: commit 447976ef4fd0
>>
>>     5.902209533 seconds time elapsed ( +- 0.31% )
> 
> This seems to be too long to test the impact of the forking phase of hackbench

[...]

Yeah, you're right. But I can't see any significant difference. IMHO,
it's all in the noise.

(A) Performance counter stats for 'perf bench sched messaging -g 100 -l
    1 -t'
    # 20 sender and receiver threads per group
    # 100 groups == 4000 threads run

(1) tip/sched/core: commit 447976ef4fd0

    Total time: 0.188 [sec]

(2) tip/sched/core + original patch on the 'sched/fair: Do not decay
    new task load on first enqueue' thread (23/09/16)

    Total time: 0.199 [sec]

(3) tip/sched/core + Peter's ENQUEUE_NEW patch on the 'sched/fair: Do
    not decay new task load on first enqueue' thread (28/09/16)

    Total time: 0.178 [sec]

(B) hackbench -P -g 1
    Running in process mode with 1 groups using 40 file descriptors
    each (== 40 tasks)
    Each sender will pass 100 messages of 100 bytes

(1) 0.067

(2) 0.083

(3) 0.073

(C) hackbench -T -g 1
    Running in threaded mode with 1 groups using 40 file descriptors
    each (== 40 tasks)
    Each sender will pass 100 messages of 100 bytes

(1) 0.077

(2) 0.079

(3) 0.072

Maybe, instead of the performance gov, I should pin the frequency to a
lower one to eliminate the thermal influence on this Hikey board.