MIME-Version: 1.0
In-Reply-To: <de3b22f1-5f25-4be8-9678-ee5d272255fb@codeaurora.org>
References: <20161014151827.GA10379@linaro.org> <2bb765e7-8a5f-c525-a6ae-fbec6fae6354@canonical.com>
 <20161017090903.GA11962@linaro.org> <4e15ad55-beeb-e860-0420-8f439d076758@arm.com>
 <20161017131952.GR3117@twins.programming.kicks-ass.net> <CAKfTPtAxw1b-vy285HKtPUBFYuJdv2CFZH_gP3CMtZHs1wLPXg@mail.gmail.com>
 <94cc6deb-f93e-60ec-5834-e84a8b98e73c@arm.com> <20161018090747.GW3142@twins.programming.kicks-ass.net>
 <CAKfTPtCH4wbzHaDzgAtLLSLUegQfztgE5bO4JV-SYxEGd78KGA@mail.gmail.com>
 <20161018103412.GT3117@twins.programming.kicks-ass.net> <20161018115651.GA20956@linaro.org>
 <de3b22f1-5f25-4be8-9678-ee5d272255fb@codeaurora.org>
From: Vincent Guittot <vincent.guittot@linaro.org>
Date: Wed, 19 Oct 2016 08:42:43 +0200
Message-ID: <CAKfTPtDV6z7PY1JOombxpHs=BCq8ncXJJOW-wMw3SAu0grU2qg@mail.gmail.com>
Subject: Re: [v4.8-rc1 Regression] sched/fair: Apply more PELT fixes
To: Joonwoo Park <joonwoop@codeaurora.org>
Cc: Peter Zijlstra <peterz@infradead.org>,
        Dietmar Eggemann <dietmar.eggemann@arm.com>,
        Joseph Salisbury <joseph.salisbury@canonical.com>,
        Ingo Molnar <mingo@kernel.org>,
        Linus Torvalds <torvalds@linux-foundation.org>,
        Thomas Gleixner <tglx@linutronix.de>,
        LKML <linux-kernel@vger.kernel.org>, Mike Galbraith <efault@gmx.de>,
        omer.akram@canonical.com
Content-Type: text/plain; charset=UTF-8
Sender: linux-kernel-owner@vger.kernel.org
Content-Transfer-Encoding: 8bit
Content-Length: 5857
Lines: 139

On 18 October 2016 at 23:58, Joonwoo Park <joonwoop@codeaurora.org> wrote:
>
>
> On 10/18/2016 04:56 AM, Vincent Guittot wrote:
>>
>> Le Tuesday 18 Oct 2016 à 12:34:12 (+0200), Peter Zijlstra a écrit :
>>>
>>> On Tue, Oct 18, 2016 at 11:45:48AM +0200, Vincent Guittot wrote:
>>>>
>>>> On 18 October 2016 at 11:07, Peter Zijlstra <peterz@infradead.org>
>>>> wrote:
>>>>>
>>>>> So aside from funny BIOSes, this should also show up when creating
>>>>> cgroups when you have offlined a few CPUs, which is far more common I'd
>>>>> think.
>>>>
>>>>
>>>> The problem is also that the load of the tg->se[cpu] that represents
>>>> the tg->cfs_rq[cpu] is initialized to 1024 in:
>>>> alloc_fair_sched_group
>>>>      for_each_possible_cpu(i) {
>>>>          init_entity_runnable_average(se);
>>>>             sa->load_avg = scale_load_down(se->load.weight);
>>>>
>>>> Initializing  sa->load_avg to 1024 for a newly created task makes
>>>> sense as we don't know yet what will be its real load but i'm not sure
>>>> that we have to do the same for se that represents a task group. This
>>>> load should be initialized to 0 and it will increase when task will be
>>>> moved/attached into task group
>>>
>>>
>>> Yes, I think that makes sense, not sure how horrible that is with the
>>
>>
>> That should not be that bad because this initial value is only useful for
>> the few dozens of ms that follow the creation of the task group
>>
>>>
>>> current state of things, but after your propagate patch, that
>>> reinstates the interactivity hack that should work for sure.
>>
>>
>> The patch below fixes the issue on my platform:
>>
>> Dietmar, Omer can you confirm that this fix the problem of your platform
>> too ?
>
>
> I just noticed this thread after posting
> https://lkml.org/lkml/2016/10/18/719...
> Noticed this bug while a ago and had the patch above at least a week but
> unfortunately didn't have time to post...
> I think Omer had same problem I was trying to fix and I believe patch I post
> should address it.
>
> Vincent, your version fixes my test case as well.

Thanks for testing.
Can i consider this as a Tested-by ?

> This is sched_stat from the same test case I had in my changelog.
> Note dd-2030 which is in root cgroup had same runtime as dd-2033 which is in
> child cgroup.
>
> dd (2030, #threads: 1)
> -------------------------------------------------------------------
> se.exec_start                                :        275700.024137
> se.vruntime                                  :         10589.114654
> se.sum_exec_runtime                          :          1576.837993
> se.nr_migrations                             :                    0
> nr_switches                                  :                  159
> nr_voluntary_switches                        :                    0
> nr_involuntary_switches                      :                  159
> se.load.weight                               :              1048576
> se.avg.load_sum                              :             48840575
> se.avg.util_sum                              :             19741820
> se.avg.load_avg                              :                 1022
> se.avg.util_avg                              :                  413
> se.avg.last_update_time                      :         275700024137
> policy                                       :                    0
> prio                                         :                  120
> clock-delta                                  :                   34
> dd (2033, #threads: 1)
> -------------------------------------------------------------------
> se.exec_start                                :        275710.037178
> se.vruntime                                  :          2383.802868
> se.sum_exec_runtime                          :          1576.547591
> se.nr_migrations                             :                    0
> nr_switches                                  :                  162
> nr_voluntary_switches                        :                    0
> nr_involuntary_switches                      :                  162
> se.load.weight                               :              1048576
> se.avg.load_sum                              :             48316646
> se.avg.util_sum                              :             21235249
> se.avg.load_avg                              :                 1011
> se.avg.util_avg                              :                  444
> se.avg.last_update_time                      :         275710037178
> policy                                       :                    0
> prio                                         :                  120
> clock-delta                                  :                   36
>
> Thanks,
> Joonwoo
>
>
>>
>> ---
>>  kernel/sched/fair.c | 9 ++++++++-
>>  1 file changed, 8 insertions(+), 1 deletion(-)
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 8b03fb5..89776ac 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -690,7 +690,14 @@ void init_entity_runnable_average(struct sched_entity
>> *se)
>>          * will definitely be update (after enqueue).
>>          */
>>         sa->period_contrib = 1023;
>> -       sa->load_avg = scale_load_down(se->load.weight);
>> +       /*
>> +        * Tasks are intialized with full load to be seen as heavy task
>> until
>> +        * they get a chance to stabilize to their real load level.
>> +        * group entity are intialized with null load to reflect the fact
>> that
>> +        * nothing has been attached yet to the task group.
>> +        */
>> +       if (entity_is_task(se))
>> +               sa->load_avg = scale_load_down(se->load.weight);
>>         sa->load_sum = sa->load_avg * LOAD_AVG_MAX;
>>         /*
>>          * At this point, util_avg won't be used in select_task_rq_fair
>> anyway
>>
>>
>>
>>
>