MIME-Version: 1.0
In-Reply-To: <57FFADC8.2020602@canonical.com>
References: <57F7F9AF.2010609@canonical.com> <20161008080019.GE3142@twins.programming.kicks-ass.net>
 <20161008083936.GA13658@gmail.com> <CAKfTPtDbqLGRG1iD7NbTOrrg-9RqfFpt6JN_uB9FcWBSWkzrGw@mail.gmail.com>
 <1475927349.5573.1.camel@gmx.de> <CAKfTPtDo7PsZjDSH4KteyCJ=yq63Ha5iwkucjRTvtP_J3o=MKw@mail.gmail.com>
 <57FE6302.6060103@canonical.com> <CAKfTPtDLLzt2s0_p2uw3dt8MRH0o8_Tus6P5Uze+6gBHFm-rVg@mail.gmail.com>
 <57FFADC8.2020602@canonical.com>
From: Vincent Guittot <vincent.guittot@linaro.org>
Date: Thu, 13 Oct 2016 18:48:12 +0200
Message-ID: <CAKfTPtDU+DmF4iLwHWF2jEnZCjPUXBOu9c6Zqh4+kQPMio39nQ@mail.gmail.com>
Subject: Re: [v4.8-rc1 Regression] sched/fair: Apply more PELT fixes
To: Joseph Salisbury <joseph.salisbury@canonical.com>
Cc: Ingo Molnar <mingo@kernel.org>, Peter Zijlstra <peterz@infradead.org>,
        Linus Torvalds <torvalds@linux-foundation.org>,
        Thomas Gleixner <tglx@linutronix.de>,
        LKML <linux-kernel@vger.kernel.org>, Mike Galbraith <efault@gmx.de>
Content-Type: text/plain; charset=UTF-8
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 4428
Lines: 104

On 13 October 2016 at 17:52, Joseph Salisbury
<joseph.salisbury@canonical.com> wrote:
> On 10/13/2016 06:58 AM, Vincent Guittot wrote:
>> Hi,
>>
>> On 12 October 2016 at 18:21, Joseph Salisbury
>> <joseph.salisbury@canonical.com> wrote:
>>> On 10/12/2016 08:20 AM, Vincent Guittot wrote:
>>>> On 8 October 2016 at 13:49, Mike Galbraith <efault@gmx.de> wrote:
>>>>> On Sat, 2016-10-08 at 13:37 +0200, Vincent Guittot wrote:
>>>>>> On 8 October 2016 at 10:39, Ingo Molnar <mingo@kernel.org> wrote:
>>>>>>> * Peter Zijlstra <peterz@infradead.org> wrote:
>>>>>>>
>>>>>>>> On Fri, Oct 07, 2016 at 03:38:23PM -0400, Joseph Salisbury wrote:
>>>>>>>>> Hello Peter,
>>>>>>>>>
>>>>>>>>> A kernel bug report was opened against Ubuntu [0].  After a
>>>>>>>>> kernel
>>>>>>>>> bisect, it was found that reverting the following commit
>>>>>>>>> resolved this bug:
>>>>>>>>>
>>>>>>>>> commit 3d30544f02120b884bba2a9466c87dba980e3be5
>>>>>>>>> Author: Peter Zijlstra <peterz@infradead.org>
>>>>>>>>> Date:   Tue Jun 21 14:27:50 2016 +0200
>>>>>>>>>
>>>>>>>>>     sched/fair: Apply more PELT fixes
>>>>>> This patch only speeds up the update of task group load in order to
>>>>>> reflect the new load balance but It should not change the final value
>>>>>> and as a result the final behavior. I will try to reproduce it in my
>>>>>> target later today
>>>>> FWIW, I tried and failed w/wo autogroup on 4.8 and master.
>>>> Me too
>>>>
>>>> Is it possible to get some dump of  /proc/sched_debug while the problem occurs ?
>>>>
>>>> Vincent
>>>>
>>>>>         -Mike
>>> The output from /proc/shed_debug can be seen here:
>>> http://paste.ubuntu.com/23312351/
>> I have looked at the dump and there is something very odd for
>> system.slice task group where the display manager is running.
>> system.slice->tg_load_avg is around 381697 but  tg_load_avg is
>> normally equal to Sum of system.slice[cpu]->tg_load_avg_contrib
>> whereas Sum of system.slice[cpu]->tg_load_avg_contrib = 1013 in our
>> case. We can have some differences because the dump of
>> /proc/shed_debug is not atomic and some changes can happen but nothing
>> like this difference.
>>
>> The main effect of this quite high value is that the weight/prio of
>> the sched_entity that represents system.slice in root cfs_rq is very
>> low (lower than task with the smallest nice prio) so the system.slice
>> task group will not get the CPU quite often compared to the user.slice
>> task group: less than 1% for the system.slice where lightDM and xorg
>> are running compared 99% for the user.slice where the stress tasks are
>> running. This is confirmed by the se->avg.util_avg value of the task
>> groups which reflect how much time each task group is effectively
>> running on a CPU:
>> system.slice[CPU3].se->avg.util_avg = 8 whereas
>> user.slice[CPU3].se->avg.util_avg = 991
>>
>> This difference of weight/priority explains why the system becomes
>> unresponsive. For now, I can't explain is why
>> system.slice->tg_load_avg = 381697 whereas is should be around 1013
>> and how the patch can generate this situation.
>>
>> Is it possible to have a dump of /proc/sched_debug before starting
>> stress command ? to check if the problem is there from the beginning
>> but not seen because not overloaded. Or if it the problem comes when
>> user starts to load the system
> Here is the dump before stress is started:
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1627108/+attachment/4760437/+files/dump_nonbuggy

This one is ok.
The dump indicates Sched Debug Version: v0.11, 4.8.0-11-generic
#12~lp1627108Commit3d30544Reverted
so this is without the culprit commit

>
> Here it is after:
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1627108/+attachment/4760436/+files/dump_buggy
>

This one has the exact same odds values for system.slice->tg_load_avg
than the 1st dump that you sent yesterday
The dump indicates Sched Debug Version: v0.11, 4.8.0-22-generic #24-Ubuntu
So this dump has been done with a different kernel than for the dump above.
As I can't find any stress task in the dump, i tend to believe that
the dump has been done before starting the stress tasks and not after
starting them. Can you confirm ?

If i'm right, it mean that the problem was already there before
starting stress tasks.


>
>>
>> Thanks,
>>
>>> Ingo, the latest scheduler bits also still exhibit the bug:
>>> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git
>>>
>>>
>