Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933567AbcJMQzW (ORCPT ); Thu, 13 Oct 2016 12:55:22 -0400 Received: from mail-lf0-f41.google.com ([209.85.215.41]:34459 "EHLO mail-lf0-f41.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933655AbcJMQzE (ORCPT ); Thu, 13 Oct 2016 12:55:04 -0400 MIME-Version: 1.0 In-Reply-To: <57FFADC8.2020602@canonical.com> References: <57F7F9AF.2010609@canonical.com> <20161008080019.GE3142@twins.programming.kicks-ass.net> <20161008083936.GA13658@gmail.com> <1475927349.5573.1.camel@gmx.de> <57FE6302.6060103@canonical.com> <57FFADC8.2020602@canonical.com> From: Vincent Guittot Date: Thu, 13 Oct 2016 18:48:12 +0200 Message-ID: Subject: Re: [v4.8-rc1 Regression] sched/fair: Apply more PELT fixes To: Joseph Salisbury Cc: Ingo Molnar , Peter Zijlstra , Linus Torvalds , Thomas Gleixner , LKML , Mike Galbraith Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4428 Lines: 104 On 13 October 2016 at 17:52, Joseph Salisbury wrote: > On 10/13/2016 06:58 AM, Vincent Guittot wrote: >> Hi, >> >> On 12 October 2016 at 18:21, Joseph Salisbury >> wrote: >>> On 10/12/2016 08:20 AM, Vincent Guittot wrote: >>>> On 8 October 2016 at 13:49, Mike Galbraith wrote: >>>>> On Sat, 2016-10-08 at 13:37 +0200, Vincent Guittot wrote: >>>>>> On 8 October 2016 at 10:39, Ingo Molnar wrote: >>>>>>> * Peter Zijlstra wrote: >>>>>>> >>>>>>>> On Fri, Oct 07, 2016 at 03:38:23PM -0400, Joseph Salisbury wrote: >>>>>>>>> Hello Peter, >>>>>>>>> >>>>>>>>> A kernel bug report was opened against Ubuntu [0]. After a >>>>>>>>> kernel >>>>>>>>> bisect, it was found that reverting the following commit >>>>>>>>> resolved this bug: >>>>>>>>> >>>>>>>>> commit 3d30544f02120b884bba2a9466c87dba980e3be5 >>>>>>>>> Author: Peter Zijlstra >>>>>>>>> Date: Tue Jun 21 14:27:50 2016 +0200 >>>>>>>>> >>>>>>>>> sched/fair: Apply more PELT fixes >>>>>> This patch only speeds up the update of task group load in order to >>>>>> reflect the new load balance but It should not change the final value >>>>>> and as a result the final behavior. I will try to reproduce it in my >>>>>> target later today >>>>> FWIW, I tried and failed w/wo autogroup on 4.8 and master. >>>> Me too >>>> >>>> Is it possible to get some dump of /proc/sched_debug while the problem occurs ? >>>> >>>> Vincent >>>> >>>>> -Mike >>> The output from /proc/shed_debug can be seen here: >>> http://paste.ubuntu.com/23312351/ >> I have looked at the dump and there is something very odd for >> system.slice task group where the display manager is running. >> system.slice->tg_load_avg is around 381697 but tg_load_avg is >> normally equal to Sum of system.slice[cpu]->tg_load_avg_contrib >> whereas Sum of system.slice[cpu]->tg_load_avg_contrib = 1013 in our >> case. We can have some differences because the dump of >> /proc/shed_debug is not atomic and some changes can happen but nothing >> like this difference. >> >> The main effect of this quite high value is that the weight/prio of >> the sched_entity that represents system.slice in root cfs_rq is very >> low (lower than task with the smallest nice prio) so the system.slice >> task group will not get the CPU quite often compared to the user.slice >> task group: less than 1% for the system.slice where lightDM and xorg >> are running compared 99% for the user.slice where the stress tasks are >> running. This is confirmed by the se->avg.util_avg value of the task >> groups which reflect how much time each task group is effectively >> running on a CPU: >> system.slice[CPU3].se->avg.util_avg = 8 whereas >> user.slice[CPU3].se->avg.util_avg = 991 >> >> This difference of weight/priority explains why the system becomes >> unresponsive. For now, I can't explain is why >> system.slice->tg_load_avg = 381697 whereas is should be around 1013 >> and how the patch can generate this situation. >> >> Is it possible to have a dump of /proc/sched_debug before starting >> stress command ? to check if the problem is there from the beginning >> but not seen because not overloaded. Or if it the problem comes when >> user starts to load the system > Here is the dump before stress is started: > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1627108/+attachment/4760437/+files/dump_nonbuggy This one is ok. The dump indicates Sched Debug Version: v0.11, 4.8.0-11-generic #12~lp1627108Commit3d30544Reverted so this is without the culprit commit > > Here it is after: > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1627108/+attachment/4760436/+files/dump_buggy > This one has the exact same odds values for system.slice->tg_load_avg than the 1st dump that you sent yesterday The dump indicates Sched Debug Version: v0.11, 4.8.0-22-generic #24-Ubuntu So this dump has been done with a different kernel than for the dump above. As I can't find any stress task in the dump, i tend to believe that the dump has been done before starting the stress tasks and not after starting them. Can you confirm ? If i'm right, it mean that the problem was already there before starting stress tasks. > >> >> Thanks, >> >>> Ingo, the latest scheduler bits also still exhibit the bug: >>> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git >>> >>> >