Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753327AbcDOINu (ORCPT ); Fri, 15 Apr 2016 04:13:50 -0400 Received: from mx1.redhat.com ([209.132.183.28]:48113 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752021AbcDOINq (ORCPT ); Fri, 15 Apr 2016 04:13:46 -0400 Reply-To: xlpang@redhat.com Subject: Re: [PATCH] sched/deadline: Fix a bug in dl_overflow() References: <1460636368-1993-1-git-send-email-xlpang@redhat.com> <20160415070709.GA3908@pablo> To: Juri Lelli Cc: linux-kernel@vger.kernel.org, Peter Zijlstra , Ingo Molnar , Steven Rostedt , luca abeni From: Xunlei Pang Message-ID: <5710A2B5.2060706@redhat.com> Date: Fri, 15 Apr 2016 16:13:41 +0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.2.0 MIME-Version: 1.0 In-Reply-To: <20160415070709.GA3908@pablo> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3831 Lines: 107 On 2016/04/15 at 15:07, Juri Lelli wrote: > [+Luca] > > Hi, > > On 14/04/16 20:19, Xunlei Pang wrote: >> I got a minus(very big) dl_b->total_bw during my deadline tests. >> >> # grep dl /proc/sched_debug >> dl_rq[0]: >> .dl_nr_running : 0 >> .dl_bw->bw : 996147 >> .dl_bw->total_bw : -222297900 >> >> Something unusual must have happened. >> >> After some digging, I finally noticed that when changing a deadline >> task to normal(cfs), and changing it back to deadline immediately, >> after it died, we will got the wrong dl_bw->total_bw. >> >> The root cause is in dl_overflow(), it has: >> if (new_bw == p->dl.dl_bw) >> return 0; >> >> 1) When a deadline task is changed to !deadline task, it will start >> dl timer in switched_from_dl(), and retain previous deadline parameter >> till the timer expires. >> 2) If we change it back to deadline with the same bandwidth parameter >> before the timer expires, as it keeps the old bandwidth although it >> is not a deadline task. dl_overflow() simply returns success without >> updating the right data, and got the wrong dl_bw->total_bw. >> >> The solution is simple, if @p is not deadline, don't return. >> >> Signed-off-by: Xunlei Pang >> --- >> kernel/sched/core.c | 3 ++- >> 1 file changed, 2 insertions(+), 1 deletion(-) >> >> diff --git a/kernel/sched/core.c b/kernel/sched/core.c >> index 4a2c79d..5988fee 100644 >> --- a/kernel/sched/core.c >> +++ b/kernel/sched/core.c >> @@ -2378,7 +2378,8 @@ static int dl_overflow(struct task_struct *p, int policy, >> u64 new_bw = dl_policy(policy) ? to_ratio(period, runtime) : 0; >> int cpus, err = -1; >> >> - if (new_bw == p->dl.dl_bw) >> + /* !deadline task may carry old deadline bandwidth */ >> + if (new_bw == p->dl.dl_bw && task_has_dl_policy(p)) > Right. I got the same patch that I believe Luca is be already using for > his tests (and he also put together the changelog). I never managed to > send it out, sorry about that. We can take yours, mine follows just in > case we want to take something from the changelog or we want to reverse > the if condition. Ah, exactly the same issue. Thanks for your feedback :-) Regards, Xunlei > > Thanks, > > - Juri > > --->8--- > > From 4bf38111bd9383035e03d3dc3d42011aaa9e26e7 Mon Sep 17 00:00:00 2001 > From: Juri Lelli > Date: Thu, 25 Feb 2016 15:50:42 +0100 > Subject: [PATCH 2/3] fix a bug in the -deadline utilization tracking mechanism > > Currently, a task doing > while(1) { > switch to SCHED_DEADLINE > switch to SCHED_OTHER > } > brings dl_b->total_bw below 0. > This happens because when the task switches back from SCHED_DEADLINE > to SCHED_OTHER, switched_from_dl() does not clear its deadline > parameters (they will be cleared by the deadline timer when it fires). > But dl_overflow() removes its utilization from dl_b->total_bw. > When the task switches back to SCHED_DEADLINE, the > if (new_bw == p->dl.dl_bw) > check in dl_overflow() prevents __dl_add() from being called, and > so when the task switches back to SCHED_OTHER dl_b->total_bw becomes > negative. > This patch changes the check so that if the task is switching from > SCHED_OTHER to SCHED_DEADLINE __dl_add() is correctly invoked. > --- > kernel/sched/core.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index 9503d59..d59fa20 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -2432,7 +2432,7 @@ static int dl_overflow(struct task_struct *p, int policy, > u64 new_bw = dl_policy(policy) ? to_ratio(period, runtime) : 0; > int cpus, err = -1; > > - if (new_bw == p->dl.dl_bw) > + if (task_has_dl_policy(p) && new_bw == p->dl.dl_bw) > return 0; > > /*