Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1759499AbbBILWr (ORCPT ); Mon, 9 Feb 2015 06:22:47 -0500 Received: from bombadil.infradead.org ([198.137.202.9]:41668 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752682AbbBILWp (ORCPT ); Mon, 9 Feb 2015 06:22:45 -0500 Date: Mon, 9 Feb 2015 12:22:37 +0100 From: Peter Zijlstra To: Zefan Li Cc: Ingo Molnar , Mike Galbraith , LKML , Stefan Bader Subject: Re: [PATCH] sched, autogroup: Fix failure when writing to cpu.rt_runtime_us Message-ID: <20150209112237.GR5029@twins.programming.kicks-ass.net> References: <54D32AD4.1060003@huawei.com> <20150205142527.GI5029@twins.programming.kicks-ass.net> <54D4194B.2040808@huawei.com> <20150206105840.GJ23123@twins.programming.kicks-ass.net> <54D5B873.5030208@huawei.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <54D5B873.5030208@huawei.com> User-Agent: Mutt/1.5.21 (2012-12-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4878 Lines: 137 On Sat, Feb 07, 2015 at 03:02:11PM +0800, Zefan Li wrote: > Before 8323f26ce342 "sched: Fix race in task_group()", task_group() of > those RT tasks always return root_task_group, but the escape can't happen. Ah, yes, I'm an idiot. I'm not sure what I was thinking, but I seemed to have confused myself very well indeed. > After commit 8323f26ce342, task_group always return root_task_group except > for the case I showed: > > 1. Change scheduling policy before setsid(): > > # cat /proc/sched_debug | grep test > R test 4194 24851.893077 945 120 24851.893077 11196.482331 0.000000 / > > 2. Change policy after setsid(): > > R test 4142 4962.517723 420 120 4962.517723 4974.126149 0.000000 /autogroup-44 Yes, which of course is inconsistent as well, it really is in the autogroup, regardless of its class. > I think we can fix it with: > > diff --git a/kernel/sched/auto_group.c b/kernel/sched/auto_group.c > index 8a2e230..8c3a169 100644 > --- a/kernel/sched/auto_group.c > +++ b/kernel/sched/auto_group.c > @@ -115,9 +115,6 @@ bool task_wants_autogroup(struct task_struct *p, struct task_group *tg) > if (tg != &root_task_group) > return false; > > - if (p->sched_class != &fair_sched_class) > - return false; > - Yes. > This is exactly what I did at first, but besides the issue described above, > seems it might lead to starving RT tasks. > > If there's some rt task in autogroups but none in root cgroup, it's allowed > to set rt_runtime to 0, so I think we have to disallow this setting, like what > we already do with global rt_runtime. > @@ -7540,6 +7543,9 @@ static int __rt_schedulable(struct task_group *tg, u64 period, u64 runtime) > .rt_runtime = runtime, > }; > > + if (tg == &root_task_group && runtime == 0) > + return -EINVAL; > + Indeed, setting runtime=0 for the root group is a very bad thing regardless of this patch. It would disallow the kernel from creating RT threads, which it needs for 'correct' operation in a number of cases. But lets make that a separate patch. So how about this? --- Subject: sched, autogroup: Fix failure to set cpu.rt_runtime_us From: Peter Zijlstra Date: Mon Feb 9 11:53:18 CET 2015 Because task_group() uses a cache of autogroup_task_group(), whoes output depends on sched_class, switching classes can generate problems. In particular, when started as fair, the cache points to the autogroup, so when switching to RT the tg_rt_schedulable() test fails for every cpu.rt_{runtime,period}_us change because now the autogroup has tasks and no runtime. Furthermore, going back to the previous semantics of varying task_group() with sched_class has the down-side that the sched_debug output varies as well, even though the task really is in the autogroup. Therefore add an autogroup exception to tg_has_rt_tasks() -- such that both (all) task_group() usages in sched/core now have one. And remove all the remnants of the variable task_group() output. Cc: Mike Galbraith Cc: Stefan Bader Reported-by: Zefan Li Fixes: 8323f26ce342 ("sched: Fix race in task_group()") Signed-off-by: Peter Zijlstra (Intel) --- kernel/sched/auto_group.c | 6 +----- kernel/sched/core.c | 6 ++++++ 2 files changed, 7 insertions(+), 5 deletions(-) --- a/kernel/sched/auto_group.c +++ b/kernel/sched/auto_group.c @@ -87,8 +87,7 @@ static inline struct autogroup *autogrou * so we don't have to move tasks around upon policy change, * or flail around trying to allocate bandwidth on the fly. * A bandwidth exception in __sched_setscheduler() allows - * the policy change to proceed. Thereafter, task_group() - * returns &root_task_group, so zero bandwidth is required. + * the policy change to proceed. */ free_rt_sched_group(tg); tg->rt_se = root_task_group.rt_se; @@ -115,9 +114,6 @@ bool task_wants_autogroup(struct task_st if (tg != &root_task_group) return false; - if (p->sched_class != &fair_sched_class) - return false; - /* * We can only assume the task group can't go away on us if * autogroup_move_group() can see us on ->thread_group list. --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -7644,6 +7644,12 @@ static inline int tg_has_rt_tasks(struct { struct task_struct *g, *p; + /* + * Autogroups do not have RT tasks; see autogroup_create(). + */ + if (task_group_is_autogroup(tg)) + return 0; + for_each_process_thread(g, p) { if (rt_task(p) && task_group(p) == tg) return 1; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/