Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752328AbdGXIG2 (ORCPT ); Mon, 24 Jul 2017 04:06:28 -0400 Received: from mail.santannapisa.it ([193.205.80.98]:35104 "EHLO mail.santannapisa.it" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751955AbdGXIGP (ORCPT ); Mon, 24 Jul 2017 04:06:15 -0400 Date: Mon, 24 Jul 2017 10:06:09 +0200 From: Luca Abeni To: Peter Zijlstra Cc: linux-kernel@vger.kernel.org, Ingo Molnar , Juri Lelli , Claudio Scordino , Steven Rostedt , Tommaso Cucinotta , Daniel Bristot de Oliveira , Joel Fernandes , Mathieu Poirier Subject: Re: [RFC v5 2/9] sched/deadline: improve the tracking of active utilization Message-ID: <20170724100609.4ec38473@luca> In-Reply-To: <20170324224715.4098dbfb@nowhere> References: <1490327582-4376-1-git-send-email-luca.abeni@santannapisa.it> <1490327582-4376-3-git-send-email-luca.abeni@santannapisa.it> <20170324132041.6fxayfe3id4af3n5@hirez.programming.kicks-ass.net> <20170324224715.4098dbfb@nowhere> X-Mailer: Claws Mail 3.13.2 (GTK+ 2.24.30; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 7864 Lines: 217 Hi Peter, On Fri, 24 Mar 2017 22:47:15 +0100 luca abeni wrote: > Hi Peter, > > On Fri, 24 Mar 2017 14:20:41 +0100 > Peter Zijlstra wrote: > > > On Fri, Mar 24, 2017 at 04:52:55AM +0100, luca abeni wrote: > > > > > diff --git a/include/linux/sched.h b/include/linux/sched.h > > > index d67eee8..952cac8 100644 > > > --- a/include/linux/sched.h > > > +++ b/include/linux/sched.h > > > @@ -445,16 +445,33 @@ struct sched_dl_entity { > > > * > > > * @dl_yielded tells if task gave up the CPU before > > > consuming > > > * all its available runtime during the last job. > > > + * > > > + * @dl_non_contending tells if task is inactive while still > > > + * contributing to the active utilization. In other words, > > > it > > > + * indicates if the inactive timer has been armed and its > > > handler > > > + * has not been executed yet. This flag is useful to avoid > > > race > > > + * conditions between the inactive timer handler and the > > > wakeup > > > + * code. > > > */ > > > int dl_throttled; > > > int dl_boosted; > > > int dl_yielded; > > > + int dl_non_contending; > > > > We should maybe look into making those bits :1, not something for this > > patch though; > > Yes, grouping all the flags in a single field was my intention too... I > planned to submit a patch to do this after merging the reclaiming > patches... But maybe it is better to do this first :) I implemented this change, but before submitting the patch I have a small question. I implemented some helpers to access the various {throttled,boosted,yielded,non_contending} flags. I have some "dl_{throttled,boosted,...}()" inline functions for reading the values of the flags, and some inline functions for setting / clearing the flags. For these, I have two possibilities: - using two separate "dl_set_{throttled,...}()" and "dl_clear_{throttled,..}()" functions for each flag - using one single "dl_set_{throttled,...}(dl, value)" function per flag, in which the flag's value is specified. I have no preferences (with the first proposal, I introduce more inline functions, but I think the functions can be made more efficient / optimized). Which one of the two proposals is preferred? (or, there is a third, better, idea that I overlooked?) Thanks, Luca > > > > > diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c > > > index cef9adb..86aed82 100644 > > > --- a/kernel/sched/deadline.c > > > +++ b/kernel/sched/deadline.c > > > @@ -65,6 +65,107 @@ void sub_running_bw(u64 dl_bw, struct dl_rq > > > *dl_rq) dl_rq->running_bw = 0; > > > } > > > > > > +void dl_change_utilization(struct task_struct *p, u64 new_bw) > > > +{ > > > + if (!task_on_rq_queued(p)) { > > > + struct rq *rq = task_rq(p); > > > + > > > + if (p->dl.dl_non_contending) { > > > + sub_running_bw(p->dl.dl_bw, &rq->dl); > > > + p->dl.dl_non_contending = 0; > > > + /* > > > + * If the timer handler is currently > > > running and the > > > + * timer cannot be cancelled, > > > inactive_task_timer() > > > + * will see that dl_not_contending is not > > > set, and > > > + * will not touch the rq's active > > > utilization, > > > + * so we are still safe. > > > + */ > > > + if > > > (hrtimer_try_to_cancel(&p->dl.inactive_timer) == 1) > > > + put_task_struct(p); > > > + } > > > + } > > > +} > > > > If we rearrange that slightly we can avoid the double indent: > [...] > Ok; I'll rework the code in this way on Monday. > > [...] > > > + if (hrtimer_try_to_cancel(&dl_se->inactive_timer) > > > == 1) > > > + put_task_struct(dl_task_of(dl_se)); > > > + dl_se->dl_non_contending = 0; > > > > This had me worried for a little while as being a use-after-free, but > > this put_task_struct() cannot be the last in this case. Might be > > worth a comment. > > Or maybe it is better to move "dl_se->dl_non_contending = 0;" before > hrtimer_try_to_cancel()? > > > > > > + } else { > > > + /* > > > + * Since "dl_non_contending" is not set, the > > > + * task's utilization has already been removed from > > > + * active utilization (either when the task > > > blocked, > > > + * when the "inactive timer" fired). > > > + * So, add it back. > > > + */ > > > + add_running_bw(dl_se->dl_bw, dl_rq); > > > + } > > > +} > > > > In general I feel it would be nice to have a state diagram included > > somewhere near these two functions. It would be nice to not have to > > dig out the PDF every time. > > Ok... Since I am not good at ascii art, would it be ok to add a textual > description? If yes, I'll add a comment like: > " > The utilization of a task is added to the runqueue's active utilization > when the task becomes active (is enqueued in the runqueue), and is > removed when the task becomes inactive. A task does not become > immediately inactive when it blocks, but becomes inactive at the so > called "0 lag time"; so, we setup the "inactive timer" to fire at the > "0 lag time". When the "inactive timer" fires, the task utilization is > removed from the runqueue's active utilization. If the task wakes up > again on the same runqueue before the "0 lag time", the active > utilization must not be changed and the "inactive timer" must be > cancelled. If the task wakes up again on a different runqueue before > the "0 lag time", then the task's utilization must be removed from the > previous runqueue's active utilization and must be added to the new > runqueue's active utilization. > In order to avoid races between a task waking up on a runqueue while the > "inactive timer" is running on a different CPU, the "dl_non_contending" > flag is used to indicate that a task is not on a runqueue but is active > (so, the flag is set when the task blocks and is cleared when the > "inactive timer" fires or when the task wakes up). > " > (if this is ok, where can I add this comment?) > > > > > +static void migrate_task_rq_dl(struct task_struct *p) > > > +{ > > > + if ((p->state == TASK_WAKING) && > > > (p->dl.dl_non_contending)) { > > > + struct rq *rq = task_rq(p); > > > + > > > + raw_spin_lock(&rq->lock); > > > + sub_running_bw(p->dl.dl_bw, &rq->dl); > > > + p->dl.dl_non_contending = 0; > > > + /* > > > + * If the timer handler is currently running and > > > the > > > + * timer cannot be cancelled, inactive_task_timer() > > > + * will see that dl_not_contending is not set, and > > > + * will not touch the rq's active utilization, > > > + * so we are still safe. > > > + */ > > > + if (hrtimer_try_to_cancel(&p->dl.inactive_timer) > > > == 1) > > > + put_task_struct(p); > > > + > > > + raw_spin_unlock(&rq->lock); > > > + } > > > +} > > > > This can similarly be reduced in indent, albeit this is only a single > > indent level, so not as onerous as the other one. > > Ok; I'll do this on Monday. > > > > What had me puzzled for a while here is taking the lock; because some > > callers of set_task_cpu() must in fact hold this lock already. Worth a > > comment I feel. > > Ok; I'll add a comment mentioning that since state is TASK_WAKING we do > not have the lock. > > > > Once I figured out the exact locking; I realized you do this on > > cross-cpu wakeups. We spend a fair amount of effort not to take both > > rq locks. But I suppose you have to here. > > The problem is that when a task wakes up before the "0 lag time" on a > different runqueue, we must "migrate" its utilization from the old > runqueue to the new one (see comment above). And I need the lock to > avoid races... The only alternative I can think about is to ad a new > lock protecting the active utilization, and to take it instead of the > runqueue lock (I do not know if this is enough, I need to check...). > I'll have a look on Monday. > > > > Thanks, > Luca