Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S936414AbdCXVre (ORCPT ); Fri, 24 Mar 2017 17:47:34 -0400 Received: from mail.santannapisa.it ([193.205.80.99]:65129 "EHLO mail.santannapisa.it" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933123AbdCXVrZ (ORCPT ); Fri, 24 Mar 2017 17:47:25 -0400 Date: Fri, 24 Mar 2017 22:47:15 +0100 From: luca abeni To: Peter Zijlstra Cc: linux-kernel@vger.kernel.org, Ingo Molnar , Juri Lelli , Claudio Scordino , Steven Rostedt , Tommaso Cucinotta , Daniel Bristot de Oliveira , Joel Fernandes , Mathieu Poirier Subject: Re: [RFC v5 2/9] sched/deadline: improve the tracking of active utilization Message-ID: <20170324224715.4098dbfb@nowhere> In-Reply-To: <20170324132041.6fxayfe3id4af3n5@hirez.programming.kicks-ass.net> References: <1490327582-4376-1-git-send-email-luca.abeni@santannapisa.it> <1490327582-4376-3-git-send-email-luca.abeni@santannapisa.it> <20170324132041.6fxayfe3id4af3n5@hirez.programming.kicks-ass.net> Organization: Scuola Superiore S.Anna X-Mailer: Claws Mail 3.13.2 (GTK+ 2.24.30; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 6488 Lines: 190 Hi Peter, On Fri, 24 Mar 2017 14:20:41 +0100 Peter Zijlstra wrote: > On Fri, Mar 24, 2017 at 04:52:55AM +0100, luca abeni wrote: > > > diff --git a/include/linux/sched.h b/include/linux/sched.h > > index d67eee8..952cac8 100644 > > --- a/include/linux/sched.h > > +++ b/include/linux/sched.h > > @@ -445,16 +445,33 @@ struct sched_dl_entity { > > * > > * @dl_yielded tells if task gave up the CPU before > > consuming > > * all its available runtime during the last job. > > + * > > + * @dl_non_contending tells if task is inactive while still > > + * contributing to the active utilization. In other words, > > it > > + * indicates if the inactive timer has been armed and its > > handler > > + * has not been executed yet. This flag is useful to avoid > > race > > + * conditions between the inactive timer handler and the > > wakeup > > + * code. > > */ > > int dl_throttled; > > int dl_boosted; > > int dl_yielded; > > + int dl_non_contending; > > We should maybe look into making those bits :1, not something for this > patch though; Yes, grouping all the flags in a single field was my intention too... I planned to submit a patch to do this after merging the reclaiming patches... But maybe it is better to do this first :) > > diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c > > index cef9adb..86aed82 100644 > > --- a/kernel/sched/deadline.c > > +++ b/kernel/sched/deadline.c > > @@ -65,6 +65,107 @@ void sub_running_bw(u64 dl_bw, struct dl_rq > > *dl_rq) dl_rq->running_bw = 0; > > } > > > > +void dl_change_utilization(struct task_struct *p, u64 new_bw) > > +{ > > + if (!task_on_rq_queued(p)) { > > + struct rq *rq = task_rq(p); > > + > > + if (p->dl.dl_non_contending) { > > + sub_running_bw(p->dl.dl_bw, &rq->dl); > > + p->dl.dl_non_contending = 0; > > + /* > > + * If the timer handler is currently > > running and the > > + * timer cannot be cancelled, > > inactive_task_timer() > > + * will see that dl_not_contending is not > > set, and > > + * will not touch the rq's active > > utilization, > > + * so we are still safe. > > + */ > > + if > > (hrtimer_try_to_cancel(&p->dl.inactive_timer) == 1) > > + put_task_struct(p); > > + } > > + } > > +} > > If we rearrange that slightly we can avoid the double indent: [...] Ok; I'll rework the code in this way on Monday. [...] > > + if (hrtimer_try_to_cancel(&dl_se->inactive_timer) > > == 1) > > + put_task_struct(dl_task_of(dl_se)); > > + dl_se->dl_non_contending = 0; > > This had me worried for a little while as being a use-after-free, but > this put_task_struct() cannot be the last in this case. Might be > worth a comment. Or maybe it is better to move "dl_se->dl_non_contending = 0;" before hrtimer_try_to_cancel()? > > > + } else { > > + /* > > + * Since "dl_non_contending" is not set, the > > + * task's utilization has already been removed from > > + * active utilization (either when the task > > blocked, > > + * when the "inactive timer" fired). > > + * So, add it back. > > + */ > > + add_running_bw(dl_se->dl_bw, dl_rq); > > + } > > +} > > In general I feel it would be nice to have a state diagram included > somewhere near these two functions. It would be nice to not have to > dig out the PDF every time. Ok... Since I am not good at ascii art, would it be ok to add a textual description? If yes, I'll add a comment like: " The utilization of a task is added to the runqueue's active utilization when the task becomes active (is enqueued in the runqueue), and is removed when the task becomes inactive. A task does not become immediately inactive when it blocks, but becomes inactive at the so called "0 lag time"; so, we setup the "inactive timer" to fire at the "0 lag time". When the "inactive timer" fires, the task utilization is removed from the runqueue's active utilization. If the task wakes up again on the same runqueue before the "0 lag time", the active utilization must not be changed and the "inactive timer" must be cancelled. If the task wakes up again on a different runqueue before the "0 lag time", then the task's utilization must be removed from the previous runqueue's active utilization and must be added to the new runqueue's active utilization. In order to avoid races between a task waking up on a runqueue while the "inactive timer" is running on a different CPU, the "dl_non_contending" flag is used to indicate that a task is not on a runqueue but is active (so, the flag is set when the task blocks and is cleared when the "inactive timer" fires or when the task wakes up). " (if this is ok, where can I add this comment?) > > +static void migrate_task_rq_dl(struct task_struct *p) > > +{ > > + if ((p->state == TASK_WAKING) && > > (p->dl.dl_non_contending)) { > > + struct rq *rq = task_rq(p); > > + > > + raw_spin_lock(&rq->lock); > > + sub_running_bw(p->dl.dl_bw, &rq->dl); > > + p->dl.dl_non_contending = 0; > > + /* > > + * If the timer handler is currently running and > > the > > + * timer cannot be cancelled, inactive_task_timer() > > + * will see that dl_not_contending is not set, and > > + * will not touch the rq's active utilization, > > + * so we are still safe. > > + */ > > + if (hrtimer_try_to_cancel(&p->dl.inactive_timer) > > == 1) > > + put_task_struct(p); > > + > > + raw_spin_unlock(&rq->lock); > > + } > > +} > > This can similarly be reduced in indent, albeit this is only a single > indent level, so not as onerous as the other one. Ok; I'll do this on Monday. > What had me puzzled for a while here is taking the lock; because some > callers of set_task_cpu() must in fact hold this lock already. Worth a > comment I feel. Ok; I'll add a comment mentioning that since state is TASK_WAKING we do not have the lock. > Once I figured out the exact locking; I realized you do this on > cross-cpu wakeups. We spend a fair amount of effort not to take both > rq locks. But I suppose you have to here. The problem is that when a task wakes up before the "0 lag time" on a different runqueue, we must "migrate" its utilization from the old runqueue to the new one (see comment above). And I need the lock to avoid races... The only alternative I can think about is to ad a new lock protecting the active utilization, and to take it instead of the runqueue lock (I do not know if this is enough, I need to check...). I'll have a look on Monday. Thanks, Luca