Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S935779AbdCXNVd (ORCPT ); Fri, 24 Mar 2017 09:21:33 -0400 Received: from bombadil.infradead.org ([65.50.211.133]:52094 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S935584AbdCXNVM (ORCPT ); Fri, 24 Mar 2017 09:21:12 -0400 Date: Fri, 24 Mar 2017 14:20:41 +0100 From: Peter Zijlstra To: luca abeni Cc: linux-kernel@vger.kernel.org, Ingo Molnar , Juri Lelli , Claudio Scordino , Steven Rostedt , Tommaso Cucinotta , Daniel Bristot de Oliveira , Joel Fernandes , Mathieu Poirier Subject: Re: [RFC v5 2/9] sched/deadline: improve the tracking of active utilization Message-ID: <20170324132041.6fxayfe3id4af3n5@hirez.programming.kicks-ass.net> References: <1490327582-4376-1-git-send-email-luca.abeni@santannapisa.it> <1490327582-4376-3-git-send-email-luca.abeni@santannapisa.it> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1490327582-4376-3-git-send-email-luca.abeni@santannapisa.it> User-Agent: NeoMutt/20170113 (1.7.2) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5738 Lines: 183 On Fri, Mar 24, 2017 at 04:52:55AM +0100, luca abeni wrote: > diff --git a/include/linux/sched.h b/include/linux/sched.h > index d67eee8..952cac8 100644 > --- a/include/linux/sched.h > +++ b/include/linux/sched.h > @@ -445,16 +445,33 @@ struct sched_dl_entity { > * > * @dl_yielded tells if task gave up the CPU before consuming > * all its available runtime during the last job. > + * > + * @dl_non_contending tells if task is inactive while still > + * contributing to the active utilization. In other words, it > + * indicates if the inactive timer has been armed and its handler > + * has not been executed yet. This flag is useful to avoid race > + * conditions between the inactive timer handler and the wakeup > + * code. > */ > int dl_throttled; > int dl_boosted; > int dl_yielded; > + int dl_non_contending; We should maybe look into making those bits :1, not something for this patch though; > diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c > index cef9adb..86aed82 100644 > --- a/kernel/sched/deadline.c > +++ b/kernel/sched/deadline.c > @@ -65,6 +65,107 @@ void sub_running_bw(u64 dl_bw, struct dl_rq *dl_rq) > dl_rq->running_bw = 0; > } > > +void dl_change_utilization(struct task_struct *p, u64 new_bw) > +{ > + if (!task_on_rq_queued(p)) { > + struct rq *rq = task_rq(p); > + > + if (p->dl.dl_non_contending) { > + sub_running_bw(p->dl.dl_bw, &rq->dl); > + p->dl.dl_non_contending = 0; > + /* > + * If the timer handler is currently running and the > + * timer cannot be cancelled, inactive_task_timer() > + * will see that dl_not_contending is not set, and > + * will not touch the rq's active utilization, > + * so we are still safe. > + */ > + if (hrtimer_try_to_cancel(&p->dl.inactive_timer) == 1) > + put_task_struct(p); > + } > + } > +} If we rearrange that slightly we can avoid the double indent: --- a/kernel/sched/deadline.c +++ b/kernel/sched/deadline.c @@ -67,23 +67,23 @@ void sub_running_bw(u64 dl_bw, struct dl void dl_change_utilization(struct task_struct *p, u64 new_bw) { - if (!task_on_rq_queued(p)) { - struct rq *rq = task_rq(p); + if (task_on_rq_queued(p)) + return; - if (p->dl.dl_non_contending) { - sub_running_bw(p->dl.dl_bw, &rq->dl); - p->dl.dl_non_contending = 0; - /* - * If the timer handler is currently running and the - * timer cannot be cancelled, inactive_task_timer() - * will see that dl_not_contending is not set, and - * will not touch the rq's active utilization, - * so we are still safe. - */ - if (hrtimer_try_to_cancel(&p->dl.inactive_timer) == 1) - put_task_struct(p); - } - } + if (!p->dl.dl_non_contending) + return; + + sub_running_bw(p->dl.dl_bw, &task_rq(p)->dl); + p->dl.dl_non_contending = 0; + /* + * If the timer handler is currently running and the + * timer cannot be cancelled, inactive_task_timer() + * will see that dl_not_contending is not set, and + * will not touch the rq's active utilization, + * so we are still safe. + */ + if (hrtimer_try_to_cancel(&p->dl.inactive_timer) == 1) + put_task_struct(p); } static void task_non_contending(struct task_struct *p) > + > +static void task_non_contending(struct task_struct *p) > +{ > +} > + > +static void task_contending(struct sched_dl_entity *dl_se) > +{ > + struct dl_rq *dl_rq = dl_rq_of_se(dl_se); > + > + /* > + * If this is a non-deadline task that has been boosted, > + * do nothing > + */ > + if (dl_se->dl_runtime == 0) > + return; > + > + if (dl_se->dl_non_contending) { > + /* > + * If the timer handler is currently running and the > + * timer cannot be cancelled, inactive_task_timer() > + * will see that dl_not_contending is not set, and > + * will not touch the rq's active utilization, > + * so we are still safe. > + */ > + if (hrtimer_try_to_cancel(&dl_se->inactive_timer) == 1) > + put_task_struct(dl_task_of(dl_se)); > + dl_se->dl_non_contending = 0; This had me worried for a little while as being a use-after-free, but this put_task_struct() cannot be the last in this case. Might be worth a comment. > + } else { > + /* > + * Since "dl_non_contending" is not set, the > + * task's utilization has already been removed from > + * active utilization (either when the task blocked, > + * when the "inactive timer" fired). > + * So, add it back. > + */ > + add_running_bw(dl_se->dl_bw, dl_rq); > + } > +} In general I feel it would be nice to have a state diagram included somewhere near these two functions. It would be nice to not have to dig out the PDF every time. > +static void migrate_task_rq_dl(struct task_struct *p) > +{ > + if ((p->state == TASK_WAKING) && (p->dl.dl_non_contending)) { > + struct rq *rq = task_rq(p); > + > + raw_spin_lock(&rq->lock); > + sub_running_bw(p->dl.dl_bw, &rq->dl); > + p->dl.dl_non_contending = 0; > + /* > + * If the timer handler is currently running and the > + * timer cannot be cancelled, inactive_task_timer() > + * will see that dl_not_contending is not set, and > + * will not touch the rq's active utilization, > + * so we are still safe. > + */ > + if (hrtimer_try_to_cancel(&p->dl.inactive_timer) == 1) > + put_task_struct(p); > + > + raw_spin_unlock(&rq->lock); > + } > +} This can similarly be reduced in indent, albeit this is only a single indent level, so not as onerous as the other one. What had me puzzled for a while here is taking the lock; because some callers of set_task_cpu() must in fact hold this lock already. Worth a comment I feel. Once I figured out the exact locking; I realized you do this on cross-cpu wakeups. We spend a fair amount of effort not to take both rq locks. But I suppose you have to here.