Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754532AbcKAVqo (ORCPT ); Tue, 1 Nov 2016 17:46:44 -0400 Received: from mail-wm0-f68.google.com ([74.125.82.68]:34397 "EHLO mail-wm0-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752154AbcKAVqm (ORCPT ); Tue, 1 Nov 2016 17:46:42 -0400 Date: Tue, 1 Nov 2016 22:46:33 +0100 From: luca abeni To: Juri Lelli Cc: linux-kernel@vger.kernel.org, Peter Zijlstra , Ingo Molnar , Claudio Scordino , Steven Rostedt Subject: Re: [RFC v3 2/6] Improve the tracking of active utilisation Message-ID: <20161101224633.4e5ee0ca@utopia> In-Reply-To: <20161101164604.GB2769@ARMvm> References: <1477317998-7487-1-git-send-email-luca.abeni@unitn.it> <1477317998-7487-3-git-send-email-luca.abeni@unitn.it> <20161101164604.GB2769@ARMvm> X-Mailer: Claws Mail 3.13.2 (GTK+ 2.24.30; i686-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 7270 Lines: 197 Hi Juri, On Tue, 1 Nov 2016 16:46:04 +0000 Juri Lelli wrote: [...] > > diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c > > index 3d95c1d..80d1541 100644 > > --- a/kernel/sched/deadline.c > > +++ b/kernel/sched/deadline.c > > @@ -47,6 +47,7 @@ static void add_running_bw(struct sched_dl_entity *dl_se, struct dl_rq *dl_rq) > > { > > u64 se_bw = dl_se->dl_bw; > > > > + lockdep_assert_held(&(rq_of_dl_rq(dl_rq))->lock); > > This and the one below go in 1/6. Ok; I'll move it there > > > dl_rq->running_bw += se_bw; > > } > > > > @@ -54,11 +55,52 @@ static void sub_running_bw(struct sched_dl_entity *dl_se, struct dl_rq *dl_rq) > > { > > u64 se_bw = dl_se->dl_bw; > > > > + lockdep_assert_held(&(rq_of_dl_rq(dl_rq))->lock); > > dl_rq->running_bw -= se_bw; > > if (WARN_ON(dl_rq->running_bw < 0)) > > dl_rq->running_bw = 0; > > } > > > > +static void task_go_inactive(struct task_struct *p) > > +{ > > + struct sched_dl_entity *dl_se = &p->dl; > > + struct hrtimer *timer = &dl_se->inactive_timer; > > + struct dl_rq *dl_rq = dl_rq_of_se(dl_se); > > + struct rq *rq = rq_of_dl_rq(dl_rq); > > + s64 zerolag_time; > > + > > + WARN_ON(dl_se->dl_runtime == 0); > > + > > + /* If the inactive timer is already armed, return immediately */ > > + if (hrtimer_active(&dl_se->inactive_timer)) > > + return; > > + > > + zerolag_time = dl_se->deadline - > > + div64_long((dl_se->runtime * dl_se->dl_period), > > + dl_se->dl_runtime); > > + > > + /* > > + * Using relative times instead of the absolute "0-lag time" > > + * allows to simplify the code > > + */ > > + zerolag_time -= rq_clock(rq); > > + > > + /* > > + * If the "0-lag time" already passed, decrease the active > > + * utilization now, instead of starting a timer > > + */ > > + if (zerolag_time < 0) { > > + sub_running_bw(dl_se, dl_rq); > > + if (!dl_task(p)) > > + __dl_clear_params(p); > > + > > + return; > > + } > > + > > + get_task_struct(p); > > + hrtimer_start(timer, ns_to_ktime(zerolag_time), HRTIMER_MODE_REL); > > +} > > + > > static inline int is_leftmost(struct task_struct *p, struct dl_rq *dl_rq) > > { > > struct sched_dl_entity *dl_se = &p->dl; > > @@ -514,7 +556,20 @@ static void update_dl_entity(struct sched_dl_entity *dl_se, > > struct dl_rq *dl_rq = dl_rq_of_se(dl_se); > > struct rq *rq = rq_of_dl_rq(dl_rq); > > > > - add_running_bw(dl_se, dl_rq); > > + if (hrtimer_is_queued(&dl_se->inactive_timer)) { > > + hrtimer_try_to_cancel(&dl_se->inactive_timer); > > Why we are OK with just trying to cancel the inactive timer? This is my understanding of the things: if the timer cannot be cancelled, it either: 1) Already decreased the active utilisation (and the timer handler is just finishing its execution). In this case, cancelling it or not cancelling it does not make any difference 2) Still have to decrease the active utilisation (and is probably waiting for the runqueue lock). In this case, the timer handler will find the task RUNNING, and will not decrease the active utilisation > > > + WARN_ON(dl_task_of(dl_se)->nr_cpus_allowed > 1); > > What's wrong with nr_cpus_allowed > 1 tasks? If nr_cpus_allowed > 1, then select_task_rq_dl() executed before enqueueing the task, and it already took care of cancelling the inactive timer. I added this WARN_ON() to check for possible races between the timer handler and select_task_rq / enqueue_task_dl... I tried hard to trigger such a race, but the WARN_ON() never happened (it only happened when I had tasks bound to one single CPU - I tried this combination for testing purposes). [...] > > @@ -1074,6 +1161,14 @@ select_task_rq_dl(struct task_struct *p, int cpu, int sd_flag, int flags) > > } > > rcu_read_unlock(); > > > > + rq = task_rq(p); > > + raw_spin_lock(&rq->lock); > > + if (hrtimer_active(&p->dl.inactive_timer)) { > > + sub_running_bw(&p->dl, &rq->dl); > > + hrtimer_try_to_cancel(&p->dl.inactive_timer); > > Can't we subtract twice if it happens that after we grabbed rq_lock the timer > fired, so it's now waiting for that lock and it goes ahead and sub_running_bw > again after we release the lock? Uhm... I somehow convinced myself that this could not happen, but I do not remember the details, sorry :( I'll check again in the next days (BTW, wouldn't this trigger the WARN_ON you mentioned above?) > > > + } > > + raw_spin_unlock(&rq->lock); > > + > > out: > > return cpu; > > } > > @@ -1244,6 +1339,11 @@ static void task_dead_dl(struct task_struct *p) > > /* XXX we should retain the bw until 0-lag */ > > dl_b->total_bw -= p->dl.dl_bw; > > raw_spin_unlock_irq(&dl_b->lock); > > + if (hrtimer_active(&p->dl.inactive_timer)) { > > + raw_spin_lock_irq(&task_rq(p)->lock); > > + sub_running_bw(&p->dl, dl_rq_of_se(&p->dl)); > > Don't we still need to wait for the 0-lag? Or maybe since the task is dying we > can release it's bw instantaneously? In this case I'd add a comment about it. Uhmm... I think you are right; I'll double-check this. > > static void set_curr_task_dl(struct rq *rq) > > @@ -1720,15 +1820,22 @@ void __init init_sched_dl_class(void) > > static void switched_from_dl(struct rq *rq, struct task_struct *p) > > { > > /* > > - * Start the deadline timer; if we switch back to dl before this we'll > > - * continue consuming our current CBS slice. If we stay outside of > > - * SCHED_DEADLINE until the deadline passes, the timer will reset the > > - * task. > > + * task_go_inactive() can start the "inactive timer" (if the 0-lag > > + * time is in the future). If the task switches back to dl before > > + * the "inactive timer" fires, it can continue to consume its current > > + * runtime using its current deadline. If it stays outside of > > + * SCHED_DEADLINE until the 0-lag time passes, inactive_task_timer() > > + * will reset the task parameters. > > */ > > - if (!start_dl_timer(p)) > > - __dl_clear_params(p); > > + if (task_on_rq_queued(p) && p->dl.dl_runtime) > > + task_go_inactive(p); > > > > - if (task_on_rq_queued(p)) > > + /* > > + * We cannot use inactive_task_timer() to invoke sub_running_bw() > > + * at the 0-lag time, because the task could have been migrated > > + * while SCHED_OTHER in the meanwhile. > > But, from a theoretical pow, we very much should, right? Yes, we should. > Is this taken care of in next patch? No, I left this small divergence between theory and practice. The problem is that the inactive timer handler looks at the task's runqueue, and decreases the active utilization of such a runqueue. If the task is a SCHED_DEADLINE task, this is not an issue, because the inactive timer is armed when the task is not on a runqueue... When the task is inserted in a runqueue again, the timer is cancelled. But when the task becomes SCHED_NORMAL, it can go on a runqueue (and migrate to different runqueues) without cancelling the inactive timer... So, when the inactive timer fires we have no guarantee that the task's runqueue is still the one from which we need to subtract the utilisation (I've actually seen this bug in action, before changing the code in the way it is now). If this is not acceptable, I'll try to find another solution to this issue. Thanks, Luca