Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753834AbaBUXui (ORCPT ); Fri, 21 Feb 2014 18:50:38 -0500 Received: from forward11.mail.yandex.net ([95.108.130.93]:41500 "EHLO forward11.mail.yandex.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751765AbaBUXug (ORCPT ); Fri, 21 Feb 2014 18:50:36 -0500 From: Kirill Tkhai To: Juri Lelli Cc: Peter Zijlstra , "linux-kernel@vger.kernel.org" , Steven Rostedt , Ingo Molnar In-Reply-To: <20140221175305.1e170b45be08fe05c93a33b4@gmail.com> References: <230991392848160@web13m.yandex.ru> <20140221103715.GP9987@twins.programming.kicks-ass.net> <20140221173641.a060b3d6c0993c21e77f29c2@gmail.com> <20140221175305.1e170b45be08fe05c93a33b4@gmail.com> Subject: Re: [RFC] sched/deadline: Prevent rt_time growth to infinity MIME-Version: 1.0 Message-Id: <57671393026602@web10j.yandex.ru> X-Mailer: Yamail [ http://yandex.ru ] 5.0 Date: Sat, 22 Feb 2014 03:50:02 +0400 Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset=koi8-r Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 21.02.2014, 20:52, "Juri Lelli" : > On Fri, 21 Feb 2014 17:36:41 +0100 > Juri Lelli wrote: > >> ?On Fri, 21 Feb 2014 11:37:15 +0100 >> ?Peter Zijlstra wrote: >>> ?On Thu, Feb 20, 2014 at 02:16:00AM +0400, Kirill Tkhai wrote: >>>> ?Since deadline tasks share rt bandwidth, we must care about >>>> ?bandwidth timer set. Otherwise rt_time may grow up to infinity >>>> ?in update_curr_dl(), if there are no other available RT tasks >>>> ?on top level bandwidth. >>>> >>>> ?I'm going to decide the problem the way below. Almost untested >>>> ?because of I skipped almost all of recent patches which haveto be applied from lkml. >>>> >>>> ?Please say, if I skipped anything in idea. Maybe better put >>>> ?start_top_rt_bandwidth() into set_curr_task_dl()? >>> ?How about we only increment rt_time when there's an RT bandwidth timer >>> ?active? >>> >>> ?--- >>> ?--- a/kernel/sched/rt.c >>> ?+++ b/kernel/sched/rt.c >>> ?@@ -568,6 +568,12 @@ static inline struct rt_bandwidth *sched >>> >>> ??#endif /* CONFIG_RT_GROUP_SCHED */ >>> >>> ?+bool sched_rt_bandwidth_active(struct rt_rq *rt_rq) >>> ?+{ >>> ?+ struct rt_bandwidth *rt_b = sched_rt_bandwidth(rt_rq); >>> ?+ return hrtimer_active(&rt_b->rt_period_timer); >>> ?+} >>> ?+ >>> ??#ifdef CONFIG_SMP >>> ??/* >>> ???* We ran out of runtime, see if we can borrow some from our neighbours. >>> ?--- a/kernel/sched/deadline.c >>> ?+++ b/kernel/sched/deadline.c >>> ?@@ -587,6 +587,8 @@ int dl_runtime_exceeded(struct rq *rq, s >>> ??????????return 1; >>> ??} >>> >>> ?+extern bool sched_rt_bandwidth_active(struct rt_rq *rt_rq); >>> ?+ >>> ??/* >>> ???* Update the current task's runtime statistics (provided it is still >>> ???* a -deadline task and has not been removed from the dl_rq). >>> ?@@ -650,11 +652,13 @@ static void update_curr_dl(struct rq *rq >>> ??????????????????struct rt_rq *rt_rq = &rq->rt; >>> >>> ??????????????????raw_spin_lock(&rt_rq->rt_runtime_lock); >>> ?- rt_rq->rt_time += delta_exec; >>> ??????????????????/* >>> ???????????????????* We'll let actual RT tasks worry about the overflow here, we >>> ?- * have our own CBS to keep us inline -- see above. >>> ?+ * have our own CBS to keep us inline; only account when RT >>> ?+ * bandwidth is relevant. >>> ???????????????????*/ >>> ?+ if (sched_rt_bandwidth_active(rt_rq)) >>> ?+ rt_rq->rt_time += delta_exec; >>> ??????????????????raw_spin_unlock(&rt_rq->rt_runtime_lock); >>> ??????????} >>> ??} >> ?So, I ran some tests with the above and I'd like to share with you what >> ?I've found. You can find here a trace-cmd trace that should be feeded >> ?to kernelshark to be able to understand what follows (or feel free to >> ?reproduce same scenario :)): >> ?http://retis.sssup.it/~jlelli/traces/trace_rt_time.dat >> >> ?Here you have a DL task (4/10) and a while(1) RT task, both running >> ?inside a rt_bw of 0.5. RT tasks is activated 500ms after DL. As I >> ?filtered in sched_rt_period_timer(), you can search for time instants >> ?when the rt_bw is replenished. It is evident that the first time after >> ?rt timer is activated back (search for start_bandwidth_timer), we can >> ?eat some bw to FAIR tasks (if any). This is due to the fact that we >> ?reset rt_bw budget at this time, start decrementing rt_time for both DL > > The reset happens when rt_bw replenishment timer fires, after a bit: > > ?sched_rt_period_timer <-- __run_hrtimer Juri, sorry, I forgot to wrote I mean the situation when only one task is on_rq at every moment. DL, RT, DL, RT, ... rt_runtime = n; rt_period = 2n; | DL's working, RT's sleeping | RT's working, DL's sleeping | all sleep | ------------------------------------------------------------------------------------------| | (1) duration = n | (2) duration = n | (3) duration = n | (repeat) |------------------------------|------------------------------|---------------------------| | (rt_bw timer is not running) | (rt_bw timer is running) | According to the patch, rt_bw timer is working only if we have queued RT task. In the case above part (1) has no queued RT tasks, so timer is not working. rt_time is not being increased too. We have ratio 2/3. Thanks, Kirill > > Apologies, > > - Juri > >> ?and RT tasks, throttle RT tasks when rt_time > runtime, but, since DL >> ?tasks acually executes inside their own server, they don't care about >> ?rt_bw. Good news is that steady state is ok: keeping track of overruns >> ?we are able to stop eating bw to other guys. >> >> ?My thougths: >> >> ??- Peter's patch is an easy fix to Kirill's problem (RT tasks were >> ????throttled too early); >> ??- something to add to this solution could be to pre-calculate bw of >> ????ready DL tasks and subtract it to rt_bw at replenishment time, but >> ????it sounds quite awkward, pessimistic, and I'm not sure it is gonna >> ????work; >> ??- we are stealing bw to best-effort tasks, and just at the beginning >> ????of the transistion, is it really a problem? >> ??- I mean, if you want guarantees make your tasks DL! :); >> ??- in the long run we are gonna have RT tasks scheduled inside CBS >> ????servers, and all this will be properly fixed up. >> >> ?Comments? >> >> ?BTW, rt timer activation/deactivation should probably be fixed for >> ?!RT_GROUP_SCHED with something like this: >> >> ?--- >> ??kernel/sched/rt.c | ??10 +++++++--- >> ??1 file changed, 7 insertions(+), 3 deletions(-) >> >> ?diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c >> ?index 6161de8..274f992 100644 >> ?--- a/kernel/sched/rt.c >> ?+++ b/kernel/sched/rt.c >> ?@@ -86,12 +86,12 @@ void init_rt_rq(struct rt_rq *rt_rq, struct rq *rq) >> ??????????raw_spin_lock_init(&rt_rq->rt_runtime_lock); >> ??} >> >> ?-#ifdef CONFIG_RT_GROUP_SCHED >> ??static void destroy_rt_bandwidth(struct rt_bandwidth *rt_b) >> ??{ >> ??????????hrtimer_cancel(&rt_b->rt_period_timer); >> ??} >> >> ?+#ifdef CONFIG_RT_GROUP_SCHED >> ??#define rt_entity_is_task(rt_se) (!(rt_se)->my_q) >> >> ??static inline struct task_struct *rt_task_of(struct sched_rt_entity *rt_se) >> ?@@ -1017,8 +1017,12 @@ inc_rt_group(struct sched_rt_entity *rt_se, struct rt_rq *rt_rq) >> ??????????start_rt_bandwidth(&def_rt_bandwidth); >> ??} >> >> ?-static inline >> ?-void dec_rt_group(struct sched_rt_entity *rt_se, struct rt_rq *rt_rq) {} >> ?+static void >> ?+dec_rt_group(struct sched_rt_entity *rt_se, struct rt_rq *rt_rq) >> ?+{ >> ?+ if (!rt_rq->rt_nr_running) >> ?+ destroy_rt_bandwidth(&def_rt_bandwidth); >> ?+} >> >> ??#endif /* CONFIG_RT_GROUP_SCHED */ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/