Received: by 10.223.176.46 with SMTP id f43csp2729190wra; Mon, 22 Jan 2018 02:24:52 -0800 (PST) X-Google-Smtp-Source: AH8x227UfRLabVc18ZK7VhR6SnnZf6bU/XVAUuhAeY0hsukFbBYqCbYSfL9QPrmeIfWwMzLzZrvn X-Received: by 10.99.131.74 with SMTP id h71mr6804638pge.373.1516616692509; Mon, 22 Jan 2018 02:24:52 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1516616692; cv=none; d=google.com; s=arc-20160816; b=SN47VJVNGTZtAuXHeGLhIbrQKxHCZdMguy9B3feNuRdd5ZPWc3XkZSODo3r+1tg2Zz Ya/L+ciB1IHJ6QYbS1cNOCB2GmWFGhuYbGoLQpf65ZfsdEu/Aclk3CwJyJWIKVI9DThA iWvOI30ha4zvqW9HeXQemKJm6ONk/6aGCDxJP95CF5HrAanCByXzE61IjcrePhK42n36 aHpAdEuyRnfTA2ysY79skw0PHjQJfB+/Ao1qhhULXgjlnD6qmQwUFe4alwteX4c5uGR+ qliFF7WgnnRYjE64Dm9PQPsVs2yeE25UHzHWfgVmChBhHdAAZNLoATfUw7X95aAkM3yp wRew== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:cc:to:subject :message-id:date:from:references:in-reply-to:mime-version :dkim-signature:arc-authentication-results; bh=yY2DAbLMbDH9dPsjv0I1wUN1mMutMTwl2vh4nKcQl9Y=; b=LoDzEc81jFcx439ch9lRKIfZF58TB+2kQT+T1NN/A11JVazedJQKiP/DEDSSfXC0Un 2n5tGTz+Hu2VivExGYClC76fbRhvLTCUOCE3966WcQWIl242ih+/aDlb5Z1Ch4yT55OA QKeQ7/16vGbZ0vPI/6t701jp9GjN9gEpx6fT/Zk3LAD3QqCeMuMOTXmT30pdZjxp05+N KldIKJjymEgK5xrroxzUrIGXfCEsWbfxyHEKwSu+uRqgZHNwwC3+umkl5j5swRKNT7zt 55yM3UplIQfaEm6kgTn/dRjd1tEj6VZJqs64s69SCZTx3rBVdk+phQxtcFnw0Vg4mVkx gvrg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=d0T6rJvH; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id g1si10306162pgo.780.2018.01.22.02.24.37; Mon, 22 Jan 2018 02:24:52 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=d0T6rJvH; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751255AbeAVKXf (ORCPT + 99 others); Mon, 22 Jan 2018 05:23:35 -0500 Received: from mail-it0-f67.google.com ([209.85.214.67]:42807 "EHLO mail-it0-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751216AbeAVKXd (ORCPT ); Mon, 22 Jan 2018 05:23:33 -0500 Received: by mail-it0-f67.google.com with SMTP id p139so9321357itb.1 for ; Mon, 22 Jan 2018 02:23:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=yY2DAbLMbDH9dPsjv0I1wUN1mMutMTwl2vh4nKcQl9Y=; b=d0T6rJvHGoKoIIx7/Taz3LTwHe5TkOqhewFofGDCqDoXrqsJpjydZJtNIQDbbipey1 /Fuh6+tfga+1gTY6xSmlKLGPDeLBY3mRXuNOMyn6rdBbdFYEy5brRnrtDTUTFSE5khGp ojFT4LNNa4iqUdbb9k0oEt9pJx+Vc6fbvkIhc= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=yY2DAbLMbDH9dPsjv0I1wUN1mMutMTwl2vh4nKcQl9Y=; b=Bcq1u+PXE3ynSEkyE6JxYvACcQoAZ4fI9h4COIpIhSa0ZB9UQU+7gv0I2iUtQPNZs0 UC1ljCHt2iOo3U5b4kmBp+g+aNSwKbTFERS0GZ0INqxHs2IQIMzGYVUMEG1MxL7NzcU5 isjjib5Yu1kA2/PYN8G802Exjkcn13nupXN6DmoLD6bb9wG+xyupI2WNpG+HlW/8zVil ZKCV6w+RnOUT8bqd+zuHX+IaOLD6p83nnk9QKkiT9lR+mmKW/RBHZ8sLDxxx/sLHVzO/ YnBkoLIz5u6qH8rXaTL7Tq0aeitzGq8K0ZmWkyBNtzZab7xNAAOQ9cSHlGhgaBxyaP0B bUcQ== X-Gm-Message-State: AKwxytfcU19NlWlZvW986yAm+KUjl6Fuii8yrR/4Na3tylPg2u78XdaZ M5pA3UwN9DNubxgr4It2z80HltCeMj8B8jsin/Tc6w== X-Received: by 10.36.33.66 with SMTP id e63mr7834729ita.74.1516616612755; Mon, 22 Jan 2018 02:23:32 -0800 (PST) MIME-Version: 1.0 Received: by 10.107.50.140 with HTTP; Mon, 22 Jan 2018 02:23:12 -0800 (PST) In-Reply-To: <8c4f4ee6-d6b4-edfb-1fe3-4f3b8ebe659c@arm.com> References: <20171222075934.f6yenvcb2zkf2ysd@hirez.programming.kicks-ass.net> <20171222082915.4lcb7xyyooqyjpia@hirez.programming.kicks-ass.net> <20171222091221.ow5vn3ydx3hj4nht@hirez.programming.kicks-ass.net> <20171222185629.lysjebfifgdwvvhu@hirez.programming.kicks-ass.net> <20171222204247.kyc6ugyyu3ei7zhs@hirez.programming.kicks-ass.net> <20180115082609.GA6320@linaro.org> <8c4f4ee6-d6b4-edfb-1fe3-4f3b8ebe659c@arm.com> From: Vincent Guittot Date: Mon, 22 Jan 2018 11:23:12 +0100 Message-ID: Subject: Re: [RFC PATCH 2/5] sched: Add NOHZ_STATS_KICK To: Dietmar Eggemann Cc: Peter Zijlstra , Ingo Molnar , linux-kernel , Brendan Jackman , Morten Rasmussen Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 22 January 2018 at 10:40, Dietmar Eggemann wr= ote: > On 01/15/2018 08:26 AM, Vincent Guittot wrote: >> >> Le Wednesday 03 Jan 2018 =C3=A0 10:16:00 (+0100), Vincent Guittot a =C3= =A9crit : >>> >>> Hi Peter, >>> >>> On 22 December 2017 at 21:42, Peter Zijlstra >>> wrote: >>>> >>>> On Fri, Dec 22, 2017 at 07:56:29PM +0100, Peter Zijlstra wrote: >>>>> >>>>> Right; but I figured we'd try and do it 'right' and see how horrible = it >>>>> is before we try and do funny things. >>>> >>>> >>>> So now it should have a 32ms tick for up to .5s when the system goes >>>> completely idle. >>>> >>>> No idea how bad that is.. >>> >>> >>> I have tested your branch but the timer doesn't seem to fire correctly >>> because i can still see blocked load in the use case i have run. >>> I haven't found the reason yet > > > Isn't the issue with the timer based implementation that > rq->has_blocked_load is never set to 1 ? Yes that's what I suggested to Peter on IRC > > Something you changed in your implementation by adding a > rq->has_blocked_load =3D 1 into nohz_balance_enter_idle(). > > >> >> Hi Peter, >> >> With the patch below on top of your branch, the blocked loads are update= d >> and >> decayed regularly. The main differences are: >> - It doesn't use a timer to trig ilb but the tick and when a cpu becomes >> idle. >> The main drawback of this solution is that the load is blocked when t= he >> system is fully idle with the advantage of not waking up a fully idle >> system. We have to wait for the next tick or newly idle event for >> updating >> blocked load when the system leaves idle stat which can be up to a ti= ck >> long. >> If this is too long, we can check for kicking ilb when task wakes up = so >> the >> blocked load will be updated as soon as the system leaves idle state. >> The main advantage is that we don't wake up a fully idle system every >> 32ms to >> update blocked load that will be not used. >> - I'm working on one more improvement to use nohz_idle_balance in the >> newly >> idle case when the system is not overloaded and >> (this_rq->avg_idle > sysctl_sched_migration_cost). In this case, we c= an >> try to >> use nohz_idle_balance with NOHZ_STATS_KICK and abort as soon as it >> exceed >> this_rq->avg_idle. This will remove some calls to kick_ilb and some >> wake up >> of an idle cpus. >> >> --- >> kernel/sched/fair.c | 72 >> +++++++++++++++++++++++++---------------------------- >> 1 file changed, 34 insertions(+), 38 deletions(-) >> >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c >> index 52114c6..898785d 100644 >> --- a/kernel/sched/fair.c >> +++ b/kernel/sched/fair.c >> @@ -5386,7 +5386,6 @@ static struct { >> atomic_t stats_state; >> unsigned long next_balance; /* in jiffy units */ >> unsigned long next_stats; >> - struct timer_list timer; >> } nohz ____cacheline_aligned; >> #endif /* CONFIG_NO_HZ_COMMON */ >> @@ -8004,8 +8003,9 @@ static inline void update_sd_lb_stats(struct lb_en= v >> *env, struct sd_lb_stats *sd >> prefer_sibling =3D 1; >> #ifdef CONFIG_NO_HZ_COMMON >> - if (env->idle =3D=3D CPU_NEWLY_IDLE) >> + if (env->idle =3D=3D CPU_NEWLY_IDLE && atomic_read(&nohz.stats_s= tate)) >> { >> env->flags |=3D LBF_NOHZ_STATS; >> + } >> #endif >> load_idx =3D get_sd_load_idx(env->sd, env->idle); >> @@ -8818,6 +8818,8 @@ update_next_balance(struct sched_domain *sd, >> unsigned long *next_balance) >> *next_balance =3D next; >> } >> +static void kick_ilb(unsigned int flags); >> + >> /* >> * idle_balance is called by schedule() if this_cpu is about to become >> * idle. Attempts to pull tasks from other CPUs. >> @@ -8852,12 +8854,16 @@ static int idle_balance(struct rq *this_rq, stru= ct >> rq_flags *rf) >> if (this_rq->avg_idle < sysctl_sched_migration_cost || >> !this_rq->rd->overload) { >> + unsigned long next =3D READ_ONCE(nohz.next_stats); >> rcu_read_lock(); >> sd =3D rcu_dereference_check_sched_domain(this_rq->sd); >> if (sd) >> update_next_balance(sd, &next_balance); >> rcu_read_unlock(); >> + if (time_after(jiffies, next) && >> atomic_read(&nohz.stats_state)) >> + kick_ilb(NOHZ_STATS_KICK); >> + >> goto out; >> } >> @@ -9075,18 +9081,6 @@ static void kick_ilb(unsigned int flags) >> smp_send_reschedule(ilb_cpu); >> } >> -void nohz_balance_timer(struct timer_list *timer) >> -{ >> - unsigned long next =3D READ_ONCE(nohz.next_stats); >> - >> - if (time_before(jiffies, next)) { >> - mod_timer(timer, next); >> - return; >> - } >> - >> - kick_ilb(NOHZ_STATS_KICK); >> -} >> - >> /* >> * Current heuristic for kicking the idle load balancer in the presenc= e >> * of an idle cpu in the system. >> @@ -9122,6 +9116,9 @@ static void nohz_balancer_kick(struct rq *rq) >> if (likely(!atomic_read(&nohz.nr_cpus))) >> return; >> + if (time_after(now, nohz.next_stats) && >> atomic_read(&nohz.stats_state)) >> + flags =3D NOHZ_STATS_KICK; >> + >> if (time_before(now, nohz.next_balance)) >> goto out; >> @@ -9227,7 +9224,6 @@ static void set_cpu_sd_state_idle(int cpu) >> void nohz_balance_enter_idle(int cpu) >> { >> struct rq *rq =3D cpu_rq(cpu); >> - unsigned int val, new; >> SCHED_WARN_ON(cpu !=3D smp_processor_id()); >> @@ -9251,6 +9247,7 @@ void nohz_balance_enter_idle(int cpu) >> return; >> rq->nohz_tick_stopped =3D 1; >> + rq->has_blocked_load =3D 1; >> cpumask_set_cpu(cpu, nohz.idle_cpus_mask); >> atomic_inc(&nohz.nr_cpus); >> @@ -9258,21 +9255,11 @@ void nohz_balance_enter_idle(int cpu) >> set_cpu_sd_state_idle(cpu); >> /* >> - * implies a barrier such that if the stats_state update is >> observed >> - * the above updates are also visible. Pairs with stuff in >> - * update_sd_lb_stats() and nohz_idle_balance(). >> + * Each time a cpu enter idle, we assume that it has blocked loa= d >> and >> + * enable the periodic update of the load of idle cpus >> */ >> - val =3D atomic_read(&nohz.stats_state); >> - do { >> - new =3D val + 2; >> - new |=3D 1; >> - } while (!atomic_try_cmpxchg(&nohz.stats_state, &val, new)); >> + atomic_set(&nohz.stats_state, 1); >> - /* >> - * If the timer was stopped, restart the thing. >> - */ >> - if (!(val & 1)) >> - mod_timer(&nohz.timer, jiffies + >> msecs_to_jiffies(LOAD_AVG_PERIOD)); >> } >> #else >> static inline void nohz_balancer_kick(struct rq *rq) { } >> @@ -9409,7 +9396,6 @@ static bool nohz_idle_balance(struct rq *this_rq, >> enum cpu_idle_type idle) >> bool has_blocked_load =3D false; >> int update_next_balance =3D 0; >> int this_cpu =3D this_rq->cpu; >> - unsigned int stats_seq; >> unsigned int flags; >> int balance_cpu; >> struct rq *rq; >> @@ -9422,7 +9408,6 @@ static bool nohz_idle_balance(struct rq *this_rq, >> enum cpu_idle_type idle) >> return false; >> } >> - stats_seq =3D atomic_read(&nohz.stats_state); >> /* >> * barrier, pairs with nohz_balance_enter_idle(), ensures ... >> */ >> @@ -9432,6 +9417,16 @@ static bool nohz_idle_balance(struct rq *this_rq, >> enum cpu_idle_type idle) >> SCHED_WARN_ON((flags & NOHZ_KICK_MASK) =3D=3D NOHZ_BALANCE_KICK)= ; >> + /* >> + * We assume there will be no idle load after this update and >> clear >> + * the stats state. If a cpu enters idle in the mean time, it wi= ll >> + * set the stats state and trig another update of idle load. >> + * Because a cpu that becomes idle, is added to idle_cpus_mask >> before >> + * setting the stats state, we are sure to not clear the state a= nd >> not >> + * check the load of an idle cpu. >> + */ >> + atomic_set(&nohz.stats_state, 0); >> + >> for_each_cpu(balance_cpu, nohz.idle_cpus_mask) { >> if (balance_cpu =3D=3D this_cpu || !idle_cpu(balance_cpu= )) >> continue; >> @@ -9441,8 +9436,10 @@ static bool nohz_idle_balance(struct rq *this_rq, >> enum cpu_idle_type idle) >> * work being done for other cpus. Next load >> * balancing owner will pick it up. >> */ >> - if (need_resched()) >> + if (need_resched()) { >> + has_blocked_load =3D true; >> break; >> + } >> rq =3D cpu_rq(balance_cpu); >> @@ -9477,12 +9474,12 @@ static bool nohz_idle_balance(struct rq >> *this_rq, enum cpu_idle_type idle) >> if (flags & NOHZ_BALANCE_KICK) >> rebalance_domains(this_rq, CPU_IDLE); >> - if (has_blocked_load || >> - !atomic_try_cmpxchg(&nohz.stats_state, &stats_seq, 0)) { >> - WRITE_ONCE(nohz.next_stats, >> - now + msecs_to_jiffies(LOAD_AVG_PERIOD))= ; >> - mod_timer(&nohz.timer, nohz.next_stats); >> - } >> + WRITE_ONCE(nohz.next_stats, >> + now + msecs_to_jiffies(LOAD_AVG_PERIOD)); >> + >> + /* There is still blocked load, enable periodic update */ >> + if (has_blocked_load) >> + atomic_set(&nohz.stats_state, 1); >> /* >> * next_balance will be updated only when there is a need. >> @@ -10115,7 +10112,6 @@ __init void init_sched_fair_class(void) >> nohz.next_balance =3D jiffies; >> nohz.next_stats =3D jiffies; >> zalloc_cpumask_var(&nohz.idle_cpus_mask, GFP_NOWAIT); >> - timer_setup(&nohz.timer, nohz_balance_timer, 0); >> #endif >> #endif /* SMP */ >> > >