Received: by 10.223.176.5 with SMTP id f5csp566741wra; Fri, 9 Feb 2018 03:43:14 -0800 (PST) X-Google-Smtp-Source: AH8x225bHj7qUqhdZuvR5l2PQ84439PcL2yL7aBWEreWD4VKqZq4vdxvH2iLfSA/AZAH6K3f9ffD X-Received: by 2002:a17:902:bc02:: with SMTP id n2-v6mr2344278pls.52.1518176594264; Fri, 09 Feb 2018 03:43:14 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1518176594; cv=none; d=google.com; s=arc-20160816; b=P/Xplyk6sSCquv87Tq4ttEyCr7QVD72tHN96mM3cHI4yJAf2xs9K8xKPeMAFKpn2aN APZhDVp3ss/8lllFyBHpwD2mBZOM3xMWXZQIKRnAfI9703Achfj6F7KTjUW2idkrTrg7 jDNImP4UF3xk1xaLTcA5mvh//KRSrnN4mTjKBZ43B+Q1/ykqwGhzLwQKR5x1OxgME6tk fV+xOTEBYPqkUaxyCx4YUVvG2aPConpW6fpAlPTU2p/gpzJvGnPqD0OagnNxqBS0ueag Qfc+MynMj/FyVVEjz2Wkl2zeDzvfAwWKNmo2RBQ3Trx87v1LeUuB3/aa6Gf5s0tD1zEQ XGwA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:cc:to:subject :message-id:date:from:references:in-reply-to:mime-version :dkim-signature:arc-authentication-results; bh=0kXkJqlDi0VcYGnBJTTPr80Qc2YuOa9wNTbZ2rSRS1s=; b=shJbqxKBvbyH9fVdTkkbHtd3NG8mdk+0pXEyJcElKTylfy4oRVekdwB0PAPAabvL3n FQdxF8eNipcxs7hu1nt6vTw3tNDW2mmbIOuWQLjHCFfWCEgub+LRAON0vEfDXqQQ/8Fk VmjWrtVxU/2wuxihWCWWRGGqYxtuWpfsD5H6jrAF9GMrwaxp5XDyZVDc2V09IMwyBZs3 a72pDDIJEoVUGtO3FZDjlXutHXckIQxVfhEIZIybylSYHBnYq1CmpnpvyV5K8CYj/Msh TpTmK5/kONmt5fOT12HYDUrXPFIOkAuxY92T5jYTUIaYlQ7YoBR/7BkKKNWue37c9Jqm mCWA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=WdJUJfGf; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id m14si1309636pgt.277.2018.02.09.03.43.00; Fri, 09 Feb 2018 03:43:14 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=WdJUJfGf; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751151AbeBILmU (ORCPT + 99 others); Fri, 9 Feb 2018 06:42:20 -0500 Received: from mail-it0-f67.google.com ([209.85.214.67]:50349 "EHLO mail-it0-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751005AbeBILmS (ORCPT ); Fri, 9 Feb 2018 06:42:18 -0500 Received: by mail-it0-f67.google.com with SMTP id x128so10585727ite.0 for ; Fri, 09 Feb 2018 03:42:18 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=0kXkJqlDi0VcYGnBJTTPr80Qc2YuOa9wNTbZ2rSRS1s=; b=WdJUJfGfHq8m1dUtrUHTldwUuyg5WHR9jO0VxMo+rwSW9qbOQRUHhYaCOOOFQNaz9j dpoxJxLCcfJk+VJZfxFCYxjs+nKAmSFRY4bPQ3jQJg+FW6H7lrvRhMs2KN9roAz21iPX lwVUJFKqmGFYvamwq5koXC4mn0nFI7M8J9cKw= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=0kXkJqlDi0VcYGnBJTTPr80Qc2YuOa9wNTbZ2rSRS1s=; b=VT4YO7/Idh2IxjDCGIbdYIR6VFyouZhxIgZZVlM26YwLQ4Y/gg6miH58pQLEuQavSx SYUa9p5CWYXtmin2d+YG7caegunrC+3p/A6muNgRWpAtGCyiTMspdfl+1ypGrcD3hBmV 112Tprcp7xZ1n6bOj+dV1w3SFljZ8kuWGFtB+laPNzE9rQYwXB94BOMbTSD8QzsxVXrU KeZa7GN1VVNPLEJw/XfKkKwB2k8G3eh3blCsX5bhQcBlWpVuJvz2BM/8Y3nchIRFZo+J AE9+SP57d9VZC8CIyZj9n1jH/xqbXBNc6bBxafwl03wKCBYwPEt3jSXtIunLwKKvxZ+G cv+A== X-Gm-Message-State: APf1xPAm8AeHlZ6f6SBx+ILad9k1OL7U3RTwTomfpl20jp3ThfwHXCMk YLRQwYJf592J74tWMHeTUQzVg+1kJWw+cyxZnWoIwQ== X-Received: by 10.36.10.210 with SMTP id 201mr2940620itw.4.1518176537416; Fri, 09 Feb 2018 03:42:17 -0800 (PST) MIME-Version: 1.0 Received: by 10.107.50.198 with HTTP; Fri, 9 Feb 2018 03:41:56 -0800 (PST) In-Reply-To: References: <1517944987-343-1-git-send-email-vincent.guittot@linaro.org> <1517944987-343-2-git-send-email-vincent.guittot@linaro.org> <780a5b3a-4829-4195-c8fd-95da27248a82@arm.com> From: Vincent Guittot Date: Fri, 9 Feb 2018 12:41:56 +0100 Message-ID: Subject: Re: [PATCH v2 1/3] sched: Stop nohz stats when decayed To: Valentin Schneider Cc: Peter Zijlstra , Ingo Molnar , linux-kernel , Morten Rasmussen , Brendan Jackman , Dietmar Eggemann Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 8 February 2018 at 20:21, Valentin Schneider wrote: > On 02/08/2018 01:36 PM, Vincent Guittot wrote: >> On 8 February 2018 at 13:46, Valentin Schneider >> wrote: >>> On 02/06/2018 07:23 PM, Vincent Guittot wrote: >>>> [...] > > For now I've been using those made-up rt-app workloads to stress specific > bits of code, but I agree it would be really nice to have a "real" worklo= ad > to see how both power (additional kick_ilb()'s) and performance (addition= al > atomic ops) are affected. As Vincent suggested offline, it could be worth > giving it a shot on Android... > > > In the meantime I've done some more accurate profiling with cpumasks on m= y > Juno r2 (patch at the bottom). As a sidenote, I realised I don't have a t= est > case for the load update through load_balance() in idle_balance() - I onl= y > test the !overload segment. Will think about that. > > In summary: > > 20 iterations per test case > All non-mentioned CPUs are idling > > --------------------- > kick_ilb() test case: > --------------------- > > a, b =3D 100% rt-app tasks > - =3D idling > > Accumulating load before sleeping > ^ > ^ > CPU1| a a a - - - a > CPU2| - - b b b b b > v > v > Periodically kicking ILBs to update nohz blocked load > > Baseline: > _nohz_idle_balance() takes 39=C2=B5s in average > nohz_balance_enter_idle() takes 233ns in average > > W/ cpumask: > _nohz_idle_balance() takes 33=C2=B5s in average > nohz_balance_enter_idle() takes 283ns in average > > Diff: > _nohz_idle_balance() -6=C2=B5s in average (-16%) > nohz_balance_enter_idle() +50ns in average (+21%) In your use case, there is no contention when accessing the cpumask. Have you tried a use case with tasks that wake ups and go back to idle simultaneously on several/all cpus so they will fight to update the atomic resources ? That would be interesting to see the impact on the runtime of the nohz_balance_enter_idle function > > > --------------------------------------------------- > Local _nohz_idle_balance() for NEWLY_IDLE test case: > --------------------------------------------------- > > a =3D 100% rt-app task > b =3D periodic rt-app task > - =3D idling > > Accumulating load before sleeping > ^ > ^ > CPU1| a a a - - - - - a > CPU2| - - b - b - b - b > v > v > Periodically updating nohz blocked load through local > _nohz_idle_balance() in idle_balance() > > > (execution times are x2 as fast as previous test case because CPU2 is a > big CPU, whereas CPU0 - the ILB 99% of the time in the previous test - is= a > LITTLE CPU ~ half as powerful). > > Baseline: > _nohz_idle_balance() takes 20=C2=B5s in average > nohz_balance_enter_idle() takes 183ns in average > > W/ cpumask: > _nohz_idle_balance() takes 13=C2=B5s in average > nohz_balance_enter_idle() takes 217ns in average > > Diff: > _nohz_idle_balance() -7=C2=B5s in average (-38%) > nohz_balance_enter_idle() +34ns in average (+18%) > > > > For more details: https://gist.github.com/valschneider/78099acee87a18057d= 56cc6cc03978b1 > > --- > kernel/sched/fair.c | 82 +++++++++++++++++++++++++----------------------= ------ > 1 file changed, 39 insertions(+), 43 deletions(-) > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index 3abb3bc..4042025 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -5404,8 +5404,8 @@ decay_load_missed(unsigned long load, unsigned long= missed_updates, int idx) > > static struct { > cpumask_var_t idle_cpus_mask; > + cpumask_var_t stale_cpus_mask; /* Idle CPUs with blocked load */ > atomic_t nr_cpus; > - int has_blocked; /* Idle CPUS has blocked load */ > unsigned long next_balance; /* in jiffy units */ > unsigned long next_blocked; /* Next update of blocked load in= jiffies */ > } nohz ____cacheline_aligned; > @@ -6968,7 +6968,6 @@ enum fbq_type { regular, remote, all }; > #define LBF_DST_PINNED 0x04 > #define LBF_SOME_PINNED 0x08 > #define LBF_NOHZ_STATS 0x10 > -#define LBF_NOHZ_AGAIN 0x20 > > struct lb_env { > struct sched_domain *sd; > @@ -7827,25 +7826,24 @@ group_type group_classify(struct sched_group *gro= up, > return group_other; > } > > -static bool update_nohz_stats(struct rq *rq) > +static void update_nohz_stats(struct rq *rq) > { > #ifdef CONFIG_NO_HZ_COMMON > unsigned int cpu =3D rq->cpu; > > - if (!rq->has_blocked_load) > - return false; > > - if (!cpumask_test_cpu(cpu, nohz.idle_cpus_mask)) > - return false; > + if (!cpumask_test_cpu(cpu, nohz.stale_cpus_mask)) > + return; > > if (!time_after(jiffies, rq->last_blocked_load_update_tick)) > - return true; > + return; > > update_blocked_averages(cpu); > > - return rq->has_blocked_load; > + if (!rq->has_blocked_load) > + cpumask_clear_cpu(cpu, nohz.stale_cpus_mask); > #else > - return false; > + return; > #endif > } > > @@ -7871,8 +7869,8 @@ static inline void update_sg_lb_stats(struct lb_env= *env, > for_each_cpu_and(i, sched_group_span(group), env->cpus) { > struct rq *rq =3D cpu_rq(i); > > - if ((env->flags & LBF_NOHZ_STATS) && update_nohz_stats(rq= )) > - env->flags |=3D LBF_NOHZ_AGAIN; > + if (env->flags & LBF_NOHZ_STATS) > + update_nohz_stats(rq); > > /* Bias balancing toward cpus of our domain */ > if (local_group) > @@ -8024,7 +8022,8 @@ static inline void update_sd_lb_stats(struct lb_env= *env, struct sd_lb_stats *sd > struct sg_lb_stats *local =3D &sds->local_stat; > struct sg_lb_stats tmp_sgs; > int load_idx, prefer_sibling =3D 0; > - int has_blocked =3D READ_ONCE(nohz.has_blocked); > + int has_blocked =3D cpumask_intersects(sched_domain_span(env->sd)= , > + nohz.stale_cpus_mask); > bool overload =3D false; > > if (child && child->flags & SD_PREFER_SIBLING) > @@ -8089,8 +8088,13 @@ static inline void update_sd_lb_stats(struct lb_en= v *env, struct sd_lb_stats *sd > } while (sg !=3D env->sd->groups); > > #ifdef CONFIG_NO_HZ_COMMON > - if ((env->flags & LBF_NOHZ_AGAIN) && > - cpumask_subset(nohz.idle_cpus_mask, sched_domain_span(env->sd= ))) { > + /* > + * All nohz CPUs with blocked load were visited but some haven't = fully > + * decayed. Visit them again later. > + */ > + if ((env->flags & LBF_NOHZ_STATS) && > + cpumask_subset(nohz.idle_cpus_mask, sched_domain_span(env->sd= )) && > + !cpumask_empty(nohz.stale_cpus_mask)) { > > WRITE_ONCE(nohz.next_blocked, > jiffies + msecs_to_jiffies(LOAD_AVG_PERIO= D)); > @@ -8882,7 +8886,7 @@ static int idle_balance(struct rq *this_rq, struct = rq_flags *rf) > > if (this_rq->avg_idle < sysctl_sched_migration_cost || > !this_rq->rd->overload) { > - unsigned long has_blocked =3D READ_ONCE(nohz.has_blocked)= ; > + unsigned long has_blocked =3D !cpumask_empty(nohz.stale_c= pus_mask); > unsigned long next_blocked =3D READ_ONCE(nohz.next_blocke= d); > > rcu_read_lock(); > @@ -9137,7 +9141,7 @@ static void nohz_balancer_kick(struct rq *rq) > struct sched_domain *sd; > int nr_busy, i, cpu =3D rq->cpu; > unsigned int flags =3D 0; > - unsigned long has_blocked =3D READ_ONCE(nohz.has_blocked); > + unsigned long has_blocked =3D !cpumask_empty(nohz.stale_cpus_mask= ); > unsigned long next_blocked =3D READ_ONCE(nohz.next_blocked); > > if (unlikely(rq->idle_balance)) > @@ -9297,10 +9301,10 @@ void nohz_balance_enter_idle(int cpu) > > out: > /* > - * Each time a cpu enter idle, we assume that it has blocked load= and > - * enable the periodic update of the load of idle cpus > + * Each time a cpu enters idle, we assume that it has blocked loa= d and > + * enable the periodic update of its blocked load > */ > - WRITE_ONCE(nohz.has_blocked, 1); > + cpumask_set_cpu(cpu, nohz.stale_cpus_mask); > } > #else > static inline void nohz_balancer_kick(struct rq *rq) { } > @@ -9437,7 +9441,6 @@ static bool _nohz_idle_balance(struct rq *this_rq, = unsigned int flags, enum cpu_ > /* Earliest time when we have to do rebalance again */ > unsigned long now =3D jiffies; > unsigned long next_balance =3D now + 60*HZ; > - bool has_blocked_load =3D false; > int update_next_balance =3D 0; > int this_cpu =3D this_rq->cpu; > int balance_cpu; > @@ -9447,16 +9450,6 @@ static bool _nohz_idle_balance(struct rq *this_rq,= unsigned int flags, enum cpu_ > > SCHED_WARN_ON((flags & NOHZ_KICK_MASK) =3D=3D NOHZ_BALANCE_KICK); > > - /* > - * We assume there will be no idle load after this update and cle= ar > - * the has_blocked flag. If a cpu enters idle in the mean time, i= t will > - * set the has_blocked flag and trig another update of idle load. > - * Because a cpu that becomes idle, is added to idle_cpus_mask be= fore > - * setting the flag, we are sure to not clear the state and not > - * check the load of an idle cpu. > - */ > - WRITE_ONCE(nohz.has_blocked, 0); > - > for_each_cpu(balance_cpu, nohz.idle_cpus_mask) { > u64 t0, domain_cost; > > @@ -9466,29 +9459,34 @@ static bool _nohz_idle_balance(struct rq *this_rq= , unsigned int flags, enum cpu_ > continue; > > /* > + * When using the nohz balance to only update blocked loa= d, > + * we can skip nohz CPUs which no longer have blocked loa= d. > + */ > + if ((flags & NOHZ_KICK_MASK) =3D=3D NOHZ_STATS_KICK && > + !cpumask_test_cpu(balance_cpu, nohz.stale_cpus_mask)) > + continue; > + > + /* > * If this cpu gets work to do, stop the load balancing > * work being done for other cpus. Next load > * balancing owner will pick it up. > */ > - if (need_resched()) { > - has_blocked_load =3D true; > + if (need_resched()) > goto abort; > - } > > /* > * If the update is done while CPU becomes idle, we abort > * the update when its cost is higher than the average id= le > * time in orde to not delay a possible wake up. > */ > - if (idle =3D=3D CPU_NEWLY_IDLE && this_rq->avg_idle < cur= r_cost) { > - has_blocked_load =3D true; > + if (idle =3D=3D CPU_NEWLY_IDLE && this_rq->avg_idle < cur= r_cost) > goto abort; > - } > > rq =3D cpu_rq(balance_cpu); > > update_blocked_averages(rq->cpu); > - has_blocked_load |=3D rq->has_blocked_load; > + if (!rq->has_blocked_load) > + cpumask_clear_cpu(balance_cpu, nohz.stale_cpus_ma= sk); > > /* > * If time for next balance is due, > @@ -9519,7 +9517,8 @@ static bool _nohz_idle_balance(struct rq *this_rq, = unsigned int flags, enum cpu_ > /* Newly idle CPU doesn't need an update */ > if (idle !=3D CPU_NEWLY_IDLE) { > update_blocked_averages(this_cpu); > - has_blocked_load |=3D this_rq->has_blocked_load; > + if (!this_rq->has_blocked_load) > + cpumask_clear_cpu(this_cpu, nohz.stale_cpus_mask)= ; > } > > if (flags & NOHZ_BALANCE_KICK) > @@ -9532,10 +9531,6 @@ static bool _nohz_idle_balance(struct rq *this_rq,= unsigned int flags, enum cpu_ > ret =3D true; > > abort: > - /* There is still blocked load, enable periodic update */ > - if (has_blocked_load) > - WRITE_ONCE(nohz.has_blocked, 1); > - > /* > * next_balance will be updated only when there is a need. > * When the CPU is attached to null domain for ex, it will not be > @@ -10196,6 +10191,7 @@ __init void init_sched_fair_class(void) > nohz.next_balance =3D jiffies; > nohz.next_blocked =3D jiffies; > zalloc_cpumask_var(&nohz.idle_cpus_mask, GFP_NOWAIT); > + zalloc_cpumask_var(&nohz.stale_cpus_mask, GFP_NOWAIT); > #endif > #endif /* SMP */ > > -- > 2.7.4 > >