Received: by 10.223.176.5 with SMTP id f5csp847368wra; Tue, 6 Feb 2018 08:20:32 -0800 (PST) X-Google-Smtp-Source: AH8x227emiDtySEPN1iIgmD8PtE/iOsHz0P9k+MSks09hlsskDCOqaoqRFnAXrlZAqr6i2xz36d4 X-Received: by 10.101.88.7 with SMTP id g7mr2408849pgr.249.1517934032124; Tue, 06 Feb 2018 08:20:32 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1517934032; cv=none; d=google.com; s=arc-20160816; b=l5ZG76RJBawCffgmj3Jv2u5ihBkFl356zhFifUopo8VvrcZkYi64wGWrOIAleeDflA /SZFvT7yZX9R54VO3+E3cIB3SWVySsRpRD+Lb0wN1/lsAZRYgEixAgIW0XX3gRLg3Duj tAXghJFkPQF5K2FSvjLmjH+HypIb9wHL08NxfYRwOo3SMk+yWBJOrl8ItZfYljYJ3IIr CbM1o46DfNm7CpoL3+okeaai/Vc/ENesN7z64ahrsVXamvtU7dB+6qZ/vtEYqXYM5PUW jYyigFS2eIeKO+NRq0mBqBatbt9HDQj+HVpg5sjAvlSE4Ooqczekd9cnD7RLlf446lQR NPuQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :references:in-reply-to:mime-version:dkim-signature :arc-authentication-results; bh=PRtJYlnSRgcvQprDWWdN+JrX3Apnwdj2a7ygz65oFNU=; b=Q80FhBhawObytp3FX2V1B5skLGT6GgE+b2SzXvE69Y5QUMCZnj4C3cDygIdq7X6Mc5 86yzg8j9bAt9ET2o2KSiERTCWtNsTM8cgF/r4IO/bFXzv4YuWKL/j2xkGAvxXrRh1Mxh lI7yZIiw8J8PIEgp0/lvPLw5GuXFD6fMBfq+FRYC50kh/AmbfjRXWWk7imQP1iIPuTLh k47IzbykVZSUzL0gG6EPaRj5avb9HqrSKcTv0ogY+E4kcc07Usk7LQl8TviCZx22S8++ DKKnV5Eynz5GhpM7j2MJN23h3qvNMYDsVoYiY+jmywGG7xMOHUVrWIk0ye1AJhiJqL3f p85Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=GOy4NUlJ; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id k3si1367409pgt.109.2018.02.06.08.20.18; Tue, 06 Feb 2018 08:20:32 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=GOy4NUlJ; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752738AbeBFQRy (ORCPT + 99 others); Tue, 6 Feb 2018 11:17:54 -0500 Received: from mail-it0-f68.google.com ([209.85.214.68]:35545 "EHLO mail-it0-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752222AbeBFQRq (ORCPT ); Tue, 6 Feb 2018 11:17:46 -0500 Received: by mail-it0-f68.google.com with SMTP id e1so3069107ita.0 for ; Tue, 06 Feb 2018 08:17:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=PRtJYlnSRgcvQprDWWdN+JrX3Apnwdj2a7ygz65oFNU=; b=GOy4NUlJoA07rXX5fdIySaSJtxR2aaSGk7Bvu85YgoW1G4L7LM2wPi6TcfDpaZ8UOG DOJJad/SoxPxBwUHfz6Kztain7KE0TxVT3bebaHs9yjoBnVak3/A7MhuCFP4B+jn3Jda G8OARNMVRlQ+3ShtKL6ZoTtNPQ5imnNGTzAZ4= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=PRtJYlnSRgcvQprDWWdN+JrX3Apnwdj2a7ygz65oFNU=; b=VOJ4xu5BPYiXLKTv4CEKmMm6PeZBOSXlSjhq+At1xKsW8nDHzF5eWVHsIlR+NWMsmx CsrOUgktODacdV8U+cmb+P9SdBmNx8vKNDzgh+4nNt8nVndaJk2C7pmLcijDvtFhLPVg uiMBkbwW9I7jpAk/AoV3+OWKIkiHeYqNTpVdCtlwbajiK7D2WO0f+b0YKiJ0NAoNrFaQ 6qZtRhPD8uMGXSqCAGertoNz+0pIKhcJNshTPLCTeTLyEpTCe8a5nM/b7AHDhM5jnplj vuSwFIoShcSaKpFD8UFJPt8cBWUb9SBMtHuqvzYk6R0o3dSSYEBdN1qLTC63cCvikSyR qA0Q== X-Gm-Message-State: APf1xPArQG3znLS6L0NKGLDkZ/cIR+5oyMfFs3Q/A+W49Gm8yYsTMYL3 +Dl+volyyzoOYExaE0gv0VCN8JPtil6q6coYYJftYA== X-Received: by 10.36.95.72 with SMTP id r69mr3841347itb.113.1517933865742; Tue, 06 Feb 2018 08:17:45 -0800 (PST) MIME-Version: 1.0 Received: by 10.107.50.198 with HTTP; Tue, 6 Feb 2018 08:17:25 -0800 (PST) In-Reply-To: <2e84ae36-9fb7-2fc5-3acd-45b362b6fb94@arm.com> References: <20180201181041.GF2269@hirez.programming.kicks-ass.net> <1517905943-24528-1-git-send-email-vincent.guittot@linaro.org> <1517905943-24528-3-git-send-email-vincent.guittot@linaro.org> <2e84ae36-9fb7-2fc5-3acd-45b362b6fb94@arm.com> From: Vincent Guittot Date: Tue, 6 Feb 2018 17:17:25 +0100 Message-ID: Subject: Re: [PATCH 3/3] sched: update blocked load when newly idle To: Valentin Schneider Cc: Peter Zijlstra , Ingo Molnar , linux-kernel , Morten Rasmussen , Brendan Jackman , Dietmar Eggemann Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 6 February 2018 at 15:32, Valentin Schneider wrote: > Hi Vincent, > > On 02/06/2018 08:32 AM, Vincent Guittot wrote: >> When NEWLY_IDLE load balance is not triggered, we might need to update the >> blocked load anyway. We can kick an ilb so an idle CPU will take care of >> updating blocked load or we can try to update them locally before entering >> idle. In the latter case, we reuse part of the nohz_idle_balance. >> >> Signed-off-by: Vincent Guittot >> --- >> kernel/sched/fair.c | 102 ++++++++++++++++++++++++++++++++++++++++++---------- >> 1 file changed, 84 insertions(+), 18 deletions(-) >> >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c >> index 6998528..256defe 100644 >> --- a/kernel/sched/fair.c >> +++ b/kernel/sched/fair.c >> @@ -8829,6 +8829,9 @@ update_next_balance(struct sched_domain *sd, unsigned long *next_balance) >> *next_balance = next; >> } >> >> +static bool _nohz_idle_balance(struct rq *this_rq, unsigned int flags, enum cpu_idle_type idle); >> +static void kick_ilb(unsigned int flags); >> + >> /* >> * idle_balance is called by schedule() if this_cpu is about to become >> * idle. Attempts to pull tasks from other CPUs. >> @@ -8863,12 +8866,26 @@ static int idle_balance(struct rq *this_rq, struct rq_flags *rf) >> >> if (this_rq->avg_idle < sysctl_sched_migration_cost || >> !this_rq->rd->overload) { >> + unsigned long has_blocked = READ_ONCE(nohz.has_blocked); >> + unsigned long next = READ_ONCE(nohz.next_blocked); > > Ditto on 'next' - there's next_balance referenced in here so it'd be nice to make clear which is which. > >> + >> rcu_read_lock(); >> sd = rcu_dereference_check_sched_domain(this_rq->sd); >> if (sd) >> update_next_balance(sd, &next_balance); >> rcu_read_unlock(); >> >> + /* >> + * Update blocked idle load if it has not been done for a >> + * while. Try to do it locally before entering idle but kick a >> + * ilb if it takes too much time and/or might delay next local >> + * wake up >> + */ >> + if (has_blocked && time_after_eq(jiffies, next) && >> + (this_rq->avg_idle < sysctl_sched_migration_cost || >> + !_nohz_idle_balance(this_rq, NOHZ_STATS_KICK, CPU_NEWLY_IDLE))) > > "this_rq->avg_idle < sysctl_sched_migration_cost" is used twice now, how about storing it in an "idle_too_short" variable ? In fact it's already the 3rd time Why do you want it to be stored in an "idle_too_short" variable ? > >> + kick_ilb(NOHZ_STATS_KICK); >> + >> goto out; >> } >> >> @@ -9393,30 +9410,24 @@ static void rebalance_domains(struct rq *rq, enum cpu_idle_type idle) >> >> #ifdef CONFIG_NO_HZ_COMMON >> /* >> - * In CONFIG_NO_HZ_COMMON case, the idle balance kickee will do the >> - * rebalancing for all the cpus for whom scheduler ticks are stopped. >> + * Internal function that runs load balance for all idle cpus. The load balance >> + * can be a simple update of blocked load or a complete load balance with >> + * tasks movement depending of flags. >> + * For newly idle mode, we abort the loop if it takes too much time and return >> + * false to notify that the loop has not be completed and a ilb should be kick. >> */ >> -static bool nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle) >> +static bool _nohz_idle_balance(struct rq *this_rq, unsigned int flags, enum cpu_idle_type idle) >> { >> /* Earliest time when we have to do rebalance again */ >> unsigned long now = jiffies; >> unsigned long next_balance = now + 60*HZ; >> - unsigned long next_stats = now + msecs_to_jiffies(LOAD_AVG_PERIOD); >> + bool has_blocked_load = false; >> int update_next_balance = 0; >> int this_cpu = this_rq->cpu; >> - unsigned int flags; >> int balance_cpu; >> + int ret = false; >> struct rq *rq; >> - >> - if (!(atomic_read(nohz_flags(this_cpu)) & NOHZ_KICK_MASK)) >> - return false; >> - >> - if (idle != CPU_IDLE) { >> - atomic_andnot(NOHZ_KICK_MASK, nohz_flags(this_cpu)); >> - return false; >> - } >> - >> - flags = atomic_fetch_andnot(NOHZ_KICK_MASK, nohz_flags(this_cpu)); >> + u64 curr_cost = 0; >> >> SCHED_WARN_ON((flags & NOHZ_KICK_MASK) == NOHZ_BALANCE_KICK); >> >> @@ -9431,6 +9442,10 @@ static bool nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle) >> WRITE_ONCE(nohz.has_blocked, 0); >> >> for_each_cpu(balance_cpu, nohz.idle_cpus_mask) { >> + u64 t0, domain_cost; >> + >> + t0 = sched_clock_cpu(this_cpu); >> + >> if (balance_cpu == this_cpu || !idle_cpu(balance_cpu)) >> continue; >> >> @@ -9444,6 +9459,16 @@ static bool nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle) >> goto abort; >> } >> >> + /* >> + * If the update is done while CPU becomes idle, we abort >> + * the update when its cost is higher than the average idle >> + * time in orde to not delay a possible wake up. >> + */ >> + if (idle == CPU_NEWLY_IDLE && this_rq->avg_idle < curr_cost) { >> + has_blocked_load = true; >> + goto abort; >> + } >> + >> rq = cpu_rq(balance_cpu); >> >> update_blocked_averages(rq->cpu); >> @@ -9456,10 +9481,10 @@ static bool nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle) >> if (time_after_eq(jiffies, rq->next_balance)) { >> struct rq_flags rf; >> >> - rq_lock_irq(rq, &rf); >> + rq_lock_irqsave(rq, &rf); >> update_rq_clock(rq); >> cpu_load_update_idle(rq); >> - rq_unlock_irq(rq, &rf); >> + rq_unlock_irqrestore(rq, &rf); >> >> if (flags & NOHZ_BALANCE_KICK) >> rebalance_domains(rq, CPU_IDLE); >> @@ -9469,15 +9494,27 @@ static bool nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle) >> next_balance = rq->next_balance; >> update_next_balance = 1; >> } >> + >> + domain_cost = sched_clock_cpu(this_cpu) - t0; >> + curr_cost += domain_cost; >> + >> + } >> + >> + /* Newly idle CPU doesn't need an update */ >> + if (idle != CPU_NEWLY_IDLE) { >> + update_blocked_averages(this_cpu); >> + has_blocked_load |= this_rq->has_blocked_load; >> } >> >> - update_blocked_averages(this_cpu); >> if (flags & NOHZ_BALANCE_KICK) >> rebalance_domains(this_rq, CPU_IDLE); >> >> WRITE_ONCE(nohz.next_blocked, >> now + msecs_to_jiffies(LOAD_AVG_PERIOD)); >> >> + /* The full idle balance loop has been done */ >> + ret = true; >> + >> abort: >> /* There is still blocked load, enable periodic update */ >> if (has_blocked_load) >> @@ -9491,6 +9528,35 @@ static bool nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle) >> if (likely(update_next_balance)) >> nohz.next_balance = next_balance; >> >> + return ret; >> +} >> + >> +/* >> + * In CONFIG_NO_HZ_COMMON case, the idle balance kickee will do the >> + * rebalancing for all the cpus for whom scheduler ticks are stopped. >> + */ >> +static bool nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle) >> +{ >> + int this_cpu = this_rq->cpu; >> + unsigned int flags; >> + >> + if (!(atomic_read(nohz_flags(this_cpu)) & NOHZ_KICK_MASK)) >> + return false; >> + >> + if (idle != CPU_IDLE) { >> + atomic_andnot(NOHZ_KICK_MASK, nohz_flags(this_cpu)); >> + return false; >> + } >> + >> + /* >> + * barrier, pairs with nohz_balance_enter_idle(), ensures ... >> + */ >> + flags = atomic_fetch_andnot(NOHZ_KICK_MASK, nohz_flags(this_cpu)); >> + if (!(flags & NOHZ_KICK_MASK)) >> + return false; >> + >> + _nohz_idle_balance(this_rq, flags, idle); >> + >> return true; >> } >> #else >>