Received: by 2002:a25:23cc:0:0:0:0:0 with SMTP id j195csp1862849ybj; Wed, 6 May 2020 06:47:26 -0700 (PDT) X-Google-Smtp-Source: APiQypKHm9Noi0tIYsDc8i4j/yGdJ64p6GdnPt2OTIqns6XeiQH69FufUELuhaSywjArUjnkh2Tf X-Received: by 2002:a50:ce08:: with SMTP id y8mr5087380edi.248.1588772846529; Wed, 06 May 2020 06:47:26 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1588772846; cv=none; d=google.com; s=arc-20160816; b=DGBLCXanqgBy1As0ua6MjHlZm73/VkpCNQdYn8jhAkUWNuZtsRWGPuecDlCFEkYe2H 37LwemHY+LEjIYq4nG+q2xvXJOL+bI/ExXNOx7NvcnHLq1u+wXpZkuXwQiPCQIOCwx3A bSHeSqxUT+Zo5Gs7hd1i3EY54vDpgoYRlKfDfHnfONqBUlE8nx1gm0CtGvzlnlflV0/S B6OPj34mKKY36ao9CHW+821B5zoxM+Amo0JNsf9+4EncMiY4fbF2wwLyfXZUDYvtMzQ2 J1FJETYeNFabvP93W2qRX02T0/Ojsk7UIz0+FzuEP//+HV7Mh0KZu5Sadpi8fDknpAx0 k+fg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=oNMw1/oUmQe9wJ5IG8wrbgriwK3wMZ1bOEkTKJypMs4=; b=r44r0a7x2C4Zt1dv2+PdNuZmn+AXVJhc5RoJ8PcIPS9n825n0M5yLLswW00bOZbylN cOQf06Ec4fzHhm49zOocQzahhZWVcXCsv0dCZY2p0mLl+3hvpcZEYnPeujz5tUGW8hdd XlKbh2j4lvdMFn00wzQ0nkH8GN5fCzhOsOawoqEYZ/sykqe6H1erir81ova3vnsMrz/g 6TgExl/vWwtMSVbsoqof7H6Or23Q4mvy0wzWmOeZzVd5a4tS1EndLnp3dX2lrayLsNH8 cUG3Pwc54cYHdDSz+PUoEp3uFhLD1UAHDZKIyzf4+VlzOuMGW8rkObMs9f4B1sOHRChs 5AVg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=dMf6oT2D; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id z13si1036557ede.147.2020.05.06.06.47.02; Wed, 06 May 2020 06:47:26 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=dMf6oT2D; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728620AbgEFNpn (ORCPT + 99 others); Wed, 6 May 2020 09:45:43 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43250 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-FAIL-OK-FAIL) by vger.kernel.org with ESMTP id S1725966AbgEFNpm (ORCPT ); Wed, 6 May 2020 09:45:42 -0400 Received: from mail-lf1-x144.google.com (mail-lf1-x144.google.com [IPv6:2a00:1450:4864:20::144]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5D8DCC061A0F for ; Wed, 6 May 2020 06:45:42 -0700 (PDT) Received: by mail-lf1-x144.google.com with SMTP id 188so1376533lfa.10 for ; Wed, 06 May 2020 06:45:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=oNMw1/oUmQe9wJ5IG8wrbgriwK3wMZ1bOEkTKJypMs4=; b=dMf6oT2DO2yk/Rta1AQRBREJoVhXM6CqkriMdZHwYpd5adf/lEIDdhHN8FCarTxLov 2WjdtT3LffvxiLAJyvmR05yc1WPgxZHSnwquGGGjA7TEklgRR5uKoGK4nEGVdpwTOews xJ+U2Q4BiZTfk30AreuFL+ePDQqbWLnSbw5CytDhe1nmK2ErFNIF6Qk+JScOoF2cLupY nhUurXboYWL1FeJD3gQPgBfRACkrBnicRuEiB4H+3EYIS6kk33m9YxQH1l3CpgaJAoH9 fFG1W8X/wZEts5mRURxutxDhi4blnOt2ns/Vwg/dhQutv+XnS6nRICqfICOJSZMCfF+D WA/g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=oNMw1/oUmQe9wJ5IG8wrbgriwK3wMZ1bOEkTKJypMs4=; b=DS7IRAaGlLNwzl8d/mLqQK2kkVh6Xg4z0SSmxfsqMEswvlmm9ZvvPm/Cd8J9RoHEea mmO1bqSlJMMyjC2FhfLs3b58zRerDZv8Su5xAnH97hCfp88GtleWGCCT4TmVgm2WvbXj FU3LmG3jZv4onn5sAgd+31n3kbAmEJcvekIi44+tBZ3Aee0xyCCDixIYwhkbCIauhBxM 6lowc3KD+GSjgfXGerDMbo7Vkg4JLXvbRwxRvp12QS3GBH1zw34bUMS477ymWNEiADUS dXPgFhC4yjec1VFvhe9nO65GGedbJZGiHD9PaFZF/73MttD6dV6tFcO6J16Ruzp1H6JN VViA== X-Gm-Message-State: AGi0PublSswhx/ZTKjKHWL8gvJonMtdWXQhNNkIGCkENYuA/S+hxTsyG 9pEWNYy6w/bFROCJ0DWMPtMlMzDTRO/Tz1WkMRPbPg== X-Received: by 2002:ac2:46c1:: with SMTP id p1mr4912523lfo.25.1588772739676; Wed, 06 May 2020 06:45:39 -0700 (PDT) MIME-Version: 1.0 References: <20200503083407.GA27766@iZj6chx1xj0e0buvshuecpZ> <20200505134056.GA31680@iZj6chx1xj0e0buvshuecpZ> <20200505142711.GA12952@vingu-book> In-Reply-To: From: Vincent Guittot Date: Wed, 6 May 2020 15:45:28 +0200 Message-ID: Subject: Re: [PATCH] sched/fair: Fix nohz.next_balance update To: Valentin Schneider Cc: Peng Liu , Dietmar Eggemann , Ingo Molnar , Peter Zijlstra , Juri Lelli , Steven Rostedt , Ben Segall , Mel Gorman , linux-kernel Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 6 May 2020 at 12:29, Valentin Schneider wrote: > > > On 05/05/20 15:27, Vincent Guittot wrote: > > So I would be in favor of something as simple as : > > > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > > index 04098d678f3b..e028bc1c4744 100644 > > --- a/kernel/sched/fair.c > > +++ b/kernel/sched/fair.c > > @@ -10457,6 +10457,14 @@ static bool _nohz_idle_balance(struct rq *this_rq, unsigned int flags, > > } > > } > > > > + /* > > + * next_balance will be updated only when there is a need. > > + * When the CPU is attached to null domain for ex, it will not be > > + * updated. > > + */ > > + if (likely(update_next_balance)) > > + nohz.next_balance = next_balance; > > + > > /* Newly idle CPU doesn't need an update */ > > if (idle != CPU_NEWLY_IDLE) { > > update_blocked_averages(this_cpu); > > @@ -10477,14 +10485,6 @@ static bool _nohz_idle_balance(struct rq *this_rq, unsigned int flags, > > if (has_blocked_load) > > WRITE_ONCE(nohz.has_blocked, 1); > > > > - /* > > - * next_balance will be updated only when there is a need. > > - * When the CPU is attached to null domain for ex, it will not be > > - * updated. > > - */ > > - if (likely(update_next_balance)) > > - nohz.next_balance = next_balance; > > - > > return ret; > > } > > > > But then we may skip an update if we goto abort, no? Imagine we have just > NOHZ_STATS_KICK, so we don't call any rebalance_domains(), and then as we > go through the last NOHZ CPU in the loop we hit need_resched(). We would > end in the abort part without any update to nohz.next_balance, despite > having accumulated relevant data in the local next_balance variable. Yes but on the other end, the last CPU has not been able to run the rebalance_domain so we must not move nohz.next_balance otherwise it will have to wait for at least another full period In fact, I think that we have a problem with current implementation because if we abort because local cpu because busy we might end up skipping idle load balance for a lot of idle CPUs As an example, imagine that we have 10 idle CPUs with the same rq->next_balance which equal nohz.next_balance. _nohz_idle_balance starts on CPU0, it processes idle lb for CPU1 but then has to abort because of need_resched. If we update nohz.next_balance like currently, the next idle load balance will happen after a full balance interval whereas we still have 8 CPUs waiting for running an idle load balance. My proposal also fixes this problem > > Also note that in this case, nohz_idle_balance() will still return true. > > If we rip out just the one update we need from rebalance_domains(), then > perhaps we could go with what Peng was initially suggesting? i.e. something > like the below. > > --- > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index 46b7bd41573f..0a292e0a0731 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -9934,22 +9934,8 @@ static void rebalance_domains(struct rq *rq, enum cpu_idle_type idle) > * When the cpu is attached to null domain for ex, it will not be > * updated. > */ > - if (likely(update_next_balance)) { > + if (likely(update_next_balance)) > rq->next_balance = next_balance; > - > -#ifdef CONFIG_NO_HZ_COMMON > - /* > - * If this CPU has been elected to perform the nohz idle > - * balance. Other idle CPUs have already rebalanced with > - * nohz_idle_balance() and nohz.next_balance has been > - * updated accordingly. This CPU is now running the idle load > - * balance for itself and we need to update the > - * nohz.next_balance accordingly. > - */ > - if ((idle == CPU_IDLE) && time_after(nohz.next_balance, rq->next_balance)) > - nohz.next_balance = rq->next_balance; > -#endif > - } > } > > static inline int on_null_domain(struct rq *rq) > @@ -10315,6 +10301,11 @@ static bool _nohz_idle_balance(struct rq *this_rq, unsigned int flags, > if (flags & NOHZ_BALANCE_KICK) > rebalance_domains(this_rq, CPU_IDLE); > > + if (time_after(next_balance, this_rq->next_balance)) { > + next_balance = this_rq->next_balance; > + update_next_balance = 1; > + } > + > WRITE_ONCE(nohz.next_blocked, > now + msecs_to_jiffies(LOAD_AVG_PERIOD)); > > @@ -10551,6 +10542,17 @@ static __latent_entropy void run_rebalance_domains(struct softirq_action *h) > /* normal load balance */ > update_blocked_averages(this_rq->cpu); > rebalance_domains(this_rq, idle); > + > +#ifdef CONFIG_NO_HZ_COMMON > + /* > + * NOHZ idle CPUs will be rebalanced with nohz_idle_balance() and thus > + * nohz.next_balance will be updated accordingly. If there was no NOHZ > + * kick, then we just need to update nohz.next_balance wrt *this* CPU. > + */ > + if ((idle == CPU_IDLE) && > + time_after(nohz.next_balance, this_rq->next_balance)) > + nohz.next_balance = this_rq->next_balance; > +#endif > } > > /* > ---