Received: by 2002:a05:6a10:206:0:0:0:0 with SMTP id 6csp3722358pxj; Tue, 11 May 2021 10:27:06 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyTFhX0lqMTSN8hCF20pYmFtRj9EZrTBVSQruCL6M1KwPnb+iQDZHUVYIsFWNjODdYBiEvm X-Received: by 2002:aa7:db0c:: with SMTP id t12mr37481868eds.72.1620754025836; Tue, 11 May 2021 10:27:05 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1620754025; cv=none; d=google.com; s=arc-20160816; b=N2Jcn6lTIfpKj8MnNCUXGPcvVyoprGiMG6fUlcmMKaNMOWmxsGJw/qR/UeXUHmMrOS Quz6q51n+ylqE99ETi835YfkJGc60k3SZ25w5Khqzjg7U0RrSPEtXhsQmWqb8Gzv/VWY 65UfmAfgUr/6Iog8acQDOsxTRIKznsnQ9icz+lg9LEujgGFO6XyN0+09lepSGow7pLQ8 U9eDeXa4rH3Ulu31OKxEkP0YAuhDT6JXFgVtb4MmX0QmmNE7EnSJlWJw8vUkxrglKPSg eOWy2ZxbXQN6hBosoamW/54VC9LFWOoqqDdkCwG17YsBefkAP+F0BANJG7WKmE6No+zF Cg6g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:from:references :cc:to:subject:ironport-sdr:ironport-sdr; bh=PidtBt7Y1QOHTx8q+fj+fe4QGOJP7qaj0mqYOI2b2Qk=; b=NbBou/k1JdWP8jqIahC5FdIRIo2ERWILIl+fzVjeO6071Cfw34ifiB72jqQ9QhyW0a jtjvJ0FIXyjAfEfiJxVqLZZjBbv7x0qtNmWiMJSFGewUzme+zObh9VzxTDO8o8cc0Njm u9Q51fBn0muPbmFyLkqngHtcPbeRufpZ9Wfz+WevLdlZPyG1MOHhp79AmC7KoCY7VABb +mPHmmu+sBq/gDRfeNcJUUdLXvTbH9PMY1eooF6whdSArwwGYzoTrzLTgvSQ35vTRwB5 X76pjcskYW7Y9bcEZorMxeFFkh0M2JhvVCAulax6ZDo82ruYGhBEjPy5LoOGyspUMKjC 3ZxA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id gs19si16994767ejc.707.2021.05.11.10.26.41; Tue, 11 May 2021 10:27:05 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231851AbhEKR0q (ORCPT + 99 others); Tue, 11 May 2021 13:26:46 -0400 Received: from mga17.intel.com ([192.55.52.151]:26995 "EHLO mga17.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231230AbhEKR0p (ORCPT ); Tue, 11 May 2021 13:26:45 -0400 IronPort-SDR: FSAbzaBxW+7acldlcrJKcDokK18P3YxCA6jFbcw0fvpavcBr+XgLcLbBk+9HT1ZrGNZDG9KIsd DWAJ7k+RbNOQ== X-IronPort-AV: E=McAfee;i="6200,9189,9981"; a="179764063" X-IronPort-AV: E=Sophos;i="5.82,291,1613462400"; d="scan'208";a="179764063" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 May 2021 10:25:38 -0700 IronPort-SDR: obO83tK9sdWGcdECl4zUVLZwfdJRstBVAXSkwf0uX1c8H/kJHUPIZ1TM82dxRngv3drdIIr6yl CbbnvVaDbtSg== X-IronPort-AV: E=Sophos;i="5.82,291,1613462400"; d="scan'208";a="436766874" Received: from schen9-mobl.amr.corp.intel.com ([10.251.18.81]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 May 2021 10:25:37 -0700 Subject: Re: [PATCH] sched/fair: Rate limit calls to update_blocked_averages() for NOHZ To: Vincent Guittot Cc: Joel Fernandes , linux-kernel , Paul McKenney , Frederic Weisbecker , Dietmar Eggeman , Qais Yousef , Ben Segall , Daniel Bristot de Oliveira , Ingo Molnar , Juri Lelli , Mel Gorman , Peter Zijlstra , Steven Rostedt , "Uladzislau Rezki (Sony)" , Neeraj upadhyay , Aubrey Li References: <20210122154600.1722680-1-joel@joelfernandes.org> <20210129172727.GA30719@vingu-book> <274d8ae5-8f4d-7662-0e04-2fbc92b416fc@linux.intel.com> <20210324134437.GA17675@vingu-book> <4aa674d9-db49-83d5-356f-a20f9e2a7935@linux.intel.com> <2d2294ce-f1d1-f827-754b-4541c1b43be8@linux.intel.com> From: Tim Chen Message-ID: <577b0aae-0111-97aa-0c99-c2a2fcfb5e2e@linux.intel.com> Date: Tue, 11 May 2021 10:25:37 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.6.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 5/11/21 8:25 AM, Vincent Guittot wrote: > Hi Tim, > > Sometimes, we want to set this_rq->next_balance backward compared to > its current value. When a CPU is busy, the balance interval is > multiplied by busy_factor which is set to 16 by default. On SMT2 sched > domain level, it means that the interval will be 32ms when busy > instead of 2ms. But if a busy load balance happens just before > becoming idle, the this_rq->next_balance will be set 32ms later > whereas it should go down to 2ms as the CPU is now idle. And this > becomes even worse when you have 128 CPUs at die sched_domain level > because the idle CPU will have to wait 2048ms instead of the correct > 128ms interval. > >> >> out: >> /* Move the next balance forward */ >> - if (time_after(this_rq->next_balance, next_balance)) >> + if (time_after(next_balance, this_rq->next_balance)) > > The current comparison is correct but next_balance should not be in the past. I understand then the intention is after the update, this_rq->next_balance should have a minimum value of jiffies+1. So we will need out: /* Move the next balance forward */ + this_rq->next_balance = max(jiffies+1, this_rq->next_balance); if (time_after(this_rq->next_balance, next_balance)) this_rq->next_balance = next_balance; as next_balance computed will be at least jiffies+1 after your fix to update_next_balance(). We still need to take care of the case when this_rq->next_balance <= jiffies. So combining with your suggestion on the fix to update_next_balance(), the fix will be diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 1d75af1ecfb4..2dc471c1511c 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -9901,7 +9901,7 @@ update_next_balance(struct sched_domain *sd, unsigned long *next_balance) /* used by idle balance, so cpu_busy = 0 */ interval = get_sd_balance_interval(sd, 0); - next = sd->last_balance + interval; + next = max(jiffies+1, sd->last_balance + interval); if (time_after(*next_balance, next)) *next_balance = next; @@ -10681,6 +10681,7 @@ static int newidle_balance(struct rq *this_rq, struct rq_flags *rf) out: /* Move the next balance forward */ + this_rq->next_balance = max(jiffies+1, this_rq->next_balance); if (time_after(this_rq->next_balance, next_balance)) this_rq->next_balance = next_balance; > > update_next_balance() is only used in newidle_balance() so we could > modify it to have: > > next = max(jiffies+1, next = sd->last_balance + interval) Is the extra assignment "next = sd->last_balance + interval" needed? This seems more straight forward: next = max(jiffies+1, sd->last_balance + interval) I will try to get the benchmark folks to do another run with this change. Hopefully I'll get some bandwidth from them soon. Thanks. Tim