Received: by 2002:a05:7412:518d:b0:e2:908c:2ebd with SMTP id fn13csp416312rdb; Thu, 5 Oct 2023 09:28:11 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGVWmF1kz1ezKdcKKCWQWrTjeT0hNDhnzsLASHM5N+mqtsuhlL+mcggo5E499kh0P3Yj36m X-Received: by 2002:a17:902:ea0a:b0:1bb:598a:14e5 with SMTP id s10-20020a170902ea0a00b001bb598a14e5mr7112627plg.43.1696523290859; Thu, 05 Oct 2023 09:28:10 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1696523290; cv=none; d=google.com; s=arc-20160816; b=E0UXEZO3uKlDmlxIqY3ZYOwNY4cEJmt2xMFHIisM2h3PovBrMRcpuc9oL608MJ34xc AHBMStEpRRVKXSux3Lii1F+UQ8m3xA681YN1AMTtVGhEUpnL6K3kSBBviOkRnmE8U+if jyfPzSs+Dz/yIOux/pD7lU/w1/nVGyf4X+3oeKru7ZF+vaUovUoHs5nUTta2xdN57m/M nwuP/juqpc3mzMOvB+k+AZozViREE5GjOkpg2VX2HMHo38/RGWYGtxjYup3ZOnhzJyma 3C8GH2sLMly+Xg2/R+IpIMykNRFkwJObHPFy9k64+Wse7qDhcvSZ6p25kkeeLk7LRoza vViA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=Qh8XUEMAE5d54Nrxpyy2HmngsV1jZcKsRa1rZEBsG8Y=; fh=z9xWQ9vkQ/qmnhanT5PuQm+6gXYvcA4UGJVeEioY8So=; b=dsK2MkHbJpHLU1x5Uk0A9uo6ibzWP6ok+cB2V3u/Uu0FhL2YI1NQuiLUjpYmgujK5P kbBqMrevn5+tbOJXt4EoUqS6PB9eESJAh6CI59gByUkwEJP8HPzrcVxnRtALshSWF0WJ wDl8JYpUPi7uWoUiKvuuH9Zp11/3EUf3tHVInE3/WbO9KuNXT4cvONNUlS3wFCUt1fvW gqCe+WHC+NxJBvmSKRxfdaSf6VbhzuJOMG8Fm3afgHQkFRsgQnjR7eazB2ybnFhWg6PT ikML33eb8R/Esb1itam1kknSsLC4RDqiWy3JyN2QxSxGx5gPlUySgRCQ/eSAutj1xtfT y8kQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@joelfernandes.org header.s=google header.b=p2T3uLy1; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from snail.vger.email (snail.vger.email. [23.128.96.37]) by mx.google.com with ESMTPS id r20-20020a170902c61400b001bb324569efsi1634615plr.364.2023.10.05.09.28.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 05 Oct 2023 09:28:10 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) client-ip=23.128.96.37; Authentication-Results: mx.google.com; dkim=pass header.i=@joelfernandes.org header.s=google header.b=p2T3uLy1; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by snail.vger.email (Postfix) with ESMTP id 257788319207; Thu, 5 Oct 2023 09:28:09 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at snail.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235386AbjJEQ1q (ORCPT + 99 others); Thu, 5 Oct 2023 12:27:46 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49796 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239917AbjJEQXk (ORCPT ); Thu, 5 Oct 2023 12:23:40 -0400 Received: from mail-il1-x12b.google.com (mail-il1-x12b.google.com [IPv6:2607:f8b0:4864:20::12b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 026506098 for ; Thu, 5 Oct 2023 09:17:38 -0700 (PDT) Received: by mail-il1-x12b.google.com with SMTP id e9e14a558f8ab-352753fb42eso5139135ab.1 for ; Thu, 05 Oct 2023 09:17:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=joelfernandes.org; s=google; t=1696522657; x=1697127457; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=Qh8XUEMAE5d54Nrxpyy2HmngsV1jZcKsRa1rZEBsG8Y=; b=p2T3uLy1NjgJjotCR5fTJca7vigzQZFGgK8+7HjcgkYocVhbKeV5rui3N8E4uyNa3r XV+Yl8+VJRdHZiXOw0BDMV3Qph7BWz9fZEtDXK8E7Sah2CcLNO7myeE+55u9nEOHiA28 9T5HZnTcAKjNDBK1tK2W06Vhe0EA+ZdVPLr1c= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1696522657; x=1697127457; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=Qh8XUEMAE5d54Nrxpyy2HmngsV1jZcKsRa1rZEBsG8Y=; b=B2zQKPfJ8u6wTppVrtxp4hiZLsVrSsehQprv+5vfFXitArHEbdZ5w9IH2+8PjWK5qE OzXYlL1C0AdsnS3jz/PjB3bXbBYhJvwpwYpKe7zLYgTR4hws/8B7Cmuc1XdI6iq/KcVa Kvb87HBNFYmed559Vd0G4QzeSAEKtSprdsaw/TuNi5SAvhwZvMXpb/QBECDf4Vl0vg79 gi1opIL40uFeq5Ncw1yCcdKCjIiw+ynH46dVVXRlLh7ncrnxmUzIyP2uqXaomYkETBdT TZNEmChRgCdLJpx08UGe5IqvfHTMrQzQHDUA07R6rpjVW20xJNQ/wZ1V59n9SfrqZGaZ j1Vw== X-Gm-Message-State: AOJu0YwmS/z3Pler1NUfjeyF9sF+oJhfJ3/IR/cVEu28JXWR24MKcx6E 2mP+B59Oyua3qVjGdxvc6psh3Cgtgpgj82iTcHg= X-Received: by 2002:a05:6e02:2143:b0:351:4b68:ec3b with SMTP id d3-20020a056e02214300b003514b68ec3bmr7099750ilv.10.1696522657564; Thu, 05 Oct 2023 09:17:37 -0700 (PDT) Received: from joelboxx5.c.googlers.com.com (161.74.123.34.bc.googleusercontent.com. [34.123.74.161]) by smtp.gmail.com with ESMTPSA id a18-20020a927f12000000b0034aa175c9c3sm494294ild.87.2023.10.05.09.17.35 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 05 Oct 2023 09:17:36 -0700 (PDT) From: "Joel Fernandes (Google)" To: linux-kernel@vger.kernel.org, Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Daniel Bristot de Oliveira , Valentin Schneider Cc: Vineeth Pillai , Suleiman Souhlal , Hsin Yi , Frederic Weisbecker , "Paul E . McKenney" , Joel Fernandes Subject: [PATCH RFC] sched/fair: Avoid unnecessary IPIs for ILB Date: Thu, 5 Oct 2023 16:17:26 +0000 Message-ID: <20231005161727.1855004-1-joel@joelfernandes.org> X-Mailer: git-send-email 2.42.0.609.gbb76f46606-goog MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (snail.vger.email [0.0.0.0]); Thu, 05 Oct 2023 09:28:09 -0700 (PDT) From: Vineeth Pillai Whenever a CPU stops its tick, it now requires another idle CPU to handle the balancing for it because it can't perform its own periodic load balancing. This means it might need to update 'nohz.next_balance' to 'rq->next_balance' if the upcoming nohz-idle load balancing is too distant in the future. This update process is done by triggering an ILB, as the general ILB handler (_nohz_idle_balance) that manages regular nohz balancing also refreshes 'nohz.next_balance' by looking at the 'rq->next_balance' of all other idle CPUs and selecting the smallest value. Triggering this ILB can be achieved by setting the NOHZ_NEXT_KICK flag. This primarily results in the ILB handler updating 'nohz.next_balance' while possibly not doing any load balancing at all. However, sending an IPI merely to refresh 'nohz.next_balance' seems excessive, and there ought to be a more efficient method to update 'nohz.next_balance' from the local CPU. Fortunately, there already exists a mechanism to directly invoke the ILB handler (_nohz_idle_balance) without initiating an IPI. It's accomplished by setting the NOHZ_NEWILB_KICK flag. This flag is set during regular "newly idle" balancing and solely exists to update a CPU's blocked load if it couldn't pull more tasks during regular "newly idle balancing" - and it does so without having to send any IPIs. Once the flag is set, the ILB handler is called directly from do_idle()-> nohz_run_idle_balance(). While its goal is to update the blocked load without an IPI, in our situation, we aim to refresh 'nohz.next_balance' without an IPI but we can piggy back on this. So in this patch, we reuse this mechanism by also setting the NOHZ_NEXT_KICK to indicate nohz.next_balance needs an update via this direct call shortcut. Note that we set this flag without knowledge that the tick is about to be stopped, because at the point we do it, we have no way of knowing that. However we do know that the CPU is about to enter idle. In our testing, the reduction in IPIs is well worth updating nohz.next_balance a few more times. Also just to note, without this patch we observe the following pattern: 1. A CPU is about to stop its tick. 2. It sets nohz.needs_update to 1. 3. It then stops its tick and goes idle. 4. The scheduler tick on another CPU checks this flag and decides an ILB kick is needed. 5. The ILB CPU ends up being the one that just stopped its tick! 6. This results in an IPI to the tick-stopped CPU which ends up waking it up and disturbing it! Testing shows a considerable reduction in IPIs when doing this: Running "cyclictest -i 100 -d 100 --latency=1000 -t -m" on a 4vcpu VM the IPI call count profiled over 10s period is as follows: without fix: ~10500 with fix: ~1000 Fixes: 7fd7a9e0caba ("sched/fair: Trigger nohz.next_balance updates when a CPU goes NOHZ-idle") [ Joel: wrote commit messages, collaborated on fix, helped reproduce issue etc. ] Cc: Suleiman Souhlal Cc: Steven Rostedt Cc: Hsin Yi Cc: Frederic Weisbecker Cc: Paul E. McKenney Signed-off-by: Vineeth Pillai Co-developed-by: Joel Fernandes (Google) Signed-off-by: Joel Fernandes (Google) --- kernel/sched/fair.c | 21 ++++++++++++++------- 1 file changed, 14 insertions(+), 7 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index cb225921bbca..2ece55f32782 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -11786,13 +11786,12 @@ void nohz_balance_enter_idle(int cpu) /* * Ensures that if nohz_idle_balance() fails to observe our * @idle_cpus_mask store, it must observe the @has_blocked - * and @needs_update stores. + * stores. */ smp_mb__after_atomic(); set_cpu_sd_state_idle(cpu); - WRITE_ONCE(nohz.needs_update, 1); out: /* * Each time a cpu enter idle, we assume that it has blocked load and @@ -11945,21 +11944,25 @@ static bool nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle) } /* - * Check if we need to run the ILB for updating blocked load before entering - * idle state. + * Check if we need to run the ILB for updating blocked load and/or updating + * nohz.next_balance before entering idle state. */ void nohz_run_idle_balance(int cpu) { unsigned int flags; - flags = atomic_fetch_andnot(NOHZ_NEWILB_KICK, nohz_flags(cpu)); + flags = atomic_fetch_andnot(NOHZ_NEWILB_KICK | NOHZ_NEXT_KICK, nohz_flags(cpu)); + + if (!flags) + return; /* * Update the blocked load only if no SCHED_SOFTIRQ is about to happen * (ie NOHZ_STATS_KICK set) and will do the same. */ - if ((flags == NOHZ_NEWILB_KICK) && !need_resched()) - _nohz_idle_balance(cpu_rq(cpu), NOHZ_STATS_KICK); + if ((flags == (flags & (NOHZ_NEXT_KICK | NOHZ_NEWILB_KICK))) && + !need_resched()) + _nohz_idle_balance(cpu_rq(cpu), flags); } static void nohz_newidle_balance(struct rq *this_rq) @@ -11977,6 +11980,10 @@ static void nohz_newidle_balance(struct rq *this_rq) if (this_rq->avg_idle < sysctl_sched_migration_cost) return; + /* If rq->next_balance before nohz.next_balance, trigger ILB */ + if (time_before(this_rq->next_balance, READ_ONCE(nohz.next_balance))) + atomic_or(NOHZ_NEXT_KICK, nohz_flags(this_cpu)); + /* Don't need to update blocked load of idle CPUs*/ if (!READ_ONCE(nohz.has_blocked) || time_before(jiffies, READ_ONCE(nohz.next_blocked))) -- 2.42.0.609.gbb76f46606-goog