Received: by 2002:a05:7412:d8a:b0:e2:908c:2ebd with SMTP id b10csp364528rdg; Tue, 10 Oct 2023 12:32:27 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGREUZH96Pl6FKN54M8D4j/Yer1fyEjKecXKft31vf7GMRNUJ6nxRyi0ZiQZycbctJ7wBh7 X-Received: by 2002:a9d:6c85:0:b0:6c6:19a6:29bf with SMTP id c5-20020a9d6c85000000b006c619a629bfmr21170213otr.7.1696966347645; Tue, 10 Oct 2023 12:32:27 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1696966347; cv=none; d=google.com; s=arc-20160816; b=Ic92eCwSERP+fduQUhwWynbgEjspPctoOWHUzHzSoVREMLbENPZYT+wAYLTQSpj/5M o68CdLJjFO3Gr1WmuR9YxRnAu4/tit9EeNg2EiSP6HApLLgy8c9fgGG9lhKoduacfw36 iaJWXPZXUkm7tNAY2AW5Uka4PWWs2ZwB9Sm8j7wk6Dshvp2v1kMFnrBv+Hkm2G3uZLlV Lnifi051TrsQDjYMucbnDmjRypgmdoS7Glohds4XhtNKDC+91Fx6zn2G4J2xUyT1ae5J wBbrXsRrioiRGd3Mp/mUPcfyx23ewGfawQskc4oL32NpsIKi31awo81lz6lnln1fjrg0 THgA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=F6rWOV5DfaLcoCZaQVzVowbVRnkQP9w8oQZ9fEV5VrY=; fh=PiaW7NmupUA99Y33QPg4LDICvb9/6W5xkJP9418G4pk=; b=aq6Q2Cl8z3tHXMwnTAAsW4yI8IlVaI/6e/jv/H8dCq8ItieHW9yM3yOk5zUqlsQyJt umobVs/iVrtW54+HJXX0uYrCWTaXQdUkjctIIeQG5SOkv8kpZOTCKIPtQzZ3HRCtM3s/ /VqdtZsPiiIqU+36scC5MVfrE/1HqYYM56AAY9xp6dQ0mk0ZB4xGKa96KMKp0/Z9TD7b Hym0s+S9xjFD7m7wN1lNrwitINfG9ypldHuEZBPbFFn66Dj013Xa7WxXAhZSN6VfeI7P EMAH8/jrbRlvObkATvkrngMFslGWywN0nCm+MlFqUVFVMvPCP6FNgKvrynbuR6vu3bCA Wa9Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@joelfernandes.org header.s=google header.b=ds6fwuJF; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:6 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from pete.vger.email (pete.vger.email. [2620:137:e000::3:6]) by mx.google.com with ESMTPS id b15-20020a6567cf000000b00577fc59373fsi12589243pgs.296.2023.10.10.12.32.27 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 10 Oct 2023 12:32:27 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:6 as permitted sender) client-ip=2620:137:e000::3:6; Authentication-Results: mx.google.com; dkim=pass header.i=@joelfernandes.org header.s=google header.b=ds6fwuJF; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:6 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by pete.vger.email (Postfix) with ESMTP id 3E282802930F; Tue, 10 Oct 2023 12:32:25 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at pete.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234229AbjJJTcL (ORCPT + 99 others); Tue, 10 Oct 2023 15:32:11 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48404 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230195AbjJJTcK (ORCPT ); Tue, 10 Oct 2023 15:32:10 -0400 Received: from mail-io1-xd36.google.com (mail-io1-xd36.google.com [IPv6:2607:f8b0:4864:20::d36]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 958A3AF for ; Tue, 10 Oct 2023 12:32:08 -0700 (PDT) Received: by mail-io1-xd36.google.com with SMTP id ca18e2360f4ac-79f915e5b47so250710839f.2 for ; Tue, 10 Oct 2023 12:32:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=joelfernandes.org; s=google; t=1696966328; x=1697571128; darn=vger.kernel.org; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:from:to :cc:subject:date:message-id:reply-to; bh=F6rWOV5DfaLcoCZaQVzVowbVRnkQP9w8oQZ9fEV5VrY=; b=ds6fwuJFaCxNz/Y0SRhA0/k5056pWq1/s5VxZvR3wmPw8+7nC6jdM7Hr2sS+ZtNsZW wg5kNVy5Z17Beek9nzP3BfVs3n5crLpHBnwbTdvhR654coSDGkNKW7p4Pr7ysTW7Tl/m LGV53GIrTSJpBAhkcWI7unhMhA1KA0eciGEMg= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1696966328; x=1697571128; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=F6rWOV5DfaLcoCZaQVzVowbVRnkQP9w8oQZ9fEV5VrY=; b=l55E90vfwxf627SpWqUcVrEjl8gV9bTNiMYrODu1qHnKMPk9BHHLCp0jVWeh65+JoE gtA9HCUeX2P44nOWgRXtJZ16cy2hk2L+Jlr58NWABk8dKdGqaDbE45QVGZTilhyi6y6O 4FVqB03pqtupdNIPsH/ZeEgpIssSwuPNv/vPompx0mNLg4BnRhgGAeKUWiOEVs+eJ8gE rFOpKp5RZkaGfOwW1YtCTM3qe/A1WQ5c7V/92Pic553s1rUf7x18xHYCPs7ZOuQm4BEJ 1i7mINYfvwEytyTFQ5nbqVTwP8vkhN43TCTj3iMyczjx5irZfzfsArz8y93nUl34AzpA +w7A== X-Gm-Message-State: AOJu0YyLUN4fKcKnpuhlWVcdIcKwdLord8jxVljMbWW4M/gmiGeZiFi7 ftnT3JQk6okZBTCeMXHhbx7pUuwQDRj+EAtGHO8= X-Received: by 2002:a6b:5c04:0:b0:79f:99b6:63 with SMTP id z4-20020a6b5c04000000b0079f99b60063mr22085698ioh.9.1696966327782; Tue, 10 Oct 2023 12:32:07 -0700 (PDT) Received: from localhost (161.74.123.34.bc.googleusercontent.com. [34.123.74.161]) by smtp.gmail.com with ESMTPSA id a26-20020a029f9a000000b0042baffe832fsm2989658jam.101.2023.10.10.12.32.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 10 Oct 2023 12:32:05 -0700 (PDT) Date: Tue, 10 Oct 2023 19:32:05 +0000 From: Joel Fernandes To: Vincent Guittot Cc: Ingo Molnar , linux-kernel@vger.kernel.org, Ingo Molnar , Peter Zijlstra , Juri Lelli , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Daniel Bristot de Oliveira , Valentin Schneider , Vineeth Pillai , Suleiman Souhlal , Hsin Yi , Frederic Weisbecker , "Paul E . McKenney" Subject: Re: [PATCH RFC] sched/fair: Avoid unnecessary IPIs for ILB Message-ID: <20231010193205.GA4011084@google.com> References: <20231005161727.1855004-1-joel@joelfernandes.org> <20231008173535.GD2338308@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Spam-Status: No, score=2.7 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, RCVD_IN_SBL_CSS,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on pete.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (pete.vger.email [0.0.0.0]); Tue, 10 Oct 2023 12:32:25 -0700 (PDT) X-Spam-Level: ** On Tue, Oct 10, 2023 at 3:15 AM Vincent Guittot wrote: > > On Sun, 8 Oct 2023 at 19:35, Joel Fernandes wrote: [...] > > One thing I am confused about in the original code is: > > > > tick_nohz_idle_stop_tick() is what sets the nohz.idle_cpus_mask. > > However, nohz_run_idle_balance() is called before that can happen, in > > do_idle(). So it is possible that NOHZ_NEWILB_KICK is set for a CPU but it is > > not yet in the mask. > > 2 things: > - update of nohz.idle_cpus_mask is not strictly aligned with cpu > entering/exiting idle state. It is set when entering but only cleared > during the next tick on the cpu because nohz.idle_cpus_mask is a > bottleneck when all CPUs enter/exit idle at high rate (usec). This > implies that a cpu entering idle can already be set in > nohz.idle_cpus_mask > - NOHZ_NEWILB_KICK is more about updating blocked load of others > already idle CPUs than the one entering idle which has already updated > its blocked load in newidle_balance() > > The goal of nohz_run_idle_balance() is to run ILB only for updating > the blocked load of already idle CPUs without waking up one of those > idle CPUs and outside the preempt disable / irq off phase of the local > cpu about to enter idle because it can take a long time. This makes complete sense, thank you for the background on this! Vineeth was just telling me in a 1:1 that he also tried doing the removal of the CPU from the idle mask in the restart-tick path. The result was that even though the mask modification is not as often as when doing it during the CPU coming out of idle, it is still higher than just doing it from the next busy tick, like in current mainline. As next steps we are looking into: 1. Monitor how often we set NEXT_KICK -- we think we can reduce the frequency of these even more and keep the overhead low. 2. Look more into the parallelism of nohz.next_balance updates (due to our next NEXT_KICK setting) and handle any race conditions. We are at the moment looking into if nohz.next_balance does not get set to the earliest value because of a race, and if so retry the operation. Something like (untested): if (likely(update_next_balance)) { do { WRITE_ONCE(nohz.next_balance, next_balance); if (likely(READ_ONCE(nohz.next_balance) <= next_balance)) { break; } cpu_relax(); } } Or a try_cmpxchg loop. I think after these items and a bit more testing, we should be good to send v2. Thanks.