Received: by 2002:a25:23cc:0:0:0:0:0 with SMTP id j195csp706859ybj; Thu, 7 May 2020 05:55:15 -0700 (PDT) X-Google-Smtp-Source: APiQypJyLLlRC3OKgT8eU9TBfEBrqZdDePs7cNHPv9KI+5i0FIjbuQY4VQatR6sVx8jPhB4Dsak1 X-Received: by 2002:a50:9b53:: with SMTP id a19mr12046757edj.104.1588856114723; Thu, 07 May 2020 05:55:14 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1588856114; cv=none; d=google.com; s=arc-20160816; b=P/swc8q61XcT6iB28riiHMf7gRTyLahqfRznDxAc2ilikL81AbZIZ3fN109aBeXIZj gC6BCeHH5ZKfiHKSkAPDLYsYXkORec3RBPn7Rfh3tvuo/6gq4/U6g43em9opxgvp7QTW fXny9rDc1wQzEMDQBRepjbi3FqOdvGaorWpj6LsYlVlcAms1qwpuytrSQt6oJOMrz4UB kwwvzzUO/cYKanO25YQJg+7HL8Sc6z21J5gNg5uNDKGXXsTE4iEe7LC0c+90ngD1ATEE Tok7Q8HsAee7rvEZFT9lMprACzSU8peq0a8wPZKJ2ipwUKRXz+lni+1fIM9AUKeBiuBm 7kWQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=itoFpk0rOnDLM0PwGwQS9avTS4/3qHGUdIW12cSPKBA=; b=AAP8fwth3gDATrm+L3FysVQmYHUCO07ga170MIPgHTP00VIutBBdguHjonMU8PMeGk xuNpgd4cWY/ywxYPQEwIy/DyEHgfWe0gNCxpLhQpAlg6kKfVOquz4uGmk9q/0QLMua3W OXq9Impkd7k8ZjJacaIDEIfmhOTfwRDoC0L6o6Z4mIXc0sJvx7HPIagJdb4lkx8srYQE JFqqiaWlydcxmgdUJVZwthCjQec45Mw11Pq1Q2MpE71JtiBTwIp/fxYeUfxCVe3SZcP4 8vfu210sHs9tQq8IaYfvraiahHFqTDRWc5ERIL/4wuj4yw2bjzevBzZtO0g3r8nw4AFw neMQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=YcEl2bDR; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id d13si3086298eds.361.2020.05.07.05.54.51; Thu, 07 May 2020 05:55:14 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=YcEl2bDR; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726464AbgEGMxW (ORCPT + 99 others); Thu, 7 May 2020 08:53:22 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34170 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-FAIL-OK-FAIL) by vger.kernel.org with ESMTP id S1725923AbgEGMxV (ORCPT ); Thu, 7 May 2020 08:53:21 -0400 Received: from mail-lj1-x243.google.com (mail-lj1-x243.google.com [IPv6:2a00:1450:4864:20::243]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DD770C05BD43 for ; Thu, 7 May 2020 05:53:19 -0700 (PDT) Received: by mail-lj1-x243.google.com with SMTP id g4so6192499ljl.2 for ; Thu, 07 May 2020 05:53:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=itoFpk0rOnDLM0PwGwQS9avTS4/3qHGUdIW12cSPKBA=; b=YcEl2bDRJI3iJB/8XtTG3CXQif1X5eYquMR0qzjP5Mpjs7ln8lzeYiI/LVLAweTafJ Z6wpNJ/eBMsw+9R5PB85KJN2qZgP51MVjKRbPAFlEnJXJ95LGd8ryQm2/5HCQEzaHunB TgqoS0zsvzozAROKQxZwxk8eAyiSyBxwNs2Eu01aqneS3eyqfunRRwrOMvvAZAXfVWE8 KNt40gSMiiKSK//wsn2rBR+sb7Bep9eQV2WZTWlYH5zGfwLd9ClOnIBtqrH6jG6nTIbF 6rkfR+lqm08l3D0S3/QBvBN38WvifoXjKyFHpctGJ+nOAlumAB0/2umm0L+/tUygS46Z tVAg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=itoFpk0rOnDLM0PwGwQS9avTS4/3qHGUdIW12cSPKBA=; b=aT8VbI96BlzpJKOLwcehUIikcWbuHqBByjhEf99iE8m891oXE0qx3eJ/VnOsRSxjMX TeW6Cjm+lHBmAbdViGub0u60Q37DwXP2OsFn9zd/oDGE+qDSpveB3TwWmIS0EN7Za6eW LvXpwESa/84DeuPht2q+Dc2tP8dfiNnjpWd0XxGsZgZzJyrjI7cjQXwot2PRW5zkaWpg iSAWKKZHiPXi8zYvb5PBKp0e1IUupE+0GiwAmfr3CuLV6q6Syi09dP2qXfAy48ipD7tA t+W5aODKlTNtJy5iuNir11nrsUKTrBSx1AT+P58dTP+Si/TYTP+zBTLyUUCSGBTn+YqF Q4WA== X-Gm-Message-State: AGi0PuaPqe5EUlxZvH/RaEABjFVlJnLeV6CePg+VTvZ7YePX8VEHjkJl qmA9Fx/y/L0ANDwG9zDLANM2Impm4T9TocM9HUqkCw== X-Received: by 2002:a2e:95d2:: with SMTP id y18mr8327928ljh.65.1588855997313; Thu, 07 May 2020 05:53:17 -0700 (PDT) MIME-Version: 1.0 References: <20200503083407.GA27766@iZj6chx1xj0e0buvshuecpZ> <20200505134056.GA31680@iZj6chx1xj0e0buvshuecpZ> <20200505142711.GA12952@vingu-book> <20200507124104.GA2769@iZj6chx1xj0e0buvshuecpZ> In-Reply-To: <20200507124104.GA2769@iZj6chx1xj0e0buvshuecpZ> From: Vincent Guittot Date: Thu, 7 May 2020 14:53:05 +0200 Message-ID: Subject: Re: [PATCH] sched/fair: Fix nohz.next_balance update To: Peng Liu Cc: Valentin Schneider , Ingo Molnar , Peter Zijlstra , Juri Lelli , Steven Rostedt , Ben Segall , Mel Gorman , linux-kernel , Dietmar Eggemann Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, 7 May 2020 at 14:41, Peng Liu wrote: > > On Wed, May 06, 2020 at 05:02:56PM +0100, Valentin Schneider wrote: > > > > On 06/05/20 14:45, Vincent Guittot wrote: > > >> But then we may skip an update if we goto abort, no? Imagine we have just > > >> NOHZ_STATS_KICK, so we don't call any rebalance_domains(), and then as we > > >> go through the last NOHZ CPU in the loop we hit need_resched(). We would > > >> end in the abort part without any update to nohz.next_balance, despite > > >> having accumulated relevant data in the local next_balance variable. > > > > > > Yes but on the other end, the last CPU has not been able to run the > > > rebalance_domain so we must not move nohz.next_balance otherwise it > > > will have to wait for at least another full period > > > In fact, I think that we have a problem with current implementation > > > because if we abort because local cpu because busy we might end up > > > skipping idle load balance for a lot of idle CPUs > > > > > > As an example, imagine that we have 10 idle CPUs with the same > > > rq->next_balance which equal nohz.next_balance. _nohz_idle_balance > > > starts on CPU0, it processes idle lb for CPU1 but then has to abort > > > because of need_resched. If we update nohz.next_balance like > > > currently, the next idle load balance will happen after a full > > > balance interval whereas we still have 8 CPUs waiting for running an > > > idle load balance. > > > > > > My proposal also fixes this problem > > > > > > > That's a very good point; so with NOHZ_BALANCE_KICK we can reduce > > nohz.next_balance via rebalance_domains(), and otherwise we would only > > increase it if we go through a complete for_each_cpu() loop in > > _nohz_idle_balance(). > > > > That said, if for some reason we keep bailing out of the loop, we won't > > push nohz.next_balance forward and thus may repeatedly nohz-balance only > > the first few CPUs in the NOHZ mask. I think that can happen if we have > > say 2 tasks pinned to a single rq, in that case nohz_balancer_kick() will > > kick a NOHZ balance whenever now >= nohz.next_balance. > > If we face the risk of "repeatly nohz-balance only the first few CPUs", > Maybe we could remember the interrupted CPU and start nohz-balance from > it next time. Just replace the loop in _nohz_idle_balance() like: > > for_each_cpu_wrap(cpu, nohz.idle_cpus_mask, nohz.anchor) { > ... > if (need_resched()) { > ... > nohz.anchor = cpu; > ... > } > ... > } > > This can mitigate the problem, but this can't help the extreme situation If we rerun _nohz_idle_balance before the balance interval, the 1st idle CPUs that has already been balanced will be skipped because there rq->next_balance will be after jiffies and we will start calling rebalance_domains for idle CPUs which have not yet been balance. So I'm not sure that this will help a lot because we have to go through all idle CPU to set the nohz.next_balance at the end > as @Vincent put, it always failed in the same CPU. In the case that i described above, the problem comes from the cpu that is selected to run the ilb but if we kick ilb again, it will not be selected again.