Received: by 2002:a05:7412:b995:b0:f9:9502:5bb8 with SMTP id it21csp7823884rdb; Thu, 4 Jan 2024 08:45:20 -0800 (PST) X-Google-Smtp-Source: AGHT+IEDQlmtBQJTr1F4dWnuEGMxeyRTyI1ZHqKvqcb8u5sCEUY3oy0jUSFgvE753FOe6DdHIoFv X-Received: by 2002:a17:906:1d9:b0:a28:a9f3:48b7 with SMTP id 25-20020a17090601d900b00a28a9f348b7mr265976ejj.137.1704386720247; Thu, 04 Jan 2024 08:45:20 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1704386720; cv=none; d=google.com; s=arc-20160816; b=JBsTjkePs/f8Co1U0+5LQQQEKlFFD8iWaxKxR6NekL9dvW7f2SN1+7OZXNHdNEgY76 uppmrbQvtKBYRHAu/QdjtYxwvnnvMoih7nmX4EaLq1UyNNFIHH7S9lkbTQa3BMcehkRe pQulf6BZJvkr3gDmUqkZ+Wzp2vhamdF+1yc/Nj5qfwNPNrEc8x7mj6U3guf7xaz9Hbwr 3Ydt9rA/+9bzXrrNZEP/hcbB/ajouALfuyK4thjvtFHePlACIvQuVkzqNuU8/N4KlAHC WR+rNj/Q/FWml+2YWaS2kBGcUFHdwW9ZqU9dg3tq2njvosvb2A6tcK0jIx4BIK5pzt/n z1tQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=mime-version:list-unsubscribe:list-subscribe:list-id:precedence :references:message-id:in-reply-to:subject:cc:to:from:date :dkim-signature; bh=dtqnjFUoWCgQFhsvOFi3p6DWIG72PcJq28mOXBoIE+I=; fh=WuLomYDyAW/aYAkZdiGMRXhA3UBMDOVvFe0bJPpPH3o=; b=cBSt1YhVjr4DzyOaRBuDVqJeyspdof03sc/DOXqs6q5CQETgNqV4gN1MbDwmx22aMG 7qhZEYz+Qp5ik7IOPbGabmhgkiCSE6TIW54esDjmPwfZpURIqm8M8abd1LDKSxD7jbD1 dm1Y330vCA5wBD6cJnOF6YM4Pgu0lrxGgmYAcX4ntRnZM8JN7YkhUGmHxvvJ2yjIVFuX /vW4S3PijpPznCfcClIC9NQ/OWmfz23fosYozlFby9KRkQJM0/LB9E1gD7umwW+WPxPd /LKUqxJQelde7CFhRhOHK2ZoR2JkcMo3/CMLa0RnQTBYKLM8xsgH9xIuXZ5RgJwwgKXA t7lw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@inria.fr header.s=dc header.b=dTK5ArS5; spf=pass (google.com: domain of linux-kernel+bounces-16959-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-16959-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=inria.fr Return-Path: Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [147.75.80.249]) by mx.google.com with ESMTPS id c21-20020a170906529500b00a28b34d8d6asi1195087ejm.218.2024.01.04.08.45.20 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 04 Jan 2024 08:45:20 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-16959-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) client-ip=147.75.80.249; Authentication-Results: mx.google.com; dkim=pass header.i=@inria.fr header.s=dc header.b=dTK5ArS5; spf=pass (google.com: domain of linux-kernel+bounces-16959-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-16959-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=inria.fr Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id 03BAD1F25915 for ; Thu, 4 Jan 2024 16:45:20 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id E5E682511F; Thu, 4 Jan 2024 16:45:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=inria.fr header.i=@inria.fr header.b="dTK5ArS5" X-Original-To: linux-kernel@vger.kernel.org Received: from mail2-relais-roc.national.inria.fr (mail2-relais-roc.national.inria.fr [192.134.164.83]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D1D73250F2 for ; Thu, 4 Jan 2024 16:45:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=inria.fr Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=inria.fr DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=inria.fr; s=dc; h=date:from:to:cc:subject:in-reply-to:message-id: references:mime-version; bh=dtqnjFUoWCgQFhsvOFi3p6DWIG72PcJq28mOXBoIE+I=; b=dTK5ArS5DLrLrlZwo117/TmXXHVaqC9Y54eo4Oq47ZOj8e1XZv5PJnTT U46uTwivPc0v8RReY/D4ieQMFl1B9/ammwAo6apJrUff67CbPTxO52FK8 QiVUH+6+UPl0jKHf5cIHBuWtu5cc7XsrVj8WhdFbriF2bpABnvmYVg0HQ I=; Authentication-Results: mail2-relais-roc.national.inria.fr; dkim=none (message not signed) header.i=none; spf=SoftFail smtp.mailfrom=julia.lawall@inria.fr; dmarc=fail (p=none dis=none) d=inria.fr X-IronPort-AV: E=Sophos;i="6.04,331,1695679200"; d="scan'208";a="144905714" Received: from dt-lawall.paris.inria.fr ([128.93.67.65]) by mail2-relais-roc.national.inria.fr with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Jan 2024 17:45:02 +0100 Date: Thu, 4 Jan 2024 17:45:02 +0100 (CET) From: Julia Lawall To: Vincent Guittot cc: Peter Zijlstra , Ingo Molnar , Dietmar Eggemann , Mel Gorman , linux-kernel@vger.kernel.org Subject: Re: EEVDF and NUMA balancing In-Reply-To: Message-ID: <8daf59ab-5f73-1f3-251e-bb9cc72a598@inria.fr> References: <98b3df1-79b7-836f-e334-afbdd594b55@inria.fr> <93112fbe-30be-eab8-427c-5d4670a0f94e@inria.fr> <9dc451b5-9dd8-89f2-1c9c-7c358faeaad@inria.fr> <2359ab5-4556-1a73-9255-3fcf2fc57ec@inria.fr> <6618dcfa-a42f-567c-2a9d-a76786683b29@inria.fr> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII On Thu, 4 Jan 2024, Vincent Guittot wrote: > On Fri, 29 Dec 2023 at 16:18, Julia Lawall wrote: > > > > > > > > On Thu, 28 Dec 2023, Julia Lawall wrote: > > > > > > > > > > > I'm surprised that you have mainly CPU_NEWLY_IDLE. Do you know the reason ? > > > > > > > > > > > > > > > > No. They come from do_idle calling the scheduler. I will look into why > > > > > > > > this happens so often. > > > > > > > > > > > > > > Hmm, the CPU was idle and received a need resched which triggered the > > > > > > > scheduler but there was nothing to schedule so it goes back to idle > > > > > > > after running a newly_idle _load_balance. > > > > > > > > > > > > I spent quite some time thinking the same until I saw the following code > > > > > > in do_idle: > > > > > > > > > > > > preempt_set_need_resched(); > > > > > > > > > > > > So I have the impression that do_idle sets need resched itself. > > > > > > > > > > But of course that code is only executed if need_resched is true. But I > > > > > > > > Yes, that is your root cause. something, most probably in interrupt > > > > context, wakes up your CPU and expect to wake up a thread > > > > > > > > > don't know who would be setting need resched on each clock tick. > > > > > > > > that can be a timer, interrupt, ipi, rcu ... > > > > a trace should give you some hints > > > > > > I have the impression that it is the goal of calling nohz_csd_func on each > > > clock tick that causes the calls to need_resched. If the idle process is > > > polling, call_function_single_prep_ipi just sets need_resched to get the > > Your system is calling the polling mode and not the default > cpuidle_idle_call() ? This could explain why I don't see such problem > on my system which doesn't have polling > > Are you forcing the use of polling mode ? > If yes, could you check that this problem disappears without forcing > polling mode ? I'll check. I didn't explicitly set anything, but I don't really know what my configuration file does. > > > > idle process to stop polling. But there is no actual task that the idle > > > process should schedule. The need_resched then prevents the idle process > > > from stealing, due to the CPU_NEWLY_IDLE flag, contradicting the whole > > > purpose of calling nohz_csd_func in the first place. > > Do I understand correctly that your sequence is : > CPU A CPU B > cpu enters idle > do_idle() > ... > loop in cpu_idle_poll > ... > kick_ilb on CPU A > send_call_function_single_ipi > set_nr_if_polling > set TIF_NEED_RESCHED > > exit polling loop > exit while (!need_resched()) > > call nohz_csd_func but > need_resched is true so it's a nope > > pick_next_task_fair > newidle_balance > load_balance(CPU_NEWLY_IDLE) Yes, this looks correct. thanks, julia > > > > > Looking in more detail, do_idle contains the following after existing the > > polling loop: > > > > flush_smp_call_function_queue(); > > schedule_idle(); > > > > flush_smp_call_function_queue() does end up calling nohz_csd_func, but > > this has no impact, because it first checks that need_resched() is false, > > whereas it is currently true to cause existing the polling loop. Removing > > that test causes: > > > > raise_softirq_irqoff(SCHED_SOFTIRQ); > > > > but that causes the load balancing code to be executed from a ksoftirqd > > task, which means that there is now no load imbalance. > > > > So the only chance to detect an imbalance does seem to be to have the load > > balance call be executed by the idle task, via schedule_idle(), as is > > done currently. But that leads to the core being considered to be newly > > idle. > > > > julia > > > > >