Received: by 2002:a05:7412:ba23:b0:fa:4c10:6cad with SMTP id jp35csp505762rdb; Thu, 18 Jan 2024 09:44:44 -0800 (PST) X-Google-Smtp-Source: AGHT+IGgcanKx2egWaVaN7Ry8EGw6XGM335bw4omoi4CFMvMiX98nir+c4hrdg2jxp2gbCkL0PwY X-Received: by 2002:aa7:c482:0:b0:55a:47f7:7f05 with SMTP id m2-20020aa7c482000000b0055a47f77f05mr243133edq.7.1705599884454; Thu, 18 Jan 2024 09:44:44 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1705599884; cv=pass; d=google.com; s=arc-20160816; b=hhMtHTqCrNFTgAW+fTBdW7fLHUa5qj1yl4a7tXJn8OZbeEkB66tgmESy5WNr/ud0+/ /dXspTuuiPZ0POflAgHA3sTJel7yoDCMGqVnrzq9dyNtiA66bKS/FI1LNnspYn8dZSg/ 6dFmFvL/fEXGNJTEHM9iGN9YszbHXxUlIPjKODk9PJvPhEYCTwHtJ+LJQ2U73FHf6RMl CXG40CtB6HD3W6AH2yh1JW78EnRX4t097bDlW6SGJoeYOls585IHM8OU8eeW0fQ0QkA1 awcHpMtPr2ZLbXv69q8CHNeu9gZY2zzmM4aY8giXMqAsMb8k8YlqOZqbW4VIwGFooCMT rP6w== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=mime-version:list-unsubscribe:list-subscribe:list-id:precedence :references:message-id:in-reply-to:subject:cc:to:from:date :dkim-signature; bh=M+DaurYcoM0Iiblj8Kl2BjtzI1qZfuyNgY4ZprroW9s=; fh=WuLomYDyAW/aYAkZdiGMRXhA3UBMDOVvFe0bJPpPH3o=; b=A8WVFhbgGmCaoHXHFZphytSubci+DF/lf7v43mgiCjZf0H4bKBz/h0N+gc5S2lkiKx dUcuwVnbWSrrxL+XIXrJwLhLHOcNxQw+JAs2hjzGGwjCtehX/gJ8aSM7Ihy6Mt35L3Hx /LnGPEOKk1b9DvEqPsUq1c4HEtzctv0DquiI98QYwe4BQFgOhMW8YBl4vY7R4osozuln 6+0Fe7nwG8HlLHuEMU2iEgv4LrgRynYLHmgW9owWki6L3qDEsVVcXtrJrmRFpEU35OqY TS0R00ZaEbt/Pw7vQg+nK0dvm+iJk0Ni6vPUDzJnArHPTJlTzDTL/8kOWM9wMG50cLDZ VgdQ== ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@inria.fr header.s=dc header.b=XRJqiRxv; arc=pass (i=1 spf=pass spfdomain=inria.fr dkim=pass dkdomain=inria.fr dmarc=pass fromdomain=inria.fr); spf=pass (google.com: domain of linux-kernel+bounces-30434-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-30434-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=inria.fr Return-Path: Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [2604:1380:4601:e00::3]) by mx.google.com with ESMTPS id l4-20020a056402124400b0055a30594f7esi575515edw.477.2024.01.18.09.44.44 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 18 Jan 2024 09:44:44 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-30434-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) client-ip=2604:1380:4601:e00::3; Authentication-Results: mx.google.com; dkim=pass header.i=@inria.fr header.s=dc header.b=XRJqiRxv; arc=pass (i=1 spf=pass spfdomain=inria.fr dkim=pass dkdomain=inria.fr dmarc=pass fromdomain=inria.fr); spf=pass (google.com: domain of linux-kernel+bounces-30434-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-30434-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=inria.fr Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id 135381F22585 for ; Thu, 18 Jan 2024 17:44:44 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 3D08B2D608; Thu, 18 Jan 2024 17:44:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=inria.fr header.i=@inria.fr header.b="XRJqiRxv" Received: from mail2-relais-roc.national.inria.fr (mail2-relais-roc.national.inria.fr [192.134.164.83]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 584532D600 for ; Thu, 18 Jan 2024 17:44:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.134.164.83 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1705599875; cv=none; b=JML2wVg/PuIRrWKz/Qy4joFdFTgAbr6KWfsb2iU4DbmBp4+MNz+PlB4Nhu8ixMCqrTEqyifX5qw7vY7yer5aw0iz42Xx4y0DQXoy1ujddAAavS/Bpc8CBc5k28ElAG/Y13vcOq1SYo4KlJN/e9uy0Px5FJjqxUrYCfZ1xgHHOfw= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1705599875; c=relaxed/simple; bh=+EEn3iTOmt+kltd/ldWX9BmOtkos9GsLBVlCS0F0OGY=; h=Date:From:To:cc:Subject:In-Reply-To:Message-ID:References: MIME-Version:Content-Type; b=Fy6M598gsN3PzHsDqcNOA5ELohP4ofB8A3LbTDCZuPBOii7xvgeQeUuxqUT9O+Wkrs1PQU4STOKWvM9bHD5ASbrQHTbGNR0+6FIo/drBiCtHAQcuTW49KR+OKumpZjNJlvp1cJi/EtGFJfqpK2fSNIE7F8oWkqa2w+bLNmojj+U= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=inria.fr; spf=pass smtp.mailfrom=inria.fr; dkim=pass (1024-bit key) header.d=inria.fr header.i=@inria.fr header.b=XRJqiRxv; arc=none smtp.client-ip=192.134.164.83 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=inria.fr Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=inria.fr DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=inria.fr; s=dc; h=date:from:to:cc:subject:in-reply-to:message-id: references:mime-version; bh=M+DaurYcoM0Iiblj8Kl2BjtzI1qZfuyNgY4ZprroW9s=; b=XRJqiRxvIACJO8Ujkt9p0TfIxL8y12Elv1QVIPCLPKK+3MK3cL2Clxhy +zhZb2O0/Yp5nLcrjwPkSk71XfrtsJQRAsxsODOyQKSfRneYEDSteKjJD edHsF23pOUjkGh8cpp97+31IQlThaH+PTU4vxvgbp8DRdsGaB77WYReM2 U=; Authentication-Results: mail2-relais-roc.national.inria.fr; dkim=none (message not signed) header.i=none; spf=SoftFail smtp.mailfrom=julia.lawall@inria.fr; dmarc=fail (p=none dis=none) d=inria.fr X-IronPort-AV: E=Sophos;i="6.05,203,1701126000"; d="scan'208";a="147455876" Received: from dt-lawall.paris.inria.fr ([128.93.67.65]) by mail2-relais-roc.national.inria.fr with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Jan 2024 18:43:20 +0100 Date: Thu, 18 Jan 2024 18:43:19 +0100 (CET) From: Julia Lawall To: Vincent Guittot cc: Peter Zijlstra , Ingo Molnar , Dietmar Eggemann , Mel Gorman , linux-kernel@vger.kernel.org Subject: Re: EEVDF and NUMA balancing In-Reply-To: Message-ID: <7231bfb1-9acc-656-c6b6-20bd8624e08a@inria.fr> References: <9dc451b5-9dd8-89f2-1c9c-7c358faeaad@inria.fr> <2359ab5-4556-1a73-9255-3fcf2fc57ec@inria.fr> <6618dcfa-a42f-567c-2a9d-a76786683b29@inria.fr> <7a845b43-bd8e-6c7d-6bca-2e6f174f671@inria.fr> <36f2cc93-db10-5977-78ab-d9d07c3f445@inria.fr> <424169db-49df-f168-d7f7-b48efe6ada@inria.fr> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII On Thu, 18 Jan 2024, Vincent Guittot wrote: > On Thu, 18 Jan 2024 at 17:50, Julia Lawall wrote: > > > > > > > > On Thu, 18 Jan 2024, Vincent Guittot wrote: > > > > > Hi Julia, > > > > > > Sorry for the delay. I have been involved on other perf regression > > > > > > On Fri, 5 Jan 2024 at 18:27, Julia Lawall wrote: > > > > > > > > > > > > > > > > On Fri, 5 Jan 2024, Julia Lawall wrote: > > > > > > > > > > > > > > > > > > > On Fri, 5 Jan 2024, Vincent Guittot wrote: > > > > > > > > > > > On Fri, 5 Jan 2024 at 15:51, Julia Lawall wrote: > > > > > > > > > > > > > > > Your system is calling the polling mode and not the default > > > > > > > > cpuidle_idle_call() ? This could explain why I don't see such problem > > > > > > > > on my system which doesn't have polling > > > > > > > > > > > > > > > > Are you forcing the use of polling mode ? > > > > > > > > If yes, could you check that this problem disappears without forcing > > > > > > > > polling mode ? > > > > > > > > > > > > > > I expanded the code in do_idle to: > > > > > > > > > > > > > > if (cpu_idle_force_poll) { c1++; > > > > > > > tick_nohz_idle_restart_tick(); > > > > > > > cpu_idle_poll(); > > > > > > > } else if (tick_check_broadcast_expired()) { c2++; > > > > > > > tick_nohz_idle_restart_tick(); > > > > > > > cpu_idle_poll(); > > > > > > > } else { c3++; > > > > > > > cpuidle_idle_call(); > > > > > > > } > > > > > > > > > > > > > > Later, I have: > > > > > > > > > > > > > > trace_printk("force poll: %d: c1: %d, c2: %d, c3: %d\n",cpu_idle_force_poll, c1, c2, c3); > > > > > > > flush_smp_call_function_queue(); > > > > > > > schedule_idle(); > > > > > > > > > > > > > > force poll, c1 and c2 are always 0, and c3 is always some non-zero value. > > > > > > > Sometimes small (often 1), and sometimes large (304 or 305). > > > > > > > > > > > > > > So I don't think it's calling cpu_idle_poll(). > > > > > > > > > > > > I agree that something else > > > > > > > > > > > > > > > > > > > > x86 has TIF_POLLING_NRFLAG defined to be a non zero value, which I think > > > > > > > is sufficient to cause the issue. > > > > > > > > > > > > Could you trace trace_sched_wake_idle_without_ipi() ans csd traces as well ? > > > > > > I don't understand what set need_resched() in your case; having in > > > > > > mind that I don't see the problem on my Arm systems and IIRC Peter > > > > > > said that he didn't face the problem on his x86 system. > > > > > > > > > > TIF_POLLING_NRFLAG doesn't seem to be defined on Arm. > > > > > > > > > > Peter said that he didn't see the problem, but perhaps that was just > > > > > random. It requires a NUMA move to occur. I make 20 runs to be sure to > > > > > see the problem at least once. But another machine might behave > > > > > differently. > > > > > > > > > > I believe the call chain is: > > > > > > > > > > scheduler_tick > > > > > trigger_load_balance > > > > > nohz_balancer_kick > > > > > kick_ilb > > > > > smp_call_function_single_async > > > > > generic_exec_single > > > > > __smp_call_single_queue > > > > > send_call_function_single_ipi > > > > > call_function_single_prep_ipi > > > > > set_nr_if_polling <====== sets need_resched > > > > > > > > > > I'll make a trace to reverify that. > > > > > > > > This is what I see at a tick, which corresponds to the call chain shown > > > > above: > > > > > > > > bt.B.x-4184 [046] 466.410605: bputs: scheduler_tick: calling trigger_load_balance > > > > bt.B.x-4184 [046] 466.410605: bputs: trigger_load_balance: calling nohz_balancer_kick > > > > bt.B.x-4184 [046] 466.410605: bputs: trigger_load_balance: calling kick_ilb > > > > bt.B.x-4184 [046] 466.410607: bprint: trigger_load_balance: calling smp_call_function_single_async 22 > > > > bt.B.x-4184 [046] 466.410607: bputs: smp_call_function_single_async: calling generic_exec_single > > > > bt.B.x-4184 [046] 466.410607: bputs: generic_exec_single: calling __smp_call_single_queue > > > > bt.B.x-4184 [046] 466.410608: bputs: __smp_call_single_queue: calling send_call_function_single_ipi > > > > bt.B.x-4184 [046] 466.410608: bputs: __smp_call_single_queue: calling call_function_single_prep_ipi > > > > bt.B.x-4184 [046] 466.410608: bputs: call_function_single_prep_ipi: calling set_nr_if_polling > > > > bt.B.x-4184 [046] 466.410609: sched_wake_idle_without_ipi: cpu=22 > > > > > > I don't know if you have made progress on this in the meantime. > > > > Not really. Basically after do_idle, there is the call to > > flush_smp_call_function_queue that invokes the deposited functions, which > > in our case is at best going to raise a softirq, and the call to schedule. > > Raising a softirq doesn't happen because of the check for need_resched. > > But even if that test were removed, it would still not be useful because > > there would be the ksoftirqd running on the idle core that would eliminate > > the imbalance between the sockets. Maybe there could be some way to call > > run_rebalance_domains directly from nohz_csd_func, since > > run_rebalance_domains doesn't use its argument, but at the moment > > run_rebalance_domains is not visible to nohz_csd_func. > > All this happen because we don't use an ipi, it should not use > ksoftirqd with ipi > > > > > > > > > Regarding the trace above, do you know if anything happens on CPU22 > > > just before the scheduler tried to kick the ILB on it ? > > > > I don't think so. It's idle. > > Ok, so if it is idle for a while , I mean nothing happened on it, not > even spurious irq, It should have cleared its TIF_POLLING_NRFLAG > > I would be good to trace the selected idle state > > > > > > Have you found why TIF_POLLING_NRFLAG seems to be always set when the > > > kick_ilb happens ? It should be cleared once entering the idle state. > > > > Actually, I don't think it is always set. It switches back and forth > > between two cases. I will look for the traces that show that. > > > > > Could you check your cpuidle driver ? > > > > Check what specifically? > > $ cat /sys/devices/system/cpu/cpuidle/current_driver > $ cat /sys/devices/system/cpu/cpuidle/current_governor intel_idle and menu julia