Received: by 2002:a05:7412:ba23:b0:fa:4c10:6cad with SMTP id jp35csp937355rdb; Fri, 19 Jan 2024 03:34:09 -0800 (PST) X-Google-Smtp-Source: AGHT+IHWSzgGzIvRg4hBZ1DQTdvVYPAtqUwICo1RwN8lvgtW0YUI9I+1E9YcqUIadIcNpbEUcPGd X-Received: by 2002:a05:622a:144a:b0:42a:17f7:6dd0 with SMTP id v10-20020a05622a144a00b0042a17f76dd0mr2676194qtx.26.1705664049522; Fri, 19 Jan 2024 03:34:09 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1705664049; cv=pass; d=google.com; s=arc-20160816; b=F7UdXB5bK/u1K44g4YEJxqJ8Y4vJfMX2JfYNRZUauuQMOKP+BiQFWhRNKhBDShd7lR sx9OV+rFQUGa0KldFYwbrBeDUlLp62NDyxsKlUUnEYYDV+/lML/BHRdushYq7y6zqF7M 4MCciTgWEnFxW+MQ/Mdr7NS0f4u5VEhVf9ubN4s3FQQiZh98hOcQS6CBkgybDHRvDP7L xp477rxl7xSZNxTd4gWMq4MBvXnLmpYitsXTRucns4RJOvWbU7D7mE1TZ8SxkXHoMYJb rDWz7uUQRWtg22C54xTuJW9tcnoLeeqK7Ou7YDjeJ/+xpmuWtvDj9ZkwjsxwGJ7YXGq5 F4bA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=mime-version:list-unsubscribe:list-subscribe:list-id:precedence :references:message-id:in-reply-to:subject:cc:to:from:date :dkim-signature; bh=wP/DdvcOGUlJXXy9GljyQe4gdeJG18qvYxNBEfg6bIQ=; fh=WuLomYDyAW/aYAkZdiGMRXhA3UBMDOVvFe0bJPpPH3o=; b=k2pNDwk/A950VZMO7QwZaRYxZkvilBA1VK/S0Xumb+spnfakt3t1NG3I+aXD60RIlu L/m2Pqym3K0X3f+pLcvNiuwpmT8PNXG2EgJA3emZ8ioujeHg7LXW/+mqcx/OCZ9EO1R1 U4x9a3eRAolLeVtn6F1Vsb7L1ORTMgIjDw6UIajrdmCUqCC39/UtNToagY5CtDj/mna6 uSrBzOpMBoqKQfBMviYWse/Vh/70FfaEIkk4FYL9BFTffqEuHXMAyE1tsZGa5UfSIAkR i0Eg0ycqcPDQb0j7//6uVVi2ZAnP8uPpAOU/3bJ3auKGH7hGQrRBV2p9tYPR1GL1qG85 RcDQ== ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@inria.fr header.s=dc header.b=kT8CILjt; arc=pass (i=1 spf=pass spfdomain=inria.fr dkim=pass dkdomain=inria.fr dmarc=pass fromdomain=inria.fr); spf=pass (google.com: domain of linux-kernel+bounces-31098-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-31098-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=inria.fr Return-Path: Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [2604:1380:45d1:ec00::1]) by mx.google.com with ESMTPS id h17-20020ac87d51000000b0042a2844cb26si644118qtb.227.2024.01.19.03.34.09 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 19 Jan 2024 03:34:09 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-31098-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) client-ip=2604:1380:45d1:ec00::1; Authentication-Results: mx.google.com; dkim=pass header.i=@inria.fr header.s=dc header.b=kT8CILjt; arc=pass (i=1 spf=pass spfdomain=inria.fr dkim=pass dkdomain=inria.fr dmarc=pass fromdomain=inria.fr); spf=pass (google.com: domain of linux-kernel+bounces-31098-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-31098-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=inria.fr Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id 386021C23A15 for ; Fri, 19 Jan 2024 11:34:09 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 663164EB27; Fri, 19 Jan 2024 11:34:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=inria.fr header.i=@inria.fr header.b="kT8CILjt" Received: from mail3-relais-sop.national.inria.fr (mail3-relais-sop.national.inria.fr [192.134.164.104]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AEB864CB5A for ; Fri, 19 Jan 2024 11:33:57 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.134.164.104 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1705664041; cv=none; b=nSckZFiDQsgUaj0mbkDZcf/gFjh6gU9gGCkKyaSSnHTPW6LNOMGfrj8Fx1FLKj5Y6OzFSeinaJqNs5Mwj8JBHkyMOJbGudlHWx2pZ86ptU/xOHf5hO4z99Hd3uZ4aUiMYqgrltJgtyuovGx6m18ONqXYQaV/MR8nMYdqLf2kTCA= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1705664041; c=relaxed/simple; bh=D+FHOV9VjJQGGct7f49AWQ0x7UfQJ5H32cxu3Gd6Fqw=; h=Date:From:To:cc:Subject:In-Reply-To:Message-ID:References: MIME-Version:Content-Type; b=LLFZ7+twdINI9Gk0WEfw35wO8eckqNqNYrQ3FC/G2uP5n3PctVfMFl82h9jipVxzRltiOLiYGkvJa7Yz/cLIx2giF8XLK8oVMLDOi0oRcZrE1yCZI8LHz35B5sZcvVOJv8LASeTVpmz5Fg/6TzVQISrt5IugwguF+AsuWFKzDAk= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=inria.fr; spf=pass smtp.mailfrom=inria.fr; dkim=pass (1024-bit key) header.d=inria.fr header.i=@inria.fr header.b=kT8CILjt; arc=none smtp.client-ip=192.134.164.104 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=inria.fr Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=inria.fr DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=inria.fr; s=dc; h=date:from:to:cc:subject:in-reply-to:message-id: references:mime-version; bh=wP/DdvcOGUlJXXy9GljyQe4gdeJG18qvYxNBEfg6bIQ=; b=kT8CILjtbM8maV2fstN7HfufMG+k4rzYOpT9AaxJN0iJroIb+16y9n1z 2UF2AYQ2kGtLVBJP1CEEJcm7OmHDe4cln+02ULeZszRKknOgpbn+cqJJW HjDHuYpTTn/GjfZ026fy8zX7JYoiQusC3qM9NayvHCMl0i/BOTbWJ4FGE k=; Authentication-Results: mail3-relais-sop.national.inria.fr; dkim=none (message not signed) header.i=none; spf=SoftFail smtp.mailfrom=julia.lawall@inria.fr; dmarc=fail (p=none dis=none) d=inria.fr X-IronPort-AV: E=Sophos;i="6.05,204,1701126000"; d="scan'208";a="77272638" Received: from dt-lawall.paris.inria.fr ([128.93.67.65]) by mail3-relais-sop.national.inria.fr with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Jan 2024 12:33:55 +0100 Date: Fri, 19 Jan 2024 12:33:54 +0100 (CET) From: Julia Lawall To: Vincent Guittot cc: Peter Zijlstra , Ingo Molnar , Dietmar Eggemann , Mel Gorman , linux-kernel@vger.kernel.org Subject: Re: EEVDF and NUMA balancing In-Reply-To: Message-ID: References: <9dc451b5-9dd8-89f2-1c9c-7c358faeaad@inria.fr> <2359ab5-4556-1a73-9255-3fcf2fc57ec@inria.fr> <6618dcfa-a42f-567c-2a9d-a76786683b29@inria.fr> <7a845b43-bd8e-6c7d-6bca-2e6f174f671@inria.fr> <36f2cc93-db10-5977-78ab-d9d07c3f445@inria.fr> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII On Fri, 19 Jan 2024, Vincent Guittot wrote: > On Thu, 18 Jan 2024 at 23:13, Julia Lawall wrote: > > > > > > > > On Thu, 18 Jan 2024, Vincent Guittot wrote: > > > > > Hi Julia, > > > > > > Sorry for the delay. I have been involved on other perf regression > > > > > > On Fri, 5 Jan 2024 at 18:27, Julia Lawall wrote: > > > > > > > > > > > > > > > > On Fri, 5 Jan 2024, Julia Lawall wrote: > > > > > > > > > > > > > > > > > > > On Fri, 5 Jan 2024, Vincent Guittot wrote: > > > > > > > > > > > On Fri, 5 Jan 2024 at 15:51, Julia Lawall wrote: > > > > > > > > > > > > > > > Your system is calling the polling mode and not the default > > > > > > > > cpuidle_idle_call() ? This could explain why I don't see such problem > > > > > > > > on my system which doesn't have polling > > > > > > > > > > > > > > > > Are you forcing the use of polling mode ? > > > > > > > > If yes, could you check that this problem disappears without forcing > > > > > > > > polling mode ? > > > > > > > > > > > > > > I expanded the code in do_idle to: > > > > > > > > > > > > > > if (cpu_idle_force_poll) { c1++; > > > > > > > tick_nohz_idle_restart_tick(); > > > > > > > cpu_idle_poll(); > > > > > > > } else if (tick_check_broadcast_expired()) { c2++; > > > > > > > tick_nohz_idle_restart_tick(); > > > > > > > cpu_idle_poll(); > > > > > > > } else { c3++; > > > > > > > cpuidle_idle_call(); > > > > > > > } > > > > > > > > > > > > > > Later, I have: > > > > > > > > > > > > > > trace_printk("force poll: %d: c1: %d, c2: %d, c3: %d\n",cpu_idle_force_poll, c1, c2, c3); > > > > > > > flush_smp_call_function_queue(); > > > > > > > schedule_idle(); > > > > > > > > > > > > > > force poll, c1 and c2 are always 0, and c3 is always some non-zero value. > > > > > > > Sometimes small (often 1), and sometimes large (304 or 305). > > > > > > > > > > > > > > So I don't think it's calling cpu_idle_poll(). > > > > > > > > > > > > I agree that something else > > > > > > > > > > > > > > > > > > > > x86 has TIF_POLLING_NRFLAG defined to be a non zero value, which I think > > > > > > > is sufficient to cause the issue. > > > > > > > > > > > > Could you trace trace_sched_wake_idle_without_ipi() ans csd traces as well ? > > > > > > I don't understand what set need_resched() in your case; having in > > > > > > mind that I don't see the problem on my Arm systems and IIRC Peter > > > > > > said that he didn't face the problem on his x86 system. > > > > > > > > > > TIF_POLLING_NRFLAG doesn't seem to be defined on Arm. > > > > > > > > > > Peter said that he didn't see the problem, but perhaps that was just > > > > > random. It requires a NUMA move to occur. I make 20 runs to be sure to > > > > > see the problem at least once. But another machine might behave > > > > > differently. > > > > > > > > > > I believe the call chain is: > > > > > > > > > > scheduler_tick > > > > > trigger_load_balance > > > > > nohz_balancer_kick > > > > > kick_ilb > > > > > smp_call_function_single_async > > > > > generic_exec_single > > > > > __smp_call_single_queue > > > > > send_call_function_single_ipi > > > > > call_function_single_prep_ipi > > > > > set_nr_if_polling <====== sets need_resched > > > > > > > > > > I'll make a trace to reverify that. > > > > > > > > This is what I see at a tick, which corresponds to the call chain shown > > > > above: > > > > > > > > bt.B.x-4184 [046] 466.410605: bputs: scheduler_tick: calling trigger_load_balance > > > > bt.B.x-4184 [046] 466.410605: bputs: trigger_load_balance: calling nohz_balancer_kick > > > > bt.B.x-4184 [046] 466.410605: bputs: trigger_load_balance: calling kick_ilb > > > > bt.B.x-4184 [046] 466.410607: bprint: trigger_load_balance: calling smp_call_function_single_async 22 > > > > bt.B.x-4184 [046] 466.410607: bputs: smp_call_function_single_async: calling generic_exec_single > > > > bt.B.x-4184 [046] 466.410607: bputs: generic_exec_single: calling __smp_call_single_queue > > > > bt.B.x-4184 [046] 466.410608: bputs: __smp_call_single_queue: calling send_call_function_single_ipi > > > > bt.B.x-4184 [046] 466.410608: bputs: __smp_call_single_queue: calling call_function_single_prep_ipi > > > > bt.B.x-4184 [046] 466.410608: bputs: call_function_single_prep_ipi: calling set_nr_if_polling > > > > bt.B.x-4184 [046] 466.410609: sched_wake_idle_without_ipi: cpu=22 > > > > > > I don't know if you have made progress on this in the meantime. > > > > > > Regarding the trace above, do you know if anything happens on CPU22 > > > just before the scheduler tried to kick the ILB on it ? > > > > > > Have you found why TIF_POLLING_NRFLAG seems to be always set when the > > > kick_ilb happens ? It should be cleared once entering the idle state. > > > > I haven't figured out everything, but the attached graph shows > > that TIF_POLLING_NRFLAG is not always set. Sometimes it is and sometimes > > it isn't. > > > > In the graph, on core 57, the blue box and the green x are before and > > after the call to cpuidle_idle_call(), resplectively. One can't see it in > > this graph, but the green x comes before the blue box. So almost all of > > the time, it is in cpuidle_idle_call(), only in the tiny gap between the x > > and the box is it back in do_idle with TIF_POLLING_NRFLAG set. > > > > Afterwards, there is a diamond for the polling case and a triangle for the > > non polling case. These also occur on clock ticks, and may be > > microscopically closer to (polling) or further from (not polling) the > > green x and blue box. > > Your problem really looks like weird timing. > > It would be good to know which idle states are selected ? or even > better if it's possible, disable all but one idle state and see if one > idle state in particular trigger your problem > > idle state can be disable here : > echo 1 > /sys/devices/system/cpu/cpu*/cpuidle/state*/disable > > One possible sequence: > tick is not stopped on the idle cpu > tick fires on busy and idle cpus > idle cpu wakes up and the wake up time varies depending of wakeup > latency of the entered c-state > busy cpu executes call_function_single_prep_ipi() and idle cpu could > be already woken or not depending of the time to wake up > > > > > I haven't yet studied what happens afterwards in the non polling case. > > Side point, according to your trace above, you can 2 consecutives real > idle load balance so the patch that I proposed, should be able to > trigger active migration because the nr_balance_failed will be != 0 > the 2nd idle load balance. Are I missing something ? Thanks for the suggestions. I will check both issues. julia