Received: by 2002:a05:7412:a9a3:b0:f9:327e:43ab with SMTP id o35csp110743rdh; Mon, 18 Dec 2023 05:59:17 -0800 (PST) X-Google-Smtp-Source: AGHT+IGQNBaFj/LW322FzvZdXHdC3xz4nZA70pDC6ItkTcbmtoG0kGL1CgvKNZGZNQr9orScLujv X-Received: by 2002:a17:902:cec3:b0:1d0:cbef:2efd with SMTP id d3-20020a170902cec300b001d0cbef2efdmr8603180plg.21.1702907956532; Mon, 18 Dec 2023 05:59:16 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1702907956; cv=none; d=google.com; s=arc-20160816; b=VwRgeJXan2eAjou7bmgTC1FIHFrmAQ9KxkERyXyRVLOLiTHknX0P4EU7ji4WQ8urkW kQ37gmtbttx3aLARFXZK5AG/cFDhp8M5jR4i36yOLnVEiQya6nsD/nRJFHh646coY7k6 6Wz3qeuHunBehdFAhRDRiEJgH1DNR0i+gKuDGgESzXa0W2bWQXViyb1k0XpwKf/i/ydq CrTiYfTWcB5+AlTNrp+wcumxTP8Bm7/bTQiEteygip17FUfzIntNCb8Rj6ThKEih8Rb7 VEZZCPs0aS/XC3ior0TVUpkdJqKGkEB01wddjRsRG/jyhscnFhVwqYc25vSM0HV97zXC vO6Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=mime-version:list-unsubscribe:list-subscribe:list-id:precedence :references:message-id:in-reply-to:subject:cc:to:from:date :dkim-signature; bh=uFJ2605CdR44/XATM5rlpd2IZaTWAyrutYxnwM/vGXM=; fh=VZgiKpmYDC8E3n8Ot4MQmmZECsSy6u5F8EXyIB0MZDY=; b=A3/OB0T2+/KrcoqVnUOBuQBSPST8RyMHWAOvGmXgSjBY91Abg8qJhGOYmSh+tgigq1 ggk1CWXtiqvFmc7vT02qoJfu+BiviySVzNn9PF8enyW7/n+3xlESPKJ1FBuzd8StKYp5 uQmogt0kobwL0gP3ZrJ66OyEhucVbazT7ARv7wm3Npwgq3ClQ1t0ptrVLFlLBXslwqjB hqhBEwbtEtDHHYMKMzk6lGQ7X9O5R3VmFljCbAhupQPCUfjmu5vi6D0z7gWNvmzSFbJv xQMMx7jYGZGww5ITtJlXNbvNJKvksMfOunt+UkOjbo7WGR/GUGDxXzavrQhHRMkHfk2s 8GPA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@inria.fr header.s=dc header.b=Iahy7gFh; spf=pass (google.com: domain of linux-kernel+bounces-3785-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-3785-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=inria.fr Return-Path: Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [2604:1380:45e3:2400::1]) by mx.google.com with ESMTPS id x3-20020a1709027c0300b001cfd0495291si17797408pll.524.2023.12.18.05.59.16 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 18 Dec 2023 05:59:16 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-3785-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) client-ip=2604:1380:45e3:2400::1; Authentication-Results: mx.google.com; dkim=pass header.i=@inria.fr header.s=dc header.b=Iahy7gFh; spf=pass (google.com: domain of linux-kernel+bounces-3785-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-3785-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=inria.fr Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id E4324281895 for ; Mon, 18 Dec 2023 13:59:10 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 911C8129EFB; Mon, 18 Dec 2023 13:59:00 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=inria.fr header.i=@inria.fr header.b="Iahy7gFh" X-Original-To: linux-kernel@vger.kernel.org Received: from mail2-relais-roc.national.inria.fr (mail2-relais-roc.national.inria.fr [192.134.164.83]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 210D71D14D for ; Mon, 18 Dec 2023 13:58:54 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=inria.fr Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=inria.fr DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=inria.fr; s=dc; h=date:from:to:cc:subject:in-reply-to:message-id: references:mime-version; bh=uFJ2605CdR44/XATM5rlpd2IZaTWAyrutYxnwM/vGXM=; b=Iahy7gFhzIhfgRv+zKq1ojbyVLXoC8qQCVkOGMpevAY3fwu2bagkKoxJ X4jxH0jFHMy5f2T2jnLrFBkCy7FHMTvGndSjlBDfT7PCSIJ0nxXHrxcKr K9I/8uUJwQQGwDBVu7Zd4R/9cvU7Ouk4VxinFrHe7eTjlUddLOWOBgUf5 4=; Authentication-Results: mail2-relais-roc.national.inria.fr; dkim=none (message not signed) header.i=none; spf=SoftFail smtp.mailfrom=julia.lawall@inria.fr; dmarc=fail (p=none dis=none) d=inria.fr X-IronPort-AV: E=Sophos;i="6.04,285,1695679200"; d="scan'208";a="142912801" Received: from dt-lawall.paris.inria.fr ([128.93.67.65]) by mail2-relais-roc.national.inria.fr with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Dec 2023 14:58:47 +0100 Date: Mon, 18 Dec 2023 14:58:47 +0100 (CET) From: Julia Lawall To: Peter Zijlstra cc: Ingo Molnar , Vincent Guittot , Dietmar Eggemann , Mel Gorman , linux-kernel@vger.kernel.org Subject: Re: EEVDF and NUMA balancing In-Reply-To: <20231009102949.GC14330@noisy.programming.kicks-ass.net> Message-ID: References: <20231003215159.GJ1539@noisy.programming.kicks-ass.net> <20231004120544.GA6307@noisy.programming.kicks-ass.net> <20231004174801.GE19999@noisy.programming.kicks-ass.net> <20231009102949.GC14330@noisy.programming.kicks-ass.net> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Hello, I have looked further into the NUMA balancing issue. The context is that there are 2N threads running on 2N cores, one thread gets NUMA balanced to the other socket, leaving N+1 threads on one socket and N-1 threads on the other socket. This condition typically persists for one or more seconds. Previously, I reported this on a 4-socket machine, but it can also occur on a 2-socket machine, with other tests from the NAS benchmark suite (sp.B, bt.B, etc). Since there are N+1 threads on one of the sockets, it would seem that load balancing would quickly kick in to bring some thread back to socket with only N-1 threads. This doesn't happen, though, because actually most of the threads have some NUMA effects such that they have a preferred node. So there is a high chance that an attempt to steal will fail, because both threads have a preference for the socket. At this point, the only hope is active balancing. However, triggering active balancing requires the success of the following condition in imbalanced_active_balance: if ((env->migration_type == migrate_task) && (sd->nr_balance_failed > sd->cache_nice_tries+2)) sd->nr_balance_failed does not increase because the core is idle. When a core is idle, it comes to the load_balance function from schedule() though newidle_balance. newidle_balance always sends in the flag CPU_NEWLY_IDLE, even if the core has been idle for a long time. Changing newidle_balance to use CPU_IDLE rather than CPU_NEWLY_IDLE when the core was already idle before the call to schedule() is not enough though, because there is also the constraint on the migration type. That turns out to be (mostly?) migrate_util. Removing the following code from find_busiest_queue: /* * Don't try to pull utilization from a CPU with one * running task. Whatever its utilization, we will fail * detach the task. */ if (nr_running <= 1) continue; and changing the above test to: if ((env->migration_type == migrate_task || env->migration_type == migrate_util) && (sd->nr_balance_failed > sd->cache_nice_tries+2)) seems to solve the problem. I will test this on more applications. But let me know if the above solution seems completely inappropriate. Maybe it violates some other constraints. I have no idea why this problem became more visible with EEVDF. It seems to have to do with the time slices all turning out to be the same. I got the same behavior in 6.5 by overwriting the timeslice calculation to always return 1. But I don't see the connection between the timeslice and the behavior of the idle task. thanks, julia