Received: by 2002:a05:7412:8d06:b0:f9:332d:97f1 with SMTP id bj6csp78737rdb; Mon, 18 Dec 2023 09:20:55 -0800 (PST) X-Google-Smtp-Source: AGHT+IEEi743VQp8T6audGPCtf8aY3Kiu5sbOUqSsiUh2Rvd8ypRYa1FHUADXD+4i4GQ2VOGYLSl X-Received: by 2002:a50:d5dd:0:b0:553:6730:9825 with SMTP id g29-20020a50d5dd000000b0055367309825mr911207edj.55.1702920055704; Mon, 18 Dec 2023 09:20:55 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1702920055; cv=none; d=google.com; s=arc-20160816; b=vo8KxCzENKDlxLrz/vaCfGHx14R6mCsZjdj2AMIn2vGPoaBi5SOd6ORkji4F3HW4Lw tx9Z0thh0GFWdhpmss7CyNyW0DiyFhEBpD7F7LSwqKA9hwpLbWwkcAkLQg/bTzOP2zTl LyPJaYRBe7cEuVsymnoaFWIpv0YgrM9k9AImQRY/eUhKFQvdwhBocPI1dXzDA9Z757Dr vAO+xxzOB/deNd7fGPOZmzXNfOjYP/hHGR3nOdXOFlLpbR8/qNnagzzYaYA9vZA9M3Jr Pb8ULOz7FC33gdWamV7z/IyCrmnX7BV4LLi7dqVga/tOxdXUA6UG3jCBVSAipl5F8HoR oFxg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:list-unsubscribe:list-subscribe:list-id:precedence :dkim-signature; bh=S1D4GI6oAvY4mJ1MSnP0envpbKSQ3PQDlsLAaDvR2hg=; fh=jJBbNPnQZtZvstbxiz5RTvIPxy/uxViwSQpNAR2vU2o=; b=EjK51V0eVUisuvSnMISqSU4XHcgl28XBSkAFYyJc0ApO7BrByvTIfwVmH5uf8k/qi8 suDTbE7H7ONTFYCbtYG5YJobDPj1JYCHIJantB5jIqdVooX3wc+bA06o51s3otVP+iSk GI6YDnlpb1R97vuD58TY1rtJkqnP7wfBn+fKcOUrctFLfBK0ETKCI/HG5aSH/gJffjxn Y7bosB/XdOmjfNvbt/U4wcAaLqbkH/sGoLzfelWAp3kFHXFtNKH0oPQoyY7VKTGW2Ppz 7zWq0G+6m5ZvN+vW7L4oOL+DDH3NGAvcmu9VORWZfcG5hyWdLaC66xoQ6yfORVArRNi3 HXAA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=FhHlVnVu; spf=pass (google.com: domain of linux-kernel+bounces-4145-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-4145-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [2604:1380:4601:e00::3]) by mx.google.com with ESMTPS id d29-20020a50cd5d000000b0054b348109dbsi9712436edj.186.2023.12.18.09.20.55 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 18 Dec 2023 09:20:55 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-4145-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) client-ip=2604:1380:4601:e00::3; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=FhHlVnVu; spf=pass (google.com: domain of linux-kernel+bounces-4145-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-4145-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id 376201F23680 for ; Mon, 18 Dec 2023 17:18:59 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id C2A2D5A84B; Mon, 18 Dec 2023 17:18:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linaro.org header.i=@linaro.org header.b="FhHlVnVu" X-Original-To: linux-kernel@vger.kernel.org Received: from mail-pj1-f42.google.com (mail-pj1-f42.google.com [209.85.216.42]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6C75D4237E for ; Mon, 18 Dec 2023 17:18:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linaro.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linaro.org Received: by mail-pj1-f42.google.com with SMTP id 98e67ed59e1d1-28b012f93eeso1323994a91.0 for ; Mon, 18 Dec 2023 09:18:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1702919929; x=1703524729; darn=vger.kernel.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=S1D4GI6oAvY4mJ1MSnP0envpbKSQ3PQDlsLAaDvR2hg=; b=FhHlVnVupZMUvUpuet2mIb+B8Re0YhVTtgaaMthjLTnsNg8gx3VNGg2i3ndifCGY61 6vD8AAf0RdEm6Y/xttNQKTGJwSzaWVl/PW57DwAPi86Cu9AEkX9PxlOrjDmN22BMMpwl 2TxQhQeSXhfofxoLs6NybLA6Sc7uAryDFGiraBM8FNihbo6MkjOx/4JIrAJIPwhZppKl 8BFIu+9yU7Q+hgCmFa0Rxllup8iU4KJq/T14ujue81CWwKGqpyylN3wzeJSYXVwmoJuD YEq4IOaDay02daozym2lIVhszh5r1ezDFrcr89RdZl4Ts6SLrgQLshE3rav2v/bZs8Ky lDlA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1702919929; x=1703524729; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=S1D4GI6oAvY4mJ1MSnP0envpbKSQ3PQDlsLAaDvR2hg=; b=O/ffCKehnhMIAp5St2K/3VYFSziu03DJic02/91vV/ZfO+zQPqArLY7F9bq/bc/vNV HhJolgz5umN0UJjIvYJKT8OZak4tRhWF6VTbTAZQ4Y1Ztwufw0gS0MVY7kzaB3MPdIOM Y2+xRpMAl2IknrWnilHObTzA2REPvRRGhF5MB/sHv7JOMQqXUu+ONuJ3MSZl3VfEdjX3 R5mHuwbFwOvpWszA+Rowqw21HCgJilZnwLIA4o3umE/bTXVbNDhOD7xMLoED9PWIGMJG s4XF3aFfSEGywmh+ZAFmSJNSapee93k56/83bi3WQg4JlCNFm7+KmF4WZGpRDb8eFZWQ NuOg== X-Gm-Message-State: AOJu0Yy8BaSQtd2G9tHYLW2mvH7v+/9HUTOLiV9pYYYyKsXS/ZrHqx2d L4FrJCeMGky9tFv8Z2f8KzsPiQjOqg9YnPW2knwFCg== X-Received: by 2002:a17:90a:5285:b0:28a:b5ac:514f with SMTP id w5-20020a17090a528500b0028ab5ac514fmr5048759pjh.95.1702919929648; Mon, 18 Dec 2023 09:18:49 -0800 (PST) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 References: <20231003215159.GJ1539@noisy.programming.kicks-ass.net> <20231004120544.GA6307@noisy.programming.kicks-ass.net> <20231004174801.GE19999@noisy.programming.kicks-ass.net> <20231009102949.GC14330@noisy.programming.kicks-ass.net> In-Reply-To: From: Vincent Guittot Date: Mon, 18 Dec 2023 18:18:38 +0100 Message-ID: Subject: Re: EEVDF and NUMA balancing To: Julia Lawall Cc: Peter Zijlstra , Ingo Molnar , Dietmar Eggemann , Mel Gorman , linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" On Mon, 18 Dec 2023 at 14:58, Julia Lawall wrote: > > Hello, > > I have looked further into the NUMA balancing issue. > > The context is that there are 2N threads running on 2N cores, one thread > gets NUMA balanced to the other socket, leaving N+1 threads on one socket > and N-1 threads on the other socket. This condition typically persists > for one or more seconds. > > Previously, I reported this on a 4-socket machine, but it can also occur > on a 2-socket machine, with other tests from the NAS benchmark suite > (sp.B, bt.B, etc). > > Since there are N+1 threads on one of the sockets, it would seem that load > balancing would quickly kick in to bring some thread back to socket with > only N-1 threads. This doesn't happen, though, because actually most of > the threads have some NUMA effects such that they have a preferred node. > So there is a high chance that an attempt to steal will fail, because both > threads have a preference for the socket. > > At this point, the only hope is active balancing. However, triggering > active balancing requires the success of the following condition in > imbalanced_active_balance: > > if ((env->migration_type == migrate_task) && > (sd->nr_balance_failed > sd->cache_nice_tries+2)) > > sd->nr_balance_failed does not increase because the core is idle. When a > core is idle, it comes to the load_balance function from schedule() though > newidle_balance. newidle_balance always sends in the flag CPU_NEWLY_IDLE, > even if the core has been idle for a long time. Do you mean that you never kick a normal idle load balance ? > > Changing newidle_balance to use CPU_IDLE rather than CPU_NEWLY_IDLE when > the core was already idle before the call to schedule() is not enough > though, because there is also the constraint on the migration type. That > turns out to be (mostly?) migrate_util. Removing the following > code from find_busiest_queue: > > /* > * Don't try to pull utilization from a CPU with one > * running task. Whatever its utilization, we will fail > * detach the task. > */ > if (nr_running <= 1) > continue; I'm surprised that load_balance wants to "migrate_util" instead of "migrate_task" You have N+1 threads on a group of 2N CPUs so you should have at most 1 thread per CPUs in your busiest group. In theory you should have the local "group_has_spare" and the busiest "group_fully_busy" (at most). This means that no group should be overloaded and load_balance should not try to migrate utli but only task > > and changing the above test to: > > if ((env->migration_type == migrate_task || env->migration_type == migrate_util) && > (sd->nr_balance_failed > sd->cache_nice_tries+2)) > > seems to solve the problem. > > I will test this on more applications. But let me know if the above > solution seems completely inappropriate. Maybe it violates some other > constraints. > > I have no idea why this problem became more visible with EEVDF. It seems > to have to do with the time slices all turning out to be the same. I got > the same behavior in 6.5 by overwriting the timeslice calculation to > always return 1. But I don't see the connection between the timeslice and > the behavior of the idle task. > > thanks, > julia