Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp286834pxb; Thu, 21 Jan 2021 07:09:52 -0800 (PST) X-Google-Smtp-Source: ABdhPJzoA8X6Xp/g6FraJZrY5sjOWSb69eqp8d8IHhsi1ZecPWDDjQi/Fk67YwWdStgY3K/OAnMI X-Received: by 2002:a05:6402:1a56:: with SMTP id bf22mr11706223edb.284.1611241792657; Thu, 21 Jan 2021 07:09:52 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1611241792; cv=none; d=google.com; s=arc-20160816; b=LO0XbfuTV3zoMgSPYXgoDyWsAyJnj46r/CbXklUR3P8mSAQ1McoV3HIzvEPbdrIlFb XmT8vveeh7Le+bBkUA13N0a43OvzUwVd5AB51CH9k/AHyLdtFAzM78rlqubEm6wiyrLT aFFEd40uu1r1RmFooBxrz8fqvQRgIIGs9MElT79EcgHPmEVEBfX8N0UrJ4nxCG3aY19w 4OXe0tkw+kumnSz+j9xtfEfMN69nOo6DQ22hNcgCAHwQ248SvYUiAjsIyAIWiFXM5Tei oOuwZ9wFK03TIcXxIhecrYEr7v5bSWLeOPBhA/EjhcaOpN7PXaSnPQ+gJ/XQDHPbuECY PKwQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:message-id:date:user-agent :references:in-reply-to:subject:cc:to:from; bh=5jI6iCYBeTzo3RC6RG5gVuSKDP1K/rxE84rUeLg7Gbo=; b=rUbuV1CH75QT8sF27TgOWg+T9yBQStTO/DF4802OaRimekeyAn/tYe2IgvnZCIdgNw JoZJluBcjodgyQTuOK2F4BmVbScyMrHA8uonWLSg9DTu0EWljmnzpW4L6u7ijeJkwuyL cVPEmbW1y0huuToMWRhfp3HpRQ+i9tHmeKkMqJ0smn3oA8UUZM9Dfi/WIZenyKK0wlXI VSQsqzh566VG8YZQo1KwdAVxUdWVHiN2hCBzg1AOqi7CmeITxgBWafOCut5dXFTx3i6b SsRLMvH0yzv1Xo6wJu0IPNo2nFCUIstrYYvMK6WVy9HPYQdiYsrqC3g3NA9hjrHfpveD O1cQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id t18si1863856ejc.249.2021.01.21.07.09.13; Thu, 21 Jan 2021 07:09:52 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730427AbhAUPIg (ORCPT + 99 others); Thu, 21 Jan 2021 10:08:36 -0500 Received: from foss.arm.com ([217.140.110.172]:38878 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1732285AbhAUPG1 (ORCPT ); Thu, 21 Jan 2021 10:06:27 -0500 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 9BCCC11D4; Thu, 21 Jan 2021 07:05:41 -0800 (PST) Received: from e113632-lin (e113632-lin.cambridge.arm.com [10.1.194.46]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 9A45B3F68F; Thu, 21 Jan 2021 07:05:40 -0800 (PST) From: Valentin Schneider To: Meelis Roos , LKML Cc: Peter Zijlstra , Vincent Guittot , Barry Song , Mel Gorman , Dietmar Eggemann Subject: Re: 5.11-rc4+git: Shortest NUMA path spans too many nodes In-Reply-To: <3ec17983-7959-eccd-af25-400056a5877d@linux.ee> References: <3ec17983-7959-eccd-af25-400056a5877d@linux.ee> User-Agent: Notmuch/0.21 (http://notmuchmail.org) Emacs/26.3 (x86_64-pc-linux-gnu) Date: Thu, 21 Jan 2021 15:05:32 +0000 Message-ID: MIME-Version: 1.0 Content-Type: text/plain Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org (+Cc relevant folks) Hi, On 21/01/21 15:41, Meelis Roos wrote: > This happens on Sun Fire X4600 M2 - 32 cores in 8 CPU slots. 5.10 was silent. Current git and > 5.10.0-13256-g5814bc2d4cc2 exhibit this message in dmesg but otherwise seem to work fine > (kernel compilation succeeds). > b5b217346de8 ("sched/topology: Warn when NUMA diameter > 2") was added in 5.11-rc1, and I believe was marked for stable. It doesn't come with a scheduler behaviour change, it only catches topologies that end up being silently (unless run with SCHED_DEBUG=y) misrepresented / misinterpreted by the scheduler. Up until now I had only seen it fire on a single, somewhat unusual topology. As fixing it is far from trivial, I figured adding this warning would let us build a case for actually fixing it if we get some more reports. Could you paste the output of the below? $ cat /sys/devices/system/node/node*/distance Additionally, booting your system with CONFIG_SCHED_DEBUG=y and appending 'sched_debug' to your cmdline should yield some extra data.