Received: by 2002:a6b:500f:0:0:0:0:0 with SMTP id e15csp378990iob; Wed, 18 May 2022 04:20:16 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwclSHNhsos9j0WqPIPP4OxUDoTvcDWfV/UFnHvGsqu+Dol4Ltls8+vefUtPUPH3LjptuNE X-Received: by 2002:a17:90a:e7cd:b0:1dc:74eb:7526 with SMTP id kb13-20020a17090ae7cd00b001dc74eb7526mr30206228pjb.144.1652872816348; Wed, 18 May 2022 04:20:16 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1652872816; cv=none; d=google.com; s=arc-20160816; b=VX2uMb9jxrQXa7IWBT4HCkz6qjKUERujDSZcVtg8bu0XevZiAzCiKjUgH3Q0S2FO7t m/BYwds0CnXmpTp7d2+THmoG9zJPEqURCSpnqf7RcQzdQzBrjd873ceAv0FpoZeLID2R Y4Miw4bZSKoLZ1UoY1pIsiItjcZMGaNxCim+IFmIdy2EDWreRnijMLSrhK/V799bkXjl 8llIg8taR5Y+03r3iJ1CcsK8GxJTwUqT2bfr6NoyoJq/Ere+X1OqJobcs5sfaw9oDdX5 eBtENnVR3yvUHJHbryQ2Xkvp3yvjEdFtZvCBkEng8UqW8T+89YXESA6GCEbm4bhOE95z UlaQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date; bh=XROsh1dahsA9BfHEdoKNFLZGNXQ23+vmtCV8A7NUIug=; b=RAxzSkdsB5ZxAtsaxKVIU6vHAiS6aE4ZWRFmYp4iJw0t6Zx7+CwQKGwzDo62V57wDk MEm/3Sn1ucscoDRTtfN1+ecHfwwxxWNKJd3MCeUr1LaCmkRSNH3AKXyCp3mf9dMLD6l7 k8VpvWxM4KEQV+qLd6MkpJkcgiS2xe4Xk6dGk0vc9iSvF/g2EwUEDEKq/+GzxyyMtQu0 1+9RjGmlyAkJK/fKsQmwfc0Py5bUHQyDRC2ZesUj2GLPVTcVe+SRuaGhAa1n0l7w19db 74hBOjbRHFeYfeR9ibSkZbjdGg38cvXbRTJ+Z/oWw1t2F2u1RDsPm/7MQpX+TPWCNnus nPjQ== ARC-Authentication-Results: i=1; mx.google.com; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [23.128.96.19]) by mx.google.com with ESMTPS id f22-20020aa782d6000000b005056d85e53dsi2607966pfn.163.2022.05.18.04.20.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 18 May 2022 04:20:16 -0700 (PDT) Received-SPF: softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) client-ip=23.128.96.19; Authentication-Results: mx.google.com; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id D14125D1AA; Wed, 18 May 2022 04:16:05 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235388AbiERLPv (ORCPT + 99 others); Wed, 18 May 2022 07:15:51 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43674 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235333AbiERLPt (ORCPT ); Wed, 18 May 2022 07:15:49 -0400 Received: from outbound-smtp35.blacknight.com (outbound-smtp35.blacknight.com [46.22.139.218]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A4E0B38D for ; Wed, 18 May 2022 04:15:42 -0700 (PDT) Received: from mail.blacknight.com (pemlinmail02.blacknight.ie [81.17.254.11]) by outbound-smtp35.blacknight.com (Postfix) with ESMTPS id 1C47F1A64 for ; Wed, 18 May 2022 12:15:41 +0100 (IST) Received: (qmail 4693 invoked from network); 18 May 2022 11:15:40 -0000 Received: from unknown (HELO techsingularity.net) (mgorman@techsingularity.net@[84.203.198.246]) by 81.17.254.9 with ESMTPSA (AES256-SHA encrypted, authenticated); 18 May 2022 11:15:40 -0000 Date: Wed, 18 May 2022 12:15:39 +0100 From: Mel Gorman To: Peter Zijlstra Cc: Ingo Molnar , Vincent Guittot , Valentin Schneider , Aubrey Li , LKML Subject: Re: [PATCH 4/4] sched/numa: Adjust imb_numa_nr to a better approximation of memory channels Message-ID: <20220518111539.GP3441@techsingularity.net> References: <20220511143038.4620-1-mgorman@techsingularity.net> <20220511143038.4620-5-mgorman@techsingularity.net> <20220518094112.GE10117@worktop.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <20220518094112.GE10117@worktop.programming.kicks-ass.net> User-Agent: Mutt/1.10.1 (2018-07-13) X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,RDNS_NONE, SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, May 18, 2022 at 11:41:12AM +0200, Peter Zijlstra wrote: > On Wed, May 11, 2022 at 03:30:38PM +0100, Mel Gorman wrote: > > For a single LLC per node, a NUMA imbalance is allowed up until 25% > > of CPUs sharing a node could be active. One intent of the cut-off is > > to avoid an imbalance of memory channels but there is no topological > > information based on active memory channels. Furthermore, there can > > be differences between nodes depending on the number of populated > > DIMMs. > > > > A cut-off of 25% was arbitrary but generally worked. It does have a severe > > corner cases though when an parallel workload is using 25% of all available > > CPUs over-saturates memory channels. This can happen due to the initial > > forking of tasks that get pulled more to one node after early wakeups > > (e.g. a barrier synchronisation) that is not quickly corrected by the > > load balancer. The LB may fail to act quickly as the parallel tasks are > > considered to be poor migrate candidates due to locality or cache hotness. > > > > On a range of modern Intel CPUs, 12.5% appears to be a better cut-off > > assuming all memory channels are populated and is used as the new cut-off > > point. A minimum of 1 is specified to allow a communicating pair to > > remain local even for CPUs with low numbers of cores. For modern AMDs, > > there are multiple LLCs and are not affected. > > Can the hardware tell us about memory channels? It's in the SMBIOS table somewhere as it's available via dmidecode. For example, on a 2-socket machine; $ dmidecode -t memory | grep -E "Size|Bank" Size: 8192 MB Bank Locator: P0_Node0_Channel0_Dimm0 Size: No Module Installed Bank Locator: P0_Node0_Channel0_Dimm1 Size: 8192 MB Bank Locator: P0_Node0_Channel1_Dimm0 Size: No Module Installed Bank Locator: P0_Node0_Channel1_Dimm1 Size: 8192 MB Bank Locator: P0_Node0_Channel2_Dimm0 Size: No Module Installed Bank Locator: P0_Node0_Channel2_Dimm1 Size: 8192 MB Bank Locator: P0_Node0_Channel3_Dimm0 Size: No Module Installed Bank Locator: P0_Node0_Channel3_Dimm1 Size: 8192 MB Bank Locator: P1_Node1_Channel0_Dimm0 Size: No Module Installed Bank Locator: P1_Node1_Channel0_Dimm1 Size: 8192 MB Bank Locator: P1_Node1_Channel1_Dimm0 Size: No Module Installed Bank Locator: P1_Node1_Channel1_Dimm1 Size: 8192 MB Bank Locator: P1_Node1_Channel2_Dimm0 Size: No Module Installed Bank Locator: P1_Node1_Channel2_Dimm1 Size: 8192 MB Bank Locator: P1_Node1_Channel3_Dimm0 Size: No Module Installed Bank Locator: P1_Node1_Channel3_Dimm1 SMBIOUS contains the information on number of channels and whether they are populated with at least one DIMM. I'm not aware of how it can be done in-kernel on a cross architectural basis. Reading through the arch manual, it states how many channels are in a given processor family and it's available during memory check errors (apparently via the EDAC driver). It's sometimes available via PMUs but I couldn't find a place where it's generically available for topology.c that would work on all x86-64 machines let alone every other architecture. It's not even clear if SMBIOS was parsed in early boot whether it's a good idea. It could result in difference imbalance thresholds for each NUMA domain or weird corner cases where assymetric NUMA node populations would result in run-to-run variance that are difficult to analyse. -- Mel Gorman SUSE Labs