Received: by 2002:a6b:500f:0:0:0:0:0 with SMTP id e15csp667309iob; Wed, 18 May 2022 10:11:34 -0700 (PDT) X-Google-Smtp-Source: ABdhPJx1PP+LeasxcDIhTDZ3VzpL8eeAEYH+qFdjlx8IiaD0FIHMCJAEGRK59Un01Z0l3ZcenLut X-Received: by 2002:a63:1954:0:b0:3f5:3640:2bc5 with SMTP id 20-20020a631954000000b003f536402bc5mr361877pgz.335.1652893894334; Wed, 18 May 2022 10:11:34 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1652893894; cv=none; d=google.com; s=arc-20160816; b=NZTbvDiQL4Z777EecGcNoBvPCCiYgFhUnV34afxbeNQPZbMdtP4xz+PSD5gr3riY6p /QZ9mMialcrlgaSFMabarHTkUlU0Qk/PPHFN8ZUYAa7M9/EHhsa3vJZWMwDvY9ByaSrK dCgVoBCeRXg6MyJaapynmj6ypkrbXsMu8xB3EhFlVCVYpOUcGB4fL2wlTfR3oh8WpGBZ ZpOUYSqaY41Rag5lrStmfCRJzMVVDAbEGuAz8q5CuFQnAfY/Dx5t4oAC7jC+zhe0H2fs zjrThlkY2fYvmUupK+LnCvL4cI45vydsDfdN8l8I8AMy+NekWPSqzM7jveEzW6M671EI bD+A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date; bh=9HnRDVO+Ki0/IoMq0JBgHCZDB/l4a8J+9IOyI1iM7YM=; b=qf51bXkdGAhO43ULiCdKpziMOYA7JGFZWMtsVXYfmHTvdEj1VWXo3ZVAfsINn0VIEV yWAU7fuGMz9AvP2srvw86zEjyAMXATR39mOjhi7LnTyjoY99Eto/JYitoVgCmXcYv9NY MA2cNL91kOgWuM9WPYahw6VvWmQnB+OokVt+JaIHmTljMmGFQvTISEAF0k4+9iUWCl8W 1W1XDNHnCGBwPjsZb9JBw2uzDQQjxdsasjNRWch8mFHhGonUUyg5f4SG0o7+K2E/s9vJ 18e693ge9bRkfoRtEN+WGNts3xqk/rCltjZ3srEq2JE4GFj4S2LwBMwDaKxP5wCeuR33 2wbg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [2620:137:e000::1:18]) by mx.google.com with ESMTPS id h6-20020a170902f7c600b0015f13887fe4si2995013plw.200.2022.05.18.10.11.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 18 May 2022 10:11:34 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) client-ip=2620:137:e000::1:18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 7633F3D1F7; Wed, 18 May 2022 10:06:40 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240723AbiERRGf (ORCPT + 99 others); Wed, 18 May 2022 13:06:35 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59654 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S240752AbiERRGb (ORCPT ); Wed, 18 May 2022 13:06:31 -0400 Received: from outbound-smtp47.blacknight.com (outbound-smtp47.blacknight.com [46.22.136.64]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 886DA3982A for ; Wed, 18 May 2022 10:06:29 -0700 (PDT) Received: from mail.blacknight.com (pemlinmail03.blacknight.ie [81.17.254.16]) by outbound-smtp47.blacknight.com (Postfix) with ESMTPS id B42E2FA978 for ; Wed, 18 May 2022 18:06:27 +0100 (IST) Received: (qmail 17228 invoked from network); 18 May 2022 17:06:27 -0000 Received: from unknown (HELO techsingularity.net) (mgorman@techsingularity.net@[84.203.198.246]) by 81.17.254.9 with ESMTPSA (AES256-SHA encrypted, authenticated); 18 May 2022 17:06:27 -0000 Date: Wed, 18 May 2022 18:06:25 +0100 From: Mel Gorman To: Peter Zijlstra Cc: Ingo Molnar , Vincent Guittot , Valentin Schneider , Aubrey Li , LKML Subject: Re: [PATCH 4/4] sched/numa: Adjust imb_numa_nr to a better approximation of memory channels Message-ID: <20220518170625.GT3441@techsingularity.net> References: <20220511143038.4620-1-mgorman@techsingularity.net> <20220511143038.4620-5-mgorman@techsingularity.net> <20220518094112.GE10117@worktop.programming.kicks-ass.net> <20220518111539.GP3441@techsingularity.net> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,RDNS_NONE, SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, May 18, 2022 at 04:05:03PM +0200, Peter Zijlstra wrote: > On Wed, May 18, 2022 at 12:15:39PM +0100, Mel Gorman wrote: > > > I'm not aware of how it can be done in-kernel on a cross architectural > > basis. Reading through the arch manual, it states how many channels are > > in a given processor family and it's available during memory check errors > > (apparently via the EDAC driver). It's sometimes available via PMUs but > > I couldn't find a place where it's generically available for topology.c > > that would work on all x86-64 machines let alone every other architecture. > > So provided it is something we want (below) we can always start an arch > interface and fill it out where needed. > It could start with a function with a fixed value that architectures can override but it might be a deep rabbit hole to discover and wire it all up. The most straight-forward would be based on CPU family and model but time consuming to maintain. It gets fuzzy if it's something like PowerKVM where channel details are hidden. It could be a deep rabbit hole. > > It's not even clear if SMBIOS was parsed in early boot whether > > We can always rebuild topology / update variables slightly later in > boot. > > > it's a > > good idea. It could result in difference imbalance thresholds for each > > NUMA domain or weird corner cases where assymetric NUMA node populations > > would result in run-to-run variance that are difficult to analyse. > > Yeah, maybe. OTOH having a magic value that's guestimated based on > hardware of the day is something that'll go bad any moment as well. > > I'm not too worried about run-to-run since people don't typically change > DIMM population over a reboot, but yes, there's always going to be > corner cases. Same with a fixed value though, that's also going to be > wrong. > By run-to-run, I mean just running the same workload in a loop and not rebooting between runs. If there are differences in how nodes are populated, there will be some run-to-run variance based purely on what node the workload started on because they will have different "allowed imbalance" thresholds. I'm running the tests to recheck exactly how much impact this patch has on the peak performance. It takes a few hours so I won't have anything until tomorrow. Initially "get peak performance" and "stabilise run-to-run variances" were my objectives. This series only aimed at the peak performance for a finish as allowed NUMA imbalance was not the sole cause of the problem. I still haven't spent time figuring out why c6f886546cb8 ("sched/fair: Trigger the update of blocked load on newly idle cpu") made such a big difference to variability. -- Mel Gorman SUSE Labs