Received: by 2002:a05:6a10:8a4f:0:0:0:0 with SMTP id dn15csp5503563pxb; Mon, 7 Feb 2022 05:13:38 -0800 (PST) X-Google-Smtp-Source: ABdhPJz77GAsOO1I7B0huTjh1aJBN1qgv/OPcMtm2MarM5AYdlwJUt/6iGTYQzVLRsuhWJ5e0Biy X-Received: by 2002:a17:907:7756:: with SMTP id kx22mr9991894ejc.703.1644239618109; Mon, 07 Feb 2022 05:13:38 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1644239618; cv=none; d=google.com; s=arc-20160816; b=cfyhAioTtoeJKyrmMpW5wClYy0hzpiBoQs3bOaXKjtCw26a8Isq58dyL7ee+mhEN5v kfHxLuGNrkxwTFHT0RZrfmvCazBQeDho5JdGeK+TUGExvI+uevfCbrH/20pYLbycmycx Gg74t+09swuz3vS6d4pXd3dbcjQThR5Al3aYA4b9Uc4zNHEKIdGLxor0zL6cBdS+IdBM MM5GAkhhbk/5h9J/7ZFZYO3fZz0rUQGDFOV9GGNcXLZQvdEdqL4qgRtuBZokCiESGRpy 8qFf4O780toxzr8Utj9xpLitptPGRJRmU7xLRPcDwOsc/UPqlRRrFu6Pcdk49l5+R9GV WRjw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date; bh=sw1R5GbQh+agR3zNE+wh/Qc1nNfRcNQSh0HhjxxFy7A=; b=KVPJdKxGpDrTXP44xDk2gbyTGxm5r1wjsudO/ofowZuap+S3SMtL3k+ukeQWuAAzVE 94dY22MKBME2wHi+RjcqyPokZjmC3nJT+OLVix6WD64mNi8wTEkSAvzl3K8ilKaGz3lp c2VxM1KSNgLBsjECbA+DtUJhpgzPkw4s/tXJm7tJ5z9/+4tTv/ehgDcOmsGwM9mQPbje Curv/BYUCPE4FH9W1bNQvVT+R7fPh6dyo5toUNvSGeCIRWSDyQpPg6MZZWxXNhur4yZR oyJipVSQQ8ElJNWIIzamwkUbddQP6980YsXm79SJEsDlOd+HTxyA6kwHxYOA83nmXfEp tzvg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id ds9si7567836ejc.775.2022.02.07.05.13.13; Mon, 07 Feb 2022 05:13:38 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1357214AbiBDJEQ (ORCPT + 99 others); Fri, 4 Feb 2022 04:04:16 -0500 Received: from outbound-smtp48.blacknight.com ([46.22.136.219]:49981 "EHLO outbound-smtp48.blacknight.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229705AbiBDJEO (ORCPT ); Fri, 4 Feb 2022 04:04:14 -0500 Received: from mail.blacknight.com (pemlinmail05.blacknight.ie [81.17.254.26]) by outbound-smtp48.blacknight.com (Postfix) with ESMTPS id E2438FABEA for ; Fri, 4 Feb 2022 09:04:13 +0000 (GMT) Received: (qmail 23272 invoked from network); 4 Feb 2022 09:04:13 -0000 Received: from unknown (HELO techsingularity.net) (mgorman@techsingularity.net@[84.203.17.223]) by 81.17.254.9 with ESMTPSA (AES256-SHA encrypted, authenticated); 4 Feb 2022 09:04:13 -0000 Date: Fri, 4 Feb 2022 09:04:11 +0000 From: Mel Gorman To: Srikar Dronamraju Cc: Peter Zijlstra , Ingo Molnar , Vincent Guittot , Valentin Schneider , Aubrey Li , Barry Song , Mike Galbraith , Gautham Shenoy , LKML Subject: Re: [PATCH 2/2] sched/fair: Adjust the allowed NUMA imbalance when SD_NUMA spans multiple LLCs Message-ID: <20220204090411.GM3366@techsingularity.net> References: <20220203144652.12540-1-mgorman@techsingularity.net> <20220203144652.12540-3-mgorman@techsingularity.net> <20220204070654.GF618915@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <20220204070654.GF618915@linux.vnet.ibm.com> User-Agent: Mutt/1.10.1 (2018-07-13) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Feb 04, 2022 at 12:36:54PM +0530, Srikar Dronamraju wrote: > * Mel Gorman [2022-02-03 14:46:52]: > > > diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c > > index d201a7052a29..e6cd55951304 100644 > > --- a/kernel/sched/topology.c > > +++ b/kernel/sched/topology.c > > @@ -2242,6 +2242,59 @@ build_sched_domains(const struct cpumask *cpu_map, struct sched_domain_attr *att > > } > > } > > > > + /* > > + * Calculate an allowed NUMA imbalance such that LLCs do not get > > + * imbalanced. > > + */ > > We seem to adding this hunk before the sched_domains may be degenerated. > Wondering if we really want to do it before degeneration. > There was no obvious advantage versus doing it at the same time characteristics like groups were being determined. > Let say we have 3 sched domains and we calculated the sd->imb_numa_nr for > all the 3 domains, then lets say the middle sched_domain gets degenerated. > Would the sd->imb_numa_nr's still be relevant? > It's expected that it is still relevant as the ratios with respect to SD_SHARE_PKG_RESOURCES should still be consistent. > > > + for_each_cpu(i, cpu_map) { > > + unsigned int imb = 0; > > + unsigned int imb_span = 1; > > + > > + for (sd = *per_cpu_ptr(d.sd, i); sd; sd = sd->parent) { > > + struct sched_domain *child = sd->child; > > + > > + if (!(sd->flags & SD_SHARE_PKG_RESOURCES) && child && > > + (child->flags & SD_SHARE_PKG_RESOURCES)) { > > + struct sched_domain *top, *top_p; > > + unsigned int nr_llcs; > > + > > + /* > > + * For a single LLC per node, allow an > > + * imbalance up to 25% of the node. This is an > > + * arbitrary cutoff based on SMT-2 to balance > > + * between memory bandwidth and avoiding > > + * premature sharing of HT resources and SMT-4 > > + * or SMT-8 *may* benefit from a different > > + * cutoff. > > + * > > + * For multiple LLCs, allow an imbalance > > + * until multiple tasks would share an LLC > > + * on one node while LLCs on another node > > + * remain idle. > > + */ > > + nr_llcs = sd->span_weight / child->span_weight; > > + if (nr_llcs == 1) > > + imb = sd->span_weight >> 2; > > + else > > + imb = nr_llcs; > > + sd->imb_numa_nr = imb; > > + > > + /* Set span based on the first NUMA domain. */ > > + top = sd; > > + top_p = top->parent; > > + while (top_p && !(top_p->flags & SD_NUMA)) { > > + top = top->parent; > > + top_p = top->parent; > > + } > > + imb_span = top_p ? top_p->span_weight : sd->span_weight; > > I am getting confused by imb_span. > Let say we have a topology of SMT -> MC -> DIE -> NUMA -> NUMA, with SMT and > MC domains having SD_SHARE_PKG_RESOURCES flag set. > We come here only for DIE domain. > > imb_span set here is being used for both the subsequent sched domains > most likely they will be NUMA domains. Right? > Right. > > + } else { > > + int factor = max(1U, (sd->span_weight / imb_span)); > > + > > + sd->imb_numa_nr = imb * factor; > > For SMT, (or any sched domains below the llcs) factor would be > sd->span_weight but imb_numa_nr and imb would be 0. Yes. > For NUMA (or any sched domain just above DIE), factor would be > sd->imb_numa_nr would be nr_llcs. > For subsequent sched_domains, the sd->imb_numa_nr would be some multiple of > nr_llcs. Right? > Right. -- Mel Gorman SUSE Labs