Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp5625120pxb; Tue, 16 Feb 2021 03:32:47 -0800 (PST) X-Google-Smtp-Source: ABdhPJxOJMERK0bSFjKxA4cZkQwtiR/Xul+nKr9FiUoI/rN6LtnKUityDCZoZCDq4/dpYnjcoqEi X-Received: by 2002:a17:906:9bf8:: with SMTP id de56mr19715530ejc.425.1613475167110; Tue, 16 Feb 2021 03:32:47 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1613475167; cv=none; d=google.com; s=arc-20160816; b=id2VhPdaWsQGzBnAWBDHId2WGnLLrvKtiXto7fSxNeXzo7vOIrgzT6QmmjNIU5fIj5 XyOam+2hyjhtNR3K5r9p0XsmLiaRKvs9MfigNvqCntdU6qqxM/kZicfEVzt2pyP87BsG ToKDPAXboAWzAD9bZCeWE+avgORE/4mjN2lfDzzutm2LK3Gh9U2n9bRW/zz9TNmlfyYB gIHpy7ZNVl0TNl6AcvFtOb3EKhOmjJArCF5ho3WTnMdjkAJWQAymhTpDZf6FLeajsT4V Ia3vWI9cSD4icy98rEbGJ06VrwD1Dp5ZzJOVhwUG30bBvDTKmARO1dolU+tdTYO2NHre yXUA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=v2p5F2l64ss/iukjOf8SrXtGqS4gt9hnzkVOGo1KctE=; b=fsZQcK/jfE5BhlEXcQ0UbGkCusXoZq76HsPes6gQDWywVbMm50sPbHPmElN1UsCDDm 55OOj7XSv37dhnz174jxaWDKb1ENMbAs1JE65l6EDHH81l+36IQpnru0dDqLOCGpyxT/ nHx4JJaWcKq/UhaXETE9VAE85RVBS09RWGTU0EL5929UNAHaBZep6gH7NYrM68mLLoBy zkegwwCwDQF6V30MWE6CLlJ0AnoN2trScOOMFPA6dnwe6jZIotOQhfh3vbauPECx4Rb4 TajoSgW96msm+waSqrSkJ6HA4qQaBbKFJR80ZYyPt5hygYr4VaK5VxCNzYLqWUWLmo6q cNLg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@infradead.org header.s=casper.20170209 header.b=NiVr76PK; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id hy19si14019087ejc.692.2021.02.16.03.32.23; Tue, 16 Feb 2021 03:32:47 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@infradead.org header.s=casper.20170209 header.b=NiVr76PK; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229890AbhBPLbg (ORCPT + 99 others); Tue, 16 Feb 2021 06:31:36 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41814 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229761AbhBPLbc (ORCPT ); Tue, 16 Feb 2021 06:31:32 -0500 Received: from casper.infradead.org (casper.infradead.org [IPv6:2001:8b0:10b:1236::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BC9CBC061574 for ; Tue, 16 Feb 2021 03:30:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=v2p5F2l64ss/iukjOf8SrXtGqS4gt9hnzkVOGo1KctE=; b=NiVr76PKF7+lTtseC5CuPHNv7G 1tV/o5sEkt/vj/j1TTwmbe0lmKO+I1/25sYS+dxZPrANIDZ7+C1quR2OQ9y1OZ2GZOKYjg+zW09yi 2pwOoyeQI6QJS/allnV1+YcFSTAj63iB4zt75MVnaENwrJDpaYOZMRiRrU0NMVRdgAl+5BZm6NH1D 82wu9DjVAMbfd6IDFy9SXDCHWdr7Uu2r7CInM/xxNsfsZbGOtIPxNljPKoJx7PZcmGk/TtdNA1Kx3 uoeqHKm0Wwjof7ac1a78V9Jv43/IxlX/zqhJVai7kwwxM3u78uHNFAXHINSrsu8f9zrupqIBWwayI HdPft1/A==; Received: from j217100.upc-j.chello.nl ([24.132.217.100] helo=noisy.programming.kicks-ass.net) by casper.infradead.org with esmtpsa (Exim 4.94 #2 (Red Hat Linux)) id 1lByXR-00Gnf2-KR; Tue, 16 Feb 2021 11:29:20 +0000 Received: from hirez.programming.kicks-ass.net (hirez.programming.kicks-ass.net [192.168.1.225]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by noisy.programming.kicks-ass.net (Postfix) with ESMTPS id 515293003E5; Tue, 16 Feb 2021 12:29:02 +0100 (CET) Received: by hirez.programming.kicks-ass.net (Postfix, from userid 1000) id 3DD922BB730E1; Tue, 16 Feb 2021 12:29:02 +0100 (CET) Date: Tue, 16 Feb 2021 12:29:02 +0100 From: Peter Zijlstra To: Alison Schofield Cc: Dave Hansen , Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, linux-kernel@vger.kernel.org, Dave Hansen , Tony Luck , Tim Chen , "H. Peter Anvin" , David Rientjes , Igor Mammedov , Prarit Bhargava , brice.goglin@gmail.com Subject: Re: [PATCH] x86, sched: Allow NUMA nodes to share an LLC on Intel platforms Message-ID: References: <20210209223943.9834-1-alison.schofield@intel.com> <20210210221134.GA12410@alison-desk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20210210221134.GA12410@alison-desk> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Feb 10, 2021 at 02:11:34PM -0800, Alison Schofield wrote: > This is equivalent to determining if x86_has_numa_in_package. > Do you think there is an opportunity to set x86_has_numa_in_package > earlier, and use it here and in set_cpu_sibling_map()? Sure. Not sure that's actually clearer though. > With that additional info (match_pkg()) how about - > > Instead of this: > - if (!topology_same_node(c, o) && x86_match_cpu(snc_cpu)) > + if (!topology_same_node(c, o) && x86_match_cpu(snc_cpu) && match_pkg(c, o)) > > Do this: > > - if (!topology_same_node(c, o) && x86_match_cpu(snc_cpu)) > + if (!topology_same_node(c, o) && match_pkg(c, o)) > > > Looking at Commit 316ad248307f ("sched/x86: Rewrite set_cpu_sibling_map()) > which reworked topology WARNINGs, the intent was to "make sure to > only warn when the check changes the end result" > > This check doesn't change the end result. It returns false directly > and if it were bypassed completely, it would still return false with > a WARNING. I'm not following the argument, we explicitly *do* modify the end result for those SNC caches. Also, by doing what you propose, we fail to get a warning if/when AMD decides to do 'funny' things. Suppose AMD too thinks this is a swell idea, but they have subtly different cache behaviour (just for giggles), then it all goes undetected, which would be bad. > If we add that additional match_pkg() check is removing the WARNING for > all cases possible? How many parts had that Intel Cluster-on-Die thing? Seeing how all the new parts have the SNC crud, that seems like a finite list. Wikipedia seems to suggest haswell and broadwell were the only onces with COD on, skylake and later has the SNC. How's something like this then (needs splitting into multiple patches I suppose): --- arch/x86/kernel/smpboot.c | 76 +++++++++++++++++++++++++---------------------- 1 file changed, 41 insertions(+), 35 deletions(-) diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c index 117e24fbfd8a..cfe23badf9a3 100644 --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -458,8 +458,31 @@ static bool match_smt(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o) return false; } +static bool match_die(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o) +{ + if ((c->phys_proc_id == o->phys_proc_id) && + (c->cpu_die_id == o->cpu_die_id)) + return true; + return false; +} + +/* + * Unlike the other levels, we do not enforce keeping a + * multicore group inside a NUMA node. If this happens, we will + * discard the MC level of the topology later. + */ +static bool match_pkg(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o) +{ + if (c->phys_proc_id == o->phys_proc_id) + return true; + return false; +} + /* - * Define snc_cpu[] for SNC (Sub-NUMA Cluster) CPUs. + * Define intel_cod_cpu[] for Intel COD (Cluster-on-Die) CPUs. + * + * Any Intel CPU that has multiple nodes per package and doesn't match this + * will have the newer SNC (Sub-NUMA Cluster). * * These are Intel CPUs that enumerate an LLC that is shared by * multiple NUMA nodes. The LLC on these systems is shared for @@ -473,14 +496,18 @@ static bool match_smt(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o) * NUMA nodes). */ -static const struct x86_cpu_id snc_cpu[] = { - X86_MATCH_INTEL_FAM6_MODEL(SKYLAKE_X, NULL), +static const struct x86_cpu_id intel_cod_cpu[] = { + X86_MATCH_INTEL_FAM6_MODEL(HASWELL_X, 0), /* COD */ + X86_MATCH_INTEL_FAM6_MODEL(BROADWELL_X, 0), /* COD */ + X86_MATCH_INTEL_FAM6_MODEL(ANY, 1), /* SNC */ {} }; static bool match_llc(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o) { + const struct x86_cpu_id *id = x86_match_cpu(intel_cod_cpu); int cpu1 = c->cpu_index, cpu2 = o->cpu_index; + bool intel_snc = id && id->driver_data; /* Do not match if we do not have a valid APICID for cpu: */ if (per_cpu(cpu_llc_id, cpu1) == BAD_APICID) @@ -495,32 +522,12 @@ static bool match_llc(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o) * means 'c' does not share the LLC of 'o'. This will be * reflected to userspace. */ - if (!topology_same_node(c, o) && x86_match_cpu(snc_cpu)) + if (match_pkg(c, o) && !topology_same_node(c, o) && intel_snc) return false; return topology_sane(c, o, "llc"); } -/* - * Unlike the other levels, we do not enforce keeping a - * multicore group inside a NUMA node. If this happens, we will - * discard the MC level of the topology later. - */ -static bool match_pkg(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o) -{ - if (c->phys_proc_id == o->phys_proc_id) - return true; - return false; -} - -static bool match_die(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o) -{ - if ((c->phys_proc_id == o->phys_proc_id) && - (c->cpu_die_id == o->cpu_die_id)) - return true; - return false; -} - #if defined(CONFIG_SCHED_SMT) || defined(CONFIG_SCHED_MC) static inline int x86_sched_itmt_flags(void) @@ -592,14 +599,23 @@ void set_cpu_sibling_map(int cpu) for_each_cpu(i, cpu_sibling_setup_mask) { o = &cpu_data(i); + if (match_pkg(c, o) && !topology_same_node(c, o)) + x86_has_numa_in_package = true; + if ((i == cpu) || (has_smt && match_smt(c, o))) link_mask(topology_sibling_cpumask, cpu, i); if ((i == cpu) || (has_mp && match_llc(c, o))) link_mask(cpu_llc_shared_mask, cpu, i); + if ((i == cpu) || (has_mp && match_die(c, o))) + link_mask(topology_die_cpumask, cpu, i); } + threads = cpumask_weight(topology_sibling_cpumask(cpu)); + if (threads > __max_smt_threads) + __max_smt_threads = threads; + /* * This needs a separate iteration over the cpus because we rely on all * topology_sibling_cpumask links to be set-up. @@ -613,8 +629,7 @@ void set_cpu_sibling_map(int cpu) /* * Does this new cpu bringup a new core? */ - if (cpumask_weight( - topology_sibling_cpumask(cpu)) == 1) { + if (threads == 1) { /* * for each core in package, increment * the booted_cores for this new cpu @@ -631,16 +646,7 @@ void set_cpu_sibling_map(int cpu) } else if (i != cpu && !c->booted_cores) c->booted_cores = cpu_data(i).booted_cores; } - if (match_pkg(c, o) && !topology_same_node(c, o)) - x86_has_numa_in_package = true; - - if ((i == cpu) || (has_mp && match_die(c, o))) - link_mask(topology_die_cpumask, cpu, i); } - - threads = cpumask_weight(topology_sibling_cpumask(cpu)); - if (threads > __max_smt_threads) - __max_smt_threads = threads; } /* maps the cpu to the sched domain representing multi-core */