Received: by 2002:a05:6a10:f3d0:0:0:0:0 with SMTP id a16csp4049227pxv; Tue, 13 Jul 2021 09:34:09 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyl18i4+VD3XmsBNz4YtMAqh0xErClq1puEFBlvww49LEEpgPCJaseqKHA9OyMepHSgwA+m X-Received: by 2002:a5d:87cc:: with SMTP id q12mr3830361ios.131.1626194049472; Tue, 13 Jul 2021 09:34:09 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1626194049; cv=none; d=google.com; s=arc-20160816; b=jb2BRdz2SkqQS7vl2ZxcYJgkwuJCSPA+DjD1a157q/3LzN7FuZ3x8GmWsSfYrgjg3x fMECWzvYT3qCaJYlS1bnJ2MgPN+wuZEbDEjyX/V5d+wxoFGVqXv1/LAuKI+wgjkEPNA1 4y3SpJJYX6KTx5FEjnKQFLP2fQO3fUMaefw5rAHlzs9qAS1YegwTNnBQT6ejq7+iwAES tElIKoN1iCYGGjv0mSeiuP0NiD2Ji12cn+btsd1KgS55dXc3xICaav3RWFRi2i/DpOTC B4un/MHUv+0sR9NO4GzAk7kKtAa8kTvQIx0TF/IdPU478exDw+S3vydmGRCMskjiUf43 SFig== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:message-id:date:references :in-reply-to:subject:cc:to:from; bh=I6M56FI5GvOsC7aJ2BM+EzAaG8wsCXkOgQWvp0IYP3o=; b=l0XDEXwlM9uxVAR2fguUimN5uvrMI13Z+1T/XJjkCQGXgWTvqm68T+Qkyy+rg+j2Pa QGZreFpAClqUyHidTdC9yEVUWXLBbWmPLeDY4BMUAlXERqb7PCCorSY+9MOuNOsQ1X+p Li2vVS2/kg/26jg3swCHF4JRuCgvl+abRyFAwqxBWikMWtrlKDOMamjvbfo7avSHyVSe JE/nckKQgxNwB7crODqICbrmopfBGv7wmKxjHSCFMHM+ihzHeOs7Y83s0gUctSDtwwzX dNMzhZSXZVbb+cT/VAg02q/K/W5SaVoMkvylPbEG4GZzU4Ryg/cFC63mqd9w9M2jnYOd xcew== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id u26si10149815jam.87.2021.07.13.09.33.50; Tue, 13 Jul 2021 09:34:09 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231175AbhGMQfS (ORCPT + 99 others); Tue, 13 Jul 2021 12:35:18 -0400 Received: from foss.arm.com ([217.140.110.172]:46852 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230434AbhGMQfR (ORCPT ); Tue, 13 Jul 2021 12:35:17 -0400 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 410806D; Tue, 13 Jul 2021 09:32:27 -0700 (PDT) Received: from e113632-lin (usa-sjc-imap-foss1.foss.arm.com [10.121.207.14]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 5A4543F7D8; Tue, 13 Jul 2021 09:32:25 -0700 (PDT) From: Valentin Schneider To: Srikar Dronamraju Cc: Ingo Molnar , Peter Zijlstra , Michael Ellerman , LKML , Mel Gorman , Rik van Riel , Thomas Gleixner , Vincent Guittot , Dietmar Eggemann , linuxppc-dev@lists.ozlabs.org, Nathan Lynch , Gautham R Shenoy , Geetika Moolchandani , Laurent Dufour Subject: Re: [PATCH v2 1/2] sched/topology: Skip updating masks for non-online nodes In-Reply-To: <20210712124856.GA3836887@linux.vnet.ibm.com> References: <20210701041552.112072-1-srikar@linux.vnet.ibm.com> <20210701041552.112072-2-srikar@linux.vnet.ibm.com> <875yxu85wi.mognet@arm.com> <20210712124856.GA3836887@linux.vnet.ibm.com> Date: Tue, 13 Jul 2021 17:32:14 +0100 Message-ID: <87zguqmay9.mognet@arm.com> MIME-Version: 1.0 Content-Type: text/plain Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 12/07/21 18:18, Srikar Dronamraju wrote: > Hi Valentin, > >> On 01/07/21 09:45, Srikar Dronamraju wrote: >> > @@ -1891,12 +1894,30 @@ void sched_init_numa(void) >> > void sched_domains_numa_masks_set(unsigned int cpu) >> > { >> >> Hmph, so we're playing games with masks of offline nodes - is that really >> necessary? Your modification of sched_init_numa() still scans all of the >> nodes (regardless of their online status) to build the distance map, and >> that is never updated (sched_init_numa() is pretty much an __init >> function). >> >> So AFAICT this is all to cope with topology_span_sane() not applying >> 'cpu_map' to its masks. That seemed fine to me back when I wrote it, but in >> light of having bogus distance values for offline nodes, not so much... >> >> What about the below instead? >> >> --- >> diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c >> index b77ad49dc14f..c2d9caad4aa6 100644 >> --- a/kernel/sched/topology.c >> +++ b/kernel/sched/topology.c >> @@ -2075,6 +2075,7 @@ static struct sched_domain *build_sched_domain(struct sched_domain_topology_leve >> static bool topology_span_sane(struct sched_domain_topology_level *tl, >> const struct cpumask *cpu_map, int cpu) >> { >> + struct cpumask *intersect = sched_domains_tmpmask; >> int i; >> >> /* NUMA levels are allowed to overlap */ >> @@ -2090,14 +2091,17 @@ static bool topology_span_sane(struct sched_domain_topology_level *tl, >> for_each_cpu(i, cpu_map) { >> if (i == cpu) >> continue; >> + >> /* >> - * We should 'and' all those masks with 'cpu_map' to exactly >> - * match the topology we're about to build, but that can only >> - * remove CPUs, which only lessens our ability to detect >> - * overlaps >> + * We shouldn't have to bother with cpu_map here, unfortunately >> + * some architectures (powerpc says hello) have to deal with >> + * offline NUMA nodes reporting bogus distance values. This can >> + * lead to funky NODE domain spans, but since those are offline >> + * we can mask them out. >> */ >> + cpumask_and(intersect, tl->mask(cpu), tl->mask(i)); >> if (!cpumask_equal(tl->mask(cpu), tl->mask(i)) && >> - cpumask_intersects(tl->mask(cpu), tl->mask(i))) >> + cpumask_intersects(intersect, cpu_map)) >> return false; >> } >> > > Unfortunately this is not helping. > I tried this patch alone and also with 2/2 patch of this series where > we update/fill fake topology numbers. However both cases are still failing. > Thanks for testing it. Now, let's take examples from your cover letter: node distances: node 0 1 2 3 4 5 6 7 0: 10 20 40 40 40 40 40 40 1: 20 10 40 40 40 40 40 40 2: 40 40 10 20 40 40 40 40 3: 40 40 20 10 40 40 40 40 4: 40 40 40 40 10 20 40 40 5: 40 40 40 40 20 10 40 40 6: 40 40 40 40 40 40 10 20 7: 40 40 40 40 40 40 20 10 But the system boots with just nodes 0 and 1, thus only this distance matrix is valid: node 0 1 0: 10 20 1: 20 10 topology_span_sane() is going to use tl->mask(cpu), and as you reported the NODE topology level should cause issues. Let's assume all offline nodes say they're 10 distance away from everyone else, and that we have one CPU per node. This would give us: NODE->mask(0) == 0,2-7 NODE->mask(1) == 1-7 The intersection is 2-7, we'll trigger the WARN_ON(). Now, with the above snippet, we'll check if that intersection covers any online CPU. For sched_init_domains(), cpu_map is cpu_active_mask, so we'd end up with an empty intersection and we shouldn't warn - that's the theory at least. Looking at sd_numa_mask(), I think there's a bug with topology_span_sane(): it doesn't run in the right place wrt where sched_domains_curr_level is updated. Could you try the below on top of the previous snippet? If that doesn't help, could you share the node distances / topology masks that lead to the WARN_ON()? Thanks. --- diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c index b77ad49dc14f..cda69dfa4065 100644 --- a/kernel/sched/topology.c +++ b/kernel/sched/topology.c @@ -1516,13 +1516,6 @@ sd_init(struct sched_domain_topology_level *tl, int sd_id, sd_weight, sd_flags = 0; struct cpumask *sd_span; -#ifdef CONFIG_NUMA - /* - * Ugly hack to pass state to sd_numa_mask()... - */ - sched_domains_curr_level = tl->numa_level; -#endif - sd_weight = cpumask_weight(tl->mask(cpu)); if (tl->sd_flags) @@ -2131,7 +2124,12 @@ build_sched_domains(const struct cpumask *cpu_map, struct sched_domain_attr *att sd = NULL; for_each_sd_topology(tl) { - +#ifdef CONFIG_NUMA + /* + * Ugly hack to pass state to sd_numa_mask()... + */ + sched_domains_curr_level = tl->numa_level; +#endif if (WARN_ON(!topology_span_sane(tl, cpu_map, i))) goto error;