Received: by 2002:a05:6a10:f3d0:0:0:0:0 with SMTP id a16csp296272pxv; Wed, 30 Jun 2021 21:19:50 -0700 (PDT) X-Google-Smtp-Source: ABdhPJykLaVXoAne8dlVDtgyODFQXxTQjTrSgxuk/eCN0d/GIedIHerxzowo5VsqUhVdv7k+NAOl X-Received: by 2002:a92:710a:: with SMTP id m10mr28820039ilc.254.1625113189891; Wed, 30 Jun 2021 21:19:49 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1625113189; cv=none; d=google.com; s=arc-20160816; b=VS3jELNeI0pGFE+Eu/64eFlYpVuSLPfgN0cxG6hUIpWtCIXnmpjITehOjQWP+5n8V6 XSODYqf5e6w4N15GC39B+NtN5NV5xwmRI0V+QU340QhrDHZu1nHaObnju53uhk2LjWhe 4svSIpLh9MYliQlA0dDck3AjbbK1aGDWa15zVSji0gDxquuaNosj/zQzYtxDNFsHF9Ss ntFsIeyKHty+8mtuInHpANewMHWC23lfK3lcotBKOV9/j8kj8k8BeVIZ36Ee3eMCWqbc iyJeDS++XN4Waxhu5vz6P4wLdAT+zvO8FNYsAY2UtpuCBLuxt2EdHrTiZQcyUK4hmn6j aIhw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:content-transfer-encoding :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=1QeEN8lYtB5E3x3m5VwTT+Q6+645tDNgC2asz074Es0=; b=IUtjfSu3SDOU+bD9vhW1gCk2DXqX+9jm3XVHC24tgTnPQo/cG3eaelk2MYk/7/EuiF 1PJk/SsgpFm9IWda658fcnfZuPi1e9YiFN3ot+oulKlvXDDyBDRAQzzSF8FVCMjkdO90 b25s3t7l6DWlQ8h3j6dfgQYugQpoUb4M9crpoeqSZ+JZuromgQyx1xbypdLOWKQAIkUl 8fObEGGDedtDNEhR6OAO4UyigWp/Odwb8lBjRO9hzdWTFeHlBDO/J9mHd+OOon5ZwxPu GTuDGtLYW+oL74qT1SzyAbVsylyQH9tSAJwVBDEXBtOcXj8ZDbjDgFcNNxxFOYD1sMfc Ucxw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@ibm.com header.s=pp1 header.b=GYdRC4h9; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=ibm.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id k14si25296860iol.64.2021.06.30.21.19.38; Wed, 30 Jun 2021 21:19:49 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@ibm.com header.s=pp1 header.b=GYdRC4h9; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=ibm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232295AbhGAETY (ORCPT + 99 others); Thu, 1 Jul 2021 00:19:24 -0400 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:54826 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229577AbhGAETW (ORCPT ); Thu, 1 Jul 2021 00:19:22 -0400 Received: from pps.filterd (m0098396.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.43/8.16.0.43) with SMTP id 16144445193283; Thu, 1 Jul 2021 00:16:20 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : content-transfer-encoding : mime-version; s=pp1; bh=1QeEN8lYtB5E3x3m5VwTT+Q6+645tDNgC2asz074Es0=; b=GYdRC4h9cYtdeNnPrwOxhUpXnbKweWClxnzpPdP+ASedvrH7zP2Vo8haRi8Z8S2G1T3x 67Uc4cR8co855xbVIym4kUUw2paKz9Z2WXhXJnVMa3mSQGYSEzyYwrKRQQZY6o45VzMw m3XqS+kC9eIWD9UuRhgDFn7z7i6hmbUUAK4v0hMxIp4v5VGMwenGo6NY+g4Obi0iMgAG H1/k72WqXPMG9tUdz9oF/T0TSpcp+YGyKodD9wqzJVa0vwlDKOCm5ErnxhrXzPgwVh0o HjxS0b16I1/dRqXY8knQPmSmVMnicLdOVA2QIMl2hz+xZUxUjPUYT/MpGniIlfgf4V6K cQ== Received: from ppma06fra.de.ibm.com (48.49.7a9f.ip4.static.sl-reverse.com [159.122.73.72]) by mx0a-001b2d01.pphosted.com with ESMTP id 39gt05cnyn-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 01 Jul 2021 00:16:20 -0400 Received: from pps.filterd (ppma06fra.de.ibm.com [127.0.0.1]) by ppma06fra.de.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 1614GFsi009388; Thu, 1 Jul 2021 04:16:17 GMT Received: from b06avi18878370.portsmouth.uk.ibm.com (b06avi18878370.portsmouth.uk.ibm.com [9.149.26.194]) by ppma06fra.de.ibm.com with ESMTP id 39dugh94ph-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 01 Jul 2021 04:16:17 +0000 Received: from d06av25.portsmouth.uk.ibm.com (d06av25.portsmouth.uk.ibm.com [9.149.105.61]) by b06avi18878370.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 1614EY6A30212556 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 1 Jul 2021 04:14:34 GMT Received: from d06av25.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 7DB6411C070; Thu, 1 Jul 2021 04:16:14 +0000 (GMT) Received: from d06av25.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 3D88211C050; Thu, 1 Jul 2021 04:16:11 +0000 (GMT) Received: from saptagiri.in.ibm.com (unknown [9.85.122.203]) by d06av25.portsmouth.uk.ibm.com (Postfix) with ESMTP; Thu, 1 Jul 2021 04:16:11 +0000 (GMT) From: Srikar Dronamraju To: Ingo Molnar , Peter Zijlstra , Michael Ellerman Cc: LKML , Mel Gorman , Rik van Riel , Srikar Dronamraju , Thomas Gleixner , Valentin Schneider , Vincent Guittot , Dietmar Eggemann , linuxppc-dev@lists.ozlabs.org, Nathan Lynch , Gautham R Shenoy , Geetika Moolchandani , Laurent Dufour Subject: [PATCH v2 1/2] sched/topology: Skip updating masks for non-online nodes Date: Thu, 1 Jul 2021 09:45:51 +0530 Message-Id: <20210701041552.112072-2-srikar@linux.vnet.ibm.com> X-Mailer: git-send-email 2.26.3 In-Reply-To: <20210701041552.112072-1-srikar@linux.vnet.ibm.com> References: <20210701041552.112072-1-srikar@linux.vnet.ibm.com> X-TM-AS-GCONF: 00 X-Proofpoint-GUID: seh_S1PwfZvT5jv2RQ5mzXErcKr8xvet X-Proofpoint-ORIG-GUID: seh_S1PwfZvT5jv2RQ5mzXErcKr8xvet Content-Transfer-Encoding: 8bit X-Proofpoint-UnRewURL: 0 URL was un-rewritten MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.391,18.0.790 definitions=2021-07-01_01:2021-06-30,2021-07-01 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 impostorscore=0 mlxscore=0 priorityscore=1501 clxscore=1015 suspectscore=0 phishscore=0 lowpriorityscore=0 malwarescore=0 mlxlogscore=999 spamscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2104190000 definitions=main-2107010027 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Currently scheduler doesn't check if node is online before adding CPUs to the node mask. However on some architectures, node distance is only available for nodes that are online. Its not sure how much to rely on the node distance, when one of the nodes is offline. If said node distance is fake (since one of the nodes is offline) and the actual node distance is different, then the cpumask of such nodes when the nodes become becomes online will be wrong. This can cause topology_span_sane to throw up a warning message and the rest of the topology being not updated properly. Resolve this by skipping update of cpumask for nodes that are not online. However by skipping, relevant CPUs may not be set when nodes are onlined. i.e when coming up with NUMA masks at a certain NUMA distance, CPUs that are part of other nodes, which are already online will not be part of the NUMA mask. Hence the first time, a CPU is added to the newly onlined node, add the other CPUs to the numa_mask. Cc: LKML Cc: linuxppc-dev@lists.ozlabs.org Cc: Nathan Lynch Cc: Michael Ellerman Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Valentin Schneider Cc: Gautham R Shenoy Cc: Dietmar Eggemann Cc: Mel Gorman Cc: Vincent Guittot Cc: Rik van Riel Cc: Geetika Moolchandani Cc: Laurent Dufour Reported-by: Geetika Moolchandani Signed-off-by: Srikar Dronamraju --- Changelog v1->v2: v1 link: http://lore.kernel.org/lkml/20210520154427.1041031-4-srikar@linux.vnet.ibm.com/t/#u Update the NUMA masks, whenever 1st CPU is added to cpuless node kernel/sched/topology.c | 25 +++++++++++++++++++++++-- 1 file changed, 23 insertions(+), 2 deletions(-) diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c index b77ad49dc14f..f25dbcab4fd2 100644 --- a/kernel/sched/topology.c +++ b/kernel/sched/topology.c @@ -1833,6 +1833,9 @@ void sched_init_numa(void) sched_domains_numa_masks[i][j] = mask; for_each_node(k) { + if (!node_online(j)) + continue; + if (sched_debug() && (node_distance(j, k) != node_distance(k, j))) sched_numa_warn("Node-distance not symmetric"); @@ -1891,12 +1894,30 @@ void sched_init_numa(void) void sched_domains_numa_masks_set(unsigned int cpu) { int node = cpu_to_node(cpu); - int i, j; + int i, j, empty; + empty = cpumask_empty(sched_domains_numa_masks[0][node]); for (i = 0; i < sched_domains_numa_levels; i++) { for (j = 0; j < nr_node_ids; j++) { - if (node_distance(j, node) <= sched_domains_numa_distance[i]) + if (!node_online(j)) + continue; + + if (node_distance(j, node) <= sched_domains_numa_distance[i]) { cpumask_set_cpu(cpu, sched_domains_numa_masks[i][j]); + + /* + * We skip updating numa_masks for offline + * nodes. However now that the node is + * finally online, CPUs that were added + * earlier, should now be accommodated into + * newly oneline node's numa mask. + */ + if (node != j && empty) { + cpumask_or(sched_domains_numa_masks[i][node], + sched_domains_numa_masks[i][node], + sched_domains_numa_masks[0][j]); + } + } } } } -- 2.27.0