Received: by 2002:a05:6902:102b:0:0:0:0 with SMTP id x11csp418022ybt; Wed, 24 Jun 2020 02:31:22 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwskvln8oJ9vHPjZFNDDivmgYdXtTJxMywz5/3q0+5/MQvXs3Ss5JCukz4EFCyj7be3G+qS X-Received: by 2002:aa7:cd6d:: with SMTP id ca13mr24786459edb.280.1592991082614; Wed, 24 Jun 2020 02:31:22 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1592991082; cv=none; d=google.com; s=arc-20160816; b=F4TTO7wcK5I1H55t7QXpmtv0SnpkuXyUEKCn1Z4A2J6m/mcfQexFWQUYtWnqQaa8rU ZnwUVwBH6B+RroQSVofm6wApfFD0r9nS6F1id2aqSo/lKYDHjzoaTwCMfnAFllKyq8zg +IrRNmzCdXx3EWzbRAtVfQCDnS4xOtUThRhePPxRFTGfqFTmqxblj38dzkeSLOtjJFCO YqQlHmFeV6SpYgse4MLng+kFXjyF4f4uuXlfk+/3rryx6jLcTmW5S+9d2iq91nE05UlC u4cCAkG2v7BfgUJWIHjshUPRgDp1v7ZRGZMLDQXfg6tfjGP4Q3gmQwOvOGnTC7umWCVV tNag== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from; bh=UEKufXFeNvENe3PxDRY96uB9op5MEtyN5vFiSQOfsKA=; b=txL3ugu2g0awmS3QejSUiMZiU2lbkQIeZ+PdMsw6D73lX/8SuIoPnhNXM+jkJsUDzU 6daQHnya31GRJazYdCDxjvQ0qABuhrye/to8fRqR6Spx+R4spdxucor7pk0cIBEZUmju q6LWG2yP1A04TPNvXST/5hjmyIo9+EmOdF6q894O/i41i0oJldVzabGSc2up+vWMLzdG B+urKn06RxJ2gvNItC4YPy3bZ2oOx949wE9sljXE85OMS04r1cac6QsKvQAeYPgGUE9P Ab60NRgnZMV5rSKue3m/rqYPYSeQfETOweqCFxADprLmQRpl1suT8S2EH2WLcZ8GiwL8 cfzw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id y2si12899027edu.320.2020.06.24.02.30.59; Wed, 24 Jun 2020 02:31:22 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2389600AbgFXJ3p (ORCPT + 99 others); Wed, 24 Jun 2020 05:29:45 -0400 Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]:31652 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S2388336AbgFXJ3l (ORCPT ); Wed, 24 Jun 2020 05:29:41 -0400 Received: from pps.filterd (m0098419.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id 05O94A7L183574; Wed, 24 Jun 2020 05:29:28 -0400 Received: from pps.reinject (localhost [127.0.0.1]) by mx0b-001b2d01.pphosted.com with ESMTP id 31uwyehxx2-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 24 Jun 2020 05:29:28 -0400 Received: from m0098419.ppops.net (m0098419.ppops.net [127.0.0.1]) by pps.reinject (8.16.0.36/8.16.0.36) with SMTP id 05O94RaN184884; Wed, 24 Jun 2020 05:29:28 -0400 Received: from ppma01fra.de.ibm.com (46.49.7a9f.ip4.static.sl-reverse.com [159.122.73.70]) by mx0b-001b2d01.pphosted.com with ESMTP id 31uwyehxwd-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 24 Jun 2020 05:29:27 -0400 Received: from pps.filterd (ppma01fra.de.ibm.com [127.0.0.1]) by ppma01fra.de.ibm.com (8.16.0.42/8.16.0.42) with SMTP id 05O9JuKP021164; Wed, 24 Jun 2020 09:29:26 GMT Received: from b06cxnps3074.portsmouth.uk.ibm.com (d06relay09.portsmouth.uk.ibm.com [9.149.109.194]) by ppma01fra.de.ibm.com with ESMTP id 31uurur89p-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 24 Jun 2020 09:29:25 +0000 Received: from d06av21.portsmouth.uk.ibm.com (d06av21.portsmouth.uk.ibm.com [9.149.105.232]) by b06cxnps3074.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 05O9TNO59503170 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 24 Jun 2020 09:29:23 GMT Received: from d06av21.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id B20E25206B; Wed, 24 Jun 2020 09:29:23 +0000 (GMT) Received: from srikart450.in.ibm.com (unknown [9.102.29.235]) by d06av21.portsmouth.uk.ibm.com (Postfix) with ESMTP id 335C752057; Wed, 24 Jun 2020 09:29:17 +0000 (GMT) From: Srikar Dronamraju To: Andrew Morton Cc: Srikar Dronamraju , linuxppc-dev@lists.ozlabs.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Michal Hocko , Mel Gorman , Vlastimil Babka , "Kirill A. Shutemov" , Christopher Lameter , Michael Ellerman , Linus Torvalds , Gautham R Shenoy , Satheesh Rajendran , David Hildenbrand Subject: [PATCH v5 3/3] mm/page_alloc: Keep memoryless cpuless node 0 offline Date: Wed, 24 Jun 2020 14:58:46 +0530 Message-Id: <20200624092846.9194-4-srikar@linux.vnet.ibm.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20200624092846.9194-1-srikar@linux.vnet.ibm.com> References: <20200624092846.9194-1-srikar@linux.vnet.ibm.com> X-TM-AS-GCONF: 00 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.216,18.0.687 definitions=2020-06-24_04:2020-06-24,2020-06-24 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 impostorscore=0 mlxlogscore=999 mlxscore=0 priorityscore=1501 phishscore=0 malwarescore=0 cotscore=-2147483648 lowpriorityscore=0 bulkscore=0 clxscore=1015 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2004280000 definitions=main-2006240063 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Currently Linux kernel with CONFIG_NUMA on a system with multiple possible nodes, marks node 0 as online at boot. However in practice, there are systems which have node 0 as memoryless and cpuless. This can cause numa_balancing to be enabled on systems with only one node with memory and CPUs. The existence of this dummy node which is cpuless and memoryless node can confuse users/scripts looking at output of lscpu / numactl. By marking, N_ONLINE as NODE_MASK_NONE, lets stop assuming that Node 0 is always online. v5.8-rc2 available: 2 nodes (0,2) node 0 cpus: node 0 size: 0 MB node 0 free: 0 MB node 2 cpus: 0 1 2 3 4 5 6 7 node 2 size: 32625 MB node 2 free: 31490 MB node distances: node 0 2 0: 10 20 2: 20 10 proc and sys files ------------------ /sys/devices/system/node/online: 0,2 /proc/sys/kernel/numa_balancing: 1 /sys/devices/system/node/has_cpu: 2 /sys/devices/system/node/has_memory: 2 /sys/devices/system/node/has_normal_memory: 2 /sys/devices/system/node/possible: 0-31 v5.8-rc2 + patch ------------------ available: 1 nodes (2) node 2 cpus: 0 1 2 3 4 5 6 7 node 2 size: 32625 MB node 2 free: 31487 MB node distances: node 2 2: 10 proc and sys files ------------------ /sys/devices/system/node/online: 2 /proc/sys/kernel/numa_balancing: 0 /sys/devices/system/node/has_cpu: 2 /sys/devices/system/node/has_memory: 2 /sys/devices/system/node/has_normal_memory: 2 /sys/devices/system/node/possible: 0-31 Note: On Powerpc, cpu_to_node of possible but not present cpus would previously return 0. Hence this commit depends on commit ("powerpc/numa: Set numa_node for all possible cpus") and commit ("powerpc/numa: Prefer node id queried from vphn"). Without the 2 commits, Powerpc system might crash. 1. User space applications like Numactl, lscpu, that parse the sysfs tend to believe there is an extra online node. This tends to confuse users and applications. Other user space applications start believing that system was not able to use all the resources (i.e missing resources) or the system was not setup correctly. 2. Also existence of dummy node also leads to inconsistent information. The number of online nodes is inconsistent with the information in the device-tree and resource-dump 3. When the dummy node is present, single node non-Numa systems end up showing up as NUMA systems and numa_balancing gets enabled. This will mean we take the hit from the unnecessary numa hinting faults. Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org Cc: Michal Hocko Cc: Mel Gorman Cc: Vlastimil Babka Cc: "Kirill A. Shutemov" Cc: Christopher Lameter Cc: Michael Ellerman Cc: Andrew Morton Cc: Linus Torvalds Cc: Gautham R Shenoy Cc: Satheesh Rajendran Cc: David Hildenbrand Signed-off-by: Srikar Dronamraju --- Changelog v4:->v5: - rebased to v5.8-rc2 link v4: http://lore.kernel.org/lkml/20200512132937.19295-1-srikar@linux.vnet.ibm.com/t/#u Changelog v1:->v2: - Rebased to v5.7-rc3 Link v2: https://lore.kernel.org/linuxppc-dev/20200428093836.27190-1-srikar@linux.vnet.ibm.com/t/#u mm/page_alloc.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 48eb0f1410d4..5187664558e1 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -117,8 +117,10 @@ EXPORT_SYMBOL(latent_entropy); */ nodemask_t node_states[NR_NODE_STATES] __read_mostly = { [N_POSSIBLE] = NODE_MASK_ALL, +#ifdef CONFIG_NUMA + [N_ONLINE] = NODE_MASK_NONE, +#else [N_ONLINE] = { { [0] = 1UL } }, -#ifndef CONFIG_NUMA [N_NORMAL_MEMORY] = { { [0] = 1UL } }, #ifdef CONFIG_HIGHMEM [N_HIGH_MEMORY] = { { [0] = 1UL } }, -- 2.18.1