Received: by 2002:a05:7412:419a:b0:f3:1519:9f41 with SMTP id i26csp3814929rdh; Tue, 28 Nov 2023 04:51:59 -0800 (PST) X-Google-Smtp-Source: AGHT+IELwdrWzNxey0pNzFqa+WVUpcI/0NQltzylgmVBJKoTGrcOzW6KGkM3TKm4/09C1yA84uJp X-Received: by 2002:a05:6a00:a87:b0:6bd:b7c5:f777 with SMTP id b7-20020a056a000a8700b006bdb7c5f777mr16976351pfl.16.1701175918975; Tue, 28 Nov 2023 04:51:58 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1701175918; cv=none; d=google.com; s=arc-20160816; b=XZqCfvGsliRX+cQtLzzxukMe1AimZrxAOzxabEvwPUwefDQgmMjaiaoLXKeu2SHwse ClpJ5XlekLykK1i6utlkBofNazR72dhS59ot4NabjfBecXN11fZIjw1E4N2zx0j4o/RN x9BGa9/VotusbyFgx5YkIZE3cDE/mfDw9/ktXdTEL7mch3z670ebD+0xfVydqJZux2zn s7nNUKrs9UkXzzp/N9fRbheFimHJn4G3/lDHrFxrAkFiBDDT42+SULIVTKGZhiiN8bMu Z3+fCARxnwgd5LGcG9xkaW23nXkgtBmYLY0j/GtycpYpD5SfRpsjLZ28kJBGe8f5500a aAFg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=yb3aOGC/TA01KFAO1ZDW1bDfbV71HXxhr9myMt4NVJ4=; fh=p0bwCLj4P8ft7U/cqCqSiMx5LPepDG79kBiaKf22ILo=; b=xMdnmQ8k7JNygO3dMbyGxj/+yZuHFXKwlJDG0wcifCJ/anmv+2me+NVKESxpkAwGh2 +mTd8bQ+w1+gmav7dgtcFZgtvvBa1JvoQiDJGUT5FI9YoRnD0HHY0NqRwKpbUZk+A7IX 9lLXVTSeFNIwYoDQGrgpHXrwK36zWZZCEGdNrOyXNoNJaC6sAtvpG0hgw6IuPr8KYoTd 2DmikCl0YllYW8vts9JW66w1GtaddTI0SS6qloVxyYBa85mUuFvzomzkK2v5OwSXhSCA KG0Z7uMYtPKpMRS5KuRpFkD8Ry+FP1wK888xIqSCFbyzLF85IYXIQzfAN9YZhlLnCmxt U+Qg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Return-Path: Received: from snail.vger.email (snail.vger.email. [2620:137:e000::3:7]) by mx.google.com with ESMTPS id b13-20020a6541cd000000b005b8615b9fe6si11793449pgq.287.2023.11.28.04.51.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 28 Nov 2023 04:51:58 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) client-ip=2620:137:e000::3:7; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by snail.vger.email (Postfix) with ESMTP id C931A80755F1; Tue, 28 Nov 2023 04:51:00 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at snail.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1344660AbjK1Mup (ORCPT + 99 others); Tue, 28 Nov 2023 07:50:45 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43482 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234844AbjK1Mui (ORCPT ); Tue, 28 Nov 2023 07:50:38 -0500 Received: from szxga02-in.huawei.com (szxga02-in.huawei.com [45.249.212.188]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7F5D6D6D for ; Tue, 28 Nov 2023 04:50:44 -0800 (PST) Received: from kwepemm000018.china.huawei.com (unknown [172.30.72.57]) by szxga02-in.huawei.com (SkyGuard) with ESMTP id 4Sfj3C1QgjzWhqF; Tue, 28 Nov 2023 20:49:59 +0800 (CST) Received: from DESKTOP-RAUQ1L5.china.huawei.com (10.174.179.172) by kwepemm000018.china.huawei.com (7.193.23.4) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.35; Tue, 28 Nov 2023 20:50:40 +0800 From: Weixi Zhu To: , , CC: , , , , , , , , , , , , , , , , , , , , , , , , , Weixi Zhu Subject: [RFC PATCH 1/6] mm/gmem: add heterogeneous NUMA node Date: Tue, 28 Nov 2023 20:50:20 +0800 Message-ID: <20231128125025.4449-2-weixi.zhu@huawei.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20231128125025.4449-1-weixi.zhu@huawei.com> References: <20231128125025.4449-1-weixi.zhu@huawei.com> MIME-Version: 1.0 Content-Transfer-Encoding: 7BIT Content-Type: text/plain; charset=US-ASCII X-Originating-IP: [10.174.179.172] X-ClientProxiedBy: dggems705-chm.china.huawei.com (10.3.19.182) To kwepemm000018.china.huawei.com (7.193.23.4) X-CFilter-Loop: Reflected X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_BLOCKED,RCVD_IN_MSPIKE_H5,RCVD_IN_MSPIKE_WL, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (snail.vger.email [0.0.0.0]); Tue, 28 Nov 2023 04:51:01 -0800 (PST) This patch adds a new NUMA node state, named N_HETEROGENEOUS. It is utilized to identify heterogeneous NUMA (hNUMA) node. Note that hNUMA node may not be directly accessible by the CPU. Each hNUMA node can be identified with a NUMA id. This can be extended to provide NUMA topology including device local DRAM, where a cache-coherent bus does not need to exist between the CPU and device local DRAM. Furthermore, this allows an application user to issue memory hints that bind with specific hNUMA nodes. Signed-off-by: Weixi Zhu --- drivers/base/node.c | 6 ++++ include/linux/gmem.h | 19 ++++++++++ include/linux/nodemask.h | 10 ++++++ init/main.c | 2 ++ mm/Kconfig | 14 ++++++++ mm/Makefile | 1 + mm/gmem.c | 78 ++++++++++++++++++++++++++++++++++++++++ mm/page_alloc.c | 3 ++ 8 files changed, 133 insertions(+) create mode 100644 include/linux/gmem.h create mode 100644 mm/gmem.c diff --git a/drivers/base/node.c b/drivers/base/node.c index 493d533f8375..aa4d2ca266aa 100644 --- a/drivers/base/node.c +++ b/drivers/base/node.c @@ -928,6 +928,9 @@ static struct node_attr node_state_attr[] = { [N_CPU] = _NODE_ATTR(has_cpu, N_CPU), [N_GENERIC_INITIATOR] = _NODE_ATTR(has_generic_initiator, N_GENERIC_INITIATOR), +#ifdef CONFIG_GMEM + [N_HETEROGENEOUS] = _NODE_ATTR(has_hetero_memory, N_HETEROGENEOUS), +#endif }; static struct attribute *node_state_attrs[] = { @@ -940,6 +943,9 @@ static struct attribute *node_state_attrs[] = { &node_state_attr[N_MEMORY].attr.attr, &node_state_attr[N_CPU].attr.attr, &node_state_attr[N_GENERIC_INITIATOR].attr.attr, +#ifdef CONFIG_GMEM + &node_state_attr[N_HETEROGENEOUS].attr.attr, +#endif NULL }; diff --git a/include/linux/gmem.h b/include/linux/gmem.h new file mode 100644 index 000000000000..fff877873557 --- /dev/null +++ b/include/linux/gmem.h @@ -0,0 +1,19 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Generalized Memory Management. + * + * Copyright (C) 2023- Huawei, Inc. + * Author: Weixi Zhu + * + */ +#ifndef _GMEM_H +#define _GMEM_H + +#ifdef CONFIG_GMEM +/* h-NUMA topology */ +void __init hnuma_init(void); +#else +static inline void hnuma_init(void) {} +#endif + +#endif /* _GMEM_H */ diff --git a/include/linux/nodemask.h b/include/linux/nodemask.h index 8d07116caaf1..66e4640a52ba 100644 --- a/include/linux/nodemask.h +++ b/include/linux/nodemask.h @@ -407,6 +407,9 @@ enum node_states { N_MEMORY, /* The node has memory(regular, high, movable) */ N_CPU, /* The node has one or more cpus */ N_GENERIC_INITIATOR, /* The node has one or more Generic Initiators */ +#ifdef CONFIG_GMEM + N_HETEROGENEOUS, /* The node has heterogeneous memory */ +#endif NR_NODE_STATES }; @@ -536,6 +539,13 @@ static inline int node_random(const nodemask_t *maskp) #define for_each_node(node) for_each_node_state(node, N_POSSIBLE) #define for_each_online_node(node) for_each_node_state(node, N_ONLINE) +#ifdef CONFIG_GMEM +/* For h-NUMA topology */ +#define hnode_map node_states[N_HETEROGENEOUS] +#define num_hnodes() num_node_state(N_HETEROGENEOUS) +#define for_each_hnode(node) for_each_node_state(node, N_HETEROGENEOUS) +#endif + /* * For nodemask scratch area. * NODEMASK_ALLOC(type, name) allocates an object with a specified type and diff --git a/init/main.c b/init/main.c index e24b0780fdff..12dfb5b63d51 100644 --- a/init/main.c +++ b/init/main.c @@ -100,6 +100,7 @@ #include #include #include +#include #include #include @@ -901,6 +902,7 @@ void start_kernel(void) setup_per_cpu_areas(); smp_prepare_boot_cpu(); /* arch-specific boot-cpu hooks */ boot_cpu_hotplug_init(); + hnuma_init(); pr_notice("Kernel command line: %s\n", saved_command_line); /* parameters may set static keys */ diff --git a/mm/Kconfig b/mm/Kconfig index 89971a894b60..1a7d8194513c 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -1270,6 +1270,20 @@ config LOCK_MM_AND_FIND_VMA bool depends on !STACK_GROWSUP +config GMEM + bool "generalized memory management for external memory devices" + depends on (ARM64 || X86_64) && MMU && TRANSPARENT_HUGEPAGE + select ARCH_USES_HIGH_VMA_FLAGS + default y + help + Supporting GMEM (generalized memory management) for external memory + devices + + GMEM extends Linux MM to share its machine-independent MM code. Only + high-level interface is provided for device drivers. This prevents + accelerator drivers from reinventing the wheel, but relies on drivers to + implement their hardware-dependent functions declared by GMEM. + source "mm/damon/Kconfig" endmenu diff --git a/mm/Makefile b/mm/Makefile index 33873c8aedb3..f48ea2eb4a44 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -138,3 +138,4 @@ obj-$(CONFIG_IO_MAPPING) += io-mapping.o obj-$(CONFIG_HAVE_BOOTMEM_INFO_NODE) += bootmem_info.o obj-$(CONFIG_GENERIC_IOREMAP) += ioremap.o obj-$(CONFIG_SHRINKER_DEBUG) += shrinker_debug.o +obj-$(CONFIG_GMEM) += gmem.o diff --git a/mm/gmem.c b/mm/gmem.c new file mode 100644 index 000000000000..767eb070b22e --- /dev/null +++ b/mm/gmem.c @@ -0,0 +1,78 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Generalized Memory Management. + * + * Copyright (C) 2023- Huawei, Inc. + * Author: Weixi Zhu + * + */ +#include +#include + +DEFINE_SPINLOCK(hnode_lock); + +struct hnode { + unsigned int id; + struct gm_dev *dev; + struct xarray pages; +}; + +struct hnode *hnodes[MAX_NUMNODES]; + +static bool is_hnode(int node) +{ + return !node_isset(node, node_possible_map) && + node_isset(node, hnode_map); +} + +static bool is_hnode_allowed(int node) +{ + return is_hnode(node) && node_isset(node, current->mems_allowed); +} + +static struct hnode *get_hnode(unsigned int hnid) +{ + return hnodes[hnid]; +} + +void __init hnuma_init(void) +{ + unsigned int node; + + for_each_node(node) + node_set(node, hnode_map); +} + +static unsigned int alloc_hnode_id(void) +{ + unsigned int node; + + spin_lock(&hnode_lock); + node = first_unset_node(hnode_map); + node_set(node, hnode_map); + spin_unlock(&hnode_lock); + + return node; +} + +static void free_hnode_id(unsigned int nid) +{ + node_clear(nid, hnode_map); +} + +static void hnode_init(struct hnode *hnode, unsigned int hnid, + struct gm_dev *dev) +{ + hnodes[hnid] = hnode; + hnodes[hnid]->id = hnid; + hnodes[hnid]->dev = dev; + xa_init(&hnodes[hnid]->pages); +} + +static void hnode_deinit(unsigned int hnid, struct gm_dev *dev) +{ + hnodes[hnid]->id = 0; + hnodes[hnid]->dev = NULL; + xa_destroy(&hnodes[hnid]->pages); + hnodes[hnid] = NULL; +} diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 733732e7e0ba..a785b62a1542 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -192,6 +192,9 @@ EXPORT_SYMBOL(latent_entropy); nodemask_t node_states[NR_NODE_STATES] __read_mostly = { [N_POSSIBLE] = NODE_MASK_ALL, [N_ONLINE] = { { [0] = 1UL } }, +#ifdef CONFIG_GMEM + [N_HETEROGENEOUS] = NODE_MASK_NONE, +#endif #ifndef CONFIG_NUMA [N_NORMAL_MEMORY] = { { [0] = 1UL } }, #ifdef CONFIG_HIGHMEM -- 2.25.1