Received: by 2002:a05:7412:8d10:b0:f3:1519:9f41 with SMTP id bj16csp2804568rdb; Fri, 8 Dec 2023 23:01:54 -0800 (PST) X-Google-Smtp-Source: AGHT+IHCSfYazi0DDn7c7DBRnrKtIO/+1XCF5xfK9yzzke4XsNUrQJLa/MJ3scgNUREJ0W5uBgAK X-Received: by 2002:a05:6870:d0c:b0:1fb:32a0:41ce with SMTP id mk12-20020a0568700d0c00b001fb32a041cemr1860326oab.0.1702105314248; Fri, 08 Dec 2023 23:01:54 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1702105314; cv=none; d=google.com; s=arc-20160816; b=PmbCvHnZGxOEcj4/qdRZZpFojACiiGSf0fszwSGVXpbSpVft7OgmlSMutjTjzKGTEd onk7yScvqYLo5QqVVBDIJKuqQue4E8UhZHl4UCcQ16ndVaBOhdpFLbYg2CxUVCPsPgtY WVMVv3s4l+K7ssTLGwuM9mx8JE1uECp2GR1uEclOUe8V92XnDnT7v3vW+fXArDnF0AKN RvQRSr95wwAKHLAR0z1iPRgwDh1hVD//rSfvVOyZ7ECoLqHKz9cRcSSmNzHP5U3GUWtq a1Hi120s8YhrteEzxT/PASjLqguJDl/ZOoi2WlnWv/T3+UstnY2lSnLQKl7piXAFhwFA z0pg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=P9scieKANHFB2osmlAGqD3i5GhIYjTl8+rxMCO4yMPo=; fh=bKWOhIeOlzy6f8PEUMoNA8u5OXJK2PyAtdTHuDM6h/4=; b=XR3kwPenwPPBD8ijZKN05HG/TsyHmbEV0qbE2SA91ISub6t/2Q5+IPRcp+mcKjbbuz TZOjg5JSTx6ziUmAud2EVjdZabDaOGxgURwHUXTftu6cZK28eQH5cbTj0TniR15ITkfb 2VWNST/ker2Pivvi4B2BqwhuPHMg/686bX6rWFbb6rpil84wUrkXpVABwDUwsX3sQkoZ 35SnM+0lzCGmIxnxNOrJLJcOWR5UquVQwuWOfr841VMTH/9Y9T3IhUiv9/vYCa0n4Zat 6jFlmbKvpCez2K0xCXWvPDqrgHRwgqafqnGfAIGnSth8SQhqpo8utqAN0ZWrDFpRFtSy 56SA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b="WqB/lZxg"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from groat.vger.email (groat.vger.email. [23.128.96.35]) by mx.google.com with ESMTPS id b2-20020a63eb42000000b005c278e32054si2610854pgk.677.2023.12.08.23.01.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 08 Dec 2023 23:01:54 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) client-ip=23.128.96.35; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b="WqB/lZxg"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by groat.vger.email (Postfix) with ESMTP id 4256B8080E30; Fri, 8 Dec 2023 23:00:11 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at groat.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234363AbjLIG7e (ORCPT + 99 others); Sat, 9 Dec 2023 01:59:34 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44982 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234359AbjLIG7c (ORCPT ); Sat, 9 Dec 2023 01:59:32 -0500 Received: from mail-qk1-x743.google.com (mail-qk1-x743.google.com [IPv6:2607:f8b0:4864:20::743]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E5C7710D2; Fri, 8 Dec 2023 22:59:37 -0800 (PST) Received: by mail-qk1-x743.google.com with SMTP id af79cd13be357-77f42ee9370so125918685a.2; Fri, 08 Dec 2023 22:59:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1702105177; x=1702709977; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=P9scieKANHFB2osmlAGqD3i5GhIYjTl8+rxMCO4yMPo=; b=WqB/lZxgMBZe+65Tn/ibAnVmwGNn+eJLQ8WWk+8tH73ug0EYSHLsblm0dj9hzSc1u6 /XIRRWuHdvyKcj7d7spjG4z9a6LvlHhMS/HG1wKW7ysqtnWJetJMVnupi51bU3qRmq8R uawlwXP7qJ/We8Dx4sMkUuA+Xzio0FOVljILLKtQR4yR7AlZfmtlp2hQ9TWWhLZSXfDL xyO0Cg4JsTWjkDNEnnWY5v3ZDjwKe1azw75xX6yBjL7x/9XMbkUZdI5jfY3yEK/oMlnK gDoNRTwyE6BPqx5haUO4fEsY1+hyWOcDtR1d9Wpm9OOvB9muqHDRfEt/UDmZWOFgsqNt EfHg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1702105177; x=1702709977; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=P9scieKANHFB2osmlAGqD3i5GhIYjTl8+rxMCO4yMPo=; b=YKAysHDCp9naYF9F2dF7xbALFMq07+qZK3jG0rdxh9ynSxM/69j7G/kYqHxaBVSXcP awuPJPS6NtfbpzbWOU7uwfMZB0SQd5GRtBzhN6JEXuttm5ojfCRIKydIcZWY9sVFFMZk ifm9/TFEurL38nQXddvXlHYDL2x3HGxFge6C0ETjrbUGzAH9VZMf8ZgGkH+K6mKJLOa2 gwhSRIC04wkCv8QQ7xn+rboCQuQ8xKZdjw915OETcAOq+nIkGblboXh6ltocCo6uCnt6 OUEhbueOyXpNXOjq282UG8hjd5x4lRVZKyEBz10bSr2MBbnOrCBOlSUqlI2Dw2YguKEt 2edw== X-Gm-Message-State: AOJu0YwfT2iS86Nu1EJufGdFaVpVqbH7QMy+slkjgVPFPzGybXCmsbTO udrXYHPP4o/uIcv6nRK/EA== X-Received: by 2002:a05:620a:22f5:b0:77e:fba4:39fc with SMTP id p21-20020a05620a22f500b0077efba439fcmr1321966qki.82.1702105176994; Fri, 08 Dec 2023 22:59:36 -0800 (PST) Received: from fedora.mshome.net (pool-173-79-56-208.washdc.fios.verizon.net. [173.79.56.208]) by smtp.gmail.com with ESMTPSA id x8-20020a81b048000000b005df5d592244sm326530ywk.78.2023.12.08.22.59.35 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 08 Dec 2023 22:59:36 -0800 (PST) From: Gregory Price X-Google-Original-From: Gregory Price To: linux-mm@kvack.org Cc: linux-doc@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org, linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, arnd@arndb.de, tglx@linutronix.de, luto@kernel.org, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org, hpa@zytor.com, mhocko@kernel.org, tj@kernel.org, ying.huang@intel.com, gregory.price@memverge.com, corbet@lwn.net, rakie.kim@sk.com, hyeongtak.ji@sk.com, honggyu.kim@sk.com, vtavarespetr@micron.com, peterz@infradead.org, jgroves@micron.com, ravis.opensrc@micron.com, sthanneeru@micron.com, emirakhur@micron.com, Hasan.Maruf@amd.com, seungjun.ha@samsung.com Subject: [PATCH v2 01/11] mm/mempolicy: implement the sysfs-based weighted_interleave interface Date: Sat, 9 Dec 2023 01:59:21 -0500 Message-Id: <20231209065931.3458-2-gregory.price@memverge.com> X-Mailer: git-send-email 2.39.1 In-Reply-To: <20231209065931.3458-1-gregory.price@memverge.com> References: <20231209065931.3458-1-gregory.price@memverge.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=1.7 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,SUBJECT_DRUG_GAP_L,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on groat.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (groat.vger.email [0.0.0.0]); Fri, 08 Dec 2023 23:00:11 -0800 (PST) X-Spam-Level: * From: Rakie Kim This patch provides a way to set interleave weight information under sysfs at /sys/kernel/mm/mempolicy/weighted_interleave/nodeN/weight The sysfs structure is designed as follows. $ tree /sys/kernel/mm/mempolicy/ /sys/kernel/mm/mempolicy/ [1] ├── possible_nodes [2] └── weighted_interleave [3] ├── node0 [4] │  └── weight [5] └── node1   └── weight Each file above can be explained as follows. [1] mm/mempolicy: configuration interface for mempolicy subsystem [2] possible_nodes: list of possible nodes informational interface which may be used across multiple memory policy configurations. Lists the `possible` nodes for which configurations may be required. A `possible` node is one which has been reserved by the kernel at boot, but may or may not be online. For example, the weighted_interleave policy generates a nodeN/ folder for possible node N. [3] weighted_interleave/: config interface for weighted interleave policy [4] weighted_interleave/nodeN/: possible node configurations [5] weighted_interleave/nodeN/weight: weight for nodeN Signed-off-by: Rakie Kim Signed-off-by: Honggyu Kim Co-developed-by: Gregory Price Signed-off-by: Gregory Price Co-developed-by: Hyeongtak Ji Signed-off-by: Hyeongtak Ji --- .../ABI/testing/sysfs-kernel-mm-mempolicy | 18 ++ ...fs-kernel-mm-mempolicy-weighted-interleave | 21 +++ mm/mempolicy.c | 169 ++++++++++++++++++ 3 files changed, 208 insertions(+) create mode 100644 Documentation/ABI/testing/sysfs-kernel-mm-mempolicy create mode 100644 Documentation/ABI/testing/sysfs-kernel-mm-mempolicy-weighted-interleave diff --git a/Documentation/ABI/testing/sysfs-kernel-mm-mempolicy b/Documentation/ABI/testing/sysfs-kernel-mm-mempolicy new file mode 100644 index 000000000000..445377dfd232 --- /dev/null +++ b/Documentation/ABI/testing/sysfs-kernel-mm-mempolicy @@ -0,0 +1,18 @@ +What: /sys/kernel/mm/mempolicy/ +Date: December 2023 +Contact: Linux memory management mailing list +Description: Interface for Mempolicy + +What: /sys/kernel/mm/mempolicy/possible_nodes +Date: December 2023 +Contact: Linux memory management mailing list +Description: The numa nodes which are possible to come online + + A possible numa node is one which has been reserved by the + system at boot, but may or may not be online at runtime. + + Example output: + + ========= ======================================== + "0,1,2,3" nodes 0-3 are possibly online or offline + ========= ======================================== diff --git a/Documentation/ABI/testing/sysfs-kernel-mm-mempolicy-weighted-interleave b/Documentation/ABI/testing/sysfs-kernel-mm-mempolicy-weighted-interleave new file mode 100644 index 000000000000..7c19a606725f --- /dev/null +++ b/Documentation/ABI/testing/sysfs-kernel-mm-mempolicy-weighted-interleave @@ -0,0 +1,21 @@ +What: /sys/kernel/mm/mempolicy/weighted_interleave/ +Date: December 2023 +Contact: Linux memory management mailing list +Description: Configuration Interface for the Weighted Interleave policy + +What: /sys/kernel/mm/mempolicy/weighted_interleave/nodeN/ + /sys/kernel/mm/mempolicy/weighted_interleave/nodeN/weight +Date: December 2023 +Contact: Linux memory management mailing list +Description: Weight configuration interface for nodeN + + The interleave weight for a memory node (N). These weights are + utilized by processes which have set their mempolicy to + MPOL_WEIGHTED_INTERLEAVE and have opted into global weights by + omitting a task-local weight array. + + These weights only affect new allocations, and changes at runtime + will not cause migrations on already allocated pages. + + Minimum weight: 1 + Maximum weight: 255 diff --git a/mm/mempolicy.c b/mm/mempolicy.c index 10a590ee1c89..28dfae195beb 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -131,6 +131,8 @@ static struct mempolicy default_policy = { static struct mempolicy preferred_node_policy[MAX_NUMNODES]; +static char iw_table[MAX_NUMNODES]; + /** * numa_nearest_node - Find nearest node by state * @node: Node id to start the search @@ -3067,3 +3069,170 @@ void mpol_to_str(char *buffer, int maxlen, struct mempolicy *pol) p += scnprintf(p, buffer + maxlen - p, ":%*pbl", nodemask_pr_args(&nodes)); } + +struct iw_node_info { + struct kobject kobj; + int nid; +}; + +static ssize_t node_weight_show(struct kobject *kobj, + struct kobj_attribute *attr, char *buf) +{ + struct iw_node_info *node_info = container_of(kobj, struct iw_node_info, + kobj); + return sysfs_emit(buf, "%d\n", iw_table[node_info->nid]); +} + +static ssize_t node_weight_store(struct kobject *kobj, + struct kobj_attribute *attr, + const char *buf, size_t count) +{ + unsigned char weight = 0; + struct iw_node_info *node_info = NULL; + + node_info = container_of(kobj, struct iw_node_info, kobj); + + if (kstrtou8(buf, 0, &weight) || !weight) + return -EINVAL; + + iw_table[node_info->nid] = weight; + + return count; +} + +static struct kobj_attribute node_weight = + __ATTR(weight, 0664, node_weight_show, node_weight_store); + +static struct attribute *dst_node_attrs[] = { + &node_weight.attr, + NULL, +}; + +static struct attribute_group dst_node_attr_group = { + .attrs = dst_node_attrs, +}; + +static const struct attribute_group *dst_node_attr_groups[] = { + &dst_node_attr_group, + NULL, +}; + +static const struct kobj_type dst_node_kobj_ktype = { + .sysfs_ops = &kobj_sysfs_ops, + .default_groups = dst_node_attr_groups, +}; + +static int add_weight_node(int nid, struct kobject *src_kobj) +{ + struct iw_node_info *node_info = NULL; + int ret; + + node_info = kzalloc(sizeof(struct iw_node_info), GFP_KERNEL); + if (!node_info) + return -ENOMEM; + node_info->nid = nid; + + kobject_init(&node_info->kobj, &dst_node_kobj_ktype); + ret = kobject_add(&node_info->kobj, src_kobj, "node%d", nid); + if (ret) { + pr_err("kobject_add error [node%d]: %d", nid, ret); + kobject_put(&node_info->kobj); + } + return ret; +} + +static int add_weighted_interleave_group(struct kobject *root_kobj) +{ + struct kobject *wi_kobj; + int nid, err; + + wi_kobj = kobject_create_and_add("weighted_interleave", root_kobj); + if (!wi_kobj) { + pr_err("failed to create node kobject\n"); + return -ENOMEM; + } + + for_each_node_state(nid, N_POSSIBLE) { + err = add_weight_node(nid, wi_kobj); + if (err) { + pr_err("failed to add sysfs [node%d]\n", nid); + break; + } + } + if (err) + kobject_put(wi_kobj); + return 0; + +} + +static ssize_t possible_nodes_show(struct kobject *kobj, + struct kobj_attribute *attr, char *buf) +{ + int nid, next_nid; + int len = 0; + + for_each_node_state(nid, N_POSSIBLE) { + len += sysfs_emit_at(buf, len, "%d", nid); + next_nid = next_node(nid, node_states[N_POSSIBLE]); + if (next_nid < MAX_NUMNODES) + len += sysfs_emit_at(buf, len, ","); + } + len += sysfs_emit_at(buf, len, "\n"); + + return len; +} + +static struct kobj_attribute possible_nodes_attr = __ATTR_RO(possible_nodes); + +static struct attribute *mempolicy_attrs[] = { + &possible_nodes_attr.attr, + NULL, +}; + +static const struct attribute_group mempolicy_attr_group = { + .attrs = mempolicy_attrs, + NULL, +}; + +static void mempolicy_kobj_release(struct kobject *kobj) +{ + kfree(kobj); +} + +static const struct kobj_type mempolicy_kobj_ktype = { + .release = mempolicy_kobj_release, + .sysfs_ops = &kobj_sysfs_ops, +}; + +static int __init mempolicy_sysfs_init(void) +{ + int err; + struct kobject *root_kobj; + + memset(&iw_table, 1, sizeof(iw_table)); + + root_kobj = kzalloc(sizeof(struct kobject), GFP_KERNEL); + if (!root_kobj) + return -ENOMEM; + + kobject_init(root_kobj, &mempolicy_kobj_ktype); + err = kobject_add(root_kobj, mm_kobj, "mempolicy"); + if (err) { + pr_err("failed to add kobject to the system\n"); + goto fail_obj; + } + + err = sysfs_create_group(root_kobj, &mempolicy_attr_group); + if (err) { + pr_err("failed to register mempolicy group\n"); + goto fail_obj; + } + + err = add_weighted_interleave_group(root_kobj); +fail_obj: + if (err) + kobject_put(root_kobj); + return err; + +} +late_initcall(mempolicy_sysfs_init); -- 2.39.1