Received: by 2002:a6b:500f:0:0:0:0:0 with SMTP id e15csp1033899iob; Fri, 13 May 2022 20:26:17 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzWeapU4HypnI9hubpbKHsUNsS2n9JIcU5hERGUwO7kE5pYUxcnB3P5/ZVxH1Id1jkM1XRg X-Received: by 2002:a5d:6088:0:b0:20a:e81e:c76a with SMTP id w8-20020a5d6088000000b0020ae81ec76amr6125313wrt.182.1652498776817; Fri, 13 May 2022 20:26:16 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1652498776; cv=none; d=google.com; s=arc-20160816; b=iVWa/BKCzKX0oBUaRpHoChRn3Q3FJZYBZZvBvutFkioCqtUoUT+8c9S9TF53UrEcjA LQ7TgDDXd2ZD6Xlvd5O8YzHqcfyMuRl/LQgfse9BQKVgmj62BUOiuGVoIvdLD0lAMmKv av5G4O76CC2viGGBA7J1fXPBZA/95k1kFz4DH1gnJs1fVBcntxFUgkvxlbRqjw9Q5mpf m0H8hI9RuOEMbtPxjcyEQXF03hI+/b4ke7EaY6Z7ouXQMGVQMdnPUPb511BIak6J+Bld na2gitgzsmSf27IscAr3OwHGD6dEoPE2BZcOSsjcbDb0oTra4FOL9zoeJmits49mBftT 3xQw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:message-id:date:references :in-reply-to:subject:cc:to:from:dkim-signature; bh=sWPxOUMCtrF885mkcMhLxmucNHefmQGu8RUGsftx3+E=; b=fJFMzqC2gkkTlV0qKR0/Ll751uPtdvtuPSramjG8nmhCBowBNmXspwjWq0+NQXjjVK rOggc/BpuJm+XQkmgo7K5yhugiK2V0I1uW3O1wjzyEa19iwxQO3qVQrvLWeHlzxCPWfc bvzCrBUqOsLJcv1Z2LD/Di/SXNsG6NQLaIPNYY7stvDL01vn/NPuLlIIyXKvB/ZbRaCj DFgXHlKTZreUiWDO/JI6eQmAVgbP6HKmACsMi8m2GzBkq5rOriL5YR1G8YMKpBxqUPU1 KQb3k/w0f7+NJaKjelziLmBeGei/fdQRHQMj1xfeQq5QX3dcuAloIkYDlkSfGhSOCGfi FUdA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@ibm.com header.s=pp1 header.b=MHuNYWJ9; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=ibm.com Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [2620:137:e000::1:18]) by mx.google.com with ESMTPS id w1-20020a7bc101000000b0038c77be9b47si6398728wmi.98.2022.05.13.20.26.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 13 May 2022 20:26:16 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) client-ip=2620:137:e000::1:18; Authentication-Results: mx.google.com; dkim=pass header.i=@ibm.com header.s=pp1 header.b=MHuNYWJ9; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=ibm.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 890B640FAE1; Fri, 13 May 2022 17:03:12 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1351079AbiELHg6 (ORCPT + 99 others); Thu, 12 May 2022 03:36:58 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43696 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1351089AbiELHg4 (ORCPT ); Thu, 12 May 2022 03:36:56 -0400 Received: from mx0a-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 17B6322BF9 for ; Thu, 12 May 2022 00:36:49 -0700 (PDT) Received: from pps.filterd (m0098416.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 24C640oA012149; Thu, 12 May 2022 07:36:23 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : in-reply-to : references : date : message-id : mime-version : content-type; s=pp1; bh=sWPxOUMCtrF885mkcMhLxmucNHefmQGu8RUGsftx3+E=; b=MHuNYWJ9UhTziERjej4EZRjp5LFE9QKMvYJBLBCpFMeUyJz1KbxxYc6c+uIduunNiEI+ 8fTI4sNo70MAeXC472u6QV94gA4wDsWArXpb36X0Y+tX3ZTltB2v9n7zoFfkkBDR0/UW 45GKbUOvGXe1F3IEpYtJkkG9+aSvagGWX6xoj3Cdk1J0L8XgpJM+DXdXBv5aiUTVjnvh S5R7M1oimFX7zFKzP4aI8uwioCej2HHAKTWL2Ynclhr0r2TSZu7krnCIQWhKXV2izOao V5zkJ1BJo99O+uk5K609e36C4uU0bcoIOKkVrzRtIM047vCfFUuQidRl5DPO43S69VLj oQ== Received: from pps.reinject (localhost [127.0.0.1]) by mx0b-001b2d01.pphosted.com (PPS) with ESMTPS id 3g0rcmdnr4-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 12 May 2022 07:36:23 +0000 Received: from m0098416.ppops.net (m0098416.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 24C7SxJ3026296; Thu, 12 May 2022 07:36:22 GMT Received: from ppma02dal.us.ibm.com (a.bd.3ea9.ip4.static.sl-reverse.com [169.62.189.10]) by mx0b-001b2d01.pphosted.com (PPS) with ESMTPS id 3g0rcmdnqp-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 12 May 2022 07:36:22 +0000 Received: from pps.filterd (ppma02dal.us.ibm.com [127.0.0.1]) by ppma02dal.us.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 24C7KRXH020085; Thu, 12 May 2022 07:36:21 GMT Received: from b03cxnp08025.gho.boulder.ibm.com (b03cxnp08025.gho.boulder.ibm.com [9.17.130.17]) by ppma02dal.us.ibm.com with ESMTP id 3fwgdasymx-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 12 May 2022 07:36:21 +0000 Received: from b03ledav001.gho.boulder.ibm.com (b03ledav001.gho.boulder.ibm.com [9.17.130.232]) by b03cxnp08025.gho.boulder.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 24C7aKYH24576494 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 12 May 2022 07:36:20 GMT Received: from b03ledav001.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 3FAD26E060; Thu, 12 May 2022 07:36:20 +0000 (GMT) Received: from b03ledav001.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 5231F6E04E; Thu, 12 May 2022 07:36:13 +0000 (GMT) Received: from skywalker.linux.ibm.com (unknown [9.43.96.94]) by b03ledav001.gho.boulder.ibm.com (Postfix) with ESMTP; Thu, 12 May 2022 07:36:12 +0000 (GMT) X-Mailer: emacs 29.0.50 (via feedmail 11-beta-1 I) From: "Aneesh Kumar K.V" To: Wei Xu Cc: "ying.huang@intel.com" , Andrew Morton , Greg Thelen , Yang Shi , Linux Kernel Mailing List , Jagdish Gediya , Michal Hocko , Tim C Chen , Dave Hansen , Alistair Popple , Baolin Wang , Feng Tang , Jonathan Cameron , Davidlohr Bueso , Dan Williams , David Rientjes , Linux MM , Brice Goglin , Hesham Almatary Subject: Re: RFC: Memory Tiering Kernel Interfaces (v2) In-Reply-To: References: <56b41ce6922ed5f640d9bd46a603fa27576532a9.camel@intel.com> Date: Thu, 12 May 2022 13:06:10 +0530 Message-ID: <87y1z7jj85.fsf@linux.ibm.com> MIME-Version: 1.0 Content-Type: text/plain X-TM-AS-GCONF: 00 X-Proofpoint-GUID: AuowE5mPoUqTDU3ID6VFxKdVpkr8UUhI X-Proofpoint-ORIG-GUID: JzQJHaqMgmaIHuZUNN--X5eKjkwIpBU5 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.858,Hydra:6.0.486,FMLib:17.11.64.514 definitions=2022-05-11_07,2022-05-12_01,2022-02-23_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 suspectscore=0 priorityscore=1501 malwarescore=0 impostorscore=0 lowpriorityscore=0 mlxlogscore=999 phishscore=0 spamscore=0 mlxscore=0 adultscore=0 bulkscore=0 clxscore=1015 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2202240000 definitions=main-2205120034 X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,RDNS_NONE, SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Wei Xu writes: > On Thu, May 12, 2022 at 12:12 AM Aneesh Kumar K V > wrote: >> >> On 5/12/22 12:33 PM, ying.huang@intel.com wrote: >> > On Wed, 2022-05-11 at 23:22 -0700, Wei Xu wrote: >> >> Sysfs Interfaces >> >> ================ >> >> >> >> * /sys/devices/system/memtier/memtierN/nodelist >> >> >> >> where N = 0, 1, 2 (the kernel supports only 3 tiers for now). >> >> >> >> Format: node_list >> >> >> >> Read-only. When read, list the memory nodes in the specified tier. >> >> >> >> Tier 0 is the highest tier, while tier 2 is the lowest tier. >> >> >> >> The absolute value of a tier id number has no specific meaning. >> >> What matters is the relative order of the tier id numbers. >> >> >> >> When a memory tier has no nodes, the kernel can hide its memtier >> >> sysfs files. >> >> >> >> * /sys/devices/system/node/nodeN/memtier >> >> >> >> where N = 0, 1, ... >> >> >> >> Format: int or empty >> >> >> >> When read, list the memory tier that the node belongs to. Its value >> >> is empty for a CPU-only NUMA node. >> >> >> >> When written, the kernel moves the node into the specified memory >> >> tier if the move is allowed. The tier assignment of all other nodes >> >> are not affected. >> >> >> >> Initially, we can make this interface read-only. >> > >> > It seems that "/sys/devices/system/node/nodeN/memtier" has all >> > information we needed. Do we really need >> > "/sys/devices/system/memtier/memtierN/nodelist"? >> > >> > That can be gotten via a simple shell command line, >> > >> > $ grep . /sys/devices/system/node/nodeN/memtier | sort -n -k 2 -t ':' >> > >> >> It will be really useful to fetch the memory tier node list in an easy >> fashion rather than reading multiple sysfs directories. If we don't have >> other attributes for memorytier, we could keep >> "/sys/devices/system/memtier/memtierN" a NUMA node list there by >> avoiding /sys/devices/system/memtier/memtierN/nodelist >> >> -aneesh > > It is harder to implement memtierN as just a file and doesn't follow > the existing sysfs pattern, either. Besides, it is extensible to have > memtierN as a directory. diff --git a/drivers/base/node.c b/drivers/base/node.c index 6248326f944d..251f38ec3816 100644 --- a/drivers/base/node.c +++ b/drivers/base/node.c @@ -1097,12 +1097,49 @@ static struct attribute *node_state_attrs[] = { NULL }; +#define MAX_TIER 3 +nodemask_t memory_tier[MAX_TIER]; + +#define _TIER_ATTR_RO(name, tier_index) \ + { __ATTR(name, 0444, show_tier, NULL), tier_index, NULL } + +struct memory_tier_attr { + struct device_attribute attr; + int tier_index; + int (*write)(nodemask_t nodes); +}; + +static ssize_t show_tier(struct device *dev, + struct device_attribute *attr, char *buf) +{ + struct memory_tier_attr *mt = container_of(attr, struct memory_tier_attr, attr); + + return sysfs_emit(buf, "%*pbl\n", + nodemask_pr_args(&memory_tier[mt->tier_index])); +} + static const struct attribute_group memory_root_attr_group = { .attrs = node_state_attrs, }; + +#define TOP_TIER 0 +static struct memory_tier_attr memory_tiers[] = { + [0] = _TIER_ATTR_RO(memory_top_tier, TOP_TIER), +}; + +static struct attribute *memory_tier_attrs[] = { + &memory_tiers[0].attr.attr, + NULL +}; + +static const struct attribute_group memory_tier_attr_group = { + .attrs = memory_tier_attrs, +}; + static const struct attribute_group *cpu_root_attr_groups[] = { &memory_root_attr_group, + &memory_tier_attr_group, NULL, }; As long as we have the ability to see the nodelist, I am good with the proposal. -aneesh