Received: by 2002:a05:6602:18e:0:0:0:0 with SMTP id m14csp2413829ioo; Sat, 28 May 2022 12:51:02 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzNt8O6kRAog7hSAiCWF3rkA9aTSKyvYCPDGHIjWXtqufUxdSKQzgQjzX3X3ASgNvfr3DqA X-Received: by 2002:a17:90a:34c9:b0:1e2:8e6e:b46e with SMTP id m9-20020a17090a34c900b001e28e6eb46emr7168180pjf.22.1653767461878; Sat, 28 May 2022 12:51:01 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1653767461; cv=none; d=google.com; s=arc-20160816; b=K6qaTel5dBFpJYuOSW2YnclwYQeMqUGPKZH9fNTnp+3XqeRcIwMb7fpYN9gYmd6O9b HjvEWlYu5E3ys9ZwSpRRAyG+qWkJG7iBMSu3EdAK+Cg9hBJEWSBkz/n/6KKWSk87C1zK vmM/luv/y4SyxqUmCS/YVfBe1eGenE085ihaxy+oidHBI6cEb+j5CGxEcQqpBi71nF2O 0vtoprlRmaem3pRcydYHSLJLeJjl4mVan2AChvTvsfHEPcNdz6gb4lmQ3O5fX+B4xksV jFD0IxVuW9Q/+ryytaagE6kFdwFBoWezaYuHahdsXsNT7NIYVyBUMiiOzzdqNV8vs4uj dHjw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:date:to:from:subject:message-id :dkim-signature; bh=UYe/LzZxn1vrLko0cxxDfjH8mceYM/L2Q2I0eyg9Ksg=; b=Z34EoLMeaArOQE2uKW3faWvzRuMp3G9OrW+UcJocGmfOOqy138w9jsax1rAlgRx75C J5nlGlbzf+xzXzMV6GVmY0gjoUkjY4SDt0jlZqnpvkQzwrTYT45SIIfVeLtdpuxKa3/X 0moIj6FzfRUpePlDCnjcMVIKKiVZR3NfMEpuCL6q78p86E+4VjYFiYIDlhLXCXTJTVqD L4QTP+pcbVOfLRtan1Xzb89gsPAIXUNDx77ScQGfnlentlbeGuj2C4wGN4o7UCVTy+N1 WhtzKvO3suMP4NQAjcSw6BHyfqXdVAGmu2lXi6UnbI/TJqBrJnrmiu6kq6i/c+qnw/vC BySw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=hnwLLggY; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [2620:137:e000::1:18]) by mx.google.com with ESMTPS id u8-20020a17090341c800b0015e84c97a93si12222839ple.111.2022.05.28.12.51.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 28 May 2022 12:51:01 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) client-ip=2620:137:e000::1:18; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=hnwLLggY; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id D0544424A3; Sat, 28 May 2022 12:11:21 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237536AbiE0C6v (ORCPT + 99 others); Thu, 26 May 2022 22:58:51 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35248 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235641AbiE0C6t (ORCPT ); Thu, 26 May 2022 22:58:49 -0400 Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C6A724D273 for ; Thu, 26 May 2022 19:58:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1653620326; x=1685156326; h=message-id:subject:from:to:date:in-reply-to:references: mime-version:content-transfer-encoding; bh=KGUTGZMWaZDLDmFTM7fXGG3Sa4D1+M55pgWF6/T02qs=; b=hnwLLggYy9P4AkNwqUh7VMYgreA0jGR+vwzWYs55oLzGdhclyzTBrnuy FxYAex6xVwnW3eEl6ha9acg5XRg6JyQOuXsMpwqLQMDx61XMNeL5Ghj8X vBzF2T7cJe3JivStxtIVeWEsRCYITEshtkHtyuqeja0YEQDwHGQx31nWj QGdO9PaGb87vV2aNib8MGqT8vuaVWZ+9eXiMuhJQFjGv8uXcRQG56ghKE X8X4csbnB8ZAqBTppL6nt6qO0b5hoVSKjc7IQW/OP7dJ1Cnbt/hsLGZPZ FLUu41E7Oj9zLLBDzWU8DLaJ1+v9LUsq6p+SA7Z+SEDOQQulAJ7f5wih7 A==; X-IronPort-AV: E=McAfee;i="6400,9594,10359"; a="261980321" X-IronPort-AV: E=Sophos;i="5.91,254,1647327600"; d="scan'208";a="261980321" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 May 2022 19:58:46 -0700 X-IronPort-AV: E=Sophos;i="5.91,254,1647327600"; d="scan'208";a="603600727" Received: from penghuil-mobl1.ccr.corp.intel.com ([10.254.212.119]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 May 2022 19:58:41 -0700 Message-ID: Subject: Re: RFC: Memory Tiering Kernel Interfaces (v3) From: Ying Huang To: Wei Xu , Andrew Morton , Greg Thelen , Yang Shi , "Aneesh Kumar K.V" , Davidlohr Bueso , Tim C Chen , Brice Goglin , Michal Hocko , Linux Kernel Mailing List , Hesham Almatary , Dave Hansen , Jonathan Cameron , Alistair Popple , Dan Williams , Feng Tang , Linux MM , Jagdish Gediya , Baolin Wang , David Rientjes Date: Fri, 27 May 2022 10:58:39 +0800 In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" User-Agent: Evolution 3.38.3-1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.8 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RDNS_NONE,SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, 2022-05-26 at 14:22 -0700, Wei Xu wrote: > Changes since v2 > ================ > * Updated the design and examples to use "rank" instead of device ID >   to determine the order between memory tiers for better flexibility. > > Overview > ======== > > The current kernel has the basic memory tiering support: Inactive > pages on a higher tier NUMA node can be migrated (demoted) to a lower > tier NUMA node to make room for new allocations on the higher tier > NUMA node. Frequently accessed pages on a lower tier NUMA node can be > migrated (promoted) to a higher tier NUMA node to improve the > performance. > > In the current kernel, memory tiers are defined implicitly via a > demotion path relationship between NUMA nodes, which is created during > the kernel initialization and updated when a NUMA node is hot-added or > hot-removed. The current implementation puts all nodes with CPU into > the top tier, and builds the tier hierarchy tier-by-tier by > establishing the per-node demotion targets based on the distances > between nodes. > > This current memory tier kernel interface needs to be improved for > several important use cases: > > * The current tier initialization code always initializes >   each memory-only NUMA node into a lower tier. But a memory-only >   NUMA node may have a high performance memory device (e.g. a DRAM >   device attached via CXL.mem or a DRAM-backed memory-only node on >   a virtual machine) and should be put into a higher tier. > > * The current tier hierarchy always puts CPU nodes into the top >   tier. But on a system with HBM (e.g. GPU memory) devices, these >   memory-only HBM NUMA nodes should be in the top tier, and DRAM nodes >   with CPUs are better to be placed into the next lower tier. > > * Also because the current tier hierarchy always puts CPU nodes >   into the top tier, when a CPU is hot-added (or hot-removed) and >   triggers a memory node from CPU-less into a CPU node (or vice >   versa), the memory tier hierarchy gets changed, even though no >   memory node is added or removed. This can make the tier >   hierarchy unstable and make it difficult to support tier-based >   memory accounting. > > * A higher tier node can only be demoted to selected nodes on the >   next lower tier as defined by the demotion path, not any other >   node from any lower tier. This strict, hard-coded demotion order >   does not work in all use cases (e.g. some use cases may want to >   allow cross-socket demotion to another node in the same demotion >   tier as a fallback when the preferred demotion node is out of >   space), and has resulted in the feature request for an interface to >   override the system-wide, per-node demotion order from the >   userspace. This demotion order is also inconsistent with the page >   allocation fallback order when all the nodes in a higher tier are >   out of space: The page allocation can fall back to any node from >   any lower tier, whereas the demotion order doesn't allow that. > > * There are no interfaces for the userspace to learn about the memory >   tier hierarchy in order to optimize its memory allocations. > > I'd like to propose revised memory tier kernel interfaces based on > the discussions in the threads: > > - https://lore.kernel.org/lkml/20220425201728.5kzm4seu7rep7ndr@offworld/T/ > - https://lore.kernel.org/linux-mm/20220426114300.00003ad8@Huawei.com/t/ > - https://lore.kernel.org/linux-mm/867bc216386eb6cbf54648f23e5825830f5b922e.camel@intel.com/T/ > - https://lore.kernel.org/linux-mm/d6314cfe1c7898a6680bed1e7cc93b0ab93e3155.camel@intel.com/T/ > > > High-level Design Ideas > ======================= > > * Define memory tiers explicitly, not implicitly. > > * Memory tiers are defined based on hardware capabilities of memory >   nodes, not their relative node distances between each other. > > * The tier assignment of each node is independent from each other. >   Moving a node from one tier to another tier doesn't affect the tier >   assignment of any other node. > > * The node-tier association is stable. A node can be reassigned to a >   different tier only under the specific conditions that don't block >   future tier-based memory cgroup accounting. > > * A node can demote its pages to any nodes of any lower tiers. The >   demotion target node selection follows the allocation fallback order >   of the source node, which is built based on node distances. The >   demotion targets are also restricted to only the nodes from the tiers >   lower than the source node. We no longer need to maintain a separate >   per-node demotion order (node_demotion[]). > > > Sysfs Interfaces > ================ > > * /sys/devices/system/memtier/ > >   This is the directory containing the information about memory tiers. > >   Each memory tier has its own subdirectory. > >   The order of memory tiers is determined by their rank values, not by >   their memtier device names. > >   - /sys/devices/system/memtier/possible > >     Format: ordered list of "memtier(rank)" >     Example: 0(64), 1(128), 2(192) > >     Read-only. When read, list all available memory tiers and their >     associated ranks, ordered by the rank values (from the highest >      tier to the lowest tier). I like the idea of "possible" file. And I think we can show default tier too. That is, if "1(128)" is the default tier (tier with DRAM), then the list can be, " 0/64 [1/128] 2/192 " To make it more easier to be parsed by shell, I will prefer something like, " 0 64 1 128 default 2 192 " But one line format is OK for me too. > > * /sys/devices/system/memtier/memtierN/ > >   This is the directory containing the information about a particular >   memory tier, memtierN, where N is the memtier device ID (e.g. 0, 1). > >   The memtier device ID number itself is just an identifier and has no >   special meaning, i.e. memtier device ID numbers do not determine the >   order of memory tiers. > >   - /sys/devices/system/memtier/memtierN/rank > >     Format: int >     Example: 100 > >     Read-only. When read, list the "rank" value associated with memtierN. > >     "Rank" is an opaque value. Its absolute value doesn't have any >     special meaning. But the rank values of different memtiers can be >     compared with each other to determine the memory tier order. >     For example, if we have 3 memtiers: memtier0, memtier1, memiter2, and >     their rank values are 10, 20, 15, then the memory tier order is: >     memtier0 -> memtier2 -> memtier1, where memtier0 is the highest tier >     and memtier1 is the lowest tier. > >     The rank value of each memtier should be unique. > >   - /sys/devices/system/memtier/memtierN/nodelist > >     Format: node_list >     Example: 1-2 > >     Read-only. When read, list the memory nodes in the specified tier. > >     If a memory tier has no memory nodes, the kernel can hide the sysfs >     directory of this memory tier, though the tier itself can still be >     visible from /sys/devices/system/memtier/possible. > > * /sys/devices/system/node/nodeN/memtier > >   where N = 0, 1, ... > >   Format: int or empty >   Example: 1 > >   When read, list the device ID of the memory tier that the node belongs >   to. Its value is empty for a CPU-only NUMA node. > >   When written, the kernel moves the node into the specified memory >   tier if the move is allowed. The tier assignment of all other nodes >   are not affected. > >   Initially, we can make this interface read-only. > > > Kernel Representation > ===================== > > * All memory tiering code is guarded by CONFIG_TIERED_MEMORY. > > * #define MAX_MEMORY_TIERS 3 > >   Support 3 memory tiers for now. This can be a kconfig option. > > * #define MEMORY_DEFAULT_TIER_DEVICE 1 > >   The default tier device that a memory node is assigned to. > > * struct memtier_dev { >       nodemask_t nodelist; >       int rank; >       int tier; >   } memtier_devices[MAX_MEMORY_TIERS] > >   Store memory tiers by device IDs. > > * struct memtier_dev *memory_tier(int tier) > >   Returns the memtier device for a given memory tier. > > * int node_tier_dev_map[MAX_NUMNODES] > >   Map a node to its tier device ID.. > >   For each CPU-only node c, node_tier_dev_map[c] = -1. > > > Memory Tier Initialization > ========================== > > By default, all memory nodes are assigned to the default tier > (MEMORY_DEFAULT_TIER_DEVICE). The default tier device has a rank value > in the middle of the possible rank value range (e.g. 127 if the range > is [0..255]). > > A device driver can move up or down its memory nodes from the default > tier. For example, PMEM can move down its memory nodes below the > default tier, whereas GPU can move up its memory nodes above the > default tier. > > The kernel initialization code makes the decision on which exact tier > a memory node should be assigned to based on the requests from the > device drivers as well as the memory device hardware information > provided by the firmware. > > > Memory Tier Reassignment > ======================== > > After a memory node is hot-removed, it can be hot-added back to a > different memory tier. This is useful for supporting dynamically > provisioned CXL.mem NUMA nodes, which may connect to different > memory devices across hot-plug events. Such tier changes should > be compatible with tier-based memory accounting. > > The userspace may also reassign an existing online memory node to a > different tier. However, this should only be allowed when no pages > are allocated from the memory node or when there are no non-root > memory cgroups (e.g. during the system boot). This restriction is > important for keeping memory tier hierarchy stable enough for > tier-based memory cgroup accounting. One way to do this is hot-remove all memory of a node, change its memtier, then hot-add its memory. Best Regards, Huang, Ying > Hot-adding/removing CPUs doesn't affect memory tier hierarchy. > > > Memory Allocation for Demotion > ============================== > > To allocate a new page as the demotion target for a page, the kernel > calls the allocation function (__alloc_pages_nodemask) with the > source page node as the preferred node and the union of all lower > tier nodes as the allowed nodemask. The actual target node selection > then follows the allocation fallback order that the kernel has > already defined. > > The pseudo code looks like: > >     targets = NODE_MASK_NONE; >     src_nid = page_to_nid(page); >     src_tier = memtier_devices[node_tier_dev_map[src_nid]].tier; >     for (i = src_tier + 1; i < MAX_MEMORY_TIERS; i++) >             nodes_or(targets, targets, memory_tier(i)->nodelist); >     new_page = __alloc_pages_nodemask(gfp, order, src_nid, targets); > > The memopolicy of cpuset, vma and owner task of the source page can > be set to refine the demotion target nodemask, e.g. to prevent > demotion or select a particular allowed node as the demotion target. > > > Memory Allocation for Promotion > =============================== > > The page allocation for promotion is similar to demotion, except that (1) > the target nodemask uses the promotion tiers, (2) the preferred node can > be the accessing CPU node, not the source page node. > > > Examples > ======== > > * Example 1: > > Node 0 & 1 are DRAM nodes, node 2 & 3 are PMEM nodes. > >                   20 >   Node 0 (DRAM) ---- Node 1 (DRAM) >        | \ / | >        | 30 40 X 40 | 30 >        | / \ | >   Node 2 (PMEM) ---- Node 3 (PMEM) >                   40 > > node distances: > node 0 1 2 3 >    0 10 20 30 40 >    1 20 10 40 30 >    2 30 40 10 40 >    3 40 30 40 10 > > $ cat /sys/devices/system/memtier/possible > 0(64), 1(128), 2(192) > > $ grep '' /sys/devices/system/memtier/memtier*/rank > /sys/devices/system/memtier/memtier1/rank:128 > /sys/devices/system/memtier/memtier2/rank:192 > > $ grep '' /sys/devices/system/memtier/memtier*/nodelist > /sys/devices/system/memtier/memtier1/nodelist:0-1 > /sys/devices/system/memtier/memtier2/nodelist:2-3 > > $ grep '' /sys/devices/system/node/node*/memtier > /sys/devices/system/node/node0/memtier:1 > /sys/devices/system/node/node1/memtier:1 > /sys/devices/system/node/node2/memtier:2 > /sys/devices/system/node/node3/memtier:2 > > Demotion fallback order: > node 0: 2, 3 > node 1: 3, 2 > node 2: empty > node 3: empty > > To prevent cross-socket demotion and memory access, the user can set > mempolicy, e.g. cpuset.mems=0,2. > > > * Example 2: > > Node 0 & 1 are DRAM nodes. > Node 2 is a PMEM node and closer to node 0. > >                   20 >   Node 0 (DRAM) ---- Node 1 (DRAM) >        | / >        | 30 / 40 >        | / >   Node 2 (PMEM) > > node distances: > node 0 1 2 >    0 10 20 30 >    1 20 10 40 >    2 30 40 10 > > $ cat /sys/devices/system/memtier/possible > 0(64), 1(128), 2(192) > > $ grep '' /sys/devices/system/memtier/memtier*/rank > /sys/devices/system/memtier/memtier1/rank:128 > /sys/devices/system/memtier/memtier2/rank:192 > > $ grep '' /sys/devices/system/memtier/memtier*/nodelist > /sys/devices/system/memtier/memtier1/nodelist:0-1 > /sys/devices/system/memtier/memtier2/nodelist:2 > > $ grep '' /sys/devices/system/node/node*/memtier > /sys/devices/system/node/node0/memtier:1 > /sys/devices/system/node/node1/memtier:1 > /sys/devices/system/node/node2/memtier:2 > > Demotion fallback order: > node 0: 2 > node 1: 2 > node 2: empty > > > * Example 3: > > Node 0 & 1 are DRAM nodes, Node 2 is a memory-only DRAM node. > > All nodes are in the same tier. > >                   20 >   Node 0 (DRAM) ---- Node 1 (DRAM) >          \ / >           \ 30 / 30 >            \ / >              Node 2 (PMEM) > > node distances: > node 0 1 2 >    0 10 20 30 >    1 20 10 30 >    2 30 30 10 > > $ cat /sys/devices/system/memtier/possible > 0(64), 1(128), 2(192) > > $ grep '' /sys/devices/system/memtier/memtier*/rank > /sys/devices/system/memtier/memtier1/rank:128 > > $ grep '' /sys/devices/system/memtier/memtier*/nodelist > /sys/devices/system/memtier/memtier1/nodelist:0-2 > > $ grep '' /sys/devices/system/node/node*/memtier > /sys/devices/system/node/node0/memtier:1 > /sys/devices/system/node/node1/memtier:1 > /sys/devices/system/node/node2/memtier:1 > > Demotion fallback order: > node 0: empty > node 1: empty > node 2: empty > > > * Example 4: > > Node 0 is a DRAM node with CPU. > Node 1 is a PMEM node. > Node 2 is a GPU node. > >                   50 >   Node 0 (DRAM) ---- Node 2 (GPU) >          \ / >           \ 30 / 60 >            \ / >              Node 1 (PMEM) > > node distances: > node 0 1 2 >    0 10 30 50 >    1 30 10 60 >    2 50 60 10 > > $ cat /sys/devices/system/memtier/possible > 0(64), 1(128), 2(192) > > $ grep '' /sys/devices/system/memtier/memtier*/rank > /sys/devices/system/memtier/memtier0/rank:64 > /sys/devices/system/memtier/memtier1/rank:128 > /sys/devices/system/memtier/memtier2/rank:192 > > $ grep '' /sys/devices/system/memtier/memtier*/nodelist > /sys/devices/system/memtier/memtier0/nodelist:2 > /sys/devices/system/memtier/memtier1/nodelist:0 > /sys/devices/system/memtier/memtier2/nodelist:1 > > $ grep '' /sys/devices/system/node/node*/memtier > /sys/devices/system/node/node0/memtier:1 > /sys/devices/system/node/node1/memtier:2 > /sys/devices/system/node/node2/memtier:0 > > Demotion fallback order: > node 0: 1 > node 1: empty > node 2: 0, 1 > > > * Example 5: > > Node 0 is a DRAM node with CPU. > Node 1 is a GPU node. > Node 2 is a PMEM node. > Node 3 is a large, slow DRAM node without CPU. > >                     100 >      Node 0 (DRAM) ---- Node 1 (GPU) >     / | / | >    /40 |30 120 / | 110 >   | | / | >   | Node 2 (PMEM) ---- / >   | \ / >    \ 80 \ / >     ------- Node 3 (Slow DRAM) > > node distances: > node 0 1 2 3 >    0 10 100 30 40 >    1 100 10 120 110 >    2 30 120 10 80 >    3 40 110 80 10 > > MAX_MEMORY_TIERS=4 (memtier3 is a memory tier added later). > > $ cat /sys/devices/system/memtier/possible > 0(64), 1(128), 3(160), 2(192) > > $ grep '' /sys/devices/system/memtier/memtier*/rank > /sys/devices/system/memtier/memtier0/rank:64 > /sys/devices/system/memtier/memtier1/rank:128 > /sys/devices/system/memtier/memtier2/rank:192 > /sys/devices/system/memtier/memtier3/rank:160 > > $ grep '' /sys/devices/system/memtier/memtier*/nodelist > /sys/devices/system/memtier/memtier0/nodelist:1 > /sys/devices/system/memtier/memtier1/nodelist:0 > /sys/devices/system/memtier/memtier2/nodelist:2 > /sys/devices/system/memtier/memtier3/nodelist:3 > > $ grep '' /sys/devices/system/node/node*/memtier > /sys/devices/system/node/node0/memtier:1 > /sys/devices/system/node/node1/memtier:0 > /sys/devices/system/node/node2/memtier:2 > /sys/devices/system/node/node3/memtier:3 > > Demotion fallback order: > node 0: 2, 3 > node 1: 0, 3, 2 > node 2: empty > node 3: 2