Received: by 2002:a05:6a10:5bc5:0:0:0:0 with SMTP id os5csp59044pxb; Fri, 15 Oct 2021 00:10:37 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzd4x4Ei8Dq50eMso3ZFQw+6lsKJ42PwYq9fy44tquKsnGyVem2dQavqAUEw+ShtFGqwH7h X-Received: by 2002:a17:902:d2cd:b0:13f:14dd:aeff with SMTP id n13-20020a170902d2cd00b0013f14ddaeffmr9734119plc.67.1634281836902; Fri, 15 Oct 2021 00:10:36 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1634281836; cv=none; d=google.com; s=arc-20160816; b=G+8QiEJUMN7KqWIGEhkIfxel9cz0S21HXxu24wgIoyPuz3Jfc2DixqpKrqc883AFap sfHoS4ZNkvJqzURqEl1MKQwVcBQW8x+0XdAEnWvG6kE79gJ7Z5e4KykzCD+pEthofxNC XPLjiQzKwxCjsGQ40IZnEwfQxYFVtqXDGgyAee5Li46qadvk+FaRj0V8rR+kHq/yqAXO 5v5oQfEqWTs62xWqMfYjDhhO6Hdxv/sAn3FBlo1YazT76nyWjzYK+jeolu/c0CpDa1eu vkmX7n2PL6n/PWyJ5sUYLaLFO1/lv14IV/xqln8biXnsevlLesPU3FK4QenOH0/loYlM 0LvQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=kRUEHQjVFbwgFLC2j3SnY9E5hqbJNKEa3Zb+0zxtKsY=; b=oWx6tqbTgTNHPcPeo6GdJK0Yxac3y2F9uxVBFq1BcHHPi188/pA/qe41msEn8kbCbw uzrm1IL1PG7nV7oBOdgb0cXmbCJ61YWwF0dt0BYJu31p2SwZskt6NdyOKv8gg4uHF7JF z0F9onmTWOU5qK3sJRZ6/RdNvP/K2KIdPMOWK5tpRU/AdUppj2pLdHXJslQsQKdFl9hy ogJjWsF7MOuetNK1W1AGj2Po4kuI7j227O9gs0NDggxw1hPPMXr65THsMLGHqYxI53Lz R3Lk0OD4PiI9RH6iBV2Yfhi8g/NcugAZ9eb3MD//KQKdrmg+MrbiNljDwepfQNhfYSmJ lnhg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=mCCK71nN; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id u1si6952737pfl.167.2021.10.15.00.10.23; Fri, 15 Oct 2021 00:10:36 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=mCCK71nN; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234233AbhJNWya (ORCPT + 99 others); Thu, 14 Oct 2021 18:54:30 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41524 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234220AbhJNWy3 (ORCPT ); Thu, 14 Oct 2021 18:54:29 -0400 Received: from mail-ed1-x536.google.com (mail-ed1-x536.google.com [IPv6:2a00:1450:4864:20::536]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BAC39C061570 for ; Thu, 14 Oct 2021 15:52:23 -0700 (PDT) Received: by mail-ed1-x536.google.com with SMTP id y30so13278794edi.0 for ; Thu, 14 Oct 2021 15:52:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=kRUEHQjVFbwgFLC2j3SnY9E5hqbJNKEa3Zb+0zxtKsY=; b=mCCK71nNiu5+L7HO68suQEiXMn2682jSHAnzgA0GM4A9wYh5wwVqQ8SPMcTVXiEXfz vKvBHcDi6V6RKRWDmo0TY4glwiJeZDKdDhGkAq0mcKOhw+6F/g3jXWWD9cK+8rBuOsHw z6nIYI8aPdwKDzKrer6bBDy0mfXPR8vld+Ti0xZAnECi+uk9oyuXIsO3wBIdvdPrJlCF a1iVwqjnnZagJJu/8QxQQfgDR5gZUUegeuXOWhPyVqbUkdpaCkGJ3NY1lGPTGjNIuOve WiUO5j5yqfbT8nq1bD1bkUb8GoX/IvtrWi3oHXlAff9wCed22U+KlbqW/sdRv97LQPxq HQYA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=kRUEHQjVFbwgFLC2j3SnY9E5hqbJNKEa3Zb+0zxtKsY=; b=ogjudcU32ddHd09Lj/+878/M/ZbsAdsMSEQdrTnPSIl6uCh2juIEjp6GkdZd3Tlvfs 0+Ck88Pwr6iojkubhrylckBHEaLmVvkGex75GN4oyn9D84Vm7/fVAsQx7YPCez34gR8l Bedp54gx8VpXPTeJPjsSQgkZIIl5foHfqWPvVaS6KgvvmBR/mCusl3W1wSIO9Aei/TzN Ez6PFIpNWMXPvI8Xs89qCwRzM+/wLLsTEOD746LTok6iabSJe2gLM5pz76W7NihXh+7+ /bpGFRdJMVCsKrxjQ01FhDFCIvZ7rxC5Qi7r5X1uk50Kwwu6AEdt5WUHH1EslxIhCuBG 1pQQ== X-Gm-Message-State: AOAM533D5xyUX5WykecIZbJs8Gpulo+zFyHpMj0zDLEKYX9hYq1+XzLZ aOLkxlLrtX6Kx4ZXdNthOynEURjMLVTwruuRcoc= X-Received: by 2002:a17:907:170a:: with SMTP id le10mr2215386ejc.537.1634251942307; Thu, 14 Oct 2021 15:52:22 -0700 (PDT) MIME-Version: 1.0 References: <20211008083938.1702663-1-ying.huang@intel.com> <20211008083938.1702663-2-ying.huang@intel.com> <87a6jco2f9.fsf@yhuang6-desk2.ccr.corp.intel.com> In-Reply-To: <87a6jco2f9.fsf@yhuang6-desk2.ccr.corp.intel.com> From: Yang Shi Date: Thu, 14 Oct 2021 15:52:10 -0700 Message-ID: Subject: Re: [PATCH -V9 1/6] NUMA Balancing: add page promotion counter To: "Huang, Ying" Cc: Linux Kernel Mailing List , Andrew Morton , Michal Hocko , Rik van Riel , Mel Gorman , Peter Zijlstra , Dave Hansen , Zi Yan , Wei Xu , osalvador , Shakeel Butt , Linux MM Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Oct 13, 2021 at 5:50 PM Huang, Ying wrote: > > Yang Shi writes: > > > On Fri, Oct 8, 2021 at 1:40 AM Huang Ying wrote: > >> > >> In a system with multiple memory types, e.g. DRAM and PMEM, the CPU > >> and DRAM in one socket will be put in one NUMA node as before, while > >> the PMEM will be put in another NUMA node as described in the > >> description of the commit c221c0b0308f ("device-dax: "Hotplug" > >> persistent memory for use like normal RAM"). So, the NUMA balancing > >> mechanism will identify all PMEM accesses as remote access and try to > >> promote the PMEM pages to DRAM. > >> > >> To distinguish the number of the inter-type promoted pages from that > >> of the inter-socket migrated pages. A new vmstat count is added. The > >> counter is per-node (count in the target node). So this can be used > >> to identify promotion imbalance among the NUMA nodes. > >> > >> Signed-off-by: "Huang, Ying" > >> Cc: Andrew Morton > >> Cc: Michal Hocko > >> Cc: Rik van Riel > >> Cc: Mel Gorman > >> Cc: Peter Zijlstra > >> Cc: Dave Hansen > >> Cc: Yang Shi > >> Cc: Zi Yan > >> Cc: Wei Xu > >> Cc: osalvador > >> Cc: Shakeel Butt > >> Cc: linux-kernel@vger.kernel.org > >> Cc: linux-mm@kvack.org > >> --- > >> include/linux/mmzone.h | 3 +++ > >> include/linux/node.h | 5 +++++ > >> include/linux/vmstat.h | 2 ++ > >> mm/migrate.c | 10 ++++++++-- > >> mm/vmstat.c | 3 +++ > >> 5 files changed, 21 insertions(+), 2 deletions(-) > >> > >> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h > >> index 6a1d79d84675..37ccd6158765 100644 > >> --- a/include/linux/mmzone.h > >> +++ b/include/linux/mmzone.h > >> @@ -209,6 +209,9 @@ enum node_stat_item { > >> NR_PAGETABLE, /* used for pagetables */ > >> #ifdef CONFIG_SWAP > >> NR_SWAPCACHE, > >> +#endif > >> +#ifdef CONFIG_NUMA_BALANCING > >> + PGPROMOTE_SUCCESS, /* promote successfully */ > >> #endif > >> NR_VM_NODE_STAT_ITEMS > >> }; > >> diff --git a/include/linux/node.h b/include/linux/node.h > >> index 8e5a29897936..26e96fcc66af 100644 > >> --- a/include/linux/node.h > >> +++ b/include/linux/node.h > >> @@ -181,4 +181,9 @@ static inline void register_hugetlbfs_with_node(node_registration_func_t reg, > >> > >> #define to_node(device) container_of(device, struct node, dev) > >> > >> +static inline bool node_is_toptier(int node) > >> +{ > >> + return node_state(node, N_CPU); > >> +} > >> + > >> #endif /* _LINUX_NODE_H_ */ > >> diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h > >> index d6a6cf53b127..75c53b7d1539 100644 > >> --- a/include/linux/vmstat.h > >> +++ b/include/linux/vmstat.h > >> @@ -112,9 +112,11 @@ static inline void vm_events_fold_cpu(int cpu) > >> #ifdef CONFIG_NUMA_BALANCING > >> #define count_vm_numa_event(x) count_vm_event(x) > >> #define count_vm_numa_events(x, y) count_vm_events(x, y) > >> +#define mod_node_balancing_page_state(n, i, v) mod_node_page_state(n, i, v) > > > > I don't quite get why we need this new API. Doesn't __count_vm_events() work? > > PGPROMOTE_SUCCESS is a per-node counter. That is, its type is enum > node_stat_item instead of enum vm_event_item. So we need to use > mod_node_page_state() instead of count_vm_events(). The new API is to > avoid #ifdef CONFIG_NUMA_BALANCING/#endif in caller. Aha, I see, sorry for overlooking this. But I think you could just call mod_node_page_state() since migrate_misplaced_page() has been protected by #ifdef CONFIG_NUMA_BALANCING. The !CONFIG_NUMA_BALANCING version just returns -EFAULT. Other than this, another nit below. > > Best Regards, > Huang, Ying > > >> #else > >> #define count_vm_numa_event(x) do {} while (0) > >> #define count_vm_numa_events(x, y) do { (void)(y); } while (0) > >> +#define mod_node_balancing_page_state(n, i, v) do {} while (0) > >> #endif /* CONFIG_NUMA_BALANCING */ > >> > >> #ifdef CONFIG_DEBUG_TLBFLUSH > >> diff --git a/mm/migrate.c b/mm/migrate.c > >> index a6a7743ee98f..c3affc587902 100644 > >> --- a/mm/migrate.c > >> +++ b/mm/migrate.c > >> @@ -2148,6 +2148,7 @@ int migrate_misplaced_page(struct page *page, struct vm_area_struct *vma, > >> pg_data_t *pgdat = NODE_DATA(node); > >> int isolated; > >> int nr_remaining; > >> + int nr_succeeded; > >> LIST_HEAD(migratepages); > >> new_page_t *new; > >> bool compound; > >> @@ -2186,7 +2187,8 @@ int migrate_misplaced_page(struct page *page, struct vm_area_struct *vma, > >> > >> list_add(&page->lru, &migratepages); > >> nr_remaining = migrate_pages(&migratepages, *new, NULL, node, > >> - MIGRATE_ASYNC, MR_NUMA_MISPLACED, NULL); > >> + MIGRATE_ASYNC, MR_NUMA_MISPLACED, > >> + &nr_succeeded); > >> if (nr_remaining) { > >> if (!list_empty(&migratepages)) { > >> list_del(&page->lru); > >> @@ -2195,8 +2197,12 @@ int migrate_misplaced_page(struct page *page, struct vm_area_struct *vma, > >> putback_lru_page(page); > >> } > >> isolated = 0; > >> - } else > >> + } else { > >> count_vm_numa_events(NUMA_PAGE_MIGRATE, nr_pages); > >> + if (!node_is_toptier(page_to_nid(page)) && node_is_toptier(node)) > >> + mod_node_balancing_page_state( > >> + NODE_DATA(node), PGPROMOTE_SUCCESS, nr_succeeded); > >> + } It looks the original code is already problematic. It just updates the counter when *all* pages are migrated successfully. But since we already has "nr_succeeded", so I think we could do: if (nr_remaining) { do_something(); } count_vm_numa_events(NUMA_PAGE_MIGRATE, nr_succeeded); if (!node_is_toptier(page_to_nid(page)) && node_is_toptier(node)) mod_node_page_state(NODE_DATA(node), PGPROMOTE_SUCCESS, nr_succeeded); > >> BUG_ON(!list_empty(&migratepages)); > >> return isolated; > >> > >> diff --git a/mm/vmstat.c b/mm/vmstat.c > >> index 8ce2620344b2..fff0ec94d795 100644 > >> --- a/mm/vmstat.c > >> +++ b/mm/vmstat.c > >> @@ -1236,6 +1236,9 @@ const char * const vmstat_text[] = { > >> #ifdef CONFIG_SWAP > >> "nr_swapcached", > >> #endif > >> +#ifdef CONFIG_NUMA_BALANCING > >> + "pgpromote_success", > >> +#endif > >> > >> /* enum writeback_stat_item counters */ > >> "nr_dirty_threshold", > >> -- > >> 2.30.2 > >>