2023-11-02 02:57:25

by Li Zhijian

[permalink] [raw]
Subject: Subject: [PATCH RFC 0/4] Demotion Profiling Improvements

With the deployment of high-capacity CXL Type 3 memory, HBM and Nvdimm,
the kernel now supports memory tiering. Building on this foundation
and aiming to further enhance memory efficiency, the kernel has
introduced demotion and promotion features.

To provide users with a more intuitive way to observe information
related to demotion, we have made several improvements to demotion
profiling, primarily in two aspects:

Patch 1 introduces a new interface: /sys/devices/system/node/node0/demotion_nodes
This interface is used to display the target nodes to which a node can demote.

Patch 2, Patch 3, and Patch 4 are improvements to demotion statistics.
Patch 2 changes the granularity of demotion statistics from global to per-node.
Patch 3 and Patch 4 further differentiate demotion statistics into demotion
source and demotion destination.

The names of the statistics are open to discussion; they could be named something
like pgdemote_from/to_* etc.
One issue with this patch set is that SUM(pgdemote_src_*) always equals SUM(pgdemote_dst_*)
in the global statistics (/proc/vmstat). Should we hide one of them?

Any feedback is welcome.

TO: Andrew Morton <[email protected]>
TO: Greg Kroah-Hartman <[email protected]>
TO: "Rafael J. Wysocki" <[email protected]>
CC: "Huang, Ying" <[email protected]>
CC: [email protected]
CC: [email protected]
TO: [email protected]

Li Zhijian (4):
drivers/base/node: Add demotion_nodes sys infterface
mm/vmstat: Move pgdemote_* to per-node stats
mm/vmstat: rename pgdemote_* to pgdemote_dst_* and add pgdemote_src_*
drivers/base/node: add demote_src and demote_dst to numastat

drivers/base/node.c | 29 +++++++++++++++++++++++++++--
include/linux/memory-tiers.h | 6 ++++++
include/linux/mmzone.h | 7 +++++++
include/linux/vm_event_item.h | 3 ---
mm/memory-tiers.c | 8 ++++++++
mm/vmscan.c | 14 +++++++++++---
mm/vmstat.c | 9 ++++++---
7 files changed, 65 insertions(+), 11 deletions(-)

--
2.29.2


2023-11-02 02:57:25

by Li Zhijian

[permalink] [raw]
Subject: [PATCH RFC 2/4] mm/vmstat: Move pgdemote_* to per-node stats

This is a prepare to improve the demotion profiling in the later
patches.

Per-node demotion stats help users to quickly identify which
node is in hige stree, and take some special operations if needed.

Signed-off-by: Li Zhijian <[email protected]>
---
include/linux/mmzone.h | 4 ++++
include/linux/vm_event_item.h | 3 ---
mm/vmscan.c | 3 ++-
mm/vmstat.c | 6 +++---
4 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 4106fbc5b4b3..ad0309eea850 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -206,6 +206,10 @@ enum node_stat_item {
#ifdef CONFIG_NUMA_BALANCING
PGPROMOTE_SUCCESS, /* promote successfully */
PGPROMOTE_CANDIDATE, /* candidate pages to promote */
+ /* PGDEMOTE_*: pages demoted */
+ PGDEMOTE_KSWAPD,
+ PGDEMOTE_DIRECT,
+ PGDEMOTE_KHUGEPAGED,
#endif
NR_VM_NODE_STAT_ITEMS
};
diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h
index 8abfa1240040..d1b847502f09 100644
--- a/include/linux/vm_event_item.h
+++ b/include/linux/vm_event_item.h
@@ -41,9 +41,6 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT,
PGSTEAL_KSWAPD,
PGSTEAL_DIRECT,
PGSTEAL_KHUGEPAGED,
- PGDEMOTE_KSWAPD,
- PGDEMOTE_DIRECT,
- PGDEMOTE_KHUGEPAGED,
PGSCAN_KSWAPD,
PGSCAN_DIRECT,
PGSCAN_KHUGEPAGED,
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 6f13394b112e..2f1fb4ec3235 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1678,7 +1678,8 @@ static unsigned int demote_folio_list(struct list_head *demote_folios,
(unsigned long)&mtc, MIGRATE_ASYNC, MR_DEMOTION,
&nr_succeeded);

- __count_vm_events(PGDEMOTE_KSWAPD + reclaimer_offset(), nr_succeeded);
+ mod_node_page_state(NODE_DATA(target_nid),
+ PGDEMOTE_KSWAPD + reclaimer_offset(), nr_succeeded);

return nr_succeeded;
}
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 00e81e99c6ee..f141c48c39e4 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1244,6 +1244,9 @@ const char * const vmstat_text[] = {
#ifdef CONFIG_NUMA_BALANCING
"pgpromote_success",
"pgpromote_candidate",
+ "pgdemote_kswapd",
+ "pgdemote_direct",
+ "pgdemote_khugepaged",
#endif

/* enum writeback_stat_item counters */
@@ -1275,9 +1278,6 @@ const char * const vmstat_text[] = {
"pgsteal_kswapd",
"pgsteal_direct",
"pgsteal_khugepaged",
- "pgdemote_kswapd",
- "pgdemote_direct",
- "pgdemote_khugepaged",
"pgscan_kswapd",
"pgscan_direct",
"pgscan_khugepaged",
--
2.29.2

2023-11-02 04:58:52

by Huang, Ying

[permalink] [raw]
Subject: Re: [PATCH RFC 2/4] mm/vmstat: Move pgdemote_* to per-node stats

Li Zhijian <[email protected]> writes:

> This is a prepare to improve the demotion profiling in the later
> patches.

I think that this patch has its value even without the following
patches. So, don't need to define it as preparation.

> Per-node demotion stats help users to quickly identify which
> node is in hige stree, and take some special operations if needed.

Better to add more description. For example, memory pressure on one
node, etc.

> Signed-off-by: Li Zhijian <[email protected]>

After addressing the comments above, feel free to add

Acked-by: "Huang, Ying" <[email protected]>

--
Best Regards,
Huang, Ying

> ---
> include/linux/mmzone.h | 4 ++++
> include/linux/vm_event_item.h | 3 ---
> mm/vmscan.c | 3 ++-
> mm/vmstat.c | 6 +++---
> 4 files changed, 9 insertions(+), 7 deletions(-)
>
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 4106fbc5b4b3..ad0309eea850 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -206,6 +206,10 @@ enum node_stat_item {
> #ifdef CONFIG_NUMA_BALANCING
> PGPROMOTE_SUCCESS, /* promote successfully */
> PGPROMOTE_CANDIDATE, /* candidate pages to promote */
> + /* PGDEMOTE_*: pages demoted */
> + PGDEMOTE_KSWAPD,
> + PGDEMOTE_DIRECT,
> + PGDEMOTE_KHUGEPAGED,
> #endif
> NR_VM_NODE_STAT_ITEMS
> };
> diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h
> index 8abfa1240040..d1b847502f09 100644
> --- a/include/linux/vm_event_item.h
> +++ b/include/linux/vm_event_item.h
> @@ -41,9 +41,6 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT,
> PGSTEAL_KSWAPD,
> PGSTEAL_DIRECT,
> PGSTEAL_KHUGEPAGED,
> - PGDEMOTE_KSWAPD,
> - PGDEMOTE_DIRECT,
> - PGDEMOTE_KHUGEPAGED,
> PGSCAN_KSWAPD,
> PGSCAN_DIRECT,
> PGSCAN_KHUGEPAGED,
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 6f13394b112e..2f1fb4ec3235 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1678,7 +1678,8 @@ static unsigned int demote_folio_list(struct list_head *demote_folios,
> (unsigned long)&mtc, MIGRATE_ASYNC, MR_DEMOTION,
> &nr_succeeded);
>
> - __count_vm_events(PGDEMOTE_KSWAPD + reclaimer_offset(), nr_succeeded);
> + mod_node_page_state(NODE_DATA(target_nid),
> + PGDEMOTE_KSWAPD + reclaimer_offset(), nr_succeeded);
>
> return nr_succeeded;
> }
> diff --git a/mm/vmstat.c b/mm/vmstat.c
> index 00e81e99c6ee..f141c48c39e4 100644
> --- a/mm/vmstat.c
> +++ b/mm/vmstat.c
> @@ -1244,6 +1244,9 @@ const char * const vmstat_text[] = {
> #ifdef CONFIG_NUMA_BALANCING
> "pgpromote_success",
> "pgpromote_candidate",
> + "pgdemote_kswapd",
> + "pgdemote_direct",
> + "pgdemote_khugepaged",
> #endif
>
> /* enum writeback_stat_item counters */
> @@ -1275,9 +1278,6 @@ const char * const vmstat_text[] = {
> "pgsteal_kswapd",
> "pgsteal_direct",
> "pgsteal_khugepaged",
> - "pgdemote_kswapd",
> - "pgdemote_direct",
> - "pgdemote_khugepaged",
> "pgscan_kswapd",
> "pgscan_direct",
> "pgscan_khugepaged",

2023-11-02 05:48:24

by Huang, Ying

[permalink] [raw]
Subject: Re: [PATCH RFC 2/4] mm/vmstat: Move pgdemote_* to per-node stats

Li Zhijian <[email protected]> writes:

> This is a prepare to improve the demotion profiling in the later
> patches.
>
> Per-node demotion stats help users to quickly identify which
> node is in hige stree, and take some special operations if needed.
>
> Signed-off-by: Li Zhijian <[email protected]>
> ---
> include/linux/mmzone.h | 4 ++++
> include/linux/vm_event_item.h | 3 ---
> mm/vmscan.c | 3 ++-
> mm/vmstat.c | 6 +++---
> 4 files changed, 9 insertions(+), 7 deletions(-)
>
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 4106fbc5b4b3..ad0309eea850 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -206,6 +206,10 @@ enum node_stat_item {
> #ifdef CONFIG_NUMA_BALANCING
> PGPROMOTE_SUCCESS, /* promote successfully */
> PGPROMOTE_CANDIDATE, /* candidate pages to promote */
> + /* PGDEMOTE_*: pages demoted */
> + PGDEMOTE_KSWAPD,
> + PGDEMOTE_DIRECT,
> + PGDEMOTE_KHUGEPAGED,
> #endif
> NR_VM_NODE_STAT_ITEMS
> };
> diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h
> index 8abfa1240040..d1b847502f09 100644
> --- a/include/linux/vm_event_item.h
> +++ b/include/linux/vm_event_item.h
> @@ -41,9 +41,6 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT,
> PGSTEAL_KSWAPD,
> PGSTEAL_DIRECT,
> PGSTEAL_KHUGEPAGED,
> - PGDEMOTE_KSWAPD,
> - PGDEMOTE_DIRECT,
> - PGDEMOTE_KHUGEPAGED,
> PGSCAN_KSWAPD,
> PGSCAN_DIRECT,
> PGSCAN_KHUGEPAGED,
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 6f13394b112e..2f1fb4ec3235 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1678,7 +1678,8 @@ static unsigned int demote_folio_list(struct list_head *demote_folios,
> (unsigned long)&mtc, MIGRATE_ASYNC, MR_DEMOTION,
> &nr_succeeded);
>
> - __count_vm_events(PGDEMOTE_KSWAPD + reclaimer_offset(), nr_succeeded);
> + mod_node_page_state(NODE_DATA(target_nid),
> + PGDEMOTE_KSWAPD + reclaimer_offset(), nr_succeeded);

Think again. It seems that it's better to count demotion event for the
source node. Because demotion comes from the memory pressure of the
source node. The target node isn't so important. Do you agree?

--
Best Regards,
Huang, Ying

>
> return nr_succeeded;
> }
> diff --git a/mm/vmstat.c b/mm/vmstat.c
> index 00e81e99c6ee..f141c48c39e4 100644
> --- a/mm/vmstat.c
> +++ b/mm/vmstat.c
> @@ -1244,6 +1244,9 @@ const char * const vmstat_text[] = {
> #ifdef CONFIG_NUMA_BALANCING
> "pgpromote_success",
> "pgpromote_candidate",
> + "pgdemote_kswapd",
> + "pgdemote_direct",
> + "pgdemote_khugepaged",
> #endif
>
> /* enum writeback_stat_item counters */
> @@ -1275,9 +1278,6 @@ const char * const vmstat_text[] = {
> "pgsteal_kswapd",
> "pgsteal_direct",
> "pgsteal_khugepaged",
> - "pgdemote_kswapd",
> - "pgdemote_direct",
> - "pgdemote_khugepaged",
> "pgscan_kswapd",
> "pgscan_direct",
> "pgscan_khugepaged",

2023-11-02 05:58:10

by Li Zhijian

[permalink] [raw]
Subject: Re: [PATCH RFC 2/4] mm/vmstat: Move pgdemote_* to per-node stats



On 02/11/2023 13:43, Huang, Ying wrote:
> Li Zhijian <[email protected]> writes:
>
>> This is a prepare to improve the demotion profiling in the later
>> patches.
>>
>> Per-node demotion stats help users to quickly identify which
>> node is in hige stree, and take some special operations if needed.
>>
>> Signed-off-by: Li Zhijian <[email protected]>
>> ---
>> include/linux/mmzone.h | 4 ++++
>> include/linux/vm_event_item.h | 3 ---
>> mm/vmscan.c | 3 ++-
>> mm/vmstat.c | 6 +++---
>> 4 files changed, 9 insertions(+), 7 deletions(-)
>>
>> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
>> index 4106fbc5b4b3..ad0309eea850 100644
>> --- a/include/linux/mmzone.h
>> +++ b/include/linux/mmzone.h
>> @@ -206,6 +206,10 @@ enum node_stat_item {
>> #ifdef CONFIG_NUMA_BALANCING
>> PGPROMOTE_SUCCESS, /* promote successfully */
>> PGPROMOTE_CANDIDATE, /* candidate pages to promote */
>> + /* PGDEMOTE_*: pages demoted */
>> + PGDEMOTE_KSWAPD,
>> + PGDEMOTE_DIRECT,
>> + PGDEMOTE_KHUGEPAGED,
>> #endif
>> NR_VM_NODE_STAT_ITEMS
>> };
>> diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h
>> index 8abfa1240040..d1b847502f09 100644
>> --- a/include/linux/vm_event_item.h
>> +++ b/include/linux/vm_event_item.h
>> @@ -41,9 +41,6 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT,
>> PGSTEAL_KSWAPD,
>> PGSTEAL_DIRECT,
>> PGSTEAL_KHUGEPAGED,
>> - PGDEMOTE_KSWAPD,
>> - PGDEMOTE_DIRECT,
>> - PGDEMOTE_KHUGEPAGED,
>> PGSCAN_KSWAPD,
>> PGSCAN_DIRECT,
>> PGSCAN_KHUGEPAGED,
>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>> index 6f13394b112e..2f1fb4ec3235 100644
>> --- a/mm/vmscan.c
>> +++ b/mm/vmscan.c
>> @@ -1678,7 +1678,8 @@ static unsigned int demote_folio_list(struct list_head *demote_folios,
>> (unsigned long)&mtc, MIGRATE_ASYNC, MR_DEMOTION,
>> &nr_succeeded);
>>
>> - __count_vm_events(PGDEMOTE_KSWAPD + reclaimer_offset(), nr_succeeded);
>> + mod_node_page_state(NODE_DATA(target_nid),
>> + PGDEMOTE_KSWAPD + reclaimer_offset(), nr_succeeded);
>
> Think again. It seems that it's better to count demotion event for the
> source node. Because demotion comes from the memory pressure of the
> source node. The target node isn't so important. Do you agree?

Good idea, I will update it.



>
> --
> Best Regards,
> Huang, Ying
>
>>
>> return nr_succeeded;
>> }
>> diff --git a/mm/vmstat.c b/mm/vmstat.c
>> index 00e81e99c6ee..f141c48c39e4 100644
>> --- a/mm/vmstat.c
>> +++ b/mm/vmstat.c
>> @@ -1244,6 +1244,9 @@ const char * const vmstat_text[] = {
>> #ifdef CONFIG_NUMA_BALANCING
>> "pgpromote_success",
>> "pgpromote_candidate",
>> + "pgdemote_kswapd",
>> + "pgdemote_direct",
>> + "pgdemote_khugepaged",
>> #endif
>>
>> /* enum writeback_stat_item counters */
>> @@ -1275,9 +1278,6 @@ const char * const vmstat_text[] = {
>> "pgsteal_kswapd",
>> "pgsteal_direct",
>> "pgsteal_khugepaged",
>> - "pgdemote_kswapd",
>> - "pgdemote_direct",
>> - "pgdemote_khugepaged",
>> "pgscan_kswapd",
>> "pgscan_direct",
>> "pgscan_khugepaged",