2024-02-26 14:10:03

by Honggyu Kim

[permalink] [raw]
Subject: [RFC PATCH v2 0/7] DAMON based 2-tier memory management for CXL memory

There was an RFC IDEA "DAMOS-based Tiered-Memory Management" previously
posted at [1].

It says there is no implementation of the demote/promote DAMOS action
are made. This RFC is about its implementation for physical address
space.


Introduction
============

With the advent of CXL/PCIe attached DRAM, which will be called simply
as CXL memory in this cover letter, some systems are becoming more
heterogeneous having memory systems with different latency and bandwidth
characteristics. They are usually handled as different NUMA nodes in
separate memory tiers and CXL memory is used as slow tiers because of
its protocol overhead compared to local DRAM.

In this kind of systems, we need to be careful placing memory pages on
proper NUMA nodes based on the memory access frequency. Otherwise, some
frequently accessed pages might reside on slow tiers and it makes
performance degradation unexpectedly. Moreover, the memory access
patterns can be changed at runtime.

To handle this problem, we need a way to monitor the memory access
patterns and migrate pages based on their access temperature. The
DAMON(Data Access MONitor) framework and its DAMOS(DAMON-based Operation
Schemes) can be useful features for monitoring and migrating pages.
DAMOS provides multiple actions based on DAMON monitoring results and it
can be used for proactive reclaim, which means swapping cold pages out
with DAMOS_PAGEOUT action, but it doesn't support migration actions such
as demotion and promotion between tiered memory nodes.

This series supports two new DAMOS actions; DAMOS_DEMOTE for demotion
from fast tiers and DAMOS_PROMOTE for promotion from slow tiers. This
prevents hot pages from being stuck on slow tiers, which makes
performance degradation and cold pages can be proactively demoted to
slow tiers so that the system can increase the chance to allocate more
hot pages to fast tiers.

The DAMON provides various tuning knobs but we found that the proactive
demotion for cold pages is especially useful when the system is running
out of memory on its fast tier nodes.

Our evaluation result shows that it reduces the performance slowdown
compared to the default memory policy from 15~17% to 4~5% when the
system runs under high memory pressure on its fast tier DRAM nodes.


DAMON configuration
===================

The specific DAMON configuration doesn't have to be in the scope of this
patch series, but some rough idea is better to be shared to explain the
evaluation result.

The DAMON provides many knobs for fine tuning but its configuration file
is generated by HMSDK[2]. It includes gen_config.py script that
generates a json file with the full config of DAMON knobs and it creates
multiple kdamonds for each NUMA node when the DAMON is enabled so that
it can run hot/cold based migration for tiered memory.


Evaluation Workload
===================

The performance evaluation is done with redis[3], which is a widely used
in-memory database and the memory access patterns are generated via
YCSB[4]. We have measured two different workloads with zipfian and
latest distributions but their configs are slightly modified to make
memory usage higher and execution time longer for better evaluation.

The idea of evaluation using these demote and promote actions covers
system-wide memory management rather than partitioning hot/cold pages of
a single workload. The default memory allocation policy creates pages
to the fast tier DRAM node first, then allocates newly created pages to
the slow tier CXL node when the DRAM node has insufficient free space.
Once the page allocation is done then those pages never move between
NUMA nodes. It's not true when using numa balancing, but it is not the
scope of this DAMON based 2-tier memory management support.

If the working set of redis can be fit fully into the DRAM node, then
the redis will access the fast DRAM only. Since the performance of DRAM
only is faster than partially accessing CXL memory in slow tiers, this
environment is not useful to evaluate this patch series.

To make pages of redis be distributed across fast DRAM node and slow
CXL node to evaluate our demote and promote actions, we pre-allocate
some cold memory externally using mmap and memset before launching
redis-server. We assumed that there are enough amount of cold memory in
datacenters as TMO[5] and TPP[6] papers mentioned.

The evaluation sequence is as follows.

1. Turn on DAMON with DAMOS_DEMOTE action for DRAM node and
DAMOS_PROMOTE action for CXL node. It demotes cold pages on DRAM
node and promotes hot pages on CXL node in a regular interval.
2. Allocate a huge block of cold memory by calling mmap and memset at
the fast tier DRAM node, then make the process sleep to make the fast
tier has insufficient memory for redis-server.
3. Launch redis-server and load prebaked snapshot image, dump.rdb. The
redis-server consumes 52GB of anon pages and 33GB of file pages, but
due to the cold memory allocated at 2, it fails allocating the entire
memory of redis-server on the fast tier DRAM node so it partially
allocates the remaining on the slow tier CXL node. The ratio of
DRAM:CXL depends on the size of the pre-allocated cold memory.
4. Run YCSB to make zipfian or latest distribution of memory accesses to
redis-server, then measure its execution time when it's completed.
5. Repeat 4 over 50 times to measure the average execution time for each
run.
6. Increase the cold memory size then repeat goes to 2.

For each test at 4 took about a minute so repeating it 50 times almost
took about 1 hour for each test with a specific cold memory from 440GB
to 500GB in 10GB increments for each evaluation. So it took about more
than 10 hours for both zipfian and latest workloads to get the entire
evaluation results. Repeating the same test set multiple times doesn't
show much difference so I think it might be enough to make the result
reliable.


Evaluation Results
==================

All the result values are normalized to DRAM-only execution time because
the workload cannot be faster than DRAM-only unless the workload hits
the bandwidth peak but our redis test doesn't go beyond the bandwidth
limit.

So the DRAM-only execution time is the ideal result without affected by
the gap between DRAM and CXL performance difference. The NUMA node
environment is as follows.

node0 - local DRAM, 512GB with a CPU socket (fast tier)
node1 - disabled
node2 - CXL DRAM, 96GB, no CPU attached (slow tier)

The following is the result of generating zipfian distribution to
redis-server and the numbers are averaged by 50 times of execution.

1. YCSB zipfian distribution read only workload
memory pressure with cold memory on node0 with 512GB of local DRAM.
=============+================================================+=========
| cold memory occupied by mmap and memset |
| 0G 440G 450G 460G 470G 480G 490G 500G |
=============+================================================+=========
Execution time normalized to DRAM-only values | GEOMEAN
-------------+------------------------------------------------+---------
DRAM-only | 1.00 - - - - - - - | 1.00
CXL-only | 1.21 - - - - - - - | 1.21
default | - 1.09 1.10 1.13 1.15 1.18 1.21 1.21 | 1.15
DAMON 2-tier | - 1.02 1.04 1.05 1.04 1.05 1.05 1.06 | 1.04
=============+================================================+=========
CXL usage of redis-server in GB | AVERAGE
-------------+------------------------------------------------+---------
DRAM-only | 0.0 - - - - - - - | 0.0
CXL-only | 52.6 - - - - - - - | 52.6
default | - 19.4 26.1 32.3 38.5 44.7 50.5 50.3 | 37.4
DAMON 2-tier | - 0.1 1.6 5.2 8.0 9.1 11.8 13.6 | 7.1
=============+================================================+=========

Each test result is based on the exeuction environment as follows.

DRAM-only : redis-server uses only local DRAM memory.
CXL-only : redis-server uses only CXL memory.
default : default memory policy(MPOL_DEFAULT).
numa balancing disabled.
DAMON 2-tier: DAMON enabled with DAMOS_DEMOTE for DRAM nodes and
DAMOS_PROMOTE for CXL nodes.

The above result shows the "default" execution time goes up as the size
of cold memory is increased from 440G to 500G because the more cold
memory used, the more CXL memory is used for the target redis workload
and this makes the execution time increase.

However, "DAMON 2-tier" result shows less slowdown because the
DAMOS_DEMOTE action at DRAM node proactively demotes pre-allocated cold
memory to CXL node and this free space at DRAM increases more chance to
allocate hot or warm pages of redis-server to fast DRAM node. Moreover,
DEMOS_PROMOTE action at CXL node also promotes hot pages of redis-server
to DRAM node actively.

As a result, it makes more memory of redis-server stay in DRAM node
compared to "default" memory policy and this makes the performance
improvement.

The following result of latest distribution workload shows similar data.

2. YCSB latest distribution read only workload
memory pressure with cold memory on node0 with 512GB of local DRAM.
=============+================================================+=========
| cold memory occupied by mmap and memset |
| 0G 440G 450G 460G 470G 480G 490G 500G |
=============+================================================+=========
Execution time normalized to DRAM-only values | GEOMEAN
-------------+------------------------------------------------+---------
DRAM-only | 1.00 - - - - - - - | 1.00
CXL-only | 1.18 - - - - - - - | 1.18
default | - 1.16 1.15 1.17 1.18 1.16 1.18 1.15 | 1.17
DAMON 2-tier | - 1.04 1.04 1.05 1.05 1.06 1.05 1.06 | 1.05
=============+================================================+=========
CXL usage of redis-server in GB | AVERAGE
-------------+------------------------------------------------+---------
DRAM-only | 0.0 - - - - - - - | 0.0
CXL-only | 52.6 - - - - - - - | 52.6
default | - 19.3 26.1 32.2 38.5 44.6 50.5 50.6 | 37.4
DAMON 2-tier | - 1.3 3.8 7.0 4.1 9.4 12.5 16.7 | 7.8
=============+================================================+=========

In summary of both results, our evaluation shows that "DAMON 2-tier"
memory management reduces the performance slowdown compared to the
"default" memory policy from 15~17% to 4~5% when the system runs with
high memory pressure on its fast tier DRAM nodes.

The similar evaluation was done in another machine that has 256GB of
local DRAM and 96GB of CXL memory. The performance slowdown is reduced
from 20~24% for "default" to 5~7% for "DAMON 2-tier".

Having these DAMOS_DEMOTE and DAMOS_PROMOTE actions can make 2-tier
memory systems run more efficiently under high memory pressures.

Signed-off-by: Honggyu Kim <[email protected]>
Signed-off-by: Hyeongtak Ji <[email protected]>
Signed-off-by: Rakie Kim <[email protected]>

[1] https://lore.kernel.org/damon/[email protected]
[2] https://github.com/skhynix/hmsdk
[3] https://github.com/redis/redis/tree/7.0.0
[4] https://github.com/brianfrankcooper/YCSB/tree/0.17.0
[5] https://dl.acm.org/doi/10.1145/3503222.3507731
[6] https://dl.acm.org/doi/10.1145/3582016.3582063

Changes from RFC:
1. Move most of implementation from mm/vmscan.c to mm/damon/paddr.c.
2. Simplify some functions of vmscan.c and used in paddr.c, but need
to be reviewed more in depth.
3. Refactor most functions for common usage for both promote and
demote actions and introduce an enum migration_mode for its control.
4. Add "target_nid" sysfs knob for migration destination node for both
promote and demote actions.
5. Move DAMOS_PROMOTE before DAMOS_DEMOTE and move then even above
DAMOS_STAT.

Honggyu Kim (3):
mm/damon: refactor DAMOS_PAGEOUT with migration_mode
mm: make alloc_demote_folio externally invokable for migration
mm/damon: introduce DAMOS_DEMOTE action for demotion

Hyeongtak Ji (4):
mm/memory-tiers: add next_promotion_node to find promotion target
mm/damon: introduce DAMOS_PROMOTE action for promotion
mm/damon/sysfs-schemes: add target_nid on sysfs-schemes
mm/damon/sysfs-schemes: apply target_nid for promote and demote
actions

include/linux/damon.h | 15 +-
include/linux/memory-tiers.h | 11 ++
include/linux/migrate_mode.h | 1 +
include/linux/vm_event_item.h | 1 +
include/trace/events/migrate.h | 3 +-
mm/damon/core.c | 5 +-
mm/damon/dbgfs.c | 2 +-
mm/damon/lru_sort.c | 3 +-
mm/damon/paddr.c | 282 ++++++++++++++++++++++++++++++++-
mm/damon/reclaim.c | 3 +-
mm/damon/sysfs-schemes.c | 39 ++++-
mm/internal.h | 1 +
mm/memory-tiers.c | 43 +++++
mm/vmscan.c | 10 +-
mm/vmstat.c | 1 +
15 files changed, 404 insertions(+), 16 deletions(-)


base-commit: 0dd3ee31125508cd67f7e7172247f05b7fd1753a
--
2.34.1



2024-02-26 14:10:30

by Honggyu Kim

[permalink] [raw]
Subject: [PATCH v2 4/7] mm/memory-tiers: add next_promotion_node to find promotion target

From: Hyeongtak Ji <[email protected]>

This patch adds next_promotion_node that can be used to identify the
appropriate promotion target based on memory tiers. When multiple
promotion target nodes are available, the nearest node is selected based
on numa distance.

Signed-off-by: Hyeongtak Ji <[email protected]>
---
include/linux/memory-tiers.h | 11 +++++++++
mm/memory-tiers.c | 43 ++++++++++++++++++++++++++++++++++++
2 files changed, 54 insertions(+)

diff --git a/include/linux/memory-tiers.h b/include/linux/memory-tiers.h
index 1e39d27bee41..0788e435fc50 100644
--- a/include/linux/memory-tiers.h
+++ b/include/linux/memory-tiers.h
@@ -50,6 +50,7 @@ int mt_set_default_dram_perf(int nid, struct node_hmem_attrs *perf,
int mt_perf_to_adistance(struct node_hmem_attrs *perf, int *adist);
#ifdef CONFIG_MIGRATION
int next_demotion_node(int node);
+int next_promotion_node(int node);
void node_get_allowed_targets(pg_data_t *pgdat, nodemask_t *targets);
bool node_is_toptier(int node);
#else
@@ -58,6 +59,11 @@ static inline int next_demotion_node(int node)
return NUMA_NO_NODE;
}

+static inline int next_promotion_node(int node)
+{
+ return NUMA_NO_NODE;
+}
+
static inline void node_get_allowed_targets(pg_data_t *pgdat, nodemask_t *targets)
{
*targets = NODE_MASK_NONE;
@@ -101,6 +107,11 @@ static inline int next_demotion_node(int node)
return NUMA_NO_NODE;
}

+static inline int next_promotion_node(int node)
+{
+ return NUMA_NO_NODE;
+}
+
static inline void node_get_allowed_targets(pg_data_t *pgdat, nodemask_t *targets)
{
*targets = NODE_MASK_NONE;
diff --git a/mm/memory-tiers.c b/mm/memory-tiers.c
index 8d5291add2bc..0060ee571cf4 100644
--- a/mm/memory-tiers.c
+++ b/mm/memory-tiers.c
@@ -335,6 +335,49 @@ int next_demotion_node(int node)
return target;
}

+/*
+ * Select a promotion target that is close to the from node among the given
+ * two nodes.
+ *
+ * TODO: consider other decision policy as node_distance may not be precise.
+ */
+static int select_promotion_target(int a, int b, int from)
+{
+ if (node_distance(from, a) < node_distance(from, b))
+ return a;
+ else
+ return b;
+}
+
+/**
+ * next_promotion_node() - Get the next node in the promotion path
+ * @node: The starting node to lookup the next node
+ *
+ * Return: node id for next memory node in the promotion path hierarchy
+ * from @node; NUMA_NO_NODE if @node is the toptier.
+ */
+int next_promotion_node(int node)
+{
+ int target = NUMA_NO_NODE;
+ int nid;
+
+ if (node_is_toptier(node))
+ return NUMA_NO_NODE;
+
+ rcu_read_lock();
+ for_each_node_state(nid, N_MEMORY) {
+ if (node_isset(node, node_demotion[nid].preferred)) {
+ if (target == NUMA_NO_NODE)
+ target = nid;
+ else
+ target = select_promotion_target(nid, target, node);
+ }
+ }
+ rcu_read_unlock();
+
+ return target;
+}
+
static void disable_all_demotion_targets(void)
{
struct memory_tier *memtier;
--
2.34.1


2024-02-26 14:11:18

by Honggyu Kim

[permalink] [raw]
Subject: [PATCH v2 5/7] mm/damon: introduce DAMOS_PROMOTE action for promotion

From: Hyeongtak Ji <[email protected]>

This patch introduces DAMOS_PROMOTE action for paddr mode.

It includes renaming alloc_demote_folio to alloc_migrate_folio to use it
for promotion as well.

Signed-off-by: Hyeongtak Ji <[email protected]>
Signed-off-by: Honggyu Kim <[email protected]>
---
include/linux/damon.h | 2 ++
include/linux/migrate_mode.h | 1 +
include/linux/vm_event_item.h | 1 +
include/trace/events/migrate.h | 3 ++-
mm/damon/paddr.c | 45 ++++++++++++++++++++++++++++------
mm/damon/sysfs-schemes.c | 1 +
mm/vmstat.c | 1 +
7 files changed, 46 insertions(+), 8 deletions(-)

diff --git a/include/linux/damon.h b/include/linux/damon.h
index 86e66772766b..d7e52d0228b4 100644
--- a/include/linux/damon.h
+++ b/include/linux/damon.h
@@ -105,6 +105,7 @@ struct damon_target {
* @DAMOS_NOHUGEPAGE: Call ``madvise()`` for the region with MADV_NOHUGEPAGE.
* @DAMOS_LRU_PRIO: Prioritize the region on its LRU lists.
* @DAMOS_LRU_DEPRIO: Deprioritize the region on its LRU lists.
+ * @DAMOS_PROMOTE: Do promotion for the given region.
* @DAMOS_DEMOTE: Do demotion for the given region.
* @DAMOS_STAT: Do nothing but count the stat.
* @NR_DAMOS_ACTIONS: Total number of DAMOS actions
@@ -123,6 +124,7 @@ enum damos_action {
DAMOS_NOHUGEPAGE,
DAMOS_LRU_PRIO,
DAMOS_LRU_DEPRIO,
+ DAMOS_PROMOTE,
DAMOS_DEMOTE,
DAMOS_STAT, /* Do nothing but only record the stat */
NR_DAMOS_ACTIONS,
diff --git a/include/linux/migrate_mode.h b/include/linux/migrate_mode.h
index f37cc03f9369..63f75eb9abf3 100644
--- a/include/linux/migrate_mode.h
+++ b/include/linux/migrate_mode.h
@@ -29,6 +29,7 @@ enum migrate_reason {
MR_CONTIG_RANGE,
MR_LONGTERM_PIN,
MR_DEMOTION,
+ MR_PROMOTION,
MR_TYPES
};

diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h
index 8abfa1240040..63cf920afeaa 100644
--- a/include/linux/vm_event_item.h
+++ b/include/linux/vm_event_item.h
@@ -44,6 +44,7 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT,
PGDEMOTE_KSWAPD,
PGDEMOTE_DIRECT,
PGDEMOTE_KHUGEPAGED,
+ PGPROMOTE,
PGSCAN_KSWAPD,
PGSCAN_DIRECT,
PGSCAN_KHUGEPAGED,
diff --git a/include/trace/events/migrate.h b/include/trace/events/migrate.h
index 0190ef725b43..f0dd569c1e62 100644
--- a/include/trace/events/migrate.h
+++ b/include/trace/events/migrate.h
@@ -22,7 +22,8 @@
EM( MR_NUMA_MISPLACED, "numa_misplaced") \
EM( MR_CONTIG_RANGE, "contig_range") \
EM( MR_LONGTERM_PIN, "longterm_pin") \
- EMe(MR_DEMOTION, "demotion")
+ EM( MR_DEMOTION, "demotion") \
+ EMe(MR_PROMOTION, "promotion")

/*
* First define the enums in the above macros to be exported to userspace
diff --git a/mm/damon/paddr.c b/mm/damon/paddr.c
index 23e37ce57202..37a7b34a36dd 100644
--- a/mm/damon/paddr.c
+++ b/mm/damon/paddr.c
@@ -229,6 +229,7 @@ static bool damos_pa_filter_out(struct damos *scheme, struct folio *folio)

enum migration_mode {
MIG_PAGEOUT,
+ MIG_PROMOTE,
MIG_DEMOTE,
};

@@ -241,9 +242,26 @@ static unsigned int migrate_folio_list(struct list_head *migrate_folios,
struct pglist_data *pgdat,
enum migration_mode mm)
{
- int target_nid = next_demotion_node(pgdat->node_id);
+ int target_nid;
unsigned int nr_succeeded;
nodemask_t allowed_mask;
+ int reason;
+ enum vm_event_item vm_event;
+
+ switch (mm) {
+ case MIG_PROMOTE:
+ target_nid = next_promotion_node(pgdat->node_id);
+ reason = MR_PROMOTION;
+ vm_event = PGPROMOTE;
+ break;
+ case MIG_DEMOTE:
+ target_nid = next_demotion_node(pgdat->node_id);
+ reason = MR_DEMOTION;
+ vm_event = PGDEMOTE_DIRECT;
+ break;
+ default:
+ return 0;
+ }

struct migration_target_control mtc = {
/*
@@ -263,14 +281,19 @@ static unsigned int migrate_folio_list(struct list_head *migrate_folios,
if (list_empty(migrate_folios))
return 0;

- node_get_allowed_targets(pgdat, &allowed_mask);
+ if (mm == MIG_DEMOTE) {
+ node_get_allowed_targets(pgdat, &allowed_mask);
+ } else if (mm == MIG_PROMOTE) {
+ /* TODO: Need to add upper_tier_mask at struct memory_tier. */
+ allowed_mask = NODE_MASK_NONE;
+ }

/* Migration ignores all cpuset and mempolicy settings */
migrate_pages(migrate_folios, alloc_migrate_folio, NULL,
- (unsigned long)&mtc, MIGRATE_ASYNC, MR_DEMOTION,
+ (unsigned long)&mtc, MIGRATE_ASYNC, reason,
&nr_succeeded);

- __count_vm_events(PGDEMOTE_DIRECT, nr_succeeded);
+ __count_vm_events(vm_event, nr_succeeded);

return nr_succeeded;
}
@@ -359,7 +382,8 @@ static unsigned int damon_pa_migrate_folio_list(struct list_head *folio_list,
VM_BUG_ON_FOLIO(folio_test_active(folio), folio);

references = folio_check_references(folio);
- if (references == FOLIOREF_KEEP)
+ if (references == FOLIOREF_KEEP ||
+ (references == FOLIOREF_RECLAIM && mm == MIG_PROMOTE))
goto keep_locked;

/* Relocate its contents to another node. */
@@ -452,8 +476,10 @@ static unsigned long damon_pa_migrate(struct damon_region *r, struct damos *s,
if (damos_pa_filter_out(s, folio))
goto put_folio;

- folio_clear_referenced(folio);
- folio_test_clear_young(folio);
+ if (mm != MIG_PROMOTE) {
+ folio_clear_referenced(folio);
+ folio_test_clear_young(folio);
+ }
if (!folio_isolate_lru(folio))
goto put_folio;
/*
@@ -471,6 +497,7 @@ static unsigned long damon_pa_migrate(struct damon_region *r, struct damos *s,
case MIG_PAGEOUT:
applied = reclaim_pages(&folio_list);
break;
+ case MIG_PROMOTE:
case MIG_DEMOTE:
applied = damon_pa_migrate_pages(&folio_list, mm);
break;
@@ -530,6 +557,8 @@ static unsigned long damon_pa_apply_scheme(struct damon_ctx *ctx,
return damon_pa_mark_accessed(r, scheme);
case DAMOS_LRU_DEPRIO:
return damon_pa_deactivate_pages(r, scheme);
+ case DAMOS_PROMOTE:
+ return damon_pa_migrate(r, scheme, MIG_PROMOTE);
case DAMOS_DEMOTE:
return damon_pa_migrate(r, scheme, MIG_DEMOTE);
case DAMOS_STAT:
@@ -552,6 +581,8 @@ static int damon_pa_scheme_score(struct damon_ctx *context,
return damon_hot_score(context, r, scheme);
case DAMOS_LRU_DEPRIO:
return damon_cold_score(context, r, scheme);
+ case DAMOS_PROMOTE:
+ return damon_hot_score(context, r, scheme);
case DAMOS_DEMOTE:
return damon_cold_score(context, r, scheme);
default:
diff --git a/mm/damon/sysfs-schemes.c b/mm/damon/sysfs-schemes.c
index 53e47fad5021..9bc48932eb6c 100644
--- a/mm/damon/sysfs-schemes.c
+++ b/mm/damon/sysfs-schemes.c
@@ -1186,6 +1186,7 @@ static const char * const damon_sysfs_damos_action_strs[] = {
"nohugepage",
"lru_prio",
"lru_deprio",
+ "promote",
"demote",
"stat",
};
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 359460deb377..c703abdb8137 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1282,6 +1282,7 @@ const char * const vmstat_text[] = {
"pgdemote_kswapd",
"pgdemote_direct",
"pgdemote_khugepaged",
+ "pgpromote",
"pgscan_kswapd",
"pgscan_direct",
"pgscan_khugepaged",
--
2.34.1


2024-02-26 14:12:10

by Honggyu Kim

[permalink] [raw]
Subject: [PATCH v2 7/7] mm/damon/sysfs-schemes: apply target_nid for promote and demote actions

From: Hyeongtak Ji <[email protected]>

This patch changes DAMOS_PROMOTE and DAMOS_DEMOTE to use target_nid of
sysfs as the destination NUMA node of migration. This has been tested
on qemu as follows:

$ cd /sys/kernel/mm/damon/admin/kdamonds/<N>
$ cat contexts/<N>/schemes/<N>/action
promote
$ echo 1 > contexts/<N>/schemes/<N>/target_nid
$ echo commit > state
$ numactl -p 2 ./hot_cold 500M 600M &
$ numastat -c -p hot_cold

Per-node process memory usage (in MBs)
PID Node 0 Node 1 Node 2 Total
-------------- ------ ------ ------ -----
701 (hot_cold) 0 501 601 1101

Signed-off-by: Hyeongtak Ji <[email protected]>
Signed-off-by: Honggyu Kim <[email protected]>
---
mm/damon/paddr.c | 27 +++++++++++++++++----------
1 file changed, 17 insertions(+), 10 deletions(-)

diff --git a/mm/damon/paddr.c b/mm/damon/paddr.c
index 37a7b34a36dd..5e057a69464f 100644
--- a/mm/damon/paddr.c
+++ b/mm/damon/paddr.c
@@ -240,9 +240,9 @@ enum migration_mode {
*/
static unsigned int migrate_folio_list(struct list_head *migrate_folios,
struct pglist_data *pgdat,
- enum migration_mode mm)
+ enum migration_mode mm,
+ int target_nid)
{
- int target_nid;
unsigned int nr_succeeded;
nodemask_t allowed_mask;
int reason;
@@ -250,12 +250,14 @@ static unsigned int migrate_folio_list(struct list_head *migrate_folios,

switch (mm) {
case MIG_PROMOTE:
- target_nid = next_promotion_node(pgdat->node_id);
+ if (target_nid == NUMA_NO_NODE)
+ target_nid = next_promotion_node(pgdat->node_id);
reason = MR_PROMOTION;
vm_event = PGPROMOTE;
break;
case MIG_DEMOTE:
- target_nid = next_demotion_node(pgdat->node_id);
+ if (target_nid == NUMA_NO_NODE)
+ target_nid = next_demotion_node(pgdat->node_id);
reason = MR_DEMOTION;
vm_event = PGDEMOTE_DIRECT;
break;
@@ -358,7 +360,8 @@ static enum folio_references folio_check_references(struct folio *folio)
*/
static unsigned int damon_pa_migrate_folio_list(struct list_head *folio_list,
struct pglist_data *pgdat,
- enum migration_mode mm)
+ enum migration_mode mm,
+ int target_nid)
{
unsigned int nr_migrated = 0;
struct folio *folio;
@@ -399,7 +402,7 @@ static unsigned int damon_pa_migrate_folio_list(struct list_head *folio_list,
/* 'folio_list' is always empty here */

/* Migrate folios selected for migration */
- nr_migrated += migrate_folio_list(&migrate_folios, pgdat, mm);
+ nr_migrated += migrate_folio_list(&migrate_folios, pgdat, mm, target_nid);
/* Folios that could not be migrated are still in @migrate_folios */
if (!list_empty(&migrate_folios)) {
/* Folios which weren't migrated go back on @folio_list */
@@ -426,7 +429,8 @@ static unsigned int damon_pa_migrate_folio_list(struct list_head *folio_list,
* common function for both cases.
*/
static unsigned long damon_pa_migrate_pages(struct list_head *folio_list,
- enum migration_mode mm)
+ enum migration_mode mm,
+ int target_nid)
{
int nid;
unsigned int nr_migrated = 0;
@@ -449,12 +453,14 @@ static unsigned long damon_pa_migrate_pages(struct list_head *folio_list,
}

nr_migrated += damon_pa_migrate_folio_list(&node_folio_list,
- NODE_DATA(nid), mm);
+ NODE_DATA(nid), mm,
+ target_nid);
nid = folio_nid(lru_to_folio(folio_list));
} while (!list_empty(folio_list));

nr_migrated += damon_pa_migrate_folio_list(&node_folio_list,
- NODE_DATA(nid), mm);
+ NODE_DATA(nid), mm,
+ target_nid);

memalloc_noreclaim_restore(noreclaim_flag);

@@ -499,7 +505,8 @@ static unsigned long damon_pa_migrate(struct damon_region *r, struct damos *s,
break;
case MIG_PROMOTE:
case MIG_DEMOTE:
- applied = damon_pa_migrate_pages(&folio_list, mm);
+ applied = damon_pa_migrate_pages(&folio_list, mm,
+ s->target_nid);
break;
default:
/* Unexpected migration mode. */
--
2.34.1


2024-02-26 14:15:24

by Honggyu Kim

[permalink] [raw]
Subject: [PATCH v2 2/7] mm: make alloc_demote_folio externally invokable for migration

The alloc_demote_folio can be used out of vmscan.c so it'd be better to
remove static keyword from it.

This function can also be used for both demotion and promotion so it'd
be better to rename it from alloc_demote_folio to alloc_migrate_folio.

Signed-off-by: Honggyu Kim <[email protected]>
---
mm/internal.h | 1 +
mm/vmscan.c | 10 +++++++---
2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/mm/internal.h b/mm/internal.h
index b61034bd50f5..61af6641235d 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -866,6 +866,7 @@ extern unsigned long __must_check vm_mmap_pgoff(struct file *, unsigned long,
unsigned long, unsigned long);

extern void set_pageblock_order(void);
+struct folio *alloc_migrate_folio(struct folio *src, unsigned long private);
unsigned long reclaim_pages(struct list_head *folio_list);
unsigned int reclaim_clean_pages_from_list(struct zone *zone,
struct list_head *folio_list);
diff --git a/mm/vmscan.c b/mm/vmscan.c
index bba207f41b14..b8a1a1599e3c 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -910,8 +910,7 @@ static void folio_check_dirty_writeback(struct folio *folio,
mapping->a_ops->is_dirty_writeback(folio, dirty, writeback);
}

-static struct folio *alloc_demote_folio(struct folio *src,
- unsigned long private)
+struct folio *alloc_migrate_folio(struct folio *src, unsigned long private)
{
struct folio *dst;
nodemask_t *allowed_mask;
@@ -935,6 +934,11 @@ static struct folio *alloc_demote_folio(struct folio *src,
if (dst)
return dst;

+ /*
+ * Allocation failed from the target node so try to allocate from
+ * fallback nodes based on allowed_mask.
+ * See fallback_alloc() at mm/slab.c.
+ */
mtc->gfp_mask &= ~__GFP_THISNODE;
mtc->nmask = allowed_mask;

@@ -973,7 +977,7 @@ static unsigned int demote_folio_list(struct list_head *demote_folios,
node_get_allowed_targets(pgdat, &allowed_mask);

/* Demotion ignores all cpuset and mempolicy settings */
- migrate_pages(demote_folios, alloc_demote_folio, NULL,
+ migrate_pages(demote_folios, alloc_migrate_folio, NULL,
(unsigned long)&mtc, MIGRATE_ASYNC, MR_DEMOTION,
&nr_succeeded);

--
2.34.1


2024-02-26 14:15:32

by Honggyu Kim

[permalink] [raw]
Subject: [PATCH v2 3/7] mm/damon: introduce DAMOS_DEMOTE action for demotion

This patch introduces DAMOS_DEMOTE action, which is similar to
DAMOS_PAGEOUT, but demote folios instead of swapping them out.

Since there are some common routines with pageout, many functions have
similar logics between pageout and demote.

damon_pa_migrate_folio_list() is a minimized version of
shrink_folio_list(), but it's minified only for demotion.

Signed-off-by: Honggyu Kim <[email protected]>
Signed-off-by: Hyeongtak Ji <[email protected]>
---
include/linux/damon.h | 2 +
mm/damon/paddr.c | 222 ++++++++++++++++++++++++++++++++++++++-
mm/damon/sysfs-schemes.c | 1 +
3 files changed, 224 insertions(+), 1 deletion(-)

diff --git a/include/linux/damon.h b/include/linux/damon.h
index e00ddf1ed39c..86e66772766b 100644
--- a/include/linux/damon.h
+++ b/include/linux/damon.h
@@ -105,6 +105,7 @@ struct damon_target {
* @DAMOS_NOHUGEPAGE: Call ``madvise()`` for the region with MADV_NOHUGEPAGE.
* @DAMOS_LRU_PRIO: Prioritize the region on its LRU lists.
* @DAMOS_LRU_DEPRIO: Deprioritize the region on its LRU lists.
+ * @DAMOS_DEMOTE: Do demotion for the given region.
* @DAMOS_STAT: Do nothing but count the stat.
* @NR_DAMOS_ACTIONS: Total number of DAMOS actions
*
@@ -122,6 +123,7 @@ enum damos_action {
DAMOS_NOHUGEPAGE,
DAMOS_LRU_PRIO,
DAMOS_LRU_DEPRIO,
+ DAMOS_DEMOTE,
DAMOS_STAT, /* Do nothing but only record the stat */
NR_DAMOS_ACTIONS,
};
diff --git a/mm/damon/paddr.c b/mm/damon/paddr.c
index 277a1c4d833c..23e37ce57202 100644
--- a/mm/damon/paddr.c
+++ b/mm/damon/paddr.c
@@ -12,6 +12,9 @@
#include <linux/pagemap.h>
#include <linux/rmap.h>
#include <linux/swap.h>
+#include <linux/memory-tiers.h>
+#include <linux/migrate.h>
+#include <linux/mm_inline.h>

#include "../internal.h"
#include "ops-common.h"
@@ -226,8 +229,214 @@ static bool damos_pa_filter_out(struct damos *scheme, struct folio *folio)

enum migration_mode {
MIG_PAGEOUT,
+ MIG_DEMOTE,
};

+/*
+ * XXX: This is copied from demote_folio_list as renamed as migrate_folio_list.
+ * Take folios on @migrate_folios and attempt to migrate them to another node.
+ * Folios which are not migrated are left on @migrate_folios.
+ */
+static unsigned int migrate_folio_list(struct list_head *migrate_folios,
+ struct pglist_data *pgdat,
+ enum migration_mode mm)
+{
+ int target_nid = next_demotion_node(pgdat->node_id);
+ unsigned int nr_succeeded;
+ nodemask_t allowed_mask;
+
+ struct migration_target_control mtc = {
+ /*
+ * Allocate from 'node', or fail quickly and quietly.
+ * When this happens, 'page' will likely just be discarded
+ * instead of migrated.
+ */
+ .gfp_mask = (GFP_HIGHUSER_MOVABLE & ~__GFP_RECLAIM) | __GFP_NOWARN |
+ __GFP_NOMEMALLOC | GFP_NOWAIT,
+ .nid = target_nid,
+ .nmask = &allowed_mask
+ };
+
+ if (pgdat->node_id == target_nid || target_nid == NUMA_NO_NODE)
+ return 0;
+
+ if (list_empty(migrate_folios))
+ return 0;
+
+ node_get_allowed_targets(pgdat, &allowed_mask);
+
+ /* Migration ignores all cpuset and mempolicy settings */
+ migrate_pages(migrate_folios, alloc_migrate_folio, NULL,
+ (unsigned long)&mtc, MIGRATE_ASYNC, MR_DEMOTION,
+ &nr_succeeded);
+
+ __count_vm_events(PGDEMOTE_DIRECT, nr_succeeded);
+
+ return nr_succeeded;
+}
+
+enum folio_references {
+ FOLIOREF_RECLAIM,
+ FOLIOREF_KEEP,
+ FOLIOREF_ACTIVATE,
+};
+
+/*
+ * XXX: This is just copied and simplified from folio_check_references at
+ * mm/vmscan.c but without having scan_control.
+ */
+static enum folio_references folio_check_references(struct folio *folio)
+{
+ int referenced_ptes, referenced_folio;
+ unsigned long vm_flags;
+
+ referenced_ptes = folio_referenced(folio, 1, NULL, &vm_flags);
+ referenced_folio = folio_test_clear_referenced(folio);
+
+ /* rmap lock contention: rotate */
+ if (referenced_ptes == -1)
+ return FOLIOREF_KEEP;
+
+ if (referenced_ptes) {
+ /*
+ * All mapped folios start out with page table
+ * references from the instantiating fault, so we need
+ * to look twice if a mapped file/anon folio is used more
+ * than once.
+ *
+ * Mark it and spare it for another trip around the
+ * inactive list. Another page table reference will
+ * lead to its activation.
+ *
+ * Note: the mark is set for activated folios as well
+ * so that recently deactivated but used folios are
+ * quickly recovered.
+ */
+ folio_set_referenced(folio);
+
+ if (referenced_folio || referenced_ptes > 1)
+ return FOLIOREF_ACTIVATE;
+
+ /*
+ * Activate file-backed executable folios after first usage.
+ */
+ if ((vm_flags & VM_EXEC) && folio_is_file_lru(folio))
+ return FOLIOREF_ACTIVATE;
+
+ return FOLIOREF_KEEP;
+ }
+
+ return FOLIOREF_RECLAIM;
+}
+
+/*
+ * XXX: This is minimized implmentation based on shrink_folio_list only for
+ * the demotion calling demote_folio_list.
+ */
+static unsigned int damon_pa_migrate_folio_list(struct list_head *folio_list,
+ struct pglist_data *pgdat,
+ enum migration_mode mm)
+{
+ unsigned int nr_migrated = 0;
+ struct folio *folio;
+ LIST_HEAD(ret_folios);
+ LIST_HEAD(migrate_folios);
+
+ cond_resched();
+
+ while (!list_empty(folio_list)) {
+ struct folio *folio;
+ enum folio_references references;
+
+ cond_resched();
+
+ folio = lru_to_folio(folio_list);
+ list_del(&folio->lru);
+
+ if (!folio_trylock(folio))
+ goto keep;
+
+ VM_BUG_ON_FOLIO(folio_test_active(folio), folio);
+
+ references = folio_check_references(folio);
+ if (references == FOLIOREF_KEEP)
+ goto keep_locked;
+
+ /* Relocate its contents to another node. */
+ list_add(&folio->lru, &migrate_folios);
+ folio_unlock(folio);
+ continue;
+keep_locked:
+ folio_unlock(folio);
+keep:
+ list_add(&folio->lru, &ret_folios);
+ VM_BUG_ON_FOLIO(folio_test_lru(folio), folio);
+ }
+ /* 'folio_list' is always empty here */
+
+ /* Migrate folios selected for migration */
+ nr_migrated += migrate_folio_list(&migrate_folios, pgdat, mm);
+ /* Folios that could not be migrated are still in @migrate_folios */
+ if (!list_empty(&migrate_folios)) {
+ /* Folios which weren't migrated go back on @folio_list */
+ list_splice_init(&migrate_folios, folio_list);
+ }
+
+ try_to_unmap_flush();
+
+ list_splice(&ret_folios, folio_list);
+
+ while (!list_empty(folio_list)) {
+ folio = lru_to_folio(folio_list);
+ list_del(&folio->lru);
+ folio_putback_lru(folio);
+ }
+
+ return nr_migrated;
+}
+
+/*
+ * XXX: This is almost identical to reclaim_pages() in mm/vmscan.c, but it
+ * internally calls damon_pa_migrate_folio_list() instead of
+ * reclaim_folio_list(). We might be better to think if we can have a
+ * common function for both cases.
+ */
+static unsigned long damon_pa_migrate_pages(struct list_head *folio_list,
+ enum migration_mode mm)
+{
+ int nid;
+ unsigned int nr_migrated = 0;
+ LIST_HEAD(node_folio_list);
+ unsigned int noreclaim_flag;
+
+ if (list_empty(folio_list))
+ return nr_migrated;
+
+ noreclaim_flag = memalloc_noreclaim_save();
+
+ nid = folio_nid(lru_to_folio(folio_list));
+ do {
+ struct folio *folio = lru_to_folio(folio_list);
+
+ if (nid == folio_nid(folio)) {
+ folio_clear_active(folio);
+ list_move(&folio->lru, &node_folio_list);
+ continue;
+ }
+
+ nr_migrated += damon_pa_migrate_folio_list(&node_folio_list,
+ NODE_DATA(nid), mm);
+ nid = folio_nid(lru_to_folio(folio_list));
+ } while (!list_empty(folio_list));
+
+ nr_migrated += damon_pa_migrate_folio_list(&node_folio_list,
+ NODE_DATA(nid), mm);
+
+ memalloc_noreclaim_restore(noreclaim_flag);
+
+ return nr_migrated;
+}
+
static unsigned long damon_pa_migrate(struct damon_region *r, struct damos *s,
enum migration_mode mm)
{
@@ -247,7 +456,11 @@ static unsigned long damon_pa_migrate(struct damon_region *r, struct damos *s,
folio_test_clear_young(folio);
if (!folio_isolate_lru(folio))
goto put_folio;
- if (folio_test_unevictable(folio))
+ /*
+ * Since unevictable folios can be demoted or promoted,
+ * unevictable test is needed only for pageout.
+ */
+ if (mm == MIG_PAGEOUT && folio_test_unevictable(folio))
folio_putback_lru(folio);
else
list_add(&folio->lru, &folio_list);
@@ -258,6 +471,9 @@ static unsigned long damon_pa_migrate(struct damon_region *r, struct damos *s,
case MIG_PAGEOUT:
applied = reclaim_pages(&folio_list);
break;
+ case MIG_DEMOTE:
+ applied = damon_pa_migrate_pages(&folio_list, mm);
+ break;
default:
/* Unexpected migration mode. */
return 0;
@@ -314,6 +530,8 @@ static unsigned long damon_pa_apply_scheme(struct damon_ctx *ctx,
return damon_pa_mark_accessed(r, scheme);
case DAMOS_LRU_DEPRIO:
return damon_pa_deactivate_pages(r, scheme);
+ case DAMOS_DEMOTE:
+ return damon_pa_migrate(r, scheme, MIG_DEMOTE);
case DAMOS_STAT:
break;
default:
@@ -334,6 +552,8 @@ static int damon_pa_scheme_score(struct damon_ctx *context,
return damon_hot_score(context, r, scheme);
case DAMOS_LRU_DEPRIO:
return damon_cold_score(context, r, scheme);
+ case DAMOS_DEMOTE:
+ return damon_cold_score(context, r, scheme);
default:
break;
}
diff --git a/mm/damon/sysfs-schemes.c b/mm/damon/sysfs-schemes.c
index fe0fe2562000..53e47fad5021 100644
--- a/mm/damon/sysfs-schemes.c
+++ b/mm/damon/sysfs-schemes.c
@@ -1186,6 +1186,7 @@ static const char * const damon_sysfs_damos_action_strs[] = {
"nohugepage",
"lru_prio",
"lru_deprio",
+ "demote",
"stat",
};

--
2.34.1


2024-02-26 14:16:23

by Honggyu Kim

[permalink] [raw]
Subject: [PATCH v2 6/7] mm/damon/sysfs-schemes: add target_nid on sysfs-schemes

From: Hyeongtak Ji <[email protected]>

This patch adds target_nid under
/sys/kernel/mm/damon/admin/kdamonds/<N>/contexts/<N>/schemes/<N>

target_nid can be used as the destination node for DAMOS actions such as
DAMOS_DEMOTE or DAMOS_PROMOTE in the future.

Signed-off-by: Hyeongtak Ji <[email protected]>
Signed-off-by: Honggyu Kim <[email protected]>
---
include/linux/damon.h | 11 ++++++++++-
mm/damon/core.c | 5 ++++-
mm/damon/dbgfs.c | 2 +-
mm/damon/lru_sort.c | 3 ++-
mm/damon/reclaim.c | 3 ++-
mm/damon/sysfs-schemes.c | 37 ++++++++++++++++++++++++++++++++++++-
6 files changed, 55 insertions(+), 6 deletions(-)

diff --git a/include/linux/damon.h b/include/linux/damon.h
index d7e52d0228b4..4d270956dbd0 100644
--- a/include/linux/damon.h
+++ b/include/linux/damon.h
@@ -321,6 +321,7 @@ struct damos_access_pattern {
* @apply_interval_us: The time between applying the @action.
* @quota: Control the aggressiveness of this scheme.
* @wmarks: Watermarks for automated (in)activation of this scheme.
+ * @target_nid: Destination node if @action is "promote" or "demote".
* @filters: Additional set of &struct damos_filter for &action.
* @stat: Statistics of this scheme.
* @list: List head for siblings.
@@ -336,6 +337,10 @@ struct damos_access_pattern {
* monitoring context are inactive, DAMON stops monitoring either, and just
* repeatedly checks the watermarks.
*
+ * @target_nid is used to set the destination node for promote or demote
+ * actions, which means it's only meaningful when @action is either "promote" or
+ * "demote".
+ *
* Before applying the &action to a memory region, &struct damon_operations
* implementation could check pages of the region and skip &action to respect
* &filters
@@ -357,6 +362,9 @@ struct damos {
/* public: */
struct damos_quota quota;
struct damos_watermarks wmarks;
+ union {
+ int target_nid;
+ };
struct list_head filters;
struct damos_stat stat;
struct list_head list;
@@ -661,7 +669,8 @@ struct damos *damon_new_scheme(struct damos_access_pattern *pattern,
enum damos_action action,
unsigned long apply_interval_us,
struct damos_quota *quota,
- struct damos_watermarks *wmarks);
+ struct damos_watermarks *wmarks,
+ int target_nid);
void damon_add_scheme(struct damon_ctx *ctx, struct damos *s);
void damon_destroy_scheme(struct damos *s);

diff --git a/mm/damon/core.c b/mm/damon/core.c
index 3a05e71509b9..0c2472818fb9 100644
--- a/mm/damon/core.c
+++ b/mm/damon/core.c
@@ -316,7 +316,8 @@ struct damos *damon_new_scheme(struct damos_access_pattern *pattern,
enum damos_action action,
unsigned long apply_interval_us,
struct damos_quota *quota,
- struct damos_watermarks *wmarks)
+ struct damos_watermarks *wmarks,
+ int target_nid)
{
struct damos *scheme;

@@ -341,6 +342,8 @@ struct damos *damon_new_scheme(struct damos_access_pattern *pattern,
scheme->wmarks = *wmarks;
scheme->wmarks.activated = true;

+ scheme->target_nid = target_nid;
+
return scheme;
}

diff --git a/mm/damon/dbgfs.c b/mm/damon/dbgfs.c
index dc0ea1fc30ca..29b427dd1186 100644
--- a/mm/damon/dbgfs.c
+++ b/mm/damon/dbgfs.c
@@ -279,7 +279,7 @@ static struct damos **str_to_schemes(const char *str, ssize_t len,

pos += parsed;
scheme = damon_new_scheme(&pattern, action, 0, &quota,
- &wmarks);
+ &wmarks, NUMA_NO_NODE);
if (!scheme)
goto fail;

diff --git a/mm/damon/lru_sort.c b/mm/damon/lru_sort.c
index f2e5f9431892..fd0492637fce 100644
--- a/mm/damon/lru_sort.c
+++ b/mm/damon/lru_sort.c
@@ -163,7 +163,8 @@ static struct damos *damon_lru_sort_new_scheme(
/* under the quota. */
&quota,
/* (De)activate this according to the watermarks. */
- &damon_lru_sort_wmarks);
+ &damon_lru_sort_wmarks,
+ NUMA_NO_NODE);
}

/* Create a DAMON-based operation scheme for hot memory regions */
diff --git a/mm/damon/reclaim.c b/mm/damon/reclaim.c
index ab974e477d2f..973ac5df84eb 100644
--- a/mm/damon/reclaim.c
+++ b/mm/damon/reclaim.c
@@ -147,7 +147,8 @@ static struct damos *damon_reclaim_new_scheme(void)
/* under the quota. */
&damon_reclaim_quota,
/* (De)activate this according to the watermarks. */
- &damon_reclaim_wmarks);
+ &damon_reclaim_wmarks,
+ NUMA_NO_NODE);
}

static int damon_reclaim_apply_parameters(void)
diff --git a/mm/damon/sysfs-schemes.c b/mm/damon/sysfs-schemes.c
index 9bc48932eb6c..8bf5aa98d916 100644
--- a/mm/damon/sysfs-schemes.c
+++ b/mm/damon/sysfs-schemes.c
@@ -6,6 +6,7 @@
*/

#include <linux/slab.h>
+#include <linux/numa.h>

#include "sysfs-common.h"

@@ -1175,6 +1176,7 @@ struct damon_sysfs_scheme {
struct damon_sysfs_scheme_filters *filters;
struct damon_sysfs_stats *stats;
struct damon_sysfs_scheme_regions *tried_regions;
+ int target_nid;
};

/* This should match with enum damos_action */
@@ -1202,6 +1204,7 @@ static struct damon_sysfs_scheme *damon_sysfs_scheme_alloc(
scheme->kobj = (struct kobject){};
scheme->action = action;
scheme->apply_interval_us = apply_interval_us;
+ scheme->target_nid = NUMA_NO_NODE;
return scheme;
}

@@ -1424,6 +1427,32 @@ static ssize_t apply_interval_us_store(struct kobject *kobj,
return err ? err : count;
}

+static ssize_t target_nid_show(struct kobject *kobj,
+ struct kobj_attribute *attr, char *buf)
+{
+ struct damon_sysfs_scheme *scheme = container_of(kobj,
+ struct damon_sysfs_scheme, kobj);
+
+ return sysfs_emit(buf, "%d\n", scheme->target_nid);
+}
+
+static ssize_t target_nid_store(struct kobject *kobj,
+ struct kobj_attribute *attr, const char *buf, size_t count)
+{
+ struct damon_sysfs_scheme *scheme = container_of(kobj,
+ struct damon_sysfs_scheme, kobj);
+ int err = 0;
+
+ if (scheme->action != DAMOS_DEMOTE &&
+ scheme->action != DAMOS_PROMOTE)
+ return -EINVAL;
+
+ /* TODO: error handling for target_nid range. */
+ err = kstrtoint(buf, 0, &scheme->target_nid);
+
+ return err ? err : count;
+}
+
static void damon_sysfs_scheme_release(struct kobject *kobj)
{
kfree(container_of(kobj, struct damon_sysfs_scheme, kobj));
@@ -1435,9 +1464,13 @@ static struct kobj_attribute damon_sysfs_scheme_action_attr =
static struct kobj_attribute damon_sysfs_scheme_apply_interval_us_attr =
__ATTR_RW_MODE(apply_interval_us, 0600);

+static struct kobj_attribute damon_sysfs_scheme_target_nid_attr =
+ __ATTR_RW_MODE(target_nid, 0600);
+
static struct attribute *damon_sysfs_scheme_attrs[] = {
&damon_sysfs_scheme_action_attr.attr,
&damon_sysfs_scheme_apply_interval_us_attr.attr,
+ &damon_sysfs_scheme_target_nid_attr.attr,
NULL,
};
ATTRIBUTE_GROUPS(damon_sysfs_scheme);
@@ -1690,7 +1723,8 @@ static struct damos *damon_sysfs_mk_scheme(
};

scheme = damon_new_scheme(&pattern, sysfs_scheme->action,
- sysfs_scheme->apply_interval_us, &quota, &wmarks);
+ sysfs_scheme->apply_interval_us, &quota, &wmarks,
+ sysfs_scheme->target_nid);
if (!scheme)
return NULL;

@@ -1721,6 +1755,7 @@ static void damon_sysfs_update_scheme(struct damos *scheme,

scheme->action = sysfs_scheme->action;
scheme->apply_interval_us = sysfs_scheme->apply_interval_us;
+ scheme->target_nid = sysfs_scheme->target_nid;

scheme->quota.ms = sysfs_quotas->ms;
scheme->quota.sz = sysfs_quotas->sz;
--
2.34.1


2024-02-27 23:51:33

by SeongJae Park

[permalink] [raw]
Subject: Re: [RFC PATCH v2 0/7] DAMON based 2-tier memory management for CXL memory

On Mon, 26 Feb 2024 23:05:46 +0900 Honggyu Kim <[email protected]> wrote:

> There was an RFC IDEA "DAMOS-based Tiered-Memory Management" previously
> posted at [1].
>
> It says there is no implementation of the demote/promote DAMOS action
> are made. This RFC is about its implementation for physical address
> space.
>
>
> Introduction
> ============
>
> With the advent of CXL/PCIe attached DRAM, which will be called simply
> as CXL memory in this cover letter, some systems are becoming more
> heterogeneous having memory systems with different latency and bandwidth
> characteristics. They are usually handled as different NUMA nodes in
> separate memory tiers and CXL memory is used as slow tiers because of
> its protocol overhead compared to local DRAM.
>
> In this kind of systems, we need to be careful placing memory pages on
> proper NUMA nodes based on the memory access frequency. Otherwise, some
> frequently accessed pages might reside on slow tiers and it makes
> performance degradation unexpectedly. Moreover, the memory access
> patterns can be changed at runtime.
>
> To handle this problem, we need a way to monitor the memory access
> patterns and migrate pages based on their access temperature. The
> DAMON(Data Access MONitor) framework and its DAMOS(DAMON-based Operation
> Schemes) can be useful features for monitoring and migrating pages.
> DAMOS provides multiple actions based on DAMON monitoring results and it
> can be used for proactive reclaim, which means swapping cold pages out
> with DAMOS_PAGEOUT action, but it doesn't support migration actions such
> as demotion and promotion between tiered memory nodes.
>
> This series supports two new DAMOS actions; DAMOS_DEMOTE for demotion
> from fast tiers and DAMOS_PROMOTE for promotion from slow tiers. This
> prevents hot pages from being stuck on slow tiers, which makes
> performance degradation and cold pages can be proactively demoted to
> slow tiers so that the system can increase the chance to allocate more
> hot pages to fast tiers.
>
> The DAMON provides various tuning knobs but we found that the proactive
> demotion for cold pages is especially useful when the system is running
> out of memory on its fast tier nodes.
>
> Our evaluation result shows that it reduces the performance slowdown
> compared to the default memory policy from 15~17% to 4~5% when the
> system runs under high memory pressure on its fast tier DRAM nodes.
>
>
> DAMON configuration
> ===================
>
> The specific DAMON configuration doesn't have to be in the scope of this
> patch series, but some rough idea is better to be shared to explain the
> evaluation result.
>
> The DAMON provides many knobs for fine tuning but its configuration file
> is generated by HMSDK[2]. It includes gen_config.py script that
> generates a json file with the full config of DAMON knobs and it creates
> multiple kdamonds for each NUMA node when the DAMON is enabled so that
> it can run hot/cold based migration for tiered memory.

I was feeling a bit confused from here since DAMON doesn't receive parameters
via a file. To my understanding, the 'configuration file' means the input file
for DAMON user-space tool, damo, not DAMON. Just a trivial thing, but making
it clear if possible could help readers in my opinion.

>
>
> Evaluation Workload
> ===================
>
> The performance evaluation is done with redis[3], which is a widely used
> in-memory database and the memory access patterns are generated via
> YCSB[4]. We have measured two different workloads with zipfian and
> latest distributions but their configs are slightly modified to make
> memory usage higher and execution time longer for better evaluation.
>
> The idea of evaluation using these demote and promote actions covers
> system-wide memory management rather than partitioning hot/cold pages of
> a single workload. The default memory allocation policy creates pages
> to the fast tier DRAM node first, then allocates newly created pages to
> the slow tier CXL node when the DRAM node has insufficient free space.
> Once the page allocation is done then those pages never move between
> NUMA nodes. It's not true when using numa balancing, but it is not the
> scope of this DAMON based 2-tier memory management support.
>
> If the working set of redis can be fit fully into the DRAM node, then
> the redis will access the fast DRAM only. Since the performance of DRAM
> only is faster than partially accessing CXL memory in slow tiers, this
> environment is not useful to evaluate this patch series.
>
> To make pages of redis be distributed across fast DRAM node and slow
> CXL node to evaluate our demote and promote actions, we pre-allocate
> some cold memory externally using mmap and memset before launching
> redis-server. We assumed that there are enough amount of cold memory in
> datacenters as TMO[5] and TPP[6] papers mentioned.
>
> The evaluation sequence is as follows.
>
> 1. Turn on DAMON with DAMOS_DEMOTE action for DRAM node and
> DAMOS_PROMOTE action for CXL node. It demotes cold pages on DRAM
> node and promotes hot pages on CXL node in a regular interval.
> 2. Allocate a huge block of cold memory by calling mmap and memset at
> the fast tier DRAM node, then make the process sleep to make the fast
> tier has insufficient memory for redis-server.
> 3. Launch redis-server and load prebaked snapshot image, dump.rdb. The
> redis-server consumes 52GB of anon pages and 33GB of file pages, but
> due to the cold memory allocated at 2, it fails allocating the entire
> memory of redis-server on the fast tier DRAM node so it partially
> allocates the remaining on the slow tier CXL node. The ratio of
> DRAM:CXL depends on the size of the pre-allocated cold memory.
> 4. Run YCSB to make zipfian or latest distribution of memory accesses to
> redis-server, then measure its execution time when it's completed.
> 5. Repeat 4 over 50 times to measure the average execution time for each
> run.
> 6. Increase the cold memory size then repeat goes to 2.
>
> For each test at 4 took about a minute so repeating it 50 times almost
> took about 1 hour for each test with a specific cold memory from 440GB
> to 500GB in 10GB increments for each evaluation. So it took about more
> than 10 hours for both zipfian and latest workloads to get the entire
> evaluation results. Repeating the same test set multiple times doesn't
> show much difference so I think it might be enough to make the result
> reliable.
>
>
> Evaluation Results
> ==================
>
> All the result values are normalized to DRAM-only execution time because
> the workload cannot be faster than DRAM-only unless the workload hits
> the bandwidth peak but our redis test doesn't go beyond the bandwidth
> limit.
>
> So the DRAM-only execution time is the ideal result without affected by
> the gap between DRAM and CXL performance difference. The NUMA node
> environment is as follows.
>
> node0 - local DRAM, 512GB with a CPU socket (fast tier)
> node1 - disabled
> node2 - CXL DRAM, 96GB, no CPU attached (slow tier)
>
> The following is the result of generating zipfian distribution to
> redis-server and the numbers are averaged by 50 times of execution.
>
> 1. YCSB zipfian distribution read only workload
> memory pressure with cold memory on node0 with 512GB of local DRAM.
> =============+================================================+=========
> | cold memory occupied by mmap and memset |
> | 0G 440G 450G 460G 470G 480G 490G 500G |
> =============+================================================+=========
> Execution time normalized to DRAM-only values | GEOMEAN
> -------------+------------------------------------------------+---------
> DRAM-only | 1.00 - - - - - - - | 1.00
> CXL-only | 1.21 - - - - - - - | 1.21
> default | - 1.09 1.10 1.13 1.15 1.18 1.21 1.21 | 1.15
> DAMON 2-tier | - 1.02 1.04 1.05 1.04 1.05 1.05 1.06 | 1.04
> =============+================================================+=========
> CXL usage of redis-server in GB | AVERAGE
> -------------+------------------------------------------------+---------
> DRAM-only | 0.0 - - - - - - - | 0.0
> CXL-only | 52.6 - - - - - - - | 52.6
> default | - 19.4 26.1 32.3 38.5 44.7 50.5 50.3 | 37.4
> DAMON 2-tier | - 0.1 1.6 5.2 8.0 9.1 11.8 13.6 | 7.1
> =============+================================================+=========
>
> Each test result is based on the exeuction environment as follows.
>
> DRAM-only : redis-server uses only local DRAM memory.
> CXL-only : redis-server uses only CXL memory.
> default : default memory policy(MPOL_DEFAULT).
> numa balancing disabled.
> DAMON 2-tier: DAMON enabled with DAMOS_DEMOTE for DRAM nodes and
> DAMOS_PROMOTE for CXL nodes.
>
> The above result shows the "default" execution time goes up as the size
> of cold memory is increased from 440G to 500G because the more cold
> memory used, the more CXL memory is used for the target redis workload
> and this makes the execution time increase.
>
> However, "DAMON 2-tier" result shows less slowdown because the
> DAMOS_DEMOTE action at DRAM node proactively demotes pre-allocated cold
> memory to CXL node and this free space at DRAM increases more chance to
> allocate hot or warm pages of redis-server to fast DRAM node. Moreover,
> DEMOS_PROMOTE action at CXL node also promotes hot pages of redis-server
> to DRAM node actively.
>
> As a result, it makes more memory of redis-server stay in DRAM node
> compared to "default" memory policy and this makes the performance
> improvement.
>
> The following result of latest distribution workload shows similar data.
>
> 2. YCSB latest distribution read only workload
> memory pressure with cold memory on node0 with 512GB of local DRAM.
> =============+================================================+=========
> | cold memory occupied by mmap and memset |
> | 0G 440G 450G 460G 470G 480G 490G 500G |
> =============+================================================+=========
> Execution time normalized to DRAM-only values | GEOMEAN
> -------------+------------------------------------------------+---------
> DRAM-only | 1.00 - - - - - - - | 1.00
> CXL-only | 1.18 - - - - - - - | 1.18
> default | - 1.16 1.15 1.17 1.18 1.16 1.18 1.15 | 1.17
> DAMON 2-tier | - 1.04 1.04 1.05 1.05 1.06 1.05 1.06 | 1.05
> =============+================================================+=========
> CXL usage of redis-server in GB | AVERAGE
> -------------+------------------------------------------------+---------
> DRAM-only | 0.0 - - - - - - - | 0.0
> CXL-only | 52.6 - - - - - - - | 52.6
> default | - 19.3 26.1 32.2 38.5 44.6 50.5 50.6 | 37.4
> DAMON 2-tier | - 1.3 3.8 7.0 4.1 9.4 12.5 16.7 | 7.8
> =============+================================================+=========
>
> In summary of both results, our evaluation shows that "DAMON 2-tier"
> memory management reduces the performance slowdown compared to the
> "default" memory policy from 15~17% to 4~5% when the system runs with
> high memory pressure on its fast tier DRAM nodes.
>
> The similar evaluation was done in another machine that has 256GB of
> local DRAM and 96GB of CXL memory. The performance slowdown is reduced
> from 20~24% for "default" to 5~7% for "DAMON 2-tier".
>
> Having these DAMOS_DEMOTE and DAMOS_PROMOTE actions can make 2-tier
> memory systems run more efficiently under high memory pressures.

Thank you for running the tests again with the new version of the patches and
sharing the results!

>
> Signed-off-by: Honggyu Kim <[email protected]>
> Signed-off-by: Hyeongtak Ji <[email protected]>
> Signed-off-by: Rakie Kim <[email protected]>
>
> [1] https://lore.kernel.org/damon/[email protected]
> [2] https://github.com/skhynix/hmsdk
> [3] https://github.com/redis/redis/tree/7.0.0
> [4] https://github.com/brianfrankcooper/YCSB/tree/0.17.0
> [5] https://dl.acm.org/doi/10.1145/3503222.3507731
> [6] https://dl.acm.org/doi/10.1145/3582016.3582063
>
> Changes from RFC:
> 1. Move most of implementation from mm/vmscan.c to mm/damon/paddr.c.
> 2. Simplify some functions of vmscan.c and used in paddr.c, but need
> to be reviewed more in depth.
> 3. Refactor most functions for common usage for both promote and
> demote actions and introduce an enum migration_mode for its control.
> 4. Add "target_nid" sysfs knob for migration destination node for both
> promote and demote actions.
> 5. Move DAMOS_PROMOTE before DAMOS_DEMOTE and move then even above
> DAMOS_STAT.

Thank you very much for addressing many of my comments.

>
> Honggyu Kim (3):
> mm/damon: refactor DAMOS_PAGEOUT with migration_mode
> mm: make alloc_demote_folio externally invokable for migration
> mm/damon: introduce DAMOS_DEMOTE action for demotion
>
> Hyeongtak Ji (4):
> mm/memory-tiers: add next_promotion_node to find promotion target
> mm/damon: introduce DAMOS_PROMOTE action for promotion
> mm/damon/sysfs-schemes: add target_nid on sysfs-schemes
> mm/damon/sysfs-schemes: apply target_nid for promote and demote
> actions

Honggyu joined DAMON Beer/Coffee/Tea Chat[1] yesterday, and we discussed about
this patchset in high level. Sharing the summary here for open discussion. As
also discussed on the first version of this patchset[2], we want to make single
action for general page migration with minimum changes, but would like to keep
page level access re-check. We also agreed the previously proposed DAMOS
filter-based approach could make sense for the purpose.

Because I was anyway planning making such DAMOS filter for not only
promotion/demotion but other types of DAMOS action, I will start developing the
page level access re-check results based DAMOS filter. Once the implementation
of the prototype is done, I will share the early implementation. Then, Honggyu
will adjust their implementation based on the filter, and run their tests again
and share the results.

[1] https://lore.kernel.org/damon/[email protected]/
[2] https://lore.kernel.org/damon/[email protected]


Thanks,
SJ

>
> include/linux/damon.h | 15 +-
> include/linux/memory-tiers.h | 11 ++
> include/linux/migrate_mode.h | 1 +
> include/linux/vm_event_item.h | 1 +
> include/trace/events/migrate.h | 3 +-
> mm/damon/core.c | 5 +-
> mm/damon/dbgfs.c | 2 +-
> mm/damon/lru_sort.c | 3 +-
> mm/damon/paddr.c | 282 ++++++++++++++++++++++++++++++++-
> mm/damon/reclaim.c | 3 +-
> mm/damon/sysfs-schemes.c | 39 ++++-
> mm/internal.h | 1 +
> mm/memory-tiers.c | 43 +++++
> mm/vmscan.c | 10 +-
> mm/vmstat.c | 1 +
> 15 files changed, 404 insertions(+), 16 deletions(-)
>
>
> base-commit: 0dd3ee31125508cd67f7e7172247f05b7fd1753a
> --
> 2.34.1

2024-03-07 03:06:00

by SeongJae Park

[permalink] [raw]
Subject: Re: [RFC PATCH v2 0/7] DAMON based 2-tier memory management for CXL memory

Hello,

On Tue, 27 Feb 2024 15:51:20 -0800 SeongJae Park <[email protected]> wrote:

> On Mon, 26 Feb 2024 23:05:46 +0900 Honggyu Kim <[email protected]> wrote:
>
> > There was an RFC IDEA "DAMOS-based Tiered-Memory Management" previously
> > posted at [1].
> >
> > It says there is no implementation of the demote/promote DAMOS action
> > are made. This RFC is about its implementation for physical address
> > space.
[...]
> Honggyu joined DAMON Beer/Coffee/Tea Chat[1] yesterday, and we discussed about
> this patchset in high level. Sharing the summary here for open discussion. As
> also discussed on the first version of this patchset[2], we want to make single
> action for general page migration with minimum changes, but would like to keep
> page level access re-check. We also agreed the previously proposed DAMOS
> filter-based approach could make sense for the purpose.
>
> Because I was anyway planning making such DAMOS filter for not only
> promotion/demotion but other types of DAMOS action, I will start developing the
> page level access re-check results based DAMOS filter. Once the implementation
> of the prototype is done, I will share the early implementation. Then, Honggyu
> will adjust their implementation based on the filter, and run their tests again
> and share the results.

I just posted an RFC patchset for the page level access re-check DAMOS filter:
https://lore.kernel.org/r/[email protected]

I hope it to help you better understanding and testing the idea.

>
> [1] https://lore.kernel.org/damon/[email protected]/
> [2] https://lore.kernel.org/damon/[email protected]


Thanks,
SJ

[...]

2024-03-08 08:31:57

by Honggyu Kim

[permalink] [raw]
Subject: Re: Re: [RFC PATCH v2 0/7] DAMON based 2-tier memory management for CXL memory

Hi SeongJae,

I couldn't send email to LKML properly due to internal system issues,
but it's better now so I will be able to communicate better.

On Wed, 6 Mar 2024 19:05:50 -0800 SeongJae Park <[email protected]> wrote:
>
> Hello,
>
> On Tue, 27 Feb 2024 15:51:20 -0800 SeongJae Park <[email protected]> wrote:
>
> > On Mon, 26 Feb 2024 23:05:46 +0900 Honggyu Kim <[email protected]> wrote:
> >
> > > There was an RFC IDEA "DAMOS-based Tiered-Memory Management" previously
> > > posted at [1].
> > >
> > > It says there is no implementation of the demote/promote DAMOS action
> > > are made. This RFC is about its implementation for physical address
> > > space.
> [...]
> > Honggyu joined DAMON Beer/Coffee/Tea Chat[1] yesterday, and we discussed about
> > this patchset in high level. Sharing the summary here for open discussion. As
> > also discussed on the first version of this patchset[2], we want to make single
> > action for general page migration with minimum changes, but would like to keep
> > page level access re-check. We also agreed the previously proposed DAMOS
> > filter-based approach could make sense for the purpose.
> >
> > Because I was anyway planning making such DAMOS filter for not only
> > promotion/demotion but other types of DAMOS action, I will start developing the
> > page level access re-check results based DAMOS filter. Once the implementation
> > of the prototype is done, I will share the early implementation. Then, Honggyu
> > will adjust their implementation based on the filter, and run their tests again
> > and share the results.
>
> I just posted an RFC patchset for the page level access re-check DAMOS filter:
> https://lore.kernel.org/r/[email protected]
>
> I hope it to help you better understanding and testing the idea.

Thanks very much for your work! I will test it based on your changes.

Thanks,
Honggyu

>
> >
> > [1] https://lore.kernel.org/damon/[email protected]/
> > [2] https://lore.kernel.org/damon/[email protected]
>
>
> Thanks,
> SJ
>
> [...]
>
>
>

2024-03-17 08:52:20

by Honggyu Kim

[permalink] [raw]
Subject: Re: [RFC PATCH v2 0/7] DAMON based 2-tier memory management for CXL memory

Hi SeongJae,

Thanks for the confirmation. I have a few comments on young filter so
please read the inline comments again.

On Wed, 12 Mar 2024 08:53:00 -0800 SeongJae Park <[email protected]> wrote:
> Hi Honggyu,
>
> > > -----Original Message-----
> > > From: SeongJae Park <[email protected]>
> > > Sent: Tuesday, March 12, 2024 3:33 AM
> > > To: Honggyu Kim <[email protected]>
> > > Cc: SeongJae Park <[email protected]>; kernel_team <[email protected]>
> > > Subject: RE: Re: [RFC PATCH v2 0/7] DAMON based 2-tier memory management for CXL memory
> > >
> > > Hi Honggyu,
> > >
> > > On Mon, 11 Mar 2024 12:51:12 +0000 "[email protected]" <[email protected]> wrote:
> > >
> > > > Hi SeongJae,
> > > >
> > > > I've tested it again and found that "young" filter has to be set
> > > > differently as follows.
> > > > - demote action: set "young" filter with "matching" true
> > > > - promote action: set "young" filter with "matching" false
> > >
> > > DAMOS filter is basically for filtering "out" memory regions that matches to
> > > the condition. Hence in your setup, young pages are not filtered out from
> > > demote action target.
> >
> > I thought young filter true means "young pages ARE filtered out" for demotion.
>
> You're correct.

Ack.

> >
> > > That is, you're demoting pages that "not" young.
> >
> > Your explanation here looks opposite to the previous statement.
>
> Again, you're correct. My intention was "non-young pages are not ..." but
> maybe I was out of my mind and mistakenly removed "non-" part. Sorry for the
> confusion.

No problem. I also think it's quite confusing.

> >
> > > And vice versa, so you're applying promote to non-non-young (young) pages.
> >
> > Yes, I understand "promote non-non-young pages" means "promote young pages".
> > This might be understood as "young pages are NOT filtered out" for promotion
> > but it doesn't mean that "old pages are filtered out" instead.
> > And we just rely hot detection only on DAMOS logics such as access frequency
> > and age. Am I correct?
>
> You're correct.

Ack. But if it doesn't mean that "old pages are filtered out" instead,
then do we really need this filter for promotion? If not, maybe should
we create another "old" filter for promotion? As of now, the promotion
is mostly done inaccurately, but the accurate migration is done at
demotion level. To avoid this issue, I feel we should promotion only
"young" pages after filtering "old" pages out.

> >
> > > I understand this is somewhat complex, but what we have for now.
> >
> > Thanks for the explanation. I guess you mean my filter setup is correct.
> > Is it correct?
>
> Again, you're correct. Your filter setup is what I expected to :)

Thanks. I see that it works fine, but I would like to have more
discussion about "young" filter. What I think about filter is that if I
apply "young" filter "true" for demotion, then the action applies only
for "young" pages, but the current implementation works opposite.

I understand the function name of internal implementation is
"damos_pa_filter_out" so the basic action is filtering out, but the
cgroup filter works in the opposite way for now.

I would like to hear how you think about this.

> >
> > > > Then, I see that "hot_cold" migrates hot/cold memory correctly.
> > >
> > > Thank you so much for sharing this great news! My tests also show no bad
> > > signal so far.
> > >
> > > >
> > > > Could you please upload the "damon_folio_mkold" patch to LKML?
> > > > Then I will rebase our changes based on it and run the redis test again.
> > >
> > > I will do that soon.
> >
> > Thanks a lot for sharing the RFC v2 for DAMOS young filter.
> > https://lore.kernel.org/damon/[email protected]/
> >
> > I will rebase our work based on it and share the result.
>
> Cool, looking forward to it! Hopefully we will make it be merged into the
> mainline by v6.10!

I hope so. Thanks for your help!

Honggyu

2024-03-17 15:32:17

by SeongJae Park

[permalink] [raw]
Subject: Re: [RFC PATCH v2 0/7] DAMON based 2-tier memory management for CXL memory

Hi Honggyu,

On Sun, 17 Mar 2024 17:36:29 +0900 Honggyu Kim <[email protected]> wrote:

> Hi SeongJae,
>
> Thanks for the confirmation. I have a few comments on young filter so
> please read the inline comments again.
>
> On Wed, 12 Mar 2024 08:53:00 -0800 SeongJae Park <[email protected]> wrote:
> > Hi Honggyu,
> >
> > > > -----Original Message-----
> > > > From: SeongJae Park <[email protected]>
> > > > Sent: Tuesday, March 12, 2024 3:33 AM
> > > > To: Honggyu Kim <[email protected]>
> > > > Cc: SeongJae Park <[email protected]>; kernel_team <[email protected]>
> > > > Subject: RE: Re: [RFC PATCH v2 0/7] DAMON based 2-tier memory management for CXL memory
> > > >
> > > > Hi Honggyu,
> > > >
> > > > On Mon, 11 Mar 2024 12:51:12 +0000 "[email protected]" <[email protected]> wrote:
> > > >
> > > > > Hi SeongJae,
> > > > >
> > > > > I've tested it again and found that "young" filter has to be set
> > > > > differently as follows.
> > > > > - demote action: set "young" filter with "matching" true
> > > > > - promote action: set "young" filter with "matching" false
> > > >
> > > > DAMOS filter is basically for filtering "out" memory regions that matches to
> > > > the condition. Hence in your setup, young pages are not filtered out from
> > > > demote action target.
> > >
> > > I thought young filter true means "young pages ARE filtered out" for demotion.
> >
> > You're correct.
>
> Ack.
>
> > >
> > > > That is, you're demoting pages that "not" young.
> > >
> > > Your explanation here looks opposite to the previous statement.
> >
> > Again, you're correct. My intention was "non-young pages are not ..." but
> > maybe I was out of my mind and mistakenly removed "non-" part. Sorry for the
> > confusion.
>
> No problem. I also think it's quite confusing.
>
> > >
> > > > And vice versa, so you're applying promote to non-non-young (young) pages.
> > >
> > > Yes, I understand "promote non-non-young pages" means "promote young pages".
> > > This might be understood as "young pages are NOT filtered out" for promotion
> > > but it doesn't mean that "old pages are filtered out" instead.
> > > And we just rely hot detection only on DAMOS logics such as access frequency
> > > and age. Am I correct?
> >
> > You're correct.
>
> Ack. But if it doesn't mean that "old pages are filtered out" instead,

It does mean that. Here, filtering is exclusive. Hence, "filter-in a type of
pages" means "filter-out pages of other types". At least that's the intention.
To quote the documentation
(https://docs.kernel.org/mm/damon/design.html#filters),

Each filter specifies the type of target memory, and whether it should
exclude the memory of the type (filter-out), or all except the memory of
the type (filter-in).

> then do we really need this filter for promotion? If not, maybe should
> we create another "old" filter for promotion? As of now, the promotion
> is mostly done inaccurately, but the accurate migration is done at
> demotion level.

Is this based on your theory? Or, a real behavior that you're seeing from your
setup? If this is a real behavior, I think that should be a bug that need to
be fixed.

> To avoid this issue, I feel we should promotion only "young" pages after
> filtering "old" pages out.
>
> > >
> > > > I understand this is somewhat complex, but what we have for now.
> > >
> > > Thanks for the explanation. I guess you mean my filter setup is correct.
> > > Is it correct?
> >
> > Again, you're correct. Your filter setup is what I expected to :)
>
> Thanks. I see that it works fine, but I would like to have more
> discussion about "young" filter. What I think about filter is that if I
> apply "young" filter "true" for demotion, then the action applies only
> for "young" pages, but the current implementation works opposite.
>
> I understand the function name of internal implementation is
> "damos_pa_filter_out" so the basic action is filtering out, but the
> cgroup filter works in the opposite way for now.

Does memcg filter works in the opposite way? I don't think so because
__damos_pa_filter_out() sets 'matches' as 'true' only if the the given folio is
contained in the given memcg. 'young' filter also simply sets 'matches' as
'true' only if the given folio is young.

If it works in the opposite way, it's a bug that need to be fixed. Please let
me know if I'm missing something.

>
> I would like to hear how you think about this.
>
> > >
> > > > > Then, I see that "hot_cold" migrates hot/cold memory correctly.
> > > >
> > > > Thank you so much for sharing this great news! My tests also show no bad
> > > > signal so far.
> > > >
> > > > >
> > > > > Could you please upload the "damon_folio_mkold" patch to LKML?
> > > > > Then I will rebase our changes based on it and run the redis test again.
> > > >
> > > > I will do that soon.
> > >
> > > Thanks a lot for sharing the RFC v2 for DAMOS young filter.
> > > https://lore.kernel.org/damon/[email protected]/
> > >
> > > I will rebase our work based on it and share the result.
> >
> > Cool, looking forward to it! Hopefully we will make it be merged into the
> > mainline by v6.10!
>
> I hope so. Thanks for your help!
>
> Honggyu


Thanks,
SJ

2024-03-17 19:14:15

by SeongJae Park

[permalink] [raw]
Subject: Re: [RFC PATCH v2 0/7] DAMON based 2-tier memory management for CXL memory

On Sun, 17 Mar 2024 08:31:44 -0700 SeongJae Park <[email protected]> wrote:

> Hi Honggyu,
>
> On Sun, 17 Mar 2024 17:36:29 +0900 Honggyu Kim <[email protected]> wrote:
>
> > Hi SeongJae,
> >
> > Thanks for the confirmation. I have a few comments on young filter so
> > please read the inline comments again.
> >
> > On Wed, 12 Mar 2024 08:53:00 -0800 SeongJae Park <[email protected]> wrote:
> > > Hi Honggyu,
[...]
> > Thanks. I see that it works fine, but I would like to have more
> > discussion about "young" filter. What I think about filter is that if I
> > apply "young" filter "true" for demotion, then the action applies only
> > for "young" pages, but the current implementation works opposite.
> >
> > I understand the function name of internal implementation is
> > "damos_pa_filter_out" so the basic action is filtering out, but the
> > cgroup filter works in the opposite way for now.
>
> Does memcg filter works in the opposite way? I don't think so because
> __damos_pa_filter_out() sets 'matches' as 'true' only if the the given folio is
> contained in the given memcg. 'young' filter also simply sets 'matches' as
> 'true' only if the given folio is young.
>
> If it works in the opposite way, it's a bug that need to be fixed. Please let
> me know if I'm missing something.

I just read the DAMOS filters part of the documentation for DAMON sysfs
interface again. I believe it is explaining the meaning of 'matching' as I
intended to, as below:

You can write ``Y`` or ``N`` to ``matching`` file to filter out pages that does
or does not match to the type, respectively. Then, the scheme's action will
not be applied to the pages that specified to be filtered out.

But, I found the following example for memcg filter is wrong, as below:

For example, below restricts a DAMOS action to be applied to only non-anonymous
pages of all memory cgroups except ``/having_care_already``.::

# echo 2 > nr_filters
# # filter out anonymous pages
echo anon > 0/type
echo Y > 0/matching
# # further filter out all cgroups except one at '/having_care_already'
echo memcg > 1/type
echo /having_care_already > 1/memcg_path
echo N > 1/matching

Specifically, the last line of the commands should write 'Y' instead of 'N' to
do what explained. Without the fix, the action will be applied to only
non-anonymous pages of 'having_care_already' memcg. This is definitely wrong.
I will fix this soon. I'm unsure if this is what made you to believe memcg
DAMOS filter is working in the opposite way, though.


Thanks,
SJ

[...]

2024-03-18 13:28:30

by Honggyu Kim

[permalink] [raw]
Subject: Re: [RFC PATCH v2 0/7] DAMON based 2-tier memory management for CXL memory

Hi SeongJae,

On Sun, 17 Mar 2024 08:31:44 -0700 SeongJae Park <[email protected]> wrote:
> Hi Honggyu,
>
> On Sun, 17 Mar 2024 17:36:29 +0900 Honggyu Kim <[email protected]> wrote:
>
> > Hi SeongJae,
> >
> > Thanks for the confirmation. I have a few comments on young filter so
> > please read the inline comments again.
> >
> > On Wed, 12 Mar 2024 08:53:00 -0800 SeongJae Park <[email protected]> wrote:
> > > Hi Honggyu,
> > >
> > > > > -----Original Message-----
> > > > > From: SeongJae Park <[email protected]>
> > > > > Sent: Tuesday, March 12, 2024 3:33 AM
> > > > > To: Honggyu Kim <[email protected]>
> > > > > Cc: SeongJae Park <[email protected]>; kernel_team <[email protected]>
> > > > > Subject: RE: Re: [RFC PATCH v2 0/7] DAMON based 2-tier memory management for CXL memory
> > > > >
> > > > > Hi Honggyu,
> > > > >
> > > > > On Mon, 11 Mar 2024 12:51:12 +0000 "[email protected]" <[email protected]> wrote:
> > > > >
> > > > > > Hi SeongJae,
> > > > > >
> > > > > > I've tested it again and found that "young" filter has to be set
> > > > > > differently as follows.
> > > > > > - demote action: set "young" filter with "matching" true
> > > > > > - promote action: set "young" filter with "matching" false

Thinking it again, I feel like "matching" true or false looks quite
vague to me as a general user.

Instead, I would like to have more meaningful names for "matching" as
follows.

- matching "true" can be either (filter) "out" or "skip".
- matching "false" can be either (filter) "in" or "apply".

Internally, the type of "matching" can be boolean, but it'd be better
for general users have another ways to set it such as "out"/"in" or
"skip"/"apply" via sysfs interface. I prefer "skip" and "apply" looks
more intuitive, but I don't have strong objection on "out" and "in" as
well.

I also feel the filter name "young" is more for developers not for
general users. I think this can be changed to "accessed" filter
instead.

The demote and promote filters can be set as follows using these.

- demote action: set "accessed" filter with "matching" to "skip"
- promote action: set "accessed" filter with "matching" to "apply"

I also think that you can feel this is more complicated so I would like
to hear how you think about this.

> > > > >
> > > > > DAMOS filter is basically for filtering "out" memory regions that matches to
> > > > > the condition.

Right. In other tools, I see filters are more used as filtering "in"
rather than filtering "out". I feel this makes me more confused.

> > > > > Hence in your setup, young pages are not filtered out from
> > > > > demote action target.
> > > >
> > > > I thought young filter true means "young pages ARE filtered out" for demotion.
> > >
> > > You're correct.
> >
> > Ack.
> >
> > > >
> > > > > That is, you're demoting pages that "not" young.
> > > >
> > > > Your explanation here looks opposite to the previous statement.
> > >
> > > Again, you're correct. My intention was "non-young pages are not ..." but
> > > maybe I was out of my mind and mistakenly removed "non-" part. Sorry for the
> > > confusion.
> >
> > No problem. I also think it's quite confusing.
> >
> > > >
> > > > > And vice versa, so you're applying promote to non-non-young (young) pages.
> > > >
> > > > Yes, I understand "promote non-non-young pages" means "promote young pages".
> > > > This might be understood as "young pages are NOT filtered out" for promotion
> > > > but it doesn't mean that "old pages are filtered out" instead.
> > > > And we just rely hot detection only on DAMOS logics such as access frequency
> > > > and age. Am I correct?
> > >
> > > You're correct.
> >
> > Ack. But if it doesn't mean that "old pages are filtered out" instead,
>
> It does mean that. Here, filtering is exclusive. Hence, "filter-in a type of
> pages" means "filter-out pages of other types". At least that's the intention.
> To quote the documentation
> (https://docs.kernel.org/mm/damon/design.html#filters),
>
> Each filter specifies the type of target memory, and whether it should
> exclude the memory of the type (filter-out), or all except the memory of
> the type (filter-in).

Thanks for the correction.

> > then do we really need this filter for promotion? If not, maybe should
> > we create another "old" filter for promotion? As of now, the promotion
> > is mostly done inaccurately, but the accurate migration is done at
> > demotion level.
>
> Is this based on your theory? Or, a real behavior that you're seeing from your
> setup? If this is a real behavior, I think that should be a bug that need to
> be fixed.

I have observed this in the hot_cold example, but I also found that the
promotion is done very quickly because the age for promote action is set
to 0 to max in my json setup. It makes most pages of the region are
young because there is not enough time for those pages being old. That
means I was wrong.

> > To avoid this issue, I feel we should promotion only "young" pages after
> > filtering "old" pages out.
> >
> > > >
> > > > > I understand this is somewhat complex, but what we have for now.
> > > >
> > > > Thanks for the explanation. I guess you mean my filter setup is correct.
> > > > Is it correct?
> > >
> > > Again, you're correct. Your filter setup is what I expected to :)
> >
> > Thanks. I see that it works fine, but I would like to have more
> > discussion about "young" filter. What I think about filter is that if I
> > apply "young" filter "true" for demotion, then the action applies only
> > for "young" pages, but the current implementation works opposite.
> >
> > I understand the function name of internal implementation is
> > "damos_pa_filter_out" so the basic action is filtering out, but the
> > cgroup filter works in the opposite way for now.
>
> Does memcg filter works in the opposite way? I don't think so because
> __damos_pa_filter_out() sets 'matches' as 'true' only if the the given folio is
> contained in the given memcg. 'young' filter also simply sets 'matches' as
> 'true' only if the given folio is young.
>
> If it works in the opposite way, it's a bug that need to be fixed. Please let
> me know if I'm missing something.

No, it was also my misunderstanding. I used to set the matching false
using my script.

Thanks,
Honggyu

> >
> > I would like to hear how you think about this.
> >
> > > >
> > > > > > Then, I see that "hot_cold" migrates hot/cold memory correctly.
> > > > >
> > > > > Thank you so much for sharing this great news! My tests also show no bad
> > > > > signal so far.
> > > > >
> > > > > >
> > > > > > Could you please upload the "damon_folio_mkold" patch to LKML?
> > > > > > Then I will rebase our changes based on it and run the redis test again.
> > > > >
> > > > > I will do that soon.
> > > >
> > > > Thanks a lot for sharing the RFC v2 for DAMOS young filter.
> > > > https://lore.kernel.org/damon/[email protected]/
> > > >
> > > > I will rebase our work based on it and share the result.
> > >
> > > Cool, looking forward to it! Hopefully we will make it be merged into the
> > > mainline by v6.10!
> >
> > I hope so. Thanks for your help!
> >
> > Honggyu
>
>
> Thanks,
> SJ

2024-03-18 13:33:57

by Honggyu Kim

[permalink] [raw]
Subject: Re: [RFC PATCH v2 0/7] DAMON based 2-tier memory management for CXL memory

Hi SeongJae,

On Sun, 17 Mar 2024 12:13:57 -0700 SeongJae Park <[email protected]> wrote:
> On Sun, 17 Mar 2024 08:31:44 -0700 SeongJae Park <[email protected]> wrote:
>
> > Hi Honggyu,
> >
> > On Sun, 17 Mar 2024 17:36:29 +0900 Honggyu Kim <[email protected]> wrote:
> >
> > > Hi SeongJae,
> > >
> > > Thanks for the confirmation. I have a few comments on young filter so
> > > please read the inline comments again.
> > >
> > > On Wed, 12 Mar 2024 08:53:00 -0800 SeongJae Park <[email protected]> wrote:
> > > > Hi Honggyu,
> [...]
> > > Thanks. I see that it works fine, but I would like to have more
> > > discussion about "young" filter. What I think about filter is that if I
> > > apply "young" filter "true" for demotion, then the action applies only
> > > for "young" pages, but the current implementation works opposite.
> > >
> > > I understand the function name of internal implementation is
> > > "damos_pa_filter_out" so the basic action is filtering out, but the
> > > cgroup filter works in the opposite way for now.
> >
> > Does memcg filter works in the opposite way? I don't think so because
> > __damos_pa_filter_out() sets 'matches' as 'true' only if the the given folio is
> > contained in the given memcg. 'young' filter also simply sets 'matches' as
> > 'true' only if the given folio is young.
> >
> > If it works in the opposite way, it's a bug that need to be fixed. Please let
> > me know if I'm missing something.
>
> I just read the DAMOS filters part of the documentation for DAMON sysfs
> interface again. I believe it is explaining the meaning of 'matching' as I
> intended to, as below:
>
> You can write ``Y`` or ``N`` to ``matching`` file to filter out pages that does
> or does not match to the type, respectively. Then, the scheme's action will
> not be applied to the pages that specified to be filtered out.
>
> But, I found the following example for memcg filter is wrong, as below:
>
> For example, below restricts a DAMOS action to be applied to only non-anonymous
> pages of all memory cgroups except ``/having_care_already``.::
>
> # echo 2 > nr_filters
> # # filter out anonymous pages
> echo anon > 0/type
> echo Y > 0/matching
> # # further filter out all cgroups except one at '/having_care_already'
> echo memcg > 1/type
> echo /having_care_already > 1/memcg_path
> echo N > 1/matching
>
> Specifically, the last line of the commands should write 'Y' instead of 'N' to
> do what explained. Without the fix, the action will be applied to only
> non-anonymous pages of 'having_care_already' memcg. This is definitely wrong.
> I will fix this soon. I'm unsure if this is what made you to believe memcg
> DAMOS filter is working in the opposite way, though.

I got confused not because of this. I just think it again that this
user interface is better to be more intuitive as I mentioned in the
previous thread.

Thanks,
Honggyu

>
> Thanks,
> SJ
>
> [...]

2024-03-18 19:07:32

by SeongJae Park

[permalink] [raw]
Subject: Re: [RFC PATCH v2 0/7] DAMON based 2-tier memory management for CXL memory

On Mon, 18 Mar 2024 22:27:45 +0900 Honggyu Kim <[email protected]> wrote:

> Hi SeongJae,
>
> On Sun, 17 Mar 2024 08:31:44 -0700 SeongJae Park <[email protected]> wrote:
> > Hi Honggyu,
> >
> > On Sun, 17 Mar 2024 17:36:29 +0900 Honggyu Kim <[email protected]> wrote:
> >
> > > Hi SeongJae,
> > >
> > > Thanks for the confirmation. I have a few comments on young filter so
> > > please read the inline comments again.
> > >
> > > On Wed, 12 Mar 2024 08:53:00 -0800 SeongJae Park <[email protected]> wrote:
> > > > Hi Honggyu,
> > > >
> > > > > > -----Original Message-----
> > > > > > From: SeongJae Park <[email protected]>
> > > > > > Sent: Tuesday, March 12, 2024 3:33 AM
> > > > > > To: Honggyu Kim <[email protected]>
> > > > > > Cc: SeongJae Park <[email protected]>; kernel_team <[email protected]>
> > > > > > Subject: RE: Re: [RFC PATCH v2 0/7] DAMON based 2-tier memory management for CXL memory
> > > > > >
> > > > > > Hi Honggyu,
> > > > > >
> > > > > > On Mon, 11 Mar 2024 12:51:12 +0000 "[email protected]" <[email protected]> wrote:
> > > > > >
> > > > > > > Hi SeongJae,
> > > > > > >
> > > > > > > I've tested it again and found that "young" filter has to be set
> > > > > > > differently as follows.
> > > > > > > - demote action: set "young" filter with "matching" true
> > > > > > > - promote action: set "young" filter with "matching" false
>
> Thinking it again, I feel like "matching" true or false looks quite
> vague to me as a general user.
>
> Instead, I would like to have more meaningful names for "matching" as
> follows.
>
> - matching "true" can be either (filter) "out" or "skip".
> - matching "false" can be either (filter) "in" or "apply".

I agree the naming could be done much better. And thank you for the nice
suggestions. I have a few concerns, though.

Firstly, increasing the number of behavioral concepts. DAMOS filter feature
has only single behavior: excluding some types of memory from DAMOS action
target. The "matching" is to provide a flexible way for further specifying the
target to exclude in a bit detail. Without it, we would need non-variant for
each filter type. Comapred to the current terms, the new terms feel like
implying there are two types of behaviors. I think one behavior is easier to
understand than two behaviors, and better match what internal code is doing.

Secondly, ambiguity in "in" and "apply". To me, the terms sound like _adding_
something more than _excluding_. I think that might confuse people in some
cases. Actually, I have used the term "filter-out" and "filter-in" on
this and several threads. When saying about "filter-in" scenario, I had to
emphasize the fact that it is not adding something but excluding others. I now
think that was not a good approach.

Finally, "apply" sounds a bit deterministic. I think it could be a bit
confusing in some cases such as when using multiple filters in a combined way.
For example, if we have two filters for 1) "apply" a memcg and 2) skip anon
pages, the given DAMOS action will not be applied to anon pages of the memcg.
I think this might be a bit confusing.

>
> Internally, the type of "matching" can be boolean, but it'd be better
> for general users have another ways to set it such as "out"/"in" or
> "skip"/"apply" via sysfs interface. I prefer "skip" and "apply" looks
> more intuitive, but I don't have strong objection on "out" and "in" as
> well.

Unfortunately, DAMON sysfs interface is an ABI that we want to keep stable. Of
course we could make some changes on it if really required. But I'm unsure if
the problem of current naming and benefit of the sugegsted change are big
enough to outweighs the stability risk and additional efforts.

Also, DAMON sysfs interface is arguably not for _very_ general users. DAMON
user-space tool is the one for _more_ general users. To quote DAMON usage
document,

- *DAMON user space tool.*
`This <https://github.com/awslabs/damo>`_ is for privileged people such as
system administrators who want a just-working human-friendly interface.
[...]
- *sysfs interface.*
:ref:`This <sysfs_interface>` is for privileged user space programmers who
want more optimized use of DAMON. [...]

If the concept is that confused, I think we could improve the documentation, or
the user space tool. But for DAMON sysfs interface, I think we need more
discussions for getting clear pros/cons that justifies the risk and the effort.

>
> I also feel the filter name "young" is more for developers not for
> general users. I think this can be changed to "accessed" filter
> instead.

In my humble opinion, "accessed" might be confusing with the term that being
used by DAMON, specifically, the concept of "nr_accesses". I also thought
about using more specific term such as "pg-accessed" or something else, but I
felt it is still unclear or making it too verbose.

I agree "young" sounds more for developers. But, again, DAMON sysfs is for not
_very_ general users. And the term is used commonly on relcaimation part and
LRU, so I think this is not too bad as long as we provide nice documentation or
abstraction from user-space tool.

>
> The demote and promote filters can be set as follows using these.
>
> - demote action: set "accessed" filter with "matching" to "skip"
> - promote action: set "accessed" filter with "matching" to "apply"
>
> I also think that you can feel this is more complicated so I would like
> to hear how you think about this.

To my humble brain, this looks intuitive for this use case. But as I also
mentioned above, I worry if this could keep simple and intuitive in complicated
filter usages.

>
> > > > > >
> > > > > > DAMOS filter is basically for filtering "out" memory regions that matches to
> > > > > > the condition.
>
> Right. In other tools, I see filters are more used as filtering "in"
> rather than filtering "out". I feel this makes me more confused.

I understand that the word, "filtering", can be used for both, and therefore
can be confused. I was also spending no small times at naming since I was
thinking about both coffee filters and color filters (of photoshop or glasses).
But it turned out that I'm more familiar with coffee filters, and might be same
for DAMON community, since the community is having beer/coffee/tea chat series
;) (https://lore.kernel.org/damon/[email protected]/)

That said, I think we may be able to make this better documented, or add a
layer of abstraction on DAMON user-space tool.

[...]
> > > > > Yes, I understand "promote non-non-young pages" means "promote young pages".
> > > > > This might be understood as "young pages are NOT filtered out" for promotion
> > > > > but it doesn't mean that "old pages are filtered out" instead.
> > > > > And we just rely hot detection only on DAMOS logics such as access frequency
> > > > > and age. Am I correct?
> > > >
> > > > You're correct.
> > >
> > > Ack. But if it doesn't mean that "old pages are filtered out" instead,
> >
> > It does mean that. Here, filtering is exclusive. Hence, "filter-in a type of
> > pages" means "filter-out pages of other types". At least that's the intention.
> > To quote the documentation
> > (https://docs.kernel.org/mm/damon/design.html#filters),
> >
> > Each filter specifies the type of target memory, and whether it should
> > exclude the memory of the type (filter-out), or all except the memory of
> > the type (filter-in).
>
> Thanks for the correction.
>
> > > then do we really need this filter for promotion? If not, maybe should
> > > we create another "old" filter for promotion? As of now, the promotion
> > > is mostly done inaccurately, but the accurate migration is done at
> > > demotion level.
> >
> > Is this based on your theory? Or, a real behavior that you're seeing from your
> > setup? If this is a real behavior, I think that should be a bug that need to
> > be fixed.
>
> I have observed this in the hot_cold example, but I also found that the
> promotion is done very quickly because the age for promote action is set
> to 0 to max in my json setup. It makes most pages of the region are
> young because there is not enough time for those pages being old. That
> means I was wrong.
[...]
> > > I understand the function name of internal implementation is
> > > "damos_pa_filter_out" so the basic action is filtering out, but the
> > > cgroup filter works in the opposite way for now.
> >
> > Does memcg filter works in the opposite way? I don't think so because
> > __damos_pa_filter_out() sets 'matches' as 'true' only if the the given folio is
> > contained in the given memcg. 'young' filter also simply sets 'matches' as
> > 'true' only if the given folio is young.
> >
> > If it works in the opposite way, it's a bug that need to be fixed. Please let
> > me know if I'm missing something.
>
> No, it was also my misunderstanding. I used to set the matching false
> using my script.

Thank you for confirming. I understand we found no bug at the moment.


To summarize my opinion again,

1. I agree the concept and names of DAMOS filters are confusing and not very
intuitive.
2. However, it's unclear if the problem and the benefit from the suggested new
names are huge enough to take the risk and effort on changing ABI.
3. We could improve documentation and/or user-space tool.

Thank you again for the suggestion and confirmations to my questions.


Thanks,
SJ

[...]

2024-03-20 07:09:29

by Honggyu Kim

[permalink] [raw]
Subject: Re: [RFC PATCH v2 0/7] DAMON based 2-tier memory management for CXL memory

Hi SeongJae,

On Mon, 18 Mar 2024 12:07:21 -0700 SeongJae Park <[email protected]> wrote:
> On Mon, 18 Mar 2024 22:27:45 +0900 Honggyu Kim <[email protected]> wrote:
>
> > Hi SeongJae,
> >
> > On Sun, 17 Mar 2024 08:31:44 -0700 SeongJae Park <[email protected]> wrote:
> > > Hi Honggyu,
> > >
> > > On Sun, 17 Mar 2024 17:36:29 +0900 Honggyu Kim <[email protected]> wrote:
> > >
> > > > Hi SeongJae,
> > > >
> > > > Thanks for the confirmation. I have a few comments on young filter so
> > > > please read the inline comments again.
> > > >
> > > > On Wed, 12 Mar 2024 08:53:00 -0800 SeongJae Park <[email protected]> wrote:
> > > > > Hi Honggyu,
> > > > >
> > > > > > > -----Original Message-----
> > > > > > > From: SeongJae Park <[email protected]>
> > > > > > > Sent: Tuesday, March 12, 2024 3:33 AM
> > > > > > > To: Honggyu Kim <[email protected]>
> > > > > > > Cc: SeongJae Park <[email protected]>; kernel_team <[email protected]>
> > > > > > > Subject: RE: Re: [RFC PATCH v2 0/7] DAMON based 2-tier memory management for CXL memory
> > > > > > >
> > > > > > > Hi Honggyu,
> > > > > > >
> > > > > > > On Mon, 11 Mar 2024 12:51:12 +0000 "[email protected]" <[email protected]> wrote:
> > > > > > >
> > > > > > > > Hi SeongJae,
> > > > > > > >
> > > > > > > > I've tested it again and found that "young" filter has to be set
> > > > > > > > differently as follows.
> > > > > > > > - demote action: set "young" filter with "matching" true
> > > > > > > > - promote action: set "young" filter with "matching" false
> >
> > Thinking it again, I feel like "matching" true or false looks quite
> > vague to me as a general user.
> >
> > Instead, I would like to have more meaningful names for "matching" as
> > follows.
> >
> > - matching "true" can be either (filter) "out" or "skip".
> > - matching "false" can be either (filter) "in" or "apply".
>
> I agree the naming could be done much better. And thank you for the nice
> suggestions. I have a few concerns, though.

I don't think my suggestion is best. I just would like to have more
discussion about it.

> Firstly, increasing the number of behavioral concepts. DAMOS filter feature
> has only single behavior: excluding some types of memory from DAMOS action
> target. The "matching" is to provide a flexible way for further specifying the
> target to exclude in a bit detail. Without it, we would need non-variant for
> each filter type. Comapred to the current terms, the new terms feel like
> implying there are two types of behaviors. I think one behavior is easier to
> understand than two behaviors, and better match what internal code is doing.
>
> Secondly, ambiguity in "in" and "apply". To me, the terms sound like _adding_
> something more than _excluding_.

I understood that young filter "matching" "false" means apply action
only to young pages. Do I misunderstood something here? If not,
"apply" means _adding_ or _including_ only the matched pages for action.
It looks like you thought about _excluding_ non matched pages here.

> I think that might confuse people in some
> cases. Actually, I have used the term "filter-out" and "filter-in" on
> this and several threads. When saying about "filter-in" scenario, I had to
> emphasize the fact that it is not adding something but excluding others.

Excluding others also means including matched pages. I think we better
focus on what to do only for the matched pages.

> I now think that was not a good approach.
>
> Finally, "apply" sounds a bit deterministic. I think it could be a bit
> confusing in some cases such as when using multiple filters in a combined way.
> For example, if we have two filters for 1) "apply" a memcg and 2) skip anon
> pages, the given DAMOS action will not be applied to anon pages of the memcg.
> I think this might be a bit confusing.

No objection on this. If so, I think "in" sounds better than "apply".

> >
> > Internally, the type of "matching" can be boolean, but it'd be better
> > for general users have another ways to set it such as "out"/"in" or
> > "skip"/"apply" via sysfs interface. I prefer "skip" and "apply" looks
> > more intuitive, but I don't have strong objection on "out" and "in" as
> > well.
>
> Unfortunately, DAMON sysfs interface is an ABI that we want to keep stable. Of
> course we could make some changes on it if really required. But I'm unsure if
> the problem of current naming and benefit of the sugegsted change are big
> enough to outweighs the stability risk and additional efforts.

I don't ask to change the interface, but just provide another way for
the setting. For example, the current "matching" accepts either 1,
true, or Y but internally keeps as "true" as a boolean type.

$ cd /sys/kernel/mm/damon/admin/kdamonds/0/contexts/0/schemes/0/filters/0

$ echo 1 | tee matching && cat matching
1
Y

$ echo true | tee matching && cat matching
true
Y

$ echo Y | tee matching && cat matching
Y
Y

I'm asking if it's okay making "matching" receive "out" or "skip" as
follows.

$ echo out | tee matching && cat matching
out
Y

$ echo skip | tee matching && cat matching
skip
Y

> Also, DAMON sysfs interface is arguably not for _very_ general users. DAMON
> user-space tool is the one for _more_ general users. To quote DAMON usage
> document,
>
> - *DAMON user space tool.*
> `This <https://github.com/awslabs/damo>`_ is for privileged people such as
> system administrators who want a just-working human-friendly interface.
> [...]
> - *sysfs interface.*
> :ref:`This <sysfs_interface>` is for privileged user space programmers who
> want more optimized use of DAMON. [...]
>
> If the concept is that confused, I think we could improve the documentation, or
> the user space tool. But for DAMON sysfs interface, I think we need more
> discussions for getting clear pros/cons that justifies the risk and the effort.

If my suggestion is not what you want in sysfs interface, then "damo"
can receive these more meaningful names and translate to "true" or
"false" when writing to sysfs.

> >
> > I also feel the filter name "young" is more for developers not for
> > general users. I think this can be changed to "accessed" filter
> > instead.
>
> In my humble opinion, "accessed" might be confusing with the term that being
> used by DAMON, specifically, the concept of "nr_accesses". I also thought
> about using more specific term such as "pg-accessed" or something else, but I
> felt it is still unclear or making it too verbose.
>
> I agree "young" sounds more for developers. But, again, DAMON sysfs is for not
> _very_ general users.

I worried the developer term is also going to be used for "damo" user
space tool as "young" filter. But if you think it's good enough, then I
will follow the decision as I also think "accessed" is not the best term
for this.

> And the term is used commonly on relcaimation part and
> LRU, so I think this is not too bad as long as we provide nice documentation or
> abstraction from user-space tool.
>
> >
> > The demote and promote filters can be set as follows using these.
> >
> > - demote action: set "accessed" filter with "matching" to "skip"
> > - promote action: set "accessed" filter with "matching" to "apply"
> >
> > I also think that you can feel this is more complicated so I would like
> > to hear how you think about this.
>
> To my humble brain, this looks intuitive for this use case. But as I also
> mentioned above, I worry if this could keep simple and intuitive in complicated
> filter usages.
>
> >
> > > > > > >
> > > > > > > DAMOS filter is basically for filtering "out" memory regions that matches to
> > > > > > > the condition.
> >
> > Right. In other tools, I see filters are more used as filtering "in"
> > rather than filtering "out". I feel this makes me more confused.
>
> I understand that the word, "filtering", can be used for both, and therefore
> can be confused. I was also spending no small times at naming since I was
> thinking about both coffee filters and color filters (of photoshop or glasses).
> But it turned out that I'm more familiar with coffee filters, and might be same
> for DAMON community, since the community is having beer/coffee/tea chat series
> ;) (https://lore.kernel.org/damon/[email protected]/)

Yeah, I thought about filter for including pages for given config as
follows.

\ /
\ / only matched items pass this filter.
||

But the current DAMOS filter is about excluding pages for given config
as follows just like a strainer.
___
/###\
|#####| matched items are excluded via this filter.
\###/
---

I think I won't get confused after keeping this difference in mind.

> That said, I think we may be able to make this better documented, or add a
> layer of abstraction on DAMON user-space tool.
>
> [...]
> > > > > > Yes, I understand "promote non-non-young pages" means "promote young pages".
> > > > > > This might be understood as "young pages are NOT filtered out" for promotion
> > > > > > but it doesn't mean that "old pages are filtered out" instead.
> > > > > > And we just rely hot detection only on DAMOS logics such as access frequency
> > > > > > and age. Am I correct?
> > > > >
> > > > > You're correct.
> > > >
> > > > Ack. But if it doesn't mean that "old pages are filtered out" instead,
> > >
> > > It does mean that. Here, filtering is exclusive. Hence, "filter-in a type of
> > > pages" means "filter-out pages of other types". At least that's the intention.
> > > To quote the documentation
> > > (https://docs.kernel.org/mm/damon/design.html#filters),
> > >
> > > Each filter specifies the type of target memory, and whether it should
> > > exclude the memory of the type (filter-out), or all except the memory of
> > > the type (filter-in).
> >
> > Thanks for the correction.
> >
> > > > then do we really need this filter for promotion? If not, maybe should
> > > > we create another "old" filter for promotion? As of now, the promotion
> > > > is mostly done inaccurately, but the accurate migration is done at
> > > > demotion level.
> > >
> > > Is this based on your theory? Or, a real behavior that you're seeing from your
> > > setup? If this is a real behavior, I think that should be a bug that need to
> > > be fixed.
> >
> > I have observed this in the hot_cold example, but I also found that the
> > promotion is done very quickly because the age for promote action is set
> > to 0 to max in my json setup. It makes most pages of the region are
> > young because there is not enough time for those pages being old. That
> > means I was wrong.
> [...]
> > > > I understand the function name of internal implementation is
> > > > "damos_pa_filter_out" so the basic action is filtering out, but the
> > > > cgroup filter works in the opposite way for now.
> > >
> > > Does memcg filter works in the opposite way? I don't think so because
> > > __damos_pa_filter_out() sets 'matches' as 'true' only if the the given folio is
> > > contained in the given memcg. 'young' filter also simply sets 'matches' as
> > > 'true' only if the given folio is young.
> > >
> > > If it works in the opposite way, it's a bug that need to be fixed. Please let
> > > me know if I'm missing something.
> >
> > No, it was also my misunderstanding. I used to set the matching false
> > using my script.
>
> Thank you for confirming. I understand we found no bug at the moment.
>
>
> To summarize my opinion again,
>
> 1. I agree the concept and names of DAMOS filters are confusing and not very
> intuitive.
> 2. However, it's unclear if the problem and the benefit from the suggested new
> names are huge enough to take the risk and effort on changing ABI.
> 3. We could improve documentation and/or user-space tool.

I think improving "damo" can be a good solution.

> Thank you again for the suggestion and confirmations to my questions.

Likewise, thank you for the explanation in details.

Honggyu

>
>
> Thanks,
> SJ
>
> [...]

2024-03-20 16:56:38

by SeongJae Park

[permalink] [raw]
Subject: Re: [RFC PATCH v2 0/7] DAMON based 2-tier memory management for CXL memory

Hi Honggyu,

On Wed, 20 Mar 2024 16:07:48 +0900 Honggyu Kim <[email protected]> wrote:

> Hi SeongJae,
>
> On Mon, 18 Mar 2024 12:07:21 -0700 SeongJae Park <[email protected]> wrote:
> > On Mon, 18 Mar 2024 22:27:45 +0900 Honggyu Kim <[email protected]> wrote:
> >
> > > Hi SeongJae,
> > >
> > > On Sun, 17 Mar 2024 08:31:44 -0700 SeongJae Park <[email protected]> wrote:
> > > > Hi Honggyu,
> > > >
> > > > On Sun, 17 Mar 2024 17:36:29 +0900 Honggyu Kim <[email protected]> wrote:
> > > >
> > > > > Hi SeongJae,
> > > > >
> > > > > Thanks for the confirmation. I have a few comments on young filter so
> > > > > please read the inline comments again.
> > > > >
> > > > > On Wed, 12 Mar 2024 08:53:00 -0800 SeongJae Park <[email protected]> wrote:
> > > > > > Hi Honggyu,
> > > > > >
> > > > > > > > -----Original Message-----
> > > > > > > > From: SeongJae Park <[email protected]>
> > > > > > > > Sent: Tuesday, March 12, 2024 3:33 AM
> > > > > > > > To: Honggyu Kim <[email protected]>
> > > > > > > > Cc: SeongJae Park <[email protected]>; kernel_team <[email protected]>
> > > > > > > > Subject: RE: Re: [RFC PATCH v2 0/7] DAMON based 2-tier memory management for CXL memory
> > > > > > > >
> > > > > > > > Hi Honggyu,
> > > > > > > >
> > > > > > > > On Mon, 11 Mar 2024 12:51:12 +0000 "[email protected]" <[email protected]> wrote:
> > > > > > > >
> > > > > > > > > Hi SeongJae,
> > > > > > > > >
> > > > > > > > > I've tested it again and found that "young" filter has to be set
> > > > > > > > > differently as follows.
> > > > > > > > > - demote action: set "young" filter with "matching" true
> > > > > > > > > - promote action: set "young" filter with "matching" false
> > >
> > > Thinking it again, I feel like "matching" true or false looks quite
> > > vague to me as a general user.
> > >
> > > Instead, I would like to have more meaningful names for "matching" as
> > > follows.
> > >
> > > - matching "true" can be either (filter) "out" or "skip".
> > > - matching "false" can be either (filter) "in" or "apply".
> >
> > I agree the naming could be done much better. And thank you for the nice
> > suggestions. I have a few concerns, though.
>
> I don't think my suggestion is best. I just would like to have more
> discussion about it.

I also understand my naming sense is far from good :) I'm grateful to have
this constructive discussion!

>
> > Firstly, increasing the number of behavioral concepts. DAMOS filter feature
> > has only single behavior: excluding some types of memory from DAMOS action
> > target. The "matching" is to provide a flexible way for further specifying the
> > target to exclude in a bit detail. Without it, we would need non-variant for
> > each filter type. Comapred to the current terms, the new terms feel like
> > implying there are two types of behaviors. I think one behavior is easier to
> > understand than two behaviors, and better match what internal code is doing.
> >
> > Secondly, ambiguity in "in" and "apply". To me, the terms sound like _adding_
> > something more than _excluding_.
>
> I understood that young filter "matching" "false" means apply action
> only to young pages. Do I misunderstood something here? If not,

Technically speaking, having a DAMOS filter with 'matching' parameter as
'false' for 'young pages' type means you want DAMOS to "exclude pages that not
young from the scheme's action target". That's the only thing it truly does,
and what it tries to guarantee. Whether the action will be applied to young
pages or not depends on more factors including additional filters and DAMOS
parameter. IOW, that's not what the simple setting promises.

Of course, I know you are assuming there is only the single filter. Hence,
effectively you're correct. And the sentence may be a better wording for end
users. However, it tooke me a bit time to understand your assumption and
concluding whether your sentence is correct or not, since I had to think about
the assumptions.

I'd say this also reminds me the first concern that I raised on the previous
mail. IOW, I feel this sentence is introducing one more behavior and making it
bit taking longer time to digest, for developers who should judge it based on
the source code. I'd suggest use only one behavioral term, "exclude", since it
is what the code really does, unless it is wording for end users.

> "apply" means _adding_ or _including_ only the matched pages for action.
> It looks like you thought about _excluding_ non matched pages here.

Yes. I'd prefer using only single term, _excluding_. It fits with the code,
and require one word less that "adding" or "including", since "adding" or
"including" require one more word, "only".

Also, even with "only", the fact that there could be more filters makes me
unsure what is the consequence of having it. That is, if we have a filter that
includes only pages of type A, but if there could be yet another filter that
includes only pages of type B, would the consequence is the action being
applied to pages of type A and B? Or, type A or type B?

In my opinion, exclusion based approach is simpler for understanding the
consequence of such combinational usage.

>
> > I think that might confuse people in some
> > cases. Actually, I have used the term "filter-out" and "filter-in" on
> > this and several threads. When saying about "filter-in" scenario, I had to
> > emphasize the fact that it is not adding something but excluding others.
>
> Excluding others also means including matched pages. I think we better
> focus on what to do only for the matched pages.

I agree that is true for the end-users in many cases. But I think that depends
on the case, and at least this specific case (kernel ABI level discussion about
DAMOS filters), I don't really feel that's better.

>
> > I now think that was not a good approach.
> >
> > Finally, "apply" sounds a bit deterministic. I think it could be a bit
> > confusing in some cases such as when using multiple filters in a combined way.
> > For example, if we have two filters for 1) "apply" a memcg and 2) skip anon
> > pages, the given DAMOS action will not be applied to anon pages of the memcg.
> > I think this might be a bit confusing.
>
> No objection on this. If so, I think "in" sounds better than "apply".

Thanks for understanding. I think allowlists or denylists might also been
better names.

>
> > >
> > > Internally, the type of "matching" can be boolean, but it'd be better
> > > for general users have another ways to set it such as "out"/"in" or
> > > "skip"/"apply" via sysfs interface. I prefer "skip" and "apply" looks
> > > more intuitive, but I don't have strong objection on "out" and "in" as
> > > well.
> >
> > Unfortunately, DAMON sysfs interface is an ABI that we want to keep stable. Of
> > course we could make some changes on it if really required. But I'm unsure if
> > the problem of current naming and benefit of the sugegsted change are big
> > enough to outweighs the stability risk and additional efforts.
>
> I don't ask to change the interface, but just provide another way for
> the setting. For example, the current "matching" accepts either 1,
> true, or Y but internally keeps as "true" as a boolean type.
>
> $ cd /sys/kernel/mm/damon/admin/kdamonds/0/contexts/0/schemes/0/filters/0
>
> $ echo 1 | tee matching && cat matching
> 1
> Y
>
> $ echo true | tee matching && cat matching
> true
> Y
>
> $ echo Y | tee matching && cat matching
> Y
> Y
>
> I'm asking if it's okay making "matching" receive "out" or "skip" as
> follows.
>
> $ echo out | tee matching && cat matching
> out
> Y
>
> $ echo skip | tee matching && cat matching
> skip
> Y

I have no strong concern about this. But also not seeing significant benefit
of this change. This will definitely not regress user experience. But it will
require introducing more kernel code, though the amount will be fairly small.
And this new interface will be something that we need to keep maintain, so
adding a tiny bit of maintenance burden. I'd prefer improving the documents or
user-space tool and keep the kernel code simple.

IMHO, end users shouldn't deal directly with DAMOS filters at all, and kernel
ABI document should be clear enough to avoid confusion. But, if someone uses
kernel ABI on production without reading the document, I'd say it might better
to crash or OOPS to give clear warning and lessons.

>
> > Also, DAMON sysfs interface is arguably not for _very_ general users. DAMON
> > user-space tool is the one for _more_ general users. To quote DAMON usage
> > document,
> >
> > - *DAMON user space tool.*
> > `This <https://github.com/awslabs/damo>`_ is for privileged people such as
> > system administrators who want a just-working human-friendly interface.
> > [...]
> > - *sysfs interface.*
> > :ref:`This <sysfs_interface>` is for privileged user space programmers who
> > want more optimized use of DAMON. [...]
> >
> > If the concept is that confused, I think we could improve the documentation, or
> > the user space tool. But for DAMON sysfs interface, I think we need more
> > discussions for getting clear pros/cons that justifies the risk and the effort.
>
> If my suggestion is not what you want in sysfs interface, then "damo"
> can receive these more meaningful names and translate to "true" or
> "false" when writing to sysfs.

Yes, I agree. We could further hide filter concept at all. For example, we
could let damo user call "migrate" DAMOS action plus "non-young" filter as
"promote" action. Or, have a dedicated command for tiered-memory management.
Similar to the gen_config.py of HMSDK
(https://github.com/skhynix/hmsdk/blob/main/tools/gen_config.py). But this
would be something to further discuss on different threads.

>
> > >
> > > I also feel the filter name "young" is more for developers not for
> > > general users. I think this can be changed to "accessed" filter
> > > instead.
> >
> > In my humble opinion, "accessed" might be confusing with the term that being
> > used by DAMON, specifically, the concept of "nr_accesses". I also thought
> > about using more specific term such as "pg-accessed" or something else, but I
> > felt it is still unclear or making it too verbose.
> >
> > I agree "young" sounds more for developers. But, again, DAMON sysfs is for not
> > _very_ general users.
>
> I worried the developer term is also going to be used for "damo" user
> space tool as "young" filter. But if you think it's good enough, then I
> will follow the decision as I also think "accessed" is not the best term
> for this.

The line is not very clear, but I think the line for "damo" should be different
from that for DAMON sysfs interface.

[...]
> > > > > > > > DAMOS filter is basically for filtering "out" memory regions that matches to
> > > > > > > > the condition.
> > >
> > > Right. In other tools, I see filters are more used as filtering "in"
> > > rather than filtering "out". I feel this makes me more confused.
> >
> > I understand that the word, "filtering", can be used for both, and therefore
> > can be confused. I was also spending no small times at naming since I was
> > thinking about both coffee filters and color filters (of photoshop or glasses).
> > But it turned out that I'm more familiar with coffee filters, and might be same
> > for DAMON community, since the community is having beer/coffee/tea chat series
> > ;) (https://lore.kernel.org/damon/[email protected]/)
>
> Yeah, I thought about filter for including pages for given config as
> follows.
>
> \ /
> \ / only matched items pass this filter.
> ||
>
> But the current DAMOS filter is about excluding pages for given config
> as follows just like a strainer.
> ___
> /###\
> |#####| matched items are excluded via this filter.
> \###/
> ---
>
> I think I won't get confused after keeping this difference in mind.

My mind model was describing it as "excluding" coffee beans, but I'd say these
are just different perspectives, not a thing about right or wrong. I'm
grateful to learn one more perspective that is different from mine :)

>
> > That said, I think we may be able to make this better documented, or add a
> > layer of abstraction on DAMON user-space tool.
> >
[...]
> > To summarize my opinion again,
> >
> > 1. I agree the concept and names of DAMOS filters are confusing and not very
> > intuitive.
> > 2. However, it's unclear if the problem and the benefit from the suggested new
> > names are huge enough to take the risk and effort on changing ABI.
> > 3. We could improve documentation and/or user-space tool.
>
> I think improving "damo" can be a good solution.

Looking forward to the discussion on it! :)

>
> > Thank you again for the suggestion and confirmations to my questions.
>
> Likewise, thank you for the explanation in details.

My great pleasure, and thank you for patiently keeping this grateful
discussion!


Thanks,
SJ

[...]

2024-03-22 08:28:24

by Honggyu Kim

[permalink] [raw]
Subject: Re: [RFC PATCH v2 0/7] DAMON based 2-tier memory management for CXL memory

Hi SeongJae,

On Wed, 20 Mar 2024 09:56:19 -0700 SeongJae Park <[email protected]> wrote:
> Hi Honggyu,
>
> On Wed, 20 Mar 2024 16:07:48 +0900 Honggyu Kim <[email protected]> wrote:
>
> > Hi SeongJae,
> >
> > On Mon, 18 Mar 2024 12:07:21 -0700 SeongJae Park <[email protected]> wrote:
> > > On Mon, 18 Mar 2024 22:27:45 +0900 Honggyu Kim <[email protected]> wrote:
> > >
> > > > Hi SeongJae,
> > > >
> > > > On Sun, 17 Mar 2024 08:31:44 -0700 SeongJae Park <[email protected]> wrote:
> > > > > Hi Honggyu,
> > > > >
> > > > > On Sun, 17 Mar 2024 17:36:29 +0900 Honggyu Kim <[email protected]> wrote:
> > > > >
> > > > > > Hi SeongJae,
> > > > > >
> > > > > > Thanks for the confirmation. I have a few comments on young filter so
> > > > > > please read the inline comments again.
> > > > > >
> > > > > > On Wed, 12 Mar 2024 08:53:00 -0800 SeongJae Park <[email protected]> wrote:
> > > > > > > Hi Honggyu,
> > > > > > >
> > > > > > > > > -----Original Message-----
> > > > > > > > > From: SeongJae Park <[email protected]>
> > > > > > > > > Sent: Tuesday, March 12, 2024 3:33 AM
> > > > > > > > > To: Honggyu Kim <[email protected]>
> > > > > > > > > Cc: SeongJae Park <[email protected]>; kernel_team <[email protected]>
> > > > > > > > > Subject: RE: Re: [RFC PATCH v2 0/7] DAMON based 2-tier memory management for CXL memory
> > > > > > > > >
> > > > > > > > > Hi Honggyu,
> > > > > > > > >
> > > > > > > > > On Mon, 11 Mar 2024 12:51:12 +0000 "[email protected]" <[email protected]> wrote:
> > > > > > > > >
> > > > > > > > > > Hi SeongJae,
> > > > > > > > > >
> > > > > > > > > > I've tested it again and found that "young" filter has to be set
> > > > > > > > > > differently as follows.
> > > > > > > > > > - demote action: set "young" filter with "matching" true
> > > > > > > > > > - promote action: set "young" filter with "matching" false
> > > >
> > > > Thinking it again, I feel like "matching" true or false looks quite
> > > > vague to me as a general user.
> > > >
> > > > Instead, I would like to have more meaningful names for "matching" as
> > > > follows.
> > > >
> > > > - matching "true" can be either (filter) "out" or "skip".
> > > > - matching "false" can be either (filter) "in" or "apply".
> > >
> > > I agree the naming could be done much better. And thank you for the nice
> > > suggestions. I have a few concerns, though.
> >
> > I don't think my suggestion is best. I just would like to have more
> > discussion about it.
>
> I also understand my naming sense is far from good :) I'm grateful to have
> this constructive discussion!

Yeah, naming is always difficult. Thanks anyway :)

> >
> > > Firstly, increasing the number of behavioral concepts. DAMOS filter feature
> > > has only single behavior: excluding some types of memory from DAMOS action
> > > target. The "matching" is to provide a flexible way for further specifying the
> > > target to exclude in a bit detail. Without it, we would need non-variant for
> > > each filter type. Comapred to the current terms, the new terms feel like
> > > implying there are two types of behaviors. I think one behavior is easier to
> > > understand than two behaviors, and better match what internal code is doing.
> > >
> > > Secondly, ambiguity in "in" and "apply". To me, the terms sound like _adding_
> > > something more than _excluding_.
> >
> > I understood that young filter "matching" "false" means apply action
> > only to young pages. Do I misunderstood something here? If not,
>
> Technically speaking, having a DAMOS filter with 'matching' parameter as
> 'false' for 'young pages' type means you want DAMOS to "exclude pages that not
> young from the scheme's action target". That's the only thing it truly does,
> and what it tries to guarantee. Whether the action will be applied to young
> pages or not depends on more factors including additional filters and DAMOS
> parameter. IOW, that's not what the simple setting promises.
>
> Of course, I know you are assuming there is only the single filter. Hence,
> effectively you're correct. And the sentence may be a better wording for end
> users. However, it tooke me a bit time to understand your assumption and
> concluding whether your sentence is correct or not, since I had to think about
> the assumptions.
>
> I'd say this also reminds me the first concern that I raised on the previous
> mail. IOW, I feel this sentence is introducing one more behavior and making it
> bit taking longer time to digest, for developers who should judge it based on
> the source code. I'd suggest use only one behavioral term, "exclude", since it
> is what the code really does, unless it is wording for end users.

Okay, I will just think filter "exclude" something.

> > "apply" means _adding_ or _including_ only the matched pages for action.
> > It looks like you thought about _excluding_ non matched pages here.
>
> Yes. I'd prefer using only single term, _excluding_. It fits with the code,
> and require one word less that "adding" or "including", since "adding" or
> "including" require one more word, "only".
>
> Also, even with "only", the fact that there could be more filters makes me
> unsure what is the consequence of having it. That is, if we have a filter that
> includes only pages of type A, but if there could be yet another filter that
> includes only pages of type B, would the consequence is the action being
> applied to pages of type A and B? Or, type A or type B?
>
> In my opinion, exclusion based approach is simpler for understanding the
> consequence of such combinational usage.
>
> >
> > > I think that might confuse people in some
> > > cases. Actually, I have used the term "filter-out" and "filter-in" on
> > > this and several threads. When saying about "filter-in" scenario, I had to
> > > emphasize the fact that it is not adding something but excluding others.
> >
> > Excluding others also means including matched pages. I think we better
> > focus on what to do only for the matched pages.
>
> I agree that is true for the end-users in many cases. But I think that depends
> on the case, and at least this specific case (kernel ABI level discussion about
> DAMOS filters), I don't really feel that's better.

OK. It could be a matter of preference and the current filter is already
in the mainline so I won't insist more.

> >
> > > I now think that was not a good approach.
> > >
> > > Finally, "apply" sounds a bit deterministic. I think it could be a bit
> > > confusing in some cases such as when using multiple filters in a combined way.
> > > For example, if we have two filters for 1) "apply" a memcg and 2) skip anon
> > > pages, the given DAMOS action will not be applied to anon pages of the memcg.
> > > I think this might be a bit confusing.
> >
> > No objection on this. If so, I think "in" sounds better than "apply".
>
> Thanks for understanding. I think allowlists or denylists might also been
> better names.

"allow" and "deny" sound good to me as well. We don't have to change it
though.

> >
> > > >
> > > > Internally, the type of "matching" can be boolean, but it'd be better
> > > > for general users have another ways to set it such as "out"/"in" or
> > > > "skip"/"apply" via sysfs interface. I prefer "skip" and "apply" looks
> > > > more intuitive, but I don't have strong objection on "out" and "in" as
> > > > well.
> > >
> > > Unfortunately, DAMON sysfs interface is an ABI that we want to keep stable. Of
> > > course we could make some changes on it if really required. But I'm unsure if
> > > the problem of current naming and benefit of the sugegsted change are big
> > > enough to outweighs the stability risk and additional efforts.
> >
> > I don't ask to change the interface, but just provide another way for
> > the setting. For example, the current "matching" accepts either 1,
> > true, or Y but internally keeps as "true" as a boolean type.
> >
> > $ cd /sys/kernel/mm/damon/admin/kdamonds/0/contexts/0/schemes/0/filters/0
> >
> > $ echo 1 | tee matching && cat matching
> > 1
> > Y
> >
> > $ echo true | tee matching && cat matching
> > true
> > Y
> >
> > $ echo Y | tee matching && cat matching
> > Y
> > Y
> >
> > I'm asking if it's okay making "matching" receive "out" or "skip" as
> > follows.
> >
> > $ echo out | tee matching && cat matching
> > out
> > Y
> >
> > $ echo skip | tee matching && cat matching
> > skip
> > Y
>
> I have no strong concern about this. But also not seeing significant benefit
> of this change. This will definitely not regress user experience. But it will
> require introducing more kernel code, though the amount will be fairly small.
> And this new interface will be something that we need to keep maintain, so
> adding a tiny bit of maintenance burden. I'd prefer improving the documents or
> user-space tool and keep the kernel code simple.

OK. I will see if there is a way to improve damo tool for this instead
of making changes on the kernel side.

> IMHO, end users shouldn't deal directly with DAMOS filters at all, and kernel
> ABI document should be clear enough to avoid confusion. But, if someone uses
> kernel ABI on production without reading the document, I'd say it might better
> to crash or OOPS to give clear warning and lessons.
>
> >
> > > Also, DAMON sysfs interface is arguably not for _very_ general users. DAMON
> > > user-space tool is the one for _more_ general users. To quote DAMON usage
> > > document,
> > >
> > > - *DAMON user space tool.*
> > > `This <https://github.com/awslabs/damo>`_ is for privileged people such as
> > > system administrators who want a just-working human-friendly interface.
> > > [...]
> > > - *sysfs interface.*
> > > :ref:`This <sysfs_interface>` is for privileged user space programmers who
> > > want more optimized use of DAMON. [...]
> > >
> > > If the concept is that confused, I think we could improve the documentation, or
> > > the user space tool. But for DAMON sysfs interface, I think we need more
> > > discussions for getting clear pros/cons that justifies the risk and the effort.
> >
> > If my suggestion is not what you want in sysfs interface, then "damo"
> > can receive these more meaningful names and translate to "true" or
> > "false" when writing to sysfs.
>
> Yes, I agree. We could further hide filter concept at all. For example, we
> could let damo user call "migrate" DAMOS action plus "non-young" filter as
> "promote" action. Or, have a dedicated command for tiered-memory management.
> Similar to the gen_config.py of HMSDK
> (https://github.com/skhynix/hmsdk/blob/main/tools/gen_config.py). But this
> would be something to further discuss on different threads.

Yeah, I made this thread too much about filter naming discussion rather
than tiered memory support.

> >
> > > >
> > > > I also feel the filter name "young" is more for developers not for
> > > > general users. I think this can be changed to "accessed" filter
> > > > instead.
> > >
> > > In my humble opinion, "accessed" might be confusing with the term that being
> > > used by DAMON, specifically, the concept of "nr_accesses". I also thought
> > > about using more specific term such as "pg-accessed" or something else, but I
> > > felt it is still unclear or making it too verbose.
> > >
> > > I agree "young" sounds more for developers. But, again, DAMON sysfs is for not
> > > _very_ general users.
> >
> > I worried the developer term is also going to be used for "damo" user
> > space tool as "young" filter. But if you think it's good enough, then I
> > will follow the decision as I also think "accessed" is not the best term
> > for this.
>
> The line is not very clear, but I think the line for "damo" should be different
> from that for DAMON sysfs interface.
>
> [...]
> > > > > > > > > DAMOS filter is basically for filtering "out" memory regions that matches to
> > > > > > > > > the condition.
> > > >
> > > > Right. In other tools, I see filters are more used as filtering "in"
> > > > rather than filtering "out". I feel this makes me more confused.
> > >
> > > I understand that the word, "filtering", can be used for both, and therefore
> > > can be confused. I was also spending no small times at naming since I was
> > > thinking about both coffee filters and color filters (of photoshop or glasses).
> > > But it turned out that I'm more familiar with coffee filters, and might be same
> > > for DAMON community, since the community is having beer/coffee/tea chat series
> > > ;) (https://lore.kernel.org/damon/[email protected]/)
> >
> > Yeah, I thought about filter for including pages for given config as
> > follows.
> >
> > \ /
> > \ / only matched items pass this filter.
> > ||
> >
> > But the current DAMOS filter is about excluding pages for given config
> > as follows just like a strainer.
> > ___
> > /###\
> > |#####| matched items are excluded via this filter.
> > \###/
> > ---
> >
> > I think I won't get confused after keeping this difference in mind.
>
> My mind model was describing it as "excluding" coffee beans, but I'd say these
> are just different perspectives, not a thing about right or wrong. I'm
> grateful to learn one more perspective that is different from mine :)

I'm more familiar with the filter in ftrace, which is set to
/sys/kernel/tracing/set_ftrace_filter and it means "including"
something. But I will keep thinking DAMOS filter is different.

> >
> > > That said, I think we may be able to make this better documented, or add a
> > > layer of abstraction on DAMON user-space tool.
> > >
> [...]
> > > To summarize my opinion again,
> > >
> > > 1. I agree the concept and names of DAMOS filters are confusing and not very
> > > intuitive.
> > > 2. However, it's unclear if the problem and the benefit from the suggested new
> > > names are huge enough to take the risk and effort on changing ABI.
> > > 3. We could improve documentation and/or user-space tool.
> >
> > I think improving "damo" can be a good solution.
>
> Looking forward to the discussion on it! :)
>
> >
> > > Thank you again for the suggestion and confirmations to my questions.
> >
> > Likewise, thank you for the explanation in details.
>
> My great pleasure, and thank you for patiently keeping this grateful
> discussion!

Thanks again for your feedback.

Honggyu

> Thanks,
> SJ
>
> [...]

2024-03-22 09:03:13

by Honggyu Kim

[permalink] [raw]
Subject: Re: [RFC PATCH v2 0/7] DAMON based 2-tier memory management for CXL memory

Hi SeongJae,

On Tue, 27 Feb 2024 15:51:20 -0800 SeongJae Park <[email protected]> wrote:
> On Mon, 26 Feb 2024 23:05:46 +0900 Honggyu Kim <[email protected]> wrote:
>
> > There was an RFC IDEA "DAMOS-based Tiered-Memory Management" previously
> > posted at [1].
> >
> > It says there is no implementation of the demote/promote DAMOS action
> > are made. This RFC is about its implementation for physical address
> > space.
> >
> >
> > Introduction
> > ============
> >
> > With the advent of CXL/PCIe attached DRAM, which will be called simply
> > as CXL memory in this cover letter, some systems are becoming more
> > heterogeneous having memory systems with different latency and bandwidth
> > characteristics. They are usually handled as different NUMA nodes in
> > separate memory tiers and CXL memory is used as slow tiers because of
> > its protocol overhead compared to local DRAM.
> >
> > In this kind of systems, we need to be careful placing memory pages on
> > proper NUMA nodes based on the memory access frequency. Otherwise, some
> > frequently accessed pages might reside on slow tiers and it makes
> > performance degradation unexpectedly. Moreover, the memory access
> > patterns can be changed at runtime.
> >
> > To handle this problem, we need a way to monitor the memory access
> > patterns and migrate pages based on their access temperature. The
> > DAMON(Data Access MONitor) framework and its DAMOS(DAMON-based Operation
> > Schemes) can be useful features for monitoring and migrating pages.
> > DAMOS provides multiple actions based on DAMON monitoring results and it
> > can be used for proactive reclaim, which means swapping cold pages out
> > with DAMOS_PAGEOUT action, but it doesn't support migration actions such
> > as demotion and promotion between tiered memory nodes.
> >
> > This series supports two new DAMOS actions; DAMOS_DEMOTE for demotion
> > from fast tiers and DAMOS_PROMOTE for promotion from slow tiers. This
> > prevents hot pages from being stuck on slow tiers, which makes
> > performance degradation and cold pages can be proactively demoted to
> > slow tiers so that the system can increase the chance to allocate more
> > hot pages to fast tiers.
> >
> > The DAMON provides various tuning knobs but we found that the proactive
> > demotion for cold pages is especially useful when the system is running
> > out of memory on its fast tier nodes.
> >
> > Our evaluation result shows that it reduces the performance slowdown
> > compared to the default memory policy from 15~17% to 4~5% when the
> > system runs under high memory pressure on its fast tier DRAM nodes.
> >
> >
> > DAMON configuration
> > ===================
> >
> > The specific DAMON configuration doesn't have to be in the scope of this
> > patch series, but some rough idea is better to be shared to explain the
> > evaluation result.
> >
> > The DAMON provides many knobs for fine tuning but its configuration file
> > is generated by HMSDK[2]. It includes gen_config.py script that
> > generates a json file with the full config of DAMON knobs and it creates
> > multiple kdamonds for each NUMA node when the DAMON is enabled so that
> > it can run hot/cold based migration for tiered memory.
>
> I was feeling a bit confused from here since DAMON doesn't receive parameters
> via a file. To my understanding, the 'configuration file' means the input file
> for DAMON user-space tool, damo, not DAMON. Just a trivial thing, but making
> it clear if possible could help readers in my opinion.
>
> >
> >
> > Evaluation Workload
> > ===================
> >
> > The performance evaluation is done with redis[3], which is a widely used
> > in-memory database and the memory access patterns are generated via
> > YCSB[4]. We have measured two different workloads with zipfian and
> > latest distributions but their configs are slightly modified to make
> > memory usage higher and execution time longer for better evaluation.
> >
> > The idea of evaluation using these demote and promote actions covers
> > system-wide memory management rather than partitioning hot/cold pages of
> > a single workload. The default memory allocation policy creates pages
> > to the fast tier DRAM node first, then allocates newly created pages to
> > the slow tier CXL node when the DRAM node has insufficient free space.
> > Once the page allocation is done then those pages never move between
> > NUMA nodes. It's not true when using numa balancing, but it is not the
> > scope of this DAMON based 2-tier memory management support.
> >
> > If the working set of redis can be fit fully into the DRAM node, then
> > the redis will access the fast DRAM only. Since the performance of DRAM
> > only is faster than partially accessing CXL memory in slow tiers, this
> > environment is not useful to evaluate this patch series.
> >
> > To make pages of redis be distributed across fast DRAM node and slow
> > CXL node to evaluate our demote and promote actions, we pre-allocate
> > some cold memory externally using mmap and memset before launching
> > redis-server. We assumed that there are enough amount of cold memory in
> > datacenters as TMO[5] and TPP[6] papers mentioned.
> >
> > The evaluation sequence is as follows.
> >
> > 1. Turn on DAMON with DAMOS_DEMOTE action for DRAM node and
> > DAMOS_PROMOTE action for CXL node. It demotes cold pages on DRAM
> > node and promotes hot pages on CXL node in a regular interval.
> > 2. Allocate a huge block of cold memory by calling mmap and memset at
> > the fast tier DRAM node, then make the process sleep to make the fast
> > tier has insufficient memory for redis-server.
> > 3. Launch redis-server and load prebaked snapshot image, dump.rdb. The
> > redis-server consumes 52GB of anon pages and 33GB of file pages, but
> > due to the cold memory allocated at 2, it fails allocating the entire
> > memory of redis-server on the fast tier DRAM node so it partially
> > allocates the remaining on the slow tier CXL node. The ratio of
> > DRAM:CXL depends on the size of the pre-allocated cold memory.
> > 4. Run YCSB to make zipfian or latest distribution of memory accesses to
> > redis-server, then measure its execution time when it's completed.
> > 5. Repeat 4 over 50 times to measure the average execution time for each
> > run.
> > 6. Increase the cold memory size then repeat goes to 2.
> >
> > For each test at 4 took about a minute so repeating it 50 times almost
> > took about 1 hour for each test with a specific cold memory from 440GB
> > to 500GB in 10GB increments for each evaluation. So it took about more
> > than 10 hours for both zipfian and latest workloads to get the entire
> > evaluation results. Repeating the same test set multiple times doesn't
> > show much difference so I think it might be enough to make the result
> > reliable.
> >
> >
> > Evaluation Results
> > ==================
> >
> > All the result values are normalized to DRAM-only execution time because
> > the workload cannot be faster than DRAM-only unless the workload hits
> > the bandwidth peak but our redis test doesn't go beyond the bandwidth
> > limit.
> >
> > So the DRAM-only execution time is the ideal result without affected by
> > the gap between DRAM and CXL performance difference. The NUMA node
> > environment is as follows.
> >
> > node0 - local DRAM, 512GB with a CPU socket (fast tier)
> > node1 - disabled
> > node2 - CXL DRAM, 96GB, no CPU attached (slow tier)
> >
> > The following is the result of generating zipfian distribution to
> > redis-server and the numbers are averaged by 50 times of execution.
> >
> > 1. YCSB zipfian distribution read only workload
> > memory pressure with cold memory on node0 with 512GB of local DRAM.
> > =============+================================================+=========
> > | cold memory occupied by mmap and memset |
> > | 0G 440G 450G 460G 470G 480G 490G 500G |
> > =============+================================================+=========
> > Execution time normalized to DRAM-only values | GEOMEAN
> > -------------+------------------------------------------------+---------
> > DRAM-only | 1.00 - - - - - - - | 1.00
> > CXL-only | 1.21 - - - - - - - | 1.21
> > default | - 1.09 1.10 1.13 1.15 1.18 1.21 1.21 | 1.15
> > DAMON 2-tier | - 1.02 1.04 1.05 1.04 1.05 1.05 1.06 | 1.04
> > =============+================================================+=========
> > CXL usage of redis-server in GB | AVERAGE
> > -------------+------------------------------------------------+---------
> > DRAM-only | 0.0 - - - - - - - | 0.0
> > CXL-only | 52.6 - - - - - - - | 52.6
> > default | - 19.4 26.1 32.3 38.5 44.7 50.5 50.3 | 37.4
> > DAMON 2-tier | - 0.1 1.6 5.2 8.0 9.1 11.8 13.6 | 7.1
> > =============+================================================+=========
> >
> > Each test result is based on the exeuction environment as follows.
> >
> > DRAM-only : redis-server uses only local DRAM memory.
> > CXL-only : redis-server uses only CXL memory.
> > default : default memory policy(MPOL_DEFAULT).
> > numa balancing disabled.
> > DAMON 2-tier: DAMON enabled with DAMOS_DEMOTE for DRAM nodes and
> > DAMOS_PROMOTE for CXL nodes.
> >
> > The above result shows the "default" execution time goes up as the size
> > of cold memory is increased from 440G to 500G because the more cold
> > memory used, the more CXL memory is used for the target redis workload
> > and this makes the execution time increase.
> >
> > However, "DAMON 2-tier" result shows less slowdown because the
> > DAMOS_DEMOTE action at DRAM node proactively demotes pre-allocated cold
> > memory to CXL node and this free space at DRAM increases more chance to
> > allocate hot or warm pages of redis-server to fast DRAM node. Moreover,
> > DEMOS_PROMOTE action at CXL node also promotes hot pages of redis-server
> > to DRAM node actively.
> >
> > As a result, it makes more memory of redis-server stay in DRAM node
> > compared to "default" memory policy and this makes the performance
> > improvement.
> >
> > The following result of latest distribution workload shows similar data.
> >
> > 2. YCSB latest distribution read only workload
> > memory pressure with cold memory on node0 with 512GB of local DRAM.
> > =============+================================================+=========
> > | cold memory occupied by mmap and memset |
> > | 0G 440G 450G 460G 470G 480G 490G 500G |
> > =============+================================================+=========
> > Execution time normalized to DRAM-only values | GEOMEAN
> > -------------+------------------------------------------------+---------
> > DRAM-only | 1.00 - - - - - - - | 1.00
> > CXL-only | 1.18 - - - - - - - | 1.18
> > default | - 1.16 1.15 1.17 1.18 1.16 1.18 1.15 | 1.17
> > DAMON 2-tier | - 1.04 1.04 1.05 1.05 1.06 1.05 1.06 | 1.05
> > =============+================================================+=========
> > CXL usage of redis-server in GB | AVERAGE
> > -------------+------------------------------------------------+---------
> > DRAM-only | 0.0 - - - - - - - | 0.0
> > CXL-only | 52.6 - - - - - - - | 52.6
> > default | - 19.3 26.1 32.2 38.5 44.6 50.5 50.6 | 37.4
> > DAMON 2-tier | - 1.3 3.8 7.0 4.1 9.4 12.5 16.7 | 7.8
> > =============+================================================+=========
> >
> > In summary of both results, our evaluation shows that "DAMON 2-tier"
> > memory management reduces the performance slowdown compared to the
> > "default" memory policy from 15~17% to 4~5% when the system runs with
> > high memory pressure on its fast tier DRAM nodes.
> >
> > The similar evaluation was done in another machine that has 256GB of
> > local DRAM and 96GB of CXL memory. The performance slowdown is reduced
> > from 20~24% for "default" to 5~7% for "DAMON 2-tier".
> >
> > Having these DAMOS_DEMOTE and DAMOS_PROMOTE actions can make 2-tier
> > memory systems run more efficiently under high memory pressures.
>
> Thank you for running the tests again with the new version of the patches and
> sharing the results!

It's a bit late answer, but the result was from the previous evaluation.
I ran it again with RFC v2, but didn't see much difference so just
pasted the same result here.

> >
> > Signed-off-by: Honggyu Kim <[email protected]>
> > Signed-off-by: Hyeongtak Ji <[email protected]>
> > Signed-off-by: Rakie Kim <[email protected]>
> >
> > [1] https://lore.kernel.org/damon/[email protected]
> > [2] https://github.com/skhynix/hmsdk
> > [3] https://github.com/redis/redis/tree/7.0.0
> > [4] https://github.com/brianfrankcooper/YCSB/tree/0.17.0
> > [5] https://dl.acm.org/doi/10.1145/3503222.3507731
> > [6] https://dl.acm.org/doi/10.1145/3582016.3582063
> >
> > Changes from RFC:
> > 1. Move most of implementation from mm/vmscan.c to mm/damon/paddr.c.
> > 2. Simplify some functions of vmscan.c and used in paddr.c, but need
> > to be reviewed more in depth.
> > 3. Refactor most functions for common usage for both promote and
> > demote actions and introduce an enum migration_mode for its control.
> > 4. Add "target_nid" sysfs knob for migration destination node for both
> > promote and demote actions.
> > 5. Move DAMOS_PROMOTE before DAMOS_DEMOTE and move then even above
> > DAMOS_STAT.
>
> Thank you very much for addressing many of my comments.

Thanks for your feedback in details.

> >
> > Honggyu Kim (3):
> > mm/damon: refactor DAMOS_PAGEOUT with migration_mode
> > mm: make alloc_demote_folio externally invokable for migration
> > mm/damon: introduce DAMOS_DEMOTE action for demotion
> >
> > Hyeongtak Ji (4):
> > mm/memory-tiers: add next_promotion_node to find promotion target
> > mm/damon: introduce DAMOS_PROMOTE action for promotion
> > mm/damon/sysfs-schemes: add target_nid on sysfs-schemes
> > mm/damon/sysfs-schemes: apply target_nid for promote and demote
> > actions
>
> Honggyu joined DAMON Beer/Coffee/Tea Chat[1] yesterday, and we discussed about
> this patchset in high level. Sharing the summary here for open discussion. As
> also discussed on the first version of this patchset[2], we want to make single
> action for general page migration with minimum changes, but would like to keep
> page level access re-check. We also agreed the previously proposed DAMOS
> filter-based approach could make sense for the purpose.

Thanks very much for the summary. I have been trying to merge promote
and demote actions into a single migrate action, but I found an issue
regarding damon_pa_scheme_score. It currently calls damon_cold_score()
for demote action and damon_hot_score() for promote action, but what
should we call when we use a single migrate action?

Thanks,
Honggyu

> Because I was anyway planning making such DAMOS filter for not only
> promotion/demotion but other types of DAMOS action, I will start developing the
> page level access re-check results based DAMOS filter. Once the implementation
> of the prototype is done, I will share the early implementation. Then, Honggyu
> will adjust their implementation based on the filter, and run their tests again
> and share the results.
>
> [1] https://lore.kernel.org/damon/[email protected]/
> [2] https://lore.kernel.org/damon/[email protected]
>
>
> Thanks,
> SJ
>
> >
> > include/linux/damon.h | 15 +-
> > include/linux/memory-tiers.h | 11 ++
> > include/linux/migrate_mode.h | 1 +
> > include/linux/vm_event_item.h | 1 +
> > include/trace/events/migrate.h | 3 +-
> > mm/damon/core.c | 5 +-
> > mm/damon/dbgfs.c | 2 +-
> > mm/damon/lru_sort.c | 3 +-
> > mm/damon/paddr.c | 282 ++++++++++++++++++++++++++++++++-
> > mm/damon/reclaim.c | 3 +-
> > mm/damon/sysfs-schemes.c | 39 ++++-
> > mm/internal.h | 1 +
> > mm/memory-tiers.c | 43 +++++
> > mm/vmscan.c | 10 +-
> > mm/vmstat.c | 1 +
> > 15 files changed, 404 insertions(+), 16 deletions(-)
> >
> >
> > base-commit: 0dd3ee31125508cd67f7e7172247f05b7fd1753a
> > --
> > 2.34.1

2024-03-22 16:33:07

by SeongJae Park

[permalink] [raw]
Subject: Re: [RFC PATCH v2 0/7] DAMON based 2-tier memory management for CXL memory

On Fri, 22 Mar 2024 18:02:23 +0900 Honggyu Kim <[email protected]> wrote:

> Hi SeongJae,
>
> On Tue, 27 Feb 2024 15:51:20 -0800 SeongJae Park <[email protected]> wrote:
> > On Mon, 26 Feb 2024 23:05:46 +0900 Honggyu Kim <[email protected]> wrote:
> >
> > > There was an RFC IDEA "DAMOS-based Tiered-Memory Management" previously
> > > posted at [1].
> > >
> > > It says there is no implementation of the demote/promote DAMOS action
> > > are made. This RFC is about its implementation for physical address
> > > space.
> > >
> > >
[...]
> > Thank you for running the tests again with the new version of the patches and
> > sharing the results!
>
> It's a bit late answer, but the result was from the previous evaluation.
> I ran it again with RFC v2, but didn't see much difference so just
> pasted the same result here.

No problem, thank you for clarifying :)

[...]
> > > Honggyu Kim (3):
> > > mm/damon: refactor DAMOS_PAGEOUT with migration_mode
> > > mm: make alloc_demote_folio externally invokable for migration
> > > mm/damon: introduce DAMOS_DEMOTE action for demotion
> > >
> > > Hyeongtak Ji (4):
> > > mm/memory-tiers: add next_promotion_node to find promotion target
> > > mm/damon: introduce DAMOS_PROMOTE action for promotion
> > > mm/damon/sysfs-schemes: add target_nid on sysfs-schemes
> > > mm/damon/sysfs-schemes: apply target_nid for promote and demote
> > > actions
> >
> > Honggyu joined DAMON Beer/Coffee/Tea Chat[1] yesterday, and we discussed about
> > this patchset in high level. Sharing the summary here for open discussion. As
> > also discussed on the first version of this patchset[2], we want to make single
> > action for general page migration with minimum changes, but would like to keep
> > page level access re-check. We also agreed the previously proposed DAMOS
> > filter-based approach could make sense for the purpose.
>
> Thanks very much for the summary. I have been trying to merge promote
> and demote actions into a single migrate action, but I found an issue
> regarding damon_pa_scheme_score. It currently calls damon_cold_score()
> for demote action and damon_hot_score() for promote action, but what
> should we call when we use a single migrate action?

Good point! This is what I didn't think about when suggesting that. Thank you
for letting me know this gap! I think there could be two approach, off the top
of my head.

The first one would be extending the interface so that the user can select the
score function. This would let flexible usage, but I'm bit concerned if this
could make things unnecessarily complex, and would really useful in many
general use case.

The second approach would be letting DAMON infer the intention. In this case,
I think we could know the intention is the demotion if the scheme has a youg
pages exclusion filter. Then, we could use the cold_score(). And vice versa.
To cover a case that there is no filter at all, I think we could have one
assumption. My humble intuition says the new action (migrate) may be used more
for promotion use case. So, in damon_pa_scheme_score(), if the action of the
given scheme is the new one (say, MIGRATE), the function will further check if
the scheme has a filter for excluding young pages. If so, the function will
use cold_score(). Otherwise, the function will use hot_score().

So I'd more prefer the second approach. I think it would be not too late to
consider the first approach after waiting for it turns out more actions have
such ambiguity and need more general interface for explicitly set the score
function.


Thanks,
SJ

[...]

2024-03-22 16:39:30

by SeongJae Park

[permalink] [raw]
Subject: Re: [RFC PATCH v2 0/7] DAMON based 2-tier memory management for CXL memory

On Fri, 22 Mar 2024 17:27:34 +0900 Honggyu Kim <[email protected]> wrote:

[...]
> OK. It could be a matter of preference and the current filter is already
> in the mainline so I won't insist more.

Thank you for accepting my humble suggestion.

[...]
> > I'd prefer improving the documents or
> > user-space tool and keep the kernel code simple.
>
> OK. I will see if there is a way to improve damo tool for this instead
> of making changes on the kernel side.

Looking forward!

[...]
> Yeah, I made this thread too much about filter naming discussion rather
> than tiered memory support.

No problem at all. Thank you for keeping this productive discussion.

[...]
> Thanks again for your feedback.

That's my pleasure :)


Thanks,
SJ

[...]

2024-03-25 21:47:30

by Honggyu Kim

[permalink] [raw]
Subject: Re: [RFC PATCH v2 0/7] DAMON based 2-tier memory management for CXL memory

Hi SeongJae,

On Fri, 22 Mar 2024 09:32:23 -0700 SeongJae Park <[email protected]> wrote:
> On Fri, 22 Mar 2024 18:02:23 +0900 Honggyu Kim <[email protected]> wrote:
>
> > Hi SeongJae,
> >
> > On Tue, 27 Feb 2024 15:51:20 -0800 SeongJae Park <[email protected]> wrote:
> > > On Mon, 26 Feb 2024 23:05:46 +0900 Honggyu Kim <[email protected]> wrote:
> > >
> > > > There was an RFC IDEA "DAMOS-based Tiered-Memory Management" previously
> > > > posted at [1].
> > > >
> > > > It says there is no implementation of the demote/promote DAMOS action
> > > > are made. This RFC is about its implementation for physical address
> > > > space.
> > > >
> > > >
> [...]
> > > Thank you for running the tests again with the new version of the patches and
> > > sharing the results!
> >
> > It's a bit late answer, but the result was from the previous evaluation.
> > I ran it again with RFC v2, but didn't see much difference so just
> > pasted the same result here.
>
> No problem, thank you for clarifying :)
>
> [...]
> > > > Honggyu Kim (3):
> > > > mm/damon: refactor DAMOS_PAGEOUT with migration_mode
> > > > mm: make alloc_demote_folio externally invokable for migration
> > > > mm/damon: introduce DAMOS_DEMOTE action for demotion
> > > >
> > > > Hyeongtak Ji (4):
> > > > mm/memory-tiers: add next_promotion_node to find promotion target
> > > > mm/damon: introduce DAMOS_PROMOTE action for promotion
> > > > mm/damon/sysfs-schemes: add target_nid on sysfs-schemes
> > > > mm/damon/sysfs-schemes: apply target_nid for promote and demote
> > > > actions
> > >
> > > Honggyu joined DAMON Beer/Coffee/Tea Chat[1] yesterday, and we discussed about
> > > this patchset in high level. Sharing the summary here for open discussion. As
> > > also discussed on the first version of this patchset[2], we want to make single
> > > action for general page migration with minimum changes, but would like to keep
> > > page level access re-check. We also agreed the previously proposed DAMOS
> > > filter-based approach could make sense for the purpose.
> >
> > Thanks very much for the summary. I have been trying to merge promote
> > and demote actions into a single migrate action, but I found an issue
> > regarding damon_pa_scheme_score. It currently calls damon_cold_score()
> > for demote action and damon_hot_score() for promote action, but what
> > should we call when we use a single migrate action?
>
> Good point! This is what I didn't think about when suggesting that. Thank you
> for letting me know this gap! I think there could be two approach, off the top
> of my head.
>
> The first one would be extending the interface so that the user can select the
> score function. This would let flexible usage, but I'm bit concerned if this
> could make things unnecessarily complex, and would really useful in many
> general use case.

I also think this looks complicated and may not be useful for general
users.

> The second approach would be letting DAMON infer the intention. In this case,
> I think we could know the intention is the demotion if the scheme has a youg
> pages exclusion filter. Then, we could use the cold_score(). And vice versa.
> To cover a case that there is no filter at all, I think we could have one
> assumption. My humble intuition says the new action (migrate) may be used more
> for promotion use case. So, in damon_pa_scheme_score(), if the action of the
> given scheme is the new one (say, MIGRATE), the function will further check if
> the scheme has a filter for excluding young pages. If so, the function will
> use cold_score(). Otherwise, the function will use hot_score().

Thanks for suggesting many ideas but I'm afraid that I feel this doesn't
look good. Thinking it again, I think we can think about keep using
DAMOS_PROMOTE and DAMOS_DEMOTE, but I can make them directly call
damon_folio_young() for access check instead of using young filter.

And we can internally handle the complicated combination such as demote
action sets "young" filter with "matching" true and promote action sets
"young" filter with "matching" false. IMHO, this will make the usage
simpler.

I would like to hear how you think about this.

> So I'd more prefer the second approach. I think it would be not too late to
> consider the first approach after waiting for it turns out more actions have
> such ambiguity and need more general interface for explicitly set the score
> function.

I will join the DAMON Beer/Coffee/Tea Chat tomorrow as scheduled so I
can talk more about this issue.

Thanks,
Honggyu

>
> Thanks,
> SJ
>
> [...]

2024-03-25 22:53:15

by SeongJae Park

[permalink] [raw]
Subject: Re: [RFC PATCH v2 0/7] DAMON based 2-tier memory management for CXL memory

On Mon, 25 Mar 2024 21:01:04 +0900 Honggyu Kim <[email protected]> wrote:

> Hi SeongJae,
>
> On Fri, 22 Mar 2024 09:32:23 -0700 SeongJae Park <[email protected]> wrote:
> > On Fri, 22 Mar 2024 18:02:23 +0900 Honggyu Kim <[email protected]> wrote:
[...]
> > > > Honggyu joined DAMON Beer/Coffee/Tea Chat[1] yesterday, and we discussed about
> > > > this patchset in high level. Sharing the summary here for open discussion. As
> > > > also discussed on the first version of this patchset[2], we want to make single
> > > > action for general page migration with minimum changes, but would like to keep
> > > > page level access re-check. We also agreed the previously proposed DAMOS
> > > > filter-based approach could make sense for the purpose.
> > >
> > > Thanks very much for the summary. I have been trying to merge promote
> > > and demote actions into a single migrate action, but I found an issue
> > > regarding damon_pa_scheme_score. It currently calls damon_cold_score()
> > > for demote action and damon_hot_score() for promote action, but what
> > > should we call when we use a single migrate action?
> >
> > Good point! This is what I didn't think about when suggesting that. Thank you
> > for letting me know this gap! I think there could be two approach, off the top
> > of my head.
> >
> > The first one would be extending the interface so that the user can select the
> > score function. This would let flexible usage, but I'm bit concerned if this
> > could make things unnecessarily complex, and would really useful in many
> > general use case.
>
> I also think this looks complicated and may not be useful for general
> users.
>
> > The second approach would be letting DAMON infer the intention. In this case,
> > I think we could know the intention is the demotion if the scheme has a youg
> > pages exclusion filter. Then, we could use the cold_score(). And vice versa.
> > To cover a case that there is no filter at all, I think we could have one
> > assumption. My humble intuition says the new action (migrate) may be used more
> > for promotion use case. So, in damon_pa_scheme_score(), if the action of the
> > given scheme is the new one (say, MIGRATE), the function will further check if
> > the scheme has a filter for excluding young pages. If so, the function will
> > use cold_score(). Otherwise, the function will use hot_score().
>
> Thanks for suggesting many ideas but I'm afraid that I feel this doesn't
> look good. Thinking it again, I think we can think about keep using
> DAMOS_PROMOTE and DAMOS_DEMOTE,

In other words, keep having dedicated DAMOS action for intuitive prioritization
score function, or, coupling the prioritization with each action, right? I
think this makes sense, and fit well with the documentation.

The prioritization mechanism should be different for each action. For example,
rarely accessed (colder) memory regions would be prioritized for page-out
scheme action. In contrast, the colder regions would be deprioritized for huge
page collapse scheme action. Hence, the prioritization mechanisms for each
action are implemented in each DAMON operations set, together with the actions.

In other words, each DAMOS action should allow users intuitively understand
what types of regions will be prioritized. We already have such couples of
DAMOS actions such as DAMOS_[NO]HUGEPAGE and DAMOS_LRU_[DE]PRIO. So adding a
couple of action for this case sounds reasonable to me. And I think this is
better and simpler than having the inferrence based behavior.

That said, I concern if 'PROMOTE' and 'DEMOTE' still sound bit ambiguous to
people who don't know 'demote_folio_list()' and its friends. Meanwhile, the
name might sound too detail about what it does to people who know the
functions, so make it bit unflexible. They might also get confused since we
don't have 'promote_folio_list()'.

To my humble understanding, what you really want to do is migrating pages to
specific address range (or node) prioritizing the pages based on the hotness.
What about, say, MIGRATE_{HOT,COLD}?

> but I can make them directly call
> damon_folio_young() for access check instead of using young filter.
>
> And we can internally handle the complicated combination such as demote
> action sets "young" filter with "matching" true and promote action sets
> "young" filter with "matching" false. IMHO, this will make the usage
> simpler.

I think whether to exclude young/non-young (maybe idle is better than
non-young?) pages from the action is better to be decoupled for following
reasons.

Firstly, we want to check the page granularity youngness mainly because we
found DAMON's monitoring result is not accurate enough for this use case. Or,
we could say that's because you cannot wait until DAMON's monitoring result
becomes accurate enough. For more detail, you could increase minimum age of
your scheme's target access pattern. I show you set minimum age of your demote
scheme as 30 seconds, but set promote scheme as 0 sec
(https://github.com/skhynix/hmsdk/blob/main/tools/gen_config.py). Increasing
the minimum age for promote scheme may reduce amount of wrong promotion. But I
assume you don't want to do that, maybe because you want to make promotion
fast. That's 100% valid use case in my opinion. And for such use case, making
the DAMOS action to do the page granularity access check would be helpful. But
if the user can set minimum age of two schemes large enough, or somehow DAMON's
monitoring accuracy is much improved, this page granularity access check might
not really required.

Secondly, I think we might want to migrate pages to other nodes in coldest
pages first way, but even if the page is young. One such case would be when
the top tier cannot accommodate all young pages. Especially when there are no
small number of tiers, I think this could happen. That is, we would prefer
migrating coldest page first, but that depend on only relative temperature and
hence young pages could also need to be migrated. Of course, this would be the
real case only if the user is convinced with DAMON's monitoring accuracy.

>
> I would like to hear how you think about this.

So, to summarize my humble opinion,

1. I like the idea of having two actions. But I'd like to use names other than
'promote' and 'demote'.
2. I still prefer having a filter for the page granularity access re-check.

>
> > So I'd more prefer the second approach. I think it would be not too late to
> > consider the first approach after waiting for it turns out more actions have
> > such ambiguity and need more general interface for explicitly set the score
> > function.
>
> I will join the DAMON Beer/Coffee/Tea Chat tomorrow as scheduled so I
> can talk more about this issue.

Looking forward to chatting with you :)


Thanks,
SJ

>
> Thanks,
> Honggyu
>
> >
> > Thanks,
> > SJ
> >
> > [...]

2024-03-26 23:03:23

by SeongJae Park

[permalink] [raw]
Subject: Re: [RFC PATCH v2 0/7] DAMON based 2-tier memory management for CXL memory

On Mon, 25 Mar 2024 15:53:03 -0700 SeongJae Park <[email protected]> wrote:

> On Mon, 25 Mar 2024 21:01:04 +0900 Honggyu Kim <[email protected]> wrote:
[...]
> > On Fri, 22 Mar 2024 09:32:23 -0700 SeongJae Park <[email protected]> wrote:
> > > On Fri, 22 Mar 2024 18:02:23 +0900 Honggyu Kim <[email protected]> wrote:
[...]
> >
> > I would like to hear how you think about this.
>
> So, to summarize my humble opinion,
>
> 1. I like the idea of having two actions. But I'd like to use names other than
> 'promote' and 'demote'.
> 2. I still prefer having a filter for the page granularity access re-check.
>
[...]
> > I will join the DAMON Beer/Coffee/Tea Chat tomorrow as scheduled so I
> > can talk more about this issue.
>
> Looking forward to chatting with you :)

We met and discussed about this topic in the chat series yesterday. Sharing
the summary for keeping the open discussion.

Honggyu thankfully accepted my humble suggestions on the last reply. Honggyu
will post the third version of this patchset soon. The patchset will implement
two new DAMOS actions, namely MIGRATE_HOT and MIGRATE_COLD. Those will migrate
the DAMOS target regions to a user-specified NUMA node, but will have different
prioritization score function. As name implies, they will prioritize more hot
regions and cold regions, respectively.

Honggyu, please feel free to fix if there is anything wrong or missed.

And thanks to Honggyu again for patiently keeping this productive discussion
and their awesome work.


Thanks,
SJ

[...]

2024-04-05 10:13:48

by Honggyu Kim

[permalink] [raw]
Subject: Re: [RFC PATCH v2 0/7] DAMON based 2-tier memory management for CXL memory

Hi SeongJae,

On Tue, 26 Mar 2024 16:03:09 -0700 SeongJae Park <[email protected]> wrote:
> On Mon, 25 Mar 2024 15:53:03 -0700 SeongJae Park <[email protected]> wrote:
>
> > On Mon, 25 Mar 2024 21:01:04 +0900 Honggyu Kim <[email protected]> wrote:
> [...]
> > > On Fri, 22 Mar 2024 09:32:23 -0700 SeongJae Park <[email protected]> wrote:
> > > > On Fri, 22 Mar 2024 18:02:23 +0900 Honggyu Kim <[email protected]> wrote:
> [...]
> > >
> > > I would like to hear how you think about this.
> >
> > So, to summarize my humble opinion,
> >
> > 1. I like the idea of having two actions. But I'd like to use names other than
> > 'promote' and 'demote'.
> > 2. I still prefer having a filter for the page granularity access re-check.
> >
> [...]
> > > I will join the DAMON Beer/Coffee/Tea Chat tomorrow as scheduled so I
> > > can talk more about this issue.
> >
> > Looking forward to chatting with you :)
>
> We met and discussed about this topic in the chat series yesterday. Sharing
> the summary for keeping the open discussion.
>
> Honggyu thankfully accepted my humble suggestions on the last reply. Honggyu
> will post the third version of this patchset soon. The patchset will implement
> two new DAMOS actions, namely MIGRATE_HOT and MIGRATE_COLD. Those will migrate
> the DAMOS target regions to a user-specified NUMA node, but will have different
> prioritization score function. As name implies, they will prioritize more hot
> regions and cold regions, respectively.

It's a late answer but thanks very much for the summary. It was really
helpful discussion in the chat series.

I'm leaving a message so that anyone can follow for the update. The v3
is posted at https://lore.kernel.org/damon/[email protected].

> Honggyu, please feel free to fix if there is anything wrong or missed.

I don't have anything to fix from your summary.

> And thanks to Honggyu again for patiently keeping this productive discussion
> and their awesome work.

I appreciate your persistent support and kind reviews.

Thanks,
Honggyu

>
> Thanks,
> SJ
>
> [...]