From: SeongJae Park <[email protected]>
NOTE: This is only an RFC for future features of DAMON patchset[1], which is
not merged in the mainline yet. The aim of this RFC is to show how DAMON would
be evolved once it is merged in. So, if you have some interest here, please
consider reviewing the DAMON patchset, either.
[1] https://lore.kernel.org/linux-mm/[email protected]/
Introduction
============
In short, this patchset improves the engine for general data access
pattern-oriented memory management for production quality and implements the
monitoring issue solved version of proactive reclamation on top of it.
Proactive Reclamation
---------------------
Proactively reclaiming cold pages helps saving memory and reducing latency
spikes that incurred by the direct reclaim of the process or CPU consumption of
kswapd, while incurring only minimal performance degradation for memory
over-committed systems[2].
Free Pages Reporting[9] based memory over-commit virtualization systems is
another use case of it. In the configuration, the guest VMs are supposed to
report free memory to host, so that host can reallocate the memory to other
guests. However, because Linux is designed to cache things in memory
aggressively, no guest would voluntarily free much memory without host's
intervention. Proactive reclamation could make the situation much better.
Google has implemented the idea and using it in their data center. They
further proposed upstreaming it in LSFMM'19, and "the general consensus was
that, while this sort of proactive reclaim would be useful for a number of
users, the cost of this particular solution was too high to consider merging it
upstream"[3]. The cost mostly comes from the coldness tracking. Roughly
speaking, the implementation periodically scans the 'Accessed' bit of each
page. For the reason, the overhead linearly increases as the size of the
memory and the scanning frequency grows. As a result, Google is known to
dedicating a CPU for the work. That's reasonable for Google, but it wouldn't
for someone.
DAMON and DAMOS: An engine for data access pattern-oriented memory management
-----------------------------------------------------------------------------
DAMON[4] is a framework for general data access monitoring. When it's adaptive
monitoring overhead control feature is used, it incurs minimized monitoring
overhead. It's not only small, but also upper-bounded, regardless of the
monitoring target memory size. Clients can set the upper-limit as they want.
While monitoring 70 GB memory of a production system every 5 milliseconds, it
consumes less than 1% single CPU time. For this, it sacrifies some of the
quality of the monitoring results. Nevertheless, the lower-bound of the
quality is configurable, and it has a best-effort algorithm for the quality.
Our test results[5] show the quality is practical enough. From the production
system monitoring, we were able to find a 4 KB memory region that shows highest
access frequency in the 70 GB memory. For someone still couldn't be convinced,
DAMON also supports the page-granularity monitoring[6], though it makes the
overhead much higher and proportional to the memory size again.
We normally don't monitor the data access pattern just for fun but to use if
for something like memory management. Proactive reclamation is one such usage.
For such general cases, DAMON provides a feature called DAMon-based Operation
Schemes (DAMOS)[7], which makes DAMON as an engine for general data access
pattern oriented memory management. Using this, clients can ask DAMON to find
memory regions of specific data access pattern and apply some memory management
action (e.g., paging out, move to head of the LRU list, use huge page, ...).
The request is called 'scheme' below.
DAMON-based Reclaim
-------------------
Therefore, by using DAMON in the cold pages detection, the proactive
reclamation's monitoring overhead issue could be solved. If someone like
Google is ok to dedicate CPUs for the monitoring and wants page-granularity
quality, they could configure DAMON so.
Actually, we already implemented a version of proactive reclamation on it and
shared its evaluation results before[5], which show noticeable achievements.
Nevertheless, it is only in a proof-of-concept level. Recently we further
introduced a user space tool[8] that automatically tunes schemes for specific
workloads and systems. Google's proactive reclamation also uses another
ML-based similar approach[2]. But, making it just works in the kernel would be
more convenient for more general users.
To this end, this patchset improves DAMOS to be proper for such production use,
and implements another version of the proactive reclamation, namely
DAMON_RECLAIM, on top of it.
DAMOS Improvements: Speed Limit, Prioritization, and Watermarks
---------------------------------------------------------------
One major problem of current version of DAMOS is the absence of the
aggressiveness control. For example, if huge memory regions of the specified
data access pattern is found, applying the action to the huge memory regions
could incur overhead. It could controlled by modifying the target data access
pattern and some auto-tuning approaches are available. But, for someone who
unable to use such tools or people who want it just works with only intuitive
tuning or default values, at least some safeguards are required.
For this, we provide speed limit. Using this, the client can specify up to how
much amount of memory the action is allowed to be applied within specific time
duration. Followup question is, to which memory regions should the action
applied within the limit? We implement a simple regions prioritization
mechanism for each action and make DAMOS to apply the action to high priority
regions first. It also allows users tune the prioritization by giving
different weights to region's size, access frequency, and age.
Another problem of current version of DAMOS is that it should manually turned
on and off, by clients. Though DAMON is very lightweight, someone would not
convinced. For such cases, we implement watermarks-based automatic schemes
activation. It allows the clients configuring the metric of their interest and
three watermarks. If the metric is higher than the high watermark or lower
than the low watermark, the scheme is deactivated. If the metric is lower than
the mid watermark but higher than the low watermark, the scheme is activated.
For example, in case of the proactive reclamation, the metric could be amount
of free memory. Using the watermarks, the sysadmin would be able to set it do
nothing at all when free memory is enough, but starts proactive reclamation
under light memory pressure. Then, if it doesn't works well enough and the
free memory becomes lower than the low watermark, we fall back to the LRU-based
page-granularity reclamation.
Evaluation
==========
We measured system memory usage and runtime of 24 realistic workloads in
PARSEC3 and SPLASH-2X benchmark suites on my QEMU/KVM based virtual machine.
The virtual machine runs on an i3.metal AWS instance and has 130GiB memory. It
utilizes 4 GiB zram as its swap device. We do the measurement 5 times and use
averages. We also measured the CPU consumption of DAMON_RECLAIM.
Compared to v5.12, DAMON_RECLAIM achieves 33% memory saving with only 2%
performance degradation. For this, DAMON_RECLAIM consumed only 5.72% of single
CPU time. Among the CPU consumption, only about 1.448% of single CPU time is
expected to be used for the monitoring.
Baseline and Complete Git Tree
==============================
The patches are based on the v5.12 plus DAMON patchset[1] plus DAMOS
patchset[7] plus physical address space support patchset[6]. You can also
clone the complete git tree from:
$ git clone git://github.com/sjp38/linux -b damon_reclaim/rfc/v1
The web is also available:
https://github.com/sjp38/linux/releases/tag/damon_reclaim/rfc/v1
Development Trees
-----------------
There are a couple of trees for entire DAMON patchset series and
features for future release.
- For latest release: https://github.com/sjp38/linux/tree/damon/master
- For next release: https://github.com/sjp38/linux/tree/damon/next
Long-term Support Trees
-----------------------
For people who want to test DAMON patchset series but using LTS kernels, there
are another couple of trees based on two latest LTS kernels respectively and
containing the 'damon/master' backports.
- For v5.4.y: https://github.com/sjp38/linux/tree/damon/for-v5.4.y
- For v5.10.y: https://github.com/sjp38/linux/tree/damon/for-v5.10.y
Sequence Of Patches
===================
The first patch makes DAMOS users able to described pages to be paged out cold
via physical address. Following four patches (patches 2-5) implement the speed
limit. Next four patches (patches 6-9) implement the memory regions
prioritization within the limit. Then, three patches (patches 10-12)
implementing the watermarks-based schemes activation follow. Finally, the 13th
patch implements the DAMON-based reclamation on top of DAMOS.
[1] https://lore.kernel.org/linux-mm/[email protected]/
[2] https://research.google/pubs/pub48551/
[3] https://lwn.net/Articles/787611/
[4] https://damonitor.github.io
[5] https://damonitor.github.io/doc/html/latest/vm/damon/eval.html
[6] https://lore.kernel.org/linux-mm/[email protected]/
[7] https://lore.kernel.org/linux-mm/[email protected]/
[8] https://github.com/awslabs/damoos
[9] https://www.kernel.org/doc/html/latest/vm/free_page_reporting.html
SeongJae Park (13):
mm/damon/paddr: Support the pageout scheme
mm/damon/damos: Make schemes aggressiveness controllable
damon/core/schemes: Skip already charged targets and regions
mm/damon/dbgfs: Support schemes speed limit
mm/damon/selftests: Support schemes speed limit
mm/damon/schemes: Prioritize regions within speed limit
mm/damon/vaddr,paddr: Support pageout prioritization
mm/damon/dbgfs: Support prioritization weights
tools/selftests/damon: Update for regions prioritization of schemes
mm/damon/schemes: Activate schemes based on a watermarks mechanism
mm/damon/dbgfs: Support watermarks
selftests/damon: Support watermarks
mm/damon: Introduce DAMON-based reclamation
include/linux/damon.h | 118 ++++++++-
mm/damon/Kconfig | 128 ++++++++++
mm/damon/Makefile | 1 +
mm/damon/core.c | 215 +++++++++++++++-
mm/damon/dbgfs.c | 45 +++-
mm/damon/paddr.c | 52 +++-
mm/damon/prmtv-common.c | 48 +++-
mm/damon/prmtv-common.h | 5 +
mm/damon/reclaim.c | 230 ++++++++++++++++++
mm/damon/vaddr.c | 15 ++
.../testing/selftests/damon/debugfs_attrs.sh | 4 +-
11 files changed, 834 insertions(+), 27 deletions(-)
create mode 100644 mm/damon/reclaim.c
--
2.17.1
From: SeongJae Park <[email protected]>
This commit makes the DAMON primitives for physical address space to
support the pageout action for DAMON-based Operation Schemes. IOW, now
the users can implement their own data access-aware reclamations for
whole system using DAMOS.
Signed-off-by: SeongJae Park <[email protected]>
---
mm/damon/paddr.c | 38 +++++++++++++++++++++++++++++++++++++-
mm/damon/prmtv-common.c | 2 +-
mm/damon/prmtv-common.h | 2 ++
3 files changed, 40 insertions(+), 2 deletions(-)
diff --git a/mm/damon/paddr.c b/mm/damon/paddr.c
index b92b07a3ce53..303db372e53b 100644
--- a/mm/damon/paddr.c
+++ b/mm/damon/paddr.c
@@ -7,6 +7,9 @@
#define pr_fmt(fmt) "damon-pa: " fmt
+#include <linux/swap.h>
+
+#include "../internal.h"
#include "prmtv-common.h"
/*
@@ -85,6 +88,39 @@ bool damon_pa_target_valid(void *t)
return true;
}
+int damon_pa_apply_scheme(struct damon_ctx *ctx, struct damon_target *t,
+ struct damon_region *r, struct damos *scheme)
+{
+ unsigned long addr;
+ LIST_HEAD(page_list);
+
+ if (scheme->action != DAMOS_PAGEOUT)
+ return -EINVAL;
+
+ for (addr = r->ar.start; addr < r->ar.end; addr += PAGE_SIZE) {
+ struct page *page = damon_get_page(PHYS_PFN(addr));
+
+ if (!page)
+ continue;
+
+ ClearPageReferenced(page);
+ test_and_clear_page_young(page);
+ if (isolate_lru_page(page)) {
+ put_page(page);
+ continue;
+ }
+ if (PageUnevictable(page)) {
+ putback_lru_page(page);
+ } else {
+ list_add(&page->lru, &page_list);
+ put_page(page);
+ }
+ }
+ reclaim_pages(&page_list);
+ cond_resched();
+ return 0;
+}
+
void damon_pa_set_primitives(struct damon_ctx *ctx)
{
ctx->primitive.init = NULL;
@@ -94,5 +130,5 @@ void damon_pa_set_primitives(struct damon_ctx *ctx)
ctx->primitive.reset_aggregated = NULL;
ctx->primitive.target_valid = damon_pa_target_valid;
ctx->primitive.cleanup = NULL;
- ctx->primitive.apply_scheme = NULL;
+ ctx->primitive.apply_scheme = damon_pa_apply_scheme;
}
diff --git a/mm/damon/prmtv-common.c b/mm/damon/prmtv-common.c
index 08e9318d67ed..01c1c1b37859 100644
--- a/mm/damon/prmtv-common.c
+++ b/mm/damon/prmtv-common.c
@@ -14,7 +14,7 @@
* The body of this function is stolen from the 'page_idle_get_page()'. We
* steal rather than reuse it because the code is quite simple.
*/
-static struct page *damon_get_page(unsigned long pfn)
+struct page *damon_get_page(unsigned long pfn)
{
struct page *page = pfn_to_online_page(pfn);
diff --git a/mm/damon/prmtv-common.h b/mm/damon/prmtv-common.h
index 939c41af6b59..ba0c4eecbb79 100644
--- a/mm/damon/prmtv-common.h
+++ b/mm/damon/prmtv-common.h
@@ -18,6 +18,8 @@
/* Get a random number in [l, r) */
#define damon_rand(l, r) (l + prandom_u32_max(r - l))
+struct page *damon_get_page(unsigned long pfn);
+
void damon_va_mkold(struct mm_struct *mm, unsigned long addr);
bool damon_va_young(struct mm_struct *mm, unsigned long addr,
unsigned long *page_sz);
--
2.17.1
From: SeongJae Park <[email protected]>
If there are too large memory regions fulfilling the target data access
pattern of a DAMON-based operation scheme, applying the action of the
scheme could consume too much CPU. To avoid that, this commit
implements a limit for the action application speed. Using the feature,
the client can set up to how much amount of memory regions the action
could applied within specific time duration.
Signed-off-by: SeongJae Park <[email protected]>
---
include/linux/damon.h | 34 +++++++++++++++++++++++++++----
mm/damon/core.c | 47 +++++++++++++++++++++++++++++++++++++------
mm/damon/dbgfs.c | 4 +++-
3 files changed, 74 insertions(+), 11 deletions(-)
diff --git a/include/linux/damon.h b/include/linux/damon.h
index 684a3603ddac..35068b0ece6f 100644
--- a/include/linux/damon.h
+++ b/include/linux/damon.h
@@ -87,6 +87,25 @@ enum damos_action {
DAMOS_STAT, /* Do nothing but only record the stat */
};
+/**
+ * struct damos_speed_limit - Controls the aggressiveness of the given scheme.
+ * @sz: Scheme action amount limit in bytes.
+ * @ms: Scheme action amount charge duration.
+ *
+ * To avoid consuming too much CPU time for applying the &struct damos->action
+ * to large memory, DAMON applies it to only up to &sz bytes within &ms.
+ *
+ * If &sz is 0, the limit is disabled.
+ */
+struct damos_speed_limit {
+ unsigned long sz;
+ unsigned long ms;
+
+/* private: for limit accounting */
+ unsigned long charged_sz;
+ unsigned long charged_from;
+};
+
/**
* struct damos - Represents a Data Access Monitoring-based Operation Scheme.
* @min_sz_region: Minimum size of target regions.
@@ -96,13 +115,19 @@ enum damos_action {
* @min_age_region: Minimum age of target regions.
* @max_age_region: Maximum age of target regions.
* @action: &damo_action to be applied to the target regions.
+ * @limit: Control the aggressiveness of this scheme.
* @stat_count: Total number of regions that this scheme is applied.
* @stat_sz: Total size of regions that this scheme is applied.
* @list: List head for siblings.
*
- * For each aggregation interval, DAMON applies @action to monitoring target
- * regions fit in the condition and updates the statistics. Note that both
- * the minimums and the maximums are inclusive.
+ * For each aggregation interval, DAMON finds regions which fit in the
+ * condition (&min_sz_region, &max_sz_region, &min_nr_accesses,
+ * &max_nr_accesses, &min_age_region, &max_age_region) and applies &action to
+ * those. To avoid consuming too much CPU for the &action, &limit is used.
+ *
+ * After applying the &action to each region, &stat_count and &stat_sz is
+ * updated to reflect the number of regions and total size of regions that the
+ * &action is applied.
*/
struct damos {
unsigned long min_sz_region;
@@ -112,6 +137,7 @@ struct damos {
unsigned int min_age_region;
unsigned int max_age_region;
enum damos_action action;
+ struct damos_speed_limit limit;
unsigned long stat_count;
unsigned long stat_sz;
struct list_head list;
@@ -335,7 +361,7 @@ struct damos *damon_new_scheme(
unsigned long min_sz_region, unsigned long max_sz_region,
unsigned int min_nr_accesses, unsigned int max_nr_accesses,
unsigned int min_age_region, unsigned int max_age_region,
- enum damos_action action);
+ enum damos_action action, struct damos_speed_limit *limit);
void damon_add_scheme(struct damon_ctx *ctx, struct damos *s);
void damon_destroy_scheme(struct damos *s);
diff --git a/mm/damon/core.c b/mm/damon/core.c
index a33b3a3b9e57..df784c72ea80 100644
--- a/mm/damon/core.c
+++ b/mm/damon/core.c
@@ -85,7 +85,7 @@ struct damos *damon_new_scheme(
unsigned long min_sz_region, unsigned long max_sz_region,
unsigned int min_nr_accesses, unsigned int max_nr_accesses,
unsigned int min_age_region, unsigned int max_age_region,
- enum damos_action action)
+ enum damos_action action, struct damos_speed_limit *limit)
{
struct damos *scheme;
@@ -103,6 +103,11 @@ struct damos *damon_new_scheme(
scheme->stat_sz = 0;
INIT_LIST_HEAD(&scheme->list);
+ scheme->limit.sz = limit->sz;
+ scheme->limit.ms = limit->ms;
+ scheme->limit.charged_sz = 0;
+ scheme->limit.charged_from = 0;
+
return scheme;
}
@@ -536,6 +541,9 @@ static void kdamond_reset_aggregated(struct damon_ctx *c)
}
}
+static void damon_split_region_at(struct damon_ctx *ctx,
+ struct damon_region *r, unsigned long sz_r);
+
static void damon_do_apply_schemes(struct damon_ctx *c,
struct damon_target *t,
struct damon_region *r)
@@ -544,7 +552,14 @@ static void damon_do_apply_schemes(struct damon_ctx *c,
unsigned long sz;
damon_for_each_scheme(s, c) {
+ struct damos_speed_limit *limit = &s->limit;
+
+ /* Check the limit */
+ if (limit->sz && limit->charged_sz >= limit->sz)
+ continue;
+
sz = r->ar.end - r->ar.start;
+ /* Check the target regions condition */
if (sz < s->min_sz_region || s->max_sz_region < sz)
continue;
if (r->nr_accesses < s->min_nr_accesses ||
@@ -552,22 +567,42 @@ static void damon_do_apply_schemes(struct damon_ctx *c,
continue;
if (r->age < s->min_age_region || s->max_age_region < r->age)
continue;
- s->stat_count++;
- s->stat_sz += sz;
- if (c->primitive.apply_scheme)
+
+ /* Apply the scheme */
+ if (c->primitive.apply_scheme) {
+ if (limit->sz && limit->charged_sz + sz > limit->sz) {
+ sz = limit->sz - limit->charged_sz;
+ damon_split_region_at(c, r, sz);
+ }
c->primitive.apply_scheme(c, t, r, s);
+ limit->charged_sz += sz;
+ }
if (s->action != DAMOS_STAT)
r->age = 0;
+
+ /* Update stat */
+ s->stat_count++;
+ s->stat_sz += sz;
}
}
static void kdamond_apply_schemes(struct damon_ctx *c)
{
struct damon_target *t;
- struct damon_region *r;
+ struct damon_region *r, *next_r;
+ struct damos *s;
+
+ damon_for_each_scheme(s, c) {
+ /* Reset charge window if the duration passed */
+ if (time_after_eq(jiffies, s->limit.charged_from +
+ msecs_to_jiffies(s->limit.ms))) {
+ s->limit.charged_from = jiffies;
+ s->limit.charged_sz = 0;
+ }
+ }
damon_for_each_target(t, c) {
- damon_for_each_region(r, t)
+ damon_for_each_region_safe(r, next_r, t)
damon_do_apply_schemes(c, t, r);
}
}
diff --git a/mm/damon/dbgfs.c b/mm/damon/dbgfs.c
index 5b254eccdb43..4b45b69db697 100644
--- a/mm/damon/dbgfs.c
+++ b/mm/damon/dbgfs.c
@@ -310,6 +310,8 @@ static struct damos **str_to_schemes(const char *str, ssize_t len,
*nr_schemes = 0;
while (pos < len && *nr_schemes < max_nr_schemes) {
+ struct damos_speed_limit limit = {};
+
ret = sscanf(&str[pos], "%lu %lu %u %u %u %u %u%n",
&min_sz, &max_sz, &min_nr_a, &max_nr_a,
&min_age, &max_age, &action, &parsed);
@@ -322,7 +324,7 @@ static struct damos **str_to_schemes(const char *str, ssize_t len,
pos += parsed;
scheme = damon_new_scheme(min_sz, max_sz, min_nr_a, max_nr_a,
- min_age, max_age, action);
+ min_age, max_age, action, &limit);
if (!scheme)
goto fail;
--
2.17.1
From: SeongJae Park <[email protected]>
If DAMOS stopped applying action to memory regions due to the speed
limit, it does nothing until next charge window starts. Then, it starts
the work from the beginning of the address space. If there is a huge
memory region at the beginning of the address space and it fulfills the
scheme target data access pattern, the action will applied to only the
region.
This commit mitigates the case by skipping memory regions that charged
in previous charge window at the beginning of current charge window.
Signed-off-by: SeongJae Park <[email protected]>
---
include/linux/damon.h | 5 +++++
mm/damon/core.c | 29 ++++++++++++++++++++++++++---
2 files changed, 31 insertions(+), 3 deletions(-)
diff --git a/include/linux/damon.h b/include/linux/damon.h
index 35068b0ece6f..0df81dd2d560 100644
--- a/include/linux/damon.h
+++ b/include/linux/damon.h
@@ -104,6 +104,8 @@ struct damos_speed_limit {
/* private: for limit accounting */
unsigned long charged_sz;
unsigned long charged_from;
+ struct damon_target *charge_target_from;
+ unsigned long charge_addr_from;
};
/**
@@ -331,6 +333,9 @@ struct damon_ctx {
#define damon_prev_region(r) \
(container_of(r->list.prev, struct damon_region, list))
+#define damon_last_region(t) \
+ (list_last_entry(&t->regions_list, struct damon_region, list))
+
#define damon_for_each_region(r, t) \
list_for_each_entry(r, &t->regions_list, list)
diff --git a/mm/damon/core.c b/mm/damon/core.c
index df784c72ea80..fab687f18d9c 100644
--- a/mm/damon/core.c
+++ b/mm/damon/core.c
@@ -107,6 +107,8 @@ struct damos *damon_new_scheme(
scheme->limit.ms = limit->ms;
scheme->limit.charged_sz = 0;
scheme->limit.charged_from = 0;
+ scheme->limit.charge_target_from = NULL;
+ scheme->limit.charge_addr_from = 0;
return scheme;
}
@@ -558,6 +560,21 @@ static void damon_do_apply_schemes(struct damon_ctx *c,
if (limit->sz && limit->charged_sz >= limit->sz)
continue;
+ if (limit->charge_target_from) {
+ if (t != limit->charge_target_from)
+ continue;
+ if (r == damon_last_region(t)) {
+ limit->charge_target_from = NULL;
+ limit->charge_addr_from = 0;
+ continue;
+ }
+ if (limit->charge_addr_from &&
+ r->ar.start < limit->charge_addr_from)
+ continue;
+ limit->charge_target_from = NULL;
+ limit->charge_addr_from = 0;
+ }
+
sz = r->ar.end - r->ar.start;
/* Check the target regions condition */
if (sz < s->min_sz_region || s->max_sz_region < sz)
@@ -576,6 +593,10 @@ static void damon_do_apply_schemes(struct damon_ctx *c,
}
c->primitive.apply_scheme(c, t, r, s);
limit->charged_sz += sz;
+ if (limit->sz && limit->charged_sz >= limit->sz) {
+ limit->charge_target_from = t;
+ limit->charge_addr_from = r->ar.end + 1;
+ }
}
if (s->action != DAMOS_STAT)
r->age = 0;
@@ -593,11 +614,13 @@ static void kdamond_apply_schemes(struct damon_ctx *c)
struct damos *s;
damon_for_each_scheme(s, c) {
+ struct damos_speed_limit *limit = &s->limit;
+
/* Reset charge window if the duration passed */
- if (time_after_eq(jiffies, s->limit.charged_from +
+ if (limit->sz && time_after_eq(jiffies, s->limit.charged_from +
msecs_to_jiffies(s->limit.ms))) {
- s->limit.charged_from = jiffies;
- s->limit.charged_sz = 0;
+ limit->charged_from = jiffies;
+ limit->charged_sz = 0;
}
}
--
2.17.1
From: SeongJae Park <[email protected]>
This commit makes the debugfs interface of DAMON to support the schemes
speed limit by chaning the format of the input for the schemes file.
Signed-off-by: SeongJae Park <[email protected]>
---
mm/damon/dbgfs.c | 30 +++++++++++++++++++++++-------
1 file changed, 23 insertions(+), 7 deletions(-)
diff --git a/mm/damon/dbgfs.c b/mm/damon/dbgfs.c
index 4b45b69db697..ea6d4fdb57fa 100644
--- a/mm/damon/dbgfs.c
+++ b/mm/damon/dbgfs.c
@@ -227,11 +227,12 @@ static ssize_t sprint_schemes(struct damon_ctx *c, char *buf, ssize_t len)
damon_for_each_scheme(s, c) {
rc = scnprintf(&buf[written], len - written,
- "%lu %lu %u %u %u %u %d %lu %lu\n",
+ "%lu %lu %u %u %u %u %d %lu %lu %lu %lu\n",
s->min_sz_region, s->max_sz_region,
s->min_nr_accesses, s->max_nr_accesses,
s->min_age_region, s->max_age_region,
- s->action, s->stat_count, s->stat_sz);
+ s->action, s->limit.sz, s->limit.ms,
+ s->stat_count, s->stat_sz);
if (!rc)
return -ENOMEM;
@@ -312,10 +313,11 @@ static struct damos **str_to_schemes(const char *str, ssize_t len,
while (pos < len && *nr_schemes < max_nr_schemes) {
struct damos_speed_limit limit = {};
- ret = sscanf(&str[pos], "%lu %lu %u %u %u %u %u%n",
+ ret = sscanf(&str[pos], "%lu %lu %u %u %u %u %u %lu %lu%n",
&min_sz, &max_sz, &min_nr_a, &max_nr_a,
- &min_age, &max_age, &action, &parsed);
- if (ret != 7)
+ &min_age, &max_age, &action, &limit.sz,
+ &limit.ms, &parsed);
+ if (ret != 9)
break;
if (!damos_action_valid(action)) {
pr_err("wrong action %d\n", action);
@@ -1133,6 +1135,15 @@ static ssize_t dbgfs_monitor_on_write(struct file *file,
return ret;
}
+/*
+ * v1: Add the scheme speed limit
+ */
+static ssize_t dbgfs_version_read(struct file *file,
+ char __user *buf, size_t count, loff_t *ppos)
+{
+ return simple_read_from_buffer(buf, count, ppos, "1\n", 2);
+}
+
static const struct file_operations mk_contexts_fops = {
.owner = THIS_MODULE,
.write = dbgfs_mk_context_write,
@@ -1149,13 +1160,18 @@ static const struct file_operations monitor_on_fops = {
.write = dbgfs_monitor_on_write,
};
+static const struct file_operations version_fops = {
+ .owner = THIS_MODULE,
+ .read = dbgfs_version_read,
+};
+
static int __init __damon_dbgfs_init(void)
{
struct dentry *dbgfs_root;
const char * const file_names[] = {"mk_contexts", "rm_contexts",
- "monitor_on"};
+ "monitor_on", "version"};
const struct file_operations *fops[] = {&mk_contexts_fops,
- &rm_contexts_fops, &monitor_on_fops};
+ &rm_contexts_fops, &monitor_on_fops, &version_fops};
int i;
dbgfs_root = debugfs_create_dir("damon", NULL);
--
2.17.1
From: SeongJae Park <[email protected]>
This commit makes DAMON to apply schemes to regions having higher
priority first, if it cannot apply schemes to all regions due to the
speed limit.
The prioritization function should be implemented in each monitoring
primitive. Those would commonly calculate the priority of the region
using attributes of regions, namely 'size', 'nr_accesses', and 'age'.
For example, some primitive would calculate the priority of each region
using a weighted sum of 'nr_accesses' and 'age' of the region.
The optimal weights would depend on give environments, so this commit
allows it to be customizable. Nevertheless, the score calculation
functions are only encouraged to respect the weights, not mandated. So,
the customization might not work for some primitives.
Signed-off-by: SeongJae Park <[email protected]>
---
include/linux/damon.h | 27 ++++++++++++++++-
mm/damon/core.c | 70 ++++++++++++++++++++++++++++++++++++-------
2 files changed, 86 insertions(+), 11 deletions(-)
diff --git a/include/linux/damon.h b/include/linux/damon.h
index 0df81dd2d560..8f35bd94fc2b 100644
--- a/include/linux/damon.h
+++ b/include/linux/damon.h
@@ -14,6 +14,8 @@
/* Minimal region size. Every damon_region is aligned by this. */
#define DAMON_MIN_REGION PAGE_SIZE
+/* Max priority score for DAMON-based operation schemes */
+#define DAMOS_MAX_SCORE (99)
/**
* struct damon_addr_range - Represents an address region of [@start, @end).
@@ -92,8 +94,18 @@ enum damos_action {
* @sz: Scheme action amount limit in bytes.
* @ms: Scheme action amount charge duration.
*
+ * @weight_sz: Weight of the region's size for prioritization.
+ * @weight_nr_accesses: Weight of the region's nr_accesses for prioritization.
+ * @weight_age: Weight of the region's age for prioritization.
+ *
* To avoid consuming too much CPU time for applying the &struct damos->action
- * to large memory, DAMON applies it to only up to &sz bytes within &ms.
+ * to large memory, DAMON applies it to only up to &sz bytes within &ms. For
+ * selecting regions within the limit, DAMON prioritizes current scheme's
+ * target memory regions using the given &struct
+ * damon_primitive->get_scheme_score. You could customize the prioritization
+ * logic for your environment by setting &weight_sz, &weight_nr_accesses, and
+ * &weight_age, because primitives are encouraged to respect those, though it's
+ * not mandatory.
*
* If &sz is 0, the limit is disabled.
*/
@@ -101,11 +113,18 @@ struct damos_speed_limit {
unsigned long sz;
unsigned long ms;
+ unsigned int weight_sz;
+ unsigned int weight_nr_accesses;
+ unsigned int weight_age;
+
/* private: for limit accounting */
unsigned long charged_sz;
unsigned long charged_from;
struct damon_target *charge_target_from;
unsigned long charge_addr_from;
+
+ unsigned long histogram[DAMOS_MAX_SCORE + 1];
+ unsigned int min_score;
};
/**
@@ -155,6 +174,7 @@ struct damon_ctx;
* @prepare_access_checks: Prepare next access check of target regions.
* @check_accesses: Check the accesses to target regions.
* @reset_aggregated: Reset aggregated accesses monitoring results.
+ * @get_scheme_score: Get the score of a region for a scheme.
* @apply_scheme: Apply a DAMON-based operation scheme.
* @target_valid: Determine if the target is valid.
* @cleanup: Clean up the context.
@@ -182,6 +202,8 @@ struct damon_ctx;
* of its update. The value will be used for regions adjustment threshold.
* @reset_aggregated should reset the access monitoring results that aggregated
* by @check_accesses.
+ * @get_scheme_score should return the priority score of a region for a scheme
+ * as an integer in [0, &DAMOS_MAX_SCORE].
* @apply_scheme is called from @kdamond when a region for user provided
* DAMON-based operation scheme is found. It should apply the scheme's action
* to the region. This is not used for &DAMON_ARBITRARY_TARGET case.
@@ -196,6 +218,9 @@ struct damon_primitive {
void (*prepare_access_checks)(struct damon_ctx *context);
unsigned int (*check_accesses)(struct damon_ctx *context);
void (*reset_aggregated)(struct damon_ctx *context);
+ int (*get_scheme_score)(struct damon_ctx *context,
+ struct damon_target *t, struct damon_region *r,
+ struct damos *scheme);
int (*apply_scheme)(struct damon_ctx *context, struct damon_target *t,
struct damon_region *r, struct damos *scheme);
bool (*target_valid)(void *target);
diff --git a/mm/damon/core.c b/mm/damon/core.c
index fab687f18d9c..15bcd05670d1 100644
--- a/mm/damon/core.c
+++ b/mm/damon/core.c
@@ -12,6 +12,7 @@
#include <linux/kthread.h>
#include <linux/random.h>
#include <linux/slab.h>
+#include <linux/string.h>
#define CREATE_TRACE_POINTS
#include <trace/events/damon.h>
@@ -105,11 +106,13 @@ struct damos *damon_new_scheme(
scheme->limit.sz = limit->sz;
scheme->limit.ms = limit->ms;
+ scheme->limit.weight_sz = limit->weight_sz;
+ scheme->limit.weight_nr_accesses = limit->weight_nr_accesses;
+ scheme->limit.weight_age = limit->weight_age;
scheme->limit.charged_sz = 0;
scheme->limit.charged_from = 0;
scheme->limit.charge_target_from = NULL;
scheme->limit.charge_addr_from = 0;
-
return scheme;
}
@@ -546,6 +549,28 @@ static void kdamond_reset_aggregated(struct damon_ctx *c)
static void damon_split_region_at(struct damon_ctx *ctx,
struct damon_region *r, unsigned long sz_r);
+static bool __damos_valid_target(struct damon_region *r, struct damos *s)
+{
+ unsigned long sz;
+
+ sz = r->ar.end - r->ar.start;
+ return s->min_sz_region <= sz && sz <= s->max_sz_region &&
+ s->min_nr_accesses <= r->nr_accesses &&
+ r->nr_accesses <= s->max_nr_accesses &&
+ s->min_age_region <= r->age && r->age <= s->max_age_region;
+}
+
+static bool damos_valid_target(struct damon_ctx *c, struct damon_target *t,
+ struct damon_region *r, struct damos *s)
+{
+ bool ret = __damos_valid_target(r, s);
+
+ if (!ret || !s->limit.sz || !c->primitive.get_scheme_score)
+ return ret;
+
+ return c->primitive.get_scheme_score(c, t, r, s) >= s->limit.min_score;
+}
+
static void damon_do_apply_schemes(struct damon_ctx *c,
struct damon_target *t,
struct damon_region *r)
@@ -575,17 +600,11 @@ static void damon_do_apply_schemes(struct damon_ctx *c,
limit->charge_addr_from = 0;
}
- sz = r->ar.end - r->ar.start;
- /* Check the target regions condition */
- if (sz < s->min_sz_region || s->max_sz_region < sz)
- continue;
- if (r->nr_accesses < s->min_nr_accesses ||
- s->max_nr_accesses < r->nr_accesses)
- continue;
- if (r->age < s->min_age_region || s->max_age_region < r->age)
+ if (!damos_valid_target(c, t, r, s))
continue;
/* Apply the scheme */
+ sz = r->ar.end - r->ar.start;
if (c->primitive.apply_scheme) {
if (limit->sz && limit->charged_sz + sz > limit->sz) {
sz = limit->sz - limit->charged_sz;
@@ -615,13 +634,44 @@ static void kdamond_apply_schemes(struct damon_ctx *c)
damon_for_each_scheme(s, c) {
struct damos_speed_limit *limit = &s->limit;
+ unsigned long cumulated_sz;
+ unsigned int score, max_score = 0;
+
+ if (!limit->sz)
+ continue;
/* Reset charge window if the duration passed */
- if (limit->sz && time_after_eq(jiffies, s->limit.charged_from +
+ if (time_after_eq(jiffies, s->limit.charged_from +
msecs_to_jiffies(s->limit.ms))) {
limit->charged_from = jiffies;
limit->charged_sz = 0;
}
+
+ if (!c->primitive.get_scheme_score)
+ continue;
+
+ /* Fill up the score histogram */
+ memset(limit->histogram, 0, sizeof(limit->histogram));
+ damon_for_each_target(t, c) {
+ damon_for_each_region(r, t) {
+ if (!__damos_valid_target(r, s))
+ continue;
+ score = c->primitive.get_scheme_score(
+ c, t, r, s);
+ limit->histogram[score] +=
+ r->ar.end - r->ar.start;
+ if (score > max_score)
+ max_score = score;
+ }
+ }
+
+ /* Set the min score limit */
+ for (cumulated_sz = 0, score = max_score; ; score--) {
+ cumulated_sz += limit->histogram[score];
+ if (cumulated_sz >= limit->sz || !score)
+ break;
+ }
+ limit->min_score = score;
}
damon_for_each_target(t, c) {
--
2.17.1
From: SeongJae Park <[email protected]>
This commit updates DAMON selftests to support updated schemes debugfs
file format.
Signed-off-by: SeongJae Park <[email protected]>
---
tools/testing/selftests/damon/debugfs_attrs.sh | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/damon/debugfs_attrs.sh b/tools/testing/selftests/damon/debugfs_attrs.sh
index 61fd3e5598e9..012b0c1fdbd3 100755
--- a/tools/testing/selftests/damon/debugfs_attrs.sh
+++ b/tools/testing/selftests/damon/debugfs_attrs.sh
@@ -101,7 +101,7 @@ echo $ORIG_CONTENT > $file
file="$DBGFS/schemes"
ORIG_CONTENT=$(cat $file)
-echo "1 2 3 4 5 6 3" > $file
+echo "1 2 3 4 5 6 3 0 0" > $file
if [ $? -ne 0 ]
then
echo "$file write fail"
@@ -110,7 +110,7 @@ then
fi
echo "1 2
-3 4 5 6 3" > $file
+3 4 5 6 3 0 0" > $file
if [ $? -eq 0 ]
then
echo "$file multi line write success (expected fail)"
--
2.17.1
From: SeongJae Park <[email protected]>
This commit makes the default monitoring primitives for virtual address
spaces and the physical address sapce to support memory regions
prioritization for 'PAGEOUT' DAMOS action. It calculates hotness of
each region as weighted sum of 'nr_accesses' and 'age' of the region and
get the priority score as reverse of the hotness, so that cold regions
can be paged out first.
Signed-off-by: SeongJae Park <[email protected]>
---
include/linux/damon.h | 4 ++++
mm/damon/paddr.c | 14 +++++++++++++
mm/damon/prmtv-common.c | 46 +++++++++++++++++++++++++++++++++++++++++
mm/damon/prmtv-common.h | 3 +++
mm/damon/vaddr.c | 15 ++++++++++++++
5 files changed, 82 insertions(+)
diff --git a/include/linux/damon.h b/include/linux/damon.h
index 8f35bd94fc2b..565f49d8ba44 100644
--- a/include/linux/damon.h
+++ b/include/linux/damon.h
@@ -428,6 +428,8 @@ bool damon_va_target_valid(void *t);
void damon_va_cleanup(struct damon_ctx *ctx);
int damon_va_apply_scheme(struct damon_ctx *context, struct damon_target *t,
struct damon_region *r, struct damos *scheme);
+int damon_va_scheme_score(struct damon_ctx *context, struct damon_target *t,
+ struct damon_region *r, struct damos *scheme);
void damon_va_set_primitives(struct damon_ctx *ctx);
#endif /* CONFIG_DAMON_VADDR */
@@ -438,6 +440,8 @@ void damon_va_set_primitives(struct damon_ctx *ctx);
void damon_pa_prepare_access_checks(struct damon_ctx *ctx);
unsigned int damon_pa_check_accesses(struct damon_ctx *ctx);
bool damon_pa_target_valid(void *t);
+int damon_pa_scheme_score(struct damon_ctx *context, struct damon_target *t,
+ struct damon_region *r, struct damos *scheme);
void damon_pa_set_primitives(struct damon_ctx *ctx);
#endif /* CONFIG_DAMON_PADDR */
diff --git a/mm/damon/paddr.c b/mm/damon/paddr.c
index 303db372e53b..99a579e8d046 100644
--- a/mm/damon/paddr.c
+++ b/mm/damon/paddr.c
@@ -121,6 +121,19 @@ int damon_pa_apply_scheme(struct damon_ctx *ctx, struct damon_target *t,
return 0;
}
+int damon_pa_scheme_score(struct damon_ctx *context, struct damon_target *t,
+ struct damon_region *r, struct damos *scheme)
+{
+ switch (scheme->action) {
+ case DAMOS_PAGEOUT:
+ return damon_pageout_score(context, r, scheme);
+ default:
+ break;
+ }
+
+ return DAMOS_MAX_SCORE;
+}
+
void damon_pa_set_primitives(struct damon_ctx *ctx)
{
ctx->primitive.init = NULL;
@@ -131,4 +144,5 @@ void damon_pa_set_primitives(struct damon_ctx *ctx)
ctx->primitive.target_valid = damon_pa_target_valid;
ctx->primitive.cleanup = NULL;
ctx->primitive.apply_scheme = damon_pa_apply_scheme;
+ ctx->primitive.get_scheme_score = damon_pa_scheme_score;
}
diff --git a/mm/damon/prmtv-common.c b/mm/damon/prmtv-common.c
index 01c1c1b37859..ca637a2bf7d8 100644
--- a/mm/damon/prmtv-common.c
+++ b/mm/damon/prmtv-common.c
@@ -236,3 +236,49 @@ bool damon_pa_young(unsigned long paddr, unsigned long *page_sz)
*page_sz = result.page_sz;
return result.accessed;
}
+
+#define DAMON_MAX_SUBSCORE (100)
+#define DAMON_MAX_AGE_IN_LOG (32)
+
+int damon_pageout_score(struct damon_ctx *c, struct damon_region *r,
+ struct damos *s)
+{
+ unsigned int max_nr_accesses;
+ int freq_subscore;
+ unsigned int age_in_sec;
+ int age_in_log, age_subscore;
+ unsigned int freq_weight = s->limit.weight_nr_accesses;
+ unsigned int age_weight = s->limit.weight_age;
+ int hotness;
+
+ max_nr_accesses = c->aggr_interval / c->sample_interval;
+ freq_subscore = r->nr_accesses * DAMON_MAX_SUBSCORE / max_nr_accesses;
+
+ age_in_sec = (unsigned long)r->age * c->aggr_interval / 1000000;
+ for (age_in_log = 0; age_in_log < DAMON_MAX_AGE_IN_LOG && age_in_sec;
+ age_in_log++, age_in_sec >>= 1)
+ ;
+
+ /* If frequency is 0, higher age means it's colder */
+ if (freq_subscore == 0)
+ age_in_log *= -1;
+
+ /*
+ * Now age_in_log is in [-DAMON_MAX_AGE_IN_LOG, DAMON_MAX_AGE_IN_LOG].
+ * Scale it to be in [0, 100] and set it as age subscore.
+ */
+ age_in_log += DAMON_MAX_AGE_IN_LOG;
+ age_subscore = age_in_log * DAMON_MAX_SUBSCORE /
+ DAMON_MAX_AGE_IN_LOG / 2;
+
+ hotness = (freq_weight * freq_subscore + age_weight * age_subscore);
+ if (freq_weight + age_weight)
+ hotness /= freq_weight + age_weight;
+ /*
+ * Transform it to fit in [0, DAMOS_MAX_SCORE]
+ */
+ hotness = hotness * DAMOS_MAX_SCORE / DAMON_MAX_SUBSCORE;
+
+ /* Return coldness of the region */
+ return DAMOS_MAX_SCORE - hotness;
+}
diff --git a/mm/damon/prmtv-common.h b/mm/damon/prmtv-common.h
index ba0c4eecbb79..b27c4e94917e 100644
--- a/mm/damon/prmtv-common.h
+++ b/mm/damon/prmtv-common.h
@@ -26,3 +26,6 @@ bool damon_va_young(struct mm_struct *mm, unsigned long addr,
void damon_pa_mkold(unsigned long paddr);
bool damon_pa_young(unsigned long paddr, unsigned long *page_sz);
+
+int damon_pageout_score(struct damon_ctx *c, struct damon_region *r,
+ struct damos *s);
diff --git a/mm/damon/vaddr.c b/mm/damon/vaddr.c
index de54ca70955d..cc70991076be 100644
--- a/mm/damon/vaddr.c
+++ b/mm/damon/vaddr.c
@@ -525,6 +525,20 @@ int damon_va_apply_scheme(struct damon_ctx *ctx, struct damon_target *t,
return damos_madvise(t, r, madv_action);
}
+int damon_va_scheme_score(struct damon_ctx *context, struct damon_target *t,
+ struct damon_region *r, struct damos *scheme)
+{
+
+ switch (scheme->action) {
+ case DAMOS_PAGEOUT:
+ return damon_pageout_score(context, r, scheme);
+ default:
+ break;
+ }
+
+ return DAMOS_MAX_SCORE;
+}
+
void damon_va_set_primitives(struct damon_ctx *ctx)
{
ctx->primitive.init = damon_va_init;
@@ -535,6 +549,7 @@ void damon_va_set_primitives(struct damon_ctx *ctx)
ctx->primitive.target_valid = damon_va_target_valid;
ctx->primitive.cleanup = damon_va_cleanup;
ctx->primitive.apply_scheme = damon_va_apply_scheme;
+ ctx->primitive.get_scheme_score = damon_va_scheme_score;
}
#include "vaddr-test.h"
--
2.17.1
From: SeongJae Park <[email protected]>
This commit allows DAMON debugfs interface users set the prioritization
weights by putting three more numbers to the 'schemes' file.
Signed-off-by: SeongJae Park <[email protected]>
---
mm/damon/dbgfs.c | 16 +++++++++++-----
1 file changed, 11 insertions(+), 5 deletions(-)
diff --git a/mm/damon/dbgfs.c b/mm/damon/dbgfs.c
index ea6d4fdb57fa..b90287b1e576 100644
--- a/mm/damon/dbgfs.c
+++ b/mm/damon/dbgfs.c
@@ -227,11 +227,14 @@ static ssize_t sprint_schemes(struct damon_ctx *c, char *buf, ssize_t len)
damon_for_each_scheme(s, c) {
rc = scnprintf(&buf[written], len - written,
- "%lu %lu %u %u %u %u %d %lu %lu %lu %lu\n",
+ "%lu %lu %u %u %u %u %d %lu %lu %u %u %u %lu %lu\n",
s->min_sz_region, s->max_sz_region,
s->min_nr_accesses, s->max_nr_accesses,
s->min_age_region, s->max_age_region,
s->action, s->limit.sz, s->limit.ms,
+ s->limit.weight_sz,
+ s->limit.weight_nr_accesses,
+ s->limit.weight_age,
s->stat_count, s->stat_sz);
if (!rc)
return -ENOMEM;
@@ -313,11 +316,14 @@ static struct damos **str_to_schemes(const char *str, ssize_t len,
while (pos < len && *nr_schemes < max_nr_schemes) {
struct damos_speed_limit limit = {};
- ret = sscanf(&str[pos], "%lu %lu %u %u %u %u %u %lu %lu%n",
+ ret = sscanf(&str[pos],
+ "%lu %lu %u %u %u %u %u %lu %lu %u %u %u%n",
&min_sz, &max_sz, &min_nr_a, &max_nr_a,
&min_age, &max_age, &action, &limit.sz,
- &limit.ms, &parsed);
- if (ret != 9)
+ &limit.ms, &limit.weight_sz,
+ &limit.weight_nr_accesses, &limit.weight_age,
+ &parsed);
+ if (ret != 12)
break;
if (!damos_action_valid(action)) {
pr_err("wrong action %d\n", action);
@@ -1141,7 +1147,7 @@ static ssize_t dbgfs_monitor_on_write(struct file *file,
static ssize_t dbgfs_version_read(struct file *file,
char __user *buf, size_t count, loff_t *ppos)
{
- return simple_read_from_buffer(buf, count, ppos, "1\n", 2);
+ return simple_read_from_buffer(buf, count, ppos, "2\n", 2);
}
static const struct file_operations mk_contexts_fops = {
--
2.17.1
From: SeongJae Park <[email protected]>
This commit updates the DAMON selftests for 'schemes' debugfs file, as
the file format is updated.
Signed-off-by: SeongJae Park <[email protected]>
---
tools/testing/selftests/damon/debugfs_attrs.sh | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/damon/debugfs_attrs.sh b/tools/testing/selftests/damon/debugfs_attrs.sh
index 012b0c1fdbd3..262034d8efa5 100755
--- a/tools/testing/selftests/damon/debugfs_attrs.sh
+++ b/tools/testing/selftests/damon/debugfs_attrs.sh
@@ -101,7 +101,7 @@ echo $ORIG_CONTENT > $file
file="$DBGFS/schemes"
ORIG_CONTENT=$(cat $file)
-echo "1 2 3 4 5 6 3 0 0" > $file
+echo "1 2 3 4 5 6 3 0 0 1 2 3" > $file
if [ $? -ne 0 ]
then
echo "$file write fail"
@@ -110,7 +110,7 @@ then
fi
echo "1 2
-3 4 5 6 3 0 0" > $file
+3 4 5 6 3 0 0 1 2 3" > $file
if [ $? -eq 0 ]
then
echo "$file multi line write success (expected fail)"
--
2.17.1
From: SeongJae Park <[email protected]>
DAMON-based operation schemes need to be manually turned on and off. In
some use cases, however, the condition for turning a scheme on and off
would depend on the system's situation. For example, schemes for
proactive pages reclamation would need to be turned on when some memory
pressure is detected, and turned off when the system has enough free
memory.
For easier control of schemes activation based on the system situation,
this commit introduces a watermarks-based mechanism. The client can
describe the watermark metric (e.g., amount of free memory in the
system), watermark check interval, and three watermarks, namely high,
mid, and low. If the scheme is deactivated, it only gets the metric and
compare that to the three watermarks for every check interval. If the
metric is higher than the high watermark, the scheme is deactivated. If
the metric is between the mid watermark and the low watermark, the
scheme is activated. If the metric is lower than the low watermark, the
scheme is deactivated again. This is to allow users fall back to
traditional page-granularity mechanisms.
Signed-off-by: SeongJae Park <[email protected]>
---
include/linux/damon.h | 52 +++++++++++++++++++++++++-
mm/damon/core.c | 87 ++++++++++++++++++++++++++++++++++++++++++-
mm/damon/dbgfs.c | 5 ++-
3 files changed, 141 insertions(+), 3 deletions(-)
diff --git a/include/linux/damon.h b/include/linux/damon.h
index 565f49d8ba44..2edd84e98056 100644
--- a/include/linux/damon.h
+++ b/include/linux/damon.h
@@ -127,6 +127,45 @@ struct damos_speed_limit {
unsigned int min_score;
};
+/**
+ * enum damos_wmark_metric - Represents the watermark metric.
+ *
+ * @DAMOS_WMARK_NONE: Ignore the watermarks of the given scheme.
+ * @DAMOS_WMARK_FREE_MEM_RATE: Free memory rate of the system in [0,1000].
+ */
+enum damos_wmark_metric {
+ DAMOS_WMARK_NONE,
+ DAMOS_WMARK_FREE_MEM_RATE,
+};
+
+/**
+ * struct damos_watermarks - Controls when a given scheme should be activated.
+ * @metric: Metric for the watermarks.
+ * @interval: Watermarks check time interval in microseconds.
+ * @high: High watermark.
+ * @mid: Middle watermark.
+ * @low: Low watermark.
+ *
+ * If &metric is &DAMOS_WMARK_NONE, the scheme is always active. Being active
+ * means DAMON does monitoring and applying the action of the scheme to
+ * appropriate memory regions. Else, DAMON checks &metric of the system for at
+ * least every &interval microseconds and works as below.
+ *
+ * If &metric is higher than &high, the scheme is inactivated. If &metric is
+ * between &mid and &low, the scheme is activated. If &metric is lower than
+ * &low, the scheme is inactivated.
+ */
+struct damos_watermarks {
+ enum damos_wmark_metric metric;
+ unsigned long interval;
+ unsigned long high;
+ unsigned long mid;
+ unsigned long low;
+
+/* private: */
+ bool activated;
+};
+
/**
* struct damos - Represents a Data Access Monitoring-based Operation Scheme.
* @min_sz_region: Minimum size of target regions.
@@ -137,6 +176,7 @@ struct damos_speed_limit {
* @max_age_region: Maximum age of target regions.
* @action: &damo_action to be applied to the target regions.
* @limit: Control the aggressiveness of this scheme.
+ * @wmarks: Watermarks for automated (in)activation of this scheme.
* @stat_count: Total number of regions that this scheme is applied.
* @stat_sz: Total size of regions that this scheme is applied.
* @list: List head for siblings.
@@ -146,6 +186,14 @@ struct damos_speed_limit {
* &max_nr_accesses, &min_age_region, &max_age_region) and applies &action to
* those. To avoid consuming too much CPU for the &action, &limit is used.
*
+ * To do the work only when needed, schemes can be activated for specific
+ * system situations using &wmarks. If all schemes that registered to the
+ * monitoring context are inactive, DAMON stops monitoring either, and just
+ * repeatedly checks the watermarks.
+ *
+ * If all schemes that registered to a &struct damon_ctx are inactive, DAMON
+ * stops monitoring and just repeatedly checks the watermarks.
+ *
* After applying the &action to each region, &stat_count and &stat_sz is
* updated to reflect the number of regions and total size of regions that the
* &action is applied.
@@ -159,6 +207,7 @@ struct damos {
unsigned int max_age_region;
enum damos_action action;
struct damos_speed_limit limit;
+ struct damos_watermarks wmarks;
unsigned long stat_count;
unsigned long stat_sz;
struct list_head list;
@@ -391,7 +440,8 @@ struct damos *damon_new_scheme(
unsigned long min_sz_region, unsigned long max_sz_region,
unsigned int min_nr_accesses, unsigned int max_nr_accesses,
unsigned int min_age_region, unsigned int max_age_region,
- enum damos_action action, struct damos_speed_limit *limit);
+ enum damos_action action, struct damos_speed_limit *limit,
+ struct damos_watermarks *wmarks);
void damon_add_scheme(struct damon_ctx *ctx, struct damos *s);
void damon_destroy_scheme(struct damos *s);
diff --git a/mm/damon/core.c b/mm/damon/core.c
index 15bcd05670d1..1c5b581700ef 100644
--- a/mm/damon/core.c
+++ b/mm/damon/core.c
@@ -10,6 +10,7 @@
#include <linux/damon.h>
#include <linux/delay.h>
#include <linux/kthread.h>
+#include <linux/mm.h>
#include <linux/random.h>
#include <linux/slab.h>
#include <linux/string.h>
@@ -86,7 +87,8 @@ struct damos *damon_new_scheme(
unsigned long min_sz_region, unsigned long max_sz_region,
unsigned int min_nr_accesses, unsigned int max_nr_accesses,
unsigned int min_age_region, unsigned int max_age_region,
- enum damos_action action, struct damos_speed_limit *limit)
+ enum damos_action action, struct damos_speed_limit *limit,
+ struct damos_watermarks *wmarks)
{
struct damos *scheme;
@@ -113,6 +115,14 @@ struct damos *damon_new_scheme(
scheme->limit.charged_from = 0;
scheme->limit.charge_target_from = NULL;
scheme->limit.charge_addr_from = 0;
+
+ scheme->wmarks.metric = wmarks->metric;
+ scheme->wmarks.interval = wmarks->interval;
+ scheme->wmarks.high = wmarks->high;
+ scheme->wmarks.mid = wmarks->mid;
+ scheme->wmarks.low = wmarks->low;
+ scheme->wmarks.activated = true;
+
return scheme;
}
@@ -581,6 +591,9 @@ static void damon_do_apply_schemes(struct damon_ctx *c,
damon_for_each_scheme(s, c) {
struct damos_speed_limit *limit = &s->limit;
+ if (!s->wmarks.activated)
+ continue;
+
/* Check the limit */
if (limit->sz && limit->charged_sz >= limit->sz)
continue;
@@ -637,6 +650,9 @@ static void kdamond_apply_schemes(struct damon_ctx *c)
unsigned long cumulated_sz;
unsigned int score, max_score = 0;
+ if (!s->wmarks.activated)
+ continue;
+
if (!limit->sz)
continue;
@@ -876,6 +892,68 @@ static bool kdamond_need_stop(struct damon_ctx *ctx)
return true;
}
+static unsigned long damos_wmark_metric_value(enum damos_wmark_metric metric)
+{
+ struct sysinfo i;
+
+ switch (metric) {
+ case DAMOS_WMARK_FREE_MEM_RATE:
+ si_meminfo(&i);
+ return i.freeram * 1000 / i.totalram;
+ default:
+ break;
+ }
+ return -EINVAL;
+}
+
+/*
+ * Returns zero if the scheme is active. Else, returns time to wait for next
+ * watermark check in micro-seconds.
+ */
+static unsigned long damos_wmark_wait_us(struct damos *scheme)
+{
+ unsigned long metric;
+
+ if (scheme->wmarks.metric == DAMOS_WMARK_NONE)
+ return 0;
+
+ metric = damos_wmark_metric_value(scheme->wmarks.metric);
+ /* higher than high watermark or lower than low watermark */
+ if (metric > scheme->wmarks.high || scheme->wmarks.low > metric) {
+ if (scheme->wmarks.activated)
+ pr_info("inactivate a scheme (%d) for %s wmark\n",
+ scheme->action,
+ metric > scheme->wmarks.activated ?
+ "high" : "low");
+ scheme->wmarks.activated = false;
+ return scheme->wmarks.interval;
+ }
+
+ /* inactive and higher than middle watermark */
+ if ((scheme->wmarks.high >= metric && metric >= scheme->wmarks.mid) &&
+ !scheme->wmarks.activated)
+ return scheme->wmarks.interval;
+
+ if (!scheme->wmarks.activated)
+ pr_info("activate a scheme (%d)\n", scheme->action);
+ scheme->wmarks.activated = true;
+ return 0;
+}
+
+static unsigned long kdamond_wmark_wait_us(struct damon_ctx *ctx)
+{
+ struct damos *s;
+ unsigned long wait_time;
+ unsigned long min_wait_time = 0;
+
+ damon_for_each_scheme(s, ctx) {
+ wait_time = damos_wmark_wait_us(s);
+ if (!min_wait_time || wait_time < min_wait_time)
+ min_wait_time = wait_time;
+ }
+ return min_wait_time;
+}
+
static void set_kdamond_stop(struct damon_ctx *ctx)
{
mutex_lock(&ctx->kdamond_lock);
@@ -904,6 +982,13 @@ static int kdamond_fn(void *data)
sz_limit = damon_region_sz_limit(ctx);
while (!kdamond_need_stop(ctx)) {
+ unsigned long wmark_wait_us = kdamond_wmark_wait_us(ctx);
+
+ if (wmark_wait_us) {
+ usleep_range(wmark_wait_us, wmark_wait_us + 1);
+ continue;
+ }
+
if (ctx->primitive.prepare_access_checks)
ctx->primitive.prepare_access_checks(ctx);
if (ctx->callback.after_sampling &&
diff --git a/mm/damon/dbgfs.c b/mm/damon/dbgfs.c
index b90287b1e576..1680fb1be8e1 100644
--- a/mm/damon/dbgfs.c
+++ b/mm/damon/dbgfs.c
@@ -315,6 +315,9 @@ static struct damos **str_to_schemes(const char *str, ssize_t len,
*nr_schemes = 0;
while (pos < len && *nr_schemes < max_nr_schemes) {
struct damos_speed_limit limit = {};
+ struct damos_watermarks wmarks = {
+ .metric = DAMOS_WMARK_NONE,
+ };
ret = sscanf(&str[pos],
"%lu %lu %u %u %u %u %u %lu %lu %u %u %u%n",
@@ -332,7 +335,7 @@ static struct damos **str_to_schemes(const char *str, ssize_t len,
pos += parsed;
scheme = damon_new_scheme(min_sz, max_sz, min_nr_a, max_nr_a,
- min_age, max_age, action, &limit);
+ min_age, max_age, action, &limit, &wmarks);
if (!scheme)
goto fail;
--
2.17.1
From: SeongJae Park <[email protected]>
This commit updates DAMON selftests for 'schemes' debugfs file to
reflect the changes in the format.
Signed-off-by: SeongJae Park <[email protected]>
---
tools/testing/selftests/damon/debugfs_attrs.sh | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/damon/debugfs_attrs.sh b/tools/testing/selftests/damon/debugfs_attrs.sh
index 262034d8efa5..90440cb3aee8 100755
--- a/tools/testing/selftests/damon/debugfs_attrs.sh
+++ b/tools/testing/selftests/damon/debugfs_attrs.sh
@@ -101,7 +101,7 @@ echo $ORIG_CONTENT > $file
file="$DBGFS/schemes"
ORIG_CONTENT=$(cat $file)
-echo "1 2 3 4 5 6 3 0 0 1 2 3" > $file
+echo "1 2 3 4 5 6 3 0 0 1 2 3 1 100 3 2 1" > $file
if [ $? -ne 0 ]
then
echo "$file write fail"
@@ -110,7 +110,7 @@ then
fi
echo "1 2
-3 4 5 6 3 0 0 1 2 3" > $file
+3 4 5 6 3 0 0 1 2 3 1 100 3 2 1" > $file
if [ $? -eq 0 ]
then
echo "$file multi line write success (expected fail)"
--
2.17.1
From: SeongJae Park <[email protected]>
This commit updates DAMON debugfs interface to support the watermarks
based schemes activation. For this, now 'schemes' file receives five
more values.
Signed-off-by: SeongJae Park <[email protected]>
---
mm/damon/dbgfs.c | 14 ++++++++------
1 file changed, 8 insertions(+), 6 deletions(-)
diff --git a/mm/damon/dbgfs.c b/mm/damon/dbgfs.c
index 1680fb1be8e1..768ef3eb9550 100644
--- a/mm/damon/dbgfs.c
+++ b/mm/damon/dbgfs.c
@@ -227,7 +227,7 @@ static ssize_t sprint_schemes(struct damon_ctx *c, char *buf, ssize_t len)
damon_for_each_scheme(s, c) {
rc = scnprintf(&buf[written], len - written,
- "%lu %lu %u %u %u %u %d %lu %lu %u %u %u %lu %lu\n",
+ "%lu %lu %u %u %u %u %d %lu %lu %u %u %u %d %lu %lu %lu %lu %lu %lu\n",
s->min_sz_region, s->max_sz_region,
s->min_nr_accesses, s->max_nr_accesses,
s->min_age_region, s->max_age_region,
@@ -235,6 +235,8 @@ static ssize_t sprint_schemes(struct damon_ctx *c, char *buf, ssize_t len)
s->limit.weight_sz,
s->limit.weight_nr_accesses,
s->limit.weight_age,
+ s->wmarks.metric, s->wmarks.interval,
+ s->wmarks.high, s->wmarks.mid, s->wmarks.low,
s->stat_count, s->stat_sz);
if (!rc)
return -ENOMEM;
@@ -315,18 +317,18 @@ static struct damos **str_to_schemes(const char *str, ssize_t len,
*nr_schemes = 0;
while (pos < len && *nr_schemes < max_nr_schemes) {
struct damos_speed_limit limit = {};
- struct damos_watermarks wmarks = {
- .metric = DAMOS_WMARK_NONE,
- };
+ struct damos_watermarks wmarks;
ret = sscanf(&str[pos],
- "%lu %lu %u %u %u %u %u %lu %lu %u %u %u%n",
+ "%lu %lu %u %u %u %u %u %lu %lu %u %u %u %u %lu %lu %lu %lu%n",
&min_sz, &max_sz, &min_nr_a, &max_nr_a,
&min_age, &max_age, &action, &limit.sz,
&limit.ms, &limit.weight_sz,
&limit.weight_nr_accesses, &limit.weight_age,
+ &wmarks.metric, &wmarks.interval,
+ &wmarks.high, &wmarks.mid, &wmarks.low,
&parsed);
- if (ret != 12)
+ if (ret != 17)
break;
if (!damos_action_valid(action)) {
pr_err("wrong action %d\n", action);
--
2.17.1
From: SeongJae Park <[email protected]>
This commit implements a new kernel subsystem that finds cold memory
regions using DAMON and reclaims those immediately. It is intended to
be used as proactive lightweigh reclamation logic for light memory
pressure. For heavy memory pressure, it could be inactivated and fall
back to the traditional page-scanning based reclamation.
It's implemented on top of DAMON framework to use the DAMON-based
Operation Schemes (DAMOS) feature. It utilizes all the DAMOS features
including speed limit, prioritization, and watermarks.
It could be enabled and tuned in build time via the kernel
configuration, in boot time via the kernel boot parameter, and in run
time via its module parameter ('/sys/module/damon_reclaim/parameters/')
interface.
Signed-off-by: SeongJae Park <[email protected]>
---
mm/damon/Kconfig | 128 +++++++++++++++++++++++++
mm/damon/Makefile | 1 +
mm/damon/reclaim.c | 230 +++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 359 insertions(+)
create mode 100644 mm/damon/reclaim.c
diff --git a/mm/damon/Kconfig b/mm/damon/Kconfig
index eeefb5b633b6..f629818e4793 100644
--- a/mm/damon/Kconfig
+++ b/mm/damon/Kconfig
@@ -84,4 +84,132 @@ config DAMON_DBGFS_KUNIT_TEST
If unsure, say N.
+config DAMON_RECLAIM
+ bool "Build DAMON-based reclaim (DAMON_RECLAIM)"
+ depends on DAMON_PADDR
+ help
+ This builds the DAMON-based reclamation subsystem. It finds pages
+ that not accessed for a long time (cold) using DAMON and reclaim
+ those if enabled.
+
+ This is suggested to be used as a proactive and lightweight
+ reclamation under light memory pressure, while the traditional page
+ scanning-based reclamation is used for heavy pressure.
+
+config DAMON_RECLAIM_ENABLE
+ bool "Enable DAMON-based reclaim"
+ depends on DAMON_RECLAIM
+ help
+ Make DAMON_RECLAIM starts work after booting. If this is not set,
+ user could enable it using its runtime interface.
+
+config DAMON_RECLAIM_MIN_AGE
+ int "Minimal microseconds of region to not accessed to be cold"
+ depends on DAMON_RECLAIM
+ default 5000000
+ help
+ If a memory region has not accessed for this or longer time,
+ DAMON_RECLAIM identifies the region as cold and reclaim.
+ 5 seconds by default.
+
+config DAMON_RECLAIM_LIMIT_SZ
+ int "Maximum bytes of memory to be reclaimed in each charging window"
+ depends on DAMON_RECLAIM
+ default 1073741824
+ help
+ DAMON_RECLAIM charges amount of memory reclaimed within each charging
+ time window and limits no more than this limit is charged. This
+ could be useful for limiting CPU usage of DAMON_RECLAIM.
+ 1 GiB by default.
+
+config DAMON_RECLAIM_LIMIT_MS
+ int "The reclaimed memory charging window in milliseconds"
+ depends on DAMON_RECLAIM
+ default 1000
+ help
+ The charge window for DAMON_RECLAIM_LIMIT_SZ.
+ 1 second by default.
+
+config DAMON_RECLAIM_WATERMARK_CHECK_INTERVAL
+ int "DAMON_RECLAIM watermarks check time interval in microseconds"
+ depends on DAMON_RECLAIM
+ default 5000000
+ help
+ Minimal time to wait before checking the watermarks, when
+ DAMON_RECLAIM is enabled but inactive due to its watermarks rule.
+ 5 seconds by default.
+
+config DAMON_RECLAIM_WATERMARK_HIGH
+ int "Free memory rate (per thousand) for the high watermark"
+ range 0 1000
+ depends on DAMON_RECLAIM
+ default 500
+ help
+ If free memory of the system in bytes per thousand bytes is higher
+ than this, DAMON_RECLAIM becomes inactive, so it does nothing but
+ periodically checks the watermarks. 500 by default.
+
+config DAMON_RECLAIM_WATERMARK_MID
+ int "Free memory rate (per thousand) for the middle watermark"
+ range 0 1000
+ depends on DAMON_RECLAIM
+ default 400
+ help
+ If free memory of the system in bytes per thousand bytes is between
+ this and the low watermark, DAMON_RECLAIM becomes active, so starts
+ the work.
+ 400 by default.
+
+config DAMON_RECLAIM_WATERMARK_LOW
+ int "Free memory rate (per thousand) for the low watermark"
+ range 0 1000
+ depends on DAMON_RECLAIM
+ default 200
+ help
+ If free memory of the system in bytes per thousand bytes is lower
+ than this, DAMON_RECLAIM becomes inactive, so does nothing but
+ periodically checks the watermarks. So, in this case, we fall back
+ to the traditional page scanning-based reclamation logic.
+ 200 by default.
+
+config DAMON_RECLAIM_SAMPLING_INTERVAL
+ int "Sampling interval for the monitoring in microseconds"
+ depends on DAMON_RECLAIM
+ default 5000
+ help
+ The sampling interval of DAMON for the DAMON_RECLAIM. Please refer
+ to the DAMON documentation for more detail.
+ 5 ms by default.
+
+config DAMON_RECLAIM_AGGREGATION_INTERVAL
+ int "Aggregation interval for the monitoring in microseconds"
+ depends on DAMON_RECLAIM
+ default 100000
+ help
+ The aggregation interval of DAMON for the DAMON_RECLAIM. Please
+ refer to the DAMON documentation for more detail.
+ reclaim. 100 ms by default.
+
+config DAMON_RECLAIM_MIN_NR_REGIONS
+ int "Minimum number of monitoring regions"
+ depends on DAMON_RECLAIM
+ default 10
+ help
+ The minimal number of monitoring regions for DAMON_RECLAIM. Can be
+ used to set lower-bound of the monitoring's quality. But setting
+ this too high could result in increased monitoring overhead. Please
+ refer to the DAMON documentation for more detail.
+ 10 by default.
+
+config DAMON_RECLAIM_MAX_NR_REGIONS
+ int "Maximum number of monitoring regions"
+ depends on DAMON_RECLAIM
+ default 1000
+ help
+ The maximum number of monitoring regions for DAMON-based reclaim.
+ Can be used to set upper-bound of the monitoring overhead. However,
+ setting this too low could result in bad monitoring quality. Please
+ refer to the DAMON documentation for more detail.
+ 1000 by default.
+
endmenu
diff --git a/mm/damon/Makefile b/mm/damon/Makefile
index 017799e5670a..39433e7d570c 100644
--- a/mm/damon/Makefile
+++ b/mm/damon/Makefile
@@ -5,3 +5,4 @@ obj-$(CONFIG_DAMON_VADDR) += prmtv-common.o vaddr.o
obj-$(CONFIG_DAMON_PADDR) += prmtv-common.o paddr.o
obj-$(CONFIG_DAMON_PGIDLE) += prmtv-common.o pgidle.o
obj-$(CONFIG_DAMON_DBGFS) += dbgfs.o
+obj-$(CONFIG_DAMON_RECLAIM) += reclaim.o
diff --git a/mm/damon/reclaim.c b/mm/damon/reclaim.c
new file mode 100644
index 000000000000..a95131038377
--- /dev/null
+++ b/mm/damon/reclaim.c
@@ -0,0 +1,230 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * DAMON-based page reclamation
+ *
+ * Author: SeongJae Park <[email protected]>
+ */
+
+#define pr_fmt(fmt) "damon-reclaim: " fmt
+
+#include <linux/damon.h>
+#include <linux/ioport.h>
+#include <linux/module.h>
+#include <linux/workqueue.h>
+
+#ifdef MODULE_PARAM_PREFIX
+#undef MODULE_PARAM_PREFIX
+#endif
+#define MODULE_PARAM_PREFIX "damon_reclaim."
+
+#ifndef CONFIG_DAMON_RECLAIM_ENABLE
+static bool enabled __read_mostly;
+#else
+static bool enabled __read_mostly = CONFIG_DAMON_RECLAIM_ENABLE;
+#endif
+module_param(enabled, bool, 0600);
+
+static unsigned long min_age __read_mostly = CONFIG_DAMON_RECLAIM_MIN_AGE;
+module_param(min_age, ulong, 0600);
+
+static unsigned long limit_sz __read_mostly = CONFIG_DAMON_RECLAIM_LIMIT_SZ;
+module_param(limit_sz, ulong, 0600);
+
+static unsigned long limit_ms __read_mostly = CONFIG_DAMON_RECLAIM_LIMIT_MS;
+module_param(limit_ms, ulong, 0600);
+
+static unsigned long wmarks_interval __read_mostly =
+ CONFIG_DAMON_RECLAIM_WATERMARK_CHECK_INTERVAL;
+module_param(wmarks_interval, ulong, 0600);
+
+static unsigned long wmarks_high __read_mostly =
+ CONFIG_DAMON_RECLAIM_WATERMARK_HIGH;
+module_param(wmarks_high, ulong, 0600);
+
+static unsigned long wmarks_mid __read_mostly =
+ CONFIG_DAMON_RECLAIM_WATERMARK_MID;
+module_param(wmarks_mid, ulong, 0600);
+
+static unsigned long wmarks_low __read_mostly =
+ CONFIG_DAMON_RECLAIM_WATERMARK_LOW;
+module_param(wmarks_low, ulong, 0600);
+
+static unsigned long sample_interval __read_mostly =
+ CONFIG_DAMON_RECLAIM_SAMPLING_INTERVAL;
+module_param(sample_interval, ulong, 0600);
+
+static unsigned long aggr_interval __read_mostly =
+ CONFIG_DAMON_RECLAIM_AGGREGATION_INTERVAL;
+module_param(aggr_interval, ulong, 0600);
+
+static unsigned long min_nr_regions __read_mostly =
+ CONFIG_DAMON_RECLAIM_MIN_NR_REGIONS;
+module_param(min_nr_regions, ulong, 0600);
+
+static unsigned long max_nr_regions __read_mostly =
+ CONFIG_DAMON_RECLAIM_MAX_NR_REGIONS;
+module_param(max_nr_regions, ulong, 0600);
+
+static unsigned long monitor_region_start;
+module_param(monitor_region_start, ulong, 0400);
+
+static unsigned long monitor_region_end;
+module_param(monitor_region_end, ulong, 0400);
+
+static struct damon_ctx *ctx;
+static struct damon_target *target;
+
+struct damon_reclaim_ram_walk_arg {
+ unsigned long start;
+ unsigned long end;
+};
+
+int walk_system_ram(struct resource *res, void *arg)
+{
+ struct damon_reclaim_ram_walk_arg *a = arg;
+
+ if (a->end - a->start < res->end - res->start) {
+ a->start = res->start;
+ a->end = res->end;
+ }
+ return 0;
+}
+
+/*
+ * Find biggest 'System RAM' resource and store its start and end address in
+ * @start and @end, respectively. If no System RAM is found, returns false.
+ */
+static bool get_monitoring_region(unsigned long *start, unsigned long *end)
+{
+ struct damon_reclaim_ram_walk_arg arg = {};
+
+ walk_system_ram_res(0, ULONG_MAX, &arg, walk_system_ram);
+
+ if (arg.end > arg.start) {
+ *start = arg.start;
+ *end = arg.end;
+ return true;
+ }
+
+
+ return false;
+}
+
+static struct damos *damon_reclaim_new_scheme(void)
+{
+ struct damos_watermarks wmarks = {
+ .metric = DAMOS_WMARK_FREE_MEM_RATE,
+ .interval = wmarks_interval,
+ .high = wmarks_high,
+ .mid = wmarks_mid,
+ .low = wmarks_low,
+ };
+ struct damos_speed_limit limit = {
+ .sz = limit_sz,
+ .ms = limit_ms,
+ /* Within the limit, page out older regions first. */
+ .weight_sz = 0,
+ .weight_nr_accesses = 0,
+ .weight_age = 1
+ };
+ struct damos *scheme = damon_new_scheme(
+ /* Find regions having PAGE_SIZE or larger size */
+ PAGE_SIZE, ULONG_MAX,
+ /* and not accessed at all */
+ 0, 0,
+ /* for min_age or more micro-seconds, and */
+ min_age / aggr_interval, UINT_MAX,
+ /* page out those, as soon as found */
+ DAMOS_PAGEOUT,
+ &limit,
+ /* Activate this based on the watermarks. */
+ &wmarks);
+
+ return scheme;
+}
+
+static int damon_reclaim_turn(bool on)
+{
+ struct damon_region *region;
+ struct damos *scheme;
+ int err;
+
+ if (!on)
+ return damon_stop(&ctx, 1);
+
+ err = damon_set_attrs(ctx, READ_ONCE(sample_interval),
+ READ_ONCE(aggr_interval),
+ READ_ONCE(aggr_interval) * 100,
+ min_nr_regions, max_nr_regions);
+ if (err)
+ return err;
+
+ if (!get_monitoring_region(&monitor_region_start, &monitor_region_end))
+ return -EINVAL;
+ /* DAMON will free this on its own when finish monitoring */
+ region = damon_new_region(monitor_region_start, monitor_region_end);
+ if (!region)
+ return -ENOMEM;
+ damon_add_region(region, target);
+
+ /* Will be freed by later 'damon_set_schemes()' */
+ scheme = damon_reclaim_new_scheme();
+ if (!scheme)
+ goto free_region_out;
+ err = damon_set_schemes(ctx, &scheme, 1);
+ if (err)
+ goto free_scheme_out;
+
+ err = damon_start(&ctx, 1);
+ if (err)
+ goto free_scheme_out;
+ goto out;
+
+free_scheme_out:
+ damon_destroy_scheme(scheme);
+free_region_out:
+ damon_destroy_region(region);
+out:
+ return err;
+}
+
+#define ENABLE_CHECK_INTERVAL_MS 1000
+static struct delayed_work damon_reclaim_timer;
+static void damon_reclaim_timer_fn(struct work_struct *work)
+{
+ static bool last_enabled;
+ bool now_enabled;
+
+ now_enabled = enabled;
+ if (last_enabled != now_enabled) {
+ if (!damon_reclaim_turn(now_enabled))
+ last_enabled = now_enabled;
+ else
+ enabled = last_enabled;
+ }
+
+ schedule_delayed_work(&damon_reclaim_timer,
+ msecs_to_jiffies(ENABLE_CHECK_INTERVAL_MS));
+}
+static DECLARE_DELAYED_WORK(damon_reclaim_timer, damon_reclaim_timer_fn);
+
+static int __init damon_reclaim_init(void)
+{
+ ctx = damon_new_ctx(DAMON_ADAPTIVE_TARGET);
+ if (!ctx)
+ return -ENOMEM;
+
+ damon_pa_set_primitives(ctx);
+
+ target = damon_new_target(4242);
+ if (!target) {
+ damon_destroy_ctx(ctx);
+ return -ENOMEM;
+ }
+ damon_add_target(ctx, target);
+
+ schedule_delayed_work(&damon_reclaim_timer, 0);
+ return 0;
+}
+
+module_init(damon_reclaim_init);
--
2.17.1
From: SeongJae Park <[email protected]>
On Mon, 31 May 2021 13:38:13 +0000 [email protected] wrote:
> From: SeongJae Park <[email protected]>
>
> DAMON-based operation schemes need to be manually turned on and off. In
> some use cases, however, the condition for turning a scheme on and off
> would depend on the system's situation. For example, schemes for
> proactive pages reclamation would need to be turned on when some memory
> pressure is detected, and turned off when the system has enough free
> memory.
>
> For easier control of schemes activation based on the system situation,
> this commit introduces a watermarks-based mechanism. The client can
> describe the watermark metric (e.g., amount of free memory in the
> system), watermark check interval, and three watermarks, namely high,
> mid, and low. If the scheme is deactivated, it only gets the metric and
> compare that to the three watermarks for every check interval. If the
> metric is higher than the high watermark, the scheme is deactivated. If
> the metric is between the mid watermark and the low watermark, the
> scheme is activated. If the metric is lower than the low watermark, the
> scheme is deactivated again. This is to allow users fall back to
> traditional page-granularity mechanisms.
>
> Signed-off-by: SeongJae Park <[email protected]>
> ---
> include/linux/damon.h | 52 +++++++++++++++++++++++++-
> mm/damon/core.c | 87 ++++++++++++++++++++++++++++++++++++++++++-
> mm/damon/dbgfs.c | 5 ++-
> 3 files changed, 141 insertions(+), 3 deletions(-)
>
> diff --git a/include/linux/damon.h b/include/linux/damon.h
> index 565f49d8ba44..2edd84e98056 100644
> --- a/include/linux/damon.h
> +++ b/include/linux/damon.h
> @@ -127,6 +127,45 @@ struct damos_speed_limit {
> unsigned int min_score;
> };
[...]
> static void set_kdamond_stop(struct damon_ctx *ctx)
> {
> mutex_lock(&ctx->kdamond_lock);
> @@ -904,6 +982,13 @@ static int kdamond_fn(void *data)
> sz_limit = damon_region_sz_limit(ctx);
>
> while (!kdamond_need_stop(ctx)) {
> + unsigned long wmark_wait_us = kdamond_wmark_wait_us(ctx);
> +
> + if (wmark_wait_us) {
> + usleep_range(wmark_wait_us, wmark_wait_us + 1);
> + continue;
> + }
James Gowans ([email protected]) found this will make kdamond sleeps in
TASK_UNINTERRUPTIBLE state. So, when DAMON is deactivated due to the
watermarks rule, the sysadmin assumes it would do nothing and DAMON really do
nothing. But, because it's sleeping in TASK_UNINTERRUPTIBLE state, which is
usually interpreted as waiting for I/O, '/proc/loadavg' like monitors will
report I/O loads, so that the sysadmin get confused.
In the next version of this RFC patchset, I will make this to use
'schedule_timeout_interruptible()' instead, if 'wmark_wait_us' is larger than
100ms. I will continue using 'usleep_range()' for small sleep time, to keep
the precision high.
Thanks,
SeongJae Park
[...]