LinuxLists.cc - [RFC v14 0/7] Implement Data Access Monitoring-based Memory Operation Schemes

2020-08-04 14:28:16

Subject: [RFC v14 0/7] Implement Data Access Monitoring-based Memory Operation Schemes

From: SeongJae Park <[email protected]>

Changes from Previous Version
=============================

- Drop loadable module support
- Use dedicated valid action checker function
- Rebase on v5.8 plus v19 DAMON

Introduction
============

DAMON[1] can be used as a primitive for data access awared memory management
optimizations. For that, users who want such optimizations should run DAMON,
read the monitoring results, analyze it, plan a new memory management scheme,
and apply the new scheme by themselves. Such efforts will be inevitable for
some complicated optimizations.

However, in many other cases, the users would simply want the system to apply a
memory management action to a memory region of a specific size having a
specific access frequency for a specific time. For example, "page out a memory
region larger than 100 MiB keeping only rare accesses more than 2 minutes", or
"Do not use THP for a memory region larger than 2 MiB rarely accessed for more
than 1 seconds".

This RFC patchset makes DAMON to handle such data access monitoring-based
operation schemes. With this change, users can do the data access aware
optimizations by simply specifying their schemes to DAMON.

[1] https://lore.kernel.org/linux-mm/[email protected]/

Evaluations
===========

We evaluated DAMON's overhead, monitoring quality and usefulness using 25
realistic workloads on my QEMU/KVM based virtual machine running a kernel that
v12 of this patchset is applied.

An experimental DAMON-based operation scheme for THP, ‘ethp’, removes 31.29% of
THP memory overheads while preserving 60.64% of THP speedup. Another
experimental DAMON-based ‘proactive reclamation’ implementation, ‘prcl’,
reduces 87.95% of residential sets and 29.52% of system memory footprint while
incurring only 2.15% runtime overhead in the best case (parsec3/freqmine).

NOTE that the experimentail THP optimization and proactive reclamation are not
for production, just only for proof of concepts.

Please refer to the official document[1] or "Documentation/admin-guide/mm: Add
a document for DAMON" patch in the latest DAMON patchset for detailed
evaluation setup and results.

[1] https://damonitor.github.io/doc/html/latest-damos

More Information
================

We prepared a showcase web site[1] that you can get more information. There
are

- the official documentations[2],
- the heatmap format dynamic access pattern of various realistic workloads for
heap area[3], mmap()-ed area[4], and stack[5] area,
- the dynamic working set size distribution[6] and chronological working set
size changes[7], and
- the latest performance test results[8].

[1] https://damonitor.github.io/_index
[2] https://damonitor.github.io/doc/html/latest-damos
[3] https://damonitor.github.io/test/result/visual/latest/rec.heatmap.0.html
[4] https://damonitor.github.io/test/result/visual/latest/rec.heatmap.1.html
[5] https://damonitor.github.io/test/result/visual/latest/rec.heatmap.2.html
[6] https://damonitor.github.io/test/result/visual/latest/rec.wss_sz.html
[7] https://damonitor.github.io/test/result/visual/latest/rec.wss_time.html
[8] https://damonitor.github.io/test/result/perf/latest/html/index.html

Baseline and Complete Git Tree
==============================

The patches are based on the v5.8 plus v19 DAMON patchset[1] and Minchan's
``do_madvise()`` patch[2], which retrieved from the -next tree. You can also
clone the complete git tree:

$ git clone git://github.com/sjp38/linux -b damos/rfc/v14

The web is also available:
https://github.com/sjp38/linux/releases/tag/damos/rfc/v14

There are a couple of trees for entire DAMON patchset series that future
features are included. The first one[3] contains the changes for latest
release, while the other one[4] contains the changes for next release.

[1] https://lore.kernel.org/linux-mm/[email protected]/
[2] https://lore.kernel.org/linux-mm/[email protected]/
[3] https://github.com/sjp38/linux/tree/damon/master
[4] https://github.com/sjp38/linux/tree/damon/next

Sequence Of Patches
===================

The 1st patch accounts age of each region. The 2nd patch implements the
handling of the schemes in DAMON and exports a kernel space programming
interface for it. The 3rd patch implements a debugfs interface for the
privileged people and user programs. The 4th patch implements schemes
statistics feature for easier tuning of the schemes and runtime access pattern
analysis. The 5th patche adds selftests for these changes, and the 6th patch
adds human friendly schemes support to the user space tool for DAMON. Finally,
the 7th patch documents this new feature in the document.

Patch History
=============

Changes from RFC v13
(https://lore.kernel.org/linux-mm/[email protected]/)
- Drop loadable module support
- Use dedicated valid action checker function
- Rebase on v5.8 plus v19 DAMON

Changes from RFC v12
(https://lore.kernel.org/linux-mm/[email protected]/)
- Wordsmith the document, comment, commit messages
- Support a scheme of max access count 0
- Use 'unsigned long' for (min|max)_sz_region

Changes from RFC v11
(https://lore.kernel.org/linux-mm/[email protected]/)
- Refine the commit messages (David Hildenbrand)
- Clean up debugfs code

Changes from RFC v10
(https://lore.kernel.org/linux-mm/[email protected]/)
- Fix the wrong error handling for schemes debugfs file
- Handle the schemes stats from the user space tool
- Remove the schemes implementation plan from the document

Changes from RFC v9
(https://lore.kernel.org/linux-mm/[email protected]/)
- Rebase on v5.7
- Fix wrong comments and documents for schemes apply conditions

Changes from RFC v8
(https://lore.kernel.org/linux-mm/[email protected]/)
- Rewrite the document (Stefan Nuernberger)
- Make 'damon_for_each_*' argument order consistent (Leonard Foerster)
- Implement statistics for schemes
- Avoid races between debugfs readers and writers
- Reset age for only significant access frequency changes
- Add kernel-doc comments in damon.h

Please refer to RFC v8 for previous history

SeongJae Park (7):
mm/damon: Account age of target regions
mm/damon: Implement data access monitoring-based operation schemes
mm/damon/schemes: Implement a debugfs interface
mm/damon/schemes: Implement statistics feature
mm/damon/selftests: Add 'schemes' debugfs tests
damon/tools: Support more human friendly 'schemes' control
Docs/admin-guide/mm/damon: Document DAMON-based Operation Schemes

Documentation/admin-guide/mm/damon/guide.rst | 41 +-
Documentation/admin-guide/mm/damon/start.rst | 11 +
Documentation/admin-guide/mm/damon/usage.rst | 109 +++++-
Documentation/vm/damon/index.rst | 1 -
include/linux/damon.h | 64 ++++
mm/damon.c | 360 +++++++++++++++++-
tools/damon/_convert_damos.py | 141 +++++++
tools/damon/_damon.py | 28 +-
tools/damon/damo | 7 +
tools/damon/schemes.py | 110 ++++++
.../testing/selftests/damon/debugfs_attrs.sh | 29 ++
11 files changed, 884 insertions(+), 17 deletions(-)
create mode 100755 tools/damon/_convert_damos.py
create mode 100644 tools/damon/schemes.py

--
2.17.1

2020-08-04 14:29:23

by SeongJae Park

[permalink] [raw]

Subject: [RFC v14 5/7] mm/damon/selftests: Add 'schemes' debugfs tests

From: SeongJae Park <[email protected]>

This commit adds simple selftets for 'schemes' debugfs file of DAMON.

Signed-off-by: SeongJae Park <[email protected]>
---
.../testing/selftests/damon/debugfs_attrs.sh | 29 +++++++++++++++++++
1 file changed, 29 insertions(+)

diff --git a/tools/testing/selftests/damon/debugfs_attrs.sh b/tools/testing/selftests/damon/debugfs_attrs.sh
index c75557e8ba58..61fd3e5598e9 100755
--- a/tools/testing/selftests/damon/debugfs_attrs.sh
+++ b/tools/testing/selftests/damon/debugfs_attrs.sh
@@ -97,6 +97,35 @@ fi

echo $ORIG_CONTENT > $file

+# Test schemes file
+file="$DBGFS/schemes"
+
+ORIG_CONTENT=$(cat $file)
+echo "1 2 3 4 5 6 3" > $file
+if [ $? -ne 0 ]
+then
+ echo "$file write fail"
+ echo $ORIG_CONTENT > $file
+ exit 1
+fi
+
+echo "1 2
+3 4 5 6 3" > $file
+if [ $? -eq 0 ]
+then
+ echo "$file multi line write success (expected fail)"
+ echo $ORIG_CONTENT > $file
+ exit 1
+fi
+
+echo > $file
+if [ $? -ne 0 ]
+then
+ echo "$file empty string writing fail"
+ echo $ORIG_CONTENT > $file
+ exit 1
+fi
+
# Test target_ids file
file="$DBGFS/target_ids"

--
2.17.1

2020-08-04 14:29:52

by SeongJae Park

[permalink] [raw]

Subject: [RFC v14 7/7] Docs/admin-guide/mm/damon: Document DAMON-based Operation Schemes

From: SeongJae Park <[email protected]>

This commit add description of DAMON-based operation schemes in the
DAMON documents.

Signed-off-by: SeongJae Park <[email protected]>
---
Documentation/admin-guide/mm/damon/guide.rst | 41 ++++++-
Documentation/admin-guide/mm/damon/start.rst | 11 ++
Documentation/admin-guide/mm/damon/usage.rst | 109 ++++++++++++++++++-
Documentation/vm/damon/index.rst | 1 -
4 files changed, 156 insertions(+), 6 deletions(-)

diff --git a/Documentation/admin-guide/mm/damon/guide.rst b/Documentation/admin-guide/mm/damon/guide.rst
index c51fb843efaa..1f9aa2ebbdb6 100644
--- a/Documentation/admin-guide/mm/damon/guide.rst
+++ b/Documentation/admin-guide/mm/damon/guide.rst
@@ -53,6 +53,11 @@ heats``. If it shows a simple pattern consists of a small number of memory
regions having high contrast of access temperature, you could consider manual
`Program Modification`_.

+If the access pattern is very frequently changing so that you cannot figure out
+what is the performance important region using your human eye, `Automated
+DAMON-based Memory Operations`_ might help the case owing to its machine-level
+microscope view.
+
If you still want to absorb more benefits, you should develop `Personalized
DAMON Application`_ for your special case.

@@ -120,6 +125,36 @@ shows the visualized access patterns of streamcluster workload in PARSEC3
benchmark suite. We can easily identify the 100 MiB sized hot object.

+Automated DAMON-based Memory Operations
+---------------------------------------
+
+Though `Manual Program Optimization` works well in many cases and DAMON can
+help it, modifying the source code is not a good option in many cases. First
+of all, the source code could be too old or unavailable. And, many workloads
+will have complex data access patterns that even hard to distinguish hot memory
+objects and cold memory objects with the human eye. Finding the mapping from
+the visualized access pattern to the source code and injecting the hinting
+system calls inside the code will also be quite challenging.
+
+By using DAMON-based operation schemes (DAMOS) via ``damo schemes``, you will
+be able to easily optimize your workload in such a case. Our example schemes
+called 'efficient THP' and 'proactive reclamation' achieved significant speedup
+and memory space saves against 25 realistic workloads [2]_.
+
+That said, note that you need careful tune of the schemes (e.g., target region
+size and age) and monitoring attributes for the successful use of this
+approach. Because the optimal values of the parameters will be dependent on
+each system and workload, misconfiguring the parameters could result in worse
+memory management.
+
+For the tuning, you could measure the performance metrics such as IPC, TLB
+misses, and swap in/out events and adjusts the parameters based on their
+changes. The total number and the total size of the regions that each scheme
+is applied, which are provided via the debugfs interface and the programming
+interface can also be useful. Writing a program automating this optimal
+parameter could be an option.
+
+
Personalized DAMON Application
------------------------------

@@ -146,9 +181,9 @@ Referencing previously done successful practices could help you getting the
sense for this kind of optimizations. There is an academic paper [1]_
reporting the visualized access pattern and manual `Program
Modification`_ results for a number of realistic workloads. You can also get
-the visualized access patterns [3]_ [4]_ [5]_ and automated DAMON-based memory
-operations results for other realistic workloads that collected with latest
-version of DAMON [2]_ .
+the visualized access patterns [3]_ [4]_ [5]_ and
+`Automated DAMON-based Memory Operations`_ results for other realistic
+workloads that collected with latest version of DAMON [2]_ .

.. [1] https://dl.acm.org/doi/10.1145/3366626.3368125
.. [2] https://damonitor.github.io/test/result/perf/latest/html/
diff --git a/Documentation/admin-guide/mm/damon/start.rst b/Documentation/admin-guide/mm/damon/start.rst
index deed2ea2321e..35cf4e4ca6aa 100644
--- a/Documentation/admin-guide/mm/damon/start.rst
+++ b/Documentation/admin-guide/mm/damon/start.rst
@@ -90,6 +90,17 @@ image files. ::
You can show the images in a web page [1]_ . Those made with other realistic
workloads are also available [2]_ [3]_ [4]_.

+
+Data Access Pattern Aware Memory Management
+===========================================
+
+Below three commands make every memory region of size >=4K that doesn't
+accessed for >=60 seconds in your workload to be swapped out. ::
+
+ $ echo "#min-size max-size min-acc max-acc min-age max-age action" > scheme
+ $ echo "4K max 0 0 60s max pageout" >> scheme
+ $ damo schemes -c my_thp_scheme <pid of your workload>
+
.. [1] https://damonitor.github.io/doc/html/v17/admin-guide/mm/damon/start.html#visualizing-recorded-patterns
.. [2] https://damonitor.github.io/test/result/visual/latest/rec.heatmap.1.png.html
.. [3] https://damonitor.github.io/test/result/visual/latest/rec.wss_sz.png.html
diff --git a/Documentation/admin-guide/mm/damon/usage.rst b/Documentation/admin-guide/mm/damon/usage.rst
index a6606d27a559..96278227f925 100644
--- a/Documentation/admin-guide/mm/damon/usage.rst
+++ b/Documentation/admin-guide/mm/damon/usage.rst
@@ -219,11 +219,70 @@ Similar to that of ``heats --heatmap``, it also supports 'gnuplot' based simple
visualization of the distribution via ``--plot`` option.

+DAMON-based Operation Schemes
+-----------------------------
+
+The ``schemes`` subcommand allows users to do DAMON-based memory management
+optimizations in a few seconds. Similar to ``record``, it receives monitoring
+attributes and target. However, in addition to those, ``schemes`` receives
+data access pattern-based memory operation schemes, which describes what memory
+operation action should be applied to memory regions showing specific data
+access pattern. Then, it starts the data access monitoring and automatically
+applies the schemes to the targets.
+
+The operation schemes should be saved in a text file in below format and passed
+to ``schemes`` subcommand via ``--schemes`` option. ::
+
+ min-size max-size min-acc max-acc min-age max-age action
+
+The format also supports comments, several units for size and age of regions,
+and human readable action names. Currently supported operation actions are
+``willneed``, ``cold``, ``pageout``, ``hugepage`` and ``nohugepage``. Each of
+the actions works same to the madvise() system call hints having the name.
+Please also note that the range is inclusive (closed interval), and ``0`` for
+max values means infinite. Below example schemes are possible. ::
+
+ # format is:
+ # <min/max size> <min/max frequency (0-100)> <min/max age> <action>
+ #
+ # B/K/M/G/T for Bytes/KiB/MiB/GiB/TiB
+ # us/ms/s/m/h/d for micro-seconds/milli-seconds/seconds/minutes/hours/days
+ # 'min/max' for possible min/max value.
+
+ # if a region keeps a high access frequency for >=100ms, put the region on
+ # the head of the LRU list (call madvise() with MADV_WILLNEED).
+ min max 80 max 100ms max willneed
+
+ # if a region keeps a low access frequency for >=200ms and <=one hour, put
+ # the region on the tail of the LRU list (call madvise() with MADV_COLD).
+ min max 10 20 200ms 1h cold
+
+ # if a region keeps a very low access frequency for >=60 seconds, swap out
+ # the region immediately (call madvise() with MADV_PAGEOUT).
+ min max 0 10 60s max pageout
+
+ # if a region of a size >=2MiB keeps a very high access frequency for
+ # >=100ms, let the region to use huge pages (call madvise() with
+ # MADV_HUGEPAGE).
+ 2M max 90 100 100ms max hugepage
+
+ # If a regions of a size >=2MiB keeps small access frequency for >=100ms,
+ # avoid the region using huge pages (call madvise() with MADV_NOHUGEPAGE).
+ 2M max 0 25 100ms max nohugepage
+
+For example, you can make a running process named 'foo' to use huge pages for
+memory regions keeping 2MB or larger size and having very high access frequency
+for at least 100 milliseconds using below commands::
+
+ $ echo "2M max 90 max 100ms max hugepage" > my_thp_scheme
+ $ ./damo schemes --schemes my_thp_scheme `pidof foo`
+
+
debugfs Interface
=================

-DAMON exports four files, ``attrs``, ``target_ids``, ``record``, and
-``monitor_on`` under its debugfs directory, ``<debugfs>/damon/``.
+DAMON exports five files, ``attrs``, ``target_ids``, ``record``, ``schemes``
+and ``monitor_on`` under its debugfs directory, ``<debugfs>/damon/``.

Attributes
@@ -280,6 +339,52 @@ saved in ``/damon.data``. ::
The recording can be disabled by setting the buffer size zero.

+Schemes
+-------
+
+For usual DAMON-based data access aware memory management optimizations, users
+would simply want the system to apply a memory management action to a memory
+region of a specific size having a specific access frequency for a specific
+time. DAMON receives such formalized operation schemes from the user and
+applies those to the target processes. It also counts the total number and
+size of regions that each scheme is applied. This statistics can be used for
+online analysis or tuning of the schemes.
+
+Users can get and set the schemes by reading from and writing to ``schemes``
+debugfs file. Reading the file also shows the statistics of each scheme. To
+the file, each of the schemes should be represented in each line in below form:
+
+ min-size max-size min-acc max-acc min-age max-age action
+
+Note that the ranges are closed interval. Bytes for the size of regions
+(``min-size`` and ``max-size``), number of monitored accesses per aggregate
+interval for access frequency (``min-acc`` and ``max-acc``), number of
+aggregate intervals for the age of regions (``min-age`` and ``max-age``), and a
+predefined integer for memory management actions should be used. The supported
+numbers and their meanings are as below.
+
+ - 0: Call ``madvise()`` for the region with ``MADV_WILLNEED``
+ - 1: Call ``madvise()`` for the region with ``MADV_COLD``
+ - 2: Call ``madvise()`` for the region with ``MADV_PAGEOUT``
+ - 3: Call ``madvise()`` for the region with ``MADV_HUGEPAGE``
+ - 4: Call ``madvise()`` for the region with ``MADV_NOHUGEPAGE``
+ - 5: Do nothing but count the statistics
+
+You can disable schemes by simply writing an empty string to the file. For
+example, below commands applies a scheme saying "If a memory region of size in
+[4KiB, 8KiB] is showing accesses per aggregate interval in [0, 5] for aggregate
+interval in [10, 20], page out the region", check the entered scheme again, and
+finally remove the scheme. ::
+
+ # cd <debugfs>/damon
+ # echo "4096 8192 0 5 10 20 2" > schemes
+ # cat schemes
+ 4096 8192 0 5 10 20 2 0 0
+ # echo > schemes
+
+The last two integers in the 4th line of above example is the total number and
+the total size of the regions that the scheme is applied.
+
Turning On/Off
--------------

diff --git a/Documentation/vm/damon/index.rst b/Documentation/vm/damon/index.rst
index 17dca3c12aad..69aec1287aaf 100644
--- a/Documentation/vm/damon/index.rst
+++ b/Documentation/vm/damon/index.rst
@@ -28,4 +28,3 @@ workloads and systems.
design
eval
api
- plans
--
2.17.1

2020-08-04 14:30:12

by SeongJae Park

[permalink] [raw]

Subject: [RFC v14 3/7] mm/damon/schemes: Implement a debugfs interface

From: SeongJae Park <[email protected]>

This commit implements a debugfs interface for the data access
monitoring oriented memory management schemes. It is supposed to be
used by administrators and/or privileged user space programs. Users can
read and update the rules using ``<debugfs>/damon/schemes`` file. The
format is::

<min/max size> <min/max access frequency> <min/max age> <action>

Signed-off-by: SeongJae Park <[email protected]>
---
mm/damon.c | 187 ++++++++++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 185 insertions(+), 2 deletions(-)

diff --git a/mm/damon.c b/mm/damon.c
index e402717a2c0e..d6b4181c5d70 100644
--- a/mm/damon.c
+++ b/mm/damon.c
@@ -194,6 +194,29 @@ static void damon_destroy_target(struct damon_target *t)
damon_free_target(t);
}

+static struct damos *damon_new_scheme(
+ unsigned long min_sz_region, unsigned long max_sz_region,
+ unsigned int min_nr_accesses, unsigned int max_nr_accesses,
+ unsigned int min_age_region, unsigned int max_age_region,
+ enum damos_action action)
+{
+ struct damos *scheme;
+
+ scheme = kmalloc(sizeof(*scheme), GFP_KERNEL);
+ if (!scheme)
+ return NULL;
+ scheme->min_sz_region = min_sz_region;
+ scheme->max_sz_region = max_sz_region;
+ scheme->min_nr_accesses = min_nr_accesses;
+ scheme->max_nr_accesses = max_nr_accesses;
+ scheme->min_age_region = min_age_region;
+ scheme->max_age_region = max_age_region;
+ scheme->action = action;
+ INIT_LIST_HEAD(&scheme->list);
+
+ return scheme;
+}
+
static void damon_add_scheme(struct damon_ctx *ctx, struct damos *s)
{
list_add_tail(&s->list, &ctx->schemes_list);
@@ -1540,6 +1563,160 @@ static ssize_t debugfs_monitor_on_write(struct file *file,
return ret;
}

+static ssize_t sprint_schemes(struct damon_ctx *c, char *buf, ssize_t len)
+{
+ struct damos *s;
+ int written = 0;
+ int rc;
+
+ damon_for_each_scheme(s, c) {
+ rc = snprintf(&buf[written], len - written,
+ "%lu %lu %u %u %u %u %d\n",
+ s->min_sz_region, s->max_sz_region,
+ s->min_nr_accesses, s->max_nr_accesses,
+ s->min_age_region, s->max_age_region,
+ s->action);
+ if (!rc)
+ return -ENOMEM;
+
+ written += rc;
+ }
+ return written;
+}
+
+static ssize_t debugfs_schemes_read(struct file *file, char __user *buf,
+ size_t count, loff_t *ppos)
+{
+ struct damon_ctx *ctx = &damon_user_ctx;
+ char *kbuf;
+ ssize_t len;
+
+ kbuf = kmalloc(count, GFP_KERNEL);
+ if (!kbuf)
+ return -ENOMEM;
+
+ mutex_lock(&ctx->kdamond_lock);
+ len = sprint_schemes(ctx, kbuf, count);
+ mutex_unlock(&ctx->kdamond_lock);
+ if (len < 0)
+ goto out;
+ len = simple_read_from_buffer(buf, count, ppos, kbuf, len);
+
+out:
+ kfree(kbuf);
+ return len;
+}
+
+static void free_schemes_arr(struct damos **schemes, ssize_t nr_schemes)
+{
+ ssize_t i;
+
+ for (i = 0; i < nr_schemes; i++)
+ kfree(schemes[i]);
+ kfree(schemes);
+}
+
+static bool damos_action_valid(int action)
+{
+ switch (action) {
+ case DAMOS_WILLNEED:
+ case DAMOS_COLD:
+ case DAMOS_PAGEOUT:
+ case DAMOS_HUGEPAGE:
+ case DAMOS_NOHUGEPAGE:
+ case DAMOS_STAT:
+ return true;
+ default:
+ return false;
+ }
+}
+
+/*
+ * Converts a string into an array of struct damos pointers
+ *
+ * Returns an array of struct damos pointers that converted if the conversion
+ * success, or NULL otherwise.
+ */
+static struct damos **str_to_schemes(const char *str, ssize_t len,
+ ssize_t *nr_schemes)
+{
+ struct damos *scheme, **schemes;
+ const int max_nr_schemes = 256;
+ int pos = 0, parsed, ret;
+ unsigned long min_sz, max_sz;
+ unsigned int min_nr_a, max_nr_a, min_age, max_age;
+ unsigned int action;
+
+ schemes = kmalloc_array(max_nr_schemes, sizeof(scheme),
+ GFP_KERNEL);
+ if (!schemes)
+ return NULL;
+
+ *nr_schemes = 0;
+ while (pos < len && *nr_schemes < max_nr_schemes) {
+ ret = sscanf(&str[pos], "%lu %lu %u %u %u %u %u%n",
+ &min_sz, &max_sz, &min_nr_a, &max_nr_a,
+ &min_age, &max_age, &action, &parsed);
+ if (ret != 7)
+ break;
+ if (!damos_action_valid(action)) {
+ pr_err("wrong action %d\n", action);
+ goto fail;
+ }
+
+ pos += parsed;
+ scheme = damon_new_scheme(min_sz, max_sz, min_nr_a, max_nr_a,
+ min_age, max_age, action);
+ if (!scheme)
+ goto fail;
+
+ schemes[*nr_schemes] = scheme;
+ *nr_schemes += 1;
+ }
+ return schemes;
+fail:
+ free_schemes_arr(schemes, *nr_schemes);
+ return NULL;
+}
+
+static ssize_t debugfs_schemes_write(struct file *file, const char __user *buf,
+ size_t count, loff_t *ppos)
+{
+ struct damon_ctx *ctx = &damon_user_ctx;
+ char *kbuf;
+ struct damos **schemes;
+ ssize_t nr_schemes = 0, ret = count;
+ int err;
+
+ kbuf = user_input_str(buf, count, ppos);
+ if (IS_ERR(kbuf))
+ return PTR_ERR(kbuf);
+
+ schemes = str_to_schemes(kbuf, ret, &nr_schemes);
+ if (!schemes) {
+ ret = -EINVAL;
+ goto out;
+ }
+
+ mutex_lock(&ctx->kdamond_lock);
+ if (ctx->kdamond) {
+ ret = -EBUSY;
+ goto unlock_out;
+ }
+
+ err = damon_set_schemes(ctx, schemes, nr_schemes);
+ if (err)
+ ret = err;
+ else
+ nr_schemes = 0;
+unlock_out:
+ mutex_unlock(&ctx->kdamond_lock);
+ free_schemes_arr(schemes, nr_schemes);
+out:
+ kfree(kbuf);
+ return ret;
+}
+
#define targetid_is_pid(ctx) \
(ctx->target_valid == kdamond_vm_target_valid)

@@ -1808,6 +1985,12 @@ static const struct file_operations target_ids_fops = {
.write = debugfs_target_ids_write,
};

+static const struct file_operations schemes_fops = {
+ .owner = THIS_MODULE,
+ .read = debugfs_schemes_read,
+ .write = debugfs_schemes_write,
+};
+
static const struct file_operations record_fops = {
.owner = THIS_MODULE,
.read = debugfs_record_read,
@@ -1824,10 +2007,10 @@ static struct dentry *debugfs_root;

static int __init damon_debugfs_init(void)
{
- const char * const file_names[] = {"attrs", "record",
+ const char * const file_names[] = {"attrs", "record", "schemes",
"target_ids", "monitor_on"};
const struct file_operations *fops[] = {&attrs_fops, &record_fops,
- &target_ids_fops, &monitor_on_fops};
+ &schemes_fops, &target_ids_fops, &monitor_on_fops};
int i;

debugfs_root = debugfs_create_dir("damon", NULL);
--
2.17.1

2020-08-04 14:31:48

by SeongJae Park

[permalink] [raw]

Subject: [RFC v14 6/7] damon/tools: Support more human friendly 'schemes' control

From: SeongJae Park <[email protected]>

This commit implements 'schemes' subcommand of the damon userspace tool.
It can be used to describe and apply the data access monitoring-based
operation schemes in more human friendly fashion.

Signed-off-by: SeongJae Park <[email protected]>
---
tools/damon/_convert_damos.py | 141 ++++++++++++++++++++++++++++++++++
tools/damon/_damon.py | 28 +++++--
tools/damon/damo | 7 ++
tools/damon/schemes.py | 110 ++++++++++++++++++++++++++
4 files changed, 280 insertions(+), 6 deletions(-)
create mode 100755 tools/damon/_convert_damos.py
create mode 100644 tools/damon/schemes.py

diff --git a/tools/damon/_convert_damos.py b/tools/damon/_convert_damos.py
new file mode 100755
index 000000000000..0fd84b3701c9
--- /dev/null
+++ b/tools/damon/_convert_damos.py
@@ -0,0 +1,141 @@
+#!/usr/bin/env python3
+# SPDX-License-Identifier: GPL-2.0
+
+"""
+Change human readable data access monitoring-based operation schemes to the low
+level input for the '<debugfs>/damon/schemes' file. Below is an example of the
+schemes written in the human readable format:
+
+ # format is:
+ # <min/max size> <min/max frequency (0-100)> <min/max age> <action>
+ #
+ # B/K/M/G/T for Bytes/KiB/MiB/GiB/TiB
+ # us/ms/s/m/h/d for micro-seconds/milli-seconds/seconds/minutes/hours/days
+ # 'min/max' for possible min/max value.
+
+ # if a region keeps a high access frequency for >=100ms, put the region on
+ # the head of the LRU list (call madvise() with MADV_WILLNEED).
+ min max 80 max 100ms max willneed
+
+ # if a region keeps a low access frequency for >=200ms and <=one hour, put
+ # the region on the tail of the LRU list (call madvise() with MADV_COLD).
+ min max 10 20 200ms 1h cold
+
+ # if a region keeps a very low access frequency for >=60 seconds, swap out
+ # the region immediately (call madvise() with MADV_PAGEOUT).
+ min max 0 10 60s max pageout
+
+ # if a region of a size >=2MiB keeps a very high access frequency for
+ # >=100ms, let the region to use huge pages (call madvise() with
+ # MADV_HUGEPAGE).
+ 2M max 90 100 100ms max hugepage
+
+ # If a regions of a size >=2MiB keeps small access frequency for >=100ms,
+ # avoid the region using huge pages (call madvise() with MADV_NOHUGEPAGE).
+ 2M max 0 25 100ms max nohugepage
+"""
+
+import argparse
+import platform
+
+uint_max = 2**32 - 1
+ulong_max = 2**64 - 1
+if platform.architecture()[0] != '64bit':
+ ulong_max = 2**32 - 1
+
+unit_to_bytes = {'B': 1, 'K': 1024, 'M': 1024 * 1024, 'G': 1024 * 1024 * 1024,
+ 'T': 1024 * 1024 * 1024 * 1024}
+
+def text_to_bytes(txt):
+ if txt == 'min':
+ return 0
+ if txt == 'max':
+ return ulong_max
+
+ unit = txt[-1]
+ number = float(txt[:-1])
+ return int(number * unit_to_bytes[unit])
+
+unit_to_usecs = {'us': 1, 'ms': 1000, 's': 1000 * 1000, 'm': 60 * 1000 * 1000,
+ 'h': 60 * 60 * 1000 * 1000, 'd': 24 * 60 * 60 * 1000 * 1000}
+
+def text_to_aggr_intervals(txt, aggr_interval):
+ if txt == 'min':
+ return 0
+ if txt == 'max':
+ return uint_max
+
+ unit = txt[-2:]
+ if unit in ['us', 'ms']:
+ number = float(txt[:-2])
+ else:
+ unit = txt[-1]
+ number = float(txt[:-1])
+ return int(number * unit_to_usecs[unit]) / aggr_interval
+
+damos_action_to_int = {'DAMOS_WILLNEED': 0, 'DAMOS_COLD': 1,
+ 'DAMOS_PAGEOUT': 2, 'DAMOS_HUGEPAGE': 3, 'DAMOS_NOHUGEPAGE': 4,
+ 'DAMOS_STAT': 5}
+
+def text_to_damos_action(txt):
+ return damos_action_to_int['DAMOS_' + txt.upper()]
+
+def text_to_nr_accesses(txt, max_nr_accesses):
+ if txt == 'min':
+ return 0
+ if txt == 'max':
+ return max_nr_accesses
+
+ return int(float(txt) * max_nr_accesses / 100)
+
+def debugfs_scheme(line, sample_interval, aggr_interval):
+ fields = line.split()
+ if len(fields) != 7:
+ print('wrong input line: %s' % line)
+ exit(1)
+
+ limit_nr_accesses = aggr_interval / sample_interval
+ try:
+ min_sz = text_to_bytes(fields[0])
+ max_sz = text_to_bytes(fields[1])
+ min_nr_accesses = text_to_nr_accesses(fields[2], limit_nr_accesses)
+ max_nr_accesses = text_to_nr_accesses(fields[3], limit_nr_accesses)
+ min_age = text_to_aggr_intervals(fields[4], aggr_interval)
+ max_age = text_to_aggr_intervals(fields[5], aggr_interval)
+ action = text_to_damos_action(fields[6])
+ except:
+ print('wrong input field')
+ raise
+ return '%d\t%d\t%d\t%d\t%d\t%d\t%d' % (min_sz, max_sz, min_nr_accesses,
+ max_nr_accesses, min_age, max_age, action)
+
+def convert(schemes_file, sample_interval, aggr_interval):
+ lines = []
+ with open(schemes_file, 'r') as f:
+ for line in f:
+ if line.startswith('#'):
+ continue
+ line = line.strip()
+ if line == '':
+ continue
+ lines.append(debugfs_scheme(line, sample_interval, aggr_interval))
+ return '\n'.join(lines)
+
+def main():
+ parser = argparse.ArgumentParser()
+ parser.add_argument('input', metavar='<file>',
+ help='input file describing the schemes')
+ parser.add_argument('-s', '--sample', metavar='<interval>', type=int,
+ default=5000, help='sampling interval (us)')
+ parser.add_argument('-a', '--aggr', metavar='<interval>', type=int,
+ default=100000, help='aggregation interval (us)')
+ args = parser.parse_args()
+
+ schemes_file = args.input
+ sample_interval = args.sample
+ aggr_interval = args.aggr
+
+ print(convert(schemes_file, sample_interval, aggr_interval))
+
+if __name__ == '__main__':
+ main()
diff --git a/tools/damon/_damon.py b/tools/damon/_damon.py
index 1f6a292e8c25..a4f6c03c23e4 100644
--- a/tools/damon/_damon.py
+++ b/tools/damon/_damon.py
@@ -10,6 +10,7 @@ import subprocess

debugfs_attrs = None
debugfs_record = None
+debugfs_schemes = None
debugfs_target_ids = None
debugfs_monitor_on = None

@@ -33,8 +34,9 @@ class Attrs:
max_nr_regions = None
rbuf_len = None
rfile_path = None
+ schemes = None

- def __init__(self, s, a, r, n, x, l, f):
+ def __init__(self, s, a, r, n, x, l, f, c):
self.sample_interval = s
self.aggr_interval = a
self.regions_update_interval = r
@@ -42,12 +44,13 @@ class Attrs:
self.max_nr_regions = x
self.rbuf_len = l
self.rfile_path = f
+ self.schemes = c

def __str__(self):
- return "%s %s %s %s %s %s %s" % (self.sample_interval,
+ return "%s %s %s %s %s %s %s\n%s" % (self.sample_interval,
self.aggr_interval, self.regions_update_interval,
self.min_nr_regions, self.max_nr_regions, self.rbuf_len,
- self.rfile_path)
+ self.rfile_path, self.schemes)

def attr_str(self):
return "%s %s %s %s %s " % (self.sample_interval, self.aggr_interval,
@@ -66,6 +69,9 @@ class Attrs:
debugfs_record), shell=True, executable='/bin/bash')
if ret:
return ret
+ return subprocess.call('echo %s > %s' % (
+ self.schemes.replace('\n', ' '), debugfs_schemes), shell=True,
+ executable='/bin/bash')

def current_attrs():
with open(debugfs_attrs, 'r') as f:
@@ -77,17 +83,26 @@ def current_attrs():
attrs.append(int(rattrs[0]))
attrs.append(rattrs[1])

+ with open(debugfs_schemes, 'r') as f:
+ schemes = f.read()
+
+ # The last two fields in each line are statistics. Remove those.
+ schemes = [' '.join(x.split()[:-2]) for x in schemes.strip().split('\n')]
+ attrs.append('\n'.join(schemes))
+
return Attrs(*attrs)

def chk_update_debugfs(debugfs):
global debugfs_attrs
global debugfs_record
+ global debugfs_schemes
global debugfs_target_ids
global debugfs_monitor_on

debugfs_damon = os.path.join(debugfs, 'damon')
debugfs_attrs = os.path.join(debugfs_damon, 'attrs')
debugfs_record = os.path.join(debugfs_damon, 'record')
+ debugfs_schemes = os.path.join(debugfs_damon, 'schemes')
debugfs_target_ids = os.path.join(debugfs_damon, 'target_ids')
debugfs_monitor_on = os.path.join(debugfs_damon, 'monitor_on')

@@ -95,8 +110,8 @@ def chk_update_debugfs(debugfs):
print("damon debugfs dir (%s) not found", debugfs_damon)
exit(1)

- for f in [debugfs_attrs, debugfs_record, debugfs_target_ids,
- debugfs_monitor_on]:
+ for f in [debugfs_attrs, debugfs_record, debugfs_schemes,
+ debugfs_target_ids, debugfs_monitor_on]:
if not os.path.isfile(f):
print("damon debugfs file (%s) not found" % f)
exit(1)
@@ -112,8 +127,9 @@ def cmd_args_to_attrs(args):
if not os.path.isabs(args.out):
args.out = os.path.join(os.getcwd(), args.out)
rfile_path = args.out
+ schemes = args.schemes
return Attrs(sample_interval, aggr_interval, regions_update_interval,
- min_nr_regions, max_nr_regions, rbuf_len, rfile_path)
+ min_nr_regions, max_nr_regions, rbuf_len, rfile_path, schemes)

def set_attrs_argparser(parser):
parser.add_argument('-d', '--debugfs', metavar='<debugfs>', type=str,
diff --git a/tools/damon/damo b/tools/damon/damo
index 58e1099ae5fc..ce7180069bef 100755
--- a/tools/damon/damo
+++ b/tools/damon/damo
@@ -5,6 +5,7 @@ import argparse

import record
import report
+import schemes

class SubCmdHelpFormatter(argparse.RawDescriptionHelpFormatter):
def _format_action(self, action):
@@ -25,6 +26,10 @@ parser_record = subparser.add_parser('record',
help='record data accesses of the given target processes')
record.set_argparser(parser_record)

+parser_schemes = subparser.add_parser('schemes',
+ help='apply operation schemes to the given target process')
+schemes.set_argparser(parser_schemes)
+
parser_report = subparser.add_parser('report',
help='report the recorded data accesses in the specified form')
report.set_argparser(parser_report)
@@ -33,5 +38,7 @@ args = parser.parse_args()

if args.command == 'record':
record.main(args)
+elif args.command == 'schemes':
+ schemes.main(args)
elif args.command == 'report':
report.main(args)
diff --git a/tools/damon/schemes.py b/tools/damon/schemes.py
new file mode 100644
index 000000000000..9095835f6133
--- /dev/null
+++ b/tools/damon/schemes.py
@@ -0,0 +1,110 @@
+#!/usr/bin/env python3
+# SPDX-License-Identifier: GPL-2.0
+
+"""
+Apply given operation schemes to the target process.
+"""
+
+import argparse
+import os
+import signal
+import subprocess
+import time
+
+import _convert_damos
+import _damon
+
+def run_damon(target, is_target_cmd, attrs, old_attrs):
+ if os.path.isfile(attrs.rfile_path):
+ os.rename(attrs.rfile_path, attrs.rfile_path + '.old')
+
+ if attrs.apply():
+ print('attributes (%s) failed to be applied' % attrs)
+ cleanup_exit(old_attrs, -1)
+ print('# damon attrs: %s %s' % (attrs.attr_str(), attrs.record_str()))
+ for line in attrs.schemes.split('\n'):
+ print('# scheme: %s' % line)
+ if is_target_cmd:
+ p = subprocess.Popen(target, shell=True, executable='/bin/bash')
+ target = p.pid
+ if _damon.set_target_pid(target):
+ print('pid setting (%s) failed' % target)
+ cleanup_exit(old_attrs, -2)
+ if _damon.turn_damon('on'):
+ print('could not turn on damon' % target)
+ cleanup_exit(old_attrs, -3)
+ while not _damon.is_damon_running():
+ sleep(1)
+ print('Press Ctrl+C to stop')
+ if is_target_cmd:
+ p.wait()
+ while True:
+ # damon will turn it off by itself if the target tasks are terminated.
+ if not _damon.is_damon_running():
+ break
+ time.sleep(1)
+
+ cleanup_exit(old_attrs, 0)
+
+def cleanup_exit(orig_attrs, exit_code):
+ if _damon.is_damon_running():
+ if _damon.turn_damon('off'):
+ print('failed to turn damon off!')
+ while _damon.is_damon_running():
+ sleep(1)
+ if orig_attrs:
+ if orig_attrs.apply():
+ print('original attributes (%s) restoration failed!' % orig_attrs)
+ exit(exit_code)
+
+def sighandler(signum, frame):
+ print('\nsignal %s received' % signum)
+ cleanup_exit(orig_attrs, signum)
+
+def chk_permission():
+ if os.geteuid() != 0:
+ print("Run as root")
+ exit(1)
+
+def set_argparser(parser):
+ _damon.set_attrs_argparser(parser)
+ parser.add_argument('target', type=str, metavar='<target>',
+ help='the target command or the pid to record')
+ parser.add_argument('-c', '--schemes', metavar='<file>', type=str,
+ default='damon.schemes',
+ help='data access monitoring-based operation schemes')
+
+def main(args=None):
+ global orig_attrs
+ if not args:
+ parser = argparse.ArgumentParser()
+ set_argparser(parser)
+ args = parser.parse_args()
+
+ chk_permission()
+ _damon.chk_update_debugfs(args.debugfs)
+
+ signal.signal(signal.SIGINT, sighandler)
+ signal.signal(signal.SIGTERM, sighandler)
+ orig_attrs = _damon.current_attrs()
+
+ args.rbuf = 0
+ args.out = 'null'
+ args.schemes = _convert_damos.convert(args.schemes, args.sample, args.aggr)
+ new_attrs = _damon.cmd_args_to_attrs(args)
+ target = args.target
+
+ target_fields = target.split()
+ if not subprocess.call('which %s &> /dev/null' % target_fields[0],
+ shell=True, executable='/bin/bash'):
+ run_damon(target, True, new_attrs, orig_attrs)
+ else:
+ try:
+ pid = int(target)
+ except:
+ print('target \'%s\' is neither a command, nor a pid' % target)
+ exit(1)
+ run_damon(target, False, new_attrs, orig_attrs)
+
+if __name__ == '__main__':
+ main()
--
2.17.1