Received: by 2002:a05:6902:102b:0:0:0:0 with SMTP id x11csp2254254ybt; Tue, 16 Jun 2020 00:48:42 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxze1VMTxsYCiZimLC3eCLgyDjtRlHEf02ZaphI3AkU/HZ1KK8uOFVI6RO7RRxafq/G3sQP X-Received: by 2002:a17:906:f10a:: with SMTP id gv10mr1501902ejb.309.1592293722085; Tue, 16 Jun 2020 00:48:42 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1592293722; cv=none; d=google.com; s=arc-20160816; b=gqFgUdOQWBQlPYtiMH4oxUXHPiIz0M06MoPtILnX8zNbexCQvH3wsrTwrd3FWzWnby 7nAQ5CHcYxCMQOBFX3afwU4JrXJ0FMrtcOrZ/gHsoJoFpe/1liOC4VIPiEFIKqpYLXL3 MIM1k3Xx8adK5Wr6Z830NcIbSAneuj+g83uXk9WNj4A8lJuSyDp0Xz+/K8B9D1QuT5sZ xKJMpBU7R6h7mLw/0PnJwA5yVpVPlTs0zmPsLwyGARgcmU+nlCZvVTgCHPx02T4r7diQ tXRFbrs9sCb4yA/8RpcOM8jVeKkGaszfBpm0Gpyc3D7hEIcm30qeG0UlP/uF2wOG2q0T WhvQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:ironport-sdr:dkim-signature; bh=E2NZvwkUduD3G6pwOdrgJZzwQKFOYBCI57++JCrH6GI=; b=0WxmEl+YEY2NrPgFG/t7CFRrDXEXc39nFcY6SCFdc6mBLwXzusnC7p0wyZjGf9xFSy hOb1uU5FUNK8vieStWBcxAKAACztruVbBtBRmOsepIA49vsVvLJn7LGdAX/i4/KkQXQR LL/j8felOOfhYcRy4pTkUa6p8Jfq08S6xik8uLZXu3HZzRyvZkpXbkImgqB8USn/Bawq bxjpbGXo8W5P1ZMjtDhWAPHOq1jRYaDVDTadfvaOk5z8jg5EXfWX1g3Uc4tz1gX2FERA /O4aTPr9pEzExppwrTPPfBvcjpfD/okttI2Rpyua64+UvuLDwnNwTJ1CmhYfWPDTjgTT C7EQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@amazon.com header.s=amazon201209 header.b=pVuN2TVz; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amazon.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id dt7si10505515ejb.395.2020.06.16.00.48.18; Tue, 16 Jun 2020 00:48:42 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@amazon.com header.s=amazon201209 header.b=pVuN2TVz; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amazon.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726868AbgFPHnn (ORCPT + 99 others); Tue, 16 Jun 2020 03:43:43 -0400 Received: from smtp-fw-4101.amazon.com ([72.21.198.25]:17936 "EHLO smtp-fw-4101.amazon.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725843AbgFPHnm (ORCPT ); Tue, 16 Jun 2020 03:43:42 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1592293420; x=1623829420; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version; bh=E2NZvwkUduD3G6pwOdrgJZzwQKFOYBCI57++JCrH6GI=; b=pVuN2TVzxGAc6LTzlWNR6StuDuUfsoh0Aqg6Ipf2SDh7U7+R4biRfb3T AkXLKBDVFaV/AsDry4SVHpsox1ociNgDU8/PujR3C6nazhNl12zI5NdHH JyX9i/hy0Tbbr5uXwqbhXy20SODeWGryQCdGEgg5tytW/j2gSjh44ci/F 4=; IronPort-SDR: D5QFZPLghOhnCpu1yMuFt6vUjUwXaU+m877vrYZgtBJ3GjGgSV5c97VJOfQS3X76WewMrXY8Ed pE3JI7USTbzw== X-IronPort-AV: E=Sophos;i="5.73,518,1583193600"; d="scan'208";a="36518224" Received: from iad12-co-svc-p1-lb1-vlan3.amazon.com (HELO email-inbound-relay-2a-69849ee2.us-west-2.amazon.com) ([10.43.8.6]) by smtp-border-fw-out-4101.iad4.amazon.com with ESMTP; 16 Jun 2020 07:43:36 +0000 Received: from EX13MTAUEA002.ant.amazon.com (pdx4-ws-svc-p6-lb7-vlan2.pdx.amazon.com [10.170.41.162]) by email-inbound-relay-2a-69849ee2.us-west-2.amazon.com (Postfix) with ESMTPS id 1F00AA18B3; Tue, 16 Jun 2020 07:43:33 +0000 (UTC) Received: from EX13D31EUA001.ant.amazon.com (10.43.165.15) by EX13MTAUEA002.ant.amazon.com (10.43.61.77) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Tue, 16 Jun 2020 07:43:33 +0000 Received: from u886c93fd17d25d.ant.amazon.com (10.43.162.248) by EX13D31EUA001.ant.amazon.com (10.43.165.15) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Tue, 16 Jun 2020 07:43:00 +0000 From: SeongJae Park To: CC: SeongJae Park , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , Subject: [RFC v12 8/8] Documentation/admin-guide/mm: Document DAMON-based operation schemes Date: Tue, 16 Jun 2020 09:38:28 +0200 Message-ID: <20200616073828.16509-9-sjpark@amazon.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20200616073828.16509-1-sjpark@amazon.com> References: <20200616073828.16509-1-sjpark@amazon.com> MIME-Version: 1.0 Content-Type: text/plain X-Originating-IP: [10.43.162.248] X-ClientProxiedBy: EX13D22UWC004.ant.amazon.com (10.43.162.198) To EX13D31EUA001.ant.amazon.com (10.43.165.15) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: SeongJae Park This commit documents DAMON-based operation schemes in the DAMON document. Signed-off-by: SeongJae Park --- Documentation/admin-guide/mm/damon/guide.rst | 35 +++++ Documentation/admin-guide/mm/damon/plans.rst | 23 +--- Documentation/admin-guide/mm/damon/usage.rst | 127 +++++++++++++++++-- 3 files changed, 153 insertions(+), 32 deletions(-) diff --git a/Documentation/admin-guide/mm/damon/guide.rst b/Documentation/admin-guide/mm/damon/guide.rst index 4a840d1b02d4..c10f65ce721c 100644 --- a/Documentation/admin-guide/mm/damon/guide.rst +++ b/Documentation/admin-guide/mm/damon/guide.rst @@ -55,6 +55,11 @@ heats``. If it shows a simple pattern consists of a small number of memory regions having high contrast of access temperature, you could consider `Manual Program Optimization`_. +If the access pattern is very frequently changing so that you cannot figure out +what is the performance important region using your human eye, `Automated +DAMON-based Memory Operations`_ might help the case owing to its machine-level +microscope view. + You don't need to take only one approach among the above plans, but you could use multiple of the above approaches to maximize the benefit. If you still want to absorb more benefits, you should develop `Personalized DAMON @@ -158,6 +163,36 @@ object is the hot object. The chronological changes of working set size. +Automated DAMON-based Memory Operations +--------------------------------------- + +Though `Manual Program Optimization` works well in many cases and DAMON can +help it, modifying the source code is not a good option in many cases. First +of all, the source code could be too old or unavailable. And, many workloads +will have complex data access patterns that even hard to distinguish hot memory +objects and cold memory objects with the human eye. Finding the mapping from +the visualized access pattern to the source code and injecting the hinting +system calls inside the code will also be quite challenging. + +By using DAMON-based operation schemes (DAMOS) via ``damo schemes``, you will +be able to easily optimize your workload in such a case. Our example schemes +called 'efficient THP' and 'proactive reclamation' achieved significant speedup +and memory space saves against 25 realistic workloads [2]_, [3]_. + +That said, note that you need careful tune of the schemes (e.g., target region +size and age) and monitoring attributes for the successful use of this +approach. Because the optimal values of the parameters will be dependent on +each system and workload, misconfiguring the parameters could result in worse +memory management. + +For the tuning, you could measure the performance metrics such as IPC, TLB +misses, and swap in/out events and adjusts the parameters based on their +changes. The total number and the total size of the regions that each scheme +is applied, which are provided via the debugfs interface and the programming +interface can also be useful. Writing a program automating this optimal +parameter could be an option. + + Personalized DAMON Application ------------------------------ diff --git a/Documentation/admin-guide/mm/damon/plans.rst b/Documentation/admin-guide/mm/damon/plans.rst index e0ad178a01d1..765344f02eb3 100644 --- a/Documentation/admin-guide/mm/damon/plans.rst +++ b/Documentation/admin-guide/mm/damon/plans.rst @@ -4,25 +4,4 @@ Future Plans ============ -DAMON is still on its first stage. Below plans are still under development. - - -Automate Data Access Monitoring-based Memory Operation Schemes Execution -======================================================================== - -The ultimate goal of DAMON is being used as a building block of the data access -pattern aware kernel memory management subsystem optimization. However, as -always, some users having very special workloads will want to do their -optimization. DAMON will automate most of the tasks for such manual -optimizations soon. Users will be required to only describe what kind of data -access pattern-based operation schemes they want in a simple form. - -By applying a very simple scheme for THP promotion/demotion with a prototype -implementation, DAMON reduced 60% of THP memory footprint overhead while -preserving 50% of the THP performance benefit. The detailed results can be -seen on an external web page [1]_. - -Several RFC patchsets for this plan are available [2]_. - -.. [1] https://damonitor.github.io/test/result/perf/latest/html/ -.. [2] https://lore.kernel.org/linux-mm/20200429124540.32232-1-sjpark@amazon.com/ +TBD. diff --git a/Documentation/admin-guide/mm/damon/usage.rst b/Documentation/admin-guide/mm/damon/usage.rst index c265de653aef..2f5b67302094 100644 --- a/Documentation/admin-guide/mm/damon/usage.rst +++ b/Documentation/admin-guide/mm/damon/usage.rst @@ -231,11 +231,71 @@ Similar to that of ``heats --heatmap``, it also supports 'gnuplot' based simple visualization of the distribution via ``--plot`` option. +DAMON-based Operation Schemes +----------------------------- + +The ``schemes`` subcommand allows users to do DAMON-based memory management +optimizations in a few seconds. Similar to ``record``, it receives monitoring +attributes and target. However, in addition to those, ``schemes`` receives +data access pattern-based memory operation schemes, which describes what memory +operation action should be applied to memory regions showing specific data +access pattern. Then, it starts the data access monitoring and automatically +applies the schemes to the targets. + +The operation schemes should be saved in a text file in below format and passed +to ``schemes`` subcommand via ``--schemes`` option. :: + + min-size max-size min-acc max-acc min-age max-age action + +The format also supports comments, several units for size and age of regions, +and human readable action names. Currently supported operation actions are +``willneed``, ``cold``, ``pageout``, ``hugepage`` and ``nohugepage``. Each of +the actions works same to the madvise() system call hints having the name. +Please also note that the range is inclusive (closed interval), and ``0`` for +max values means infinite. Below example schemes are possible. :: + + # format is: + # + # + # B/K/M/G/T for Bytes/KiB/MiB/GiB/TiB + # us/ms/s/m/h/d for micro-seconds/milli-seconds/seconds/minutes/hours/days + # 'null' means zero for size and age. + + # if a region keeps a high access frequency for >=100ms, put the region on + # the head of the LRU list (call madvise() with MADV_WILLNEED). + null null 80 null 100ms 0s willneed + + # if a region keeps a low access frequency at least 200ms and at most one + # hour, put the region on the tail of the LRU list (call madvise() with + # MADV_COLD). + 0B 0B 10 20 200ms 1h cold + + # if a region keeps a very low access frequency for >=1 minute, swap + # out the region immediately (call madvise() with MADV_PAGEOUT). + 0B null 0 10 60s 0s pageout + + # if a region of a size >=2MiB keeps a very high access frequency for + # >=100ms, let the region to use huge pages (call madvise() with + # MADV_HUGEPAGE). + 2M null 90 99 100ms 0s hugepage + + # If a region of a size >=2MiB keeps small access frequency for >=100ms, + # avoid the region using huge pages (call madvise() with MADV_NOHUGEPAGE). + 2M null 0 25 100ms 0s nohugepage + +For example, you can make a running process named 'foo' to use huge pages for +memory regions keeping 2MB or larger size and having very high access frequency +for at least 100 milliseconds using below commands:: + + $ echo "2M null 90 99 100ms 0s hugepage" > my_thp_scheme + $ ./damo schemes --schemes my_thp_scheme `pidof foo` + + debugfs Interface ================= -DAMON exports four files, ``attrs``, ``pids``, ``record``, and ``monitor_on`` -under its debugfs directory, ``/damon/``. +DAMON exports five files, ``attrs``, ``pids``, ``record``, ``schemes`` and +``monitor_on`` under its debugfs directory, ``/damon/``. Attributes @@ -285,17 +345,64 @@ saved in ``/damon.data``. :: 4096 /damon.data +Schemes +------- + +For usual DAMON-based data access aware memory management optimizations, users +would simply want the system to apply a memory management action to a memory +region of a specific size having a specific access frequency for a specific +time. DAMON receives such formalized operation schemes from the user and +applies those to the target processes. It also counts the total number and +size of regions that each scheme is applied. This statistics can be used for +online analysis or tuning of the schemes. + +Users can get and set the schemes by reading from and writing to ``schemes`` +debugfs file. Reading the file also shows the statistics of each scheme. To +the file, each of the schemes should be represented in each line in below form: + + min-size max-size min-acc max-acc min-age max-age action + +Note that the ranges are closed interval. Bytes for the size of regions +(``min-size`` and ``max-size``), number of monitored accesses per aggregate +interval for access frequency (``min-acc`` and ``max-acc``), number of +aggregate intervals for the age of regions (``min-age`` and ``max-age``), and a +predefined integer for memory management actions should be used. The supported +numbers and their +meanings are as below. + + - 0: Call ``madvise()`` for the region with ``MADV_WILLNEED`` + - 1: Call ``madvise()`` for the region with ``MADV_COLD`` + - 2: Call ``madvise()`` for the region with ``MADV_PAGEOUT`` + - 3: Call ``madvise()`` for the region with ``MADV_HUGEPAGE`` + - 4: Call ``madvise()`` for the region with ``MADV_NOHUGEPAGE`` + - 5: Do nothing but count the statistics + +You can disable schemes by simply writing an empty string to the file. For +example, below commands applies a scheme saying "If a memory region >=4KiB is +showing <=5 accesses per aggregate interval (0 5) for >=5 aggregate interval (5 +0), page out the region (2)", check the entered scheme again, and finally +remove the scheme. :: + + # cd /damon + # echo "4096 0 0 5 5 0 2" > schemes + # cat schemes + 4096 0 0 5 5 0 2 0 0 + # echo > schemes + +The last two integers in the 4th line of above example is the total number and +the total size of the regions that the scheme is applied. + Turning On/Off -------------- -Setting the attributes as described above doesn't incur effect unless you -explicitly start the monitoring. You can start, stop, and check the current -status of the monitoring by writing to and reading from the ``monitor_on`` -file. Writing ``on`` to the file make DAMON start monitoring of the target -processes with the attributes. Recording will also start if requested before. -Writing ``off`` to the file stops those. DAMON also stops if every target -process is terminated. Below example commands turn on, off, and check the -status of DAMON:: +Setting the attributes and schemes as described above doesn't incur effect +unless you explicitly start the monitoring. You can start, stop, and check +the current status of the monitoring by writing to and reading from the +``monitor_on`` file. Writing ``on`` to the file make DAMON start monitoring +of the target processes with the attributes. Recording and schemes applying +will also start if requested before. Writing ``off`` to the file stops those. +DAMON also stops if every target process is terminated. Below example +commands turn on, off, and check the status of DAMON:: # cd /damon # echo on > monitor_on -- 2.17.1