Received: by 2002:a05:6902:102b:0:0:0:0 with SMTP id x11csp1788063ybt; Mon, 15 Jun 2020 09:23:22 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxuxhlzj8kDF+W56O0/TnnduHgJccsQD8w6Zyc21/xH9+3OWUPOifEzWtNoxj/I2CvslH4L X-Received: by 2002:a05:6402:710:: with SMTP id w16mr19866919edx.373.1592238201800; Mon, 15 Jun 2020 09:23:21 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1592238201; cv=none; d=google.com; s=arc-20160816; b=OEEsKXailTQX3lVHS7De+Jtz9AE+qHV8JtWBeR55fmS4thdyi3FLpTPClyO12uy5lF hp9Mb5PG8GF5YcdYaZOv718EeE7PYcBXhTio2D17pP5iTX6DypxewQZjVaYz5H+eUnCq rRV+jmkRGsIzJFRjbG5HRlOCUHOY1u60cIl0GDE/ZvpbeMSXY/0m1vugNao4LypvgREj Nan/51Yg52F8zJmrLc2+S3mv8ay921gmoyUzDEkLYU3QjYY1NLv63iN4DW8BioXF5QjZ NjOKWQRmzoipwDhIXqZlbZF9AkRFiVSUJUCwwcNObFG5SCEaUE9Owjmh3jrpdEcr1ZqZ 8fbQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:ironport-sdr:dkim-signature; bh=gpzfjDjUXSfospLjnBTLWKKESDPOF6dDsa4rIxN/4KU=; b=c0+0NlURVF91ejABmkULyEUMbX6JYrEk76fmw4lgRznxDP9kiB4Ccf9ouOfhzH2CKQ 9QClw6cc7GQocPPJ9xsiIpbNdboFDdZPRJJ0GwVbWtnaws9QLVEQjF+7S/SZg9WHowC+ AsCS4bpFGG2Ey0PTnI8T0A9wYCgFyzXxjJIl/eLXhYe7Ut3IIyslS4YDeU6oxaybU4S/ 2w4BmTo1MFSRFNxutrSniqpXPL+EJvXDbUUEcOhRFcaMcwDElTwgVnOuyXg1DVRxbUXy lSqi4QqUebVPxVkiecaBcB1Z3omOJazF7HGCN2bCri9nV44Akn8z76FjtFaaJH9VpJIb Yk0g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@amazon.com header.s=amazon201209 header.b="bYl3+/N1"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amazon.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id k14si9353994edr.351.2020.06.15.09.22.57; Mon, 15 Jun 2020 09:23:21 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@amazon.com header.s=amazon201209 header.b="bYl3+/N1"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amazon.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730781AbgFOQVK (ORCPT + 99 others); Mon, 15 Jun 2020 12:21:10 -0400 Received: from smtp-fw-4101.amazon.com ([72.21.198.25]:35164 "EHLO smtp-fw-4101.amazon.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729637AbgFOQVK (ORCPT ); Mon, 15 Jun 2020 12:21:10 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1592238066; x=1623774066; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version; bh=gpzfjDjUXSfospLjnBTLWKKESDPOF6dDsa4rIxN/4KU=; b=bYl3+/N1eSx/Zpj7ZhjK25mMN/rPserllPdBNFuwDvanYNfXm1H7xfuy I+Hxl/AEpsqhdRTisMTu0bGWf6DYvMv+ZlZJliqBiFeIBRijEToSdzoeK A+rOq6a386SBOATtbE4eJc6RMdigxF2TKg8uv9/XnMGRgdYP2KL/w+kYt I=; IronPort-SDR: zE658v6NIXbAIPL5WLwmN70d7+tV8+m1/D1rS2pXjkFH3HSEmC8uM+ywa6kkj5jOs1G9q41B60 E0IG3itizl1Q== X-IronPort-AV: E=Sophos;i="5.73,515,1583193600"; d="scan'208";a="36385371" Received: from iad12-co-svc-p1-lb1-vlan3.amazon.com (HELO email-inbound-relay-2a-1c1b5cdd.us-west-2.amazon.com) ([10.43.8.6]) by smtp-border-fw-out-4101.iad4.amazon.com with ESMTP; 15 Jun 2020 16:20:58 +0000 Received: from EX13MTAUEA002.ant.amazon.com (pdx4-ws-svc-p6-lb7-vlan2.pdx.amazon.com [10.170.41.162]) by email-inbound-relay-2a-1c1b5cdd.us-west-2.amazon.com (Postfix) with ESMTPS id 7B185A1E78; Mon, 15 Jun 2020 16:20:55 +0000 (UTC) Received: from EX13D31EUA001.ant.amazon.com (10.43.165.15) by EX13MTAUEA002.ant.amazon.com (10.43.61.77) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Mon, 15 Jun 2020 16:20:54 +0000 Received: from u886c93fd17d25d.ant.amazon.com (10.43.161.145) by EX13D31EUA001.ant.amazon.com (10.43.165.15) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Mon, 15 Jun 2020 16:20:38 +0000 From: SeongJae Park To: CC: SeongJae Park , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , Subject: [PATCH v16 03/14] mm/damon: Implement region based sampling Date: Mon, 15 Jun 2020 18:19:16 +0200 Message-ID: <20200615161927.12637-4-sjpark@amazon.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20200615161927.12637-1-sjpark@amazon.com> References: <20200615161927.12637-1-sjpark@amazon.com> MIME-Version: 1.0 Content-Type: text/plain X-Originating-IP: [10.43.161.145] X-ClientProxiedBy: EX13D23UWA004.ant.amazon.com (10.43.160.72) To EX13D31EUA001.ant.amazon.com (10.43.165.15) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: SeongJae Park This commit implements DAMON's target address space independent high level logics for basic access check and region based sampling. The target address space specific logics for the monitoring target address regions construction and the access check are required, though. The following commits will provide reference implementations of those for the general virtual address spaces and the physical address space. Users can implement and use their own versions for their specific use cases, though. Basic Access Check ------------------ DAMON basically reports what pages are how frequently accessed. The frequency is not an absolute number of accesses, but a ratio. For this, DAMON first calls target monitoring construction callback (``init_target_regions``), and then the access check callbacks, which is assumed to check the access to each page and aggregates the number of observed accesses of each page, for every ``sampling interval``. Finally, DAMON resets the aggregated count per ``aggregation interval``. This is thus similar to the common periodic access checks based monitoring mechanisms but provides the access frequency. The overhead will increase as the size of the target process grows. Region Based Sampling --------------------- To avoid the unbounded increase of the overhead, DAMON groups a number of adjacent pages that assumed to have same access frequencies into a region. As long as the assumption (pages in a region have same access frequencies) is kept, only one page in the region is required to be checked. Therefore, the monitoring overhead is controllable by setting the number of regions. Nonetheless, this scheme cannot preserve the quality of the output if the assumption is not kept. Following commit will introduce how we can make the guarantee with some sort of best effort. Signed-off-by: SeongJae Park --- include/linux/damon.h | 80 ++++++++++++- mm/damon.c | 258 +++++++++++++++++++++++++++++++++++++++++- 2 files changed, 335 insertions(+), 3 deletions(-) diff --git a/include/linux/damon.h b/include/linux/damon.h index c8f8c1c41a45..649fb8b6209f 100644 --- a/include/linux/damon.h +++ b/include/linux/damon.h @@ -11,6 +11,8 @@ #define _DAMON_H_ #include +#include +#include #include /** @@ -53,11 +55,87 @@ struct damon_task { }; /** - * struct damon_ctx - Represents a context for each monitoring. + * struct damon_ctx - Represents a context for each monitoring. This is the + * main interface that allows users to set the attributes and get the results + * of the monitoring. + * + * For each monitoring request (damon_start()), a kernel thread for the + * monitoring is created. The pointer to the thread is stored in @kdamond. + * + * @sample_interval: The time between access samplings. + * @aggr_interval: The time between monitor results aggregations. + * @min_nr_regions: The number of initial monitoring regions. + * + * For each @sample_interval, DAMON checks whether each region is accessed or + * not. It aggregates and keeps the access information (number of accesses to + * each region) for @aggr_interval time. All time intervals are in + * micro-seconds. + * + * @kdamond: Kernel thread who does the monitoring. + * @kdamond_stop: Notifies whether kdamond should stop. + * @kdamond_lock: Mutex for the synchronizations with @kdamond. + * + * The monitoring thread sets @kdamond to NULL when it terminates. Therefore, + * users can know whether the monitoring is ongoing or terminated by reading + * @kdamond. Also, users can ask @kdamond to be terminated by writing non-zero + * to @kdamond_stop. Reads and writes to @kdamond and @kdamond_stop from + * outside of the monitoring thread must be protected by @kdamond_lock. + * + * Note that the monitoring thread protects only @kdamond and @kdamond_stop via + * @kdamond_lock. Accesses to other fields must be protected by themselves. + * * @tasks_list: Head of monitoring target tasks (&damon_task) list. + * + * @init_target_regions: Constructs initial monitoring target regions. + * @prepare_access_checks: Prepares next access check of target regions. + * @check_accesses: Checks the access of target regions. + * @sample_cb: Called for each sampling interval. + * @aggregate_cb: Called for each aggregation interval. + * + * DAMON can be extended for various address spaces by users. For this, users + * can register the target address space dependent low level functions for + * their usecases via the callback pointers of the context. The monitoring + * thread calls @init_target_regions before starting the monitoring, and + * @prepare_access_checks and @check_accesses for each @sample_interval. + * + * @init_target_regions should construct proper monitoring target regions and + * link those to the DAMON context struct. + * @prepare_access_checks should manipulate the monitoring regions to be + * prepare for the next access check. + * @check_accesses should check the accesses to each region that made after the + * last preparation and update the `->nr_accesses` of each region. + * + * @sample_cb and @aggregate_cb are called from @kdamond for each of the + * sampling intervals and aggregation intervals, respectively. Therefore, + * users can safely access to the monitoring results via @tasks_list without + * additional protection of @kdamond_lock. For the reason, users are + * recommended to use these callback for the accesses to the results. */ struct damon_ctx { + unsigned long sample_interval; + unsigned long aggr_interval; + unsigned long min_nr_regions; + + struct timespec64 last_aggregation; + + struct task_struct *kdamond; + bool kdamond_stop; + struct mutex kdamond_lock; + struct list_head tasks_list; /* 'damon_task' objects */ + + /* callbacks */ + void (*init_target_regions)(struct damon_ctx *context); + void (*prepare_access_checks)(struct damon_ctx *context); + unsigned int (*check_accesses)(struct damon_ctx *context); + void (*sample_cb)(struct damon_ctx *context); + void (*aggregate_cb)(struct damon_ctx *context); }; +int damon_set_pids(struct damon_ctx *ctx, int *pids, ssize_t nr_pids); +int damon_set_attrs(struct damon_ctx *ctx, unsigned long sample_int, + unsigned long aggr_int, unsigned long min_nr_reg); +int damon_start(struct damon_ctx *ctx); +int damon_stop(struct damon_ctx *ctx); + #endif diff --git a/mm/damon.c b/mm/damon.c index 2bf35bdc0470..aba02c652b51 100644 --- a/mm/damon.c +++ b/mm/damon.c @@ -9,18 +9,27 @@ * This file is constructed in below parts. * * - Functions and macros for DAMON data structures + * - Functions for DAMON core logics and features + * - Functions for the DAMON programming interface * - Functions for the module loading/unloading - * - * The core parts are not implemented yet. */ #define pr_fmt(fmt) "damon: " fmt #include +#include +#include #include #include +#include +#include +#include +#include #include +/* Minimal region size. Every damon_region is aligned by this. */ +#define MIN_REGION PAGE_SIZE + /* * Functions and macros for DAMON data structures */ @@ -167,6 +176,251 @@ static unsigned int nr_damon_regions(struct damon_task *t) return nr_regions; } +/* + * Functions for DAMON core logics and features + */ + +/* + * damon_check_reset_time_interval() - Check if a time interval is elapsed. + * @baseline: the time to check whether the interval has elapsed since + * @interval: the time interval (microseconds) + * + * See whether the given time interval has passed since the given baseline + * time. If so, it also updates the baseline to current time for next check. + * + * Return: true if the time interval has passed, or false otherwise. + */ +static bool damon_check_reset_time_interval(struct timespec64 *baseline, + unsigned long interval) +{ + struct timespec64 now; + + ktime_get_coarse_ts64(&now); + if ((timespec64_to_ns(&now) - timespec64_to_ns(baseline)) < + interval * 1000) + return false; + *baseline = now; + return true; +} + +/* + * Check whether it is time to flush the aggregated information + */ +static bool kdamond_aggregate_interval_passed(struct damon_ctx *ctx) +{ + return damon_check_reset_time_interval(&ctx->last_aggregation, + ctx->aggr_interval); +} + +/* + * Reset the aggregated monitoring results + */ +static void kdamond_reset_aggregated(struct damon_ctx *c) +{ + struct damon_task *t; + struct damon_region *r; + + damon_for_each_task(t, c) { + damon_for_each_region(r, t) + r->nr_accesses = 0; + } +} + +/* + * Check whether current monitoring should be stopped + * + * The monitoring is stopped when either the user requested to stop, or all + * monitoring target tasks are dead. + * + * Returns true if need to stop current monitoring. + */ +static bool kdamond_need_stop(struct damon_ctx *ctx) +{ + struct damon_task *t; + struct task_struct *task; + bool stop; + + mutex_lock(&ctx->kdamond_lock); + stop = ctx->kdamond_stop; + mutex_unlock(&ctx->kdamond_lock); + if (stop) + return true; + + damon_for_each_task(t, ctx) { + /* -1 is reserved for non-process bounded monitoring */ + if (t->pid == -1) + return false; + + task = damon_get_task_struct(t); + if (task) { + put_task_struct(task); + return false; + } + } + + return true; +} + +/* + * The monitoring daemon that runs as a kernel thread + */ +static int kdamond_fn(void *data) +{ + struct damon_ctx *ctx = (struct damon_ctx *)data; + struct damon_task *t; + struct damon_region *r, *next; + + pr_info("kdamond (%d) starts\n", ctx->kdamond->pid); + if (ctx->init_target_regions) + ctx->init_target_regions(ctx); + while (!kdamond_need_stop(ctx)) { + if (ctx->prepare_access_checks) + ctx->prepare_access_checks(ctx); + if (ctx->sample_cb) + ctx->sample_cb(ctx); + + usleep_range(ctx->sample_interval, ctx->sample_interval + 1); + + if (ctx->check_accesses) + ctx->check_accesses(ctx); + + if (kdamond_aggregate_interval_passed(ctx)) { + if (ctx->aggregate_cb) + ctx->aggregate_cb(ctx); + kdamond_reset_aggregated(ctx); + } + + } + damon_for_each_task(t, ctx) { + damon_for_each_region_safe(r, next, t) + damon_destroy_region(r); + } + pr_debug("kdamond (%d) finishes\n", ctx->kdamond->pid); + mutex_lock(&ctx->kdamond_lock); + ctx->kdamond = NULL; + mutex_unlock(&ctx->kdamond_lock); + + do_exit(0); +} + +/* + * Functions for the DAMON programming interface + */ + +static bool damon_kdamond_running(struct damon_ctx *ctx) +{ + bool running; + + mutex_lock(&ctx->kdamond_lock); + running = ctx->kdamond != NULL; + mutex_unlock(&ctx->kdamond_lock); + + return running; +} + +/** + * damon_start() - Starts monitoring with given context. + * @ctx: monitoring context + * + * Return: 0 on success, negative error code otherwise. + */ +int damon_start(struct damon_ctx *ctx) +{ + int err = -EBUSY; + + mutex_lock(&ctx->kdamond_lock); + if (!ctx->kdamond) { + err = 0; + ctx->kdamond_stop = false; + ctx->kdamond = kthread_run(kdamond_fn, ctx, "kdamond"); + if (IS_ERR(ctx->kdamond)) + err = PTR_ERR(ctx->kdamond); + } + mutex_unlock(&ctx->kdamond_lock); + + return err; +} + +/** + * damon_stop() - Stops monitoring of given context. + * @ctx: monitoring context + * + * Return: 0 on success, negative error code otherwise. + */ +int damon_stop(struct damon_ctx *ctx) +{ + mutex_lock(&ctx->kdamond_lock); + if (ctx->kdamond) { + ctx->kdamond_stop = true; + mutex_unlock(&ctx->kdamond_lock); + while (damon_kdamond_running(ctx)) + usleep_range(ctx->sample_interval, + ctx->sample_interval * 2); + return 0; + } + mutex_unlock(&ctx->kdamond_lock); + + return -EPERM; +} + +/** + * damon_set_pids() - Set monitoring target processes. + * @ctx: monitoring context + * @pids: array of target processes pids + * @nr_pids: number of entries in @pids + * + * This function should not be called while the kdamond is running. + * + * Return: 0 on success, negative error code otherwise. + */ +int damon_set_pids(struct damon_ctx *ctx, int *pids, ssize_t nr_pids) +{ + ssize_t i; + struct damon_task *t, *next; + + damon_for_each_task_safe(t, next, ctx) + damon_destroy_task(t); + + for (i = 0; i < nr_pids; i++) { + t = damon_new_task(pids[i]); + if (!t) { + pr_err("Failed to alloc damon_task\n"); + return -ENOMEM; + } + damon_add_task(ctx, t); + } + + return 0; +} + +/** + * damon_set_attrs() - Set attributes for the monitoring. + * @ctx: monitoring context + * @sample_int: time interval between samplings + * @aggr_int: time interval between aggregations + * @min_nr_reg: minimal number of regions + * + * This function should not be called while the kdamond is running. + * Every time interval is in micro-seconds. + * + * Return: 0 on success, negative error code otherwise. + */ +int damon_set_attrs(struct damon_ctx *ctx, unsigned long sample_int, + unsigned long aggr_int, unsigned long min_nr_reg) +{ + if (min_nr_reg < 3) { + pr_err("min_nr_regions (%lu) must be at least 3\n", + min_nr_reg); + return -EINVAL; + } + + ctx->sample_interval = sample_int; + ctx->aggr_interval = aggr_int; + ctx->min_nr_regions = min_nr_reg; + + return 0; +} + /* * Functions for the module loading/unloading */ -- 2.17.1