Received: by 2002:a05:6358:7058:b0:131:369:b2a3 with SMTP id 24csp9223086rwp; Thu, 20 Jul 2023 01:11:49 -0700 (PDT) X-Google-Smtp-Source: APBJJlF/r4ohYOVWt5JUTlxKLfgKdF4YkVELtiEcXVravEHG9MdnX1X7XWtE1A2lCw3OXa4+Hww1 X-Received: by 2002:a17:906:c189:b0:99b:4a6f:9c72 with SMTP id g9-20020a170906c18900b0099b4a6f9c72mr1740535ejz.72.1689840709130; Thu, 20 Jul 2023 01:11:49 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1689840709; cv=none; d=google.com; s=arc-20160816; b=kEQV8JWQMmr20lDAEmSLrQ+krrOdU/s2zxqJ1nY5MNb/Sza0ZbtTlMcZGR0tr/j4YW k5cnHY6UvmOB7rlEvyT4bXnFvanNZsc8zI5LAMAKXAnRJXDX4Ma9QJckLxip2mv9pg0+ ROOGiFJjHm4yXZyd3i4m3UaAYtAFbMIwPjf9BBDkJj/Ltsfdf3TN8YnYu956iZ9I3U/w ZKi1eLc1WoDPitb0qa9TFhh3Z6Dg18PSzJD1Rg/twP/5Uw3SUQLFBUcfJXhIvt7vhLRB Z9c8P6eHZ5wxyeMRWZDFbHYe3lzsNpWmpCPlIxHuMEd6ZpOjzMCqY3N2+nqt4y34KMaE BBzg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=VyMIM9+tcYiRGMTsBGjvi0vU3Mwws8djANktEDDglwA=; fh=5QaWCNvsrziG43oP6cMwR1zyvEmdAKz+qAWSnAk1QIY=; b=skt7rc0WMuW1SJOerh337UUgC1QrB4m2pfSsivs4nBSqA9dOs6Emt+QzYmoZ/lx8/N XO9vDtpYPYdzUB4hfViEmEb0JR3vs0CuTc+tEihXZMW+GLpguCAhxamRCeJ6tPP0pBaW UrEAWE4842SUUf/BicOJjPQQH+ZskRiAYD146I+bnMJYPo2Z49bdLB8L7Q/7vdqKcmSP Xkb47fB9Dl7CV0njgT2srpIaaISF1gdlGJJw6irYThLfMJ352vQCObovpzg3S5J6KfKV hQK4Beuq//LhhXjvucDZ2W2lwHpMlSog0pw125USrm9M8+4Ww9jig9j1W4a0unALuuLK I/1Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=byOeZzDw; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id l4-20020a170906a40400b00992beea60c2si318659ejz.826.2023.07.20.01.11.24; Thu, 20 Jul 2023 01:11:49 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=byOeZzDw; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231497AbjGTHJe (ORCPT + 99 others); Thu, 20 Jul 2023 03:09:34 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43058 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230291AbjGTHIu (ORCPT ); Thu, 20 Jul 2023 03:08:50 -0400 Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com [IPv6:2607:f8b0:4864:20::b49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5C9AC270C for ; Thu, 20 Jul 2023 00:08:43 -0700 (PDT) Received: by mail-yb1-xb49.google.com with SMTP id 3f1490d57ef6-cb8263615d7so475357276.1 for ; Thu, 20 Jul 2023 00:08:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1689836922; x=1692428922; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=VyMIM9+tcYiRGMTsBGjvi0vU3Mwws8djANktEDDglwA=; b=byOeZzDwSe+TlCjYRKBoXqqWywqR6C1yG/A40dOXCXNDbvhvWImB25mBWajRKzlpsd 7pw0yGkSnP78uE/qIJ91xiFtjZ8WsotdIkdUAUo9pCwMIjy6NapBzMWDaD5jSQ5WRKqG Se0fctUKe/3aEu5ubRL9pul76zHCCtINkNBEyKURGH2aFpknJ55HycotBhqzgpzYPQKE UJtv9JLEiNXgfyyCUh834Zn94be/WG5kodBP7zmNrBOk5cu/mZ3dShiUkQYtjSSYO1Rs 4gRgPSThLVz2L65vzOc7OySwGSHuufaREX+F7eqZX0F2Ko4WmwxK9yLR5upAijtlIKWO +cGA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689836922; x=1692428922; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=VyMIM9+tcYiRGMTsBGjvi0vU3Mwws8djANktEDDglwA=; b=l3lnzggkIXNlIM57sO+q5/QTKbraS6Ef+Q2r199w1mOkh8UTQj7uXC0GX1x3vax9E8 RTjxMQCwvwYKyscw14Do4jwFfGX9fhAfn/CbseZQ1bibfwKP8n+Fcom8ke+IqR5NIdDM jU0M5BaOUcvhrfhOdUfr4ZUCSW7TNQPofqtZAZzWcAwtD7LKAP89bRVZtLLdKZdfKOnY wozW30atVEu5DL7Ibugxd8g38S4KqCIj95+AXEHAmubT1OnRlsm+gFI8MQfIyDYWVu62 Pt2OOuu7O6cjKMvbSI5EW2P9nqbBwggA/RxsfDAcQHklpqf6NEWapZUNdgbtWDGSgLGO mCTg== X-Gm-Message-State: ABy/qLaJjnGHAs31CmIpaLaXHgmwfZX6RsVrKiARY8dRHIp7yVy5CGhB yKbWzTVSI/P+nhqCKnNl2RsxL+fV/BAYoqQB X-Received: from yosry.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:2327]) (user=yosryahmed job=sendgmr) by 2002:a05:6902:1709:b0:cab:e42c:876b with SMTP id by9-20020a056902170900b00cabe42c876bmr40935ybb.3.1689836922401; Thu, 20 Jul 2023 00:08:42 -0700 (PDT) Date: Thu, 20 Jul 2023 07:08:25 +0000 In-Reply-To: <20230720070825.992023-1-yosryahmed@google.com> Mime-Version: 1.0 References: <20230720070825.992023-1-yosryahmed@google.com> X-Mailer: git-send-email 2.41.0.255.g8b1d071c50-goog Message-ID: <20230720070825.992023-9-yosryahmed@google.com> Subject: [RFC PATCH 8/8] selftests: cgroup: test_memcontrol: add a selftest for memcg recharging From: Yosry Ahmed To: Andrew Morton , Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt Cc: Muchun Song , "Matthew Wilcox (Oracle)" , Tejun Heo , Zefan Li , Yu Zhao , Luis Chamberlain , Kees Cook , Iurii Zaikin , "T.J. Mercier" , Greg Thelen , linux-kernel@vger.kernel.org, linux-mm@kvack.org, cgroups@vger.kernel.org, Yosry Ahmed Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org When a memcg is removed, any mapped pages charged to it are recharged to the memcg of the process(es) mapping them. Any remaining pages are recharged using deferred recharge on the next time they are accessed or ditied. Add a selftest that exercises these paths for shmem and normal files: - A page is recharged on offlining if it is already mapped into the address space of a process in a different memcg. - A page is recharged after offlining when written to by a process in a different memcg (if the write results in dirtying the page). - A page is recharged after offlining when read by a process in a different memcg. - A page is recharged after offlining when mapped by a process in a different memcg. Signed-off-by: Yosry Ahmed --- tools/testing/selftests/cgroup/cgroup_util.c | 14 + tools/testing/selftests/cgroup/cgroup_util.h | 1 + .../selftests/cgroup/test_memcontrol.c | 310 ++++++++++++++++++ 3 files changed, 325 insertions(+) diff --git a/tools/testing/selftests/cgroup/cgroup_util.c b/tools/testing/selftests/cgroup/cgroup_util.c index e8bbbdb77e0d..e853b2a4db77 100644 --- a/tools/testing/selftests/cgroup/cgroup_util.c +++ b/tools/testing/selftests/cgroup/cgroup_util.c @@ -519,6 +519,20 @@ int is_swap_enabled(void) return cnt > 1; } + +int is_memcg_recharging_enabled(void) +{ + char buf[10]; + bool enabled; + + if (read_text("/proc/sys/vm/recharge_offline_memcgs", + buf, sizeof(buf)) <= 0) + return -1; + + enabled = strtol(buf, NULL, 10); + return enabled; +} + int set_oom_adj_score(int pid, int score) { char path[PATH_MAX]; diff --git a/tools/testing/selftests/cgroup/cgroup_util.h b/tools/testing/selftests/cgroup/cgroup_util.h index c92df4e5d395..10c0fa36bfd7 100644 --- a/tools/testing/selftests/cgroup/cgroup_util.h +++ b/tools/testing/selftests/cgroup/cgroup_util.h @@ -49,6 +49,7 @@ extern int get_temp_fd(void); extern int alloc_pagecache(int fd, size_t size); extern int alloc_anon(const char *cgroup, void *arg); extern int is_swap_enabled(void); +extern int is_memcg_recharging_enabled(void); extern int set_oom_adj_score(int pid, int score); extern int cg_wait_for_proc_count(const char *cgroup, int count); extern int cg_killall(const char *cgroup); diff --git a/tools/testing/selftests/cgroup/test_memcontrol.c b/tools/testing/selftests/cgroup/test_memcontrol.c index c7c9572003a8..4e1ea93e0a54 100644 --- a/tools/testing/selftests/cgroup/test_memcontrol.c +++ b/tools/testing/selftests/cgroup/test_memcontrol.c @@ -17,6 +17,8 @@ #include #include #include +#include +#include #include "../kselftest.h" #include "cgroup_util.h" @@ -1287,6 +1289,313 @@ static int test_memcg_oom_group_score_events(const char *root) return ret; } +/* Map 50M from the beginning of a file */ +static int map_fd_50M_noexit(const char *cgroup, void *arg) +{ + size_t size = MB(50); + int ppid = getppid(); + int fd = (long)arg; + char *memory; + + memory = mmap(NULL, size, PROT_READ, MAP_SHARED | MAP_POPULATE, fd, 0); + if (memory == MAP_FAILED) { + fprintf(stderr, "error: mmap, errno %d\n", errno); + return -1; + } + + while (getppid() == ppid) + sleep(1); + + munmap(memory, size); + return 0; +} + +/* + * Write 50M to the beginning of a file. + * The file is sync'ed first to make sure any dirty pages are laundered before + * we dirty them again. + */ +static int write_fd_50M(const char *cgroup, void *arg) +{ + size_t size = MB(50); + int fd = (long)arg; + char buf[PAGE_SIZE]; + int i; + + fsync(fd); + lseek(fd, 0, SEEK_SET); + for (i = 0; i < size; i += sizeof(buf)) + write(fd, buf, sizeof(buf)); + + return 0; +} + +/* See write_fd_50M() */ +static int write_fd_50M_noexit(const char *cgroup, void *arg) +{ + int ppid = getppid(); + + write_fd_50M(cgroup, arg); + + while (getppid() == ppid) + sleep(1); + + return 0; +} + +/* Read 50M from the beginning of a file */ +static int read_fd_50M_noexit(const char *cgroup, void *arg) +{ + size_t size = MB(50); + int ppid = getppid(); + int fd = (long)arg; + char buf[PAGE_SIZE]; + int i; + + lseek(fd, 0, SEEK_SET); + for (i = 0; i < size; i += sizeof(buf)) + read(fd, buf, sizeof(buf)); + + while (getppid() == ppid) + sleep(1); + + return 0; +} + +#define TEST_RECHARGE_DIR "/test-recharge" + +static int __test_memcg_recharge(const char *root, char *stat_name) +{ + char *parent = NULL, *child1 = NULL, *child2 = NULL; + long stat, prev, pstat, current; + int ret = KSFT_FAIL; + char file_path[256]; + int i, pid; + struct { + int fd; + int (*before_fn)(const char *cgroup, void *arg); + int (*after_fn)(const char *cgroup, void *arg); + } test_files[] = { + /* test recharge for already mapped file */ + { + .before_fn = map_fd_50M_noexit, + }, + /* test recharge on new mapping after offline */ + { + .after_fn = map_fd_50M_noexit, + }, + /* test recharge on write after offline */ + { + .after_fn = write_fd_50M_noexit, + }, + /* test recharge on read after offline */ + { + .after_fn = read_fd_50M_noexit, + } + }; + + parent = cg_name(root, "parent"); + if (!parent) + goto cleanup; + + if (cg_create(parent)) + goto cleanup; + + if (cg_write(parent, "cgroup.subtree_control", "+memory")) + goto cleanup; + + child1 = cg_name(parent, "child1"); + if (!child1) + goto cleanup; + + if (cg_create(child1)) + goto cleanup; + + child2 = cg_name(parent, "child2"); + if (!child2) + goto cleanup; + + if (cg_create(child2)) + goto cleanup; + + for (i = 0; i < ARRAY_SIZE(test_files); i++) { + long target = MB(50) * (i+1); /* 50MB per file */ + int fd; + + snprintf(file_path, sizeof(file_path), "%s/file%d", + TEST_RECHARGE_DIR, i); + + fd = open(file_path, O_CREAT | O_RDWR); + if (fd < 0) + goto cleanup; + + test_files[i].fd = fd; + if (cg_run(child1, write_fd_50M, (void *)(long) fd)) + goto cleanup; + + stat = 0; + do { + sleep(1); + prev = stat; + stat = cg_read_key_long(child1, "memory.stat", + stat_name); + } while (stat < target && stat > prev); + + if (stat < target) { + fprintf(stderr, "error: child1 %s %ld < %ld", + stat_name, stat, target); + goto cleanup; + } + + current = cg_read_long(child1, "memory.current"); + if (current < target) { + fprintf(stderr, "error: child1 current %ld < %ld", + current, target); + goto cleanup; + } + + if (test_files[i].before_fn) { + pid = cg_run_nowait(child2, test_files[i].before_fn, + (void *)(long)fd); + if (pid < 0) + goto cleanup; + /* make sure before_fn() finishes executing before offlining */ + sleep(1); + } + } + + current = cg_read_long(child2, "memory.current"); + if (current > MB(1)) { + fprintf(stderr, "error: child2 current %ld > 1M\n", current); + goto cleanup; + } + + stat = cg_read_key_long(child2, "memory.stat", stat_name); + if (stat > 0) { + fprintf(stderr, "error: child2 %s %ld > 0\n", + stat_name, stat); + goto cleanup; + } + + if (cg_destroy(child1) < 0) + goto cleanup; + + for (i = 0; i < ARRAY_SIZE(test_files); i++) { + long target = MB(50) * (i+1); + int fd = test_files[i].fd; + + if (test_files[i].after_fn) { + pid = cg_run_nowait(child2, test_files[i].after_fn, + (void *)(long)fd); + if (pid < 0) + goto cleanup; + } + + stat = 0; + do { + sleep(1); + prev = stat; + stat = cg_read_key_long(child2, "memory.stat", + stat_name); + } while (stat < target && stat > prev); + + if (stat < target) { + fprintf(stderr, "error: child2 %s %ld < %ld\n", + stat_name, stat, target); + goto cleanup; + } + + current = cg_read_long(child2, "memory.current"); + if (current < target) { + fprintf(stderr, "error: child2 current %ld < %ld\n", + current, target); + goto cleanup; + } + } + + pstat = cg_read_key_long(parent, "memory.stat", stat_name); + if (stat < pstat) { + fprintf(stderr, "error: recharged %s (%ld) < total (%ld)\n", + stat_name, stat, pstat); + goto cleanup; + } + + ret = KSFT_PASS; +cleanup: + if (child2) { + cg_destroy(child2); + free(child2); + } + if (child1) { + cg_destroy(child1); + free(child1); + } + if (parent) { + cg_destroy(parent); + free(parent); + } + for (i = 0; i < ARRAY_SIZE(test_files); i++) { + close(test_files[i].fd); + snprintf(file_path, sizeof(file_path), "%s/file%d", + TEST_RECHARGE_DIR, i); + remove(file_path); + } + return ret; +} + +static int test_memcg_recharge(const char *root) +{ + int i, ret = KSFT_PASS; + struct { + char *mount_type, *stat_name; + } test_setups[] = { + /* test both shmem & normal files */ + { + .mount_type = "tmpfs", + .stat_name = "shmem", + }, + { + .stat_name = "file", + } + }; + + if (!is_memcg_recharging_enabled()) + return KSFT_SKIP; + + if (unshare(CLONE_NEWNS) < 0) + return KSFT_FAIL; + + if (mount(NULL, "/", "", MS_REC | MS_PRIVATE, NULL) < 0) + return KSFT_FAIL; + + for (i = 0; i < ARRAY_SIZE(test_setups); i++) { + int setup_ret = KSFT_FAIL; + char *mount_type = test_setups[i].mount_type; + char *stat_name = test_setups[i].stat_name; + + if (mkdir(TEST_RECHARGE_DIR, 0777) < 0) + goto next; + + if (mount_type && + mount(NULL, TEST_RECHARGE_DIR, mount_type, 0, NULL) < 0) + goto next; + + setup_ret = __test_memcg_recharge(root, stat_name); + +next: + if (mount_type) + umount(TEST_RECHARGE_DIR); + remove(TEST_RECHARGE_DIR); + + if (setup_ret == KSFT_FAIL) { + ret = KSFT_FAIL; + break; + } + } + umount("/"); + return ret; +} + #define T(x) { x, #x } struct memcg_test { int (*fn)(const char *root); @@ -1306,6 +1615,7 @@ struct memcg_test { T(test_memcg_oom_group_leaf_events), T(test_memcg_oom_group_parent_events), T(test_memcg_oom_group_score_events), + T(test_memcg_recharge), }; #undef T -- 2.41.0.255.g8b1d071c50-goog