Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp1192515imu; Wed, 9 Jan 2019 13:22:08 -0800 (PST) X-Google-Smtp-Source: ALg8bN55di3MB/hIetlUGtlw2RC5YTbUwfEiMIA0L7ZuRaLp5uS4XV1SxNTQrysFBMaZd/xly5H9 X-Received: by 2002:a62:83ce:: with SMTP id h197mr7491975pfe.187.1547068928491; Wed, 09 Jan 2019 13:22:08 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1547068928; cv=none; d=google.com; s=arc-20160816; b=ga5dDE6hBhCMiwBZZZllygyPU5ok0VdhJZMuZ/YBWu3BevUlOay8LTHLL+MmDvWAFR of6VmuhUrjlKZSp2ZwiltzWWBLf3ODxKkDoQQtT3jukRNRrC0OE0GKmLFIxPKYlVIf6D 3DEBT605F2xuYl3SmlwdPhvcw/ENZkbIsxVgxHe1GUrLS/+LmJevX8JsdGdyMYFUwKwK 9nmDrY8Na7vHRfETaXiBAwBGwbTLzU2fBZyYbFVaRFHa6coH3si2X2paFhRDHVlS0LrT mZ7Ng4pWe9dnGz+b0UggPgYicUYR+Mn7kHcVmZW1lpqtlJyCTlSG4IgwVmmQ4jMXF2UQ 0OaQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from; bh=EN+sxvfh+Jvfs8kApQQJpfLqOm7GMQaU4B1T4WVue8o=; b=XOz3RsZKtyzbLUoLaq6BfZwWOEjZZfx7Rj2eSfwsZBwXKnFBeIaMo9cFVv5X5/Izxl f8dRMV/kfjtPEt7jhTlDhkDj6E3yP3kLEasE8qOdoBVXK2J8vfZPQZWQRP43mMgmOJXX DcVLrzhx5IcmObXluCkx4A1N8c4vuLvE214a9WoN1s8SChi9QXKBTbuNk+qkchLs2plb opYfB/cz8KL2s6Hg4cFcAv7gyxb+HublyfRbi5RKUctyC2ewhKUJaIfUOlZeXWNW0y76 Tp7rbyP4zgL3V1jnes3Xt40pQb5Vyw5zklMBcxsqJ6cLxbQjwSz3phPYgvSIAAoahz2J WNIg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id h8si5713184plr.343.2019.01.09.13.21.53; Wed, 09 Jan 2019 13:22:08 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728167AbfAITSO (ORCPT + 99 others); Wed, 9 Jan 2019 14:18:14 -0500 Received: from out30-132.freemail.mail.aliyun.com ([115.124.30.132]:54230 "EHLO out30-132.freemail.mail.aliyun.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727430AbfAITSN (ORCPT ); Wed, 9 Jan 2019 14:18:13 -0500 X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R151e4;CH=green;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04420;MF=yang.shi@linux.alibaba.com;NM=1;PH=DS;RN=7;SR=0;TI=SMTPD_---0THtvvDg_1547061291; Received: from e19h19392.et15sqa.tbsite.net(mailfrom:yang.shi@linux.alibaba.com fp:SMTPD_---0THtvvDg_1547061291) by smtp.aliyun-inc.com(127.0.0.1); Thu, 10 Jan 2019 03:14:59 +0800 From: Yang Shi To: mhocko@suse.com, hannes@cmpxchg.org, shakeelb@google.com, akpm@linux-foundation.org Cc: yang.shi@linux.alibaba.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [v3 PATCH 3/5] mm: memcontrol: introduce wipe_on_offline interface Date: Thu, 10 Jan 2019 03:14:43 +0800 Message-Id: <1547061285-100329-4-git-send-email-yang.shi@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1547061285-100329-1-git-send-email-yang.shi@linux.alibaba.com> References: <1547061285-100329-1-git-send-email-yang.shi@linux.alibaba.com> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org We have some usecases which create and remove memcgs very frequently, and the tasks in the memcg may just access the files which are unlikely accessed by anyone else. So, we prefer force_empty the memcg before rmdir'ing it to reclaim the page cache so that they don't get accumulated to incur unnecessary memory pressure. Since the memory pressure may incur direct reclaim to harm some latency sensitive applications. Force empty would help out such usecase, however force empty reclaims memory synchronously when writing to memory.force_empty. It may take some time to return and the afterwards operations are blocked by it. Although this can be done in background, some usecases may need create new memcg with the same name right after the old one is deleted. So, the creation might get blocked by the before reclaim/remove operation. Delaying memory reclaim in cgroup offline for such usecase sounds reasonable. Introduced a new interface, called wipe_on_offline for both default and legacy hierarchy, which does memory reclaim in css offline kworker. Writing to 1 would enable it, writing 0 would disable it. Suggested-by: Michal Hocko Cc: Johannes Weiner Cc: Shakeel Butt Signed-off-by: Yang Shi --- include/linux/memcontrol.h | 3 +++ mm/memcontrol.c | 53 ++++++++++++++++++++++++++++++++++++++++++++-- 2 files changed, 54 insertions(+), 2 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 83ae11c..2f1258a 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -311,6 +311,9 @@ struct mem_cgroup { struct list_head event_list; spinlock_t event_list_lock; + /* Reclaim as much as possible memory in offline kworker */ + bool wipe_on_offline; + struct mem_cgroup_per_node *nodeinfo[0]; /* WARNING: nodeinfo must be the last member here */ }; diff --git a/mm/memcontrol.c b/mm/memcontrol.c index eaa3970..ff50810 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2918,6 +2918,35 @@ static ssize_t mem_cgroup_force_empty_write(struct kernfs_open_file *of, return mem_cgroup_force_empty(memcg, true) ?: nbytes; } +static int wipe_on_offline_show(struct seq_file *m, void *v) +{ + struct mem_cgroup *memcg = mem_cgroup_from_css(seq_css(m)); + + seq_printf(m, "%lu\n", (unsigned long)memcg->wipe_on_offline); + + return 0; +} + +static int wipe_on_offline_write(struct cgroup_subsys_state *css, + struct cftype *cft, u64 val) +{ + int ret = 0; + + struct mem_cgroup *memcg = mem_cgroup_from_css(css); + + if (mem_cgroup_is_root(memcg)) + return -EINVAL; + + if (val == 0) + memcg->wipe_on_offline = false; + else if (val == 1) + memcg->wipe_on_offline = true; + else + ret = -EINVAL; + + return ret; +} + static u64 mem_cgroup_hierarchy_read(struct cgroup_subsys_state *css, struct cftype *cft) { @@ -4283,6 +4312,11 @@ static ssize_t memcg_write_event_control(struct kernfs_open_file *of, .write = mem_cgroup_reset, .read_u64 = mem_cgroup_read_u64, }, + { + .name = "wipe_on_offline", + .seq_show = wipe_on_offline_show, + .write_u64 = wipe_on_offline_write, + }, { }, /* terminate */ }; @@ -4569,11 +4603,20 @@ static void mem_cgroup_css_offline(struct cgroup_subsys_state *css) page_counter_set_min(&memcg->memory, 0); page_counter_set_low(&memcg->memory, 0); + /* + * Reclaim as much as possible memory when offlining. + * + * Do it after min/low is reset otherwise some memory might + * be protected by min/low. + */ + if (memcg->wipe_on_offline) + mem_cgroup_force_empty(memcg, false); + else + drain_all_stock(memcg); + memcg_offline_kmem(memcg); wb_memcg_offline(memcg); - drain_all_stock(memcg); - mem_cgroup_id_put(memcg); } @@ -5694,6 +5737,12 @@ static ssize_t memory_oom_group_write(struct kernfs_open_file *of, .seq_show = memory_oom_group_show, .write = memory_oom_group_write, }, + { + .name = "wipe_on_offline", + .flags = CFTYPE_NOT_ON_ROOT, + .seq_show = wipe_on_offline_show, + .write_u64 = wipe_on_offline_write, + }, { } /* terminate */ }; -- 1.8.3.1