Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp2193013imu; Thu, 10 Jan 2019 09:46:53 -0800 (PST) X-Google-Smtp-Source: ALg8bN7Ft14yrMLhr7DHmOBaim/NJtKEhohxt0NInlKA6T8mzpRXlTLeNv6gkRyxhghEcYDDWtas X-Received: by 2002:a63:c64f:: with SMTP id x15mr9953694pgg.16.1547142412947; Thu, 10 Jan 2019 09:46:52 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1547142412; cv=none; d=google.com; s=arc-20160816; b=RAlHEGGJGFvNYsVQZj0uA3f5kSC/Tfuj86jpoIugm+5+yhQTJvsiJDwDxfyYZ3BAeK p5G1AKC4CNlWfbc4f5JO/PPijSL6w0I0ObMviJJLEBaxOosM63vtgRy/1c6LmoM6YxqI FCM3cQn8zesJfU4Bej/sNwBZEPk2at+quLbo4FTIy8udm6l2kJLay6XuF1PzZuBWpxnO YsJlfR/Q+TG98Ky2w216Gsk/cDo3JvP367VvVPXE6dzGb/5w5lsqJp9zFoAd6nYrNCdB tLYdbNP9mIRpkOKnRh71ynjbzqqD4G0bYnkUNKONVFg8qq4kyM+sXeY1L89Ee7+VsYpG BCXA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:from:subject:mime-version :message-id:date:dkim-signature; bh=Gt8KB5JyoxxYaL1/z6NLcbWHN8NvTltGpPlll2SY3FY=; b=PtbYx2vMMIXsWVevdtFDcs86fRuPvCTbg3aHYh1cRS8VhzMimIUyEgCn9kE7plRaiN xqFAnibcPtT5IdSq6EtFHSQRGoeh2TGQD79JZH3p2B7rxIYw3v5WlC0qXM8IbrkuUx/G Dw9lwJ34ImJu13FUQ1BLwZHkTL620PegHpRl0Jci9wtRXjDi265x+qke/vjYJTil/CFL 6qPflttvF0peKwltS3e8cvoRO0B5ARvzldH0Zmecvp4CBDFA78Eaumh+wE2peGTCGWGo r5gw5IvTrZEzdWFC6EATaLpAoRPyrwyymkZuS1z0PTBv5YxRRrRYX15UD6b2jgtskTTk 4tkg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=rSTdy8Eq; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id v5si67719563ply.74.2019.01.10.09.46.37; Thu, 10 Jan 2019 09:46:52 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=rSTdy8Eq; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730584AbfAJRoo (ORCPT + 99 others); Thu, 10 Jan 2019 12:44:44 -0500 Received: from mail-pl1-f202.google.com ([209.85.214.202]:41187 "EHLO mail-pl1-f202.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730024AbfAJRon (ORCPT ); Thu, 10 Jan 2019 12:44:43 -0500 Received: by mail-pl1-f202.google.com with SMTP id y2so6601222plr.8 for ; Thu, 10 Jan 2019 09:44:43 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:message-id:mime-version:subject:from:to:cc; bh=Gt8KB5JyoxxYaL1/z6NLcbWHN8NvTltGpPlll2SY3FY=; b=rSTdy8EqKNDloKVk7yuhlbAdCYfMtPuUgqSsT99xkRUxQP4YdeQX2BlE7gVeyBCUB4 DpDB0pJiexAVR5wSa5LDB3gsKPGKD+LzAUgOfgBvfmCpIiJlxceVQCSfa9D6WKocBTzg CYIeuSwBbcGCiapIrOH0wtIfM2+pZuVqkR2qPKjwZIZ9Q/OaKZ9U6wPPpODCqYJ9tJZc 6gH3LHZUs7tR7Lr6og9K7DHz8PgDkBLmDPoUWPUCZ0/l1+UuFjyKz4+4VS/zaqdz+l7d LPp6BPRean8RE7C8UrsGUuZHKMYV1OYT4HSiOyZMuAh25AbdI6di/fQ+TJzKXPLcvOps CAlQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:message-id:mime-version:subject:from:to:cc; bh=Gt8KB5JyoxxYaL1/z6NLcbWHN8NvTltGpPlll2SY3FY=; b=Ty0AIAR5gaQIq+O93mXYPAIVEji2UzdOkTsTRy8OQ7Clw7T1CAvw3AnokcrrBu+1W4 ORMwk/asew5708STs+/yy8bZDcqMpBSrs342k1FgdNpoOStSTqmvAvoTwM1lewWRt988 bnBeP9N3x+mTV/IkbBKQVvk5OfPPzQLXIobUboPdjdzUYfGo4QGDDpYB7Ms7HsUech1g TkgNlLbFElPaF42t3ppFVNpY7v3dpzwfDE/yZmcYW+cC7pqa2cmvo3njT2Mnts2z7FMv jCMqJ26OzCqQZvqyDEQBpF+NPTRP0mopv01RY+cXB+k82xVtrmDeG8IBtn8r5pCTWi77 ASnw== X-Gm-Message-State: AJcUukfOrXKPjbRPVA1CSoT8Tvq5PnjqiO4uEM47vaGgC4FYu+rLPDbD B1HhMKJnQEdSmMGyzefe2EhYiBW3TRnfpQ== X-Received: by 2002:a17:902:2bc5:: with SMTP id l63mr2405108plb.82.1547142282934; Thu, 10 Jan 2019 09:44:42 -0800 (PST) Date: Thu, 10 Jan 2019 09:44:32 -0800 Message-Id: <20190110174432.82064-1-shakeelb@google.com> Mime-Version: 1.0 X-Mailer: git-send-email 2.20.1.97.g81188d93c3-goog Subject: [PATCH v3] memcg: schedule high reclaim for remote memcgs on high_work From: Shakeel Butt To: Michal Hocko , Andrew Morton , Johannes Weiner , Vladimir Davydov Cc: cgroups@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Shakeel Butt Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org If a memcg is over high limit, memory reclaim is scheduled to run on return-to-userland. However it is assumed that the memcg is the current process's memcg. With remote memcg charging for kmem or swapping in a page charged to remote memcg, current process can trigger reclaim on remote memcg. So, schduling reclaim on return-to-userland for remote memcgs will ignore the high reclaim altogether. So, record the memcg needing high reclaim and trigger high reclaim for that memcg on return-to-userland. However if the memcg is already recorded for high reclaim and the recorded memcg is not the descendant of the the memcg needing high reclaim, punt the high reclaim to the work queue. Signed-off-by: Shakeel Butt --- Changelog since v2: - TIF_NOTIFY_RESUME can be set from places other than try_charge() in which case current->memcg_high_reclaim will be null. Correctly handle such scenarios. Changelog since v1: - Punt high reclaim of a memcg to work queue only if the recorded memcg is not its descendant. include/linux/sched.h | 3 +++ kernel/fork.c | 1 + mm/memcontrol.c | 22 ++++++++++++++++------ 3 files changed, 20 insertions(+), 6 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index 7d08562eeec7..5e6690042497 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1172,6 +1172,9 @@ struct task_struct { /* Used by memcontrol for targeted memcg charge: */ struct mem_cgroup *active_memcg; + + /* Used by memcontrol for high relcaim: */ + struct mem_cgroup *memcg_high_reclaim; #endif #ifdef CONFIG_BLK_CGROUP diff --git a/kernel/fork.c b/kernel/fork.c index 1b0fde63d831..85da44137847 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -918,6 +918,7 @@ static struct task_struct *dup_task_struct(struct task_struct *orig, int node) #ifdef CONFIG_MEMCG tsk->active_memcg = NULL; + tsk->memcg_high_reclaim = NULL; #endif return tsk; diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 953d4ba8a595..18f4aefbe0bf 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2168,14 +2168,17 @@ static void high_work_func(struct work_struct *work) void mem_cgroup_handle_over_high(void) { unsigned int nr_pages = current->memcg_nr_pages_over_high; - struct mem_cgroup *memcg; + struct mem_cgroup *memcg = current->memcg_high_reclaim; if (likely(!nr_pages)) return; - memcg = get_mem_cgroup_from_mm(current->mm); + if (!memcg) + memcg = get_mem_cgroup_from_mm(current->mm); + reclaim_high(memcg, nr_pages, GFP_KERNEL); css_put(&memcg->css); + current->memcg_high_reclaim = NULL; current->memcg_nr_pages_over_high = 0; } @@ -2329,10 +2332,10 @@ static int try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask, * If the hierarchy is above the normal consumption range, schedule * reclaim on returning to userland. We can perform reclaim here * if __GFP_RECLAIM but let's always punt for simplicity and so that - * GFP_KERNEL can consistently be used during reclaim. @memcg is - * not recorded as it most likely matches current's and won't - * change in the meantime. As high limit is checked again before - * reclaim, the cost of mismatch is negligible. + * GFP_KERNEL can consistently be used during reclaim. Record the memcg + * for the return-to-userland high reclaim. If the memcg is already + * recorded and the recorded memcg is not the descendant of the memcg + * needing high reclaim, punt the high reclaim to the work queue. */ do { if (page_counter_read(&memcg->memory) > memcg->high) { @@ -2340,6 +2343,13 @@ static int try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask, if (in_interrupt()) { schedule_work(&memcg->high_work); break; + } else if (!current->memcg_high_reclaim) { + css_get(&memcg->css); + current->memcg_high_reclaim = memcg; + } else if (!mem_cgroup_is_descendant( + current->memcg_high_reclaim, memcg)) { + schedule_work(&memcg->high_work); + break; } current->memcg_nr_pages_over_high += batch; set_notify_resume(current); -- 2.20.1.97.g81188d93c3-goog