Received: by 2002:ac0:a594:0:0:0:0:0 with SMTP id m20-v6csp2952857imm; Thu, 24 May 2018 19:47:19 -0700 (PDT) X-Google-Smtp-Source: AB8JxZpKpxgDmSY27+0hGUOh9Upud24OAJV7mgf76OvKhjx1XJzJVpPmTq2XUbSH+ch8gIq0IX5d X-Received: by 2002:a62:c103:: with SMTP id i3-v6mr613831pfg.148.1527216439886; Thu, 24 May 2018 19:47:19 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1527216439; cv=none; d=google.com; s=arc-20160816; b=T9EtJEaFLxKsw8v9BrAyQse5h/BR2iWZyzmuUVlLxiEclPeUU47Xmi1X24qWpOXo7+ Uwf+q8VPtKrq+KYFoj0JsuOo6Q/oO66MLraUu+ZuZJeKSj+RezXZxq3X3VKJQA8maA0s Xywc4KLJJlEleqmrZvxDLeR1oFoLUZfnBsA4ZQtZqQbAbalQHyFyNUge80Eudm1ozh50 gKXSSDsQy6EzM1EsqtplA4XEVfQ7jFTudL8groQj8z+KIWEqiwKzNKJ0GqmTwVmSfQ2U fEByZuQ0tpOC9lEDM+EOCHpNWHnlZwRid97r9aOKbkPOB+hgHYbfw/gkXFqxgKcAv1Zi SYaQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature:arc-authentication-results; bh=avoDFARa/5oW9295PZa/ilUCcVhSxEI6GiPZlHXLuAo=; b=Ki83XnWcJNlLQ8l45hHn75AAsDw+WtsJijxzGZhdWkJUf2C1kkLr/19izw1ERGbrTm RAqBxcpb3kbaftZY4azElLVB4F5p/zKeK+4PHR2XS0WSpDpW5A875gU4vFaWpxCze/2w 4LNe2KdPkVrBSoiuhjH2a7JkHb6Vd/g22hEAHMUuHISYmIOVQadRlm0404xSbqcE2eHc nlBEjluGkZNOK6z4QQezMMqN9e1qiTBb7h2c3AU3a9XwkfDLNN+66bnhUg/bNZPpAH+H LlZdBQ/gpTazt3a3qtNYeTiweHEK/+Oe62Oro+doDvxdwyvjywqg6l9GLnQ3n9QfAhzI 2vZQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@amarulasolutions.com header.s=google header.b=PykF11mN; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id p2-v6si13337055pgn.187.2018.05.24.19.47.05; Thu, 24 May 2018 19:47:19 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@amarulasolutions.com header.s=google header.b=PykF11mN; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S968434AbeEXXhV (ORCPT + 99 others); Thu, 24 May 2018 19:37:21 -0400 Received: from mail-wr0-f194.google.com ([209.85.128.194]:39657 "EHLO mail-wr0-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S935472AbeEXXhT (ORCPT ); Thu, 24 May 2018 19:37:19 -0400 Received: by mail-wr0-f194.google.com with SMTP id w18-v6so6049968wrn.6 for ; Thu, 24 May 2018 16:37:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amarulasolutions.com; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=avoDFARa/5oW9295PZa/ilUCcVhSxEI6GiPZlHXLuAo=; b=PykF11mNhjY9NBByXfmCW1E1pVFdieS/1oC8pRvJBR9eqrfmEua7QeeJtOJQhFuL2O w/WbG1pKW/qRVQTeYH+DQe9b8MwAHhlxO3eeew81NPbOyozGWY+kz4XO+bhwpNb0ccnh zvsseQpcxAW1dGxSl3rUDh0IIe/83yUa9XrbM= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=avoDFARa/5oW9295PZa/ilUCcVhSxEI6GiPZlHXLuAo=; b=n1bYy/+yERIqGzvxM2k0ehiCYGIkE4LgIeiPS78Jier3R+1ZLOfpNqkB3AwPiolivL BKWA8Xh8WHVXnnhqGy5mXmrGn7bqWEgBw8jHLYhh73CfK7YZKwti5LHhnXbiMKxXltz+ 2lfkt+HJ+k9djQLGCAPPv5V+OnU6hcjev9WM1uRMSj9t77y57BYtomERs6eGO9dLbAJF 2IurXSvECYfa0r/goFJ3jKxHUT92369QGxN9D0ZBRnES4zpucMg8bSv/Vrc7x53KVhrl cwi2aQFybXPOaPY3GADT3bB+Hd8TW5FiVqQwiD1nK+o4wBZHpk0t+iLFTkjpS57/BvLX VfRQ== X-Gm-Message-State: ALKqPwdXz3HQl2AnbuZDvBTsO1on/0pxfHgHxrQnSi7VobifAKFEf4B3 4/SFrBml0Jc8Hl0aELazyX2rgw== X-Received: by 2002:adf:a0b9:: with SMTP id m54-v6mr55121wrm.76.1527205037844; Thu, 24 May 2018 16:37:17 -0700 (PDT) Received: from andrea ([94.230.152.15]) by smtp.gmail.com with ESMTPSA id v31-v6sm13459090wrc.80.2018.05.24.16.37.15 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 24 May 2018 16:37:16 -0700 (PDT) Date: Fri, 25 May 2018 01:37:10 +0200 From: Andrea Parri To: Andrew Morton Cc: Michal Hocko , "Eric W. Biederman" , Johannes Weiner , Kirill Tkhai , peterz@infradead.org, viro@zeniv.linux.org.uk, mingo@kernel.org, paulmck@linux.vnet.ibm.com, keescook@chromium.org, riel@redhat.com, tglx@linutronix.de, kirill.shutemov@linux.intel.com, marcos.souza.org@gmail.com, hoeun.ryu@gmail.com, pasha.tatashin@oracle.com, gs051095@gmail.com, dhowells@redhat.com, rppt@linux.vnet.ibm.com, linux-kernel@vger.kernel.org, Balbir Singh , Tejun Heo , Oleg Nesterov Subject: Re: [PATCH 0/2] mm->owner to mm->memcg fixes Message-ID: <20180524233710.GA2993@andrea> References: <20180504145435.GA26573@redhat.com> <87y3gzfmjt.fsf@xmission.com> <20180504162209.GB26573@redhat.com> <871serfk77.fsf@xmission.com> <87tvrncoyc.fsf_-_@xmission.com> <20180510121418.GD5325@dhcp22.suse.cz> <20180522125757.GL20020@dhcp22.suse.cz> <87wovu889o.fsf@xmission.com> <20180524111002.GB20441@dhcp22.suse.cz> <20180524141635.c99b7025a73a709e179f92a2@linux-foundation.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180524141635.c99b7025a73a709e179f92a2@linux-foundation.org> User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, May 24, 2018 at 02:16:35PM -0700, Andrew Morton wrote: > On Thu, 24 May 2018 13:10:02 +0200 Michal Hocko wrote: > > > I would really prefer and appreciate a repost with all the fixes folded > > in. > > [1/2] > > From: "Eric W. Biederman" > Subject: memcg: replace mm->owner with mm->memcg > > Recently it was reported that mm_update_next_owner could get into cases > where it was executing its fallback for_each_process part of the loop and > thus taking up a lot of time. Reference? > > To deal with this replace mm->owner with mm->memcg. This just reduces the > complexity of everything. "the complexity of everything"? > As much as possible I have maintained the > current semantics. "As much as possible"? > There are two siginificant exceptions. s/siginificant/significant > During fork > the memcg of the process calling fork is charged rather than init_css_set. > During memory cgroup migration the charges are migrated not if the > process is the owner of the mm, but if the process being migrated has the > same memory cgroup as the mm. > > I believe it was a bug It was a bug or not?? > if init_css_set is charged for memory activity > during fork, and the old behavior was simply a consequence of the new task > not having tsk->cgroup not initialized to it's proper cgroup. > > During cgroup migration only thread group leaders are allowed to migrate. > Which means in practice there should only be one. "in practice there should"?? > Linux tasks created > with CLONE_VM are the only exception, but the common cases are already > ruled out. Processes created with vfork have a suspended parent and can > do nothing but call exec so they should never show up. Threads of the > same cgroup are not the thread group leader so also should not show up. > That leaves the old LinuxThreads library which is probably out of use by "probably"??? > now, and someone doing something very creative with cgroups, "very creative"? > and rolling > their own threads with CLONE_VM. So in practice I don't think "in practice I don't think"?? Andrea > the > difference charge migration will affect anyone. > > To ensure that mm->memcg is updated appropriately I have implemented > cgroup "attach" and "fork" methods. This ensures that at those points the > mm pointed to the task has the appropriate memory cgroup. > > For simplicity instead of introducing a new mm lock I simply use exchange > on the pointer where the mm->memcg is updated to get atomic updates. > > Looking at the history effectively this change is a revert. The reason > given for adding mm->owner is so that multiple cgroups can be attached to > the same mm. In the last 8 years a second user of mm->owner has not > appeared. A feature that has never used, makes the code more complicated > and has horrible worst case performance should go. > > [ebiederm@xmission.com: update to work when !CONFIG_MMU] > Link: http://lkml.kernel.org/r/87lgczcox0.fsf_-_@xmission.com > [ebiederm@xmission.com: close race between migration and installing bprm->mm as mm] > Link: http://lkml.kernel.org/r/87fu37cow4.fsf_-_@xmission.com > Link: http://lkml.kernel.org/r/87lgd1zww0.fsf_-_@xmission.com > Fixes: cf475ad28ac3 ("cgroups: add an owner to the mm_struct") > Signed-off-by: "Eric W. Biederman" > Reported-by: Kirill Tkhai > Acked-by: Johannes Weiner > Cc: Michal Hocko > Cc: "Kirill A. Shutemov" > Cc: Tejun Heo > Cc: Oleg Nesterov > Signed-off-by: Andrew Morton > --- > > fs/exec.c | 3 - > include/linux/memcontrol.h | 16 +++++- > include/linux/mm_types.h | 12 ---- > include/linux/sched/mm.h | 8 --- > kernel/exit.c | 89 ----------------------------------- > kernel/fork.c | 17 +++++- > mm/debug.c | 4 - > mm/memcontrol.c | 81 +++++++++++++++++++++++-------- > 8 files changed, 93 insertions(+), 137 deletions(-) > > diff -puN fs/exec.c~memcg-replace-mm-owner-with-mm-memcg fs/exec.c > --- a/fs/exec.c~memcg-replace-mm-owner-with-mm-memcg > +++ a/fs/exec.c > @@ -1040,11 +1040,12 @@ static int exec_mmap(struct mm_struct *m > up_read(&old_mm->mmap_sem); > BUG_ON(active_mm != old_mm); > setmax_mm_hiwater_rss(&tsk->signal->maxrss, old_mm); > - mm_update_next_owner(old_mm); > mmput(old_mm); > return 0; > } > mmdrop(active_mm); > + /* The tsk may have migrated before the new mm was attached */ > + mm_sync_memcg_from_task(tsk); > return 0; > } > > diff -puN include/linux/memcontrol.h~memcg-replace-mm-owner-with-mm-memcg include/linux/memcontrol.h > --- a/include/linux/memcontrol.h~memcg-replace-mm-owner-with-mm-memcg > +++ a/include/linux/memcontrol.h > @@ -345,7 +345,6 @@ out: > struct lruvec *mem_cgroup_page_lruvec(struct page *, struct pglist_data *); > > bool task_in_mem_cgroup(struct task_struct *task, struct mem_cgroup *memcg); > -struct mem_cgroup *mem_cgroup_from_task(struct task_struct *p); > > struct mem_cgroup *get_mem_cgroup_from_mm(struct mm_struct *mm); > > @@ -408,6 +407,9 @@ static inline bool mem_cgroup_is_descend > return cgroup_is_descendant(memcg->css.cgroup, root->css.cgroup); > } > > +void mm_update_memcg(struct mm_struct *mm, struct mem_cgroup *new); > +void mm_sync_memcg_from_task(struct task_struct *tsk); > + > static inline bool mm_match_cgroup(struct mm_struct *mm, > struct mem_cgroup *memcg) > { > @@ -415,7 +417,7 @@ static inline bool mm_match_cgroup(struc > bool match = false; > > rcu_read_lock(); > - task_memcg = mem_cgroup_from_task(rcu_dereference(mm->owner)); > + task_memcg = rcu_dereference(mm->memcg); > if (task_memcg) > match = mem_cgroup_is_descendant(task_memcg, memcg); > rcu_read_unlock(); > @@ -699,7 +701,7 @@ static inline void count_memcg_event_mm( > return; > > rcu_read_lock(); > - memcg = mem_cgroup_from_task(rcu_dereference(mm->owner)); > + memcg = rcu_dereference(mm->memcg); > if (likely(memcg)) { > count_memcg_events(memcg, idx, 1); > if (idx == OOM_KILL) > @@ -787,6 +789,14 @@ static inline struct lruvec *mem_cgroup_ > return &pgdat->lruvec; > } > > +static inline void mm_update_memcg(struct mm_struct *mm, struct mem_cgroup *new) > +{ > +} > + > +static inline void mm_sync_memcg_from_task(struct task_struct *tsk) > +{ > +} > + > static inline bool mm_match_cgroup(struct mm_struct *mm, > struct mem_cgroup *memcg) > { > diff -puN include/linux/mm_types.h~memcg-replace-mm-owner-with-mm-memcg include/linux/mm_types.h > --- a/include/linux/mm_types.h~memcg-replace-mm-owner-with-mm-memcg > +++ a/include/linux/mm_types.h > @@ -445,17 +445,7 @@ struct mm_struct { > struct kioctx_table __rcu *ioctx_table; > #endif > #ifdef CONFIG_MEMCG > - /* > - * "owner" points to a task that is regarded as the canonical > - * user/owner of this mm. All of the following must be true in > - * order for it to be changed: > - * > - * current == mm->owner > - * current->mm != mm > - * new_owner->mm == mm > - * new_owner->alloc_lock is held > - */ > - struct task_struct __rcu *owner; > + struct mem_cgroup __rcu *memcg; > #endif > struct user_namespace *user_ns; > > diff -puN include/linux/sched/mm.h~memcg-replace-mm-owner-with-mm-memcg include/linux/sched/mm.h > --- a/include/linux/sched/mm.h~memcg-replace-mm-owner-with-mm-memcg > +++ a/include/linux/sched/mm.h > @@ -95,14 +95,6 @@ extern struct mm_struct *mm_access(struc > /* Remove the current tasks stale references to the old mm_struct */ > extern void mm_release(struct task_struct *, struct mm_struct *); > > -#ifdef CONFIG_MEMCG > -extern void mm_update_next_owner(struct mm_struct *mm); > -#else > -static inline void mm_update_next_owner(struct mm_struct *mm) > -{ > -} > -#endif /* CONFIG_MEMCG */ > - > #ifdef CONFIG_MMU > extern void arch_pick_mmap_layout(struct mm_struct *mm, > struct rlimit *rlim_stack); > diff -puN kernel/exit.c~memcg-replace-mm-owner-with-mm-memcg kernel/exit.c > --- a/kernel/exit.c~memcg-replace-mm-owner-with-mm-memcg > +++ a/kernel/exit.c > @@ -399,94 +399,6 @@ kill_orphaned_pgrp(struct task_struct *t > } > } > > -#ifdef CONFIG_MEMCG > -/* > - * A task is exiting. If it owned this mm, find a new owner for the mm. > - */ > -void mm_update_next_owner(struct mm_struct *mm) > -{ > - struct task_struct *c, *g, *p = current; > - > -retry: > - /* > - * If the exiting or execing task is not the owner, it's > - * someone else's problem. > - */ > - if (mm->owner != p) > - return; > - /* > - * The current owner is exiting/execing and there are no other > - * candidates. Do not leave the mm pointing to a possibly > - * freed task structure. > - */ > - if (atomic_read(&mm->mm_users) <= 1) { > - mm->owner = NULL; > - return; > - } > - > - read_lock(&tasklist_lock); > - /* > - * Search in the children > - */ > - list_for_each_entry(c, &p->children, sibling) { > - if (c->mm == mm) > - goto assign_new_owner; > - } > - > - /* > - * Search in the siblings > - */ > - list_for_each_entry(c, &p->real_parent->children, sibling) { > - if (c->mm == mm) > - goto assign_new_owner; > - } > - > - /* > - * Search through everything else, we should not get here often. > - */ > - for_each_process(g) { > - if (g->flags & PF_KTHREAD) > - continue; > - for_each_thread(g, c) { > - if (c->mm == mm) > - goto assign_new_owner; > - if (c->mm) > - break; > - } > - } > - read_unlock(&tasklist_lock); > - /* > - * We found no owner yet mm_users > 1: this implies that we are > - * most likely racing with swapoff (try_to_unuse()) or /proc or > - * ptrace or page migration (get_task_mm()). Mark owner as NULL. > - */ > - mm->owner = NULL; > - return; > - > -assign_new_owner: > - BUG_ON(c == p); > - get_task_struct(c); > - /* > - * The task_lock protects c->mm from changing. > - * We always want mm->owner->mm == mm > - */ > - task_lock(c); > - /* > - * Delay read_unlock() till we have the task_lock() > - * to ensure that c does not slip away underneath us > - */ > - read_unlock(&tasklist_lock); > - if (c->mm != mm) { > - task_unlock(c); > - put_task_struct(c); > - goto retry; > - } > - mm->owner = c; > - task_unlock(c); > - put_task_struct(c); > -} > -#endif /* CONFIG_MEMCG */ > - > /* > * Turn us into a lazy TLB process if we > * aren't already.. > @@ -540,7 +452,6 @@ static void exit_mm(void) > up_read(&mm->mmap_sem); > enter_lazy_tlb(mm, current); > task_unlock(current); > - mm_update_next_owner(mm); > mmput(mm); > if (test_thread_flag(TIF_MEMDIE)) > exit_oom_victim(); > diff -puN kernel/fork.c~memcg-replace-mm-owner-with-mm-memcg kernel/fork.c > --- a/kernel/fork.c~memcg-replace-mm-owner-with-mm-memcg > +++ a/kernel/fork.c > @@ -878,10 +878,19 @@ static void mm_init_aio(struct mm_struct > #endif > } > > -static void mm_init_owner(struct mm_struct *mm, struct task_struct *p) > +static void mm_init_memcg(struct mm_struct *mm) > { > #ifdef CONFIG_MEMCG > - mm->owner = p; > + struct cgroup_subsys_state *css; > + > + /* Ensure mm->memcg is initialized */ > + mm->memcg = NULL; > + > + rcu_read_lock(); > + css = task_css(current, memory_cgrp_id); > + if (css && css_tryget(css)) > + mm_update_memcg(mm, mem_cgroup_from_css(css)); > + rcu_read_unlock(); > #endif > } > > @@ -912,7 +921,7 @@ static struct mm_struct *mm_init(struct > spin_lock_init(&mm->arg_lock); > mm_init_cpumask(mm); > mm_init_aio(mm); > - mm_init_owner(mm, p); > + mm_init_memcg(mm); > RCU_INIT_POINTER(mm->exe_file, NULL); > mmu_notifier_mm_init(mm); > hmm_mm_init(mm); > @@ -942,6 +951,7 @@ static struct mm_struct *mm_init(struct > fail_nocontext: > mm_free_pgd(mm); > fail_nopgd: > + mm_update_memcg(mm, NULL); > free_mm(mm); > return NULL; > } > @@ -979,6 +989,7 @@ static inline void __mmput(struct mm_str > } > if (mm->binfmt) > module_put(mm->binfmt->module); > + mm_update_memcg(mm, NULL); > mmdrop(mm); > } > > diff -puN mm/debug.c~memcg-replace-mm-owner-with-mm-memcg mm/debug.c > --- a/mm/debug.c~memcg-replace-mm-owner-with-mm-memcg > +++ a/mm/debug.c > @@ -116,7 +116,7 @@ void dump_mm(const struct mm_struct *mm) > "ioctx_table %px\n" > #endif > #ifdef CONFIG_MEMCG > - "owner %px " > + "memcg %px " > #endif > "exe_file %px\n" > #ifdef CONFIG_MMU_NOTIFIER > @@ -147,7 +147,7 @@ void dump_mm(const struct mm_struct *mm) > mm->ioctx_table, > #endif > #ifdef CONFIG_MEMCG > - mm->owner, > + mm->memcg, > #endif > mm->exe_file, > #ifdef CONFIG_MMU_NOTIFIER > diff -puN mm/memcontrol.c~memcg-replace-mm-owner-with-mm-memcg mm/memcontrol.c > --- a/mm/memcontrol.c~memcg-replace-mm-owner-with-mm-memcg > +++ a/mm/memcontrol.c > @@ -664,20 +664,6 @@ static void memcg_check_events(struct me > } > } > > -struct mem_cgroup *mem_cgroup_from_task(struct task_struct *p) > -{ > - /* > - * mm_update_next_owner() may clear mm->owner to NULL > - * if it races with swapoff, page migration, etc. > - * So this can be called with p == NULL. > - */ > - if (unlikely(!p)) > - return NULL; > - > - return mem_cgroup_from_css(task_css(p, memory_cgrp_id)); > -} > -EXPORT_SYMBOL(mem_cgroup_from_task); > - > struct mem_cgroup *get_mem_cgroup_from_mm(struct mm_struct *mm) > { > struct mem_cgroup *memcg = NULL; > @@ -692,7 +678,7 @@ struct mem_cgroup *get_mem_cgroup_from_m > if (unlikely(!mm)) > memcg = root_mem_cgroup; > else { > - memcg = mem_cgroup_from_task(rcu_dereference(mm->owner)); > + memcg = rcu_dereference(mm->memcg); > if (unlikely(!memcg)) > memcg = root_mem_cgroup; > } > @@ -1025,7 +1011,7 @@ bool task_in_mem_cgroup(struct task_stru > * killed to prevent needlessly killing additional tasks. > */ > rcu_read_lock(); > - task_memcg = mem_cgroup_from_task(task); > + task_memcg = mem_cgroup_from_css(task_css(task, memory_cgrp_id)); > css_get(&task_memcg->css); > rcu_read_unlock(); > } > @@ -4850,15 +4836,16 @@ static int mem_cgroup_can_attach(struct > if (!move_flags) > return 0; > > - from = mem_cgroup_from_task(p); > + from = mem_cgroup_from_css(task_css(p, memory_cgrp_id)); > > VM_BUG_ON(from == memcg); > > mm = get_task_mm(p); > if (!mm) > return 0; > - /* We move charges only when we move a owner of the mm */ > - if (mm->owner == p) { > + > + /* We move charges except for creative uses of CLONE_VM */ > + if (mm->memcg == from) { > VM_BUG_ON(mc.from); > VM_BUG_ON(mc.to); > VM_BUG_ON(mc.precharge); > @@ -5058,6 +5045,58 @@ static void mem_cgroup_move_task(void) > } > #endif > > +/** > + * mm_update_memcg - Update the memory cgroup of a mm_struct > + * @mm: mm struct > + * @new: new memory cgroup value > + * > + * Called whenever mm->memcg needs to change. Consumes a reference > + * to new (unless new is NULL). The reference to the old memory > + * cgroup is decreased. > + */ > +void mm_update_memcg(struct mm_struct *mm, struct mem_cgroup *new) > +{ > + /* This is the only place where mm->memcg is changed */ > + struct mem_cgroup *old; > + > + old = xchg(&mm->memcg, new); > + if (old) > + css_put(&old->css); > +} > + > +static void task_update_memcg(struct task_struct *tsk, struct mem_cgroup *new) > +{ > + struct mm_struct *mm; > + task_lock(tsk); > + mm = tsk->mm; > + if (mm && !(tsk->flags & PF_KTHREAD)) > + mm_update_memcg(mm, new); > + task_unlock(tsk); > +} > + > +static void mem_cgroup_attach(struct cgroup_taskset *tset) > +{ > + struct cgroup_subsys_state *css; > + struct task_struct *tsk; > + > + cgroup_taskset_for_each(tsk, css, tset) { > + struct mem_cgroup *new = mem_cgroup_from_css(css); > + css_get(css); > + task_update_memcg(tsk, new); > + } > +} > + > +void mm_sync_memcg_from_task(struct task_struct *tsk) > +{ > + struct cgroup_subsys_state *css; > + > + rcu_read_lock(); > + css = task_css(tsk, memory_cgrp_id); > + if (css && css_tryget(css)) > + task_update_memcg(tsk, mem_cgroup_from_css(css)); > + rcu_read_unlock(); > +} > + > /* > * Cgroup retains root cgroups across [un]mount cycles making it necessary > * to verify whether we're attached to the default hierarchy on each mount > @@ -5358,8 +5397,10 @@ struct cgroup_subsys memory_cgrp_subsys > .css_free = mem_cgroup_css_free, > .css_reset = mem_cgroup_css_reset, > .can_attach = mem_cgroup_can_attach, > + .attach = mem_cgroup_attach, > .cancel_attach = mem_cgroup_cancel_attach, > .post_attach = mem_cgroup_move_task, > + .fork = mm_sync_memcg_from_task, > .bind = mem_cgroup_bind, > .dfl_cftypes = memory_files, > .legacy_cftypes = mem_cgroup_legacy_files, > @@ -5846,7 +5887,7 @@ void mem_cgroup_sk_alloc(struct sock *sk > } > > rcu_read_lock(); > - memcg = mem_cgroup_from_task(current); > + memcg = mem_cgroup_from_css(task_css(current, memory_cgrp_id)); > if (memcg == root_mem_cgroup) > goto out; > if (!cgroup_subsys_on_dfl(memory_cgrp_subsys) && !memcg->tcpmem_active) > _ >