Received: by 10.192.165.148 with SMTP id m20csp1230240imm; Wed, 2 May 2018 17:01:49 -0700 (PDT) X-Google-Smtp-Source: AB8JxZr8TXFXFhn98p2/qGYA73qMgQUUy8lB1DfcucrLr4Cyq6MpmL1QymxHBkddeRAL6gD1S22O X-Received: by 2002:a17:902:8d81:: with SMTP id v1-v6mr21203674plo.383.1525305709831; Wed, 02 May 2018 17:01:49 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1525305709; cv=none; d=google.com; s=arc-20160816; b=S5qLAWMP/WFt8JArTLobmxhPRKRV0Jjmc7dmjVachEMXPmD1nezcLTzKaiZHqFmyyl maHNkww9ZH8By4QFk6NGWCrWmEyz1Y57l3ZLmauxzAvaKOiAbEUx11Q5W26lbXKXOplP 4SMgVbK5nCJuqMcOiBDBZzC0/DgKkr9wNOSoJlDk+2BnXKSusm1HA6xyrXH/+wqxZ6qe O54d6QDw+iyLEo58kD8R8baG3EJFN8tCA1vpFVtWRRCSKgsg/7o1/meO3N+dNRSHLBQi 3dWKBWEiEVg9UOvXNmu0YUzuImZrM56tJvY9N8LDHJawLGMSp9BGBQLkhHTG9B4zrTBp 3I3A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:subject:cc:to:from:date :dkim-signature:arc-authentication-results; bh=Nl16PBVoB23zb7oWGgjsUnaNeo8XE1tkcDmRjp3VnTM=; b=dGnbGVPP4E0FlI9dA1sxvtvQMeQIe1HpwC6UVkFwnIHzQmoMDH/NZ8OqUhWLYrl+aA t3q4Ky3QsQWQxzyNx3Lh/V0mjW8pyC0JrEFy8LxfkVRGqOte4f+2/Vyaxuvv1b0LLREn 2rjYprWCT0VoS+eyMud6wRUujlvgFYaPV1cUa7J5M301bkEuq8tAp3uOs+a0N27KGQ4N Uj57WAdMKucXYu0B+5fYoi+B9ZiCSLx8H4bgmWk7rtBBo9dnYuv1ObwZvF+0UXCqBmXK ekwt/9cppClWKctVSV3/6X+cjhC8VnyikPspWnbfmOUPTYGbca1h7y98MX+P8m07bt0U uAyQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=NQGdpEre; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id p33-v6si12352934pld.318.2018.05.02.17.01.35; Wed, 02 May 2018 17:01:49 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=NQGdpEre; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751770AbeECAAM (ORCPT + 99 others); Wed, 2 May 2018 20:00:12 -0400 Received: from mail-pg0-f65.google.com ([74.125.83.65]:38449 "EHLO mail-pg0-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750925AbeECAAL (ORCPT ); Wed, 2 May 2018 20:00:11 -0400 Received: by mail-pg0-f65.google.com with SMTP id n9-v6so11787859pgq.5 for ; Wed, 02 May 2018 17:00:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:to:cc:subject:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=Nl16PBVoB23zb7oWGgjsUnaNeo8XE1tkcDmRjp3VnTM=; b=NQGdpEremEEYQqRmOpG57Mn/HmU5eczmZA/iPyZ9I3JZ8Rci3UN+y3iYEIlYqgkg1w TPvABf5CCE0CE9pQ3CKtIHcjVrejaWoQVee49FCv1UZZYnFPbqCvYTFLI+IwE/a51c84 hHDzixLFsyKIxA4aVjbovbIyoMYLfSH00TnrRZomRfl1vrZhoNABZg2UmcI+eRapH3O5 gRsHKOtYkjoOtp96X/IZxFOcXlnUrlJ7nnDl5P0G4MentXb7aGbMjPNf9V3a7uSrwNOq oqQIxPL/Wifp/++OYf9aacHSb3O6fuNNb29pEfqHK0kvo8ptsVgc3o6ULUXsxnOpie2M Redg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=Nl16PBVoB23zb7oWGgjsUnaNeo8XE1tkcDmRjp3VnTM=; b=csZq9RIuOjX2qmppgYPd6VCy5nnlWV05mbq1Qq1WkoL3nmVRBWdfah+sk1ey+h0VKd 3JTp79Qmkrrp6H8Q2txDVa9uPYpu4vThGcSAWwlvRJCSfQaakDKTttoZqpV/4r9kKnzj Xnepr17ieEOlMMLDBwXRELXZ39kRY3qORYClH4tbbtfLT0l5srzvwJKY6lvBqOJsZ1KC i0Xvp3UhAdkL7uvfkBxuiJlnp+pBi7TUdAKt72i17E68tdQp5MX27UHyj7jIuAoEb7Gu bUi7gsRSFfQITeeeKQHYNrWieK9ftpZy3rdzH6tYyE+zGIVj3TrS4nB2PRpoVzncbF2/ U7MQ== X-Gm-Message-State: ALQs6tAJ0zG/n3ILimaZ8MgXLa38AnanFdYdnYEmaYOkjduAEt7vRrsr HUBpb7PxNpRo/H1lTAC1Mfs= X-Received: by 2002:a63:7352:: with SMTP id d18-v6mr18089978pgn.52.1525305610261; Wed, 02 May 2018 17:00:10 -0700 (PDT) Received: from balbir.ozlabs.ibm.com ([122.99.82.10]) by smtp.gmail.com with ESMTPSA id j14sm3431833pfn.151.2018.05.02.17.00.04 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Wed, 02 May 2018 17:00:10 -0700 (PDT) Date: Thu, 3 May 2018 09:59:52 +1000 From: Balbir Singh To: ebiederm@xmission.com (Eric W. Biederman) Cc: Michal Hocko , Kirill Tkhai , akpm@linux-foundation.org, peterz@infradead.org, oleg@redhat.com, viro@zeniv.linux.org.uk, mingo@kernel.org, paulmck@linux.vnet.ibm.com, keescook@chromium.org, riel@redhat.com, tglx@linutronix.de, kirill.shutemov@linux.intel.com, marcos.souza.org@gmail.com, hoeun.ryu@gmail.com, pasha.tatashin@oracle.com, gs051095@gmail.com, dhowells@redhat.com, rppt@linux.vnet.ibm.com, linux-kernel@vger.kernel.org Subject: Re: [RFC][PATCH] memcg: Replace mm->owner with mm->memcg Message-ID: <20180503095952.70bffde1@balbir.ozlabs.ibm.com> In-Reply-To: <87h8nr2sa3.fsf_-_@xmission.com> References: <152473763015.29458.1131542311542381803.stgit@localhost.localdomain> <20180426130700.GP17484@dhcp22.suse.cz> <87efj2q6sq.fsf@xmission.com> <20180426192818.GX17484@dhcp22.suse.cz> <20180427070848.GA17484@dhcp22.suse.cz> <87r2n01q58.fsf@xmission.com> <87o9hz2sw3.fsf@xmission.com> <87h8nr2sa3.fsf_-_@xmission.com> X-Mailer: Claws Mail 3.15.1-dirty (GTK+ 2.24.31; x86_64-redhat-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, 01 May 2018 12:35:16 -0500 ebiederm@xmission.com (Eric W. Biederman) wrote: > Recently it was reported that mm_update_next_owner could get into > cases where it was executing it's fallback for_each_process part of > the loop and thus taking up a lot of time. > > To deal with this replace mm->owner with mm->memcg. This just reduces > the complexity of everything. As much as possible I have maintained > the current semantics. There are two siginificant exceptions. During > fork the memcg of the process calling fork is charged rather than > init_css_set. During memory cgroup migration the charges are migrated > not if the process is the owner of the mm, but if the process being > migrated has the same memory cgroup as the mm. > > I believe it was a bug if init_css_set is charged for memory activity > during fork, and the old behavior was simply a consequence of the new > task not having tsk->cgroup not initialized to it's proper cgroup. That does sound like a bug, I guess we've not seen it because we did not track any slab allocations initially. > > Durhing cgroup migration only thread group leaders are allowed to > migrate. Which means in practice there should only be one. Linux > tasks created with CLONE_VM are the only exception, but the common > cases are already ruled out. Processes created with vfork have a > suspended parent and can do nothing but call exec so they should never > show up. Threads of the same cgroup are not the thread group leader > so also should not show up. That leaves the old LinuxThreads library > which is probably out of use by now, and someone doing something very > creative with cgroups, and rolling their own threads with CLONE_VM. > So in practice I don't think the difference charge migration will > affect anyone. > > To ensure that mm->memcg is updated appropriately I have implemented > cgroup "attach" and "fork" methods. This ensures that at those > points the mm pointed to the task has the appropriate memory cgroup. > > For simplicity instead of introducing a new mm lock I simply use > exchange on the pointer where the mm->memcg is updated to get > atomic updates. > > Looking at the history effectively this change is a revert. The > reason given for adding mm->owner is so that multiple cgroups can be > attached to the same mm. In the last 8 years a second user of > mm->owner has not appeared. A feature that has never used, makes the > code more complicated and has horrible worst case performance should > go. The idea was to track the mm to the right cgroup, we did find that the mm could be confused as belonging to two cgroups. tsk->cgroup is not sufficient and if when the tgid left, we needed an owner to track where the current allocations were. But this is from 8 year old history, I don't have my notes anymore :) > > Fixes: cf475ad28ac3 ("cgroups: add an owner to the mm_struct") > Reported-by: Kirill Tkhai > Signed-off-by: "Eric W. Biederman" > --- > fs/exec.c | 1 - > include/linux/memcontrol.h | 11 ++++-- > include/linux/mm_types.h | 12 +------ > include/linux/sched/mm.h | 8 ----- > kernel/exit.c | 89 ---------------------------------------------- > kernel/fork.c | 17 +++++++-- > mm/memcontrol.c | 86 ++++++++++++++++++++++++++++++++++---------- > 7 files changed, 90 insertions(+), 134 deletions(-) > > diff --git a/fs/exec.c b/fs/exec.c > index 183059c427b9..a8be9318d1a8 100644 > --- a/fs/exec.c > +++ b/fs/exec.c > @@ -1040,7 +1040,6 @@ static int exec_mmap(struct mm_struct *mm) > up_read(&old_mm->mmap_sem); > BUG_ON(active_mm != old_mm); > setmax_mm_hiwater_rss(&tsk->signal->maxrss, old_mm); > - mm_update_next_owner(old_mm); > mmput(old_mm); > return 0; > } > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h > index d99b71bc2c66..147e04bfcaee 100644 > --- a/include/linux/memcontrol.h > +++ b/include/linux/memcontrol.h > @@ -341,7 +341,6 @@ static inline struct lruvec *mem_cgroup_lruvec(struct pglist_data *pgdat, > struct lruvec *mem_cgroup_page_lruvec(struct page *, struct pglist_data *); > > bool task_in_mem_cgroup(struct task_struct *task, struct mem_cgroup *memcg); > -struct mem_cgroup *mem_cgroup_from_task(struct task_struct *p); > > static inline > struct mem_cgroup *mem_cgroup_from_css(struct cgroup_subsys_state *css){ > @@ -402,6 +401,8 @@ static inline bool mem_cgroup_is_descendant(struct mem_cgroup *memcg, > return cgroup_is_descendant(memcg->css.cgroup, root->css.cgroup); > } > > +void mm_update_memcg(struct mm_struct *mm, struct mem_cgroup *new); > + > static inline bool mm_match_cgroup(struct mm_struct *mm, > struct mem_cgroup *memcg) > { > @@ -409,7 +410,7 @@ static inline bool mm_match_cgroup(struct mm_struct *mm, > bool match = false; > > rcu_read_lock(); > - task_memcg = mem_cgroup_from_task(rcu_dereference(mm->owner)); > + task_memcg = rcu_dereference(mm->memcg); > if (task_memcg) > match = mem_cgroup_is_descendant(task_memcg, memcg); > rcu_read_unlock(); > @@ -693,7 +694,7 @@ static inline void count_memcg_event_mm(struct mm_struct *mm, > return; > > rcu_read_lock(); > - memcg = mem_cgroup_from_task(rcu_dereference(mm->owner)); > + memcg = rcu_dereference(mm->memcg); > if (likely(memcg)) { > count_memcg_events(memcg, idx, 1); > if (idx == OOM_KILL) > @@ -781,6 +782,10 @@ static inline struct lruvec *mem_cgroup_page_lruvec(struct page *page, > return &pgdat->lruvec; > } > > +static inline void mm_update_memcg(struct mm_struct *mm, struct mem_cgroup *new) > +{ > +} > + > static inline bool mm_match_cgroup(struct mm_struct *mm, > struct mem_cgroup *memcg) > { > diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h > index 21612347d311..ea5efd40a5d1 100644 > --- a/include/linux/mm_types.h > +++ b/include/linux/mm_types.h > @@ -443,17 +443,7 @@ struct mm_struct { > struct kioctx_table __rcu *ioctx_table; > #endif > #ifdef CONFIG_MEMCG > - /* > - * "owner" points to a task that is regarded as the canonical > - * user/owner of this mm. All of the following must be true in > - * order for it to be changed: > - * > - * current == mm->owner > - * current->mm != mm > - * new_owner->mm == mm > - * new_owner->alloc_lock is held > - */ > - struct task_struct __rcu *owner; > + struct mem_cgroup __rcu *memcg; > #endif > struct user_namespace *user_ns; > > diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h > index 2c570cd934af..cc8e68d36fc2 100644 > --- a/include/linux/sched/mm.h > +++ b/include/linux/sched/mm.h > @@ -95,14 +95,6 @@ extern struct mm_struct *mm_access(struct task_struct *task, unsigned int mode); > /* Remove the current tasks stale references to the old mm_struct */ > extern void mm_release(struct task_struct *, struct mm_struct *); > > -#ifdef CONFIG_MEMCG > -extern void mm_update_next_owner(struct mm_struct *mm); > -#else > -static inline void mm_update_next_owner(struct mm_struct *mm) > -{ > -} > -#endif /* CONFIG_MEMCG */ > - > #ifdef CONFIG_MMU > extern void arch_pick_mmap_layout(struct mm_struct *mm, > struct rlimit *rlim_stack); > diff --git a/kernel/exit.c b/kernel/exit.c > index c3c7ac560114..be967d2da0ce 100644 > --- a/kernel/exit.c > +++ b/kernel/exit.c > @@ -399,94 +399,6 @@ kill_orphaned_pgrp(struct task_struct *tsk, struct task_struct *parent) > } > } > > -#ifdef CONFIG_MEMCG > -/* > - * A task is exiting. If it owned this mm, find a new owner for the mm. > - */ > -void mm_update_next_owner(struct mm_struct *mm) > -{ > - struct task_struct *c, *g, *p = current; > - > -retry: > - /* > - * If the exiting or execing task is not the owner, it's > - * someone else's problem. > - */ > - if (mm->owner != p) > - return; > - /* > - * The current owner is exiting/execing and there are no other > - * candidates. Do not leave the mm pointing to a possibly > - * freed task structure. > - */ > - if (atomic_read(&mm->mm_users) <= 1) { > - mm->owner = NULL; > - return; > - } > - > - read_lock(&tasklist_lock); > - /* > - * Search in the children > - */ > - list_for_each_entry(c, &p->children, sibling) { > - if (c->mm == mm) > - goto assign_new_owner; > - } > - > - /* > - * Search in the siblings > - */ > - list_for_each_entry(c, &p->real_parent->children, sibling) { > - if (c->mm == mm) > - goto assign_new_owner; > - } > - > - /* > - * Search through everything else, we should not get here often. > - */ > - for_each_process(g) { > - if (g->flags & PF_KTHREAD) > - continue; > - for_each_thread(g, c) { > - if (c->mm == mm) > - goto assign_new_owner; > - if (c->mm) > - break; > - } > - } > - read_unlock(&tasklist_lock); > - /* > - * We found no owner yet mm_users > 1: this implies that we are > - * most likely racing with swapoff (try_to_unuse()) or /proc or > - * ptrace or page migration (get_task_mm()). Mark owner as NULL. > - */ > - mm->owner = NULL; > - return; > - > -assign_new_owner: > - BUG_ON(c == p); > - get_task_struct(c); > - /* > - * The task_lock protects c->mm from changing. > - * We always want mm->owner->mm == mm > - */ > - task_lock(c); > - /* > - * Delay read_unlock() till we have the task_lock() > - * to ensure that c does not slip away underneath us > - */ > - read_unlock(&tasklist_lock); > - if (c->mm != mm) { > - task_unlock(c); > - put_task_struct(c); > - goto retry; > - } > - mm->owner = c; > - task_unlock(c); > - put_task_struct(c); > -} > -#endif /* CONFIG_MEMCG */ > - > /* > * Turn us into a lazy TLB process if we > * aren't already.. > @@ -540,7 +452,6 @@ static void exit_mm(void) > up_read(&mm->mmap_sem); > enter_lazy_tlb(mm, current); > task_unlock(current); > - mm_update_next_owner(mm); > mmput(mm); > if (test_thread_flag(TIF_MEMDIE)) > exit_oom_victim(); > diff --git a/kernel/fork.c b/kernel/fork.c > index a5d21c42acfc..f284acf22aad 100644 > --- a/kernel/fork.c > +++ b/kernel/fork.c > @@ -868,10 +868,19 @@ static void mm_init_aio(struct mm_struct *mm) > #endif > } > > -static void mm_init_owner(struct mm_struct *mm, struct task_struct *p) > +static void mm_init_memcg(struct mm_struct *mm) > { > #ifdef CONFIG_MEMCG > - mm->owner = p; > + struct cgroup_subsys_state *css; > + > + /* Ensure mm->memcg is initialized */ > + mm->memcg = NULL; > + > + rcu_read_lock(); > + css = task_css(current, memory_cgrp_id); > + if (css && css_tryget(css)) > + mm_update_memcg(mm, mem_cgroup_from_css(css)); > + rcu_read_unlock(); > #endif > } > > @@ -901,7 +910,7 @@ static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p, > spin_lock_init(&mm->page_table_lock); > mm_init_cpumask(mm); > mm_init_aio(mm); > - mm_init_owner(mm, p); > + mm_init_memcg(mm); > RCU_INIT_POINTER(mm->exe_file, NULL); > mmu_notifier_mm_init(mm); > hmm_mm_init(mm); > @@ -931,6 +940,7 @@ static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p, > fail_nocontext: > mm_free_pgd(mm); > fail_nopgd: > + mm_update_memcg(mm, NULL); > free_mm(mm); > return NULL; > } > @@ -968,6 +978,7 @@ static inline void __mmput(struct mm_struct *mm) > } > if (mm->binfmt) > module_put(mm->binfmt->module); > + mm_update_memcg(mm, NULL); > mmdrop(mm); > } > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 2bd3df3d101a..5dce8a7fa65b 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -664,20 +664,6 @@ static void memcg_check_events(struct mem_cgroup *memcg, struct page *page) > } > } > > -struct mem_cgroup *mem_cgroup_from_task(struct task_struct *p) > -{ > - /* > - * mm_update_next_owner() may clear mm->owner to NULL > - * if it races with swapoff, page migration, etc. > - * So this can be called with p == NULL. > - */ > - if (unlikely(!p)) > - return NULL; > - > - return mem_cgroup_from_css(task_css(p, memory_cgrp_id)); > -} > -EXPORT_SYMBOL(mem_cgroup_from_task); > - > static struct mem_cgroup *get_mem_cgroup_from_mm(struct mm_struct *mm) > { > struct mem_cgroup *memcg = NULL; > @@ -692,7 +678,7 @@ static struct mem_cgroup *get_mem_cgroup_from_mm(struct mm_struct *mm) > if (unlikely(!mm)) > memcg = root_mem_cgroup; > else { > - memcg = mem_cgroup_from_task(rcu_dereference(mm->owner)); > + memcg = rcu_dereference(mm->memcg); > if (unlikely(!memcg)) > memcg = root_mem_cgroup; > } > @@ -1011,7 +997,7 @@ bool task_in_mem_cgroup(struct task_struct *task, struct mem_cgroup *memcg) > * killed to prevent needlessly killing additional tasks. > */ > rcu_read_lock(); > - task_memcg = mem_cgroup_from_task(task); > + task_memcg = mem_cgroup_from_css(task_css(task, memory_cgrp_id)); > css_get(&task_memcg->css); > rcu_read_unlock(); > } > @@ -4827,15 +4813,16 @@ static int mem_cgroup_can_attach(struct cgroup_taskset *tset) > if (!move_flags) > return 0; > > - from = mem_cgroup_from_task(p); > + from = mem_cgroup_from_css(task_css(p, memory_cgrp_id)); > > VM_BUG_ON(from == memcg); > > mm = get_task_mm(p); > if (!mm) > return 0; > + > /* We move charges only when we move a owner of the mm */ > - if (mm->owner == p) { > + if (mm->memcg == from) { > VM_BUG_ON(mc.from); > VM_BUG_ON(mc.to); > VM_BUG_ON(mc.precharge); > @@ -4859,6 +4846,59 @@ static int mem_cgroup_can_attach(struct cgroup_taskset *tset) > return ret; > } > > +/** > + * mm_update_memcg - Update the memory cgroup of a mm_struct > + * @mm: mm struct > + * @new: new memory cgroup value > + * > + * Called whenever mm->memcg needs to change. Consumes a reference > + * to new (unless new is NULL). The reference to the old memory > + * cgroup is decreased. > + */ > +void mm_update_memcg(struct mm_struct *mm, struct mem_cgroup *new) > +{ > + /* This is the only place where mm->memcg is changed */ > + struct mem_cgroup *old; > + > + old = xchg(&mm->memcg, new); > + if (old) > + css_put(&old->css); > +} > + > +static void task_update_memcg(struct task_struct *tsk, struct mem_cgroup *new) > +{ > + struct mm_struct *mm; > + task_lock(tsk); > + mm = tsk->mm; > + if (mm && !(tsk->flags & PF_KTHREAD)) > + mm_update_memcg(mm, new); > + task_unlock(tsk); > +} > + > +static void mem_cgroup_attach(struct cgroup_taskset *tset) > +{ > + struct cgroup_subsys_state *css; > + struct task_struct *tsk; > + > + cgroup_taskset_for_each(tsk, css, tset) { > + struct mem_cgroup *new = mem_cgroup_from_css(css); > + css_get(css); > + task_update_memcg(tsk, new); I'd have to go back and check and I think your comment refers to this, but we don't expect non tgid tasks to show up here? My concern is I can't find the guaratee that task_update_memcg(tsk, new) is not 1. Duplicated for each thread in the process or attached to the mm 2. Do not update mm->memcg to point to different places, so the one that sticks is the one that updated things last. Balbir Singh