Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932596Ab3DBPE2 (ORCPT ); Tue, 2 Apr 2013 11:04:28 -0400 Received: from cantor2.suse.de ([195.135.220.15]:49054 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932122Ab3DBPEZ (ORCPT ); Tue, 2 Apr 2013 11:04:25 -0400 Date: Tue, 2 Apr 2013 17:04:22 +0200 From: Michal Hocko To: Glauber Costa Cc: Li Zefan , Johannes Weiner , KAMEZAWA Hiroyuki , LKML , Cgroups , linux-mm@kvack.org Subject: [PATCH -v2] memcg: don't do cleanup manually if mem_cgroup_css_online() fails Message-ID: <20130402150422.GB32520@dhcp22.suse.cz> References: <515A8A40.6020406@huawei.com> <20130402121600.GK24345@dhcp22.suse.cz> <20130402141646.GQ24345@dhcp22.suse.cz> <515AE948.1000704@parallels.com> <20130402142825.GA32520@dhcp22.suse.cz> <515AEC3A.2030401@parallels.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <515AEC3A.2030401@parallels.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4394 Lines: 130 On Tue 02-04-13 18:33:30, Glauber Costa wrote: > On 04/02/2013 06:28 PM, Michal Hocko wrote: > > On Tue 02-04-13 18:20:56, Glauber Costa wrote: > >> On 04/02/2013 06:16 PM, Michal Hocko wrote: > >>> mem_cgroup_css_online > >>> memcg_init_kmem > >>> mem_cgroup_get # refcnt = 2 > >>> memcg_update_all_caches > >>> memcg_update_cache_size # fails with ENOMEM > >> > >> Here is the thing: this one in kmem only happens for kmem enabled > >> memcgs. For those, we tend to do a get once, and put only when the last > >> kmem reference is gone. > >> > >> For non-kmem memcgs, refcnt will be 1 here, and will be balanced out by > >> the mem_cgroup_put() in css_free. > > > > So we need this, right? > > --- > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > index f608546..2ef875d 100644 > > --- a/mm/memcontrol.c > > +++ b/mm/memcontrol.c > > @@ -5306,6 +5306,8 @@ static int memcg_propagate_kmem(struct mem_cgroup *memcg) > > ret = memcg_update_cache_sizes(memcg); > > mutex_unlock(&set_limit_mutex); > > out: > > + if (ret) > > + mem_cgroup_put(memcg); > > return ret; > > } > > #endif /* CONFIG_MEMCG_KMEM */ > > @@ -6417,16 +6419,6 @@ mem_cgroup_css_online(struct cgroup *cont) > > > > error = memcg_init_kmem(memcg, &mem_cgroup_subsys); > > mutex_unlock(&memcg_create_mutex); > > - if (error) { > > - /* > > - * We call put now because our (and parent's) refcnts > > - * are already in place. mem_cgroup_put() will internally > > - * call __mem_cgroup_free, so return directly > > - */ > > - mem_cgroup_put(memcg); > > - if (parent->use_hierarchy) > > - mem_cgroup_put(parent); > > - } > > return error; > > } > > > > > Yes, indeed you are very right - and thanks for looking at such depth. So what about the patch bellow? It seems that I provoked all this mess but my brain managed to push it away so I do not remember why I thought the parent needs reference drop... It is "only" 3.9 thing fortunately. --- >From 3aff5d958f1d0717795018f7d0d6b63d53ad1dd3 Mon Sep 17 00:00:00 2001 From: Li Zefan Date: Tue, 2 Apr 2013 16:37:39 +0200 Subject: [PATCH] memcg: don't do cleanup manually if mem_cgroup_css_online() fails mem_cgroup_css_online is called with memcg with refcnt = 1 and it expects that mem_cgroup_css_free will drop this last reference. This doesn't hold when memcg_init_kmem fails though and a reference is dropped for both memcg and its parent explicitly if it returns with an error. This is not correct for two reasons. Firstly mem_cgroup_put on parent is excessive because mem_cgroup_put is hierarchy aware and secondly only memcg_propagate_kmem takes an additional reference. The first one is a real use-after-free bug introduced by e4715f01 (memcg: avoid dangling reference count in creation failure) The later one is non-issue right now because the only implementation of init_cgroup seems to be tcp_init_cgroup which doesn't fail but it is better to make the error handling saner and move the mem_cgroup_put(memcg) to memcg_propagate_kmem where it belongs. Signed-off-by: Li Zefan Signed-off-by: Michal Hocko --- mm/memcontrol.c | 13 +++---------- 1 file changed, 3 insertions(+), 10 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index f608546..cf9ba7e 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -5306,6 +5306,8 @@ static int memcg_propagate_kmem(struct mem_cgroup *memcg) ret = memcg_update_cache_sizes(memcg); mutex_unlock(&set_limit_mutex); out: + if (ret) + mem_cgroup_put(memcg); return ret; } #endif /* CONFIG_MEMCG_KMEM */ @@ -6417,16 +6419,7 @@ mem_cgroup_css_online(struct cgroup *cont) error = memcg_init_kmem(memcg, &mem_cgroup_subsys); mutex_unlock(&memcg_create_mutex); - if (error) { - /* - * We call put now because our (and parent's) refcnts - * are already in place. mem_cgroup_put() will internally - * call __mem_cgroup_free, so return directly - */ - mem_cgroup_put(memcg); - if (parent->use_hierarchy) - mem_cgroup_put(parent); - } + return error; } -- 1.7.10.4 -- 1.7.10.4 -- Michal Hocko SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/