Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754847Ab1CVHb7 (ORCPT ); Tue, 22 Mar 2011 03:31:59 -0400 Received: from cantor2.suse.de ([195.135.220.15]:57839 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751361Ab1CVHb4 (ORCPT ); Tue, 22 Mar 2011 03:31:56 -0400 Date: Tue, 22 Mar 2011 08:31:50 +0100 From: Michal Hocko To: Daisuke Nishimura Cc: KAMEZAWA Hiroyuki , Andrew Morton , linux-mm@kvack.org, LKML Subject: Re: [PATCH] memcg: consider per-cpu stock reserves when returning RES_USAGE for _MEM Message-ID: <20110322073150.GA12940@tiehlicka.suse.cz> References: <20110318152532.GB18450@tiehlicka.suse.cz> <20110321093419.GA26047@tiehlicka.suse.cz> <20110321102420.GB26047@tiehlicka.suse.cz> <20110322091014.27677ab3.kamezawa.hiroyu@jp.fujitsu.com> <20110322104723.fd81dddc.nishimura@mxp.nes.nec.co.jp> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20110322104723.fd81dddc.nishimura@mxp.nes.nec.co.jp> User-Agent: Mutt/1.5.20 (2009-06-14) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5405 Lines: 141 On Tue 22-03-11 10:47:23, Daisuke Nishimura wrote: > On Tue, 22 Mar 2011 09:10:14 +0900 > KAMEZAWA Hiroyuki wrote: > > > On Mon, 21 Mar 2011 11:24:20 +0100 > > Michal Hocko wrote: > > > > > [Sorry for reposting but I forgot to fully refresh the patch before > > > posting...] > > > > > > On Mon 21-03-11 10:34:19, Michal Hocko wrote: > > > > On Fri 18-03-11 16:25:32, Michal Hocko wrote: > > > > [...] > > > > > According to our documention this is a reasonable test case: > > > > > Documentation/cgroups/memory.txt: > > > > > memory.usage_in_bytes # show current memory(RSS+Cache) usage. > > > > > > > > > > This however doesn't work after your commit: > > > > > cdec2e4265d (memcg: coalesce charging via percpu storage) > > > > > > > > > > because since then we are charging in bulks so we can end up with > > > > > rss+cache <= usage_in_bytes. > > > > [...] > > > > > I think we have several options here > > > > > 1) document that the value is actually >= rss+cache and it shows > > > > > the guaranteed charges for the group > > > > > 2) use rss+cache rather then res->count > > > > > 3) remove the file > > > > > 4) call drain_all_stock_sync before asking for the value in > > > > > mem_cgroup_read > > > > > 5) collect the current amount of stock charges and subtract it > > > > > from the current res->count value > > > > > > > > > > 1) and 2) would suggest that the file is actually not very much useful. > > > > > 3) is basically the interface change as well > > > > > 4) sounds little bit invasive as we basically lose the advantage of the > > > > > pool whenever somebody reads the file. Btw. for who is this file > > > > > intended? > > > > > 5) sounds like a compromise > > > > > > > > I guess that 4) is really too invasive - for no good reason so here we > > > > go with the 5) solution. > > > > I think the test in LTP is bad...(it should be fuzzy.) because we cannot > > avoid races... > I agree. I think that as well. In fact, I quite do not understand what it is testing actually (that we account charges correctly? If yes then what if both values are wrong?). The other point is, though, we have exported this interface and documented its semantic. This is the reason I have asked for the initial purpose of the file in the first place. Is this something for debugging only? Can we make use of the value somehow (other than a shortcut for rss+cache)? If there is realy no strong reason for the file existence I would rather vote for its removing than having a unusable semantic. [...] > > > Index: linus_tree/mm/memcontrol.c > > > =================================================================== > > > --- linus_tree.orig/mm/memcontrol.c 2011-03-18 16:09:11.000000000 +0100 > > > +++ linus_tree/mm/memcontrol.c 2011-03-21 10:21:55.000000000 +0100 > > > @@ -3579,13 +3579,30 @@ static unsigned long mem_cgroup_recursiv > > > return val; > > > } > > > > > > +static u64 mem_cgroup_current_usage(struct mem_cgroup *mem) > > > +{ > > > + u64 val = res_counter_read_u64(&mem->res, RES_USAGE); > > > + u64 per_cpu_val = 0; > > > + int cpu; > > > + > > > + get_online_cpus(); > > > + for_each_online_cpu(cpu) { > > > + struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu); > > > + > > > + per_cpu_val += stock->nr_pages * PAGE_SIZE; > > > > if (memcg_stock->cached == mem) > > per_cpu_val += stock->nr_pages * PAGE_SIZE; OK, thanks for pointing that out. > > AND I think you doesn't handle batched uncharge. > > Do you have any idea ? Hmm, missed that... And this will be really hard as the batch structure is embeded in the task struct (strange why we simply cannot uncharge same way as we do charges?). > > (Peter Zilstra's patch will make error size of > > bached uncharge bigger.) > > > > So....rather than this, just always using root memcg's code is > > a good way. Could you try ? > > == > > usage = mem_cgroup_recursive_stat(mem, MEM_CGROUP_STAT_CACHE); > > usage += mem_cgroup_recursive_stat(mem, MEM_CGROUP_STAT_RSS); > > > > if (swap) > > val += mem_cgroup_recursive_stat(mem, MEM_CGROUP_STAT_SWAPOUT); > > > > return val << PAGE_SHIFT; > > == > > > So, option 2) above. > I have tried that and it really works but what is the point of the file then? Or should we do this as a workaround for some time frame until we remove the file completely? > As Michal already said, this change will make *.usage_in_bytes not so useful, > i.e. we can use memory.stat instead. It would be useful in that regard that we do not have to grep out two values from .stat file and sum them up. But I am missing the point. If we want to have a sum we should add it to the .stat file, right? (like we do in other exported files - e.g. meminfo etc...) > > I don't have any good idea, but I tend to agree to 1) or 3)(or rename the file names) now. The more I think about it the more I think the same. > Considering batched uncharge, I think 4) and 5) is difficult. Thanks for comments. -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/