Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756749Ab2HOUUS (ORCPT ); Wed, 15 Aug 2012 16:20:18 -0400 Received: from mail-pb0-f46.google.com ([209.85.160.46]:43884 "EHLO mail-pb0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756690Ab2HOTu4 (ORCPT ); Wed, 15 Aug 2012 15:50:56 -0400 MIME-Version: 1.0 In-Reply-To: <20120814162144.GC6905@dhcp22.suse.cz> References: <1344517279-30646-1-git-send-email-glommer@parallels.com> <1344517279-30646-5-git-send-email-glommer@parallels.com> <20120814162144.GC6905@dhcp22.suse.cz> Date: Wed, 15 Aug 2012 12:50:55 -0700 Message-ID: Subject: Re: [PATCH v2 04/11] kmem accounting basic infrastructure From: Ying Han To: Michal Hocko Cc: Glauber Costa , linux-kernel@vger.kernel.org, linux-mm@kvack.org, cgroups@vger.kernel.org, devel@openvz.org, Johannes Weiner , Andrew Morton , kamezawa.hiroyu@jp.fujitsu.com, Christoph Lameter , David Rientjes , Pekka Enberg Content-Type: text/plain; charset=ISO-8859-1 X-System-Of-Record: true Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5918 Lines: 139 On Tue, Aug 14, 2012 at 9:21 AM, Michal Hocko wrote: > On Thu 09-08-12 17:01:12, Glauber Costa wrote: >> This patch adds the basic infrastructure for the accounting of the slab >> caches. To control that, the following files are created: >> >> * memory.kmem.usage_in_bytes >> * memory.kmem.limit_in_bytes >> * memory.kmem.failcnt >> * memory.kmem.max_usage_in_bytes >> >> They have the same meaning of their user memory counterparts. They >> reflect the state of the "kmem" res_counter. >> >> The code is not enabled until a limit is set. This can be tested by the >> flag "kmem_accounted". This means that after the patch is applied, no >> behavioral changes exists for whoever is still using memcg to control >> their memory usage. >> >> We always account to both user and kernel resource_counters. This >> effectively means that an independent kernel limit is in place when the >> limit is set to a lower value than the user memory. A equal or higher >> value means that the user limit will always hit first, meaning that kmem >> is effectively unlimited. > > Well, it contributes to the user limit so it is not unlimited. It just > falls under a different limit and it tends to contribute less. This can > be quite confusing. I am still not sure whether we should mix the two > things together. If somebody wants to limit the kernel memory he has to > touch the other limit anyway. Do you have a strong reason to mix the > user and kernel counters? The reason to mix the two together is a compromise of the two use cases we've heard by far. In google, we only need one limit which limits u & k, and the reclaim kicks in when the total usage hits the limit. > My impression was that kernel allocation should simply fail while user > allocations might reclaim as well. Why should we reclaim just because of > the kernel allocation (which is unreclaimable from hard limit reclaim > point of view)? Some of kernel objects are reclaimable if we have per-memcg shrinker. > I also think that the whole thing would get much simpler if those two > are split. Anyway if this is really a must then this should be > documented here. What would be the use case you have in your end? --Ying > One nit bellow. > >> People who want to track kernel memory but not limit it, can set this >> limit to a very high number (like RESOURCE_MAX - 1page - that no one >> will ever hit, or equal to the user memory) >> >> Signed-off-by: Glauber Costa >> CC: Michal Hocko >> CC: Johannes Weiner >> Reviewed-by: Kamezawa Hiroyuki >> --- >> mm/memcontrol.c | 69 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++- >> 1 file changed, 68 insertions(+), 1 deletion(-) >> >> diff --git a/mm/memcontrol.c b/mm/memcontrol.c >> index b0e29f4..54e93de 100644 >> --- a/mm/memcontrol.c >> +++ b/mm/memcontrol.c > [...] >> @@ -4046,8 +4059,23 @@ static int mem_cgroup_write(struct cgroup *cont, struct cftype *cft, >> break; >> if (type == _MEM) >> ret = mem_cgroup_resize_limit(memcg, val); >> - else >> + else if (type == _MEMSWAP) >> ret = mem_cgroup_resize_memsw_limit(memcg, val); >> + else if (type == _KMEM) { >> + ret = res_counter_set_limit(&memcg->kmem, val); >> + if (ret) >> + break; >> + /* >> + * Once enabled, can't be disabled. We could in theory >> + * disable it if we haven't yet created any caches, or >> + * if we can shrink them all to death. >> + * >> + * But it is not worth the trouble >> + */ >> + if (!memcg->kmem_accounted && val != RESOURCE_MAX) >> + memcg->kmem_accounted = true; >> + } else >> + return -EINVAL; >> break; > > This doesn't check for the hierachy so kmem_accounted might not be in > sync with it's parents. mem_cgroup_create (below) needs to copy > kmem_accounted down from the parent and the above needs to check if this > is a similar dance like mem_cgroup_oom_control_write. > > [...] > >> @@ -5033,6 +5098,7 @@ mem_cgroup_create(struct cgroup *cont) >> if (parent && parent->use_hierarchy) { >> res_counter_init(&memcg->res, &parent->res); >> res_counter_init(&memcg->memsw, &parent->memsw); >> + res_counter_init(&memcg->kmem, &parent->kmem); >> /* >> * We increment refcnt of the parent to ensure that we can >> * safely access it on res_counter_charge/uncharge. >> @@ -5043,6 +5109,7 @@ mem_cgroup_create(struct cgroup *cont) >> } else { >> res_counter_init(&memcg->res, NULL); >> res_counter_init(&memcg->memsw, NULL); >> + res_counter_init(&memcg->kmem, NULL); >> } >> memcg->last_scanned_node = MAX_NUMNODES; >> INIT_LIST_HEAD(&memcg->oom_notify); >> -- >> 1.7.11.2 >> >> -- >> To unsubscribe from this list: send the line "unsubscribe cgroups" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- > Michal Hocko > SUSE Labs > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: email@kvack.org -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/