Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758904AbYC0Ie0 (ORCPT ); Thu, 27 Mar 2008 04:34:26 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1758142AbYC0IeA (ORCPT ); Thu, 27 Mar 2008 04:34:00 -0400 Received: from e28smtp03.in.ibm.com ([59.145.155.3]:33127 "EHLO e28smtp03.in.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757972AbYC0Id6 (ORCPT ); Thu, 27 Mar 2008 04:33:58 -0400 Message-ID: <47EB5B27.2050907@linux.vnet.ibm.com> Date: Thu, 27 Mar 2008 14:00:31 +0530 From: Balbir Singh Reply-To: balbir@linux.vnet.ibm.com Organization: IBM User-Agent: Thunderbird 2.0.0.12 (X11/20080226) MIME-Version: 1.0 To: Pavel Emelyanov CC: Andrew Morton , Hugh Dickins , Sudhir Kumar , YAMAMOTO Takashi , Paul Menage , lizf@cn.fujitsu.com, linux-kernel@vger.kernel.org, taka@valinux.co.jp, linux-mm@kvack.org, David Rientjes , KAMEZAWA Hiroyuki Subject: Re: [RFC][2/3] Account and control virtual address space allocations (v2) References: <20080326184954.9465.19379.sendpatchset@localhost.localdomain> <20080326185017.9465.29950.sendpatchset@localhost.localdomain> <47EB4A7E.6060505@openvz.org> <47EB548D.2050609@linux.vnet.ibm.com> <47EB59C3.3080803@openvz.org> In-Reply-To: <47EB59C3.3080803@openvz.org> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4040 Lines: 94 Pavel Emelyanov wrote: > Balbir Singh wrote: >> Pavel Emelyanov wrote: >>> Balbir Singh wrote: >>>> Changelog v2 >>>> ------------ >>>> Change the accounting to what is already present in the kernel. Split >>>> the address space accounting into mem_cgroup_charge_as and >>>> mem_cgroup_uncharge_as. At the time of VM expansion, call >>>> mem_cgroup_cannot_expand_as to check if the new allocation will push >>>> us over the limit >>>> >>>> This patch implements accounting and control of virtual address space. >>>> Accounting is done when the virtual address space of any task/mm_struct >>>> belonging to the cgroup is incremented or decremented. This patch >>>> fails the expansion if the cgroup goes over its limit. >>>> >>>> TODOs >>>> >>>> 1. Only when CONFIG_MMU is enabled, is the virtual address space control >>>> enabled. Should we do this for nommu cases as well? My suspicion is >>>> that we don't have to. >>>> >>>> Signed-off-by: Balbir Singh >>>> --- >>>> >>>> arch/ia64/kernel/perfmon.c | 2 + >>>> arch/x86/kernel/ptrace.c | 7 +++ >>>> fs/exec.c | 2 + >>>> include/linux/memcontrol.h | 26 +++++++++++++ >>>> include/linux/res_counter.h | 19 ++++++++-- >>>> init/Kconfig | 2 - >>>> kernel/fork.c | 17 +++++++-- >>>> mm/memcontrol.c | 83 ++++++++++++++++++++++++++++++++++++++++++++ >>>> mm/mmap.c | 11 +++++ >>>> mm/mremap.c | 2 + >>>> 10 files changed, 163 insertions(+), 8 deletions(-) >>>> >>>> diff -puN mm/memcontrol.c~memory-controller-virtual-address-space-accounting-and-control mm/memcontrol.c >>>> --- linux-2.6.25-rc5/mm/memcontrol.c~memory-controller-virtual-address-space-accounting-and-control 2008-03-26 16:27:59.000000000 +0530 >>>> +++ linux-2.6.25-rc5-balbir/mm/memcontrol.c 2008-03-27 00:18:16.000000000 +0530 >>>> @@ -526,6 +526,76 @@ unsigned long mem_cgroup_isolate_pages(u >>>> return nr_taken; >>>> } >>>> >>>> +#ifdef CONFIG_CGROUP_MEM_RES_CTLR_AS >>>> +/* >>>> + * Charge the address space usage for cgroup. This routine is most >>>> + * likely to be called from places that expand the total_vm of a mm_struct. >>>> + */ >>>> +void mem_cgroup_charge_as(struct mm_struct *mm, long nr_pages) >>>> +{ >>>> + struct mem_cgroup *mem; >>>> + >>>> + if (mem_cgroup_subsys.disabled) >>>> + return; >>>> + >>>> + rcu_read_lock(); >>>> + mem = rcu_dereference(mm->mem_cgroup); >>>> + css_get(&mem->css); >>>> + rcu_read_unlock(); >>>> + >>>> + res_counter_charge(&mem->as_res, (nr_pages * PAGE_SIZE)); >>>> + css_put(&mem->css); >>> Why don't you check whether the counter is charged? This is >>> bad for two reasons: >>> 1. you allow for some growth above the limit (e.g. in expand_stack) >> I was doing that earlier and then decided to keep the virtual address space code >> in sync with the RLIMIT_AS checking code in the kernel. If you see the flow, it >> closely resembles what we do with mm->total_vm and may_expand_vm(). >> expand_stack() in turn calls acct_stack_growth() which calls may_expand_vm() > > But this is racy! Look - you do expand_stack on two CPUs and the limit is > almost reached - so that there's room for a single expansion. In this case > may_expand_vm will return true for both, since it only checks the limit, > while the subsequent charge will fail on one of them, since it actually > tries to raise the usage... > Hmm... yes, possibly. Thanks for pointing this out. For a single mm_struct, the check is done under mmap_sem(), so it's OK for processes. I suspect, I'll have to go back to what I had earlier. I don't want to add a mutex to mem_cgroup, that will hurt parallelism badly. -- Warm Regards, Balbir Singh Linux Technology Center IBM, ISTL -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/