Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754602AbYKGJU6 (ORCPT ); Fri, 7 Nov 2008 04:20:58 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1751578AbYKGJUR (ORCPT ); Fri, 7 Nov 2008 04:20:17 -0500 Received: from fgwmail6.fujitsu.co.jp ([192.51.44.36]:44038 "EHLO fgwmail6.fujitsu.co.jp" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751225AbYKGJUN (ORCPT ); Fri, 7 Nov 2008 04:20:13 -0500 Date: Fri, 7 Nov 2008 18:19:32 +0900 From: KAMEZAWA Hiroyuki To: Daisuke Nishimura Cc: "linux-mm@kvack.org" , "linux-kernel@vger.kernel.org" , "balbir@linux.vnet.ibm.com" , "menage@google.com" Subject: Re: [RFC][PATCH 5/6] memcg: mem+swap controller Message-Id: <20081107181932.94e6f307.kamezawa.hiroyu@jp.fujitsu.com> In-Reply-To: <20081107180248.39251a80.nishimura@mxp.nes.nec.co.jp> References: <20081105171637.1b393333.kamezawa.hiroyu@jp.fujitsu.com> <20081105172316.354c00fb.kamezawa.hiroyu@jp.fujitsu.com> <20081107180248.39251a80.nishimura@mxp.nes.nec.co.jp> Organization: FUJITSU Co. LTD. X-Mailer: Sylpheed 2.5.0 (GTK+ 2.10.14; i686-pc-mingw32) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 7072 Lines: 221 On Fri, 7 Nov 2008 18:02:48 +0900 Daisuke Nishimura wrote: > On Wed, 5 Nov 2008 17:23:16 +0900, KAMEZAWA Hiroyuki wrote: > > Mem+Swap controller core. > > > > This patch implements per cgroup limit for usage of memory+swap. > > However there are SwapCache, double counting of swap-cache and > > swap-entry is avoided. > > > > Mem+Swap controller works as following. > > - memory usage is limited by memory.limit_in_bytes. > > - memory + swap usage is limited by memory.memsw_limit_in_bytes. > > > > > > This has following benefits. > > - A user can limit total resource usage of mem+swap. > > > > Without this, because memory resource controller doesn't take care of > > usage of swap, a process can exhaust all the swap (by memory leak.) > > We can avoid this case. > > > > And Swap is shared resource but it cannot be reclaimed (goes back to memory) > > until it's used. This characteristic can be trouble when the memory > > is divided into some parts by cpuset or memcg. > > Assume group A and group B. > > After some application executes, the system can be.. > > > > Group A -- very large free memory space but occupy 99% of swap. > > Group B -- under memory shortage but cannot use swap...it's nearly full. > > > > Ability to set appropriate swap limit for each group is required. > > > > Maybe someone wonder "why not swap but mem+swap ?" > > > > - The global LRU(kswapd) can swap out arbitrary pages. Swap-out means > > to move account from memory to swap...there is no change in usage of > > mem+swap. > > > > In other words, when we want to limit the usage of swap without affecting > > global LRU, mem+swap limit is better than just limiting swap. > > > > > > Accounting target information is stored in swap_cgroup which is > > per swap entry record. > > > > Charge is done as following. > > map > > - charge page and memsw. > > > > unmap > > - uncharge page/memsw if not SwapCache. > > > > swap-out (__delete_from_swap_cache) > > - uncharge page > > - record mem_cgroup information to swap_cgroup. > > > > swap-in (do_swap_page) > > - charged as page and memsw. > > record in swap_cgroup is cleared. > > memsw accounting is decremented. > > > > swap-free (swap_free()) > > - if swap entry is freed, memsw is uncharged by PAGE_SIZE. > > > > > > After this, usual memory resource controller handles SwapCache. > > (It was lacked(ignored) feature in current memcg but must be handled.) > > > SwapCache has been handled in [2/6] already :) > yes. I'll rewrite this. > (snip) > > @@ -514,12 +534,25 @@ static int __mem_cgroup_try_charge(struc > > css_get(&mem->css); > > } > > > > + while (1) { > > + int ret; > > + bool noswap = false; > > > > - while (unlikely(res_counter_charge(&mem->res, PAGE_SIZE))) { > > + ret = res_counter_charge(&mem->res, PAGE_SIZE); > > + if (likely(!ret)) { > > + if (!do_swap_account) > > + break; > > + ret = res_counter_charge(&mem->memsw, PAGE_SIZE); > > + if (likely(!ret)) > > + break; > > + /* mem+swap counter fails */ > > + res_counter_uncharge(&mem->res, PAGE_SIZE); > > + noswap = true; > > + } > > if (!(gfp_mask & __GFP_WAIT)) > > goto nomem; > > > > - if (try_to_free_mem_cgroup_pages(mem, gfp_mask)) > > + if (try_to_free_mem_cgroup_pages(mem, gfp_mask, noswap)) > > continue; > > > > /* > I have two comment about try_charge. > > 1. It would be better if possible to avoid charging memsw at swapin (and uncharging > it again at mem_cgroup_cache_charge_swapin/mem_cgroup_commit_charge_swapin). > How about adding a new argument "charge_memsw" ? (it has many args already now...) Hmm, maybe possible and good. I'll cosider this again. > 2. Should we use swap when exceeding mem.limit but mem.limit == memsw.limit ? > I'd like to put that special case into "TODO" list. Hmm... maybe set noswap=true in that case is enough. but we have to be careful. > (snip) > > void mem_cgroup_cancel_charge_swapin(struct mem_cgroup *mem) > > @@ -838,6 +947,7 @@ void mem_cgroup_cancel_charge_swapin(str > > if (!mem) > > return; > > res_counter_uncharge(&mem->res, PAGE_SIZE); > > + res_counter_uncharge(&mem->memsw, PAGE_SIZE); > > css_put(&mem->css); > > } > > > "if (do_swap_account)" is needed before uncharging memsw. > good catch ! > (snip) > > static struct cftype mem_cgroup_files[] = { > > { > > .name = "usage_in_bytes", > > - .private = RES_USAGE, > > + .private = MEMFILE_PRIVATE(_MEM, RES_USAGE), > > .read_u64 = mem_cgroup_read, > > }, > > { > > .name = "max_usage_in_bytes", > > - .private = RES_MAX_USAGE, > > + .private = MEMFILE_PRIVATE(_MEM, RES_MAX_USAGE), > > .trigger = mem_cgroup_reset, > > .read_u64 = mem_cgroup_read, > > }, > > { > > .name = "limit_in_bytes", > > - .private = RES_LIMIT, > > + .private = MEMFILE_PRIVATE(_MEM, RES_LIMIT), > > .write_string = mem_cgroup_write, > > .read_u64 = mem_cgroup_read, > > }, > > { > > .name = "failcnt", > > - .private = RES_FAILCNT, > > + .private = MEMFILE_PRIVATE(_MEM, RES_FAILCNT), > > .trigger = mem_cgroup_reset, > > .read_u64 = mem_cgroup_read, > > }, > > @@ -1317,6 +1541,31 @@ static struct cftype mem_cgroup_files[] > > .name = "stat", > > .read_map = mem_control_stat_show, > > }, > > +#ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP > > + { > > + .name = "memsw.usage_in_bytes", > > + .private = MEMFILE_PRIVATE(_MEMSWAP, RES_USAGE), > > + .read_u64 = mem_cgroup_read, > > + }, > > + { > > + .name = "memsw.max_usage_in_bytes", > > + .private = MEMFILE_PRIVATE(_MEMSWAP, RES_MAX_USAGE), > > + .trigger = mem_cgroup_reset, > > + .read_u64 = mem_cgroup_read, > > + }, > > + { > > + .name = "memsw.limit_in_bytes", > > + .private = MEMFILE_PRIVATE(_MEMSWAP, RES_LIMIT), > > + .write_string = mem_cgroup_write, > > + .read_u64 = mem_cgroup_read, > > + }, > > + { > > + .name = "memsw.failcnt", > > + .private = MEMFILE_PRIVATE(_MEMSWAP, RES_FAILCNT), > > + .trigger = mem_cgroup_reset, > > + .read_u64 = mem_cgroup_read, > > + }, > > +#endif > > }; > > > IMHO, it would be better to define those "memsw.*" files as memsw_cgroup_files[], > and change mem_cgroup_populate() like: > > static int mem_cgroup_populate(struct cgroup_subsys *ss, > struct cgroup *cont) > { > int ret; > > ret = cgroup_add_files(cont, ss, mem_cgroup_files, > ARRAY_SIZE(mem_cgroup_files)); > if (!ret && do_swap_account) > ret = cgroup_add_files(cont, ss, memsw_cgroup_files, > ARRAY_SIZE(memsw_cgroup_files)); > > return ret; > } > > so that those files appear only when swap accounting is enabled. > Nice idea. I'll try that. Thanks, -Kame -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/