Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754501AbYKGJIz (ORCPT ); Fri, 7 Nov 2008 04:08:55 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1751831AbYKGJIi (ORCPT ); Fri, 7 Nov 2008 04:08:38 -0500 Received: from TYO202.gate.nec.co.jp ([202.32.8.206]:33505 "EHLO tyo202.gate.nec.co.jp" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751350AbYKGJIg (ORCPT ); Fri, 7 Nov 2008 04:08:36 -0500 Date: Fri, 7 Nov 2008 18:02:48 +0900 From: Daisuke Nishimura To: KAMEZAWA Hiroyuki Cc: "linux-mm@kvack.org" , "linux-kernel@vger.kernel.org" , "balbir@linux.vnet.ibm.com" , "menage@google.com" , nishimura@mxp.nes.nec.co.jp Subject: Re: [RFC][PATCH 5/6] memcg: mem+swap controller Message-Id: <20081107180248.39251a80.nishimura@mxp.nes.nec.co.jp> In-Reply-To: <20081105172316.354c00fb.kamezawa.hiroyu@jp.fujitsu.com> References: <20081105171637.1b393333.kamezawa.hiroyu@jp.fujitsu.com> <20081105172316.354c00fb.kamezawa.hiroyu@jp.fujitsu.com> Organization: NEC Soft, Ltd. X-Mailer: Sylpheed 2.4.8 (GTK+ 2.10.14; i686-pc-mingw32) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 6345 Lines: 203 On Wed, 5 Nov 2008 17:23:16 +0900, KAMEZAWA Hiroyuki wrote: > Mem+Swap controller core. > > This patch implements per cgroup limit for usage of memory+swap. > However there are SwapCache, double counting of swap-cache and > swap-entry is avoided. > > Mem+Swap controller works as following. > - memory usage is limited by memory.limit_in_bytes. > - memory + swap usage is limited by memory.memsw_limit_in_bytes. > > > This has following benefits. > - A user can limit total resource usage of mem+swap. > > Without this, because memory resource controller doesn't take care of > usage of swap, a process can exhaust all the swap (by memory leak.) > We can avoid this case. > > And Swap is shared resource but it cannot be reclaimed (goes back to memory) > until it's used. This characteristic can be trouble when the memory > is divided into some parts by cpuset or memcg. > Assume group A and group B. > After some application executes, the system can be.. > > Group A -- very large free memory space but occupy 99% of swap. > Group B -- under memory shortage but cannot use swap...it's nearly full. > > Ability to set appropriate swap limit for each group is required. > > Maybe someone wonder "why not swap but mem+swap ?" > > - The global LRU(kswapd) can swap out arbitrary pages. Swap-out means > to move account from memory to swap...there is no change in usage of > mem+swap. > > In other words, when we want to limit the usage of swap without affecting > global LRU, mem+swap limit is better than just limiting swap. > > > Accounting target information is stored in swap_cgroup which is > per swap entry record. > > Charge is done as following. > map > - charge page and memsw. > > unmap > - uncharge page/memsw if not SwapCache. > > swap-out (__delete_from_swap_cache) > - uncharge page > - record mem_cgroup information to swap_cgroup. > > swap-in (do_swap_page) > - charged as page and memsw. > record in swap_cgroup is cleared. > memsw accounting is decremented. > > swap-free (swap_free()) > - if swap entry is freed, memsw is uncharged by PAGE_SIZE. > > > After this, usual memory resource controller handles SwapCache. > (It was lacked(ignored) feature in current memcg but must be handled.) > SwapCache has been handled in [2/6] already :) (snip) > @@ -514,12 +534,25 @@ static int __mem_cgroup_try_charge(struc > css_get(&mem->css); > } > > + while (1) { > + int ret; > + bool noswap = false; > > - while (unlikely(res_counter_charge(&mem->res, PAGE_SIZE))) { > + ret = res_counter_charge(&mem->res, PAGE_SIZE); > + if (likely(!ret)) { > + if (!do_swap_account) > + break; > + ret = res_counter_charge(&mem->memsw, PAGE_SIZE); > + if (likely(!ret)) > + break; > + /* mem+swap counter fails */ > + res_counter_uncharge(&mem->res, PAGE_SIZE); > + noswap = true; > + } > if (!(gfp_mask & __GFP_WAIT)) > goto nomem; > > - if (try_to_free_mem_cgroup_pages(mem, gfp_mask)) > + if (try_to_free_mem_cgroup_pages(mem, gfp_mask, noswap)) > continue; > > /* I have two comment about try_charge. 1. It would be better if possible to avoid charging memsw at swapin (and uncharging it again at mem_cgroup_cache_charge_swapin/mem_cgroup_commit_charge_swapin). How about adding a new argument "charge_memsw" ? (it has many args already now...) 2. Should we use swap when exceeding mem.limit but mem.limit == memsw.limit ? (snip) > void mem_cgroup_cancel_charge_swapin(struct mem_cgroup *mem) > @@ -838,6 +947,7 @@ void mem_cgroup_cancel_charge_swapin(str > if (!mem) > return; > res_counter_uncharge(&mem->res, PAGE_SIZE); > + res_counter_uncharge(&mem->memsw, PAGE_SIZE); > css_put(&mem->css); > } > "if (do_swap_account)" is needed before uncharging memsw. (snip) > static struct cftype mem_cgroup_files[] = { > { > .name = "usage_in_bytes", > - .private = RES_USAGE, > + .private = MEMFILE_PRIVATE(_MEM, RES_USAGE), > .read_u64 = mem_cgroup_read, > }, > { > .name = "max_usage_in_bytes", > - .private = RES_MAX_USAGE, > + .private = MEMFILE_PRIVATE(_MEM, RES_MAX_USAGE), > .trigger = mem_cgroup_reset, > .read_u64 = mem_cgroup_read, > }, > { > .name = "limit_in_bytes", > - .private = RES_LIMIT, > + .private = MEMFILE_PRIVATE(_MEM, RES_LIMIT), > .write_string = mem_cgroup_write, > .read_u64 = mem_cgroup_read, > }, > { > .name = "failcnt", > - .private = RES_FAILCNT, > + .private = MEMFILE_PRIVATE(_MEM, RES_FAILCNT), > .trigger = mem_cgroup_reset, > .read_u64 = mem_cgroup_read, > }, > @@ -1317,6 +1541,31 @@ static struct cftype mem_cgroup_files[] > .name = "stat", > .read_map = mem_control_stat_show, > }, > +#ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP > + { > + .name = "memsw.usage_in_bytes", > + .private = MEMFILE_PRIVATE(_MEMSWAP, RES_USAGE), > + .read_u64 = mem_cgroup_read, > + }, > + { > + .name = "memsw.max_usage_in_bytes", > + .private = MEMFILE_PRIVATE(_MEMSWAP, RES_MAX_USAGE), > + .trigger = mem_cgroup_reset, > + .read_u64 = mem_cgroup_read, > + }, > + { > + .name = "memsw.limit_in_bytes", > + .private = MEMFILE_PRIVATE(_MEMSWAP, RES_LIMIT), > + .write_string = mem_cgroup_write, > + .read_u64 = mem_cgroup_read, > + }, > + { > + .name = "memsw.failcnt", > + .private = MEMFILE_PRIVATE(_MEMSWAP, RES_FAILCNT), > + .trigger = mem_cgroup_reset, > + .read_u64 = mem_cgroup_read, > + }, > +#endif > }; > IMHO, it would be better to define those "memsw.*" files as memsw_cgroup_files[], and change mem_cgroup_populate() like: static int mem_cgroup_populate(struct cgroup_subsys *ss, struct cgroup *cont) { int ret; ret = cgroup_add_files(cont, ss, mem_cgroup_files, ARRAY_SIZE(mem_cgroup_files)); if (!ret && do_swap_account) ret = cgroup_add_files(cont, ss, memsw_cgroup_files, ARRAY_SIZE(memsw_cgroup_files)); return ret; } so that those files appear only when swap accounting is enabled. Thanks, Daisuke Nishimura. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/