Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1946097AbWKJJQy (ORCPT ); Fri, 10 Nov 2006 04:16:54 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1946118AbWKJJQx (ORCPT ); Fri, 10 Nov 2006 04:16:53 -0500 Received: from ausmtp04.au.ibm.com ([202.81.18.152]:29129 "EHLO ausmtp04.au.ibm.com") by vger.kernel.org with ESMTP id S1946097AbWKJJQv (ORCPT ); Fri, 10 Nov 2006 04:16:51 -0500 Message-ID: <45544362.9040805@in.ibm.com> Date: Fri, 10 Nov 2006 14:46:18 +0530 From: Balbir Singh Reply-To: balbir@in.ibm.com Organization: IBM User-Agent: Thunderbird 1.5.0.7 (X11/20060922) MIME-Version: 1.0 To: Pavel Emelianov CC: Linux MM , dev@openvz.org, ckrm-tech@lists.sourceforge.net, Linux Kernel Mailing List , haveblue@us.ibm.com, rohitseth@google.com Subject: Re: [RFC][PATCH 8/8] RSS controller support reclamation References: <20061109193523.21437.86224.sendpatchset@balbir.in.ibm.com> <20061109193636.21437.11778.sendpatchset@balbir.in.ibm.com> <45543E36.2080600@openvz.org> In-Reply-To: <45543E36.2080600@openvz.org> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4532 Lines: 147 Pavel Emelianov wrote: > Balbir Singh wrote: >> Reclaim memory as we hit the max_shares limit. The code for reclamation >> is inspired from Dave Hansen's challenged memory controller and from the >> shrink_all_memory() code >> >> Reclamation can be triggered from two paths >> >> 1. While incrementing the RSS, we hit the limit of the container >> 2. A container is resized, such that it's new limit is below its current >> RSS >> >> In (1) reclamation takes place in the background. > > Hmm... This is not a hard limit in this case, right? And in case > of overloaded system from the moment reclamation thread is woken > up till the moment it starts shrinking zones container may touch > too many pages... > > That's not good. Yes, please see my comments in the TODO's. Hard limits should be easy to implement, it's a question of calling the correct routine based on policy. > >> TODO's >> >> 1. max_shares currently works like a soft limit. The RSS can grow beyond it's >> limit. One possible fix is to introduce a soft limit (reclaim when the >> container hits the soft limit) and fail when we hit the hard limit > > Such soft limit doesn't help also. It just makes effects on > low-loaded system smoother. > > And what about a hard limit - how would you fail in page fault in > case of limit hit? SIGKILL/SEGV is not an option - in this case we > should run synchronous reclamation. This is done in beancounter > patches v6 we've sent recently. > I thought about running synchronous reclamation, but then did not follow that approach, I was not sure if calling the reclaim routines from the page fault context is a good thing to do. It's worth trying out, since it would provide better control over rss. >> Signed-off-by: Balbir Singh >> --- >> >> --- linux-2.6.19-rc2/mm/vmscan.c~container-memctlr-reclaim 2006-11-09 22:21:11.000000000 +0530 >> +++ linux-2.6.19-rc2-balbir/mm/vmscan.c 2006-11-09 22:21:11.000000000 +0530 >> @@ -36,6 +36,8 @@ >> #include >> #include >> #include >> +#include >> +#include >> >> #include >> #include >> @@ -65,6 +67,9 @@ struct scan_control { >> int swappiness; >> >> int all_unreclaimable; >> + >> + int overlimit; >> + void *container; /* Added as void * to avoid #ifdef's */ >> }; >> >> /* >> @@ -811,6 +816,10 @@ force_reclaim_mapped: >> cond_resched(); >> page = lru_to_page(&l_hold); >> list_del(&page->lru); >> + if (!memctlr_page_reclaim(page, sc->container, sc->overlimit)) { >> + list_add(&page->lru, &l_active); >> + continue; >> + } >> if (page_mapped(page)) { >> if (!reclaim_mapped || >> (total_swap_pages == 0 && PageAnon(page)) || > > [snip] See comment below. > >> >> +#ifdef CONFIG_RES_GROUPS_MEMORY >> +/* >> + * Modelled after shrink_all_memory >> + */ >> +unsigned long memctlr_shrink_container_memory(unsigned long nr_pages, >> + struct container *container, >> + int overlimit) >> +{ >> + unsigned long lru_pages; >> + unsigned long ret = 0; >> + int pass; >> + struct zone *zone; >> + struct scan_control sc = { >> + .gfp_mask = GFP_KERNEL, >> + .may_swap = 0, >> + .swap_cluster_max = nr_pages, >> + .may_writepage = 1, >> + .swappiness = vm_swappiness, >> + .overlimit = overlimit, >> + .container = container, >> + }; >> + > > [snip] > >> + for (prio = DEF_PRIORITY; prio >= 0; prio--) { >> + unsigned long nr_to_scan = nr_pages - ret; >> + >> + sc.nr_scanned = 0; >> + ret += shrink_all_zones(nr_to_scan, prio, pass, &sc); >> + if (ret >= nr_pages) >> + break; >> + >> + if (sc.nr_scanned && prio < DEF_PRIORITY - 2) >> + blk_congestion_wait(WRITE, HZ / 10); >> + } >> + } >> + return ret; >> +} >> +#endif > > Please correct me if I'm wrong, but does this reclamation work like > "run over all the zones' lists searching for page whose controller > is sc->container" ? > Yeah, that's correct. The code can also reclaim memory from all over-the-limit containers (by passing SC_OVERLIMIT_ALL). The idea behind using such a scheme is to ensure that the global LRU list is not broken. -- Thanks for the feedback, Balbir Singh, Linux Technology Center, IBM Software Labs - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/