Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752993Ab1FCFZg (ORCPT ); Fri, 3 Jun 2011 01:25:36 -0400 Received: from smtp-out.google.com ([74.125.121.67]:47603 "EHLO smtp-out.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752221Ab1FCFZf convert rfc822-to-8bit (ORCPT ); Fri, 3 Jun 2011 01:25:35 -0400 DomainKey-Signature: a=rsa-sha1; c=nofws; d=google.com; s=beta; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=A+SkuRc89mB31x6oFzi3CL4pL6RS93MOdDKzKrYDhm7AIkkJhHfI/aXANQj+X6jxRr KNIU0KkpDSYvT8ztExxg== MIME-Version: 1.0 In-Reply-To: References: <1306909519-7286-1-git-send-email-hannes@cmpxchg.org> <1306909519-7286-5-git-send-email-hannes@cmpxchg.org> Date: Thu, 2 Jun 2011 22:25:29 -0700 Message-ID: Subject: Re: [patch 4/8] memcg: rework soft limit reclaim From: Ying Han To: Johannes Weiner Cc: KAMEZAWA Hiroyuki , Daisuke Nishimura , Balbir Singh , Michal Hocko , Andrew Morton , Rik van Riel , Minchan Kim , KOSAKI Motohiro , Mel Gorman , Greg Thelen , Michel Lespinasse , "linux-mm@kvack.org" , linux-kernel Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8BIT X-System-Of-Record: true Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 8361 Lines: 201 On Thu, Jun 2, 2011 at 2:55 PM, Ying Han wrote: > On Tue, May 31, 2011 at 11:25 PM, Johannes Weiner wrote: >> Currently, soft limit reclaim is entered from kswapd, where it selects >> the memcg with the biggest soft limit excess in absolute bytes, and >> reclaims pages from it with maximum aggressiveness (priority 0). >> >> This has the following disadvantages: >> >> ? ?1. because of the aggressiveness, kswapd can be stalled on a memcg >> ? ?that is hard to reclaim from for a long time, sending the rest of >> ? ?the allocators into direct reclaim in the meantime. >> >> ? ?2. it only considers the biggest offender (in absolute bytes, no >> ? ?less, so very unhandy for setups with different-sized memcgs) and >> ? ?does not apply any pressure at all on other memcgs in excess. >> >> ? ?3. because it is only invoked from kswapd, the soft limit is >> ? ?meaningful during global memory pressure, but it is not taken into >> ? ?account during hierarchical target reclaim where it could allow >> ? ?prioritizing memcgs as well. ?So while it does hierarchical >> ? ?reclaim once triggered, it is not a truly hierarchical mechanism. >> >> Here is a different approach. ?Instead of having a soft limit reclaim >> cycle separate from the rest of reclaim, this patch ensures that each >> time a group of memcgs is reclaimed - be it because of global memory >> pressure or because of a hard limit - memcgs that exceed their soft >> limit, or contribute to the soft limit excess of one their parents, >> are reclaimed from at a higher priority than their siblings. >> >> This results in the following: >> >> ? ?1. all relevant memcgs are scanned with increasing priority during >> ? ?memory pressure. ?The primary goal is to free pages, not to punish >> ? ?soft limit offenders. >> >> ? ?2. increased pressure is applied to all memcgs in excess of their >> ? ?soft limit, not only the biggest offender. >> >> ? ?3. the soft limit becomes meaningful for target reclaim as well, >> ? ?where it allows prioritizing children of a hierarchy when the >> ? ?parent hits its limit. >> >> ? ?4. direct reclaim now also applies increased soft limit pressure, >> ? ?not just kswapd anymore. >> >> Signed-off-by: Johannes Weiner >> --- >> ?include/linux/memcontrol.h | ? ?7 +++++++ >> ?mm/memcontrol.c ? ? ? ? ? ?| ? 26 ++++++++++++++++++++++++++ >> ?mm/vmscan.c ? ? ? ? ? ? ? ?| ? ?8 ++++++-- >> ?3 files changed, 39 insertions(+), 2 deletions(-) >> >> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h >> index 8f402b9..7d99e87 100644 >> --- a/include/linux/memcontrol.h >> +++ b/include/linux/memcontrol.h >> @@ -104,6 +104,7 @@ extern void mem_cgroup_end_migration(struct mem_cgroup *mem, >> ?struct mem_cgroup *mem_cgroup_hierarchy_walk(struct mem_cgroup *, >> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? struct mem_cgroup *); >> ?void mem_cgroup_stop_hierarchy_walk(struct mem_cgroup *, struct mem_cgroup *); >> +bool mem_cgroup_soft_limit_exceeded(struct mem_cgroup *, struct mem_cgroup *); >> >> ?/* >> ?* For memory reclaim. >> @@ -345,6 +346,12 @@ static inline void mem_cgroup_stop_hierarchy_walk(struct mem_cgroup *r, >> ?{ >> ?} >> >> +static inline bool mem_cgroup_soft_limit_exceeded(struct mem_cgroup *root, >> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? struct mem_cgroup *mem) >> +{ >> + ? ? ? return false; >> +} >> + >> ?static inline void >> ?mem_cgroup_print_oom_info(struct mem_cgroup *memcg, struct task_struct *p) >> ?{ >> diff --git a/mm/memcontrol.c b/mm/memcontrol.c >> index 983efe4..94f77cc3 100644 >> --- a/mm/memcontrol.c >> +++ b/mm/memcontrol.c >> @@ -1460,6 +1460,32 @@ void mem_cgroup_stop_hierarchy_walk(struct mem_cgroup *root, >> ? ? ? ? ? ? ? ?css_put(&mem->css); >> ?} >> >> +/** >> + * mem_cgroup_soft_limit_exceeded - check if a memcg (hierarchically) >> + * ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?exceeds a soft limit >> + * @root: highest ancestor of @mem to consider >> + * @mem: memcg to check for excess >> + * >> + * The function indicates whether @mem has exceeded its own soft >> + * limit, or contributes to the soft limit excess of one of its >> + * parents in the hierarchy below @root. >> + */ >> +bool mem_cgroup_soft_limit_exceeded(struct mem_cgroup *root, >> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? struct mem_cgroup *mem) >> +{ >> + ? ? ? for (;;) { >> + ? ? ? ? ? ? ? if (mem == root_mem_cgroup) >> + ? ? ? ? ? ? ? ? ? ? ? return false; >> + ? ? ? ? ? ? ? if (res_counter_soft_limit_excess(&mem->res)) >> + ? ? ? ? ? ? ? ? ? ? ? return true; >> + ? ? ? ? ? ? ? if (mem == root) >> + ? ? ? ? ? ? ? ? ? ? ? return false; >> + ? ? ? ? ? ? ? mem = parent_mem_cgroup(mem); >> + ? ? ? ? ? ? ? if (!mem) >> + ? ? ? ? ? ? ? ? ? ? ? return false; >> + ? ? ? } >> +} >> + >> ?static unsigned long mem_cgroup_reclaim(struct mem_cgroup *mem, >> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?gfp_t gfp_mask, >> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?unsigned long flags) >> diff --git a/mm/vmscan.c b/mm/vmscan.c >> index c7d4b44..0163840 100644 >> --- a/mm/vmscan.c >> +++ b/mm/vmscan.c >> @@ -1988,9 +1988,13 @@ static void shrink_zone(int priority, struct zone *zone, >> ? ? ? ? ? ? ? ?unsigned long reclaimed = sc->nr_reclaimed; >> ? ? ? ? ? ? ? ?unsigned long scanned = sc->nr_scanned; >> ? ? ? ? ? ? ? ?unsigned long nr_reclaimed; >> + ? ? ? ? ? ? ? int epriority = priority; >> + >> + ? ? ? ? ? ? ? if (mem_cgroup_soft_limit_exceeded(root, mem)) >> + ? ? ? ? ? ? ? ? ? ? ? epriority -= 1; > > Here we grant the ability to shrink from all the memcgs, but only > higher the priority for those exceed the soft_limit. That is a design > change > for the "soft_limit" which giving a hint to which memcgs to reclaim > from first under global memory pressure. Basically, we shouldn't reclaim from a memcg under its soft_limit unless we have trouble reclaim pages from others. Something like the following makes better sense: diff --git a/mm/vmscan.c b/mm/vmscan.c index bdc2fd3..b82ba8c 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1989,6 +1989,8 @@ restart: throttle_vm_writeout(sc->gfp_mask); } +#define MEMCG_SOFTLIMIT_RECLAIM_PRIORITY 2 + static void shrink_zone(int priority, struct zone *zone, struct scan_control *sc) { @@ -2001,13 +2003,13 @@ static void shrink_zone(int priority, struct zone *zone, unsigned long reclaimed = sc->nr_reclaimed; unsigned long scanned = sc->nr_scanned; unsigned long nr_reclaimed; - int epriority = priority; - if (mem_cgroup_soft_limit_exceeded(root, mem)) - epriority -= 1; + if (!mem_cgroup_soft_limit_exceeded(root, mem) && + priority > MEMCG_SOFTLIMIT_RECLAIM_PRIORITY) + continue; sc->mem_cgroup = mem; - do_shrink_zone(epriority, zone, sc); + do_shrink_zone(priority, zone, sc); mem_cgroup_count_reclaim(mem, current_is_kswapd(), mem != root, /* limit or hierarchy? */ sc->nr_scanned - scanned, --Ying > > --Ying > > >> >> ? ? ? ? ? ? ? ?sc->mem_cgroup = mem; >> - ? ? ? ? ? ? ? do_shrink_zone(priority, zone, sc); >> + ? ? ? ? ? ? ? do_shrink_zone(epriority, zone, sc); >> ? ? ? ? ? ? ? ?mem_cgroup_count_reclaim(mem, current_is_kswapd(), >> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? mem != root, /* limit or hierarchy? */ >> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? sc->nr_scanned - scanned, >> @@ -2480,7 +2484,7 @@ loop_again: >> ? ? ? ? ? ? ? ? ? ? ? ? * Call soft limit reclaim before calling shrink_zone. >> ? ? ? ? ? ? ? ? ? ? ? ? * For now we ignore the return value >> ? ? ? ? ? ? ? ? ? ? ? ? */ >> - ? ? ? ? ? ? ? ? ? ? ? mem_cgroup_soft_limit_reclaim(zone, order, sc.gfp_mask); >> + ? ? ? ? ? ? ? ? ? ? ? //mem_cgroup_soft_limit_reclaim(zone, order, sc.gfp_mask); >> >> ? ? ? ? ? ? ? ? ? ? ? ?/* >> ? ? ? ? ? ? ? ? ? ? ? ? * We put equal pressure on every zone, unless >> -- >> 1.7.5.2 >> >> > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/