Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754300Ab1FBFhf (ORCPT ); Thu, 2 Jun 2011 01:37:35 -0400 Received: from smtp-out.google.com ([74.125.121.67]:4477 "EHLO smtp-out.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751080Ab1FBFhe convert rfc822-to-8bit (ORCPT ); Thu, 2 Jun 2011 01:37:34 -0400 DomainKey-Signature: a=rsa-sha1; c=nofws; d=google.com; s=beta; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=Fk5TJsVJomiyJD3toLXU3U4IiMSTUIwpEjLczsS9CIE8zZV+VY5wb9qackjx3i3daO ArLJW4FCq5O+Ixl2mQgQ== MIME-Version: 1.0 In-Reply-To: <1306909519-7286-5-git-send-email-hannes@cmpxchg.org> References: <1306909519-7286-1-git-send-email-hannes@cmpxchg.org> <1306909519-7286-5-git-send-email-hannes@cmpxchg.org> Date: Wed, 1 Jun 2011 22:37:29 -0700 Message-ID: Subject: Re: [patch 4/8] memcg: rework soft limit reclaim From: Ying Han To: Johannes Weiner Cc: KAMEZAWA Hiroyuki , Daisuke Nishimura , Balbir Singh , Michal Hocko , Andrew Morton , Rik van Riel , Minchan Kim , KOSAKI Motohiro , Mel Gorman , Greg Thelen , Michel Lespinasse , "linux-mm@kvack.org" , linux-kernel Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8BIT X-System-Of-Record: true Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 6741 Lines: 162 On Tue, May 31, 2011 at 11:25 PM, Johannes Weiner wrote: > Currently, soft limit reclaim is entered from kswapd, where it selects > the memcg with the biggest soft limit excess in absolute bytes, and > reclaims pages from it with maximum aggressiveness (priority 0). > > This has the following disadvantages: > > ? ?1. because of the aggressiveness, kswapd can be stalled on a memcg > ? ?that is hard to reclaim from for a long time, sending the rest of > ? ?the allocators into direct reclaim in the meantime. > > ? ?2. it only considers the biggest offender (in absolute bytes, no > ? ?less, so very unhandy for setups with different-sized memcgs) and > ? ?does not apply any pressure at all on other memcgs in excess. > > ? ?3. because it is only invoked from kswapd, the soft limit is > ? ?meaningful during global memory pressure, but it is not taken into > ? ?account during hierarchical target reclaim where it could allow > ? ?prioritizing memcgs as well. ?So while it does hierarchical > ? ?reclaim once triggered, it is not a truly hierarchical mechanism. > > Here is a different approach. ?Instead of having a soft limit reclaim > cycle separate from the rest of reclaim, this patch ensures that each > time a group of memcgs is reclaimed - be it because of global memory > pressure or because of a hard limit - memcgs that exceed their soft > limit, or contribute to the soft limit excess of one their parents, > are reclaimed from at a higher priority than their siblings. > > This results in the following: > > ? ?1. all relevant memcgs are scanned with increasing priority during > ? ?memory pressure. ?The primary goal is to free pages, not to punish > ? ?soft limit offenders. > > ? ?2. increased pressure is applied to all memcgs in excess of their > ? ?soft limit, not only the biggest offender. > > ? ?3. the soft limit becomes meaningful for target reclaim as well, > ? ?where it allows prioritizing children of a hierarchy when the > ? ?parent hits its limit. > > ? ?4. direct reclaim now also applies increased soft limit pressure, > ? ?not just kswapd anymore. So I see now that we removed the logic of doing per-zone soft_limit reclaim totally (including the next patch). Instead we are iterating the whole memcg hierarchy under global memory pressure. Is there a reason we didn't keep the per-zone memcg list which allows us only scanning memgs w/ pages landed on the zone? --Ying > > Signed-off-by: Johannes Weiner > --- > ?include/linux/memcontrol.h | ? ?7 +++++++ > ?mm/memcontrol.c ? ? ? ? ? ?| ? 26 ++++++++++++++++++++++++++ > ?mm/vmscan.c ? ? ? ? ? ? ? ?| ? ?8 ++++++-- > ?3 files changed, 39 insertions(+), 2 deletions(-) > > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h > index 8f402b9..7d99e87 100644 > --- a/include/linux/memcontrol.h > +++ b/include/linux/memcontrol.h > @@ -104,6 +104,7 @@ extern void mem_cgroup_end_migration(struct mem_cgroup *mem, > ?struct mem_cgroup *mem_cgroup_hierarchy_walk(struct mem_cgroup *, > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? struct mem_cgroup *); > ?void mem_cgroup_stop_hierarchy_walk(struct mem_cgroup *, struct mem_cgroup *); > +bool mem_cgroup_soft_limit_exceeded(struct mem_cgroup *, struct mem_cgroup *); > > ?/* > ?* For memory reclaim. > @@ -345,6 +346,12 @@ static inline void mem_cgroup_stop_hierarchy_walk(struct mem_cgroup *r, > ?{ > ?} > > +static inline bool mem_cgroup_soft_limit_exceeded(struct mem_cgroup *root, > + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? struct mem_cgroup *mem) > +{ > + ? ? ? return false; > +} > + > ?static inline void > ?mem_cgroup_print_oom_info(struct mem_cgroup *memcg, struct task_struct *p) > ?{ > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 983efe4..94f77cc3 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -1460,6 +1460,32 @@ void mem_cgroup_stop_hierarchy_walk(struct mem_cgroup *root, > ? ? ? ? ? ? ? ?css_put(&mem->css); > ?} > > +/** > + * mem_cgroup_soft_limit_exceeded - check if a memcg (hierarchically) > + * ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?exceeds a soft limit > + * @root: highest ancestor of @mem to consider > + * @mem: memcg to check for excess > + * > + * The function indicates whether @mem has exceeded its own soft > + * limit, or contributes to the soft limit excess of one of its > + * parents in the hierarchy below @root. > + */ > +bool mem_cgroup_soft_limit_exceeded(struct mem_cgroup *root, > + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? struct mem_cgroup *mem) > +{ > + ? ? ? for (;;) { > + ? ? ? ? ? ? ? if (mem == root_mem_cgroup) > + ? ? ? ? ? ? ? ? ? ? ? return false; > + ? ? ? ? ? ? ? if (res_counter_soft_limit_excess(&mem->res)) > + ? ? ? ? ? ? ? ? ? ? ? return true; > + ? ? ? ? ? ? ? if (mem == root) > + ? ? ? ? ? ? ? ? ? ? ? return false; > + ? ? ? ? ? ? ? mem = parent_mem_cgroup(mem); > + ? ? ? ? ? ? ? if (!mem) > + ? ? ? ? ? ? ? ? ? ? ? return false; > + ? ? ? } > +} > + > ?static unsigned long mem_cgroup_reclaim(struct mem_cgroup *mem, > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?gfp_t gfp_mask, > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?unsigned long flags) > diff --git a/mm/vmscan.c b/mm/vmscan.c > index c7d4b44..0163840 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -1988,9 +1988,13 @@ static void shrink_zone(int priority, struct zone *zone, > ? ? ? ? ? ? ? ?unsigned long reclaimed = sc->nr_reclaimed; > ? ? ? ? ? ? ? ?unsigned long scanned = sc->nr_scanned; > ? ? ? ? ? ? ? ?unsigned long nr_reclaimed; > + ? ? ? ? ? ? ? int epriority = priority; > + > + ? ? ? ? ? ? ? if (mem_cgroup_soft_limit_exceeded(root, mem)) > + ? ? ? ? ? ? ? ? ? ? ? epriority -= 1; > > ? ? ? ? ? ? ? ?sc->mem_cgroup = mem; > - ? ? ? ? ? ? ? do_shrink_zone(priority, zone, sc); > + ? ? ? ? ? ? ? do_shrink_zone(epriority, zone, sc); > ? ? ? ? ? ? ? ?mem_cgroup_count_reclaim(mem, current_is_kswapd(), > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? mem != root, /* limit or hierarchy? */ > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? sc->nr_scanned - scanned, > @@ -2480,7 +2484,7 @@ loop_again: > ? ? ? ? ? ? ? ? ? ? ? ? * Call soft limit reclaim before calling shrink_zone. > ? ? ? ? ? ? ? ? ? ? ? ? * For now we ignore the return value > ? ? ? ? ? ? ? ? ? ? ? ? */ > - ? ? ? ? ? ? ? ? ? ? ? mem_cgroup_soft_limit_reclaim(zone, order, sc.gfp_mask); > + ? ? ? ? ? ? ? ? ? ? ? //mem_cgroup_soft_limit_reclaim(zone, order, sc.gfp_mask); > > ? ? ? ? ? ? ? ? ? ? ? ?/* > ? ? ? ? ? ? ? ? ? ? ? ? * We put equal pressure on every zone, unless > -- > 1.7.5.2 > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/