Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752239Ab1ECGMM (ORCPT ); Tue, 3 May 2011 02:12:12 -0400 Received: from zene.cmpxchg.org ([85.214.230.12]:45346 "EHLO zene.cmpxchg.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751748Ab1ECGMK (ORCPT ); Tue, 3 May 2011 02:12:10 -0400 Date: Tue, 3 May 2011 08:11:56 +0200 From: Johannes Weiner To: Ying Han Cc: James Bottomley , Chris Mason , linux-fsdevel , linux-mm , linux-kernel , Paul Menage , Li Zefan , containers@lists.linux-foundation.org, Balbir Singh Subject: Re: memcg: fix fatal livelock in kswapd Message-ID: <20110503061156.GC10278@cmpxchg.org> References: <1304366849.15370.27.camel@mulgrave.site> <20110502224838.GB10278@cmpxchg.org> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3165 Lines: 70 On Mon, May 02, 2011 at 04:14:09PM -0700, Ying Han wrote: > On Mon, May 2, 2011 at 3:48 PM, Johannes Weiner wrote: > > Hi, > > > > On Mon, May 02, 2011 at 03:07:29PM -0500, James Bottomley wrote: > >> The fatal livelock in kswapd, reported in this thread: > >> > >> http://marc.info/?t=130392066000001 > >> > >> Is mitigateable if we prevent the cgroups code being so aggressive in > >> its zone shrinking (by reducing it's default shrink from 0 [everything] > >> to DEF_PRIORITY [some things]). ?This will have an obvious knock on > >> effect to cgroup accounting, but it's better than hanging systems. > > > > Actually, it's not that obvious. ?At least not to me. ?I added Balbir, > > who added said comment and code in the first place, to CC: Here is the > > comment in full quote: > > > > ? ? ? ?/* > > ? ? ? ? * NOTE: Although we can get the priority field, using it > > ? ? ? ? * here is not a good idea, since it limits the pages we can scan. > > ? ? ? ? * if we don't reclaim here, the shrink_zone from balance_pgdat > > ? ? ? ? * will pick up pages from other mem cgroup's as well. We hack > > ? ? ? ? * the priority and make it zero. > > ? ? ? ? */ > > > > The idea is that if one memcg is above its softlimit, we prefer > > reducing pages from this memcg over reclaiming random other pages, > > including those of other memcgs. > > > > But the code flow looks like this: > > > > ? ? ? ?balance_pgdat > > ? ? ? ? ?mem_cgroup_soft_limit_reclaim > > ? ? ? ? ? ?mem_cgroup_shrink_node_zone > > ? ? ? ? ? ? ?shrink_zone(0, zone, &sc) > > ? ? ? ? ?shrink_zone(prio, zone, &sc) > > > > so the success of the inner memcg shrink_zone does at least not > > explicitely result in the outer, global shrink_zone steering clear of > > other memcgs' pages. ?It just tries to move the pressure of balancing > > the zones to the memcg with the biggest soft limit excess. ?That can > > only really work if the memcg is a large enough contributor to the > > zone's total number of lru pages, though, and looks very likely to hit > > the exceeding memcg too hard in other cases. > yes, the logic is selecting one memcg(the one exceeding the most) and > starting hierarchical reclaim on it. It will looping until the the > following condition becomes true: > 1. memcg usage is below its soft_limit > 2. looping 100 times > 3. reclaimed pages equal or greater than (excess >>2) where excess is > the (usage - soft_limit) There is no need to loop if we beat up the memcg in question with a hammer during the first iteration ;-) That is, we already did the aggressive scan when all these conditions are checked. > hmm, the worst case i can think of is the memcg only has one page > allocate on the zone, and we end up looping 100 time each time and not > contributing much to the global reclaim. Good point, it should probably bail earlier on a zone that does not really contribute to the soft limit excess. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/