Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758076AbYJIG1X (ORCPT ); Thu, 9 Oct 2008 02:27:23 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1753462AbYJIG1P (ORCPT ); Thu, 9 Oct 2008 02:27:15 -0400 Received: from fgwmail5.fujitsu.co.jp ([192.51.44.35]:41712 "EHLO fgwmail5.fujitsu.co.jp" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754180AbYJIG1O (ORCPT ); Thu, 9 Oct 2008 02:27:14 -0400 Date: Thu, 9 Oct 2008 15:26:53 +0900 From: KAMEZAWA Hiroyuki To: Daisuke Nishimura Cc: "linux-mm@kvack.org" , LKML , "balbir@linux.vnet.ibm.com" Subject: Re: [PATCH 5/6] memcg: lazy lru freeing Message-Id: <20081009152653.83b5ffac.kamezawa.hiroyu@jp.fujitsu.com> In-Reply-To: <20081009143949.b3cf91b7.nishimura@mxp.nes.nec.co.jp> References: <20081001165233.404c8b9c.kamezawa.hiroyu@jp.fujitsu.com> <20081001170005.1997d7c8.kamezawa.hiroyu@jp.fujitsu.com> <20081009143949.b3cf91b7.nishimura@mxp.nes.nec.co.jp> Organization: FUJITSU Co. LTD. X-Mailer: Sylpheed 2.5.0 (GTK+ 2.10.14; i686-pc-mingw32) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3916 Lines: 118 On Thu, 9 Oct 2008 14:39:49 +0900 Daisuke Nishimura wrote: > On Wed, 1 Oct 2008 17:00:05 +0900, KAMEZAWA Hiroyuki wrote: > > Free page_cgroup from its LRU in batched manner. > > > > When uncharge() is called, page is pushed onto per-cpu vector and > > removed from LRU, later.. This routine resembles to global LRU's pagevec. > > This patch is half of the whole patch and a set with following lazy LRU add > > patch. > > > > After this, a pc, which is PageCgroupLRU(pc)==true, is on LRU. > > This LRU bit is guarded by lru_lock(). > > > > PageCgroupUsed(pc) && PageCgroupLRU(pc) means "pc" is used and on LRU. > > This check makes sense only when both 2 locks, lock_page_cgroup()/lru_lock(), > > are aquired. > > > > PageCgroupUsed(pc) && !PageCgroupLRU(pc) means "pc" is used but not on LRU. > > !PageCgroupUsed(pc) && PageCgroupLRU(pc) means "pc" is unused but still on > > LRU. lru walk routine should avoid touching this. > > > > Changelog (v5) => (v6): > > - Fixing race and added PCG_LRU bit > > > > Signed-off-by: KAMEZAWA Hiroyuki > > > > (snip) > > > +static void > > +__release_page_cgroup(struct memcg_percpu_vec *mpv) > > +{ > > + unsigned long flags; > > + struct mem_cgroup_per_zone *mz, *prev_mz; > > + struct page_cgroup *pc; > > + int i, nr; > > + > > + local_irq_save(flags); > > + nr = mpv->nr; > > + mpv->nr = 0; > > + prev_mz = NULL; > > + for (i = nr - 1; i >= 0; i--) { > > + pc = mpv->vec[i]; > > + mz = page_cgroup_zoneinfo(pc); > > + if (prev_mz != mz) { > > + if (prev_mz) > > + spin_unlock(&prev_mz->lru_lock); > > + prev_mz = mz; > > + spin_lock(&mz->lru_lock); > > + } > > + /* > > + * this "pc" may be charge()->uncharge() while we are waiting > > + * for this. But charge() path check LRU bit and remove this > > + * from LRU if necessary. > > + */ > > + if (!PageCgroupUsed(pc) && PageCgroupLRU(pc)) { > > + ClearPageCgroupLRU(pc); > > + __mem_cgroup_remove_list(mz, pc); > > + css_put(&pc->mem_cgroup->css); > > + } > > + } > > + if (prev_mz) > > + spin_unlock(&prev_mz->lru_lock); > > + local_irq_restore(flags); > > + > > +} > > + > I'm wondering if page_cgroup_zoneinfo is safe without lock_page_cgroup > because it dereferences pc->mem_cgroup. > I'm worring if the pc has been moved to another lru by re-charge(and re-uncharge), > and __mem_cgroup_remove_list toches a wrong(old) group. > > Hmm, there are many things to be done for re-charge and re-uncharge, > so "if (!PageCgroupUsed(pc) && PageCgroupLRU(pc))" would be enough. > (it can avoid race between re-charge.) > It's safe just because I added following check. + /* + * This page_cgroup is not used but may be on LRU. + */ + if (unlikely(PageCgroupLRU(pc))) { + /* + * pc->mem_cgroup has old information. force_empty() guarantee + * that we never see stale mem_cgroup here. + */ + mz = page_cgroup_zoneinfo(pc); + spin_lock_irqsave(&mz->lru_lock, flags); + if (PageCgroupLRU(pc)) { + ClearPageCgroupLRU(pc); + __mem_cgroup_remove_list(mz, pc); + css_put(&pc->mem_cgroup->css); + } + spin_unlock_irqrestore(&mz->lru_lock, flags); + } + /* Here, PCG_LRU bit is cleared */ before reusing, LRU bit is unset. > Another user of page_cgroup_zoneinfo without lock_page_cgroup is > __mem_cgroup_move_lists called by mem_cgroup_isolate_pages, > but mem_cgroup_isolate_pages handles pc which is actually on the mz->lru > so it would be ok. > (I think adding VM_BUG_ON(mz != page_cgroup_zoneifno(pc)) would make sense, > or add new arg *mz to __mem_cgroup_move_lists?) > ok, I'll add VM_BUG_ON(). Thanks, -Kame -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/