Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758203AbYLLJxq (ORCPT ); Fri, 12 Dec 2008 04:53:46 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1756873AbYLLJxi (ORCPT ); Fri, 12 Dec 2008 04:53:38 -0500 Received: from TYO202.gate.nec.co.jp ([202.32.8.206]:57998 "EHLO tyo202.gate.nec.co.jp" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756694AbYLLJxh (ORCPT ); Fri, 12 Dec 2008 04:53:37 -0500 Date: Fri, 12 Dec 2008 18:43:41 +0900 From: Daisuke Nishimura To: KAMEZAWA Hiroyuki Cc: "linux-mm@kvack.org" , "linux-kernel@vger.kernel.org" , "balbir@linux.vnet.ibm.com" , "akpm@linux-foundation.org" , Hugh Dickins , nishimura@mxp.nes.nec.co.jp Subject: Re: [BUGFIX][PATCH mmotm] memcg fix swap accounting leak Message-Id: <20081212184341.b62903a7.nishimura@mxp.nes.nec.co.jp> In-Reply-To: <20081212172930.282caa38.kamezawa.hiroyu@jp.fujitsu.com> References: <20081212172930.282caa38.kamezawa.hiroyu@jp.fujitsu.com> Organization: NEC Soft, Ltd. X-Mailer: Sylpheed 2.4.8 (GTK+ 2.10.14; i686-pc-mingw32) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3643 Lines: 99 (add CC: Hugh Dickins ) On Fri, 12 Dec 2008 17:29:30 +0900, KAMEZAWA Hiroyuki wrote: > > From: KAMEZAWA Hiroyuki > > Fix swap-page-fault charge leak of memcg. > > Now, memcg has hooks to swap-out operation and checks SwapCache is really > unused or not. That check depends on contents of struct page. > I.e. If PageAnon(page) && page_mapped(page), the page is recoginized as > still-in-use. > > Now, reuse_swap_page() calles delete_from_swap_cache() before establishment > of any rmap. Then, in followinig sequence > > (Page fault with WRITE) > Assume the page is SwapCache "on memory (still charged)" > try_charge() (charge += PAGESIZE) > commit_charge() > => (Check page_cgroup and found PCG_USED bit, charge-=PAGE_SIZE > because it seems already charged.) > reuse_swap_page() > -> delete_from_swapcache() > -> mem_cgroup_uncharge_swapcache() (charge -= PAGESIZE) > ...... > > too much uncharge..... > > To avoid this, move commit_charge() after page_mapcount() goes up to 1. > By this, > Assume the page is SwapCache "on memory" > try_charge() (charge += PAGESIZE) > reuse_swap_page() (may charge -= PAGESIZE if PCG_USED is set) > commit_charge() (Ony if page_cgroup is marked as PCG_USED, > charge -= PAGESIZE) > Accounting will be correct. > > Signed-off-by: KAMEZAWA Hiroyuki I've confirmed that the problem you reported offline is fixed, but... > --- > mm/memory.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > Index: mmotm-2.6.28-Dec11/mm/memory.c > =================================================================== > --- mmotm-2.6.28-Dec11.orig/mm/memory.c > +++ mmotm-2.6.28-Dec11/mm/memory.c > @@ -2433,17 +2433,17 @@ static int do_swap_page(struct mm_struct > * which may delete_from_swap_cache(). > */ > The comment here says: /* * The page isn't present yet, go ahead with the fault. * * Be careful about the sequence of operations here. * To get its accounting right, reuse_swap_page() must be called * while the page is counted on swap but not yet in mapcount i.e. * before page_add_anon_rmap() and swap_free(); try_to_free_swap() * must be called after the swap_free(), or it will never succeed. * And mem_cgroup_commit_charge_swapin(), which uses the swp_entry * in page->private, must be called before reuse_swap_page(), * which may delete_from_swap_cache(). */ Hmm.. should we save page->private before calling reuse_swap_page and pass it to mem_cgroup_commit_charge_swapin(I think it cannot be changed because the page is locked)? Thanks, Daisuke Nishimura. > - mem_cgroup_commit_charge_swapin(page, ptr); > inc_mm_counter(mm, anon_rss); > pte = mk_pte(page, vma->vm_page_prot); > if (write_access && reuse_swap_page(page)) { > pte = maybe_mkwrite(pte_mkdirty(pte), vma); > write_access = 0; > } > - > flush_icache_page(vma, page); > set_pte_at(mm, address, page_table, pte); > page_add_anon_rmap(page, vma, address); > + /* It's better to call commit-charge after rmap is established */ > + mem_cgroup_commit_charge_swapin(page, ptr); > > swap_free(entry); > if (vm_swap_full() || (vma->vm_flags & VM_LOCKED) || PageMlocked(page)) > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/