Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753435AbZIHCjh (ORCPT ); Mon, 7 Sep 2009 22:39:37 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1753282AbZIHCjg (ORCPT ); Mon, 7 Sep 2009 22:39:36 -0400 Received: from fgwmail6.fujitsu.co.jp ([192.51.44.36]:48518 "EHLO fgwmail6.fujitsu.co.jp" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753237AbZIHCjg (ORCPT ); Mon, 7 Sep 2009 22:39:36 -0400 X-SecurityPolicyCheck-FJ: OK by FujitsuOutboundMailChecker v1.3.1 Date: Tue, 8 Sep 2009 11:37:34 +0900 From: KAMEZAWA Hiroyuki To: Hugh Dickins Cc: Andrew Morton , KOSAKI Motohiro , Linus Torvalds , Nick Piggin , Rik van Riel , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH 7/8] mm: reinstate ZERO_PAGE Message-Id: <20090908113734.869cdad7.kamezawa.hiroyu@jp.fujitsu.com> In-Reply-To: References: Organization: FUJITSU Co. LTD. X-Mailer: Sylpheed 2.5.0 (GTK+ 2.10.14; i686-pc-mingw32) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5627 Lines: 173 On Mon, 7 Sep 2009 22:39:34 +0100 (BST) Hugh Dickins wrote: > KAMEZAWA Hiroyuki has observed customers of earlier kernels taking > advantage of the ZERO_PAGE: which we stopped do_anonymous_page() from > using in 2.6.24. And there were a couple of regression reports on LKML. > > Following suggestions from Linus, reinstate do_anonymous_page() use of > the ZERO_PAGE; but this time avoid dirtying its struct page cacheline > with (map)count updates - let vm_normal_page() regard it as abnormal. > > Use it only on arches which __HAVE_ARCH_PTE_SPECIAL (x86, s390, sh32, > most powerpc): that's not essential, but minimizes additional branches > (keeping them in the unlikely pte_special case); and incidentally > excludes mips (some models of which needed eight colours of ZERO_PAGE > to avoid costly exceptions). > > Don't be fanatical about avoiding ZERO_PAGE updates: get_user_pages() > callers won't want to make exceptions for it, so increment its count > there. Changes to mlock and migration? happily seems not needed. > > In most places it's quicker to check pfn than struct page address: > prepare a __read_mostly zero_pfn for that. Does get_dump_page() > still need its ZERO_PAGE check? probably not, but keep it anyway. > > Signed-off-by: Hugh Dickins A nitpick but this was a concern you shown, IIUC. == __get_user_pages().. if (pages) { pages[i] = page; flush_anon_page(vma, page, start); flush_dcache_page(page); } == This part will call flush_dcache_page() even when ZERO_PAGE is found. Don't we need to mask this ? Thanks, -Kame > --- > I have not studied the performance of this at all: I'd rather it go > into mmotm where others may decide whether it's a good thing or not. > > mm/memory.c | 53 +++++++++++++++++++++++++++++++++++++++++--------- > 1 file changed, 44 insertions(+), 9 deletions(-) > > --- mm6/mm/memory.c 2009-09-07 13:16:53.000000000 +0100 > +++ mm7/mm/memory.c 2009-09-07 13:17:01.000000000 +0100 > @@ -107,6 +107,17 @@ static int __init disable_randmaps(char > } > __setup("norandmaps", disable_randmaps); > > +static unsigned long zero_pfn __read_mostly; > + > +/* > + * CONFIG_MMU architectures set up ZERO_PAGE in their paging_init() > + */ > +static int __init init_zero_pfn(void) > +{ > + zero_pfn = page_to_pfn(ZERO_PAGE(0)); > + return 0; > +} > +core_initcall(init_zero_pfn); > > /* > * If a p?d_bad entry is found while walking page tables, report > @@ -499,7 +510,9 @@ struct page *vm_normal_page(struct vm_ar > if (HAVE_PTE_SPECIAL) { > if (likely(!pte_special(pte))) > goto check_pfn; > - if (!(vma->vm_flags & (VM_PFNMAP | VM_MIXEDMAP))) > + if (vma->vm_flags & (VM_PFNMAP | VM_MIXEDMAP)) > + return NULL; > + if (pfn != zero_pfn) > print_bad_pte(vma, addr, pte, NULL); > return NULL; > } > @@ -1144,9 +1157,14 @@ struct page *follow_page(struct vm_area_ > goto no_page; > if ((flags & FOLL_WRITE) && !pte_write(pte)) > goto unlock; > + > page = vm_normal_page(vma, address, pte); > - if (unlikely(!page)) > - goto bad_page; > + if (unlikely(!page)) { > + if ((flags & FOLL_DUMP) || > + pte_pfn(pte) != zero_pfn) > + goto bad_page; > + page = pte_page(pte); > + } > > if (flags & FOLL_GET) > get_page(page); > @@ -2085,10 +2103,19 @@ gotten: > > if (unlikely(anon_vma_prepare(vma))) > goto oom; > - VM_BUG_ON(old_page == ZERO_PAGE(0)); > - new_page = alloc_page_vma(GFP_HIGHUSER_MOVABLE, vma, address); > - if (!new_page) > - goto oom; > + > + if (pte_pfn(orig_pte) == zero_pfn) { > + new_page = alloc_zeroed_user_highpage_movable(vma, address); > + if (!new_page) > + goto oom; > + } else { > + new_page = alloc_page_vma(GFP_HIGHUSER_MOVABLE, vma, address); > + if (!new_page) > + goto oom; > + cow_user_page(new_page, old_page, address, vma); > + } > + __SetPageUptodate(new_page); > + > /* > * Don't let another task, with possibly unlocked vma, > * keep the mlocked page. > @@ -2098,8 +2125,6 @@ gotten: > clear_page_mlock(old_page); > unlock_page(old_page); > } > - cow_user_page(new_page, old_page, address, vma); > - __SetPageUptodate(new_page); > > if (mem_cgroup_newpage_charge(new_page, mm, GFP_KERNEL)) > goto oom_free_new; > @@ -2594,6 +2619,15 @@ static int do_anonymous_page(struct mm_s > spinlock_t *ptl; > pte_t entry; > > + if (HAVE_PTE_SPECIAL && !(flags & FAULT_FLAG_WRITE)) { > + entry = pte_mkspecial(pfn_pte(zero_pfn, vma->vm_page_prot)); > + ptl = pte_lockptr(mm, pmd); > + spin_lock(ptl); > + if (!pte_none(*page_table)) > + goto unlock; > + goto setpte; > + } > + > /* Allocate our own private page. */ > pte_unmap(page_table); > > @@ -2617,6 +2651,7 @@ static int do_anonymous_page(struct mm_s > > inc_mm_counter(mm, anon_rss); > page_add_new_anon_rmap(page, vma, address); > +setpte: > set_pte_at(mm, address, page_table, entry); > > /* No need to invalidate - it was non-present before */ > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/