Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753504Ab2JKTy6 (ORCPT ); Thu, 11 Oct 2012 15:54:58 -0400 Received: from gir.skynet.ie ([193.1.99.77]:55299 "EHLO gir.skynet.ie" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751968Ab2JKTyx (ORCPT ); Thu, 11 Oct 2012 15:54:53 -0400 Date: Thu, 11 Oct 2012 20:54:50 +0100 From: Mel Gorman To: Andrea Arcangeli Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, Linus Torvalds , Andrew Morton , Peter Zijlstra , Ingo Molnar , Hugh Dickins , Rik van Riel , Johannes Weiner , Hillf Danton , Andrew Jones , Dan Smith , Thomas Gleixner , Paul Turner , Christoph Lameter , Suresh Siddha , Mike Galbraith , "Paul E. McKenney" Subject: Re: [PATCH 05/33] autonuma: pte_numa() and pmd_numa() Message-ID: <20121011195450.GM3317@csn.ul.ie> References: <1349308275-2174-1-git-send-email-aarcange@redhat.com> <1349308275-2174-6-git-send-email-aarcange@redhat.com> <20121011111545.GR3317@csn.ul.ie> <20121011165847.GO1818@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <20121011165847.GO1818@redhat.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5142 Lines: 137 On Thu, Oct 11, 2012 at 06:58:47PM +0200, Andrea Arcangeli wrote: > On Thu, Oct 11, 2012 at 12:15:45PM +0100, Mel Gorman wrote: > > huh? > > > > #define _PAGE_NUMA _PAGE_PROTNONE > > > > so this is effective _PAGE_PRESENT | _PAGE_PROTNONE | _PAGE_PROTNONE > > > > I suspect you are doing this because there is no requirement for > > _PAGE_NUMA == _PAGE_PROTNONE for other architectures and it was best to > > describe your intent. Is that really the case or did I miss something > > stupid? > > Exactly. > > It reminds that we need to return true in pte_present when the NUMA > hinting page fault is on. > > Hardwiring _PAGE_NUMA to _PAGE_PROTNONE conceptually is not necessary > and it's actually an artificial restrictions. Other archs without a > bitflag for _PAGE_PROTNONE, may want to use something else and they'll > have to deal with pte_present too, somehow. So this is a reminder for > them as well. > That's all very reasonable. > > > static inline int pte_hidden(pte_t pte) > > > @@ -420,7 +421,63 @@ static inline int pmd_present(pmd_t pmd) > > > * the _PAGE_PSE flag will remain set at all times while the > > > * _PAGE_PRESENT bit is clear). > > > */ > > > - return pmd_flags(pmd) & (_PAGE_PRESENT | _PAGE_PROTNONE | _PAGE_PSE); > > > + return pmd_flags(pmd) & (_PAGE_PRESENT | _PAGE_PROTNONE | _PAGE_PSE | > > > + _PAGE_NUMA); > > > +} > > > + > > > +#ifdef CONFIG_AUTONUMA > > > +/* > > > + * _PAGE_NUMA works identical to _PAGE_PROTNONE (it's actually the > > > + * same bit too). It's set only when _PAGE_PRESET is not set and it's > > > > same bit on x86, not necessarily anywhere else. > > Yep. In fact before using _PAGE_PRESENT the two bits were different > even on x86. But I unified them. If I vary them then they will become > _PAGE_PTE_NUMA/_PAGE_PMD_NUMA and the above will fail to build without > risk of errors. > Ok. > > > > _PAGE_PRESENT? > > good eye ;) corrected. > > > > +/* > > > + * pte/pmd_mknuma sets the _PAGE_ACCESSED bitflag automatically > > > + * because they're called by the NUMA hinting minor page fault. > > > > automatically or atomically? > > > > I assume you meant atomically but what stops two threads faulting at the > > same time and doing to the same update? mmap_sem will be insufficient in > > that case so what is guaranteeing the atomicity. PTL? > > I meant automatically. I explained myself wrong and automatically may > be the wrong word. It also is atomic of course but it wasn't about the > atomic part. > > So the thing is: the numa hinting page fault hooking point is this: > > if (pte_numa(entry)) > return pte_numa_fixup(mm, vma, address, entry, pte, pmd); > > It won't get this far: > > entry = pte_mkyoung(entry); > if (ptep_set_access_flags(vma, address, pte, entry, flags & FAULT_FLAG_WRITE)) { > > So if I don't set _PAGE_ACCESSED in pte/pmd_mknuma, the TLB miss > handler will have to set _PAGE_ACCESSED itself with an additional > write on the pte/pmd later when userland touches the page. And that > will slow us down for no good. > All clear now. Letting it fall through to reach that point would be convulated and messy. This is a better option. > Because mknuma is only called in the numa hinting page fault context, > it's optimal to set _PAGE_ACCESSED too, not only _PAGE_PRESENT (and > clearing _PAGE_NUMA of course). > > The basic idea, is that the numa hinting page fault can only trigger > if userland touches the page, and after such an event, _PAGE_ACCESSED > would be set by the hardware no matter if there is a NUMA hinting page > fault or not (so we can optimize away the hardware action when the NUMA > hinting page fault triggers). > > I tried to reword it: > > diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h > index cf1d3f0..3dc6a9b 100644 > --- a/arch/x86/include/asm/pgtable.h > +++ b/arch/x86/include/asm/pgtable.h > @@ -449,12 +449,12 @@ static inline int pmd_numa(pmd_t pmd) > #endif > > /* > - * pte/pmd_mknuma sets the _PAGE_ACCESSED bitflag automatically > - * because they're called by the NUMA hinting minor page fault. If we > - * wouldn't set the _PAGE_ACCESSED bitflag here, the TLB miss handler > - * would be forced to set it later while filling the TLB after we > - * return to userland. That would trigger a second write to memory > - * that we optimize away by setting _PAGE_ACCESSED here. > + * pte/pmd_mknuma sets the _PAGE_ACCESSED bitflag too because they're > + * only called by the NUMA hinting minor page fault. If we wouldn't > + * set the _PAGE_ACCESSED bitflag here, the TLB miss handler would be > + * forced to set it later while filling the TLB after we return to > + * userland. That would trigger a second write to memory that we > + * optimize away by setting _PAGE_ACCESSED here. > */ > static inline pte_t pte_mknonnuma(pte_t pte) > { > Much better. -- Mel Gorman SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/