Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933324Ab2HVW5q (ORCPT ); Wed, 22 Aug 2012 18:57:46 -0400 Received: from gate.crashing.org ([63.228.1.57]:50204 "EHLO gate.crashing.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755268Ab2HVW5m (ORCPT ); Wed, 22 Aug 2012 18:57:42 -0400 Message-ID: <1345676194.13399.1.camel@pasglop> Subject: Re: [PATCH 33/36] autonuma: powerpc port From: Benjamin Herrenschmidt To: Vaidyanathan Srinivasan Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, Hillf Danton , Dan Smith , Linus Torvalds , Andrew Morton , Thomas Gleixner , Ingo Molnar , Paul Turner , Suresh Siddha , Mike Galbraith , "Paul E. McKenney" , Lai Jiangshan , Bharata B Rao , Lee Schermerhorn , Rik van Riel , Johannes Weiner , Srivatsa Vaddagiri , Christoph Lameter , Alex Shi , Mauricio Faria de Oliveira , Konrad Rzeszutek Wilk , Don Morris , Andrea Arcangeli Date: Thu, 23 Aug 2012 08:56:34 +1000 In-Reply-To: <1345672907.2617.44.camel@pasglop> References: <1345647560-30387-1-git-send-email-aarcange@redhat.com> <1345647560-30387-34-git-send-email-aarcange@redhat.com> <1345672907.2617.44.camel@pasglop> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.2.3-0ubuntu6 Content-Transfer-Encoding: 7bit Mime-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 6489 Lines: 182 On Thu, 2012-08-23 at 08:01 +1000, Benjamin Herrenschmidt wrote: > On Wed, 2012-08-22 at 16:59 +0200, Andrea Arcangeli wrote: > > From: Vaidyanathan Srinivasan > > > > * PMD flaging is not required in powerpc since large pages > > are tracked in ptes. > > * Yet to be tested with large pages > > * This is an initial patch that partially works > > * knuma_scand and numa hinting page faults works > > * Page migration is yet to be observed/verified > > > > Signed-off-by: Vaidyanathan Srinivasan > > Signed-off-by: Andrea Arcangeli > > I don't like this. What I mean here is that it's fine as a proof of concept ;-) I don't like it being in a series aimed at upstream... We can try to flush out the issues, but as it is, the patch isn't upstreamable imho. As for finding PTE bits, I have a few ideas we need to discuss, but nothing simple I'm afraid. Cheers, Ben. > --- > > arch/powerpc/include/asm/pgtable.h | 48 ++++++++++++++++++++++++++++- > > arch/powerpc/include/asm/pte-hash64-64k.h | 4 ++- > > arch/powerpc/mm/numa.c | 3 +- > > mm/autonuma.c | 2 +- > > 4 files changed, 53 insertions(+), 4 deletions(-) > > > > diff --git a/arch/powerpc/include/asm/pgtable.h b/arch/powerpc/include/asm/pgtable.h > > index 2e0e411..5f03079 100644 > > --- a/arch/powerpc/include/asm/pgtable.h > > +++ b/arch/powerpc/include/asm/pgtable.h > > @@ -33,10 +33,56 @@ static inline int pte_dirty(pte_t pte) { return pte_val(pte) & _PAGE_DIRTY; } > > static inline int pte_young(pte_t pte) { return pte_val(pte) & _PAGE_ACCESSED; } > > static inline int pte_file(pte_t pte) { return pte_val(pte) & _PAGE_FILE; } > > static inline int pte_special(pte_t pte) { return pte_val(pte) & _PAGE_SPECIAL; } > > -static inline int pte_present(pte_t pte) { return pte_val(pte) & _PAGE_PRESENT; } > > +static inline int pte_present(pte_t pte) { return pte_val(pte) & > > + (_PAGE_PRESENT|_PAGE_NUMA_PTE); } > > Is this absolutely necessary ? (testing two bits). It somewhat changes > the semantics of "pte_present" which I don't really like. > > > static inline int pte_none(pte_t pte) { return (pte_val(pte) & ~_PTE_NONE_MASK) == 0; } > > static inline pgprot_t pte_pgprot(pte_t pte) { return __pgprot(pte_val(pte) & PAGE_PROT_BITS); } > > > > +#ifdef CONFIG_AUTONUMA > > +static inline int pte_numa(pte_t pte) > > +{ > > + return (pte_val(pte) & > > + (_PAGE_NUMA_PTE|_PAGE_PRESENT)) == _PAGE_NUMA_PTE; > > +} > > + > > +#endif > > Why the ifdef and not anywhere else ? > > > +static inline pte_t pte_mknonnuma(pte_t pte) > > +{ > > + pte_val(pte) &= ~_PAGE_NUMA_PTE; > > + pte_val(pte) |= (_PAGE_PRESENT|_PAGE_ACCESSED); > > + > > + return pte; > > +} > > + > > +static inline pte_t pte_mknuma(pte_t pte) > > +{ > > + pte_val(pte) |= _PAGE_NUMA_PTE; > > + pte_val(pte) &= ~_PAGE_PRESENT; > > + return pte; > > +} > > + > > +static inline int pmd_numa(pmd_t pmd) > > +{ > > + /* PMD tracking not implemented */ > > + return 0; > > +} > > + > > +static inline pmd_t pmd_mknonnuma(pmd_t pmd) > > +{ > > + BUG(); > > + return pmd; > > +} > > + > > +static inline pmd_t pmd_mknuma(pmd_t pmd) > > +{ > > + BUG(); > > + return pmd; > > +} > > + > > +/* No pmd flags on powerpc */ > > +#define set_pmd_at(mm, addr, pmdp, pmd) do { } while (0) > > + > > /* Conversion functions: convert a page and protection to a page entry, > > * and a page entry and page directory to the page they refer to. > > * > > diff --git a/arch/powerpc/include/asm/pte-hash64-64k.h b/arch/powerpc/include/asm/pte-hash64-64k.h > > index 59247e8..f7e1468 100644 > > --- a/arch/powerpc/include/asm/pte-hash64-64k.h > > +++ b/arch/powerpc/include/asm/pte-hash64-64k.h > > @@ -7,6 +7,8 @@ > > #define _PAGE_COMBO 0x10000000 /* this is a combo 4k page */ > > #define _PAGE_4K_PFN 0x20000000 /* PFN is for a single 4k page */ > > > > +#define _PAGE_NUMA_PTE 0x40000000 /* Adjust PTE_RPN_SHIFT below */ > > + > > /* For 64K page, we don't have a separate _PAGE_HASHPTE bit. Instead, > > * we set that to be the whole sub-bits mask. The C code will only > > * test this, so a multi-bit mask will work. For combo pages, this > > @@ -36,7 +38,7 @@ > > * That gives us a max RPN of 34 bits, which means a max of 50 bits > > * of addressable physical space, or 46 bits for the special 4k PFNs. > > */ > > -#define PTE_RPN_SHIFT (30) > > +#define PTE_RPN_SHIFT (31) > > I'm concerned. We are already running short on RPN bits. We can't spare > more. If you absolutely need a PTE bit, we'll need to explore ways to > free some, but just reducing the RPN isn't an option. > > Think of what happens if PTE_4K_PFN is set... > > Also you conveniently avoided all the other pte-*.h variants meaning you > broke the build for everything except ppc64 with 64k pages. > > > #ifndef __ASSEMBLY__ > > > > diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c > > index 39b1597..80af41e 100644 > > --- a/arch/powerpc/mm/numa.c > > +++ b/arch/powerpc/mm/numa.c > > @@ -22,6 +22,7 @@ > > #include > > #include > > #include > > +#include > > #include > > #include > > #include > > @@ -1045,7 +1046,7 @@ void __init do_init_bootmem(void) > > * all reserved areas marked. > > */ > > NODE_DATA(nid) = careful_zallocation(nid, > > - sizeof(struct pglist_data), > > + autonuma_pglist_data_size(), > > SMP_CACHE_BYTES, end_pfn); > > > > dbg("node %d\n", nid); > > diff --git a/mm/autonuma.c b/mm/autonuma.c > > index ada6c57..a4da3f3 100644 > > --- a/mm/autonuma.c > > +++ b/mm/autonuma.c > > @@ -25,7 +25,7 @@ unsigned long autonuma_flags __read_mostly = > > #ifdef CONFIG_AUTONUMA_DEFAULT_ENABLED > > |(1< > #endif > > - |(1< > + |(0< > That changes the default accross all architectures, is that ok vs. > Andrea ? > > Cheers, > Ben. > > > static DEFINE_MUTEX(knumad_mm_mutex); > > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/