Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932526Ab2F1HMk (ORCPT ); Thu, 28 Jun 2012 03:12:40 -0400 Received: from gate.crashing.org ([63.228.1.57]:58259 "EHLO gate.crashing.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754023Ab2F1HMh (ORCPT ); Thu, 28 Jun 2012 03:12:37 -0400 Message-ID: <1340867364.20977.65.camel@pasglop> Subject: Re: [PATCH 02/20] mm: Add optional TLB flush to generic RCU page-table freeing From: Benjamin Herrenschmidt To: Peter Zijlstra Cc: Linus Torvalds , linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org, linux-mm@kvack.org, Thomas Gleixner , Ingo Molnar , akpm@linux-foundation.org, Rik van Riel , Hugh Dickins , Mel Gorman , Nick Piggin , Alex Shi , "Nikunj A. Dadhania" , Konrad Rzeszutek Wilk , David Miller , Russell King , Catalin Marinas , Chris Metcalf , Martin Schwidefsky , Tony Luck , Paul Mundt , Jeff Dike , Richard Weinberger , Ralf Baechle , Kyle McMartin , James Bottomley , Chris Zankel Date: Thu, 28 Jun 2012 17:09:24 +1000 In-Reply-To: <1340838106.10063.85.camel@twins> References: <20120627211540.459910855@chello.nl> <20120627212830.693232452@chello.nl> <1340838106.10063.85.camel@twins> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.2.3-0ubuntu6 Content-Transfer-Encoding: 7bit Mime-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5538 Lines: 162 On Thu, 2012-06-28 at 01:01 +0200, Peter Zijlstra wrote: > On Wed, 2012-06-27 at 15:23 -0700, Linus Torvalds wrote: > > > Plus it really isn't about hardware page table walkers at all. It's > > more about the possibility of speculative TLB fils, it has nothing to > > do with *how* they are done. Sure, it's likely that a software > > pagetable walker wouldn't be something that gets called speculatively, > > but it's not out of the question. > > > Hmm, I would call gup_fast() as speculative as we can get in software. > It does a lock-less walk of the page-tables. That's what the RCU free'd > page-table stuff is for to begin with. Strictly speaking it's not :-) To *begin with* (as in the origin of that code) it comes from powerpc hash table code which walks the linux page tables locklessly :-) It then came in handy with gup_fast :-) > > IOW, if Sparc/PPC really want to guarantee that they never fill TLB > > entries speculatively, and that if we are in a kernel thread they will > > *never* fill the TLB with anything else, then make them enable > > CONFIG_STRICT_TLB_FILL or something in their architecture Kconfig > > files. > > Since we've dealt with the speculative software side by using RCU-ish > stuff, the only thing that's left is hardware, now neither sparc64 nor > ppc actually know about the linux page-tables from what I understood, > they only look at their hash-table thing. Some embedded ppc's know about the lowest level (SW loaded PMD) but that's not an issue here. We flush these special TLB entries specifically and synchronously in __pte_free_tlb(). > So even if the hardware did do speculative tlb fills, it would do them > from the hash-table, but that's already cleared out. Right, Cheers, Ben. > > How about something like this > > --- > Subject: mm: Add missing TLB invalidate to RCU page-table freeing > From: Peter Zijlstra > Date: Thu Jun 28 00:49:33 CEST 2012 > > For normal systems we need a TLB invalidate before freeing the > page-tables, the generic RCU based page-table freeing code lacked > this. > > This is because this code originally came from ppc where the hardware > never walks the linux page-tables and thus this invalidate is not > required. > > Others, notably s390 which ran into this problem in cd94154cc6a > ("[S390] fix tlb flushing for page table pages"), do very much need > this TLB invalidation. > > Therefore add it, with a Kconfig option to disable it so as to not > unduly slow down PPC and SPARC64 which neither of them need it. > > Signed-off-by: Peter Zijlstra > --- > arch/Kconfig | 3 +++ > arch/powerpc/Kconfig | 1 + > arch/sparc/Kconfig | 1 + > mm/memory.c | 18 ++++++++++++++++++ > 4 files changed, 23 insertions(+) > > --- a/arch/Kconfig > +++ b/arch/Kconfig > @@ -231,6 +231,9 @@ config HAVE_ARCH_MUTEX_CPU_RELAX > config HAVE_RCU_TABLE_FREE > bool > > +config STRICT_TLB_FILL > + bool > + > config ARCH_HAVE_NMI_SAFE_CMPXCHG > bool > > --- a/arch/powerpc/Kconfig > +++ b/arch/powerpc/Kconfig > @@ -127,6 +127,7 @@ config PPC > select GENERIC_IRQ_SHOW_LEVEL > select IRQ_FORCED_THREADING > select HAVE_RCU_TABLE_FREE if SMP > + select STRICT_TLB_FILL > select HAVE_SYSCALL_TRACEPOINTS > select HAVE_BPF_JIT if PPC64 > select HAVE_ARCH_JUMP_LABEL > --- a/arch/sparc/Kconfig > +++ b/arch/sparc/Kconfig > @@ -52,6 +52,7 @@ config SPARC64 > select HAVE_KRETPROBES > select HAVE_KPROBES > select HAVE_RCU_TABLE_FREE if SMP > + select STRICT_TLB_FILL > select HAVE_MEMBLOCK > select HAVE_MEMBLOCK_NODE_MAP > select HAVE_SYSCALL_WRAPPERS > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -329,11 +329,27 @@ static void tlb_remove_table_rcu(struct > free_page((unsigned long)batch); > } > > +#ifdef CONFIG_STRICT_TLB_FILL > +/* > + * Some archictures (sparc64, ppc) cannot refill TLBs after the they've removed > + * the PTE entries from their hash-table. Their hardware never looks at the > + * linux page-table structures, so they don't need a hardware TLB invalidate > + * when tearing down the page-table structure itself. > + */ > +static inline void tlb_table_flush_mmu(struct mmu_gather *tlb) { } > +#else > +static inline void tlb_table_flush_mmu(struct mmu_gather *tlb) > +{ > + tlb_flush_mmu(tlb); > +} > +#endif > + > void tlb_table_flush(struct mmu_gather *tlb) > { > struct mmu_table_batch **batch = &tlb->batch; > > if (*batch) { > + tlb_table_flush_mmu(tlb); > call_rcu_sched(&(*batch)->rcu, tlb_remove_table_rcu); > *batch = NULL; > } > @@ -345,6 +361,7 @@ void tlb_remove_table(struct mmu_gather > > tlb->need_flush = 1; > > +#ifdef CONFIG_STRICT_TLB_FILL > /* > * When there's less then two users of this mm there cannot be a > * concurrent page-table walk. > @@ -353,6 +370,7 @@ void tlb_remove_table(struct mmu_gather > __tlb_remove_table(table); > return; > } > +#endif > > if (*batch == NULL) { > *batch = (struct mmu_table_batch *)__get_free_page(GFP_NOWAIT | __GFP_NOWARN); > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: email@kvack.org -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/