Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752444AbdHPVnw (ORCPT ); Wed, 16 Aug 2017 17:43:52 -0400 Received: from userp1040.oracle.com ([156.151.31.81]:43473 "EHLO userp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752339AbdHPVnu (ORCPT ); Wed, 16 Aug 2017 17:43:50 -0400 From: Boris Ostrovsky Subject: Re: [tip:x86/platform] x86/hyper-v: Use hypercall for remote TLB flush To: Vitaly Kuznetsov , Peter Zijlstra References: <20170802160921.21791-8-vkuznets@redhat.com> <20170810185646.GI6524@worktop.programming.kicks-ass.net> <20170810192742.GJ6524@worktop.programming.kicks-ass.net> <20170811090336.lfznz6qzrbhiqwvi@hirez.programming.kicks-ass.net> <20170811162605.tr4tig4av3q4fll6@hirez.programming.kicks-ass.net> <87fucup9ou.fsf@vitty.brq.redhat.com> <87shgrmpko.fsf@vitty.brq.redhat.com> Cc: Linus Torvalds , Jork Loeser , KY Srinivasan , Simon Xiao , Haiyang Zhang , Stephen Hemminger , "luto@kernel.org" , "hpa@zytor.com" , "linux-kernel@vger.kernel.org" , "rostedt@goodmis.org" , "andy.shevchenko@gmail.com" , "tglx@linutronix.de" , "mingo@kernel.org" , "linux-tip-commits@vger.kernel.org" , "Kirill A. Shutemov" , Juergen Gross Message-ID: <817e2efa-6154-a1c6-8ec7-de04324d6df8@oracle.com> Date: Wed, 16 Aug 2017 17:41:12 -0400 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 MIME-Version: 1.0 In-Reply-To: <87shgrmpko.fsf@vitty.brq.redhat.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit X-Source-IP: aserv0021.oracle.com [141.146.126.233] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 7482 Lines: 201 On 08/16/2017 12:42 PM, Vitaly Kuznetsov wrote: > Vitaly Kuznetsov writes: > >> Peter Zijlstra writes: >> >>> On Fri, Aug 11, 2017 at 09:16:29AM -0700, Linus Torvalds wrote: >>>> On Fri, Aug 11, 2017 at 2:03 AM, Peter Zijlstra wrote: >>>>> I'm sure we talked about using HAVE_RCU_TABLE_FREE for x86 (and yes that >>>>> would make it work again), but this was some years ago and I cannot >>>>> readily find those emails. >>>> I think the only time we really talked about HAVE_RCU_TABLE_FREE for >>>> x86 (at least that I was cc'd on) was not because of RCU freeing, but >>>> because we just wanted to use the generic page table lookup code on >>>> x86 *despite* not using RCU freeing. >>>> >>>> And we just ended up renaming HAVE_GENERIC_RCU_GUP as HAVE_GENERIC_GUP. >>>> >>>> There was only passing mention of maybe making x86 use RCU, but the >>>> discussion was really about why the IF flag meant that x86 didn't need >>>> to, iirc. >>>> >>>> I don't recall us ever discussing *really* making x86 use RCU. >>> Google finds me this: >>> >>> https://lwn.net/Articles/500188/ >>> >>> Which includes: >>> >>> http://www.mail-archive.com/kvm@vger.kernel.org/msg72918.html >>> >>> which does as was suggested here, selects HAVE_RCU_TABLE_FREE for >>> PARAVIRT_TLB_FLUSH. >>> >>> But yes, this is very much virt specific nonsense, native would never >>> need this. >> In case we decide to go HAVE_RCU_TABLE_FREE for all PARAVIRT-enabled >> kernels (as it seems to be the easiest/fastest way to fix Xen PV) - what >> do you think about the required testing? Any suggestion for a >> specifically crafted micro benchmark in addition to standard >> ebizzy/kernbench/...? > In the meantime I tested HAVE_RCU_TABLE_FREE with kernbench (enablement > patch I used is attached; I know that it breaks other architectures) on > bare metal with PARAVIRT enabled in config. The results are: > > 6-CPU host: > > Average Half load -j 3 Run (std deviation): > CURRENT HAVE_RCU_TABLE_FREE > ======= =================== > Elapsed Time 400.498 (0.179679) Elapsed Time 399.909 (0.162853) > User Time 1098.72 (0.278536) User Time 1097.59 (0.283894) > System Time 100.301 (0.201629) System Time 99.736 (0.196254) > Percent CPU 299 (0) Percent CPU 299 (0) > Context Switches 5774.1 (69.2121) Context Switches 5744.4 (79.4162) > Sleeps 87621.2 (78.1093) Sleeps 87586.1 (99.7079) > > Average Optimal load -j 24 Run (std deviation): > CURRENT HAVE_RCU_TABLE_FREE > ======= =================== > Elapsed Time 219.03 (0.652534) Elapsed Time 218.959 (0.598674) > User Time 1119.51 (21.3284) User Time 1118.81 (21.7793) > System Time 100.499 (0.389308) System Time 99.8335 (0.251423) > Percent CPU 432.5 (136.974) Percent CPU 432.45 (136.922) > Context Switches 81827.4 (78029.5) Context Switches 81818.5 (78051) > Sleeps 97124.8 (9822.4) Sleeps 97207.9 (9955.04) > > 16-CPU host: > > Average Half load -j 8 Run (std deviation): > CURRENT HAVE_RCU_TABLE_FREE > ======= =================== > Elapsed Time 213.538 (3.7891) Elapsed Time 212.5 (3.10939) > User Time 1306.4 (1.83399) User Time 1307.65 (1.01364) > System Time 194.59 (0.864378) System Time 195.478 (0.794588) > Percent CPU 702.6 (13.5388) Percent CPU 707 (11.1131) > Context Switches 21189.2 (1199.4) Context Switches 21288.2 (552.388) > Sleeps 89390.2 (482.325) Sleeps 89677 (277.06) > > Average Optimal load -j 64 Run (std deviation): > CURRENT HAVE_RCU_TABLE_FREE > ======= =================== > Elapsed Time 137.866 (0.787928) Elapsed Time 138.438 (0.218792) > User Time 1488.92 (192.399) User Time 1489.92 (192.135) > System Time 234.981 (42.5806) System Time 236.09 (42.8138) > Percent CPU 1057.1 (373.826) Percent CPU 1057.1 (369.114) > Context Switches 187514 (175324) Context Switches 187358 (175060) > Sleeps 112633 (24535.5) Sleeps 111743 (23297.6) > > As you can see, there's no notable difference. I'll think of a > microbenchmark though. FWIW, I was about to send a very similar patch (but with only Xen-PV enabling RCU-based free by default) and saw similar results with kernbench, both Xen PV and baremetal. >> Additionally, I see another option for us: enable 'rcu table free' on >> boot (e.g. by taking tlb_remove_table to pv_ops and doing boot-time >> patching for it) so bare metal and other hypervisors are not affected >> by the change. > It seems there's no need for that and we can keep things simple... > > -- Vitaly > > 0001-x86-enable-RCU-based-table-free-when-PARAVIRT.patch > > > >From daf5117706920aebe793d1239fccac2edd4d680c Mon Sep 17 00:00:00 2001 > From: Vitaly Kuznetsov > Date: Mon, 14 Aug 2017 16:05:05 +0200 > Subject: [PATCH] x86: enable RCU based table free when PARAVIRT > > Signed-off-by: Vitaly Kuznetsov > --- > arch/x86/Kconfig | 1 + > arch/x86/mm/pgtable.c | 16 ++++++++++++++++ > mm/memory.c | 5 +++++ > 3 files changed, 22 insertions(+) > > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig > index 781521b7cf9e..9c1666ea04c9 100644 > --- a/arch/x86/Kconfig > +++ b/arch/x86/Kconfig > @@ -167,6 +167,7 @@ config X86 > select HAVE_PERF_REGS > select HAVE_PERF_USER_STACK_DUMP > select HAVE_REGS_AND_STACK_ACCESS_API > + select HAVE_RCU_TABLE_FREE if SMP && PARAVIRT > select HAVE_RELIABLE_STACKTRACE if X86_64 && FRAME_POINTER && STACK_VALIDATION > select HAVE_STACK_VALIDATION if X86_64 > select HAVE_SYSCALL_TRACEPOINTS > diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c > index 508a708eb9a6..b6aaab9fb3b8 100644 > --- a/arch/x86/mm/pgtable.c > +++ b/arch/x86/mm/pgtable.c > @@ -56,7 +56,11 @@ void ___pte_free_tlb(struct mmu_gather *tlb, struct page *pte) > { > pgtable_page_dtor(pte); > paravirt_release_pte(page_to_pfn(pte)); > +#ifdef CONFIG_HAVE_RCU_TABLE_FREE > + tlb_remove_table(tlb, pte); > +#else > tlb_remove_page(tlb, pte); > +#endif > } > > #if CONFIG_PGTABLE_LEVELS > 2 > @@ -72,21 +76,33 @@ void ___pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmd) > tlb->need_flush_all = 1; > #endif > pgtable_pmd_page_dtor(page); > +#ifdef CONFIG_HAVE_RCU_TABLE_FREE > + tlb_remove_table(tlb, page); > +#else > tlb_remove_page(tlb, page); > +#endif > } > > #if CONFIG_PGTABLE_LEVELS > 3 > void ___pud_free_tlb(struct mmu_gather *tlb, pud_t *pud) > { > paravirt_release_pud(__pa(pud) >> PAGE_SHIFT); > +#ifdef CONFIG_HAVE_RCU_TABLE_FREE > + tlb_remove_table(tlb, virt_to_page(pud)); > +#else > tlb_remove_page(tlb, virt_to_page(pud)); > +#endif > } > > #if CONFIG_PGTABLE_LEVELS > 4 > void ___p4d_free_tlb(struct mmu_gather *tlb, p4d_t *p4d) > { > paravirt_release_p4d(__pa(p4d) >> PAGE_SHIFT); > +#ifdef CONFIG_HAVE_RCU_TABLE_FREE > + tlb_remove_table(tlb, virt_to_page(p4d)); > +#else > tlb_remove_page(tlb, virt_to_page(p4d)); > +#endif This can probably be factored out. > } > #endif /* CONFIG_PGTABLE_LEVELS > 4 */ > #endif /* CONFIG_PGTABLE_LEVELS > 3 */ > diff --git a/mm/memory.c b/mm/memory.c > index e158f7ac6730..18d6671b6ae2 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -329,6 +329,11 @@ bool __tlb_remove_page_size(struct mmu_gather *tlb, struct page *page, int page_ > * See the comment near struct mmu_table_batch. > */ > > +static void __tlb_remove_table(void *table) > +{ > + free_page_and_swap_cache(table); > +} > + This needs to be a per-arch routine (e.g. see arch/arm64/include/asm/tlb.h). -boris