Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752598AbdDHQsY (ORCPT ); Sat, 8 Apr 2017 12:48:24 -0400 Received: from mail-sn1nam01on0095.outbound.protection.outlook.com ([104.47.32.95]:44831 "EHLO NAM01-SN1-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751886AbdDHQrc (ORCPT ); Sat, 8 Apr 2017 12:47:32 -0400 From: KY Srinivasan To: Vitaly Kuznetsov , "devel@linuxdriverproject.org" , "x86@kernel.org" CC: "linux-kernel@vger.kernel.org" , "Haiyang Zhang" , Stephen Hemminger , Thomas Gleixner , Ingo Molnar , "H. Peter Anvin" , Steven Rostedt , "Jork Loeser" Subject: RE: [PATCH 6/7] x86/hyper-v: use hypercall for remove TLB flush Thread-Topic: [PATCH 6/7] x86/hyper-v: use hypercall for remove TLB flush Thread-Index: AQHSr5HvFlY0JcZSoUGew8hIo1ql6aG7rYqw Date: Sat, 8 Apr 2017 16:47:27 +0000 Message-ID: References: <20170407112701.17157-1-vkuznets@redhat.com> <20170407112701.17157-7-vkuznets@redhat.com> In-Reply-To: <20170407112701.17157-7-vkuznets@redhat.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: redhat.com; dkim=none (message not signed) header.d=none;redhat.com; dmarc=none action=none header.from=microsoft.com; x-originating-ip: [2601:600:8c00:1040:bdd6:d020:3aef:2c08] x-microsoft-exchange-diagnostics: 1;BLUPR03MB1412;7:eBGecMd02cwQpMQRNqZeZfCApQG/F85u9q5AZkBZqqHK4rNZzFauLxkXRXLT6lfO4eo0fL0SRAWyc07ECmeBVr8swT/Q0NhuWcLJ/dZO389reT2YE3egw5DyUHdrxo9Xb9gMT35YC6v2prX1CA7mb2n6dp0aHc7x3lRwxpbXN0XoZJuS/fgJ+UogvsSB4v+rjDCkYcD+yt/n53ryW5udHtd8YPcHoEHawfLRy6++TXbjo2qGYoxZGUx9kwOHilmy+6OgbCLKJZ8AkbDHMwvCUgr7YRqLl7az5MQsWCQWHgNM2/1aliYvsdCsxaksVna/kiFVepiZefCly8xupss/SQLWoIPr986iu0dPc1dLEDI= x-ms-office365-filtering-correlation-id: 884ae3d5-721c-471a-6eee-08d47e9ef0d5 x-ms-office365-filtering-ht: Tenant x-microsoft-antispam: UriScan:;BCL:0;PCL:0;RULEID:(22001)(2017030254075)(48565401081)(201703131423075)(201703031133081);SRVR:BLUPR03MB1412; x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:(9452136761055); x-exchange-antispam-report-cfa-test: BCL:0;PCL:0;RULEID:(61425038)(6040450)(601004)(2401047)(5005006)(8121501046)(93006095)(93001095)(10201501046)(3002001)(6055026)(61426038)(61427038)(6041248)(20161123560025)(20161123555025)(20161123562025)(201703131423075)(201702281528075)(201703061421075)(20161123564025)(6072148);SRVR:BLUPR03MB1412;BCL:0;PCL:0;RULEID:;SRVR:BLUPR03MB1412; x-forefront-prvs: 0271483E06 x-forefront-antispam-report: SFV:NSPM;SFS:(10019020)(6009001)(39860400002)(39840400002)(39450400003)(39400400002)(39850400002)(39410400002)(13464003)(377454003)(8676002)(99286003)(4326008)(189998001)(81166006)(53936002)(55016002)(77096006)(76176999)(54356999)(50986999)(25786009)(33656002)(6506006)(9686003)(2906002)(2201001)(8990500004)(8936002)(5005710100001)(10290500002)(86612001)(6246003)(575784001)(86362001)(6436002)(2900100001)(107886003)(38730400002)(3660700001)(3280700002)(5660300001)(305945005)(7736002)(2501003)(74316002)(7696004)(6116002)(2950100002)(102836003)(122556002)(10090500001);DIR:OUT;SFP:1102;SCL:1;SRVR:BLUPR03MB1412;H:BN6PR03MB2481.namprd03.prod.outlook.com;FPR:;SPF:None;MLV:sfv;LANG:en; spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 X-OriginatorOrg: microsoft.com X-MS-Exchange-CrossTenant-originalarrivaltime: 08 Apr 2017 16:47:27.3762 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 72f988bf-86f1-41af-91ab-2d7cd011db47 X-MS-Exchange-Transport-CrossTenantHeadersStamped: BLUPR03MB1412 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by mail.home.local id v38GmTec010902 Content-Length: 8413 Lines: 268 > -----Original Message----- > From: Vitaly Kuznetsov [mailto:vkuznets@redhat.com] > Sent: Friday, April 7, 2017 4:27 AM > To: devel@linuxdriverproject.org; x86@kernel.org > Cc: linux-kernel@vger.kernel.org; KY Srinivasan ; > Haiyang Zhang ; Stephen Hemminger > ; Thomas Gleixner ; Ingo > Molnar ; H. Peter Anvin ; Steven > Rostedt ; Jork Loeser > Subject: [PATCH 6/7] x86/hyper-v: use hypercall for remove TLB flush > > Hyper-V host can suggest us to use hypercall for doing remote TLB flush, > this is supposed to work faster than IPIs. > > Implementation details: to do HvFlushVirtualAddress{Space,List} hypercalls > we need to put the input somewhere in memory and we don't really want to > have memory allocation on each call so we pre-allocate per cpu memory > areas > on boot. These areas are of fixes size, limit them with an arbitrary number > of 16 (16 gvas are able to specify 16 * 4096 pages). > > pv_ops patching is happening very early so we need to separate > hyperv_setup_mmu_ops() and hyper_alloc_mmu(). > > It is possible and easy to implement local TLB flushing too and there is > even a hint for that. However, I don't see a room for optimization on the > host side as both hypercall and native tlb flush will result in vmexit. The > hint is also not set on modern Hyper-V versions. > > Signed-off-by: Vitaly Kuznetsov > --- > arch/x86/hyperv/Makefile | 2 +- > arch/x86/hyperv/hv_init.c | 2 + > arch/x86/hyperv/mmu.c | 128 > +++++++++++++++++++++++++++++++++++++ > arch/x86/include/asm/mshyperv.h | 2 + > arch/x86/include/uapi/asm/hyperv.h | 7 ++ > arch/x86/kernel/cpu/mshyperv.c | 1 + > 6 files changed, 141 insertions(+), 1 deletion(-) > create mode 100644 arch/x86/hyperv/mmu.c > > diff --git a/arch/x86/hyperv/Makefile b/arch/x86/hyperv/Makefile > index 171ae09..367a820 100644 > --- a/arch/x86/hyperv/Makefile > +++ b/arch/x86/hyperv/Makefile > @@ -1 +1 @@ > -obj-y := hv_init.o > +obj-y := hv_init.o mmu.o > diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c > index 1c14088..2cf8a98 100644 > --- a/arch/x86/hyperv/hv_init.c > +++ b/arch/x86/hyperv/hv_init.c > @@ -163,6 +163,8 @@ void hyperv_init(void) > hypercall_msr.guest_physical_address = > vmalloc_to_pfn(hv_hypercall_pg); > wrmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64); > > + hyper_alloc_mmu(); > + > /* > * Register Hyper-V specific clocksource. > */ > diff --git a/arch/x86/hyperv/mmu.c b/arch/x86/hyperv/mmu.c > new file mode 100644 > index 0000000..fb487cb > --- /dev/null > +++ b/arch/x86/hyperv/mmu.c > @@ -0,0 +1,128 @@ > +#include > +#include > +#include > +#include > +#include > +#include > +#include > + > +/* > + * Arbitrary number; we need to pre-allocate per-cpu struct for doing TLB > + * flush hypercalls and we need to pick a size. '16' means we'll be able > + * to flush 16 * 4096 pages (256MB) with one hypercall. > + */ > +#define HV_MMU_MAX_GVAS 16 Did you experiment with different sizes here. > + > +/* HvFlushVirtualAddressSpace*, HvFlushVirtualAddressList hypercalls */ > +struct hv_flush_pcpu { > + struct { > + __u64 address_space; > + __u64 flags; > + __u64 processor_mask; > + __u64 gva_list[HV_MMU_MAX_GVAS]; > + } flush; > + > + spinlock_t lock; > +}; > + We may be supporting more than 64 CPUs in this hypercall. I am going to inquire with the Windows folks and get back to you. > +static struct hv_flush_pcpu __percpu *pcpu_flush; > + > +static void hyperv_flush_tlb_others(const struct cpumask *cpus, > + struct mm_struct *mm, unsigned long > start, > + unsigned long end) > +{ > + struct hv_flush_pcpu *flush; > + unsigned long cur, flags; > + u64 status = -1ULL; > + int cpu, vcpu, gva_n; > + > + if (!pcpu_flush || !hv_hypercall_pg) > + goto do_native; > + > + if (cpumask_empty(cpus)) > + return; > + > + flush = this_cpu_ptr(pcpu_flush); > + spin_lock_irqsave(&flush->lock, flags); > + > + flush->flush.address_space = virt_to_phys(mm->pgd); > + flush->flush.processor_mask = 0; > + if (cpumask_equal(cpus, cpu_present_mask)) { > + flush->flush.flags = HV_FLUSH_ALL_PROCESSORS; > + } else { > + flush->flush.flags = 0; > + for_each_cpu(cpu, cpus) { > + vcpu = vmbus_cpu_number_to_vp_number(cpu); > + if (vcpu != -1 && vcpu < 64) > + flush->flush.processor_mask |= 1 << vcpu; > + else > + goto unlock_do_native; > + } > + } > + > + if (end == TLB_FLUSH_ALL) { > + flush->flush.flags = > HV_FLUSH_NON_GLOBAL_MAPPINGS_ONLY; > + status = > hv_do_hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE, > + &flush->flush, NULL); > + } else { > + cur = start; > +more_gvas: > + gva_n = 0; > + > + do { > + flush->flush.gva_list[gva_n] = cur & PAGE_MASK; > + /* > + * Lower 12 bits encode the number of additional > + * pages to flush (in addition to the 'cur' page). > + */ > + if (end >= cur + PAGE_SIZE * PAGE_SIZE) > + flush->flush.gva_list[gva_n] |= > ~PAGE_MASK; > + else if (end > cur) > + flush->flush.gva_list[gva_n] |= > + (end - cur - 1) >> PAGE_SHIFT; > + > + cur += PAGE_SIZE * PAGE_SIZE; > + ++gva_n; > + > + } while (cur < end && gva_n < HV_MMU_MAX_GVAS); > + > + status = > hv_do_rep_hypercall(HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST, > + gva_n, &flush->flush, NULL); > + > + if (!(status & 0xffff) && cur < end) > + goto more_gvas; > + } > + > +unlock_do_native: > + spin_unlock_irqrestore(&flush->lock, flags); > + > + if (!(status & 0xffff)) > + return; > +do_native: > + native_flush_tlb_others(cpus, mm, start, end); > +} > + > +void hyperv_setup_mmu_ops(void) > +{ > + if (ms_hyperv.hints & > HV_X64_REMOTE_TLB_FLUSH_RECOMMENDED) { > + pr_info("Hyper-V: Using hypercall for remote TLB flush\n"); > + pv_mmu_ops.flush_tlb_others = hyperv_flush_tlb_others; > + } > +} > + > +void hyper_alloc_mmu(void) > +{ > + int cpu; > + struct hv_flush_pcpu *flush; > + > + if (ms_hyperv.hints & > HV_X64_REMOTE_TLB_FLUSH_RECOMMENDED) { > + pcpu_flush = alloc_percpu(struct hv_flush_pcpu); > + if (!pcpu_flush) > + return; > + > + for_each_possible_cpu(cpu) { > + flush = per_cpu_ptr(pcpu_flush, cpu); > + spin_lock_init(&flush->lock); > + } > + } > +} > diff --git a/arch/x86/include/asm/mshyperv.h > b/arch/x86/include/asm/mshyperv.h > index 1293c84..a5041c3 100644 > --- a/arch/x86/include/asm/mshyperv.h > +++ b/arch/x86/include/asm/mshyperv.h > @@ -301,6 +301,8 @@ static inline int > vmbus_cpu_number_to_vp_number(int cpu_number) > } > > void hyperv_init(void); > +void hyperv_setup_mmu_ops(void); > +void hyper_alloc_mmu(void); > void hyperv_report_panic(struct pt_regs *regs); > bool hv_is_hypercall_page_setup(void); > void hyperv_cleanup(void); > diff --git a/arch/x86/include/uapi/asm/hyperv.h > b/arch/x86/include/uapi/asm/hyperv.h > index c87e900..3d44036 100644 > --- a/arch/x86/include/uapi/asm/hyperv.h > +++ b/arch/x86/include/uapi/asm/hyperv.h > @@ -239,6 +239,8 @@ > (~((1ull << > HV_X64_MSR_HYPERCALL_PAGE_ADDRESS_SHIFT) - 1)) > > /* Declare the various hypercall operations. */ > +#define HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE 0x0002 > +#define HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST 0x0003 > #define HVCALL_NOTIFY_LONG_SPIN_WAIT 0x0008 > #define HVCALL_POST_MESSAGE 0x005c > #define HVCALL_SIGNAL_EVENT 0x005d > @@ -256,6 +258,11 @@ > #define HV_PROCESSOR_POWER_STATE_C2 2 > #define HV_PROCESSOR_POWER_STATE_C3 3 > > +#define HV_FLUSH_ALL_PROCESSORS 0x00000001 > +#define HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES 0x00000002 > +#define HV_FLUSH_NON_GLOBAL_MAPPINGS_ONLY 0x00000004 > +#define HV_FLUSH_USE_EXTENDED_RANGE_FORMAT 0x00000008 > + > /* Hypercall interface */ > union hv_hypercall_input { > u64 as_uint64; > diff --git a/arch/x86/kernel/cpu/mshyperv.c > b/arch/x86/kernel/cpu/mshyperv.c > index 04cb8d3..fc228d8 100644 > --- a/arch/x86/kernel/cpu/mshyperv.c > +++ b/arch/x86/kernel/cpu/mshyperv.c > @@ -233,6 +233,7 @@ static void __init ms_hyperv_init_platform(void) > * Setup the hook to get control post apic initialization. > */ > x86_platform.apic_post_init = hyperv_init; > + hyperv_setup_mmu_ops(); > #endif > } > > -- > 2.9.3