Received: by 10.223.164.202 with SMTP id h10csp1538814wrb; Wed, 15 Nov 2017 23:15:37 -0800 (PST) X-Google-Smtp-Source: AGs4zMb4eTW+tOK/996UeyV0NpR/4CxW5j01d9CzcpzcDNsYGKXX1TZD0mR9YR5PXxv+eSTHjtgA X-Received: by 10.159.242.194 with SMTP id x2mr780056plw.64.1510816537189; Wed, 15 Nov 2017 23:15:37 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1510816537; cv=none; d=google.com; s=arc-20160816; b=p1UdB+KA99y3kdDAQu1GwEp5kSuOXWL+4DCg244KjurKdJ0iHLp6BoAv1EGH6rh9e9 nW5MD5uMpIF6AQKhOG7qU8Io2EjaRui1/tPBGRmMVdkFfg1jiZ0qb01R1UyIsvs7yv84 o0o8vqsju0ub0exv1LVEpGPCkoKS39MczbzNUlm46RqheabQxPUDvVJ3NwlO5JkkR9In dOIJVY1f58N0XT4VEgf2JJSxwINOMQY6yauPcIC+sy+6pyOsXGkiUXICrSQNuEWpuAGq 9Dw32mqcaItve72ubFA/NQIWR6+XgBG3O4KENQdBjofqLhfs4tnWq5MzSaTxnF/y/gra dARQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:cc:to:subject :message-id:date:from:references:in-reply-to:mime-version :dkim-signature:arc-authentication-results; bh=0t7qjvjll7mwiPyIfVT20v8TMygvTWvKCn6YxuXHMwA=; b=bGaxhsrXL0MF+VuHssCp4+WRgsHoepYI97sOqUfS41KNIp7cDjz2oFjixjkY8UDRxr q7DR4jbDRGU2iMdJStXcZ97XH8atMbpQsd9gTWGtfBhLEgh03jq/WC975qMugtdAKB1S p46Dlh3P6zL8g93zI07u0Mq663dzhjnZBAwU8XO7YaNyD6uSpH3f9Hv31nXoUsVmE9e6 Xh6JhJWJIfJDAXZJFwasq7QWefsRRxUEQhAqBlin3+a2XIgOmiftGIf1sLYHw0x2yB3J /DwebTQDR7KxX90UGLwxXOWvIPAw6K5hqC39n54KL4YWWcC5o77jRC7Aw69sC+rzdYoD FFxw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=nXzuI1I9; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id h10si191111pgf.24.2017.11.15.23.15.23; Wed, 15 Nov 2017 23:15:37 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=nXzuI1I9; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1759710AbdKPFbF (ORCPT + 91 others); Thu, 16 Nov 2017 00:31:05 -0500 Received: from mail-ot0-f195.google.com ([74.125.82.195]:43891 "EHLO mail-ot0-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1759447AbdKPFa7 (ORCPT ); Thu, 16 Nov 2017 00:30:59 -0500 Received: by mail-ot0-f195.google.com with SMTP id s12so15235106otc.0; Wed, 15 Nov 2017 21:30:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=0t7qjvjll7mwiPyIfVT20v8TMygvTWvKCn6YxuXHMwA=; b=nXzuI1I9gFB3Q83FsDAcoRLC6CD2ZRj04Bbid/cylsGqT9PckMsaE1RT+u9onT8Rxa x4pcKssqPUv+Itn941xJ6qYgM2en19qPgSHKtUizY99+vGIisYrI9RS81hLkqT/M9H7m Fn76epEp/oNMCMW8nnvWngeyF4dWWRGXeYqqkphISr2el1yGzFq18GhkIm3Cqd76c4GD 3gRHgZG2p2A2cOYAlAxABrG8lonOYcJ6atBUwW+YrK0ZoG5bZN77QnnADiO4RMIxZjab gSPiBYz4gJT3OeRwbf5oAqYCBfXdTcB+qjW+RusLHCMTIzHwjqzLL5ixRunDH8tFXRXW JVIg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=0t7qjvjll7mwiPyIfVT20v8TMygvTWvKCn6YxuXHMwA=; b=eRunmt+8GR9UeHxNWmW4mZWH8/jLOYamDa2EJ9g9QvUocdUO2+NfkVYZNeHphWiYiP EcTXCGqNgnc2PYEJrHaKa3xwtBOU22i0R7hJhaZwiisEYoeKfn0SWRp8f43fwowo300R PZdZSc37Wg9Qs6qDhrsePr8PoTcr1zuFMcF79vwVzpegKmTiTLb+VaQ/bGiGdCQh3Cla B2P/IJjyRc3K7X/iBud4DgDu+lGRUQRO/zMzbsBPim31lufka49fL/RuGPv+nZlwbwov gV484DClaRgpC0KW4d7BWNzZzWp/+GuMdl2qW4VTXc2G3hiB8GW28FsIQcCcekH5VSBz 8p9g== X-Gm-Message-State: AJaThX4GNzr7qdvt7bMoezCUXIujksxh3JNrn262MJjG7qSXQR22FijZ Tgczbw50AtXxUBTwJZ/wjlkXFja/R6xAbzJsUqQBrA== X-Received: by 10.157.3.43 with SMTP id 40mr361313otv.156.1510810258325; Wed, 15 Nov 2017 21:30:58 -0800 (PST) MIME-Version: 1.0 Received: by 10.74.53.27 with HTTP; Wed, 15 Nov 2017 21:30:57 -0800 (PST) In-Reply-To: <1510567280-19376-3-git-send-email-wanpeng.li@hotmail.com> References: <1510567280-19376-1-git-send-email-wanpeng.li@hotmail.com> <1510567280-19376-3-git-send-email-wanpeng.li@hotmail.com> From: Wanpeng Li Date: Thu, 16 Nov 2017 13:30:57 +0800 Message-ID: Subject: Re: [PATCH v5 2/4] KVM: X86: Add paravirt remote TLB flush To: "linux-kernel@vger.kernel.org" , kvm Cc: Paolo Bonzini , =?UTF-8?B?UmFkaW0gS3LEjW3DocWZ?= , Peter Zijlstra , Wanpeng Li Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 2017-11-13 18:01 GMT+08:00 Wanpeng Li : > From: Wanpeng Li > > Remote flushing api's does a busy wait which is fine in bare-metal > scenario. But with-in the guest, the vcpus might have been pre-empted > or blocked. In this scenario, the initator vcpu would end up > busy-waiting for a long amount of time. > > This patch set implements para-virt flush tlbs making sure that it does > not wait for vcpus that are sleeping. And all the sleeping vcpus flush > the tlb on guest enter. > > The best result is achieved when we're overcommiting the host by running > multiple vCPUs on each pCPU. In this case PV tlb flush avoids touching > vCPUs which are not scheduled and avoid the wait on the main CPU. > > Test on a Haswell i7 desktop 4 cores (2HT), so 8 pCPUs, running ebizzy > in one linux guest. > > ebizzy -M > vanilla optimized boost > 8 vCPUs 10152 10083 -0.68% > 16 vCPUs 1224 4866 297.5% > 24 vCPUs 1109 3871 249% > 32 vCPUs 1025 3375 229.3% > > Cc: Paolo Bonzini > Cc: Radim Kr=C4=8Dm=C3=A1=C5=99 > Cc: Peter Zijlstra > Signed-off-by: Wanpeng Li > --- > Documentation/virtual/kvm/cpuid.txt | 4 ++++ > arch/x86/include/uapi/asm/kvm_para.h | 2 ++ > arch/x86/kernel/kvm.c | 42 ++++++++++++++++++++++++++++++= +++++- > 3 files changed, 47 insertions(+), 1 deletion(-) > > diff --git a/Documentation/virtual/kvm/cpuid.txt b/Documentation/virtual/= kvm/cpuid.txt > index 117066a..9693fcc 100644 > --- a/Documentation/virtual/kvm/cpuid.txt > +++ b/Documentation/virtual/kvm/cpuid.txt > @@ -60,6 +60,10 @@ KVM_FEATURE_PV_DEDICATED || 8 || guest c= hecks this feature bit > || || mizations such as usage o= f > || || qspinlocks. > ------------------------------------------------------------------------= ------ > +KVM_FEATURE_PV_TLB_FLUSH || 9 || guest checks this feature= bit > + || || before enabling paravirtu= alized > + || || tlb flush. > +------------------------------------------------------------------------= ------ > KVM_FEATURE_CLOCKSOURCE_STABLE_BIT || 24 || host will warn if no gues= t-side > || || per-cpu warps are expecte= d in > || || kvmclock. > diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi= /asm/kvm_para.h > index 6d66556..e267d83 100644 > --- a/arch/x86/include/uapi/asm/kvm_para.h > +++ b/arch/x86/include/uapi/asm/kvm_para.h > @@ -26,6 +26,7 @@ > #define KVM_FEATURE_PV_EOI 6 > #define KVM_FEATURE_PV_UNHALT 7 > #define KVM_FEATURE_PV_DEDICATED 8 > +#define KVM_FEATURE_PV_TLB_FLUSH 9 > > /* The last 8 bits are used to indicate how to interpret the flags field > * in pvclock structure. If no bits are set, all flags are ignored. > @@ -54,6 +55,7 @@ struct kvm_steal_time { > > #define KVM_VCPU_NOT_PREEMPTED (0 << 0) > #define KVM_VCPU_PREEMPTED (1 << 0) > +#define KVM_VCPU_SHOULD_FLUSH (1 << 1) > > #define KVM_CLOCK_PAIRING_WALLCLOCK 0 > struct kvm_clock_pairing { > diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c > index 66ed3bc..78794c1 100644 > --- a/arch/x86/kernel/kvm.c > +++ b/arch/x86/kernel/kvm.c > @@ -465,9 +465,40 @@ static void __init kvm_apf_trap_init(void) > update_intr_gate(X86_TRAP_PF, async_page_fault); > } > > +static DEFINE_PER_CPU(cpumask_var_t, __pv_tlb_mask); > + > +static void kvm_flush_tlb_others(const struct cpumask *cpumask, > + const struct flush_tlb_info *info) > +{ > + u8 state; > + int cpu; > + struct kvm_steal_time *src; > + struct cpumask *flushmask =3D this_cpu_cpumask_var_ptr(__pv_tlb_m= ask); > + > + if (unlikely(!flushmask)) > + return; > + > + cpumask_copy(flushmask, cpumask); > + /* > + * We have to call flush only on online vCPUs. And > + * queue flush_on_enter for pre-empted vCPUs > + */ > + for_each_cpu(cpu, flushmask) { > + src =3D &per_cpu(steal_time, cpu); > + state =3D READ_ONCE(src->preempted); > + if ((state & KVM_VCPU_PREEMPTED)) { > + if (try_cmpxchg(&src->preempted, &state, > + state | KVM_VCPU_SHOULD_FLUSH)) > + __cpumask_clear_cpu(cpu, flushmask); > + } > + } > + > + native_flush_tlb_others(flushmask, info); > +} > + > void __init kvm_guest_init(void) > { > - int i; > + int i, cpu; > > if (!kvm_para_available()) > return; > @@ -484,6 +515,15 @@ void __init kvm_guest_init(void) > pv_time_ops.steal_clock =3D kvm_steal_clock; > } > > + if (kvm_para_has_feature(KVM_FEATURE_PV_TLB_FLUSH) && > + !kvm_para_has_feature(KVM_FEATURE_PV_DEDICATED)) { > + for_each_possible_cpu(cpu) { > + zalloc_cpumask_var_node(per_cpu_ptr(&__pv_tlb_mas= k, cpu), > + GFP_KERNEL, cpu_to_node(cpu)); > + } > + pv_mmu_ops.flush_tlb_others =3D kvm_flush_tlb_others; > + } > + > if (kvm_para_has_feature(KVM_FEATURE_PV_EOI)) > apic_set_eoi_write(kvm_guest_apic_eoi_write); > Adds the below codes to fix per-cpu variable memory allocation issue. kvm_guest_init() is called too early during boot, buddy/slab system is not ready, I didn't find suitable memblock API, so just define variable directly. diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c index 707ee68..d8fbaf5 100644 --- a/arch/x86/kernel/kvm.c +++ b/arch/x86/kernel/kvm.c @@ -465,7 +465,7 @@ static void __init kvm_apf_trap_init(void) update_intr_gate(X86_TRAP_PF, async_page_fault); } -static DEFINE_PER_CPU(cpumask_var_t, __pv_tlb_mask); +static DEFINE_PER_CPU(cpumask_t, __pv_tlb_mask); static void kvm_flush_tlb_others(const struct cpumask *cpumask, const struct flush_tlb_info *info) @@ -473,7 +473,7 @@ static void kvm_flush_tlb_others(const struct cpumask *cpumask, u8 state; int cpu; struct kvm_steal_time *src; - struct cpumask *flushmask =3D this_cpu_cpumask_var_ptr(__pv_tlb_mask); + cpumask_t *flushmask =3D &per_cpu(__pv_tlb_mask, smp_processor_id()); if (unlikely(!flushmask)) return; @@ -498,7 +498,7 @@ static void kvm_flush_tlb_others(const struct cpumask *cpumask, void __init kvm_guest_init(void) { - int i, cpu; + int i; if (!kvm_para_available()) return; @@ -516,13 +516,8 @@ void __init kvm_guest_init(void) } if (kvm_para_has_feature(KVM_FEATURE_PV_TLB_FLUSH) && - !kvm_para_has_feature(KVM_FEATURE_PV_DEDICATED)) { - for_each_possible_cpu(cpu) { - zalloc_cpumask_var_node(per_cpu_ptr(&__pv_tlb_mask, cpu), - GFP_KERNEL, cpu_to_node(cpu)); - } + !kvm_para_has_feature(KVM_FEATURE_PV_DEDICATED)) pv_mmu_ops.flush_tlb_others =3D kvm_flush_tlb_others; - } if (kvm_para_has_feature(KVM_FEATURE_PV_EOI)) apic_set_eoi_write(kvm_guest_apic_eoi_write); From 1584176752656017441@xxx Wed Nov 15 23:31:21 +0000 2017 X-GM-THRID: 1583909069046568739 X-Gmail-Labels: Inbox,Category Forums,HistoricalUnread