2015-07-08 15:18:41

by Paolo Bonzini

[permalink] [raw]
Subject: [RFC/RFT PATCH v3 0/4] KVM: x86: full virtualization of guest MTRR

This part of the MTRR patches was dropped by Xiao. Bring SVM on feature
parity with VMX, and then do guest MTRR virtualization for both VMX and SVM.

The IPAT bit of VMX extended page tables is emulated by mangling the guest
PAT value.

I do not have any AMD machines that support an IOMMU, so I would like
some help testing these patches. Thanks,

Paolo

v1->v2: AMD IOMMUs do have snooping control [Joerg]
New patch 1

v2->v3: Split __KVM_ARCH_* defines [Alex]
SVM: correctly map MTRR values to pgprot [Xiao]

Jan Kiszka (1):
KVM: SVM: Sync g_pat with guest-written PAT value

Paolo Bonzini (3):
KVM: count number of assigned devices
KVM: SVM: use NPT page attributes
KVM: x86: apply guest MTRR virtualization on host reserved pages

arch/x86/include/asm/kvm_host.h | 2 +
arch/x86/kvm/iommu.c | 2 +
arch/x86/kvm/svm.c | 108 ++++++++++++++++++++++++++++++++++++++--
arch/x86/kvm/vmx.c | 11 ++--
arch/x86/kvm/x86.c | 18 +++++++
include/linux/kvm_host.h | 18 +++++++
virt/kvm/vfio.c | 5 ++
7 files changed, 151 insertions(+), 13 deletions(-)

--
1.8.3.1


2015-07-08 15:19:59

by Paolo Bonzini

[permalink] [raw]
Subject: [PATCH 1/4] KVM: count number of assigned devices

If there are no assigned devices, the guest PAT are not providing
any useful information and can be overridden to writeback; VMX
always does this because it has the "IPAT" bit in its extended
page table entries, but SVM does not have anything similar.
Hook into VFIO and legacy device assignment so that they
provide this information to KVM.

Signed-off-by: Paolo Bonzini <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 2 ++
arch/x86/kvm/iommu.c | 2 ++
arch/x86/kvm/x86.c | 18 ++++++++++++++++++
include/linux/kvm_host.h | 18 ++++++++++++++++++
virt/kvm/vfio.c | 5 +++++
5 files changed, 45 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 2a7f5d782c33..49ec9038ec14 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -604,6 +604,8 @@ struct kvm_arch {
bool iommu_noncoherent;
#define __KVM_HAVE_ARCH_NONCOHERENT_DMA
atomic_t noncoherent_dma_count;
+#define __KVM_HAVE_ARCH_ASSIGNED_DEVICE
+ atomic_t assigned_device_count;
struct kvm_pic *vpic;
struct kvm_ioapic *vioapic;
struct kvm_pit *vpit;
diff --git a/arch/x86/kvm/iommu.c b/arch/x86/kvm/iommu.c
index 7dbced309ddb..5c520ebf6343 100644
--- a/arch/x86/kvm/iommu.c
+++ b/arch/x86/kvm/iommu.c
@@ -200,6 +200,7 @@ int kvm_assign_device(struct kvm *kvm, struct pci_dev *pdev)
goto out_unmap;
}

+ kvm_arch_start_assignment(kvm);
pci_set_dev_assigned(pdev);

dev_info(&pdev->dev, "kvm assign device\n");
@@ -224,6 +225,7 @@ int kvm_deassign_device(struct kvm *kvm, struct pci_dev *pdev)
iommu_detach_device(domain, &pdev->dev);

pci_clear_dev_assigned(pdev);
+ kvm_arch_end_assignment(kvm);

dev_info(&pdev->dev, "kvm deassign device\n");

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 6bd19c7abc65..0024968b342d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8213,6 +8213,24 @@ bool kvm_arch_can_inject_async_page_present(struct kvm_vcpu *vcpu)
kvm_x86_ops->interrupt_allowed(vcpu);
}

+void kvm_arch_start_assignment(struct kvm *kvm)
+{
+ atomic_inc(&kvm->arch.assigned_device_count);
+}
+EXPORT_SYMBOL_GPL(kvm_arch_start_assignment);
+
+void kvm_arch_end_assignment(struct kvm *kvm)
+{
+ atomic_dec(&kvm->arch.assigned_device_count);
+}
+EXPORT_SYMBOL_GPL(kvm_arch_end_assignment);
+
+bool kvm_arch_has_assigned_device(struct kvm *kvm)
+{
+ return atomic_read(&kvm->arch.assigned_device_count);
+}
+EXPORT_SYMBOL_GPL(kvm_arch_has_assigned_device);
+
void kvm_arch_register_noncoherent_dma(struct kvm *kvm)
{
atomic_inc(&kvm->arch.noncoherent_dma_count);
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 9564fd78c547..05e99b8ef465 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -734,6 +734,24 @@ static inline bool kvm_arch_has_noncoherent_dma(struct kvm *kvm)
return false;
}
#endif
+#ifdef __KVM_HAVE_ARCH_ASSIGNED_DEVICE
+void kvm_arch_start_assignment(struct kvm *kvm);
+void kvm_arch_end_assignment(struct kvm *kvm);
+bool kvm_arch_has_assigned_device(struct kvm *kvm);
+#else
+static inline void kvm_arch_start_assignment(struct kvm *kvm)
+{
+}
+
+static inline void kvm_arch_end_assignment(struct kvm *kvm)
+{
+}
+
+static inline bool kvm_arch_has_assigned_device(struct kvm *kvm)
+{
+ return false;
+}
+#endif

static inline wait_queue_head_t *kvm_arch_vcpu_wq(struct kvm_vcpu *vcpu)
{
diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
index 620e37f741b8..1dd087da6f31 100644
--- a/virt/kvm/vfio.c
+++ b/virt/kvm/vfio.c
@@ -155,6 +155,8 @@ static int kvm_vfio_set_group(struct kvm_device *dev, long attr, u64 arg)
list_add_tail(&kvg->node, &kv->group_list);
kvg->vfio_group = vfio_group;

+ kvm_arch_start_assignment(dev->kvm);
+
mutex_unlock(&kv->lock);

kvm_vfio_update_coherency(dev);
@@ -190,6 +192,8 @@ static int kvm_vfio_set_group(struct kvm_device *dev, long attr, u64 arg)
break;
}

+ kvm_arch_end_assignment(dev->kvm);
+
mutex_unlock(&kv->lock);

kvm_vfio_group_put_external_user(vfio_group);
@@ -239,6 +243,7 @@ static void kvm_vfio_destroy(struct kvm_device *dev)
kvm_vfio_group_put_external_user(kvg->vfio_group);
list_del(&kvg->node);
kfree(kvg);
+ kvm_arch_end_assignment(dev->kvm);
}

kvm_vfio_update_coherency(dev);
--
1.8.3.1

2015-07-08 15:19:35

by Paolo Bonzini

[permalink] [raw]
Subject: [PATCH 2/4] KVM: SVM: use NPT page attributes

Right now, NPT page attributes are not used, and the final page
attribute depends solely on gPAT (which however is not synced
correctly), the guest MTRRs and the guest page attributes.

However, we can do better by mimicking what is done for VMX.
In the absence of PCI passthrough, the guest PAT can be ignored
and the page attributes can be just WB. If passthrough is being
used, instead, keep respecting the guest PAT, and emulate the guest
MTRRs through the PAT field of the nested page tables.

The only snag is that MTRRs can only be emulated correctly if
Linux's PAT setting includes the type.

Signed-off-by: Paolo Bonzini <[email protected]>
---
arch/x86/kvm/svm.c | 101 ++++++++++++++++++++++++++++++++++++++++++++++++++---
1 file changed, 96 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 602b974a60a6..414ec25b673e 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -865,6 +865,64 @@ static void svm_disable_lbrv(struct vcpu_svm *svm)
set_msr_interception(msrpm, MSR_IA32_LASTINTTOIP, 0, 0);
}

+#define MTRR_TYPE_UC_MINUS 7
+#define MTRR2PROTVAL_INVALID 0xff
+
+static u8 mtrr2protval[8];
+
+static u8 fallback_mtrr_type(int mtrr)
+{
+ /*
+ * WT and WP aren't always available in the host PAT. Treat
+ * them as UC and UC- respectively. Everything else should be
+ * there.
+ */
+ switch (mtrr)
+ {
+ case MTRR_TYPE_WRTHROUGH:
+ return MTRR_TYPE_UNCACHABLE;
+ case MTRR_TYPE_WRPROT:
+ return MTRR_TYPE_UC_MINUS;
+ default:
+ BUG();
+ }
+}
+
+static void build_mtrr2protval(void)
+{
+ int i;
+ u64 pat;
+
+ for (i = 0; i < 8; i++)
+ mtrr2protval[i] = MTRR2PROTVAL_INVALID;
+
+ /* Ignore the invalid MTRR types. */
+ mtrr2protval[2] = 0;
+ mtrr2protval[3] = 0;
+
+ /*
+ * Use host PAT value to figure out the mapping from guest MTRR
+ * values to nested page table PAT/PCD/PWT values. We do not
+ * want to change the host PAT value every time we enter the
+ * guest.
+ */
+ rdmsrl(MSR_IA32_CR_PAT, pat);
+ for (i = 0; i < 8; i++) {
+ u8 mtrr = pat >> (8 * i);
+
+ if (mtrr2protval[mtrr] == MTRR2PROTVAL_INVALID)
+ mtrr2protval[mtrr] = __cm_idx2pte(i);
+ }
+
+ for (i = 0; i < 8; i++) {
+ if (mtrr2protval[i] == MTRR2PROTVAL_INVALID) {
+ u8 fallback = fallback_mtrr_type(i);
+ mtrr2protval[i] = mtrr2protval[fallback];
+ BUG_ON(mtrr2protval[i] == MTRR2PROTVAL_INVALID);
+ }
+ }
+}
+
static __init int svm_hardware_setup(void)
{
int cpu;
@@ -931,6 +989,7 @@ static __init int svm_hardware_setup(void)
} else
kvm_disable_tdp();

+ build_mtrr2protval();
return 0;

err:
@@ -1085,6 +1144,42 @@ static u64 svm_compute_tsc_offset(struct kvm_vcpu *vcpu, u64 target_tsc)
return target_tsc - tsc;
}

+static void svm_set_guest_pat(struct vcpu_svm *svm, u64 *g_pat)
+{
+ struct kvm_vcpu *vcpu = &svm->vcpu;
+
+ /* Unlike Intel, AMD takes the guest's CR0.CD into account.
+ *
+ * AMD does not have IPAT. To emulate it for the case of guests
+ * with no assigned devices, just set everything to WB. If guests
+ * have assigned devices, however, we cannot force WB for RAM
+ * pages only, so use the guest PAT directly.
+ */
+ if (!kvm_arch_has_assigned_device(vcpu->kvm))
+ *g_pat = 0x0606060606060606;
+ else
+ *g_pat = vcpu->arch.pat;
+}
+
+static u64 svm_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio)
+{
+ u8 mtrr;
+
+ /*
+ * 1. MMIO: always map as UC
+ * 2. No passthrough: always map as WB, and force guest PAT to WB as well
+ * 3. Passthrough: can't guarantee the result, try to trust guest.
+ */
+ if (is_mmio)
+ return _PAGE_NOCACHE;
+
+ if (!kvm_arch_has_assigned_device(vcpu->kvm))
+ return 0;
+
+ mtrr = kvm_mtrr_get_guest_memory_type(vcpu, gfn);
+ return mtrr2protval[mtrr];
+}
+
static void init_vmcb(struct vcpu_svm *svm, bool init_event)
{
struct vmcb_control_area *control = &svm->vmcb->control;
@@ -1180,6 +1275,7 @@ static void init_vmcb(struct vcpu_svm *svm, bool init_event)
clr_cr_intercept(svm, INTERCEPT_CR3_READ);
clr_cr_intercept(svm, INTERCEPT_CR3_WRITE);
save->g_pat = svm->vcpu.arch.pat;
+ svm_set_guest_pat(svm, &save->g_pat);
save->cr3 = 0;
save->cr4 = 0;
}
@@ -4088,11 +4184,6 @@ static bool svm_has_high_real_mode_segbase(void)
return true;
}

-static u64 svm_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio)
-{
- return 0;
-}
-
static void svm_cpuid_update(struct kvm_vcpu *vcpu)
{
}
--
1.8.3.1

2015-07-08 15:18:50

by Paolo Bonzini

[permalink] [raw]
Subject: [PATCH 3/4] KVM: SVM: Sync g_pat with guest-written PAT value

From: Jan Kiszka <[email protected]>

When hardware supports the g_pat VMCB field, we can use it for emulating
the PAT configuration that the guest configures by writing to the
corresponding MSR.

Signed-off-by: Jan Kiszka <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
---
arch/x86/kvm/svm.c | 10 ++++++++++
1 file changed, 10 insertions(+)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 414ec25b673e..d36cfaf5a97a 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -3350,6 +3350,16 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
case MSR_VM_IGNNE:
vcpu_unimpl(vcpu, "unimplemented wrmsr: 0x%x data 0x%llx\n", ecx, data);
break;
+ case MSR_IA32_CR_PAT:
+ if (npt_enabled) {
+ if (!kvm_mtrr_valid(vcpu, MSR_IA32_CR_PAT, data))
+ return 1;
+ vcpu->arch.pat = data;
+ svm_set_guest_pat(svm, &svm->vmcb->save.g_pat);
+ mark_dirty(svm->vmcb, VMCB_NPT);
+ break;
+ }
+ /* fall through */
default:
return kvm_set_msr_common(vcpu, msr);
}
--
1.8.3.1

2015-07-08 15:18:47

by Paolo Bonzini

[permalink] [raw]
Subject: [PATCH 4/4] KVM: x86: apply guest MTRR virtualization on host reserved pages

Currently guest MTRR is avoided if kvm_is_reserved_pfn returns true.
However, the guest could prefer a different page type than UC for
such pages. A good example is that pass-throughed VGA frame buffer is
not always UC as host expected.

This patch enables full use of virtual guest MTRRs.

Suggested-by: Xiao Guangrong <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
---
arch/x86/kvm/svm.c | 7 ++-----
arch/x86/kvm/vmx.c | 11 +++--------
2 files changed, 5 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index d36cfaf5a97a..bbc678a66b18 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1166,14 +1166,11 @@ static u64 svm_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio)
u8 mtrr;

/*
- * 1. MMIO: always map as UC
+ * 1. MMIO: trust guest MTRR, so same as item 3.
* 2. No passthrough: always map as WB, and force guest PAT to WB as well
* 3. Passthrough: can't guarantee the result, try to trust guest.
*/
- if (is_mmio)
- return _PAGE_NOCACHE;
-
- if (!kvm_arch_has_assigned_device(vcpu->kvm))
+ if (!is_mmio && !kvm_arch_has_assigned_device(vcpu->kvm))
return 0;

mtrr = kvm_mtrr_get_guest_memory_type(vcpu, gfn);
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index e856dd566f4c..5b4e9384717a 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -8632,22 +8632,17 @@ static u64 vmx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio)
u64 ipat = 0;

/* For VT-d and EPT combination
- * 1. MMIO: always map as UC
+ * 1. MMIO: guest may want to apply WC, trust it.
* 2. EPT with VT-d:
* a. VT-d without snooping control feature: can't guarantee the
- * result, try to trust guest.
+ * result, try to trust guest. So the same as item 1.
* b. VT-d with snooping control feature: snooping control feature of
* VT-d engine can guarantee the cache correctness. Just set it
* to WB to keep consistent with host. So the same as item 3.
* 3. EPT without VT-d: always map as WB and set IPAT=1 to keep
* consistent with host MTRR
*/
- if (is_mmio) {
- cache = MTRR_TYPE_UNCACHABLE;
- goto exit;
- }
-
- if (!kvm_arch_has_noncoherent_dma(vcpu->kvm)) {
+ if (!is_mmio && !kvm_arch_has_noncoherent_dma(vcpu->kvm)) {
ipat = VMX_EPT_IPAT_BIT;
cache = MTRR_TYPE_WRBACK;
goto exit;
--
1.8.3.1

2015-07-08 15:29:15

by Alex Williamson

[permalink] [raw]
Subject: Re: [PATCH 1/4] KVM: count number of assigned devices

On Wed, 2015-07-08 at 17:18 +0200, Paolo Bonzini wrote:
> If there are no assigned devices, the guest PAT are not providing
> any useful information and can be overridden to writeback; VMX
> always does this because it has the "IPAT" bit in its extended
> page table entries, but SVM does not have anything similar.
> Hook into VFIO and legacy device assignment so that they
> provide this information to KVM.
>
> Signed-off-by: Paolo Bonzini <[email protected]>
> ---
> arch/x86/include/asm/kvm_host.h | 2 ++
> arch/x86/kvm/iommu.c | 2 ++
> arch/x86/kvm/x86.c | 18 ++++++++++++++++++
> include/linux/kvm_host.h | 18 ++++++++++++++++++
> virt/kvm/vfio.c | 5 +++++
> 5 files changed, 45 insertions(+)


Reviewed-by: Alex Williamson <[email protected]>


> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 2a7f5d782c33..49ec9038ec14 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -604,6 +604,8 @@ struct kvm_arch {
> bool iommu_noncoherent;
> #define __KVM_HAVE_ARCH_NONCOHERENT_DMA
> atomic_t noncoherent_dma_count;
> +#define __KVM_HAVE_ARCH_ASSIGNED_DEVICE
> + atomic_t assigned_device_count;
> struct kvm_pic *vpic;
> struct kvm_ioapic *vioapic;
> struct kvm_pit *vpit;
> diff --git a/arch/x86/kvm/iommu.c b/arch/x86/kvm/iommu.c
> index 7dbced309ddb..5c520ebf6343 100644
> --- a/arch/x86/kvm/iommu.c
> +++ b/arch/x86/kvm/iommu.c
> @@ -200,6 +200,7 @@ int kvm_assign_device(struct kvm *kvm, struct pci_dev *pdev)
> goto out_unmap;
> }
>
> + kvm_arch_start_assignment(kvm);
> pci_set_dev_assigned(pdev);
>
> dev_info(&pdev->dev, "kvm assign device\n");
> @@ -224,6 +225,7 @@ int kvm_deassign_device(struct kvm *kvm, struct pci_dev *pdev)
> iommu_detach_device(domain, &pdev->dev);
>
> pci_clear_dev_assigned(pdev);
> + kvm_arch_end_assignment(kvm);
>
> dev_info(&pdev->dev, "kvm deassign device\n");
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 6bd19c7abc65..0024968b342d 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -8213,6 +8213,24 @@ bool kvm_arch_can_inject_async_page_present(struct kvm_vcpu *vcpu)
> kvm_x86_ops->interrupt_allowed(vcpu);
> }
>
> +void kvm_arch_start_assignment(struct kvm *kvm)
> +{
> + atomic_inc(&kvm->arch.assigned_device_count);
> +}
> +EXPORT_SYMBOL_GPL(kvm_arch_start_assignment);
> +
> +void kvm_arch_end_assignment(struct kvm *kvm)
> +{
> + atomic_dec(&kvm->arch.assigned_device_count);
> +}
> +EXPORT_SYMBOL_GPL(kvm_arch_end_assignment);
> +
> +bool kvm_arch_has_assigned_device(struct kvm *kvm)
> +{
> + return atomic_read(&kvm->arch.assigned_device_count);
> +}
> +EXPORT_SYMBOL_GPL(kvm_arch_has_assigned_device);
> +
> void kvm_arch_register_noncoherent_dma(struct kvm *kvm)
> {
> atomic_inc(&kvm->arch.noncoherent_dma_count);
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 9564fd78c547..05e99b8ef465 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -734,6 +734,24 @@ static inline bool kvm_arch_has_noncoherent_dma(struct kvm *kvm)
> return false;
> }
> #endif
> +#ifdef __KVM_HAVE_ARCH_ASSIGNED_DEVICE
> +void kvm_arch_start_assignment(struct kvm *kvm);
> +void kvm_arch_end_assignment(struct kvm *kvm);
> +bool kvm_arch_has_assigned_device(struct kvm *kvm);
> +#else
> +static inline void kvm_arch_start_assignment(struct kvm *kvm)
> +{
> +}
> +
> +static inline void kvm_arch_end_assignment(struct kvm *kvm)
> +{
> +}
> +
> +static inline bool kvm_arch_has_assigned_device(struct kvm *kvm)
> +{
> + return false;
> +}
> +#endif
>
> static inline wait_queue_head_t *kvm_arch_vcpu_wq(struct kvm_vcpu *vcpu)
> {
> diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
> index 620e37f741b8..1dd087da6f31 100644
> --- a/virt/kvm/vfio.c
> +++ b/virt/kvm/vfio.c
> @@ -155,6 +155,8 @@ static int kvm_vfio_set_group(struct kvm_device *dev, long attr, u64 arg)
> list_add_tail(&kvg->node, &kv->group_list);
> kvg->vfio_group = vfio_group;
>
> + kvm_arch_start_assignment(dev->kvm);
> +
> mutex_unlock(&kv->lock);
>
> kvm_vfio_update_coherency(dev);
> @@ -190,6 +192,8 @@ static int kvm_vfio_set_group(struct kvm_device *dev, long attr, u64 arg)
> break;
> }
>
> + kvm_arch_end_assignment(dev->kvm);
> +
> mutex_unlock(&kv->lock);
>
> kvm_vfio_group_put_external_user(vfio_group);
> @@ -239,6 +243,7 @@ static void kvm_vfio_destroy(struct kvm_device *dev)
> kvm_vfio_group_put_external_user(kvg->vfio_group);
> list_del(&kvg->node);
> kfree(kvg);
> + kvm_arch_end_assignment(dev->kvm);
> }
>
> kvm_vfio_update_coherency(dev);


2015-07-08 17:46:35

by Jörg Rödel

[permalink] [raw]
Subject: Re: [RFC/RFT PATCH v3 0/4] KVM: x86: full virtualization of guest MTRR

On Wed, Jul 08, 2015 at 05:18:26PM +0200, Paolo Bonzini wrote:
> I do not have any AMD machines that support an IOMMU, so I would like
> some help testing these patches.

Works here, tested on an AMD IOMMUv2 machine and could successfully
download a file over the assigned NIC.

Tested-by: Joerg Roedel <[email protected]>

2015-07-29 14:08:13

by Or Gerlitz

[permalink] [raw]
Subject: Re: [RFC/RFT PATCH v3 0/4] KVM: x86: full virtualization of guest MTRR

On 7/8/2015 6:18 PM, Paolo Bonzini wrote:
> This part of the MTRR patches was dropped by Xiao. Bring SVM on feature
> parity with VMX, and then do guest MTRR virtualization for both VMX and SVM.
>
> The IPAT bit of VMX extended page tables is emulated by mangling the guest
> PAT value.
>
> I do not have any AMD machines that support an IOMMU, so I would like
> some help testing these patches. Thanks,
>
>

Hi Paolo,

We (finally) have results showing that the patches work well and provide
benefit.

For getting better latency with ConnectX RDMA devices, we write send
descriptors
to a write-combining (WC) mapped buffer instead of ringing a doorbell
and having
the HW fetch the descriptor from system memory. In the mlx4 jargon, this
optimization
is called Blue-Flame (BF).

To test the patches, we booted two hosts with 4.2-rc, patched with
this series, over which a legacy (RHEL 6.x) guests arerunning.

Under SRIOV, the mlx4 VF driver queries the mlx4 PF driver on the
host if BF is availablefor them to use.

We did two runs:

In [1] the guest VF driver was told by the host PF driver
that BF isn't supportedand hence they didn't use WC.

In [2] the VFs were told by the host PF driver that BF is supported
and hence used WC.

The results are provided in micro-seconds and account for half RTT of
nativeRDMA latency test, the WC advantage is notable, so +1for this series!

Or.


[1] guests not-using Blue-Flame / Write-Combining

root@host-194-168-80-68 ~]# ib_send_lat -a
#bytes #iterations t_min[usec] t_max[usec] t_typical[usec]
2 1000 1.13 11.53 1.16
4 1000 1.13 6.33 1.16
8 1000 1.13 5.17 1.17
16 1000 1.14 4.37 1.17
32 1000 1.15 5.01 1.18
64 1000 1.19 7.96 1.22
128 1000 1.28 5.44 1.31
256 1000 1.62 6.90 1.65
512 1000 1.78 5.65 1.82


[2] guests using Blue-Flame when the host allows Write-Combining mapping
by VMs

root@host-194-168-80-68 ~]# ib_send_lat -a
#bytes #iterations t_min[usec] t_max[usec] t_typical[usec]
2 1000 0.86 16.97 0.89
4 1000 0.87 4.81 0.90
8 1000 0.87 4.89 0.90
16 1000 0.87 6.46 0.90
32 1000 0.88 4.22 0.91
64 1000 0.94 4.03 0.97
128 1000 1.03 6.50 1.06
256 1000 1.36 7.71 1.39
512 1000 1.49 5.92 1.52