2015-08-25 09:03:44

by Wu, Feng

[permalink] [raw]
Subject: [PATCH v7 00/17] Add VT-d Posted-Interrupts support

VT-d Posted-Interrupts is an enhancement to CPU side Posted-Interrupt.
With VT-d Posted-Interrupts enabled, external interrupts from
direct-assigned devices can be delivered to guests without VMM
intervention when guest is running in non-root mode.

You can find the VT-d Posted-Interrtups Spec. in the following URL:
http://www.intel.com/content/www/us/en/intelligent-systems/intel-technology/vt-directed-io-spec.html

v7:
* Define two weak irq bypass callbacks:
- kvm_arch_irq_bypass_start()
- kvm_arch_irq_bypass_stop()
* Remove the x86 dummy implementation of the above two functions.
* Print some useful information instead of WARN_ON() when the
irq bypass consumer unregistration fails.
* Fix an issue when calling pi_pre_block and pi_post_block.

v6:
* Rebase on 4.2.0-rc6
* Rebase on https://lkml.org/lkml/2015/8/6/526 and http://www.gossamer-threads.com/lists/linux/kernel/2235623
* Make the add_consumer and del_consumer callbacks static
* Remove pointless INIT_LIST_HEAD to 'vdev->ctx[vector].producer.node)'
* Use dev_info instead of WARN_ON() when irq_bypass_register_producer fails
* Remove optional dummy callbacks for irq producer

v4:
* For lowest-priority interrupt, only support single-CPU destination
interrupts at the current stage, more common lowest priority support
will be added later.
* Accoring to Marcelo's suggestion, when vCPU is blocked, we handle
the posted-interrupts in the HLT emulation path.
* Some small changes (coding style, typo, add some code comments)

v3:
* Adjust the Posted-interrupts Descriptor updating logic when vCPU is
preempted or blocked.
* KVM_DEV_VFIO_DEVICE_POSTING_IRQ --> KVM_DEV_VFIO_DEVICE_POST_IRQ
* __KVM_HAVE_ARCH_KVM_VFIO_POSTING --> __KVM_HAVE_ARCH_KVM_VFIO_POST
* Add KVM_DEV_VFIO_DEVICE_UNPOST_IRQ attribute for VFIO irq, which
can be used to change back to remapping mode.
* Fix typo

v2:
* Use VFIO framework to enable this feature, the VFIO part of this series is
base on Eric's patch "[PATCH v3 0/8] KVM-VFIO IRQ forward control"
* Rebase this patchset on git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git,
then revise some irq logic based on the new hierarchy irqdomain patches provided
by Jiang Liu <[email protected]>

Feng Wu (17):
KVM: Extend struct pi_desc for VT-d Posted-Interrupts
KVM: Add some helper functions for Posted-Interrupts
KVM: Define a new interface kvm_intr_is_single_vcpu()
KVM: Get Posted-Interrupts descriptor address from 'struct kvm_vcpu'
KVM: Add interfaces to control PI outside vmx
KVM: Make struct kvm_irq_routing_table accessible
KVM: make kvm_set_msi_irq() public
vfio: Select IRQ_BYPASS_MANAGER for vfio PCI devices
vfio: Register/unregister irq_bypass_producer
KVM: x86: Update IRTE for posted-interrupts
KVM: Define two weak arch callbacks for irq bypass manager
KVM: Implement IRQ bypass consumer callbacks for x86
KVM: Add an arch specific hooks in 'struct kvm_kernel_irqfd'
KVM: Update Posted-Interrupts Descriptor when vCPU is preempted
KVM: Update Posted-Interrupts Descriptor when vCPU is blocked
KVM: Warn if 'SN' is set during posting interrupts by software
iommu/vt-d: Add a command line parameter for VT-d posted-interrupts

Documentation/kernel-parameters.txt | 1 +
arch/x86/include/asm/kvm_host.h | 20 +++
arch/x86/kvm/Kconfig | 1 +
arch/x86/kvm/irq_comm.c | 28 +++-
arch/x86/kvm/vmx.c | 288 +++++++++++++++++++++++++++++++++++-
arch/x86/kvm/x86.c | 167 +++++++++++++++++++--
drivers/iommu/irq_remapping.c | 12 +-
drivers/vfio/pci/Kconfig | 1 +
drivers/vfio/pci/vfio_pci_intrs.c | 9 ++
drivers/vfio/pci/vfio_pci_private.h | 2 +
include/linux/kvm_host.h | 28 ++++
include/linux/kvm_irqfd.h | 2 +
virt/kvm/eventfd.c | 22 ++-
virt/kvm/irqchip.c | 10 --
virt/kvm/kvm_main.c | 3 +
15 files changed, 565 insertions(+), 29 deletions(-)

--
2.1.0


2015-08-25 09:23:34

by Wu, Feng

[permalink] [raw]
Subject: [PATCH v7 01/17] KVM: Extend struct pi_desc for VT-d Posted-Interrupts

Extend struct pi_desc for VT-d Posted-Interrupts.

Signed-off-by: Feng Wu <[email protected]>
---
arch/x86/kvm/vmx.c | 20 ++++++++++++++++++--
1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 83b7b5c..271dd70 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -446,8 +446,24 @@ struct nested_vmx {
/* Posted-Interrupt Descriptor */
struct pi_desc {
u32 pir[8]; /* Posted interrupt requested */
- u32 control; /* bit 0 of control is outstanding notification bit */
- u32 rsvd[7];
+ union {
+ struct {
+ /* bit 256 - Outstanding Notification */
+ u16 on : 1,
+ /* bit 257 - Suppress Notification */
+ sn : 1,
+ /* bit 271:258 - Reserved */
+ rsvd_1 : 14;
+ /* bit 279:272 - Notification Vector */
+ u8 nv;
+ /* bit 287:280 - Reserved */
+ u8 rsvd_2;
+ /* bit 319:288 - Notification Destination */
+ u32 ndst;
+ };
+ u64 control;
+ };
+ u32 rsvd[6];
} __aligned(64);

static bool pi_test_and_set_on(struct pi_desc *pi_desc)
--
2.1.0

2015-08-25 09:03:51

by Wu, Feng

[permalink] [raw]
Subject: [PATCH v7 02/17] KVM: Add some helper functions for Posted-Interrupts

This patch adds some helper functions to manipulate the
Posted-Interrupts Descriptor.

Signed-off-by: Feng Wu <[email protected]>
---
arch/x86/kvm/vmx.c | 26 ++++++++++++++++++++++++++
1 file changed, 26 insertions(+)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 271dd70..316f9bf 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -443,6 +443,8 @@ struct nested_vmx {
};

#define POSTED_INTR_ON 0
+#define POSTED_INTR_SN 1
+
/* Posted-Interrupt Descriptor */
struct pi_desc {
u32 pir[8]; /* Posted interrupt requested */
@@ -483,6 +485,30 @@ static int pi_test_and_set_pir(int vector, struct pi_desc *pi_desc)
return test_and_set_bit(vector, (unsigned long *)pi_desc->pir);
}

+static void pi_clear_sn(struct pi_desc *pi_desc)
+{
+ return clear_bit(POSTED_INTR_SN,
+ (unsigned long *)&pi_desc->control);
+}
+
+static void pi_set_sn(struct pi_desc *pi_desc)
+{
+ return set_bit(POSTED_INTR_SN,
+ (unsigned long *)&pi_desc->control);
+}
+
+static int pi_test_on(struct pi_desc *pi_desc)
+{
+ return test_bit(POSTED_INTR_ON,
+ (unsigned long *)&pi_desc->control);
+}
+
+static int pi_test_sn(struct pi_desc *pi_desc)
+{
+ return test_bit(POSTED_INTR_SN,
+ (unsigned long *)&pi_desc->control);
+}
+
struct vcpu_vmx {
struct kvm_vcpu vcpu;
unsigned long host_rsp;
--
2.1.0

2015-08-25 09:22:53

by Wu, Feng

[permalink] [raw]
Subject: [PATCH v7 03/17] KVM: Define a new interface kvm_intr_is_single_vcpu()

This patch defines a new interface kvm_intr_is_single_vcpu(),
which can returns whether the interrupt is for single-CPU or not.

It is used by VT-d PI, since now we only support single-CPU
interrupts, For lowest-priority interrupts, if user configures
it via /proc/irq or uses irqbalance to make it single-CPU, we
can use PI to deliver the interrupts to it. Full functionality
of lowest-priority support will be added later.

Signed-off-by: Feng Wu <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 3 +++
arch/x86/kvm/irq_comm.c | 24 ++++++++++++++++++++++++
2 files changed, 27 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 49ec903..af11bca 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1204,4 +1204,7 @@ int __x86_set_memory_region(struct kvm *kvm,
int x86_set_memory_region(struct kvm *kvm,
const struct kvm_userspace_memory_region *mem);

+bool kvm_intr_is_single_vcpu(struct kvm *kvm, struct kvm_lapic_irq *irq,
+ struct kvm_vcpu **dest_vcpu);
+
#endif /* _ASM_X86_KVM_HOST_H */
diff --git a/arch/x86/kvm/irq_comm.c b/arch/x86/kvm/irq_comm.c
index 9efff9e..a9572a13 100644
--- a/arch/x86/kvm/irq_comm.c
+++ b/arch/x86/kvm/irq_comm.c
@@ -297,6 +297,30 @@ out:
return r;
}

+bool kvm_intr_is_single_vcpu(struct kvm *kvm, struct kvm_lapic_irq *irq,
+ struct kvm_vcpu **dest_vcpu)
+{
+ int i, r = 0;
+ struct kvm_vcpu *vcpu;
+
+ kvm_for_each_vcpu(i, vcpu, kvm) {
+ if (!kvm_apic_present(vcpu))
+ continue;
+
+ if (!kvm_apic_match_dest(vcpu, NULL, irq->shorthand,
+ irq->dest_id, irq->dest_mode))
+ continue;
+
+ r++;
+ *dest_vcpu = vcpu;
+ }
+
+ if (r == 1)
+ return true;
+ else
+ return false;
+}
+
#define IOAPIC_ROUTING_ENTRY(irq) \
{ .gsi = irq, .type = KVM_IRQ_ROUTING_IRQCHIP, \
.u.irqchip = { .irqchip = KVM_IRQCHIP_IOAPIC, .pin = (irq) } }
--
2.1.0

2015-08-25 09:03:58

by Wu, Feng

[permalink] [raw]
Subject: [PATCH v7 04/17] KVM: Get Posted-Interrupts descriptor address from 'struct kvm_vcpu'

Define an interface to get PI descriptor address from the vCPU structure.

Signed-off-by: Feng Wu <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 2 ++
arch/x86/kvm/vmx.c | 11 +++++++++++
2 files changed, 13 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index af11bca..d50c1d3 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -858,6 +858,8 @@ struct kvm_x86_ops {
void (*enable_log_dirty_pt_masked)(struct kvm *kvm,
struct kvm_memory_slot *slot,
gfn_t offset, unsigned long mask);
+
+ u64 (*get_pi_desc_addr)(struct kvm_vcpu *vcpu);
/* pmu operations of sub-arch */
const struct kvm_pmu_ops *pmu_ops;
};
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 316f9bf..81a995c 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -610,6 +610,10 @@ static inline struct vcpu_vmx *to_vmx(struct kvm_vcpu *vcpu)
#define FIELD64(number, name) [number] = VMCS12_OFFSET(name), \
[number##_HIGH] = VMCS12_OFFSET(name)+4

+struct pi_desc *vcpu_to_pi_desc(struct kvm_vcpu *vcpu)
+{
+ return &(to_vmx(vcpu)->pi_desc);
+}

static unsigned long shadow_read_only_fields[] = {
/*
@@ -4487,6 +4491,11 @@ static void vmx_sync_pir_to_irr_dummy(struct kvm_vcpu *vcpu)
return;
}

+static u64 vmx_get_pi_desc_addr(struct kvm_vcpu *vcpu)
+{
+ return __pa((u64)vcpu_to_pi_desc(vcpu));
+}
+
/*
* Set up the vmcs's constant host-state fields, i.e., host-state fields that
* will not change in the lifetime of the guest.
@@ -10460,6 +10469,8 @@ static struct kvm_x86_ops vmx_x86_ops = {
.flush_log_dirty = vmx_flush_log_dirty,
.enable_log_dirty_pt_masked = vmx_enable_log_dirty_pt_masked,

+ .get_pi_desc_addr = vmx_get_pi_desc_addr,
+
.pmu_ops = &intel_pmu_ops,
};

--
2.1.0

2015-08-25 09:03:56

by Wu, Feng

[permalink] [raw]
Subject: [PATCH v7 05/17] KVM: Add interfaces to control PI outside vmx

This patch adds pi_clear_sn and pi_set_sn to struct kvm_x86_ops,
so we can set/clear SN outside vmx.

Signed-off-by: Feng Wu <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 3 +++
arch/x86/kvm/vmx.c | 13 +++++++++++++
2 files changed, 16 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index d50c1d3..c4f99f1 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -860,6 +860,9 @@ struct kvm_x86_ops {
gfn_t offset, unsigned long mask);

u64 (*get_pi_desc_addr)(struct kvm_vcpu *vcpu);
+
+ void (*pi_clear_sn)(struct kvm_vcpu *vcpu);
+ void (*pi_set_sn)(struct kvm_vcpu *vcpu);
/* pmu operations of sub-arch */
const struct kvm_pmu_ops *pmu_ops;
};
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 81a995c..234f720 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -615,6 +615,16 @@ struct pi_desc *vcpu_to_pi_desc(struct kvm_vcpu *vcpu)
return &(to_vmx(vcpu)->pi_desc);
}

+static void vmx_pi_clear_sn(struct kvm_vcpu *vcpu)
+{
+ pi_clear_sn(vcpu_to_pi_desc(vcpu));
+}
+
+static void vmx_pi_set_sn(struct kvm_vcpu *vcpu)
+{
+ pi_set_sn(vcpu_to_pi_desc(vcpu));
+}
+
static unsigned long shadow_read_only_fields[] = {
/*
* We do NOT shadow fields that are modified when L0
@@ -10471,6 +10481,9 @@ static struct kvm_x86_ops vmx_x86_ops = {

.get_pi_desc_addr = vmx_get_pi_desc_addr,

+ .pi_clear_sn = vmx_pi_clear_sn,
+ .pi_set_sn = vmx_pi_set_sn,
+
.pmu_ops = &intel_pmu_ops,
};

--
2.1.0

2015-08-25 09:21:54

by Wu, Feng

[permalink] [raw]
Subject: [PATCH v7 06/17] KVM: Make struct kvm_irq_routing_table accessible

Move struct kvm_irq_routing_table from irqchip.c to kvm_host.h,
so we can use it outside of irqchip.c.

Signed-off-by: Feng Wu <[email protected]>
---
include/linux/kvm_host.h | 14 ++++++++++++++
virt/kvm/irqchip.c | 10 ----------
2 files changed, 14 insertions(+), 10 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 5ac8d21..5f183fb 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -328,6 +328,20 @@ struct kvm_kernel_irq_routing_entry {
struct hlist_node link;
};

+#ifdef CONFIG_HAVE_KVM_IRQ_ROUTING
+
+struct kvm_irq_routing_table {
+ int chip[KVM_NR_IRQCHIPS][KVM_IRQCHIP_NUM_PINS];
+ u32 nr_rt_entries;
+ /*
+ * Array indexed by gsi. Each entry contains list of irq chips
+ * the gsi is connected to.
+ */
+ struct hlist_head map[0];
+};
+
+#endif
+
#ifndef KVM_PRIVATE_MEM_SLOTS
#define KVM_PRIVATE_MEM_SLOTS 0
#endif
diff --git a/virt/kvm/irqchip.c b/virt/kvm/irqchip.c
index 21c1424..2cf45d3 100644
--- a/virt/kvm/irqchip.c
+++ b/virt/kvm/irqchip.c
@@ -31,16 +31,6 @@
#include <trace/events/kvm.h>
#include "irq.h"

-struct kvm_irq_routing_table {
- int chip[KVM_NR_IRQCHIPS][KVM_IRQCHIP_NUM_PINS];
- u32 nr_rt_entries;
- /*
- * Array indexed by gsi. Each entry contains list of irq chips
- * the gsi is connected to.
- */
- struct hlist_head map[0];
-};
-
int kvm_irq_map_gsi(struct kvm *kvm,
struct kvm_kernel_irq_routing_entry *entries, int gsi)
{
--
2.1.0

2015-08-25 09:20:44

by Wu, Feng

[permalink] [raw]
Subject: [PATCH v7 07/17] KVM: make kvm_set_msi_irq() public

Make kvm_set_msi_irq() public, we can use this function outside.

Signed-off-by: Feng Wu <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 4 ++++
arch/x86/kvm/irq_comm.c | 4 ++--
2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index c4f99f1..82d0709 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -175,6 +175,8 @@ enum {
*/
#define KVM_APIC_PV_EOI_PENDING 1

+struct kvm_kernel_irq_routing_entry;
+
/*
* We don't want allocation failures within the mmu code, so we preallocate
* enough memory for a single page fault in a cache.
@@ -1212,4 +1214,6 @@ int x86_set_memory_region(struct kvm *kvm,
bool kvm_intr_is_single_vcpu(struct kvm *kvm, struct kvm_lapic_irq *irq,
struct kvm_vcpu **dest_vcpu);

+void kvm_set_msi_irq(struct kvm_kernel_irq_routing_entry *e,
+ struct kvm_lapic_irq *irq);
#endif /* _ASM_X86_KVM_HOST_H */
diff --git a/arch/x86/kvm/irq_comm.c b/arch/x86/kvm/irq_comm.c
index a9572a13..1319c60 100644
--- a/arch/x86/kvm/irq_comm.c
+++ b/arch/x86/kvm/irq_comm.c
@@ -91,8 +91,8 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src,
return r;
}

-static inline void kvm_set_msi_irq(struct kvm_kernel_irq_routing_entry *e,
- struct kvm_lapic_irq *irq)
+void kvm_set_msi_irq(struct kvm_kernel_irq_routing_entry *e,
+ struct kvm_lapic_irq *irq)
{
trace_kvm_msi_set_irq(e->msi.address_lo, e->msi.data);

--
2.1.0

2015-08-25 09:04:05

by Wu, Feng

[permalink] [raw]
Subject: [PATCH v7 08/17] vfio: Select IRQ_BYPASS_MANAGER for vfio PCI devices

Enable irq bypass manager for vfio PCI devices.

Signed-off-by: Feng Wu <[email protected]>
---
drivers/vfio/pci/Kconfig | 1 +
1 file changed, 1 insertion(+)

diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig
index 579d83b..02912f1 100644
--- a/drivers/vfio/pci/Kconfig
+++ b/drivers/vfio/pci/Kconfig
@@ -2,6 +2,7 @@ config VFIO_PCI
tristate "VFIO support for PCI devices"
depends on VFIO && PCI && EVENTFD
select VFIO_VIRQFD
+ select IRQ_BYPASS_MANAGER
help
Support for the PCI VFIO bus driver. This is required to make
use of PCI drivers using the VFIO framework.
--
2.1.0

2015-08-25 09:19:58

by Wu, Feng

[permalink] [raw]
Subject: [PATCH v7 09/17] vfio: Register/unregister irq_bypass_producer

This patch adds the registration/unregistration of an
irq_bypass_producer for MSI/MSIx on vfio pci devices.

v6:
- Make the add_consumer and del_consumer callbacks static
- Remove pointless INIT_LIST_HEAD to 'vdev->ctx[vector].producer.node)'
- Use dev_info instead of WARN_ON() when irq_bypass_register_producer fails
- Remove optional dummy callbacks for irq producer

Signed-off-by: Feng Wu <[email protected]>
---
drivers/vfio/pci/vfio_pci_intrs.c | 9 +++++++++
drivers/vfio/pci/vfio_pci_private.h | 2 ++
2 files changed, 11 insertions(+)

diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
index 1f577b4..c65299d 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -319,6 +319,7 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_device *vdev,

if (vdev->ctx[vector].trigger) {
free_irq(irq, vdev->ctx[vector].trigger);
+ irq_bypass_unregister_producer(&vdev->ctx[vector].producer);
kfree(vdev->ctx[vector].name);
eventfd_ctx_put(vdev->ctx[vector].trigger);
vdev->ctx[vector].trigger = NULL;
@@ -360,6 +361,14 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_device *vdev,
return ret;
}

+ vdev->ctx[vector].producer.token = trigger;
+ vdev->ctx[vector].producer.irq = irq;
+ ret = irq_bypass_register_producer(&vdev->ctx[vector].producer);
+ if (unlikely(ret))
+ dev_info(&pdev->dev,
+ "irq bypass producer (token %p) registeration fails: %d\n",
+ vdev->ctx[vector].producer.token, ret);
+
vdev->ctx[vector].trigger = trigger;

return 0;
diff --git a/drivers/vfio/pci/vfio_pci_private.h b/drivers/vfio/pci/vfio_pci_private.h
index ae0e1b4..0e7394f 100644
--- a/drivers/vfio/pci/vfio_pci_private.h
+++ b/drivers/vfio/pci/vfio_pci_private.h
@@ -13,6 +13,7 @@

#include <linux/mutex.h>
#include <linux/pci.h>
+#include <linux/irqbypass.h>

#ifndef VFIO_PCI_PRIVATE_H
#define VFIO_PCI_PRIVATE_H
@@ -29,6 +30,7 @@ struct vfio_pci_irq_ctx {
struct virqfd *mask;
char *name;
bool masked;
+ struct irq_bypass_producer producer;
};

struct vfio_pci_device {
--
2.1.0

2015-08-25 09:19:15

by Wu, Feng

[permalink] [raw]
Subject: [PATCH v7 10/17] KVM: x86: Update IRTE for posted-interrupts

This patch adds the routine to update IRTE for posted-interrupts
when guest changes the interrupt configuration.

Signed-off-by: Feng Wu <[email protected]>
---
arch/x86/kvm/x86.c | 73 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 73 insertions(+)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 5ef2560..8f09a76 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -63,6 +63,7 @@
#include <asm/fpu/internal.h> /* Ugh! */
#include <asm/pvclock.h>
#include <asm/div64.h>
+#include <asm/irq_remapping.h>

#define MAX_IO_MSRS 256
#define KVM_MAX_MCE_BANKS 32
@@ -8248,6 +8249,78 @@ bool kvm_arch_has_noncoherent_dma(struct kvm *kvm)
}
EXPORT_SYMBOL_GPL(kvm_arch_has_noncoherent_dma);

+/*
+ * kvm_arch_update_pi_irte - set IRTE for Posted-Interrupts
+ *
+ * @kvm: kvm
+ * @host_irq: host irq of the interrupt
+ * @guest_irq: gsi of the interrupt
+ * @set: set or unset PI
+ * returns 0 on success, < 0 on failure
+ */
+int kvm_arch_update_pi_irte(struct kvm *kvm, unsigned int host_irq,
+ uint32_t guest_irq, bool set)
+{
+ struct kvm_kernel_irq_routing_entry *e;
+ struct kvm_irq_routing_table *irq_rt;
+ struct kvm_lapic_irq irq;
+ struct kvm_vcpu *vcpu;
+ struct vcpu_data vcpu_info;
+ int idx, ret = -EINVAL;
+
+ if (!irq_remapping_cap(IRQ_POSTING_CAP))
+ return 0;
+
+ idx = srcu_read_lock(&kvm->irq_srcu);
+ irq_rt = srcu_dereference(kvm->irq_routing, &kvm->irq_srcu);
+ BUG_ON(guest_irq >= irq_rt->nr_rt_entries);
+
+ hlist_for_each_entry(e, &irq_rt->map[guest_irq], link) {
+ if (e->type != KVM_IRQ_ROUTING_MSI)
+ continue;
+ /*
+ * VT-d PI cannot support posting multicast/broadcast
+ * interrupts to a VCPU, we still use interrupt remapping
+ * for these kind of interrupts.
+ *
+ * For lowest-priority interrupts, we only support
+ * those with single CPU as the destination, e.g. user
+ * configures the interrupts via /proc/irq or uses
+ * irqbalance to make the interrupts single-CPU.
+ *
+ * We will support full lowest-priority interrupt later.
+ *
+ */
+
+ kvm_set_msi_irq(e, &irq);
+ if (!kvm_intr_is_single_vcpu(kvm, &irq, &vcpu))
+ continue;
+
+ vcpu_info.pi_desc_addr = kvm_x86_ops->get_pi_desc_addr(vcpu);
+ vcpu_info.vector = irq.vector;
+
+ if (set)
+ ret = irq_set_vcpu_affinity(host_irq, &vcpu_info);
+ else {
+ /* suppress notification event before unposting */
+ kvm_x86_ops->pi_set_sn(vcpu);
+ ret = irq_set_vcpu_affinity(host_irq, NULL);
+ kvm_x86_ops->pi_clear_sn(vcpu);
+ }
+
+ if (ret < 0) {
+ printk(KERN_INFO "%s: failed to update PI IRTE\n",
+ __func__);
+ goto out;
+ }
+ }
+
+ ret = 0;
+out:
+ srcu_read_unlock(&kvm->irq_srcu, idx);
+ return ret;
+}
+
EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_exit);
EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_inj_virq);
EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_page_fault);
--
2.1.0

2015-08-25 09:04:20

by Wu, Feng

[permalink] [raw]
Subject: [PATCH v7 11/17] KVM: Define two weak arch callbacks for irq bypass manager

Define two weak arch callbacks so that archs that don't need
them don't need define them.

Signed-off-by: Feng Wu <[email protected]>
---
virt/kvm/eventfd.c | 10 ++++++++++
1 file changed, 10 insertions(+)

diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
index d7a230f..f3050b9 100644
--- a/virt/kvm/eventfd.c
+++ b/virt/kvm/eventfd.c
@@ -256,6 +256,16 @@ static void irqfd_update(struct kvm *kvm, struct kvm_kernel_irqfd *irqfd)
write_seqcount_end(&irqfd->irq_entry_sc);
}

+void __attribute__((weak)) kvm_arch_irq_bypass_stop(
+ struct irq_bypass_consumer *cons)
+{
+}
+
+void __attribute__((weak)) kvm_arch_irq_bypass_start(
+ struct irq_bypass_consumer *cons)
+{
+}
+
static int
kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args)
{
--
2.1.0

2015-08-25 09:04:14

by Wu, Feng

[permalink] [raw]
Subject: [PATCH v7 12/17] KVM: Implement IRQ bypass consumer callbacks for x86

Implement the following callbacks for x86:

- kvm_arch_irq_bypass_add_producer
- kvm_arch_irq_bypass_del_producer
- kvm_arch_irq_bypass_stop: dummy callback
- kvm_arch_irq_bypass_resume: dummy callback

and set CONFIG_HAVE_KVM_IRQ_BYPASS for x86.

Signed-off-by: Feng Wu <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/kvm/Kconfig | 1 +
arch/x86/kvm/x86.c | 34 ++++++++++++++++++++++++++++++++++
3 files changed, 36 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 82d0709..3038c1b 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -24,6 +24,7 @@
#include <linux/perf_event.h>
#include <linux/pvclock_gtod.h>
#include <linux/clocksource.h>
+#include <linux/irqbypass.h>

#include <asm/pvclock-abi.h>
#include <asm/desc.h>
diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index c951d44..b90776f 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -30,6 +30,7 @@ config KVM
select HAVE_KVM_IRQCHIP
select HAVE_KVM_IRQFD
select IRQ_BYPASS_MANAGER
+ select HAVE_KVM_IRQ_BYPASS
select HAVE_KVM_IRQ_ROUTING
select HAVE_KVM_EVENTFD
select KVM_APIC_ARCHITECTURE
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 8f09a76..be4b561 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -50,6 +50,8 @@
#include <linux/pci.h>
#include <linux/timekeeper_internal.h>
#include <linux/pvclock_gtod.h>
+#include <linux/kvm_irqfd.h>
+#include <linux/irqbypass.h>
#include <trace/events/kvm.h>

#define CREATE_TRACE_POINTS
@@ -8321,6 +8323,38 @@ out:
return ret;
}

+int kvm_arch_irq_bypass_add_producer(struct irq_bypass_consumer *cons,
+ struct irq_bypass_producer *prod)
+{
+ struct kvm_kernel_irqfd *irqfd =
+ container_of(cons, struct kvm_kernel_irqfd, consumer);
+
+ irqfd->producer = prod;
+
+ return kvm_arch_update_pi_irte(irqfd->kvm, prod->irq, irqfd->gsi, 1);
+}
+
+void kvm_arch_irq_bypass_del_producer(struct irq_bypass_consumer *cons,
+ struct irq_bypass_producer *prod)
+{
+ int ret;
+ struct kvm_kernel_irqfd *irqfd =
+ container_of(cons, struct kvm_kernel_irqfd, consumer);
+
+ irqfd->producer = NULL;
+
+ /*
+ * When producer of consumer is unregistered, we change back to
+ * remapped mode, so we can re-use the current implementation
+ * when the irq is masked/disabed or the consumer side (KVM
+ * int this case doesn't want to receive the interrupts.
+ */
+ ret = kvm_arch_update_pi_irte(irqfd->kvm, prod->irq, irqfd->gsi, 0);
+ if (ret)
+ printk(KERN_INFO "irq bypass consumer (token %p) unregistration"
+ " fails: %d\n", irqfd->consumer.token, ret);
+}
+
EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_exit);
EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_inj_virq);
EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_page_fault);
--
2.1.0

2015-08-25 09:04:17

by Wu, Feng

[permalink] [raw]
Subject: [PATCH v7 13/17] KVM: Add an arch specific hooks in 'struct kvm_kernel_irqfd'

This patch adds an arch specific hooks 'arch_update' in
'struct kvm_kernel_irqfd'. On Intel side, it is used to
update the IRTE when VT-d posted-interrupts is used.

Signed-off-by: Feng Wu <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 2 ++
arch/x86/kvm/x86.c | 5 +++++
include/linux/kvm_host.h | 11 +++++++++++
include/linux/kvm_irqfd.h | 2 ++
virt/kvm/eventfd.c | 12 +++++++++++-
5 files changed, 31 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 3038c1b..22269b4 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -176,6 +176,8 @@ enum {
*/
#define KVM_APIC_PV_EOI_PENDING 1

+#define __KVM_HAVE_ARCH_IRQFD_INIT 1
+
struct kvm_kernel_irq_routing_entry;

/*
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index be4b561..ef93fdc 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8355,6 +8355,11 @@ void kvm_arch_irq_bypass_del_producer(struct irq_bypass_consumer *cons,
" fails: %d\n", irqfd->consumer.token, ret);
}

+void kvm_arch_irqfd_init(struct kvm_kernel_irqfd *irqfd)
+{
+ irqfd->arch_update = kvm_arch_update_pi_irte;
+}
+
EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_exit);
EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_inj_virq);
EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_page_fault);
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 5f183fb..f4005dc 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -34,6 +34,8 @@

#include <asm/kvm_host.h>

+struct kvm_kernel_irqfd;
+
/*
* The bit 16 ~ bit 31 of kvm_memory_region::flags are internally used
* in kvm, other bits are visible for userspace which are defined in
@@ -1145,6 +1147,15 @@ extern struct kvm_device_ops kvm_xics_ops;
extern struct kvm_device_ops kvm_arm_vgic_v2_ops;
extern struct kvm_device_ops kvm_arm_vgic_v3_ops;

+#ifdef __KVM_HAVE_ARCH_IRQFD_INIT
+void kvm_arch_irqfd_init(struct kvm_kernel_irqfd *irqfd);
+#else
+static inline void kvm_arch_irqfd_init(struct kvm_kernel_irqfd *irqfd)
+{
+ irqfd->arch_update = NULL;
+}
+#endif
+
#ifdef CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT

static inline void kvm_vcpu_set_in_spin_loop(struct kvm_vcpu *vcpu, bool val)
diff --git a/include/linux/kvm_irqfd.h b/include/linux/kvm_irqfd.h
index 0c1de05..b7aab52 100644
--- a/include/linux/kvm_irqfd.h
+++ b/include/linux/kvm_irqfd.h
@@ -66,6 +66,8 @@ struct kvm_kernel_irqfd {
struct work_struct shutdown;
struct irq_bypass_consumer consumer;
struct irq_bypass_producer *producer;
+ int (*arch_update)(struct kvm *kvm, unsigned int host_irq,
+ uint32_t guest_irq, bool set);
};

#endif /* __LINUX_KVM_IRQFD_H */
diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
index f3050b9..b2d9066 100644
--- a/virt/kvm/eventfd.c
+++ b/virt/kvm/eventfd.c
@@ -288,6 +288,7 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args)
INIT_LIST_HEAD(&irqfd->list);
INIT_WORK(&irqfd->inject, irqfd_inject);
INIT_WORK(&irqfd->shutdown, irqfd_shutdown);
+ kvm_arch_irqfd_init(irqfd);
seqcount_init(&irqfd->irq_entry_sc);

f = fdget(args->fd);
@@ -580,13 +581,22 @@ kvm_irqfd_release(struct kvm *kvm)
*/
void kvm_irq_routing_update(struct kvm *kvm)
{
+ int ret;
struct kvm_kernel_irqfd *irqfd;

spin_lock_irq(&kvm->irqfds.lock);

- list_for_each_entry(irqfd, &kvm->irqfds.items, list)
+ list_for_each_entry(irqfd, &kvm->irqfds.items, list) {
irqfd_update(kvm, irqfd);

+ if (irqfd->arch_update && irqfd->producer) {
+ ret = irqfd->arch_update(
+ irqfd->kvm, irqfd->producer->irq,
+ irqfd->gsi, 1);
+ WARN_ON(ret);
+ }
+ }
+
spin_unlock_irq(&kvm->irqfds.lock);
}

--
2.1.0

2015-08-25 09:18:19

by Wu, Feng

[permalink] [raw]
Subject: [PATCH v7 14/17] KVM: Update Posted-Interrupts Descriptor when vCPU is preempted

This patch updates the Posted-Interrupts Descriptor when vCPU
is preempted.

sched out:
- Set 'SN' to suppress furture non-urgent interrupts posted for
the vCPU.

sched in:
- Clear 'SN'
- Change NDST if vCPU is scheduled to a different CPU
- Set 'NV' to POSTED_INTR_VECTOR

Signed-off-by: Feng Wu <[email protected]>
---
arch/x86/kvm/vmx.c | 51 +++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 51 insertions(+)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 234f720..9c87064 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -45,6 +45,7 @@
#include <asm/debugreg.h>
#include <asm/kexec.h>
#include <asm/apic.h>
+#include <asm/irq_remapping.h>

#include "trace.h"
#include "pmu.h"
@@ -2001,10 +2002,60 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
vmcs_writel(HOST_IA32_SYSENTER_ESP, sysenter_esp); /* 22.2.3 */
vmx->loaded_vmcs->cpu = cpu;
}
+
+ if (irq_remapping_cap(IRQ_POSTING_CAP)) {
+ struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
+ struct pi_desc old, new;
+ unsigned int dest;
+
+ do {
+ old.control = new.control = pi_desc->control;
+
+ /*
+ * If 'nv' field is POSTED_INTR_WAKEUP_VECTOR, there
+ * are two possible cases:
+ * 1. After running 'pi_pre_block', context switch
+ * happened. For this case, 'sn' was set in
+ * vmx_vcpu_put(), so we need to clear it here.
+ * 2. After running 'pi_pre_block', we were blocked,
+ * and woken up by some other guy. For this case,
+ * we don't need to do anything, 'pi_post_block'
+ * will do everything for us. However, we cannot
+ * check whether it is case #1 or case #2 here
+ * (maybe, not needed), so we also clear sn here,
+ * I think it is not a big deal.
+ */
+ if (pi_desc->nv != POSTED_INTR_WAKEUP_VECTOR) {
+ if (vcpu->cpu != cpu) {
+ dest = cpu_physical_id(cpu);
+
+ if (x2apic_enabled())
+ new.ndst = dest;
+ else
+ new.ndst = (dest << 8) & 0xFF00;
+ }
+
+ /* set 'NV' to 'notification vector' */
+ new.nv = POSTED_INTR_VECTOR;
+ }
+
+ /* Allow posting non-urgent interrupts */
+ new.sn = 0;
+ } while (cmpxchg(&pi_desc->control, old.control,
+ new.control) != old.control);
+ }
}

static void vmx_vcpu_put(struct kvm_vcpu *vcpu)
{
+ if (irq_remapping_cap(IRQ_POSTING_CAP)) {
+ struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
+
+ /* Set SN when the vCPU is preempted */
+ if (vcpu->preempted)
+ pi_set_sn(pi_desc);
+ }
+
__vmx_load_host_state(to_vmx(vcpu));
if (!vmm_exclusive) {
__loaded_vmcs_clear(to_vmx(vcpu)->loaded_vmcs);
--
2.1.0

2015-08-25 09:05:58

by Wu, Feng

[permalink] [raw]
Subject: [PATCH v7 15/17] KVM: Update Posted-Interrupts Descriptor when vCPU is blocked

This patch updates the Posted-Interrupts Descriptor when vCPU
is blocked.

pre-block:
- Add the vCPU to the blocked per-CPU list
- Set 'NV' to POSTED_INTR_WAKEUP_VECTOR

post-block:
- Remove the vCPU from the per-CPU list

Signed-off-by: Feng Wu <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 5 ++
arch/x86/kvm/vmx.c | 151 ++++++++++++++++++++++++++++++++++++++++
arch/x86/kvm/x86.c | 55 ++++++++++++---
include/linux/kvm_host.h | 3 +
virt/kvm/kvm_main.c | 3 +
5 files changed, 207 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 22269b4..32af275 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -554,6 +554,8 @@ struct kvm_vcpu_arch {
*/
bool write_fault_to_shadow_pgtable;

+ bool halted;
+
/* set at EPT violation at this point */
unsigned long exit_qualification;

@@ -868,6 +870,9 @@ struct kvm_x86_ops {

void (*pi_clear_sn)(struct kvm_vcpu *vcpu);
void (*pi_set_sn)(struct kvm_vcpu *vcpu);
+
+ int (*pi_pre_block)(struct kvm_vcpu *vcpu);
+ void (*pi_post_block)(struct kvm_vcpu *vcpu);
/* pmu operations of sub-arch */
const struct kvm_pmu_ops *pmu_ops;
};
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 9c87064..64e35ea 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -888,6 +888,13 @@ static DEFINE_PER_CPU(struct vmcs *, current_vmcs);
static DEFINE_PER_CPU(struct list_head, loaded_vmcss_on_cpu);
static DEFINE_PER_CPU(struct desc_ptr, host_gdt);

+/*
+ * We maintian a per-CPU linked-list of vCPU, so in wakeup_handler() we
+ * can find which vCPU should be waken up.
+ */
+static DEFINE_PER_CPU(struct list_head, blocked_vcpu_on_cpu);
+static DEFINE_PER_CPU(spinlock_t, blocked_vcpu_on_cpu_lock);
+
static unsigned long *vmx_io_bitmap_a;
static unsigned long *vmx_io_bitmap_b;
static unsigned long *vmx_msr_bitmap_legacy;
@@ -2981,6 +2988,8 @@ static int hardware_enable(void)
return -EBUSY;

INIT_LIST_HEAD(&per_cpu(loaded_vmcss_on_cpu, cpu));
+ INIT_LIST_HEAD(&per_cpu(blocked_vcpu_on_cpu, cpu));
+ spin_lock_init(&per_cpu(blocked_vcpu_on_cpu_lock, cpu));

/*
* Now we can enable the vmclear operation in kdump
@@ -6106,6 +6115,25 @@ static void update_ple_window_actual_max(void)
ple_window_grow, INT_MIN);
}

+/*
+ * Handler for POSTED_INTERRUPT_WAKEUP_VECTOR.
+ */
+static void wakeup_handler(void)
+{
+ struct kvm_vcpu *vcpu;
+ int cpu = smp_processor_id();
+
+ spin_lock(&per_cpu(blocked_vcpu_on_cpu_lock, cpu));
+ list_for_each_entry(vcpu, &per_cpu(blocked_vcpu_on_cpu, cpu),
+ blocked_vcpu_list) {
+ struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
+
+ if (pi_test_on(pi_desc) == 1)
+ kvm_vcpu_kick(vcpu);
+ }
+ spin_unlock(&per_cpu(blocked_vcpu_on_cpu_lock, cpu));
+}
+
static __init int hardware_setup(void)
{
int r = -ENOMEM, i, msr;
@@ -6290,6 +6318,8 @@ static __init int hardware_setup(void)
kvm_x86_ops->enable_log_dirty_pt_masked = NULL;
}

+ kvm_set_posted_intr_wakeup_handler(wakeup_handler);
+
return alloc_kvm_area();

out8:
@@ -10414,6 +10444,124 @@ static void vmx_enable_log_dirty_pt_masked(struct kvm *kvm,
kvm_mmu_clear_dirty_pt_masked(kvm, memslot, offset, mask);
}

+/*
+ * This routine does the following things for vCPU which is going
+ * to be blocked if VT-d PI is enabled.
+ * - Store the vCPU to the wakeup list, so when interrupts happen
+ * we can find the right vCPU to wake up.
+ * - Change the Posted-interrupt descriptor as below:
+ * 'NDST' <-- vcpu->pre_pcpu
+ * 'NV' <-- POSTED_INTR_WAKEUP_VECTOR
+ * - If 'ON' is set during this process, which means at least one
+ * interrupt is posted for this vCPU, we cannot block it, in
+ * this case, return 1, otherwise, return 0.
+ *
+ */
+static int vmx_pi_pre_block(struct kvm_vcpu *vcpu)
+{
+ unsigned long flags;
+ unsigned int dest;
+ struct pi_desc old, new;
+ struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
+
+ if (!irq_remapping_cap(IRQ_POSTING_CAP))
+ return 0;
+
+ vcpu->pre_pcpu = vcpu->cpu;
+ spin_lock_irqsave(&per_cpu(blocked_vcpu_on_cpu_lock,
+ vcpu->pre_pcpu), flags);
+ list_add_tail(&vcpu->blocked_vcpu_list,
+ &per_cpu(blocked_vcpu_on_cpu,
+ vcpu->pre_pcpu));
+ spin_unlock_irqrestore(&per_cpu(blocked_vcpu_on_cpu_lock,
+ vcpu->pre_pcpu), flags);
+
+ do {
+ old.control = new.control = pi_desc->control;
+
+ /*
+ * We should not block the vCPU if
+ * an interrupt is posted for it.
+ */
+ if (pi_test_on(pi_desc) == 1) {
+ spin_lock_irqsave(&per_cpu(blocked_vcpu_on_cpu_lock,
+ vcpu->pre_pcpu), flags);
+ list_del(&vcpu->blocked_vcpu_list);
+ spin_unlock_irqrestore(
+ &per_cpu(blocked_vcpu_on_cpu_lock,
+ vcpu->pre_pcpu), flags);
+ vcpu->pre_pcpu = -1;
+
+ return 1;
+ }
+
+ WARN((pi_desc->sn == 1),
+ "Warning: SN field of posted-interrupts "
+ "is set before blocking\n");
+
+ /*
+ * Since vCPU can be preempted during this process,
+ * vcpu->cpu could be different with pre_pcpu, we
+ * need to set pre_pcpu as the destination of wakeup
+ * notification event, then we can find the right vCPU
+ * to wakeup in wakeup handler if interrupts happen
+ * when the vCPU is in blocked state.
+ */
+ dest = cpu_physical_id(vcpu->pre_pcpu);
+
+ if (x2apic_enabled())
+ new.ndst = dest;
+ else
+ new.ndst = (dest << 8) & 0xFF00;
+
+ /* set 'NV' to 'wakeup vector' */
+ new.nv = POSTED_INTR_WAKEUP_VECTOR;
+ } while (cmpxchg(&pi_desc->control, old.control,
+ new.control) != old.control);
+
+ return 0;
+}
+
+static void vmx_pi_post_block(struct kvm_vcpu *vcpu)
+{
+ struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
+ struct pi_desc old, new;
+ unsigned int dest;
+ unsigned long flags;
+
+ if (!irq_remapping_cap(IRQ_POSTING_CAP))
+ return;
+
+ do {
+ old.control = new.control = pi_desc->control;
+
+ dest = cpu_physical_id(vcpu->cpu);
+
+ if (x2apic_enabled())
+ new.ndst = dest;
+ else
+ new.ndst = (dest << 8) & 0xFF00;
+
+ /* Allow posting non-urgent interrupts */
+ new.sn = 0;
+
+ /* set 'NV' to 'notification vector' */
+ new.nv = POSTED_INTR_VECTOR;
+ } while (cmpxchg(&pi_desc->control, old.control,
+ new.control) != old.control);
+
+ if(vcpu->pre_pcpu != -1) {
+ spin_lock_irqsave(
+ &per_cpu(blocked_vcpu_on_cpu_lock,
+ vcpu->pre_pcpu), flags);
+ list_del(&vcpu->blocked_vcpu_list);
+ spin_unlock_irqrestore(
+ &per_cpu(blocked_vcpu_on_cpu_lock,
+ vcpu->pre_pcpu), flags);
+ vcpu->pre_pcpu = -1;
+ }
+}
+
static struct kvm_x86_ops vmx_x86_ops = {
.cpu_has_kvm_support = cpu_has_kvm_support,
.disabled_by_bios = vmx_disabled_by_bios,
@@ -10535,6 +10683,9 @@ static struct kvm_x86_ops vmx_x86_ops = {
.pi_clear_sn = vmx_pi_clear_sn,
.pi_set_sn = vmx_pi_set_sn,

+ .pi_pre_block = vmx_pi_pre_block,
+ .pi_post_block = vmx_pi_post_block,
+
.pmu_ops = &intel_pmu_ops,
};

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ef93fdc..fc7f222 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5869,7 +5869,13 @@ int kvm_vcpu_halt(struct kvm_vcpu *vcpu)
{
++vcpu->stat.halt_exits;
if (irqchip_in_kernel(vcpu->kvm)) {
- vcpu->arch.mp_state = KVM_MP_STATE_HALTED;
+ /* Handle posted-interrupt when vCPU is to be halted */
+ if (!kvm_x86_ops->pi_pre_block ||
+ (kvm_x86_ops->pi_pre_block &&
+ kvm_x86_ops->pi_pre_block(vcpu) == 0)) {
+ vcpu->arch.halted = true;
+ vcpu->arch.mp_state = KVM_MP_STATE_HALTED;
+ }
return 1;
} else {
vcpu->run->exit_reason = KVM_EXIT_HLT;
@@ -6518,6 +6524,21 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
kvm_vcpu_reload_apic_access_page(vcpu);
}

+ /*
+ * Since posted-interrupts can be set by VT-d HW now, in this
+ * case, KVM_REQ_EVENT is not set. We move the following
+ * operations out of the if statement.
+ */
+ if (kvm_lapic_enabled(vcpu)) {
+ /*
+ * Update architecture specific hints for APIC
+ * virtual interrupt delivery.
+ */
+ if (kvm_x86_ops->hwapic_irr_update)
+ kvm_x86_ops->hwapic_irr_update(vcpu,
+ kvm_lapic_find_highest_irr(vcpu));
+ }
+
if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) {
kvm_apic_accept_events(vcpu);
if (vcpu->arch.mp_state == KVM_MP_STATE_INIT_RECEIVED) {
@@ -6534,13 +6555,6 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
kvm_x86_ops->enable_irq_window(vcpu);

if (kvm_lapic_enabled(vcpu)) {
- /*
- * Update architecture specific hints for APIC
- * virtual interrupt delivery.
- */
- if (kvm_x86_ops->hwapic_irr_update)
- kvm_x86_ops->hwapic_irr_update(vcpu,
- kvm_lapic_find_highest_irr(vcpu));
update_cr8_intercept(vcpu);
kvm_lapic_sync_to_vapic(vcpu);
}
@@ -6711,10 +6725,31 @@ static int vcpu_run(struct kvm_vcpu *vcpu)

for (;;) {
if (vcpu->arch.mp_state == KVM_MP_STATE_RUNNABLE &&
- !vcpu->arch.apf.halted)
+ !vcpu->arch.apf.halted) {
+ /*
+ * For some cases, we can get here with
+ * vcpu->arch.halted being true.
+ */
+ if (kvm_x86_ops->pi_post_block && vcpu->arch.halted) {
+ kvm_x86_ops->pi_post_block(vcpu);
+ vcpu->arch.halted = false;
+ }
+
r = vcpu_enter_guest(vcpu);
- else
+ } else {
r = vcpu_block(kvm, vcpu);
+
+ /*
+ * pi_post_block() must be called after
+ * pi_pre_block() which is called in
+ * kvm_vcpu_halt().
+ */
+ if (kvm_x86_ops->pi_post_block && vcpu->arch.halted) {
+ kvm_x86_ops->pi_post_block(vcpu);
+ vcpu->arch.halted = false;
+ }
+ }
+
if (r <= 0)
break;

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index f4005dc..6aa69f4 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -233,6 +233,9 @@ struct kvm_vcpu {
unsigned long requests;
unsigned long guest_debug;

+ int pre_pcpu;
+ struct list_head blocked_vcpu_list;
+
struct mutex mutex;
struct kvm_run *run;

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 8b8a444..191c7eb 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -220,6 +220,9 @@ int kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, unsigned id)
init_waitqueue_head(&vcpu->wq);
kvm_async_pf_vcpu_init(vcpu);

+ vcpu->pre_pcpu = -1;
+ INIT_LIST_HEAD(&vcpu->blocked_vcpu_list);
+
page = alloc_page(GFP_KERNEL | __GFP_ZERO);
if (!page) {
r = -ENOMEM;
--
2.1.0

2015-08-25 09:05:11

by Wu, Feng

[permalink] [raw]
Subject: [PATCH v7 16/17] KVM: Warn if 'SN' is set during posting interrupts by software

Currently, we don't support urgent interrupt, all interrupts
are recognized as non-urgent interrupt, so we cannot post
interrupts when 'SN' is set.

If the vcpu is in guest mode, it cannot have been scheduled out,
and that's the only case when SN is set currently, warning if
SN is set.

Signed-off-by: Feng Wu <[email protected]>
---
arch/x86/kvm/vmx.c | 16 ++++++++++++++++
1 file changed, 16 insertions(+)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 64e35ea..eb640a1 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -4494,6 +4494,22 @@ static inline bool kvm_vcpu_trigger_posted_interrupt(struct kvm_vcpu *vcpu)
{
#ifdef CONFIG_SMP
if (vcpu->mode == IN_GUEST_MODE) {
+ struct vcpu_vmx *vmx = to_vmx(vcpu);
+
+ /*
+ * Currently, we don't support urgent interrupt,
+ * all interrupts are recognized as non-urgent
+ * interrupt, so we cannot post interrupts when
+ * 'SN' is set.
+ *
+ * If the vcpu is in guest mode, it means it is
+ * running instead of being scheduled out and
+ * waiting in the run queue, and that's the only
+ * case when 'SN' is set currently, warning if
+ * 'SN' is set.
+ */
+ WARN_ON_ONCE(pi_test_sn(&vmx->pi_desc));
+
apic->send_IPI_mask(get_cpu_mask(vcpu->cpu),
POSTED_INTR_VECTOR);
return true;
--
2.1.0

2015-08-25 09:04:24

by Wu, Feng

[permalink] [raw]
Subject: [PATCH v7 17/17] iommu/vt-d: Add a command line parameter for VT-d posted-interrupts

Enable VT-d Posted-Interrtups and add a command line
parameter for it.

Signed-off-by: Feng Wu <[email protected]>
---
Documentation/kernel-parameters.txt | 1 +
drivers/iommu/irq_remapping.c | 12 ++++++++----
2 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 1d6f045..52aca36 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -1547,6 +1547,7 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
nosid disable Source ID checking
no_x2apic_optout
BIOS x2APIC opt-out request will be ignored
+ nopost disable Interrupt Posting

iomem= Disable strict checking of access to MMIO memory
strict regions from userspace.
diff --git a/drivers/iommu/irq_remapping.c b/drivers/iommu/irq_remapping.c
index 2d99930..d8c3997 100644
--- a/drivers/iommu/irq_remapping.c
+++ b/drivers/iommu/irq_remapping.c
@@ -22,7 +22,7 @@ int irq_remap_broken;
int disable_sourceid_checking;
int no_x2apic_optout;

-int disable_irq_post = 1;
+int disable_irq_post = 0;

static int disable_irq_remap;
static struct irq_remap_ops *remap_ops;
@@ -58,14 +58,18 @@ static __init int setup_irqremap(char *str)
return -EINVAL;

while (*str) {
- if (!strncmp(str, "on", 2))
+ if (!strncmp(str, "on", 2)) {
disable_irq_remap = 0;
- else if (!strncmp(str, "off", 3))
+ disable_irq_post = 0;
+ } else if (!strncmp(str, "off", 3)) {
disable_irq_remap = 1;
- else if (!strncmp(str, "nosid", 5))
+ disable_irq_post = 1;
+ } else if (!strncmp(str, "nosid", 5))
disable_sourceid_checking = 1;
else if (!strncmp(str, "no_x2apic_optout", 16))
no_x2apic_optout = 1;
+ else if (!strncmp(str, "nopost", 6))
+ disable_irq_post = 1;

str += strcspn(str, ",");
while (*str == ',')
--
2.1.0

2015-08-25 19:57:48

by Alex Williamson

[permalink] [raw]
Subject: Re: [PATCH v7 10/17] KVM: x86: Update IRTE for posted-interrupts

On Tue, 2015-08-25 at 16:50 +0800, Feng Wu wrote:
> This patch adds the routine to update IRTE for posted-interrupts
> when guest changes the interrupt configuration.
>
> Signed-off-by: Feng Wu <[email protected]>
> ---
> arch/x86/kvm/x86.c | 73 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 73 insertions(+)
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 5ef2560..8f09a76 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -63,6 +63,7 @@
> #include <asm/fpu/internal.h> /* Ugh! */
> #include <asm/pvclock.h>
> #include <asm/div64.h>
> +#include <asm/irq_remapping.h>
>
> #define MAX_IO_MSRS 256
> #define KVM_MAX_MCE_BANKS 32
> @@ -8248,6 +8249,78 @@ bool kvm_arch_has_noncoherent_dma(struct kvm *kvm)
> }
> EXPORT_SYMBOL_GPL(kvm_arch_has_noncoherent_dma);
>
> +/*
> + * kvm_arch_update_pi_irte - set IRTE for Posted-Interrupts
> + *
> + * @kvm: kvm
> + * @host_irq: host irq of the interrupt
> + * @guest_irq: gsi of the interrupt
> + * @set: set or unset PI
> + * returns 0 on success, < 0 on failure
> + */
> +int kvm_arch_update_pi_irte(struct kvm *kvm, unsigned int host_irq,
> + uint32_t guest_irq, bool set)
> +{
> + struct kvm_kernel_irq_routing_entry *e;
> + struct kvm_irq_routing_table *irq_rt;
> + struct kvm_lapic_irq irq;
> + struct kvm_vcpu *vcpu;
> + struct vcpu_data vcpu_info;
> + int idx, ret = -EINVAL;
> +
> + if (!irq_remapping_cap(IRQ_POSTING_CAP))
> + return 0;
> +
> + idx = srcu_read_lock(&kvm->irq_srcu);
> + irq_rt = srcu_dereference(kvm->irq_routing, &kvm->irq_srcu);
> + BUG_ON(guest_irq >= irq_rt->nr_rt_entries);
> +
> + hlist_for_each_entry(e, &irq_rt->map[guest_irq], link) {
> + if (e->type != KVM_IRQ_ROUTING_MSI)
> + continue;
> + /*
> + * VT-d PI cannot support posting multicast/broadcast
> + * interrupts to a VCPU, we still use interrupt remapping
> + * for these kind of interrupts.
> + *
> + * For lowest-priority interrupts, we only support
> + * those with single CPU as the destination, e.g. user
> + * configures the interrupts via /proc/irq or uses
> + * irqbalance to make the interrupts single-CPU.
> + *
> + * We will support full lowest-priority interrupt later.
> + *
> + */
> +
> + kvm_set_msi_irq(e, &irq);
> + if (!kvm_intr_is_single_vcpu(kvm, &irq, &vcpu))
> + continue;
> +
> + vcpu_info.pi_desc_addr = kvm_x86_ops->get_pi_desc_addr(vcpu);
> + vcpu_info.vector = irq.vector;
> +
> + if (set)
> + ret = irq_set_vcpu_affinity(host_irq, &vcpu_info);
> + else {
> + /* suppress notification event before unposting */
> + kvm_x86_ops->pi_set_sn(vcpu);
> + ret = irq_set_vcpu_affinity(host_irq, NULL);
> + kvm_x86_ops->pi_clear_sn(vcpu);
> + }

Can we add trace events so that we have a way to tell when PI is being
enabled/disabled other than performance heuristics? Thanks,

Alex

> +
> + if (ret < 0) {
> + printk(KERN_INFO "%s: failed to update PI IRTE\n",
> + __func__);
> + goto out;
> + }
> + }
> +
> + ret = 0;
> +out:
> + srcu_read_unlock(&kvm->irq_srcu, idx);
> + return ret;
> +}
> +
> EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_exit);
> EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_inj_virq);
> EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_page_fault);


2015-08-26 00:36:20

by Wu, Feng

[permalink] [raw]
Subject: RE: [PATCH v7 10/17] KVM: x86: Update IRTE for posted-interrupts



> -----Original Message-----
> From: Alex Williamson [mailto:[email protected]]
> Sent: Wednesday, August 26, 2015 3:58 AM
> To: Wu, Feng
> Cc: [email protected]; [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]; [email protected]
> Subject: Re: [PATCH v7 10/17] KVM: x86: Update IRTE for posted-interrupts
>
> On Tue, 2015-08-25 at 16:50 +0800, Feng Wu wrote:
> > This patch adds the routine to update IRTE for posted-interrupts
> > when guest changes the interrupt configuration.
> >
> > Signed-off-by: Feng Wu <[email protected]>
> > ---
> > arch/x86/kvm/x86.c | 73
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > 1 file changed, 73 insertions(+)
> > + kvm_set_msi_irq(e, &irq);
> > + if (!kvm_intr_is_single_vcpu(kvm, &irq, &vcpu))
> > + continue;
> > +
> > + vcpu_info.pi_desc_addr = kvm_x86_ops->get_pi_desc_addr(vcpu);
> > + vcpu_info.vector = irq.vector;
> > +
> > + if (set)
> > + ret = irq_set_vcpu_affinity(host_irq, &vcpu_info);
> > + else {
> > + /* suppress notification event before unposting */
> > + kvm_x86_ops->pi_set_sn(vcpu);
> > + ret = irq_set_vcpu_affinity(host_irq, NULL);
> > + kvm_x86_ops->pi_clear_sn(vcpu);
> > + }
>
> Can we add trace events so that we have a way to tell when PI is being
> enabled/disabled other than performance heuristics? Thanks,

Sure, I will add it.

Thanks,
Feng

>
> Alex
> >
>

????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?