2015-07-13 09:58:34

by Wu, Feng

[permalink] [raw]
Subject: [v5 00/19] Add VT-d Posted-Interrupts support

VT-d Posted-Interrupts is an enhancement to CPU side Posted-Interrupt.
With VT-d Posted-Interrupts enabled, external interrupts from
direct-assigned devices can be delivered to guests without VMM
intervention when guest is running in non-root mode.

You can find the VT-d Posted-Interrtups Spec. in the following URL:
http://www.intel.com/content/www/us/en/intelligent-systems/intel-technology/vt-directed-io-spec.html

This series was part of http://thread.gmane.org/gmane.linux.kernel.iommu/7708. To make things clear, send out IOMMU part here.

This patch-set is based on the lastest x86/apic branch of tip tree.

Divide the whole series which contain multiple components into three parts:
- Prerequisite changes to irq subsystem (already merged)
- IOMMU part (already merged)
- KVM and VFIO parts (this series)

v5:
- Based on Alex and Eric's irq bypass manager:
https://lkml.org/lkml/2015/7/10/663
- Reuse some common patch from Eric

Eric Auger (3):
KVM: create kvm_irqfd.h
KVM: eventfd: add irq bypass information in irqfd
KVM: eventfd: add irq bypass consumer management

Feng Wu (16):
KVM: Extend struct pi_desc for VT-d Posted-Interrupts
KVM: Add some helper functions for Posted-Interrupts
KVM: Define a new interface kvm_intr_is_single_vcpu()
KVM: Get Posted-Interrupts descriptor address from struct kvm_vcpu
KVM: Add interfaces to control PI outside vmx
KVM: Make struct kvm_irq_routing_table accessible
KVM: make kvm_set_msi_irq() public
vfio: Select IRQ_BYPASS_MANAGER for vfio PCI devices
vfio: Register/unregister irq_bypass_producer
KVM, x86: Select IRQ_BYPASS_MANAGER for KVM_INTEL
KVM: x86: Update IRTE for posted-interrupts
KVM: x86: Add arch specific routines for irqbypass manager
KVM: Add an arch specific hooks in 'struct kvm_kernel_irqfd'
KVM: Update Posted-Interrupts Descriptor when vCPU is preempted
KVM: Update Posted-Interrupts Descriptor when vCPU is blocked
KVM: Warn if 'SN' is set during posting interrupts by software

arch/x86/include/asm/kvm_host.h | 15 ++
arch/x86/kvm/Kconfig | 1 +
arch/x86/kvm/irq_comm.c | 28 +++-
arch/x86/kvm/vmx.c | 278 +++++++++++++++++++++++++++++++++++-
arch/x86/kvm/x86.c | 160 +++++++++++++++++++--
drivers/vfio/pci/Kconfig | 1 +
drivers/vfio/pci/vfio_pci_intrs.c | 19 +++
drivers/vfio/pci/vfio_pci_private.h | 2 +
include/linux/kvm_host.h | 23 +++
include/linux/kvm_irqfd.h | 74 ++++++++++
virt/kvm/eventfd.c | 115 ++++++---------
virt/kvm/irqchip.c | 11 --
virt/kvm/kvm_main.c | 3 +
13 files changed, 632 insertions(+), 98 deletions(-)
create mode 100644 include/linux/kvm_irqfd.h

--
2.1.0


2015-07-13 09:58:37

by Wu, Feng

[permalink] [raw]
Subject: [v5 01/19] KVM: Extend struct pi_desc for VT-d Posted-Interrupts

Extend struct pi_desc for VT-d Posted-Interrupts.

Signed-off-by: Feng Wu <[email protected]>
---
arch/x86/kvm/vmx.c | 20 ++++++++++++++++++--
1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index e11dd59..765539e 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -445,8 +445,24 @@ struct nested_vmx {
/* Posted-Interrupt Descriptor */
struct pi_desc {
u32 pir[8]; /* Posted interrupt requested */
- u32 control; /* bit 0 of control is outstanding notification bit */
- u32 rsvd[7];
+ union {
+ struct {
+ /* bit 256 - Outstanding Notification */
+ u16 on : 1,
+ /* bit 257 - Suppress Notification */
+ sn : 1,
+ /* bit 271:258 - Reserved */
+ rsvd_1 : 14;
+ /* bit 279:272 - Notification Vector */
+ u8 nv;
+ /* bit 287:280 - Reserved */
+ u8 rsvd_2;
+ /* bit 319:288 - Notification Destination */
+ u32 ndst;
+ };
+ u64 control;
+ };
+ u32 rsvd[6];
} __aligned(64);

static bool pi_test_and_set_on(struct pi_desc *pi_desc)
--
2.1.0

2015-07-13 10:04:36

by Wu, Feng

[permalink] [raw]
Subject: [v5 02/19] KVM: Add some helper functions for Posted-Interrupts

This patch adds some helper functions to manipulate the
Posted-Interrupts Descriptor.

Signed-off-by: Feng Wu <[email protected]>
---
arch/x86/kvm/vmx.c | 26 ++++++++++++++++++++++++++
1 file changed, 26 insertions(+)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 765539e..1e815b6 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -442,6 +442,8 @@ struct nested_vmx {
};

#define POSTED_INTR_ON 0
+#define POSTED_INTR_SN 1
+
/* Posted-Interrupt Descriptor */
struct pi_desc {
u32 pir[8]; /* Posted interrupt requested */
@@ -482,6 +484,30 @@ static int pi_test_and_set_pir(int vector, struct pi_desc *pi_desc)
return test_and_set_bit(vector, (unsigned long *)pi_desc->pir);
}

+static void pi_clear_sn(struct pi_desc *pi_desc)
+{
+ return clear_bit(POSTED_INTR_SN,
+ (unsigned long *)&pi_desc->control);
+}
+
+static void pi_set_sn(struct pi_desc *pi_desc)
+{
+ return set_bit(POSTED_INTR_SN,
+ (unsigned long *)&pi_desc->control);
+}
+
+static int pi_test_on(struct pi_desc *pi_desc)
+{
+ return test_bit(POSTED_INTR_ON,
+ (unsigned long *)&pi_desc->control);
+}
+
+static int pi_test_sn(struct pi_desc *pi_desc)
+{
+ return test_bit(POSTED_INTR_SN,
+ (unsigned long *)&pi_desc->control);
+}
+
struct vcpu_vmx {
struct kvm_vcpu vcpu;
unsigned long host_rsp;
--
2.1.0

2015-07-13 09:58:43

by Wu, Feng

[permalink] [raw]
Subject: [v5 03/19] KVM: Define a new interface kvm_intr_is_single_vcpu()

This patch defines a new interface kvm_intr_is_single_vcpu(),
which can returns whether the interrupt is for single-CPU or not.

It is used by VT-d PI, since now we only support single-CPU
interrupts, For lowest-priority interrupts, if user configures
it via /proc/irq or uses irqbalance to make it single-CPU, we
can use PI to deliver the interrupts to it. Full functionality
of lowest-priority support will be added later.

Signed-off-by: Feng Wu <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 2 ++
arch/x86/kvm/irq_comm.c | 24 ++++++++++++++++++++++++
2 files changed, 26 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index f8c0ec3..b8832e5 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1180,4 +1180,6 @@ int kvm_pmu_read_pmc(struct kvm_vcpu *vcpu, unsigned pmc, u64 *data);
void kvm_handle_pmu_event(struct kvm_vcpu *vcpu);
void kvm_deliver_pmi(struct kvm_vcpu *vcpu);

+bool kvm_intr_is_single_vcpu(struct kvm *kvm, struct kvm_lapic_irq *irq,
+ struct kvm_vcpu **dest_vcpu);
#endif /* _ASM_X86_KVM_HOST_H */
diff --git a/arch/x86/kvm/irq_comm.c b/arch/x86/kvm/irq_comm.c
index 72298b3..9e42645 100644
--- a/arch/x86/kvm/irq_comm.c
+++ b/arch/x86/kvm/irq_comm.c
@@ -299,6 +299,30 @@ out:
return r;
}

+bool kvm_intr_is_single_vcpu(struct kvm *kvm, struct kvm_lapic_irq *irq,
+ struct kvm_vcpu **dest_vcpu)
+{
+ int i, r = 0;
+ struct kvm_vcpu *vcpu;
+
+ kvm_for_each_vcpu(i, vcpu, kvm) {
+ if (!kvm_apic_present(vcpu))
+ continue;
+
+ if (!kvm_apic_match_dest(vcpu, NULL, irq->shorthand,
+ irq->dest_id, irq->dest_mode))
+ continue;
+
+ r++;
+ *dest_vcpu = vcpu;
+ }
+
+ if (r == 1)
+ return true;
+ else
+ return false;
+}
+
#define IOAPIC_ROUTING_ENTRY(irq) \
{ .gsi = irq, .type = KVM_IRQ_ROUTING_IRQCHIP, \
.u.irqchip = { .irqchip = KVM_IRQCHIP_IOAPIC, .pin = (irq) } }
--
2.1.0

2015-07-13 09:58:44

by Wu, Feng

[permalink] [raw]
Subject: [v5 04/19] KVM: Get Posted-Interrupts descriptor address from struct kvm_vcpu

Define an interface to get PI descriptor address from the vCPU structure.

Signed-off-by: Feng Wu <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 2 ++
arch/x86/kvm/vmx.c | 11 +++++++++++
2 files changed, 13 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index b8832e5..9df0724 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -836,6 +836,8 @@ struct kvm_x86_ops {
void (*enable_log_dirty_pt_masked)(struct kvm *kvm,
struct kvm_memory_slot *slot,
gfn_t offset, unsigned long mask);
+
+ u64 (*get_pi_desc_addr)(struct kvm_vcpu *vcpu);
};

struct kvm_arch_async_pf {
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 1e815b6..1b33e33 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -609,6 +609,10 @@ static inline struct vcpu_vmx *to_vmx(struct kvm_vcpu *vcpu)
#define FIELD64(number, name) [number] = VMCS12_OFFSET(name), \
[number##_HIGH] = VMCS12_OFFSET(name)+4

+struct pi_desc *vcpu_to_pi_desc(struct kvm_vcpu *vcpu)
+{
+ return &(to_vmx(vcpu)->pi_desc);
+}

static unsigned long shadow_read_only_fields[] = {
/*
@@ -4494,6 +4498,11 @@ static void vmx_sync_pir_to_irr_dummy(struct kvm_vcpu *vcpu)
return;
}

+static u64 vmx_get_pi_desc_addr(struct kvm_vcpu *vcpu)
+{
+ return __pa((u64)vcpu_to_pi_desc(vcpu));
+}
+
/*
* Set up the vmcs's constant host-state fields, i.e., host-state fields that
* will not change in the lifetime of the guest.
@@ -10296,6 +10305,8 @@ static struct kvm_x86_ops vmx_x86_ops = {
.slot_disable_log_dirty = vmx_slot_disable_log_dirty,
.flush_log_dirty = vmx_flush_log_dirty,
.enable_log_dirty_pt_masked = vmx_enable_log_dirty_pt_masked,
+
+ .get_pi_desc_addr = vmx_get_pi_desc_addr,
};

static int __init vmx_init(void)
--
2.1.0

2015-07-13 10:03:51

by Wu, Feng

[permalink] [raw]
Subject: [v5 05/19] KVM: Add interfaces to control PI outside vmx

This patch adds pi_clear_sn and pi_set_sn to struct kvm_x86_ops,
so we can set/clear SN outside vmx.

Signed-off-by: Feng Wu <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 3 +++
arch/x86/kvm/vmx.c | 13 +++++++++++++
2 files changed, 16 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 9df0724..739fd14 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -838,6 +838,9 @@ struct kvm_x86_ops {
gfn_t offset, unsigned long mask);

u64 (*get_pi_desc_addr)(struct kvm_vcpu *vcpu);
+
+ void (*pi_clear_sn)(struct kvm_vcpu *vcpu);
+ void (*pi_set_sn)(struct kvm_vcpu *vcpu);
};

struct kvm_arch_async_pf {
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 1b33e33..35ef4c6 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -614,6 +614,16 @@ struct pi_desc *vcpu_to_pi_desc(struct kvm_vcpu *vcpu)
return &(to_vmx(vcpu)->pi_desc);
}

+static void vmx_pi_clear_sn(struct kvm_vcpu *vcpu)
+{
+ pi_clear_sn(vcpu_to_pi_desc(vcpu));
+}
+
+static void vmx_pi_set_sn(struct kvm_vcpu *vcpu)
+{
+ pi_set_sn(vcpu_to_pi_desc(vcpu));
+}
+
static unsigned long shadow_read_only_fields[] = {
/*
* We do NOT shadow fields that are modified when L0
@@ -10307,6 +10317,9 @@ static struct kvm_x86_ops vmx_x86_ops = {
.enable_log_dirty_pt_masked = vmx_enable_log_dirty_pt_masked,

.get_pi_desc_addr = vmx_get_pi_desc_addr,
+
+ .pi_clear_sn = vmx_pi_clear_sn,
+ .pi_set_sn = vmx_pi_set_sn,
};

static int __init vmx_init(void)
--
2.1.0

2015-07-13 09:58:47

by Wu, Feng

[permalink] [raw]
Subject: [v5 06/19] KVM: Make struct kvm_irq_routing_table accessible

Move struct kvm_irq_routing_table from irqchip.c to kvm_host.h,
so we can use it outside of irqchip.c.

Signed-off-by: Feng Wu <[email protected]>
---
include/linux/kvm_host.h | 15 +++++++++++++++
virt/kvm/irqchip.c | 11 -----------
2 files changed, 15 insertions(+), 11 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index ad45054..f591f7c 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -321,6 +321,21 @@ struct kvm_kernel_irq_routing_entry {
struct hlist_node link;
};

+#ifdef CONFIG_HAVE_KVM_IRQ_ROUTING
+
+struct kvm_irq_routing_table {
+ int chip[KVM_NR_IRQCHIPS][KVM_IRQCHIP_NUM_PINS];
+ struct kvm_kernel_irq_routing_entry *rt_entries;
+ u32 nr_rt_entries;
+ /*
+ * Array indexed by gsi. Each entry contains list of irq chips
+ * the gsi is connected to.
+ */
+ struct hlist_head map[0];
+};
+
+#endif
+
#ifndef KVM_PRIVATE_MEM_SLOTS
#define KVM_PRIVATE_MEM_SLOTS 0
#endif
diff --git a/virt/kvm/irqchip.c b/virt/kvm/irqchip.c
index 1d56a90..bac3b52 100644
--- a/virt/kvm/irqchip.c
+++ b/virt/kvm/irqchip.c
@@ -31,17 +31,6 @@
#include <trace/events/kvm.h>
#include "irq.h"

-struct kvm_irq_routing_table {
- int chip[KVM_NR_IRQCHIPS][KVM_IRQCHIP_NUM_PINS];
- struct kvm_kernel_irq_routing_entry *rt_entries;
- u32 nr_rt_entries;
- /*
- * Array indexed by gsi. Each entry contains list of irq chips
- * the gsi is connected to.
- */
- struct hlist_head map[0];
-};
-
int kvm_irq_map_gsi(struct kvm *kvm,
struct kvm_kernel_irq_routing_entry *entries, int gsi)
{
--
2.1.0

2015-07-13 09:58:50

by Wu, Feng

[permalink] [raw]
Subject: [v5 07/19] KVM: make kvm_set_msi_irq() public

Make kvm_set_msi_irq() public, we can use this function outside.

Signed-off-by: Feng Wu <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 4 ++++
arch/x86/kvm/irq_comm.c | 4 ++--
2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 739fd14..1b0278e 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -175,6 +175,8 @@ enum {
*/
#define KVM_APIC_PV_EOI_PENDING 1

+struct kvm_kernel_irq_routing_entry;
+
/*
* We don't want allocation failures within the mmu code, so we preallocate
* enough memory for a single page fault in a cache.
@@ -1187,4 +1189,6 @@ void kvm_deliver_pmi(struct kvm_vcpu *vcpu);

bool kvm_intr_is_single_vcpu(struct kvm *kvm, struct kvm_lapic_irq *irq,
struct kvm_vcpu **dest_vcpu);
+void kvm_set_msi_irq(struct kvm_kernel_irq_routing_entry *e,
+ struct kvm_lapic_irq *irq);
#endif /* _ASM_X86_KVM_HOST_H */
diff --git a/arch/x86/kvm/irq_comm.c b/arch/x86/kvm/irq_comm.c
index 9e42645..58d7d49 100644
--- a/arch/x86/kvm/irq_comm.c
+++ b/arch/x86/kvm/irq_comm.c
@@ -94,8 +94,8 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src,
return r;
}

-static inline void kvm_set_msi_irq(struct kvm_kernel_irq_routing_entry *e,
- struct kvm_lapic_irq *irq)
+void kvm_set_msi_irq(struct kvm_kernel_irq_routing_entry *e,
+ struct kvm_lapic_irq *irq)
{
trace_kvm_msi_set_irq(e->msi.address_lo, e->msi.data);

--
2.1.0

2015-07-13 10:03:03

by Wu, Feng

[permalink] [raw]
Subject: [v5 08/19] vfio: Select IRQ_BYPASS_MANAGER for vfio PCI devices

Enable irq bypass manager for vfio PCI devices.

Signed-off-by: Feng Wu <[email protected]>
---
drivers/vfio/pci/Kconfig | 1 +
1 file changed, 1 insertion(+)

diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig
index 579d83b..02912f1 100644
--- a/drivers/vfio/pci/Kconfig
+++ b/drivers/vfio/pci/Kconfig
@@ -2,6 +2,7 @@ config VFIO_PCI
tristate "VFIO support for PCI devices"
depends on VFIO && PCI && EVENTFD
select VFIO_VIRQFD
+ select IRQ_BYPASS_MANAGER
help
Support for the PCI VFIO bus driver. This is required to make
use of PCI drivers using the VFIO framework.
--
2.1.0

2015-07-13 09:58:53

by Wu, Feng

[permalink] [raw]
Subject: [v5 09/19] vfio: Register/unregister irq_bypass_producer

This patch adds the registration/unregistration of an
irq_bypass_producer for MSI/MSIx on vfio pci devices.

Signed-off-by: Feng Wu <[email protected]>
---
drivers/vfio/pci/vfio_pci_intrs.c | 19 +++++++++++++++++++
drivers/vfio/pci/vfio_pci_private.h | 2 ++
2 files changed, 21 insertions(+)

diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
index 1f577b4..4795606 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -305,6 +305,16 @@ static int vfio_msi_enable(struct vfio_pci_device *vdev, int nvec, bool msix)
return 0;
}

+void vfio_pci_add_consumer(struct irq_bypass_producer *prod,
+ struct irq_bypass_consumer *cons)
+{
+}
+
+void vfio_pci_del_consumer(struct irq_bypass_producer *prod,
+ struct irq_bypass_consumer *cons)
+{
+}
+
static int vfio_msi_set_vector_signal(struct vfio_pci_device *vdev,
int vector, int fd, bool msix)
{
@@ -319,6 +329,7 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_device *vdev,

if (vdev->ctx[vector].trigger) {
free_irq(irq, vdev->ctx[vector].trigger);
+ irq_bypass_unregister_producer(&vdev->ctx[vector].producer);
kfree(vdev->ctx[vector].name);
eventfd_ctx_put(vdev->ctx[vector].trigger);
vdev->ctx[vector].trigger = NULL;
@@ -360,6 +371,14 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_device *vdev,
return ret;
}

+ INIT_LIST_HEAD(&vdev->ctx[vector].producer.node);
+ vdev->ctx[vector].producer.token = trigger;
+ vdev->ctx[vector].producer.irq = irq;
+ vdev->ctx[vector].producer.add_consumer = vfio_pci_add_consumer;
+ vdev->ctx[vector].producer.del_consumer = vfio_pci_del_consumer;
+ ret = irq_bypass_register_producer(&vdev->ctx[vector].producer);
+ WARN_ON(ret);
+
vdev->ctx[vector].trigger = trigger;

return 0;
diff --git a/drivers/vfio/pci/vfio_pci_private.h b/drivers/vfio/pci/vfio_pci_private.h
index ae0e1b4..0e7394f 100644
--- a/drivers/vfio/pci/vfio_pci_private.h
+++ b/drivers/vfio/pci/vfio_pci_private.h
@@ -13,6 +13,7 @@

#include <linux/mutex.h>
#include <linux/pci.h>
+#include <linux/irqbypass.h>

#ifndef VFIO_PCI_PRIVATE_H
#define VFIO_PCI_PRIVATE_H
@@ -29,6 +30,7 @@ struct vfio_pci_irq_ctx {
struct virqfd *mask;
char *name;
bool masked;
+ struct irq_bypass_producer producer;
};

struct vfio_pci_device {
--
2.1.0

2015-07-13 09:58:55

by Wu, Feng

[permalink] [raw]
Subject: [v5 10/19] KVM, x86: Select IRQ_BYPASS_MANAGER for KVM_INTEL

Enable irq bypass manager for kvm-intel.

Signed-off-by: Feng Wu <[email protected]>
---
arch/x86/kvm/Kconfig | 1 +
1 file changed, 1 insertion(+)

diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index 921a8f9..be125bc 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -61,6 +61,7 @@ config KVM_INTEL
depends on KVM
# for perf_guest_get_msrs():
depends on CPU_SUP_INTEL
+ select IRQ_BYPASS_MANAGER
---help---
Provides support for KVM on Intel processors equipped with the VT
extensions.
--
2.1.0

2015-07-13 09:58:58

by Wu, Feng

[permalink] [raw]
Subject: [v5 11/19] KVM: create kvm_irqfd.h

From: Eric Auger <[email protected]>

Move _irqfd_resampler and _irqfd struct declarations in a new
public header: kvm_irqfd.h. They are respectively renamed into
kvm_kernel_irqfd_resampler and kvm_kernel_irqfd. Those datatypes
will be used by architecture specific code, in the context of
IRQ bypass manager integration.

Signed-off-by: Eric Auger <[email protected]>
---
include/linux/kvm_irqfd.h | 69 ++++++++++++++++++++++++++++++++++
virt/kvm/eventfd.c | 95 ++++++++++++-----------------------------------
2 files changed, 92 insertions(+), 72 deletions(-)
create mode 100644 include/linux/kvm_irqfd.h

diff --git a/include/linux/kvm_irqfd.h b/include/linux/kvm_irqfd.h
new file mode 100644
index 0000000..f926b39
--- /dev/null
+++ b/include/linux/kvm_irqfd.h
@@ -0,0 +1,69 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * irqfd: Allows an fd to be used to inject an interrupt to the guest
+ * Credit goes to Avi Kivity for the original idea.
+ */
+
+#ifndef __LINUX_KVM_IRQFD_H
+#define __LINUX_KVM_IRQFD_H
+
+#include <linux/kvm_host.h>
+#include <linux/poll.h>
+
+/*
+ * Resampling irqfds are a special variety of irqfds used to emulate
+ * level triggered interrupts. The interrupt is asserted on eventfd
+ * trigger. On acknowledgment through the irq ack notifier, the
+ * interrupt is de-asserted and userspace is notified through the
+ * resamplefd. All resamplers on the same gsi are de-asserted
+ * together, so we don't need to track the state of each individual
+ * user. We can also therefore share the same irq source ID.
+ */
+struct kvm_kernel_irqfd_resampler {
+ struct kvm *kvm;
+ /*
+ * List of resampling struct _irqfd objects sharing this gsi.
+ * RCU list modified under kvm->irqfds.resampler_lock
+ */
+ struct list_head list;
+ struct kvm_irq_ack_notifier notifier;
+ /*
+ * Entry in list of kvm->irqfd.resampler_list. Use for sharing
+ * resamplers among irqfds on the same gsi.
+ * Accessed and modified under kvm->irqfds.resampler_lock
+ */
+ struct list_head link;
+};
+
+struct kvm_kernel_irqfd {
+ /* Used for MSI fast-path */
+ struct kvm *kvm;
+ wait_queue_t wait;
+ /* Update side is protected by irqfds.lock */
+ struct kvm_kernel_irq_routing_entry irq_entry;
+ seqcount_t irq_entry_sc;
+ /* Used for level IRQ fast-path */
+ int gsi;
+ struct work_struct inject;
+ /* The resampler used by this irqfd (resampler-only) */
+ struct kvm_kernel_irqfd_resampler *resampler;
+ /* Eventfd notified on resample (resampler-only) */
+ struct eventfd_ctx *resamplefd;
+ /* Entry in list of irqfds for a resampler (resampler-only) */
+ struct list_head resampler_link;
+ /* Used for setup/shutdown */
+ struct eventfd_ctx *eventfd;
+ struct list_head list;
+ poll_table pt;
+ struct work_struct shutdown;
+};
+
+#endif /* __LINUX_KVM_IRQFD_H */
diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
index 9ff4193..647ffb8 100644
--- a/virt/kvm/eventfd.c
+++ b/virt/kvm/eventfd.c
@@ -23,6 +23,7 @@

#include <linux/kvm_host.h>
#include <linux/kvm.h>
+#include <linux/kvm_irqfd.h>
#include <linux/workqueue.h>
#include <linux/syscalls.h>
#include <linux/wait.h>
@@ -39,68 +40,14 @@
#include <kvm/iodev.h>

#ifdef CONFIG_HAVE_KVM_IRQFD
-/*
- * --------------------------------------------------------------------
- * irqfd: Allows an fd to be used to inject an interrupt to the guest
- *
- * Credit goes to Avi Kivity for the original idea.
- * --------------------------------------------------------------------
- */
-
-/*
- * Resampling irqfds are a special variety of irqfds used to emulate
- * level triggered interrupts. The interrupt is asserted on eventfd
- * trigger. On acknowledgement through the irq ack notifier, the
- * interrupt is de-asserted and userspace is notified through the
- * resamplefd. All resamplers on the same gsi are de-asserted
- * together, so we don't need to track the state of each individual
- * user. We can also therefore share the same irq source ID.
- */
-struct _irqfd_resampler {
- struct kvm *kvm;
- /*
- * List of resampling struct _irqfd objects sharing this gsi.
- * RCU list modified under kvm->irqfds.resampler_lock
- */
- struct list_head list;
- struct kvm_irq_ack_notifier notifier;
- /*
- * Entry in list of kvm->irqfd.resampler_list. Use for sharing
- * resamplers among irqfds on the same gsi.
- * Accessed and modified under kvm->irqfds.resampler_lock
- */
- struct list_head link;
-};
-
-struct _irqfd {
- /* Used for MSI fast-path */
- struct kvm *kvm;
- wait_queue_t wait;
- /* Update side is protected by irqfds.lock */
- struct kvm_kernel_irq_routing_entry irq_entry;
- seqcount_t irq_entry_sc;
- /* Used for level IRQ fast-path */
- int gsi;
- struct work_struct inject;
- /* The resampler used by this irqfd (resampler-only) */
- struct _irqfd_resampler *resampler;
- /* Eventfd notified on resample (resampler-only) */
- struct eventfd_ctx *resamplefd;
- /* Entry in list of irqfds for a resampler (resampler-only) */
- struct list_head resampler_link;
- /* Used for setup/shutdown */
- struct eventfd_ctx *eventfd;
- struct list_head list;
- poll_table pt;
- struct work_struct shutdown;
-};

static struct workqueue_struct *irqfd_cleanup_wq;

static void
irqfd_inject(struct work_struct *work)
{
- struct _irqfd *irqfd = container_of(work, struct _irqfd, inject);
+ struct kvm_kernel_irqfd *irqfd =
+ container_of(work, struct kvm_kernel_irqfd, inject);
struct kvm *kvm = irqfd->kvm;

if (!irqfd->resampler) {
@@ -121,12 +68,13 @@ irqfd_inject(struct work_struct *work)
static void
irqfd_resampler_ack(struct kvm_irq_ack_notifier *kian)
{
- struct _irqfd_resampler *resampler;
+ struct kvm_kernel_irqfd_resampler *resampler;
struct kvm *kvm;
- struct _irqfd *irqfd;
+ struct kvm_kernel_irqfd *irqfd;
int idx;

- resampler = container_of(kian, struct _irqfd_resampler, notifier);
+ resampler = container_of(kian,
+ struct kvm_kernel_irqfd_resampler, notifier);
kvm = resampler->kvm;

kvm_set_irq(kvm, KVM_IRQFD_RESAMPLE_IRQ_SOURCE_ID,
@@ -141,9 +89,9 @@ irqfd_resampler_ack(struct kvm_irq_ack_notifier *kian)
}

static void
-irqfd_resampler_shutdown(struct _irqfd *irqfd)
+irqfd_resampler_shutdown(struct kvm_kernel_irqfd *irqfd)
{
- struct _irqfd_resampler *resampler = irqfd->resampler;
+ struct kvm_kernel_irqfd_resampler *resampler = irqfd->resampler;
struct kvm *kvm = resampler->kvm;

mutex_lock(&kvm->irqfds.resampler_lock);
@@ -168,7 +116,8 @@ irqfd_resampler_shutdown(struct _irqfd *irqfd)
static void
irqfd_shutdown(struct work_struct *work)
{
- struct _irqfd *irqfd = container_of(work, struct _irqfd, shutdown);
+ struct kvm_kernel_irqfd *irqfd =
+ container_of(work, struct kvm_kernel_irqfd, shutdown);
u64 cnt;

/*
@@ -198,7 +147,7 @@ irqfd_shutdown(struct work_struct *work)

/* assumes kvm->irqfds.lock is held */
static bool
-irqfd_is_active(struct _irqfd *irqfd)
+irqfd_is_active(struct kvm_kernel_irqfd *irqfd)
{
return list_empty(&irqfd->list) ? false : true;
}
@@ -209,7 +158,7 @@ irqfd_is_active(struct _irqfd *irqfd)
* assumes kvm->irqfds.lock is held
*/
static void
-irqfd_deactivate(struct _irqfd *irqfd)
+irqfd_deactivate(struct kvm_kernel_irqfd *irqfd)
{
BUG_ON(!irqfd_is_active(irqfd));

@@ -224,7 +173,8 @@ irqfd_deactivate(struct _irqfd *irqfd)
static int
irqfd_wakeup(wait_queue_t *wait, unsigned mode, int sync, void *key)
{
- struct _irqfd *irqfd = container_of(wait, struct _irqfd, wait);
+ struct kvm_kernel_irqfd *irqfd =
+ container_of(wait, struct kvm_kernel_irqfd, wait);
unsigned long flags = (unsigned long)key;
struct kvm_kernel_irq_routing_entry irq;
struct kvm *kvm = irqfd->kvm;
@@ -274,12 +224,13 @@ static void
irqfd_ptable_queue_proc(struct file *file, wait_queue_head_t *wqh,
poll_table *pt)
{
- struct _irqfd *irqfd = container_of(pt, struct _irqfd, pt);
+ struct kvm_kernel_irqfd *irqfd =
+ container_of(pt, struct kvm_kernel_irqfd, pt);
add_wait_queue(wqh, &irqfd->wait);
}

/* Must be called under irqfds.lock */
-static void irqfd_update(struct kvm *kvm, struct _irqfd *irqfd)
+static void irqfd_update(struct kvm *kvm, struct kvm_kernel_irqfd *irqfd)
{
struct kvm_kernel_irq_routing_entry *e;
struct kvm_kernel_irq_routing_entry entries[KVM_NR_IRQCHIPS];
@@ -304,7 +255,7 @@ static void irqfd_update(struct kvm *kvm, struct _irqfd *irqfd)
static int
kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args)
{
- struct _irqfd *irqfd, *tmp;
+ struct kvm_kernel_irqfd *irqfd, *tmp;
struct fd f;
struct eventfd_ctx *eventfd = NULL, *resamplefd = NULL;
int ret;
@@ -340,7 +291,7 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args)
irqfd->eventfd = eventfd;

if (args->flags & KVM_IRQFD_FLAG_RESAMPLE) {
- struct _irqfd_resampler *resampler;
+ struct kvm_kernel_irqfd_resampler *resampler;

resamplefd = eventfd_ctx_fdget(args->resamplefd);
if (IS_ERR(resamplefd)) {
@@ -525,7 +476,7 @@ kvm_eventfd_init(struct kvm *kvm)
static int
kvm_irqfd_deassign(struct kvm *kvm, struct kvm_irqfd *args)
{
- struct _irqfd *irqfd, *tmp;
+ struct kvm_kernel_irqfd *irqfd, *tmp;
struct eventfd_ctx *eventfd;

eventfd = eventfd_ctx_fdget(args->fd);
@@ -581,7 +532,7 @@ kvm_irqfd(struct kvm *kvm, struct kvm_irqfd *args)
void
kvm_irqfd_release(struct kvm *kvm)
{
- struct _irqfd *irqfd, *tmp;
+ struct kvm_kernel_irqfd *irqfd, *tmp;

spin_lock_irq(&kvm->irqfds.lock);

@@ -604,7 +555,7 @@ kvm_irqfd_release(struct kvm *kvm)
*/
void kvm_irq_routing_update(struct kvm *kvm)
{
- struct _irqfd *irqfd;
+ struct kvm_kernel_irqfd *irqfd;

spin_lock_irq(&kvm->irqfds.lock);

--
2.1.0

2015-07-13 10:01:50

by Wu, Feng

[permalink] [raw]
Subject: [v5 12/19] KVM: eventfd: add irq bypass information in irqfd

From: Eric Auger <[email protected]>

This patch adds the following new members in 'struct kvm_kernel_irqfd'
- struct irq_bypass_consumer consumer
- struct irq_bypass_producer *producer

Signed-off-by: Eric Auger <[email protected]>
Signed-off-by: Feng Wu <[email protected]>
---
include/linux/kvm_irqfd.h | 3 +++
1 file changed, 3 insertions(+)

diff --git a/include/linux/kvm_irqfd.h b/include/linux/kvm_irqfd.h
index f926b39..cf9aad4 100644
--- a/include/linux/kvm_irqfd.h
+++ b/include/linux/kvm_irqfd.h
@@ -17,6 +17,7 @@

#include <linux/kvm_host.h>
#include <linux/poll.h>
+#include <linux/irqbypass.h>

/*
* Resampling irqfds are a special variety of irqfds used to emulate
@@ -64,6 +65,8 @@ struct kvm_kernel_irqfd {
struct list_head list;
poll_table pt;
struct work_struct shutdown;
+ struct irq_bypass_consumer consumer;
+ struct irq_bypass_producer *producer;
};

#endif /* __LINUX_KVM_IRQFD_H */
--
2.1.0

2015-07-13 09:59:03

by Wu, Feng

[permalink] [raw]
Subject: [v5 13/19] KVM: x86: Update IRTE for posted-interrupts

This patch adds the routine to update IRTE for posted-interrupts
when guest changes the interrupt configuration.

Signed-off-by: Feng Wu <[email protected]>
---
arch/x86/kvm/x86.c | 73 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 73 insertions(+)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 26eaeb5..d81ac02 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -63,6 +63,7 @@
#include <asm/fpu/internal.h> /* Ugh! */
#include <asm/pvclock.h>
#include <asm/div64.h>
+#include <asm/irq_remapping.h>

#define MAX_IO_MSRS 256
#define KVM_MAX_MCE_BANKS 32
@@ -7950,6 +7951,78 @@ bool kvm_arch_has_noncoherent_dma(struct kvm *kvm)
}
EXPORT_SYMBOL_GPL(kvm_arch_has_noncoherent_dma);

+/*
+ * kvm_arch_update_pi_irte - set IRTE for Posted-Interrupts
+ *
+ * @kvm: kvm
+ * @host_irq: host irq of the interrupt
+ * @guest_irq: gsi of the interrupt
+ * @set: set or unset PI
+ * returns 0 on success, < 0 on failure
+ */
+int kvm_arch_update_pi_irte(struct kvm *kvm, unsigned int host_irq,
+ uint32_t guest_irq, bool set)
+{
+ struct kvm_kernel_irq_routing_entry *e;
+ struct kvm_irq_routing_table *irq_rt;
+ struct kvm_lapic_irq irq;
+ struct kvm_vcpu *vcpu;
+ struct vcpu_data vcpu_info;
+ int idx, ret = -EINVAL;
+
+ if (!irq_remapping_cap(IRQ_POSTING_CAP))
+ return 0;
+
+ idx = srcu_read_lock(&kvm->irq_srcu);
+ irq_rt = srcu_dereference(kvm->irq_routing, &kvm->irq_srcu);
+ BUG_ON(guest_irq >= irq_rt->nr_rt_entries);
+
+ hlist_for_each_entry(e, &irq_rt->map[guest_irq], link) {
+ if (e->type != KVM_IRQ_ROUTING_MSI)
+ continue;
+ /*
+ * VT-d PI cannot support posting multicast/broadcast
+ * interrupts to a VCPU, we still use interrupt remapping
+ * for these kind of interrupts.
+ *
+ * For lowest-priority interrupts, we only support
+ * those with single CPU as the destination, e.g. user
+ * configures the interrupts via /proc/irq or uses
+ * irqbalance to make the interrupts single-CPU.
+ *
+ * We will support full lowest-priority interrupt later.
+ *
+ */
+
+ kvm_set_msi_irq(e, &irq);
+ if (!kvm_intr_is_single_vcpu(kvm, &irq, &vcpu))
+ continue;
+
+ vcpu_info.pi_desc_addr = kvm_x86_ops->get_pi_desc_addr(vcpu);
+ vcpu_info.vector = irq.vector;
+
+ if (set)
+ ret = irq_set_vcpu_affinity(host_irq, &vcpu_info);
+ else {
+ /* suppress notification event before unposting */
+ kvm_x86_ops->pi_set_sn(vcpu);
+ ret = irq_set_vcpu_affinity(host_irq, NULL);
+ kvm_x86_ops->pi_clear_sn(vcpu);
+ }
+
+ if (ret < 0) {
+ printk(KERN_INFO "%s: failed to update PI IRTE\n",
+ __func__);
+ goto out;
+ }
+ }
+
+ ret = 0;
+out:
+ srcu_read_unlock(&kvm->irq_srcu, idx);
+ return ret;
+}
+
EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_exit);
EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_inj_virq);
EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_page_fault);
--
2.1.0

2015-07-13 10:01:22

by Wu, Feng

[permalink] [raw]
Subject: [v5 14/19] KVM: x86: Add arch specific routines for irqbypass manager

Add the following x86 specific routines for irqbypass manger:

- kvm_arch_irq_bypass_add_producer
- kvm_arch_irq_bypass_del_producer

Signed-off-by: Feng Wu <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/kvm/x86.c | 40 ++++++++++++++++++++++++++++++++++++++++
include/linux/kvm_host.h | 2 ++
3 files changed, 43 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 1b0278e..6db761b 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -24,6 +24,7 @@
#include <linux/perf_event.h>
#include <linux/pvclock_gtod.h>
#include <linux/clocksource.h>
+#include <linux/irqbypass.h>

#include <asm/pvclock-abi.h>
#include <asm/desc.h>
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index d81ac02..62bbafe 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -49,6 +49,8 @@
#include <linux/pci.h>
#include <linux/timekeeper_internal.h>
#include <linux/pvclock_gtod.h>
+#include <linux/kvm_irqfd.h>
+#include <linux/irqbypass.h>
#include <trace/events/kvm.h>

#define CREATE_TRACE_POINTS
@@ -8023,6 +8025,44 @@ out:
return ret;
}

+void kvm_arch_irq_bypass_add_producer(struct irq_bypass_consumer *cons,
+ struct irq_bypass_producer *prod)
+{
+ int ret;
+ struct kvm_kernel_irqfd *irqfd =
+ container_of(cons, struct kvm_kernel_irqfd, consumer);
+
+ irqfd->producer = prod;
+
+ ret = kvm_arch_update_pi_irte(irqfd->kvm, prod->irq, irqfd->gsi, 1);
+ WARN_ON(ret);
+}
+
+void kvm_arch_irq_bypass_del_producer(struct irq_bypass_consumer *cons,
+ struct irq_bypass_producer *prod)
+{
+ int ret;
+ struct kvm_kernel_irqfd *irqfd =
+ container_of(cons, struct kvm_kernel_irqfd, consumer);
+
+ irqfd->producer = NULL;
+
+ /*
+ * When producer of consumer is unregistered, we change back to
+ * remapped mode, so we can re-use the current implementation
+ * when the irq is masked/disabed or the consumer side (KVM
+ * int this case doesn't want to receive the interrupts.
+ */
+ ret = kvm_arch_update_pi_irte(irqfd->kvm, prod->irq, irqfd->gsi, 0);
+ WARN_ON(ret);
+}
+
+void kvm_arch_irq_consumer_init(struct irq_bypass_consumer *cons)
+{
+ cons->add_producer = kvm_arch_irq_bypass_add_producer;
+ cons->del_producer = kvm_arch_irq_bypass_del_producer;
+}
+
EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_exit);
EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_inj_virq);
EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_page_fault);
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index f591f7c..e693b3a 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1073,6 +1073,8 @@ extern struct kvm_device_ops kvm_xics_ops;
extern struct kvm_device_ops kvm_arm_vgic_v2_ops;
extern struct kvm_device_ops kvm_arm_vgic_v3_ops;

+void kvm_arch_irq_consumer_init(struct irq_bypass_consumer *cons);
+
#ifdef CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT

static inline void kvm_vcpu_set_in_spin_loop(struct kvm_vcpu *vcpu, bool val)
--
2.1.0

2015-07-13 09:59:08

by Wu, Feng

[permalink] [raw]
Subject: [v5 15/19] KVM: eventfd: add irq bypass consumer management

From: Eric Auger <[email protected]>

This patch adds the registration/unregistration of an
irq_bypass_consumer on irqfd assignment/deassignment.

Signed-off-by: Eric Auger <[email protected]>
Signed-off-by: Feng Wu <[email protected]>
---
virt/kvm/eventfd.c | 7 +++++++
1 file changed, 7 insertions(+)

diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
index 647ffb8..4225eea 100644
--- a/virt/kvm/eventfd.c
+++ b/virt/kvm/eventfd.c
@@ -35,6 +35,7 @@
#include <linux/srcu.h>
#include <linux/slab.h>
#include <linux/seqlock.h>
+#include <linux/irqbypass.h>
#include <trace/events/kvm.h>

#include <kvm/iodev.h>
@@ -140,6 +141,7 @@ irqfd_shutdown(struct work_struct *work)
/*
* It is now safe to release the object's resources
*/
+ irq_bypass_unregister_consumer(&irqfd->consumer);
eventfd_ctx_put(irqfd->eventfd);
kfree(irqfd);
}
@@ -380,6 +382,11 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args)
*/
fdput(f);

+ irqfd->consumer.token = (void *)irqfd->eventfd;
+ kvm_arch_irq_consumer_init(&irqfd->consumer);
+ ret = irq_bypass_register_consumer(&irqfd->consumer);
+ WARN_ON(ret);
+
return 0;

fail:
--
2.1.0

2015-07-13 09:59:06

by Wu, Feng

[permalink] [raw]
Subject: [v5 16/19] KVM: Add an arch specific hooks in 'struct kvm_kernel_irqfd'

This patch adds an arch specific hooks 'arch_update' in
'struct kvm_kernel_irqfd'. On Intel side, it is used to
update the IRTE when VT-d posted-interrupts is used.

Signed-off-by: Feng Wu <[email protected]>
---
arch/x86/kvm/x86.c | 5 +++++
include/linux/kvm_host.h | 3 +++
include/linux/kvm_irqfd.h | 2 ++
virt/kvm/eventfd.c | 13 ++++++++++++-
4 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 62bbafe..a88e659 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8063,6 +8063,11 @@ void kvm_arch_irq_consumer_init(struct irq_bypass_consumer *cons)
cons->del_producer = kvm_arch_irq_bypass_del_producer;
}

+void kvm_arch_irqfd_init(struct kvm_kernel_irqfd *irqfd)
+{
+ irqfd->arch_update = kvm_arch_update_pi_irte;
+}
+
EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_exit);
EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_inj_virq);
EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_page_fault);
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index e693b3a..b37ebca 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -33,6 +33,8 @@

#include <asm/kvm_host.h>

+struct kvm_kernel_irqfd;
+
/*
* The bit 16 ~ bit 31 of kvm_memory_region::flags are internally used
* in kvm, other bits are visible for userspace which are defined in
@@ -1074,6 +1076,7 @@ extern struct kvm_device_ops kvm_arm_vgic_v2_ops;
extern struct kvm_device_ops kvm_arm_vgic_v3_ops;

void kvm_arch_irq_consumer_init(struct irq_bypass_consumer *cons);
+void kvm_arch_irqfd_init(struct kvm_kernel_irqfd *irqfd);

#ifdef CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT

diff --git a/include/linux/kvm_irqfd.h b/include/linux/kvm_irqfd.h
index cf9aad4..47a2696 100644
--- a/include/linux/kvm_irqfd.h
+++ b/include/linux/kvm_irqfd.h
@@ -67,6 +67,8 @@ struct kvm_kernel_irqfd {
struct work_struct shutdown;
struct irq_bypass_consumer consumer;
struct irq_bypass_producer *producer;
+ int (*arch_update)(struct kvm *kvm, unsigned int host_irq,
+ uint32_t guest_irq, bool set);
};

#endif /* __LINUX_KVM_IRQFD_H */
diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
index 4225eea..762282c 100644
--- a/virt/kvm/eventfd.c
+++ b/virt/kvm/eventfd.c
@@ -276,6 +276,7 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args)
INIT_LIST_HEAD(&irqfd->list);
INIT_WORK(&irqfd->inject, irqfd_inject);
INIT_WORK(&irqfd->shutdown, irqfd_shutdown);
+ kvm_arch_irqfd_init(irqfd);
seqcount_init(&irqfd->irq_entry_sc);

f = fdget(args->fd);
@@ -562,13 +563,23 @@ kvm_irqfd_release(struct kvm *kvm)
*/
void kvm_irq_routing_update(struct kvm *kvm)
{
+ int ret;
struct kvm_kernel_irqfd *irqfd;

spin_lock_irq(&kvm->irqfds.lock);

- list_for_each_entry(irqfd, &kvm->irqfds.items, list)
+ list_for_each_entry(irqfd, &kvm->irqfds.items, list) {
irqfd_update(kvm, irqfd);

+ if (irqfd->arch_update) {
+ BUG_ON(!irqfd->producer);
+ ret = irqfd->arch_update(
+ irqfd->kvm, irqfd->producer->irq,
+ irqfd->gsi, 1);
+ WARN_ON(ret);
+ }
+ }
+
spin_unlock_irq(&kvm->irqfds.lock);
}

--
2.1.0

2015-07-13 09:59:10

by Wu, Feng

[permalink] [raw]
Subject: [v5 17/19] KVM: Update Posted-Interrupts Descriptor when vCPU is preempted

This patch updates the Posted-Interrupts Descriptor when vCPU
is preempted.

sched out:
- Set 'SN' to suppress furture non-urgent interrupts posted for
the vCPU.

sched in:
- Clear 'SN'
- Change NDST if vCPU is scheduled to a different CPU
- Set 'NV' to POSTED_INTR_VECTOR

Signed-off-by: Feng Wu <[email protected]>
---
arch/x86/kvm/vmx.c | 34 ++++++++++++++++++++++++++++++++++
1 file changed, 34 insertions(+)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 35ef4c6..dd6f3d5 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -45,6 +45,7 @@
#include <asm/debugreg.h>
#include <asm/kexec.h>
#include <asm/apic.h>
+#include <asm/irq_remapping.h>

#include "trace.h"

@@ -2000,10 +2001,43 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
vmcs_writel(HOST_IA32_SYSENTER_ESP, sysenter_esp); /* 22.2.3 */
vmx->loaded_vmcs->cpu = cpu;
}
+
+ if (irq_remapping_cap(IRQ_POSTING_CAP)) {
+ struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
+ struct pi_desc old, new;
+ unsigned int dest;
+
+ do {
+ old.control = new.control = pi_desc->control;
+ if (vcpu->cpu != cpu) {
+ dest = cpu_physical_id(cpu);
+
+ if (x2apic_enabled())
+ new.ndst = dest;
+ else
+ new.ndst = (dest << 8) & 0xFF00;
+ }
+
+ /* Allow posting non-urgent interrupts */
+ new.sn = 0;
+
+ /* set 'NV' to 'notification vector' */
+ new.nv = POSTED_INTR_VECTOR;
+ } while (cmpxchg(&pi_desc->control, old.control,
+ new.control) != old.control);
+ }
}

static void vmx_vcpu_put(struct kvm_vcpu *vcpu)
{
+ if (irq_remapping_cap(IRQ_POSTING_CAP)) {
+ struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
+
+ /* Set SN when the vCPU is preempted */
+ if (vcpu->preempted)
+ pi_set_sn(pi_desc);
+ }
+
__vmx_load_host_state(to_vmx(vcpu));
if (!vmm_exclusive) {
__loaded_vmcs_clear(to_vmx(vcpu)->loaded_vmcs);
--
2.1.0

2015-07-13 10:00:04

by Wu, Feng

[permalink] [raw]
Subject: [v5 18/19] KVM: Update Posted-Interrupts Descriptor when vCPU is blocked

This patch updates the Posted-Interrupts Descriptor when vCPU
is blocked.

pre-block:
- Add the vCPU to the blocked per-CPU list
- Set 'NV' to POSTED_INTR_WAKEUP_VECTOR

post-block:
- Remove the vCPU from the per-CPU list

Signed-off-by: Feng Wu <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 3 +
arch/x86/kvm/vmx.c | 158 ++++++++++++++++++++++++++++++++++++++++
arch/x86/kvm/x86.c | 42 ++++++++---
include/linux/kvm_host.h | 3 +
virt/kvm/kvm_main.c | 3 +
5 files changed, 199 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 6db761b..68548d8 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -844,6 +844,9 @@ struct kvm_x86_ops {

void (*pi_clear_sn)(struct kvm_vcpu *vcpu);
void (*pi_set_sn)(struct kvm_vcpu *vcpu);
+
+ int (*pi_pre_block)(struct kvm_vcpu *vcpu);
+ void (*pi_post_block)(struct kvm_vcpu *vcpu);
};

struct kvm_arch_async_pf {
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index dd6f3d5..cecd018 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -887,6 +887,13 @@ static DEFINE_PER_CPU(struct vmcs *, current_vmcs);
static DEFINE_PER_CPU(struct list_head, loaded_vmcss_on_cpu);
static DEFINE_PER_CPU(struct desc_ptr, host_gdt);

+/*
+ * We maintian a per-CPU linked-list of vCPU, so in wakeup_handler() we
+ * can find which vCPU should be waken up.
+ */
+static DEFINE_PER_CPU(struct list_head, blocked_vcpu_on_cpu);
+static DEFINE_PER_CPU(spinlock_t, blocked_vcpu_on_cpu_lock);
+
static unsigned long *vmx_io_bitmap_a;
static unsigned long *vmx_io_bitmap_b;
static unsigned long *vmx_msr_bitmap_legacy;
@@ -2971,6 +2978,8 @@ static int hardware_enable(void)
return -EBUSY;

INIT_LIST_HEAD(&per_cpu(loaded_vmcss_on_cpu, cpu));
+ INIT_LIST_HEAD(&per_cpu(blocked_vcpu_on_cpu, cpu));
+ spin_lock_init(&per_cpu(blocked_vcpu_on_cpu_lock, cpu));

/*
* Now we can enable the vmclear operation in kdump
@@ -6098,6 +6107,25 @@ static void update_ple_window_actual_max(void)
ple_window_grow, INT_MIN);
}

+/*
+ * Handler for POSTED_INTERRUPT_WAKEUP_VECTOR.
+ */
+static void wakeup_handler(void)
+{
+ struct kvm_vcpu *vcpu;
+ int cpu = smp_processor_id();
+
+ spin_lock(&per_cpu(blocked_vcpu_on_cpu_lock, cpu));
+ list_for_each_entry(vcpu, &per_cpu(blocked_vcpu_on_cpu, cpu),
+ blocked_vcpu_list) {
+ struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
+
+ if (pi_test_on(pi_desc) == 1)
+ kvm_vcpu_kick(vcpu);
+ }
+ spin_unlock(&per_cpu(blocked_vcpu_on_cpu_lock, cpu));
+}
+
static __init int hardware_setup(void)
{
int r = -ENOMEM, i, msr;
@@ -6282,6 +6310,8 @@ static __init int hardware_setup(void)
kvm_x86_ops->enable_log_dirty_pt_masked = NULL;
}

+ kvm_set_posted_intr_wakeup_handler(wakeup_handler);
+
return alloc_kvm_area();

out8:
@@ -10235,6 +10265,131 @@ static void vmx_enable_log_dirty_pt_masked(struct kvm *kvm,
kvm_mmu_clear_dirty_pt_masked(kvm, memslot, offset, mask);
}

+/*
+ * This routine does the following things for vCPU which is going
+ * to be blocked if VT-d PI is enabled.
+ * - Store the vCPU to the wakeup list, so when interrupts happen
+ * we can find the right vCPU to wake up.
+ * - Change the Posted-interrupt descriptor as below:
+ * 'NDST' <-- vcpu->pre_pcpu
+ * 'NV' <-- POSTED_INTR_WAKEUP_VECTOR
+ * - If 'ON' is set during this process, which means at least one
+ * interrupt is posted for this vCPU, we cannot block it, in
+ * this case, return 1, otherwise, return 0.
+ *
+ */
+static int vmx_pi_pre_block(struct kvm_vcpu *vcpu)
+{
+ unsigned long flags;
+ unsigned int dest;
+ struct pi_desc old, new;
+ struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
+
+ if (!irq_remapping_cap(IRQ_POSTING_CAP))
+ return 0;
+
+ vcpu->pre_pcpu = vcpu->cpu;
+ spin_lock_irqsave(&per_cpu(blocked_vcpu_on_cpu_lock,
+ vcpu->pre_pcpu), flags);
+ list_add_tail(&vcpu->blocked_vcpu_list,
+ &per_cpu(blocked_vcpu_on_cpu,
+ vcpu->pre_pcpu));
+ spin_unlock_irqrestore(&per_cpu(blocked_vcpu_on_cpu_lock,
+ vcpu->pre_pcpu), flags);
+
+ do {
+ old.control = new.control = pi_desc->control;
+
+ /*
+ * We should not block the vCPU if
+ * an interrupt is posted for it.
+ */
+ if (pi_test_on(pi_desc) == 1) {
+ spin_lock_irqsave(&per_cpu(blocked_vcpu_on_cpu_lock,
+ vcpu->pre_pcpu), flags);
+ list_del(&vcpu->blocked_vcpu_list);
+ spin_unlock_irqrestore(
+ &per_cpu(blocked_vcpu_on_cpu_lock,
+ vcpu->pre_pcpu), flags);
+ vcpu->pre_pcpu = -1;
+
+ return 1;
+ }
+
+ WARN((pi_desc->sn == 1),
+ "Warning: SN field of posted-interrupts "
+ "is set before blocking\n");
+
+ /*
+ * Since vCPU can be preempted during this process,
+ * vcpu->cpu could be different with pre_pcpu, we
+ * need to set pre_pcpu as the destination of wakeup
+ * notification event, then we can find the right vCPU
+ * to wakeup in wakeup handler if interrupts happen
+ * when the vCPU is in blocked state.
+ */
+ dest = cpu_physical_id(vcpu->pre_pcpu);
+
+ if (x2apic_enabled())
+ new.ndst = dest;
+ else
+ new.ndst = (dest << 8) & 0xFF00;
+
+ /* set 'NV' to 'wakeup vector' */
+ new.nv = POSTED_INTR_WAKEUP_VECTOR;
+ } while (cmpxchg(&pi_desc->control, old.control,
+ new.control) != old.control);
+
+ return 0;
+}
+
+static void vmx_pi_post_block(struct kvm_vcpu *vcpu)
+{
+ struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
+ struct pi_desc old, new;
+ unsigned int dest;
+ unsigned long flags;
+
+ if (!irq_remapping_cap(IRQ_POSTING_CAP))
+ return;
+
+ /*
+ * If the vCPU is not really blocked and vmx_vcpu_load()
+ * doesn't have chance to run before this point, we should
+ * set posted-interrupt descriptor back, just like what we
+ * do in vmx_vcpu_load().
+ */
+ if (pi_desc->nv == POSTED_INTR_WAKEUP_VECTOR)
+ do {
+ old.control = new.control = pi_desc->control;
+
+ dest = cpu_physical_id(vcpu->cpu);
+
+ if (x2apic_enabled())
+ new.ndst = dest;
+ else
+ new.ndst = (dest << 8) & 0xFF00;
+
+ /* Allow posting non-urgent interrupts */
+ new.sn = 0;
+
+ /* set 'NV' to 'notification vector' */
+ new.nv = POSTED_INTR_VECTOR;
+ } while (cmpxchg(&pi_desc->control, old.control,
+ new.control) != old.control);
+
+ if(vcpu->pre_pcpu != -1) {
+ spin_lock_irqsave(
+ &per_cpu(blocked_vcpu_on_cpu_lock,
+ vcpu->pre_pcpu), flags);
+ list_del(&vcpu->blocked_vcpu_list);
+ spin_unlock_irqrestore(
+ &per_cpu(blocked_vcpu_on_cpu_lock,
+ vcpu->pre_pcpu), flags);
+ vcpu->pre_pcpu = -1;
+ }
+}
+
static struct kvm_x86_ops vmx_x86_ops = {
.cpu_has_kvm_support = cpu_has_kvm_support,
.disabled_by_bios = vmx_disabled_by_bios,
@@ -10354,6 +10509,9 @@ static struct kvm_x86_ops vmx_x86_ops = {

.pi_clear_sn = vmx_pi_clear_sn,
.pi_set_sn = vmx_pi_set_sn,
+
+ .pi_pre_block = vmx_pi_pre_block,
+ .pi_post_block = vmx_pi_post_block,
};

static int __init vmx_init(void)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index a88e659..8294f2d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5866,7 +5866,11 @@ int kvm_vcpu_halt(struct kvm_vcpu *vcpu)
{
++vcpu->stat.halt_exits;
if (irqchip_in_kernel(vcpu->kvm)) {
- vcpu->arch.mp_state = KVM_MP_STATE_HALTED;
+ /* Handle posted-interrupt when vCPU is to be halted */
+ if (!kvm_x86_ops->pi_pre_block ||
+ (kvm_x86_ops->pi_pre_block &&
+ kvm_x86_ops->pi_pre_block(vcpu) == 0))
+ vcpu->arch.mp_state = KVM_MP_STATE_HALTED;
return 1;
} else {
vcpu->run->exit_reason = KVM_EXIT_HLT;
@@ -6284,6 +6288,21 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
kvm_vcpu_reload_apic_access_page(vcpu);
}

+ /*
+ * Since posted-interrupts can be set by VT-d HW now, in this
+ * case, KVM_REQ_EVENT is not set. We move the following
+ * operations out of the if statement.
+ */
+ if (kvm_lapic_enabled(vcpu)) {
+ /*
+ * Update architecture specific hints for APIC
+ * virtual interrupt delivery.
+ */
+ if (kvm_x86_ops->hwapic_irr_update)
+ kvm_x86_ops->hwapic_irr_update(vcpu,
+ kvm_lapic_find_highest_irr(vcpu));
+ }
+
if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) {
kvm_apic_accept_events(vcpu);
if (vcpu->arch.mp_state == KVM_MP_STATE_INIT_RECEIVED) {
@@ -6300,13 +6319,6 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
kvm_x86_ops->enable_irq_window(vcpu);

if (kvm_lapic_enabled(vcpu)) {
- /*
- * Update architecture specific hints for APIC
- * virtual interrupt delivery.
- */
- if (kvm_x86_ops->hwapic_irr_update)
- kvm_x86_ops->hwapic_irr_update(vcpu,
- kvm_lapic_find_highest_irr(vcpu));
update_cr8_intercept(vcpu);
kvm_lapic_sync_to_vapic(vcpu);
}
@@ -6477,10 +6489,20 @@ static int vcpu_run(struct kvm_vcpu *vcpu)

for (;;) {
if (vcpu->arch.mp_state == KVM_MP_STATE_RUNNABLE &&
- !vcpu->arch.apf.halted)
+ !vcpu->arch.apf.halted) {
r = vcpu_enter_guest(vcpu);
- else
+ } else {
r = vcpu_block(kvm, vcpu);
+
+ /*
+ * pi_post_block() must be called after
+ * pi_pre_block() which is called in
+ * kvm_vcpu_halt().
+ */
+ if (kvm_x86_ops->pi_post_block)
+ kvm_x86_ops->pi_post_block(vcpu);
+ }
+
if (r <= 0)
break;

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index b37ebca..bebff2c 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -227,6 +227,9 @@ struct kvm_vcpu {
unsigned long requests;
unsigned long guest_debug;

+ int pre_pcpu;
+ struct list_head blocked_vcpu_list;
+
struct mutex mutex;
struct kvm_run *run;

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 9097741..6938554 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -221,6 +221,9 @@ int kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, unsigned id)
init_waitqueue_head(&vcpu->wq);
kvm_async_pf_vcpu_init(vcpu);

+ vcpu->pre_pcpu = -1;
+ INIT_LIST_HEAD(&vcpu->blocked_vcpu_list);
+
page = alloc_page(GFP_KERNEL | __GFP_ZERO);
if (!page) {
r = -ENOMEM;
--
2.1.0

2015-07-13 09:59:14

by Wu, Feng

[permalink] [raw]
Subject: [v5 19/19] KVM: Warn if 'SN' is set during posting interrupts by software

Currently, we don't support urgent interrupt, all interrupts
are recognized as non-urgent interrupt, so we cannot post
interrupts when 'SN' is set.

If the vcpu is in guest mode, it cannot have been scheduled out,
and that's the only case when SN is set currently, warning if
SN is set.

Signed-off-by: Feng Wu <[email protected]>
---
arch/x86/kvm/vmx.c | 16 ++++++++++++++++
1 file changed, 16 insertions(+)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index cecd018..d4d5abc 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -4484,6 +4484,22 @@ static inline bool kvm_vcpu_trigger_posted_interrupt(struct kvm_vcpu *vcpu)
{
#ifdef CONFIG_SMP
if (vcpu->mode == IN_GUEST_MODE) {
+ struct vcpu_vmx *vmx = to_vmx(vcpu);
+
+ /*
+ * Currently, we don't support urgent interrupt,
+ * all interrupts are recognized as non-urgent
+ * interrupt, so we cannot post interrupts when
+ * 'SN' is set.
+ *
+ * If the vcpu is in guest mode, it means it is
+ * running instead of being scheduled out and
+ * waiting in the run queue, and that's the only
+ * case when 'SN' is set currently, warning if
+ * 'SN' is set.
+ */
+ WARN_ON_ONCE(pi_test_sn(&vmx->pi_desc));
+
apic->send_IPI_mask(get_cpu_mask(vcpu->cpu),
POSTED_INTR_VECTOR);
return true;
--
2.1.0

2015-07-13 12:56:26

by Eric Auger

[permalink] [raw]
Subject: Re: [v5 09/19] vfio: Register/unregister irq_bypass_producer

Hi Feng,
On 07/13/2015 11:47 AM, Feng Wu wrote:
> This patch adds the registration/unregistration of an
> irq_bypass_producer for MSI/MSIx on vfio pci devices.
may be worth mentioning this is a dummy producer with callbacks not yet
implemented.
>
> Signed-off-by: Feng Wu <[email protected]>
> ---
> drivers/vfio/pci/vfio_pci_intrs.c | 19 +++++++++++++++++++
> drivers/vfio/pci/vfio_pci_private.h | 2 ++
> 2 files changed, 21 insertions(+)
>
> diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
> index 1f577b4..4795606 100644
> --- a/drivers/vfio/pci/vfio_pci_intrs.c
> +++ b/drivers/vfio/pci/vfio_pci_intrs.c
> @@ -305,6 +305,16 @@ static int vfio_msi_enable(struct vfio_pci_device *vdev, int nvec, bool msix)
> return 0;
> }
>
> +void vfio_pci_add_consumer(struct irq_bypass_producer *prod,
> + struct irq_bypass_consumer *cons)
static?
Also Paolo encouraged me to use vfio_pci_irq_bypass_add_consumer naming.
Same below.
> +{
> +}
> +
> +void vfio_pci_del_consumer(struct irq_bypass_producer *prod,
> + struct irq_bypass_consumer *cons)
static?
Eric
> +{
> +}
> +
> static int vfio_msi_set_vector_signal(struct vfio_pci_device *vdev,
> int vector, int fd, bool msix)
> {
> @@ -319,6 +329,7 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_device *vdev,
>
> if (vdev->ctx[vector].trigger) {
> free_irq(irq, vdev->ctx[vector].trigger);
> + irq_bypass_unregister_producer(&vdev->ctx[vector].producer);
> kfree(vdev->ctx[vector].name);
> eventfd_ctx_put(vdev->ctx[vector].trigger);
> vdev->ctx[vector].trigger = NULL;
> @@ -360,6 +371,14 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_device *vdev,
> return ret;
> }
>
> + INIT_LIST_HEAD(&vdev->ctx[vector].producer.node);
> + vdev->ctx[vector].producer.token = trigger;
> + vdev->ctx[vector].producer.irq = irq;
> + vdev->ctx[vector].producer.add_consumer = vfio_pci_add_consumer;
> + vdev->ctx[vector].producer.del_consumer = vfio_pci_del_consumer;
> + ret = irq_bypass_register_producer(&vdev->ctx[vector].producer);
> + WARN_ON(ret);
> +
> vdev->ctx[vector].trigger = trigger;
>
> return 0;
> diff --git a/drivers/vfio/pci/vfio_pci_private.h b/drivers/vfio/pci/vfio_pci_private.h
> index ae0e1b4..0e7394f 100644
> --- a/drivers/vfio/pci/vfio_pci_private.h
> +++ b/drivers/vfio/pci/vfio_pci_private.h
> @@ -13,6 +13,7 @@
>
> #include <linux/mutex.h>
> #include <linux/pci.h>
> +#include <linux/irqbypass.h>
>
> #ifndef VFIO_PCI_PRIVATE_H
> #define VFIO_PCI_PRIVATE_H
> @@ -29,6 +30,7 @@ struct vfio_pci_irq_ctx {
> struct virqfd *mask;
> char *name;
> bool masked;
> + struct irq_bypass_producer producer;
> };
>
> struct vfio_pci_device {
>

2015-07-13 13:10:26

by Eric Auger

[permalink] [raw]
Subject: Re: [v5 16/19] KVM: Add an arch specific hooks in 'struct kvm_kernel_irqfd'

On 07/13/2015 11:47 AM, Feng Wu wrote:
> This patch adds an arch specific hooks 'arch_update' in
> 'struct kvm_kernel_irqfd'. On Intel side, it is used to
> update the IRTE when VT-d posted-interrupts is used.
>
> Signed-off-by: Feng Wu <[email protected]>
> ---
> arch/x86/kvm/x86.c | 5 +++++
> include/linux/kvm_host.h | 3 +++
> include/linux/kvm_irqfd.h | 2 ++
> virt/kvm/eventfd.c | 13 ++++++++++++-
> 4 files changed, 22 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 62bbafe..a88e659 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -8063,6 +8063,11 @@ void kvm_arch_irq_consumer_init(struct irq_bypass_consumer *cons)
> cons->del_producer = kvm_arch_irq_bypass_del_producer;
> }
>
> +void kvm_arch_irqfd_init(struct kvm_kernel_irqfd *irqfd)
> +{
> + irqfd->arch_update = kvm_arch_update_pi_irte;
> +}
> +
> EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_exit);
> EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_inj_virq);
> EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_page_fault);
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index e693b3a..b37ebca 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -33,6 +33,8 @@
>
> #include <asm/kvm_host.h>
>
> +struct kvm_kernel_irqfd;
> +
> /*
> * The bit 16 ~ bit 31 of kvm_memory_region::flags are internally used
> * in kvm, other bits are visible for userspace which are defined in
> @@ -1074,6 +1076,7 @@ extern struct kvm_device_ops kvm_arm_vgic_v2_ops;
> extern struct kvm_device_ops kvm_arm_vgic_v3_ops;
>
> void kvm_arch_irq_consumer_init(struct irq_bypass_consumer *cons);
> +void kvm_arch_irqfd_init(struct kvm_kernel_irqfd *irqfd);
need to handle the case this is not implemented by the arch?


>
> #ifdef CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT
>
> diff --git a/include/linux/kvm_irqfd.h b/include/linux/kvm_irqfd.h
> index cf9aad4..47a2696 100644
> --- a/include/linux/kvm_irqfd.h
> +++ b/include/linux/kvm_irqfd.h
> @@ -67,6 +67,8 @@ struct kvm_kernel_irqfd {
> struct work_struct shutdown;
> struct irq_bypass_consumer consumer;
> struct irq_bypass_producer *producer;
> + int (*arch_update)(struct kvm *kvm, unsigned int host_irq,
> + uint32_t guest_irq, bool set);
you must document this cb cannot sleep I think
Eric
> };
>
> #endif /* __LINUX_KVM_IRQFD_H */
> diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
> index 4225eea..762282c 100644
> --- a/virt/kvm/eventfd.c
> +++ b/virt/kvm/eventfd.c
> @@ -276,6 +276,7 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args)
> INIT_LIST_HEAD(&irqfd->list);
> INIT_WORK(&irqfd->inject, irqfd_inject);
> INIT_WORK(&irqfd->shutdown, irqfd_shutdown);
> + kvm_arch_irqfd_init(irqfd);
> seqcount_init(&irqfd->irq_entry_sc);
>
> f = fdget(args->fd);
> @@ -562,13 +563,23 @@ kvm_irqfd_release(struct kvm *kvm)
> */
> void kvm_irq_routing_update(struct kvm *kvm)
> {
> + int ret;
> struct kvm_kernel_irqfd *irqfd;
>
> spin_lock_irq(&kvm->irqfds.lock);
>
> - list_for_each_entry(irqfd, &kvm->irqfds.items, list)
> + list_for_each_entry(irqfd, &kvm->irqfds.items, list) {
> irqfd_update(kvm, irqfd);
>
> + if (irqfd->arch_update) {
> + BUG_ON(!irqfd->producer);
> + ret = irqfd->arch_update(
> + irqfd->kvm, irqfd->producer->irq,
> + irqfd->gsi, 1);
> + WARN_ON(ret);
> + }
> + }
> +
> spin_unlock_irq(&kvm->irqfds.lock);
> }
>
>

2015-07-13 13:16:57

by Eric Auger

[permalink] [raw]
Subject: Re: [v5 15/19] KVM: eventfd: add irq bypass consumer management

Hi Feng,
On 07/13/2015 11:47 AM, Feng Wu wrote:
> From: Eric Auger <[email protected]>
>
> This patch adds the registration/unregistration of an
> irq_bypass_consumer on irqfd assignment/deassignment.
>
> Signed-off-by: Eric Auger <[email protected]>
> Signed-off-by: Feng Wu <[email protected]>
> ---
> virt/kvm/eventfd.c | 7 +++++++
> 1 file changed, 7 insertions(+)
>
> diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
> index 647ffb8..4225eea 100644
> --- a/virt/kvm/eventfd.c
> +++ b/virt/kvm/eventfd.c
> @@ -35,6 +35,7 @@
> #include <linux/srcu.h>
> #include <linux/slab.h>
> #include <linux/seqlock.h>
> +#include <linux/irqbypass.h>
> #include <trace/events/kvm.h>
>
> #include <kvm/iodev.h>
> @@ -140,6 +141,7 @@ irqfd_shutdown(struct work_struct *work)
> /*
> * It is now safe to release the object's resources
> */
> + irq_bypass_unregister_consumer(&irqfd->consumer);
> eventfd_ctx_put(irqfd->eventfd);
> kfree(irqfd);
> }
> @@ -380,6 +382,11 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args)
> */
> fdput(f);
>
> + irqfd->consumer.token = (void *)irqfd->eventfd;
> + kvm_arch_irq_consumer_init(&irqfd->consumer);
what if the architecture does not implement kvm_arch_irq_consumer_init?

Also you are using here this single function kvm_arch_irq_consumer_init
to do some irq bypass manager settings + attaching your
irqfd->arch_update cb which does not really relate to IRQ bypass
manager. I think I preferred the approach where start/top/add/del were
exposed separately ([RFC v2 5/6] KVM: introduce kvm_arch functions for
IRQ bypass).

Why not adding another kvm_arch_irq_routing_update then, not necessarily
linked to irq bypass manager.

Best Regards

Eric
> + ret = irq_bypass_register_consumer(&irqfd->consumer);
> + WARN_ON(ret);
> +
> return 0;
>
> fail:
>

2015-07-13 13:17:50

by Eric Auger

[permalink] [raw]
Subject: Re: [v5 14/19] KVM: x86: Add arch specific routines for irqbypass manager

On 07/13/2015 11:47 AM, Feng Wu wrote:
> Add the following x86 specific routines for irqbypass manger:
>
> - kvm_arch_irq_bypass_add_producer
> - kvm_arch_irq_bypass_del_producer
>
> Signed-off-by: Feng Wu <[email protected]>
> ---
> arch/x86/include/asm/kvm_host.h | 1 +
> arch/x86/kvm/x86.c | 40 ++++++++++++++++++++++++++++++++++++++++
> include/linux/kvm_host.h | 2 ++
> 3 files changed, 43 insertions(+)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 1b0278e..6db761b 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -24,6 +24,7 @@
> #include <linux/perf_event.h>
> #include <linux/pvclock_gtod.h>
> #include <linux/clocksource.h>
> +#include <linux/irqbypass.h>
>
> #include <asm/pvclock-abi.h>
> #include <asm/desc.h>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index d81ac02..62bbafe 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -49,6 +49,8 @@
> #include <linux/pci.h>
> #include <linux/timekeeper_internal.h>
> #include <linux/pvclock_gtod.h>
> +#include <linux/kvm_irqfd.h>
> +#include <linux/irqbypass.h>
> #include <trace/events/kvm.h>
>
> #define CREATE_TRACE_POINTS
> @@ -8023,6 +8025,44 @@ out:
> return ret;
> }
>
> +void kvm_arch_irq_bypass_add_producer(struct irq_bypass_consumer *cons,
> + struct irq_bypass_producer *prod)
static?
> +{
> + int ret;
> + struct kvm_kernel_irqfd *irqfd =
> + container_of(cons, struct kvm_kernel_irqfd, consumer);
> +
> + irqfd->producer = prod;
> +
> + ret = kvm_arch_update_pi_irte(irqfd->kvm, prod->irq, irqfd->gsi, 1);
> + WARN_ON(ret);
> +}
> +
> +void kvm_arch_irq_bypass_del_producer(struct irq_bypass_consumer *cons,
> + struct irq_bypass_producer *prod)
> +{
static?
Since those 2 functions are not exposed to generic code anymore, why
this kvm_arch naming?

Eric
> + int ret;
> + struct kvm_kernel_irqfd *irqfd =
> + container_of(cons, struct kvm_kernel_irqfd, consumer);
> +
> + irqfd->producer = NULL;
> +
> + /*
> + * When producer of consumer is unregistered, we change back to
> + * remapped mode, so we can re-use the current implementation
> + * when the irq is masked/disabed or the consumer side (KVM
> + * int this case doesn't want to receive the interrupts.
> + */
> + ret = kvm_arch_update_pi_irte(irqfd->kvm, prod->irq, irqfd->gsi, 0);
> + WARN_ON(ret);
> +}
> +
> +void kvm_arch_irq_consumer_init(struct irq_bypass_consumer *cons)
> +{
> + cons->add_producer = kvm_arch_irq_bypass_add_producer;
> + cons->del_producer = kvm_arch_irq_bypass_del_producer;
> +}
> +
> EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_exit);
> EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_inj_virq);
> EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_page_fault);
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index f591f7c..e693b3a 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -1073,6 +1073,8 @@ extern struct kvm_device_ops kvm_xics_ops;
> extern struct kvm_device_ops kvm_arm_vgic_v2_ops;
> extern struct kvm_device_ops kvm_arm_vgic_v3_ops;
>
> +void kvm_arch_irq_consumer_init(struct irq_bypass_consumer *cons);
> +
> #ifdef CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT
>
> static inline void kvm_vcpu_set_in_spin_loop(struct kvm_vcpu *vcpu, bool val)
>

2015-07-13 13:19:51

by Eric Auger

[permalink] [raw]
Subject: Re: [v5 00/19] Add VT-d Posted-Interrupts support

Hi Feng,
On 07/13/2015 11:47 AM, Feng Wu wrote:
> VT-d Posted-Interrupts is an enhancement to CPU side Posted-Interrupt.
> With VT-d Posted-Interrupts enabled, external interrupts from
> direct-assigned devices can be delivered to guests without VMM
> intervention when guest is running in non-root mode.
>
> You can find the VT-d Posted-Interrtups Spec. in the following URL:
> http://www.intel.com/content/www/us/en/intelligent-systems/intel-technology/vt-directed-io-spec.html
>
> This series was part of http://thread.gmane.org/gmane.linux.kernel.iommu/7708. To make things clear, send out IOMMU part here.
>
> This patch-set is based on the lastest x86/apic branch of tip tree.
>
> Divide the whole series which contain multiple components into three parts:
> - Prerequisite changes to irq subsystem (already merged)
> - IOMMU part (already merged)
> - KVM and VFIO parts (this series)
>
> v5:
> - Based on Alex and Eric's irq bypass manager:
> https://lkml.org/lkml/2015/7/10/663
> - Reuse some common patch from Eric

A comment about the overall structure. Previously you prefered to have 2
separate series, one usable by both of us and one with my forwarding
stuff. Why did you change your mind?

Best Regards

Eric
>
> Eric Auger (3):
> KVM: create kvm_irqfd.h
> KVM: eventfd: add irq bypass information in irqfd
> KVM: eventfd: add irq bypass consumer management
>
> Feng Wu (16):
> KVM: Extend struct pi_desc for VT-d Posted-Interrupts
> KVM: Add some helper functions for Posted-Interrupts
> KVM: Define a new interface kvm_intr_is_single_vcpu()
> KVM: Get Posted-Interrupts descriptor address from struct kvm_vcpu
> KVM: Add interfaces to control PI outside vmx
> KVM: Make struct kvm_irq_routing_table accessible
> KVM: make kvm_set_msi_irq() public
> vfio: Select IRQ_BYPASS_MANAGER for vfio PCI devices
> vfio: Register/unregister irq_bypass_producer
> KVM, x86: Select IRQ_BYPASS_MANAGER for KVM_INTEL
> KVM: x86: Update IRTE for posted-interrupts
> KVM: x86: Add arch specific routines for irqbypass manager
> KVM: Add an arch specific hooks in 'struct kvm_kernel_irqfd'
> KVM: Update Posted-Interrupts Descriptor when vCPU is preempted
> KVM: Update Posted-Interrupts Descriptor when vCPU is blocked
> KVM: Warn if 'SN' is set during posting interrupts by software
>
> arch/x86/include/asm/kvm_host.h | 15 ++
> arch/x86/kvm/Kconfig | 1 +
> arch/x86/kvm/irq_comm.c | 28 +++-
> arch/x86/kvm/vmx.c | 278 +++++++++++++++++++++++++++++++++++-
> arch/x86/kvm/x86.c | 160 +++++++++++++++++++--
> drivers/vfio/pci/Kconfig | 1 +
> drivers/vfio/pci/vfio_pci_intrs.c | 19 +++
> drivers/vfio/pci/vfio_pci_private.h | 2 +
> include/linux/kvm_host.h | 23 +++
> include/linux/kvm_irqfd.h | 74 ++++++++++
> virt/kvm/eventfd.c | 115 ++++++---------
> virt/kvm/irqchip.c | 11 --
> virt/kvm/kvm_main.c | 3 +
> 13 files changed, 632 insertions(+), 98 deletions(-)
> create mode 100644 include/linux/kvm_irqfd.h
>

2015-07-13 13:47:31

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [v5 15/19] KVM: eventfd: add irq bypass consumer management

13/07/2015 15:16, Eric Auger wrote:
>> >
>> > + irqfd->consumer.token = (void *)irqfd->eventfd;
>> > + kvm_arch_irq_consumer_init(&irqfd->consumer);
> what if the architecture does not implement kvm_arch_irq_consumer_init?
>
> Also you are using here this single function kvm_arch_irq_consumer_init
> to do some irq bypass manager settings + attaching your
> irqfd->arch_update cb which does not really relate to IRQ bypass
> manager. I think I preferred the approach where start/top/add/del were
> exposed separately ([RFC v2 5/6] KVM: introduce kvm_arch functions for
> IRQ bypass).
>
> Why not adding another kvm_arch_irq_routing_update then, not necessarily
> linked to irq bypass manager.

Yes, I also preferred the dummy kvm_arch_* functions to this approach
with an init function. You'd have to add dummy init functions anyway
for non-ARM, non-x86 architectures.

Paolo

2015-07-13 18:57:31

by Alex Williamson

[permalink] [raw]
Subject: Re: [v5 09/19] vfio: Register/unregister irq_bypass_producer

On Mon, 2015-07-13 at 17:47 +0800, Feng Wu wrote:
> This patch adds the registration/unregistration of an
> irq_bypass_producer for MSI/MSIx on vfio pci devices.
>
> Signed-off-by: Feng Wu <[email protected]>
> ---
> drivers/vfio/pci/vfio_pci_intrs.c | 19 +++++++++++++++++++
> drivers/vfio/pci/vfio_pci_private.h | 2 ++
> 2 files changed, 21 insertions(+)
>
> diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
> index 1f577b4..4795606 100644
> --- a/drivers/vfio/pci/vfio_pci_intrs.c
> +++ b/drivers/vfio/pci/vfio_pci_intrs.c
> @@ -305,6 +305,16 @@ static int vfio_msi_enable(struct vfio_pci_device *vdev, int nvec, bool msix)
> return 0;
> }
>
> +void vfio_pci_add_consumer(struct irq_bypass_producer *prod,
> + struct irq_bypass_consumer *cons)
> +{
> +}
> +
> +void vfio_pci_del_consumer(struct irq_bypass_producer *prod,
> + struct irq_bypass_consumer *cons)
> +{
> +}
> +

Yes, as Eric says, these should be static

> static int vfio_msi_set_vector_signal(struct vfio_pci_device *vdev,
> int vector, int fd, bool msix)
> {
> @@ -319,6 +329,7 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_device *vdev,
>
> if (vdev->ctx[vector].trigger) {
> free_irq(irq, vdev->ctx[vector].trigger);
> + irq_bypass_unregister_producer(&vdev->ctx[vector].producer);
> kfree(vdev->ctx[vector].name);
> eventfd_ctx_put(vdev->ctx[vector].trigger);
> vdev->ctx[vector].trigger = NULL;
> @@ -360,6 +371,14 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_device *vdev,
> return ret;
> }
>
> + INIT_LIST_HEAD(&vdev->ctx[vector].producer.node);

nit, INIT_LIST_HEAD isn't really needed, irq-bypass-manager shouldn't
trust incoming data here anyway.

> + vdev->ctx[vector].producer.token = trigger;
> + vdev->ctx[vector].producer.irq = irq;
> + vdev->ctx[vector].producer.add_consumer = vfio_pci_add_consumer;
> + vdev->ctx[vector].producer.del_consumer = vfio_pci_del_consumer;
> + ret = irq_bypass_register_producer(&vdev->ctx[vector].producer);
> + WARN_ON(ret);

This is only an acceleration path, so a dev_dbg() or dev_info() is
probably more appropriate. Same for kvm side.

> +
> vdev->ctx[vector].trigger = trigger;
>
> return 0;
> diff --git a/drivers/vfio/pci/vfio_pci_private.h b/drivers/vfio/pci/vfio_pci_private.h
> index ae0e1b4..0e7394f 100644
> --- a/drivers/vfio/pci/vfio_pci_private.h
> +++ b/drivers/vfio/pci/vfio_pci_private.h
> @@ -13,6 +13,7 @@
>
> #include <linux/mutex.h>
> #include <linux/pci.h>
> +#include <linux/irqbypass.h>
>
> #ifndef VFIO_PCI_PRIVATE_H
> #define VFIO_PCI_PRIVATE_H
> @@ -29,6 +30,7 @@ struct vfio_pci_irq_ctx {
> struct virqfd *mask;
> char *name;
> bool masked;
> + struct irq_bypass_producer producer;
> };
>
> struct vfio_pci_device {


2015-07-14 01:09:56

by Wu, Feng

[permalink] [raw]
Subject: RE: [v5 00/19] Add VT-d Posted-Interrupts support



> -----Original Message-----
> From: Eric Auger [mailto:[email protected]]
> Sent: Monday, July 13, 2015 9:19 PM
> To: Wu, Feng; [email protected]; [email protected]
> Cc: [email protected]; [email protected]; [email protected]
> Subject: Re: [v5 00/19] Add VT-d Posted-Interrupts support
>
> Hi Feng,
> On 07/13/2015 11:47 AM, Feng Wu wrote:
> > VT-d Posted-Interrupts is an enhancement to CPU side Posted-Interrupt.
> > With VT-d Posted-Interrupts enabled, external interrupts from
> > direct-assigned devices can be delivered to guests without VMM
> > intervention when guest is running in non-root mode.
> >
> > You can find the VT-d Posted-Interrtups Spec. in the following URL:
> >
> http://www.intel.com/content/www/us/en/intelligent-systems/intel-technolog
> y/vt-directed-io-spec.html
> >
> > This series was part of
> http://thread.gmane.org/gmane.linux.kernel.iommu/7708. To make things
> clear, send out IOMMU part here.
> >
> > This patch-set is based on the lastest x86/apic branch of tip tree.
> >
> > Divide the whole series which contain multiple components into three parts:
> > - Prerequisite changes to irq subsystem (already merged)
> > - IOMMU part (already merged)
> > - KVM and VFIO parts (this series)
> >
> > v5:
> > - Based on Alex and Eric's irq bypass manager:
> > https://lkml.org/lkml/2015/7/10/663
> > - Reuse some common patch from Eric
>
> A comment about the overall structure. Previously you prefered to have 2
> separate series, one usable by both of us and one with my forwarding
> stuff. Why did you change your mind?

I didn't change my mind, since alex sent out the latest irq bypass manger
patch, in which, some callbacks are renamed and some are changed to
optional, I feel there may need some changes to your patch below:
[RFC v2 0/6] IRQ bypass manager and irqfd consumer

So I integrate it here, sorry for the inconvenience. Could you please send
Out a new version of this patch-set, then I can follow it. Thanks a lot!

Thanks,
Feng

>
> Best Regards
>
> Eric
> >
> > Eric Auger (3):
> > KVM: create kvm_irqfd.h
> > KVM: eventfd: add irq bypass information in irqfd
> > KVM: eventfd: add irq bypass consumer management
> >
> > Feng Wu (16):
> > KVM: Extend struct pi_desc for VT-d Posted-Interrupts
> > KVM: Add some helper functions for Posted-Interrupts
> > KVM: Define a new interface kvm_intr_is_single_vcpu()
> > KVM: Get Posted-Interrupts descriptor address from struct kvm_vcpu
> > KVM: Add interfaces to control PI outside vmx
> > KVM: Make struct kvm_irq_routing_table accessible
> > KVM: make kvm_set_msi_irq() public
> > vfio: Select IRQ_BYPASS_MANAGER for vfio PCI devices
> > vfio: Register/unregister irq_bypass_producer
> > KVM, x86: Select IRQ_BYPASS_MANAGER for KVM_INTEL
> > KVM: x86: Update IRTE for posted-interrupts
> > KVM: x86: Add arch specific routines for irqbypass manager
> > KVM: Add an arch specific hooks in 'struct kvm_kernel_irqfd'
> > KVM: Update Posted-Interrupts Descriptor when vCPU is preempted
> > KVM: Update Posted-Interrupts Descriptor when vCPU is blocked
> > KVM: Warn if 'SN' is set during posting interrupts by software
> >
> > arch/x86/include/asm/kvm_host.h | 15 ++
> > arch/x86/kvm/Kconfig | 1 +
> > arch/x86/kvm/irq_comm.c | 28 +++-
> > arch/x86/kvm/vmx.c | 278
> +++++++++++++++++++++++++++++++++++-
> > arch/x86/kvm/x86.c | 160 +++++++++++++++++++--
> > drivers/vfio/pci/Kconfig | 1 +
> > drivers/vfio/pci/vfio_pci_intrs.c | 19 +++
> > drivers/vfio/pci/vfio_pci_private.h | 2 +
> > include/linux/kvm_host.h | 23 +++
> > include/linux/kvm_irqfd.h | 74 ++++++++++
> > virt/kvm/eventfd.c | 115 ++++++---------
> > virt/kvm/irqchip.c | 11 --
> > virt/kvm/kvm_main.c | 3 +
> > 13 files changed, 632 insertions(+), 98 deletions(-)
> > create mode 100644 include/linux/kvm_irqfd.h
> >

2015-07-28 05:09:12

by Wu, Feng

[permalink] [raw]
Subject: RE: [v5 15/19] KVM: eventfd: add irq bypass consumer management



> -----Original Message-----
> From: Paolo Bonzini [mailto:[email protected]]
> Sent: Monday, July 13, 2015 9:47 PM
> To: Eric Auger; Wu, Feng; [email protected]; [email protected]
> Cc: [email protected]; [email protected]
> Subject: Re: [v5 15/19] KVM: eventfd: add irq bypass consumer management
>
> 13/07/2015 15:16, Eric Auger wrote:
> >> >
> >> > + irqfd->consumer.token = (void *)irqfd->eventfd;
> >> > + kvm_arch_irq_consumer_init(&irqfd->consumer);
> > what if the architecture does not implement kvm_arch_irq_consumer_init?
> >
> > Also you are using here this single function kvm_arch_irq_consumer_init
> > to do some irq bypass manager settings + attaching your
> > irqfd->arch_update cb which does not really relate to IRQ bypass
> > manager. I think I preferred the approach where start/top/add/del were
> > exposed separately ([RFC v2 5/6] KVM: introduce kvm_arch functions for
> > IRQ bypass).
> >
> > Why not adding another kvm_arch_irq_routing_update then, not necessarily
> > linked to irq bypass manager.
>
> Yes, I also preferred the dummy kvm_arch_* functions to this approach
> with an init function. You'd have to add dummy init functions anyway
> for non-ARM, non-x86 architectures.

I think dummy kvm_arch_* is okay for me. However, my point is that currently
'add_producer ' and 'del_producer ' are mandatory, other callbacks are optional.
In patch "[RFC v2 5/6] KVM: introduce kvm_arch functions for IRQ bypass " and
"[RFC v2 6/6] KVM: eventfd: add irq bypass consumer management ", it
provides all the callbacks, which means we need to implement dummy arch
specific functions no matter it is necessary. In that case, seems it is pointless
to make some of the callbacks optional. Anyway, if you guys are fine with the
dummy approach, I am good! :)

Thanks,
Feng

>
> Paolo