2015-06-11 11:01:33

by Wu, Feng

[permalink] [raw]
Subject: [v4 00/16] Add VT-d Posted-Interrupts support

VT-d Posted-Interrupts is an enhancement to CPU side Posted-Interrupt.
With VT-d Posted-Interrupts enabled, external interrupts from
direct-assigned devices can be delivered to guests without VMM
intervention when guest is running in non-root mode.

You can find the VT-d Posted-Interrtups Spec. in the following URL:
http://www.intel.com/content/www/us/en/intelligent-systems/intel-technology/vt-directed-io-spec.html

This series was part of http://thread.gmane.org/gmane.linux.kernel.iommu/7708. To make things clear, send out IOMMU part here.

This patch-set is based on the lastest x86/apic branch of tip tree.

Divide the whole series which contain multiple components into three parts:
- Prerequisite changes to irq subsystem (already merged in tip/x86/apic)
- IOMMU part (about to be merged in merged in tip/x86/apic), here is the
latest version: https://lkml.org/lkml/2015/6/9/22
- KVM and VFIO parts (this series)

Patch 8, 9, and 10 are from Eric Auger, there are some common VFIO APIs
defined in them. I integrate them in this series and use the APIs in
my patches.

v4:
* For lowest-priority interrupt, only support single-CPU destination
interrupts at the current stage, more common lowest priority support
will be added later.
* Accoring to Marcelo's suggestion, when vCPU is blocked, we handle
the posted-interrupts in the HLT emulation path.
* Some small changes (coding style, typo, add some code comments)

Eric Auger (3):
KVM: kvm-vfio: User API for IRQ forwarding
VFIO: external user API for interaction
KVM: kvm-vfio: wrappers to VFIO external API device helpers

Feng Wu (13):
KVM: Extend struct pi_desc for VT-d Posted-Interrupts
KVM: Add some helper functions for Posted-Interrupts
KVM: Define a new interface kvm_intr_is_single_vcpu()
KVM: Get Posted-Interrupts descriptor address from struct kvm_vcpu
KVM: Add interfaces to control PI outside vmx
KVM: Make struct kvm_irq_routing_table accessible
KVM: make kvm_set_msi_irq() public
KVM: kvm-vfio: User API for VT-d Posted-Interrupts
KVM: kvm-vfio: implement the VFIO skeleton for VT-d Posted-Interrupts
KVM: x86: kvm-vfio: VT-d posted-interrupts setup
KVM: Update Posted-Interrupts Descriptor when vCPU is preempted
KVM: Update Posted-Interrupts Descriptor when vCPU is blocked
KVM: Warn if 'SN' is set during posting interrupts by software

Documentation/virtual/kvm/devices/vfio.txt | 43 ++++-
arch/x86/include/asm/kvm_host.h | 16 ++
arch/x86/kvm/Makefile | 3 +-
arch/x86/kvm/irq_comm.c | 28 ++-
arch/x86/kvm/kvm_vfio_x86.c | 85 +++++++++
arch/x86/kvm/vmx.c | 278 ++++++++++++++++++++++++++++-
arch/x86/kvm/x86.c | 42 +++--
drivers/vfio/vfio.c | 24 +++
include/linux/kvm_host.h | 40 +++++
include/linux/vfio.h | 3 +
include/uapi/linux/kvm.h | 14 ++
virt/kvm/irqchip.c | 11 --
virt/kvm/kvm_main.c | 3 +
virt/kvm/vfio.c | 200 +++++++++++++++++++++
14 files changed, 758 insertions(+), 32 deletions(-)
create mode 100644 arch/x86/kvm/kvm_vfio_x86.c

--
2.1.0


2015-06-11 11:01:53

by Wu, Feng

[permalink] [raw]
Subject: [v4 01/16] KVM: Extend struct pi_desc for VT-d Posted-Interrupts

Extend struct pi_desc for VT-d Posted-Interrupts.

Signed-off-by: Feng Wu <[email protected]>
---
arch/x86/kvm/vmx.c | 20 ++++++++++++++++++--
1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index f7b6168..bd26501 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -446,8 +446,24 @@ struct nested_vmx {
/* Posted-Interrupt Descriptor */
struct pi_desc {
u32 pir[8]; /* Posted interrupt requested */
- u32 control; /* bit 0 of control is outstanding notification bit */
- u32 rsvd[7];
+ union {
+ struct {
+ /* bit 256 - Outstanding Notification */
+ u64 on : 1,
+ /* bit 257 - Suppress Notification */
+ sn : 1,
+ /* bit 271:258 - Reserved */
+ rsvd_1 : 14,
+ /* bit 279:272 - Notification Vector */
+ nv : 8,
+ /* bit 287:280 - Reserved */
+ rsvd_2 : 8,
+ /* bit 319:288 - Notification Destination */
+ ndst : 32;
+ };
+ u64 control;
+ };
+ u32 rsvd[6];
} __aligned(64);

static bool pi_test_and_set_on(struct pi_desc *pi_desc)
--
2.1.0

2015-06-11 11:02:04

by Wu, Feng

[permalink] [raw]
Subject: [v4 02/16] KVM: Add some helper functions for Posted-Interrupts

This patch adds some helper functions to manipulate the
Posted-Interrupts Descriptor.

Signed-off-by: Feng Wu <[email protected]>
---
arch/x86/kvm/vmx.c | 26 ++++++++++++++++++++++++++
1 file changed, 26 insertions(+)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index bd26501..8be6aa4 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -443,6 +443,8 @@ struct nested_vmx {
};

#define POSTED_INTR_ON 0
+#define POSTED_INTR_SN 1
+
/* Posted-Interrupt Descriptor */
struct pi_desc {
u32 pir[8]; /* Posted interrupt requested */
@@ -483,6 +485,30 @@ static int pi_test_and_set_pir(int vector, struct pi_desc *pi_desc)
return test_and_set_bit(vector, (unsigned long *)pi_desc->pir);
}

+static void pi_clear_sn(struct pi_desc *pi_desc)
+{
+ return clear_bit(POSTED_INTR_SN,
+ (unsigned long *)&pi_desc->control);
+}
+
+static void pi_set_sn(struct pi_desc *pi_desc)
+{
+ return set_bit(POSTED_INTR_SN,
+ (unsigned long *)&pi_desc->control);
+}
+
+static int pi_test_on(struct pi_desc *pi_desc)
+{
+ return test_bit(POSTED_INTR_ON,
+ (unsigned long *)&pi_desc->control);
+}
+
+static int pi_test_sn(struct pi_desc *pi_desc)
+{
+ return test_bit(POSTED_INTR_SN,
+ (unsigned long *)&pi_desc->control);
+}
+
struct vcpu_vmx {
struct kvm_vcpu vcpu;
unsigned long host_rsp;
--
2.1.0

2015-06-11 11:02:11

by Wu, Feng

[permalink] [raw]
Subject: [v4 03/16] KVM: Define a new interface kvm_intr_is_single_vcpu()

This patch defines a new interface kvm_intr_is_single_vcpu(),
which can returns whether the interrupt is for single-CPU or not.

It is used by VT-d PI, since now we only support single-CPU
interrupts, For lowest-priority interrupts, if user configures
it via /proc/irq or uses irqbalance to make it single-CPU, we
can use PI to deliver the interrupts to it. Full functionality
of lowest-priority support will be added later.

Signed-off-by: Feng Wu <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 2 ++
arch/x86/kvm/irq_comm.c | 24 ++++++++++++++++++++++++
2 files changed, 26 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index dea2e7e..cab4141 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1179,4 +1179,6 @@ int kvm_pmu_read_pmc(struct kvm_vcpu *vcpu, unsigned pmc, u64 *data);
void kvm_handle_pmu_event(struct kvm_vcpu *vcpu);
void kvm_deliver_pmi(struct kvm_vcpu *vcpu);

+bool kvm_intr_is_single_vcpu(struct kvm *kvm, struct kvm_lapic_irq *irq,
+ struct kvm_vcpu **dest_vcpu);
#endif /* _ASM_X86_KVM_HOST_H */
diff --git a/arch/x86/kvm/irq_comm.c b/arch/x86/kvm/irq_comm.c
index 72298b3..9e42645 100644
--- a/arch/x86/kvm/irq_comm.c
+++ b/arch/x86/kvm/irq_comm.c
@@ -299,6 +299,30 @@ out:
return r;
}

+bool kvm_intr_is_single_vcpu(struct kvm *kvm, struct kvm_lapic_irq *irq,
+ struct kvm_vcpu **dest_vcpu)
+{
+ int i, r = 0;
+ struct kvm_vcpu *vcpu;
+
+ kvm_for_each_vcpu(i, vcpu, kvm) {
+ if (!kvm_apic_present(vcpu))
+ continue;
+
+ if (!kvm_apic_match_dest(vcpu, NULL, irq->shorthand,
+ irq->dest_id, irq->dest_mode))
+ continue;
+
+ r++;
+ *dest_vcpu = vcpu;
+ }
+
+ if (r == 1)
+ return true;
+ else
+ return false;
+}
+
#define IOAPIC_ROUTING_ENTRY(irq) \
{ .gsi = irq, .type = KVM_IRQ_ROUTING_IRQCHIP, \
.u.irqchip = { .irqchip = KVM_IRQCHIP_IOAPIC, .pin = (irq) } }
--
2.1.0

2015-06-11 11:02:18

by Wu, Feng

[permalink] [raw]
Subject: [v4 04/16] KVM: Get Posted-Interrupts descriptor address from struct kvm_vcpu

Define an interface to get PI descriptor address from the vCPU structure.

Signed-off-by: Feng Wu <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 2 ++
arch/x86/kvm/vmx.c | 11 +++++++++++
2 files changed, 13 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index cab4141..c7a09fb 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -833,6 +833,8 @@ struct kvm_x86_ops {
void (*enable_log_dirty_pt_masked)(struct kvm *kvm,
struct kvm_memory_slot *slot,
gfn_t offset, unsigned long mask);
+
+ u64 (*get_pi_desc_addr)(struct kvm_vcpu *vcpu);
};

struct kvm_arch_async_pf {
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 8be6aa4..ccc9ce4 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -610,6 +610,10 @@ static inline struct vcpu_vmx *to_vmx(struct kvm_vcpu *vcpu)
#define FIELD64(number, name) [number] = VMCS12_OFFSET(name), \
[number##_HIGH] = VMCS12_OFFSET(name)+4

+struct pi_desc *vcpu_to_pi_desc(struct kvm_vcpu *vcpu)
+{
+ return &(to_vmx(vcpu)->pi_desc);
+}

static unsigned long shadow_read_only_fields[] = {
/*
@@ -4495,6 +4499,11 @@ static void vmx_sync_pir_to_irr_dummy(struct kvm_vcpu *vcpu)
return;
}

+static u64 vmx_get_pi_desc_addr(struct kvm_vcpu *vcpu)
+{
+ return __pa((u64)vcpu_to_pi_desc(vcpu));
+}
+
/*
* Set up the vmcs's constant host-state fields, i.e., host-state fields that
* will not change in the lifetime of the guest.
@@ -10296,6 +10305,8 @@ static struct kvm_x86_ops vmx_x86_ops = {
.slot_disable_log_dirty = vmx_slot_disable_log_dirty,
.flush_log_dirty = vmx_flush_log_dirty,
.enable_log_dirty_pt_masked = vmx_enable_log_dirty_pt_masked,
+
+ .get_pi_desc_addr = vmx_get_pi_desc_addr,
};

static int __init vmx_init(void)
--
2.1.0

2015-06-11 11:02:23

by Wu, Feng

[permalink] [raw]
Subject: [v4 05/16] KVM: Add interfaces to control PI outside vmx

This patch adds pi_clear_sn and pi_set_sn to struct kvm_x86_ops,
so we can set/clear SN outside vmx.

Signed-off-by: Feng Wu <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 3 +++
arch/x86/kvm/vmx.c | 13 +++++++++++++
2 files changed, 16 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index c7a09fb..69bc770 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -835,6 +835,9 @@ struct kvm_x86_ops {
gfn_t offset, unsigned long mask);

u64 (*get_pi_desc_addr)(struct kvm_vcpu *vcpu);
+
+ void (*pi_clear_sn)(struct kvm_vcpu *vcpu);
+ void (*pi_set_sn)(struct kvm_vcpu *vcpu);
};

struct kvm_arch_async_pf {
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index ccc9ce4..dd3c63a 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -615,6 +615,16 @@ struct pi_desc *vcpu_to_pi_desc(struct kvm_vcpu *vcpu)
return &(to_vmx(vcpu)->pi_desc);
}

+static void vmx_pi_clear_sn(struct kvm_vcpu *vcpu)
+{
+ pi_clear_sn(vcpu_to_pi_desc(vcpu));
+}
+
+static void vmx_pi_set_sn(struct kvm_vcpu *vcpu)
+{
+ pi_set_sn(vcpu_to_pi_desc(vcpu));
+}
+
static unsigned long shadow_read_only_fields[] = {
/*
* We do NOT shadow fields that are modified when L0
@@ -10307,6 +10317,9 @@ static struct kvm_x86_ops vmx_x86_ops = {
.enable_log_dirty_pt_masked = vmx_enable_log_dirty_pt_masked,

.get_pi_desc_addr = vmx_get_pi_desc_addr,
+
+ .pi_clear_sn = vmx_pi_clear_sn,
+ .pi_set_sn = vmx_pi_set_sn,
};

static int __init vmx_init(void)
--
2.1.0

2015-06-11 11:02:26

by Wu, Feng

[permalink] [raw]
Subject: [v4 06/16] KVM: Make struct kvm_irq_routing_table accessible

Move struct kvm_irq_routing_table from irqchip.c to kvm_host.h,
so we can use it outside of irqchip.c.

Signed-off-by: Feng Wu <[email protected]>
---
include/linux/kvm_host.h | 15 +++++++++++++++
virt/kvm/irqchip.c | 11 -----------
2 files changed, 15 insertions(+), 11 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index ad45054..f591f7c 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -321,6 +321,21 @@ struct kvm_kernel_irq_routing_entry {
struct hlist_node link;
};

+#ifdef CONFIG_HAVE_KVM_IRQ_ROUTING
+
+struct kvm_irq_routing_table {
+ int chip[KVM_NR_IRQCHIPS][KVM_IRQCHIP_NUM_PINS];
+ struct kvm_kernel_irq_routing_entry *rt_entries;
+ u32 nr_rt_entries;
+ /*
+ * Array indexed by gsi. Each entry contains list of irq chips
+ * the gsi is connected to.
+ */
+ struct hlist_head map[0];
+};
+
+#endif
+
#ifndef KVM_PRIVATE_MEM_SLOTS
#define KVM_PRIVATE_MEM_SLOTS 0
#endif
diff --git a/virt/kvm/irqchip.c b/virt/kvm/irqchip.c
index 1d56a90..bac3b52 100644
--- a/virt/kvm/irqchip.c
+++ b/virt/kvm/irqchip.c
@@ -31,17 +31,6 @@
#include <trace/events/kvm.h>
#include "irq.h"

-struct kvm_irq_routing_table {
- int chip[KVM_NR_IRQCHIPS][KVM_IRQCHIP_NUM_PINS];
- struct kvm_kernel_irq_routing_entry *rt_entries;
- u32 nr_rt_entries;
- /*
- * Array indexed by gsi. Each entry contains list of irq chips
- * the gsi is connected to.
- */
- struct hlist_head map[0];
-};
-
int kvm_irq_map_gsi(struct kvm *kvm,
struct kvm_kernel_irq_routing_entry *entries, int gsi)
{
--
2.1.0

2015-06-11 11:02:30

by Wu, Feng

[permalink] [raw]
Subject: [v4 07/16] KVM: make kvm_set_msi_irq() public

Make kvm_set_msi_irq() public, we can use this function outside.

Signed-off-by: Feng Wu <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 4 ++++
arch/x86/kvm/irq_comm.c | 4 ++--
2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 69bc770..31a495f 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -175,6 +175,8 @@ enum {
*/
#define KVM_APIC_PV_EOI_PENDING 1

+struct kvm_kernel_irq_routing_entry;
+
/*
* We don't want allocation failures within the mmu code, so we preallocate
* enough memory for a single page fault in a cache.
@@ -1186,4 +1188,6 @@ void kvm_deliver_pmi(struct kvm_vcpu *vcpu);

bool kvm_intr_is_single_vcpu(struct kvm *kvm, struct kvm_lapic_irq *irq,
struct kvm_vcpu **dest_vcpu);
+void kvm_set_msi_irq(struct kvm_kernel_irq_routing_entry *e,
+ struct kvm_lapic_irq *irq);
#endif /* _ASM_X86_KVM_HOST_H */
diff --git a/arch/x86/kvm/irq_comm.c b/arch/x86/kvm/irq_comm.c
index 9e42645..58d7d49 100644
--- a/arch/x86/kvm/irq_comm.c
+++ b/arch/x86/kvm/irq_comm.c
@@ -94,8 +94,8 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src,
return r;
}

-static inline void kvm_set_msi_irq(struct kvm_kernel_irq_routing_entry *e,
- struct kvm_lapic_irq *irq)
+void kvm_set_msi_irq(struct kvm_kernel_irq_routing_entry *e,
+ struct kvm_lapic_irq *irq)
{
trace_kvm_msi_set_irq(e->msi.address_lo, e->msi.data);

--
2.1.0

2015-06-11 11:02:40

by Wu, Feng

[permalink] [raw]
Subject: [v4 08/16] KVM: kvm-vfio: User API for IRQ forwarding

From: Eric Auger <[email protected]>

This patch adds and documents a new KVM_DEV_VFIO_DEVICE group
and 2 device attributes: KVM_DEV_VFIO_DEVICE_FORWARD_IRQ,
KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ. The purpose is to be able
to set a VFIO device IRQ as forwarded or not forwarded.
the command takes as argument a handle to a new struct named
kvm_vfio_dev_irq.

Signed-off-by: Eric Auger <[email protected]>
---
Documentation/virtual/kvm/devices/vfio.txt | 34 ++++++++++++++++++++++++------
include/uapi/linux/kvm.h | 12 +++++++++++
2 files changed, 40 insertions(+), 6 deletions(-)

diff --git a/Documentation/virtual/kvm/devices/vfio.txt b/Documentation/virtual/kvm/devices/vfio.txt
index ef51740..6186e6d 100644
--- a/Documentation/virtual/kvm/devices/vfio.txt
+++ b/Documentation/virtual/kvm/devices/vfio.txt
@@ -4,15 +4,20 @@ VFIO virtual device
Device types supported:
KVM_DEV_TYPE_VFIO

-Only one VFIO instance may be created per VM. The created device
-tracks VFIO groups in use by the VM and features of those groups
-important to the correctness and acceleration of the VM. As groups
-are enabled and disabled for use by the VM, KVM should be updated
-about their presence. When registered with KVM, a reference to the
-VFIO-group is held by KVM.
+Only one VFIO instance may be created per VM.
+
+The created device tracks VFIO groups in use by the VM and features
+of those groups important to the correctness and acceleration of
+the VM. As groups are enabled and disabled for use by the VM, KVM
+should be updated about their presence. When registered with KVM,
+a reference to the VFIO-group is held by KVM.
+
+The device also enables to control some IRQ settings of VFIO devices:
+forwarding/posting.

Groups:
KVM_DEV_VFIO_GROUP
+ KVM_DEV_VFIO_DEVICE

KVM_DEV_VFIO_GROUP attributes:
KVM_DEV_VFIO_GROUP_ADD: Add a VFIO group to VFIO-KVM device tracking
@@ -20,3 +25,20 @@ KVM_DEV_VFIO_GROUP attributes:

For each, kvm_device_attr.addr points to an int32_t file descriptor
for the VFIO group.
+
+KVM_DEV_VFIO_DEVICE attributes:
+ KVM_DEV_VFIO_DEVICE_FORWARD_IRQ: set a VFIO device IRQ as forwarded
+ KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ: set a VFIO device IRQ as not forwarded
+
+For each, kvm_device_attr.addr points to a kvm_vfio_dev_irq struct.
+
+When forwarded, a physical IRQ is completed by the guest and not by the
+host. This requires HW support in the interrupt controller.
+
+Forwarding can only be set when the corresponding VFIO IRQ is not masked
+(would it be through VFIO_DEVICE_SET_IRQS command or as a consequence of this
+IRQ being currently handled) or active at interrupt controller level.
+In such a situation, -EAGAIN is returned. It is advised to to set the
+forwarding before the VFIO signaling is set up, this avoids trial and errors.
+
+Unforwarding can happen at any time.
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 4b60056..798f3e4 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -999,6 +999,9 @@ struct kvm_device_attr {
#define KVM_DEV_VFIO_GROUP 1
#define KVM_DEV_VFIO_GROUP_ADD 1
#define KVM_DEV_VFIO_GROUP_DEL 2
+#define KVM_DEV_VFIO_DEVICE 2
+#define KVM_DEV_VFIO_DEVICE_FORWARD_IRQ 1
+#define KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ 2

enum kvm_device_type {
KVM_DEV_TYPE_FSL_MPIC_20 = 1,
@@ -1018,6 +1021,15 @@ enum kvm_device_type {
KVM_DEV_TYPE_MAX,
};

+struct kvm_vfio_dev_irq {
+ __u32 argsz; /* structure length */
+ __u32 fd; /* file descriptor of the VFIO device */
+ __u32 index; /* VFIO device IRQ index */
+ __u32 start; /* start of subindex range */
+ __u32 count; /* size of subindex range */
+ __u32 gsi[]; /* gsi, ie. virtual IRQ number */
+};
+
/*
* ioctls for VM fds
*/
--
2.1.0

2015-06-11 11:06:20

by Wu, Feng

[permalink] [raw]
Subject: [v4 09/16] VFIO: external user API for interaction

From: Eric Auger <[email protected]>

The VFIO external user API is enriched with 3 new functions that
allows a kernel user external to VFIO to retrieve some information
from a VFIO device.

- vfio_device_get_external_user enables to get a vfio device from
its fd and increments its reference counter
- vfio_device_put_external_user decrements the reference counter
- vfio_external_base_device returns a handle to the struct device

Signed-off-by: Eric Auger <[email protected]>
---
drivers/vfio/vfio.c | 24 ++++++++++++++++++++++++
include/linux/vfio.h | 3 +++
2 files changed, 27 insertions(+)

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index e1278fe..c10b3cb 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -1504,6 +1504,30 @@ void vfio_group_put_external_user(struct vfio_group *group)
}
EXPORT_SYMBOL_GPL(vfio_group_put_external_user);

+struct vfio_device *vfio_device_get_external_user(struct file *filep)
+{
+ struct vfio_device *vdev = filep->private_data;
+
+ if (filep->f_op != &vfio_device_fops)
+ return ERR_PTR(-EINVAL);
+
+ vfio_device_get(vdev);
+ return vdev;
+}
+EXPORT_SYMBOL_GPL(vfio_device_get_external_user);
+
+void vfio_device_put_external_user(struct vfio_device *vdev)
+{
+ vfio_device_put(vdev);
+}
+EXPORT_SYMBOL_GPL(vfio_device_put_external_user);
+
+struct device *vfio_external_base_device(struct vfio_device *vdev)
+{
+ return vdev->dev;
+}
+EXPORT_SYMBOL_GPL(vfio_external_base_device);
+
int vfio_external_user_iommu_id(struct vfio_group *group)
{
return iommu_group_id(group->iommu_group);
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index ddb4409..e120e9a 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -88,6 +88,9 @@ extern void vfio_group_put_external_user(struct vfio_group *group);
extern int vfio_external_user_iommu_id(struct vfio_group *group);
extern long vfio_external_check_extension(struct vfio_group *group,
unsigned long arg);
+extern struct vfio_device *vfio_device_get_external_user(struct file *filep);
+extern void vfio_device_put_external_user(struct vfio_device *vdev);
+extern struct device *vfio_external_base_device(struct vfio_device *vdev);

struct pci_dev;
#ifdef CONFIG_EEH
--
2.1.0

2015-06-11 11:05:36

by Wu, Feng

[permalink] [raw]
Subject: [v4 10/16] KVM: kvm-vfio: wrappers to VFIO external API device helpers

From: Eric Auger <[email protected]>

Provide wrapper functions that allow KVM-VFIO device code to
interact with a vfio device:
- kvm_vfio_device_get_external_user gets a handle to a struct
vfio_device from the vfio device file descriptor and increments
its reference counter,
- kvm_vfio_device_put_external_user decrements the reference counter
to a vfio device,
- kvm_vfio_external_base_device returns a handle to the struct device
of the vfio device.

Also kvm_vfio_get_vfio_device and kvm_vfio_put_vfio_device helpers
are introduced.

Signed-off-by: Eric Auger <[email protected]>
---
virt/kvm/vfio.c | 74 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 74 insertions(+)

diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
index 620e37f..80a45e4 100644
--- a/virt/kvm/vfio.c
+++ b/virt/kvm/vfio.c
@@ -60,6 +60,80 @@ static void kvm_vfio_group_put_external_user(struct vfio_group *vfio_group)
symbol_put(vfio_group_put_external_user);
}

+static struct vfio_device *kvm_vfio_device_get_external_user(struct file *filep)
+{
+ struct vfio_device *vdev;
+ struct vfio_device *(*fn)(struct file *);
+
+ fn = symbol_get(vfio_device_get_external_user);
+ if (!fn)
+ return ERR_PTR(-EINVAL);
+
+ vdev = fn(filep);
+
+ symbol_put(vfio_device_get_external_user);
+
+ return vdev;
+}
+
+static void kvm_vfio_device_put_external_user(struct vfio_device *vdev)
+{
+ void (*fn)(struct vfio_device *);
+
+ fn = symbol_get(vfio_device_put_external_user);
+ if (!fn)
+ return;
+
+ fn(vdev);
+
+ symbol_put(vfio_device_put_external_user);
+}
+
+static struct device *kvm_vfio_external_base_device(struct vfio_device *vdev)
+{
+ struct device *(*fn)(struct vfio_device *);
+ struct device *dev;
+
+ fn = symbol_get(vfio_external_base_device);
+ if (!fn)
+ return NULL;
+
+ dev = fn(vdev);
+
+ symbol_put(vfio_external_base_device);
+
+ return dev;
+}
+
+/**
+ * kvm_vfio_get_vfio_device - Returns a handle to a vfio-device
+ *
+ * Checks it is a valid vfio device and increments its reference counter
+ * @fd: file descriptor of the vfio platform device
+ */
+static struct vfio_device *kvm_vfio_get_vfio_device(int fd)
+{
+ struct fd f = fdget(fd);
+ struct vfio_device *vdev;
+
+ if (!f.file)
+ return ERR_PTR(-EINVAL);
+ vdev = kvm_vfio_device_get_external_user(f.file);
+ fdput(f);
+ return vdev;
+}
+
+/**
+ * kvm_vfio_put_vfio_device: decrements the reference counter of the
+ * vfio platform * device
+ *
+ * @vdev: vfio_device handle to release
+ */
+static void kvm_vfio_put_vfio_device(struct vfio_device *vdev)
+{
+ kvm_vfio_device_put_external_user(vdev);
+}
+
static bool kvm_vfio_group_is_coherent(struct vfio_group *vfio_group)
{
long (*fn)(struct vfio_group *, unsigned long);
--
2.1.0

2015-06-11 11:05:23

by Wu, Feng

[permalink] [raw]
Subject: [v4 11/16] KVM: kvm-vfio: User API for VT-d Posted-Interrupts

This patch adds and documents two new attributes
KVM_DEV_VFIO_DEVICE_POST_IRQ and KVM_DEV_VFIO_DEVICE_UNPOST_IRQ
in KVM_DEV_VFIO_DEVICE group. The new attributes are used for
VT-d Posted-Interrupts.

When guest OS changes the interrupt configuration for an
assigned device, such as, MSI/MSIx data/address fields,
QEMU will use this IRQ attribute to tell KVM to update the
related IRTE according the VT-d Posted-Interrrupts Specification,
such as, the guest vector should be updated in the related IRTE.

Signed-off-by: Feng Wu <[email protected]>
---
Documentation/virtual/kvm/devices/vfio.txt | 9 +++++++++
include/uapi/linux/kvm.h | 2 ++
2 files changed, 11 insertions(+)

diff --git a/Documentation/virtual/kvm/devices/vfio.txt b/Documentation/virtual/kvm/devices/vfio.txt
index 6186e6d..34925e1 100644
--- a/Documentation/virtual/kvm/devices/vfio.txt
+++ b/Documentation/virtual/kvm/devices/vfio.txt
@@ -42,3 +42,12 @@ In such a situation, -EAGAIN is returned. It is advised to to set the
forwarding before the VFIO signaling is set up, this avoids trial and errors.

Unforwarding can happen at any time.
+
+ KVM_DEV_VFIO_DEVICE_POST_IRQ: set a VFIO device IRQ as posted
+ KVM_DEV_VFIO_DEVICE_UNPOST_IRQ: set a VFIO device IRQ as remapped
+For this attribute, kvm_device_attr.addr points to a kvm_vfio_dev_irq struct.
+
+When guest OS changes the interrupt configuration for an assigned device,
+such as, MSI/MSIx data/address fields, QEMU will use this IRQ attribute
+to tell KVM to update the related IRTE according the VT-d Posted-Interrrupts
+Specification, such as, the guest vector should be updated in the related IRTE.
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 798f3e4..6e17596 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1002,6 +1002,8 @@ struct kvm_device_attr {
#define KVM_DEV_VFIO_DEVICE 2
#define KVM_DEV_VFIO_DEVICE_FORWARD_IRQ 1
#define KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ 2
+#define KVM_DEV_VFIO_DEVICE_POST_IRQ 3
+#define KVM_DEV_VFIO_DEVICE_UNPOST_IRQ 4

enum kvm_device_type {
KVM_DEV_TYPE_FSL_MPIC_20 = 1,
--
2.1.0

2015-06-11 11:05:13

by Wu, Feng

[permalink] [raw]
Subject: [v4 12/16] KVM: kvm-vfio: implement the VFIO skeleton for VT-d Posted-Interrupts

This patch adds the kvm-vfio interface for VT-d Posted-Interrupts.
When guests update MSI/MSI-x information for an assigned-device,
QEMU will use KVM_DEV_VFIO_DEVICE_POST_IRQ attribute to setup
IRTE for VT-d PI. Userspace program can also use
KVM_DEV_VFIO_DEVICE_UNPOST_IRQ to change back to irq remapping mode.
This patch implements these IRQ attributes.

Signed-off-by: Feng Wu <[email protected]>
---
include/linux/kvm_host.h | 22 +++++++++
virt/kvm/vfio.c | 126 +++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 148 insertions(+)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index f591f7c..69f8711 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1073,6 +1073,28 @@ extern struct kvm_device_ops kvm_xics_ops;
extern struct kvm_device_ops kvm_arm_vgic_v2_ops;
extern struct kvm_device_ops kvm_arm_vgic_v3_ops;

+#ifdef __KVM_HAVE_ARCH_KVM_VFIO_POST
+/*
+ * kvm_arch_vfio_update_pi_irte - set IRTE for Posted-Interrupts
+ *
+ * @kvm: kvm
+ * @host_irq: host irq of the interrupt
+ * @guest_irq: gsi of the interrupt
+ * @set: set or unset PI
+ * returns 0 on success, < 0 on failure
+ */
+int kvm_arch_vfio_update_pi_irte(struct kvm *kvm, unsigned int host_irq,
+ uint32_t guest_irq, bool set);
+#else
+static inline int kvm_arch_vfio_update_pi_irte(struct kvm *kvm,
+ unsigned int host_irq,
+ uint32_t guest_irq,
+ bool set)
+{
+ return 0;
+}
+#endif
+
#ifdef CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT

static inline void kvm_vcpu_set_in_spin_loop(struct kvm_vcpu *vcpu, bool val)
diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
index 80a45e4..547fc51 100644
--- a/virt/kvm/vfio.c
+++ b/virt/kvm/vfio.c
@@ -18,6 +18,7 @@
#include <linux/slab.h>
#include <linux/uaccess.h>
#include <linux/vfio.h>
+#include <asm/irq_remapping.h>
#include "vfio.h"

struct kvm_vfio_group {
@@ -276,12 +277,128 @@ static int kvm_vfio_set_group(struct kvm_device *dev, long attr, u64 arg)
return -ENXIO;
}

+static int kvm_vfio_pci_get_irq_count(struct pci_dev *pdev, int irq_type)
+{
+ if (irq_type == VFIO_PCI_INTX_IRQ_INDEX) {
+ u8 pin;
+
+ pci_read_config_byte(pdev, PCI_INTERRUPT_PIN, &pin);
+ if (pin)
+ return 1;
+ } else if (irq_type == VFIO_PCI_MSI_IRQ_INDEX) {
+ return pci_msi_vec_count(pdev);
+ } else if (irq_type == VFIO_PCI_MSIX_IRQ_INDEX) {
+ return pci_msix_vec_count(pdev);
+ }
+
+ return 0;
+}
+
+static int kvm_vfio_control_pi(struct kvm_device *kdev,
+ int32_t __user *argp, bool set)
+{
+ struct kvm_vfio_dev_irq pi_info;
+ uint32_t *gsi;
+ unsigned long minsz;
+ struct vfio_device *vdev;
+ struct msi_desc *entry;
+ struct device *dev;
+ struct pci_dev *pdev;
+ int i, max, ret;
+
+ minsz = offsetofend(struct kvm_vfio_dev_irq, count);
+
+ if (copy_from_user(&pi_info, (void __user *)argp, minsz))
+ return -EFAULT;
+
+ if (pi_info.argsz < minsz || pi_info.index >= VFIO_PCI_NUM_IRQS)
+ return -EINVAL;
+
+ vdev = kvm_vfio_get_vfio_device(pi_info.fd);
+ if (IS_ERR(vdev))
+ return PTR_ERR(vdev);
+
+ dev = kvm_vfio_external_base_device(vdev);
+ if (!dev || !dev_is_pci(dev)) {
+ ret = -EFAULT;
+ goto put_vfio_device;
+ }
+
+ pdev = to_pci_dev(dev);
+
+ max = kvm_vfio_pci_get_irq_count(pdev, pi_info.index);
+ if (max <= 0) {
+ ret = -EFAULT;
+ goto put_vfio_device;
+ }
+
+ if (pi_info.argsz - minsz < pi_info.count * sizeof(u32) ||
+ pi_info.start >= max || pi_info.start + pi_info.count > max) {
+ ret = -EINVAL;
+ goto put_vfio_device;
+ }
+
+ gsi = memdup_user((void __user *)((unsigned long)argp + minsz),
+ pi_info.count * sizeof(u32));
+ if (IS_ERR(gsi)) {
+ ret = PTR_ERR(gsi);
+ goto put_vfio_device;
+ }
+
+#ifdef CONFIG_PCI_MSI
+ for (i = 0; i < pi_info.count; i++) {
+ list_for_each_entry(entry, &pdev->msi_list, list) {
+ if (entry->msi_attrib.entry_nr != pi_info.start+i)
+ continue;
+
+ ret = kvm_arch_vfio_update_pi_irte(kdev->kvm,
+ entry->irq,
+ gsi[i],
+ set);
+ if (ret)
+ goto free_gsi;
+ }
+ }
+#endif
+
+ ret = 0;
+
+free_gsi:
+ kfree(gsi);
+
+put_vfio_device:
+ kvm_vfio_put_vfio_device(vdev);
+ return ret;
+}
+
+static int kvm_vfio_set_device(struct kvm_device *kdev, long attr, u64 arg)
+{
+ int32_t __user *argp = (int32_t __user *)(unsigned long)arg;
+ int ret;
+
+ switch (attr) {
+#ifdef __KVM_HAVE_ARCH_KVM_VFIO_POST
+ case KVM_DEV_VFIO_DEVICE_POST_IRQ:
+ ret = kvm_vfio_control_pi(kdev, argp, 1);
+ break;
+ case KVM_DEV_VFIO_DEVICE_UNPOST_IRQ:
+ ret = kvm_vfio_control_pi(kdev, argp, 0);
+ break;
+#endif
+ default:
+ ret = -ENXIO;
+ }
+ return ret;
+}
+
static int kvm_vfio_set_attr(struct kvm_device *dev,
struct kvm_device_attr *attr)
{
switch (attr->group) {
case KVM_DEV_VFIO_GROUP:
return kvm_vfio_set_group(dev, attr->attr, attr->addr);
+ case KVM_DEV_VFIO_DEVICE:
+ return kvm_vfio_set_device(dev, attr->attr, attr->addr);
}

return -ENXIO;
@@ -299,6 +416,15 @@ static int kvm_vfio_has_attr(struct kvm_device *dev,
}

break;
+ case KVM_DEV_VFIO_DEVICE:
+ switch (attr->attr) {
+#ifdef __KVM_HAVE_ARCH_KVM_VFIO_POST
+ case KVM_DEV_VFIO_DEVICE_POST_IRQ:
+ case KVM_DEV_VFIO_DEVICE_UNPOST_IRQ:
+ return irq_remapping_cap(IRQ_POSTING_CAP) ? 0 : -ENXIO;
+#endif
+ }
+ break;
}

return -ENXIO;
--
2.1.0

2015-06-11 11:04:57

by Wu, Feng

[permalink] [raw]
Subject: [v4 13/16] KVM: x86: kvm-vfio: VT-d posted-interrupts setup

This patch defines macro __KVM_HAVE_ARCH_KVM_VFIO_POST and
implement kvm_arch_vfio_update_pi_irte for x86 architecture.

Signed-off-by: Feng Wu <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 2 +
arch/x86/kvm/Makefile | 3 +-
arch/x86/kvm/kvm_vfio_x86.c | 85 +++++++++++++++++++++++++++++++++++++++++
3 files changed, 89 insertions(+), 1 deletion(-)
create mode 100644 arch/x86/kvm/kvm_vfio_x86.c

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 31a495f..1605bf8 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -81,6 +81,8 @@ static inline gfn_t gfn_to_index(gfn_t gfn, gfn_t base_gfn, int level)
(base_gfn >> KVM_HPAGE_GFN_SHIFT(level));
}

+#define __KVM_HAVE_ARCH_KVM_VFIO_POST
+
#define KVM_PERMILLE_MMU_PAGES 20
#define KVM_MIN_ALLOC_MMU_PAGES 64
#define KVM_MMU_HASH_SHIFT 10
diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
index 16e8f96..6bafc89 100644
--- a/arch/x86/kvm/Makefile
+++ b/arch/x86/kvm/Makefile
@@ -12,7 +12,8 @@ kvm-y += $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o \
kvm-$(CONFIG_KVM_ASYNC_PF) += $(KVM)/async_pf.o

kvm-y += x86.o mmu.o emulate.o i8259.o irq.o lapic.o \
- i8254.o ioapic.o irq_comm.o cpuid.o pmu.o
+ i8254.o ioapic.o irq_comm.o cpuid.o pmu.o \
+ kvm_vfio_x86.o
kvm-$(CONFIG_KVM_DEVICE_ASSIGNMENT) += assigned-dev.o iommu.o
kvm-intel-y += vmx.o
kvm-amd-y += svm.o
diff --git a/arch/x86/kvm/kvm_vfio_x86.c b/arch/x86/kvm/kvm_vfio_x86.c
new file mode 100644
index 0000000..a2d74f9
--- /dev/null
+++ b/arch/x86/kvm/kvm_vfio_x86.c
@@ -0,0 +1,85 @@
+/*
+ * Copyright (C) 2014 Intel Corporation.
+ * Authors: Feng Wu <[email protected]>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ */
+
+#include <linux/kvm_host.h>
+#include <asm/irq_remapping.h>
+
+/*
+ * kvm_arch_vfio_update_pi_irte - set IRTE for Posted-Interrupts
+ *
+ * @kvm: kvm
+ * @host_irq: host irq of the interrupt
+ * @guest_irq: gsi of the interrupt
+ * @set: set or unset PI
+ * returns 0 on success, < 0 on failure
+ */
+int kvm_arch_vfio_update_pi_irte(struct kvm *kvm, unsigned int host_irq,
+ uint32_t guest_irq, bool set)
+{
+ struct kvm_kernel_irq_routing_entry *e;
+ struct kvm_irq_routing_table *irq_rt;
+ struct kvm_lapic_irq irq;
+ struct kvm_vcpu *vcpu;
+ struct vcpu_data vcpu_info;
+ int idx, ret = -EINVAL;
+
+ idx = srcu_read_lock(&kvm->irq_srcu);
+ irq_rt = srcu_dereference(kvm->irq_routing, &kvm->irq_srcu);
+ BUG_ON(guest_irq >= irq_rt->nr_rt_entries);
+
+ hlist_for_each_entry(e, &irq_rt->map[guest_irq], link) {
+ if (e->type != KVM_IRQ_ROUTING_MSI)
+ continue;
+ /*
+ * VT-d PI cannot support posting multicast/broadcast
+ * interrupts to a VCPU, we still use interrupt remapping
+ * for these kind of interrupts.
+ *
+ * For lowest-priority interrupts, we only support
+ * those with single CPU as the destination, e.g. user
+ * configures the interrupts via /proc/irq or uses
+ * irqbalance to make the interrupts single-CPU.
+ *
+ * We will support full lowest-priority interrupt later.
+ *
+ */
+
+ kvm_set_msi_irq(e, &irq);
+ if (!kvm_intr_is_single_vcpu(kvm, &irq, &vcpu))
+ continue;
+
+ vcpu_info.pi_desc_addr = kvm_x86_ops->get_pi_desc_addr(vcpu);
+ vcpu_info.vector = irq.vector;
+
+ if (set)
+ ret = irq_set_vcpu_affinity(host_irq, &vcpu_info);
+ else {
+ /* suppress notification event before unposting */
+ kvm_x86_ops->pi_set_sn(vcpu);
+ ret = irq_set_vcpu_affinity(host_irq, NULL);
+ kvm_x86_ops->pi_clear_sn(vcpu);
+ }
+
+ if (ret < 0) {
+ printk(KERN_INFO "%s: failed to update PI IRTE\n",
+ __func__);
+ goto out;
+ }
+ }
+
+ ret = 0;
+out:
+ srcu_read_unlock(&kvm->irq_srcu, idx);
+ return ret;
+}
--
2.1.0

2015-06-11 11:04:38

by Wu, Feng

[permalink] [raw]
Subject: [v4 14/16] KVM: Update Posted-Interrupts Descriptor when vCPU is preempted

This patch updates the Posted-Interrupts Descriptor when vCPU
is preempted.

sched out:
- Set 'SN' to suppress furture non-urgent interrupts posted for
the vCPU.

sched in:
- Clear 'SN'
- Change NDST if vCPU is scheduled to a different CPU
- Set 'NV' to POSTED_INTR_VECTOR

Signed-off-by: Feng Wu <[email protected]>
---
arch/x86/kvm/vmx.c | 34 ++++++++++++++++++++++++++++++++++
1 file changed, 34 insertions(+)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index dd3c63a..9d9b403 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -46,6 +46,7 @@
#include <asm/debugreg.h>
#include <asm/kexec.h>
#include <asm/apic.h>
+#include <asm/irq_remapping.h>

#include "trace.h"

@@ -2001,10 +2002,43 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
vmcs_writel(HOST_IA32_SYSENTER_ESP, sysenter_esp); /* 22.2.3 */
vmx->loaded_vmcs->cpu = cpu;
}
+
+ if (irq_remapping_cap(IRQ_POSTING_CAP)) {
+ struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
+ struct pi_desc old, new;
+ unsigned int dest;
+
+ do {
+ old.control = new.control = pi_desc->control;
+ if (vcpu->cpu != cpu) {
+ dest = cpu_physical_id(cpu);
+
+ if (x2apic_enabled())
+ new.ndst = dest;
+ else
+ new.ndst = (dest << 8) & 0xFF00;
+ }
+
+ /* Allow posting non-urgent interrupts */
+ new.sn = 0;
+
+ /* set 'NV' to 'notification vector' */
+ new.nv = POSTED_INTR_VECTOR;
+ } while (cmpxchg(&pi_desc->control, old.control,
+ new.control) != old.control);
+ }
}

static void vmx_vcpu_put(struct kvm_vcpu *vcpu)
{
+ if (irq_remapping_cap(IRQ_POSTING_CAP)) {
+ struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
+
+ /* Set SN when the vCPU is preempted */
+ if (vcpu->preempted)
+ pi_set_sn(pi_desc);
+ }
+
__vmx_load_host_state(to_vmx(vcpu));
if (!vmm_exclusive) {
__loaded_vmcs_clear(to_vmx(vcpu)->loaded_vmcs);
--
2.1.0

2015-06-11 11:03:50

by Wu, Feng

[permalink] [raw]
Subject: [v4 15/16] KVM: Update Posted-Interrupts Descriptor when vCPU is blocked

This patch updates the Posted-Interrupts Descriptor when vCPU
is blocked.

pre-block:
- Add the vCPU to the blocked per-CPU list
- Set 'NV' to POSTED_INTR_WAKEUP_VECTOR

post-block:
- Remove the vCPU from the per-CPU list

Signed-off-by: Feng Wu <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 3 +
arch/x86/kvm/vmx.c | 158 ++++++++++++++++++++++++++++++++++++++++
arch/x86/kvm/x86.c | 42 ++++++++---
include/linux/kvm_host.h | 3 +
virt/kvm/kvm_main.c | 3 +
5 files changed, 199 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 1605bf8..2ef6f2a 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -842,6 +842,9 @@ struct kvm_x86_ops {

void (*pi_clear_sn)(struct kvm_vcpu *vcpu);
void (*pi_set_sn)(struct kvm_vcpu *vcpu);
+
+ int (*pi_pre_block)(struct kvm_vcpu *vcpu);
+ void (*pi_post_block)(struct kvm_vcpu *vcpu);
};

struct kvm_arch_async_pf {
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 9d9b403..7e8a800 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -888,6 +888,13 @@ static DEFINE_PER_CPU(struct vmcs *, current_vmcs);
static DEFINE_PER_CPU(struct list_head, loaded_vmcss_on_cpu);
static DEFINE_PER_CPU(struct desc_ptr, host_gdt);

+/*
+ * We maintian a per-CPU linked-list of vCPU, so in wakeup_handler() we
+ * can find which vCPU should be waken up.
+ */
+static DEFINE_PER_CPU(struct list_head, blocked_vcpu_on_cpu);
+static DEFINE_PER_CPU(spinlock_t, blocked_vcpu_on_cpu_lock);
+
static unsigned long *vmx_io_bitmap_a;
static unsigned long *vmx_io_bitmap_b;
static unsigned long *vmx_msr_bitmap_legacy;
@@ -2972,6 +2979,8 @@ static int hardware_enable(void)
return -EBUSY;

INIT_LIST_HEAD(&per_cpu(loaded_vmcss_on_cpu, cpu));
+ INIT_LIST_HEAD(&per_cpu(blocked_vcpu_on_cpu, cpu));
+ spin_lock_init(&per_cpu(blocked_vcpu_on_cpu_lock, cpu));

/*
* Now we can enable the vmclear operation in kdump
@@ -6099,6 +6108,25 @@ static void update_ple_window_actual_max(void)
ple_window_grow, INT_MIN);
}

+/*
+ * Handler for POSTED_INTERRUPT_WAKEUP_VECTOR.
+ */
+static void wakeup_handler(void)
+{
+ struct kvm_vcpu *vcpu;
+ int cpu = smp_processor_id();
+
+ spin_lock(&per_cpu(blocked_vcpu_on_cpu_lock, cpu));
+ list_for_each_entry(vcpu, &per_cpu(blocked_vcpu_on_cpu, cpu),
+ blocked_vcpu_list) {
+ struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
+
+ if (pi_test_on(pi_desc) == 1)
+ kvm_vcpu_kick(vcpu);
+ }
+ spin_unlock(&per_cpu(blocked_vcpu_on_cpu_lock, cpu));
+}
+
static __init int hardware_setup(void)
{
int r = -ENOMEM, i, msr;
@@ -6283,6 +6311,8 @@ static __init int hardware_setup(void)
kvm_x86_ops->enable_log_dirty_pt_masked = NULL;
}

+ kvm_set_posted_intr_wakeup_handler(wakeup_handler);
+
return alloc_kvm_area();

out8:
@@ -10236,6 +10266,131 @@ static void vmx_enable_log_dirty_pt_masked(struct kvm *kvm,
kvm_mmu_clear_dirty_pt_masked(kvm, memslot, offset, mask);
}

+/*
+ * This routine does the following things for vCPU which is going
+ * to be blocked if VT-d PI is enabled.
+ * - Store the vCPU to the wakeup list, so when interrupts happen
+ * we can find the right vCPU to wake up.
+ * - Change the Posted-interrupt descriptor as below:
+ * 'NDST' <-- vcpu->pre_pcpu
+ * 'NV' <-- POSTED_INTR_WAKEUP_VECTOR
+ * - If 'ON' is set during this process, which means at least one
+ * interrupt is posted for this vCPU, we cannot block it, in
+ * this case, return 1, otherwise, return 0.
+ *
+ */
+static int vmx_pi_pre_block(struct kvm_vcpu *vcpu)
+{
+ unsigned long flags;
+ unsigned int dest;
+ struct pi_desc old, new;
+ struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
+
+ if (!irq_remapping_cap(IRQ_POSTING_CAP))
+ return 0;
+
+ vcpu->pre_pcpu = vcpu->cpu;
+ spin_lock_irqsave(&per_cpu(blocked_vcpu_on_cpu_lock,
+ vcpu->pre_pcpu), flags);
+ list_add_tail(&vcpu->blocked_vcpu_list,
+ &per_cpu(blocked_vcpu_on_cpu,
+ vcpu->pre_pcpu));
+ spin_unlock_irqrestore(&per_cpu(blocked_vcpu_on_cpu_lock,
+ vcpu->pre_pcpu), flags);
+
+ do {
+ old.control = new.control = pi_desc->control;
+
+ /*
+ * We should not block the vCPU if
+ * an interrupt is posted for it.
+ */
+ if (pi_test_on(pi_desc) == 1) {
+ spin_lock_irqsave(&per_cpu(blocked_vcpu_on_cpu_lock,
+ vcpu->pre_pcpu), flags);
+ list_del(&vcpu->blocked_vcpu_list);
+ spin_unlock_irqrestore(
+ &per_cpu(blocked_vcpu_on_cpu_lock,
+ vcpu->pre_pcpu), flags);
+ vcpu->pre_pcpu = -1;
+
+ return 1;
+ }
+
+ WARN((pi_desc->sn == 1),
+ "Warning: SN field of posted-interrupts "
+ "is set before blocking\n");
+
+ /*
+ * Since vCPU can be preempted during this process,
+ * vcpu->cpu could be different with pre_pcpu, we
+ * need to set pre_pcpu as the destination of wakeup
+ * notification event, then we can find the right vCPU
+ * to wakeup in wakeup handler if interrupts happen
+ * when the vCPU is in blocked state.
+ */
+ dest = cpu_physical_id(vcpu->pre_pcpu);
+
+ if (x2apic_enabled())
+ new.ndst = dest;
+ else
+ new.ndst = (dest << 8) & 0xFF00;
+
+ /* set 'NV' to 'wakeup vector' */
+ new.nv = POSTED_INTR_WAKEUP_VECTOR;
+ } while (cmpxchg(&pi_desc->control, old.control,
+ new.control) != old.control);
+
+ return 0;
+}
+
+static void vmx_pi_post_block(struct kvm_vcpu *vcpu)
+{
+ struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
+ struct pi_desc old, new;
+ unsigned int dest;
+ unsigned long flags;
+
+ if (!irq_remapping_cap(IRQ_POSTING_CAP))
+ return;
+
+ /*
+ * If the vCPU is not really blocked and vmx_vcpu_load()
+ * doesn't have chance to run before this point, we should
+ * set posted-interrupt descriptor back, just like what we
+ * do in vmx_vcpu_load().
+ */
+ if (pi_desc->nv == POSTED_INTR_WAKEUP_VECTOR)
+ do {
+ old.control = new.control = pi_desc->control;
+
+ dest = cpu_physical_id(vcpu->cpu);
+
+ if (x2apic_enabled())
+ new.ndst = dest;
+ else
+ new.ndst = (dest << 8) & 0xFF00;
+
+ /* Allow posting non-urgent interrupts */
+ new.sn = 0;
+
+ /* set 'NV' to 'notification vector' */
+ new.nv = POSTED_INTR_VECTOR;
+ } while (cmpxchg(&pi_desc->control, old.control,
+ new.control) != old.control);
+
+ if(vcpu->pre_pcpu != -1) {
+ spin_lock_irqsave(
+ &per_cpu(blocked_vcpu_on_cpu_lock,
+ vcpu->pre_pcpu), flags);
+ list_del(&vcpu->blocked_vcpu_list);
+ spin_unlock_irqrestore(
+ &per_cpu(blocked_vcpu_on_cpu_lock,
+ vcpu->pre_pcpu), flags);
+ vcpu->pre_pcpu = -1;
+ }
+}
+
static struct kvm_x86_ops vmx_x86_ops = {
.cpu_has_kvm_support = cpu_has_kvm_support,
.disabled_by_bios = vmx_disabled_by_bios,
@@ -10354,6 +10509,9 @@ static struct kvm_x86_ops vmx_x86_ops = {

.pi_clear_sn = vmx_pi_clear_sn,
.pi_set_sn = vmx_pi_set_sn,
+
+ .pi_pre_block = vmx_pi_pre_block,
+ .pi_post_block = vmx_pi_post_block,
};

static int __init vmx_init(void)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c73efcd..fabf4a7 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5866,7 +5866,11 @@ int kvm_vcpu_halt(struct kvm_vcpu *vcpu)
{
++vcpu->stat.halt_exits;
if (irqchip_in_kernel(vcpu->kvm)) {
- vcpu->arch.mp_state = KVM_MP_STATE_HALTED;
+ /* Handle posted-interrupt when vCPU is to be halted */
+ if (!kvm_x86_ops->pi_pre_block ||
+ (kvm_x86_ops->pi_pre_block &&
+ kvm_x86_ops->pi_pre_block(vcpu) == 0))
+ vcpu->arch.mp_state = KVM_MP_STATE_HALTED;
return 1;
} else {
vcpu->run->exit_reason = KVM_EXIT_HLT;
@@ -6282,6 +6286,21 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
kvm_vcpu_reload_apic_access_page(vcpu);
}

+ /*
+ * Since posted-interrupts can be set by VT-d HW now, in this
+ * case, KVM_REQ_EVENT is not set. We move the following
+ * operations out of the if statement.
+ */
+ if (kvm_lapic_enabled(vcpu)) {
+ /*
+ * Update architecture specific hints for APIC
+ * virtual interrupt delivery.
+ */
+ if (kvm_x86_ops->hwapic_irr_update)
+ kvm_x86_ops->hwapic_irr_update(vcpu,
+ kvm_lapic_find_highest_irr(vcpu));
+ }
+
if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) {
kvm_apic_accept_events(vcpu);
if (vcpu->arch.mp_state == KVM_MP_STATE_INIT_RECEIVED) {
@@ -6298,13 +6317,6 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
kvm_x86_ops->enable_irq_window(vcpu);

if (kvm_lapic_enabled(vcpu)) {
- /*
- * Update architecture specific hints for APIC
- * virtual interrupt delivery.
- */
- if (kvm_x86_ops->hwapic_irr_update)
- kvm_x86_ops->hwapic_irr_update(vcpu,
- kvm_lapic_find_highest_irr(vcpu));
update_cr8_intercept(vcpu);
kvm_lapic_sync_to_vapic(vcpu);
}
@@ -6475,10 +6487,20 @@ static int vcpu_run(struct kvm_vcpu *vcpu)

for (;;) {
if (vcpu->arch.mp_state == KVM_MP_STATE_RUNNABLE &&
- !vcpu->arch.apf.halted)
+ !vcpu->arch.apf.halted) {
r = vcpu_enter_guest(vcpu);
- else
+ } else {
r = vcpu_block(kvm, vcpu);
+
+ /*
+ * pi_post_block() must be called after
+ * pi_pre_block() which is called in
+ * kvm_vcpu_halt().
+ */
+ if (kvm_x86_ops->pi_post_block)
+ kvm_x86_ops->pi_post_block(vcpu);
+ }
+
if (r <= 0)
break;

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 69f8711..6c90ef2 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -225,6 +225,9 @@ struct kvm_vcpu {
unsigned long requests;
unsigned long guest_debug;

+ int pre_pcpu;
+ struct list_head blocked_vcpu_list;
+
struct mutex mutex;
struct kvm_run *run;

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 9097741..6938554 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -221,6 +221,9 @@ int kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, unsigned id)
init_waitqueue_head(&vcpu->wq);
kvm_async_pf_vcpu_init(vcpu);

+ vcpu->pre_pcpu = -1;
+ INIT_LIST_HEAD(&vcpu->blocked_vcpu_list);
+
page = alloc_page(GFP_KERNEL | __GFP_ZERO);
if (!page) {
r = -ENOMEM;
--
2.1.0

2015-06-11 11:03:38

by Wu, Feng

[permalink] [raw]
Subject: [v4 16/16] KVM: Warn if 'SN' is set during posting interrupts by software

Currently, we don't support urgent interrupt, all interrupts
are recognized as non-urgent interrupt, so we cannot post
interrupts when 'SN' is set.

If the vcpu is in guest mode, it cannot have been scheduled out,
and that's the only case when SN is set currently, warning if
SN is set.

Signed-off-by: Feng Wu <[email protected]>
---
arch/x86/kvm/vmx.c | 16 ++++++++++++++++
1 file changed, 16 insertions(+)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 7e8a800..f1daa8b 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -4485,6 +4485,22 @@ static inline bool kvm_vcpu_trigger_posted_interrupt(struct kvm_vcpu *vcpu)
{
#ifdef CONFIG_SMP
if (vcpu->mode == IN_GUEST_MODE) {
+ struct vcpu_vmx *vmx = to_vmx(vcpu);
+
+ /*
+ * Currently, we don't support urgent interrupt,
+ * all interrupts are recognized as non-urgent
+ * interrupt, so we cannot post interrupts when
+ * 'SN' is set.
+ *
+ * If the vcpu is in guest mode, it means it is
+ * running instead of being scheduled out and
+ * waiting in the run queue, and that's the only
+ * case when 'SN' is set currently, warning if
+ * 'SN' is set.
+ */
+ WARN_ON_ONCE(pi_test_sn(&vmx->pi_desc));
+
apic->send_IPI_mask(get_cpu_mask(vcpu->cpu),
POSTED_INTR_VECTOR);
return true;
--
2.1.0

2015-06-11 13:37:58

by Eric Auger

[permalink] [raw]
Subject: Re: [v4 08/16] KVM: kvm-vfio: User API for IRQ forwarding

Hi Feng,
On 06/11/2015 12:51 PM, Feng Wu wrote:
> From: Eric Auger <[email protected]>
>
> This patch adds and documents a new KVM_DEV_VFIO_DEVICE group
> and 2 device attributes: KVM_DEV_VFIO_DEVICE_FORWARD_IRQ,
> KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ. The purpose is to be able
> to set a VFIO device IRQ as forwarded or not forwarded.
> the command takes as argument a handle to a new struct named
> kvm_vfio_dev_irq.
>
> Signed-off-by: Eric Auger <[email protected]>
> ---
> Documentation/virtual/kvm/devices/vfio.txt | 34 ++++++++++++++++++++++++------
> include/uapi/linux/kvm.h | 12 +++++++++++
> 2 files changed, 40 insertions(+), 6 deletions(-)
>
> diff --git a/Documentation/virtual/kvm/devices/vfio.txt b/Documentation/virtual/kvm/devices/vfio.txt
> index ef51740..6186e6d 100644
> --- a/Documentation/virtual/kvm/devices/vfio.txt
> +++ b/Documentation/virtual/kvm/devices/vfio.txt
> @@ -4,15 +4,20 @@ VFIO virtual device
> Device types supported:
> KVM_DEV_TYPE_VFIO
>
> -Only one VFIO instance may be created per VM. The created device
> -tracks VFIO groups in use by the VM and features of those groups
> -important to the correctness and acceleration of the VM. As groups
> -are enabled and disabled for use by the VM, KVM should be updated
> -about their presence. When registered with KVM, a reference to the
> -VFIO-group is held by KVM.
> +Only one VFIO instance may be created per VM.
> +
> +The created device tracks VFIO groups in use by the VM and features
> +of those groups important to the correctness and acceleration of
> +the VM. As groups are enabled and disabled for use by the VM, KVM
> +should be updated about their presence. When registered with KVM,
> +a reference to the VFIO-group is held by KVM.
> +
> +The device also enables to control some IRQ settings of VFIO devices:
> +forwarding/posting.
>
> Groups:
> KVM_DEV_VFIO_GROUP
> + KVM_DEV_VFIO_DEVICE
>
> KVM_DEV_VFIO_GROUP attributes:
> KVM_DEV_VFIO_GROUP_ADD: Add a VFIO group to VFIO-KVM device tracking
> @@ -20,3 +25,20 @@ KVM_DEV_VFIO_GROUP attributes:
>
> For each, kvm_device_attr.addr points to an int32_t file descriptor
> for the VFIO group.
> +
> +KVM_DEV_VFIO_DEVICE attributes:
> + KVM_DEV_VFIO_DEVICE_FORWARD_IRQ: set a VFIO device IRQ as forwarded
> + KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ: set a VFIO device IRQ as not forwarded
> +
> +For each, kvm_device_attr.addr points to a kvm_vfio_dev_irq struct.
> +
> +When forwarded, a physical IRQ is completed by the guest and not by the
> +host. This requires HW support in the interrupt controller.
> +
> +Forwarding can only be set when the corresponding VFIO IRQ is not masked
> +(would it be through VFIO_DEVICE_SET_IRQS command or as a consequence of this
> +IRQ being currently handled) or active at interrupt controller level.
> +In such a situation, -EAGAIN is returned. It is advised to to set the
> +forwarding before the VFIO signaling is set up, this avoids trial and errors.
> +
> +Unforwarding can happen at any time.
Unfortunately the above text is out of context of your series. If it
simplifies things take the ownership of that patch file and remove my
description of KVM_DEV_VFIO_DEVICE_FORWARD_IRQ.
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 4b60056..798f3e4 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -999,6 +999,9 @@ struct kvm_device_attr {
> #define KVM_DEV_VFIO_GROUP 1
> #define KVM_DEV_VFIO_GROUP_ADD 1
> #define KVM_DEV_VFIO_GROUP_DEL 2
> +#define KVM_DEV_VFIO_DEVICE 2
> +#define KVM_DEV_VFIO_DEVICE_FORWARD_IRQ 1
> +#define KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ 2
same here
Best Regards

Eric
>
> enum kvm_device_type {
> KVM_DEV_TYPE_FSL_MPIC_20 = 1,
> @@ -1018,6 +1021,15 @@ enum kvm_device_type {
> KVM_DEV_TYPE_MAX,
> };
>
> +struct kvm_vfio_dev_irq {
> + __u32 argsz; /* structure length */
> + __u32 fd; /* file descriptor of the VFIO device */
> + __u32 index; /* VFIO device IRQ index */
> + __u32 start; /* start of subindex range */
> + __u32 count; /* size of subindex range */
> + __u32 gsi[]; /* gsi, ie. virtual IRQ number */
> +};
> +
> /*
> * ioctls for VM fds
> */
>

2015-06-11 17:15:11

by Alex Williamson

[permalink] [raw]
Subject: Re: [v4 12/16] KVM: kvm-vfio: implement the VFIO skeleton for VT-d Posted-Interrupts

On Thu, 2015-06-11 at 18:51 +0800, Feng Wu wrote:
> This patch adds the kvm-vfio interface for VT-d Posted-Interrupts.
> When guests update MSI/MSI-x information for an assigned-device,
> QEMU will use KVM_DEV_VFIO_DEVICE_POST_IRQ attribute to setup
> IRTE for VT-d PI. Userspace program can also use
> KVM_DEV_VFIO_DEVICE_UNPOST_IRQ to change back to irq remapping mode.
> This patch implements these IRQ attributes.
>
> Signed-off-by: Feng Wu <[email protected]>
> ---
> include/linux/kvm_host.h | 22 +++++++++
> virt/kvm/vfio.c | 126 +++++++++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 148 insertions(+)
>
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index f591f7c..69f8711 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -1073,6 +1073,28 @@ extern struct kvm_device_ops kvm_xics_ops;
> extern struct kvm_device_ops kvm_arm_vgic_v2_ops;
> extern struct kvm_device_ops kvm_arm_vgic_v3_ops;
>
> +#ifdef __KVM_HAVE_ARCH_KVM_VFIO_POST
> +/*
> + * kvm_arch_vfio_update_pi_irte - set IRTE for Posted-Interrupts
> + *
> + * @kvm: kvm
> + * @host_irq: host irq of the interrupt
> + * @guest_irq: gsi of the interrupt
> + * @set: set or unset PI
> + * returns 0 on success, < 0 on failure
> + */
> +int kvm_arch_vfio_update_pi_irte(struct kvm *kvm, unsigned int host_irq,
> + uint32_t guest_irq, bool set);
> +#else
> +static inline int kvm_arch_vfio_update_pi_irte(struct kvm *kvm,
> + unsigned int host_irq,
> + uint32_t guest_irq,
> + bool set)
> +{
> + return 0;
> +}

The code below can't get to this function without
__KVM_HAVE_ARCH_KVM_VFIO_POST, but this seems like it should return an
error if not implemented.

> +#endif
> +
> #ifdef CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT
>
> static inline void kvm_vcpu_set_in_spin_loop(struct kvm_vcpu *vcpu, bool val)
> diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
> index 80a45e4..547fc51 100644
> --- a/virt/kvm/vfio.c
> +++ b/virt/kvm/vfio.c
> @@ -18,6 +18,7 @@
> #include <linux/slab.h>
> #include <linux/uaccess.h>
> #include <linux/vfio.h>
> +#include <asm/irq_remapping.h>

This only exists on x86. Are we also getting lucky with some of the
include chains that give us the PCI related defines? It looks like
we're implicitly assuming CONFIG_PCI

> #include "vfio.h"
>
> struct kvm_vfio_group {
> @@ -276,12 +277,128 @@ static int kvm_vfio_set_group(struct kvm_device *dev, long attr, u64 arg)
> return -ENXIO;
> }
>
> +static int kvm_vfio_pci_get_irq_count(struct pci_dev *pdev, int irq_type)
> +{
> + if (irq_type == VFIO_PCI_INTX_IRQ_INDEX) {
> + u8 pin;
> +
> + pci_read_config_byte(pdev, PCI_INTERRUPT_PIN, &pin);
> + if (pin)
> + return 1;
> + } else if (irq_type == VFIO_PCI_MSI_IRQ_INDEX) {
> + return pci_msi_vec_count(pdev);
> + } else if (irq_type == VFIO_PCI_MSIX_IRQ_INDEX) {
> + return pci_msix_vec_count(pdev);
> + }
> +
> + return 0;
> +}
> +
> +static int kvm_vfio_control_pi(struct kvm_device *kdev,
> + int32_t __user *argp, bool set)
> +{
> + struct kvm_vfio_dev_irq pi_info;
> + uint32_t *gsi;
> + unsigned long minsz;
> + struct vfio_device *vdev;
> + struct msi_desc *entry;
> + struct device *dev;
> + struct pci_dev *pdev;
> + int i, max, ret;
> +
> + minsz = offsetofend(struct kvm_vfio_dev_irq, count);
> +
> + if (copy_from_user(&pi_info, (void __user *)argp, minsz))
> + return -EFAULT;
> +
> + if (pi_info.argsz < minsz || pi_info.index >= VFIO_PCI_NUM_IRQS)
> + return -EINVAL;

Could we also abort on pi_info.count == 0?

> +
> + vdev = kvm_vfio_get_vfio_device(pi_info.fd);
> + if (IS_ERR(vdev))
> + return PTR_ERR(vdev);
> +
> + dev = kvm_vfio_external_base_device(vdev);
> + if (!dev || !dev_is_pci(dev)) {
> + ret = -EFAULT;
> + goto put_vfio_device;
> + }
> +
> + pdev = to_pci_dev(dev);
> +
> + max = kvm_vfio_pci_get_irq_count(pdev, pi_info.index);
> + if (max <= 0) {
> + ret = -EFAULT;
> + goto put_vfio_device;
> + }
> +
> + if (pi_info.argsz - minsz < pi_info.count * sizeof(u32) ||
> + pi_info.start >= max || pi_info.start + pi_info.count > max) {
> + ret = -EINVAL;
> + goto put_vfio_device;
> + }
> +
> + gsi = memdup_user((void __user *)((unsigned long)argp + minsz),
> + pi_info.count * sizeof(u32));
> + if (IS_ERR(gsi)) {
> + ret = PTR_ERR(gsi);
> + goto put_vfio_device;
> + }
> +
> +#ifdef CONFIG_PCI_MSI
> + for (i = 0; i < pi_info.count; i++) {
> + list_for_each_entry(entry, &pdev->msi_list, list) {

Should we be able to get here for INTx?

> + if (entry->msi_attrib.entry_nr != pi_info.start+i)
> + continue;
> +
> + ret = kvm_arch_vfio_update_pi_irte(kdev->kvm,
> + entry->irq,
> + gsi[i],
> + set);
> + if (ret)
> + goto free_gsi;
> + }
> + }
> +#endif
> +
> + ret = 0;

So if we didn't do anything, return success? That seems strange.
Should we also be doing some unwind on failure? Thanks,

Alex

> +
> +free_gsi:
> + kfree(gsi);
> +
> +put_vfio_device:
> + kvm_vfio_put_vfio_device(vdev);
> + return ret;
> +}
> +
> +static int kvm_vfio_set_device(struct kvm_device *kdev, long attr, u64 arg)
> +{
> + int32_t __user *argp = (int32_t __user *)(unsigned long)arg;
> + int ret;
> +
> + switch (attr) {
> +#ifdef __KVM_HAVE_ARCH_KVM_VFIO_POST
> + case KVM_DEV_VFIO_DEVICE_POST_IRQ:
> + ret = kvm_vfio_control_pi(kdev, argp, 1);
> + break;
> + case KVM_DEV_VFIO_DEVICE_UNPOST_IRQ:
> + ret = kvm_vfio_control_pi(kdev, argp, 0);
> + break;
> +#endif
> + default:
> + ret = -ENXIO;
> + }
> + return ret;
> +}
> +
> static int kvm_vfio_set_attr(struct kvm_device *dev,
> struct kvm_device_attr *attr)
> {
> switch (attr->group) {
> case KVM_DEV_VFIO_GROUP:
> return kvm_vfio_set_group(dev, attr->attr, attr->addr);
> + case KVM_DEV_VFIO_DEVICE:
> + return kvm_vfio_set_device(dev, attr->attr, attr->addr);
> }
>
> return -ENXIO;
> @@ -299,6 +416,15 @@ static int kvm_vfio_has_attr(struct kvm_device *dev,
> }
>
> break;
> + case KVM_DEV_VFIO_DEVICE:
> + switch (attr->attr) {
> +#ifdef __KVM_HAVE_ARCH_KVM_VFIO_POST
> + case KVM_DEV_VFIO_DEVICE_POST_IRQ:
> + case KVM_DEV_VFIO_DEVICE_UNPOST_IRQ:
> + return irq_remapping_cap(IRQ_POSTING_CAP) ? 0 : -ENXIO;
> +#endif
> + }
> + break;
> }
>
> return -ENXIO;


2015-06-11 17:16:11

by Alex Williamson

[permalink] [raw]
Subject: Re: [v4 13/16] KVM: x86: kvm-vfio: VT-d posted-interrupts setup

On Thu, 2015-06-11 at 18:51 +0800, Feng Wu wrote:
> This patch defines macro __KVM_HAVE_ARCH_KVM_VFIO_POST and
> implement kvm_arch_vfio_update_pi_irte for x86 architecture.
>

What's vfio specific in any of this? It's obviously called from the
kvm-vfio device interface, but nothing below is vfio related. This is
simply the posted interrupt interface to kvm. Thanks,

Alex


> Signed-off-by: Feng Wu <[email protected]>
> ---
> arch/x86/include/asm/kvm_host.h | 2 +
> arch/x86/kvm/Makefile | 3 +-
> arch/x86/kvm/kvm_vfio_x86.c | 85 +++++++++++++++++++++++++++++++++++++++++
> 3 files changed, 89 insertions(+), 1 deletion(-)
> create mode 100644 arch/x86/kvm/kvm_vfio_x86.c
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 31a495f..1605bf8 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -81,6 +81,8 @@ static inline gfn_t gfn_to_index(gfn_t gfn, gfn_t base_gfn, int level)
> (base_gfn >> KVM_HPAGE_GFN_SHIFT(level));
> }
>
> +#define __KVM_HAVE_ARCH_KVM_VFIO_POST
> +
> #define KVM_PERMILLE_MMU_PAGES 20
> #define KVM_MIN_ALLOC_MMU_PAGES 64
> #define KVM_MMU_HASH_SHIFT 10
> diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
> index 16e8f96..6bafc89 100644
> --- a/arch/x86/kvm/Makefile
> +++ b/arch/x86/kvm/Makefile
> @@ -12,7 +12,8 @@ kvm-y += $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o \
> kvm-$(CONFIG_KVM_ASYNC_PF) += $(KVM)/async_pf.o
>
> kvm-y += x86.o mmu.o emulate.o i8259.o irq.o lapic.o \
> - i8254.o ioapic.o irq_comm.o cpuid.o pmu.o
> + i8254.o ioapic.o irq_comm.o cpuid.o pmu.o \
> + kvm_vfio_x86.o
> kvm-$(CONFIG_KVM_DEVICE_ASSIGNMENT) += assigned-dev.o iommu.o
> kvm-intel-y += vmx.o
> kvm-amd-y += svm.o
> diff --git a/arch/x86/kvm/kvm_vfio_x86.c b/arch/x86/kvm/kvm_vfio_x86.c
> new file mode 100644
> index 0000000..a2d74f9
> --- /dev/null
> +++ b/arch/x86/kvm/kvm_vfio_x86.c
> @@ -0,0 +1,85 @@
> +/*
> + * Copyright (C) 2014 Intel Corporation.
> + * Authors: Feng Wu <[email protected]>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License, version 2, as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + */
> +
> +#include <linux/kvm_host.h>
> +#include <asm/irq_remapping.h>
> +
> +/*
> + * kvm_arch_vfio_update_pi_irte - set IRTE for Posted-Interrupts
> + *
> + * @kvm: kvm
> + * @host_irq: host irq of the interrupt
> + * @guest_irq: gsi of the interrupt
> + * @set: set or unset PI
> + * returns 0 on success, < 0 on failure
> + */
> +int kvm_arch_vfio_update_pi_irte(struct kvm *kvm, unsigned int host_irq,
> + uint32_t guest_irq, bool set)
> +{
> + struct kvm_kernel_irq_routing_entry *e;
> + struct kvm_irq_routing_table *irq_rt;
> + struct kvm_lapic_irq irq;
> + struct kvm_vcpu *vcpu;
> + struct vcpu_data vcpu_info;
> + int idx, ret = -EINVAL;
> +
> + idx = srcu_read_lock(&kvm->irq_srcu);
> + irq_rt = srcu_dereference(kvm->irq_routing, &kvm->irq_srcu);
> + BUG_ON(guest_irq >= irq_rt->nr_rt_entries);
> +
> + hlist_for_each_entry(e, &irq_rt->map[guest_irq], link) {
> + if (e->type != KVM_IRQ_ROUTING_MSI)
> + continue;
> + /*
> + * VT-d PI cannot support posting multicast/broadcast
> + * interrupts to a VCPU, we still use interrupt remapping
> + * for these kind of interrupts.
> + *
> + * For lowest-priority interrupts, we only support
> + * those with single CPU as the destination, e.g. user
> + * configures the interrupts via /proc/irq or uses
> + * irqbalance to make the interrupts single-CPU.
> + *
> + * We will support full lowest-priority interrupt later.
> + *
> + */
> +
> + kvm_set_msi_irq(e, &irq);
> + if (!kvm_intr_is_single_vcpu(kvm, &irq, &vcpu))
> + continue;
> +
> + vcpu_info.pi_desc_addr = kvm_x86_ops->get_pi_desc_addr(vcpu);
> + vcpu_info.vector = irq.vector;
> +
> + if (set)
> + ret = irq_set_vcpu_affinity(host_irq, &vcpu_info);
> + else {
> + /* suppress notification event before unposting */
> + kvm_x86_ops->pi_set_sn(vcpu);
> + ret = irq_set_vcpu_affinity(host_irq, NULL);
> + kvm_x86_ops->pi_clear_sn(vcpu);
> + }
> +
> + if (ret < 0) {
> + printk(KERN_INFO "%s: failed to update PI IRTE\n",
> + __func__);
> + goto out;
> + }
> + }
> +
> + ret = 0;
> +out:
> + srcu_read_unlock(&kvm->irq_srcu, idx);
> + return ret;
> +}


2015-06-11 19:59:12

by Avi Kivity

[permalink] [raw]
Subject: Re: [v4 08/16] KVM: kvm-vfio: User API for IRQ forwarding

On 06/11/2015 01:51 PM, Feng Wu wrote:
> From: Eric Auger <[email protected]>
>
> This patch adds and documents a new KVM_DEV_VFIO_DEVICE group
> and 2 device attributes: KVM_DEV_VFIO_DEVICE_FORWARD_IRQ,
> KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ. The purpose is to be able
> to set a VFIO device IRQ as forwarded or not forwarded.
> the command takes as argument a handle to a new struct named
> kvm_vfio_dev_irq.

Is there no way to do this automatically? After all, vfio knows that a
device interrupt is forwarded to some eventfd, and kvm knows that some
eventfd is forwarded to a guest interrupt. If they compare notes
through a central registry, they can figure out that the interrupt needs
to be forwarded.


> Signed-off-by: Eric Auger <[email protected]>
> ---
> Documentation/virtual/kvm/devices/vfio.txt | 34 ++++++++++++++++++++++++------
> include/uapi/linux/kvm.h | 12 +++++++++++
> 2 files changed, 40 insertions(+), 6 deletions(-)
>
> diff --git a/Documentation/virtual/kvm/devices/vfio.txt b/Documentation/virtual/kvm/devices/vfio.txt
> index ef51740..6186e6d 100644
> --- a/Documentation/virtual/kvm/devices/vfio.txt
> +++ b/Documentation/virtual/kvm/devices/vfio.txt
> @@ -4,15 +4,20 @@ VFIO virtual device
> Device types supported:
> KVM_DEV_TYPE_VFIO
>
> -Only one VFIO instance may be created per VM. The created device
> -tracks VFIO groups in use by the VM and features of those groups
> -important to the correctness and acceleration of the VM. As groups
> -are enabled and disabled for use by the VM, KVM should be updated
> -about their presence. When registered with KVM, a reference to the
> -VFIO-group is held by KVM.
> +Only one VFIO instance may be created per VM.
> +
> +The created device tracks VFIO groups in use by the VM and features
> +of those groups important to the correctness and acceleration of
> +the VM. As groups are enabled and disabled for use by the VM, KVM
> +should be updated about their presence. When registered with KVM,
> +a reference to the VFIO-group is held by KVM.
> +
> +The device also enables to control some IRQ settings of VFIO devices:
> +forwarding/posting.
>
> Groups:
> KVM_DEV_VFIO_GROUP
> + KVM_DEV_VFIO_DEVICE
>
> KVM_DEV_VFIO_GROUP attributes:
> KVM_DEV_VFIO_GROUP_ADD: Add a VFIO group to VFIO-KVM device tracking
> @@ -20,3 +25,20 @@ KVM_DEV_VFIO_GROUP attributes:
>
> For each, kvm_device_attr.addr points to an int32_t file descriptor
> for the VFIO group.
> +
> +KVM_DEV_VFIO_DEVICE attributes:
> + KVM_DEV_VFIO_DEVICE_FORWARD_IRQ: set a VFIO device IRQ as forwarded
> + KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ: set a VFIO device IRQ as not forwarded
> +
> +For each, kvm_device_attr.addr points to a kvm_vfio_dev_irq struct.
> +
> +When forwarded, a physical IRQ is completed by the guest and not by the
> +host. This requires HW support in the interrupt controller.
> +
> +Forwarding can only be set when the corresponding VFIO IRQ is not masked
> +(would it be through VFIO_DEVICE_SET_IRQS command or as a consequence of this
> +IRQ being currently handled) or active at interrupt controller level.
> +In such a situation, -EAGAIN is returned. It is advised to to set the
> +forwarding before the VFIO signaling is set up, this avoids trial and errors.
> +
> +Unforwarding can happen at any time.
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 4b60056..798f3e4 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -999,6 +999,9 @@ struct kvm_device_attr {
> #define KVM_DEV_VFIO_GROUP 1
> #define KVM_DEV_VFIO_GROUP_ADD 1
> #define KVM_DEV_VFIO_GROUP_DEL 2
> +#define KVM_DEV_VFIO_DEVICE 2
> +#define KVM_DEV_VFIO_DEVICE_FORWARD_IRQ 1
> +#define KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ 2
>
> enum kvm_device_type {
> KVM_DEV_TYPE_FSL_MPIC_20 = 1,
> @@ -1018,6 +1021,15 @@ enum kvm_device_type {
> KVM_DEV_TYPE_MAX,
> };
>
> +struct kvm_vfio_dev_irq {
> + __u32 argsz; /* structure length */
> + __u32 fd; /* file descriptor of the VFIO device */
> + __u32 index; /* VFIO device IRQ index */
> + __u32 start; /* start of subindex range */
> + __u32 count; /* size of subindex range */
> + __u32 gsi[]; /* gsi, ie. virtual IRQ number */
> +};
> +
> /*
> * ioctls for VM fds
> */

2015-06-12 00:20:56

by Wu, Feng

[permalink] [raw]
Subject: RE: [v4 08/16] KVM: kvm-vfio: User API for IRQ forwarding



> -----Original Message-----
> From: Eric Auger [mailto:[email protected]]
> Sent: Thursday, June 11, 2015 9:38 PM
> To: Wu, Feng; [email protected]; [email protected]
> Cc: [email protected]; [email protected];
> [email protected]
> Subject: Re: [v4 08/16] KVM: kvm-vfio: User API for IRQ forwarding
>
> Hi Feng,
> On 06/11/2015 12:51 PM, Feng Wu wrote:
> > From: Eric Auger <[email protected]>
> >
> > This patch adds and documents a new KVM_DEV_VFIO_DEVICE group
> > and 2 device attributes: KVM_DEV_VFIO_DEVICE_FORWARD_IRQ,
> > KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ. The purpose is to be able
> > to set a VFIO device IRQ as forwarded or not forwarded.
> > the command takes as argument a handle to a new struct named
> > kvm_vfio_dev_irq.
> >
> > Signed-off-by: Eric Auger <[email protected]>
> > ---
> > Documentation/virtual/kvm/devices/vfio.txt | 34
> ++++++++++++++++++++++++------
> > include/uapi/linux/kvm.h | 12 +++++++++++
> > 2 files changed, 40 insertions(+), 6 deletions(-)
> >
> > diff --git a/Documentation/virtual/kvm/devices/vfio.txt
> b/Documentation/virtual/kvm/devices/vfio.txt
> > index ef51740..6186e6d 100644
> > --- a/Documentation/virtual/kvm/devices/vfio.txt
> > +++ b/Documentation/virtual/kvm/devices/vfio.txt
> > @@ -4,15 +4,20 @@ VFIO virtual device
> > Device types supported:
> > KVM_DEV_TYPE_VFIO
> >
> > -Only one VFIO instance may be created per VM. The created device
> > -tracks VFIO groups in use by the VM and features of those groups
> > -important to the correctness and acceleration of the VM. As groups
> > -are enabled and disabled for use by the VM, KVM should be updated
> > -about their presence. When registered with KVM, a reference to the
> > -VFIO-group is held by KVM.
> > +Only one VFIO instance may be created per VM.
> > +
> > +The created device tracks VFIO groups in use by the VM and features
> > +of those groups important to the correctness and acceleration of
> > +the VM. As groups are enabled and disabled for use by the VM, KVM
> > +should be updated about their presence. When registered with KVM,
> > +a reference to the VFIO-group is held by KVM.
> > +
> > +The device also enables to control some IRQ settings of VFIO devices:
> > +forwarding/posting.
> >
> > Groups:
> > KVM_DEV_VFIO_GROUP
> > + KVM_DEV_VFIO_DEVICE
> >
> > KVM_DEV_VFIO_GROUP attributes:
> > KVM_DEV_VFIO_GROUP_ADD: Add a VFIO group to VFIO-KVM device
> tracking
> > @@ -20,3 +25,20 @@ KVM_DEV_VFIO_GROUP attributes:
> >
> > For each, kvm_device_attr.addr points to an int32_t file descriptor
> > for the VFIO group.
> > +
> > +KVM_DEV_VFIO_DEVICE attributes:
> > + KVM_DEV_VFIO_DEVICE_FORWARD_IRQ: set a VFIO device IRQ as
> forwarded
> > + KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ: set a VFIO device IRQ as
> not forwarded
> > +
> > +For each, kvm_device_attr.addr points to a kvm_vfio_dev_irq struct.
> > +
> > +When forwarded, a physical IRQ is completed by the guest and not by the
> > +host. This requires HW support in the interrupt controller.
> > +
> > +Forwarding can only be set when the corresponding VFIO IRQ is not masked
> > +(would it be through VFIO_DEVICE_SET_IRQS command or as a consequence
> of this
> > +IRQ being currently handled) or active at interrupt controller level.
> > +In such a situation, -EAGAIN is returned. It is advised to to set the
> > +forwarding before the VFIO signaling is set up, this avoids trial and errors.
> > +
> > +Unforwarding can happen at any time.
> Unfortunately the above text is out of context of your series. If it
> simplifies things take the ownership of that patch file and remove my
> description of KVM_DEV_VFIO_DEVICE_FORWARD_IRQ.

Oh, I didn't notice that. It is a good suggestion, Eric!

Thanks,
Feng


> > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> > index 4b60056..798f3e4 100644
> > --- a/include/uapi/linux/kvm.h
> > +++ b/include/uapi/linux/kvm.h
> > @@ -999,6 +999,9 @@ struct kvm_device_attr {
> > #define KVM_DEV_VFIO_GROUP 1
> > #define KVM_DEV_VFIO_GROUP_ADD 1
> > #define KVM_DEV_VFIO_GROUP_DEL 2
> > +#define KVM_DEV_VFIO_DEVICE 2
> > +#define KVM_DEV_VFIO_DEVICE_FORWARD_IRQ 1
> > +#define KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ 2
> same here
> Best Regards
>
> Eric
> >
> > enum kvm_device_type {
> > KVM_DEV_TYPE_FSL_MPIC_20 = 1,
> > @@ -1018,6 +1021,15 @@ enum kvm_device_type {
> > KVM_DEV_TYPE_MAX,
> > };
> >
> > +struct kvm_vfio_dev_irq {
> > + __u32 argsz; /* structure length */
> > + __u32 fd; /* file descriptor of the VFIO device */
> > + __u32 index; /* VFIO device IRQ index */
> > + __u32 start; /* start of subindex range */
> > + __u32 count; /* size of subindex range */
> > + __u32 gsi[]; /* gsi, ie. virtual IRQ number */
> > +};
> > +
> > /*
> > * ioctls for VM fds
> > */
> >

2015-06-12 00:23:12

by Wu, Feng

[permalink] [raw]
Subject: RE: [v4 08/16] KVM: kvm-vfio: User API for IRQ forwarding



> -----Original Message-----
> From: Avi Kivity [mailto:[email protected]]
> Sent: Friday, June 12, 2015 3:59 AM
> To: Wu, Feng; [email protected]; [email protected]
> Cc: [email protected]; [email protected];
> [email protected]; [email protected]
> Subject: Re: [v4 08/16] KVM: kvm-vfio: User API for IRQ forwarding
>
> On 06/11/2015 01:51 PM, Feng Wu wrote:
> > From: Eric Auger <[email protected]>
> >
> > This patch adds and documents a new KVM_DEV_VFIO_DEVICE group
> > and 2 device attributes: KVM_DEV_VFIO_DEVICE_FORWARD_IRQ,
> > KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ. The purpose is to be able
> > to set a VFIO device IRQ as forwarded or not forwarded.
> > the command takes as argument a handle to a new struct named
> > kvm_vfio_dev_irq.
>
> Is there no way to do this automatically? After all, vfio knows that a
> device interrupt is forwarded to some eventfd, and kvm knows that some
> eventfd is forwarded to a guest interrupt. If they compare notes
> through a central registry, they can figure out that the interrupt needs
> to be forwarded.

Oh, just like Eric mentioned in his reply, this description is out of context of
this series, I will remove them in the next version.

Thanks,
Feng

>
>
> > Signed-off-by: Eric Auger <[email protected]>
> > ---
> > Documentation/virtual/kvm/devices/vfio.txt | 34
> ++++++++++++++++++++++++------
> > include/uapi/linux/kvm.h | 12 +++++++++++
> > 2 files changed, 40 insertions(+), 6 deletions(-)
> >
> > diff --git a/Documentation/virtual/kvm/devices/vfio.txt
> b/Documentation/virtual/kvm/devices/vfio.txt
> > index ef51740..6186e6d 100644
> > --- a/Documentation/virtual/kvm/devices/vfio.txt
> > +++ b/Documentation/virtual/kvm/devices/vfio.txt
> > @@ -4,15 +4,20 @@ VFIO virtual device
> > Device types supported:
> > KVM_DEV_TYPE_VFIO
> >
> > -Only one VFIO instance may be created per VM. The created device
> > -tracks VFIO groups in use by the VM and features of those groups
> > -important to the correctness and acceleration of the VM. As groups
> > -are enabled and disabled for use by the VM, KVM should be updated
> > -about their presence. When registered with KVM, a reference to the
> > -VFIO-group is held by KVM.
> > +Only one VFIO instance may be created per VM.
> > +
> > +The created device tracks VFIO groups in use by the VM and features
> > +of those groups important to the correctness and acceleration of
> > +the VM. As groups are enabled and disabled for use by the VM, KVM
> > +should be updated about their presence. When registered with KVM,
> > +a reference to the VFIO-group is held by KVM.
> > +
> > +The device also enables to control some IRQ settings of VFIO devices:
> > +forwarding/posting.
> >
> > Groups:
> > KVM_DEV_VFIO_GROUP
> > + KVM_DEV_VFIO_DEVICE
> >
> > KVM_DEV_VFIO_GROUP attributes:
> > KVM_DEV_VFIO_GROUP_ADD: Add a VFIO group to VFIO-KVM device
> tracking
> > @@ -20,3 +25,20 @@ KVM_DEV_VFIO_GROUP attributes:
> >
> > For each, kvm_device_attr.addr points to an int32_t file descriptor
> > for the VFIO group.
> > +
> > +KVM_DEV_VFIO_DEVICE attributes:
> > + KVM_DEV_VFIO_DEVICE_FORWARD_IRQ: set a VFIO device IRQ as
> forwarded
> > + KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ: set a VFIO device IRQ as
> not forwarded
> > +
> > +For each, kvm_device_attr.addr points to a kvm_vfio_dev_irq struct.
> > +
> > +When forwarded, a physical IRQ is completed by the guest and not by the
> > +host. This requires HW support in the interrupt controller.
> > +
> > +Forwarding can only be set when the corresponding VFIO IRQ is not masked
> > +(would it be through VFIO_DEVICE_SET_IRQS command or as a consequence
> of this
> > +IRQ being currently handled) or active at interrupt controller level.
> > +In such a situation, -EAGAIN is returned. It is advised to to set the
> > +forwarding before the VFIO signaling is set up, this avoids trial and errors.
> > +
> > +Unforwarding can happen at any time.
> > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> > index 4b60056..798f3e4 100644
> > --- a/include/uapi/linux/kvm.h
> > +++ b/include/uapi/linux/kvm.h
> > @@ -999,6 +999,9 @@ struct kvm_device_attr {
> > #define KVM_DEV_VFIO_GROUP 1
> > #define KVM_DEV_VFIO_GROUP_ADD 1
> > #define KVM_DEV_VFIO_GROUP_DEL 2
> > +#define KVM_DEV_VFIO_DEVICE 2
> > +#define KVM_DEV_VFIO_DEVICE_FORWARD_IRQ 1
> > +#define KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ 2
> >
> > enum kvm_device_type {
> > KVM_DEV_TYPE_FSL_MPIC_20 = 1,
> > @@ -1018,6 +1021,15 @@ enum kvm_device_type {
> > KVM_DEV_TYPE_MAX,
> > };
> >
> > +struct kvm_vfio_dev_irq {
> > + __u32 argsz; /* structure length */
> > + __u32 fd; /* file descriptor of the VFIO device */
> > + __u32 index; /* VFIO device IRQ index */
> > + __u32 start; /* start of subindex range */
> > + __u32 count; /* size of subindex range */
> > + __u32 gsi[]; /* gsi, ie. virtual IRQ number */
> > +};
> > +
> > /*
> > * ioctls for VM fds
> > */

2015-06-12 04:59:36

by Wu, Feng

[permalink] [raw]
Subject: RE: [v4 12/16] KVM: kvm-vfio: implement the VFIO skeleton for VT-d Posted-Interrupts



> -----Original Message-----
> From: Alex Williamson [mailto:[email protected]]
> Sent: Friday, June 12, 2015 1:15 AM
> To: Wu, Feng
> Cc: [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]
> Subject: Re: [v4 12/16] KVM: kvm-vfio: implement the VFIO skeleton for VT-d
> Posted-Interrupts
>
> On Thu, 2015-06-11 at 18:51 +0800, Feng Wu wrote:
> > This patch adds the kvm-vfio interface for VT-d Posted-Interrupts.
> > When guests update MSI/MSI-x information for an assigned-device,
> > QEMU will use KVM_DEV_VFIO_DEVICE_POST_IRQ attribute to setup
> > IRTE for VT-d PI. Userspace program can also use
> > KVM_DEV_VFIO_DEVICE_UNPOST_IRQ to change back to irq remapping
> mode.
> > This patch implements these IRQ attributes.
> >
> > Signed-off-by: Feng Wu <[email protected]>
> > ---
> > include/linux/kvm_host.h | 22 +++++++++
> > virt/kvm/vfio.c | 126
> +++++++++++++++++++++++++++++++++++++++++++++++
> > 2 files changed, 148 insertions(+)
> >
> > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > index f591f7c..69f8711 100644
> > --- a/include/linux/kvm_host.h
> > +++ b/include/linux/kvm_host.h
> > @@ -1073,6 +1073,28 @@ extern struct kvm_device_ops kvm_xics_ops;
> > extern struct kvm_device_ops kvm_arm_vgic_v2_ops;
> > extern struct kvm_device_ops kvm_arm_vgic_v3_ops;
> >
> > +#ifdef __KVM_HAVE_ARCH_KVM_VFIO_POST
> > +/*
> > + * kvm_arch_vfio_update_pi_irte - set IRTE for Posted-Interrupts
> > + *
> > + * @kvm: kvm
> > + * @host_irq: host irq of the interrupt
> > + * @guest_irq: gsi of the interrupt
> > + * @set: set or unset PI
> > + * returns 0 on success, < 0 on failure
> > + */
> > +int kvm_arch_vfio_update_pi_irte(struct kvm *kvm, unsigned int host_irq,
> > + uint32_t guest_irq, bool set);
> > +#else
> > +static inline int kvm_arch_vfio_update_pi_irte(struct kvm *kvm,
> > + unsigned int host_irq,
> > + uint32_t guest_irq,
> > + bool set)
> > +{
> > + return 0;
> > +}
>
> The code below can't get to this function without
> __KVM_HAVE_ARCH_KVM_VFIO_POST, but this seems like it should return an
> error if not implemented.

kvm_arch_vfio_update_pi_irte() is called by kvm_vfio_control_pi(), if we remove the
dummy definition of kvm_arch_vfio_update_pi_irte(), kvm_vfio_control_pi() is also
needed to be included in __KVM_HAVE_ARCH_KVM_VFIO_POST, I will handle this
in the next version.

>
> > +#endif
> > +
> > #ifdef CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT
> >
> > static inline void kvm_vcpu_set_in_spin_loop(struct kvm_vcpu *vcpu, bool
> val)
> > diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
> > index 80a45e4..547fc51 100644
> > --- a/virt/kvm/vfio.c
> > +++ b/virt/kvm/vfio.c
> > @@ -18,6 +18,7 @@
> > #include <linux/slab.h>
> > #include <linux/uaccess.h>
> > #include <linux/vfio.h>
> > +#include <asm/irq_remapping.h>
>
> This only exists on x86.

But in kvm_vfio_has_attr(), we can only return 0 when posted interrupt is supported
via calling " irq_remapping_cap(IRQ_POSTING_CAP)" which needs this header file.
Do you think how can I handle this?

> Are we also getting lucky with some of the
> include chains that give us the PCI related defines? It looks like
> we're implicitly assuming CONFIG_PCI

Yes, I think the PCI related header files are included implicitly here. Anyway
I can add "#include <linux/pci.h>" explicitly.

> > #include "vfio.h"
> >
> > struct kvm_vfio_group {
> > @@ -276,12 +277,128 @@ static int kvm_vfio_set_group(struct kvm_device
> *dev, long attr, u64 arg)
> > return -ENXIO;
> > }
> >
> > +static int kvm_vfio_pci_get_irq_count(struct pci_dev *pdev, int irq_type)
> > +{
> > + if (irq_type == VFIO_PCI_INTX_IRQ_INDEX) {
> > + u8 pin;
> > +
> > + pci_read_config_byte(pdev, PCI_INTERRUPT_PIN, &pin);
> > + if (pin)
> > + return 1;
> > + } else if (irq_type == VFIO_PCI_MSI_IRQ_INDEX) {
> > + return pci_msi_vec_count(pdev);
> > + } else if (irq_type == VFIO_PCI_MSIX_IRQ_INDEX) {
> > + return pci_msix_vec_count(pdev);
> > + }
> > +
> > + return 0;
> > +}
> > +
> > +static int kvm_vfio_control_pi(struct kvm_device *kdev,
> > + int32_t __user *argp, bool set)
> > +{
> > + struct kvm_vfio_dev_irq pi_info;
> > + uint32_t *gsi;
> > + unsigned long minsz;
> > + struct vfio_device *vdev;
> > + struct msi_desc *entry;
> > + struct device *dev;
> > + struct pci_dev *pdev;
> > + int i, max, ret;
> > +
> > + minsz = offsetofend(struct kvm_vfio_dev_irq, count);
> > +
> > + if (copy_from_user(&pi_info, (void __user *)argp, minsz))
> > + return -EFAULT;
> > +
> > + if (pi_info.argsz < minsz || pi_info.index >= VFIO_PCI_NUM_IRQS)
> > + return -EINVAL;
>
> Could we also abort on pi_info.count == 0?

Yes, that is a good point.

>
> > +
> > + vdev = kvm_vfio_get_vfio_device(pi_info.fd);
> > + if (IS_ERR(vdev))
> > + return PTR_ERR(vdev);
> > +
> > + dev = kvm_vfio_external_base_device(vdev);
> > + if (!dev || !dev_is_pci(dev)) {
> > + ret = -EFAULT;
> > + goto put_vfio_device;
> > + }
> > +
> > + pdev = to_pci_dev(dev);
> > +
> > + max = kvm_vfio_pci_get_irq_count(pdev, pi_info.index);
> > + if (max <= 0) {
> > + ret = -EFAULT;
> > + goto put_vfio_device;
> > + }
> > +
> > + if (pi_info.argsz - minsz < pi_info.count * sizeof(u32) ||
> > + pi_info.start >= max || pi_info.start + pi_info.count > max) {
> > + ret = -EINVAL;
> > + goto put_vfio_device;
> > + }
> > +
> > + gsi = memdup_user((void __user *)((unsigned long)argp + minsz),
> > + pi_info.count * sizeof(u32));
> > + if (IS_ERR(gsi)) {
> > + ret = PTR_ERR(gsi);
> > + goto put_vfio_device;
> > + }
> > +
> > +#ifdef CONFIG_PCI_MSI
> > + for (i = 0; i < pi_info.count; i++) {
> > + list_for_each_entry(entry, &pdev->msi_list, list) {
>
> Should we be able to get here for INTx?

We only support PI for MSI/MSIx. I think I can return earlier in this function if
the index is not VFIO_PCI_MSI_IRQ_INDEX or VFIO_PCI_MSIX_IRQ_INDEX,
is this okay for you?

>
> > + if (entry->msi_attrib.entry_nr != pi_info.start+i)
> > + continue;
> > +
> > + ret = kvm_arch_vfio_update_pi_irte(kdev->kvm,
> > + entry->irq,
> > + gsi[i],
> > + set);
> > + if (ret)
> > + goto free_gsi;
> > + }
> > + }
> > +#endif
> > +
> > + ret = 0;
>
> So if we didn't do anything, return success? That seems strange.
> Should we also be doing some unwind on failure? Thanks,
>
I can't think of what I need to do on failure. Do you have any ideas?

Thanks,
Feng

> Alex
>
> > +
> > +free_gsi:
> > + kfree(gsi);
> > +
> > +put_vfio_device:
> > + kvm_vfio_put_vfio_device(vdev);
> > + return ret;
> > +}
> > +
> > +static int kvm_vfio_set_device(struct kvm_device *kdev, long attr, u64 arg)
> > +{
> > + int32_t __user *argp = (int32_t __user *)(unsigned long)arg;
> > + int ret;
> > +
> > + switch (attr) {
> > +#ifdef __KVM_HAVE_ARCH_KVM_VFIO_POST
> > + case KVM_DEV_VFIO_DEVICE_POST_IRQ:
> > + ret = kvm_vfio_control_pi(kdev, argp, 1);
> > + break;
> > + case KVM_DEV_VFIO_DEVICE_UNPOST_IRQ:
> > + ret = kvm_vfio_control_pi(kdev, argp, 0);
> > + break;
> > +#endif
> > + default:
> > + ret = -ENXIO;
> > + }
> > + return ret;
> > +}
> > +
> > static int kvm_vfio_set_attr(struct kvm_device *dev,
> > struct kvm_device_attr *attr)
> > {
> > switch (attr->group) {
> > case KVM_DEV_VFIO_GROUP:
> > return kvm_vfio_set_group(dev, attr->attr, attr->addr);
> > + case KVM_DEV_VFIO_DEVICE:
> > + return kvm_vfio_set_device(dev, attr->attr, attr->addr);
> > }
> >
> > return -ENXIO;
> > @@ -299,6 +416,15 @@ static int kvm_vfio_has_attr(struct kvm_device
> *dev,
> > }
> >
> > break;
> > + case KVM_DEV_VFIO_DEVICE:
> > + switch (attr->attr) {
> > +#ifdef __KVM_HAVE_ARCH_KVM_VFIO_POST
> > + case KVM_DEV_VFIO_DEVICE_POST_IRQ:
> > + case KVM_DEV_VFIO_DEVICE_UNPOST_IRQ:
> > + return irq_remapping_cap(IRQ_POSTING_CAP) ? 0 : -ENXIO;
> > +#endif
> > + }
> > + break;
> > }
> >
> > return -ENXIO;
>
>

????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2015-06-12 04:58:46

by Wu, Feng

[permalink] [raw]
Subject: RE: [v4 13/16] KVM: x86: kvm-vfio: VT-d posted-interrupts setup



> -----Original Message-----
> From: Alex Williamson [mailto:[email protected]]
> Sent: Friday, June 12, 2015 1:16 AM
> To: Wu, Feng
> Cc: [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]
> Subject: Re: [v4 13/16] KVM: x86: kvm-vfio: VT-d posted-interrupts setup
>
> On Thu, 2015-06-11 at 18:51 +0800, Feng Wu wrote:
> > This patch defines macro __KVM_HAVE_ARCH_KVM_VFIO_POST and
> > implement kvm_arch_vfio_update_pi_irte for x86 architecture.
> >
>
> What's vfio specific in any of this? It's obviously called from the
> kvm-vfio device interface, but nothing below is vfio related. This is
> simply the posted interrupt interface to kvm. Thanks,
>
> Alex
>

Oh, yes, maybe I need to move this part to vmx.c.

Thanks,
Feng

>
> > Signed-off-by: Feng Wu <[email protected]>
> > ---
> > arch/x86/include/asm/kvm_host.h | 2 +
> > arch/x86/kvm/Makefile | 3 +-
> > arch/x86/kvm/kvm_vfio_x86.c | 85
> +++++++++++++++++++++++++++++++++++++++++
> > 3 files changed, 89 insertions(+), 1 deletion(-)
> > create mode 100644 arch/x86/kvm/kvm_vfio_x86.c
> >
> > diff --git a/arch/x86/include/asm/kvm_host.h
> b/arch/x86/include/asm/kvm_host.h
> > index 31a495f..1605bf8 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -81,6 +81,8 @@ static inline gfn_t gfn_to_index(gfn_t gfn, gfn_t
> base_gfn, int level)
> > (base_gfn >> KVM_HPAGE_GFN_SHIFT(level));
> > }
> >
> > +#define __KVM_HAVE_ARCH_KVM_VFIO_POST
> > +
> > #define KVM_PERMILLE_MMU_PAGES 20
> > #define KVM_MIN_ALLOC_MMU_PAGES 64
> > #define KVM_MMU_HASH_SHIFT 10
> > diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
> > index 16e8f96..6bafc89 100644
> > --- a/arch/x86/kvm/Makefile
> > +++ b/arch/x86/kvm/Makefile
> > @@ -12,7 +12,8 @@ kvm-y += $(KVM)/kvm_main.o
> $(KVM)/coalesced_mmio.o \
> > kvm-$(CONFIG_KVM_ASYNC_PF) += $(KVM)/async_pf.o
> >
> > kvm-y += x86.o mmu.o emulate.o i8259.o irq.o lapic.o \
> > - i8254.o ioapic.o irq_comm.o cpuid.o pmu.o
> > + i8254.o ioapic.o irq_comm.o cpuid.o pmu.o \
> > + kvm_vfio_x86.o
> > kvm-$(CONFIG_KVM_DEVICE_ASSIGNMENT) += assigned-dev.o iommu.o
> > kvm-intel-y += vmx.o
> > kvm-amd-y += svm.o
> > diff --git a/arch/x86/kvm/kvm_vfio_x86.c b/arch/x86/kvm/kvm_vfio_x86.c
> > new file mode 100644
> > index 0000000..a2d74f9
> > --- /dev/null
> > +++ b/arch/x86/kvm/kvm_vfio_x86.c
> > @@ -0,0 +1,85 @@
> > +/*
> > + * Copyright (C) 2014 Intel Corporation.
> > + * Authors: Feng Wu <[email protected]>
> > + *
> > + * This program is free software; you can redistribute it and/or modify
> > + * it under the terms of the GNU General Public License, version 2, as
> > + * published by the Free Software Foundation.
> > + *
> > + * This program is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> > + * GNU General Public License for more details.
> > + */
> > +
> > +#include <linux/kvm_host.h>
> > +#include <asm/irq_remapping.h>
> > +
> > +/*
> > + * kvm_arch_vfio_update_pi_irte - set IRTE for Posted-Interrupts
> > + *
> > + * @kvm: kvm
> > + * @host_irq: host irq of the interrupt
> > + * @guest_irq: gsi of the interrupt
> > + * @set: set or unset PI
> > + * returns 0 on success, < 0 on failure
> > + */
> > +int kvm_arch_vfio_update_pi_irte(struct kvm *kvm, unsigned int host_irq,
> > + uint32_t guest_irq, bool set)
> > +{
> > + struct kvm_kernel_irq_routing_entry *e;
> > + struct kvm_irq_routing_table *irq_rt;
> > + struct kvm_lapic_irq irq;
> > + struct kvm_vcpu *vcpu;
> > + struct vcpu_data vcpu_info;
> > + int idx, ret = -EINVAL;
> > +
> > + idx = srcu_read_lock(&kvm->irq_srcu);
> > + irq_rt = srcu_dereference(kvm->irq_routing, &kvm->irq_srcu);
> > + BUG_ON(guest_irq >= irq_rt->nr_rt_entries);
> > +
> > + hlist_for_each_entry(e, &irq_rt->map[guest_irq], link) {
> > + if (e->type != KVM_IRQ_ROUTING_MSI)
> > + continue;
> > + /*
> > + * VT-d PI cannot support posting multicast/broadcast
> > + * interrupts to a VCPU, we still use interrupt remapping
> > + * for these kind of interrupts.
> > + *
> > + * For lowest-priority interrupts, we only support
> > + * those with single CPU as the destination, e.g. user
> > + * configures the interrupts via /proc/irq or uses
> > + * irqbalance to make the interrupts single-CPU.
> > + *
> > + * We will support full lowest-priority interrupt later.
> > + *
> > + */
> > +
> > + kvm_set_msi_irq(e, &irq);
> > + if (!kvm_intr_is_single_vcpu(kvm, &irq, &vcpu))
> > + continue;
> > +
> > + vcpu_info.pi_desc_addr = kvm_x86_ops->get_pi_desc_addr(vcpu);
> > + vcpu_info.vector = irq.vector;
> > +
> > + if (set)
> > + ret = irq_set_vcpu_affinity(host_irq, &vcpu_info);
> > + else {
> > + /* suppress notification event before unposting */
> > + kvm_x86_ops->pi_set_sn(vcpu);
> > + ret = irq_set_vcpu_affinity(host_irq, NULL);
> > + kvm_x86_ops->pi_clear_sn(vcpu);
> > + }
> > +
> > + if (ret < 0) {
> > + printk(KERN_INFO "%s: failed to update PI IRTE\n",
> > + __func__);
> > + goto out;
> > + }
> > + }
> > +
> > + ret = 0;
> > +out:
> > + srcu_read_unlock(&kvm->irq_srcu, idx);
> > + return ret;
> > +}
>
>

????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2015-06-12 14:51:07

by Alex Williamson

[permalink] [raw]
Subject: Re: [v4 12/16] KVM: kvm-vfio: implement the VFIO skeleton for VT-d Posted-Interrupts

On Fri, 2015-06-12 at 04:54 +0000, Wu, Feng wrote:
>
>
> > -----Original Message-----
> > From: Alex Williamson [mailto:[email protected]]
> > Sent: Friday, June 12, 2015 1:15 AM
> > To: Wu, Feng
> > Cc: [email protected]; [email protected]; [email protected];
> > [email protected]; [email protected]
> > Subject: Re: [v4 12/16] KVM: kvm-vfio: implement the VFIO skeleton for VT-d
> > Posted-Interrupts
> >
> > On Thu, 2015-06-11 at 18:51 +0800, Feng Wu wrote:
> > > This patch adds the kvm-vfio interface for VT-d Posted-Interrupts.
> > > When guests update MSI/MSI-x information for an assigned-device,
> > > QEMU will use KVM_DEV_VFIO_DEVICE_POST_IRQ attribute to setup
> > > IRTE for VT-d PI. Userspace program can also use
> > > KVM_DEV_VFIO_DEVICE_UNPOST_IRQ to change back to irq remapping
> > mode.
> > > This patch implements these IRQ attributes.
> > >
> > > Signed-off-by: Feng Wu <[email protected]>
> > > ---
> > > include/linux/kvm_host.h | 22 +++++++++
> > > virt/kvm/vfio.c | 126
> > +++++++++++++++++++++++++++++++++++++++++++++++
> > > 2 files changed, 148 insertions(+)
> > >
> > > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > > index f591f7c..69f8711 100644
> > > --- a/include/linux/kvm_host.h
> > > +++ b/include/linux/kvm_host.h
> > > @@ -1073,6 +1073,28 @@ extern struct kvm_device_ops kvm_xics_ops;
> > > extern struct kvm_device_ops kvm_arm_vgic_v2_ops;
> > > extern struct kvm_device_ops kvm_arm_vgic_v3_ops;
> > >
> > > +#ifdef __KVM_HAVE_ARCH_KVM_VFIO_POST
> > > +/*
> > > + * kvm_arch_vfio_update_pi_irte - set IRTE for Posted-Interrupts
> > > + *
> > > + * @kvm: kvm
> > > + * @host_irq: host irq of the interrupt
> > > + * @guest_irq: gsi of the interrupt
> > > + * @set: set or unset PI
> > > + * returns 0 on success, < 0 on failure
> > > + */
> > > +int kvm_arch_vfio_update_pi_irte(struct kvm *kvm, unsigned int host_irq,
> > > + uint32_t guest_irq, bool set);
> > > +#else
> > > +static inline int kvm_arch_vfio_update_pi_irte(struct kvm *kvm,
> > > + unsigned int host_irq,
> > > + uint32_t guest_irq,
> > > + bool set)
> > > +{
> > > + return 0;
> > > +}
> >
> > The code below can't get to this function without
> > __KVM_HAVE_ARCH_KVM_VFIO_POST, but this seems like it should return an
> > error if not implemented.
>
> kvm_arch_vfio_update_pi_irte() is called by kvm_vfio_control_pi(), if we remove the
> dummy definition of kvm_arch_vfio_update_pi_irte(), kvm_vfio_control_pi() is also
> needed to be included in __KVM_HAVE_ARCH_KVM_VFIO_POST, I will handle this
> in the next version.

But typically we wouldn't return success for a function that's not
implemented.

> > > +#endif
> > > +
> > > #ifdef CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT
> > >
> > > static inline void kvm_vcpu_set_in_spin_loop(struct kvm_vcpu *vcpu, bool
> > val)
> > > diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
> > > index 80a45e4..547fc51 100644
> > > --- a/virt/kvm/vfio.c
> > > +++ b/virt/kvm/vfio.c
> > > @@ -18,6 +18,7 @@
> > > #include <linux/slab.h>
> > > #include <linux/uaccess.h>
> > > #include <linux/vfio.h>
> > > +#include <asm/irq_remapping.h>
> >
> > This only exists on x86.
>
> But in kvm_vfio_has_attr(), we can only return 0 when posted interrupt is supported
> via calling " irq_remapping_cap(IRQ_POSTING_CAP)" which needs this header file.
> Do you think how can I handle this?

Further abstraction, #ifdef... This file is not x86 specific.

> > Are we also getting lucky with some of the
> > include chains that give us the PCI related defines? It looks like
> > we're implicitly assuming CONFIG_PCI
>
> Yes, I think the PCI related header files are included implicitly here. Anyway
> I can add "#include <linux/pci.h>" explicitly.

Ok, but that doesn't solve the problem that this file should not assume
CONFIG_PCI. Posted interrupts functionality may requires PCI, but
neither KVM nor VFIO necessarily do and therefore the kvm-vfio interface
cannot assume it.

> > > #include "vfio.h"
> > >
> > > struct kvm_vfio_group {
> > > @@ -276,12 +277,128 @@ static int kvm_vfio_set_group(struct kvm_device
> > *dev, long attr, u64 arg)
> > > return -ENXIO;
> > > }
> > >
> > > +static int kvm_vfio_pci_get_irq_count(struct pci_dev *pdev, int irq_type)
> > > +{
> > > + if (irq_type == VFIO_PCI_INTX_IRQ_INDEX) {
> > > + u8 pin;
> > > +
> > > + pci_read_config_byte(pdev, PCI_INTERRUPT_PIN, &pin);
> > > + if (pin)
> > > + return 1;
> > > + } else if (irq_type == VFIO_PCI_MSI_IRQ_INDEX) {
> > > + return pci_msi_vec_count(pdev);
> > > + } else if (irq_type == VFIO_PCI_MSIX_IRQ_INDEX) {
> > > + return pci_msix_vec_count(pdev);
> > > + }
> > > +
> > > + return 0;
> > > +}
> > > +
> > > +static int kvm_vfio_control_pi(struct kvm_device *kdev,
> > > + int32_t __user *argp, bool set)
> > > +{
> > > + struct kvm_vfio_dev_irq pi_info;
> > > + uint32_t *gsi;
> > > + unsigned long minsz;
> > > + struct vfio_device *vdev;
> > > + struct msi_desc *entry;
> > > + struct device *dev;
> > > + struct pci_dev *pdev;
> > > + int i, max, ret;
> > > +
> > > + minsz = offsetofend(struct kvm_vfio_dev_irq, count);
> > > +
> > > + if (copy_from_user(&pi_info, (void __user *)argp, minsz))
> > > + return -EFAULT;
> > > +
> > > + if (pi_info.argsz < minsz || pi_info.index >= VFIO_PCI_NUM_IRQS)
> > > + return -EINVAL;
> >
> > Could we also abort on pi_info.count == 0?
>
> Yes, that is a good point.
>
> >
> > > +
> > > + vdev = kvm_vfio_get_vfio_device(pi_info.fd);
> > > + if (IS_ERR(vdev))
> > > + return PTR_ERR(vdev);
> > > +
> > > + dev = kvm_vfio_external_base_device(vdev);
> > > + if (!dev || !dev_is_pci(dev)) {
> > > + ret = -EFAULT;
> > > + goto put_vfio_device;
> > > + }
> > > +
> > > + pdev = to_pci_dev(dev);
> > > +
> > > + max = kvm_vfio_pci_get_irq_count(pdev, pi_info.index);
> > > + if (max <= 0) {
> > > + ret = -EFAULT;
> > > + goto put_vfio_device;
> > > + }
> > > +
> > > + if (pi_info.argsz - minsz < pi_info.count * sizeof(u32) ||
> > > + pi_info.start >= max || pi_info.start + pi_info.count > max) {
> > > + ret = -EINVAL;
> > > + goto put_vfio_device;
> > > + }
> > > +
> > > + gsi = memdup_user((void __user *)((unsigned long)argp + minsz),
> > > + pi_info.count * sizeof(u32));
> > > + if (IS_ERR(gsi)) {
> > > + ret = PTR_ERR(gsi);
> > > + goto put_vfio_device;
> > > + }
> > > +
> > > +#ifdef CONFIG_PCI_MSI
> > > + for (i = 0; i < pi_info.count; i++) {
> > > + list_for_each_entry(entry, &pdev->msi_list, list) {
> >
> > Should we be able to get here for INTx?
>
> We only support PI for MSI/MSIx. I think I can return earlier in this function if
> the index is not VFIO_PCI_MSI_IRQ_INDEX or VFIO_PCI_MSIX_IRQ_INDEX,
> is this okay for you?

Yes, preferably early enough that you don't need to read the pin
register to determine if the device supports INTx.

> >
> > > + if (entry->msi_attrib.entry_nr != pi_info.start+i)
> > > + continue;
> > > +
> > > + ret = kvm_arch_vfio_update_pi_irte(kdev->kvm,
> > > + entry->irq,
> > > + gsi[i],
> > > + set);
> > > + if (ret)
> > > + goto free_gsi;
> > > + }
> > > + }
> > > +#endif
> > > +
> > > + ret = 0;
> >
> > So if we didn't do anything, return success? That seems strange.
> > Should we also be doing some unwind on failure? Thanks,
> >
> I can't think of what I need to do on failure. Do you have any ideas?

This is a user interface, so I think a user would expect that success
means that all of the parameters were correct and all of the indexes
were configured. If something generated an error, everything should be
returned to the initial state and an error returned. That may mean that
we need to reverse the loop and unset things that we set, or set things
that were unset. Thanks,

Alex

> > > +free_gsi:
> > > + kfree(gsi);
> > > +
> > > +put_vfio_device:
> > > + kvm_vfio_put_vfio_device(vdev);
> > > + return ret;
> > > +}
> > > +
> > > +static int kvm_vfio_set_device(struct kvm_device *kdev, long attr, u64 arg)
> > > +{
> > > + int32_t __user *argp = (int32_t __user *)(unsigned long)arg;
> > > + int ret;
> > > +
> > > + switch (attr) {
> > > +#ifdef __KVM_HAVE_ARCH_KVM_VFIO_POST
> > > + case KVM_DEV_VFIO_DEVICE_POST_IRQ:
> > > + ret = kvm_vfio_control_pi(kdev, argp, 1);
> > > + break;
> > > + case KVM_DEV_VFIO_DEVICE_UNPOST_IRQ:
> > > + ret = kvm_vfio_control_pi(kdev, argp, 0);
> > > + break;
> > > +#endif
> > > + default:
> > > + ret = -ENXIO;
> > > + }
> > > + return ret;
> > > +}
> > > +
> > > static int kvm_vfio_set_attr(struct kvm_device *dev,
> > > struct kvm_device_attr *attr)
> > > {
> > > switch (attr->group) {
> > > case KVM_DEV_VFIO_GROUP:
> > > return kvm_vfio_set_group(dev, attr->attr, attr->addr);
> > > + case KVM_DEV_VFIO_DEVICE:
> > > + return kvm_vfio_set_device(dev, attr->attr, attr->addr);
> > > }
> > >
> > > return -ENXIO;
> > > @@ -299,6 +416,15 @@ static int kvm_vfio_has_attr(struct kvm_device
> > *dev,
> > > }
> > >
> > > break;
> > > + case KVM_DEV_VFIO_DEVICE:
> > > + switch (attr->attr) {
> > > +#ifdef __KVM_HAVE_ARCH_KVM_VFIO_POST
> > > + case KVM_DEV_VFIO_DEVICE_POST_IRQ:
> > > + case KVM_DEV_VFIO_DEVICE_UNPOST_IRQ:
> > > + return irq_remapping_cap(IRQ_POSTING_CAP) ? 0 : -ENXIO;
> > > +#endif
> > > + }
> > > + break;
> > > }
> > >
> > > return -ENXIO;
> >
> >
>
> NrybXǧv^)޺{.n+hܨ}Ơz&j:+vzZ++zfh~izw?&)ߢf


2015-06-12 15:41:38

by Alex Williamson

[permalink] [raw]
Subject: Re: [v4 08/16] KVM: kvm-vfio: User API for IRQ forwarding

On Fri, 2015-06-12 at 00:23 +0000, Wu, Feng wrote:
>
> > -----Original Message-----
> > From: Avi Kivity [mailto:[email protected]]
> > Sent: Friday, June 12, 2015 3:59 AM
> > To: Wu, Feng; [email protected]; [email protected]
> > Cc: [email protected]; [email protected];
> > [email protected]; [email protected]
> > Subject: Re: [v4 08/16] KVM: kvm-vfio: User API for IRQ forwarding
> >
> > On 06/11/2015 01:51 PM, Feng Wu wrote:
> > > From: Eric Auger <[email protected]>
> > >
> > > This patch adds and documents a new KVM_DEV_VFIO_DEVICE group
> > > and 2 device attributes: KVM_DEV_VFIO_DEVICE_FORWARD_IRQ,
> > > KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ. The purpose is to be able
> > > to set a VFIO device IRQ as forwarded or not forwarded.
> > > the command takes as argument a handle to a new struct named
> > > kvm_vfio_dev_irq.
> >
> > Is there no way to do this automatically? After all, vfio knows that a
> > device interrupt is forwarded to some eventfd, and kvm knows that some
> > eventfd is forwarded to a guest interrupt. If they compare notes
> > through a central registry, they can figure out that the interrupt needs
> > to be forwarded.
>
> Oh, just like Eric mentioned in his reply, this description is out of context of
> this series, I will remove them in the next version.


I suspect Avi's question was more general. While forward/unforward is
out of context for this series, it's very similar in nature to
enabling/disabling posted interrupts. So I think the question remains
whether we really need userspace to participate in creating this
shortcut or if kvm and vfio can some how orchestrate figuring it out
automatically.

Personally I don't know how we could do it automatically. We've always
relied on userspace to independently setup vfio and kvm such that
neither have any idea that the other is there and update each side
independently when anything changes. So it seems consistent to continue
that here. It doesn't seem like there's much to gain performance-wise
either, updates should be a relatively rare event I'd expect.

There's really no metadata associated with an eventfd, so "comparing
notes" automatically might imply some central registration entity. That
immediately sounds like a much more complex solution, but maybe Avi has
some ideas to manage it. Thanks,

Alex

> > > Signed-off-by: Eric Auger <[email protected]>
> > > ---
> > > Documentation/virtual/kvm/devices/vfio.txt | 34
> > ++++++++++++++++++++++++------
> > > include/uapi/linux/kvm.h | 12 +++++++++++
> > > 2 files changed, 40 insertions(+), 6 deletions(-)
> > >
> > > diff --git a/Documentation/virtual/kvm/devices/vfio.txt
> > b/Documentation/virtual/kvm/devices/vfio.txt
> > > index ef51740..6186e6d 100644
> > > --- a/Documentation/virtual/kvm/devices/vfio.txt
> > > +++ b/Documentation/virtual/kvm/devices/vfio.txt
> > > @@ -4,15 +4,20 @@ VFIO virtual device
> > > Device types supported:
> > > KVM_DEV_TYPE_VFIO
> > >
> > > -Only one VFIO instance may be created per VM. The created device
> > > -tracks VFIO groups in use by the VM and features of those groups
> > > -important to the correctness and acceleration of the VM. As groups
> > > -are enabled and disabled for use by the VM, KVM should be updated
> > > -about their presence. When registered with KVM, a reference to the
> > > -VFIO-group is held by KVM.
> > > +Only one VFIO instance may be created per VM.
> > > +
> > > +The created device tracks VFIO groups in use by the VM and features
> > > +of those groups important to the correctness and acceleration of
> > > +the VM. As groups are enabled and disabled for use by the VM, KVM
> > > +should be updated about their presence. When registered with KVM,
> > > +a reference to the VFIO-group is held by KVM.
> > > +
> > > +The device also enables to control some IRQ settings of VFIO devices:
> > > +forwarding/posting.
> > >
> > > Groups:
> > > KVM_DEV_VFIO_GROUP
> > > + KVM_DEV_VFIO_DEVICE
> > >
> > > KVM_DEV_VFIO_GROUP attributes:
> > > KVM_DEV_VFIO_GROUP_ADD: Add a VFIO group to VFIO-KVM device
> > tracking
> > > @@ -20,3 +25,20 @@ KVM_DEV_VFIO_GROUP attributes:
> > >
> > > For each, kvm_device_attr.addr points to an int32_t file descriptor
> > > for the VFIO group.
> > > +
> > > +KVM_DEV_VFIO_DEVICE attributes:
> > > + KVM_DEV_VFIO_DEVICE_FORWARD_IRQ: set a VFIO device IRQ as
> > forwarded
> > > + KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ: set a VFIO device IRQ as
> > not forwarded
> > > +
> > > +For each, kvm_device_attr.addr points to a kvm_vfio_dev_irq struct.
> > > +
> > > +When forwarded, a physical IRQ is completed by the guest and not by the
> > > +host. This requires HW support in the interrupt controller.
> > > +
> > > +Forwarding can only be set when the corresponding VFIO IRQ is not masked
> > > +(would it be through VFIO_DEVICE_SET_IRQS command or as a consequence
> > of this
> > > +IRQ being currently handled) or active at interrupt controller level.
> > > +In such a situation, -EAGAIN is returned. It is advised to to set the
> > > +forwarding before the VFIO signaling is set up, this avoids trial and errors.
> > > +
> > > +Unforwarding can happen at any time.
> > > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> > > index 4b60056..798f3e4 100644
> > > --- a/include/uapi/linux/kvm.h
> > > +++ b/include/uapi/linux/kvm.h
> > > @@ -999,6 +999,9 @@ struct kvm_device_attr {
> > > #define KVM_DEV_VFIO_GROUP 1
> > > #define KVM_DEV_VFIO_GROUP_ADD 1
> > > #define KVM_DEV_VFIO_GROUP_DEL 2
> > > +#define KVM_DEV_VFIO_DEVICE 2
> > > +#define KVM_DEV_VFIO_DEVICE_FORWARD_IRQ 1
> > > +#define KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ 2
> > >
> > > enum kvm_device_type {
> > > KVM_DEV_TYPE_FSL_MPIC_20 = 1,
> > > @@ -1018,6 +1021,15 @@ enum kvm_device_type {
> > > KVM_DEV_TYPE_MAX,
> > > };
> > >
> > > +struct kvm_vfio_dev_irq {
> > > + __u32 argsz; /* structure length */
> > > + __u32 fd; /* file descriptor of the VFIO device */
> > > + __u32 index; /* VFIO device IRQ index */
> > > + __u32 start; /* start of subindex range */
> > > + __u32 count; /* size of subindex range */
> > > + __u32 gsi[]; /* gsi, ie. virtual IRQ number */
> > > +};
> > > +
> > > /*
> > > * ioctls for VM fds
> > > */
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html


2015-06-12 18:49:01

by Avi Kivity

[permalink] [raw]
Subject: Re: [v4 08/16] KVM: kvm-vfio: User API for IRQ forwarding

On 06/12/2015 06:41 PM, Alex Williamson wrote:
> On Fri, 2015-06-12 at 00:23 +0000, Wu, Feng wrote:
>>> -----Original Message-----
>>> From: Avi Kivity [mailto:[email protected]]
>>> Sent: Friday, June 12, 2015 3:59 AM
>>> To: Wu, Feng; [email protected]; [email protected]
>>> Cc: [email protected]; [email protected];
>>> [email protected]; [email protected]
>>> Subject: Re: [v4 08/16] KVM: kvm-vfio: User API for IRQ forwarding
>>>
>>> On 06/11/2015 01:51 PM, Feng Wu wrote:
>>>> From: Eric Auger <[email protected]>
>>>>
>>>> This patch adds and documents a new KVM_DEV_VFIO_DEVICE group
>>>> and 2 device attributes: KVM_DEV_VFIO_DEVICE_FORWARD_IRQ,
>>>> KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ. The purpose is to be able
>>>> to set a VFIO device IRQ as forwarded or not forwarded.
>>>> the command takes as argument a handle to a new struct named
>>>> kvm_vfio_dev_irq.
>>> Is there no way to do this automatically? After all, vfio knows that a
>>> device interrupt is forwarded to some eventfd, and kvm knows that some
>>> eventfd is forwarded to a guest interrupt. If they compare notes
>>> through a central registry, they can figure out that the interrupt needs
>>> to be forwarded.
>> Oh, just like Eric mentioned in his reply, this description is out of context of
>> this series, I will remove them in the next version.
>
> I suspect Avi's question was more general. While forward/unforward is
> out of context for this series, it's very similar in nature to
> enabling/disabling posted interrupts. So I think the question remains
> whether we really need userspace to participate in creating this
> shortcut or if kvm and vfio can some how orchestrate figuring it out
> automatically.
>
> Personally I don't know how we could do it automatically. We've always
> relied on userspace to independently setup vfio and kvm such that
> neither have any idea that the other is there and update each side
> independently when anything changes. So it seems consistent to continue
> that here. It doesn't seem like there's much to gain performance-wise
> either, updates should be a relatively rare event I'd expect.
>
> There's really no metadata associated with an eventfd, so "comparing
> notes" automatically might imply some central registration entity. That
> immediately sounds like a much more complex solution, but maybe Avi has
> some ideas to manage it. Thanks,
>

The idea is to have a central registry maintained by a posted interrupts
manager. Both vfio and kvm pass the filp (along with extra information)
to the posted interrupts manager, which, when it detects a filp match,
tells each of them what to do.

The advantages are:
- old userspace gains the optimization without change
- a userspace API is more expensive to maintain than internal kernel
interfaces (CVEs, documentation, maintaining backwards compatibility)
- if you can do it without a new interface, this indicates that all the
information in the new interface is redundant. That means you have to
check it for consistency with the existing information, so it's extra
work (likely, it's exactly what the posted interrupt manager would be
doing anyway).

2015-06-12 19:03:38

by Alex Williamson

[permalink] [raw]
Subject: Re: [v4 08/16] KVM: kvm-vfio: User API for IRQ forwarding

On Fri, 2015-06-12 at 21:48 +0300, Avi Kivity wrote:
> On 06/12/2015 06:41 PM, Alex Williamson wrote:
> > On Fri, 2015-06-12 at 00:23 +0000, Wu, Feng wrote:
> >>> -----Original Message-----
> >>> From: Avi Kivity [mailto:[email protected]]
> >>> Sent: Friday, June 12, 2015 3:59 AM
> >>> To: Wu, Feng; [email protected]; [email protected]
> >>> Cc: [email protected]; [email protected];
> >>> [email protected]; [email protected]
> >>> Subject: Re: [v4 08/16] KVM: kvm-vfio: User API for IRQ forwarding
> >>>
> >>> On 06/11/2015 01:51 PM, Feng Wu wrote:
> >>>> From: Eric Auger <[email protected]>
> >>>>
> >>>> This patch adds and documents a new KVM_DEV_VFIO_DEVICE group
> >>>> and 2 device attributes: KVM_DEV_VFIO_DEVICE_FORWARD_IRQ,
> >>>> KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ. The purpose is to be able
> >>>> to set a VFIO device IRQ as forwarded or not forwarded.
> >>>> the command takes as argument a handle to a new struct named
> >>>> kvm_vfio_dev_irq.
> >>> Is there no way to do this automatically? After all, vfio knows that a
> >>> device interrupt is forwarded to some eventfd, and kvm knows that some
> >>> eventfd is forwarded to a guest interrupt. If they compare notes
> >>> through a central registry, they can figure out that the interrupt needs
> >>> to be forwarded.
> >> Oh, just like Eric mentioned in his reply, this description is out of context of
> >> this series, I will remove them in the next version.
> >
> > I suspect Avi's question was more general. While forward/unforward is
> > out of context for this series, it's very similar in nature to
> > enabling/disabling posted interrupts. So I think the question remains
> > whether we really need userspace to participate in creating this
> > shortcut or if kvm and vfio can some how orchestrate figuring it out
> > automatically.
> >
> > Personally I don't know how we could do it automatically. We've always
> > relied on userspace to independently setup vfio and kvm such that
> > neither have any idea that the other is there and update each side
> > independently when anything changes. So it seems consistent to continue
> > that here. It doesn't seem like there's much to gain performance-wise
> > either, updates should be a relatively rare event I'd expect.
> >
> > There's really no metadata associated with an eventfd, so "comparing
> > notes" automatically might imply some central registration entity. That
> > immediately sounds like a much more complex solution, but maybe Avi has
> > some ideas to manage it. Thanks,
> >
>
> The idea is to have a central registry maintained by a posted interrupts
> manager. Both vfio and kvm pass the filp (along with extra information)
> to the posted interrupts manager, which, when it detects a filp match,
> tells each of them what to do.
>
> The advantages are:
> - old userspace gains the optimization without change
> - a userspace API is more expensive to maintain than internal kernel
> interfaces (CVEs, documentation, maintaining backwards compatibility)
> - if you can do it without a new interface, this indicates that all the
> information in the new interface is redundant. That means you have to
> check it for consistency with the existing information, so it's extra
> work (likely, it's exactly what the posted interrupt manager would be
> doing anyway).

Yep, those all sound like good things and I believe that's similar in
design to the way we had originally discussed this interaction at
LPC/KVM Forum several years ago. I'd be in favor of that approach.
Thanks,

Alex

2015-06-15 06:42:44

by Wu, Feng

[permalink] [raw]
Subject: RE: [v4 08/16] KVM: kvm-vfio: User API for IRQ forwarding



> -----Original Message-----
> From: Alex Williamson [mailto:[email protected]]
> Sent: Saturday, June 13, 2015 3:04 AM
> To: Avi Kivity
> Cc: Wu, Feng; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected]
> Subject: Re: [v4 08/16] KVM: kvm-vfio: User API for IRQ forwarding
>
> On Fri, 2015-06-12 at 21:48 +0300, Avi Kivity wrote:
> > On 06/12/2015 06:41 PM, Alex Williamson wrote:
> > > On Fri, 2015-06-12 at 00:23 +0000, Wu, Feng wrote:
> > >>> -----Original Message-----
> > >>> From: Avi Kivity [mailto:[email protected]]
> > >>> Sent: Friday, June 12, 2015 3:59 AM
> > >>> To: Wu, Feng; [email protected]; [email protected]
> > >>> Cc: [email protected]; [email protected];
> > >>> [email protected]; [email protected]
> > >>> Subject: Re: [v4 08/16] KVM: kvm-vfio: User API for IRQ forwarding
> > >>>
> > >>> On 06/11/2015 01:51 PM, Feng Wu wrote:
> > >>>> From: Eric Auger <[email protected]>
> > >>>>
> > >>>> This patch adds and documents a new KVM_DEV_VFIO_DEVICE group
> > >>>> and 2 device attributes: KVM_DEV_VFIO_DEVICE_FORWARD_IRQ,
> > >>>> KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ. The purpose is to be able
> > >>>> to set a VFIO device IRQ as forwarded or not forwarded.
> > >>>> the command takes as argument a handle to a new struct named
> > >>>> kvm_vfio_dev_irq.
> > >>> Is there no way to do this automatically? After all, vfio knows that a
> > >>> device interrupt is forwarded to some eventfd, and kvm knows that some
> > >>> eventfd is forwarded to a guest interrupt. If they compare notes
> > >>> through a central registry, they can figure out that the interrupt needs
> > >>> to be forwarded.
> > >> Oh, just like Eric mentioned in his reply, this description is out of context of
> > >> this series, I will remove them in the next version.
> > >
> > > I suspect Avi's question was more general. While forward/unforward is
> > > out of context for this series, it's very similar in nature to
> > > enabling/disabling posted interrupts. So I think the question remains
> > > whether we really need userspace to participate in creating this
> > > shortcut or if kvm and vfio can some how orchestrate figuring it out
> > > automatically.
> > >
> > > Personally I don't know how we could do it automatically. We've always
> > > relied on userspace to independently setup vfio and kvm such that
> > > neither have any idea that the other is there and update each side
> > > independently when anything changes. So it seems consistent to continue
> > > that here. It doesn't seem like there's much to gain performance-wise
> > > either, updates should be a relatively rare event I'd expect.
> > >
> > > There's really no metadata associated with an eventfd, so "comparing
> > > notes" automatically might imply some central registration entity. That
> > > immediately sounds like a much more complex solution, but maybe Avi has
> > > some ideas to manage it. Thanks,
> > >
> >
> > The idea is to have a central registry maintained by a posted interrupts
> > manager. Both vfio and kvm pass the filp (along with extra information)
> > to the posted interrupts manager, which, when it detects a filp match,
> > tells each of them what to do.
> >
> > The advantages are:
> > - old userspace gains the optimization without change
> > - a userspace API is more expensive to maintain than internal kernel
> > interfaces (CVEs, documentation, maintaining backwards compatibility)
> > - if you can do it without a new interface, this indicates that all the
> > information in the new interface is redundant. That means you have to
> > check it for consistency with the existing information, so it's extra
> > work (likely, it's exactly what the posted interrupt manager would be
> > doing anyway).
>
> Yep, those all sound like good things and I believe that's similar in
> design to the way we had originally discussed this interaction at
> LPC/KVM Forum several years ago. I'd be in favor of that approach.
> Thanks,

This seems a little complex compared to the current solution, since I am
not quite familiar with VFIO, Alex, can you help on this if we need to do this
that way, especially for the VFIO part?

Thanks,
Feng

>
> Alex

????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2015-06-15 16:17:33

by Eric Auger

[permalink] [raw]
Subject: Re: [v4 08/16] KVM: kvm-vfio: User API for IRQ forwarding

Hi Alex, all,
On 06/12/2015 09:03 PM, Alex Williamson wrote:
> On Fri, 2015-06-12 at 21:48 +0300, Avi Kivity wrote:
>> On 06/12/2015 06:41 PM, Alex Williamson wrote:
>>> On Fri, 2015-06-12 at 00:23 +0000, Wu, Feng wrote:
>>>>> -----Original Message-----
>>>>> From: Avi Kivity [mailto:[email protected]]
>>>>> Sent: Friday, June 12, 2015 3:59 AM
>>>>> To: Wu, Feng; [email protected]; [email protected]
>>>>> Cc: [email protected]; [email protected];
>>>>> [email protected]; [email protected]
>>>>> Subject: Re: [v4 08/16] KVM: kvm-vfio: User API for IRQ forwarding
>>>>>
>>>>> On 06/11/2015 01:51 PM, Feng Wu wrote:
>>>>>> From: Eric Auger <[email protected]>
>>>>>>
>>>>>> This patch adds and documents a new KVM_DEV_VFIO_DEVICE group
>>>>>> and 2 device attributes: KVM_DEV_VFIO_DEVICE_FORWARD_IRQ,
>>>>>> KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ. The purpose is to be able
>>>>>> to set a VFIO device IRQ as forwarded or not forwarded.
>>>>>> the command takes as argument a handle to a new struct named
>>>>>> kvm_vfio_dev_irq.
>>>>> Is there no way to do this automatically? After all, vfio knows that a
>>>>> device interrupt is forwarded to some eventfd, and kvm knows that some
>>>>> eventfd is forwarded to a guest interrupt. If they compare notes
>>>>> through a central registry, they can figure out that the interrupt needs
>>>>> to be forwarded.
>>>> Oh, just like Eric mentioned in his reply, this description is out of context of
>>>> this series, I will remove them in the next version.
>>>
>>> I suspect Avi's question was more general. While forward/unforward is
>>> out of context for this series, it's very similar in nature to
>>> enabling/disabling posted interrupts. So I think the question remains
>>> whether we really need userspace to participate in creating this
>>> shortcut or if kvm and vfio can some how orchestrate figuring it out
>>> automatically.
>>>
>>> Personally I don't know how we could do it automatically. We've always
>>> relied on userspace to independently setup vfio and kvm such that
>>> neither have any idea that the other is there and update each side
>>> independently when anything changes. So it seems consistent to continue
>>> that here. It doesn't seem like there's much to gain performance-wise
>>> either, updates should be a relatively rare event I'd expect.
>>>
>>> There's really no metadata associated with an eventfd, so "comparing
>>> notes" automatically might imply some central registration entity. That
>>> immediately sounds like a much more complex solution, but maybe Avi has
>>> some ideas to manage it. Thanks,
>>>
>>
>> The idea is to have a central registry maintained by a posted interrupts
>> manager. Both vfio and kvm pass the filp (along with extra information)
>> to the posted interrupts manager, which, when it detects a filp match,
>> tells each of them what to do.
>>
>> The advantages are:
>> - old userspace gains the optimization without change
>> - a userspace API is more expensive to maintain than internal kernel
>> interfaces (CVEs, documentation, maintaining backwards compatibility)
>> - if you can do it without a new interface, this indicates that all the
>> information in the new interface is redundant. That means you have to
>> check it for consistency with the existing information, so it's extra
>> work (likely, it's exactly what the posted interrupt manager would be
>> doing anyway).
>
> Yep, those all sound like good things and I believe that's similar in
> design to the way we had originally discussed this interaction at
> LPC/KVM Forum several years ago. I'd be in favor of that approach.

I guess this discussion also is relevant wrt "[RFC v6 00/16] KVM-VFIO
IRQ forward control" series? Or is that "central registry maintained by
a posted interrupts manager" something more specific to x86?

Thank you in advance

Best Regards

Eric

> Thanks,
>
> Alex
>

2015-06-15 16:45:10

by Alex Williamson

[permalink] [raw]
Subject: Re: [v4 08/16] KVM: kvm-vfio: User API for IRQ forwarding

On Mon, 2015-06-15 at 18:17 +0200, Eric Auger wrote:
> Hi Alex, all,
> On 06/12/2015 09:03 PM, Alex Williamson wrote:
> > On Fri, 2015-06-12 at 21:48 +0300, Avi Kivity wrote:
> >> On 06/12/2015 06:41 PM, Alex Williamson wrote:
> >>> On Fri, 2015-06-12 at 00:23 +0000, Wu, Feng wrote:
> >>>>> -----Original Message-----
> >>>>> From: Avi Kivity [mailto:[email protected]]
> >>>>> Sent: Friday, June 12, 2015 3:59 AM
> >>>>> To: Wu, Feng; [email protected]; [email protected]
> >>>>> Cc: [email protected]; [email protected];
> >>>>> [email protected]; [email protected]
> >>>>> Subject: Re: [v4 08/16] KVM: kvm-vfio: User API for IRQ forwarding
> >>>>>
> >>>>> On 06/11/2015 01:51 PM, Feng Wu wrote:
> >>>>>> From: Eric Auger <[email protected]>
> >>>>>>
> >>>>>> This patch adds and documents a new KVM_DEV_VFIO_DEVICE group
> >>>>>> and 2 device attributes: KVM_DEV_VFIO_DEVICE_FORWARD_IRQ,
> >>>>>> KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ. The purpose is to be able
> >>>>>> to set a VFIO device IRQ as forwarded or not forwarded.
> >>>>>> the command takes as argument a handle to a new struct named
> >>>>>> kvm_vfio_dev_irq.
> >>>>> Is there no way to do this automatically? After all, vfio knows that a
> >>>>> device interrupt is forwarded to some eventfd, and kvm knows that some
> >>>>> eventfd is forwarded to a guest interrupt. If they compare notes
> >>>>> through a central registry, they can figure out that the interrupt needs
> >>>>> to be forwarded.
> >>>> Oh, just like Eric mentioned in his reply, this description is out of context of
> >>>> this series, I will remove them in the next version.
> >>>
> >>> I suspect Avi's question was more general. While forward/unforward is
> >>> out of context for this series, it's very similar in nature to
> >>> enabling/disabling posted interrupts. So I think the question remains
> >>> whether we really need userspace to participate in creating this
> >>> shortcut or if kvm and vfio can some how orchestrate figuring it out
> >>> automatically.
> >>>
> >>> Personally I don't know how we could do it automatically. We've always
> >>> relied on userspace to independently setup vfio and kvm such that
> >>> neither have any idea that the other is there and update each side
> >>> independently when anything changes. So it seems consistent to continue
> >>> that here. It doesn't seem like there's much to gain performance-wise
> >>> either, updates should be a relatively rare event I'd expect.
> >>>
> >>> There's really no metadata associated with an eventfd, so "comparing
> >>> notes" automatically might imply some central registration entity. That
> >>> immediately sounds like a much more complex solution, but maybe Avi has
> >>> some ideas to manage it. Thanks,
> >>>
> >>
> >> The idea is to have a central registry maintained by a posted interrupts
> >> manager. Both vfio and kvm pass the filp (along with extra information)
> >> to the posted interrupts manager, which, when it detects a filp match,
> >> tells each of them what to do.
> >>
> >> The advantages are:
> >> - old userspace gains the optimization without change
> >> - a userspace API is more expensive to maintain than internal kernel
> >> interfaces (CVEs, documentation, maintaining backwards compatibility)
> >> - if you can do it without a new interface, this indicates that all the
> >> information in the new interface is redundant. That means you have to
> >> check it for consistency with the existing information, so it's extra
> >> work (likely, it's exactly what the posted interrupt manager would be
> >> doing anyway).
> >
> > Yep, those all sound like good things and I believe that's similar in
> > design to the way we had originally discussed this interaction at
> > LPC/KVM Forum several years ago. I'd be in favor of that approach.
>
> I guess this discussion also is relevant wrt "[RFC v6 00/16] KVM-VFIO
> IRQ forward control" series? Or is that "central registry maintained by
> a posted interrupts manager" something more specific to x86?

I'd think we'd want it for any sort of offload and supporting both
posted-interrupts and irq-forwarding would be a good validation. I
imagine there would be registration/de-registration callbacks separate
for interrupt producers vs interrupt consumers. Each registration
function would likely provide a struct of callbacks, probably similar to
the get_symbol callbacks proposed for the kvm-vfio device on the IRQ
producer side. The eventfd would be the token that the manager would
use to match producers and consumers. The hard part is probably
figuring out what information to retrieve from the producer and provide
to the consumer in a generic way between pci and platform, but as an
internal interface, it's not a big deal if we screw it up a few times to
start. Thanks,

Alex

2015-06-18 09:16:35

by Wu, Feng

[permalink] [raw]
Subject: RE: [v4 08/16] KVM: kvm-vfio: User API for IRQ forwarding



> -----Original Message-----
> From: Alex Williamson [mailto:[email protected]]
> Sent: Tuesday, June 16, 2015 12:45 AM
> To: Eric Auger
> Cc: Avi Kivity; Wu, Feng; [email protected]; [email protected];
> [email protected]; [email protected]
> Subject: Re: [v4 08/16] KVM: kvm-vfio: User API for IRQ forwarding
>
> On Mon, 2015-06-15 at 18:17 +0200, Eric Auger wrote:
> > Hi Alex, all,
> > On 06/12/2015 09:03 PM, Alex Williamson wrote:
> > > On Fri, 2015-06-12 at 21:48 +0300, Avi Kivity wrote:
> > >> On 06/12/2015 06:41 PM, Alex Williamson wrote:
> > >>> On Fri, 2015-06-12 at 00:23 +0000, Wu, Feng wrote:
> > >>>>> -----Original Message-----
> > >>>>> From: Avi Kivity [mailto:[email protected]]
> > >>>>> Sent: Friday, June 12, 2015 3:59 AM
> > >>>>> To: Wu, Feng; [email protected]; [email protected]
> > >>>>> Cc: [email protected]; [email protected];
> > >>>>> [email protected]; [email protected]
> > >>>>> Subject: Re: [v4 08/16] KVM: kvm-vfio: User API for IRQ forwarding
> > >>>>>
> > >>>>> On 06/11/2015 01:51 PM, Feng Wu wrote:
> > >>>>>> From: Eric Auger <[email protected]>
> > >>>>>>
> > >>>>>> This patch adds and documents a new KVM_DEV_VFIO_DEVICE
> group
> > >>>>>> and 2 device attributes: KVM_DEV_VFIO_DEVICE_FORWARD_IRQ,
> > >>>>>> KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ. The purpose is to be
> able
> > >>>>>> to set a VFIO device IRQ as forwarded or not forwarded.
> > >>>>>> the command takes as argument a handle to a new struct named
> > >>>>>> kvm_vfio_dev_irq.
> > >>>>> Is there no way to do this automatically? After all, vfio knows that a
> > >>>>> device interrupt is forwarded to some eventfd, and kvm knows that
> some
> > >>>>> eventfd is forwarded to a guest interrupt. If they compare notes
> > >>>>> through a central registry, they can figure out that the interrupt needs
> > >>>>> to be forwarded.
> > >>>> Oh, just like Eric mentioned in his reply, this description is out of context
> of
> > >>>> this series, I will remove them in the next version.
> > >>>
> > >>> I suspect Avi's question was more general. While forward/unforward is
> > >>> out of context for this series, it's very similar in nature to
> > >>> enabling/disabling posted interrupts. So I think the question remains
> > >>> whether we really need userspace to participate in creating this
> > >>> shortcut or if kvm and vfio can some how orchestrate figuring it out
> > >>> automatically.
> > >>>
> > >>> Personally I don't know how we could do it automatically. We've always
> > >>> relied on userspace to independently setup vfio and kvm such that
> > >>> neither have any idea that the other is there and update each side
> > >>> independently when anything changes. So it seems consistent to
> continue
> > >>> that here. It doesn't seem like there's much to gain performance-wise
> > >>> either, updates should be a relatively rare event I'd expect.
> > >>>
> > >>> There's really no metadata associated with an eventfd, so "comparing
> > >>> notes" automatically might imply some central registration entity. That
> > >>> immediately sounds like a much more complex solution, but maybe Avi
> has
> > >>> some ideas to manage it. Thanks,
> > >>>
> > >>
> > >> The idea is to have a central registry maintained by a posted interrupts
> > >> manager. Both vfio and kvm pass the filp (along with extra information)
> > >> to the posted interrupts manager, which, when it detects a filp match,
> > >> tells each of them what to do.
> > >>
> > >> The advantages are:
> > >> - old userspace gains the optimization without change
> > >> - a userspace API is more expensive to maintain than internal kernel
> > >> interfaces (CVEs, documentation, maintaining backwards compatibility)
> > >> - if you can do it without a new interface, this indicates that all the
> > >> information in the new interface is redundant. That means you have to
> > >> check it for consistency with the existing information, so it's extra
> > >> work (likely, it's exactly what the posted interrupt manager would be
> > >> doing anyway).
> > >
> > > Yep, those all sound like good things and I believe that's similar in
> > > design to the way we had originally discussed this interaction at
> > > LPC/KVM Forum several years ago. I'd be in favor of that approach.
> >
> > I guess this discussion also is relevant wrt "[RFC v6 00/16] KVM-VFIO
> > IRQ forward control" series? Or is that "central registry maintained by
> > a posted interrupts manager" something more specific to x86?
>
> I'd think we'd want it for any sort of offload and supporting both
> posted-interrupts and irq-forwarding would be a good validation. I
> imagine there would be registration/de-registration callbacks separate
> for interrupt producers vs interrupt consumers. Each registration
> function would likely provide a struct of callbacks, probably similar to
> the get_symbol callbacks proposed for the kvm-vfio device on the IRQ
> producer side. The eventfd would be the token that the manager would
> use to match producers and consumers. The hard part is probably
> figuring out what information to retrieve from the producer and provide
> to the consumer in a generic way between pci and platform, but as an
> internal interface, it's not a big deal if we screw it up a few times to
> start. Thanks,

On posted-interrupts side, the main purpose of the new APIs is to update
the IRTE when guest changes vMSI/vMSIx configuration. Alex, do you have
any detailed ideas for the new solution to achieve this purpose? It should
be helpful if you can share some!

Thanks,
Feng

>
> Alex

????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2015-06-18 20:04:19

by Alex Williamson

[permalink] [raw]
Subject: Re: [v4 08/16] KVM: kvm-vfio: User API for IRQ forwarding

[Adding Joerg since he was part of this original idea]

On Thu, 2015-06-18 at 09:16 +0000, Wu, Feng wrote:
>
>
> > -----Original Message-----
> > From: Alex Williamson [mailto:[email protected]]
> > Sent: Tuesday, June 16, 2015 12:45 AM
> > To: Eric Auger
> > Cc: Avi Kivity; Wu, Feng; [email protected]; [email protected];
> > [email protected]; [email protected]
> > Subject: Re: [v4 08/16] KVM: kvm-vfio: User API for IRQ forwarding
> >
> > On Mon, 2015-06-15 at 18:17 +0200, Eric Auger wrote:
> > > Hi Alex, all,
> > > On 06/12/2015 09:03 PM, Alex Williamson wrote:
> > > > On Fri, 2015-06-12 at 21:48 +0300, Avi Kivity wrote:
> > > >> On 06/12/2015 06:41 PM, Alex Williamson wrote:
> > > >>> On Fri, 2015-06-12 at 00:23 +0000, Wu, Feng wrote:
> > > >>>>> -----Original Message-----
> > > >>>>> From: Avi Kivity [mailto:[email protected]]
> > > >>>>> Sent: Friday, June 12, 2015 3:59 AM
> > > >>>>> To: Wu, Feng; [email protected]; [email protected]
> > > >>>>> Cc: [email protected]; [email protected];
> > > >>>>> [email protected]; [email protected]
> > > >>>>> Subject: Re: [v4 08/16] KVM: kvm-vfio: User API for IRQ forwarding
> > > >>>>>
> > > >>>>> On 06/11/2015 01:51 PM, Feng Wu wrote:
> > > >>>>>> From: Eric Auger <[email protected]>
> > > >>>>>>
> > > >>>>>> This patch adds and documents a new KVM_DEV_VFIO_DEVICE
> > group
> > > >>>>>> and 2 device attributes: KVM_DEV_VFIO_DEVICE_FORWARD_IRQ,
> > > >>>>>> KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ. The purpose is to be
> > able
> > > >>>>>> to set a VFIO device IRQ as forwarded or not forwarded.
> > > >>>>>> the command takes as argument a handle to a new struct named
> > > >>>>>> kvm_vfio_dev_irq.
> > > >>>>> Is there no way to do this automatically? After all, vfio knows that a
> > > >>>>> device interrupt is forwarded to some eventfd, and kvm knows that
> > some
> > > >>>>> eventfd is forwarded to a guest interrupt. If they compare notes
> > > >>>>> through a central registry, they can figure out that the interrupt needs
> > > >>>>> to be forwarded.
> > > >>>> Oh, just like Eric mentioned in his reply, this description is out of context
> > of
> > > >>>> this series, I will remove them in the next version.
> > > >>>
> > > >>> I suspect Avi's question was more general. While forward/unforward is
> > > >>> out of context for this series, it's very similar in nature to
> > > >>> enabling/disabling posted interrupts. So I think the question remains
> > > >>> whether we really need userspace to participate in creating this
> > > >>> shortcut or if kvm and vfio can some how orchestrate figuring it out
> > > >>> automatically.
> > > >>>
> > > >>> Personally I don't know how we could do it automatically. We've always
> > > >>> relied on userspace to independently setup vfio and kvm such that
> > > >>> neither have any idea that the other is there and update each side
> > > >>> independently when anything changes. So it seems consistent to
> > continue
> > > >>> that here. It doesn't seem like there's much to gain performance-wise
> > > >>> either, updates should be a relatively rare event I'd expect.
> > > >>>
> > > >>> There's really no metadata associated with an eventfd, so "comparing
> > > >>> notes" automatically might imply some central registration entity. That
> > > >>> immediately sounds like a much more complex solution, but maybe Avi
> > has
> > > >>> some ideas to manage it. Thanks,
> > > >>>
> > > >>
> > > >> The idea is to have a central registry maintained by a posted interrupts
> > > >> manager. Both vfio and kvm pass the filp (along with extra information)
> > > >> to the posted interrupts manager, which, when it detects a filp match,
> > > >> tells each of them what to do.
> > > >>
> > > >> The advantages are:
> > > >> - old userspace gains the optimization without change
> > > >> - a userspace API is more expensive to maintain than internal kernel
> > > >> interfaces (CVEs, documentation, maintaining backwards compatibility)
> > > >> - if you can do it without a new interface, this indicates that all the
> > > >> information in the new interface is redundant. That means you have to
> > > >> check it for consistency with the existing information, so it's extra
> > > >> work (likely, it's exactly what the posted interrupt manager would be
> > > >> doing anyway).
> > > >
> > > > Yep, those all sound like good things and I believe that's similar in
> > > > design to the way we had originally discussed this interaction at
> > > > LPC/KVM Forum several years ago. I'd be in favor of that approach.
> > >
> > > I guess this discussion also is relevant wrt "[RFC v6 00/16] KVM-VFIO
> > > IRQ forward control" series? Or is that "central registry maintained by
> > > a posted interrupts manager" something more specific to x86?
> >
> > I'd think we'd want it for any sort of offload and supporting both
> > posted-interrupts and irq-forwarding would be a good validation. I
> > imagine there would be registration/de-registration callbacks separate
> > for interrupt producers vs interrupt consumers. Each registration
> > function would likely provide a struct of callbacks, probably similar to
> > the get_symbol callbacks proposed for the kvm-vfio device on the IRQ
> > producer side. The eventfd would be the token that the manager would
> > use to match producers and consumers. The hard part is probably
> > figuring out what information to retrieve from the producer and provide
> > to the consumer in a generic way between pci and platform, but as an
> > internal interface, it's not a big deal if we screw it up a few times to
> > start. Thanks,
>
> On posted-interrupts side, the main purpose of the new APIs is to update
> the IRTE when guest changes vMSI/vMSIx configuration. Alex, do you have
> any detailed ideas for the new solution to achieve this purpose? It should
> be helpful if you can share some!


There are plenty of details to be filled in, but I think the basics
looks something like the code below. The IRQ bypass manager just
defines a pair of structures, one for interrupt producers and one for
interrupt consumers. I'm certain that we'll need more callbacks than
I've defined below, but figuring out what those should be for the best
abstraction is the hardest part of this idea. The manager provides both
registration and de-registration interfaces for both types of objects
and keeps lists for each, protected by a lock. The manager doesn't even
really need to know what the match token is, but I assume for our
purposes it will be an eventfd_ctx.

On the vfio side, the producer struct would be embedded in the
vfio_pci_irq_ctx struct. KVM would probably embed the consumer struct
in _irqfd. As I've coded below, the IRQ bypass manager calls the
consumer callbacks, so the producer struct would need fields or
callbacks to provide the consumer the info it needs. AIUI the Posted
Interrupt model, VFIO only needs to provide data to the consumer. For
IRQ Forwarding, I think the producer needs to be informed when bypass is
active to model the incoming interrupt as edge vs level.

I've prototyped the base IRQ bypass manager here as static, but I don't
see any reason it couldn't be a module that's loaded by dependency when
either vfio-pci or kvm-intel is loaded (or other producer/consumer
objects).

Is this a reasonable starting point to craft the additional fields and
callbacks and interaction of who calls who that we need to support
Posted Interrupts and IRQ Forwarding? Is the AMD version of this still
alive? Thanks,

Alex

diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index 413a7bf..22f6fcb 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -61,6 +61,7 @@ config KVM_INTEL
depends on KVM
# for perf_guest_get_msrs():
depends on CPU_SUP_INTEL
+ select IRQ_BYPASS_MANAGER
---help---
Provides support for KVM on Intel processors equipped with the VT
extensions.
diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig
index 579d83b..02912f1 100644
--- a/drivers/vfio/pci/Kconfig
+++ b/drivers/vfio/pci/Kconfig
@@ -2,6 +2,7 @@ config VFIO_PCI
tristate "VFIO support for PCI devices"
depends on VFIO && PCI && EVENTFD
select VFIO_VIRQFD
+ select IRQ_BYPASS_MANAGER
help
Support for the PCI VFIO bus driver. This is required to make
use of PCI drivers using the VFIO framework.
diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
index 1f577b4..4e053be 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -181,6 +181,7 @@ static int vfio_intx_set_signal(struct vfio_pci_device *vdev, int fd)

if (vdev->ctx[0].trigger) {
free_irq(pdev->irq, vdev);
+ /* irq_bypass_unregister_producer(); */
kfree(vdev->ctx[0].name);
eventfd_ctx_put(vdev->ctx[0].trigger);
vdev->ctx[0].trigger = NULL;
@@ -214,6 +215,8 @@ static int vfio_intx_set_signal(struct vfio_pci_device *vdev, int fd)
return ret;
}

+ /* irq_bypass_register_producer(); */
+
/*
* INTx disable will stick across the new irq setup,
* disable_irq won't.
@@ -319,6 +322,7 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_device *vdev,

if (vdev->ctx[vector].trigger) {
free_irq(irq, vdev->ctx[vector].trigger);
+ /* irq_bypass_unregister_producer(); */
kfree(vdev->ctx[vector].name);
eventfd_ctx_put(vdev->ctx[vector].trigger);
vdev->ctx[vector].trigger = NULL;
@@ -360,6 +364,8 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_device *vdev,
return ret;
}

+ /* irq_bypass_register_producer(); */
+
vdev->ctx[vector].trigger = trigger;

return 0;
diff --git a/include/linux/irqbypass.h b/include/linux/irqbypass.h
new file mode 100644
index 0000000..718508e
--- /dev/null
+++ b/include/linux/irqbypass.h
@@ -0,0 +1,23 @@
+#ifndef IRQBYPASS_H
+#define IRQBYPASS_H
+
+#include <linux/list.h>
+
+struct irq_bypass_producer {
+ struct list_head node;
+ void *token;
+ /* TBD */
+};
+
+struct irq_bypass_consumer {
+ struct list_head node;
+ void *token;
+ void (*add_producer)(struct irq_bypass_producer *);
+ void (*del_producer)(struct irq_bypass_producer *);
+};
+
+int irq_bypass_register_producer(struct irq_bypass_producer *);
+void irq_bypass_unregister_producer(struct irq_bypass_producer *);
+int irq_bypass_register_consumer(struct irq_bypass_consumer *);
+void irq_bypass_unregister_consumer(struct irq_bypass_consumer *);
+#endif /* IRQBYPASS_H */
diff --git a/kernel/irq/Kconfig b/kernel/irq/Kconfig
index 9a76e3b..4502cdc 100644
--- a/kernel/irq/Kconfig
+++ b/kernel/irq/Kconfig
@@ -100,4 +100,7 @@ config SPARSE_IRQ

If you don't know what to do here, say N.

+config IRQ_BYPASS_MANAGER
+ bool
+
endmenu
diff --git a/kernel/irq/Makefile b/kernel/irq/Makefile
index d121235..a30ed77 100644
--- a/kernel/irq/Makefile
+++ b/kernel/irq/Makefile
@@ -7,3 +7,4 @@ obj-$(CONFIG_PROC_FS) += proc.o
obj-$(CONFIG_GENERIC_PENDING_IRQ) += migration.o
obj-$(CONFIG_PM_SLEEP) += pm.o
obj-$(CONFIG_GENERIC_MSI_IRQ) += msi.o
+obj-$(CONFIG_IRQ_BYPASS_MANAGER) += bypass.o
diff --git a/kernel/irq/bypass.c b/kernel/irq/bypass.c
new file mode 100644
index 0000000..5d0f92b
--- /dev/null
+++ b/kernel/irq/bypass.c
@@ -0,0 +1,116 @@
+/*
+ * IRQ offload/bypass manager
+ *
+ * Various virtualization hardware acceleration techniques allow bypassing
+ * or offloading interrupts receieved from devices around the host kernel.
+ * Posted Interrupts on Intel VT-d systems can allow interrupts to be
+ * recieved directly by a virtual machine. ARM IRQ Forwarding can allow
+ * level triggered device interrupts to be de-asserted directly by the VM.
+ * This manager allows interrupt producers and consumers to find each other
+ * to enable this sort of bypass.
+ */
+
+#include <linux/irqbypass.h>
+#include <linux/list.h>
+#include <linux/module.h>
+#include <linux/mutex.h>
+
+static LIST_HEAD(producers);
+static LIST_HEAD(consumers);
+static DEFINE_MUTEX(lock);
+
+int irq_bypass_register_producer(struct irq_bypass_producer *producer)
+{
+ struct irq_bypass_producer *tmp;
+ struct irq_bypass_consumer *consumer;
+ int ret = 0;
+
+ mutex_lock(&lock);
+
+ list_for_each_entry(tmp, &producers, node) {
+ if (tmp->token == producer->token) {
+ ret = -EINVAL;
+ goto unlock;
+ }
+ }
+
+ list_add(&producer->node, &producers);
+
+ list_for_each_entry(consumer, &consumers, node) {
+ if (consumer->token == producer->token) {
+ consumer->add_producer(producer);
+ break;
+ }
+ }
+unlock:
+ mutex_unlock(&lock);
+ return ret;
+}
+EXPORT_SYMBOL_GPL(irq_bypass_register_producer);
+
+void irq_bypass_unregister_producer(struct irq_bypass_producer *producer)
+{
+ struct irq_bypass_consumer *consumer;
+
+ mutex_lock(&lock);
+
+ list_for_each_entry(consumer, &consumers, node) {
+ if (consumer->token == producer->token) {
+ consumer->del_producer(producer);
+ break;
+ }
+ }
+
+ list_del(&producer->node);
+
+ mutex_unlock(&lock);
+}
+EXPORT_SYMBOL_GPL(irq_bypass_unregister_producer);
+
+int irq_bypass_register_consumer(struct irq_bypass_consumer *consumer)
+{
+ struct irq_bypass_consumer *tmp;
+ struct irq_bypass_producer *producer;
+ int ret = 0;
+
+ mutex_lock(&lock);
+
+ list_for_each_entry(tmp, &consumers, node) {
+ if (tmp->token == consumer->token) {
+ ret = -EINVAL;
+ goto unlock;
+ }
+ }
+
+ list_add(&consumer->node, &consumers);
+
+ list_for_each_entry(producer, &producers, node) {
+ if (producer->token == consumer->token) {
+ consumer->add_producer(producer);
+ break;
+ }
+ }
+unlock:
+ mutex_unlock(&lock);
+ return ret;
+}
+EXPORT_SYMBOL_GPL(irq_bypass_register_consumer);
+
+void irq_bypass_unregister_consumer(struct irq_bypass_consumer *consumer)
+{
+ struct irq_bypass_producer *producer;
+
+ mutex_lock(&lock);
+
+ list_for_each_entry(producer, &producers, node) {
+ if (producer->token == consumer->token) {
+ consumer->del_producer(producer);
+ break;
+ }
+ }
+
+ list_del(&consumer->node);
+
+ mutex_unlock(&lock);
+}
+EXPORT_SYMBOL_GPL(irq_bypass_unregister_consumer);
diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
index 9ff4193..f3da161 100644
--- a/virt/kvm/eventfd.c
+++ b/virt/kvm/eventfd.c
@@ -429,6 +429,8 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args)
*/
fdput(f);

+ /* irq_bypass_register_consumer(); */
+
return 0;

fail:
@@ -528,6 +530,8 @@ kvm_irqfd_deassign(struct kvm *kvm, struct kvm_irqfd *args)
struct _irqfd *irqfd, *tmp;
struct eventfd_ctx *eventfd;

+ /* irq_bypass_unregister_consumer() */
+
eventfd = eventfd_ctx_fdget(args->fd);
if (IS_ERR(eventfd))
return PTR_ERR(eventfd);


2015-06-23 15:50:04

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [v4 01/16] KVM: Extend struct pi_desc for VT-d Posted-Interrupts



On 11/06/2015 12:51, Feng Wu wrote:
> + union {
> + struct {
> + /* bit 256 - Outstanding Notification */
> + u64 on : 1,
> + /* bit 257 - Suppress Notification */
> + sn : 1,
> + /* bit 271:258 - Reserved */
> + rsvd_1 : 14,
> + /* bit 279:272 - Notification Vector */
> + nv : 8,
> + /* bit 287:280 - Reserved */
> + rsvd_2 : 8,
> + /* bit 319:288 - Notification Destination */
> + ndst : 32;

Please do not use bitfields, Linus doesn't like them. Use static
inlines instead to extract or set the fields from the control field.

Paolo

> + };
> + u64 control;
> + };
> + u32 rsvd[6];

2015-06-23 16:00:51

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [v4 12/16] KVM: kvm-vfio: implement the VFIO skeleton for VT-d Posted-Interrupts



On 12/06/2015 16:51, Alex Williamson wrote:
> > > The code below can't get to this function without
> > > __KVM_HAVE_ARCH_KVM_VFIO_POST, but this seems like it should return an
> > > error if not implemented.
> >
> > kvm_arch_vfio_update_pi_irte() is called by kvm_vfio_control_pi(), if we remove the
> > dummy definition of kvm_arch_vfio_update_pi_irte(), kvm_vfio_control_pi() is also
> > needed to be included in __KVM_HAVE_ARCH_KVM_VFIO_POST, I will handle this
> > in the next version.
>
> But typically we wouldn't return success for a function that's not
> implemented.

I agree. Moving the implementation to vmx.c would also let you make
kvm_arch_update_pi_irte() the only entry-point to the functionality in
vmx.c.

Paolo

2015-06-23 16:05:50

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [v4 15/16] KVM: Update Posted-Interrupts Descriptor when vCPU is blocked



On 11/06/2015 12:51, Feng Wu wrote:
> - vcpu->arch.mp_state = KVM_MP_STATE_HALTED;
> + /* Handle posted-interrupt when vCPU is to be halted */
> + if (!kvm_x86_ops->pi_pre_block ||
> + (kvm_x86_ops->pi_pre_block &&

This condition is unnecessary.

Paolo

> + kvm_x86_ops->pi_pre_block(vcpu) == 0))
> + vcpu->arch.mp_state = KVM_MP_STATE_HALTED;

2015-06-24 05:42:39

by Wu, Feng

[permalink] [raw]
Subject: RE: [v4 01/16] KVM: Extend struct pi_desc for VT-d Posted-Interrupts



> -----Original Message-----
> From: Paolo Bonzini [mailto:[email protected]]
> Sent: Tuesday, June 23, 2015 11:50 PM
> To: Wu, Feng; [email protected]; [email protected]
> Cc: [email protected]; [email protected];
> [email protected]
> Subject: Re: [v4 01/16] KVM: Extend struct pi_desc for VT-d Posted-Interrupts
>
>
>
> On 11/06/2015 12:51, Feng Wu wrote:
> > + union {
> > + struct {
> > + /* bit 256 - Outstanding Notification */
> > + u64 on : 1,
> > + /* bit 257 - Suppress Notification */
> > + sn : 1,
> > + /* bit 271:258 - Reserved */
> > + rsvd_1 : 14,
> > + /* bit 279:272 - Notification Vector */
> > + nv : 8,
> > + /* bit 287:280 - Reserved */
> > + rsvd_2 : 8,
> > + /* bit 319:288 - Notification Destination */
> > + ndst : 32;
>
> Please do not use bitfields, Linus doesn't like them. Use static
> inlines instead to extract or set the fields from the control field.

Do you mean we don't use bitfields at all, or the following format is
acceptable, in which, we use bitfileds as less as possible?

union {
struct
{
u16 on : 1, /* bit 256 - Outstanding Notification */
sn : 1, /* bit 257 - Suppress Notification */
rsvd_1 : 14; /* bit 271:258 - Reserved */
u8 nv; /* bit 279:272 - Notification Vector */
u8 rsvd_2; /* bit 287:280 - Reserved */
u32 ndst; /* bit 319:288 - Notification Destination */
};
u64 control;
};

Thanks,
Feng

>
> Paolo
>
> > + };
> > + u64 control;
> > + };
> > + u32 rsvd[6];

2015-06-24 08:46:21

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [v4 01/16] KVM: Extend struct pi_desc for VT-d Posted-Interrupts



On 24/06/2015 07:42, Wu, Feng wrote:
> Do you mean we don't use bitfields at all, or the following format is
> acceptable, in which, we use bitfileds as less as possible?

> union {
> struct
> {
> u16 on : 1, /* bit 256 - Outstanding Notification */
> sn : 1, /* bit 257 - Suppress Notification */
> rsvd_1 : 14; /* bit 271:258 - Reserved */
> u8 nv; /* bit 279:272 - Notification Vector */
> u8 rsvd_2; /* bit 287:280 - Reserved */
> u32 ndst; /* bit 319:288 - Notification Destination */
> };
> u64 control;

This is okay.

Paolo

2015-06-24 15:46:29

by Joerg Roedel

[permalink] [raw]
Subject: Re: [v4 08/16] KVM: kvm-vfio: User API for IRQ forwarding

On Thu, Jun 18, 2015 at 02:04:08PM -0600, Alex Williamson wrote:
> There are plenty of details to be filled in,

I also need to fill plenty of details in my head first, so here are some
suggestions based on my current understanding. Please don't hesitate to
correct me if where I got something wrong.

So first I totally agree that the handling of PI/non-PI configurations
should be transparent to user-space.

I read a bit through the VT-d spec, and my understanding of posted
interrupts so far is that:

1) Each VCPU gets a PI-Descriptor with its pending Posted
Interrupts. This descriptor needs to be updated when a VCPU
is migrated to another PCPU and should thus be under control
of KVM.

This is similar to the vAPIC backing page in the AMD version
of this, except that the PCPU routing information is stored
somewhere else on AMD.

2) As long as the VCPU runs the IRTEs are configured for
posting, when the VCPU goes to sleep the old remapped entry is
established again. So when the VCPU sleeps the interrupt
would get routed to VFIO and forwarded through the eventfd.

This would be different to the AMD version, where we have a
running bit. When this is clear the IOMMU will trigger an event
in its event-log. This might need special handling in VFIO
('might' because VFIO does not need to forward the interrupt,
it just needs to make sure the VCPU wakes up).

Please correct me if my understanding of the Intel version is
wrong.

So most of the data structures the IOMMU reads for this need to be
updated from KVM code (either x86-generic or AMD/Intel specific code),
as KVM has the information about VCPU load/unload and the IRQ routing.

What KVM needs from VFIO are the informations about the physical
interrupts, and it makes total sense to attach them as metadata to the
eventfd.

But the problems start at how this metadata should look like. It would
be good to have some generic description, but not sure if this is
possible. Otherwise this metadata would need to be requested by VFIO
from the IOMMU driver and passed on to KVM, which it then passes back to
the IOMMU driver. Or something like that.



Joerg

2015-06-24 15:50:09

by Joerg Roedel

[permalink] [raw]
Subject: Re: [v4 08/16] KVM: kvm-vfio: User API for IRQ forwarding

On Mon, Jun 15, 2015 at 06:17:03PM +0200, Eric Auger wrote:
> I guess this discussion also is relevant wrt "[RFC v6 00/16] KVM-VFIO
> IRQ forward control" series? Or is that "central registry maintained by
> a posted interrupts manager" something more specific to x86?

>From what I understood so far, the feature you implemented for ARM is a
bit different from the ones that get introduced to x86.

Can you please share some details on how the ARM version works? I am
interested in how the GICv2 is configured for IRQ forwarding. The
question is whether the forwarding information needs to be updated from
KVM and what information about the IRQ KVM needs for this.


Joerg

2015-06-24 16:25:37

by Eric Auger

[permalink] [raw]
Subject: Re: [v4 08/16] KVM: kvm-vfio: User API for IRQ forwarding

Hi Joerg,

On 06/24/2015 05:50 PM, Joerg Roedel wrote:
> On Mon, Jun 15, 2015 at 06:17:03PM +0200, Eric Auger wrote:
>> I guess this discussion also is relevant wrt "[RFC v6 00/16] KVM-VFIO
>> IRQ forward control" series? Or is that "central registry maintained by
>> a posted interrupts manager" something more specific to x86?
>
> From what I understood so far, the feature you implemented for ARM is a
> bit different from the ones that get introduced to x86.
>
> Can you please share some details on how the ARM version works? I am
> interested in how the GICv2 is configured for IRQ forwarding. The
> question is whether the forwarding information needs to be updated from
> KVM and what information about the IRQ KVM needs for this.

The principle is that when you inject a virtual IRQ to a guest, you
program a register in the GIC, known as a list register. There you put
both the virtual IRQ you want to inject but also the physical IRQ it is
linked with (HWbit mode set = forwarding set). When the guest completes
the virtual IRQ the GIC HW automatically deactivates the physical IRQ
found in the list register. In that mode the physical IRQ deactivation
is under the ownership of the guest (actually automatically done by the HW).

If HWbit mode is *not* set (forwarding not set), you do not specify the
HW IRQ in the list register. The host deactivates the physical IRQ &
masks it before triggering the virtual IRQ. Only the virtual IRQ ID is
programmed in the list register. When the guest completes the virtual
IRQ, a physical maintenance IRQ is triggered. The hyp mode is entered
and eventually the host unmasks the IRQ.

Some illustrations can be found in
http://www.linux-kvm.org/images/a/a8/01x04-ARMdevice.pdf

Hope it helps

Eric
>
>
> Joerg
>

2015-06-24 19:49:33

by Alex Williamson

[permalink] [raw]
Subject: Re: [v4 08/16] KVM: kvm-vfio: User API for IRQ forwarding

On Wed, 2015-06-24 at 18:25 +0200, Eric Auger wrote:
> Hi Joerg,
>
> On 06/24/2015 05:50 PM, Joerg Roedel wrote:
> > On Mon, Jun 15, 2015 at 06:17:03PM +0200, Eric Auger wrote:
> >> I guess this discussion also is relevant wrt "[RFC v6 00/16] KVM-VFIO
> >> IRQ forward control" series? Or is that "central registry maintained by
> >> a posted interrupts manager" something more specific to x86?
> >
> > From what I understood so far, the feature you implemented for ARM is a
> > bit different from the ones that get introduced to x86.
> >
> > Can you please share some details on how the ARM version works? I am
> > interested in how the GICv2 is configured for IRQ forwarding. The
> > question is whether the forwarding information needs to be updated from
> > KVM and what information about the IRQ KVM needs for this.
>
> The principle is that when you inject a virtual IRQ to a guest, you
> program a register in the GIC, known as a list register. There you put
> both the virtual IRQ you want to inject but also the physical IRQ it is
> linked with (HWbit mode set = forwarding set). When the guest completes
> the virtual IRQ the GIC HW automatically deactivates the physical IRQ
> found in the list register. In that mode the physical IRQ deactivation
> is under the ownership of the guest (actually automatically done by the HW).
>
> If HWbit mode is *not* set (forwarding not set), you do not specify the
> HW IRQ in the list register. The host deactivates the physical IRQ &
> masks it before triggering the virtual IRQ. Only the virtual IRQ ID is
> programmed in the list register. When the guest completes the virtual
> IRQ, a physical maintenance IRQ is triggered. The hyp mode is entered
> and eventually the host unmasks the IRQ.
>
> Some illustrations can be found in
> http://www.linux-kvm.org/images/a/a8/01x04-ARMdevice.pdf


I think an important aspect for our design is that in the case of Posted
Interrupts, they're only used for edge triggered interrupts so VFIO is
only an information provider for KVM to configure it. VFIO will
hopefully just see fewer interrupts as they magically appear directly in
the guest. IRQ Forwarding however affects the de-assertion of level
triggered interrupts. VFIO needs to switch to something more like an
edge handler when IRQ Forwarding is enabled. So in that model, VFIO
needs to provide information as well as consume it to change behavior.
Thanks,

Alex

2015-06-25 01:54:45

by Wu, Feng

[permalink] [raw]
Subject: RE: [v4 08/16] KVM: kvm-vfio: User API for IRQ forwarding



> -----Original Message-----
> From: Joerg Roedel [mailto:[email protected]]
> Sent: Wednesday, June 24, 2015 11:46 PM
> To: Alex Williamson
> Cc: Wu, Feng; Eric Auger; Avi Kivity; [email protected];
> [email protected]; [email protected]; [email protected]
> Subject: Re: [v4 08/16] KVM: kvm-vfio: User API for IRQ forwarding
>
> On Thu, Jun 18, 2015 at 02:04:08PM -0600, Alex Williamson wrote:
> > There are plenty of details to be filled in,
>
> I also need to fill plenty of details in my head first, so here are some
> suggestions based on my current understanding. Please don't hesitate to
> correct me if where I got something wrong.
>
> So first I totally agree that the handling of PI/non-PI configurations
> should be transparent to user-space.
>
> I read a bit through the VT-d spec, and my understanding of posted
> interrupts so far is that:
>
> 1) Each VCPU gets a PI-Descriptor with its pending Posted
> Interrupts. This descriptor needs to be updated when a VCPU
> is migrated to another PCPU and should thus be under control
> of KVM.
>
> This is similar to the vAPIC backing page in the AMD version
> of this, except that the PCPU routing information is stored
> somewhere else on AMD.
>
> 2) As long as the VCPU runs the IRTEs are configured for
> posting, when the VCPU goes to sleep the old remapped entry is
> established again. So when the VCPU sleeps the interrupt
> would get routed to VFIO and forwarded through the eventfd.

When the vCPU sleeps, says, blocked when guest is running HLT, the
interrupt is still in posted mode. The solution is when the vCPU is blocked,
we use another notification vector (named wakeup notification vector) to
wakeup the blocked vCPU when interrupts happens. And in the wakeup
event handler, we unblock the vCPU.

Thanks,
Feng

>
> This would be different to the AMD version, where we have a
> running bit. When this is clear the IOMMU will trigger an event
> in its event-log. This might need special handling in VFIO
> ('might' because VFIO does not need to forward the interrupt,
> it just needs to make sure the VCPU wakes up).
>
> Please correct me if my understanding of the Intel version is
> wrong.
>
> So most of the data structures the IOMMU reads for this need to be
> updated from KVM code (either x86-generic or AMD/Intel specific code),
> as KVM has the information about VCPU load/unload and the IRQ routing.

Yes, this part has nothing to do with VFIO, KVM itself can handle it well.

>
> What KVM needs from VFIO are the informations about the physical
> interrupts, and it makes total sense to attach them as metadata to the
> eventfd.

When guest set the irq affinity, QEMU first gets the MSI/MSIx configuration,
then it passes these information to kernel space via VFIO infrastructure, we
need these MSI/MSIx configuration to update the associated posted-format
IRTE according. This is the key point for PI in term of VFIO.

Thanks,
Feng

>
> But the problems start at how this metadata should look like. It would
> be good to have some generic description, but not sure if this is
> possible. Otherwise this metadata would need to be requested by VFIO
> from the IOMMU driver and passed on to KVM, which it then passes back to
> the IOMMU driver. Or something like that.
>
>
>
> Joerg

2015-06-25 01:57:40

by Wu, Feng

[permalink] [raw]
Subject: RE: [v4 08/16] KVM: kvm-vfio: User API for IRQ forwarding



> -----Original Message-----
> From: Alex Williamson [mailto:[email protected]]
> Sent: Thursday, June 25, 2015 3:49 AM
> To: Eric Auger
> Cc: Joerg Roedel; Avi Kivity; Wu, Feng; [email protected];
> [email protected]; [email protected]; [email protected]
> Subject: Re: [v4 08/16] KVM: kvm-vfio: User API for IRQ forwarding
>
> On Wed, 2015-06-24 at 18:25 +0200, Eric Auger wrote:
> > Hi Joerg,
> >
> > On 06/24/2015 05:50 PM, Joerg Roedel wrote:
> > > On Mon, Jun 15, 2015 at 06:17:03PM +0200, Eric Auger wrote:
> > >> I guess this discussion also is relevant wrt "[RFC v6 00/16] KVM-VFIO
> > >> IRQ forward control" series? Or is that "central registry maintained by
> > >> a posted interrupts manager" something more specific to x86?
> > >
> > > From what I understood so far, the feature you implemented for ARM is a
> > > bit different from the ones that get introduced to x86.
> > >
> > > Can you please share some details on how the ARM version works? I am
> > > interested in how the GICv2 is configured for IRQ forwarding. The
> > > question is whether the forwarding information needs to be updated from
> > > KVM and what information about the IRQ KVM needs for this.
> >
> > The principle is that when you inject a virtual IRQ to a guest, you
> > program a register in the GIC, known as a list register. There you put
> > both the virtual IRQ you want to inject but also the physical IRQ it is
> > linked with (HWbit mode set = forwarding set). When the guest completes
> > the virtual IRQ the GIC HW automatically deactivates the physical IRQ
> > found in the list register. In that mode the physical IRQ deactivation
> > is under the ownership of the guest (actually automatically done by the HW).
> >
> > If HWbit mode is *not* set (forwarding not set), you do not specify the
> > HW IRQ in the list register. The host deactivates the physical IRQ &
> > masks it before triggering the virtual IRQ. Only the virtual IRQ ID is
> > programmed in the list register. When the guest completes the virtual
> > IRQ, a physical maintenance IRQ is triggered. The hyp mode is entered
> > and eventually the host unmasks the IRQ.
> >
> > Some illustrations can be found in
> > http://www.linux-kvm.org/images/a/a8/01x04-ARMdevice.pdf
>
>
> I think an important aspect for our design is that in the case of Posted
> Interrupts, they're only used for edge triggered interrupts so VFIO is
> only an information provider for KVM to configure it.

Exactly! For PI, KVM only needs some information from VFIO when the
guests set the irq affinity.

Thanks,
Feng

VFIO will
> hopefully just see fewer interrupts as they magically appear directly in
> the guest. IRQ Forwarding however affects the de-assertion of level
> triggered interrupts. VFIO needs to switch to something more like an
> edge handler when IRQ Forwarding is enabled. So in that model, VFIO
> needs to provide information as well as consume it to change behavior.
> Thanks,
>
> Alex

????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2015-06-25 09:37:20

by Wu, Feng

[permalink] [raw]
Subject: RE: [v4 08/16] KVM: kvm-vfio: User API for IRQ forwarding



> -----Original Message-----
> From: Joerg Roedel [mailto:[email protected]]
> Sent: Wednesday, June 24, 2015 11:46 PM
> To: Alex Williamson
> Cc: Wu, Feng; Eric Auger; Avi Kivity; [email protected];
> [email protected]; [email protected]; [email protected]
> Subject: Re: [v4 08/16] KVM: kvm-vfio: User API for IRQ forwarding
>
> On Thu, Jun 18, 2015 at 02:04:08PM -0600, Alex Williamson wrote:
> > There are plenty of details to be filled in,
>
> I also need to fill plenty of details in my head first, so here are some
> suggestions based on my current understanding. Please don't hesitate to
> correct me if where I got something wrong.
>
> So first I totally agree that the handling of PI/non-PI configurations
> should be transparent to user-space.

After thinking about this a bit more, I recall that why I used user-space
to trigger the IRTE update for posted-interrupts, here is the reason:

Let's take MSI for an example:
When guest updates the MSI configuration, here is the code path in
QEMU and KVM:

vfio_update_msi() --> vfio_update_kvm_msi_virq() -->
kvm_irqchip_update_msi_route() --> kvm_update_routing_entry() -->
kvm_irqchip_commit_routes() --> kvm_irqchip_commit_routes() -->
KVM_SET_GSI_ROUTING --> kvm_set_irq_routing()

It will finally go to kvm_set_irq_routing() in KVM, there are two problem:
1. It use RCU in this function, it is hard to find which entry in the irq routing
table is being updated.
2. Even we find the updated entry, it is hard to find the associated assigned
device with this irq routing entry.

So I used a VFIO API to notify KVM the updated MSI/MSIx configuration and
the associated assigned devices. I think we need to find a way to address
the above two issues before going forward. Alex, what is your opinion?
Thanks a lot!

Thanks,
Feng


>
> I read a bit through the VT-d spec, and my understanding of posted
> interrupts so far is that:
>
> 1) Each VCPU gets a PI-Descriptor with its pending Posted
> Interrupts. This descriptor needs to be updated when a VCPU
> is migrated to another PCPU and should thus be under control
> of KVM.
>
> This is similar to the vAPIC backing page in the AMD version
> of this, except that the PCPU routing information is stored
> somewhere else on AMD.
>
> 2) As long as the VCPU runs the IRTEs are configured for
> posting, when the VCPU goes to sleep the old remapped entry is
> established again. So when the VCPU sleeps the interrupt
> would get routed to VFIO and forwarded through the eventfd.
>
> This would be different to the AMD version, where we have a
> running bit. When this is clear the IOMMU will trigger an event
> in its event-log. This might need special handling in VFIO
> ('might' because VFIO does not need to forward the interrupt,
> it just needs to make sure the VCPU wakes up).
>
> Please correct me if my understanding of the Intel version is
> wrong.
>
> So most of the data structures the IOMMU reads for this need to be
> updated from KVM code (either x86-generic or AMD/Intel specific code),
> as KVM has the information about VCPU load/unload and the IRQ routing.
>
> What KVM needs from VFIO are the informations about the physical
> interrupts, and it makes total sense to attach them as metadata to the
> eventfd.
>
> But the problems start at how this metadata should look like. It would
> be good to have some generic description, but not sure if this is
> possible. Otherwise this metadata would need to be requested by VFIO
> from the IOMMU driver and passed on to KVM, which it then passes back to
> the IOMMU driver. Or something like that.
>
>
>
> Joerg

2015-06-25 15:12:03

by Alex Williamson

[permalink] [raw]
Subject: Re: [v4 08/16] KVM: kvm-vfio: User API for IRQ forwarding

On Thu, 2015-06-25 at 09:37 +0000, Wu, Feng wrote:
>
> > -----Original Message-----
> > From: Joerg Roedel [mailto:[email protected]]
> > Sent: Wednesday, June 24, 2015 11:46 PM
> > To: Alex Williamson
> > Cc: Wu, Feng; Eric Auger; Avi Kivity; [email protected];
> > [email protected]; [email protected]; [email protected]
> > Subject: Re: [v4 08/16] KVM: kvm-vfio: User API for IRQ forwarding
> >
> > On Thu, Jun 18, 2015 at 02:04:08PM -0600, Alex Williamson wrote:
> > > There are plenty of details to be filled in,
> >
> > I also need to fill plenty of details in my head first, so here are some
> > suggestions based on my current understanding. Please don't hesitate to
> > correct me if where I got something wrong.
> >
> > So first I totally agree that the handling of PI/non-PI configurations
> > should be transparent to user-space.
>
> After thinking about this a bit more, I recall that why I used user-space
> to trigger the IRTE update for posted-interrupts, here is the reason:
>
> Let's take MSI for an example:
> When guest updates the MSI configuration, here is the code path in
> QEMU and KVM:
>
> vfio_update_msi() --> vfio_update_kvm_msi_virq() -->
> kvm_irqchip_update_msi_route() --> kvm_update_routing_entry() -->
> kvm_irqchip_commit_routes() --> kvm_irqchip_commit_routes() -->
> KVM_SET_GSI_ROUTING --> kvm_set_irq_routing()
>
> It will finally go to kvm_set_irq_routing() in KVM, there are two problem:
> 1. It use RCU in this function, it is hard to find which entry in the irq routing
> table is being updated.
> 2. Even we find the updated entry, it is hard to find the associated assigned
> device with this irq routing entry.
>
> So I used a VFIO API to notify KVM the updated MSI/MSIx configuration and
> the associated assigned devices. I think we need to find a way to address
> the above two issues before going forward. Alex, what is your opinion?

So the trouble is that QEMU vfio updates a single MSI vector, but that
just updates a single entry within a whole table of routes, then the
whole table is pushed to KVM. But in kvm_set_irq_routing() we have
access to both the new and the old tables, so we do have the ability to
detect the change. We can therefore detect which GSI changed and
cross-reference that to KVMs irqfds. If we have an irqfd that matches
the GSI then we have all the information we need, right? We can use the
eventfd_ctx of the irqfd to call into the IRQ bypass manager if we need
to. If it's an irqfd that's already enabled for bypass then we may
already have the data we need to tweak the PI config.

Yes, I agree it's more difficult, but it doesn't appear to be
impossible, right? Thanks,

Alex

2015-06-29 09:06:34

by Joerg Roedel

[permalink] [raw]
Subject: Re: [v4 08/16] KVM: kvm-vfio: User API for IRQ forwarding

Hi Feng,

On Thu, Jun 25, 2015 at 09:11:52AM -0600, Alex Williamson wrote:
> So the trouble is that QEMU vfio updates a single MSI vector, but that
> just updates a single entry within a whole table of routes, then the
> whole table is pushed to KVM. But in kvm_set_irq_routing() we have
> access to both the new and the old tables, so we do have the ability to
> detect the change. We can therefore detect which GSI changed and
> cross-reference that to KVMs irqfds. If we have an irqfd that matches
> the GSI then we have all the information we need, right? We can use the
> eventfd_ctx of the irqfd to call into the IRQ bypass manager if we need
> to. If it's an irqfd that's already enabled for bypass then we may
> already have the data we need to tweak the PI config.
>
> Yes, I agree it's more difficult, but it doesn't appear to be
> impossible, right?

Since this also doesn't happen very often, you could also just update _all_
PI data-structures from kvm_set_irq_routing, no? This would just
resemble the way the API works anyway.

You just need to be careful to update the data structures only when the
function can't fail anymore, so that you don't have to roll back
anything.


Joerg

2015-06-29 09:15:09

by Wu, Feng

[permalink] [raw]
Subject: RE: [v4 08/16] KVM: kvm-vfio: User API for IRQ forwarding



> -----Original Message-----
> From: Joerg Roedel [mailto:[email protected]]
> Sent: Monday, June 29, 2015 5:06 PM
> To: Wu, Feng
> Cc: Alex Williamson; Eric Auger; Avi Kivity; [email protected];
> [email protected]; [email protected]; [email protected]
> Subject: Re: [v4 08/16] KVM: kvm-vfio: User API for IRQ forwarding
>
> Hi Feng,
>
> On Thu, Jun 25, 2015 at 09:11:52AM -0600, Alex Williamson wrote:
> > So the trouble is that QEMU vfio updates a single MSI vector, but that
> > just updates a single entry within a whole table of routes, then the
> > whole table is pushed to KVM. But in kvm_set_irq_routing() we have
> > access to both the new and the old tables, so we do have the ability to
> > detect the change. We can therefore detect which GSI changed and
> > cross-reference that to KVMs irqfds. If we have an irqfd that matches
> > the GSI then we have all the information we need, right? We can use the
> > eventfd_ctx of the irqfd to call into the IRQ bypass manager if we need
> > to. If it's an irqfd that's already enabled for bypass then we may
> > already have the data we need to tweak the PI config.
> >
> > Yes, I agree it's more difficult, but it doesn't appear to be
> > impossible, right?
>
> Since this also doesn't happen very often, you could also just update _all_
> PI data-structures from kvm_set_irq_routing, no? This would just
> resemble the way the API works anyway.

Thanks a lot for your suggestion, Joerg!

Do you mean updating the hardware IRTEs for all the entries in the irq
routing table, no matter whether it is the updated one?

Thanks,
Feng

>
> You just need to be careful to update the data structures only when the
> function can't fail anymore, so that you don't have to roll back
> anything.
>
>
> Joerg

2015-06-29 09:22:51

by Joerg Roedel

[permalink] [raw]
Subject: Re: [v4 08/16] KVM: kvm-vfio: User API for IRQ forwarding

On Mon, Jun 29, 2015 at 09:14:54AM +0000, Wu, Feng wrote:
> Do you mean updating the hardware IRTEs for all the entries in the irq
> routing table, no matter whether it is the updated one?

Right, that's what I mean. It seems wrong to me to work around the API
interface by creating a diff between the old and the new routing table.
It is much simpler (and easier to maintain) to just update the IRTE
and PI structures for all IRQs in the routing table, especially since
this is not a hot-path.


Joerg

2015-06-29 13:01:45

by Wu, Feng

[permalink] [raw]
Subject: RE: [v4 08/16] KVM: kvm-vfio: User API for IRQ forwarding



> -----Original Message-----
> From: Joerg Roedel [mailto:[email protected]]
> Sent: Monday, June 29, 2015 5:23 PM
> To: Wu, Feng
> Cc: Alex Williamson; Eric Auger; Avi Kivity; [email protected];
> [email protected]; [email protected]; [email protected]
> Subject: Re: [v4 08/16] KVM: kvm-vfio: User API for IRQ forwarding
>
> On Mon, Jun 29, 2015 at 09:14:54AM +0000, Wu, Feng wrote:
> > Do you mean updating the hardware IRTEs for all the entries in the irq
> > routing table, no matter whether it is the updated one?
>
> Right, that's what I mean. It seems wrong to me to work around the API
> interface by creating a diff between the old and the new routing table.

Yes the original usage model here doesn't care about the diff between
the old and new, it is a little intrusive to add the comparison code here.

> It is much simpler (and easier to maintain) to just update the IRTE
> and PI structures for all IRQs in the routing table, especially since
> this is not a hot-path.

Agree.

Thanks,
Feng

>
>
> Joerg

2015-06-29 13:28:13

by Wu, Feng

[permalink] [raw]
Subject: RE: [v4 08/16] KVM: kvm-vfio: User API for IRQ forwarding



> -----Original Message-----
> From: Alex Williamson [mailto:[email protected]]
> Sent: Friday, June 19, 2015 4:04 AM
> To: Wu, Feng
> Cc: Eric Auger; Avi Kivity; [email protected]; [email protected];
> [email protected]; [email protected]; Joerg Roedel
> Subject: Re: [v4 08/16] KVM: kvm-vfio: User API for IRQ forwarding
>
> [Adding Joerg since he was part of this original idea]
>
> On Thu, 2015-06-18 at 09:16 +0000, Wu, Feng wrote:
> >
> >
> > > -----Original Message-----
> > > From: Alex Williamson [mailto:[email protected]]
> > > Sent: Tuesday, June 16, 2015 12:45 AM
> > > To: Eric Auger
> > > Cc: Avi Kivity; Wu, Feng; [email protected];
> [email protected];
> > > [email protected]; [email protected]
> > > Subject: Re: [v4 08/16] KVM: kvm-vfio: User API for IRQ forwarding
> > >
> > > On Mon, 2015-06-15 at 18:17 +0200, Eric Auger wrote:
> > > > Hi Alex, all,
> > > > On 06/12/2015 09:03 PM, Alex Williamson wrote:
> > > > > On Fri, 2015-06-12 at 21:48 +0300, Avi Kivity wrote:
> > > > >> On 06/12/2015 06:41 PM, Alex Williamson wrote:
> > > > >>> On Fri, 2015-06-12 at 00:23 +0000, Wu, Feng wrote:
> > > > >>>>> -----Original Message-----
> > > > >>>>> From: Avi Kivity [mailto:[email protected]]
> > > > >>>>> Sent: Friday, June 12, 2015 3:59 AM
> > > > >>>>> To: Wu, Feng; [email protected]; [email protected]
> > > > >>>>> Cc: [email protected]; [email protected];
> > > > >>>>> [email protected]; [email protected]
> > > > >>>>> Subject: Re: [v4 08/16] KVM: kvm-vfio: User API for IRQ forwarding
> > > > >>>>>
> > > > >>>>> On 06/11/2015 01:51 PM, Feng Wu wrote:
> > > > >>>>>> From: Eric Auger <[email protected]>
> > > > >>>>>>
> > > > >>>>>> This patch adds and documents a new KVM_DEV_VFIO_DEVICE
> > > group
> > > > >>>>>> and 2 device attributes:
> KVM_DEV_VFIO_DEVICE_FORWARD_IRQ,
> > > > >>>>>> KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ. The purpose is to be
> > > able
> > > > >>>>>> to set a VFIO device IRQ as forwarded or not forwarded.
> > > > >>>>>> the command takes as argument a handle to a new struct named
> > > > >>>>>> kvm_vfio_dev_irq.
> > > > >>>>> Is there no way to do this automatically? After all, vfio knows that
> a
> > > > >>>>> device interrupt is forwarded to some eventfd, and kvm knows that
> > > some
> > > > >>>>> eventfd is forwarded to a guest interrupt. If they compare notes
> > > > >>>>> through a central registry, they can figure out that the interrupt
> needs
> > > > >>>>> to be forwarded.
> > > > >>>> Oh, just like Eric mentioned in his reply, this description is out of
> context
> > > of
> > > > >>>> this series, I will remove them in the next version.
> > > > >>>
> > > > >>> I suspect Avi's question was more general. While forward/unforward
> is
> > > > >>> out of context for this series, it's very similar in nature to
> > > > >>> enabling/disabling posted interrupts. So I think the question remains
> > > > >>> whether we really need userspace to participate in creating this
> > > > >>> shortcut or if kvm and vfio can some how orchestrate figuring it out
> > > > >>> automatically.
> > > > >>>
> > > > >>> Personally I don't know how we could do it automatically. We've
> always
> > > > >>> relied on userspace to independently setup vfio and kvm such that
> > > > >>> neither have any idea that the other is there and update each side
> > > > >>> independently when anything changes. So it seems consistent to
> > > continue
> > > > >>> that here. It doesn't seem like there's much to gain
> performance-wise
> > > > >>> either, updates should be a relatively rare event I'd expect.
> > > > >>>
> > > > >>> There's really no metadata associated with an eventfd, so "comparing
> > > > >>> notes" automatically might imply some central registration entity.
> That
> > > > >>> immediately sounds like a much more complex solution, but maybe Avi
> > > has
> > > > >>> some ideas to manage it. Thanks,
> > > > >>>
> > > > >>
> > > > >> The idea is to have a central registry maintained by a posted interrupts
> > > > >> manager. Both vfio and kvm pass the filp (along with extra
> information)
> > > > >> to the posted interrupts manager, which, when it detects a filp match,
> > > > >> tells each of them what to do.
> > > > >>
> > > > >> The advantages are:
> > > > >> - old userspace gains the optimization without change
> > > > >> - a userspace API is more expensive to maintain than internal kernel
> > > > >> interfaces (CVEs, documentation, maintaining backwards compatibility)
> > > > >> - if you can do it without a new interface, this indicates that all the
> > > > >> information in the new interface is redundant. That means you have
> to
> > > > >> check it for consistency with the existing information, so it's extra
> > > > >> work (likely, it's exactly what the posted interrupt manager would be
> > > > >> doing anyway).
> > > > >
> > > > > Yep, those all sound like good things and I believe that's similar in
> > > > > design to the way we had originally discussed this interaction at
> > > > > LPC/KVM Forum several years ago. I'd be in favor of that approach.
> > > >
> > > > I guess this discussion also is relevant wrt "[RFC v6 00/16] KVM-VFIO
> > > > IRQ forward control" series? Or is that "central registry maintained by
> > > > a posted interrupts manager" something more specific to x86?
> > >
> > > I'd think we'd want it for any sort of offload and supporting both
> > > posted-interrupts and irq-forwarding would be a good validation. I
> > > imagine there would be registration/de-registration callbacks separate
> > > for interrupt producers vs interrupt consumers. Each registration
> > > function would likely provide a struct of callbacks, probably similar to
> > > the get_symbol callbacks proposed for the kvm-vfio device on the IRQ
> > > producer side. The eventfd would be the token that the manager would
> > > use to match producers and consumers. The hard part is probably
> > > figuring out what information to retrieve from the producer and provide
> > > to the consumer in a generic way between pci and platform, but as an
> > > internal interface, it's not a big deal if we screw it up a few times to
> > > start. Thanks,
> >
> > On posted-interrupts side, the main purpose of the new APIs is to update
> > the IRTE when guest changes vMSI/vMSIx configuration. Alex, do you have
> > any detailed ideas for the new solution to achieve this purpose? It should
> > be helpful if you can share some!
>
>
> There are plenty of details to be filled in, but I think the basics
> looks something like the code below. The IRQ bypass manager just
> defines a pair of structures, one for interrupt producers and one for
> interrupt consumers. I'm certain that we'll need more callbacks than
> I've defined below, but figuring out what those should be for the best
> abstraction is the hardest part of this idea. The manager provides both
> registration and de-registration interfaces for both types of objects
> and keeps lists for each, protected by a lock. The manager doesn't even
> really need to know what the match token is, but I assume for our
> purposes it will be an eventfd_ctx.
>
> On the vfio side, the producer struct would be embedded in the
> vfio_pci_irq_ctx struct. KVM would probably embed the consumer struct
> in _irqfd. As I've coded below, the IRQ bypass manager calls the
> consumer callbacks, so the producer struct would need fields or
> callbacks to provide the consumer the info it needs. AIUI the Posted
> Interrupt model, VFIO only needs to provide data to the consumer. For
> IRQ Forwarding, I think the producer needs to be informed when bypass is
> active to model the incoming interrupt as edge vs level.
>
> I've prototyped the base IRQ bypass manager here as static, but I don't
> see any reason it couldn't be a module that's loaded by dependency when
> either vfio-pci or kvm-intel is loaded (or other producer/consumer
> objects).
>
> Is this a reasonable starting point to craft the additional fields and
> callbacks and interaction of who calls who that we need to support
> Posted Interrupts and IRQ Forwarding? Is the AMD version of this still
> alive? Thanks,
>
> Alex
>
> diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
> index 413a7bf..22f6fcb 100644
> --- a/arch/x86/kvm/Kconfig
> +++ b/arch/x86/kvm/Kconfig
> @@ -61,6 +61,7 @@ config KVM_INTEL
> depends on KVM
> # for perf_guest_get_msrs():
> depends on CPU_SUP_INTEL
> + select IRQ_BYPASS_MANAGER
> ---help---
> Provides support for KVM on Intel processors equipped with the VT
> extensions.
> diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig
> index 579d83b..02912f1 100644
> --- a/drivers/vfio/pci/Kconfig
> +++ b/drivers/vfio/pci/Kconfig
> @@ -2,6 +2,7 @@ config VFIO_PCI
> tristate "VFIO support for PCI devices"
> depends on VFIO && PCI && EVENTFD
> select VFIO_VIRQFD
> + select IRQ_BYPASS_MANAGER
> help
> Support for the PCI VFIO bus driver. This is required to make
> use of PCI drivers using the VFIO framework.
> diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
> index 1f577b4..4e053be 100644
> --- a/drivers/vfio/pci/vfio_pci_intrs.c
> +++ b/drivers/vfio/pci/vfio_pci_intrs.c
> @@ -181,6 +181,7 @@ static int vfio_intx_set_signal(struct vfio_pci_device
> *vdev, int fd)
>
> if (vdev->ctx[0].trigger) {
> free_irq(pdev->irq, vdev);
> + /* irq_bypass_unregister_producer(); */
> kfree(vdev->ctx[0].name);
> eventfd_ctx_put(vdev->ctx[0].trigger);
> vdev->ctx[0].trigger = NULL;
> @@ -214,6 +215,8 @@ static int vfio_intx_set_signal(struct vfio_pci_device
> *vdev, int fd)
> return ret;
> }
>
> + /* irq_bypass_register_producer(); */
> +
> /*
> * INTx disable will stick across the new irq setup,
> * disable_irq won't.
> @@ -319,6 +322,7 @@ static int vfio_msi_set_vector_signal(struct
> vfio_pci_device *vdev,
>
> if (vdev->ctx[vector].trigger) {
> free_irq(irq, vdev->ctx[vector].trigger);
> + /* irq_bypass_unregister_producer(); */
> kfree(vdev->ctx[vector].name);
> eventfd_ctx_put(vdev->ctx[vector].trigger);
> vdev->ctx[vector].trigger = NULL;
> @@ -360,6 +364,8 @@ static int vfio_msi_set_vector_signal(struct
> vfio_pci_device *vdev,
> return ret;
> }
>
> + /* irq_bypass_register_producer(); */
> +
> vdev->ctx[vector].trigger = trigger;
>
> return 0;
> diff --git a/include/linux/irqbypass.h b/include/linux/irqbypass.h
> new file mode 100644
> index 0000000..718508e
> --- /dev/null
> +++ b/include/linux/irqbypass.h
> @@ -0,0 +1,23 @@
> +#ifndef IRQBYPASS_H
> +#define IRQBYPASS_H
> +
> +#include <linux/list.h>
> +
> +struct irq_bypass_producer {
> + struct list_head node;
> + void *token;
> + /* TBD */
> +};
> +
> +struct irq_bypass_consumer {
> + struct list_head node;
> + void *token;
> + void (*add_producer)(struct irq_bypass_producer *);
> + void (*del_producer)(struct irq_bypass_producer *);

These two callbacks should be common function, for PI, I need to add
something specific to x86, such as, updating the associated IRTE, how
should I do for this?

> +};
> +
> +int irq_bypass_register_producer(struct irq_bypass_producer *);
> +void irq_bypass_unregister_producer(struct irq_bypass_producer *);
> +int irq_bypass_register_consumer(struct irq_bypass_consumer *);
> +void irq_bypass_unregister_consumer(struct irq_bypass_consumer *);
> +#endif /* IRQBYPASS_H */
> diff --git a/kernel/irq/Kconfig b/kernel/irq/Kconfig
> index 9a76e3b..4502cdc 100644
> --- a/kernel/irq/Kconfig
> +++ b/kernel/irq/Kconfig
> @@ -100,4 +100,7 @@ config SPARSE_IRQ
>
> If you don't know what to do here, say N.
>
> +config IRQ_BYPASS_MANAGER
> + bool
> +
> endmenu
> diff --git a/kernel/irq/Makefile b/kernel/irq/Makefile
> index d121235..a30ed77 100644
> --- a/kernel/irq/Makefile
> +++ b/kernel/irq/Makefile
> @@ -7,3 +7,4 @@ obj-$(CONFIG_PROC_FS) += proc.o
> obj-$(CONFIG_GENERIC_PENDING_IRQ) += migration.o
> obj-$(CONFIG_PM_SLEEP) += pm.o
> obj-$(CONFIG_GENERIC_MSI_IRQ) += msi.o
> +obj-$(CONFIG_IRQ_BYPASS_MANAGER) += bypass.o
> diff --git a/kernel/irq/bypass.c b/kernel/irq/bypass.c
> new file mode 100644
> index 0000000..5d0f92b
> --- /dev/null
> +++ b/kernel/irq/bypass.c

Is it better to put this code here or in vfio folder?

Thanks,
Feng

> @@ -0,0 +1,116 @@
> +/*
> + * IRQ offload/bypass manager
> + *
> + * Various virtualization hardware acceleration techniques allow bypassing
> + * or offloading interrupts receieved from devices around the host kernel.
> + * Posted Interrupts on Intel VT-d systems can allow interrupts to be
> + * recieved directly by a virtual machine. ARM IRQ Forwarding can allow
> + * level triggered device interrupts to be de-asserted directly by the VM.
> + * This manager allows interrupt producers and consumers to find each other
> + * to enable this sort of bypass.
> + */
> +
> +#include <linux/irqbypass.h>
> +#include <linux/list.h>
> +#include <linux/module.h>
> +#include <linux/mutex.h>
> +
> +static LIST_HEAD(producers);
> +static LIST_HEAD(consumers);
> +static DEFINE_MUTEX(lock);
> +
> +int irq_bypass_register_producer(struct irq_bypass_producer *producer)
> +{
> + struct irq_bypass_producer *tmp;
> + struct irq_bypass_consumer *consumer;
> + int ret = 0;
> +
> + mutex_lock(&lock);
> +
> + list_for_each_entry(tmp, &producers, node) {
> + if (tmp->token == producer->token) {
> + ret = -EINVAL;
> + goto unlock;
> + }
> + }
> +
> + list_add(&producer->node, &producers);
> +
> + list_for_each_entry(consumer, &consumers, node) {
> + if (consumer->token == producer->token) {
> + consumer->add_producer(producer);
> + break;
> + }
> + }
> +unlock:
> + mutex_unlock(&lock);
> + return ret;
> +}
> +EXPORT_SYMBOL_GPL(irq_bypass_register_producer);
> +
> +void irq_bypass_unregister_producer(struct irq_bypass_producer *producer)
> +{
> + struct irq_bypass_consumer *consumer;
> +
> + mutex_lock(&lock);
> +
> + list_for_each_entry(consumer, &consumers, node) {
> + if (consumer->token == producer->token) {
> + consumer->del_producer(producer);
> + break;
> + }
> + }
> +
> + list_del(&producer->node);
> +
> + mutex_unlock(&lock);
> +}
> +EXPORT_SYMBOL_GPL(irq_bypass_unregister_producer);
> +
> +int irq_bypass_register_consumer(struct irq_bypass_consumer *consumer)
> +{
> + struct irq_bypass_consumer *tmp;
> + struct irq_bypass_producer *producer;
> + int ret = 0;
> +
> + mutex_lock(&lock);
> +
> + list_for_each_entry(tmp, &consumers, node) {
> + if (tmp->token == consumer->token) {
> + ret = -EINVAL;
> + goto unlock;
> + }
> + }
> +
> + list_add(&consumer->node, &consumers);
> +
> + list_for_each_entry(producer, &producers, node) {
> + if (producer->token == consumer->token) {
> + consumer->add_producer(producer);
> + break;
> + }
> + }
> +unlock:
> + mutex_unlock(&lock);
> + return ret;
> +}
> +EXPORT_SYMBOL_GPL(irq_bypass_register_consumer);
> +
> +void irq_bypass_unregister_consumer(struct irq_bypass_consumer *consumer)
> +{
> + struct irq_bypass_producer *producer;
> +
> + mutex_lock(&lock);
> +
> + list_for_each_entry(producer, &producers, node) {
> + if (producer->token == consumer->token) {
> + consumer->del_producer(producer);
> + break;
> + }
> + }
> +
> + list_del(&consumer->node);
> +
> + mutex_unlock(&lock);
> +}
> +EXPORT_SYMBOL_GPL(irq_bypass_unregister_consumer);
> diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
> index 9ff4193..f3da161 100644
> --- a/virt/kvm/eventfd.c
> +++ b/virt/kvm/eventfd.c
> @@ -429,6 +429,8 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd
> *args)
> */
> fdput(f);
>
> + /* irq_bypass_register_consumer(); */
> +
> return 0;
>
> fail:
> @@ -528,6 +530,8 @@ kvm_irqfd_deassign(struct kvm *kvm, struct kvm_irqfd
> *args)
> struct _irqfd *irqfd, *tmp;
> struct eventfd_ctx *eventfd;
>
> + /* irq_bypass_unregister_consumer() */
> +
> eventfd = eventfd_ctx_fdget(args->fd);
> if (IS_ERR(eventfd))
> return PTR_ERR(eventfd);
>
>

????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2015-06-29 15:18:48

by Alex Williamson

[permalink] [raw]
Subject: Re: [v4 08/16] KVM: kvm-vfio: User API for IRQ forwarding

On Mon, 2015-06-29 at 13:27 +0000, Wu, Feng wrote:
>
>
> > -----Original Message-----
> > From: Alex Williamson [mailto:[email protected]]
> > Sent: Friday, June 19, 2015 4:04 AM
> > To: Wu, Feng
> > Cc: Eric Auger; Avi Kivity; [email protected]; [email protected];
> > [email protected]; [email protected]; Joerg Roedel
> > Subject: Re: [v4 08/16] KVM: kvm-vfio: User API for IRQ forwarding
> >
> > [Adding Joerg since he was part of this original idea]
> >
> >
> > There are plenty of details to be filled in, but I think the basics
> > looks something like the code below. The IRQ bypass manager just
> > defines a pair of structures, one for interrupt producers and one for
> > interrupt consumers. I'm certain that we'll need more callbacks than
> > I've defined below, but figuring out what those should be for the best
> > abstraction is the hardest part of this idea. The manager provides both
> > registration and de-registration interfaces for both types of objects
> > and keeps lists for each, protected by a lock. The manager doesn't even
> > really need to know what the match token is, but I assume for our
> > purposes it will be an eventfd_ctx.
> >
> > On the vfio side, the producer struct would be embedded in the
> > vfio_pci_irq_ctx struct. KVM would probably embed the consumer struct
> > in _irqfd. As I've coded below, the IRQ bypass manager calls the
> > consumer callbacks, so the producer struct would need fields or
> > callbacks to provide the consumer the info it needs. AIUI the Posted
> > Interrupt model, VFIO only needs to provide data to the consumer. For
> > IRQ Forwarding, I think the producer needs to be informed when bypass is
> > active to model the incoming interrupt as edge vs level.
> >
> > I've prototyped the base IRQ bypass manager here as static, but I don't
> > see any reason it couldn't be a module that's loaded by dependency when
> > either vfio-pci or kvm-intel is loaded (or other producer/consumer
> > objects).
> >
> > Is this a reasonable starting point to craft the additional fields and
> > callbacks and interaction of who calls who that we need to support
> > Posted Interrupts and IRQ Forwarding? Is the AMD version of this still
> > alive? Thanks,
> >
> > Alex
> >
> > diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
> > index 413a7bf..22f6fcb 100644
> > --- a/arch/x86/kvm/Kconfig
> > +++ b/arch/x86/kvm/Kconfig
> > @@ -61,6 +61,7 @@ config KVM_INTEL
> > depends on KVM
> > # for perf_guest_get_msrs():
> > depends on CPU_SUP_INTEL
> > + select IRQ_BYPASS_MANAGER
> > ---help---
> > Provides support for KVM on Intel processors equipped with the VT
> > extensions.
> > diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig
> > index 579d83b..02912f1 100644
> > --- a/drivers/vfio/pci/Kconfig
> > +++ b/drivers/vfio/pci/Kconfig
> > @@ -2,6 +2,7 @@ config VFIO_PCI
> > tristate "VFIO support for PCI devices"
> > depends on VFIO && PCI && EVENTFD
> > select VFIO_VIRQFD
> > + select IRQ_BYPASS_MANAGER
> > help
> > Support for the PCI VFIO bus driver. This is required to make
> > use of PCI drivers using the VFIO framework.
> > diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
> > index 1f577b4..4e053be 100644
> > --- a/drivers/vfio/pci/vfio_pci_intrs.c
> > +++ b/drivers/vfio/pci/vfio_pci_intrs.c
> > @@ -181,6 +181,7 @@ static int vfio_intx_set_signal(struct vfio_pci_device
> > *vdev, int fd)
> >
> > if (vdev->ctx[0].trigger) {
> > free_irq(pdev->irq, vdev);
> > + /* irq_bypass_unregister_producer(); */
> > kfree(vdev->ctx[0].name);
> > eventfd_ctx_put(vdev->ctx[0].trigger);
> > vdev->ctx[0].trigger = NULL;
> > @@ -214,6 +215,8 @@ static int vfio_intx_set_signal(struct vfio_pci_device
> > *vdev, int fd)
> > return ret;
> > }
> >
> > + /* irq_bypass_register_producer(); */
> > +
> > /*
> > * INTx disable will stick across the new irq setup,
> > * disable_irq won't.
> > @@ -319,6 +322,7 @@ static int vfio_msi_set_vector_signal(struct
> > vfio_pci_device *vdev,
> >
> > if (vdev->ctx[vector].trigger) {
> > free_irq(irq, vdev->ctx[vector].trigger);
> > + /* irq_bypass_unregister_producer(); */
> > kfree(vdev->ctx[vector].name);
> > eventfd_ctx_put(vdev->ctx[vector].trigger);
> > vdev->ctx[vector].trigger = NULL;
> > @@ -360,6 +364,8 @@ static int vfio_msi_set_vector_signal(struct
> > vfio_pci_device *vdev,
> > return ret;
> > }
> >
> > + /* irq_bypass_register_producer(); */
> > +
> > vdev->ctx[vector].trigger = trigger;
> >
> > return 0;
> > diff --git a/include/linux/irqbypass.h b/include/linux/irqbypass.h
> > new file mode 100644
> > index 0000000..718508e
> > --- /dev/null
> > +++ b/include/linux/irqbypass.h
> > @@ -0,0 +1,23 @@
> > +#ifndef IRQBYPASS_H
> > +#define IRQBYPASS_H
> > +
> > +#include <linux/list.h>
> > +
> > +struct irq_bypass_producer {
> > + struct list_head node;
> > + void *token;
> > + /* TBD */
> > +};
> > +
> > +struct irq_bypass_consumer {
> > + struct list_head node;
> > + void *token;
> > + void (*add_producer)(struct irq_bypass_producer *);
> > + void (*del_producer)(struct irq_bypass_producer *);
>
> These two callbacks should be common function, for PI, I need to add
> something specific to x86, such as, updating the associated IRTE, how
> should I do for this?

These are function pointers, the consumer (kvm in this case) can
populate them with whatever implementation it needs. The details of
updating the IRTE should be completely hidden from this interface. This
interface only handles identifying matches between producer and consumer
and providing an API for the handshake. Feel free to use more
appropriate callbacks and structure fields, these are only meant as a
rough sketch of the idea and possible interaction, but please keep
layering in mind to make a generic interface.

> > +};
> > +
> > +int irq_bypass_register_producer(struct irq_bypass_producer *);
> > +void irq_bypass_unregister_producer(struct irq_bypass_producer *);
> > +int irq_bypass_register_consumer(struct irq_bypass_consumer *);
> > +void irq_bypass_unregister_consumer(struct irq_bypass_consumer *);
> > +#endif /* IRQBYPASS_H */
> > diff --git a/kernel/irq/Kconfig b/kernel/irq/Kconfig
> > index 9a76e3b..4502cdc 100644
> > --- a/kernel/irq/Kconfig
> > +++ b/kernel/irq/Kconfig
> > @@ -100,4 +100,7 @@ config SPARSE_IRQ
> >
> > If you don't know what to do here, say N.
> >
> > +config IRQ_BYPASS_MANAGER
> > + bool
> > +
> > endmenu
> > diff --git a/kernel/irq/Makefile b/kernel/irq/Makefile
> > index d121235..a30ed77 100644
> > --- a/kernel/irq/Makefile
> > +++ b/kernel/irq/Makefile
> > @@ -7,3 +7,4 @@ obj-$(CONFIG_PROC_FS) += proc.o
> > obj-$(CONFIG_GENERIC_PENDING_IRQ) += migration.o
> > obj-$(CONFIG_PM_SLEEP) += pm.o
> > obj-$(CONFIG_GENERIC_MSI_IRQ) += msi.o
> > +obj-$(CONFIG_IRQ_BYPASS_MANAGER) += bypass.o
> > diff --git a/kernel/irq/bypass.c b/kernel/irq/bypass.c
> > new file mode 100644
> > index 0000000..5d0f92b
> > --- /dev/null
> > +++ b/kernel/irq/bypass.c
>
> Is it better to put this code here or in vfio folder?

What about it is specific to vfio? Both vfio and kvm are clients to the
interface, but I don't think we want to add any barriers that restrict
it to that pair. I think we originally thought of this as an IOMMU
service, so drivers/iommu might be another possible home for it.
Thanks,

Alex