2014-11-10 06:36:32

by Wu, Feng

[permalink] [raw]
Subject: [PATCH 00/13] Add VT-d Posted-Interrupts support for KVM

VT-d Posted-Interrupts is an enhancement to CPU side Posted-Interrupt.
With VT-d Posted-Interrupts enabled, external interrupts from
direct-assigned devices can be delivered to guests without VMM
intervention when guest is running in non-root mode.

You can find the VT-d Posted-Interrtups Spec. in the following URL:
http://www.intel.com/content/www/us/en/intelligent-systems/intel-technology/vt-directed-io-spec.html

Feng Wu (13):
iommu/vt-d: VT-d Posted-Interrupts feature detection
KVM: Initialize VT-d Posted-Interrtups Descriptor
KVM: Add KVM_CAP_PI to detect VT-d Posted-Interrtups
iommu/vt-d: Adjust 'struct irte' to better suit for VT-d
Posted-Interrupts
KVM: Update IRTE according to guest interrupt configuration changes
KVM: Add some helper functions for Posted-Interrupts
x86, irq: Define a global vector for VT-d Posted-Interrupts
KVM: Update Posted-Interrupts descriptor during VCPU scheduling
KVM: Change NDST field after VCPU scheduling
KVM: Add the handler for Wake-up Vector
KVM: Suppress posted-interrupt when 'SN' is set
iommu/vt-d: No need to migrating irq for VT-d Posted-Interrtups
iommu/vt-d: Add a command line parameter for VT-d posted-interrupts

arch/x86/include/asm/entry_arch.h | 2 +
arch/x86/include/asm/hardirq.h | 1 +
arch/x86/include/asm/hw_irq.h | 2 +
arch/x86/include/asm/irq_remapping.h | 7 +
arch/x86/include/asm/irq_vectors.h | 1 +
arch/x86/include/asm/kvm_host.h | 9 ++
arch/x86/kernel/apic/apic.c | 1 +
arch/x86/kernel/entry_64.S | 2 +
arch/x86/kernel/irq.c | 27 ++++
arch/x86/kernel/irqinit.c | 2 +
arch/x86/kvm/vmx.c | 257 +++++++++++++++++++++++++++++++++-
arch/x86/kvm/x86.c | 53 ++++++-
drivers/iommu/amd_iommu.c | 6 +
drivers/iommu/intel_irq_remapping.c | 83 +++++++++--
drivers/iommu/irq_remapping.c | 20 +++
drivers/iommu/irq_remapping.h | 8 +
include/linux/dmar.h | 30 ++++-
include/linux/intel-iommu.h | 1 +
include/linux/kvm_host.h | 25 ++++
include/uapi/linux/kvm.h | 2 +
virt/kvm/assigned-dev.c | 141 +++++++++++++++++++
virt/kvm/irq_comm.c | 4 +-
virt/kvm/irqchip.c | 11 --
virt/kvm/kvm_main.c | 14 ++
24 files changed, 667 insertions(+), 42 deletions(-)


2014-11-10 06:36:33

by Wu, Feng

[permalink] [raw]
Subject: [PATCH 01/13] iommu/vt-d: VT-d Posted-Interrupts feature detection

VT-d Posted-Interrupts is an enhancement to CPU side Posted-Interrupt.
With VT-d Posted-Interrupts enabled, external interrupts from
direct-assigned devices can be delivered to guests without VMM
intervention when guest is running in non-root mode.

This patch adds feature detection logic for VT-d posted-interrupt.

Signed-off-by: Feng Wu <[email protected]>
---
drivers/iommu/intel_irq_remapping.c | 13 +++++++++++++
drivers/iommu/irq_remapping.c | 4 ++++
drivers/iommu/irq_remapping.h | 5 +++++
include/linux/intel-iommu.h | 1 +
4 files changed, 23 insertions(+), 0 deletions(-)

diff --git a/drivers/iommu/intel_irq_remapping.c b/drivers/iommu/intel_irq_remapping.c
index 7c80661..f99f0f1 100644
--- a/drivers/iommu/intel_irq_remapping.c
+++ b/drivers/iommu/intel_irq_remapping.c
@@ -580,6 +580,19 @@ static int __init intel_irq_remapping_supported(void)
if (!ecap_ir_support(iommu->ecap))
return 0;

+ /* VT-d posted-interrupt feature detection*/
+ if (disable_irq_post == 0)
+ for_each_drhd_unit(drhd) {
+ struct intel_iommu *iommu = drhd->iommu;
+
+ if (!cap_pi_support(iommu->cap)) {
+ irq_post_enabled = 0;
+ disable_irq_post = 1;
+ break;
+ }
+ irq_post_enabled = 1;
+ }
+
return 1;
}

diff --git a/drivers/iommu/irq_remapping.c b/drivers/iommu/irq_remapping.c
index 74a1767..2f8ee00 100644
--- a/drivers/iommu/irq_remapping.c
+++ b/drivers/iommu/irq_remapping.c
@@ -23,6 +23,10 @@ int irq_remap_broken;
int disable_sourceid_checking;
int no_x2apic_optout;

+int disable_irq_post = 1;
+int irq_post_enabled = 0;
+EXPORT_SYMBOL_GPL(irq_post_enabled);
+
static struct irq_remap_ops *remap_ops;

static int msi_alloc_remapped_irq(struct pci_dev *pdev, int irq, int nvec);
diff --git a/drivers/iommu/irq_remapping.h b/drivers/iommu/irq_remapping.h
index fde250f..7bb5913 100644
--- a/drivers/iommu/irq_remapping.h
+++ b/drivers/iommu/irq_remapping.h
@@ -37,6 +37,9 @@ extern int disable_sourceid_checking;
extern int no_x2apic_optout;
extern int irq_remapping_enabled;

+extern int disable_irq_post;
+extern int irq_post_enabled;
+
struct irq_remap_ops {
/* Check whether Interrupt Remapping is supported */
int (*supported)(void);
@@ -91,6 +94,8 @@ extern struct irq_remap_ops amd_iommu_irq_ops;
#define irq_remapping_enabled 0
#define disable_irq_remap 1
#define irq_remap_broken 0
+#define disable_irq_post 1
+#define irq_post_enabled 0

#endif /* CONFIG_IRQ_REMAP */

diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index a65208a..5b1a124 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -87,6 +87,7 @@ static inline void dmar_writeq(void __iomem *addr, u64 val)
/*
* Decoding Capability Register
*/
+#define cap_pi_support(c) (((c) >> 59) & 1)
#define cap_read_drain(c) (((c) >> 55) & 1)
#define cap_write_drain(c) (((c) >> 54) & 1)
#define cap_max_amask_val(c) (((c) >> 48) & 0x3f)
--
1.7.1

2014-11-10 06:36:36

by Wu, Feng

[permalink] [raw]
Subject: [PATCH 05/13] KVM: Update IRTE according to guest interrupt configuration changes

When guest changes its interrupt configuration (such as, vector, etc.)
for direct-assigned devices, we need to update the associated IRTE
with the new guest vector, so external interrupts from the assigned
devices can be injected to guests without VM-Exit.

The current method of handling guest lowest priority interrtups
is to use a counter 'apic_arb_prio' for each VCPU, we choose the
VCPU with smallest 'apic_arb_prio' and then increase it by 1.
However, for VT-d PI, we cannot re-use this, since we no longer
have control to 'apic_arb_prio' with posted interrupt direct
delivery by Hardware.

Here, we introduce a similiar way with 'apic_arb_prio' to handle
guest lowest priority interrtups when VT-d PI is used. Here is the
ideas:
- Each VCPU has a counter 'round_robin_counter'.
- When guests sets an interrupts to lowest priority, we choose
the VCPU with smallest 'round_robin_counter' as the destination,
then increase it.

Signed-off-by: Feng Wu <[email protected]>
---
arch/x86/include/asm/irq_remapping.h | 6 ++
arch/x86/include/asm/kvm_host.h | 2 +
arch/x86/kvm/vmx.c | 12 +++
arch/x86/kvm/x86.c | 11 +++
drivers/iommu/amd_iommu.c | 6 ++
drivers/iommu/intel_irq_remapping.c | 28 +++++++
drivers/iommu/irq_remapping.c | 9 ++
drivers/iommu/irq_remapping.h | 3 +
include/linux/dmar.h | 26 ++++++
include/linux/kvm_host.h | 22 +++++
include/uapi/linux/kvm.h | 1 +
virt/kvm/assigned-dev.c | 141 ++++++++++++++++++++++++++++++++++
virt/kvm/irq_comm.c | 4 +-
virt/kvm/irqchip.c | 11 ---
14 files changed, 269 insertions(+), 13 deletions(-)

diff --git a/arch/x86/include/asm/irq_remapping.h b/arch/x86/include/asm/irq_remapping.h
index a3cc437..32d6cc4 100644
--- a/arch/x86/include/asm/irq_remapping.h
+++ b/arch/x86/include/asm/irq_remapping.h
@@ -51,6 +51,7 @@ extern void compose_remapped_msi_msg(struct pci_dev *pdev,
unsigned int irq, unsigned int dest,
struct msi_msg *msg, u8 hpet_id);
extern int setup_hpet_msi_remapped(unsigned int irq, unsigned int id);
+extern int update_pi_irte(unsigned int irq, u64 pi_desc_addr, u32 vector);
extern void panic_if_irq_remap(const char *msg);
extern bool setup_remapped_irq(int irq,
struct irq_cfg *cfg,
@@ -88,6 +89,11 @@ static inline int setup_hpet_msi_remapped(unsigned int irq, unsigned int id)
return -ENODEV;
}

+static inline int update_pi_irte(unsigned int irq, u64 pi_desc_addr, u32 vector)
+{
+ return -ENODEV;
+}
+
static inline void panic_if_irq_remap(const char *msg)
{
}
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 6ed0c30..0630161 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -358,6 +358,7 @@ struct kvm_vcpu_arch {
struct kvm_lapic *apic; /* kernel irqchip context */
unsigned long apic_attention;
int32_t apic_arb_prio;
+ int32_t round_robin_counter;
int mp_state;
u64 ia32_misc_enable_msr;
bool tpr_access_reporting;
@@ -771,6 +772,7 @@ struct kvm_x86_ops {
int (*check_nested_events)(struct kvm_vcpu *vcpu, bool external_intr);

void (*sched_in)(struct kvm_vcpu *kvm, int cpu);
+ u64 (*get_pi_desc_addr)(struct kvm_vcpu *vcpu);
};

struct kvm_arch_async_pf {
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index a4670d3..ae91b72 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -544,6 +544,11 @@ static inline struct vcpu_vmx *to_vmx(struct kvm_vcpu *vcpu)
return container_of(vcpu, struct vcpu_vmx, vcpu);
}

+struct pi_desc *vcpu_to_pi_desc(struct kvm_vcpu *vcpu)
+{
+ return &(to_vmx(vcpu)->pi_desc);
+}
+
#define VMCS12_OFFSET(x) offsetof(struct vmcs12, x)
#define FIELD(number, name) [number] = VMCS12_OFFSET(name)
#define FIELD64(number, name) [number] = VMCS12_OFFSET(name), \
@@ -4280,6 +4285,11 @@ static void vmx_sync_pir_to_irr_dummy(struct kvm_vcpu *vcpu)
return;
}

+static u64 vmx_get_pi_desc_addr(struct kvm_vcpu *vcpu)
+{
+ return __pa((u64)vcpu_to_pi_desc(vcpu));
+}
+
/*
* Set up the vmcs's constant host-state fields, i.e., host-state fields that
* will not change in the lifetime of the guest.
@@ -9232,6 +9242,8 @@ static struct kvm_x86_ops vmx_x86_ops = {
.check_nested_events = vmx_check_nested_events,

.sched_in = vmx_sched_in,
+
+ .get_pi_desc_addr = vmx_get_pi_desc_addr,
};

static int __init vmx_init(void)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index b447a98..0c19d15 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7735,6 +7735,17 @@ bool kvm_arch_has_noncoherent_dma(struct kvm *kvm)
}
EXPORT_SYMBOL_GPL(kvm_arch_has_noncoherent_dma);

+int kvm_update_pi_irte_common(struct kvm *kvm, struct kvm_vcpu *vcpu,
+ u32 guest_vector, int host_irq)
+{
+ u64 pi_desc_addr = kvm_x86_ops->get_pi_desc_addr(vcpu);
+
+ if (update_pi_irte(host_irq, pi_desc_addr, guest_vector))
+ return -1;
+
+ return 0;
+}
+
EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_exit);
EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_inj_virq);
EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_page_fault);
diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 505a9ad..a36fdc7 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -4280,6 +4280,11 @@ static int alloc_hpet_msi(unsigned int irq, unsigned int id)
return 0;
}

+static int dummy_update_pi_irte(int irq, u64 pi_desc_addr, u32 vector)
+{
+ return -EINVAL;
+}
+
struct irq_remap_ops amd_iommu_irq_ops = {
.supported = amd_iommu_supported,
.prepare = amd_iommu_prepare,
@@ -4294,5 +4299,6 @@ struct irq_remap_ops amd_iommu_irq_ops = {
.msi_alloc_irq = msi_alloc_irq,
.msi_setup_irq = msi_setup_irq,
.alloc_hpet_msi = alloc_hpet_msi,
+ .update_pi_irte = dummy_update_pi_irte,
};
#endif
diff --git a/drivers/iommu/intel_irq_remapping.c b/drivers/iommu/intel_irq_remapping.c
index 776da10..87c02fe 100644
--- a/drivers/iommu/intel_irq_remapping.c
+++ b/drivers/iommu/intel_irq_remapping.c
@@ -1172,6 +1172,33 @@ static int intel_alloc_hpet_msi(unsigned int irq, unsigned int id)
return ret;
}

+static int intel_update_pi_irte(int irq, u64 pi_desc_addr, u32 vector)
+{
+ struct irte irte;
+
+ if (get_irte(irq, &irte))
+ return -1;
+
+ irte.irq_post_low.urg = 0;
+ irte.irq_post_low.vector = vector;
+ irte.irq_post_low.pda_l = (pi_desc_addr >> (32 - PDA_LOW_BIT)) &
+ ~(-1UL << PDA_LOW_BIT);
+ irte.irq_post_high.pda_h = (pi_desc_addr >> 32) &
+ ~(-1UL << PDA_HIGH_BIT);
+
+ irte.irq_post_low.__reserved_1 = 0;
+ irte.irq_post_low.__reserved_2 = 0;
+ irte.irq_post_low.__reserved_3 = 0;
+ irte.irq_post_high.__reserved_4 = 0;
+
+ irte.irq_post_low.pst = 1;
+
+ if (modify_irte(irq, &irte))
+ return -1;
+
+ return 0;
+}
+
struct irq_remap_ops intel_irq_remap_ops = {
.supported = intel_irq_remapping_supported,
.prepare = dmar_table_init,
@@ -1186,4 +1213,5 @@ struct irq_remap_ops intel_irq_remap_ops = {
.msi_alloc_irq = intel_msi_alloc_irq,
.msi_setup_irq = intel_msi_setup_irq,
.alloc_hpet_msi = intel_alloc_hpet_msi,
+ .update_pi_irte = intel_update_pi_irte,
};
diff --git a/drivers/iommu/irq_remapping.c b/drivers/iommu/irq_remapping.c
index 2f8ee00..0e36860 100644
--- a/drivers/iommu/irq_remapping.c
+++ b/drivers/iommu/irq_remapping.c
@@ -362,6 +362,15 @@ int setup_hpet_msi_remapped(unsigned int irq, unsigned int id)
return default_setup_hpet_msi(irq, id);
}

+int update_pi_irte(unsigned int irq, u64 pi_desc_addr, u32 vector)
+{
+ if (!remap_ops || !remap_ops->update_pi_irte)
+ return -ENODEV;
+
+ return remap_ops->update_pi_irte(irq, pi_desc_addr, vector);
+}
+EXPORT_SYMBOL_GPL(update_pi_irte);
+
void panic_if_irq_remap(const char *msg)
{
if (irq_remapping_enabled)
diff --git a/drivers/iommu/irq_remapping.h b/drivers/iommu/irq_remapping.h
index 7bb5913..2d8f740 100644
--- a/drivers/iommu/irq_remapping.h
+++ b/drivers/iommu/irq_remapping.h
@@ -84,6 +84,9 @@ struct irq_remap_ops {

/* Setup interrupt remapping for an HPET MSI */
int (*alloc_hpet_msi)(unsigned int, unsigned int);
+
+ /* Update IRTE for posted-interrupt */
+ int (*update_pi_irte)(int irq, u64 pi_desc_addr, u32 vector);
};

extern struct irq_remap_ops intel_irq_remap_ops;
diff --git a/include/linux/dmar.h b/include/linux/dmar.h
index 8be5d42..e1ff4f7 100644
--- a/include/linux/dmar.h
+++ b/include/linux/dmar.h
@@ -160,6 +160,20 @@ struct irte {
__reserved_2 : 8,
dest_id : 32;
} irq_remap_low;
+
+ struct {
+ __u64 present : 1,
+ fpd : 1,
+ __reserved_1 : 6,
+ avail : 4,
+ __reserved_2 : 2,
+ urg : 1,
+ pst : 1,
+ vector : 8,
+ __reserved_3 : 14,
+ pda_l : 26;
+ } irq_post_low;
+
__u64 low;
};

@@ -170,10 +184,22 @@ struct irte {
svt : 2,
__reserved_3 : 44;
} irq_remap_high;
+
+ struct {
+ __u64 sid: 16,
+ sq: 2,
+ svt: 2,
+ __reserved_4: 12,
+ pda_h: 32;
+ } irq_post_high;
+
__u64 high;
};
};

+#define PDA_LOW_BIT 26
+#define PDA_HIGH_BIT 32
+
enum {
IRQ_REMAP_XAPIC_MODE,
IRQ_REMAP_X2APIC_MODE,
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index ea53b04..6bb8287 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -335,6 +335,25 @@ struct kvm_kernel_irq_routing_entry {
struct hlist_node link;
};

+#ifdef CONFIG_HAVE_KVM_IRQ_ROUTING
+
+struct kvm_irq_routing_table {
+ int chip[KVM_NR_IRQCHIPS][KVM_IRQCHIP_NUM_PINS];
+ struct kvm_kernel_irq_routing_entry *rt_entries;
+ u32 nr_rt_entries;
+ /*
+ * Array indexed by gsi. Each entry contains list of irq chips
+ * the gsi is connected to.
+ */
+ struct hlist_head map[0];
+};
+
+#else
+
+struct kvm_irq_routing_table {};
+
+#endif
+
#ifndef KVM_PRIVATE_MEM_SLOTS
#define KVM_PRIVATE_MEM_SLOTS 0
#endif
@@ -766,6 +785,9 @@ void kvm_unregister_irq_ack_notifier(struct kvm *kvm,
struct kvm_irq_ack_notifier *kian);
int kvm_request_irq_source_id(struct kvm *kvm);
void kvm_free_irq_source_id(struct kvm *kvm, int irq_source_id);
+void kvm_set_msi_irq(struct kvm_kernel_irq_routing_entry *e,
+ struct kvm_lapic_irq *irq);
+bool kvm_is_dm_lowest_prio(struct kvm_lapic_irq *irq);

#ifdef CONFIG_KVM_DEVICE_ASSIGNMENT
int kvm_iommu_map_pages(struct kvm *kvm, struct kvm_memory_slot *slot);
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 7593c52..509223a 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1027,6 +1027,7 @@ struct kvm_s390_ucas_mapping {
#define KVM_XEN_HVM_CONFIG _IOW(KVMIO, 0x7a, struct kvm_xen_hvm_config)
#define KVM_SET_CLOCK _IOW(KVMIO, 0x7b, struct kvm_clock_data)
#define KVM_GET_CLOCK _IOR(KVMIO, 0x7c, struct kvm_clock_data)
+#define KVM_ASSIGN_DEV_PI_UPDATE _IOR(KVMIO, 0x7d, __u32)
/* Available with KVM_CAP_PIT_STATE2 */
#define KVM_GET_PIT2 _IOR(KVMIO, 0x9f, struct kvm_pit_state2)
#define KVM_SET_PIT2 _IOW(KVMIO, 0xa0, struct kvm_pit_state2)
diff --git a/virt/kvm/assigned-dev.c b/virt/kvm/assigned-dev.c
index e05000e..e154009 100644
--- a/virt/kvm/assigned-dev.c
+++ b/virt/kvm/assigned-dev.c
@@ -326,6 +326,135 @@ void kvm_free_all_assigned_devices(struct kvm *kvm)
}
}

+int __weak kvm_update_pi_irte_common(struct kvm *kvm, struct kvm_vcpu *vcpu,
+ u32 guest_vector, int host_irq)
+{
+ return 0;
+}
+
+int kvm_compare_rr_counter(struct kvm_vcpu *vcpu1, struct kvm_vcpu *vcpu2)
+{
+ return vcpu1->arch.round_robin_counter -
+ vcpu2->arch.round_robin_counter;
+}
+
+bool kvm_pi_find_dest_vcpu(struct kvm *kvm, struct kvm_lapic_irq *irq,
+ struct kvm_vcpu **dest_vcpu)
+{
+ int i, r = 0;
+ struct kvm_vcpu *vcpu, *dest = NULL;
+
+ kvm_for_each_vcpu(i, vcpu, kvm) {
+ if (!kvm_apic_present(vcpu))
+ continue;
+
+ if (!kvm_apic_match_dest(vcpu, NULL, irq->shorthand,
+ irq->dest_id, irq->dest_mode))
+ continue;
+
+ if (!kvm_is_dm_lowest_prio(irq)) {
+ r++;
+ *dest_vcpu = vcpu;
+ } else if (kvm_lapic_enabled(vcpu)) {
+ if (!dest)
+ dest = vcpu;
+ else if (kvm_compare_rr_counter(vcpu, dest) < 0)
+ dest = vcpu;
+ }
+ }
+
+ if (dest) {
+ dest->arch.round_robin_counter++;
+ *dest_vcpu = dest;
+ return true;
+ } else if (r == 1)
+ return true;
+
+ return false;
+}
+
+static int __kvm_update_pi_irte(struct kvm *kvm, int host_irq, int guest_irq)
+{
+ struct kvm_kernel_irq_routing_entry *e;
+ struct kvm_irq_routing_table *irq_rt;
+ struct kvm_lapic_irq irq;
+ struct kvm_vcpu *vcpu;
+ int idx, ret = -EINVAL;
+
+ idx = srcu_read_lock(&kvm->irq_srcu);
+ irq_rt = srcu_dereference(kvm->irq_routing, &kvm->irq_srcu);
+ ASSERT(guest_irq < irq_rt->nr_rt_entries);
+
+ hlist_for_each_entry(e, &irq_rt->map[guest_irq], link) {
+ if (e->type != KVM_IRQ_ROUTING_MSI)
+ continue;
+ /*
+ * VT-d posted-interrupt has the following
+ * limitations:
+ * - No support for posting multicast/broadcast
+ * interrupts to a VCPU
+ * Still use interrupt remapping for these
+ * kind of interrupts
+ */
+
+ kvm_set_msi_irq(e, &irq);
+ if (!kvm_pi_find_dest_vcpu(kvm, &irq, &vcpu)) {
+ printk(KERN_INFO "%s: can not find the target VCPU\n",
+ __func__);
+ ret = -EINVAL;
+ goto out;
+ }
+
+ if (kvm_update_pi_irte_common(kvm, vcpu, irq.vector,
+ host_irq)) {
+ printk(KERN_INFO "%s: failed to update PI IRTE\n",
+ __func__);
+ ret = -EINVAL;
+ goto out;
+ }
+ }
+
+ ret = 0;
+out:
+ srcu_read_unlock(&kvm->irq_srcu, idx);
+ return ret;
+}
+
+int kvm_update_pi_irte(struct kvm *kvm, u32 dev_id)
+{
+ int i, rc = -1;
+ struct kvm_assigned_dev_kernel *dev;
+
+ mutex_lock(&kvm->lock);
+ dev = kvm_find_assigned_dev(&kvm->arch.assigned_dev_head, dev_id);
+ if (!dev) {
+ printk(KERN_INFO "%s: cannot find the assigned dev.\n",
+ __func__);
+ rc = -1;
+ goto out;
+ }
+
+ BUG_ON(dev->irq_requested_type == 0);
+
+ if ((dev->irq_requested_type & KVM_DEV_IRQ_HOST_MSI) &&
+ (dev->dev->msi_enabled == 1)) {
+ __kvm_update_pi_irte(kvm,
+ dev->host_irq, dev->guest_irq);
+ } else if ((dev->irq_requested_type & KVM_DEV_IRQ_HOST_MSIX) &&
+ (dev->dev->msix_enabled == 1)) {
+ for (i = 0; i < dev->entries_nr; i++) {
+ __kvm_update_pi_irte(kvm,
+ dev->host_msix_entries[i].vector,
+ dev->guest_msix_entries[i].vector);
+ }
+ }
+
+out:
+ rc = 0;
+ mutex_unlock(&kvm->lock);
+ return rc;
+}
+
static int assigned_device_enable_host_intx(struct kvm *kvm,
struct kvm_assigned_dev_kernel *dev)
{
@@ -1017,6 +1146,18 @@ long kvm_vm_ioctl_assigned_device(struct kvm *kvm, unsigned ioctl,
r = kvm_vm_ioctl_set_pci_irq_mask(kvm, &assigned_dev);
break;
}
+ case KVM_ASSIGN_DEV_PI_UPDATE: {
+ u32 dev_id;
+
+ r = -EFAULT;
+ if (copy_from_user(&dev_id, argp, sizeof(dev_id)))
+ goto out;
+ r = kvm_update_pi_irte(kvm, dev_id);
+ if (r)
+ goto out;
+ break;
+
+ }
default:
r = -ENOTTY;
break;
diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c
index 963b899..f51aed3 100644
--- a/virt/kvm/irq_comm.c
+++ b/virt/kvm/irq_comm.c
@@ -55,7 +55,7 @@ static int kvm_set_ioapic_irq(struct kvm_kernel_irq_routing_entry *e,
line_status);
}

-inline static bool kvm_is_dm_lowest_prio(struct kvm_lapic_irq *irq)
+bool kvm_is_dm_lowest_prio(struct kvm_lapic_irq *irq)
{
#ifdef CONFIG_IA64
return irq->delivery_mode ==
@@ -106,7 +106,7 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src,
return r;
}

-static inline void kvm_set_msi_irq(struct kvm_kernel_irq_routing_entry *e,
+void kvm_set_msi_irq(struct kvm_kernel_irq_routing_entry *e,
struct kvm_lapic_irq *irq)
{
trace_kvm_msi_set_irq(e->msi.address_lo, e->msi.data);
diff --git a/virt/kvm/irqchip.c b/virt/kvm/irqchip.c
index 7f256f3..cdf29a6 100644
--- a/virt/kvm/irqchip.c
+++ b/virt/kvm/irqchip.c
@@ -31,17 +31,6 @@
#include <trace/events/kvm.h>
#include "irq.h"

-struct kvm_irq_routing_table {
- int chip[KVM_NR_IRQCHIPS][KVM_IRQCHIP_NUM_PINS];
- struct kvm_kernel_irq_routing_entry *rt_entries;
- u32 nr_rt_entries;
- /*
- * Array indexed by gsi. Each entry contains list of irq chips
- * the gsi is connected to.
- */
- struct hlist_head map[0];
-};
-
int kvm_irq_map_gsi(struct kvm *kvm,
struct kvm_kernel_irq_routing_entry *entries, int gsi)
{
--
1.7.1

2014-11-10 06:36:42

by Wu, Feng

[permalink] [raw]
Subject: [PATCH 13/13] iommu/vt-d: Add a command line parameter for VT-d posted-interrupts

Enable VT-d Posted-Interrtups and add a command line
parameter for it.

Signed-off-by: Feng Wu <[email protected]>
---
drivers/iommu/irq_remapping.c | 9 ++++++++-
1 files changed, 8 insertions(+), 1 deletions(-)

diff --git a/drivers/iommu/irq_remapping.c b/drivers/iommu/irq_remapping.c
index 0e36860..3cb9429 100644
--- a/drivers/iommu/irq_remapping.c
+++ b/drivers/iommu/irq_remapping.c
@@ -23,7 +23,7 @@ int irq_remap_broken;
int disable_sourceid_checking;
int no_x2apic_optout;

-int disable_irq_post = 1;
+int disable_irq_post = 0;
int irq_post_enabled = 0;
EXPORT_SYMBOL_GPL(irq_post_enabled);

@@ -206,6 +206,13 @@ static __init int setup_irqremap(char *str)
}
early_param("intremap", setup_irqremap);

+static __init int setup_nointpost(char *str)
+{
+ disable_irq_post = 1;
+ return 0;
+}
+early_param("nointpost", setup_nointpost);
+
void __init setup_irq_remapping_ops(void)
{
remap_ops = &intel_irq_remap_ops;
--
1.7.1

2014-11-10 06:37:12

by Wu, Feng

[permalink] [raw]
Subject: [PATCH 08/13] KVM: Update Posted-Interrupts descriptor during VCPU scheduling

Update Posted-Interrupts descriptor according to the
following rules:
- Before VCPU block, set 'NV' to POSTED_INTR_WAKEUP_VECTOR
- After VCPU block, set 'NV' back to POSTED_INTR_VECTOR

Signed-off-by: Feng Wu <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 5 ++
arch/x86/kvm/vmx.c | 83 +++++++++++++++++++++++++++++++++++++++
arch/x86/kvm/x86.c | 16 +++++++
virt/kvm/kvm_main.c | 11 +++++
4 files changed, 115 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 0630161..71cfe3e 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -773,6 +773,8 @@ struct kvm_x86_ops {

void (*sched_in)(struct kvm_vcpu *kvm, int cpu);
u64 (*get_pi_desc_addr)(struct kvm_vcpu *vcpu);
+ int (*vcpu_pre_block)(struct kvm_vcpu *vcpu);
+ void (*vcpu_post_block)(struct kvm_vcpu *vcpu);
};

struct kvm_arch_async_pf {
@@ -1095,4 +1097,7 @@ int kvm_pmu_read_pmc(struct kvm_vcpu *vcpu, unsigned pmc, u64 *data);
void kvm_handle_pmu_event(struct kvm_vcpu *vcpu);
void kvm_deliver_pmi(struct kvm_vcpu *vcpu);

+int kvm_arch_vcpu_pre_block(struct kvm_vcpu *vcpu);
+void kvm_arch_vcpu_post_block(struct kvm_vcpu *vcpu);
+
#endif /* _ASM_X86_KVM_HOST_H */
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index f41111f..4c1a966 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -9153,6 +9153,86 @@ static void vmx_sched_in(struct kvm_vcpu *vcpu, int cpu)
shrink_ple_window(vcpu);
}

+static int vmx_vcpu_pre_block(struct kvm_vcpu *vcpu)
+{
+ struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
+ struct pi_desc old;
+ struct pi_desc new;
+
+ if (!irq_post_enabled)
+ return 0;
+
+ memset(&old, 0, sizeof(old));
+ memset(&new, 0, sizeof(new));
+
+ do {
+ old.control = new.control = pi_desc->control;
+
+ /*
+ * A posted-interrupt happened in the one of the
+ * following two cases:
+ * 1. After the latest pir-to-virr sync operation
+ * in kvm_arch_vcpu_runnable() function
+ * 2. In this do-while() loop, a posted-interrupt
+ * occurs.
+ *
+ * For either of above cases, we should not block
+ * the VCPU.
+ */
+ if (pi_test_on(pi_desc) == 1) {
+ /*
+ * Need to set this flag, then the inject will
+ * be synced from PIR to vIRR before VM-ENTRY.
+ * In fact, for guest IPI case, in function
+ * vmx_deliver_posted_interrupt(), this flags
+ * has already been set, but if the interrupt
+ * is injected by VT-d PI hardware, we need
+ * to set this.
+ */
+ kvm_make_request(KVM_REQ_EVENT, vcpu);
+ return 1;
+ }
+
+ pi_clear_sn(&new);
+
+ /* set 'NV' to 'wakeup vector' */
+ new.nv = POSTED_INTR_WAKEUP_VECTOR;
+ } while (cmpxchg(&pi_desc->control, old.control, new.control)
+ != old.control);
+
+ return 0;
+}
+
+static void vmx_vcpu_post_block(struct kvm_vcpu *vcpu)
+{
+ struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
+ struct pi_desc old;
+ struct pi_desc new;
+ unsigned int dest = 0;
+
+ if (!irq_post_enabled)
+ return;
+
+ pi_set_sn(pi_desc);
+
+ do {
+ old.control = new.control = pi_desc->control;
+
+ dest = cpu_physical_id(vcpu->cpu);
+
+ if (x2apic_mode)
+ new.ndst = dest;
+ else
+ new.ndst = (dest << 8) & 0xFF00;
+
+ /* set 'NV' to 'notification vector' */
+ new.nv = POSTED_INTR_VECTOR;
+ } while (cmpxchg(&pi_desc->control, old.control, new.control)
+ != old.control);
+
+ pi_clear_sn(pi_desc);
+}
+
static struct kvm_x86_ops vmx_x86_ops = {
.cpu_has_kvm_support = cpu_has_kvm_support,
.disabled_by_bios = vmx_disabled_by_bios,
@@ -9262,6 +9342,9 @@ static struct kvm_x86_ops vmx_x86_ops = {
.sched_in = vmx_sched_in,

.get_pi_desc_addr = vmx_get_pi_desc_addr,
+
+ .vcpu_pre_block = vmx_vcpu_pre_block,
+ .vcpu_post_block = vmx_vcpu_post_block,
};

static int __init vmx_init(void)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 0c19d15..d0c8bb2 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7746,6 +7746,22 @@ int kvm_update_pi_irte_common(struct kvm *kvm, struct kvm_vcpu *vcpu,
return 0;
}

+int kvm_arch_vcpu_pre_block(struct kvm_vcpu *vcpu)
+{
+ if (kvm_x86_ops->vcpu_pre_block)
+ return kvm_x86_ops->vcpu_pre_block(vcpu);
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(kvm_arch_vcpu_pre_block);
+
+void kvm_arch_vcpu_post_block(struct kvm_vcpu *vcpu)
+{
+ if (kvm_x86_ops->vcpu_post_block)
+ kvm_x86_ops->vcpu_post_block(vcpu);
+}
+EXPORT_SYMBOL_GPL(kvm_arch_vcpu_post_block);
+
EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_exit);
EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_inj_virq);
EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_page_fault);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 25ffac9..1be1a45 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1754,7 +1754,18 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu)
if (signal_pending(current))
break;

+#ifdef CONFIG_X86
+ if (kvm_arch_vcpu_pre_block(vcpu) == 1) {
+ kvm_make_request(KVM_REQ_UNHALT, vcpu);
+ break;
+ }
+#endif
+
schedule();
+
+#ifdef CONFIG_X86
+ kvm_arch_vcpu_post_block(vcpu);
+#endif
}

finish_wait(&vcpu->wq, &wait);
--
1.7.1

2014-11-10 06:37:48

by Wu, Feng

[permalink] [raw]
Subject: [PATCH 06/13] KVM: Add some helper functions for Posted-Interrupts

This patch adds three helper functions to manipulate the Posted-
Interrtups Decriptor.

Signed-off-by: Feng Wu <[email protected]>
---
arch/x86/kvm/vmx.c | 18 ++++++++++++++++++
1 files changed, 18 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index ae91b72..f41111f 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -435,6 +435,24 @@ static void pi_clear_sn(struct pi_desc *pi_desc)
(unsigned long *)&pi_desc->control);
}

+static void pi_set_sn(struct pi_desc *pi_desc)
+{
+ return set_bit(POSTED_INTR_SN,
+ (unsigned long *)&pi_desc->control);
+}
+
+static int pi_test_on(struct pi_desc *pi_desc)
+{
+ return test_bit(POSTED_INTR_ON,
+ (unsigned long *)&pi_desc->control);
+}
+
+static int pi_test_sn(struct pi_desc *pi_desc)
+{
+ return test_bit(POSTED_INTR_SN,
+ (unsigned long *)&pi_desc->control);
+}
+
static bool pi_test_and_set_on(struct pi_desc *pi_desc)
{
return test_and_set_bit(POSTED_INTR_ON,
--
1.7.1

2014-11-10 06:36:39

by Wu, Feng

[permalink] [raw]
Subject: [PATCH 07/13] x86, irq: Define a global vector for VT-d Posted-Interrupts

Currently, we use a global vector as the Posted-Interrupts
Notification Event for all the VCPUs in the system. We need
to introduce another global vector for VT-d Posted-Interrtups,
which will be used to wakeup the sleep VCPU when an external
interrupt from a direct-assigned device happens for that VCPU.

Signed-off-by: Feng Wu <[email protected]>
---
arch/x86/include/asm/entry_arch.h | 2 ++
arch/x86/include/asm/hardirq.h | 1 +
arch/x86/include/asm/hw_irq.h | 2 ++
arch/x86/include/asm/irq_vectors.h | 1 +
arch/x86/kernel/entry_64.S | 2 ++
arch/x86/kernel/irq.c | 27 +++++++++++++++++++++++++++
arch/x86/kernel/irqinit.c | 2 ++
7 files changed, 37 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/entry_arch.h b/arch/x86/include/asm/entry_arch.h
index dc5fa66..27ca0af 100644
--- a/arch/x86/include/asm/entry_arch.h
+++ b/arch/x86/include/asm/entry_arch.h
@@ -23,6 +23,8 @@ BUILD_INTERRUPT(x86_platform_ipi, X86_PLATFORM_IPI_VECTOR)
#ifdef CONFIG_HAVE_KVM
BUILD_INTERRUPT3(kvm_posted_intr_ipi, POSTED_INTR_VECTOR,
smp_kvm_posted_intr_ipi)
+BUILD_INTERRUPT3(kvm_posted_intr_wakeup_ipi, POSTED_INTR_WAKEUP_VECTOR,
+ smp_kvm_posted_intr_wakeup_ipi)
#endif

/*
diff --git a/arch/x86/include/asm/hardirq.h b/arch/x86/include/asm/hardirq.h
index 0f5fb6b..9866065 100644
--- a/arch/x86/include/asm/hardirq.h
+++ b/arch/x86/include/asm/hardirq.h
@@ -14,6 +14,7 @@ typedef struct {
#endif
#ifdef CONFIG_HAVE_KVM
unsigned int kvm_posted_intr_ipis;
+ unsigned int kvm_posted_intr_wakeup_ipis;
#endif
unsigned int x86_platform_ipis; /* arch dependent */
unsigned int apic_perf_irqs;
diff --git a/arch/x86/include/asm/hw_irq.h b/arch/x86/include/asm/hw_irq.h
index 4615906..559563c 100644
--- a/arch/x86/include/asm/hw_irq.h
+++ b/arch/x86/include/asm/hw_irq.h
@@ -29,6 +29,7 @@
extern asmlinkage void apic_timer_interrupt(void);
extern asmlinkage void x86_platform_ipi(void);
extern asmlinkage void kvm_posted_intr_ipi(void);
+extern asmlinkage void kvm_posted_intr_wakeup_ipi(void);
extern asmlinkage void error_interrupt(void);
extern asmlinkage void irq_work_interrupt(void);

@@ -92,6 +93,7 @@ extern void trace_call_function_single_interrupt(void);
#define trace_irq_move_cleanup_interrupt irq_move_cleanup_interrupt
#define trace_reboot_interrupt reboot_interrupt
#define trace_kvm_posted_intr_ipi kvm_posted_intr_ipi
+#define trace_kvm_posted_intr_wakeup_ipi kvm_posted_intr_wakeup_ipi
#endif /* CONFIG_TRACING */

/* IOAPIC */
diff --git a/arch/x86/include/asm/irq_vectors.h b/arch/x86/include/asm/irq_vectors.h
index 5702d7e..1343349 100644
--- a/arch/x86/include/asm/irq_vectors.h
+++ b/arch/x86/include/asm/irq_vectors.h
@@ -105,6 +105,7 @@
/* Vector for KVM to deliver posted interrupt IPI */
#ifdef CONFIG_HAVE_KVM
#define POSTED_INTR_VECTOR 0xf2
+#define POSTED_INTR_WAKEUP_VECTOR 0xf1
#endif

/*
diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index df088bb..7663aaa 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -1004,6 +1004,8 @@ apicinterrupt X86_PLATFORM_IPI_VECTOR \
#ifdef CONFIG_HAVE_KVM
apicinterrupt3 POSTED_INTR_VECTOR \
kvm_posted_intr_ipi smp_kvm_posted_intr_ipi
+apicinterrupt3 POSTED_INTR_WAKEUP_VECTOR \
+ kvm_posted_intr_wakeup_ipi smp_kvm_posted_intr_wakeup_ipi
#endif

#ifdef CONFIG_X86_MCE_THRESHOLD
diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c
index 922d285..47408c3 100644
--- a/arch/x86/kernel/irq.c
+++ b/arch/x86/kernel/irq.c
@@ -237,6 +237,9 @@ __visible void smp_x86_platform_ipi(struct pt_regs *regs)
}

#ifdef CONFIG_HAVE_KVM
+void (*wakeup_handler_callback)(void) = NULL;
+EXPORT_SYMBOL_GPL(wakeup_handler_callback);
+
/*
* Handler for POSTED_INTERRUPT_VECTOR.
*/
@@ -256,6 +259,30 @@ __visible void smp_kvm_posted_intr_ipi(struct pt_regs *regs)

set_irq_regs(old_regs);
}
+
+/*
+ * Handler for POSTED_INTERRUPT_WAKEUP_VECTOR.
+ */
+__visible void smp_kvm_posted_intr_wakeup_ipi(struct pt_regs *regs)
+{
+ struct pt_regs *old_regs = set_irq_regs(regs);
+
+ ack_APIC_irq();
+
+ irq_enter();
+
+ exit_idle();
+
+ inc_irq_stat(kvm_posted_intr_wakeup_ipis);
+
+ if (wakeup_handler_callback)
+ wakeup_handler_callback();
+
+ irq_exit();
+
+ set_irq_regs(old_regs);
+}
+
#endif

__visible void smp_trace_x86_platform_ipi(struct pt_regs *regs)
diff --git a/arch/x86/kernel/irqinit.c b/arch/x86/kernel/irqinit.c
index 4de73ee..659cde3 100644
--- a/arch/x86/kernel/irqinit.c
+++ b/arch/x86/kernel/irqinit.c
@@ -168,6 +168,8 @@ static void __init apic_intr_init(void)
#ifdef CONFIG_HAVE_KVM
/* IPI for KVM to deliver posted interrupt */
alloc_intr_gate(POSTED_INTR_VECTOR, kvm_posted_intr_ipi);
+ /* IPI for KVM to deliver interrupt to wake up tasks */
+ alloc_intr_gate(POSTED_INTR_WAKEUP_VECTOR, kvm_posted_intr_wakeup_ipi);
#endif

/* IPI vectors for APIC spurious and error interrupts */
--
1.7.1

2014-11-10 06:38:29

by Wu, Feng

[permalink] [raw]
Subject: [PATCH 12/13] iommu/vt-d: No need to migrating irq for VT-d Posted-Interrtups

We don't need to migrate the irqs for VT-d Posted-Interrtups here.
When 'pst' is set in IRTE, the associated irq will be posted to
guests instead of interrupt remapping. The destination of the
interrupt is set in Posted-Interrupts Descriptor, and the migration
happens during VCPU scheduling.

Signed-off-by: Feng Wu <[email protected]>
---
drivers/iommu/intel_irq_remapping.c | 7 +++++++
1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/drivers/iommu/intel_irq_remapping.c b/drivers/iommu/intel_irq_remapping.c
index 87c02fe..249e2b1 100644
--- a/drivers/iommu/intel_irq_remapping.c
+++ b/drivers/iommu/intel_irq_remapping.c
@@ -1038,6 +1038,13 @@ intel_ioapic_set_affinity(struct irq_data *data, const struct cpumask *mask,
if (get_irte(irq, &irte))
return -EBUSY;

+ /*
+ * If the interrupt is for posting, it is used by guests,
+ * we cannot change IRTE here.
+ */
+ if (irte.irq_post_low.pst == 1)
+ return 0;
+
err = assign_irq_vector(irq, cfg, mask);
if (err)
return err;
--
1.7.1

2014-11-10 06:38:49

by Wu, Feng

[permalink] [raw]
Subject: [PATCH 11/13] KVM: Suppress posted-interrupt when 'SN' is set

Currently, we don't support urgent interrupt, all interrupts
are recognized as non-urgent interrupt, so we cannot send
posted-interrupt when 'SN' is set.

Signed-off-by: Feng Wu <[email protected]>
---
arch/x86/kvm/vmx.c | 11 +++++++++--
1 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 51d2c8a..495cfbd 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -4306,15 +4306,22 @@ static int vmx_vm_has_apicv(struct kvm *kvm)
static void vmx_deliver_posted_interrupt(struct kvm_vcpu *vcpu, int vector)
{
struct vcpu_vmx *vmx = to_vmx(vcpu);
- int r;
+ int r, sn;

if (pi_test_and_set_pir(vector, &vmx->pi_desc))
return;

+ /*
+ * Currently, we don't support urgent interrupt, all interrupts
+ * are recognized as non-urgent interrupt, so we cannot send
+ * posted-interrupt when 'SN' is set.
+ */
+ sn = pi_test_sn(&vmx->pi_desc);
+
r = pi_test_and_set_on(&vmx->pi_desc);
kvm_make_request(KVM_REQ_EVENT, vcpu);
#ifdef CONFIG_SMP
- if (!r && (vcpu->mode == IN_GUEST_MODE))
+ if (!r && !sn && (vcpu->mode == IN_GUEST_MODE))
apic->send_IPI_mask(get_cpu_mask(vcpu->cpu),
POSTED_INTR_VECTOR);
else
--
1.7.1

2014-11-10 06:39:13

by Wu, Feng

[permalink] [raw]
Subject: [PATCH 09/13] KVM: Change NDST field after VCPU scheduling

This patch changes the NDST filed of Posted-Interrupts
Descriptor after VCPU is scheduled to another physical
CPU.

Signed-off-by: Feng Wu <[email protected]>
---
arch/x86/kvm/vmx.c | 25 +++++++++++++++++++++++++
1 files changed, 25 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 4c1a966..fa77714 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -1906,6 +1906,31 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
vmcs_writel(HOST_IA32_SYSENTER_ESP, sysenter_esp); /* 22.2.3 */
vmx->loaded_vmcs->cpu = cpu;
}
+
+ if (irq_post_enabled && (vcpu->cpu != cpu)) {
+ struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
+ struct pi_desc old, new;
+ unsigned int dest;
+
+ memset(&old, 0, sizeof(old));
+ memset(&new, 0, sizeof(new));
+
+ pi_set_sn(pi_desc);
+
+ do {
+ old.control = new.control = pi_desc->control;
+
+ dest = cpu_physical_id(cpu);
+
+ if (x2apic_mode)
+ new.ndst = dest;
+ else
+ new.ndst = (dest << 8) & 0xFF00;
+
+ } while (cmpxchg(&pi_desc->control, old.control,
+ new.control) != old.control);
+ pi_clear_sn(pi_desc);
+ }
}

static void vmx_vcpu_put(struct kvm_vcpu *vcpu)
--
1.7.1

2014-11-10 06:39:11

by Wu, Feng

[permalink] [raw]
Subject: [PATCH 10/13] KVM: Add the handler for Wake-up Vector

When VCPU is blocked and an external interrupts from assigned
devices is delivered to it, VT-d Posted-Interrupts mechanism
will deliver a interrrupt to the associated physical CPU with
Wake-up Vector. In its handler, we find the destination VCPU
and wake up it.

Signed-off-by: Feng Wu <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 2 +
arch/x86/kvm/vmx.c | 52 +++++++++++++++++++++++++++++++++++++++
arch/x86/kvm/x86.c | 22 +++++++++++-----
include/linux/kvm_host.h | 3 ++
virt/kvm/kvm_main.c | 3 ++
5 files changed, 75 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 71cfe3e..ca231a3 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -99,6 +99,8 @@ static inline gfn_t gfn_to_index(gfn_t gfn, gfn_t base_gfn, int level)

#define ASYNC_PF_PER_VCPU 64

+extern void (*wakeup_handler_callback)(void);
+
enum kvm_reg {
VCPU_REGS_RAX = 0,
VCPU_REGS_RCX = 1,
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index fa77714..51d2c8a 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -822,6 +822,13 @@ static DEFINE_PER_CPU(struct vmcs *, current_vmcs);
static DEFINE_PER_CPU(struct list_head, loaded_vmcss_on_cpu);
static DEFINE_PER_CPU(struct desc_ptr, host_gdt);

+/*
+ * We maintian a per-CPU linked-list of VCPU, so in wakeup_handler() we
+ * can find which VCPU should be waken up.
+ */
+static DEFINE_PER_CPU(struct list_head, blocked_vcpu_on_cpu);
+static DEFINE_PER_CPU(spinlock_t, blocked_vcpu_on_cpu_lock);
+
static unsigned long *vmx_io_bitmap_a;
static unsigned long *vmx_io_bitmap_b;
static unsigned long *vmx_msr_bitmap_legacy;
@@ -2813,6 +2820,8 @@ static int hardware_enable(void)
return -EBUSY;

INIT_LIST_HEAD(&per_cpu(loaded_vmcss_on_cpu, cpu));
+ INIT_LIST_HEAD(&per_cpu(blocked_vcpu_on_cpu, cpu));
+ spin_lock_init(&per_cpu(blocked_vcpu_on_cpu_lock, cpu));

/*
* Now we can enable the vmclear operation in kdump
@@ -9183,6 +9192,7 @@ static int vmx_vcpu_pre_block(struct kvm_vcpu *vcpu)
struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
struct pi_desc old;
struct pi_desc new;
+ unsigned long flags;

if (!irq_post_enabled)
return 0;
@@ -9222,9 +9232,22 @@ static int vmx_vcpu_pre_block(struct kvm_vcpu *vcpu)

/* set 'NV' to 'wakeup vector' */
new.nv = POSTED_INTR_WAKEUP_VECTOR;
+
+ /*
+ * We should save physical cpu id here, vcpu->cpu may
+ * be changed due to preemption, in that case, this
+ * do-while loop will run again.
+ */
+ vcpu->wakeup_cpu = vcpu->cpu;
} while (cmpxchg(&pi_desc->control, old.control, new.control)
!= old.control);

+ spin_lock_irqsave(&per_cpu(blocked_vcpu_on_cpu_lock,
+ vcpu->wakeup_cpu), flags);
+ list_add_tail(&vcpu->blocked_vcpu_list,
+ &per_cpu(blocked_vcpu_on_cpu, vcpu->wakeup_cpu));
+ spin_unlock_irqrestore(&per_cpu(blocked_vcpu_on_cpu_lock,
+ vcpu->wakeup_cpu), flags);
return 0;
}

@@ -9234,6 +9257,7 @@ static void vmx_vcpu_post_block(struct kvm_vcpu *vcpu)
struct pi_desc old;
struct pi_desc new;
unsigned int dest = 0;
+ unsigned long flags;

if (!irq_post_enabled)
return;
@@ -9255,6 +9279,13 @@ static void vmx_vcpu_post_block(struct kvm_vcpu *vcpu)
} while (cmpxchg(&pi_desc->control, old.control, new.control)
!= old.control);

+ spin_lock_irqsave(&per_cpu(blocked_vcpu_on_cpu_lock,
+ vcpu->wakeup_cpu), flags);
+ list_del(&vcpu->blocked_vcpu_list);
+ spin_unlock_irqrestore(&per_cpu(blocked_vcpu_on_cpu_lock,
+ vcpu->wakeup_cpu), flags);
+ vcpu->wakeup_cpu = -1;
+
pi_clear_sn(pi_desc);
}

@@ -9372,6 +9403,25 @@ static struct kvm_x86_ops vmx_x86_ops = {
.vcpu_post_block = vmx_vcpu_post_block,
};

+/*
+ * Handler for POSTED_INTERRUPT_WAKEUP_VECTOR.
+ */
+void wakeup_handler(void)
+{
+ struct kvm_vcpu *vcpu;
+ int cpu = smp_processor_id();
+
+ spin_lock(&per_cpu(blocked_vcpu_on_cpu_lock, cpu));
+ list_for_each_entry(vcpu, &per_cpu(blocked_vcpu_on_cpu, cpu),
+ blocked_vcpu_list) {
+ struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
+
+ if (pi_test_on(pi_desc) == 1)
+ kvm_vcpu_kick(vcpu);
+ }
+ spin_unlock(&per_cpu(blocked_vcpu_on_cpu_lock, cpu));
+}
+
static int __init vmx_init(void)
{
int r, i, msr;
@@ -9486,6 +9536,8 @@ static int __init vmx_init(void)

update_ple_window_actual_max();

+ wakeup_handler_callback = wakeup_handler;
+
return 0;

out7:
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index d0c8bb2..2061b3d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6156,6 +6156,21 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
kvm_vcpu_reload_apic_access_page(vcpu);
}

+ /*
+ * Since posted-interrupts can be set by VT-d HW now, in this
+ * case, KVM_REQ_EVENT is not set. We move the following
+ * operations out of the if statement.
+ */
+ if (kvm_lapic_enabled(vcpu)) {
+ /*
+ * Update architecture specific hints for APIC
+ * virtual interrupt delivery.
+ */
+ if (kvm_x86_ops->hwapic_irr_update)
+ kvm_x86_ops->hwapic_irr_update(vcpu,
+ kvm_lapic_find_highest_irr(vcpu));
+ }
+
if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) {
kvm_apic_accept_events(vcpu);
if (vcpu->arch.mp_state == KVM_MP_STATE_INIT_RECEIVED) {
@@ -6172,13 +6187,6 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
kvm_x86_ops->enable_irq_window(vcpu);

if (kvm_lapic_enabled(vcpu)) {
- /*
- * Update architecture specific hints for APIC
- * virtual interrupt delivery.
- */
- if (kvm_x86_ops->hwapic_irr_update)
- kvm_x86_ops->hwapic_irr_update(vcpu,
- kvm_lapic_find_highest_irr(vcpu));
update_cr8_intercept(vcpu);
kvm_lapic_sync_to_vapic(vcpu);
}
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 6bb8287..614b4ba 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -239,6 +239,9 @@ struct kvm_vcpu {
unsigned long requests;
unsigned long guest_debug;

+ int wakeup_cpu;
+ struct list_head blocked_vcpu_list;
+
struct mutex mutex;
struct kvm_run *run;

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 1be1a45..fb3e504 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -224,6 +224,9 @@ int kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, unsigned id)
init_waitqueue_head(&vcpu->wq);
kvm_async_pf_vcpu_init(vcpu);

+ vcpu->wakeup_cpu = -1;
+ INIT_LIST_HEAD(&vcpu->blocked_vcpu_list);
+
page = alloc_page(GFP_KERNEL | __GFP_ZERO);
if (!page) {
r = -ENOMEM;
--
1.7.1

2014-11-10 06:40:31

by Wu, Feng

[permalink] [raw]
Subject: [PATCH 02/13] KVM: Initialize VT-d Posted-Interrtups Descriptor

This patch initialize the VT-d Posted-interrupt Descritpor.

Signed-off-by: Feng Wu <[email protected]>
---
arch/x86/include/asm/irq_remapping.h | 1 +
arch/x86/kernel/apic/apic.c | 1 +
arch/x86/kvm/vmx.c | 56 ++++++++++++++++++++++++++++++++-
3 files changed, 56 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/irq_remapping.h b/arch/x86/include/asm/irq_remapping.h
index b7747c4..a3cc437 100644
--- a/arch/x86/include/asm/irq_remapping.h
+++ b/arch/x86/include/asm/irq_remapping.h
@@ -57,6 +57,7 @@ extern bool setup_remapped_irq(int irq,
struct irq_chip *chip);

void irq_remap_modify_chip_defaults(struct irq_chip *chip);
+extern int irq_post_enabled;

#else /* CONFIG_IRQ_REMAP */

diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index ba6cc04..987408d 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -162,6 +162,7 @@ __setup("apicpmtimer", setup_apicpmtimer);
#endif

int x2apic_mode;
+EXPORT_SYMBOL_GPL(x2apic_mode);
#ifdef CONFIG_X86_X2APIC
/* x2apic enabled before OS handover */
int x2apic_preenabled;
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 3e556c6..a4670d3 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -45,6 +45,7 @@
#include <asm/perf_event.h>
#include <asm/debugreg.h>
#include <asm/kexec.h>
+#include <asm/irq_remapping.h>

#include "trace.h"

@@ -408,13 +409,32 @@ struct nested_vmx {
};

#define POSTED_INTR_ON 0
+#define POSTED_INTR_SN 1
+
/* Posted-Interrupt Descriptor */
struct pi_desc {
u32 pir[8]; /* Posted interrupt requested */
- u32 control; /* bit 0 of control is outstanding notification bit */
- u32 rsvd[7];
+ union {
+ struct {
+ u64 on : 1,
+ sn : 1,
+ rsvd_1 : 13,
+ ndm : 1,
+ nv : 8,
+ rsvd_2 : 8,
+ ndst : 32;
+ };
+ u64 control;
+ };
+ u32 rsvd[6];
} __aligned(64);

+static void pi_clear_sn(struct pi_desc *pi_desc)
+{
+ return clear_bit(POSTED_INTR_SN,
+ (unsigned long *)&pi_desc->control);
+}
+
static bool pi_test_and_set_on(struct pi_desc *pi_desc)
{
return test_and_set_bit(POSTED_INTR_ON,
@@ -4396,6 +4416,33 @@ static void ept_set_mmio_spte_mask(void)
kvm_mmu_set_mmio_spte_mask((0x3ull << 62) | 0x6ull);
}

+static bool pi_desc_init(struct vcpu_vmx *vmx)
+{
+ unsigned int dest;
+
+ if (irq_post_enabled == 0)
+ return true;
+
+ /*
+ * Initialize Posted-Interrupt Descriptor
+ */
+
+ pi_clear_sn(&vmx->pi_desc);
+ vmx->pi_desc.nv = POSTED_INTR_VECTOR;
+
+ /* Physical mode for Notificaiton Event */
+ vmx->pi_desc.ndm = 0;
+ dest = cpu_physical_id(vmx->vcpu.cpu);
+
+ if (x2apic_mode)
+ vmx->pi_desc.ndst = dest;
+ else
+ vmx->pi_desc.ndst = (dest << 8) & 0xFF00;
+
+ return true;
+}
+
+
/*
* Sets up the vmcs for emulated real mode.
*/
@@ -4439,6 +4486,11 @@ static int vmx_vcpu_setup(struct vcpu_vmx *vmx)

vmcs_write64(POSTED_INTR_NV, POSTED_INTR_VECTOR);
vmcs_write64(POSTED_INTR_DESC_ADDR, __pa((&vmx->pi_desc)));
+
+ if (!pi_desc_init(vmx)) {
+ printk(KERN_ERR "Initialize PI descriptor error!\n");
+ return 1;
+ }
}

if (ple_gap) {
--
1.7.1

2014-11-10 06:41:09

by Wu, Feng

[permalink] [raw]
Subject: [PATCH 04/13] iommu/vt-d: Adjust 'struct irte' to better suit for VT-d Posted-Interrupts

This patch adjusts the definition of 'struct irte', so that we can
add the VT-d Posted-Interrtups format in this structure later.

Signed-off-by: Feng Wu <[email protected]>
---
drivers/iommu/intel_irq_remapping.c | 35 +++++++++++++++++++----------------
include/linux/dmar.h | 4 ++--
2 files changed, 21 insertions(+), 18 deletions(-)

diff --git a/drivers/iommu/intel_irq_remapping.c b/drivers/iommu/intel_irq_remapping.c
index f99f0f1..776da10 100644
--- a/drivers/iommu/intel_irq_remapping.c
+++ b/drivers/iommu/intel_irq_remapping.c
@@ -310,9 +310,9 @@ static void set_irte_sid(struct irte *irte, unsigned int svt,
{
if (disable_sourceid_checking)
svt = SVT_NO_VERIFY;
- irte->svt = svt;
- irte->sq = sq;
- irte->sid = sid;
+ irte->irq_remap_high.svt = svt;
+ irte->irq_remap_high.sq = sq;
+ irte->irq_remap_high.sid = sid;
}

static int set_ioapic_sid(struct irte *irte, int apic)
@@ -917,8 +917,8 @@ static void prepare_irte(struct irte *irte, int vector,
{
memset(irte, 0, sizeof(*irte));

- irte->present = 1;
- irte->dst_mode = apic->irq_dest_mode;
+ irte->irq_remap_low.present = 1;
+ irte->irq_remap_low.dst_mode = apic->irq_dest_mode;
/*
* Trigger mode in the IRTE will always be edge, and for IO-APIC, the
* actual level or edge trigger will be setup in the IO-APIC
@@ -926,11 +926,11 @@ static void prepare_irte(struct irte *irte, int vector,
* For more details, see the comments (in io_apic.c) explainig IO-APIC
* irq migration in the presence of interrupt-remapping.
*/
- irte->trigger_mode = 0;
- irte->dlvry_mode = apic->irq_delivery_mode;
- irte->vector = vector;
- irte->dest_id = IRTE_DEST(dest);
- irte->redir_hint = 1;
+ irte->irq_remap_low.trigger_mode = 0;
+ irte->irq_remap_low.dlvry_mode = apic->irq_delivery_mode;
+ irte->irq_remap_low.vector = vector;
+ irte->irq_remap_low.dest_id = IRTE_DEST(dest);
+ irte->irq_remap_low.redir_hint = 1;
}

static int intel_setup_ioapic_entry(int irq,
@@ -973,10 +973,13 @@ static int intel_setup_ioapic_entry(int irq,
"Redir_hint:%d Trig_Mode:%d Dlvry_Mode:%X "
"Avail:%X Vector:%02X Dest:%08X "
"SID:%04X SQ:%X SVT:%X)\n",
- attr->ioapic, irte.present, irte.fpd, irte.dst_mode,
- irte.redir_hint, irte.trigger_mode, irte.dlvry_mode,
- irte.avail, irte.vector, irte.dest_id,
- irte.sid, irte.sq, irte.svt);
+ attr->ioapic, irte.irq_remap_low.present,
+ irte.irq_remap_low.fpd, irte.irq_remap_low.dst_mode,
+ irte.irq_remap_low.redir_hint, irte.irq_remap_low.trigger_mode,
+ irte.irq_remap_low.dlvry_mode, irte.irq_remap_low.avail,
+ irte.irq_remap_low.vector, irte.irq_remap_low.dest_id,
+ irte.irq_remap_high.sid, irte.irq_remap_high.sq,
+ irte.irq_remap_high.svt);

entry = (struct IR_IO_APIC_route_entry *)route_entry;
memset(entry, 0, sizeof(*entry));
@@ -1046,8 +1049,8 @@ intel_ioapic_set_affinity(struct irq_data *data, const struct cpumask *mask,
return err;
}

- irte.vector = cfg->vector;
- irte.dest_id = IRTE_DEST(dest);
+ irte.irq_remap_low.vector = cfg->vector;
+ irte.irq_remap_low.dest_id = IRTE_DEST(dest);

/*
* Atomically updates the IRTE with the new destination, vector
diff --git a/include/linux/dmar.h b/include/linux/dmar.h
index 593fff9..8be5d42 100644
--- a/include/linux/dmar.h
+++ b/include/linux/dmar.h
@@ -159,7 +159,7 @@ struct irte {
vector : 8,
__reserved_2 : 8,
dest_id : 32;
- };
+ } irq_remap_low;
__u64 low;
};

@@ -169,7 +169,7 @@ struct irte {
sq : 2,
svt : 2,
__reserved_3 : 44;
- };
+ } irq_remap_high;
__u64 high;
};
};
--
1.7.1

2014-11-10 06:41:31

by Wu, Feng

[permalink] [raw]
Subject: [PATCH 03/13] KVM: Add KVM_CAP_PI to detect VT-d Posted-Interrtups

This patch adds KVM_CAP_PI to detect VT-d Posted-Interrtups
feature for QEMU.

Signed-off-by: Feng Wu <[email protected]>
---
arch/x86/kvm/x86.c | 4 ++++
include/uapi/linux/kvm.h | 1 +
2 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 0033df3..b447a98 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -63,6 +63,7 @@
#include <asm/xcr.h>
#include <asm/pvclock.h>
#include <asm/div64.h>
+#include <asm/irq_remapping.h>

#define MAX_IO_MSRS 256
#define KVM_MAX_MCE_BANKS 32
@@ -2775,6 +2776,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
case KVM_CAP_TSC_DEADLINE_TIMER:
r = boot_cpu_has(X86_FEATURE_TSC_DEADLINE_TIMER);
break;
+ case KVM_CAP_PI:
+ r = irq_post_enabled;
+ break;
default:
r = 0;
break;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 6076882..7593c52 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -761,6 +761,7 @@ struct kvm_ppc_smmu_info {
#define KVM_CAP_PPC_FIXUP_HCALL 103
#define KVM_CAP_PPC_ENABLE_HCALL 104
#define KVM_CAP_CHECK_EXTENSION_VM 105
+#define KVM_CAP_PI 106

#ifdef KVM_CAP_IRQ_ROUTING

--
1.7.1

2014-11-10 18:16:31

by Thomas Gleixner

[permalink] [raw]
Subject: Re: several messages

On Mon, 10 Nov 2014, Feng Wu wrote:

> VT-d Posted-Interrupts is an enhancement to CPU side Posted-Interrupt.
> With VT-d Posted-Interrupts enabled, external interrupts from
> direct-assigned devices can be delivered to guests without VMM
> intervention when guest is running in non-root mode.

Can you please talk to Jiang and synchronize your work with his
refactoring of the x86 interrupt handling subsystem.

I want this stuff cleaned up first before we add new stuff to it.

Thanks,

tglx

2014-11-10 21:57:54

by Alex Williamson

[permalink] [raw]
Subject: Re: [PATCH 02/13] KVM: Initialize VT-d Posted-Interrtups Descriptor

On Mon, 2014-11-10 at 14:26 +0800, Feng Wu wrote:
> This patch initialize the VT-d Posted-interrupt Descritpor.
>
> Signed-off-by: Feng Wu <[email protected]>
> ---
> arch/x86/include/asm/irq_remapping.h | 1 +
> arch/x86/kernel/apic/apic.c | 1 +
> arch/x86/kvm/vmx.c | 56 ++++++++++++++++++++++++++++++++-
> 3 files changed, 56 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/include/asm/irq_remapping.h b/arch/x86/include/asm/irq_remapping.h
> index b7747c4..a3cc437 100644
> --- a/arch/x86/include/asm/irq_remapping.h
> +++ b/arch/x86/include/asm/irq_remapping.h
> @@ -57,6 +57,7 @@ extern bool setup_remapped_irq(int irq,
> struct irq_chip *chip);
>
> void irq_remap_modify_chip_defaults(struct irq_chip *chip);
> +extern int irq_post_enabled;
>
> #else /* CONFIG_IRQ_REMAP */
>
> diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
> index ba6cc04..987408d 100644
> --- a/arch/x86/kernel/apic/apic.c
> +++ b/arch/x86/kernel/apic/apic.c
> @@ -162,6 +162,7 @@ __setup("apicpmtimer", setup_apicpmtimer);
> #endif
>
> int x2apic_mode;
> +EXPORT_SYMBOL_GPL(x2apic_mode);
> #ifdef CONFIG_X86_X2APIC
> /* x2apic enabled before OS handover */
> int x2apic_preenabled;
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 3e556c6..a4670d3 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -45,6 +45,7 @@
> #include <asm/perf_event.h>
> #include <asm/debugreg.h>
> #include <asm/kexec.h>
> +#include <asm/irq_remapping.h>
>
> #include "trace.h"
>
> @@ -408,13 +409,32 @@ struct nested_vmx {
> };
>
> #define POSTED_INTR_ON 0
> +#define POSTED_INTR_SN 1
> +
> /* Posted-Interrupt Descriptor */
> struct pi_desc {
> u32 pir[8]; /* Posted interrupt requested */
> - u32 control; /* bit 0 of control is outstanding notification bit */
> - u32 rsvd[7];
> + union {
> + struct {
> + u64 on : 1,
> + sn : 1,
> + rsvd_1 : 13,
> + ndm : 1,
> + nv : 8,
> + rsvd_2 : 8,
> + ndst : 32;
> + };
> + u64 control;
> + };
> + u32 rsvd[6];
> } __aligned(64);
>
> +static void pi_clear_sn(struct pi_desc *pi_desc)
> +{
> + return clear_bit(POSTED_INTR_SN,
> + (unsigned long *)&pi_desc->control);
> +}
> +
> static bool pi_test_and_set_on(struct pi_desc *pi_desc)
> {
> return test_and_set_bit(POSTED_INTR_ON,
> @@ -4396,6 +4416,33 @@ static void ept_set_mmio_spte_mask(void)
> kvm_mmu_set_mmio_spte_mask((0x3ull << 62) | 0x6ull);
> }
>
> +static bool pi_desc_init(struct vcpu_vmx *vmx)
> +{
> + unsigned int dest;
> +
> + if (irq_post_enabled == 0)
> + return true;
> +
> + /*
> + * Initialize Posted-Interrupt Descriptor
> + */
> +
> + pi_clear_sn(&vmx->pi_desc);
> + vmx->pi_desc.nv = POSTED_INTR_VECTOR;
> +
> + /* Physical mode for Notificaiton Event */
> + vmx->pi_desc.ndm = 0;
> + dest = cpu_physical_id(vmx->vcpu.cpu);
> +
> + if (x2apic_mode)
> + vmx->pi_desc.ndst = dest;
> + else
> + vmx->pi_desc.ndst = (dest << 8) & 0xFF00;
> +
> + return true;

Why does this bother to return anything since it can only return true?

> +}
> +
> +
> /*
> * Sets up the vmcs for emulated real mode.
> */
> @@ -4439,6 +4486,11 @@ static int vmx_vcpu_setup(struct vcpu_vmx *vmx)
>
> vmcs_write64(POSTED_INTR_NV, POSTED_INTR_VECTOR);
> vmcs_write64(POSTED_INTR_DESC_ADDR, __pa((&vmx->pi_desc)));
> +
> + if (!pi_desc_init(vmx)) {

And therefore this cannot happen.

> + printk(KERN_ERR "Initialize PI descriptor error!\n");
> + return 1;

This is the wrong error anyway, vmx_create_vcpu() returns ERR_PTR(1)
which fails the reverse IS_ERR()

Thanks,
Alex

> + }
> }
>
> if (ple_gap) {


2014-11-10 21:58:15

by Alex Williamson

[permalink] [raw]
Subject: Re: [PATCH 05/13] KVM: Update IRTE according to guest interrupt configuration changes

On Mon, 2014-11-10 at 14:26 +0800, Feng Wu wrote:
> When guest changes its interrupt configuration (such as, vector, etc.)
> for direct-assigned devices, we need to update the associated IRTE
> with the new guest vector, so external interrupts from the assigned
> devices can be injected to guests without VM-Exit.
>
> The current method of handling guest lowest priority interrtups
> is to use a counter 'apic_arb_prio' for each VCPU, we choose the
> VCPU with smallest 'apic_arb_prio' and then increase it by 1.
> However, for VT-d PI, we cannot re-use this, since we no longer
> have control to 'apic_arb_prio' with posted interrupt direct
> delivery by Hardware.
>
> Here, we introduce a similiar way with 'apic_arb_prio' to handle
> guest lowest priority interrtups when VT-d PI is used. Here is the
> ideas:
> - Each VCPU has a counter 'round_robin_counter'.
> - When guests sets an interrupts to lowest priority, we choose
> the VCPU with smallest 'round_robin_counter' as the destination,
> then increase it.
>
> Signed-off-by: Feng Wu <[email protected]>
> ---
> arch/x86/include/asm/irq_remapping.h | 6 ++
> arch/x86/include/asm/kvm_host.h | 2 +
> arch/x86/kvm/vmx.c | 12 +++
> arch/x86/kvm/x86.c | 11 +++
> drivers/iommu/amd_iommu.c | 6 ++
> drivers/iommu/intel_irq_remapping.c | 28 +++++++
> drivers/iommu/irq_remapping.c | 9 ++
> drivers/iommu/irq_remapping.h | 3 +
> include/linux/dmar.h | 26 ++++++
> include/linux/kvm_host.h | 22 +++++
> include/uapi/linux/kvm.h | 1 +
> virt/kvm/assigned-dev.c | 141 ++++++++++++++++++++++++++++++++++
> virt/kvm/irq_comm.c | 4 +-
> virt/kvm/irqchip.c | 11 ---
> 14 files changed, 269 insertions(+), 13 deletions(-)
>
> diff --git a/arch/x86/include/asm/irq_remapping.h b/arch/x86/include/asm/irq_remapping.h
> index a3cc437..32d6cc4 100644
> --- a/arch/x86/include/asm/irq_remapping.h
> +++ b/arch/x86/include/asm/irq_remapping.h
> @@ -51,6 +51,7 @@ extern void compose_remapped_msi_msg(struct pci_dev *pdev,
> unsigned int irq, unsigned int dest,
> struct msi_msg *msg, u8 hpet_id);
> extern int setup_hpet_msi_remapped(unsigned int irq, unsigned int id);
> +extern int update_pi_irte(unsigned int irq, u64 pi_desc_addr, u32 vector);
> extern void panic_if_irq_remap(const char *msg);
> extern bool setup_remapped_irq(int irq,
> struct irq_cfg *cfg,
> @@ -88,6 +89,11 @@ static inline int setup_hpet_msi_remapped(unsigned int irq, unsigned int id)
> return -ENODEV;
> }
>
> +static inline int update_pi_irte(unsigned int irq, u64 pi_desc_addr, u32 vector)
> +{
> + return -ENODEV;
> +}
> +
> static inline void panic_if_irq_remap(const char *msg)
> {
> }
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 6ed0c30..0630161 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -358,6 +358,7 @@ struct kvm_vcpu_arch {
> struct kvm_lapic *apic; /* kernel irqchip context */
> unsigned long apic_attention;
> int32_t apic_arb_prio;
> + int32_t round_robin_counter;
> int mp_state;
> u64 ia32_misc_enable_msr;
> bool tpr_access_reporting;
> @@ -771,6 +772,7 @@ struct kvm_x86_ops {
> int (*check_nested_events)(struct kvm_vcpu *vcpu, bool external_intr);
>
> void (*sched_in)(struct kvm_vcpu *kvm, int cpu);
> + u64 (*get_pi_desc_addr)(struct kvm_vcpu *vcpu);
> };
>
> struct kvm_arch_async_pf {
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index a4670d3..ae91b72 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -544,6 +544,11 @@ static inline struct vcpu_vmx *to_vmx(struct kvm_vcpu *vcpu)
> return container_of(vcpu, struct vcpu_vmx, vcpu);
> }
>
> +struct pi_desc *vcpu_to_pi_desc(struct kvm_vcpu *vcpu)
> +{
> + return &(to_vmx(vcpu)->pi_desc);
> +}
> +
> #define VMCS12_OFFSET(x) offsetof(struct vmcs12, x)
> #define FIELD(number, name) [number] = VMCS12_OFFSET(name)
> #define FIELD64(number, name) [number] = VMCS12_OFFSET(name), \
> @@ -4280,6 +4285,11 @@ static void vmx_sync_pir_to_irr_dummy(struct kvm_vcpu *vcpu)
> return;
> }
>
> +static u64 vmx_get_pi_desc_addr(struct kvm_vcpu *vcpu)
> +{
> + return __pa((u64)vcpu_to_pi_desc(vcpu));
> +}
> +
> /*
> * Set up the vmcs's constant host-state fields, i.e., host-state fields that
> * will not change in the lifetime of the guest.
> @@ -9232,6 +9242,8 @@ static struct kvm_x86_ops vmx_x86_ops = {
> .check_nested_events = vmx_check_nested_events,
>
> .sched_in = vmx_sched_in,
> +
> + .get_pi_desc_addr = vmx_get_pi_desc_addr,
> };
>
> static int __init vmx_init(void)
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index b447a98..0c19d15 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -7735,6 +7735,17 @@ bool kvm_arch_has_noncoherent_dma(struct kvm *kvm)
> }
> EXPORT_SYMBOL_GPL(kvm_arch_has_noncoherent_dma);
>
> +int kvm_update_pi_irte_common(struct kvm *kvm, struct kvm_vcpu *vcpu,
> + u32 guest_vector, int host_irq)
> +{
> + u64 pi_desc_addr = kvm_x86_ops->get_pi_desc_addr(vcpu);
> +
> + if (update_pi_irte(host_irq, pi_desc_addr, guest_vector))
> + return -1;
> +
> + return 0;
> +}
> +
> EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_exit);
> EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_inj_virq);
> EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_page_fault);
> diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
> index 505a9ad..a36fdc7 100644
> --- a/drivers/iommu/amd_iommu.c
> +++ b/drivers/iommu/amd_iommu.c
> @@ -4280,6 +4280,11 @@ static int alloc_hpet_msi(unsigned int irq, unsigned int id)
> return 0;
> }
>
> +static int dummy_update_pi_irte(int irq, u64 pi_desc_addr, u32 vector)
> +{
> + return -EINVAL;
> +}
> +
> struct irq_remap_ops amd_iommu_irq_ops = {
> .supported = amd_iommu_supported,
> .prepare = amd_iommu_prepare,
> @@ -4294,5 +4299,6 @@ struct irq_remap_ops amd_iommu_irq_ops = {
> .msi_alloc_irq = msi_alloc_irq,
> .msi_setup_irq = msi_setup_irq,
> .alloc_hpet_msi = alloc_hpet_msi,
> + .update_pi_irte = dummy_update_pi_irte,
> };
> #endif
> diff --git a/drivers/iommu/intel_irq_remapping.c b/drivers/iommu/intel_irq_remapping.c
> index 776da10..87c02fe 100644
> --- a/drivers/iommu/intel_irq_remapping.c
> +++ b/drivers/iommu/intel_irq_remapping.c
> @@ -1172,6 +1172,33 @@ static int intel_alloc_hpet_msi(unsigned int irq, unsigned int id)
> return ret;
> }
>
> +static int intel_update_pi_irte(int irq, u64 pi_desc_addr, u32 vector)
> +{
> + struct irte irte;
> +
> + if (get_irte(irq, &irte))
> + return -1;
> +
> + irte.irq_post_low.urg = 0;
> + irte.irq_post_low.vector = vector;
> + irte.irq_post_low.pda_l = (pi_desc_addr >> (32 - PDA_LOW_BIT)) &
> + ~(-1UL << PDA_LOW_BIT);
> + irte.irq_post_high.pda_h = (pi_desc_addr >> 32) &
> + ~(-1UL << PDA_HIGH_BIT);
> +
> + irte.irq_post_low.__reserved_1 = 0;
> + irte.irq_post_low.__reserved_2 = 0;
> + irte.irq_post_low.__reserved_3 = 0;
> + irte.irq_post_high.__reserved_4 = 0;
> +
> + irte.irq_post_low.pst = 1;
> +
> + if (modify_irte(irq, &irte))
> + return -1;
> +
> + return 0;
> +}
> +
> struct irq_remap_ops intel_irq_remap_ops = {
> .supported = intel_irq_remapping_supported,
> .prepare = dmar_table_init,
> @@ -1186,4 +1213,5 @@ struct irq_remap_ops intel_irq_remap_ops = {
> .msi_alloc_irq = intel_msi_alloc_irq,
> .msi_setup_irq = intel_msi_setup_irq,
> .alloc_hpet_msi = intel_alloc_hpet_msi,
> + .update_pi_irte = intel_update_pi_irte,

Extending irq_remap_ops should really be a separate patch from it's use
by KVM.

> };
> diff --git a/drivers/iommu/irq_remapping.c b/drivers/iommu/irq_remapping.c
> index 2f8ee00..0e36860 100644
> --- a/drivers/iommu/irq_remapping.c
> +++ b/drivers/iommu/irq_remapping.c
> @@ -362,6 +362,15 @@ int setup_hpet_msi_remapped(unsigned int irq, unsigned int id)
> return default_setup_hpet_msi(irq, id);
> }
>
> +int update_pi_irte(unsigned int irq, u64 pi_desc_addr, u32 vector)
> +{
> + if (!remap_ops || !remap_ops->update_pi_irte)
> + return -ENODEV;
> +
> + return remap_ops->update_pi_irte(irq, pi_desc_addr, vector);
> +}
> +EXPORT_SYMBOL_GPL(update_pi_irte);
> +
> void panic_if_irq_remap(const char *msg)
> {
> if (irq_remapping_enabled)
> diff --git a/drivers/iommu/irq_remapping.h b/drivers/iommu/irq_remapping.h
> index 7bb5913..2d8f740 100644
> --- a/drivers/iommu/irq_remapping.h
> +++ b/drivers/iommu/irq_remapping.h
> @@ -84,6 +84,9 @@ struct irq_remap_ops {
>
> /* Setup interrupt remapping for an HPET MSI */
> int (*alloc_hpet_msi)(unsigned int, unsigned int);
> +
> + /* Update IRTE for posted-interrupt */
> + int (*update_pi_irte)(int irq, u64 pi_desc_addr, u32 vector);
> };
>
> extern struct irq_remap_ops intel_irq_remap_ops;
> diff --git a/include/linux/dmar.h b/include/linux/dmar.h
> index 8be5d42..e1ff4f7 100644
> --- a/include/linux/dmar.h
> +++ b/include/linux/dmar.h
> @@ -160,6 +160,20 @@ struct irte {
> __reserved_2 : 8,
> dest_id : 32;
> } irq_remap_low;
> +
> + struct {
> + __u64 present : 1,
> + fpd : 1,
> + __reserved_1 : 6,
> + avail : 4,
> + __reserved_2 : 2,
> + urg : 1,
> + pst : 1,
> + vector : 8,
> + __reserved_3 : 14,
> + pda_l : 26;
> + } irq_post_low;
> +
> __u64 low;
> };
>
> @@ -170,10 +184,22 @@ struct irte {
> svt : 2,
> __reserved_3 : 44;
> } irq_remap_high;
> +
> + struct {
> + __u64 sid: 16,
> + sq: 2,
> + svt: 2,
> + __reserved_4: 12,
> + pda_h: 32;
> + } irq_post_high;
> +
> __u64 high;
> };
> };
>
> +#define PDA_LOW_BIT 26
> +#define PDA_HIGH_BIT 32
> +
> enum {
> IRQ_REMAP_XAPIC_MODE,
> IRQ_REMAP_X2APIC_MODE,
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index ea53b04..6bb8287 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -335,6 +335,25 @@ struct kvm_kernel_irq_routing_entry {
> struct hlist_node link;
> };
>
> +#ifdef CONFIG_HAVE_KVM_IRQ_ROUTING
> +
> +struct kvm_irq_routing_table {
> + int chip[KVM_NR_IRQCHIPS][KVM_IRQCHIP_NUM_PINS];
> + struct kvm_kernel_irq_routing_entry *rt_entries;
> + u32 nr_rt_entries;
> + /*
> + * Array indexed by gsi. Each entry contains list of irq chips
> + * the gsi is connected to.
> + */
> + struct hlist_head map[0];
> +};
> +
> +#else
> +
> +struct kvm_irq_routing_table {};
> +
> +#endif
> +
> #ifndef KVM_PRIVATE_MEM_SLOTS
> #define KVM_PRIVATE_MEM_SLOTS 0
> #endif
> @@ -766,6 +785,9 @@ void kvm_unregister_irq_ack_notifier(struct kvm *kvm,
> struct kvm_irq_ack_notifier *kian);
> int kvm_request_irq_source_id(struct kvm *kvm);
> void kvm_free_irq_source_id(struct kvm *kvm, int irq_source_id);
> +void kvm_set_msi_irq(struct kvm_kernel_irq_routing_entry *e,
> + struct kvm_lapic_irq *irq);
> +bool kvm_is_dm_lowest_prio(struct kvm_lapic_irq *irq);
>
> #ifdef CONFIG_KVM_DEVICE_ASSIGNMENT
> int kvm_iommu_map_pages(struct kvm *kvm, struct kvm_memory_slot *slot);
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 7593c52..509223a 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1027,6 +1027,7 @@ struct kvm_s390_ucas_mapping {
> #define KVM_XEN_HVM_CONFIG _IOW(KVMIO, 0x7a, struct kvm_xen_hvm_config)
> #define KVM_SET_CLOCK _IOW(KVMIO, 0x7b, struct kvm_clock_data)
> #define KVM_GET_CLOCK _IOR(KVMIO, 0x7c, struct kvm_clock_data)
> +#define KVM_ASSIGN_DEV_PI_UPDATE _IOR(KVMIO, 0x7d, __u32)
> /* Available with KVM_CAP_PIT_STATE2 */
> #define KVM_GET_PIT2 _IOR(KVMIO, 0x9f, struct kvm_pit_state2)
> #define KVM_SET_PIT2 _IOW(KVMIO, 0xa0, struct kvm_pit_state2)

Needs an accompanying Documentation/virtual/kvm/api.txt update.

> diff --git a/virt/kvm/assigned-dev.c b/virt/kvm/assigned-dev.c
> index e05000e..e154009 100644
> --- a/virt/kvm/assigned-dev.c
> +++ b/virt/kvm/assigned-dev.c


Since legacy KVM device assignment is effectively deprecated, have you
considered how we might do this with VFIO? Thanks,

Alex


> @@ -326,6 +326,135 @@ void kvm_free_all_assigned_devices(struct kvm *kvm)
> }
> }
>
> +int __weak kvm_update_pi_irte_common(struct kvm *kvm, struct kvm_vcpu *vcpu,
> + u32 guest_vector, int host_irq)
> +{
> + return 0;
> +}
> +
> +int kvm_compare_rr_counter(struct kvm_vcpu *vcpu1, struct kvm_vcpu *vcpu2)
> +{
> + return vcpu1->arch.round_robin_counter -
> + vcpu2->arch.round_robin_counter;
> +}
> +
> +bool kvm_pi_find_dest_vcpu(struct kvm *kvm, struct kvm_lapic_irq *irq,
> + struct kvm_vcpu **dest_vcpu)
> +{
> + int i, r = 0;
> + struct kvm_vcpu *vcpu, *dest = NULL;
> +
> + kvm_for_each_vcpu(i, vcpu, kvm) {
> + if (!kvm_apic_present(vcpu))
> + continue;
> +
> + if (!kvm_apic_match_dest(vcpu, NULL, irq->shorthand,
> + irq->dest_id, irq->dest_mode))
> + continue;
> +
> + if (!kvm_is_dm_lowest_prio(irq)) {
> + r++;
> + *dest_vcpu = vcpu;
> + } else if (kvm_lapic_enabled(vcpu)) {
> + if (!dest)
> + dest = vcpu;
> + else if (kvm_compare_rr_counter(vcpu, dest) < 0)
> + dest = vcpu;
> + }
> + }
> +
> + if (dest) {
> + dest->arch.round_robin_counter++;
> + *dest_vcpu = dest;
> + return true;
> + } else if (r == 1)
> + return true;
> +
> + return false;
> +}
> +
> +static int __kvm_update_pi_irte(struct kvm *kvm, int host_irq, int guest_irq)
> +{
> + struct kvm_kernel_irq_routing_entry *e;
> + struct kvm_irq_routing_table *irq_rt;
> + struct kvm_lapic_irq irq;
> + struct kvm_vcpu *vcpu;
> + int idx, ret = -EINVAL;
> +
> + idx = srcu_read_lock(&kvm->irq_srcu);
> + irq_rt = srcu_dereference(kvm->irq_routing, &kvm->irq_srcu);
> + ASSERT(guest_irq < irq_rt->nr_rt_entries);
> +
> + hlist_for_each_entry(e, &irq_rt->map[guest_irq], link) {
> + if (e->type != KVM_IRQ_ROUTING_MSI)
> + continue;
> + /*
> + * VT-d posted-interrupt has the following
> + * limitations:
> + * - No support for posting multicast/broadcast
> + * interrupts to a VCPU
> + * Still use interrupt remapping for these
> + * kind of interrupts
> + */
> +
> + kvm_set_msi_irq(e, &irq);
> + if (!kvm_pi_find_dest_vcpu(kvm, &irq, &vcpu)) {
> + printk(KERN_INFO "%s: can not find the target VCPU\n",
> + __func__);
> + ret = -EINVAL;
> + goto out;
> + }
> +
> + if (kvm_update_pi_irte_common(kvm, vcpu, irq.vector,
> + host_irq)) {
> + printk(KERN_INFO "%s: failed to update PI IRTE\n",
> + __func__);
> + ret = -EINVAL;
> + goto out;
> + }
> + }
> +
> + ret = 0;
> +out:
> + srcu_read_unlock(&kvm->irq_srcu, idx);
> + return ret;
> +}
> +
> +int kvm_update_pi_irte(struct kvm *kvm, u32 dev_id)
> +{
> + int i, rc = -1;
> + struct kvm_assigned_dev_kernel *dev;
> +
> + mutex_lock(&kvm->lock);
> + dev = kvm_find_assigned_dev(&kvm->arch.assigned_dev_head, dev_id);
> + if (!dev) {
> + printk(KERN_INFO "%s: cannot find the assigned dev.\n",
> + __func__);
> + rc = -1;
> + goto out;
> + }
> +
> + BUG_ON(dev->irq_requested_type == 0);
> +
> + if ((dev->irq_requested_type & KVM_DEV_IRQ_HOST_MSI) &&
> + (dev->dev->msi_enabled == 1)) {
> + __kvm_update_pi_irte(kvm,
> + dev->host_irq, dev->guest_irq);
> + } else if ((dev->irq_requested_type & KVM_DEV_IRQ_HOST_MSIX) &&
> + (dev->dev->msix_enabled == 1)) {
> + for (i = 0; i < dev->entries_nr; i++) {
> + __kvm_update_pi_irte(kvm,
> + dev->host_msix_entries[i].vector,
> + dev->guest_msix_entries[i].vector);
> + }
> + }
> +
> +out:
> + rc = 0;
> + mutex_unlock(&kvm->lock);
> + return rc;
> +}
> +
> static int assigned_device_enable_host_intx(struct kvm *kvm,
> struct kvm_assigned_dev_kernel *dev)
> {
> @@ -1017,6 +1146,18 @@ long kvm_vm_ioctl_assigned_device(struct kvm *kvm, unsigned ioctl,
> r = kvm_vm_ioctl_set_pci_irq_mask(kvm, &assigned_dev);
> break;
> }
> + case KVM_ASSIGN_DEV_PI_UPDATE: {
> + u32 dev_id;
> +
> + r = -EFAULT;
> + if (copy_from_user(&dev_id, argp, sizeof(dev_id)))
> + goto out;
> + r = kvm_update_pi_irte(kvm, dev_id);
> + if (r)
> + goto out;
> + break;
> +
> + }
> default:
> r = -ENOTTY;
> break;
> diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c
> index 963b899..f51aed3 100644
> --- a/virt/kvm/irq_comm.c
> +++ b/virt/kvm/irq_comm.c
> @@ -55,7 +55,7 @@ static int kvm_set_ioapic_irq(struct kvm_kernel_irq_routing_entry *e,
> line_status);
> }
>
> -inline static bool kvm_is_dm_lowest_prio(struct kvm_lapic_irq *irq)
> +bool kvm_is_dm_lowest_prio(struct kvm_lapic_irq *irq)
> {
> #ifdef CONFIG_IA64
> return irq->delivery_mode ==
> @@ -106,7 +106,7 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src,
> return r;
> }
>
> -static inline void kvm_set_msi_irq(struct kvm_kernel_irq_routing_entry *e,
> +void kvm_set_msi_irq(struct kvm_kernel_irq_routing_entry *e,
> struct kvm_lapic_irq *irq)
> {
> trace_kvm_msi_set_irq(e->msi.address_lo, e->msi.data);
> diff --git a/virt/kvm/irqchip.c b/virt/kvm/irqchip.c
> index 7f256f3..cdf29a6 100644
> --- a/virt/kvm/irqchip.c
> +++ b/virt/kvm/irqchip.c
> @@ -31,17 +31,6 @@
> #include <trace/events/kvm.h>
> #include "irq.h"
>
> -struct kvm_irq_routing_table {
> - int chip[KVM_NR_IRQCHIPS][KVM_IRQCHIP_NUM_PINS];
> - struct kvm_kernel_irq_routing_entry *rt_entries;
> - u32 nr_rt_entries;
> - /*
> - * Array indexed by gsi. Each entry contains list of irq chips
> - * the gsi is connected to.
> - */
> - struct hlist_head map[0];
> -};
> -
> int kvm_irq_map_gsi(struct kvm *kvm,
> struct kvm_kernel_irq_routing_entry *entries, int gsi)
> {


2014-11-10 21:58:22

by Alex Williamson

[permalink] [raw]
Subject: Re: [PATCH 13/13] iommu/vt-d: Add a command line parameter for VT-d posted-interrupts

On Mon, 2014-11-10 at 14:26 +0800, Feng Wu wrote:
> Enable VT-d Posted-Interrtups and add a command line
> parameter for it.
>
> Signed-off-by: Feng Wu <[email protected]>
> ---
> drivers/iommu/irq_remapping.c | 9 ++++++++-
> 1 files changed, 8 insertions(+), 1 deletions(-)
>
> diff --git a/drivers/iommu/irq_remapping.c b/drivers/iommu/irq_remapping.c
> index 0e36860..3cb9429 100644
> --- a/drivers/iommu/irq_remapping.c
> +++ b/drivers/iommu/irq_remapping.c
> @@ -23,7 +23,7 @@ int irq_remap_broken;
> int disable_sourceid_checking;
> int no_x2apic_optout;
>
> -int disable_irq_post = 1;
> +int disable_irq_post = 0;
> int irq_post_enabled = 0;
> EXPORT_SYMBOL_GPL(irq_post_enabled);
>
> @@ -206,6 +206,13 @@ static __init int setup_irqremap(char *str)
> }
> early_param("intremap", setup_irqremap);
>
> +static __init int setup_nointpost(char *str)
> +{
> + disable_irq_post = 1;
> + return 0;
> +}
> +early_param("nointpost", setup_nointpost);
> +

Documentation/kernel-parameters.txt? Thanks,

Alex

> void __init setup_irq_remapping_ops(void)
> {
> remap_ops = &intel_irq_remap_ops;


2014-11-11 02:28:53

by Jiang Liu

[permalink] [raw]
Subject: Re: several messages

On 2014/11/11 2:15, Thomas Gleixner wrote:
> On Mon, 10 Nov 2014, Feng Wu wrote:
>
>> VT-d Posted-Interrupts is an enhancement to CPU side Posted-Interrupt.
>> With VT-d Posted-Interrupts enabled, external interrupts from
>> direct-assigned devices can be delivered to guests without VMM
>> intervention when guest is running in non-root mode.
>
> Can you please talk to Jiang and synchronize your work with his
> refactoring of the x86 interrupt handling subsystem.
>
> I want this stuff cleaned up first before we add new stuff to it.
Hi Thomas,
Just talked with Feng, we will focused on refactor first and
then add posted interrupt support.
Regards!
Gerry

>
> Thanks,
>
> tglx
>

2014-11-11 06:40:08

by Wu, Feng

[permalink] [raw]
Subject: RE: several messages



> -----Original Message-----
> From: Jiang Liu [mailto:[email protected]]
> Sent: Tuesday, November 11, 2014 10:29 AM
> To: Thomas Gleixner; Wu, Feng
> Cc: [email protected]; [email protected]; David Woodhouse;
> [email protected]; [email protected]; H. Peter Anvin; [email protected];
> [email protected]; [email protected]; LKML
> Subject: Re: several messages
>
> On 2014/11/11 2:15, Thomas Gleixner wrote:
> > On Mon, 10 Nov 2014, Feng Wu wrote:
> >
> >> VT-d Posted-Interrupts is an enhancement to CPU side Posted-Interrupt.
> >> With VT-d Posted-Interrupts enabled, external interrupts from
> >> direct-assigned devices can be delivered to guests without VMM
> >> intervention when guest is running in non-root mode.
> >
> > Can you please talk to Jiang and synchronize your work with his
> > refactoring of the x86 interrupt handling subsystem.
> >
> > I want this stuff cleaned up first before we add new stuff to it.
> Hi Thomas,
> Just talked with Feng, we will focused on refactor first and
> then add posted interrupt support.
> Regards!
> Gerry

No problem!

Thanks,
Feng

>
> >
> > Thanks,
> >
> > tglx
> >

2014-11-11 09:21:20

by Wu, Feng

[permalink] [raw]
Subject: RE: [PATCH 05/13] KVM: Update IRTE according to guest interrupt configuration changes



> -----Original Message-----
> From: Alex Williamson [mailto:[email protected]]
> Sent: Tuesday, November 11, 2014 5:58 AM
> To: Wu, Feng
> Cc: [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]
> Subject: Re: [PATCH 05/13] KVM: Update IRTE according to guest interrupt
> configuration changes
>
> On Mon, 2014-11-10 at 14:26 +0800, Feng Wu wrote:
> > When guest changes its interrupt configuration (such as, vector, etc.)
> > for direct-assigned devices, we need to update the associated IRTE
> > with the new guest vector, so external interrupts from the assigned
> > devices can be injected to guests without VM-Exit.
> >
> > The current method of handling guest lowest priority interrtups
> > is to use a counter 'apic_arb_prio' for each VCPU, we choose the
> > VCPU with smallest 'apic_arb_prio' and then increase it by 1.
> > However, for VT-d PI, we cannot re-use this, since we no longer
> > have control to 'apic_arb_prio' with posted interrupt direct
> > delivery by Hardware.
> >
> > Here, we introduce a similiar way with 'apic_arb_prio' to handle
> > guest lowest priority interrtups when VT-d PI is used. Here is the
> > ideas:
> > - Each VCPU has a counter 'round_robin_counter'.
> > - When guests sets an interrupts to lowest priority, we choose
> > the VCPU with smallest 'round_robin_counter' as the destination,
> > then increase it.
> >
> > Signed-off-by: Feng Wu <[email protected]>
> > ---
> > arch/x86/include/asm/irq_remapping.h | 6 ++
> > arch/x86/include/asm/kvm_host.h | 2 +
> > arch/x86/kvm/vmx.c | 12 +++
> > arch/x86/kvm/x86.c | 11 +++
> > drivers/iommu/amd_iommu.c | 6 ++
> > drivers/iommu/intel_irq_remapping.c | 28 +++++++
> > drivers/iommu/irq_remapping.c | 9 ++
> > drivers/iommu/irq_remapping.h | 3 +
> > include/linux/dmar.h | 26 ++++++
> > include/linux/kvm_host.h | 22 +++++
> > include/uapi/linux/kvm.h | 1 +
> > virt/kvm/assigned-dev.c | 141
> ++++++++++++++++++++++++++++++++++
> > virt/kvm/irq_comm.c | 4 +-
> > virt/kvm/irqchip.c | 11 ---
> > 14 files changed, 269 insertions(+), 13 deletions(-)
> >
> > diff --git a/arch/x86/include/asm/irq_remapping.h
> b/arch/x86/include/asm/irq_remapping.h
> > index a3cc437..32d6cc4 100644
> > --- a/arch/x86/include/asm/irq_remapping.h
> > +++ b/arch/x86/include/asm/irq_remapping.h
> > @@ -51,6 +51,7 @@ extern void compose_remapped_msi_msg(struct
> pci_dev *pdev,
> > unsigned int irq, unsigned int dest,
> > struct msi_msg *msg, u8 hpet_id);
> > extern int setup_hpet_msi_remapped(unsigned int irq, unsigned int id);
> > +extern int update_pi_irte(unsigned int irq, u64 pi_desc_addr, u32 vector);
> > extern void panic_if_irq_remap(const char *msg);
> > extern bool setup_remapped_irq(int irq,
> > struct irq_cfg *cfg,
> > @@ -88,6 +89,11 @@ static inline int setup_hpet_msi_remapped(unsigned
> int irq, unsigned int id)
> > return -ENODEV;
> > }
> >
> > +static inline int update_pi_irte(unsigned int irq, u64 pi_desc_addr, u32
> vector)
> > +{
> > + return -ENODEV;
> > +}
> > +
> > static inline void panic_if_irq_remap(const char *msg)
> > {
> > }
> > diff --git a/arch/x86/include/asm/kvm_host.h
> b/arch/x86/include/asm/kvm_host.h
> > index 6ed0c30..0630161 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -358,6 +358,7 @@ struct kvm_vcpu_arch {
> > struct kvm_lapic *apic; /* kernel irqchip context */
> > unsigned long apic_attention;
> > int32_t apic_arb_prio;
> > + int32_t round_robin_counter;
> > int mp_state;
> > u64 ia32_misc_enable_msr;
> > bool tpr_access_reporting;
> > @@ -771,6 +772,7 @@ struct kvm_x86_ops {
> > int (*check_nested_events)(struct kvm_vcpu *vcpu, bool external_intr);
> >
> > void (*sched_in)(struct kvm_vcpu *kvm, int cpu);
> > + u64 (*get_pi_desc_addr)(struct kvm_vcpu *vcpu);
> > };
> >
> > struct kvm_arch_async_pf {
> > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> > index a4670d3..ae91b72 100644
> > --- a/arch/x86/kvm/vmx.c
> > +++ b/arch/x86/kvm/vmx.c
> > @@ -544,6 +544,11 @@ static inline struct vcpu_vmx *to_vmx(struct
> kvm_vcpu *vcpu)
> > return container_of(vcpu, struct vcpu_vmx, vcpu);
> > }
> >
> > +struct pi_desc *vcpu_to_pi_desc(struct kvm_vcpu *vcpu)
> > +{
> > + return &(to_vmx(vcpu)->pi_desc);
> > +}
> > +
> > #define VMCS12_OFFSET(x) offsetof(struct vmcs12, x)
> > #define FIELD(number, name) [number] = VMCS12_OFFSET(name)
> > #define FIELD64(number, name) [number] = VMCS12_OFFSET(name), \
> > @@ -4280,6 +4285,11 @@ static void vmx_sync_pir_to_irr_dummy(struct
> kvm_vcpu *vcpu)
> > return;
> > }
> >
> > +static u64 vmx_get_pi_desc_addr(struct kvm_vcpu *vcpu)
> > +{
> > + return __pa((u64)vcpu_to_pi_desc(vcpu));
> > +}
> > +
> > /*
> > * Set up the vmcs's constant host-state fields, i.e., host-state fields that
> > * will not change in the lifetime of the guest.
> > @@ -9232,6 +9242,8 @@ static struct kvm_x86_ops vmx_x86_ops = {
> > .check_nested_events = vmx_check_nested_events,
> >
> > .sched_in = vmx_sched_in,
> > +
> > + .get_pi_desc_addr = vmx_get_pi_desc_addr,
> > };
> >
> > static int __init vmx_init(void)
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index b447a98..0c19d15 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -7735,6 +7735,17 @@ bool kvm_arch_has_noncoherent_dma(struct
> kvm *kvm)
> > }
> > EXPORT_SYMBOL_GPL(kvm_arch_has_noncoherent_dma);
> >
> > +int kvm_update_pi_irte_common(struct kvm *kvm, struct kvm_vcpu *vcpu,
> > + u32 guest_vector, int host_irq)
> > +{
> > + u64 pi_desc_addr = kvm_x86_ops->get_pi_desc_addr(vcpu);
> > +
> > + if (update_pi_irte(host_irq, pi_desc_addr, guest_vector))
> > + return -1;
> > +
> > + return 0;
> > +}
> > +
> > EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_exit);
> > EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_inj_virq);
> > EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_page_fault);
> > diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
> > index 505a9ad..a36fdc7 100644
> > --- a/drivers/iommu/amd_iommu.c
> > +++ b/drivers/iommu/amd_iommu.c
> > @@ -4280,6 +4280,11 @@ static int alloc_hpet_msi(unsigned int irq,
> unsigned int id)
> > return 0;
> > }
> >
> > +static int dummy_update_pi_irte(int irq, u64 pi_desc_addr, u32 vector)
> > +{
> > + return -EINVAL;
> > +}
> > +
> > struct irq_remap_ops amd_iommu_irq_ops = {
> > .supported = amd_iommu_supported,
> > .prepare = amd_iommu_prepare,
> > @@ -4294,5 +4299,6 @@ struct irq_remap_ops amd_iommu_irq_ops = {
> > .msi_alloc_irq = msi_alloc_irq,
> > .msi_setup_irq = msi_setup_irq,
> > .alloc_hpet_msi = alloc_hpet_msi,
> > + .update_pi_irte = dummy_update_pi_irte,
> > };
> > #endif
> > diff --git a/drivers/iommu/intel_irq_remapping.c
> b/drivers/iommu/intel_irq_remapping.c
> > index 776da10..87c02fe 100644
> > --- a/drivers/iommu/intel_irq_remapping.c
> > +++ b/drivers/iommu/intel_irq_remapping.c
> > @@ -1172,6 +1172,33 @@ static int intel_alloc_hpet_msi(unsigned int irq,
> unsigned int id)
> > return ret;
> > }
> >
> > +static int intel_update_pi_irte(int irq, u64 pi_desc_addr, u32 vector)
> > +{
> > + struct irte irte;
> > +
> > + if (get_irte(irq, &irte))
> > + return -1;
> > +
> > + irte.irq_post_low.urg = 0;
> > + irte.irq_post_low.vector = vector;
> > + irte.irq_post_low.pda_l = (pi_desc_addr >> (32 - PDA_LOW_BIT)) &
> > + ~(-1UL << PDA_LOW_BIT);
> > + irte.irq_post_high.pda_h = (pi_desc_addr >> 32) &
> > + ~(-1UL << PDA_HIGH_BIT);
> > +
> > + irte.irq_post_low.__reserved_1 = 0;
> > + irte.irq_post_low.__reserved_2 = 0;
> > + irte.irq_post_low.__reserved_3 = 0;
> > + irte.irq_post_high.__reserved_4 = 0;
> > +
> > + irte.irq_post_low.pst = 1;
> > +
> > + if (modify_irte(irq, &irte))
> > + return -1;
> > +
> > + return 0;
> > +}
> > +
> > struct irq_remap_ops intel_irq_remap_ops = {
> > .supported = intel_irq_remapping_supported,
> > .prepare = dmar_table_init,
> > @@ -1186,4 +1213,5 @@ struct irq_remap_ops intel_irq_remap_ops = {
> > .msi_alloc_irq = intel_msi_alloc_irq,
> > .msi_setup_irq = intel_msi_setup_irq,
> > .alloc_hpet_msi = intel_alloc_hpet_msi,
> > + .update_pi_irte = intel_update_pi_irte,
>
> Extending irq_remap_ops should really be a separate patch from it's use
> by KVM.

Will do.

>
> > };
> > diff --git a/drivers/iommu/irq_remapping.c b/drivers/iommu/irq_remapping.c
> > index 2f8ee00..0e36860 100644
> > --- a/drivers/iommu/irq_remapping.c
> > +++ b/drivers/iommu/irq_remapping.c
> > @@ -362,6 +362,15 @@ int setup_hpet_msi_remapped(unsigned int irq,
> unsigned int id)
> > return default_setup_hpet_msi(irq, id);
> > }
> >
> > +int update_pi_irte(unsigned int irq, u64 pi_desc_addr, u32 vector)
> > +{
> > + if (!remap_ops || !remap_ops->update_pi_irte)
> > + return -ENODEV;
> > +
> > + return remap_ops->update_pi_irte(irq, pi_desc_addr, vector);
> > +}
> > +EXPORT_SYMBOL_GPL(update_pi_irte);
> > +
> > void panic_if_irq_remap(const char *msg)
> > {
> > if (irq_remapping_enabled)
> > diff --git a/drivers/iommu/irq_remapping.h b/drivers/iommu/irq_remapping.h
> > index 7bb5913..2d8f740 100644
> > --- a/drivers/iommu/irq_remapping.h
> > +++ b/drivers/iommu/irq_remapping.h
> > @@ -84,6 +84,9 @@ struct irq_remap_ops {
> >
> > /* Setup interrupt remapping for an HPET MSI */
> > int (*alloc_hpet_msi)(unsigned int, unsigned int);
> > +
> > + /* Update IRTE for posted-interrupt */
> > + int (*update_pi_irte)(int irq, u64 pi_desc_addr, u32 vector);
> > };
> >
> > extern struct irq_remap_ops intel_irq_remap_ops;
> > diff --git a/include/linux/dmar.h b/include/linux/dmar.h
> > index 8be5d42..e1ff4f7 100644
> > --- a/include/linux/dmar.h
> > +++ b/include/linux/dmar.h
> > @@ -160,6 +160,20 @@ struct irte {
> > __reserved_2 : 8,
> > dest_id : 32;
> > } irq_remap_low;
> > +
> > + struct {
> > + __u64 present : 1,
> > + fpd : 1,
> > + __reserved_1 : 6,
> > + avail : 4,
> > + __reserved_2 : 2,
> > + urg : 1,
> > + pst : 1,
> > + vector : 8,
> > + __reserved_3 : 14,
> > + pda_l : 26;
> > + } irq_post_low;
> > +
> > __u64 low;
> > };
> >
> > @@ -170,10 +184,22 @@ struct irte {
> > svt : 2,
> > __reserved_3 : 44;
> > } irq_remap_high;
> > +
> > + struct {
> > + __u64 sid: 16,
> > + sq: 2,
> > + svt: 2,
> > + __reserved_4: 12,
> > + pda_h: 32;
> > + } irq_post_high;
> > +
> > __u64 high;
> > };
> > };
> >
> > +#define PDA_LOW_BIT 26
> > +#define PDA_HIGH_BIT 32
> > +
> > enum {
> > IRQ_REMAP_XAPIC_MODE,
> > IRQ_REMAP_X2APIC_MODE,
> > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > index ea53b04..6bb8287 100644
> > --- a/include/linux/kvm_host.h
> > +++ b/include/linux/kvm_host.h
> > @@ -335,6 +335,25 @@ struct kvm_kernel_irq_routing_entry {
> > struct hlist_node link;
> > };
> >
> > +#ifdef CONFIG_HAVE_KVM_IRQ_ROUTING
> > +
> > +struct kvm_irq_routing_table {
> > + int chip[KVM_NR_IRQCHIPS][KVM_IRQCHIP_NUM_PINS];
> > + struct kvm_kernel_irq_routing_entry *rt_entries;
> > + u32 nr_rt_entries;
> > + /*
> > + * Array indexed by gsi. Each entry contains list of irq chips
> > + * the gsi is connected to.
> > + */
> > + struct hlist_head map[0];
> > +};
> > +
> > +#else
> > +
> > +struct kvm_irq_routing_table {};
> > +
> > +#endif
> > +
> > #ifndef KVM_PRIVATE_MEM_SLOTS
> > #define KVM_PRIVATE_MEM_SLOTS 0
> > #endif
> > @@ -766,6 +785,9 @@ void kvm_unregister_irq_ack_notifier(struct kvm
> *kvm,
> > struct kvm_irq_ack_notifier *kian);
> > int kvm_request_irq_source_id(struct kvm *kvm);
> > void kvm_free_irq_source_id(struct kvm *kvm, int irq_source_id);
> > +void kvm_set_msi_irq(struct kvm_kernel_irq_routing_entry *e,
> > + struct kvm_lapic_irq *irq);
> > +bool kvm_is_dm_lowest_prio(struct kvm_lapic_irq *irq);
> >
> > #ifdef CONFIG_KVM_DEVICE_ASSIGNMENT
> > int kvm_iommu_map_pages(struct kvm *kvm, struct kvm_memory_slot
> *slot);
> > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> > index 7593c52..509223a 100644
> > --- a/include/uapi/linux/kvm.h
> > +++ b/include/uapi/linux/kvm.h
> > @@ -1027,6 +1027,7 @@ struct kvm_s390_ucas_mapping {
> > #define KVM_XEN_HVM_CONFIG _IOW(KVMIO, 0x7a, struct
> kvm_xen_hvm_config)
> > #define KVM_SET_CLOCK _IOW(KVMIO, 0x7b, struct
> kvm_clock_data)
> > #define KVM_GET_CLOCK _IOR(KVMIO, 0x7c, struct
> kvm_clock_data)
> > +#define KVM_ASSIGN_DEV_PI_UPDATE _IOR(KVMIO, 0x7d, __u32)
> > /* Available with KVM_CAP_PIT_STATE2 */
> > #define KVM_GET_PIT2 _IOR(KVMIO, 0x9f, struct
> kvm_pit_state2)
> > #define KVM_SET_PIT2 _IOW(KVMIO, 0xa0, struct
> kvm_pit_state2)
>
> Needs an accompanying Documentation/virtual/kvm/api.txt update.
>
> > diff --git a/virt/kvm/assigned-dev.c b/virt/kvm/assigned-dev.c
> > index e05000e..e154009 100644
> > --- a/virt/kvm/assigned-dev.c
> > +++ b/virt/kvm/assigned-dev.c
>
>
> Since legacy KVM device assignment is effectively deprecated, have you
> considered how we might do this with VFIO? Thanks,
>
> Alex
>
I haven't thought about how to enable this in VFIO so far. I think I can continue to
implement that if needed after this patch set is finished. What do you think of this?

Thanks,
Feng


>
> > @@ -326,6 +326,135 @@ void kvm_free_all_assigned_devices(struct kvm
> *kvm)
> > }
> > }
> >
> > +int __weak kvm_update_pi_irte_common(struct kvm *kvm, struct kvm_vcpu
> *vcpu,
> > + u32 guest_vector, int host_irq)
> > +{
> > + return 0;
> > +}
> > +
> > +int kvm_compare_rr_counter(struct kvm_vcpu *vcpu1, struct kvm_vcpu
> *vcpu2)
> > +{
> > + return vcpu1->arch.round_robin_counter -
> > + vcpu2->arch.round_robin_counter;
> > +}
> > +
> > +bool kvm_pi_find_dest_vcpu(struct kvm *kvm, struct kvm_lapic_irq *irq,
> > + struct kvm_vcpu **dest_vcpu)
> > +{
> > + int i, r = 0;
> > + struct kvm_vcpu *vcpu, *dest = NULL;
> > +
> > + kvm_for_each_vcpu(i, vcpu, kvm) {
> > + if (!kvm_apic_present(vcpu))
> > + continue;
> > +
> > + if (!kvm_apic_match_dest(vcpu, NULL, irq->shorthand,
> > + irq->dest_id, irq->dest_mode))
> > + continue;
> > +
> > + if (!kvm_is_dm_lowest_prio(irq)) {
> > + r++;
> > + *dest_vcpu = vcpu;
> > + } else if (kvm_lapic_enabled(vcpu)) {
> > + if (!dest)
> > + dest = vcpu;
> > + else if (kvm_compare_rr_counter(vcpu, dest) < 0)
> > + dest = vcpu;
> > + }
> > + }
> > +
> > + if (dest) {
> > + dest->arch.round_robin_counter++;
> > + *dest_vcpu = dest;
> > + return true;
> > + } else if (r == 1)
> > + return true;
> > +
> > + return false;
> > +}
> > +
> > +static int __kvm_update_pi_irte(struct kvm *kvm, int host_irq, int
> guest_irq)
> > +{
> > + struct kvm_kernel_irq_routing_entry *e;
> > + struct kvm_irq_routing_table *irq_rt;
> > + struct kvm_lapic_irq irq;
> > + struct kvm_vcpu *vcpu;
> > + int idx, ret = -EINVAL;
> > +
> > + idx = srcu_read_lock(&kvm->irq_srcu);
> > + irq_rt = srcu_dereference(kvm->irq_routing, &kvm->irq_srcu);
> > + ASSERT(guest_irq < irq_rt->nr_rt_entries);
> > +
> > + hlist_for_each_entry(e, &irq_rt->map[guest_irq], link) {
> > + if (e->type != KVM_IRQ_ROUTING_MSI)
> > + continue;
> > + /*
> > + * VT-d posted-interrupt has the following
> > + * limitations:
> > + * - No support for posting multicast/broadcast
> > + * interrupts to a VCPU
> > + * Still use interrupt remapping for these
> > + * kind of interrupts
> > + */
> > +
> > + kvm_set_msi_irq(e, &irq);
> > + if (!kvm_pi_find_dest_vcpu(kvm, &irq, &vcpu)) {
> > + printk(KERN_INFO "%s: can not find the target VCPU\n",
> > + __func__);
> > + ret = -EINVAL;
> > + goto out;
> > + }
> > +
> > + if (kvm_update_pi_irte_common(kvm, vcpu, irq.vector,
> > + host_irq)) {
> > + printk(KERN_INFO "%s: failed to update PI IRTE\n",
> > + __func__);
> > + ret = -EINVAL;
> > + goto out;
> > + }
> > + }
> > +
> > + ret = 0;
> > +out:
> > + srcu_read_unlock(&kvm->irq_srcu, idx);
> > + return ret;
> > +}
> > +
> > +int kvm_update_pi_irte(struct kvm *kvm, u32 dev_id)
> > +{
> > + int i, rc = -1;
> > + struct kvm_assigned_dev_kernel *dev;
> > +
> > + mutex_lock(&kvm->lock);
> > + dev = kvm_find_assigned_dev(&kvm->arch.assigned_dev_head, dev_id);
> > + if (!dev) {
> > + printk(KERN_INFO "%s: cannot find the assigned dev.\n",
> > + __func__);
> > + rc = -1;
> > + goto out;
> > + }
> > +
> > + BUG_ON(dev->irq_requested_type == 0);
> > +
> > + if ((dev->irq_requested_type & KVM_DEV_IRQ_HOST_MSI) &&
> > + (dev->dev->msi_enabled == 1)) {
> > + __kvm_update_pi_irte(kvm,
> > + dev->host_irq, dev->guest_irq);
> > + } else if ((dev->irq_requested_type & KVM_DEV_IRQ_HOST_MSIX) &&
> > + (dev->dev->msix_enabled == 1)) {
> > + for (i = 0; i < dev->entries_nr; i++) {
> > + __kvm_update_pi_irte(kvm,
> > + dev->host_msix_entries[i].vector,
> > + dev->guest_msix_entries[i].vector);
> > + }
> > + }
> > +
> > +out:
> > + rc = 0;
> > + mutex_unlock(&kvm->lock);
> > + return rc;
> > +}
> > +
> > static int assigned_device_enable_host_intx(struct kvm *kvm,
> > struct kvm_assigned_dev_kernel *dev)
> > {
> > @@ -1017,6 +1146,18 @@ long kvm_vm_ioctl_assigned_device(struct kvm
> *kvm, unsigned ioctl,
> > r = kvm_vm_ioctl_set_pci_irq_mask(kvm, &assigned_dev);
> > break;
> > }
> > + case KVM_ASSIGN_DEV_PI_UPDATE: {
> > + u32 dev_id;
> > +
> > + r = -EFAULT;
> > + if (copy_from_user(&dev_id, argp, sizeof(dev_id)))
> > + goto out;
> > + r = kvm_update_pi_irte(kvm, dev_id);
> > + if (r)
> > + goto out;
> > + break;
> > +
> > + }
> > default:
> > r = -ENOTTY;
> > break;
> > diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c
> > index 963b899..f51aed3 100644
> > --- a/virt/kvm/irq_comm.c
> > +++ b/virt/kvm/irq_comm.c
> > @@ -55,7 +55,7 @@ static int kvm_set_ioapic_irq(struct
> kvm_kernel_irq_routing_entry *e,
> > line_status);
> > }
> >
> > -inline static bool kvm_is_dm_lowest_prio(struct kvm_lapic_irq *irq)
> > +bool kvm_is_dm_lowest_prio(struct kvm_lapic_irq *irq)
> > {
> > #ifdef CONFIG_IA64
> > return irq->delivery_mode ==
> > @@ -106,7 +106,7 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm,
> struct kvm_lapic *src,
> > return r;
> > }
> >
> > -static inline void kvm_set_msi_irq(struct kvm_kernel_irq_routing_entry *e,
> > +void kvm_set_msi_irq(struct kvm_kernel_irq_routing_entry *e,
> > struct kvm_lapic_irq *irq)
> > {
> > trace_kvm_msi_set_irq(e->msi.address_lo, e->msi.data);
> > diff --git a/virt/kvm/irqchip.c b/virt/kvm/irqchip.c
> > index 7f256f3..cdf29a6 100644
> > --- a/virt/kvm/irqchip.c
> > +++ b/virt/kvm/irqchip.c
> > @@ -31,17 +31,6 @@
> > #include <trace/events/kvm.h>
> > #include "irq.h"
> >
> > -struct kvm_irq_routing_table {
> > - int chip[KVM_NR_IRQCHIPS][KVM_IRQCHIP_NUM_PINS];
> > - struct kvm_kernel_irq_routing_entry *rt_entries;
> > - u32 nr_rt_entries;
> > - /*
> > - * Array indexed by gsi. Each entry contains list of irq chips
> > - * the gsi is connected to.
> > - */
> > - struct hlist_head map[0];
> > -};
> > -
> > int kvm_irq_map_gsi(struct kvm *kvm,
> > struct kvm_kernel_irq_routing_entry *entries, int gsi)
> > {
>
>

????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2014-11-11 09:21:42

by Wu, Feng

[permalink] [raw]
Subject: RE: [PATCH 02/13] KVM: Initialize VT-d Posted-Interrtups Descriptor



> -----Original Message-----
> From: Alex Williamson [mailto:[email protected]]
> Sent: Tuesday, November 11, 2014 5:57 AM
> To: Wu, Feng
> Cc: [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]
> Subject: Re: [PATCH 02/13] KVM: Initialize VT-d Posted-Interrtups Descriptor
>
> On Mon, 2014-11-10 at 14:26 +0800, Feng Wu wrote:
> > This patch initialize the VT-d Posted-interrupt Descritpor.
> >
> > Signed-off-by: Feng Wu <[email protected]>
> > ---
> > arch/x86/include/asm/irq_remapping.h | 1 +
> > arch/x86/kernel/apic/apic.c | 1 +
> > arch/x86/kvm/vmx.c | 56
> ++++++++++++++++++++++++++++++++-
> > 3 files changed, 56 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/x86/include/asm/irq_remapping.h
> b/arch/x86/include/asm/irq_remapping.h
> > index b7747c4..a3cc437 100644
> > --- a/arch/x86/include/asm/irq_remapping.h
> > +++ b/arch/x86/include/asm/irq_remapping.h
> > @@ -57,6 +57,7 @@ extern bool setup_remapped_irq(int irq,
> > struct irq_chip *chip);
> >
> > void irq_remap_modify_chip_defaults(struct irq_chip *chip);
> > +extern int irq_post_enabled;
> >
> > #else /* CONFIG_IRQ_REMAP */
> >
> > diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
> > index ba6cc04..987408d 100644
> > --- a/arch/x86/kernel/apic/apic.c
> > +++ b/arch/x86/kernel/apic/apic.c
> > @@ -162,6 +162,7 @@ __setup("apicpmtimer", setup_apicpmtimer);
> > #endif
> >
> > int x2apic_mode;
> > +EXPORT_SYMBOL_GPL(x2apic_mode);
> > #ifdef CONFIG_X86_X2APIC
> > /* x2apic enabled before OS handover */
> > int x2apic_preenabled;
> > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> > index 3e556c6..a4670d3 100644
> > --- a/arch/x86/kvm/vmx.c
> > +++ b/arch/x86/kvm/vmx.c
> > @@ -45,6 +45,7 @@
> > #include <asm/perf_event.h>
> > #include <asm/debugreg.h>
> > #include <asm/kexec.h>
> > +#include <asm/irq_remapping.h>
> >
> > #include "trace.h"
> >
> > @@ -408,13 +409,32 @@ struct nested_vmx {
> > };
> >
> > #define POSTED_INTR_ON 0
> > +#define POSTED_INTR_SN 1
> > +
> > /* Posted-Interrupt Descriptor */
> > struct pi_desc {
> > u32 pir[8]; /* Posted interrupt requested */
> > - u32 control; /* bit 0 of control is outstanding notification bit */
> > - u32 rsvd[7];
> > + union {
> > + struct {
> > + u64 on : 1,
> > + sn : 1,
> > + rsvd_1 : 13,
> > + ndm : 1,
> > + nv : 8,
> > + rsvd_2 : 8,
> > + ndst : 32;
> > + };
> > + u64 control;
> > + };
> > + u32 rsvd[6];
> > } __aligned(64);
> >
> > +static void pi_clear_sn(struct pi_desc *pi_desc)
> > +{
> > + return clear_bit(POSTED_INTR_SN,
> > + (unsigned long *)&pi_desc->control);
> > +}
> > +
> > static bool pi_test_and_set_on(struct pi_desc *pi_desc)
> > {
> > return test_and_set_bit(POSTED_INTR_ON,
> > @@ -4396,6 +4416,33 @@ static void ept_set_mmio_spte_mask(void)
> > kvm_mmu_set_mmio_spte_mask((0x3ull << 62) | 0x6ull);
> > }
> >
> > +static bool pi_desc_init(struct vcpu_vmx *vmx)
> > +{
> > + unsigned int dest;
> > +
> > + if (irq_post_enabled == 0)
> > + return true;
> > +
> > + /*
> > + * Initialize Posted-Interrupt Descriptor
> > + */
> > +
> > + pi_clear_sn(&vmx->pi_desc);
> > + vmx->pi_desc.nv = POSTED_INTR_VECTOR;
> > +
> > + /* Physical mode for Notificaiton Event */
> > + vmx->pi_desc.ndm = 0;
> > + dest = cpu_physical_id(vmx->vcpu.cpu);
> > +
> > + if (x2apic_mode)
> > + vmx->pi_desc.ndst = dest;
> > + else
> > + vmx->pi_desc.ndst = (dest << 8) & 0xFF00;
> > +
> > + return true;
>
> Why does this bother to return anything since it can only return true?

Yes, I will make this function void. Thanks for the comments!

Thanks,
Feng

>
> > +}
> > +
> > +
> > /*
> > * Sets up the vmcs for emulated real mode.
> > */
> > @@ -4439,6 +4486,11 @@ static int vmx_vcpu_setup(struct vcpu_vmx
> *vmx)
> >
> > vmcs_write64(POSTED_INTR_NV, POSTED_INTR_VECTOR);
> > vmcs_write64(POSTED_INTR_DESC_ADDR, __pa((&vmx->pi_desc)));
> > +
> > + if (!pi_desc_init(vmx)) {
>
> And therefore this cannot happen.
>
> > + printk(KERN_ERR "Initialize PI descriptor error!\n");
> > + return 1;
>
> This is the wrong error anyway, vmx_create_vcpu() returns ERR_PTR(1)
> which fails the reverse IS_ERR()
>
> Thanks,
> Alex
>
> > + }
> > }
> >
> > if (ple_gap) {
>
>

????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2014-11-11 11:22:30

by Wu, Feng

[permalink] [raw]
Subject: RE: [PATCH 05/13] KVM: Update IRTE according to guest interrupt configuration changes



> -----Original Message-----
> From: Alex Williamson [mailto:[email protected]]
> Sent: Tuesday, November 11, 2014 5:58 AM
> To: Wu, Feng
> Cc: [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]
> Subject: Re: [PATCH 05/13] KVM: Update IRTE according to guest interrupt
> configuration changes
>
> On Mon, 2014-11-10 at 14:26 +0800, Feng Wu wrote:
> > When guest changes its interrupt configuration (such as, vector, etc.)
> > for direct-assigned devices, we need to update the associated IRTE
> > with the new guest vector, so external interrupts from the assigned
> > devices can be injected to guests without VM-Exit.
> >
> > The current method of handling guest lowest priority interrtups
> > is to use a counter 'apic_arb_prio' for each VCPU, we choose the
> > VCPU with smallest 'apic_arb_prio' and then increase it by 1.
> > However, for VT-d PI, we cannot re-use this, since we no longer
> > have control to 'apic_arb_prio' with posted interrupt direct
> > delivery by Hardware.
> >
> > Here, we introduce a similiar way with 'apic_arb_prio' to handle
> > guest lowest priority interrtups when VT-d PI is used. Here is the
> > ideas:
> > - Each VCPU has a counter 'round_robin_counter'.
> > - When guests sets an interrupts to lowest priority, we choose
> > the VCPU with smallest 'round_robin_counter' as the destination,
> > then increase it.
> >
> > Signed-off-by: Feng Wu <[email protected]>
> > ---
> > arch/x86/include/asm/irq_remapping.h | 6 ++
> > arch/x86/include/asm/kvm_host.h | 2 +
> > arch/x86/kvm/vmx.c | 12 +++
> > arch/x86/kvm/x86.c | 11 +++
> > drivers/iommu/amd_iommu.c | 6 ++
> > drivers/iommu/intel_irq_remapping.c | 28 +++++++
> > drivers/iommu/irq_remapping.c | 9 ++
> > drivers/iommu/irq_remapping.h | 3 +
> > include/linux/dmar.h | 26 ++++++
> > include/linux/kvm_host.h | 22 +++++
> > include/uapi/linux/kvm.h | 1 +
> > virt/kvm/assigned-dev.c | 141
> ++++++++++++++++++++++++++++++++++
> > virt/kvm/irq_comm.c | 4 +-
> > virt/kvm/irqchip.c | 11 ---
> > 14 files changed, 269 insertions(+), 13 deletions(-)
> >
> > diff --git a/arch/x86/include/asm/irq_remapping.h
> b/arch/x86/include/asm/irq_remapping.h
> > index a3cc437..32d6cc4 100644
> > --- a/arch/x86/include/asm/irq_remapping.h
> > +++ b/arch/x86/include/asm/irq_remapping.h
> > @@ -51,6 +51,7 @@ extern void compose_remapped_msi_msg(struct
> pci_dev *pdev,
> > unsigned int irq, unsigned int dest,
> > struct msi_msg *msg, u8 hpet_id);
> > extern int setup_hpet_msi_remapped(unsigned int irq, unsigned int id);
> > +extern int update_pi_irte(unsigned int irq, u64 pi_desc_addr, u32 vector);
> > extern void panic_if_irq_remap(const char *msg);
> > extern bool setup_remapped_irq(int irq,
> > struct irq_cfg *cfg,
> > @@ -88,6 +89,11 @@ static inline int setup_hpet_msi_remapped(unsigned
> int irq, unsigned int id)
> > return -ENODEV;
> > }
> >
> > +static inline int update_pi_irte(unsigned int irq, u64 pi_desc_addr, u32
> vector)
> > +{
> > + return -ENODEV;
> > +}
> > +
> > static inline void panic_if_irq_remap(const char *msg)
> > {
> > }
> > diff --git a/arch/x86/include/asm/kvm_host.h
> b/arch/x86/include/asm/kvm_host.h
> > index 6ed0c30..0630161 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -358,6 +358,7 @@ struct kvm_vcpu_arch {
> > struct kvm_lapic *apic; /* kernel irqchip context */
> > unsigned long apic_attention;
> > int32_t apic_arb_prio;
> > + int32_t round_robin_counter;
> > int mp_state;
> > u64 ia32_misc_enable_msr;
> > bool tpr_access_reporting;
> > @@ -771,6 +772,7 @@ struct kvm_x86_ops {
> > int (*check_nested_events)(struct kvm_vcpu *vcpu, bool external_intr);
> >
> > void (*sched_in)(struct kvm_vcpu *kvm, int cpu);
> > + u64 (*get_pi_desc_addr)(struct kvm_vcpu *vcpu);
> > };
> >
> > struct kvm_arch_async_pf {
> > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> > index a4670d3..ae91b72 100644
> > --- a/arch/x86/kvm/vmx.c
> > +++ b/arch/x86/kvm/vmx.c
> > @@ -544,6 +544,11 @@ static inline struct vcpu_vmx *to_vmx(struct
> kvm_vcpu *vcpu)
> > return container_of(vcpu, struct vcpu_vmx, vcpu);
> > }
> >
> > +struct pi_desc *vcpu_to_pi_desc(struct kvm_vcpu *vcpu)
> > +{
> > + return &(to_vmx(vcpu)->pi_desc);
> > +}
> > +
> > #define VMCS12_OFFSET(x) offsetof(struct vmcs12, x)
> > #define FIELD(number, name) [number] = VMCS12_OFFSET(name)
> > #define FIELD64(number, name) [number] = VMCS12_OFFSET(name), \
> > @@ -4280,6 +4285,11 @@ static void vmx_sync_pir_to_irr_dummy(struct
> kvm_vcpu *vcpu)
> > return;
> > }
> >
> > +static u64 vmx_get_pi_desc_addr(struct kvm_vcpu *vcpu)
> > +{
> > + return __pa((u64)vcpu_to_pi_desc(vcpu));
> > +}
> > +
> > /*
> > * Set up the vmcs's constant host-state fields, i.e., host-state fields that
> > * will not change in the lifetime of the guest.
> > @@ -9232,6 +9242,8 @@ static struct kvm_x86_ops vmx_x86_ops = {
> > .check_nested_events = vmx_check_nested_events,
> >
> > .sched_in = vmx_sched_in,
> > +
> > + .get_pi_desc_addr = vmx_get_pi_desc_addr,
> > };
> >
> > static int __init vmx_init(void)
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index b447a98..0c19d15 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -7735,6 +7735,17 @@ bool kvm_arch_has_noncoherent_dma(struct
> kvm *kvm)
> > }
> > EXPORT_SYMBOL_GPL(kvm_arch_has_noncoherent_dma);
> >
> > +int kvm_update_pi_irte_common(struct kvm *kvm, struct kvm_vcpu *vcpu,
> > + u32 guest_vector, int host_irq)
> > +{
> > + u64 pi_desc_addr = kvm_x86_ops->get_pi_desc_addr(vcpu);
> > +
> > + if (update_pi_irte(host_irq, pi_desc_addr, guest_vector))
> > + return -1;
> > +
> > + return 0;
> > +}
> > +
> > EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_exit);
> > EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_inj_virq);
> > EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_page_fault);
> > diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
> > index 505a9ad..a36fdc7 100644
> > --- a/drivers/iommu/amd_iommu.c
> > +++ b/drivers/iommu/amd_iommu.c
> > @@ -4280,6 +4280,11 @@ static int alloc_hpet_msi(unsigned int irq,
> unsigned int id)
> > return 0;
> > }
> >
> > +static int dummy_update_pi_irte(int irq, u64 pi_desc_addr, u32 vector)
> > +{
> > + return -EINVAL;
> > +}
> > +
> > struct irq_remap_ops amd_iommu_irq_ops = {
> > .supported = amd_iommu_supported,
> > .prepare = amd_iommu_prepare,
> > @@ -4294,5 +4299,6 @@ struct irq_remap_ops amd_iommu_irq_ops = {
> > .msi_alloc_irq = msi_alloc_irq,
> > .msi_setup_irq = msi_setup_irq,
> > .alloc_hpet_msi = alloc_hpet_msi,
> > + .update_pi_irte = dummy_update_pi_irte,
> > };
> > #endif
> > diff --git a/drivers/iommu/intel_irq_remapping.c
> b/drivers/iommu/intel_irq_remapping.c
> > index 776da10..87c02fe 100644
> > --- a/drivers/iommu/intel_irq_remapping.c
> > +++ b/drivers/iommu/intel_irq_remapping.c
> > @@ -1172,6 +1172,33 @@ static int intel_alloc_hpet_msi(unsigned int irq,
> unsigned int id)
> > return ret;
> > }
> >
> > +static int intel_update_pi_irte(int irq, u64 pi_desc_addr, u32 vector)
> > +{
> > + struct irte irte;
> > +
> > + if (get_irte(irq, &irte))
> > + return -1;
> > +
> > + irte.irq_post_low.urg = 0;
> > + irte.irq_post_low.vector = vector;
> > + irte.irq_post_low.pda_l = (pi_desc_addr >> (32 - PDA_LOW_BIT)) &
> > + ~(-1UL << PDA_LOW_BIT);
> > + irte.irq_post_high.pda_h = (pi_desc_addr >> 32) &
> > + ~(-1UL << PDA_HIGH_BIT);
> > +
> > + irte.irq_post_low.__reserved_1 = 0;
> > + irte.irq_post_low.__reserved_2 = 0;
> > + irte.irq_post_low.__reserved_3 = 0;
> > + irte.irq_post_high.__reserved_4 = 0;
> > +
> > + irte.irq_post_low.pst = 1;
> > +
> > + if (modify_irte(irq, &irte))
> > + return -1;
> > +
> > + return 0;
> > +}
> > +
> > struct irq_remap_ops intel_irq_remap_ops = {
> > .supported = intel_irq_remapping_supported,
> > .prepare = dmar_table_init,
> > @@ -1186,4 +1213,5 @@ struct irq_remap_ops intel_irq_remap_ops = {
> > .msi_alloc_irq = intel_msi_alloc_irq,
> > .msi_setup_irq = intel_msi_setup_irq,
> > .alloc_hpet_msi = intel_alloc_hpet_msi,
> > + .update_pi_irte = intel_update_pi_irte,
>
> Extending irq_remap_ops should really be a separate patch from it's use
> by KVM.

Will do.

>
> > };
> > diff --git a/drivers/iommu/irq_remapping.c b/drivers/iommu/irq_remapping.c
> > index 2f8ee00..0e36860 100644
> > --- a/drivers/iommu/irq_remapping.c
> > +++ b/drivers/iommu/irq_remapping.c
> > @@ -362,6 +362,15 @@ int setup_hpet_msi_remapped(unsigned int irq,
> unsigned int id)
> > return default_setup_hpet_msi(irq, id);
> > }
> >
> > +int update_pi_irte(unsigned int irq, u64 pi_desc_addr, u32 vector)
> > +{
> > + if (!remap_ops || !remap_ops->update_pi_irte)
> > + return -ENODEV;
> > +
> > + return remap_ops->update_pi_irte(irq, pi_desc_addr, vector);
> > +}
> > +EXPORT_SYMBOL_GPL(update_pi_irte);
> > +
> > void panic_if_irq_remap(const char *msg)
> > {
> > if (irq_remapping_enabled)
> > diff --git a/drivers/iommu/irq_remapping.h b/drivers/iommu/irq_remapping.h
> > index 7bb5913..2d8f740 100644
> > --- a/drivers/iommu/irq_remapping.h
> > +++ b/drivers/iommu/irq_remapping.h
> > @@ -84,6 +84,9 @@ struct irq_remap_ops {
> >
> > /* Setup interrupt remapping for an HPET MSI */
> > int (*alloc_hpet_msi)(unsigned int, unsigned int);
> > +
> > + /* Update IRTE for posted-interrupt */
> > + int (*update_pi_irte)(int irq, u64 pi_desc_addr, u32 vector);
> > };
> >
> > extern struct irq_remap_ops intel_irq_remap_ops;
> > diff --git a/include/linux/dmar.h b/include/linux/dmar.h
> > index 8be5d42..e1ff4f7 100644
> > --- a/include/linux/dmar.h
> > +++ b/include/linux/dmar.h
> > @@ -160,6 +160,20 @@ struct irte {
> > __reserved_2 : 8,
> > dest_id : 32;
> > } irq_remap_low;
> > +
> > + struct {
> > + __u64 present : 1,
> > + fpd : 1,
> > + __reserved_1 : 6,
> > + avail : 4,
> > + __reserved_2 : 2,
> > + urg : 1,
> > + pst : 1,
> > + vector : 8,
> > + __reserved_3 : 14,
> > + pda_l : 26;
> > + } irq_post_low;
> > +
> > __u64 low;
> > };
> >
> > @@ -170,10 +184,22 @@ struct irte {
> > svt : 2,
> > __reserved_3 : 44;
> > } irq_remap_high;
> > +
> > + struct {
> > + __u64 sid: 16,
> > + sq: 2,
> > + svt: 2,
> > + __reserved_4: 12,
> > + pda_h: 32;
> > + } irq_post_high;
> > +
> > __u64 high;
> > };
> > };
> >
> > +#define PDA_LOW_BIT 26
> > +#define PDA_HIGH_BIT 32
> > +
> > enum {
> > IRQ_REMAP_XAPIC_MODE,
> > IRQ_REMAP_X2APIC_MODE,
> > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > index ea53b04..6bb8287 100644
> > --- a/include/linux/kvm_host.h
> > +++ b/include/linux/kvm_host.h
> > @@ -335,6 +335,25 @@ struct kvm_kernel_irq_routing_entry {
> > struct hlist_node link;
> > };
> >
> > +#ifdef CONFIG_HAVE_KVM_IRQ_ROUTING
> > +
> > +struct kvm_irq_routing_table {
> > + int chip[KVM_NR_IRQCHIPS][KVM_IRQCHIP_NUM_PINS];
> > + struct kvm_kernel_irq_routing_entry *rt_entries;
> > + u32 nr_rt_entries;
> > + /*
> > + * Array indexed by gsi. Each entry contains list of irq chips
> > + * the gsi is connected to.
> > + */
> > + struct hlist_head map[0];
> > +};
> > +
> > +#else
> > +
> > +struct kvm_irq_routing_table {};
> > +
> > +#endif
> > +
> > #ifndef KVM_PRIVATE_MEM_SLOTS
> > #define KVM_PRIVATE_MEM_SLOTS 0
> > #endif
> > @@ -766,6 +785,9 @@ void kvm_unregister_irq_ack_notifier(struct kvm
> *kvm,
> > struct kvm_irq_ack_notifier *kian);
> > int kvm_request_irq_source_id(struct kvm *kvm);
> > void kvm_free_irq_source_id(struct kvm *kvm, int irq_source_id);
> > +void kvm_set_msi_irq(struct kvm_kernel_irq_routing_entry *e,
> > + struct kvm_lapic_irq *irq);
> > +bool kvm_is_dm_lowest_prio(struct kvm_lapic_irq *irq);
> >
> > #ifdef CONFIG_KVM_DEVICE_ASSIGNMENT
> > int kvm_iommu_map_pages(struct kvm *kvm, struct kvm_memory_slot
> *slot);
> > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> > index 7593c52..509223a 100644
> > --- a/include/uapi/linux/kvm.h
> > +++ b/include/uapi/linux/kvm.h
> > @@ -1027,6 +1027,7 @@ struct kvm_s390_ucas_mapping {
> > #define KVM_XEN_HVM_CONFIG _IOW(KVMIO, 0x7a, struct
> kvm_xen_hvm_config)
> > #define KVM_SET_CLOCK _IOW(KVMIO, 0x7b, struct
> kvm_clock_data)
> > #define KVM_GET_CLOCK _IOR(KVMIO, 0x7c, struct
> kvm_clock_data)
> > +#define KVM_ASSIGN_DEV_PI_UPDATE _IOR(KVMIO, 0x7d, __u32)
> > /* Available with KVM_CAP_PIT_STATE2 */
> > #define KVM_GET_PIT2 _IOR(KVMIO, 0x9f, struct
> kvm_pit_state2)
> > #define KVM_SET_PIT2 _IOW(KVMIO, 0xa0, struct
> kvm_pit_state2)
>
> Needs an accompanying Documentation/virtual/kvm/api.txt update.
>
> > diff --git a/virt/kvm/assigned-dev.c b/virt/kvm/assigned-dev.c
> > index e05000e..e154009 100644
> > --- a/virt/kvm/assigned-dev.c
> > +++ b/virt/kvm/assigned-dev.c
>
>
> Since legacy KVM device assignment is effectively deprecated, have you
> considered how we might do this with VFIO? Thanks,
>
> Alex
>
I haven't thought about how to enable this in VFIO so far. I think I can continue to
implement that if needed after this patch set is finished. What do you think of this?

Thanks,
Feng


>
> > @@ -326,6 +326,135 @@ void kvm_free_all_assigned_devices(struct kvm
> *kvm)
> > }
> > }
> >
> > +int __weak kvm_update_pi_irte_common(struct kvm *kvm, struct kvm_vcpu
> *vcpu,
> > + u32 guest_vector, int host_irq)
> > +{
> > + return 0;
> > +}
> > +
> > +int kvm_compare_rr_counter(struct kvm_vcpu *vcpu1, struct kvm_vcpu
> *vcpu2)
> > +{
> > + return vcpu1->arch.round_robin_counter -
> > + vcpu2->arch.round_robin_counter;
> > +}
> > +
> > +bool kvm_pi_find_dest_vcpu(struct kvm *kvm, struct kvm_lapic_irq *irq,
> > + struct kvm_vcpu **dest_vcpu)
> > +{
> > + int i, r = 0;
> > + struct kvm_vcpu *vcpu, *dest = NULL;
> > +
> > + kvm_for_each_vcpu(i, vcpu, kvm) {
> > + if (!kvm_apic_present(vcpu))
> > + continue;
> > +
> > + if (!kvm_apic_match_dest(vcpu, NULL, irq->shorthand,
> > + irq->dest_id, irq->dest_mode))
> > + continue;
> > +
> > + if (!kvm_is_dm_lowest_prio(irq)) {
> > + r++;
> > + *dest_vcpu = vcpu;
> > + } else if (kvm_lapic_enabled(vcpu)) {
> > + if (!dest)
> > + dest = vcpu;
> > + else if (kvm_compare_rr_counter(vcpu, dest) < 0)
> > + dest = vcpu;
> > + }
> > + }
> > +
> > + if (dest) {
> > + dest->arch.round_robin_counter++;
> > + *dest_vcpu = dest;
> > + return true;
> > + } else if (r == 1)
> > + return true;
> > +
> > + return false;
> > +}
> > +
> > +static int __kvm_update_pi_irte(struct kvm *kvm, int host_irq, int
> guest_irq)
> > +{
> > + struct kvm_kernel_irq_routing_entry *e;
> > + struct kvm_irq_routing_table *irq_rt;
> > + struct kvm_lapic_irq irq;
> > + struct kvm_vcpu *vcpu;
> > + int idx, ret = -EINVAL;
> > +
> > + idx = srcu_read_lock(&kvm->irq_srcu);
> > + irq_rt = srcu_dereference(kvm->irq_routing, &kvm->irq_srcu);
> > + ASSERT(guest_irq < irq_rt->nr_rt_entries);
> > +
> > + hlist_for_each_entry(e, &irq_rt->map[guest_irq], link) {
> > + if (e->type != KVM_IRQ_ROUTING_MSI)
> > + continue;
> > + /*
> > + * VT-d posted-interrupt has the following
> > + * limitations:
> > + * - No support for posting multicast/broadcast
> > + * interrupts to a VCPU
> > + * Still use interrupt remapping for these
> > + * kind of interrupts
> > + */
> > +
> > + kvm_set_msi_irq(e, &irq);
> > + if (!kvm_pi_find_dest_vcpu(kvm, &irq, &vcpu)) {
> > + printk(KERN_INFO "%s: can not find the target VCPU\n",
> > + __func__);
> > + ret = -EINVAL;
> > + goto out;
> > + }
> > +
> > + if (kvm_update_pi_irte_common(kvm, vcpu, irq.vector,
> > + host_irq)) {
> > + printk(KERN_INFO "%s: failed to update PI IRTE\n",
> > + __func__);
> > + ret = -EINVAL;
> > + goto out;
> > + }
> > + }
> > +
> > + ret = 0;
> > +out:
> > + srcu_read_unlock(&kvm->irq_srcu, idx);
> > + return ret;
> > +}
> > +
> > +int kvm_update_pi_irte(struct kvm *kvm, u32 dev_id)
> > +{
> > + int i, rc = -1;
> > + struct kvm_assigned_dev_kernel *dev;
> > +
> > + mutex_lock(&kvm->lock);
> > + dev = kvm_find_assigned_dev(&kvm->arch.assigned_dev_head, dev_id);
> > + if (!dev) {
> > + printk(KERN_INFO "%s: cannot find the assigned dev.\n",
> > + __func__);
> > + rc = -1;
> > + goto out;
> > + }
> > +
> > + BUG_ON(dev->irq_requested_type == 0);
> > +
> > + if ((dev->irq_requested_type & KVM_DEV_IRQ_HOST_MSI) &&
> > + (dev->dev->msi_enabled == 1)) {
> > + __kvm_update_pi_irte(kvm,
> > + dev->host_irq, dev->guest_irq);
> > + } else if ((dev->irq_requested_type & KVM_DEV_IRQ_HOST_MSIX) &&
> > + (dev->dev->msix_enabled == 1)) {
> > + for (i = 0; i < dev->entries_nr; i++) {
> > + __kvm_update_pi_irte(kvm,
> > + dev->host_msix_entries[i].vector,
> > + dev->guest_msix_entries[i].vector);
> > + }
> > + }
> > +
> > +out:
> > + rc = 0;
> > + mutex_unlock(&kvm->lock);
> > + return rc;
> > +}
> > +
> > static int assigned_device_enable_host_intx(struct kvm *kvm,
> > struct kvm_assigned_dev_kernel *dev)
> > {
> > @@ -1017,6 +1146,18 @@ long kvm_vm_ioctl_assigned_device(struct kvm
> *kvm, unsigned ioctl,
> > r = kvm_vm_ioctl_set_pci_irq_mask(kvm, &assigned_dev);
> > break;
> > }
> > + case KVM_ASSIGN_DEV_PI_UPDATE: {
> > + u32 dev_id;
> > +
> > + r = -EFAULT;
> > + if (copy_from_user(&dev_id, argp, sizeof(dev_id)))
> > + goto out;
> > + r = kvm_update_pi_irte(kvm, dev_id);
> > + if (r)
> > + goto out;
> > + break;
> > +
> > + }
> > default:
> > r = -ENOTTY;
> > break;
> > diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c
> > index 963b899..f51aed3 100644
> > --- a/virt/kvm/irq_comm.c
> > +++ b/virt/kvm/irq_comm.c
> > @@ -55,7 +55,7 @@ static int kvm_set_ioapic_irq(struct
> kvm_kernel_irq_routing_entry *e,
> > line_status);
> > }
> >
> > -inline static bool kvm_is_dm_lowest_prio(struct kvm_lapic_irq *irq)
> > +bool kvm_is_dm_lowest_prio(struct kvm_lapic_irq *irq)
> > {
> > #ifdef CONFIG_IA64
> > return irq->delivery_mode ==
> > @@ -106,7 +106,7 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm,
> struct kvm_lapic *src,
> > return r;
> > }
> >
> > -static inline void kvm_set_msi_irq(struct kvm_kernel_irq_routing_entry *e,
> > +void kvm_set_msi_irq(struct kvm_kernel_irq_routing_entry *e,
> > struct kvm_lapic_irq *irq)
> > {
> > trace_kvm_msi_set_irq(e->msi.address_lo, e->msi.data);
> > diff --git a/virt/kvm/irqchip.c b/virt/kvm/irqchip.c
> > index 7f256f3..cdf29a6 100644
> > --- a/virt/kvm/irqchip.c
> > +++ b/virt/kvm/irqchip.c
> > @@ -31,17 +31,6 @@
> > #include <trace/events/kvm.h>
> > #include "irq.h"
> >
> > -struct kvm_irq_routing_table {
> > - int chip[KVM_NR_IRQCHIPS][KVM_IRQCHIP_NUM_PINS];
> > - struct kvm_kernel_irq_routing_entry *rt_entries;
> > - u32 nr_rt_entries;
> > - /*
> > - * Array indexed by gsi. Each entry contains list of irq chips
> > - * the gsi is connected to.
> > - */
> > - struct hlist_head map[0];
> > -};
> > -
> > int kvm_irq_map_gsi(struct kvm *kvm,
> > struct kvm_kernel_irq_routing_entry *entries, int gsi)
> > {
>
>

2014-11-11 11:57:36

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH 05/13] KVM: Update IRTE according to guest interrupt configuration changes



On 11/11/2014 10:20, Wu, Feng wrote:
> > Since legacy KVM device assignment is effectively deprecated, have you
> > considered how we might do this with VFIO? Thanks,
>
> I haven't thought about how to enable this in VFIO so far. I think I can continue to
> implement that if needed after this patch set is finished. What do you think of this?

Hi Feng,

we are not applying new features to legacy KVM device assignment, since
it is unsafe (it does not honor ACS).

I and Alex can help you with designing a way to interface VFIO with KVM
posted interrupts. Give us a few days to study these patches more, or
feel free to request comments if you have ideas about it yourself.

Paolo

2014-11-11 12:28:26

by Wu, Feng

[permalink] [raw]
Subject: RE: [PATCH 05/13] KVM: Update IRTE according to guest interrupt configuration changes



> -----Original Message-----
> From: Paolo Bonzini [mailto:[email protected]]
> Sent: Tuesday, November 11, 2014 7:02 PM
> To: Wu, Feng; Alex Williamson
> Cc: [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]
> Subject: Re: [PATCH 05/13] KVM: Update IRTE according to guest interrupt
> configuration changes
>
>
>
> On 11/11/2014 10:20, Wu, Feng wrote:
> > > Since legacy KVM device assignment is effectively deprecated, have you
> > > considered how we might do this with VFIO? Thanks,
> >
> > I haven't thought about how to enable this in VFIO so far. I think I can continue
> to
> > implement that if needed after this patch set is finished. What do you think of
> this?
>
> Hi Feng,
>
> we are not applying new features to legacy KVM device assignment, since
> it is unsafe (it does not honor ACS).
>
> I and Alex can help you with designing a way to interface VFIO with KVM
> posted interrupts. Give us a few days to study these patches more, or
> feel free to request comments if you have ideas about it yourself.
>
> Paolo

Okay, then I will put some efforts on getting familiar with VFIO mechanism. If
You have any questions about these patches, we can discuss it together.

Thanks,
Feng
????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2014-11-11 13:03:00

by Wu, Feng

[permalink] [raw]
Subject: RE: [PATCH 05/13] KVM: Update IRTE according to guest interrupt configuration changes



> -----Original Message-----
> From: Paolo Bonzini [mailto:[email protected]]
> Sent: Tuesday, November 11, 2014 7:02 PM
> To: Wu, Feng; Alex Williamson
> Cc: [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]
> Subject: Re: [PATCH 05/13] KVM: Update IRTE according to guest interrupt
> configuration changes
>
>
>
> On 11/11/2014 10:20, Wu, Feng wrote:
> > > Since legacy KVM device assignment is effectively deprecated, have you
> > > considered how we might do this with VFIO? Thanks,
> >
> > I haven't thought about how to enable this in VFIO so far. I think I can continue
> to
> > implement that if needed after this patch set is finished. What do you think of
> this?
>
> Hi Feng,
>
> we are not applying new features to legacy KVM device assignment, since
> it is unsafe (it does not honor ACS).
>
> I and Alex can help you with designing a way to interface VFIO with KVM
> posted interrupts. Give us a few days to study these patches more, or
> feel free to request comments if you have ideas about it yourself.
>
> Paolo

Okay, then I will put some efforts on getting familiar with VFIO mechanism. If
You have any questions about these patches, we can discuss it together.

Thanks,
Feng

2014-11-11 13:35:52

by Jiang Liu

[permalink] [raw]
Subject: Re: [PATCH 02/13] KVM: Initialize VT-d Posted-Interrtups Descriptor

On 2014/11/10 14:26, Feng Wu wrote:
> This patch initialize the VT-d Posted-interrupt Descritpor.
>
> Signed-off-by: Feng Wu <[email protected]>
> ---
> arch/x86/include/asm/irq_remapping.h | 1 +
> arch/x86/kernel/apic/apic.c | 1 +
> arch/x86/kvm/vmx.c | 56 ++++++++++++++++++++++++++++++++-
> 3 files changed, 56 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/include/asm/irq_remapping.h b/arch/x86/include/asm/irq_remapping.h
> index b7747c4..a3cc437 100644
> --- a/arch/x86/include/asm/irq_remapping.h
> +++ b/arch/x86/include/asm/irq_remapping.h
> @@ -57,6 +57,7 @@ extern bool setup_remapped_irq(int irq,
> struct irq_chip *chip);
>
> void irq_remap_modify_chip_defaults(struct irq_chip *chip);
> +extern int irq_post_enabled;
>
> #else /* CONFIG_IRQ_REMAP */
>
> diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
> index ba6cc04..987408d 100644
> --- a/arch/x86/kernel/apic/apic.c
> +++ b/arch/x86/kernel/apic/apic.c
> @@ -162,6 +162,7 @@ __setup("apicpmtimer", setup_apicpmtimer);
> #endif
>
> int x2apic_mode;
> +EXPORT_SYMBOL_GPL(x2apic_mode);
> #ifdef CONFIG_X86_X2APIC
> /* x2apic enabled before OS handover */
> int x2apic_preenabled;
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 3e556c6..a4670d3 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -45,6 +45,7 @@
> #include <asm/perf_event.h>
> #include <asm/debugreg.h>
> #include <asm/kexec.h>
> +#include <asm/irq_remapping.h>
>
> #include "trace.h"
>
> @@ -408,13 +409,32 @@ struct nested_vmx {
> };
>
> #define POSTED_INTR_ON 0
> +#define POSTED_INTR_SN 1
> +
> /* Posted-Interrupt Descriptor */
> struct pi_desc {
> u32 pir[8]; /* Posted interrupt requested */
> - u32 control; /* bit 0 of control is outstanding notification bit */
> - u32 rsvd[7];
> + union {
> + struct {
> + u64 on : 1,
> + sn : 1,
> + rsvd_1 : 13,
> + ndm : 1,
> + nv : 8,
> + rsvd_2 : 8,
> + ndst : 32;
> + };
> + u64 control;
> + };
> + u32 rsvd[6];
> } __aligned(64);
>
> +static void pi_clear_sn(struct pi_desc *pi_desc)
> +{
> + return clear_bit(POSTED_INTR_SN,
> + (unsigned long *)&pi_desc->control);
> +}
> +
> static bool pi_test_and_set_on(struct pi_desc *pi_desc)
> {
> return test_and_set_bit(POSTED_INTR_ON,
> @@ -4396,6 +4416,33 @@ static void ept_set_mmio_spte_mask(void)
> kvm_mmu_set_mmio_spte_mask((0x3ull << 62) | 0x6ull);
> }
>
> +static bool pi_desc_init(struct vcpu_vmx *vmx)
> +{
> + unsigned int dest;
> +
> + if (irq_post_enabled == 0)
> + return true;
> +
> + /*
> + * Initialize Posted-Interrupt Descriptor
> + */
> +
> + pi_clear_sn(&vmx->pi_desc);
> + vmx->pi_desc.nv = POSTED_INTR_VECTOR;
> +
> + /* Physical mode for Notificaiton Event */
> + vmx->pi_desc.ndm = 0;
> + dest = cpu_physical_id(vmx->vcpu.cpu);
> +
> + if (x2apic_mode)
Hi Feng,
Could you try to use x2apic_enabled() here so you don't
need to export x2apic_mode?
Regards!
Gerry
> + vmx->pi_desc.ndst = dest;
> + else
> + vmx->pi_desc.ndst = (dest << 8) & 0xFF00;
> +
> + return true;
> +}
> +
> +
> /*
> * Sets up the vmcs for emulated real mode.
> */
> @@ -4439,6 +4486,11 @@ static int vmx_vcpu_setup(struct vcpu_vmx *vmx)
>
> vmcs_write64(POSTED_INTR_NV, POSTED_INTR_VECTOR);
> vmcs_write64(POSTED_INTR_DESC_ADDR, __pa((&vmx->pi_desc)));
> +
> + if (!pi_desc_init(vmx)) {
> + printk(KERN_ERR "Initialize PI descriptor error!\n");
> + return 1;
> + }
> }
>
> if (ple_gap) {
>

2014-11-11 13:38:53

by Jiang Liu

[permalink] [raw]
Subject: Re: [PATCH 01/13] iommu/vt-d: VT-d Posted-Interrupts feature detection



On 2014/11/10 14:26, Feng Wu wrote:
> VT-d Posted-Interrupts is an enhancement to CPU side Posted-Interrupt.
> With VT-d Posted-Interrupts enabled, external interrupts from
> direct-assigned devices can be delivered to guests without VMM
> intervention when guest is running in non-root mode.
>
> This patch adds feature detection logic for VT-d posted-interrupt.
>
> Signed-off-by: Feng Wu <[email protected]>
> ---
> drivers/iommu/intel_irq_remapping.c | 13 +++++++++++++
> drivers/iommu/irq_remapping.c | 4 ++++
> drivers/iommu/irq_remapping.h | 5 +++++
> include/linux/intel-iommu.h | 1 +
> 4 files changed, 23 insertions(+), 0 deletions(-)
>
> diff --git a/drivers/iommu/intel_irq_remapping.c b/drivers/iommu/intel_irq_remapping.c
> index 7c80661..f99f0f1 100644
> --- a/drivers/iommu/intel_irq_remapping.c
> +++ b/drivers/iommu/intel_irq_remapping.c
> @@ -580,6 +580,19 @@ static int __init intel_irq_remapping_supported(void)
> if (!ecap_ir_support(iommu->ecap))
> return 0;
>
> + /* VT-d posted-interrupt feature detection*/
> + if (disable_irq_post == 0)
> + for_each_drhd_unit(drhd) {
> + struct intel_iommu *iommu = drhd->iommu;
Hi Feng,
You may use for_each_active_iommu() here.
Regards!
Gerry

> +
> + if (!cap_pi_support(iommu->cap)) {
> + irq_post_enabled = 0;
> + disable_irq_post = 1;
> + break;
> + }
> + irq_post_enabled = 1;
> + }
> +
> return 1;
> }
>
> diff --git a/drivers/iommu/irq_remapping.c b/drivers/iommu/irq_remapping.c
> index 74a1767..2f8ee00 100644
> --- a/drivers/iommu/irq_remapping.c
> +++ b/drivers/iommu/irq_remapping.c
> @@ -23,6 +23,10 @@ int irq_remap_broken;
> int disable_sourceid_checking;
> int no_x2apic_optout;
>
> +int disable_irq_post = 1;
> +int irq_post_enabled = 0;
> +EXPORT_SYMBOL_GPL(irq_post_enabled);
> +
> static struct irq_remap_ops *remap_ops;
>
> static int msi_alloc_remapped_irq(struct pci_dev *pdev, int irq, int nvec);
> diff --git a/drivers/iommu/irq_remapping.h b/drivers/iommu/irq_remapping.h
> index fde250f..7bb5913 100644
> --- a/drivers/iommu/irq_remapping.h
> +++ b/drivers/iommu/irq_remapping.h
> @@ -37,6 +37,9 @@ extern int disable_sourceid_checking;
> extern int no_x2apic_optout;
> extern int irq_remapping_enabled;
>
> +extern int disable_irq_post;
> +extern int irq_post_enabled;
> +
> struct irq_remap_ops {
> /* Check whether Interrupt Remapping is supported */
> int (*supported)(void);
> @@ -91,6 +94,8 @@ extern struct irq_remap_ops amd_iommu_irq_ops;
> #define irq_remapping_enabled 0
> #define disable_irq_remap 1
> #define irq_remap_broken 0
> +#define disable_irq_post 1
> +#define irq_post_enabled 0
>
> #endif /* CONFIG_IRQ_REMAP */
>
> diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
> index a65208a..5b1a124 100644
> --- a/include/linux/intel-iommu.h
> +++ b/include/linux/intel-iommu.h
> @@ -87,6 +87,7 @@ static inline void dmar_writeq(void __iomem *addr, u64 val)
> /*
> * Decoding Capability Register
> */
> +#define cap_pi_support(c) (((c) >> 59) & 1)
> #define cap_read_drain(c) (((c) >> 55) & 1)
> #define cap_write_drain(c) (((c) >> 54) & 1)
> #define cap_max_amask_val(c) (((c) >> 48) & 0x3f)
>

2014-11-11 13:43:56

by Jiang Liu

[permalink] [raw]
Subject: Re: [PATCH 04/13] iommu/vt-d: Adjust 'struct irte' to better suit for VT-d Posted-Interrupts

Hi Feng,
Other than this solution, how about introducing new
struct irte_pi for posted interrupt?

On 2014/11/10 14:26, Feng Wu wrote:
> This patch adjusts the definition of 'struct irte', so that we can
> add the VT-d Posted-Interrtups format in this structure later.
>
> Signed-off-by: Feng Wu <[email protected]>
> ---
> drivers/iommu/intel_irq_remapping.c | 35 +++++++++++++++++++----------------
> include/linux/dmar.h | 4 ++--
> 2 files changed, 21 insertions(+), 18 deletions(-)
>
> diff --git a/drivers/iommu/intel_irq_remapping.c b/drivers/iommu/intel_irq_remapping.c
> index f99f0f1..776da10 100644
> --- a/drivers/iommu/intel_irq_remapping.c
> +++ b/drivers/iommu/intel_irq_remapping.c
> @@ -310,9 +310,9 @@ static void set_irte_sid(struct irte *irte, unsigned int svt,
> {
> if (disable_sourceid_checking)
> svt = SVT_NO_VERIFY;
> - irte->svt = svt;
> - irte->sq = sq;
> - irte->sid = sid;
> + irte->irq_remap_high.svt = svt;
> + irte->irq_remap_high.sq = sq;
> + irte->irq_remap_high.sid = sid;
> }
>
> static int set_ioapic_sid(struct irte *irte, int apic)
> @@ -917,8 +917,8 @@ static void prepare_irte(struct irte *irte, int vector,
> {
> memset(irte, 0, sizeof(*irte));
>
> - irte->present = 1;
> - irte->dst_mode = apic->irq_dest_mode;
> + irte->irq_remap_low.present = 1;
> + irte->irq_remap_low.dst_mode = apic->irq_dest_mode;
> /*
> * Trigger mode in the IRTE will always be edge, and for IO-APIC, the
> * actual level or edge trigger will be setup in the IO-APIC
> @@ -926,11 +926,11 @@ static void prepare_irte(struct irte *irte, int vector,
> * For more details, see the comments (in io_apic.c) explainig IO-APIC
> * irq migration in the presence of interrupt-remapping.
> */
> - irte->trigger_mode = 0;
> - irte->dlvry_mode = apic->irq_delivery_mode;
> - irte->vector = vector;
> - irte->dest_id = IRTE_DEST(dest);
> - irte->redir_hint = 1;
> + irte->irq_remap_low.trigger_mode = 0;
> + irte->irq_remap_low.dlvry_mode = apic->irq_delivery_mode;
> + irte->irq_remap_low.vector = vector;
> + irte->irq_remap_low.dest_id = IRTE_DEST(dest);
> + irte->irq_remap_low.redir_hint = 1;
> }
>
> static int intel_setup_ioapic_entry(int irq,
> @@ -973,10 +973,13 @@ static int intel_setup_ioapic_entry(int irq,
> "Redir_hint:%d Trig_Mode:%d Dlvry_Mode:%X "
> "Avail:%X Vector:%02X Dest:%08X "
> "SID:%04X SQ:%X SVT:%X)\n",
> - attr->ioapic, irte.present, irte.fpd, irte.dst_mode,
> - irte.redir_hint, irte.trigger_mode, irte.dlvry_mode,
> - irte.avail, irte.vector, irte.dest_id,
> - irte.sid, irte.sq, irte.svt);
> + attr->ioapic, irte.irq_remap_low.present,
> + irte.irq_remap_low.fpd, irte.irq_remap_low.dst_mode,
> + irte.irq_remap_low.redir_hint, irte.irq_remap_low.trigger_mode,
> + irte.irq_remap_low.dlvry_mode, irte.irq_remap_low.avail,
> + irte.irq_remap_low.vector, irte.irq_remap_low.dest_id,
> + irte.irq_remap_high.sid, irte.irq_remap_high.sq,
> + irte.irq_remap_high.svt);
>
> entry = (struct IR_IO_APIC_route_entry *)route_entry;
> memset(entry, 0, sizeof(*entry));
> @@ -1046,8 +1049,8 @@ intel_ioapic_set_affinity(struct irq_data *data, const struct cpumask *mask,
> return err;
> }
>
> - irte.vector = cfg->vector;
> - irte.dest_id = IRTE_DEST(dest);
> + irte.irq_remap_low.vector = cfg->vector;
> + irte.irq_remap_low.dest_id = IRTE_DEST(dest);
>
> /*
> * Atomically updates the IRTE with the new destination, vector
> diff --git a/include/linux/dmar.h b/include/linux/dmar.h
> index 593fff9..8be5d42 100644
> --- a/include/linux/dmar.h
> +++ b/include/linux/dmar.h
> @@ -159,7 +159,7 @@ struct irte {
> vector : 8,
> __reserved_2 : 8,
> dest_id : 32;
> - };
> + } irq_remap_low;
> __u64 low;
> };
>
> @@ -169,7 +169,7 @@ struct irte {
> sq : 2,
> svt : 2,
> __reserved_3 : 44;
> - };
> + } irq_remap_high;
> __u64 high;
> };
> };
>

2014-11-11 13:48:53

by Jiang Liu

[permalink] [raw]
Subject: Re: [PATCH 12/13] iommu/vt-d: No need to migrating irq for VT-d Posted-Interrtups



On 2014/11/10 14:26, Feng Wu wrote:
> We don't need to migrate the irqs for VT-d Posted-Interrtups here.
> When 'pst' is set in IRTE, the associated irq will be posted to
> guests instead of interrupt remapping. The destination of the
> interrupt is set in Posted-Interrupts Descriptor, and the migration
> happens during VCPU scheduling.
>
> Signed-off-by: Feng Wu <[email protected]>
> ---
> drivers/iommu/intel_irq_remapping.c | 7 +++++++
> 1 files changed, 7 insertions(+), 0 deletions(-)
>
> diff --git a/drivers/iommu/intel_irq_remapping.c b/drivers/iommu/intel_irq_remapping.c
> index 87c02fe..249e2b1 100644
> --- a/drivers/iommu/intel_irq_remapping.c
> +++ b/drivers/iommu/intel_irq_remapping.c
> @@ -1038,6 +1038,13 @@ intel_ioapic_set_affinity(struct irq_data *data, const struct cpumask *mask,
> if (get_irte(irq, &irte))
> return -EBUSY;
>
> + /*
> + * If the interrupt is for posting, it is used by guests,
> + * we cannot change IRTE here.
> + */
> + if (irte.irq_post_low.pst == 1)
> + return 0;
Hi Feng,
You should return some error code instead of 0, otherwise the
irq core will get confused.

> +
> err = assign_irq_vector(irq, cfg, mask);
> if (err)
> return err;
>

2014-11-12 03:42:44

by Zhang, Yang Z

[permalink] [raw]
Subject: RE: [PATCH 05/13] KVM: Update IRTE according to guest interrupt configuration changes

Paolo Bonzini wrote on 2014-11-11:
>
>
> On 11/11/2014 10:20, Wu, Feng wrote:
>>> Since legacy KVM device assignment is effectively deprecated, have
>>> you considered how we might do this with VFIO? Thanks,
>>
>> I haven't thought about how to enable this in VFIO so far. I think I
>> can continue to implement that if needed after this patch set is finished.
> What do you think of this?
>
> Hi Feng,
>
> we are not applying new features to legacy KVM device assignment,
> since it is unsafe (it does not honor ACS).

Personally, I think this feature will be helpful to the legacy device assignment. Agree, vfio is the right solution for future feature enabling. But the old kvm without the good vfio supporting is still used largely today. The user really looking for this feature but they will not upgrade their kernel. It's easy for us to backport this feature to old kvm with the legacy device assignment, but it is impossible to backport the whole vfio. So I think you guys can take a consider to add this feature to both vfio and legacy device assignment.

>
> I and Alex can help you with designing a way to interface VFIO with
> KVM posted interrupts. Give us a few days to study these patches
> more, or feel free to request comments if you have ideas about it yourself.
>
> Paolo


Best regards,
Yang

????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2014-11-12 09:15:17

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH 05/13] KVM: Update IRTE according to guest interrupt configuration changes



On 12/11/2014 04:42, Zhang, Yang Z wrote:
> Personally, I think this feature will be helpful to the legacy device
> assignment. Agree, vfio is the right solution for future feature
> enabling. But the old kvm without the good vfio supporting is still
> used largely today. The user really looking for this feature but they
> will not upgrade their kernel. It's easy for us to backport this
> feature to old kvm with the legacy device assignment, but it is
> impossible to backport the whole vfio.

You can certainly backport these patches to distros that do not have
VFIO. But upstream we should work on VFIO first. VFIO has feature
parity with legacy device assignment, and adding a new feature that is
not in VFIO would be a bad idea.

By the way, do you have benchmark results for it? We have not been able
to see any performance improvement for APICv on e.g. netperf.

Paolo

2014-11-12 09:23:32

by Wu, Feng

[permalink] [raw]
Subject: RE: [PATCH 05/13] KVM: Update IRTE according to guest interrupt configuration changes



> -----Original Message-----
> From: Paolo Bonzini [mailto:[email protected]]
> Sent: Wednesday, November 12, 2014 5:14 PM
> To: Zhang, Yang Z; Wu, Feng; Alex Williamson
> Cc: [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]
> Subject: Re: [PATCH 05/13] KVM: Update IRTE according to guest interrupt
> configuration changes
>
>
>
> On 12/11/2014 04:42, Zhang, Yang Z wrote:
> > Personally, I think this feature will be helpful to the legacy device
> > assignment. Agree, vfio is the right solution for future feature
> > enabling. But the old kvm without the good vfio supporting is still
> > used largely today. The user really looking for this feature but they
> > will not upgrade their kernel. It's easy for us to backport this
> > feature to old kvm with the legacy device assignment, but it is
> > impossible to backport the whole vfio.
>
> You can certainly backport these patches to distros that do not have
> VFIO. But upstream we should work on VFIO first. VFIO has feature
> parity with legacy device assignment, and adding a new feature that is
> not in VFIO would be a bad idea.
>
> By the way, do you have benchmark results for it? We have not been able
> to see any performance improvement for APICv on e.g. netperf.

Do you mean benchmark results for APICv itself or VT-d Posted-Interrtups?

Thanks,
Feng

>
> Paolo
????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2014-11-12 09:57:17

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH 05/13] KVM: Update IRTE according to guest interrupt configuration changes



On 12/11/2014 10:19, Wu, Feng wrote:
>> You can certainly backport these patches to distros that do not have
>> VFIO. But upstream we should work on VFIO first. VFIO has feature
>> parity with legacy device assignment, and adding a new feature that is
>> not in VFIO would be a bad idea.
>>
>> By the way, do you have benchmark results for it? We have not been able
>> to see any performance improvement for APICv on e.g. netperf.
>
> Do you mean benchmark results for APICv itself or VT-d Posted-Interrtups?

Especially for VT-d posted interrupts---but it'd be great to know which
workloads see the biggest speedup from APICv.

Paolo

2014-11-12 17:12:09

by Alex Williamson

[permalink] [raw]
Subject: Re: [PATCH 05/13] KVM: Update IRTE according to guest interrupt configuration changes

On Wed, 2014-11-12 at 10:14 +0100, Paolo Bonzini wrote:
>
> On 12/11/2014 04:42, Zhang, Yang Z wrote:
> > Personally, I think this feature will be helpful to the legacy device
> > assignment. Agree, vfio is the right solution for future feature
> > enabling. But the old kvm without the good vfio supporting is still
> > used largely today. The user really looking for this feature but they
> > will not upgrade their kernel. It's easy for us to backport this
> > feature to old kvm with the legacy device assignment, but it is
> > impossible to backport the whole vfio.
>
> You can certainly backport these patches to distros that do not have
> VFIO. But upstream we should work on VFIO first. VFIO has feature
> parity with legacy device assignment, and adding a new feature that is
> not in VFIO would be a bad idea.

Thanks Paolo, I agree. We should design the interfaces for VFIO since
we expect legacy KVM assignment to be deprecated and eventually removed.
I think that some of the platform device work for ARM's IRQ forwarding
should probably be leveraged for this interface. IRQ forwarding
effectively allows level triggered interrupts to be handled as edge,
eliminating the mask/unmask overhead and EOI path entirely. To do this
through VFIO they make use of the KVM-VFIO device to register the device
and set attributes for the forwarded IRQ. This enables KVM to use the
VFIO external user interfaces to acquire a VFIO device reference and
access the struct device. From there it can do some IRQ manipulation on
the device to reconfigure how the host handles the interrupt. Ideally
we could use the same base KVM-VFIO device interface interface, perhaps
with different attributes, and obviously with different architecture
backing. Thanks,

Alex

2014-11-13 01:14:22

by Wu, Feng

[permalink] [raw]
Subject: RE: [PATCH 05/13] KVM: Update IRTE according to guest interrupt configuration changes



> -----Original Message-----
> From: [email protected] [mailto:[email protected]] On
> Behalf Of Paolo Bonzini
> Sent: Wednesday, November 12, 2014 5:56 PM
> To: Wu, Feng; Zhang, Yang Z; Alex Williamson
> Cc: [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]
> Subject: Re: [PATCH 05/13] KVM: Update IRTE according to guest interrupt
> configuration changes
>
>
>
> On 12/11/2014 10:19, Wu, Feng wrote:
> >> You can certainly backport these patches to distros that do not have
> >> VFIO. But upstream we should work on VFIO first. VFIO has feature
> >> parity with legacy device assignment, and adding a new feature that is
> >> not in VFIO would be a bad idea.
> >>
> >> By the way, do you have benchmark results for it? We have not been able
> >> to see any performance improvement for APICv on e.g. netperf.
> >
> > Do you mean benchmark results for APICv itself or VT-d Posted-Interrtups?
>
> Especially for VT-d posted interrupts---but it'd be great to know which
> workloads see the biggest speedup from APICv.

We have some draft performance data internally, please see the attached. For VT-d PI,
I think we can get the biggest performance gain if the VCPU is running in non-root mode
for most of the time (not in HLT state), since external interrupt from assigned devices
will be delivered by guest directly in this case. That means we can run some cpu
intensive workload in the guests.

Thanks,
Feng

>
> Paolo
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html


Attachments:
VT-d PI Performance on KVM.pdf (288.72 kB)
VT-d PI Performance on KVM.pdf

2014-11-13 01:21:58

by Zhang, Yang Z

[permalink] [raw]
Subject: RE: [PATCH 05/13] KVM: Update IRTE according to guest interrupt configuration changes

Wu, Feng wrote on 2014-11-13:
>
>
> [email protected] wrote on 2014-11-12:
>> [email protected]; [email protected];
>> [email protected]
>> Subject: Re: [PATCH 05/13] KVM: Update IRTE according to guest
>> interrupt configuration changes
>>
>>
>>
>> On 12/11/2014 10:19, Wu, Feng wrote:
>>>> You can certainly backport these patches to distros that do not
>>>> have VFIO. But upstream we should work on VFIO first. VFIO has
>>>> feature parity with legacy device assignment, and adding a new
>>>> feature that is not in VFIO would be a bad idea.
>>>>
>>>> By the way, do you have benchmark results for it? We have not been
>>>> able to see any performance improvement for APICv on e.g. netperf.
>>>
>>> Do you mean benchmark results for APICv itself or VT-d Posted-Interrtups?
>>
>> Especially for VT-d posted interrupts---but it'd be great to know
>> which workloads see the biggest speedup from APICv.
>
> We have some draft performance data internally, please see the
> attached. For VT-d PI, I think we can get the biggest performance gain
> if the VCPU is running in non-root mode for most of the time (not in
> HLT state), since external interrupt from assigned devices will be delivered by guest directly in this case.
> That means we can run some cpu intensive workload in the guests.

Have you check that the CPU side posted interrupt is taking effect in w/o VT-D PI case? Per my understanding, the performance gap should be so large if you use CPU side posted interrupt. This data more like the VT-d PI vs non PI(both VT-d and CPU).

>
> Thanks,
> Feng
>
>>
>> Paolo
>> --
>> To unsubscribe from this list: send the line "unsubscribe kvm" in the
>> body of a message to [email protected] More majordomo info at
>> http://vger.kernel.org/majordomo-info.html


Best regards,
Yang


????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2014-11-13 01:32:45

by Wu, Feng

[permalink] [raw]
Subject: RE: [PATCH 05/13] KVM: Update IRTE according to guest interrupt configuration changes



> -----Original Message-----
> From: Zhang, Yang Z
> Sent: Thursday, November 13, 2014 9:21 AM
> To: Wu, Feng; Paolo Bonzini; Alex Williamson
> Cc: [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]
> Subject: RE: [PATCH 05/13] KVM: Update IRTE according to guest interrupt
> configuration changes
>
> Wu, Feng wrote on 2014-11-13:
> >
> >
> > [email protected] wrote on 2014-11-12:
> >> [email protected]; [email protected];
> >> [email protected]
> >> Subject: Re: [PATCH 05/13] KVM: Update IRTE according to guest
> >> interrupt configuration changes
> >>
> >>
> >>
> >> On 12/11/2014 10:19, Wu, Feng wrote:
> >>>> You can certainly backport these patches to distros that do not
> >>>> have VFIO. But upstream we should work on VFIO first. VFIO has
> >>>> feature parity with legacy device assignment, and adding a new
> >>>> feature that is not in VFIO would be a bad idea.
> >>>>
> >>>> By the way, do you have benchmark results for it? We have not been
> >>>> able to see any performance improvement for APICv on e.g. netperf.
> >>>
> >>> Do you mean benchmark results for APICv itself or VT-d Posted-Interrtups?
> >>
> >> Especially for VT-d posted interrupts---but it'd be great to know
> >> which workloads see the biggest speedup from APICv.
> >
> > We have some draft performance data internally, please see the
> > attached. For VT-d PI, I think we can get the biggest performance gain
> > if the VCPU is running in non-root mode for most of the time (not in
> > HLT state), since external interrupt from assigned devices will be delivered by
> guest directly in this case.
> > That means we can run some cpu intensive workload in the guests.
>
> Have you check that the CPU side posted interrupt is taking effect in w/o VT-D
> PI case? Per my understanding, the performance gap should be so large if you
> use CPU side posted interrupt. This data more like the VT-d PI vs non PI(both
> VT-d and CPU).

Yes, this data is VT-d PI vs Non VT-d PI. The CPU side APICv mechanism (including CPU side Posted-Interrtups) is enabled.

Thanks,
Feng

>
> >
> > Thanks,
> > Feng
> >
> >>
> >> Paolo
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe kvm" in the
> >> body of a message to [email protected] More majordomo info at
> >> http://vger.kernel.org/majordomo-info.html
>
>
> Best regards,
> Yang
>

????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2014-11-13 01:49:56

by Zhang, Yang Z

[permalink] [raw]
Subject: RE: [PATCH 05/13] KVM: Update IRTE according to guest interrupt configuration changes

Wu, Feng wrote on 2014-11-13:
>
>
> Zhang, Yang Z wrote on 2014-11-13:
>> [email protected]; [email protected];
>> [email protected]
>> Subject: RE: [PATCH 05/13] KVM: Update IRTE according to guest
>> interrupt configuration changes
>>
>> Wu, Feng wrote on 2014-11-13:
>>>
>>>
>>> [email protected] wrote on 2014-11-12:
>>>> [email protected]; [email protected];
>>>> [email protected]
>>>> Subject: Re: [PATCH 05/13] KVM: Update IRTE according to guest
>>>> interrupt configuration changes
>>>>
>>>>
>>>>
>>>> On 12/11/2014 10:19, Wu, Feng wrote:
>>>>>> You can certainly backport these patches to distros that do not
>>>>>> have VFIO. But upstream we should work on VFIO first. VFIO
>>>>>> has feature parity with legacy device assignment, and adding a
>>>>>> new feature that is not in VFIO would be a bad idea.
>>>>>>
>>>>>> By the way, do you have benchmark results for it? We have not
>>>>>> been able to see any performance improvement for APICv on e.g.
> netperf.
>>>>>
>>>>> Do you mean benchmark results for APICv itself or VT-d
> Posted-Interrtups?
>>>>
>>>> Especially for VT-d posted interrupts---but it'd be great to know
>>>> which workloads see the biggest speedup from APICv.
>>>
>>> We have some draft performance data internally, please see the
>>> attached. For VT-d PI, I think we can get the biggest performance gain
>>> if the VCPU is running in non-root mode for most of the time (not in
>>> HLT state), since external interrupt from assigned devices will be
>>> delivered by guest directly in this case. That means we can run some
>>> cpu intensive workload in the guests.
>>
>> Have you check that the CPU side posted interrupt is taking effect
>> in w/o VT-D PI case? Per my understanding, the performance gap
>> should be so large if you use CPU side posted interrupt. This data
>> more like the VT-d PI vs non PI(both VT-d and CPU).
>
> Yes, this data is VT-d PI vs Non VT-d PI. The CPU side APICv mechanism
> (including CPU side Posted-Interrtups) is enabled.

>From the CPU utilization data, it seems the environment of APICv is not reasonable to me. with current APICv, the interrupt should not deliver to the PCPU where vcpu is running. Otherwise, it will force the vcpu vmexit and the CPU side posted interrupt cannot take effect. Do you set the interrupt affinity manually?

>
> Thanks,
> Feng
>
>>
>>>
>>> Thanks,
>>> Feng
>>>
>>>>
>>>> Paolo
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe kvm" in
>>>> the body of a message to [email protected] More majordomo
>>>> info at http://vger.kernel.org/majordomo-info.html
>>
>>
>> Best regards,
>> Yang
>>


Best regards,
Yang


????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2014-11-20 04:53:10

by Wu, Feng

[permalink] [raw]
Subject: RE: [PATCH 02/13] KVM: Initialize VT-d Posted-Interrtups Descriptor



> -----Original Message-----
> From: Jiang Liu [mailto:[email protected]]
> Sent: Tuesday, November 11, 2014 9:36 PM
> To: Wu, Feng; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected]
> Cc: [email protected]; [email protected];
> [email protected]
> Subject: Re: [PATCH 02/13] KVM: Initialize VT-d Posted-Interrtups Descriptor
>
> On 2014/11/10 14:26, Feng Wu wrote:
> > This patch initialize the VT-d Posted-interrupt Descritpor.
> >
> > Signed-off-by: Feng Wu <[email protected]>
> > ---
> > arch/x86/include/asm/irq_remapping.h | 1 +
> > arch/x86/kernel/apic/apic.c | 1 +
> > arch/x86/kvm/vmx.c | 56
> ++++++++++++++++++++++++++++++++-
> > 3 files changed, 56 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/x86/include/asm/irq_remapping.h
> b/arch/x86/include/asm/irq_remapping.h
> > index b7747c4..a3cc437 100644
> > --- a/arch/x86/include/asm/irq_remapping.h
> > +++ b/arch/x86/include/asm/irq_remapping.h
> > @@ -57,6 +57,7 @@ extern bool setup_remapped_irq(int irq,
> > struct irq_chip *chip);
> >
> > void irq_remap_modify_chip_defaults(struct irq_chip *chip);
> > +extern int irq_post_enabled;
> >
> > #else /* CONFIG_IRQ_REMAP */
> >
> > diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
> > index ba6cc04..987408d 100644
> > --- a/arch/x86/kernel/apic/apic.c
> > +++ b/arch/x86/kernel/apic/apic.c
> > @@ -162,6 +162,7 @@ __setup("apicpmtimer", setup_apicpmtimer);
> > #endif
> >
> > int x2apic_mode;
> > +EXPORT_SYMBOL_GPL(x2apic_mode);
> > #ifdef CONFIG_X86_X2APIC
> > /* x2apic enabled before OS handover */
> > int x2apic_preenabled;
> > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> > index 3e556c6..a4670d3 100644
> > --- a/arch/x86/kvm/vmx.c
> > +++ b/arch/x86/kvm/vmx.c
> > @@ -45,6 +45,7 @@
> > #include <asm/perf_event.h>
> > #include <asm/debugreg.h>
> > #include <asm/kexec.h>
> > +#include <asm/irq_remapping.h>
> >
> > #include "trace.h"
> >
> > @@ -408,13 +409,32 @@ struct nested_vmx {
> > };
> >
> > #define POSTED_INTR_ON 0
> > +#define POSTED_INTR_SN 1
> > +
> > /* Posted-Interrupt Descriptor */
> > struct pi_desc {
> > u32 pir[8]; /* Posted interrupt requested */
> > - u32 control; /* bit 0 of control is outstanding notification bit */
> > - u32 rsvd[7];
> > + union {
> > + struct {
> > + u64 on : 1,
> > + sn : 1,
> > + rsvd_1 : 13,
> > + ndm : 1,
> > + nv : 8,
> > + rsvd_2 : 8,
> > + ndst : 32;
> > + };
> > + u64 control;
> > + };
> > + u32 rsvd[6];
> > } __aligned(64);
> >
> > +static void pi_clear_sn(struct pi_desc *pi_desc)
> > +{
> > + return clear_bit(POSTED_INTR_SN,
> > + (unsigned long *)&pi_desc->control);
> > +}
> > +
> > static bool pi_test_and_set_on(struct pi_desc *pi_desc)
> > {
> > return test_and_set_bit(POSTED_INTR_ON,
> > @@ -4396,6 +4416,33 @@ static void ept_set_mmio_spte_mask(void)
> > kvm_mmu_set_mmio_spte_mask((0x3ull << 62) | 0x6ull);
> > }
> >
> > +static bool pi_desc_init(struct vcpu_vmx *vmx)
> > +{
> > + unsigned int dest;
> > +
> > + if (irq_post_enabled == 0)
> > + return true;
> > +
> > + /*
> > + * Initialize Posted-Interrupt Descriptor
> > + */
> > +
> > + pi_clear_sn(&vmx->pi_desc);
> > + vmx->pi_desc.nv = POSTED_INTR_VECTOR;
> > +
> > + /* Physical mode for Notificaiton Event */
> > + vmx->pi_desc.ndm = 0;
> > + dest = cpu_physical_id(vmx->vcpu.cpu);
> > +
> > + if (x2apic_mode)
> Hi Feng,
> Could you try to use x2apic_enabled() here so you don't
> need to export x2apic_mode?
> Regards!
> Gerry

In that case, we should also export x2apic_enabled(), right?

Thanks,
Feng

> > + vmx->pi_desc.ndst = dest;
> > + else
> > + vmx->pi_desc.ndst = (dest << 8) & 0xFF00;
> > +
> > + return true;
> > +}
> > +
> > +
> > /*
> > * Sets up the vmcs for emulated real mode.
> > */
> > @@ -4439,6 +4486,11 @@ static int vmx_vcpu_setup(struct vcpu_vmx
> *vmx)
> >
> > vmcs_write64(POSTED_INTR_NV, POSTED_INTR_VECTOR);
> > vmcs_write64(POSTED_INTR_DESC_ADDR, __pa((&vmx->pi_desc)));
> > +
> > + if (!pi_desc_init(vmx)) {
> > + printk(KERN_ERR "Initialize PI descriptor error!\n");
> > + return 1;
> > + }
> > }
> >
> > if (ple_gap) {
> >

2014-11-20 05:01:00

by Jiang Liu

[permalink] [raw]
Subject: Re: [PATCH 02/13] KVM: Initialize VT-d Posted-Interrtups Descriptor

On 2014/11/20 12:53, Wu, Feng wrote:
>
>
>> -----Original Message-----
<snit>
>>> + /*
>>> + * Initialize Posted-Interrupt Descriptor
>>> + */
>>> +
>>> + pi_clear_sn(&vmx->pi_desc);
>>> + vmx->pi_desc.nv = POSTED_INTR_VECTOR;
>>> +
>>> + /* Physical mode for Notificaiton Event */
>>> + vmx->pi_desc.ndm = 0;
>>> + dest = cpu_physical_id(vmx->vcpu.cpu);
>>> +
>>> + if (x2apic_mode)
>> Hi Feng,
>> Could you try to use x2apic_enabled() here so you don't
>> need to export x2apic_mode?
>> Regards!
>> Gerry
>
> In that case, we should also export x2apic_enabled(), right?
Hi Feng,
x2apic_enabled() is a static inline function:)
Regards!
Gerry