This series allows to optimize the deactivation of virtual interrupts
associated to vfio platform device interrupts.
This is a revival of "[PATCH v4 00/13] ARM IRQ forward control based on
IRQ bypass manager" (https://lkml.org/lkml/2015/11/19/351) whose development
was stalled due to dependency on new VGIC design and drop of priority.
Without that optimization the deactivation of the physical IRQ is performed
by the host. Also for level sensitive interrupts, The VFIO driver disables
the physical IRQ. The deactivation of the virtual IRQ by the guest is trapped
and the physical IRQ gets re-enabled at that time.
The ARM GIC supports direct EOI for virtual interrupts directly mapped
to physical interrupts. When this mode is set, the host does not
deactivate the physical interrupt anymore, but simply drops the
interrupt priority on EOI. When the guest deactivates the virtual IRQ,
the GIC automatically deactivates the physical IRQ. This avoids a world
switch on deactivation.
This series sets direct EOI mode on ARM/ARM64 for vfio platform interrupts.
This relies on a negotiation between the vfio platform driver and KVM/irqfd
though the irq bypass manager.
The setup sequence is:
preamble:
- disable the physical IRQ
- halt guest execution
forwarding setting:
- program the VFIO driver for forwarding (select the right physical
interrupt handler)
- program the VGIC and IRQCHIP for forwarding
postamble:
- resume guest execution
- enable the physical IRQ
When destroying the optimized path the following sequence is executed:
- preamble
- unset forwarding at VGIC and IRQCHIP level
- unset forwarding at VFIO level
- postamble
The injection still is based on irqfd triggering. For level sensitive
interrupts though, the resamplefd is not triggered anymore since
deactivation is not trapped by KVM.
This was tested with:
- AMD Seattle xgmac platform device assignment
- Also MSI non regression was tested
The series can be fount at:
https://github.com/eauger/linux/tree/v4.12-rc5-deoi-v2
It is based on 4.12-rc5
Best Regards
Eric
History:
v1 -> v2:
- drop VFIO-PCI INTx support due to shared interrupt incompatibility
- introduce a producer type
- restructured kvm_vgic_[set|unset]_forwarding for reuse in forwarding
setting
Eric Auger (8):
VFIO: platform: Differentiate auto-masking from user masking
VFIO: platform: Introduce direct EOI interrupt handler
VFIO: platform: Direct EOI irq bypass for ARM/ARM64
KVM: arm/arm64: vgic: restructure kvm_vgic_(un)map_phys_irq
KVM: arm/arm64: vgic: Handle mapped level sensitive SPIs
KVM: arm/arm64: vgic: Implement forwarding setting
virt: irqbypass: Add a type field to the irqbypass producer
KVM: arm/arm64: register DEOI irq bypass consumer on ARM/ARM64
arch/arm/kvm/Kconfig | 3 +
arch/arm64/kvm/Kconfig | 3 +
drivers/vfio/pci/vfio_pci_intrs.c | 1 +
drivers/vfio/platform/Kconfig | 5 +
drivers/vfio/platform/Makefile | 2 +-
drivers/vfio/platform/vfio_platform_irq.c | 60 ++++++---
drivers/vfio/platform/vfio_platform_irq_bypass.c | 119 ++++++++++++++++
drivers/vfio/platform/vfio_platform_private.h | 30 ++++-
include/kvm/arm_vgic.h | 13 +-
include/linux/irqbypass.h | 7 +
virt/kvm/arm/arch_timer.c | 24 +---
virt/kvm/arm/arm.c | 48 +++++++
virt/kvm/arm/vgic/vgic.c | 164 ++++++++++++++++++++---
virt/kvm/arm/vgic/vgic.h | 7 +-
14 files changed, 425 insertions(+), 61 deletions(-)
create mode 100644 drivers/vfio/platform/vfio_platform_irq_bypass.c
--
2.5.5
For direct EOI modality we will need to differentiate a userspace
masking from the IRQ handler auto-masking.
The level sensitive IRQ handler sets the automasked flag while
VFIO_DEVICE_SET_IRQS with VFIO_IRQ_SET_ACTION_MASK sets the
usermasked flag. VFIO_IRQ_SET_ACTION_UNMASK resets both flags.
Signed-off-by: Eric Auger <[email protected]>
---
v1 -> v2:
- rename maked into usermasked
- add is_masked() macro
- commit message reworded to better explain role of each flag
- change commit title
---
drivers/vfio/platform/vfio_platform_irq.c | 18 ++++++++++--------
drivers/vfio/platform/vfio_platform_private.h | 6 +++++-
2 files changed, 15 insertions(+), 9 deletions(-)
diff --git a/drivers/vfio/platform/vfio_platform_irq.c b/drivers/vfio/platform/vfio_platform_irq.c
index 46d4750..6740dbe 100644
--- a/drivers/vfio/platform/vfio_platform_irq.c
+++ b/drivers/vfio/platform/vfio_platform_irq.c
@@ -29,9 +29,9 @@ static void vfio_platform_mask(struct vfio_platform_irq *irq_ctx)
spin_lock_irqsave(&irq_ctx->lock, flags);
- if (!irq_ctx->masked) {
+ if (!is_masked(irq_ctx)) {
disable_irq_nosync(irq_ctx->hwirq);
- irq_ctx->masked = true;
+ irq_ctx->usermasked = true;
}
spin_unlock_irqrestore(&irq_ctx->lock, flags);
@@ -89,9 +89,10 @@ static void vfio_platform_unmask(struct vfio_platform_irq *irq_ctx)
spin_lock_irqsave(&irq_ctx->lock, flags);
- if (irq_ctx->masked) {
+ if (is_masked(irq_ctx)) {
enable_irq(irq_ctx->hwirq);
- irq_ctx->masked = false;
+ irq_ctx->usermasked = false;
+ irq_ctx->automasked = false;
}
spin_unlock_irqrestore(&irq_ctx->lock, flags);
@@ -152,12 +153,12 @@ static irqreturn_t vfio_automasked_irq_handler(int irq, void *dev_id)
spin_lock_irqsave(&irq_ctx->lock, flags);
- if (!irq_ctx->masked) {
+ if (!is_masked(irq_ctx)) {
ret = IRQ_HANDLED;
/* automask maskable interrupts */
disable_irq_nosync(irq_ctx->hwirq);
- irq_ctx->masked = true;
+ irq_ctx->automasked = true;
}
spin_unlock_irqrestore(&irq_ctx->lock, flags);
@@ -217,7 +218,7 @@ static int vfio_set_trigger(struct vfio_platform_device *vdev, int index,
return ret;
}
- if (!irq->masked)
+ if (!irq->usermasked)
enable_irq(irq->hwirq);
return 0;
@@ -314,7 +315,8 @@ int vfio_platform_irq_init(struct vfio_platform_device *vdev)
vdev->irqs[i].count = 1;
vdev->irqs[i].hwirq = hwirq;
- vdev->irqs[i].masked = false;
+ vdev->irqs[i].usermasked = false;
+ vdev->irqs[i].automasked = false;
}
vdev->num_irqs = cnt;
diff --git a/drivers/vfio/platform/vfio_platform_private.h b/drivers/vfio/platform/vfio_platform_private.h
index 85ffe5d..e8db291 100644
--- a/drivers/vfio/platform/vfio_platform_private.h
+++ b/drivers/vfio/platform/vfio_platform_private.h
@@ -33,7 +33,8 @@ struct vfio_platform_irq {
int hwirq;
char *name;
struct eventfd_ctx *trigger;
- bool masked;
+ bool usermasked;
+ bool automasked;
spinlock_t lock;
struct virqfd *unmask;
struct virqfd *mask;
@@ -102,6 +103,9 @@ extern int vfio_platform_set_irqs_ioctl(struct vfio_platform_device *vdev,
extern void __vfio_platform_register_reset(struct vfio_platform_reset_node *n);
extern void vfio_platform_unregister_reset(const char *compat,
vfio_platform_reset_fn_t fn);
+
+#define is_masked(irq) ((irq)->usermasked || (irq)->automasked)
+
#define vfio_platform_register_reset(__compat, __reset) \
static struct vfio_platform_reset_node __reset ## _node = { \
.owner = THIS_MODULE, \
--
2.5.5
We add two new fields in vfio_platform_irq: deoi and handler.
If deoi is set, this means the physical IRQ attached to the virtual
IRQ is directly deactivated by the guest and the VFIO driver does
not need to disable the physical IRQ and mask it at VFIO level.
The handler pointer points to either the automasked or "deoi" handler.
This latter is the same as the one used for edge sensitive IRQs.
A wrapper handler is introduced that calls the chosen handler function.
The irq lock is taken/released in the wrapper handler. eventfd_signal
can be called in regions not allowed to sleep.
Signed-off-by: Eric Auger <[email protected]>
---
v1 -> v2:
- fix a weird indentation
---
drivers/vfio/platform/vfio_platform_irq.c | 23 ++++++++++++++++-------
drivers/vfio/platform/vfio_platform_private.h | 2 ++
2 files changed, 18 insertions(+), 7 deletions(-)
diff --git a/drivers/vfio/platform/vfio_platform_irq.c b/drivers/vfio/platform/vfio_platform_irq.c
index 6740dbe..0228a96 100644
--- a/drivers/vfio/platform/vfio_platform_irq.c
+++ b/drivers/vfio/platform/vfio_platform_irq.c
@@ -148,11 +148,8 @@ static int vfio_platform_set_irq_unmask(struct vfio_platform_device *vdev,
static irqreturn_t vfio_automasked_irq_handler(int irq, void *dev_id)
{
struct vfio_platform_irq *irq_ctx = dev_id;
- unsigned long flags;
int ret = IRQ_NONE;
- spin_lock_irqsave(&irq_ctx->lock, flags);
-
if (!is_masked(irq_ctx)) {
ret = IRQ_HANDLED;
@@ -161,8 +158,6 @@ static irqreturn_t vfio_automasked_irq_handler(int irq, void *dev_id)
irq_ctx->automasked = true;
}
- spin_unlock_irqrestore(&irq_ctx->lock, flags);
-
if (ret == IRQ_HANDLED)
eventfd_signal(irq_ctx->trigger, 1);
@@ -178,6 +173,19 @@ static irqreturn_t vfio_irq_handler(int irq, void *dev_id)
return IRQ_HANDLED;
}
+static irqreturn_t vfio_wrapper_handler(int irq, void *dev_id)
+{
+ struct vfio_platform_irq *irq_ctx = dev_id;
+ unsigned long flags;
+ irqreturn_t ret;
+
+ spin_lock_irqsave(&irq_ctx->lock, flags);
+ ret = irq_ctx->handler(irq, dev_id);
+ spin_unlock_irqrestore(&irq_ctx->lock, flags);
+
+ return ret;
+}
+
static int vfio_set_trigger(struct vfio_platform_device *vdev, int index,
int fd, irq_handler_t handler)
{
@@ -208,9 +216,10 @@ static int vfio_set_trigger(struct vfio_platform_device *vdev, int index,
}
irq->trigger = trigger;
+ irq->handler = handler;
irq_set_status_flags(irq->hwirq, IRQ_NOAUTOEN);
- ret = request_irq(irq->hwirq, handler, 0, irq->name, irq);
+ ret = request_irq(irq->hwirq, vfio_wrapper_handler, 0, irq->name, irq);
if (ret) {
kfree(irq->name);
eventfd_ctx_put(trigger);
@@ -232,7 +241,7 @@ static int vfio_platform_set_irq_trigger(struct vfio_platform_device *vdev,
struct vfio_platform_irq *irq = &vdev->irqs[index];
irq_handler_t handler;
- if (vdev->irqs[index].flags & VFIO_IRQ_INFO_AUTOMASKED)
+ if (vdev->irqs[index].flags & VFIO_IRQ_INFO_AUTOMASKED && !irq->deoi)
handler = vfio_automasked_irq_handler;
else
handler = vfio_irq_handler;
diff --git a/drivers/vfio/platform/vfio_platform_private.h b/drivers/vfio/platform/vfio_platform_private.h
index e8db291..66a9ef5 100644
--- a/drivers/vfio/platform/vfio_platform_private.h
+++ b/drivers/vfio/platform/vfio_platform_private.h
@@ -38,6 +38,8 @@ struct vfio_platform_irq {
spinlock_t lock;
struct virqfd *unmask;
struct virqfd *mask;
+ bool deoi;
+ irqreturn_t (*handler)(int irq, void *dev_id);
};
struct vfio_platform_region {
--
2.5.5
We want to reuse the core of the map/unmap functions for IRQ
forwarding. Let's move the computation of the hwirq in
kvm_vgic_map_phys_irq and pass the linux IRQ as parameter.
the host_irq is added to struct vgic_irq.
We introduce kvm_vgic_map/unmap_irq which take a struct vgic_irq
handle as a parameter.
Signed-off-by: Eric Auger <[email protected]>
---
include/kvm/arm_vgic.h | 8 ++++---
virt/kvm/arm/arch_timer.c | 24 +------------------
virt/kvm/arm/vgic/vgic.c | 60 +++++++++++++++++++++++++++++++++++------------
3 files changed, 51 insertions(+), 41 deletions(-)
diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index ef71858..cceb31d 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -112,6 +112,7 @@ struct vgic_irq {
bool hw; /* Tied to HW IRQ */
struct kref refcount; /* Used for LPIs */
u32 hwintid; /* HW INTID number */
+ unsigned int host_irq; /* linux irq corresponding to hwintid */
union {
u8 targets; /* GICv2 target VCPUs mask */
u32 mpidr; /* GICv3 target VCPU */
@@ -301,9 +302,10 @@ int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, unsigned int intid,
bool level);
int kvm_vgic_inject_mapped_irq(struct kvm *kvm, int cpuid, unsigned int intid,
bool level);
-int kvm_vgic_map_phys_irq(struct kvm_vcpu *vcpu, u32 virt_irq, u32 phys_irq);
-int kvm_vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, unsigned int virt_irq);
-bool kvm_vgic_map_is_active(struct kvm_vcpu *vcpu, unsigned int virt_irq);
+int kvm_vgic_map_phys_irq(struct kvm_vcpu *vcpu, unsigned int host_irq,
+ u32 vintid);
+int kvm_vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, unsigned int vintid);
+bool kvm_vgic_map_is_active(struct kvm_vcpu *vcpu, unsigned int vintid);
int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu);
diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
index 5976609..57a30ab 100644
--- a/virt/kvm/arm/arch_timer.c
+++ b/virt/kvm/arm/arch_timer.c
@@ -617,9 +617,6 @@ int kvm_timer_enable(struct kvm_vcpu *vcpu)
{
struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
- struct irq_desc *desc;
- struct irq_data *data;
- int phys_irq;
int ret;
if (timer->enabled)
@@ -632,26 +629,7 @@ int kvm_timer_enable(struct kvm_vcpu *vcpu)
if (!vgic_initialized(vcpu->kvm))
return -ENODEV;
- /*
- * Find the physical IRQ number corresponding to the host_vtimer_irq
- */
- desc = irq_to_desc(host_vtimer_irq);
- if (!desc) {
- kvm_err("%s: no interrupt descriptor\n", __func__);
- return -EINVAL;
- }
-
- data = irq_desc_get_irq_data(desc);
- while (data->parent_data)
- data = data->parent_data;
-
- phys_irq = data->hwirq;
-
- /*
- * Tell the VGIC that the virtual interrupt is tied to a
- * physical interrupt. We do that once per VCPU.
- */
- ret = kvm_vgic_map_phys_irq(vcpu, vtimer->irq.irq, phys_irq);
+ ret = kvm_vgic_map_phys_irq(vcpu, host_vtimer_irq, vtimer->irq.irq);
if (ret)
return ret;
diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c
index 83b24d2..075f073 100644
--- a/virt/kvm/arm/vgic/vgic.c
+++ b/virt/kvm/arm/vgic/vgic.c
@@ -17,6 +17,8 @@
#include <linux/kvm.h>
#include <linux/kvm_host.h>
#include <linux/list_sort.h>
+#include <linux/interrupt.h>
+#include <linux/irq.h>
#include "vgic.h"
@@ -392,38 +394,66 @@ int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, unsigned int intid,
return 0;
}
-int kvm_vgic_map_phys_irq(struct kvm_vcpu *vcpu, u32 virt_irq, u32 phys_irq)
+/* @irq->irq_lock must be held */
+static int kvm_vgic_map_irq(struct kvm_vcpu *vcpu, struct vgic_irq *irq,
+ unsigned int host_irq)
{
- struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, virt_irq);
+ struct irq_desc *desc;
+ struct irq_data *data;
- BUG_ON(!irq);
-
- spin_lock(&irq->irq_lock);
+ /*
+ * Find the physical IRQ number corresponding to @host_irq
+ */
+ desc = irq_to_desc(host_irq);
+ if (!desc) {
+ kvm_err("%s: no interrupt descriptor\n", __func__);
+ return -EINVAL;
+ }
+ data = irq_desc_get_irq_data(desc);
+ while (data->parent_data)
+ data = data->parent_data;
irq->hw = true;
- irq->hwintid = phys_irq;
+ irq->host_irq = host_irq;
+ irq->hwintid = data->hwirq;
+ return 0;
+}
+
+/* @irq->irq_lock must be held */
+static inline void kvm_vgic_unmap_irq(struct vgic_irq *irq)
+{
+ irq->hw = false;
+ irq->hwintid = 0;
+}
+
+int kvm_vgic_map_phys_irq(struct kvm_vcpu *vcpu, unsigned int host_irq,
+ u32 vintid)
+{
+ struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, vintid);
+ int ret;
+ BUG_ON(!irq);
+
+ spin_lock(&irq->irq_lock);
+ ret = kvm_vgic_map_irq(vcpu, irq, host_irq);
spin_unlock(&irq->irq_lock);
vgic_put_irq(vcpu->kvm, irq);
- return 0;
+ return ret;
}
-int kvm_vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, unsigned int virt_irq)
+int kvm_vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, unsigned int vintid)
{
struct vgic_irq *irq;
if (!vgic_initialized(vcpu->kvm))
return -EAGAIN;
- irq = vgic_get_irq(vcpu->kvm, vcpu, virt_irq);
+ irq = vgic_get_irq(vcpu->kvm, vcpu, vintid);
BUG_ON(!irq);
spin_lock(&irq->irq_lock);
-
- irq->hw = false;
- irq->hwintid = 0;
-
+ kvm_vgic_unmap_irq(irq);
spin_unlock(&irq->irq_lock);
vgic_put_irq(vcpu->kvm, irq);
@@ -726,9 +756,9 @@ void vgic_kick_vcpus(struct kvm *kvm)
}
}
-bool kvm_vgic_map_is_active(struct kvm_vcpu *vcpu, unsigned int virt_irq)
+bool kvm_vgic_map_is_active(struct kvm_vcpu *vcpu, unsigned int vintid)
{
- struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, virt_irq);
+ struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, vintid);
bool map_is_active;
spin_lock(&irq->irq_lock);
--
2.5.5
This patch adds the registration/unregistration of an
irq_bypass_producer for vfio platform device interrupts.
Its callbacks handle the direct EOI modality on VFIO side.
- stop/start: disable/enable the host irq
- add/del consumer: set the VFIO Direct EOI mode, ie. select the
adapted physical IRQ handler (automasked or not automasked).
Signed-off-by: Eric Auger <[email protected]>
---
v1 -> v2:
- add a comment explaining we allow direct EOI for edge sensitive
interrupts without modifying the physical IRQ handler.
---
drivers/vfio/platform/Kconfig | 5 +
drivers/vfio/platform/Makefile | 2 +-
drivers/vfio/platform/vfio_platform_irq.c | 19 ++++
drivers/vfio/platform/vfio_platform_irq_bypass.c | 118 +++++++++++++++++++++++
drivers/vfio/platform/vfio_platform_private.h | 22 +++++
5 files changed, 165 insertions(+), 1 deletion(-)
create mode 100644 drivers/vfio/platform/vfio_platform_irq_bypass.c
diff --git a/drivers/vfio/platform/Kconfig b/drivers/vfio/platform/Kconfig
index bb30128..33ec3d9 100644
--- a/drivers/vfio/platform/Kconfig
+++ b/drivers/vfio/platform/Kconfig
@@ -2,6 +2,7 @@ config VFIO_PLATFORM
tristate "VFIO support for platform devices"
depends on VFIO && EVENTFD && (ARM || ARM64)
select VFIO_VIRQFD
+ select IRQ_BYPASS_MANAGER
help
Support for platform devices with VFIO. This is required to make
use of platform devices present on the system using the VFIO
@@ -19,4 +20,8 @@ config VFIO_AMBA
If you don't know what to do here, say N.
+config VFIO_PLATFORM_IRQ_BYPASS_DEOI
+ depends on VFIO_PLATFORM
+ def_bool y
+
source "drivers/vfio/platform/reset/Kconfig"
diff --git a/drivers/vfio/platform/Makefile b/drivers/vfio/platform/Makefile
index 41a6224..324f3e7 100644
--- a/drivers/vfio/platform/Makefile
+++ b/drivers/vfio/platform/Makefile
@@ -1,4 +1,4 @@
-vfio-platform-base-y := vfio_platform_common.o vfio_platform_irq.o
+vfio-platform-base-y := vfio_platform_common.o vfio_platform_irq.o vfio_platform_irq_bypass.o
vfio-platform-y := vfio_platform.o
obj-$(CONFIG_VFIO_PLATFORM) += vfio-platform.o
diff --git a/drivers/vfio/platform/vfio_platform_irq.c b/drivers/vfio/platform/vfio_platform_irq.c
index 0228a96..452dc3d 100644
--- a/drivers/vfio/platform/vfio_platform_irq.c
+++ b/drivers/vfio/platform/vfio_platform_irq.c
@@ -20,6 +20,7 @@
#include <linux/types.h>
#include <linux/vfio.h>
#include <linux/irq.h>
+#include <linux/irqbypass.h>
#include "vfio_platform_private.h"
@@ -186,6 +187,19 @@ static irqreturn_t vfio_wrapper_handler(int irq, void *dev_id)
return ret;
}
+/* must be called with irq_ctx->lock held */
+int vfio_platform_set_deoi(struct vfio_platform_irq *irq_ctx, bool deoi)
+{
+ irq_ctx->deoi = deoi;
+
+ if (!deoi && (irq_ctx->flags & VFIO_IRQ_INFO_AUTOMASKED))
+ irq_ctx->handler = vfio_automasked_irq_handler;
+ else
+ irq_ctx->handler = vfio_irq_handler;
+
+ return 0;
+}
+
static int vfio_set_trigger(struct vfio_platform_device *vdev, int index,
int fd, irq_handler_t handler)
{
@@ -196,6 +210,7 @@ static int vfio_set_trigger(struct vfio_platform_device *vdev, int index,
if (irq->trigger) {
irq_clear_status_flags(irq->hwirq, IRQ_NOAUTOEN);
free_irq(irq->hwirq, irq);
+ irq_bypass_unregister_producer(&irq->producer);
kfree(irq->name);
eventfd_ctx_put(irq->trigger);
irq->trigger = NULL;
@@ -227,6 +242,10 @@ static int vfio_set_trigger(struct vfio_platform_device *vdev, int index,
return ret;
}
+ if (vfio_platform_has_deoi())
+ vfio_platform_register_deoi_producer(vdev, irq,
+ trigger, irq->hwirq);
+
if (!irq->usermasked)
enable_irq(irq->hwirq);
diff --git a/drivers/vfio/platform/vfio_platform_irq_bypass.c b/drivers/vfio/platform/vfio_platform_irq_bypass.c
new file mode 100644
index 0000000..692b081
--- /dev/null
+++ b/drivers/vfio/platform/vfio_platform_irq_bypass.c
@@ -0,0 +1,118 @@
+/*
+ * VFIO platform device irqbypass callback implementation for DEOI
+ *
+ * Copyright (C) 2017 Red Hat, Inc. All rights reserved.
+ * Author: Eric Auger <[email protected]>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ */
+
+#include <linux/err.h>
+#include <linux/device.h>
+#include <linux/irq.h>
+#include <linux/irqbypass.h>
+#include "vfio_platform_private.h"
+
+#ifdef CONFIG_VFIO_PLATFORM_IRQ_BYPASS_DEOI
+
+static void irq_bypass_deoi_start(struct irq_bypass_producer *prod)
+{
+ enable_irq(prod->irq);
+}
+
+static void irq_bypass_deoi_stop(struct irq_bypass_producer *prod)
+{
+ disable_irq(prod->irq);
+}
+
+/**
+ * irq_bypass_deoi_add_consumer - turns irq direct EOI on
+ *
+ * The linux irq is disabled when the function is called.
+ * The operation succeeds only if the irq is not active at irqchip level
+ * and the irq is not automasked at VFIO level, meaning the IRQ is under
+ * injection into the guest.
+ */
+static int irq_bypass_deoi_add_consumer(struct irq_bypass_producer *prod,
+ struct irq_bypass_consumer *cons)
+{
+ struct vfio_platform_irq *irq_ctx =
+ container_of(prod, struct vfio_platform_irq, producer);
+ unsigned long flags;
+ bool active;
+ int ret;
+
+ spin_lock_irqsave(&irq_ctx->lock, flags);
+
+ ret = irq_get_irqchip_state(irq_ctx->hwirq, IRQCHIP_STATE_ACTIVE,
+ &active);
+ if (ret)
+ goto out;
+
+ if (active || irq_ctx->automasked) {
+ ret = -EAGAIN;
+ goto out;
+ }
+
+ /*
+ * Direct EOI is enabled for edge sensitive interrupts without any
+ * change with respect to the physical interrupt handler
+ */
+ if (!(irq_get_trigger_type(irq_ctx->hwirq) & IRQ_TYPE_LEVEL_MASK))
+ goto out;
+
+ ret = vfio_platform_set_deoi(irq_ctx, true);
+out:
+ spin_unlock_irqrestore(&irq_ctx->lock, flags);
+ return ret;
+}
+
+static void irq_bypass_deoi_del_consumer(struct irq_bypass_producer *prod,
+ struct irq_bypass_consumer *cons)
+{
+ struct vfio_platform_irq *irq_ctx =
+ container_of(prod, struct vfio_platform_irq, producer);
+ unsigned long flags;
+
+ spin_lock_irqsave(&irq_ctx->lock, flags);
+ if (irq_get_trigger_type(irq_ctx->hwirq) & IRQ_TYPE_LEVEL_MASK)
+ vfio_platform_set_deoi(irq_ctx, false);
+ spin_unlock_irqrestore(&irq_ctx->lock, flags);
+}
+
+bool vfio_platform_has_deoi(void)
+{
+ return true;
+}
+
+void vfio_platform_register_deoi_producer(struct vfio_platform_device *vdev,
+ struct vfio_platform_irq *irq,
+ struct eventfd_ctx *trigger,
+ unsigned int host_irq)
+{
+ struct irq_bypass_producer *prod = &irq->producer;
+ int ret;
+
+ prod->token = trigger;
+ prod->irq = host_irq;
+ prod->add_consumer = irq_bypass_deoi_add_consumer;
+ prod->del_consumer = irq_bypass_deoi_del_consumer;
+ prod->stop = irq_bypass_deoi_stop;
+ prod->start = irq_bypass_deoi_start;
+
+ ret = irq_bypass_register_producer(prod);
+ if (unlikely(ret))
+ dev_info(vdev->device,
+ "irq bypass producer (token %p) registration fails: %d\n",
+ prod->token, ret);
+}
+
+#endif
+
diff --git a/drivers/vfio/platform/vfio_platform_private.h b/drivers/vfio/platform/vfio_platform_private.h
index 66a9ef5..ab85c58 100644
--- a/drivers/vfio/platform/vfio_platform_private.h
+++ b/drivers/vfio/platform/vfio_platform_private.h
@@ -17,6 +17,7 @@
#include <linux/types.h>
#include <linux/interrupt.h>
+#include <linux/irqbypass.h>
#define VFIO_PLATFORM_OFFSET_SHIFT 40
#define VFIO_PLATFORM_OFFSET_MASK (((u64)(1) << VFIO_PLATFORM_OFFSET_SHIFT) - 1)
@@ -40,6 +41,7 @@ struct vfio_platform_irq {
struct virqfd *mask;
bool deoi;
irqreturn_t (*handler)(int irq, void *dev_id);
+ struct irq_bypass_producer producer;
};
struct vfio_platform_region {
@@ -102,12 +104,32 @@ extern int vfio_platform_set_irqs_ioctl(struct vfio_platform_device *vdev,
unsigned start, unsigned count,
void *data);
+extern int vfio_platform_set_deoi(struct vfio_platform_irq *irq_ctx, bool deoi);
+
extern void __vfio_platform_register_reset(struct vfio_platform_reset_node *n);
extern void vfio_platform_unregister_reset(const char *compat,
vfio_platform_reset_fn_t fn);
#define is_masked(irq) ((irq)->usermasked || (irq)->automasked)
+#ifdef CONFIG_VFIO_PLATFORM_IRQ_BYPASS_DEOI
+bool vfio_platform_has_deoi(void);
+void vfio_platform_register_deoi_producer(struct vfio_platform_device *vdev,
+ struct vfio_platform_irq *irq,
+ struct eventfd_ctx *trigger,
+ unsigned int host_irq);
+#else
+static inline bool vfio_platform_has_deoi(void)
+{
+ return false;
+}
+static inline
+void vfio_platform_register_deoi_producer(struct vfio_platform_device *vdev,
+ struct vfio_platform_irq *irq,
+ struct eventfd_ctx *trigger,
+ unsigned int host_irq) {}
+#endif
+
#define vfio_platform_register_reset(__compat, __reset) \
static struct vfio_platform_reset_node __reset ## _node = { \
.owner = THIS_MODULE, \
--
2.5.5
Currently, the line level of unmapped level sensitive SPIs is
toggled down by the maintenance IRQ handler/resamplefd mechanism.
As mapped SPI completion is not trapped, we cannot rely on this
mechanism and the line level needs to be observed at distributor
level instead.
This patch handles the physical IRQ case in vgic_validate_injection
and get the line level of a mapped SPI at distributor level.
Signed-off-by: Eric Auger <[email protected]>
---
v1 -> v2:
- renamed is_unshared_mapped into is_mapped_spi
- changes to kvm_vgic_map_phys_irq moved in the previous patch
- make vgic_validate_injection more readable
- reword the commit message
---
virt/kvm/arm/vgic/vgic.c | 16 ++++++++++++++--
virt/kvm/arm/vgic/vgic.h | 7 ++++++-
2 files changed, 20 insertions(+), 3 deletions(-)
diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c
index 075f073..2e35ac7 100644
--- a/virt/kvm/arm/vgic/vgic.c
+++ b/virt/kvm/arm/vgic/vgic.c
@@ -139,6 +139,17 @@ void vgic_put_irq(struct kvm *kvm, struct vgic_irq *irq)
kfree(irq);
}
+bool irq_line_level(struct vgic_irq *irq)
+{
+ bool line_level = irq->line_level;
+
+ if (unlikely(is_mapped_spi(irq)))
+ WARN_ON(irq_get_irqchip_state(irq->host_irq,
+ IRQCHIP_STATE_PENDING,
+ &line_level));
+ return line_level;
+}
+
/**
* kvm_vgic_target_oracle - compute the target vcpu for an irq
*
@@ -236,13 +247,14 @@ static void vgic_sort_ap_list(struct kvm_vcpu *vcpu)
/*
* Only valid injection if changing level for level-triggered IRQs or for a
- * rising edge.
+ * rising edge. Injection of virtual interrupts associated to physical
+ * interrupts always is valid.
*/
static bool vgic_validate_injection(struct vgic_irq *irq, bool level)
{
switch (irq->config) {
case VGIC_CONFIG_LEVEL:
- return irq->line_level != level;
+ return (irq->line_level != level || unlikely(is_mapped_spi(irq)));
case VGIC_CONFIG_EDGE:
return level;
}
diff --git a/virt/kvm/arm/vgic/vgic.h b/virt/kvm/arm/vgic/vgic.h
index bba7fa2..da254ae 100644
--- a/virt/kvm/arm/vgic/vgic.h
+++ b/virt/kvm/arm/vgic/vgic.h
@@ -96,14 +96,19 @@
/* we only support 64 kB translation table page size */
#define KVM_ITS_L1E_ADDR_MASK GENMASK_ULL(51, 16)
+bool irq_line_level(struct vgic_irq *irq);
+
static inline bool irq_is_pending(struct vgic_irq *irq)
{
if (irq->config == VGIC_CONFIG_EDGE)
return irq->pending_latch;
else
- return irq->pending_latch || irq->line_level;
+ return irq->pending_latch || irq_line_level(irq);
}
+#define is_mapped_spi(i) \
+((i)->hw && (i)->intid >= VGIC_NR_PRIVATE_IRQS && (i)->intid < 1020)
+
/*
* This struct provides an intermediate representation of the fields contained
* in the GICH_VMCR and ICH_VMCR registers, such that code exporting the GIC
--
2.5.5
This patch selects IRQ_BYPASS_MANAGER and HAVE_KVM_IRQ_BYPASS
configs for ARM/ARM64.
kvm_arch_has_irq_bypass() now is implemented and returns true.
As a consequence the irq bypass consumer will be registered for
ARM/ARM64 with Direct EOI/IRQ forwarding callbacks:
- stop/start: halt/resume guest execution
- add/del_producer: set/unset forwarding/DEOI at vgic/irqchip level
The consumer currently only works with a VFIO_PLATFORM producer.
In other cases, start/stop do nothing and return without error
to avoid outputting a spurious warning.
Signed-off-by: Eric Auger <[email protected]>
---
v1 -> v2:
- check the type of the producer to avoid attachement with VFIO-PCI
dummy MSI producer
---
arch/arm/kvm/Kconfig | 3 +++
arch/arm64/kvm/Kconfig | 3 +++
virt/kvm/arm/arm.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 54 insertions(+)
diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig
index 90d0176..4e2b192 100644
--- a/arch/arm/kvm/Kconfig
+++ b/arch/arm/kvm/Kconfig
@@ -3,6 +3,7 @@
#
source "virt/kvm/Kconfig"
+source "virt/lib/Kconfig"
menuconfig VIRTUALIZATION
bool "Virtualization"
@@ -35,6 +36,8 @@ config KVM
select HAVE_KVM_IRQCHIP
select HAVE_KVM_IRQ_ROUTING
select HAVE_KVM_MSI
+ select IRQ_BYPASS_MANAGER
+ select HAVE_KVM_IRQ_BYPASS
depends on ARM_VIRT_EXT && ARM_LPAE && ARM_ARCH_TIMER
---help---
Support hosting virtualized guest machines.
diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
index 52cb7ad..7e0d6e6 100644
--- a/arch/arm64/kvm/Kconfig
+++ b/arch/arm64/kvm/Kconfig
@@ -3,6 +3,7 @@
#
source "virt/kvm/Kconfig"
+source "virt/lib/Kconfig"
menuconfig VIRTUALIZATION
bool "Virtualization"
@@ -35,6 +36,8 @@ config KVM
select HAVE_KVM_MSI
select HAVE_KVM_IRQCHIP
select HAVE_KVM_IRQ_ROUTING
+ select IRQ_BYPASS_MANAGER
+ select HAVE_KVM_IRQ_BYPASS
---help---
Support hosting virtualized guest machines.
We don't support KVM with 16K page tables yet, due to the multiple
diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index 3417e18..58871f8 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -27,6 +27,8 @@
#include <linux/mman.h>
#include <linux/sched.h>
#include <linux/kvm.h>
+#include <linux/kvm_irqfd.h>
+#include <linux/irqbypass.h>
#include <trace/events/kvm.h>
#include <kvm/arm_pmu.h>
@@ -1420,6 +1422,52 @@ struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr)
return NULL;
}
+bool kvm_arch_has_irq_bypass(void)
+{
+ return true;
+}
+
+int kvm_arch_irq_bypass_add_producer(struct irq_bypass_consumer *cons,
+ struct irq_bypass_producer *prod)
+{
+ struct kvm_kernel_irqfd *irqfd =
+ container_of(cons, struct kvm_kernel_irqfd, consumer);
+
+ if (prod->type != IRQ_BYPASS_VFIO_PLATFORM)
+ return 0;
+
+ return kvm_vgic_set_forwarding(irqfd->kvm, prod->irq,
+ irqfd->gsi + VGIC_NR_PRIVATE_IRQS);
+}
+void kvm_arch_irq_bypass_del_producer(struct irq_bypass_consumer *cons,
+ struct irq_bypass_producer *prod)
+{
+ struct kvm_kernel_irqfd *irqfd =
+ container_of(cons, struct kvm_kernel_irqfd, consumer);
+
+ if (prod->type != IRQ_BYPASS_VFIO_PLATFORM)
+ return;
+
+ kvm_vgic_unset_forwarding(irqfd->kvm, prod->irq,
+ irqfd->gsi + VGIC_NR_PRIVATE_IRQS);
+}
+
+void kvm_arch_irq_bypass_stop(struct irq_bypass_consumer *cons)
+{
+ struct kvm_kernel_irqfd *irqfd =
+ container_of(cons, struct kvm_kernel_irqfd, consumer);
+
+ kvm_arm_halt_guest(irqfd->kvm);
+}
+
+void kvm_arch_irq_bypass_start(struct irq_bypass_consumer *cons)
+{
+ struct kvm_kernel_irqfd *irqfd =
+ container_of(cons, struct kvm_kernel_irqfd, consumer);
+
+ kvm_arm_resume_guest(irqfd->kvm);
+}
+
/**
* Initialize Hyp-mode and memory mappings on all CPUs.
*/
--
2.5.5
On ARM, we are about to introduce a vfio-platform producer. The
associated architecture specific consumer only is compatible with
this producer and does not work with the dummy vfio-pci msi producer.
Adding a type to the producer allows to easily discriminate between
producers instead of complexifying things by analyzing the interrupt
type.
Signed-off-by: Eric Auger <[email protected]>
---
v2: new
---
drivers/vfio/pci/vfio_pci_intrs.c | 1 +
drivers/vfio/platform/vfio_platform_irq_bypass.c | 1 +
include/linux/irqbypass.h | 7 +++++++
3 files changed, 9 insertions(+)
diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
index 1c46045..40a6b21 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -343,6 +343,7 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_device *vdev,
vdev->ctx[vector].producer.token = trigger;
vdev->ctx[vector].producer.irq = irq;
+ vdev->ctx[vector].producer.type = IRQ_BYPASS_VFIO_PCI_MSI;
ret = irq_bypass_register_producer(&vdev->ctx[vector].producer);
if (unlikely(ret))
dev_info(&pdev->dev,
diff --git a/drivers/vfio/platform/vfio_platform_irq_bypass.c b/drivers/vfio/platform/vfio_platform_irq_bypass.c
index 692b081..603b4b3 100644
--- a/drivers/vfio/platform/vfio_platform_irq_bypass.c
+++ b/drivers/vfio/platform/vfio_platform_irq_bypass.c
@@ -102,6 +102,7 @@ void vfio_platform_register_deoi_producer(struct vfio_platform_device *vdev,
prod->token = trigger;
prod->irq = host_irq;
+ prod->type = IRQ_BYPASS_VFIO_PLATFORM;
prod->add_consumer = irq_bypass_deoi_add_consumer;
prod->del_consumer = irq_bypass_deoi_del_consumer;
prod->stop = irq_bypass_deoi_stop;
diff --git a/include/linux/irqbypass.h b/include/linux/irqbypass.h
index f0f5d26..331c19a 100644
--- a/include/linux/irqbypass.h
+++ b/include/linux/irqbypass.h
@@ -31,11 +31,17 @@ struct irq_bypass_consumer;
* are not supported.
*/
+enum irq_bypass_producer_type {
+ IRQ_BYPASS_VFIO_PCI_MSI,
+ IRQ_BYPASS_VFIO_PLATFORM,
+};
+
/**
* struct irq_bypass_producer - IRQ bypass producer definition
* @node: IRQ bypass manager private list management
* @token: opaque token to match between producer and consumer (non-NULL)
* @irq: Linux IRQ number for the producer device
+ * @type: type of the producer to easily assess interoperability with consumer
* @add_consumer: Connect the IRQ producer to an IRQ consumer (optional)
* @del_consumer: Disconnect the IRQ producer from an IRQ consumer (optional)
* @stop: Perform any quiesce operations necessary prior to add/del (optional)
@@ -49,6 +55,7 @@ struct irq_bypass_producer {
struct list_head node;
void *token;
int irq;
+ enum irq_bypass_producer_type type;
int (*add_consumer)(struct irq_bypass_producer *,
struct irq_bypass_consumer *);
void (*del_consumer)(struct irq_bypass_producer *,
--
2.5.5
Implements kvm_vgic_[set|unset]_forwarding.
Handle low-level VGIC programming and consistent irqchip
programming.
Signed-off-by: Eric Auger <[email protected]>
---
v1 -> v2:
- change the parameter names used in the declaration
- use kvm_vgic_map/unmap_phys_irq and handle their returned value
---
include/kvm/arm_vgic.h | 5 +++
virt/kvm/arm/vgic/vgic.c | 88 ++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 93 insertions(+)
diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index cceb31d..5064a57 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -343,4 +343,9 @@ int kvm_send_userspace_msi(struct kvm *kvm, struct kvm_msi *msi);
*/
int kvm_vgic_setup_default_irq_routing(struct kvm *kvm);
+int kvm_vgic_set_forwarding(struct kvm *kvm, unsigned int host_irq,
+ unsigned int vintid);
+void kvm_vgic_unset_forwarding(struct kvm *kvm, unsigned int host_irq,
+ unsigned int vintid);
+
#endif /* __KVM_ARM_VGIC_H */
diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c
index 2e35ac7..9ee3e77 100644
--- a/virt/kvm/arm/vgic/vgic.c
+++ b/virt/kvm/arm/vgic/vgic.c
@@ -781,3 +781,91 @@ bool kvm_vgic_map_is_active(struct kvm_vcpu *vcpu, unsigned int vintid)
return map_is_active;
}
+/**
+ * kvm_vgic_set_forwarding - Set IRQ forwarding
+ *
+ * @kvm: kvm handle
+ * @host_irq: the host linux IRQ
+ * @vintid: the virtual INTID
+ *
+ * This function must be called when the IRQ is not active:
+ * ie. not active at GIC level and not currently under injection
+ * into the guest using the unforwarded mode. The physical IRQ must
+ * be disabled and all vCPUs must have been exited and prevented
+ * from being re-entered.
+ */
+int kvm_vgic_set_forwarding(struct kvm *kvm, unsigned int host_irq,
+ unsigned int vintid)
+{
+ struct kvm_vcpu *vcpu;
+ struct vgic_irq *irq;
+ int ret;
+
+ kvm_debug("%s host_irq=%d vintid=%d\n", __func__, host_irq, vintid);
+
+ if (!vgic_valid_spi(kvm, vintid))
+ return -EINVAL;
+
+ irq = vgic_get_irq(kvm, NULL, vintid);
+ spin_lock(&irq->irq_lock);
+
+ if (irq->hw) {
+ ret = -EINVAL;
+ goto unlock;
+ }
+ vcpu = irq->target_vcpu;
+ if (!vcpu) {
+ ret = -EAGAIN;
+ goto unlock;
+ }
+
+ ret = kvm_vgic_map_irq(vcpu, irq, host_irq);
+ if (!ret)
+ irq_set_vcpu_affinity(host_irq, vcpu);
+unlock:
+ spin_unlock(&irq->irq_lock);
+ vgic_put_irq(kvm, irq);
+ return ret;
+}
+
+/**
+ * kvm_vgic_unset_forwarding - Unset IRQ forwarding
+ *
+ * @kvm: KVM handle
+ * @host_irq: the host Linux IRQ number
+ * @vintid: virtual INTID
+ *
+ * This function must be called when the host irq is disabled and
+ * all vCPUs have been exited and prevented from being re-entered.
+ */
+void kvm_vgic_unset_forwarding(struct kvm *kvm,
+ unsigned int host_irq,
+ unsigned int vintid)
+{
+ struct vgic_irq *irq;
+ bool active;
+
+ kvm_debug("%s host_irq=%d vintid=%d\n", __func__, host_irq, vintid);
+
+ if (!vgic_valid_spi(kvm, vintid))
+ return;
+
+ irq = vgic_get_irq(kvm, NULL, vintid);
+ spin_lock(&irq->irq_lock);
+
+ if (!irq->hw)
+ goto unlock;
+
+ WARN_ON(irq_get_irqchip_state(host_irq, IRQCHIP_STATE_ACTIVE, &active));
+
+ if (active)
+ irq_set_irqchip_state(host_irq, IRQCHIP_STATE_ACTIVE, false);
+
+ kvm_vgic_unmap_irq(irq);
+ irq_set_vcpu_affinity(host_irq, NULL);
+
+unlock:
+ spin_unlock(&irq->irq_lock);
+ vgic_put_irq(kvm, irq);
+}
+
--
2.5.5
Hi Eric,
On 15/06/17 13:52, Eric Auger wrote:
> Currently, the line level of unmapped level sensitive SPIs is
> toggled down by the maintenance IRQ handler/resamplefd mechanism.
>
> As mapped SPI completion is not trapped, we cannot rely on this
> mechanism and the line level needs to be observed at distributor
> level instead.
>
> This patch handles the physical IRQ case in vgic_validate_injection
> and get the line level of a mapped SPI at distributor level.
>
> Signed-off-by: Eric Auger <[email protected]>
>
> ---
>
> v1 -> v2:
> - renamed is_unshared_mapped into is_mapped_spi
> - changes to kvm_vgic_map_phys_irq moved in the previous patch
> - make vgic_validate_injection more readable
> - reword the commit message
> ---
> virt/kvm/arm/vgic/vgic.c | 16 ++++++++++++++--
> virt/kvm/arm/vgic/vgic.h | 7 ++++++-
> 2 files changed, 20 insertions(+), 3 deletions(-)
>
> diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c
> index 075f073..2e35ac7 100644
> --- a/virt/kvm/arm/vgic/vgic.c
> +++ b/virt/kvm/arm/vgic/vgic.c
> @@ -139,6 +139,17 @@ void vgic_put_irq(struct kvm *kvm, struct vgic_irq *irq)
> kfree(irq);
> }
>
> +bool irq_line_level(struct vgic_irq *irq)
> +{
> + bool line_level = irq->line_level;
> +
> + if (unlikely(is_mapped_spi(irq)))
> + WARN_ON(irq_get_irqchip_state(irq->host_irq,
> + IRQCHIP_STATE_PENDING,
> + &line_level));
> + return line_level;
> +}
> +
> /**
> * kvm_vgic_target_oracle - compute the target vcpu for an irq
> *
> @@ -236,13 +247,14 @@ static void vgic_sort_ap_list(struct kvm_vcpu *vcpu)
>
> /*
> * Only valid injection if changing level for level-triggered IRQs or for a
> - * rising edge.
> + * rising edge. Injection of virtual interrupts associated to physical
> + * interrupts always is valid.
> */
> static bool vgic_validate_injection(struct vgic_irq *irq, bool level)
> {
> switch (irq->config) {
> case VGIC_CONFIG_LEVEL:
> - return irq->line_level != level;
> + return (irq->line_level != level || unlikely(is_mapped_spi(irq)));
> case VGIC_CONFIG_EDGE:
> return level;
> }
> diff --git a/virt/kvm/arm/vgic/vgic.h b/virt/kvm/arm/vgic/vgic.h
> index bba7fa2..da254ae 100644
> --- a/virt/kvm/arm/vgic/vgic.h
> +++ b/virt/kvm/arm/vgic/vgic.h
> @@ -96,14 +96,19 @@
> /* we only support 64 kB translation table page size */
> #define KVM_ITS_L1E_ADDR_MASK GENMASK_ULL(51, 16)
>
> +bool irq_line_level(struct vgic_irq *irq);
> +
> static inline bool irq_is_pending(struct vgic_irq *irq)
> {
> if (irq->config == VGIC_CONFIG_EDGE)
> return irq->pending_latch;
> else
> - return irq->pending_latch || irq->line_level;
> + return irq->pending_latch || irq_line_level(irq);
I'm a bit concerned that an edge interrupt doesn't take the distributor
state into account here. Why is that so? Once an SPI is forwarded to a
guest, a large part of the edge vs level differences move into the HW,
and are not that different anymore from a SW PoV.
Thanks,
M.
--
Jazz is not dead. It just smells funny...
Hi Marc,
On 04/07/2017 14:15, Marc Zyngier wrote:
> Hi Eric,
>
> On 15/06/17 13:52, Eric Auger wrote:
>> Currently, the line level of unmapped level sensitive SPIs is
>> toggled down by the maintenance IRQ handler/resamplefd mechanism.
>>
>> As mapped SPI completion is not trapped, we cannot rely on this
>> mechanism and the line level needs to be observed at distributor
>> level instead.
>>
>> This patch handles the physical IRQ case in vgic_validate_injection
>> and get the line level of a mapped SPI at distributor level.
>>
>> Signed-off-by: Eric Auger <[email protected]>
>>
>> ---
>>
>> v1 -> v2:
>> - renamed is_unshared_mapped into is_mapped_spi
>> - changes to kvm_vgic_map_phys_irq moved in the previous patch
>> - make vgic_validate_injection more readable
>> - reword the commit message
>> ---
>> virt/kvm/arm/vgic/vgic.c | 16 ++++++++++++++--
>> virt/kvm/arm/vgic/vgic.h | 7 ++++++-
>> 2 files changed, 20 insertions(+), 3 deletions(-)
>>
>> diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c
>> index 075f073..2e35ac7 100644
>> --- a/virt/kvm/arm/vgic/vgic.c
>> +++ b/virt/kvm/arm/vgic/vgic.c
>> @@ -139,6 +139,17 @@ void vgic_put_irq(struct kvm *kvm, struct vgic_irq *irq)
>> kfree(irq);
>> }
>>
>> +bool irq_line_level(struct vgic_irq *irq)
>> +{
>> + bool line_level = irq->line_level;
>> +
>> + if (unlikely(is_mapped_spi(irq)))
>> + WARN_ON(irq_get_irqchip_state(irq->host_irq,
>> + IRQCHIP_STATE_PENDING,
>> + &line_level));
>> + return line_level;
>> +}
>> +
>> /**
>> * kvm_vgic_target_oracle - compute the target vcpu for an irq
>> *
>> @@ -236,13 +247,14 @@ static void vgic_sort_ap_list(struct kvm_vcpu *vcpu)
>>
>> /*
>> * Only valid injection if changing level for level-triggered IRQs or for a
>> - * rising edge.
>> + * rising edge. Injection of virtual interrupts associated to physical
>> + * interrupts always is valid.
>> */
>> static bool vgic_validate_injection(struct vgic_irq *irq, bool level)
>> {
>> switch (irq->config) {
>> case VGIC_CONFIG_LEVEL:
>> - return irq->line_level != level;
>> + return (irq->line_level != level || unlikely(is_mapped_spi(irq)));
>> case VGIC_CONFIG_EDGE:
>> return level;
>> }
>> diff --git a/virt/kvm/arm/vgic/vgic.h b/virt/kvm/arm/vgic/vgic.h
>> index bba7fa2..da254ae 100644
>> --- a/virt/kvm/arm/vgic/vgic.h
>> +++ b/virt/kvm/arm/vgic/vgic.h
>> @@ -96,14 +96,19 @@
>> /* we only support 64 kB translation table page size */
>> #define KVM_ITS_L1E_ADDR_MASK GENMASK_ULL(51, 16)
>>
>> +bool irq_line_level(struct vgic_irq *irq);
>> +
>> static inline bool irq_is_pending(struct vgic_irq *irq)
>> {
>> if (irq->config == VGIC_CONFIG_EDGE)
>> return irq->pending_latch;
>> else
>> - return irq->pending_latch || irq->line_level;
>> + return irq->pending_latch || irq_line_level(irq);
>
> I'm a bit concerned that an edge interrupt doesn't take the distributor
> state into account here. Why is that so? Once an SPI is forwarded to a
> guest, a large part of the edge vs level differences move into the HW,
> and are not that different anymore from a SW PoV.
As pointed out by Christoffer in https://lkml.org/lkml/2017/6/8/322,
isn't it a bit risky in general to poke the physical state instead of
the virtual state. For level sensitive, to me we don't really have many
other alternatives. For edge, we are not obliged to.
Don't we have situations, due to the lazy disable approach, where the
physical IRQ hits, enters the genirq handler and the actual handler is
not called, ie. the virtual IRQ is not injected?
Thanks
Eric
>
> Thanks,
>
> M.
>
On Thu, Jun 15, 2017 at 02:52:36PM +0200, Eric Auger wrote:
> We want to reuse the core of the map/unmap functions for IRQ
> forwarding. Let's move the computation of the hwirq in
> kvm_vgic_map_phys_irq and pass the linux IRQ as parameter.
> the host_irq is added to struct vgic_irq.
>
> We introduce kvm_vgic_map/unmap_irq which take a struct vgic_irq
> handle as a parameter.
So this is to avoid the linux-to-hardware irq number translation in
other callers?
I am sort of guessing that we need to store the host_irq number so that
we can probe into the physical GIC in later patches? That may be good
to note in this commit message if you respin.
Thanks,
-Christoffer
>
> Signed-off-by: Eric Auger <[email protected]>
> ---
> include/kvm/arm_vgic.h | 8 ++++---
> virt/kvm/arm/arch_timer.c | 24 +------------------
> virt/kvm/arm/vgic/vgic.c | 60 +++++++++++++++++++++++++++++++++++------------
> 3 files changed, 51 insertions(+), 41 deletions(-)
>
> diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
> index ef71858..cceb31d 100644
> --- a/include/kvm/arm_vgic.h
> +++ b/include/kvm/arm_vgic.h
> @@ -112,6 +112,7 @@ struct vgic_irq {
> bool hw; /* Tied to HW IRQ */
> struct kref refcount; /* Used for LPIs */
> u32 hwintid; /* HW INTID number */
> + unsigned int host_irq; /* linux irq corresponding to hwintid */
> union {
> u8 targets; /* GICv2 target VCPUs mask */
> u32 mpidr; /* GICv3 target VCPU */
> @@ -301,9 +302,10 @@ int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, unsigned int intid,
> bool level);
> int kvm_vgic_inject_mapped_irq(struct kvm *kvm, int cpuid, unsigned int intid,
> bool level);
> -int kvm_vgic_map_phys_irq(struct kvm_vcpu *vcpu, u32 virt_irq, u32 phys_irq);
> -int kvm_vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, unsigned int virt_irq);
> -bool kvm_vgic_map_is_active(struct kvm_vcpu *vcpu, unsigned int virt_irq);
> +int kvm_vgic_map_phys_irq(struct kvm_vcpu *vcpu, unsigned int host_irq,
> + u32 vintid);
> +int kvm_vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, unsigned int vintid);
> +bool kvm_vgic_map_is_active(struct kvm_vcpu *vcpu, unsigned int vintid);
>
> int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu);
>
> diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
> index 5976609..57a30ab 100644
> --- a/virt/kvm/arm/arch_timer.c
> +++ b/virt/kvm/arm/arch_timer.c
> @@ -617,9 +617,6 @@ int kvm_timer_enable(struct kvm_vcpu *vcpu)
> {
> struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
> - struct irq_desc *desc;
> - struct irq_data *data;
> - int phys_irq;
> int ret;
>
> if (timer->enabled)
> @@ -632,26 +629,7 @@ int kvm_timer_enable(struct kvm_vcpu *vcpu)
> if (!vgic_initialized(vcpu->kvm))
> return -ENODEV;
>
> - /*
> - * Find the physical IRQ number corresponding to the host_vtimer_irq
> - */
> - desc = irq_to_desc(host_vtimer_irq);
> - if (!desc) {
> - kvm_err("%s: no interrupt descriptor\n", __func__);
> - return -EINVAL;
> - }
> -
> - data = irq_desc_get_irq_data(desc);
> - while (data->parent_data)
> - data = data->parent_data;
> -
> - phys_irq = data->hwirq;
> -
> - /*
> - * Tell the VGIC that the virtual interrupt is tied to a
> - * physical interrupt. We do that once per VCPU.
> - */
> - ret = kvm_vgic_map_phys_irq(vcpu, vtimer->irq.irq, phys_irq);
> + ret = kvm_vgic_map_phys_irq(vcpu, host_vtimer_irq, vtimer->irq.irq);
> if (ret)
> return ret;
>
> diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c
> index 83b24d2..075f073 100644
> --- a/virt/kvm/arm/vgic/vgic.c
> +++ b/virt/kvm/arm/vgic/vgic.c
> @@ -17,6 +17,8 @@
> #include <linux/kvm.h>
> #include <linux/kvm_host.h>
> #include <linux/list_sort.h>
> +#include <linux/interrupt.h>
> +#include <linux/irq.h>
>
> #include "vgic.h"
>
> @@ -392,38 +394,66 @@ int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, unsigned int intid,
> return 0;
> }
>
> -int kvm_vgic_map_phys_irq(struct kvm_vcpu *vcpu, u32 virt_irq, u32 phys_irq)
> +/* @irq->irq_lock must be held */
> +static int kvm_vgic_map_irq(struct kvm_vcpu *vcpu, struct vgic_irq *irq,
> + unsigned int host_irq)
> {
> - struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, virt_irq);
> + struct irq_desc *desc;
> + struct irq_data *data;
>
> - BUG_ON(!irq);
> -
> - spin_lock(&irq->irq_lock);
> + /*
> + * Find the physical IRQ number corresponding to @host_irq
> + */
> + desc = irq_to_desc(host_irq);
> + if (!desc) {
> + kvm_err("%s: no interrupt descriptor\n", __func__);
> + return -EINVAL;
> + }
> + data = irq_desc_get_irq_data(desc);
> + while (data->parent_data)
> + data = data->parent_data;
>
> irq->hw = true;
> - irq->hwintid = phys_irq;
> + irq->host_irq = host_irq;
> + irq->hwintid = data->hwirq;
> + return 0;
> +}
> +
> +/* @irq->irq_lock must be held */
> +static inline void kvm_vgic_unmap_irq(struct vgic_irq *irq)
> +{
> + irq->hw = false;
> + irq->hwintid = 0;
> +}
> +
> +int kvm_vgic_map_phys_irq(struct kvm_vcpu *vcpu, unsigned int host_irq,
> + u32 vintid)
> +{
> + struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, vintid);
> + int ret;
>
> + BUG_ON(!irq);
> +
> + spin_lock(&irq->irq_lock);
> + ret = kvm_vgic_map_irq(vcpu, irq, host_irq);
> spin_unlock(&irq->irq_lock);
> vgic_put_irq(vcpu->kvm, irq);
>
> - return 0;
> + return ret;
> }
>
> -int kvm_vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, unsigned int virt_irq)
> +int kvm_vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, unsigned int vintid)
> {
> struct vgic_irq *irq;
>
> if (!vgic_initialized(vcpu->kvm))
> return -EAGAIN;
>
> - irq = vgic_get_irq(vcpu->kvm, vcpu, virt_irq);
> + irq = vgic_get_irq(vcpu->kvm, vcpu, vintid);
> BUG_ON(!irq);
>
> spin_lock(&irq->irq_lock);
> -
> - irq->hw = false;
> - irq->hwintid = 0;
> -
> + kvm_vgic_unmap_irq(irq);
> spin_unlock(&irq->irq_lock);
> vgic_put_irq(vcpu->kvm, irq);
>
> @@ -726,9 +756,9 @@ void vgic_kick_vcpus(struct kvm *kvm)
> }
> }
>
> -bool kvm_vgic_map_is_active(struct kvm_vcpu *vcpu, unsigned int virt_irq)
> +bool kvm_vgic_map_is_active(struct kvm_vcpu *vcpu, unsigned int vintid)
> {
> - struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, virt_irq);
> + struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, vintid);
> bool map_is_active;
>
> spin_lock(&irq->irq_lock);
> --
> 2.5.5
>
On Thu, Jun 15, 2017 at 02:52:37PM +0200, Eric Auger wrote:
> Currently, the line level of unmapped level sensitive SPIs is
> toggled down by the maintenance IRQ handler/resamplefd mechanism.
>
> As mapped SPI completion is not trapped, we cannot rely on this
> mechanism and the line level needs to be observed at distributor
> level instead.
>
> This patch handles the physical IRQ case in vgic_validate_injection
> and get the line level of a mapped SPI at distributor level.
>
> Signed-off-by: Eric Auger <[email protected]>
>
> ---
>
> v1 -> v2:
> - renamed is_unshared_mapped into is_mapped_spi
> - changes to kvm_vgic_map_phys_irq moved in the previous patch
> - make vgic_validate_injection more readable
> - reword the commit message
> ---
> virt/kvm/arm/vgic/vgic.c | 16 ++++++++++++++--
> virt/kvm/arm/vgic/vgic.h | 7 ++++++-
> 2 files changed, 20 insertions(+), 3 deletions(-)
>
> diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c
> index 075f073..2e35ac7 100644
> --- a/virt/kvm/arm/vgic/vgic.c
> +++ b/virt/kvm/arm/vgic/vgic.c
> @@ -139,6 +139,17 @@ void vgic_put_irq(struct kvm *kvm, struct vgic_irq *irq)
> kfree(irq);
> }
>
> +bool irq_line_level(struct vgic_irq *irq)
> +{
> + bool line_level = irq->line_level;
> +
> + if (unlikely(is_mapped_spi(irq)))
> + WARN_ON(irq_get_irqchip_state(irq->host_irq,
> + IRQCHIP_STATE_PENDING,
> + &line_level));
> + return line_level;
> +}
> +
> /**
> * kvm_vgic_target_oracle - compute the target vcpu for an irq
> *
> @@ -236,13 +247,14 @@ static void vgic_sort_ap_list(struct kvm_vcpu *vcpu)
>
> /*
> * Only valid injection if changing level for level-triggered IRQs or for a
> - * rising edge.
> + * rising edge. Injection of virtual interrupts associated to physical
> + * interrupts always is valid.
why? I don't remember this now, and that means I probably won't in the
future either.
When I look at this now, I'm thinking, if we're not going to change
anything, why proceed beyond validate injection?
> */
> static bool vgic_validate_injection(struct vgic_irq *irq, bool level)
> {
> switch (irq->config) {
> case VGIC_CONFIG_LEVEL:
> - return irq->line_level != level;
> + return (irq->line_level != level || unlikely(is_mapped_spi(irq)));
> case VGIC_CONFIG_EDGE:
> return level;
> }
> diff --git a/virt/kvm/arm/vgic/vgic.h b/virt/kvm/arm/vgic/vgic.h
> index bba7fa2..da254ae 100644
> --- a/virt/kvm/arm/vgic/vgic.h
> +++ b/virt/kvm/arm/vgic/vgic.h
> @@ -96,14 +96,19 @@
> /* we only support 64 kB translation table page size */
> #define KVM_ITS_L1E_ADDR_MASK GENMASK_ULL(51, 16)
>
> +bool irq_line_level(struct vgic_irq *irq);
> +
> static inline bool irq_is_pending(struct vgic_irq *irq)
> {
> if (irq->config == VGIC_CONFIG_EDGE)
> return irq->pending_latch;
> else
> - return irq->pending_latch || irq->line_level;
> + return irq->pending_latch || irq_line_level(irq);
> }
>
> +#define is_mapped_spi(i) \
> +((i)->hw && (i)->intid >= VGIC_NR_PRIVATE_IRQS && (i)->intid < 1020)
> +
nit: why is this not a static inline ?
> /*
> * This struct provides an intermediate representation of the fields contained
> * in the GICH_VMCR and ICH_VMCR registers, such that code exporting the GIC
> --
> 2.5.5
>
Thanks,
-Christoffer
On Fri, Jul 07, 2017 at 09:41:42AM +0200, Auger Eric wrote:
> Hi Marc,
>
> On 04/07/2017 14:15, Marc Zyngier wrote:
> > Hi Eric,
> >
> > On 15/06/17 13:52, Eric Auger wrote:
> >> Currently, the line level of unmapped level sensitive SPIs is
> >> toggled down by the maintenance IRQ handler/resamplefd mechanism.
> >>
> >> As mapped SPI completion is not trapped, we cannot rely on this
> >> mechanism and the line level needs to be observed at distributor
> >> level instead.
> >>
> >> This patch handles the physical IRQ case in vgic_validate_injection
> >> and get the line level of a mapped SPI at distributor level.
> >>
> >> Signed-off-by: Eric Auger <[email protected]>
> >>
> >> ---
> >>
> >> v1 -> v2:
> >> - renamed is_unshared_mapped into is_mapped_spi
> >> - changes to kvm_vgic_map_phys_irq moved in the previous patch
> >> - make vgic_validate_injection more readable
> >> - reword the commit message
> >> ---
> >> virt/kvm/arm/vgic/vgic.c | 16 ++++++++++++++--
> >> virt/kvm/arm/vgic/vgic.h | 7 ++++++-
> >> 2 files changed, 20 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c
> >> index 075f073..2e35ac7 100644
> >> --- a/virt/kvm/arm/vgic/vgic.c
> >> +++ b/virt/kvm/arm/vgic/vgic.c
> >> @@ -139,6 +139,17 @@ void vgic_put_irq(struct kvm *kvm, struct vgic_irq *irq)
> >> kfree(irq);
> >> }
> >>
> >> +bool irq_line_level(struct vgic_irq *irq)
> >> +{
> >> + bool line_level = irq->line_level;
> >> +
> >> + if (unlikely(is_mapped_spi(irq)))
> >> + WARN_ON(irq_get_irqchip_state(irq->host_irq,
> >> + IRQCHIP_STATE_PENDING,
> >> + &line_level));
> >> + return line_level;
> >> +}
> >> +
> >> /**
> >> * kvm_vgic_target_oracle - compute the target vcpu for an irq
> >> *
> >> @@ -236,13 +247,14 @@ static void vgic_sort_ap_list(struct kvm_vcpu *vcpu)
> >>
> >> /*
> >> * Only valid injection if changing level for level-triggered IRQs or for a
> >> - * rising edge.
> >> + * rising edge. Injection of virtual interrupts associated to physical
> >> + * interrupts always is valid.
> >> */
> >> static bool vgic_validate_injection(struct vgic_irq *irq, bool level)
> >> {
> >> switch (irq->config) {
> >> case VGIC_CONFIG_LEVEL:
> >> - return irq->line_level != level;
> >> + return (irq->line_level != level || unlikely(is_mapped_spi(irq)));
> >> case VGIC_CONFIG_EDGE:
> >> return level;
> >> }
> >> diff --git a/virt/kvm/arm/vgic/vgic.h b/virt/kvm/arm/vgic/vgic.h
> >> index bba7fa2..da254ae 100644
> >> --- a/virt/kvm/arm/vgic/vgic.h
> >> +++ b/virt/kvm/arm/vgic/vgic.h
> >> @@ -96,14 +96,19 @@
> >> /* we only support 64 kB translation table page size */
> >> #define KVM_ITS_L1E_ADDR_MASK GENMASK_ULL(51, 16)
> >>
> >> +bool irq_line_level(struct vgic_irq *irq);
> >> +
> >> static inline bool irq_is_pending(struct vgic_irq *irq)
> >> {
> >> if (irq->config == VGIC_CONFIG_EDGE)
> >> return irq->pending_latch;
> >> else
> >> - return irq->pending_latch || irq->line_level;
> >> + return irq->pending_latch || irq_line_level(irq);
> >
> > I'm a bit concerned that an edge interrupt doesn't take the distributor
> > state into account here. Why is that so? Once an SPI is forwarded to a
> > guest, a large part of the edge vs level differences move into the HW,
> > and are not that different anymore from a SW PoV.
>
> As pointed out by Christoffer in https://lkml.org/lkml/2017/6/8/322,
> isn't it a bit risky in general to poke the physical state instead of
> the virtual state. For level sensitive, to me we don't really have many
> other alternatives. For edge, we are not obliged to.
I think we need to be clear on the fundamental question of whether or
not we consider pending_latch and/or line_level for mapped interrupts.
I can definitely see the argument that the pending state is kept in
hardware, so if you want to know that for a mapped interrupt, ask the
hardware.
The upside of this appraoch is a clean separation of state and we avoid
any logic to synchronize a virtual state with the physical state.
The downside is that it's slower to peek into the physical GIC than to
read a variable from memory, and we need to special case the validate
path (which I now understand).
If we move to keeping the state in HW, how do we deal with GICD_SPENDR ?
Does that mean we will forward a from the VM handled by the VGIC to the
physical GIC?
>
> Don't we have situations, due to the lazy disable approach, where the
> physical IRQ hits, enters the genirq handler and the actual handler is
> not called, ie. the virtual IRQ is not injected?
>
I'm not sure I remember what these situations were, specifically, but
certainly if we ever have a situation where a mapped irq's pending state
should be different from that of the physical one, then it doesn't work.
Thanks,
-Christoffer
On Thu, Jun 15, 2017 at 02:52:38PM +0200, Eric Auger wrote:
> Implements kvm_vgic_[set|unset]_forwarding.
>
> Handle low-level VGIC programming and consistent irqchip
> programming.
>
> Signed-off-by: Eric Auger <[email protected]>
>
> ---
>
> v1 -> v2:
> - change the parameter names used in the declaration
> - use kvm_vgic_map/unmap_phys_irq and handle their returned value
> ---
> include/kvm/arm_vgic.h | 5 +++
> virt/kvm/arm/vgic/vgic.c | 88 ++++++++++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 93 insertions(+)
>
> diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
> index cceb31d..5064a57 100644
> --- a/include/kvm/arm_vgic.h
> +++ b/include/kvm/arm_vgic.h
> @@ -343,4 +343,9 @@ int kvm_send_userspace_msi(struct kvm *kvm, struct kvm_msi *msi);
> */
> int kvm_vgic_setup_default_irq_routing(struct kvm *kvm);
>
> +int kvm_vgic_set_forwarding(struct kvm *kvm, unsigned int host_irq,
> + unsigned int vintid);
> +void kvm_vgic_unset_forwarding(struct kvm *kvm, unsigned int host_irq,
> + unsigned int vintid);
> +
> #endif /* __KVM_ARM_VGIC_H */
> diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c
> index 2e35ac7..9ee3e77 100644
> --- a/virt/kvm/arm/vgic/vgic.c
> +++ b/virt/kvm/arm/vgic/vgic.c
> @@ -781,3 +781,91 @@ bool kvm_vgic_map_is_active(struct kvm_vcpu *vcpu, unsigned int vintid)
> return map_is_active;
> }
>
> +/**
> + * kvm_vgic_set_forwarding - Set IRQ forwarding
> + *
> + * @kvm: kvm handle
> + * @host_irq: the host linux IRQ
> + * @vintid: the virtual INTID
> + *
> + * This function must be called when the IRQ is not active:
> + * ie. not active at GIC level and not currently under injection
> + * into the guest using the unforwarded mode. The physical IRQ must
> + * be disabled and all vCPUs must have been exited and prevented
> + * from being re-entered.
> + */
> +int kvm_vgic_set_forwarding(struct kvm *kvm, unsigned int host_irq,
> + unsigned int vintid)
> +{
> + struct kvm_vcpu *vcpu;
> + struct vgic_irq *irq;
> + int ret;
> +
> + kvm_debug("%s host_irq=%d vintid=%d\n", __func__, host_irq, vintid);
do you need to check if the vgic is initialized etc. here?
> +
> + if (!vgic_valid_spi(kvm, vintid))
> + return -EINVAL;
> +
> + irq = vgic_get_irq(kvm, NULL, vintid);
> + spin_lock(&irq->irq_lock);
> +
> + if (irq->hw) {
> + ret = -EINVAL;
is this because it's already forwarded? How about EBUSY or EEXIST
instead then?
> + goto unlock;
> + }
> + vcpu = irq->target_vcpu;
> + if (!vcpu) {
> + ret = -EAGAIN;
what is this case exactly?
> + goto unlock;
> + }
> +
> + ret = kvm_vgic_map_irq(vcpu, irq, host_irq);
> + if (!ret)
> + irq_set_vcpu_affinity(host_irq, vcpu);
so this is essentially map + set_vcpu_affinity. Why do we want the GIC
to do this in one go as opposed to leaving it up to the caller?
> +unlock:
> + spin_unlock(&irq->irq_lock);
> + vgic_put_irq(kvm, irq);
> + return ret;
> +}
> +
> +/**
> + * kvm_vgic_unset_forwarding - Unset IRQ forwarding
> + *
> + * @kvm: KVM handle
> + * @host_irq: the host Linux IRQ number
> + * @vintid: virtual INTID
> + *
> + * This function must be called when the host irq is disabled and
> + * all vCPUs have been exited and prevented from being re-entered.
> + */
> +void kvm_vgic_unset_forwarding(struct kvm *kvm,
> + unsigned int host_irq,
> + unsigned int vintid)
> +{
> + struct vgic_irq *irq;
> + bool active;
> +
> + kvm_debug("%s host_irq=%d vintid=%d\n", __func__, host_irq, vintid);
do you need to check if the vgic is initialized etc. here?
> +
> + if (!vgic_valid_spi(kvm, vintid))
> + return;
> +
> + irq = vgic_get_irq(kvm, NULL, vintid);
> + spin_lock(&irq->irq_lock);
> +
> + if (!irq->hw)
> + goto unlock;
> +
> + WARN_ON(irq_get_irqchip_state(host_irq, IRQCHIP_STATE_ACTIVE, &active));
> +
> + if (active)
> + irq_set_irqchip_state(host_irq, IRQCHIP_STATE_ACTIVE, false);
> +
> + kvm_vgic_unmap_irq(irq);
> + irq_set_vcpu_affinity(host_irq, NULL);
> +
> +unlock:
> + spin_unlock(&irq->irq_lock);
> + vgic_put_irq(kvm, irq);
> +}
> +
> --
> 2.5.5
>
Thanks,
-Christoffer
On Thu, Jun 15, 2017 at 02:52:40PM +0200, Eric Auger wrote:
> This patch selects IRQ_BYPASS_MANAGER and HAVE_KVM_IRQ_BYPASS
> configs for ARM/ARM64.
>
> kvm_arch_has_irq_bypass() now is implemented and returns true.
> As a consequence the irq bypass consumer will be registered for
> ARM/ARM64 with Direct EOI/IRQ forwarding callbacks:
>
> - stop/start: halt/resume guest execution
> - add/del_producer: set/unset forwarding/DEOI at vgic/irqchip level
>
> The consumer currently only works with a VFIO_PLATFORM producer.
> In other cases, start/stop do nothing and return without error
> to avoid outputting a spurious warning.
>
> Signed-off-by: Eric Auger <[email protected]>
>
> ---
>
> v1 -> v2:
> - check the type of the producer to avoid attachement with VFIO-PCI
> dummy MSI producer
> ---
> arch/arm/kvm/Kconfig | 3 +++
> arch/arm64/kvm/Kconfig | 3 +++
> virt/kvm/arm/arm.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++++
> 3 files changed, 54 insertions(+)
>
> diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig
> index 90d0176..4e2b192 100644
> --- a/arch/arm/kvm/Kconfig
> +++ b/arch/arm/kvm/Kconfig
> @@ -3,6 +3,7 @@
> #
>
> source "virt/kvm/Kconfig"
> +source "virt/lib/Kconfig"
>
> menuconfig VIRTUALIZATION
> bool "Virtualization"
> @@ -35,6 +36,8 @@ config KVM
> select HAVE_KVM_IRQCHIP
> select HAVE_KVM_IRQ_ROUTING
> select HAVE_KVM_MSI
> + select IRQ_BYPASS_MANAGER
> + select HAVE_KVM_IRQ_BYPASS
> depends on ARM_VIRT_EXT && ARM_LPAE && ARM_ARCH_TIMER
> ---help---
> Support hosting virtualized guest machines.
> diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
> index 52cb7ad..7e0d6e6 100644
> --- a/arch/arm64/kvm/Kconfig
> +++ b/arch/arm64/kvm/Kconfig
> @@ -3,6 +3,7 @@
> #
>
> source "virt/kvm/Kconfig"
> +source "virt/lib/Kconfig"
>
> menuconfig VIRTUALIZATION
> bool "Virtualization"
> @@ -35,6 +36,8 @@ config KVM
> select HAVE_KVM_MSI
> select HAVE_KVM_IRQCHIP
> select HAVE_KVM_IRQ_ROUTING
> + select IRQ_BYPASS_MANAGER
> + select HAVE_KVM_IRQ_BYPASS
> ---help---
> Support hosting virtualized guest machines.
> We don't support KVM with 16K page tables yet, due to the multiple
> diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
> index 3417e18..58871f8 100644
> --- a/virt/kvm/arm/arm.c
> +++ b/virt/kvm/arm/arm.c
> @@ -27,6 +27,8 @@
> #include <linux/mman.h>
> #include <linux/sched.h>
> #include <linux/kvm.h>
> +#include <linux/kvm_irqfd.h>
> +#include <linux/irqbypass.h>
> #include <trace/events/kvm.h>
> #include <kvm/arm_pmu.h>
>
> @@ -1420,6 +1422,52 @@ struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr)
> return NULL;
> }
>
> +bool kvm_arch_has_irq_bypass(void)
> +{
> + return true;
> +}
> +
> +int kvm_arch_irq_bypass_add_producer(struct irq_bypass_consumer *cons,
> + struct irq_bypass_producer *prod)
> +{
> + struct kvm_kernel_irqfd *irqfd =
> + container_of(cons, struct kvm_kernel_irqfd, consumer);
> +
> + if (prod->type != IRQ_BYPASS_VFIO_PLATFORM)
> + return 0;
> +
> + return kvm_vgic_set_forwarding(irqfd->kvm, prod->irq,
> + irqfd->gsi + VGIC_NR_PRIVATE_IRQS);
> +}
> +void kvm_arch_irq_bypass_del_producer(struct irq_bypass_consumer *cons,
> + struct irq_bypass_producer *prod)
> +{
> + struct kvm_kernel_irqfd *irqfd =
> + container_of(cons, struct kvm_kernel_irqfd, consumer);
> +
> + if (prod->type != IRQ_BYPASS_VFIO_PLATFORM)
> + return;
> +
> + kvm_vgic_unset_forwarding(irqfd->kvm, prod->irq,
> + irqfd->gsi + VGIC_NR_PRIVATE_IRQS);
> +}
> +
> +void kvm_arch_irq_bypass_stop(struct irq_bypass_consumer *cons)
> +{
> + struct kvm_kernel_irqfd *irqfd =
> + container_of(cons, struct kvm_kernel_irqfd, consumer);
> +
> + kvm_arm_halt_guest(irqfd->kvm);
> +}
> +
> +void kvm_arch_irq_bypass_start(struct irq_bypass_consumer *cons)
> +{
> + struct kvm_kernel_irqfd *irqfd =
> + container_of(cons, struct kvm_kernel_irqfd, consumer);
> +
> + kvm_arm_resume_guest(irqfd->kvm);
> +}
> +
> /**
> * Initialize Hyp-mode and memory mappings on all CPUs.
> */
> --
> 2.5.5
Nit: I'm wondering if this should go in a different file, like
virt/kvm/arm/irqbypass.c, or virt/kvm/arm/vgic/vgic-irqbypass.c ?
Thanks,
-Christoffer
On 21/07/17 14:03, Christoffer Dall wrote:
> On Fri, Jul 07, 2017 at 09:41:42AM +0200, Auger Eric wrote:
>> Hi Marc,
>>
>> On 04/07/2017 14:15, Marc Zyngier wrote:
>>> Hi Eric,
>>>
>>> On 15/06/17 13:52, Eric Auger wrote:
>>>> Currently, the line level of unmapped level sensitive SPIs is
>>>> toggled down by the maintenance IRQ handler/resamplefd mechanism.
>>>>
>>>> As mapped SPI completion is not trapped, we cannot rely on this
>>>> mechanism and the line level needs to be observed at distributor
>>>> level instead.
>>>>
>>>> This patch handles the physical IRQ case in vgic_validate_injection
>>>> and get the line level of a mapped SPI at distributor level.
>>>>
>>>> Signed-off-by: Eric Auger <[email protected]>
>>>>
>>>> ---
>>>>
>>>> v1 -> v2:
>>>> - renamed is_unshared_mapped into is_mapped_spi
>>>> - changes to kvm_vgic_map_phys_irq moved in the previous patch
>>>> - make vgic_validate_injection more readable
>>>> - reword the commit message
>>>> ---
>>>> virt/kvm/arm/vgic/vgic.c | 16 ++++++++++++++--
>>>> virt/kvm/arm/vgic/vgic.h | 7 ++++++-
>>>> 2 files changed, 20 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c
>>>> index 075f073..2e35ac7 100644
>>>> --- a/virt/kvm/arm/vgic/vgic.c
>>>> +++ b/virt/kvm/arm/vgic/vgic.c
>>>> @@ -139,6 +139,17 @@ void vgic_put_irq(struct kvm *kvm, struct vgic_irq *irq)
>>>> kfree(irq);
>>>> }
>>>>
>>>> +bool irq_line_level(struct vgic_irq *irq)
>>>> +{
>>>> + bool line_level = irq->line_level;
>>>> +
>>>> + if (unlikely(is_mapped_spi(irq)))
>>>> + WARN_ON(irq_get_irqchip_state(irq->host_irq,
>>>> + IRQCHIP_STATE_PENDING,
>>>> + &line_level));
>>>> + return line_level;
>>>> +}
>>>> +
>>>> /**
>>>> * kvm_vgic_target_oracle - compute the target vcpu for an irq
>>>> *
>>>> @@ -236,13 +247,14 @@ static void vgic_sort_ap_list(struct kvm_vcpu *vcpu)
>>>>
>>>> /*
>>>> * Only valid injection if changing level for level-triggered IRQs or for a
>>>> - * rising edge.
>>>> + * rising edge. Injection of virtual interrupts associated to physical
>>>> + * interrupts always is valid.
>>>> */
>>>> static bool vgic_validate_injection(struct vgic_irq *irq, bool level)
>>>> {
>>>> switch (irq->config) {
>>>> case VGIC_CONFIG_LEVEL:
>>>> - return irq->line_level != level;
>>>> + return (irq->line_level != level || unlikely(is_mapped_spi(irq)));
>>>> case VGIC_CONFIG_EDGE:
>>>> return level;
>>>> }
>>>> diff --git a/virt/kvm/arm/vgic/vgic.h b/virt/kvm/arm/vgic/vgic.h
>>>> index bba7fa2..da254ae 100644
>>>> --- a/virt/kvm/arm/vgic/vgic.h
>>>> +++ b/virt/kvm/arm/vgic/vgic.h
>>>> @@ -96,14 +96,19 @@
>>>> /* we only support 64 kB translation table page size */
>>>> #define KVM_ITS_L1E_ADDR_MASK GENMASK_ULL(51, 16)
>>>>
>>>> +bool irq_line_level(struct vgic_irq *irq);
>>>> +
>>>> static inline bool irq_is_pending(struct vgic_irq *irq)
>>>> {
>>>> if (irq->config == VGIC_CONFIG_EDGE)
>>>> return irq->pending_latch;
>>>> else
>>>> - return irq->pending_latch || irq->line_level;
>>>> + return irq->pending_latch || irq_line_level(irq);
>>>
>>> I'm a bit concerned that an edge interrupt doesn't take the distributor
>>> state into account here. Why is that so? Once an SPI is forwarded to a
>>> guest, a large part of the edge vs level differences move into the HW,
>>> and are not that different anymore from a SW PoV.
>>
>> As pointed out by Christoffer in https://lkml.org/lkml/2017/6/8/322,
>> isn't it a bit risky in general to poke the physical state instead of
>> the virtual state. For level sensitive, to me we don't really have many
>> other alternatives. For edge, we are not obliged to.
>
> I think we need to be clear on the fundamental question of whether or
> not we consider pending_latch and/or line_level for mapped interrupts.
>
> I can definitely see the argument that the pending state is kept in
> hardware, so if you want to know that for a mapped interrupt, ask the
> hardware.
>
> The upside of this appraoch is a clean separation of state and we avoid
> any logic to synchronize a virtual state with the physical state.
>
> The downside is that it's slower to peek into the physical GIC than to
> read a variable from memory, and we need to special case the validate
> path (which I now understand).
>
> If we move to keeping the state in HW, how do we deal with GICD_SPENDR ?
> Does that mean we will forward a from the VM handled by the VGIC to the
> physical GIC?
Sounds like it to me. Otherwise, we start loosing some state. Note that
this is what my GICv4 patches are also doing, by forwarding the INT and
CLEAR commands to the physical ITS.
>> Don't we have situations, due to the lazy disable approach, where the
>> physical IRQ hits, enters the genirq handler and the actual handler is
>> not called, ie. the virtual IRQ is not injected?
>>
>
> I'm not sure I remember what these situations were, specifically, but
> certainly if we ever have a situation where a mapped irq's pending state
> should be different from that of the physical one, then it doesn't work.
There is a very simple way around the issue Eric mentions, which is to
use the "IRQ_DISABLE_UNLAZY" flag, which disable the lazy disabling of
interrupts. That's also something the GICv4 patches use, as when we
mask an interrupt, we want to be sure that it is immediately done (the
host side will never see the interrupt firing, and thus can never
disabled it).
Thanks,
M.
--
Jazz is not dead. It just smells funny...
On Tue, Jul 25, 2017 at 02:47:55PM +0100, Marc Zyngier wrote:
> On 21/07/17 14:03, Christoffer Dall wrote:
> > On Fri, Jul 07, 2017 at 09:41:42AM +0200, Auger Eric wrote:
> >> Hi Marc,
> >>
> >> On 04/07/2017 14:15, Marc Zyngier wrote:
> >>> Hi Eric,
> >>>
> >>> On 15/06/17 13:52, Eric Auger wrote:
> >>>> Currently, the line level of unmapped level sensitive SPIs is
> >>>> toggled down by the maintenance IRQ handler/resamplefd mechanism.
> >>>>
> >>>> As mapped SPI completion is not trapped, we cannot rely on this
> >>>> mechanism and the line level needs to be observed at distributor
> >>>> level instead.
> >>>>
> >>>> This patch handles the physical IRQ case in vgic_validate_injection
> >>>> and get the line level of a mapped SPI at distributor level.
> >>>>
> >>>> Signed-off-by: Eric Auger <[email protected]>
> >>>>
> >>>> ---
> >>>>
> >>>> v1 -> v2:
> >>>> - renamed is_unshared_mapped into is_mapped_spi
> >>>> - changes to kvm_vgic_map_phys_irq moved in the previous patch
> >>>> - make vgic_validate_injection more readable
> >>>> - reword the commit message
> >>>> ---
> >>>> virt/kvm/arm/vgic/vgic.c | 16 ++++++++++++++--
> >>>> virt/kvm/arm/vgic/vgic.h | 7 ++++++-
> >>>> 2 files changed, 20 insertions(+), 3 deletions(-)
> >>>>
> >>>> diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c
> >>>> index 075f073..2e35ac7 100644
> >>>> --- a/virt/kvm/arm/vgic/vgic.c
> >>>> +++ b/virt/kvm/arm/vgic/vgic.c
> >>>> @@ -139,6 +139,17 @@ void vgic_put_irq(struct kvm *kvm, struct vgic_irq *irq)
> >>>> kfree(irq);
> >>>> }
> >>>>
> >>>> +bool irq_line_level(struct vgic_irq *irq)
> >>>> +{
> >>>> + bool line_level = irq->line_level;
> >>>> +
> >>>> + if (unlikely(is_mapped_spi(irq)))
> >>>> + WARN_ON(irq_get_irqchip_state(irq->host_irq,
> >>>> + IRQCHIP_STATE_PENDING,
> >>>> + &line_level));
> >>>> + return line_level;
> >>>> +}
> >>>> +
> >>>> /**
> >>>> * kvm_vgic_target_oracle - compute the target vcpu for an irq
> >>>> *
> >>>> @@ -236,13 +247,14 @@ static void vgic_sort_ap_list(struct kvm_vcpu *vcpu)
> >>>>
> >>>> /*
> >>>> * Only valid injection if changing level for level-triggered IRQs or for a
> >>>> - * rising edge.
> >>>> + * rising edge. Injection of virtual interrupts associated to physical
> >>>> + * interrupts always is valid.
> >>>> */
> >>>> static bool vgic_validate_injection(struct vgic_irq *irq, bool level)
> >>>> {
> >>>> switch (irq->config) {
> >>>> case VGIC_CONFIG_LEVEL:
> >>>> - return irq->line_level != level;
> >>>> + return (irq->line_level != level || unlikely(is_mapped_spi(irq)));
> >>>> case VGIC_CONFIG_EDGE:
> >>>> return level;
> >>>> }
> >>>> diff --git a/virt/kvm/arm/vgic/vgic.h b/virt/kvm/arm/vgic/vgic.h
> >>>> index bba7fa2..da254ae 100644
> >>>> --- a/virt/kvm/arm/vgic/vgic.h
> >>>> +++ b/virt/kvm/arm/vgic/vgic.h
> >>>> @@ -96,14 +96,19 @@
> >>>> /* we only support 64 kB translation table page size */
> >>>> #define KVM_ITS_L1E_ADDR_MASK GENMASK_ULL(51, 16)
> >>>>
> >>>> +bool irq_line_level(struct vgic_irq *irq);
> >>>> +
> >>>> static inline bool irq_is_pending(struct vgic_irq *irq)
> >>>> {
> >>>> if (irq->config == VGIC_CONFIG_EDGE)
> >>>> return irq->pending_latch;
> >>>> else
> >>>> - return irq->pending_latch || irq->line_level;
> >>>> + return irq->pending_latch || irq_line_level(irq);
> >>>
> >>> I'm a bit concerned that an edge interrupt doesn't take the distributor
> >>> state into account here. Why is that so? Once an SPI is forwarded to a
> >>> guest, a large part of the edge vs level differences move into the HW,
> >>> and are not that different anymore from a SW PoV.
> >>
> >> As pointed out by Christoffer in https://lkml.org/lkml/2017/6/8/322,
> >> isn't it a bit risky in general to poke the physical state instead of
> >> the virtual state. For level sensitive, to me we don't really have many
> >> other alternatives. For edge, we are not obliged to.
> >
> > I think we need to be clear on the fundamental question of whether or
> > not we consider pending_latch and/or line_level for mapped interrupts.
> >
> > I can definitely see the argument that the pending state is kept in
> > hardware, so if you want to know that for a mapped interrupt, ask the
> > hardware.
> >
> > The upside of this appraoch is a clean separation of state and we avoid
> > any logic to synchronize a virtual state with the physical state.
> >
> > The downside is that it's slower to peek into the physical GIC than to
> > read a variable from memory, and we need to special case the validate
> > path (which I now understand).
> >
> > If we move to keeping the state in HW, how do we deal with GICD_SPENDR ?
> > Does that mean we will forward a from the VM handled by the VGIC to the
> > physical GIC?
>
> Sounds like it to me. Otherwise, we start loosing some state.
How do we loose state? Is it not more a question of complexity to make
sure the 'cached' vgic state is up to date with the real state? (like
what we currently do for the timer mapped interrupt).
On GICv2 this is likely going to make injecting timer interrupts slower,
because we'll check the pending state of whatever's in the AP list on
entry to the guest and peek into the physical GIC again.
> Note that
> this is what my GICv4 patches are also doing, by forwarding the INT and
> CLEAR commands to the physical ITS.
>
> >> Don't we have situations, due to the lazy disable approach, where the
> >> physical IRQ hits, enters the genirq handler and the actual handler is
> >> not called, ie. the virtual IRQ is not injected?
> >>
> >
> > I'm not sure I remember what these situations were, specifically, but
> > certainly if we ever have a situation where a mapped irq's pending state
> > should be different from that of the physical one, then it doesn't work.
>
> There is a very simple way around the issue Eric mentions, which is to
> use the "IRQ_DISABLE_UNLAZY" flag, which disable the lazy disabling of
> interrupts. That's also something the GICv4 patches use, as when we
> mask an interrupt, we want to be sure that it is immediately done (the
> host side will never see the interrupt firing, and thus can never
> disabled it).
>
If we don't care about the potential performance hit mentioned above, it
sounds like a good solution to me.
Thanks,
-Christoffer
On 25/07/17 15:48, Christoffer Dall wrote:
> On Tue, Jul 25, 2017 at 02:47:55PM +0100, Marc Zyngier wrote:
>> On 21/07/17 14:03, Christoffer Dall wrote:
>>> On Fri, Jul 07, 2017 at 09:41:42AM +0200, Auger Eric wrote:
>>>> Hi Marc,
>>>>
>>>> On 04/07/2017 14:15, Marc Zyngier wrote:
>>>>> Hi Eric,
>>>>>
>>>>> On 15/06/17 13:52, Eric Auger wrote:
>>>>>> Currently, the line level of unmapped level sensitive SPIs is
>>>>>> toggled down by the maintenance IRQ handler/resamplefd mechanism.
>>>>>>
>>>>>> As mapped SPI completion is not trapped, we cannot rely on this
>>>>>> mechanism and the line level needs to be observed at distributor
>>>>>> level instead.
>>>>>>
>>>>>> This patch handles the physical IRQ case in vgic_validate_injection
>>>>>> and get the line level of a mapped SPI at distributor level.
>>>>>>
>>>>>> Signed-off-by: Eric Auger <[email protected]>
>>>>>>
>>>>>> ---
>>>>>>
>>>>>> v1 -> v2:
>>>>>> - renamed is_unshared_mapped into is_mapped_spi
>>>>>> - changes to kvm_vgic_map_phys_irq moved in the previous patch
>>>>>> - make vgic_validate_injection more readable
>>>>>> - reword the commit message
>>>>>> ---
>>>>>> virt/kvm/arm/vgic/vgic.c | 16 ++++++++++++++--
>>>>>> virt/kvm/arm/vgic/vgic.h | 7 ++++++-
>>>>>> 2 files changed, 20 insertions(+), 3 deletions(-)
>>>>>>
>>>>>> diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c
>>>>>> index 075f073..2e35ac7 100644
>>>>>> --- a/virt/kvm/arm/vgic/vgic.c
>>>>>> +++ b/virt/kvm/arm/vgic/vgic.c
>>>>>> @@ -139,6 +139,17 @@ void vgic_put_irq(struct kvm *kvm, struct vgic_irq *irq)
>>>>>> kfree(irq);
>>>>>> }
>>>>>>
>>>>>> +bool irq_line_level(struct vgic_irq *irq)
>>>>>> +{
>>>>>> + bool line_level = irq->line_level;
>>>>>> +
>>>>>> + if (unlikely(is_mapped_spi(irq)))
>>>>>> + WARN_ON(irq_get_irqchip_state(irq->host_irq,
>>>>>> + IRQCHIP_STATE_PENDING,
>>>>>> + &line_level));
>>>>>> + return line_level;
>>>>>> +}
>>>>>> +
>>>>>> /**
>>>>>> * kvm_vgic_target_oracle - compute the target vcpu for an irq
>>>>>> *
>>>>>> @@ -236,13 +247,14 @@ static void vgic_sort_ap_list(struct kvm_vcpu *vcpu)
>>>>>>
>>>>>> /*
>>>>>> * Only valid injection if changing level for level-triggered IRQs or for a
>>>>>> - * rising edge.
>>>>>> + * rising edge. Injection of virtual interrupts associated to physical
>>>>>> + * interrupts always is valid.
>>>>>> */
>>>>>> static bool vgic_validate_injection(struct vgic_irq *irq, bool level)
>>>>>> {
>>>>>> switch (irq->config) {
>>>>>> case VGIC_CONFIG_LEVEL:
>>>>>> - return irq->line_level != level;
>>>>>> + return (irq->line_level != level || unlikely(is_mapped_spi(irq)));
>>>>>> case VGIC_CONFIG_EDGE:
>>>>>> return level;
>>>>>> }
>>>>>> diff --git a/virt/kvm/arm/vgic/vgic.h b/virt/kvm/arm/vgic/vgic.h
>>>>>> index bba7fa2..da254ae 100644
>>>>>> --- a/virt/kvm/arm/vgic/vgic.h
>>>>>> +++ b/virt/kvm/arm/vgic/vgic.h
>>>>>> @@ -96,14 +96,19 @@
>>>>>> /* we only support 64 kB translation table page size */
>>>>>> #define KVM_ITS_L1E_ADDR_MASK GENMASK_ULL(51, 16)
>>>>>>
>>>>>> +bool irq_line_level(struct vgic_irq *irq);
>>>>>> +
>>>>>> static inline bool irq_is_pending(struct vgic_irq *irq)
>>>>>> {
>>>>>> if (irq->config == VGIC_CONFIG_EDGE)
>>>>>> return irq->pending_latch;
>>>>>> else
>>>>>> - return irq->pending_latch || irq->line_level;
>>>>>> + return irq->pending_latch || irq_line_level(irq);
>>>>>
>>>>> I'm a bit concerned that an edge interrupt doesn't take the distributor
>>>>> state into account here. Why is that so? Once an SPI is forwarded to a
>>>>> guest, a large part of the edge vs level differences move into the HW,
>>>>> and are not that different anymore from a SW PoV.
>>>>
>>>> As pointed out by Christoffer in https://lkml.org/lkml/2017/6/8/322,
>>>> isn't it a bit risky in general to poke the physical state instead of
>>>> the virtual state. For level sensitive, to me we don't really have many
>>>> other alternatives. For edge, we are not obliged to.
>>>
>>> I think we need to be clear on the fundamental question of whether or
>>> not we consider pending_latch and/or line_level for mapped interrupts.
>>>
>>> I can definitely see the argument that the pending state is kept in
>>> hardware, so if you want to know that for a mapped interrupt, ask the
>>> hardware.
>>>
>>> The upside of this appraoch is a clean separation of state and we avoid
>>> any logic to synchronize a virtual state with the physical state.
>>>
>>> The downside is that it's slower to peek into the physical GIC than to
>>> read a variable from memory, and we need to special case the validate
>>> path (which I now understand).
>>>
>>> If we move to keeping the state in HW, how do we deal with GICD_SPENDR ?
>>> Does that mean we will forward a from the VM handled by the VGIC to the
>>> physical GIC?
>>
>> Sounds like it to me. Otherwise, we start loosing some state.
>
> How do we loose state? Is it not more a question of complexity to make
> sure the 'cached' vgic state is up to date with the real state? (like
> what we currently do for the timer mapped interrupt).
Sorry, I was very imprecise here. It is not so much that we'd loose
state, but that we'd have some state at the wrong location. If we have a
guest playing with the pending state, we need to make sure the physical
side is up to date. Otherwise, we can end-up in situations where we'd
inject an interrupt for the guest based on a pending state that only
exists in the virtual distributor, and yet the virtual CPUIF is going to
try and deactivate it in the physical side on EOI.
> On GICv2 this is likely going to make injecting timer interrupts slower,
> because we'll check the pending state of whatever's in the AP list on
> entry to the guest and peek into the physical GIC again.
That's a very valid concern. Though there is a slight distinction with
the timer, in that we entirely control the injection of the interrupt,
while an SPI can fire an any particular moment.
>
>> Note that
>> this is what my GICv4 patches are also doing, by forwarding the INT and
>> CLEAR commands to the physical ITS.
>>
>>>> Don't we have situations, due to the lazy disable approach, where the
>>>> physical IRQ hits, enters the genirq handler and the actual handler is
>>>> not called, ie. the virtual IRQ is not injected?
>>>>
>>>
>>> I'm not sure I remember what these situations were, specifically, but
>>> certainly if we ever have a situation where a mapped irq's pending state
>>> should be different from that of the physical one, then it doesn't work.
>>
>> There is a very simple way around the issue Eric mentions, which is to
>> use the "IRQ_DISABLE_UNLAZY" flag, which disable the lazy disabling of
>> interrupts. That's also something the GICv4 patches use, as when we
>> mask an interrupt, we want to be sure that it is immediately done (the
>> host side will never see the interrupt firing, and thus can never
>> disabled it).
>>
> If we don't care about the potential performance hit mentioned above, it
> sounds like a good solution to me.
I think we need to measure the damage this would cause.
Thanks,
M.
--
Jazz is not dead. It just smells funny...
On Tue, Jul 25, 2017 at 04:41:52PM +0100, Marc Zyngier wrote:
> On 25/07/17 15:48, Christoffer Dall wrote:
> > On Tue, Jul 25, 2017 at 02:47:55PM +0100, Marc Zyngier wrote:
> >> On 21/07/17 14:03, Christoffer Dall wrote:
> >>> On Fri, Jul 07, 2017 at 09:41:42AM +0200, Auger Eric wrote:
> >>>> Hi Marc,
> >>>>
> >>>> On 04/07/2017 14:15, Marc Zyngier wrote:
> >>>>> Hi Eric,
> >>>>>
> >>>>> On 15/06/17 13:52, Eric Auger wrote:
> >>>>>> Currently, the line level of unmapped level sensitive SPIs is
> >>>>>> toggled down by the maintenance IRQ handler/resamplefd mechanism.
> >>>>>>
> >>>>>> As mapped SPI completion is not trapped, we cannot rely on this
> >>>>>> mechanism and the line level needs to be observed at distributor
> >>>>>> level instead.
> >>>>>>
> >>>>>> This patch handles the physical IRQ case in vgic_validate_injection
> >>>>>> and get the line level of a mapped SPI at distributor level.
> >>>>>>
> >>>>>> Signed-off-by: Eric Auger <[email protected]>
> >>>>>>
> >>>>>> ---
> >>>>>>
> >>>>>> v1 -> v2:
> >>>>>> - renamed is_unshared_mapped into is_mapped_spi
> >>>>>> - changes to kvm_vgic_map_phys_irq moved in the previous patch
> >>>>>> - make vgic_validate_injection more readable
> >>>>>> - reword the commit message
> >>>>>> ---
> >>>>>> virt/kvm/arm/vgic/vgic.c | 16 ++++++++++++++--
> >>>>>> virt/kvm/arm/vgic/vgic.h | 7 ++++++-
> >>>>>> 2 files changed, 20 insertions(+), 3 deletions(-)
> >>>>>>
> >>>>>> diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c
> >>>>>> index 075f073..2e35ac7 100644
> >>>>>> --- a/virt/kvm/arm/vgic/vgic.c
> >>>>>> +++ b/virt/kvm/arm/vgic/vgic.c
> >>>>>> @@ -139,6 +139,17 @@ void vgic_put_irq(struct kvm *kvm, struct vgic_irq *irq)
> >>>>>> kfree(irq);
> >>>>>> }
> >>>>>>
> >>>>>> +bool irq_line_level(struct vgic_irq *irq)
> >>>>>> +{
> >>>>>> + bool line_level = irq->line_level;
> >>>>>> +
> >>>>>> + if (unlikely(is_mapped_spi(irq)))
> >>>>>> + WARN_ON(irq_get_irqchip_state(irq->host_irq,
> >>>>>> + IRQCHIP_STATE_PENDING,
> >>>>>> + &line_level));
> >>>>>> + return line_level;
> >>>>>> +}
> >>>>>> +
> >>>>>> /**
> >>>>>> * kvm_vgic_target_oracle - compute the target vcpu for an irq
> >>>>>> *
> >>>>>> @@ -236,13 +247,14 @@ static void vgic_sort_ap_list(struct kvm_vcpu *vcpu)
> >>>>>>
> >>>>>> /*
> >>>>>> * Only valid injection if changing level for level-triggered IRQs or for a
> >>>>>> - * rising edge.
> >>>>>> + * rising edge. Injection of virtual interrupts associated to physical
> >>>>>> + * interrupts always is valid.
> >>>>>> */
> >>>>>> static bool vgic_validate_injection(struct vgic_irq *irq, bool level)
> >>>>>> {
> >>>>>> switch (irq->config) {
> >>>>>> case VGIC_CONFIG_LEVEL:
> >>>>>> - return irq->line_level != level;
> >>>>>> + return (irq->line_level != level || unlikely(is_mapped_spi(irq)));
> >>>>>> case VGIC_CONFIG_EDGE:
> >>>>>> return level;
> >>>>>> }
> >>>>>> diff --git a/virt/kvm/arm/vgic/vgic.h b/virt/kvm/arm/vgic/vgic.h
> >>>>>> index bba7fa2..da254ae 100644
> >>>>>> --- a/virt/kvm/arm/vgic/vgic.h
> >>>>>> +++ b/virt/kvm/arm/vgic/vgic.h
> >>>>>> @@ -96,14 +96,19 @@
> >>>>>> /* we only support 64 kB translation table page size */
> >>>>>> #define KVM_ITS_L1E_ADDR_MASK GENMASK_ULL(51, 16)
> >>>>>>
> >>>>>> +bool irq_line_level(struct vgic_irq *irq);
> >>>>>> +
> >>>>>> static inline bool irq_is_pending(struct vgic_irq *irq)
> >>>>>> {
> >>>>>> if (irq->config == VGIC_CONFIG_EDGE)
> >>>>>> return irq->pending_latch;
> >>>>>> else
> >>>>>> - return irq->pending_latch || irq->line_level;
> >>>>>> + return irq->pending_latch || irq_line_level(irq);
> >>>>>
> >>>>> I'm a bit concerned that an edge interrupt doesn't take the distributor
> >>>>> state into account here. Why is that so? Once an SPI is forwarded to a
> >>>>> guest, a large part of the edge vs level differences move into the HW,
> >>>>> and are not that different anymore from a SW PoV.
> >>>>
> >>>> As pointed out by Christoffer in https://lkml.org/lkml/2017/6/8/322,
> >>>> isn't it a bit risky in general to poke the physical state instead of
> >>>> the virtual state. For level sensitive, to me we don't really have many
> >>>> other alternatives. For edge, we are not obliged to.
> >>>
> >>> I think we need to be clear on the fundamental question of whether or
> >>> not we consider pending_latch and/or line_level for mapped interrupts.
> >>>
> >>> I can definitely see the argument that the pending state is kept in
> >>> hardware, so if you want to know that for a mapped interrupt, ask the
> >>> hardware.
> >>>
> >>> The upside of this appraoch is a clean separation of state and we avoid
> >>> any logic to synchronize a virtual state with the physical state.
> >>>
> >>> The downside is that it's slower to peek into the physical GIC than to
> >>> read a variable from memory, and we need to special case the validate
> >>> path (which I now understand).
> >>>
> >>> If we move to keeping the state in HW, how do we deal with GICD_SPENDR ?
> >>> Does that mean we will forward a from the VM handled by the VGIC to the
> >>> physical GIC?
> >>
> >> Sounds like it to me. Otherwise, we start loosing some state.
> >
> > How do we loose state? Is it not more a question of complexity to make
> > sure the 'cached' vgic state is up to date with the real state? (like
> > what we currently do for the timer mapped interrupt).
>
> Sorry, I was very imprecise here. It is not so much that we'd loose
> state, but that we'd have some state at the wrong location. If we have a
> guest playing with the pending state, we need to make sure the physical
> side is up to date. Otherwise, we can end-up in situations where we'd
> inject an interrupt for the guest based on a pending state that only
> exists in the virtual distributor, and yet the virtual CPUIF is going to
> try and deactivate it in the physical side on EOI.
>
So my understanding was that we'd always have to make sure that
something presented as an active interrupt to the guest is also active
on the physical distributor, since otherwise the whole
deactivate-via-the-hw-bit thing is unpredictable, but that this doesn't
really apply to the pending state.
Actually, I can't seem to fully understand how we can move the pending
state of edge-triggered interrupts to the physical distributor.
If we an edge-triggered interrupt fires on the host, it will be pending,
but the host will ack the interrupt to figure out what happened an
forward it to the VM, an the physical interrupt is no longer pending,
but instead active, and the virtual interrupt is now pending. Should we
in this case write to the physical distributor and set the interrupt
pending again?
> > On GICv2 this is likely going to make injecting timer interrupts slower,
> > because we'll check the pending state of whatever's in the AP list on
> > entry to the guest and peek into the physical GIC again.
>
> That's a very valid concern. Though there is a slight distinction with
> the timer, in that we entirely control the injection of the interrupt,
> while an SPI can fire an any particular moment.
>
Right, but these patches treat the timer and other mapped interrupts the
same, or am I missing something?
> >
> >> Note that
> >> this is what my GICv4 patches are also doing, by forwarding the INT and
> >> CLEAR commands to the physical ITS.
> >>
> >>>> Don't we have situations, due to the lazy disable approach, where the
> >>>> physical IRQ hits, enters the genirq handler and the actual handler is
> >>>> not called, ie. the virtual IRQ is not injected?
> >>>>
> >>>
> >>> I'm not sure I remember what these situations were, specifically, but
> >>> certainly if we ever have a situation where a mapped irq's pending state
> >>> should be different from that of the physical one, then it doesn't work.
> >>
> >> There is a very simple way around the issue Eric mentions, which is to
> >> use the "IRQ_DISABLE_UNLAZY" flag, which disable the lazy disabling of
> >> interrupts. That's also something the GICv4 patches use, as when we
> >> mask an interrupt, we want to be sure that it is immediately done (the
> >> host side will never see the interrupt firing, and thus can never
> >> disabled it).
> >>
> > If we don't care about the potential performance hit mentioned above, it
> > sounds like a good solution to me.
>
> I think we need to measure the damage this would cause.
>
I agree, we have to measure it.
Thanks,
-Christoffer
Hi Christoffer,
On 21/07/2017 14:11, Christoffer Dall wrote:
> On Thu, Jun 15, 2017 at 02:52:37PM +0200, Eric Auger wrote:
>> Currently, the line level of unmapped level sensitive SPIs is
>> toggled down by the maintenance IRQ handler/resamplefd mechanism.
>>
>> As mapped SPI completion is not trapped, we cannot rely on this
>> mechanism and the line level needs to be observed at distributor
>> level instead.
>>
>> This patch handles the physical IRQ case in vgic_validate_injection
>> and get the line level of a mapped SPI at distributor level.
>>
>> Signed-off-by: Eric Auger <[email protected]>
>>
>> ---
>>
>> v1 -> v2:
>> - renamed is_unshared_mapped into is_mapped_spi
>> - changes to kvm_vgic_map_phys_irq moved in the previous patch
>> - make vgic_validate_injection more readable
>> - reword the commit message
>> ---
>> virt/kvm/arm/vgic/vgic.c | 16 ++++++++++++++--
>> virt/kvm/arm/vgic/vgic.h | 7 ++++++-
>> 2 files changed, 20 insertions(+), 3 deletions(-)
>>
>> diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c
>> index 075f073..2e35ac7 100644
>> --- a/virt/kvm/arm/vgic/vgic.c
>> +++ b/virt/kvm/arm/vgic/vgic.c
>> @@ -139,6 +139,17 @@ void vgic_put_irq(struct kvm *kvm, struct vgic_irq *irq)
>> kfree(irq);
>> }
>>
>> +bool irq_line_level(struct vgic_irq *irq)
>> +{
>> + bool line_level = irq->line_level;
>> +
>> + if (unlikely(is_mapped_spi(irq)))
>> + WARN_ON(irq_get_irqchip_state(irq->host_irq,
>> + IRQCHIP_STATE_PENDING,
>> + &line_level));
>> + return line_level;
>> +}
>> +
>> /**
>> * kvm_vgic_target_oracle - compute the target vcpu for an irq
>> *
>> @@ -236,13 +247,14 @@ static void vgic_sort_ap_list(struct kvm_vcpu *vcpu)
>>
>> /*
>> * Only valid injection if changing level for level-triggered IRQs or for a
>> - * rising edge.
>> + * rising edge. Injection of virtual interrupts associated to physical
>> + * interrupts always is valid.
>
> why? I don't remember this now, and that means I probably won't in the
> future either.
Sorry for the late reply.
The life cycle of the physically mapped IRQ is as follows:
- pIRQ becomes pending
- pIRQ is acknowledged by the physical handler and becomes active
- vIRQ gets injected as part of the physical handler chain
(VFIO->irqfd kvm_vgic_inject_irq for instance). Linux irq cannot
hit until vIRQ=pIRQ deactivation
- guest deactivates the vIRQ which automatically deactivates the pIRQ
So to me if we are about to vgic_validate_injection() an injection of a
physically mapped vIRQ this means the vgic state is ready to accept it:
previous occurence was deactivated. There cannot be any state
inconsistency around the line_level/level.
Do you agree?
I will add this description at least in the commit message.
>
> When I look at this now, I'm thinking, if we're not going to change
> anything, why proceed beyond validate injection?
don't catch this one. validation always succeeds and then we further
handle the IRQ.
Thanks
Eric
>
>> */
>> static bool vgic_validate_injection(struct vgic_irq *irq, bool level)
>> {
>> switch (irq->config) {
>> case VGIC_CONFIG_LEVEL:
>> - return irq->line_level != level;
>> + return (irq->line_level != level || unlikely(is_mapped_spi(irq)));
>> case VGIC_CONFIG_EDGE:
>> return level;
>> }
>> diff --git a/virt/kvm/arm/vgic/vgic.h b/virt/kvm/arm/vgic/vgic.h
>> index bba7fa2..da254ae 100644
>> --- a/virt/kvm/arm/vgic/vgic.h
>> +++ b/virt/kvm/arm/vgic/vgic.h
>> @@ -96,14 +96,19 @@
>> /* we only support 64 kB translation table page size */
>> #define KVM_ITS_L1E_ADDR_MASK GENMASK_ULL(51, 16)
>>
>> +bool irq_line_level(struct vgic_irq *irq);
>> +
>> static inline bool irq_is_pending(struct vgic_irq *irq)
>> {
>> if (irq->config == VGIC_CONFIG_EDGE)
>> return irq->pending_latch;
>> else
>> - return irq->pending_latch || irq->line_level;
>> + return irq->pending_latch || irq_line_level(irq);
>> }
>>
>> +#define is_mapped_spi(i) \
>> +((i)->hw && (i)->intid >= VGIC_NR_PRIVATE_IRQS && (i)->intid < 1020)
>> +
>
> nit: why is this not a static inline ?
>
>> /*
>> * This struct provides an intermediate representation of the fields contained
>> * in the GICH_VMCR and ICH_VMCR registers, such that code exporting the GIC
>> --
>> 2.5.5
>>
>
> Thanks,
> -Christoffer
>
Hi Christoffer,
On 21/07/2017 13:44, Christoffer Dall wrote:
> On Thu, Jun 15, 2017 at 02:52:36PM +0200, Eric Auger wrote:
>> We want to reuse the core of the map/unmap functions for IRQ
>> forwarding. Let's move the computation of the hwirq in
>> kvm_vgic_map_phys_irq and pass the linux IRQ as parameter.
>> the host_irq is added to struct vgic_irq.
>>
>> We introduce kvm_vgic_map/unmap_irq which take a struct vgic_irq
>> handle as a parameter.
>
> So this is to avoid the linux-to-hardware irq number translation in
> other callers?
Yes that's correct. Also forwarding code caller needs to hold the lock,
reason why I pass a vgic_irq as a param.
>
> I am sort of guessing that we need to store the host_irq number so that
> we can probe into the physical GIC in later patches? That may be good
> to note in this commit message if you respin.
done
Thanks
Eric
>
> Thanks,
> -Christoffer
>
>>
>> Signed-off-by: Eric Auger <[email protected]>
>> ---
>> include/kvm/arm_vgic.h | 8 ++++---
>> virt/kvm/arm/arch_timer.c | 24 +------------------
>> virt/kvm/arm/vgic/vgic.c | 60 +++++++++++++++++++++++++++++++++++------------
>> 3 files changed, 51 insertions(+), 41 deletions(-)
>>
>> diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
>> index ef71858..cceb31d 100644
>> --- a/include/kvm/arm_vgic.h
>> +++ b/include/kvm/arm_vgic.h
>> @@ -112,6 +112,7 @@ struct vgic_irq {
>> bool hw; /* Tied to HW IRQ */
>> struct kref refcount; /* Used for LPIs */
>> u32 hwintid; /* HW INTID number */
>> + unsigned int host_irq; /* linux irq corresponding to hwintid */
>> union {
>> u8 targets; /* GICv2 target VCPUs mask */
>> u32 mpidr; /* GICv3 target VCPU */
>> @@ -301,9 +302,10 @@ int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, unsigned int intid,
>> bool level);
>> int kvm_vgic_inject_mapped_irq(struct kvm *kvm, int cpuid, unsigned int intid,
>> bool level);
>> -int kvm_vgic_map_phys_irq(struct kvm_vcpu *vcpu, u32 virt_irq, u32 phys_irq);
>> -int kvm_vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, unsigned int virt_irq);
>> -bool kvm_vgic_map_is_active(struct kvm_vcpu *vcpu, unsigned int virt_irq);
>> +int kvm_vgic_map_phys_irq(struct kvm_vcpu *vcpu, unsigned int host_irq,
>> + u32 vintid);
>> +int kvm_vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, unsigned int vintid);
>> +bool kvm_vgic_map_is_active(struct kvm_vcpu *vcpu, unsigned int vintid);
>>
>> int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu);
>>
>> diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
>> index 5976609..57a30ab 100644
>> --- a/virt/kvm/arm/arch_timer.c
>> +++ b/virt/kvm/arm/arch_timer.c
>> @@ -617,9 +617,6 @@ int kvm_timer_enable(struct kvm_vcpu *vcpu)
>> {
>> struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
>> struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
>> - struct irq_desc *desc;
>> - struct irq_data *data;
>> - int phys_irq;
>> int ret;
>>
>> if (timer->enabled)
>> @@ -632,26 +629,7 @@ int kvm_timer_enable(struct kvm_vcpu *vcpu)
>> if (!vgic_initialized(vcpu->kvm))
>> return -ENODEV;
>>
>> - /*
>> - * Find the physical IRQ number corresponding to the host_vtimer_irq
>> - */
>> - desc = irq_to_desc(host_vtimer_irq);
>> - if (!desc) {
>> - kvm_err("%s: no interrupt descriptor\n", __func__);
>> - return -EINVAL;
>> - }
>> -
>> - data = irq_desc_get_irq_data(desc);
>> - while (data->parent_data)
>> - data = data->parent_data;
>> -
>> - phys_irq = data->hwirq;
>> -
>> - /*
>> - * Tell the VGIC that the virtual interrupt is tied to a
>> - * physical interrupt. We do that once per VCPU.
>> - */
>> - ret = kvm_vgic_map_phys_irq(vcpu, vtimer->irq.irq, phys_irq);
>> + ret = kvm_vgic_map_phys_irq(vcpu, host_vtimer_irq, vtimer->irq.irq);
>> if (ret)
>> return ret;
>>
>> diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c
>> index 83b24d2..075f073 100644
>> --- a/virt/kvm/arm/vgic/vgic.c
>> +++ b/virt/kvm/arm/vgic/vgic.c
>> @@ -17,6 +17,8 @@
>> #include <linux/kvm.h>
>> #include <linux/kvm_host.h>
>> #include <linux/list_sort.h>
>> +#include <linux/interrupt.h>
>> +#include <linux/irq.h>
>>
>> #include "vgic.h"
>>
>> @@ -392,38 +394,66 @@ int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, unsigned int intid,
>> return 0;
>> }
>>
>> -int kvm_vgic_map_phys_irq(struct kvm_vcpu *vcpu, u32 virt_irq, u32 phys_irq)
>> +/* @irq->irq_lock must be held */
>> +static int kvm_vgic_map_irq(struct kvm_vcpu *vcpu, struct vgic_irq *irq,
>> + unsigned int host_irq)
>> {
>> - struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, virt_irq);
>> + struct irq_desc *desc;
>> + struct irq_data *data;
>>
>> - BUG_ON(!irq);
>> -
>> - spin_lock(&irq->irq_lock);
>> + /*
>> + * Find the physical IRQ number corresponding to @host_irq
>> + */
>> + desc = irq_to_desc(host_irq);
>> + if (!desc) {
>> + kvm_err("%s: no interrupt descriptor\n", __func__);
>> + return -EINVAL;
>> + }
>> + data = irq_desc_get_irq_data(desc);
>> + while (data->parent_data)
>> + data = data->parent_data;
>>
>> irq->hw = true;
>> - irq->hwintid = phys_irq;
>> + irq->host_irq = host_irq;
>> + irq->hwintid = data->hwirq;
>> + return 0;
>> +}
>> +
>> +/* @irq->irq_lock must be held */
>> +static inline void kvm_vgic_unmap_irq(struct vgic_irq *irq)
>> +{
>> + irq->hw = false;
>> + irq->hwintid = 0;
>> +}
>> +
>> +int kvm_vgic_map_phys_irq(struct kvm_vcpu *vcpu, unsigned int host_irq,
>> + u32 vintid)
>> +{
>> + struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, vintid);
>> + int ret;
>>
>> + BUG_ON(!irq);
>> +
>> + spin_lock(&irq->irq_lock);
>> + ret = kvm_vgic_map_irq(vcpu, irq, host_irq);
>> spin_unlock(&irq->irq_lock);
>> vgic_put_irq(vcpu->kvm, irq);
>>
>> - return 0;
>> + return ret;
>> }
>>
>> -int kvm_vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, unsigned int virt_irq)
>> +int kvm_vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, unsigned int vintid)
>> {
>> struct vgic_irq *irq;
>>
>> if (!vgic_initialized(vcpu->kvm))
>> return -EAGAIN;
>>
>> - irq = vgic_get_irq(vcpu->kvm, vcpu, virt_irq);
>> + irq = vgic_get_irq(vcpu->kvm, vcpu, vintid);
>> BUG_ON(!irq);
>>
>> spin_lock(&irq->irq_lock);
>> -
>> - irq->hw = false;
>> - irq->hwintid = 0;
>> -
>> + kvm_vgic_unmap_irq(irq);
>> spin_unlock(&irq->irq_lock);
>> vgic_put_irq(vcpu->kvm, irq);
>>
>> @@ -726,9 +756,9 @@ void vgic_kick_vcpus(struct kvm *kvm)
>> }
>> }
>>
>> -bool kvm_vgic_map_is_active(struct kvm_vcpu *vcpu, unsigned int virt_irq)
>> +bool kvm_vgic_map_is_active(struct kvm_vcpu *vcpu, unsigned int vintid)
>> {
>> - struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, virt_irq);
>> + struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, vintid);
>> bool map_is_active;
>>
>> spin_lock(&irq->irq_lock);
>> --
>> 2.5.5
>>
Hi,
On 25/07/2017 17:41, Marc Zyngier wrote:
> On 25/07/17 15:48, Christoffer Dall wrote:
>> On Tue, Jul 25, 2017 at 02:47:55PM +0100, Marc Zyngier wrote:
>>> On 21/07/17 14:03, Christoffer Dall wrote:
>>>> On Fri, Jul 07, 2017 at 09:41:42AM +0200, Auger Eric wrote:
>>>>> Hi Marc,
>>>>>
>>>>> On 04/07/2017 14:15, Marc Zyngier wrote:
>>>>>> Hi Eric,
>>>>>>
>>>>>> On 15/06/17 13:52, Eric Auger wrote:
>>>>>>> Currently, the line level of unmapped level sensitive SPIs is
>>>>>>> toggled down by the maintenance IRQ handler/resamplefd mechanism.
>>>>>>>
>>>>>>> As mapped SPI completion is not trapped, we cannot rely on this
>>>>>>> mechanism and the line level needs to be observed at distributor
>>>>>>> level instead.
>>>>>>>
>>>>>>> This patch handles the physical IRQ case in vgic_validate_injection
>>>>>>> and get the line level of a mapped SPI at distributor level.
>>>>>>>
>>>>>>> Signed-off-by: Eric Auger <[email protected]>
>>>>>>>
>>>>>>> ---
>>>>>>>
>>>>>>> v1 -> v2:
>>>>>>> - renamed is_unshared_mapped into is_mapped_spi
>>>>>>> - changes to kvm_vgic_map_phys_irq moved in the previous patch
>>>>>>> - make vgic_validate_injection more readable
>>>>>>> - reword the commit message
>>>>>>> ---
>>>>>>> virt/kvm/arm/vgic/vgic.c | 16 ++++++++++++++--
>>>>>>> virt/kvm/arm/vgic/vgic.h | 7 ++++++-
>>>>>>> 2 files changed, 20 insertions(+), 3 deletions(-)
>>>>>>>
>>>>>>> diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c
>>>>>>> index 075f073..2e35ac7 100644
>>>>>>> --- a/virt/kvm/arm/vgic/vgic.c
>>>>>>> +++ b/virt/kvm/arm/vgic/vgic.c
>>>>>>> @@ -139,6 +139,17 @@ void vgic_put_irq(struct kvm *kvm, struct vgic_irq *irq)
>>>>>>> kfree(irq);
>>>>>>> }
>>>>>>>
>>>>>>> +bool irq_line_level(struct vgic_irq *irq)
>>>>>>> +{
>>>>>>> + bool line_level = irq->line_level;
>>>>>>> +
>>>>>>> + if (unlikely(is_mapped_spi(irq)))
>>>>>>> + WARN_ON(irq_get_irqchip_state(irq->host_irq,
>>>>>>> + IRQCHIP_STATE_PENDING,
>>>>>>> + &line_level));
>>>>>>> + return line_level;
>>>>>>> +}
>>>>>>> +
>>>>>>> /**
>>>>>>> * kvm_vgic_target_oracle - compute the target vcpu for an irq
>>>>>>> *
>>>>>>> @@ -236,13 +247,14 @@ static void vgic_sort_ap_list(struct kvm_vcpu *vcpu)
>>>>>>>
>>>>>>> /*
>>>>>>> * Only valid injection if changing level for level-triggered IRQs or for a
>>>>>>> - * rising edge.
>>>>>>> + * rising edge. Injection of virtual interrupts associated to physical
>>>>>>> + * interrupts always is valid.
>>>>>>> */
>>>>>>> static bool vgic_validate_injection(struct vgic_irq *irq, bool level)
>>>>>>> {
>>>>>>> switch (irq->config) {
>>>>>>> case VGIC_CONFIG_LEVEL:
>>>>>>> - return irq->line_level != level;
>>>>>>> + return (irq->line_level != level || unlikely(is_mapped_spi(irq)));
>>>>>>> case VGIC_CONFIG_EDGE:
>>>>>>> return level;
>>>>>>> }
>>>>>>> diff --git a/virt/kvm/arm/vgic/vgic.h b/virt/kvm/arm/vgic/vgic.h
>>>>>>> index bba7fa2..da254ae 100644
>>>>>>> --- a/virt/kvm/arm/vgic/vgic.h
>>>>>>> +++ b/virt/kvm/arm/vgic/vgic.h
>>>>>>> @@ -96,14 +96,19 @@
>>>>>>> /* we only support 64 kB translation table page size */
>>>>>>> #define KVM_ITS_L1E_ADDR_MASK GENMASK_ULL(51, 16)
>>>>>>>
>>>>>>> +bool irq_line_level(struct vgic_irq *irq);
>>>>>>> +
>>>>>>> static inline bool irq_is_pending(struct vgic_irq *irq)
>>>>>>> {
>>>>>>> if (irq->config == VGIC_CONFIG_EDGE)
>>>>>>> return irq->pending_latch;
>>>>>>> else
>>>>>>> - return irq->pending_latch || irq->line_level;
>>>>>>> + return irq->pending_latch || irq_line_level(irq);
>>>>>>
>>>>>> I'm a bit concerned that an edge interrupt doesn't take the distributor
>>>>>> state into account here. Why is that so? Once an SPI is forwarded to a
>>>>>> guest, a large part of the edge vs level differences move into the HW,
>>>>>> and are not that different anymore from a SW PoV.
>>>>>
>>>>> As pointed out by Christoffer in https://lkml.org/lkml/2017/6/8/322,
>>>>> isn't it a bit risky in general to poke the physical state instead of
>>>>> the virtual state. For level sensitive, to me we don't really have many
>>>>> other alternatives. For edge, we are not obliged to.
>>>>
>>>> I think we need to be clear on the fundamental question of whether or
>>>> not we consider pending_latch and/or line_level for mapped interrupts.
>>>>
>>>> I can definitely see the argument that the pending state is kept in
>>>> hardware, so if you want to know that for a mapped interrupt, ask the
>>>> hardware.
>>>>
>>>> The upside of this appraoch is a clean separation of state and we avoid
>>>> any logic to synchronize a virtual state with the physical state.
>>>>
>>>> The downside is that it's slower to peek into the physical GIC than to
>>>> read a variable from memory, and we need to special case the validate
>>>> path (which I now understand).
>>>>
>>>> If we move to keeping the state in HW, how do we deal with GICD_SPENDR ?
>>>> Does that mean we will forward a from the VM handled by the VGIC to the
>>>> physical GIC?
>>>
>>> Sounds like it to me. Otherwise, we start loosing some state.
>>
>> How do we loose state? Is it not more a question of complexity to make
>> sure the 'cached' vgic state is up to date with the real state? (like
>> what we currently do for the timer mapped interrupt).
>
> Sorry, I was very imprecise here. It is not so much that we'd loose
> state, but that we'd have some state at the wrong location. If we have a
> guest playing with the pending state, we need to make sure the physical
> side is up to date. Otherwise, we can end-up in situations where we'd
> inject an interrupt for the guest based on a pending state that only
> exists in the virtual distributor, and yet the virtual CPUIF is going to
> try and deactivate it in the physical side on EOI.
At the moment the write to GICD_ISPENDR (vgic_mmio_write_spending) sets
the pending_latch virtual state.
and the read uses irq_is_pending() new wrapper which looks at both the
physical state and the pending_latch virtual state in case of physically
mapped irq (irq_is_pending).
But As Marc mentioned, as I don't propagate the pending_latch to the
physical distributor, effectively we can have a guest "DIR" that tries
to complete a physical IRQ that is not active and that's wrong.
>
>> On GICv2 this is likely going to make injecting timer interrupts slower,
>> because we'll check the pending state of whatever's in the AP list on
>> entry to the guest and peek into the physical GIC again.
aren't timer interrupts PPIs? My series only affect the state machine
for physically mapped SPIs so this shouldn't have any impact, no?
Thanks
Eric
>
> That's a very valid concern. Though there is a slight distinction with
> the timer, in that we entirely control the injection of the interrupt,
> while an SPI can fire an any particular moment.
>
>>
>>> Note that
>>> this is what my GICv4 patches are also doing, by forwarding the INT and
>>> CLEAR commands to the physical ITS.
>>>
>>>>> Don't we have situations, due to the lazy disable approach, where the
>>>>> physical IRQ hits, enters the genirq handler and the actual handler is
>>>>> not called, ie. the virtual IRQ is not injected?
>>>>>
>>>>
>>>> I'm not sure I remember what these situations were, specifically, but
>>>> certainly if we ever have a situation where a mapped irq's pending state
>>>> should be different from that of the physical one, then it doesn't work.
>>>
>>> There is a very simple way around the issue Eric mentions, which is to
>>> use the "IRQ_DISABLE_UNLAZY" flag, which disable the lazy disabling of
>>> interrupts. That's also something the GICv4 patches use, as when we
>>> mask an interrupt, we want to be sure that it is immediately done (the
>>> host side will never see the interrupt firing, and thus can never
>>> disabled it).
>>>
>> If we don't care about the potential performance hit mentioned above, it
>> sounds like a good solution to me.
>
> I think we need to measure the damage this would cause.
>
> Thanks,
>
> M.
>
Hi Marc,
On 04/07/2017 14:15, Marc Zyngier wrote:
> Hi Eric,
>
> On 15/06/17 13:52, Eric Auger wrote:
>> Currently, the line level of unmapped level sensitive SPIs is
>> toggled down by the maintenance IRQ handler/resamplefd mechanism.
>>
>> As mapped SPI completion is not trapped, we cannot rely on this
>> mechanism and the line level needs to be observed at distributor
>> level instead.
>>
>> This patch handles the physical IRQ case in vgic_validate_injection
>> and get the line level of a mapped SPI at distributor level.
>>
>> Signed-off-by: Eric Auger <[email protected]>
>>
>> ---
>>
>> v1 -> v2:
>> - renamed is_unshared_mapped into is_mapped_spi
>> - changes to kvm_vgic_map_phys_irq moved in the previous patch
>> - make vgic_validate_injection more readable
>> - reword the commit message
>> ---
>> virt/kvm/arm/vgic/vgic.c | 16 ++++++++++++++--
>> virt/kvm/arm/vgic/vgic.h | 7 ++++++-
>> 2 files changed, 20 insertions(+), 3 deletions(-)
>>
>> diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c
>> index 075f073..2e35ac7 100644
>> --- a/virt/kvm/arm/vgic/vgic.c
>> +++ b/virt/kvm/arm/vgic/vgic.c
>> @@ -139,6 +139,17 @@ void vgic_put_irq(struct kvm *kvm, struct vgic_irq *irq)
>> kfree(irq);
>> }
>>
>> +bool irq_line_level(struct vgic_irq *irq)
>> +{
>> + bool line_level = irq->line_level;
>> +
>> + if (unlikely(is_mapped_spi(irq)))
>> + WARN_ON(irq_get_irqchip_state(irq->host_irq,
>> + IRQCHIP_STATE_PENDING,
>> + &line_level));
>> + return line_level;
>> +}
>> +
>> /**
>> * kvm_vgic_target_oracle - compute the target vcpu for an irq
>> *
>> @@ -236,13 +247,14 @@ static void vgic_sort_ap_list(struct kvm_vcpu *vcpu)
>>
>> /*
>> * Only valid injection if changing level for level-triggered IRQs or for a
>> - * rising edge.
>> + * rising edge. Injection of virtual interrupts associated to physical
>> + * interrupts always is valid.
>> */
>> static bool vgic_validate_injection(struct vgic_irq *irq, bool level)
>> {
>> switch (irq->config) {
>> case VGIC_CONFIG_LEVEL:
>> - return irq->line_level != level;
>> + return (irq->line_level != level || unlikely(is_mapped_spi(irq)));
>> case VGIC_CONFIG_EDGE:
>> return level;
>> }
>> diff --git a/virt/kvm/arm/vgic/vgic.h b/virt/kvm/arm/vgic/vgic.h
>> index bba7fa2..da254ae 100644
>> --- a/virt/kvm/arm/vgic/vgic.h
>> +++ b/virt/kvm/arm/vgic/vgic.h
>> @@ -96,14 +96,19 @@
>> /* we only support 64 kB translation table page size */
>> #define KVM_ITS_L1E_ADDR_MASK GENMASK_ULL(51, 16)
>>
>> +bool irq_line_level(struct vgic_irq *irq);
>> +
>> static inline bool irq_is_pending(struct vgic_irq *irq)
>> {
>> if (irq->config == VGIC_CONFIG_EDGE)
>> return irq->pending_latch;
>> else
>> - return irq->pending_latch || irq->line_level;
>> + return irq->pending_latch || irq_line_level(irq);
>
> I'm a bit concerned that an edge interrupt doesn't take the distributor
> state into account here. Why is that so? Once an SPI is forwarded to a
> guest, a large part of the edge vs level differences move into the HW,
> and are not that different anymore from a SW PoV.
As mentioned by Christoffer, When a physically mapped edge sensitive
vSPI is pending (irq->pending_latch set), the physical state can be
active or pending & active and maybe pending? (if the guest has
completed the vSPI and we have not folded the LR and a new pSPI
triggered). That's a real mess. Assuming I propagate the pending latch
to the pDIST do we really have an issue with state inconsistency?
So I would tend to think:
- I need to fix the issue of ISPENDRn/ICPENDRn and propagate the pending
state to the pDIST on those guest writes.
- Then raises the question of the save/restore (uaccess) of a physically
mapped SPI. I guess we should never try to save/restore state of a
physically mapped SPI?
- I need to prevent the lazy disable approach
Thoughts?
Thanks
Eric
>
> Thanks,
>
> M.
>
Hi Christoffer,
On 21/07/2017 15:13, Christoffer Dall wrote:
> On Thu, Jun 15, 2017 at 02:52:38PM +0200, Eric Auger wrote:
>> Implements kvm_vgic_[set|unset]_forwarding.
>>
>> Handle low-level VGIC programming and consistent irqchip
>> programming.
>>
>> Signed-off-by: Eric Auger <[email protected]>
>>
>> ---
>>
>> v1 -> v2:
>> - change the parameter names used in the declaration
>> - use kvm_vgic_map/unmap_phys_irq and handle their returned value
>> ---
>> include/kvm/arm_vgic.h | 5 +++
>> virt/kvm/arm/vgic/vgic.c | 88 ++++++++++++++++++++++++++++++++++++++++++++++++
>> 2 files changed, 93 insertions(+)
>>
>> diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
>> index cceb31d..5064a57 100644
>> --- a/include/kvm/arm_vgic.h
>> +++ b/include/kvm/arm_vgic.h
>> @@ -343,4 +343,9 @@ int kvm_send_userspace_msi(struct kvm *kvm, struct kvm_msi *msi);
>> */
>> int kvm_vgic_setup_default_irq_routing(struct kvm *kvm);
>>
>> +int kvm_vgic_set_forwarding(struct kvm *kvm, unsigned int host_irq,
>> + unsigned int vintid);
>> +void kvm_vgic_unset_forwarding(struct kvm *kvm, unsigned int host_irq,
>> + unsigned int vintid);
>> +
>> #endif /* __KVM_ARM_VGIC_H */
>> diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c
>> index 2e35ac7..9ee3e77 100644
>> --- a/virt/kvm/arm/vgic/vgic.c
>> +++ b/virt/kvm/arm/vgic/vgic.c
>> @@ -781,3 +781,91 @@ bool kvm_vgic_map_is_active(struct kvm_vcpu *vcpu, unsigned int vintid)
>> return map_is_active;
>> }
>>
>> +/**
>> + * kvm_vgic_set_forwarding - Set IRQ forwarding
>> + *
>> + * @kvm: kvm handle
>> + * @host_irq: the host linux IRQ
>> + * @vintid: the virtual INTID
>> + *
>> + * This function must be called when the IRQ is not active:
>> + * ie. not active at GIC level and not currently under injection
>> + * into the guest using the unforwarded mode. The physical IRQ must
>> + * be disabled and all vCPUs must have been exited and prevented
>> + * from being re-entered.
>> + */
>> +int kvm_vgic_set_forwarding(struct kvm *kvm, unsigned int host_irq,
>> + unsigned int vintid)
>> +{
>> + struct kvm_vcpu *vcpu;
>> + struct vgic_irq *irq;
>> + int ret;
>> +
>> + kvm_debug("%s host_irq=%d vintid=%d\n", __func__, host_irq, vintid);
>
> do you need to check if the vgic is initialized etc. here?
yes
>
>> +
>> + if (!vgic_valid_spi(kvm, vintid))
>> + return -EINVAL;
>> +
>> + irq = vgic_get_irq(kvm, NULL, vintid);
>> + spin_lock(&irq->irq_lock);
>> +
>> + if (irq->hw) {
>> + ret = -EINVAL;
>
> is this because it's already forwarded? How about EBUSY or EEXIST
> instead then?
OK
>
>> + goto unlock;
>> + }
>> + vcpu = irq->target_vcpu;
>> + if (!vcpu) {
>> + ret = -EAGAIN;
>
> what is this case exactly?
This was discussed previously with Marc
(https://patchwork.kernel.org/patch/9746841/). In GICv3 case the vcpu
parameter is not used in irq_set_vcpu_affinity. What this function does
is tell the GIC not to DIR the physical IRQ.
So in my case I just need a non NULL vcpu passed as parameter of
irq_set_vcpu_affinity. kvm_vgic_map_irq is not using it because we are
handling SPIs. But in GICv4 the actual target vpcu will be needed so I
decided to use this latter and return an error in case it is not known.
>
>> + goto unlock;
>> + }
>> +
>> + ret = kvm_vgic_map_irq(vcpu, irq, host_irq);
>> + if (!ret)
>> + irq_set_vcpu_affinity(host_irq, vcpu);
>
> so this is essentially map + set_vcpu_affinity. Why do we want the GIC
> to do this in one go as opposed to leaving it up to the caller?
The VGIC code already use some genirq functions like
irq_set/get_irqchip_state. Using the irq->lock prevents the 2 actions
from being raced with an kvm_vgic_unset_forwarding(). Both the GIC and
VGIC programming must be consistent.
Thanks
Eric
>
>> +unlock:
>> + spin_unlock(&irq->irq_lock);
>> + vgic_put_irq(kvm, irq);
>> + return ret;
>> +}
>> +
>> +/**
>> + * kvm_vgic_unset_forwarding - Unset IRQ forwarding
>> + *
>> + * @kvm: KVM handle
>> + * @host_irq: the host Linux IRQ number
>> + * @vintid: virtual INTID
>> + *
>> + * This function must be called when the host irq is disabled and
>> + * all vCPUs have been exited and prevented from being re-entered.
>> + */
>> +void kvm_vgic_unset_forwarding(struct kvm *kvm,
>> + unsigned int host_irq,
>> + unsigned int vintid)
>> +{
>> + struct vgic_irq *irq;
>> + bool active;
>> +
>> + kvm_debug("%s host_irq=%d vintid=%d\n", __func__, host_irq, vintid);
>
> do you need to check if the vgic is initialized etc. here?
>
>> +
>> + if (!vgic_valid_spi(kvm, vintid))
>> + return;
>> +
>> + irq = vgic_get_irq(kvm, NULL, vintid);
>> + spin_lock(&irq->irq_lock);
>> +
>> + if (!irq->hw)
>> + goto unlock;
>> +
>> + WARN_ON(irq_get_irqchip_state(host_irq, IRQCHIP_STATE_ACTIVE, &active));
>> +
>> + if (active)
>> + irq_set_irqchip_state(host_irq, IRQCHIP_STATE_ACTIVE, false);
>> +
>> + kvm_vgic_unmap_irq(irq);
>> + irq_set_vcpu_affinity(host_irq, NULL);
>> +
>> +unlock:
>> + spin_unlock(&irq->irq_lock);
>> + vgic_put_irq(kvm, irq);
>> +}
>> +
>> --
>> 2.5.5
>>
>
> Thanks,
> -Christoffer
>
On Tue, Aug 22, 2017 at 04:35:26PM +0200, Auger Eric wrote:
> Hi,
>
> On 25/07/2017 17:41, Marc Zyngier wrote:
> > On 25/07/17 15:48, Christoffer Dall wrote:
> >> On Tue, Jul 25, 2017 at 02:47:55PM +0100, Marc Zyngier wrote:
> >>> On 21/07/17 14:03, Christoffer Dall wrote:
> >>>> On Fri, Jul 07, 2017 at 09:41:42AM +0200, Auger Eric wrote:
> >>>>> Hi Marc,
> >>>>>
> >>>>> On 04/07/2017 14:15, Marc Zyngier wrote:
> >>>>>> Hi Eric,
> >>>>>>
> >>>>>> On 15/06/17 13:52, Eric Auger wrote:
> >>>>>>> Currently, the line level of unmapped level sensitive SPIs is
> >>>>>>> toggled down by the maintenance IRQ handler/resamplefd mechanism.
> >>>>>>>
> >>>>>>> As mapped SPI completion is not trapped, we cannot rely on this
> >>>>>>> mechanism and the line level needs to be observed at distributor
> >>>>>>> level instead.
> >>>>>>>
> >>>>>>> This patch handles the physical IRQ case in vgic_validate_injection
> >>>>>>> and get the line level of a mapped SPI at distributor level.
> >>>>>>>
> >>>>>>> Signed-off-by: Eric Auger <[email protected]>
> >>>>>>>
> >>>>>>> ---
> >>>>>>>
> >>>>>>> v1 -> v2:
> >>>>>>> - renamed is_unshared_mapped into is_mapped_spi
> >>>>>>> - changes to kvm_vgic_map_phys_irq moved in the previous patch
> >>>>>>> - make vgic_validate_injection more readable
> >>>>>>> - reword the commit message
> >>>>>>> ---
> >>>>>>> virt/kvm/arm/vgic/vgic.c | 16 ++++++++++++++--
> >>>>>>> virt/kvm/arm/vgic/vgic.h | 7 ++++++-
> >>>>>>> 2 files changed, 20 insertions(+), 3 deletions(-)
> >>>>>>>
> >>>>>>> diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c
> >>>>>>> index 075f073..2e35ac7 100644
> >>>>>>> --- a/virt/kvm/arm/vgic/vgic.c
> >>>>>>> +++ b/virt/kvm/arm/vgic/vgic.c
> >>>>>>> @@ -139,6 +139,17 @@ void vgic_put_irq(struct kvm *kvm, struct vgic_irq *irq)
> >>>>>>> kfree(irq);
> >>>>>>> }
> >>>>>>>
> >>>>>>> +bool irq_line_level(struct vgic_irq *irq)
> >>>>>>> +{
> >>>>>>> + bool line_level = irq->line_level;
> >>>>>>> +
> >>>>>>> + if (unlikely(is_mapped_spi(irq)))
> >>>>>>> + WARN_ON(irq_get_irqchip_state(irq->host_irq,
> >>>>>>> + IRQCHIP_STATE_PENDING,
> >>>>>>> + &line_level));
> >>>>>>> + return line_level;
> >>>>>>> +}
> >>>>>>> +
> >>>>>>> /**
> >>>>>>> * kvm_vgic_target_oracle - compute the target vcpu for an irq
> >>>>>>> *
> >>>>>>> @@ -236,13 +247,14 @@ static void vgic_sort_ap_list(struct kvm_vcpu *vcpu)
> >>>>>>>
> >>>>>>> /*
> >>>>>>> * Only valid injection if changing level for level-triggered IRQs or for a
> >>>>>>> - * rising edge.
> >>>>>>> + * rising edge. Injection of virtual interrupts associated to physical
> >>>>>>> + * interrupts always is valid.
> >>>>>>> */
> >>>>>>> static bool vgic_validate_injection(struct vgic_irq *irq, bool level)
> >>>>>>> {
> >>>>>>> switch (irq->config) {
> >>>>>>> case VGIC_CONFIG_LEVEL:
> >>>>>>> - return irq->line_level != level;
> >>>>>>> + return (irq->line_level != level || unlikely(is_mapped_spi(irq)));
> >>>>>>> case VGIC_CONFIG_EDGE:
> >>>>>>> return level;
> >>>>>>> }
> >>>>>>> diff --git a/virt/kvm/arm/vgic/vgic.h b/virt/kvm/arm/vgic/vgic.h
> >>>>>>> index bba7fa2..da254ae 100644
> >>>>>>> --- a/virt/kvm/arm/vgic/vgic.h
> >>>>>>> +++ b/virt/kvm/arm/vgic/vgic.h
> >>>>>>> @@ -96,14 +96,19 @@
> >>>>>>> /* we only support 64 kB translation table page size */
> >>>>>>> #define KVM_ITS_L1E_ADDR_MASK GENMASK_ULL(51, 16)
> >>>>>>>
> >>>>>>> +bool irq_line_level(struct vgic_irq *irq);
> >>>>>>> +
> >>>>>>> static inline bool irq_is_pending(struct vgic_irq *irq)
> >>>>>>> {
> >>>>>>> if (irq->config == VGIC_CONFIG_EDGE)
> >>>>>>> return irq->pending_latch;
> >>>>>>> else
> >>>>>>> - return irq->pending_latch || irq->line_level;
> >>>>>>> + return irq->pending_latch || irq_line_level(irq);
> >>>>>>
> >>>>>> I'm a bit concerned that an edge interrupt doesn't take the distributor
> >>>>>> state into account here. Why is that so? Once an SPI is forwarded to a
> >>>>>> guest, a large part of the edge vs level differences move into the HW,
> >>>>>> and are not that different anymore from a SW PoV.
> >>>>>
> >>>>> As pointed out by Christoffer in https://lkml.org/lkml/2017/6/8/322,
> >>>>> isn't it a bit risky in general to poke the physical state instead of
> >>>>> the virtual state. For level sensitive, to me we don't really have many
> >>>>> other alternatives. For edge, we are not obliged to.
> >>>>
> >>>> I think we need to be clear on the fundamental question of whether or
> >>>> not we consider pending_latch and/or line_level for mapped interrupts.
> >>>>
> >>>> I can definitely see the argument that the pending state is kept in
> >>>> hardware, so if you want to know that for a mapped interrupt, ask the
> >>>> hardware.
> >>>>
> >>>> The upside of this appraoch is a clean separation of state and we avoid
> >>>> any logic to synchronize a virtual state with the physical state.
> >>>>
> >>>> The downside is that it's slower to peek into the physical GIC than to
> >>>> read a variable from memory, and we need to special case the validate
> >>>> path (which I now understand).
> >>>>
> >>>> If we move to keeping the state in HW, how do we deal with GICD_SPENDR ?
> >>>> Does that mean we will forward a from the VM handled by the VGIC to the
> >>>> physical GIC?
> >>>
> >>> Sounds like it to me. Otherwise, we start loosing some state.
> >>
> >> How do we loose state? Is it not more a question of complexity to make
> >> sure the 'cached' vgic state is up to date with the real state? (like
> >> what we currently do for the timer mapped interrupt).
> >
> > Sorry, I was very imprecise here. It is not so much that we'd loose
> > state, but that we'd have some state at the wrong location. If we have a
> > guest playing with the pending state, we need to make sure the physical
> > side is up to date. Otherwise, we can end-up in situations where we'd
> > inject an interrupt for the guest based on a pending state that only
> > exists in the virtual distributor, and yet the virtual CPUIF is going to
> > try and deactivate it in the physical side on EOI.
>
> At the moment the write to GICD_ISPENDR (vgic_mmio_write_spending) sets
> the pending_latch virtual state.
>
> and the read uses irq_is_pending() new wrapper which looks at both the
> physical state and the pending_latch virtual state in case of physically
> mapped irq (irq_is_pending).
>
> But As Marc mentioned, as I don't propagate the pending_latch to the
> physical distributor, effectively we can have a guest "DIR" that tries
> to complete a physical IRQ that is not active and that's wrong.
>
> >
> >> On GICv2 this is likely going to make injecting timer interrupts slower,
> >> because we'll check the pending state of whatever's in the AP list on
> >> entry to the guest and peek into the physical GIC again.
> aren't timer interrupts PPIs? My series only affect the state machine
> for physically mapped SPIs so this shouldn't have any impact, no?
>
That's true. I guess we got lucky.
Thanks,
-Christoffer
On Tue, Aug 22, 2017 at 04:33:42PM +0200, Auger Eric wrote:
> Hi Christoffer,
>
> On 21/07/2017 14:11, Christoffer Dall wrote:
> > On Thu, Jun 15, 2017 at 02:52:37PM +0200, Eric Auger wrote:
> >> Currently, the line level of unmapped level sensitive SPIs is
> >> toggled down by the maintenance IRQ handler/resamplefd mechanism.
> >>
> >> As mapped SPI completion is not trapped, we cannot rely on this
> >> mechanism and the line level needs to be observed at distributor
> >> level instead.
> >>
> >> This patch handles the physical IRQ case in vgic_validate_injection
> >> and get the line level of a mapped SPI at distributor level.
> >>
> >> Signed-off-by: Eric Auger <[email protected]>
> >>
> >> ---
> >>
> >> v1 -> v2:
> >> - renamed is_unshared_mapped into is_mapped_spi
> >> - changes to kvm_vgic_map_phys_irq moved in the previous patch
> >> - make vgic_validate_injection more readable
> >> - reword the commit message
> >> ---
> >> virt/kvm/arm/vgic/vgic.c | 16 ++++++++++++++--
> >> virt/kvm/arm/vgic/vgic.h | 7 ++++++-
> >> 2 files changed, 20 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c
> >> index 075f073..2e35ac7 100644
> >> --- a/virt/kvm/arm/vgic/vgic.c
> >> +++ b/virt/kvm/arm/vgic/vgic.c
> >> @@ -139,6 +139,17 @@ void vgic_put_irq(struct kvm *kvm, struct vgic_irq *irq)
> >> kfree(irq);
> >> }
> >>
> >> +bool irq_line_level(struct vgic_irq *irq)
> >> +{
> >> + bool line_level = irq->line_level;
> >> +
> >> + if (unlikely(is_mapped_spi(irq)))
> >> + WARN_ON(irq_get_irqchip_state(irq->host_irq,
> >> + IRQCHIP_STATE_PENDING,
> >> + &line_level));
> >> + return line_level;
> >> +}
> >> +
> >> /**
> >> * kvm_vgic_target_oracle - compute the target vcpu for an irq
> >> *
> >> @@ -236,13 +247,14 @@ static void vgic_sort_ap_list(struct kvm_vcpu *vcpu)
> >>
> >> /*
> >> * Only valid injection if changing level for level-triggered IRQs or for a
> >> - * rising edge.
> >> + * rising edge. Injection of virtual interrupts associated to physical
> >> + * interrupts always is valid.
> >
> > why? I don't remember this now, and that means I probably won't in the
> > future either.
>
> Sorry for the late reply.
>
> The life cycle of the physically mapped IRQ is as follows:
> - pIRQ becomes pending
> - pIRQ is acknowledged by the physical handler and becomes active
> - vIRQ gets injected as part of the physical handler chain
> (VFIO->irqfd kvm_vgic_inject_irq for instance). Linux irq cannot
> hit until vIRQ=pIRQ deactivation
> - guest deactivates the vIRQ which automatically deactivates the pIRQ
>
>
> So to me if we are about to vgic_validate_injection() an injection of a
> physically mapped vIRQ this means the vgic state is ready to accept it:
> previous occurence was deactivated. There cannot be any state
> inconsistency around the line_level/level.
>
> Do you agree?
>
> I will add this description at least in the commit message.
I think the important point is, that even though we don't change the
level, we still add it to the AP list if not already there, and
therefore we still do this.
> >
> > When I look at this now, I'm thinking, if we're not going to change
> > anything, why proceed beyond validate injection?
>
> don't catch this one. validation always succeeds and then we further
> handle the IRQ.
The problem is that the code suggests that we will not change something,
but in fact, later on, in the caller, we do queue this IRQ if not on the
AP list, even though there were no state change in the struct IRQ.
But Marc and I sketched out another proposal which could benefit the
timer as well. Let me try to verify if it works and send it to you and
see if that is an improvement over this one.
Thanks,
-Christoffer
Hi Christoffer,
On 29/08/2017 08:45, Christoffer Dall wrote:
> On Tue, Aug 22, 2017 at 04:33:42PM +0200, Auger Eric wrote:
>> Hi Christoffer,
>>
>> On 21/07/2017 14:11, Christoffer Dall wrote:
>>> On Thu, Jun 15, 2017 at 02:52:37PM +0200, Eric Auger wrote:
>>>> Currently, the line level of unmapped level sensitive SPIs is
>>>> toggled down by the maintenance IRQ handler/resamplefd mechanism.
>>>>
>>>> As mapped SPI completion is not trapped, we cannot rely on this
>>>> mechanism and the line level needs to be observed at distributor
>>>> level instead.
>>>>
>>>> This patch handles the physical IRQ case in vgic_validate_injection
>>>> and get the line level of a mapped SPI at distributor level.
>>>>
>>>> Signed-off-by: Eric Auger <[email protected]>
>>>>
>>>> ---
>>>>
>>>> v1 -> v2:
>>>> - renamed is_unshared_mapped into is_mapped_spi
>>>> - changes to kvm_vgic_map_phys_irq moved in the previous patch
>>>> - make vgic_validate_injection more readable
>>>> - reword the commit message
>>>> ---
>>>> virt/kvm/arm/vgic/vgic.c | 16 ++++++++++++++--
>>>> virt/kvm/arm/vgic/vgic.h | 7 ++++++-
>>>> 2 files changed, 20 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c
>>>> index 075f073..2e35ac7 100644
>>>> --- a/virt/kvm/arm/vgic/vgic.c
>>>> +++ b/virt/kvm/arm/vgic/vgic.c
>>>> @@ -139,6 +139,17 @@ void vgic_put_irq(struct kvm *kvm, struct vgic_irq *irq)
>>>> kfree(irq);
>>>> }
>>>>
>>>> +bool irq_line_level(struct vgic_irq *irq)
>>>> +{
>>>> + bool line_level = irq->line_level;
>>>> +
>>>> + if (unlikely(is_mapped_spi(irq)))
>>>> + WARN_ON(irq_get_irqchip_state(irq->host_irq,
>>>> + IRQCHIP_STATE_PENDING,
>>>> + &line_level));
>>>> + return line_level;
>>>> +}
>>>> +
>>>> /**
>>>> * kvm_vgic_target_oracle - compute the target vcpu for an irq
>>>> *
>>>> @@ -236,13 +247,14 @@ static void vgic_sort_ap_list(struct kvm_vcpu *vcpu)
>>>>
>>>> /*
>>>> * Only valid injection if changing level for level-triggered IRQs or for a
>>>> - * rising edge.
>>>> + * rising edge. Injection of virtual interrupts associated to physical
>>>> + * interrupts always is valid.
>>>
>>> why? I don't remember this now, and that means I probably won't in the
>>> future either.
>>
>> Sorry for the late reply.
>>
>> The life cycle of the physically mapped IRQ is as follows:
>> - pIRQ becomes pending
>> - pIRQ is acknowledged by the physical handler and becomes active
>> - vIRQ gets injected as part of the physical handler chain
>> (VFIO->irqfd kvm_vgic_inject_irq for instance). Linux irq cannot
>> hit until vIRQ=pIRQ deactivation
>> - guest deactivates the vIRQ which automatically deactivates the pIRQ
>>
>>
>> So to me if we are about to vgic_validate_injection() an injection of a
>> physically mapped vIRQ this means the vgic state is ready to accept it:
>> previous occurence was deactivated. There cannot be any state
>> inconsistency around the line_level/level.
>>
>> Do you agree?
>>
>> I will add this description at least in the commit message.
>
> I think the important point is, that even though we don't change the
> level, we still add it to the AP list if not already there, and
> therefore we still do this.
>
>>>
>>> When I look at this now, I'm thinking, if we're not going to change
>>> anything, why proceed beyond validate injection?
>>
>> don't catch this one. validation always succeeds and then we further
>> handle the IRQ.
>
> The problem is that the code suggests that we will not change something,
> but in fact, later on, in the caller, we do queue this IRQ if not on the
> AP list, even though there were no state change in the struct IRQ.
>
> But Marc and I sketched out another proposal which could benefit the
> timer as well. Let me try to verify if it works and send it to you and
> see if that is an improvement over this one.
OK, looking forward to studying your proposal
Thanks
Eric
>
> Thanks,
> -Christoffer
>
On Wed, Aug 23, 2017 at 10:58:38AM +0200, Auger Eric wrote:
> Hi Christoffer,
>
> On 21/07/2017 15:13, Christoffer Dall wrote:
> > On Thu, Jun 15, 2017 at 02:52:38PM +0200, Eric Auger wrote:
> >> Implements kvm_vgic_[set|unset]_forwarding.
> >>
> >> Handle low-level VGIC programming and consistent irqchip
> >> programming.
> >>
> >> Signed-off-by: Eric Auger <[email protected]>
> >>
> >> ---
> >>
> >> v1 -> v2:
> >> - change the parameter names used in the declaration
> >> - use kvm_vgic_map/unmap_phys_irq and handle their returned value
> >> ---
> >> include/kvm/arm_vgic.h | 5 +++
> >> virt/kvm/arm/vgic/vgic.c | 88 ++++++++++++++++++++++++++++++++++++++++++++++++
> >> 2 files changed, 93 insertions(+)
> >>
> >> diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
> >> index cceb31d..5064a57 100644
> >> --- a/include/kvm/arm_vgic.h
> >> +++ b/include/kvm/arm_vgic.h
> >> @@ -343,4 +343,9 @@ int kvm_send_userspace_msi(struct kvm *kvm, struct kvm_msi *msi);
> >> */
> >> int kvm_vgic_setup_default_irq_routing(struct kvm *kvm);
> >>
> >> +int kvm_vgic_set_forwarding(struct kvm *kvm, unsigned int host_irq,
> >> + unsigned int vintid);
> >> +void kvm_vgic_unset_forwarding(struct kvm *kvm, unsigned int host_irq,
> >> + unsigned int vintid);
> >> +
> >> #endif /* __KVM_ARM_VGIC_H */
> >> diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c
> >> index 2e35ac7..9ee3e77 100644
> >> --- a/virt/kvm/arm/vgic/vgic.c
> >> +++ b/virt/kvm/arm/vgic/vgic.c
> >> @@ -781,3 +781,91 @@ bool kvm_vgic_map_is_active(struct kvm_vcpu *vcpu, unsigned int vintid)
> >> return map_is_active;
> >> }
> >>
> >> +/**
> >> + * kvm_vgic_set_forwarding - Set IRQ forwarding
> >> + *
> >> + * @kvm: kvm handle
> >> + * @host_irq: the host linux IRQ
> >> + * @vintid: the virtual INTID
> >> + *
> >> + * This function must be called when the IRQ is not active:
> >> + * ie. not active at GIC level and not currently under injection
> >> + * into the guest using the unforwarded mode. The physical IRQ must
> >> + * be disabled and all vCPUs must have been exited and prevented
> >> + * from being re-entered.
> >> + */
> >> +int kvm_vgic_set_forwarding(struct kvm *kvm, unsigned int host_irq,
> >> + unsigned int vintid)
> >> +{
> >> + struct kvm_vcpu *vcpu;
> >> + struct vgic_irq *irq;
> >> + int ret;
> >> +
> >> + kvm_debug("%s host_irq=%d vintid=%d\n", __func__, host_irq, vintid);
> >
> > do you need to check if the vgic is initialized etc. here?
> yes
> >
> >> +
> >> + if (!vgic_valid_spi(kvm, vintid))
> >> + return -EINVAL;
> >> +
> >> + irq = vgic_get_irq(kvm, NULL, vintid);
> >> + spin_lock(&irq->irq_lock);
> >> +
> >> + if (irq->hw) {
> >> + ret = -EINVAL;
> >
> > is this because it's already forwarded? How about EBUSY or EEXIST
> > instead then?
> OK
> >
> >> + goto unlock;
> >> + }
> >> + vcpu = irq->target_vcpu;
> >> + if (!vcpu) {
> >> + ret = -EAGAIN;
> >
> > what is this case exactly?
> This was discussed previously with Marc
> (https://patchwork.kernel.org/patch/9746841/). In GICv3 case the vcpu
> parameter is not used in irq_set_vcpu_affinity. What this function does
> is tell the GIC not to DIR the physical IRQ.
>
> So in my case I just need a non NULL vcpu passed as parameter of
> irq_set_vcpu_affinity. kvm_vgic_map_irq is not using it because we are
> handling SPIs. But in GICv4 the actual target vpcu will be needed so I
> decided to use this latter and return an error in case it is not known.
Right, but my comment was to the fact that I don't think
irq->target_vcpu could ever be NULL, and I think if you want to simply
assert this, you should instead do:
BUG_ON(!vcpu);
> >
> >> + goto unlock;
> >> + }
> >> +
> >> + ret = kvm_vgic_map_irq(vcpu, irq, host_irq);
> >> + if (!ret)
> >> + irq_set_vcpu_affinity(host_irq, vcpu);
> >
> > so this is essentially map + set_vcpu_affinity. Why do we want the GIC
> > to do this in one go as opposed to leaving it up to the caller?
> The VGIC code already use some genirq functions like
> irq_set/get_irqchip_state. Using the irq->lock prevents the 2 actions
> from being raced with an kvm_vgic_unset_forwarding(). Both the GIC and
> VGIC programming must be consistent.
>
ok, I guess this makes sense.
Thanks,
-Christoffer