This series proposes an integration of "ARM: Forwarding physical
interrupts to a guest VM" (http://lwn.net/Articles/603514/) in
KVM.
It enables to set/unset forwarding for a VFIO platform device IRQ.
A forwarded IRQ is deactivated by the guest and not by the host.
When the guest deactivates the associated virtual IRQ, the interrupt
controller automatically completes the physical IRQ. Obviously
this requires some HW support in the interrupt controller. This is
the case for ARM GICv2.
The direct benefit is that, for a level sensitive IRQ, a VM exit
can be avoided on forwarded IRQ completion.
When the IRQ is forwarded, the VFIO platform driver does not need to
mask the physical IRQ anymore before signaling the eventfd. Indeed
genirq lowers the running priority, enabling other physical IRQ to hit
except that one.
Besides, the injection still is based on irqfd triggering. The only
impact on irqfd process is resamplefd is not called anymore on
virtual IRQ completion since deactivation is not trapped by KVM.
The current integration is based on an extension of the KVM-VFIO
device, previously used by KVM to interact with VFIO groups. The
patch series now enables KVM to directly interact with a VFIO
platform device. The VFIO external API was extended for that purpose.
The IRQ forward programming is architecture specific (virtual interrupt
controller programming basically). However the whole infrastructure is
kept generic.
from a user point of view, the functionality is provided through a
new KVM-VFIO group named KVM_DEV_VFIO_DEVICE and 2 associated
attributes:
- KVM_DEV_VFIO_DEVICE_FORWARD_IRQ,
- KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ.
The capability can be checked with KVM_HAS_DEVICE_ATTR.
Forwarding must be activated when the VFIO IRQ is not active at
physical level or being under injection into the guest (VFIO masked)
Forwarding can be unset at any time.
This patch series has the following dependencies:
- RFC "ARM: Forwarding physical interrupts to a guest VM"
(http://lwn.net/Articles/603514/)
Note part of this RFC has not evolved since June 2014. Only below subset
has progressed.
- [PATCH v4 0/3] genirq: Saving/restoring the irqchip state of an irq line
http://lkml.iu.edu/hypermail/linux/kernel/1503.2/02462.html
- [RFC v2] chip/vgic adaptations for forwarded irq
http://lists.infradead.org/pipermail/linux-arm-kernel/2015-February/323183.html
Integrated pieces can be found at:
https://git.linaro.org/people/eric.auger/linux.git/shortlog/refs/heads/4.O_forward_v6
This was tested on Calxeda Midway, assigning the xgmac main IRQ.
Unforward was tested doing periodic forward/unforward with random offsets,
while using netcat traffic to make sure unforward often occurs while the
IRQ is in progress.
v5 -> v6:
Took into account Alex comments:
- vfio
x introduced vfio_device_external_ops to hold external callbacks:
mask, is_active, set_automasked
x their proto now feature index, start, count
x implementation of vfio_external_[mask, is_active, set_automasked] moved
to vfio.c. the functions just call bus specific callbacks, currently
only implemented on vfio_platform side.
- kvm-vfio
x does not use struct vfio_platform_device handles anymore. Use vfio_device.
x remove DEBUG flags
x rename kvm_vfio_platform_get_irq into kvm_vfio_get_hwirq
v4 -> v5:
- fix arm64 compilation issues
- arch/arm64/include/asm/kvm_host.h now defines
x __KVM_HAVE_ARCH_KVM_VFIO_FORWARD for arm64
x __KVM_HAVE_ARCH_HALT_GUEST
x and features pause renamed into power_off
v3 -> v4:
- revert as RFC again due to lots of changes, extra complexity induced
by new set/unset_forward implementation, and dependencies on RFC patches
- kvm_vfio_dev_irq struct is used at user level to pass the parameters
to KVM-VFIO KVM_DEV_VFIO_DEVICE/KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ. Shared
with Intel posted IRQs.
- unforward now can happen any time with no constraint and cannot fail
- new VFIO platform external functions introduced:
vfio_externl_set_automasked, vfio_external_mask, vfio_external_is_active,
- introduce a modality to force guest to exit & prevent it from being
re-entered and rename older ARM pause modality into power-off
(related to PSCI power-off start)
- kvm_vfio_arm.c no more exists. architecture specific code is moved into
arm/gic.c. This code is not that much VFIO dependent anymore. Although
some references still exit in comments.
- 2 separate architecture specific functions for set and unset (only one
has a return value).
v2 -> v3:
- kvm_fwd_irq_action enum replaced by a bool (KVM_VFIO_IRQ_CLEANUP does not
exist anymore)
- a new struct local to vfio.c was introduced to wrap kvm_fw_irq and make it
linkable: kvm_vfio_fwd_irq_node
- kvm_fwd_irq now is self-contained (includes struct vfio_device *)
- a single list of kvm_vfio_fwd_irq_irq_node is used instead of having
a list of devices and a list of forward irq per device. Having 2 lists
brought extra complexity.
- the VFIO device ref counter is incremented each time a new IRQ is forwarded.
It is not attempted anymore to hold a single reference whatever the number
of forwarded IRQs.
- subindex added on top of index to be closer to VFIO API
- platform device check moved in the arm specific implementation
- enable the KVM-VFIO device for arm64
- forwarded state change only can happen while the VFIO IRQ handler is not
set; in other words, when the VFIO IRQ signaling is not set.
v1 -> v2:
- forward control is moved from architecture specific file into generic
vfio.c module.
only kvm_arch_set_fwd_state remains architecture specific
- integrate Kim's patch which enables KVM-VFIO for ARM
- fix vgic state bypass in vgic_queue_hwirq
- struct kvm_arch_forwarded_irq moved from arch/arm/include/uapi/asm/kvm.h
to include/uapi/linux/kvm.h
also irq_index renamed into index and guest_irq renamed into gsi
- ASSIGN/DEASSIGN renamed into FORWARD/UNFORWARD
- vfio_external_get_base_device renamed into vfio_external_base_device
- vfio_external_get_type removed
- kvm_vfio_external_get_base_device renamed into kvm_vfio_external_base_device
- __KVM_HAVE_ARCH_KVM_VFIO renamed into __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
Eric Auger (15):
VFIO: platform: test forwarded state when selecting IRQ handler
VFIO: platform: single handler using function pointer
KVM: kvm-vfio: User API for IRQ forwarding
VFIO: external user API for interaction with vfio devices
VFIO: Introduce vfio_device_external_ops
VFIO: pci: initialize vfio_device_external_ops
VFIO: platform: implement vfio_device_external_ops callbacks
VFIO: add vfio_external_{mask|is_active|set_automasked}
KVM: kvm-vfio: wrappers to VFIO external API device helpers
KVM: kvm-vfio: wrappers for
vfio_external_{mask|is_active|set_automasked}
KVM: arm: rename pause into power_off
kvm: introduce kvm_arch_halt_guest and kvm_arch_resume_guest
kvm: arm/arm64: implement kvm_arch_halt_guest and
kvm_arch_resume_guest
KVM: kvm-vfio: generic forwarding control
KVM: arm/arm64: vgic: forwarding control
Kim Phillips (1):
KVM: arm/arm64: Enable the KVM-VFIO device
Documentation/virtual/kvm/devices/vfio.txt | 34 +-
arch/arm/include/asm/kvm_host.h | 7 +-
arch/arm/kvm/Kconfig | 1 +
arch/arm/kvm/Makefile | 2 +-
arch/arm/kvm/arm.c | 38 ++-
arch/arm/kvm/psci.c | 10 +-
arch/arm64/include/asm/kvm_host.h | 5 +
arch/arm64/kvm/Kconfig | 1 +
arch/arm64/kvm/Makefile | 2 +-
drivers/vfio/pci/vfio_pci.c | 1 +
drivers/vfio/platform/vfio_platform_common.c | 7 +
drivers/vfio/platform/vfio_platform_irq.c | 77 ++++-
drivers/vfio/platform/vfio_platform_private.h | 12 +
drivers/vfio/vfio.c | 63 ++++
include/linux/kvm_host.h | 59 ++++
include/linux/vfio.h | 37 +++
include/uapi/linux/kvm.h | 12 +
virt/kvm/arm/vgic.c | 190 +++++++++++
virt/kvm/vfio.c | 436 +++++++++++++++++++++++++-
19 files changed, 964 insertions(+), 30 deletions(-)
--
1.9.1
From: Kim Phillips <[email protected]>
The KVM-VFIO device is used by the QEMU VFIO device. It is used to
record the list of in-use VFIO groups so that KVM can manipulate
them. With this series, it will also be used to record the forwarded
IRQs.
Signed-off-by: Kim Phillips <[email protected]>
Signed-off-by: Eric Auger <[email protected]>
---
v4 -> v5:
- reword the commit message to explain both usages of the KVM-VFIO
device in QEMU
- squash both arm and arm64 enables
---
arch/arm/kvm/Kconfig | 1 +
arch/arm/kvm/Makefile | 2 +-
arch/arm64/kvm/Kconfig | 1 +
arch/arm64/kvm/Makefile | 2 +-
4 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig
index f1f79d1..bfb915d 100644
--- a/arch/arm/kvm/Kconfig
+++ b/arch/arm/kvm/Kconfig
@@ -28,6 +28,7 @@ config KVM
select KVM_GENERIC_DIRTYLOG_READ_PROTECT
select SRCU
select MMU_NOTIFIER
+ select KVM_VFIO
select HAVE_KVM_EVENTFD
select HAVE_KVM_IRQFD
depends on ARM_VIRT_EXT && ARM_LPAE && ARM_ARCH_TIMER
diff --git a/arch/arm/kvm/Makefile b/arch/arm/kvm/Makefile
index a093bf1..a4dbb97 100644
--- a/arch/arm/kvm/Makefile
+++ b/arch/arm/kvm/Makefile
@@ -15,7 +15,7 @@ AFLAGS_init.o := -Wa,-march=armv7-a$(plus_virt)
AFLAGS_interrupts.o := -Wa,-march=armv7-a$(plus_virt)
KVM := ../../../virt/kvm
-kvm-arm-y = $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o $(KVM)/eventfd.o
+kvm-arm-y = $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o $(KVM)/eventfd.o $(KVM)/vfio.o
obj-y += kvm-arm.o init.o interrupts.o
obj-y += arm.o handle_exit.o guest.o mmu.o emulate.o reset.o
diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
index 5105e29..bfffe8f 100644
--- a/arch/arm64/kvm/Kconfig
+++ b/arch/arm64/kvm/Kconfig
@@ -28,6 +28,7 @@ config KVM
select KVM_ARM_HOST
select KVM_GENERIC_DIRTYLOG_READ_PROTECT
select SRCU
+ select KVM_VFIO
select HAVE_KVM_EVENTFD
select HAVE_KVM_IRQFD
---help---
diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
index b22c636..7104ca1 100644
--- a/arch/arm64/kvm/Makefile
+++ b/arch/arm64/kvm/Makefile
@@ -11,7 +11,7 @@ ARM=../../../arch/arm/kvm
obj-$(CONFIG_KVM_ARM_HOST) += kvm.o
-kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o $(KVM)/eventfd.o
+kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o $(KVM)/eventfd.o $(KVM)/vfio.o
kvm-$(CONFIG_KVM_ARM_HOST) += $(ARM)/arm.o $(ARM)/mmu.o $(ARM)/mmio.o
kvm-$(CONFIG_KVM_ARM_HOST) += $(ARM)/psci.o $(ARM)/perf.o
--
1.9.1
In case the IRQ is forwarded, the VFIO platform IRQ handler does not
need to disable the IRQ anymore.
When setting the IRQ handler we now also test the forwarded state. In
case the IRQ is forwarded we select the vfio_irq_handler.
Signed-off-by: Eric Auger <[email protected]>
---
v3 -> v4:
- change title
v2 -> v3:
- forwarded state was tested in the handler. Now the forwarded state
is tested before setting the handler. This definitively limits
the dynamics of forwarded state changes but I don't think there is
a use case where we need to be able to change the state at any time.
Conflicts:
drivers/vfio/platform/vfio_platform_irq.c
---
drivers/vfio/platform/vfio_platform_irq.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/drivers/vfio/platform/vfio_platform_irq.c b/drivers/vfio/platform/vfio_platform_irq.c
index 88bba57..132bb3f 100644
--- a/drivers/vfio/platform/vfio_platform_irq.c
+++ b/drivers/vfio/platform/vfio_platform_irq.c
@@ -229,8 +229,13 @@ static int vfio_platform_set_irq_trigger(struct vfio_platform_device *vdev,
{
struct vfio_platform_irq *irq = &vdev->irqs[index];
irq_handler_t handler;
+ struct irq_data *d;
+ bool is_forwarded;
- if (vdev->irqs[index].flags & VFIO_IRQ_INFO_AUTOMASKED)
+ d = irq_get_irq_data(irq->hwirq);
+ is_forwarded = irqd_irq_forwarded(d);
+
+ if (vdev->irqs[index].flags & VFIO_IRQ_INFO_AUTOMASKED && !is_forwarded)
handler = vfio_automasked_irq_handler;
else
handler = vfio_irq_handler;
--
1.9.1
A single handler now is registered whatever the use case: automasked
or not. A function pointer is set according to the wished behavior
and the handler calls this function.
The irq lock is taken/released in the root handler. eventfd_signal can
be called in regions not allowed to sleep.
Signed-off-by: Eric Auger <[email protected]>
---
v4: creation
---
drivers/vfio/platform/vfio_platform_irq.c | 21 +++++++++++++++------
drivers/vfio/platform/vfio_platform_private.h | 1 +
2 files changed, 16 insertions(+), 6 deletions(-)
diff --git a/drivers/vfio/platform/vfio_platform_irq.c b/drivers/vfio/platform/vfio_platform_irq.c
index 132bb3f..8eb65c1 100644
--- a/drivers/vfio/platform/vfio_platform_irq.c
+++ b/drivers/vfio/platform/vfio_platform_irq.c
@@ -147,11 +147,8 @@ static int vfio_platform_set_irq_unmask(struct vfio_platform_device *vdev,
static irqreturn_t vfio_automasked_irq_handler(int irq, void *dev_id)
{
struct vfio_platform_irq *irq_ctx = dev_id;
- unsigned long flags;
int ret = IRQ_NONE;
- spin_lock_irqsave(&irq_ctx->lock, flags);
-
if (!irq_ctx->masked) {
ret = IRQ_HANDLED;
@@ -160,8 +157,6 @@ static irqreturn_t vfio_automasked_irq_handler(int irq, void *dev_id)
irq_ctx->masked = true;
}
- spin_unlock_irqrestore(&irq_ctx->lock, flags);
-
if (ret == IRQ_HANDLED)
eventfd_signal(irq_ctx->trigger, 1);
@@ -177,6 +172,19 @@ static irqreturn_t vfio_irq_handler(int irq, void *dev_id)
return IRQ_HANDLED;
}
+static irqreturn_t vfio_handler(int irq, void *dev_id)
+{
+ struct vfio_platform_irq *irq_ctx = dev_id;
+ unsigned long flags;
+ irqreturn_t ret;
+
+ spin_lock_irqsave(&irq_ctx->lock, flags);
+ ret = irq_ctx->handler(irq, dev_id);
+ spin_unlock_irqrestore(&irq_ctx->lock, flags);
+
+ return ret;
+}
+
static int vfio_set_trigger(struct vfio_platform_device *vdev, int index,
int fd, irq_handler_t handler)
{
@@ -206,9 +214,10 @@ static int vfio_set_trigger(struct vfio_platform_device *vdev, int index,
}
irq->trigger = trigger;
+ irq->handler = handler;
irq_set_status_flags(irq->hwirq, IRQ_NOAUTOEN);
- ret = request_irq(irq->hwirq, handler, 0, irq->name, irq);
+ ret = request_irq(irq->hwirq, vfio_handler, 0, irq->name, irq);
if (ret) {
kfree(irq->name);
eventfd_ctx_put(trigger);
diff --git a/drivers/vfio/platform/vfio_platform_private.h b/drivers/vfio/platform/vfio_platform_private.h
index 5d31e04..eb91deb 100644
--- a/drivers/vfio/platform/vfio_platform_private.h
+++ b/drivers/vfio/platform/vfio_platform_private.h
@@ -37,6 +37,7 @@ struct vfio_platform_irq {
spinlock_t lock;
struct virqfd *unmask;
struct virqfd *mask;
+ irqreturn_t (*handler)(int irq, void *dev_id);
};
struct vfio_platform_region {
--
1.9.1
This patch adds and documents a new KVM_DEV_VFIO_DEVICE group
and 2 device attributes: KVM_DEV_VFIO_DEVICE_FORWARD_IRQ,
KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ. The purpose is to be able
to set a VFIO device IRQ as forwarded or not forwarded.
the command takes as argument a handle to a new struct named
kvm_vfio_dev_irq.
Signed-off-by: Eric Auger <[email protected]>
---
v3 -> v4:
- rename kvm_arch_forwarded_irq into kvm_vfio_dev_irq
- some rewording in commit message
- document forwarding restrictions and remove unforwarding ones
v2 -> v3:
- rework vfio kvm device documentation
- reword commit message and title
- add subindex in kvm_arch_forwarded_irq to be closer to VFIO API
- forwarding state can only be changed with VFIO IRQ signaling is off
v1 -> v2:
- struct kvm_arch_forwarded_irq moved from arch/arm/include/uapi/asm/kvm.h
to include/uapi/linux/kvm.h
also irq_index renamed into index and guest_irq renamed into gsi
- ASSIGN/DEASSIGN renamed into FORWARD/UNFORWARD
---
Documentation/virtual/kvm/devices/vfio.txt | 34 ++++++++++++++++++++++++------
include/uapi/linux/kvm.h | 12 +++++++++++
2 files changed, 40 insertions(+), 6 deletions(-)
diff --git a/Documentation/virtual/kvm/devices/vfio.txt b/Documentation/virtual/kvm/devices/vfio.txt
index ef51740..6186e6d 100644
--- a/Documentation/virtual/kvm/devices/vfio.txt
+++ b/Documentation/virtual/kvm/devices/vfio.txt
@@ -4,15 +4,20 @@ VFIO virtual device
Device types supported:
KVM_DEV_TYPE_VFIO
-Only one VFIO instance may be created per VM. The created device
-tracks VFIO groups in use by the VM and features of those groups
-important to the correctness and acceleration of the VM. As groups
-are enabled and disabled for use by the VM, KVM should be updated
-about their presence. When registered with KVM, a reference to the
-VFIO-group is held by KVM.
+Only one VFIO instance may be created per VM.
+
+The created device tracks VFIO groups in use by the VM and features
+of those groups important to the correctness and acceleration of
+the VM. As groups are enabled and disabled for use by the VM, KVM
+should be updated about their presence. When registered with KVM,
+a reference to the VFIO-group is held by KVM.
+
+The device also enables to control some IRQ settings of VFIO devices:
+forwarding/posting.
Groups:
KVM_DEV_VFIO_GROUP
+ KVM_DEV_VFIO_DEVICE
KVM_DEV_VFIO_GROUP attributes:
KVM_DEV_VFIO_GROUP_ADD: Add a VFIO group to VFIO-KVM device tracking
@@ -20,3 +25,20 @@ KVM_DEV_VFIO_GROUP attributes:
For each, kvm_device_attr.addr points to an int32_t file descriptor
for the VFIO group.
+
+KVM_DEV_VFIO_DEVICE attributes:
+ KVM_DEV_VFIO_DEVICE_FORWARD_IRQ: set a VFIO device IRQ as forwarded
+ KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ: set a VFIO device IRQ as not forwarded
+
+For each, kvm_device_attr.addr points to a kvm_vfio_dev_irq struct.
+
+When forwarded, a physical IRQ is completed by the guest and not by the
+host. This requires HW support in the interrupt controller.
+
+Forwarding can only be set when the corresponding VFIO IRQ is not masked
+(would it be through VFIO_DEVICE_SET_IRQS command or as a consequence of this
+IRQ being currently handled) or active at interrupt controller level.
+In such a situation, -EAGAIN is returned. It is advised to to set the
+forwarding before the VFIO signaling is set up, this avoids trial and errors.
+
+Unforwarding can happen at any time.
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 8055706..1b78dd3 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -945,6 +945,9 @@ struct kvm_device_attr {
#define KVM_DEV_VFIO_GROUP 1
#define KVM_DEV_VFIO_GROUP_ADD 1
#define KVM_DEV_VFIO_GROUP_DEL 2
+#define KVM_DEV_VFIO_DEVICE 2
+#define KVM_DEV_VFIO_DEVICE_FORWARD_IRQ 1
+#define KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ 2
enum kvm_device_type {
KVM_DEV_TYPE_FSL_MPIC_20 = 1,
@@ -964,6 +967,15 @@ enum kvm_device_type {
KVM_DEV_TYPE_MAX,
};
+struct kvm_vfio_dev_irq {
+ __u32 argsz; /* structure length */
+ __u32 fd; /* file descriptor of the VFIO device */
+ __u32 index; /* VFIO device IRQ index */
+ __u32 start; /* start of subindex range */
+ __u32 count; /* size of subindex range */
+ __u32 gsi[]; /* gsi, ie. virtual IRQ number */
+};
+
/*
* ioctls for VM fds
*/
--
1.9.1
The VFIO external user API is enriched with 3 new functions that
allows a kernel user external to VFIO to retrieve some information
from a VFIO device.
- vfio_device_get_external_user enables to get a vfio device from
its fd and increments its reference counter
- vfio_device_put_external_user decrements the reference counter
- vfio_external_base_device returns a handle to the struct device
---
v3 -> v4:
- change the commit title
v2 -> v3:
- reword the commit message
v1 -> v2:
- vfio_external_get_base_device renamed into vfio_external_base_device
- vfio_external_get_type removed
Signed-off-by: Eric Auger <[email protected]>
---
drivers/vfio/vfio.c | 24 ++++++++++++++++++++++++
include/linux/vfio.h | 3 +++
2 files changed, 27 insertions(+)
diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index 23ba12a..33ae8c5 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -1490,6 +1490,30 @@ void vfio_group_put_external_user(struct vfio_group *group)
}
EXPORT_SYMBOL_GPL(vfio_group_put_external_user);
+struct vfio_device *vfio_device_get_external_user(struct file *filep)
+{
+ struct vfio_device *vdev = filep->private_data;
+
+ if (filep->f_op != &vfio_device_fops)
+ return ERR_PTR(-EINVAL);
+
+ vfio_device_get(vdev);
+ return vdev;
+}
+EXPORT_SYMBOL_GPL(vfio_device_get_external_user);
+
+void vfio_device_put_external_user(struct vfio_device *vdev)
+{
+ vfio_device_put(vdev);
+}
+EXPORT_SYMBOL_GPL(vfio_device_put_external_user);
+
+struct device *vfio_external_base_device(struct vfio_device *vdev)
+{
+ return vdev->dev;
+}
+EXPORT_SYMBOL_GPL(vfio_external_base_device);
+
int vfio_external_user_iommu_id(struct vfio_group *group)
{
return iommu_group_id(group->iommu_group);
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index 683b514..b18c38f 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -101,6 +101,9 @@ extern void vfio_group_put_external_user(struct vfio_group *group);
extern int vfio_external_user_iommu_id(struct vfio_group *group);
extern long vfio_external_check_extension(struct vfio_group *group,
unsigned long arg);
+extern struct vfio_device *vfio_device_get_external_user(struct file *filep);
+extern void vfio_device_put_external_user(struct vfio_device *vdev);
+extern struct device *vfio_external_base_device(struct vfio_device *vdev);
struct pci_dev;
#ifdef CONFIG_EEH
--
1.9.1
New bus callbacks are introduced. They correspond to external
functions. To avoid messing up the main vfio_device_ops
struct, a new vfio_device_external_ops struct is introduced.
Signed-off-by: Eric Auger <[email protected]>
---
v6: creation
---
include/linux/vfio.h | 18 ++++++++++++++++++
1 file changed, 18 insertions(+)
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index b18c38f..b0ec474 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -19,6 +19,23 @@
#include <uapi/linux/vfio.h>
/**
+ * struct vfio_device_external_ops - VFIO bus driver device callbacks
+ * used as external API
+ * @mask: mask any IRQ defined by triplet
+ * @is_active: returns whether any IRQ defined by triplet is active
+ * @set_automasked: sets the automasked flag of triplet's IRQ
+ */
+struct vfio_device_external_ops {
+ int (*mask)(void *device_data, unsigned index, unsigned start,
+ unsigned count);
+ int (*is_active)(void *device_data, unsigned index, unsigned start,
+ unsigned count);
+ int (*set_automasked)(void *device_data, unsigned index,
+ unsigned start, unsigned count,
+ bool automasked);
+};
+
+/**
* struct vfio_device_ops - VFIO bus driver device callbacks
*
* @open: Called when userspace creates new file descriptor for device
@@ -42,6 +59,7 @@ struct vfio_device_ops {
unsigned long arg);
int (*mmap)(void *device_data, struct vm_area_struct *vma);
void (*request)(void *device_data, unsigned int count);
+ struct vfio_device_external_ops *external_ops;
};
extern int vfio_add_group_dev(struct device *dev,
--
1.9.1
Signed-off-by: Eric Auger <[email protected]>
---
v6: creation
---
drivers/vfio/pci/vfio_pci.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 2f865d07..755897f 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -855,6 +855,7 @@ static const struct vfio_device_ops vfio_pci_ops = {
.write = vfio_pci_write,
.mmap = vfio_pci_mmap,
.request = vfio_pci_request,
+ .external_ops = NULL,
};
static int vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
--
1.9.1
This patch adds the implementation for the 3 external callbacks of
vfio_device_external_ops struct, namely active, is_active,
set_automasked. Also vfio_device_ops and vfio_device_external_ops are
set accordingly.
Signed-off-by: Eric Auger <[email protected]>
---
v6: creation
---
drivers/vfio/platform/vfio_platform_common.c | 7 ++++
drivers/vfio/platform/vfio_platform_irq.c | 49 +++++++++++++++++++++++++++
drivers/vfio/platform/vfio_platform_private.h | 11 ++++++
3 files changed, 67 insertions(+)
diff --git a/drivers/vfio/platform/vfio_platform_common.c b/drivers/vfio/platform/vfio_platform_common.c
index abcff7a..4113d46 100644
--- a/drivers/vfio/platform/vfio_platform_common.c
+++ b/drivers/vfio/platform/vfio_platform_common.c
@@ -471,6 +471,12 @@ static int vfio_platform_mmap(void *device_data, struct vm_area_struct *vma)
return -EINVAL;
}
+static struct vfio_device_external_ops vfio_platform_external_ops = {
+ .mask = vfio_platform_external_mask,
+ .is_active = vfio_platform_external_is_active,
+ .set_automasked = vfio_platform_external_set_automasked,
+};
+
static const struct vfio_device_ops vfio_platform_ops = {
.name = "vfio-platform",
.open = vfio_platform_open,
@@ -479,6 +485,7 @@ static const struct vfio_device_ops vfio_platform_ops = {
.read = vfio_platform_read,
.write = vfio_platform_write,
.mmap = vfio_platform_mmap,
+ .external_ops = &vfio_platform_external_ops
};
int vfio_platform_probe_common(struct vfio_platform_device *vdev,
diff --git a/drivers/vfio/platform/vfio_platform_irq.c b/drivers/vfio/platform/vfio_platform_irq.c
index 8eb65c1..f6d83ed 100644
--- a/drivers/vfio/platform/vfio_platform_irq.c
+++ b/drivers/vfio/platform/vfio_platform_irq.c
@@ -231,6 +231,55 @@ static int vfio_set_trigger(struct vfio_platform_device *vdev, int index,
return 0;
}
+int vfio_platform_external_mask(void *device_data, unsigned index,
+ unsigned start, unsigned count)
+{
+ struct vfio_platform_device *vdev = device_data;
+
+ vfio_platform_mask(&vdev->irqs[index]);
+ return 0;
+}
+
+int vfio_platform_external_is_active(void *device_data, unsigned index,
+ unsigned start, unsigned count)
+{
+ unsigned long flags;
+ struct vfio_platform_device *vdev = device_data;
+ struct vfio_platform_irq *irq = &vdev->irqs[index];
+ bool active, masked, outstanding;
+ int ret;
+
+ spin_lock_irqsave(&irq->lock, flags);
+
+ ret = irq_get_irqchip_state(irq->hwirq, IRQCHIP_STATE_ACTIVE, &active);
+ BUG_ON(ret);
+ masked = irq->masked;
+ outstanding = active || masked;
+
+ spin_unlock_irqrestore(&irq->lock, flags);
+ return outstanding;
+}
+
+int vfio_platform_external_set_automasked(void *device_data, unsigned index,
+ unsigned start, unsigned count,
+ bool automasked)
+{
+ unsigned long flags;
+ struct vfio_platform_device *vdev = device_data;
+ struct vfio_platform_irq *irq = &vdev->irqs[index];
+
+ spin_lock_irqsave(&irq->lock, flags);
+ if (automasked) {
+ irq->flags |= VFIO_IRQ_INFO_AUTOMASKED;
+ irq->handler = vfio_automasked_irq_handler;
+ } else {
+ irq->flags &= ~VFIO_IRQ_INFO_AUTOMASKED;
+ irq->handler = vfio_irq_handler;
+ }
+ spin_unlock_irqrestore(&irq->lock, flags);
+ return 0;
+}
+
static int vfio_platform_set_irq_trigger(struct vfio_platform_device *vdev,
unsigned index, unsigned start,
unsigned count, uint32_t flags,
diff --git a/drivers/vfio/platform/vfio_platform_private.h b/drivers/vfio/platform/vfio_platform_private.h
index eb91deb..253caa3 100644
--- a/drivers/vfio/platform/vfio_platform_private.h
+++ b/drivers/vfio/platform/vfio_platform_private.h
@@ -83,4 +83,15 @@ extern int vfio_platform_set_irqs_ioctl(struct vfio_platform_device *vdev,
unsigned start, unsigned count,
void *data);
+extern int vfio_platform_external_mask(void *device_data, unsigned index,
+ unsigned start, unsigned count);
+extern int vfio_platform_external_is_active(void *device_data,
+ unsigned index, unsigned start,
+ unsigned count);
+extern int vfio_platform_external_set_automasked(void *device_data,
+ unsigned index,
+ unsigned start,
+ unsigned count,
+ bool automasked);
+
#endif /* VFIO_PLATFORM_PRIVATE_H */
--
1.9.1
Introduces 3 new external functions aimed at doing actions
on VFIO devices:
- mask VFIO IRQ
- get the active status of VFIO IRQ (active at interrupt
controller level or masked by the level-sensitive automasking).
- change the automasked property and switch the IRQ handler
(between automasked/ non automasked)
Their implementation is based on bus specific callbacks.
Note there is no way to discriminate between user-space
masking and automasked handler masking. As a consequence, is_active
will return true in case the IRQ was masked by the user-space.
Signed-off-by: Eric Auger <[email protected]>
---
v5 -> v6:
- implementation now uses external ops
- prototype changed (index, start, count) and returns int
V4: creation
---
drivers/vfio/vfio.c | 39 +++++++++++++++++++++++++++++++++++++++
include/linux/vfio.h | 16 ++++++++++++++++
2 files changed, 55 insertions(+)
diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index 33ae8c5..a36188e 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -1526,6 +1526,45 @@ long vfio_external_check_extension(struct vfio_group *group, unsigned long arg)
}
EXPORT_SYMBOL_GPL(vfio_external_check_extension);
+int vfio_external_mask(struct vfio_device *vdev, unsigned index,
+ unsigned start, unsigned count)
+{
+ if (vdev->ops->external_ops &&
+ vdev->ops->external_ops->mask)
+ return vdev->ops->external_ops->mask(vdev->device_data,
+ index, start, count);
+ else
+ return -ENXIO;
+}
+EXPORT_SYMBOL_GPL(vfio_external_mask);
+
+int vfio_external_is_active(struct vfio_device *vdev, unsigned index,
+ unsigned start, unsigned count)
+{
+ if (vdev->ops->external_ops &&
+ vdev->ops->external_ops->is_active)
+ return vdev->ops->external_ops->is_active(vdev->device_data,
+ index, start, count);
+ else
+ return -ENXIO;
+}
+EXPORT_SYMBOL_GPL(vfio_external_is_active);
+
+int vfio_external_set_automasked(struct vfio_device *vdev,
+ unsigned index, unsigned start,
+ unsigned count, bool automasked)
+{
+ if (vdev->ops->external_ops &&
+ vdev->ops->external_ops->set_automasked)
+ return vdev->ops->external_ops->set_automasked(
+ vdev->device_data,
+ index, start,
+ count, automasked);
+ else
+ return -ENXIO;
+}
+EXPORT_SYMBOL_GPL(vfio_external_set_automasked);
+
/**
* Module/class support
*/
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index b0ec474..a740392 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -123,6 +123,22 @@ extern struct vfio_device *vfio_device_get_external_user(struct file *filep);
extern void vfio_device_put_external_user(struct vfio_device *vdev);
extern struct device *vfio_external_base_device(struct vfio_device *vdev);
+extern int vfio_external_mask(struct vfio_device *vdev, unsigned index,
+ unsigned start, unsigned count);
+/*
+ * returns whether the VFIO IRQ is active:
+ * true if not yet deactivated at interrupt controller level or if
+ * automasked (level sensitive IRQ). Unfortunately there is no way to
+ * discriminate between handler auto-masking and user-space masking
+ */
+extern int vfio_external_is_active(struct vfio_device *vdev,
+ unsigned index, unsigned start,
+ unsigned count);
+
+extern int vfio_external_set_automasked(struct vfio_device *vdev,
+ unsigned index, unsigned start,
+ unsigned count, bool automasked);
+
struct pci_dev;
#ifdef CONFIG_EEH
extern void vfio_spapr_pci_eeh_open(struct pci_dev *pdev);
--
1.9.1
Provide wrapper functions that allow KVM-VFIO device code to
interact with a vfio device:
- kvm_vfio_device_get_external_user gets a handle to a struct
vfio_device from the vfio device file descriptor and increments
its reference counter,
- kvm_vfio_device_put_external_user decrements the reference counter
to a vfio device,
- kvm_vfio_external_base_device returns a handle to the struct device
of the vfio device.
Also kvm_vfio_get_vfio_device and kvm_vfio_put_vfio_device helpers
are introduced.
Signed-off-by: Eric Auger <[email protected]>
---
v3 -> v4:
- wrappers are no more exposed in kvm_host and become kvm/vfio.c static
functions
- added kvm_vfio_get_vfio_device/kvm_vfio_put_vfio_device in that
patch file
v2 -> v3:
- reword the commit message and title
v1 -> v2:
- kvm_vfio_external_get_base_device renamed into
kvm_vfio_external_base_device
- kvm_vfio_external_get_type removed
---
virt/kvm/vfio.c | 74 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 74 insertions(+)
diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
index 620e37f..80a45e4 100644
--- a/virt/kvm/vfio.c
+++ b/virt/kvm/vfio.c
@@ -60,6 +60,80 @@ static void kvm_vfio_group_put_external_user(struct vfio_group *vfio_group)
symbol_put(vfio_group_put_external_user);
}
+static struct vfio_device *kvm_vfio_device_get_external_user(struct file *filep)
+{
+ struct vfio_device *vdev;
+ struct vfio_device *(*fn)(struct file *);
+
+ fn = symbol_get(vfio_device_get_external_user);
+ if (!fn)
+ return ERR_PTR(-EINVAL);
+
+ vdev = fn(filep);
+
+ symbol_put(vfio_device_get_external_user);
+
+ return vdev;
+}
+
+static void kvm_vfio_device_put_external_user(struct vfio_device *vdev)
+{
+ void (*fn)(struct vfio_device *);
+
+ fn = symbol_get(vfio_device_put_external_user);
+ if (!fn)
+ return;
+
+ fn(vdev);
+
+ symbol_put(vfio_device_put_external_user);
+}
+
+static struct device *kvm_vfio_external_base_device(struct vfio_device *vdev)
+{
+ struct device *(*fn)(struct vfio_device *);
+ struct device *dev;
+
+ fn = symbol_get(vfio_external_base_device);
+ if (!fn)
+ return NULL;
+
+ dev = fn(vdev);
+
+ symbol_put(vfio_external_base_device);
+
+ return dev;
+}
+
+/**
+ * kvm_vfio_get_vfio_device - Returns a handle to a vfio-device
+ *
+ * Checks it is a valid vfio device and increments its reference counter
+ * @fd: file descriptor of the vfio platform device
+ */
+static struct vfio_device *kvm_vfio_get_vfio_device(int fd)
+{
+ struct fd f = fdget(fd);
+ struct vfio_device *vdev;
+
+ if (!f.file)
+ return ERR_PTR(-EINVAL);
+ vdev = kvm_vfio_device_get_external_user(f.file);
+ fdput(f);
+ return vdev;
+}
+
+/**
+ * kvm_vfio_put_vfio_device: decrements the reference counter of the
+ * vfio platform * device
+ *
+ * @vdev: vfio_device handle to release
+ */
+static void kvm_vfio_put_vfio_device(struct vfio_device *vdev)
+{
+ kvm_vfio_device_put_external_user(vdev);
+}
+
static bool kvm_vfio_group_is_coherent(struct vfio_group *vfio_group)
{
long (*fn)(struct vfio_group *, unsigned long);
--
1.9.1
Those 3 new wrapper functions call the respective VFIO external
functions.
Signed-off-by: Eric Auger <[email protected]>
---
- v4: creation
- v5 -> v6:
x vfio.h modifications duly moved in "VFIO: platform: add
vfio_external_{mask|is_active|set_automasked}"
x all external functions use vfio_device handle
x change proto of above function (unsigned index, unsigned start,
unsigned count); also return int.
---
virt/kvm/vfio.c | 52 ++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 52 insertions(+)
diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
index 80a45e4..f4e86d3 100644
--- a/virt/kvm/vfio.c
+++ b/virt/kvm/vfio.c
@@ -134,6 +134,58 @@ static void kvm_vfio_put_vfio_device(struct vfio_device *vdev)
kvm_vfio_device_put_external_user(vdev);
}
+int kvm_vfio_external_is_active(struct vfio_device *vdev,
+ unsigned index, unsigned start, unsigned count)
+{
+ int (*fn)(struct vfio_device *, unsigned index, unsigned start,
+ unsigned count);
+ bool active;
+
+ fn = symbol_get(vfio_external_is_active);
+ if (!fn)
+ return -ENXIO;
+
+ active = fn(vdev, index, start, count);
+
+ symbol_put(vfio_external_is_active);
+ return active;
+}
+
+int kvm_vfio_external_mask(struct vfio_device *vdev, unsigned index,
+ unsigned start, unsigned count)
+{
+ int ret;
+ int (*fn)(struct vfio_device *, unsigned index, unsigned start,
+ unsigned count);
+
+ fn = symbol_get(vfio_external_mask);
+ if (!fn)
+ return -ENXIO;
+
+ ret = fn(vdev, index, start, count);
+
+ symbol_put(vfio_external_mask);
+ return ret;
+}
+
+int kvm_vfio_external_set_automasked(struct vfio_device *vdev,
+ unsigned index, unsigned start,
+ unsigned count, bool automasked)
+{
+ int ret;
+ int (*fn)(struct vfio_device *, unsigned index, unsigned start,
+ unsigned count, bool automasked);
+
+ fn = symbol_get(vfio_external_set_automasked);
+ if (!fn)
+ return -ENXIO;
+
+ ret = fn(vdev, index, start, count, automasked);
+
+ symbol_put(vfio_external_set_automasked);
+ return ret;
+}
+
static bool kvm_vfio_group_is_coherent(struct vfio_group *vfio_group)
{
long (*fn)(struct vfio_group *, unsigned long);
--
1.9.1
The kvm_vcpu_arch pause field is renamed into power_off to prepare
for the introduction of a new pause field.
Signed-off-by: Eric Auger <[email protected]>
v4 -> v5:
- fix compilation issue on arm64 (add power_off field in kvm_host.h)
---
arch/arm/include/asm/kvm_host.h | 4 ++--
arch/arm/kvm/arm.c | 10 +++++-----
arch/arm/kvm/psci.c | 10 +++++-----
arch/arm64/include/asm/kvm_host.h | 4 ++--
4 files changed, 14 insertions(+), 14 deletions(-)
diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 902a7d1..79a919c 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -129,8 +129,8 @@ struct kvm_vcpu_arch {
* here.
*/
- /* Don't run the guest on this vcpu */
- bool pause;
+ /* vcpu power-off state */
+ bool power_off;
/* IO related fields */
struct kvm_decode mmio_decode;
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index c3a2dc6..08e12dc 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -458,7 +458,7 @@ static void vcpu_pause(struct kvm_vcpu *vcpu)
{
wait_queue_head_t *wq = kvm_arch_vcpu_wq(vcpu);
- wait_event_interruptible(*wq, !vcpu->arch.pause);
+ wait_event_interruptible(*wq, !vcpu->arch.power_off);
}
static int kvm_vcpu_initialized(struct kvm_vcpu *vcpu)
@@ -508,7 +508,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
update_vttbr(vcpu->kvm);
- if (vcpu->arch.pause)
+ if (vcpu->arch.power_off)
vcpu_pause(vcpu);
kvm_vgic_flush_hwstate(vcpu);
@@ -730,12 +730,12 @@ static int kvm_arch_vcpu_ioctl_vcpu_init(struct kvm_vcpu *vcpu,
vcpu_reset_hcr(vcpu);
/*
- * Handle the "start in power-off" case by marking the VCPU as paused.
+ * Handle the "start in power-off" case.
*/
if (test_bit(KVM_ARM_VCPU_POWER_OFF, vcpu->arch.features))
- vcpu->arch.pause = true;
+ vcpu->arch.power_off = true;
else
- vcpu->arch.pause = false;
+ vcpu->arch.power_off = false;
return 0;
}
diff --git a/arch/arm/kvm/psci.c b/arch/arm/kvm/psci.c
index 02fa8ef..367b131 100644
--- a/arch/arm/kvm/psci.c
+++ b/arch/arm/kvm/psci.c
@@ -61,7 +61,7 @@ static unsigned long kvm_psci_vcpu_suspend(struct kvm_vcpu *vcpu)
static void kvm_psci_vcpu_off(struct kvm_vcpu *vcpu)
{
- vcpu->arch.pause = true;
+ vcpu->arch.power_off = true;
}
static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu *source_vcpu)
@@ -85,7 +85,7 @@ static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu *source_vcpu)
*/
if (!vcpu)
return PSCI_RET_INVALID_PARAMS;
- if (!vcpu->arch.pause) {
+ if (!vcpu->arch.power_off) {
if (kvm_psci_version(source_vcpu) != KVM_ARM_PSCI_0_1)
return PSCI_RET_ALREADY_ON;
else
@@ -113,7 +113,7 @@ static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu *source_vcpu)
* the general puspose registers are undefined upon CPU_ON.
*/
*vcpu_reg(vcpu, 0) = context_id;
- vcpu->arch.pause = false;
+ vcpu->arch.power_off = false;
smp_mb(); /* Make sure the above is visible */
wq = kvm_arch_vcpu_wq(vcpu);
@@ -150,7 +150,7 @@ static unsigned long kvm_psci_vcpu_affinity_info(struct kvm_vcpu *vcpu)
kvm_for_each_vcpu(i, tmp, kvm) {
mpidr = kvm_vcpu_get_mpidr_aff(tmp);
if (((mpidr & target_affinity_mask) == target_affinity) &&
- !tmp->arch.pause) {
+ !tmp->arch.power_off) {
return PSCI_0_2_AFFINITY_LEVEL_ON;
}
}
@@ -173,7 +173,7 @@ static void kvm_prepare_system_event(struct kvm_vcpu *vcpu, u32 type)
* re-initialized.
*/
kvm_for_each_vcpu(i, tmp, vcpu->kvm) {
- tmp->arch.pause = true;
+ tmp->arch.power_off = true;
kvm_vcpu_kick(tmp);
}
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 967fb1c..a713b0c 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -122,8 +122,8 @@ struct kvm_vcpu_arch {
* here.
*/
- /* Don't run the guest */
- bool pause;
+ /* vcpu power-off state */
+ bool power_off;
/* IO related fields */
struct kvm_decode mmio_decode;
--
1.9.1
This API allows to
- exit the guest and avoid re-entering it
- resume the guest execution
Signed-off-by: Eric Auger <[email protected]>
---
include/linux/kvm_host.h | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index ae9c720..f4e1829 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1053,6 +1053,18 @@ extern struct kvm_device_ops kvm_xics_ops;
extern struct kvm_device_ops kvm_arm_vgic_v2_ops;
extern struct kvm_device_ops kvm_arm_vgic_v3_ops;
+#ifdef __KVM_HAVE_ARCH_HALT_GUEST
+
+void kvm_arch_halt_guest(struct kvm *kvm);
+void kvm_arch_resume_guest(struct kvm *kvm);
+
+#else
+
+inline void kvm_arch_halt_guest(struct kvm *kvm) {}
+inline void kvm_arch_resume_guest(struct kvm *kvm) {}
+
+#endif
+
#ifdef CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT
static inline void kvm_vcpu_set_in_spin_loop(struct kvm_vcpu *vcpu, bool val)
--
1.9.1
This patch defines __KVM_HAVE_ARCH_HALT_GUEST and implements
kvm_arch_halt_guest and kvm_arch_resume_guest for ARM.
On halt, the guest is forced to exit and prevented from being
re-entered.
Signed-off-by: Eric Auger <[email protected]>
---
v4 -> v5: add arm64 support
- also defines __KVM_HAVE_ARCH_HALT_GUEST for arm64
- add pause field
---
arch/arm/include/asm/kvm_host.h | 4 ++++
arch/arm/kvm/arm.c | 32 +++++++++++++++++++++++++++++---
arch/arm64/include/asm/kvm_host.h | 4 ++++
3 files changed, 37 insertions(+), 3 deletions(-)
diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 79a919c..b829f93 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -28,6 +28,7 @@
#include <kvm/arm_arch_timer.h>
#define __KVM_HAVE_ARCH_INTC_INITIALIZED
+#define __KVM_HAVE_ARCH_HALT_GUEST
#if defined(CONFIG_KVM_ARM_MAX_VCPUS)
#define KVM_MAX_VCPUS CONFIG_KVM_ARM_MAX_VCPUS
@@ -132,6 +133,9 @@ struct kvm_vcpu_arch {
/* vcpu power-off state */
bool power_off;
+ /* Don't run the guest */
+ bool pause;
+
/* IO related fields */
struct kvm_decode mmio_decode;
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 08e12dc..ebc0b55 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -454,11 +454,36 @@ bool kvm_arch_intc_initialized(struct kvm *kvm)
return vgic_initialized(kvm);
}
+void kvm_arch_halt_guest(struct kvm *kvm)
+{
+ int i;
+ struct kvm_vcpu *vcpu;
+
+ kvm_for_each_vcpu(i, vcpu, kvm)
+ vcpu->arch.pause = true;
+ force_vm_exit(cpu_all_mask);
+}
+
+void kvm_arch_resume_guest(struct kvm *kvm)
+{
+ int i;
+ struct kvm_vcpu *vcpu;
+
+ kvm_for_each_vcpu(i, vcpu, kvm) {
+ wait_queue_head_t *wq = kvm_arch_vcpu_wq(vcpu);
+
+ vcpu->arch.pause = false;
+ wake_up_interruptible(wq);
+ }
+}
+
+
static void vcpu_pause(struct kvm_vcpu *vcpu)
{
wait_queue_head_t *wq = kvm_arch_vcpu_wq(vcpu);
- wait_event_interruptible(*wq, !vcpu->arch.power_off);
+ wait_event_interruptible(*wq, ((!vcpu->arch.power_off) &&
+ (!vcpu->arch.pause)));
}
static int kvm_vcpu_initialized(struct kvm_vcpu *vcpu)
@@ -508,7 +533,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
update_vttbr(vcpu->kvm);
- if (vcpu->arch.power_off)
+ if (vcpu->arch.power_off || vcpu->arch.pause)
vcpu_pause(vcpu);
kvm_vgic_flush_hwstate(vcpu);
@@ -526,7 +551,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
run->exit_reason = KVM_EXIT_INTR;
}
- if (ret <= 0 || need_new_vmid_gen(vcpu->kvm)) {
+ if (ret <= 0 || need_new_vmid_gen(vcpu->kvm) ||
+ vcpu->arch.pause) {
kvm_timer_sync_hwstate(vcpu);
local_irq_enable();
kvm_timer_finish_sync(vcpu);
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index a713b0c..ffcc698 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -29,6 +29,7 @@
#include <asm/kvm_mmio.h>
#define __KVM_HAVE_ARCH_INTC_INITIALIZED
+#define __KVM_HAVE_ARCH_HALT_GUEST
#if defined(CONFIG_KVM_ARM_MAX_VCPUS)
#define KVM_MAX_VCPUS CONFIG_KVM_ARM_MAX_VCPUS
@@ -125,6 +126,9 @@ struct kvm_vcpu_arch {
/* vcpu power-off state */
bool power_off;
+ /* Don't run the guest */
+ bool pause;
+
/* IO related fields */
struct kvm_decode mmio_decode;
--
1.9.1
This patch introduces a new KVM_DEV_VFIO_DEVICE group.
This is a new control channel which enables KVM to cooperate with
viable VFIO devices.
The patch introduces 2 attributes for this group:
KVM_DEV_VFIO_DEVICE_FORWARD_IRQ, KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ.
Their purpose is to turn a VFIO device IRQ into a forwarded IRQ and
respectively unset the feature.
The generic part introduced here interact with VFIO, genirq, KVM while
the architecture specific part mostly takes care of the virtual interrupt
controller programming.
Architecture specific implementation is enabled when
__KVM_HAVE_ARCH_KVM_VFIO_FORWARD is set. When not set those
functions are void.
Signed-off-by: Eric Auger <[email protected]>
---
v5 -> v6:
- does not use struct vfio_platform_device handles anymore
- remove DEBUG flags
- rename kvm_vfio_platform_get_irq into kvm_vfio_get_hwirq
- handle returned error values from external functions
v3 -> v4:
- use new kvm_vfio_dev_irq struct
- improve error handling according to Alex comments
- full rework or generic/arch specific split to accomodate for
unforward that never fails
- kvm_vfio_get_vfio_device and kvm_vfio_put_vfio_device removed from
that patch file and introduced before (since also used by Feng)
- guard kvm_vfio_control_irq_forward call with
__KVM_HAVE_ARCH_KVM_VFIO_FORWARD
v2 -> v3:
- add API comments in kvm_host.h
- improve the commit message
- create a private kvm_vfio_fwd_irq struct
- fwd_irq_action replaced by a bool and removal of VFIO_IRQ_CLEANUP. This
latter action will be handled in vgic.
- add a vfio_device handle argument to kvm_arch_set_fwd_state. The goal is
to move platform specific stuff in architecture specific code.
- kvm_arch_set_fwd_state renamed into kvm_arch_vfio_set_forward
- increment the ref counter each time we do an IRQ forwarding and decrement
this latter each time one IRQ forward is unset. Simplifies the whole
ref counting.
- simplification of list handling: create, search, removal
v1 -> v2:
- __KVM_HAVE_ARCH_KVM_VFIO renamed into __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
- original patch file separated into 2 parts: generic part moved in vfio.c
and ARM specific part(kvm_arch_set_fwd_state)
---
include/linux/kvm_host.h | 47 +++++++
virt/kvm/vfio.c | 310 ++++++++++++++++++++++++++++++++++++++++++++++-
2 files changed, 354 insertions(+), 3 deletions(-)
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index f4e1829..8f17f87 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1042,6 +1042,15 @@ struct kvm_device_ops {
unsigned long arg);
};
+/* internal self-contained structure describing a forwarded IRQ */
+struct kvm_fwd_irq {
+ struct kvm *kvm; /* VM to inject the GSI into */
+ struct vfio_device *vdev; /* vfio device the IRQ belongs to */
+ __u32 index; /* VFIO device IRQ index */
+ __u32 subindex; /* VFIO device IRQ subindex */
+ __u32 gsi; /* gsi, ie. virtual IRQ number */
+};
+
void kvm_device_get(struct kvm_device *dev);
void kvm_device_put(struct kvm_device *dev);
struct kvm_device *kvm_device_from_filp(struct file *filp);
@@ -1065,6 +1074,44 @@ inline void kvm_arch_resume_guest(struct kvm *kvm) {}
#endif
+#ifdef __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
+/**
+ * kvm_arch_set_forward - Sets forwarding for a given IRQ
+ *
+ * @kvm: handle to the VM
+ * @host_irq: physical IRQ number
+ * @guest_irq: virtual IRQ number
+ * returns 0 on success, < 0 on failure
+ */
+int kvm_arch_set_forward(struct kvm *kvm,
+ unsigned int host_irq, unsigned int guest_irq);
+
+/**
+ * kvm_arch_unset_forward - Unsets forwarding for a given IRQ
+ *
+ * @kvm: handle to the VM
+ * @host_irq: physical IRQ number
+ * @guest_irq: virtual IRQ number
+ * @active: returns whether the IRQ is active
+ */
+void kvm_arch_unset_forward(struct kvm *kvm,
+ unsigned int host_irq,
+ unsigned int guest_irq,
+ bool *active);
+
+#else
+static inline int kvm_arch_set_forward(struct kvm *kvm,
+ unsigned int host_irq,
+ unsigned int guest_irq)
+{
+ return -ENOENT;
+}
+static inline void kvm_arch_unset_forward(struct kvm *kvm,
+ unsigned int host_irq,
+ unsigned int guest_irq,
+ bool *active) {}
+#endif
+
#ifdef CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT
static inline void kvm_vcpu_set_in_spin_loop(struct kvm_vcpu *vcpu, bool val)
diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
index f4e86d3..cf40a2a 100644
--- a/virt/kvm/vfio.c
+++ b/virt/kvm/vfio.c
@@ -19,14 +19,27 @@
#include <linux/uaccess.h>
#include <linux/vfio.h>
#include "vfio.h"
+#include <linux/platform_device.h>
+#include <linux/irq.h>
+#include <linux/spinlock.h>
+#include <linux/interrupt.h>
+#include <linux/delay.h>
struct kvm_vfio_group {
struct list_head node;
struct vfio_group *vfio_group;
};
+/* private linkable kvm_fwd_irq struct */
+struct kvm_vfio_fwd_irq_node {
+ struct list_head link;
+ struct kvm_fwd_irq fwd_irq;
+};
+
struct kvm_vfio {
struct list_head group_list;
+ /* list of registered VFIO forwarded IRQs */
+ struct list_head fwd_node_list;
struct mutex lock;
bool noncoherent;
};
@@ -328,12 +341,295 @@ static int kvm_vfio_set_group(struct kvm_device *dev, long attr, u64 arg)
return -ENXIO;
}
+/**
+ * kvm_vfio_find_fwd_irq - checks whether a forwarded IRQ already is
+ * registered in the list of forwarded IRQs
+ *
+ * @kv: handle to the kvm-vfio device
+ * @fwd: handle to the forwarded irq struct
+ * In the positive returns the handle to its node in the kvm-vfio
+ * forwarded IRQ list, returns NULL otherwise.
+ * Must be called with kv->lock hold.
+ */
+static struct kvm_vfio_fwd_irq_node *kvm_vfio_find_fwd_irq(
+ struct kvm_vfio *kv,
+ struct kvm_fwd_irq *fwd)
+{
+ struct kvm_vfio_fwd_irq_node *node;
+
+ list_for_each_entry(node, &kv->fwd_node_list, link) {
+ if ((node->fwd_irq.index == fwd->index) &&
+ (node->fwd_irq.subindex == fwd->subindex) &&
+ (node->fwd_irq.vdev == fwd->vdev))
+ return node;
+ }
+ return NULL;
+}
+/**
+ * kvm_vfio_register_fwd_irq - Allocates, populates and registers a
+ * forwarded IRQ
+ *
+ * @kv: handle to the kvm-vfio device
+ * @fwd: handle to the forwarded irq struct
+ * In case of success returns a handle to the new list node,
+ * NULL otherwise.
+ * Must be called with kv->lock hold.
+ */
+static struct kvm_vfio_fwd_irq_node *kvm_vfio_register_fwd_irq(
+ struct kvm_vfio *kv,
+ struct kvm_fwd_irq *fwd)
+{
+ struct kvm_vfio_fwd_irq_node *node;
+
+ node = kmalloc(sizeof(*node), GFP_KERNEL);
+ if (!node)
+ return ERR_PTR(-ENOMEM);
+
+ node->fwd_irq = *fwd;
+
+ list_add(&node->link, &kv->fwd_node_list);
+
+ return node;
+}
+
+/**
+ * kvm_vfio_unregister_fwd_irq - unregisters and frees a forwarded IRQ
+ *
+ * @node: handle to the node struct
+ * Must be called with kv->lock hold.
+ */
+static void kvm_vfio_unregister_fwd_irq(struct kvm_vfio_fwd_irq_node *node)
+{
+ list_del(&node->link);
+ kvm_vfio_put_vfio_device(node->fwd_irq.vdev);
+ kfree(node);
+}
+
+static int kvm_vfio_get_hwirq(struct vfio_device *vdev, int index)
+{
+ int hwirq;
+ struct platform_device *platdev;
+ struct device *dev = kvm_vfio_external_base_device(vdev);
+
+ if (dev->bus == &platform_bus_type) {
+ platdev = to_platform_device(dev);
+ hwirq = platform_get_irq(platdev, index);
+ return hwirq;
+ } else
+ return -EINVAL;
+}
+
+/**
+ * kvm_vfio_set_forward - turns a VFIO device IRQ into a forwarded IRQ
+ * @kv: handle to the kvm-vfio device
+ * @fd: file descriptor of the vfio device the IRQ belongs to
+ * @fwd: handle to the forwarded irq struct
+ *
+ * Registers and turns an IRQ as forwarded. The operation only is allowed
+ * if the IRQ is not in progress (active at interrupt controller level or
+ * auto-masked by the handler). In case the user-space masked the IRQ,
+ * the operation will fail too.
+ * returns:
+ * -EAGAIN if the IRQ is in progress or VFIO masked
+ * -EEXIST if the IRQ is already registered as forwarded
+ * -ENOMEM on memory shortage
+ */
+static int kvm_vfio_set_forward(struct kvm_vfio *kv, int fd,
+ struct kvm_fwd_irq *fwd)
+{
+ int ret = -EAGAIN;
+ struct kvm_vfio_fwd_irq_node *node;
+ struct vfio_device *vdev = fwd->vdev;
+ int index = fwd->index;
+ int host_irq = kvm_vfio_get_hwirq(fwd->vdev, fwd->index);
+ bool active;
+
+ kvm_arch_halt_guest(fwd->kvm);
+
+ disable_irq(host_irq);
+
+ active = kvm_vfio_external_is_active(vdev, index, 0, 0);
+
+ if (active)
+ goto out;
+
+ node = kvm_vfio_register_fwd_irq(kv, fwd);
+ if (IS_ERR(node)) {
+ ret = PTR_ERR(node);
+ goto out;
+ }
+
+ ret = kvm_vfio_external_set_automasked(vdev, index, 0, 0, false);
+ if (ret)
+ goto out;
+
+ ret = kvm_arch_set_forward(fwd->kvm, host_irq, fwd->gsi);
+
+out:
+ enable_irq(host_irq);
+
+ kvm_arch_resume_guest(fwd->kvm);
+
+ return ret;
+}
+
+/**
+ * kvm_vfio_unset_forward - Sets a VFIO device IRQ as non forwarded
+ * @kv: handle to the kvm-vfio device
+ * @node: handle to the node to unset
+ *
+ * Calls the architecture specific implementation of set_forward and
+ * unregisters the IRQ from the forwarded IRQ list.
+ */
+static void kvm_vfio_unset_forward(struct kvm_vfio *kv,
+ struct kvm_vfio_fwd_irq_node *node)
+{
+ struct kvm_fwd_irq fwd = node->fwd_irq;
+ int host_irq = kvm_vfio_get_hwirq(fwd.vdev, fwd.index);
+ int index = fwd.index;
+ struct vfio_device *vdev = fwd.vdev;
+ bool active = false;
+
+ kvm_arch_halt_guest(fwd.kvm);
+
+ disable_irq(host_irq);
+
+ kvm_arch_unset_forward(fwd.kvm, host_irq, fwd.gsi, &active);
+
+ if (active)
+ kvm_vfio_external_mask(vdev, index, 0, 0);
+
+ kvm_vfio_external_set_automasked(vdev, index, 0, 0, true);
+ enable_irq(host_irq);
+
+ kvm_arch_resume_guest(fwd.kvm);
+
+ kvm_vfio_unregister_fwd_irq(node);
+}
+
+static int kvm_vfio_control_irq_forward(struct kvm_device *kdev, long attr,
+ int32_t __user *argp)
+{
+ struct kvm_vfio_dev_irq user_fwd_irq;
+ struct kvm_fwd_irq fwd;
+ struct vfio_device *vdev;
+ struct kvm_vfio *kv = kdev->private;
+ struct kvm_vfio_fwd_irq_node *node;
+ unsigned long minsz;
+ int ret = 0;
+ u32 *gsi = NULL;
+ size_t size;
+
+ minsz = offsetofend(struct kvm_vfio_dev_irq, count);
+
+ if (copy_from_user(&user_fwd_irq, argp, minsz))
+ return -EFAULT;
+
+ if (user_fwd_irq.argsz < minsz)
+ return -EINVAL;
+
+ vdev = kvm_vfio_get_vfio_device(user_fwd_irq.fd);
+ if (IS_ERR(vdev))
+ return PTR_ERR(vdev);
+
+ ret = kvm_vfio_get_hwirq(vdev, user_fwd_irq.index);
+ if (ret < 0)
+ goto error;
+
+ size = sizeof(__u32);
+ if (user_fwd_irq.argsz - minsz < size) {
+ ret = -EINVAL;
+ goto error;
+ }
+
+ gsi = memdup_user((void __user *)((unsigned long)argp + minsz), size);
+ if (IS_ERR(gsi)) {
+ ret = PTR_ERR(gsi);
+ goto error;
+ }
+
+ fwd.vdev = vdev;
+ fwd.kvm = kdev->kvm;
+ fwd.index = user_fwd_irq.index;
+ fwd.subindex = 0;
+ fwd.gsi = *gsi;
+
+ node = kvm_vfio_find_fwd_irq(kv, &fwd);
+
+ switch (attr) {
+ case KVM_DEV_VFIO_DEVICE_FORWARD_IRQ:
+ if (node) {
+ ret = -EEXIST;
+ goto error;
+ }
+ mutex_lock(&kv->lock);
+ ret = kvm_vfio_set_forward(kv, user_fwd_irq.fd, &fwd);
+ mutex_unlock(&kv->lock);
+ break;
+ case KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ:
+ if (!node) {
+ ret = -ENOENT;
+ goto error;
+ }
+ mutex_lock(&kv->lock);
+ kvm_vfio_unset_forward(kv, node);
+ mutex_unlock(&kv->lock);
+ kvm_vfio_put_vfio_device(vdev);
+ ret = 0;
+ break;
+ }
+
+ kfree(gsi);
+error:
+ if (ret < 0)
+ kvm_vfio_put_vfio_device(vdev);
+ return ret;
+}
+
+static int kvm_vfio_set_device(struct kvm_device *kdev, long attr, u64 arg)
+{
+ int32_t __user *argp = (int32_t __user *)(unsigned long)arg;
+ int ret;
+
+ switch (attr) {
+#ifdef __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
+ case KVM_DEV_VFIO_DEVICE_FORWARD_IRQ:
+ case KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ:
+ ret = kvm_vfio_control_irq_forward(kdev, attr, argp);
+ break;
+#endif
+ default:
+ ret = -ENXIO;
+ }
+ return ret;
+}
+
+/**
+ * kvm_vfio_clean_fwd_irq - Unset forwarding state of all
+ * registered forwarded IRQs and free their list nodes.
+ * @kv: kvm-vfio device
+ *
+ * Loop on all registered device/IRQ combos, reset the non forwarded state,
+ * void the lists and release the reference
+ */
+static int kvm_vfio_clean_fwd_irq(struct kvm_vfio *kv)
+{
+ struct kvm_vfio_fwd_irq_node *node, *tmp;
+
+ list_for_each_entry_safe(node, tmp, &kv->fwd_node_list, link) {
+ kvm_vfio_unset_forward(kv, node);
+ }
+ return 0;
+}
+
static int kvm_vfio_set_attr(struct kvm_device *dev,
struct kvm_device_attr *attr)
{
switch (attr->group) {
case KVM_DEV_VFIO_GROUP:
return kvm_vfio_set_group(dev, attr->attr, attr->addr);
+ case KVM_DEV_VFIO_DEVICE:
+ return kvm_vfio_set_device(dev, attr->attr, attr->addr);
}
return -ENXIO;
@@ -349,10 +645,17 @@ static int kvm_vfio_has_attr(struct kvm_device *dev,
case KVM_DEV_VFIO_GROUP_DEL:
return 0;
}
-
break;
+#ifdef __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
+ case KVM_DEV_VFIO_DEVICE:
+ switch (attr->attr) {
+ case KVM_DEV_VFIO_DEVICE_FORWARD_IRQ:
+ case KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ:
+ return 0;
+ }
+ break;
+#endif
}
-
return -ENXIO;
}
@@ -366,7 +669,7 @@ static void kvm_vfio_destroy(struct kvm_device *dev)
list_del(&kvg->node);
kfree(kvg);
}
-
+ kvm_vfio_clean_fwd_irq(kv);
kvm_vfio_update_coherency(dev);
kfree(kv);
@@ -398,6 +701,7 @@ static int kvm_vfio_create(struct kvm_device *dev, u32 type)
return -ENOMEM;
INIT_LIST_HEAD(&kv->group_list);
+ INIT_LIST_HEAD(&kv->fwd_node_list);
mutex_init(&kv->lock);
dev->private = kv;
--
1.9.1
This patch sets __KVM_HAVE_ARCH_KVM_VFIO_FORWARD and implements
kvm_arch_set_forward for ARM/ARM64.
As a result the KVM-VFIO device now allows to forward/unforward a
VFIO device IRQ on ARM.
kvm_arch_set_forward and kvm_arch_unset_forward mostly take care of
VGIC programming: physical IRQ/guest IRQ mapping, list register cleanup,
VGIC state machine.
Signed-off-by: Eric Auger <[email protected]>
---
v4 -> v5:
- fix arm64 compilation issues, ie. also defines
__KVM_HAVE_ARCH_HALT_GUEST for arm64
v3 -> v4:
- code originally located in kvm_vfio_arm.c
- kvm_arch_vfio_{set|unset}_forward renamed into
kvm_arch_{set|unset}_forward
- split into 2 functions (set/unset) since unset does not fail anymore
- unset can be invoked at whatever time. Extra care is taken to handle
transition in VGIC state machine, LR cleanup, ...
v2 -> v3:
- renaming of kvm_arch_set_fwd_state into kvm_arch_vfio_set_forward
- takes a bool arg instead of kvm_fwd_irq_action enum
- removal of KVM_VFIO_IRQ_CLEANUP
- platform device check now happens here
- more precise errors returned
- irq_eoi handled externally to this patch (VGIC)
- correct enable_irq bug done twice
- reword the commit message
- correct check of platform_bus_type
- use raw_spin_lock_irqsave and check the validity of the handler
---
arch/arm/include/asm/kvm_host.h | 1 +
arch/arm64/include/asm/kvm_host.h | 1 +
virt/kvm/arm/vgic.c | 190 ++++++++++++++++++++++++++++++++++++++
3 files changed, 192 insertions(+)
diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index b829f93..8e3fd7f 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -29,6 +29,7 @@
#define __KVM_HAVE_ARCH_INTC_INITIALIZED
#define __KVM_HAVE_ARCH_HALT_GUEST
+#define __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
#if defined(CONFIG_KVM_ARM_MAX_VCPUS)
#define KVM_MAX_VCPUS CONFIG_KVM_ARM_MAX_VCPUS
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index ffcc698..9be392a 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -30,6 +30,7 @@
#define __KVM_HAVE_ARCH_INTC_INITIALIZED
#define __KVM_HAVE_ARCH_HALT_GUEST
+#define __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
#if defined(CONFIG_KVM_ARM_MAX_VCPUS)
#define KVM_MAX_VCPUS CONFIG_KVM_ARM_MAX_VCPUS
diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
index f345c41..c0ac151 100644
--- a/virt/kvm/arm/vgic.c
+++ b/virt/kvm/arm/vgic.c
@@ -2133,3 +2133,193 @@ int kvm_set_msi(struct kvm_kernel_irq_routing_entry *e,
{
return 0;
}
+
+/**
+ * kvm_arch_set_forward - Set forwarding for a given IRQ
+ *
+ * @kvm: handle to the VM
+ * @host_irq: physical IRQ number
+ * @guest_irq: virtual IRQ number
+ *
+ * This function is supposed to be called only if the IRQ
+ * is not in progress: ie. not active at VGIC level and not
+ * currently under injection in the KVM.
+ */
+int kvm_arch_set_forward(struct kvm *kvm, unsigned int host_irq,
+ unsigned int guest_irq)
+{
+ irq_hw_number_t gic_irq;
+ struct irq_desc *desc = irq_to_desc(host_irq);
+ struct irq_data *d;
+ unsigned long flags;
+ struct kvm_vcpu *vcpu = kvm_get_vcpu(kvm, 0);
+ int ret = 0;
+ int spi_id = guest_irq + VGIC_NR_PRIVATE_IRQS;
+ struct vgic_dist *dist = &kvm->arch.vgic;
+
+ if (!vcpu)
+ return 0;
+
+ spin_lock(&dist->lock);
+
+ raw_spin_lock_irqsave(&desc->lock, flags);
+ d = &desc->irq_data;
+ gic_irq = irqd_to_hwirq(d);
+ irqd_set_irq_forwarded(d);
+ /*
+ * next physical IRQ will be be handled as forwarded
+ * by the host (priority drop only)
+ */
+
+ raw_spin_unlock_irqrestore(&desc->lock, flags);
+
+ /*
+ * need to release the dist spin_lock here since
+ * vgic_map_phys_irq can sleep
+ */
+ spin_unlock(&dist->lock);
+ ret = vgic_map_phys_irq(vcpu, spi_id, (int)gic_irq);
+ /*
+ * next guest_irq injection will be considered as
+ * forwarded and next flush will program LR
+ * without maintenance IRQ but with HW bit set
+ */
+ return ret;
+}
+
+/**
+ * kvm_arch_unset_forward - Unset forwarding for a given IRQ
+ *
+ * @kvm: handle to the VM
+ * @host_irq: physical IRQ number
+ * @guest_irq: virtual IRQ number
+ * @active: returns whether the physical IRQ is active
+ *
+ * This function must be called when the host_irq is disabled
+ * and guest has been exited and prevented from being re-entered.
+ *
+ */
+void kvm_arch_unset_forward(struct kvm *kvm,
+ unsigned int host_irq,
+ unsigned int guest_irq,
+ bool *active)
+{
+ struct kvm_vcpu *vcpu = kvm_get_vcpu(kvm, 0);
+ struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
+ struct vgic_dist *dist = &kvm->arch.vgic;
+ int ret, lr;
+ struct vgic_lr vlr;
+ struct irq_desc *desc = irq_to_desc(host_irq);
+ struct irq_data *d;
+ unsigned long flags;
+ irq_hw_number_t gic_irq;
+ int spi_id = guest_irq + VGIC_NR_PRIVATE_IRQS;
+ struct irq_chip *chip;
+ bool queued, needs_deactivate = true;
+
+ spin_lock(&dist->lock);
+
+ irq_get_irqchip_state(host_irq, IRQCHIP_STATE_ACTIVE, active);
+
+ if (!vcpu)
+ goto out;
+
+ raw_spin_lock_irqsave(&desc->lock, flags);
+ d = irq_desc_get_irq_data(desc);
+ gic_irq = irqd_to_hwirq(d);
+ raw_spin_unlock_irqrestore(&desc->lock, flags);
+
+ ret = vgic_unmap_phys_irq(vcpu, spi_id, gic_irq);
+ /*
+ * subsequent update_irq_pending or flush will handle this
+ * irq as forwarded
+ */
+
+ if (likely(!(*active))) {
+ /*
+ * the IRQ was not active. It may happen the handle_fasteoi_irq
+ * is entered afterwards due to lazy disable_irq. If this
+ * happens the IRQ will be deactivated and never be injected.
+ * let's simply prepare the states for subsequent non forwarded
+ * injection
+ */
+ vgic_dist_irq_clear_level(vcpu, spi_id);
+ vgic_dist_irq_clear_pending(vcpu, spi_id);
+ vgic_irq_clear_queued(vcpu, spi_id);
+ needs_deactivate = false;
+ goto out;
+ }
+
+ /* is there any list register with valid state? */
+ lr = vgic_cpu->vgic_irq_lr_map[spi_id];
+ queued = false;
+ if (lr != LR_EMPTY) {
+ vlr = vgic_get_lr(vcpu, lr);
+ if (vlr.state & LR_STATE_MASK)
+ queued = true;
+ }
+
+ if (!queued) {
+ vgic_irq_clear_queued(vcpu, spi_id);
+ if (vgic_dist_irq_is_pending(vcpu, spi_id)) {
+ /*
+ * IRQ is injected but not yet queued. LR will be
+ * written with EOI_INT and process_maintenance will
+ * reset the states: queued, level(resampler). Pending
+ * will be reset on flush.
+ */
+ vgic_dist_irq_set_level(vcpu, spi_id);
+ } else {
+ /*
+ * We are before the injection (update_irq_pending).
+ * This is the most tricky window. Due to the usage of
+ * disable_irq in generic unforward part (mandated by
+ * resamplefd unmask VFIO integration), there is a risk
+ * the fasteoi handler returns without calling the VFIO
+ * handler and deactivates the physical IRQ (lazy
+ * disable). Hence we cannot know whether the IRQ will
+ * ever be injected. The only solution found is to do as
+ * if the IRQ were not active. The active parameter
+ * typically is used by the caller to know whether
+ * it needs to mask * the IRQ. If not duly masked,
+ * another physical IRQ may hit again while the previous
+ * virtual IRQ is in progress. Update_irq_pending
+ * validate_injection will prevent this injection.
+ */
+ vgic_dist_irq_clear_level(vcpu, spi_id);
+ *active = false;
+ }
+ goto out;
+ }
+
+ /*
+ * the virtual IRQ is queued and a valid LR exists, let's patch it for
+ * EOI maintenance
+ */
+ vlr.state |= LR_EOI_INT;
+ vgic_set_lr(vcpu, lr, vlr);
+ /*
+ * we expect a maintenance IRQ which will reset the
+ * queued, pending and level states
+ */
+ vgic_dist_irq_set_level(vcpu, spi_id);
+ vgic_dist_irq_set_pending(vcpu, spi_id);
+ vgic_irq_set_queued(vcpu, spi_id);
+
+out:
+
+ raw_spin_lock_irqsave(&desc->lock, flags);
+ d = irq_desc_get_irq_data(desc);
+ if (needs_deactivate) {
+ chip = irq_data_get_irq_chip(d);
+ chip->irq_eoi(d);
+ }
+ irqd_clr_irq_forwarded(d);
+ /* next occurrence will be deactivated by the host */
+ raw_spin_unlock_irqrestore(&desc->lock, flags);
+
+ *active = *active && vcpu;
+
+ spin_unlock(&dist->lock);
+}
+
--
1.9.1