2014-11-23 18:37:28

by Eric Auger

[permalink] [raw]
Subject: [PATCH v3 0/8] KVM-VFIO IRQ forward control

This series proposes an integration of "ARM: Forwarding physical
interrupts to a guest VM" (http://lwn.net/Articles/603514/) in
KVM.

It enables to transform a VFIO platform driver IRQ into a forwarded
IRQ.

When a physical IRQ is forwarded (to a guest), the host does not
deactivates this latter. Completion ownership is transferred to the
guest. When the guest deactivates the associated virtual IRQ,
the interrupt controler automatically completes the physical IRQ.
Obviously this requires some dedicated HW support in the interrupt
controler.

The direct benefit is that, for a level sensitive IRQ, it avoids a
VM exit on forwarded IRQ completion.

When the IRQ is forwarded, the VFIO platform driver does not need to
mask the physical IRQ anymore before signaling the eventfd. Indeed
genirq lowers the running priority, enabling other physical IRQ to hit
except that one.

Besides, the injection still is based on irqfd triggering. The only
impact on irqfd process is resamplefd is not called anymore on
virtual IRQ completion since this latter becomes "transparent".

The current integration is based on an extension of the KVM-VFIO
device, previously used by KVM to interact with VFIO groups. The
patch series now enables KVM to directly interact with a VFIO
platform device. The VFIO external API was extended for that purpose.

Th KVM-VFIO device can get/put the vfio platform device, check its
integrity and type, get the IRQ number associated to an IRQ index.

The IRQ forward programming is architecture specific (virtual interrupt
controller programming basically). However the whole infrastructure is
kept generic.

from a user point of view, the functionality is provided through a
new KVM-VFIO group named KVM_DEV_VFIO_DEVICE and 2 associated
attributes:
- KVM_DEV_VFIO_DEVICE_FORWARD_IRQ,
- KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ.

The capability can be checked with KVM_HAS_DEVICE_ATTR.

Forwarding must be activated before VFIO signaling mechanism is set
using VFIO_DEVICE_SET_IRQS and unset while the signaling is disabled.

---

This patch series has the following dependencies:
- "ARM: Forwarding physical interrupts to a guest VM"
(http://lwn.net/Articles/603514/)
- [PATCH v9 00/19] VFIO support for platform and AMBA devices on ARM
(http://www.spinics.net/lists/kvm-arm/msg11745.html)
- [PATCH v2 0/6] vfio: type1: support for ARM SMMUS with VFIO_IOMMU_TYPE1
(http://www.spinics.net/lists/kvm-arm/msg11738.html)

Integrated pieces can be found at
ssh://git.linaro.org/people/eric.auger/linux.git
on branch irqfd_integ_v8

This was was tested on Calxeda Midway, assigning the xgmac main IRQ.

v2 -> v3:
- kvm_fwd_irq_action enum replaced by a bool (KVM_VFIO_IRQ_CLEANUP does not
exist anymore)
- a new struct local to vfio.c was introduced to wrap kvm_fw_irq and make it
linkable: kvm_vfio_fwd_irq_node
- kvm_fwd_irq now is self-contained (includes struct vfio_device *)
- a single list of kvm_vfio_fwd_irq_irq_node is used instead of having
a list of devices and a list of forward irq per device. Having 2 lists
brought extra complexity.
- the VFIO device ref counter is incremented each time a new IRQ is forwarded.
It is not attempted anymore to hold a single reference whatever the number
of forwarded IRQs.
- subindex added on top of index to be closer to VFIO API
- platform device check moved in the arm specific implementation
- enable the KVM-VFIO device for arm64
- forwarded state change only can happen while the VFIO IRQ handler is not
set; in other words, when the VFIO IRQ signaling is not set.

v1 -> v2:
- forward control is moved from architecture specific file into generic
vfio.c module.
only kvm_arch_set_fwd_state remains architecture specific
- integrate Kim's patch which enables KVM-VFIO for ARM
- fix vgic state bypass in vgic_queue_hwirq
- struct kvm_arch_forwarded_irq moved from arch/arm/include/uapi/asm/kvm.h
to include/uapi/linux/kvm.h
also irq_index renamed into index and guest_irq renamed into gsi
- ASSIGN/DEASSIGN renamed into FORWARD/UNFORWARD
- vfio_external_get_base_device renamed into vfio_external_base_device
- vfio_external_get_type removed
- kvm_vfio_external_get_base_device renamed into kvm_vfio_external_base_device
- __KVM_HAVE_ARCH_KVM_VFIO renamed into __KVM_HAVE_ARCH_KVM_VFIO_FORWARD


Eric Auger (7):
KVM: arm64: Enable the KVM-VFIO device
VFIO: platform: forwarded state tested when selecting IRQ handler
KVM: kvm-vfio: User API for IRQ forwarding
VFIO: External user API device helpers
KVM: kvm-vfio: wrapper to VFIO external API device helpers
KVM: kvm-vfio: generic forwarding control
KVM: arm: kvm-vfio: forwarding control

Kim Phillips (1):
KVM: arm: Enable the KVM-VFIO device

Documentation/virtual/kvm/devices/vfio.txt | 34 +++-
arch/arm/include/asm/kvm_host.h | 7 +
arch/arm/kvm/Kconfig | 1 +
arch/arm/kvm/Makefile | 4 +-
arch/arm/kvm/kvm_vfio_arm.c | 101 ++++++++++
arch/arm64/kvm/Kconfig | 1 +
arch/arm64/kvm/Makefile | 2 +-
drivers/vfio/platform/vfio_platform_irq.c | 7 +-
drivers/vfio/vfio.c | 24 +++
include/linux/kvm_host.h | 28 +++
include/linux/vfio.h | 3 +
include/uapi/linux/kvm.h | 10 +
virt/kvm/vfio.c | 294 ++++++++++++++++++++++++++++-
13 files changed, 503 insertions(+), 13 deletions(-)
create mode 100644 arch/arm/kvm/kvm_vfio_arm.c

--
1.9.1


2014-11-23 18:37:32

by Eric Auger

[permalink] [raw]
Subject: [PATCH v3 1/8] KVM: arm: Enable the KVM-VFIO device

From: Kim Phillips <[email protected]>

Used by KVM-enabled VFIO-based device passthrough support in QEMU.

Signed-off-by: Kim Phillips <[email protected]>
---
arch/arm/kvm/Kconfig | 1 +
arch/arm/kvm/Makefile | 2 +-
2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig
index e519a40..aace254 100644
--- a/arch/arm/kvm/Kconfig
+++ b/arch/arm/kvm/Kconfig
@@ -24,6 +24,7 @@ config KVM
select KVM_MMIO
select KVM_ARM_HOST
depends on ARM_VIRT_EXT && ARM_LPAE
+ select KVM_VFIO
select HAVE_KVM_EVENTFD
---help---
Support hosting virtualized guest machines. You will also
diff --git a/arch/arm/kvm/Makefile b/arch/arm/kvm/Makefile
index 859db09..ea1fa76 100644
--- a/arch/arm/kvm/Makefile
+++ b/arch/arm/kvm/Makefile
@@ -15,7 +15,7 @@ AFLAGS_init.o := -Wa,-march=armv7-a$(plus_virt)
AFLAGS_interrupts.o := -Wa,-march=armv7-a$(plus_virt)

KVM := ../../../virt/kvm
-kvm-arm-y = $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o $(KVM)/eventfd.o
+kvm-arm-y = $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o $(KVM)/eventfd.o $(KVM)/vfio.o

obj-y += kvm-arm.o init.o interrupts.o
obj-y += arm.o handle_exit.o guest.o mmu.o emulate.o reset.o
--
1.9.1

2014-11-23 18:37:37

by Eric Auger

[permalink] [raw]
Subject: [PATCH v3 2/8] KVM: arm64: Enable the KVM-VFIO device

Used by KVM-enabled VFIO-based device passthrough support in QEMU.

Signed-off-by: Joel Schopp <[email protected]>
Signed-off-by: Eric Auger <[email protected]>

---

Extracted from [RFC PATCH] arm64: KVM: add irqfd support
http://www.spinics.net/lists/kvm-arm/msg10798.html
---
arch/arm64/kvm/Kconfig | 1 +
arch/arm64/kvm/Makefile | 2 +-
2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
index 09c25c2..2edf926 100644
--- a/arch/arm64/kvm/Kconfig
+++ b/arch/arm64/kvm/Kconfig
@@ -26,6 +26,7 @@ config KVM
select KVM_ARM_HOST
select KVM_ARM_VGIC
select KVM_ARM_TIMER
+ select KVM_VFIO
select HAVE_KVM_EVENTFD
---help---
Support hosting virtualized guest machines.
diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
index 2e6b827..81ed091 100644
--- a/arch/arm64/kvm/Makefile
+++ b/arch/arm64/kvm/Makefile
@@ -11,7 +11,7 @@ ARM=../../../arch/arm/kvm

obj-$(CONFIG_KVM_ARM_HOST) += kvm.o

-kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o $(KVM)/eventfd.o
+kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o $(KVM)/eventfd.o $(KVM)/vfio.o
kvm-$(CONFIG_KVM_ARM_HOST) += $(ARM)/arm.o $(ARM)/mmu.o $(ARM)/mmio.o
kvm-$(CONFIG_KVM_ARM_HOST) += $(ARM)/psci.o $(ARM)/perf.o

--
1.9.1

2014-11-23 18:37:45

by Eric Auger

[permalink] [raw]
Subject: [PATCH v3 5/8] VFIO: External user API device helpers

The VFIO external user API is enriched with 3 new functions that
allows a kernel user external to VFIO to retrieve some information
from a VFIO device.

- vfio_device_get_external_user enables to get a vfio device from
its fd and increments its reference counter
- vfio_device_put_external_user decrements the reference counter
- vfio_external_base_device returns a handle to the struct device

Signed-off-by: Eric Auger <[email protected]>

---

v2 -> v3:
- reword the commit message

v1 -> v2:

- vfio_external_get_base_device renamed into vfio_external_base_device
- vfio_external_get_type removed
---
drivers/vfio/vfio.c | 24 ++++++++++++++++++++++++
include/linux/vfio.h | 3 +++
2 files changed, 27 insertions(+)

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index 8e84471..282814e 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -1401,6 +1401,30 @@ void vfio_group_put_external_user(struct vfio_group *group)
}
EXPORT_SYMBOL_GPL(vfio_group_put_external_user);

+struct vfio_device *vfio_device_get_external_user(struct file *filep)
+{
+ struct vfio_device *vdev = filep->private_data;
+
+ if (filep->f_op != &vfio_device_fops)
+ return ERR_PTR(-EINVAL);
+
+ vfio_device_get(vdev);
+ return vdev;
+}
+EXPORT_SYMBOL_GPL(vfio_device_get_external_user);
+
+void vfio_device_put_external_user(struct vfio_device *vdev)
+{
+ vfio_device_put(vdev);
+}
+EXPORT_SYMBOL_GPL(vfio_device_put_external_user);
+
+struct device *vfio_external_base_device(struct vfio_device *vdev)
+{
+ return vdev->dev;
+}
+EXPORT_SYMBOL_GPL(vfio_external_base_device);
+
int vfio_external_user_iommu_id(struct vfio_group *group)
{
return iommu_group_id(group->iommu_group);
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index 2fb2e30..08e33ec 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -99,6 +99,9 @@ extern void vfio_group_put_external_user(struct vfio_group *group);
extern int vfio_external_user_iommu_id(struct vfio_group *group);
extern long vfio_external_check_extension(struct vfio_group *group,
unsigned long arg);
+extern struct vfio_device *vfio_device_get_external_user(struct file *filep);
+extern void vfio_device_put_external_user(struct vfio_device *vdev);
+extern struct device *vfio_external_base_device(struct vfio_device *vdev);

struct pci_dev;
#ifdef CONFIG_EEH
--
1.9.1

2014-11-23 18:37:50

by Eric Auger

[permalink] [raw]
Subject: [PATCH v3 6/8] KVM: kvm-vfio: wrapper to VFIO external API device helpers

Provide wrapper functions that allow KVM-VFIO device code to
interact with a vfio device:
- kvm_vfio_device_get_external_user gets a handle to a struct
vfio_device from the vfio device file descriptor and increments
its reference counter,
- kvm_vfio_device_put_external_user decrements the reference counter
to a vfio device,
- kvm_vfio_external_base_device returns a handle to the struct device
of the vfio device.

The KVM-VFIO device uses the VFIO external API device functions.

Signed-off-by: Eric Auger <[email protected]>

---

v2 -> v3:
reword the commit message and title

v1 -> v2:
- kvm_vfio_external_get_base_device renamed into
kvm_vfio_external_base_device
- kvm_vfio_external_get_type removed
---
arch/arm/include/asm/kvm_host.h | 5 +++++
virt/kvm/vfio.c | 45 +++++++++++++++++++++++++++++++++++++++++
2 files changed, 50 insertions(+)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 53036e2..bca5b79 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -169,6 +169,11 @@ void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte);
unsigned long kvm_arm_num_regs(struct kvm_vcpu *vcpu);
int kvm_arm_copy_reg_indices(struct kvm_vcpu *vcpu, u64 __user *indices);

+struct vfio_device;
+struct vfio_device *kvm_vfio_device_get_external_user(struct file *filep);
+void kvm_vfio_device_put_external_user(struct vfio_device *vdev);
+struct device *kvm_vfio_external_base_device(struct vfio_device *vdev);
+
/* We do not have shadow page tables, hence the empty hooks */
static inline int kvm_age_hva(struct kvm *kvm, unsigned long start,
unsigned long end)
diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
index 620e37f..6f0cc34 100644
--- a/virt/kvm/vfio.c
+++ b/virt/kvm/vfio.c
@@ -60,6 +60,51 @@ static void kvm_vfio_group_put_external_user(struct vfio_group *vfio_group)
symbol_put(vfio_group_put_external_user);
}

+struct vfio_device *kvm_vfio_device_get_external_user(struct file *filep)
+{
+ struct vfio_device *vdev;
+ struct vfio_device *(*fn)(struct file *);
+
+ fn = symbol_get(vfio_device_get_external_user);
+ if (!fn)
+ return ERR_PTR(-EINVAL);
+
+ vdev = fn(filep);
+
+ symbol_put(vfio_device_get_external_user);
+
+ return vdev;
+}
+
+void kvm_vfio_device_put_external_user(struct vfio_device *vdev)
+{
+ void (*fn)(struct vfio_device *);
+
+ fn = symbol_get(vfio_device_put_external_user);
+ if (!fn)
+ return;
+
+ fn(vdev);
+
+ symbol_put(vfio_device_put_external_user);
+}
+
+struct device *kvm_vfio_external_base_device(struct vfio_device *vdev)
+{
+ struct device *(*fn)(struct vfio_device *);
+ struct device *dev;
+
+ fn = symbol_get(vfio_external_base_device);
+ if (!fn)
+ return NULL;
+
+ dev = fn(vdev);
+
+ symbol_put(vfio_external_base_device);
+
+ return dev;
+}
+
static bool kvm_vfio_group_is_coherent(struct vfio_group *vfio_group)
{
long (*fn)(struct vfio_group *, unsigned long);
--
1.9.1

2014-11-23 18:37:55

by Eric Auger

[permalink] [raw]
Subject: [PATCH v3 8/8] KVM: arm: kvm-vfio: forwarding control

This patch sets __KVM_HAVE_ARCH_KVM_VFIO_FORWARD and implements
kvm_arch_vfio_set_forward for ARM.

As a result the KVM-VFIO device now allows to forward/unforward a
VFIO device IRQ on ARM.

kvm_arch_vfio_set_forward programs both genirq and the VGIC to control
where the physical IRQ deactivation is initiated.
- forwarded case: deactivation is initiated by the guest; when it
completes the virtual IRQ, the GIC automatically deactivates the
physical IRQ.
- not forwarded case: the physical IRQ deactivation is handled by the host

Signed-off-by: Eric Auger <[email protected]>

---

v2 -> v3:
- renaming of kvm_arch_set_fwd_state into kvm_arch_vfio_set_forward
- takes a bool arg instead of kvm_fwd_irq_action enum
- removal of KVM_VFIO_IRQ_CLEANUP
- platform device check now happens here
- more precise errors returned
- irq_eoi handled externally to this patch (VGIC)
- correct enable_irq bug done twice
- reword the commit message
- correct check of platform_bus_type
- use raw_spin_lock_irqsave and check the validity of the handler
---
arch/arm/include/asm/kvm_host.h | 2 +
arch/arm/kvm/Makefile | 2 +-
arch/arm/kvm/kvm_vfio_arm.c | 101 ++++++++++++++++++++++++++++++++++++++++
3 files changed, 104 insertions(+), 1 deletion(-)
create mode 100644 arch/arm/kvm/kvm_vfio_arm.c

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index bca5b79..447f90c 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -27,6 +27,8 @@
#include <asm/fpstate.h>
#include <kvm/arm_arch_timer.h>

+#define __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
+
#if defined(CONFIG_KVM_ARM_MAX_VCPUS)
#define KVM_MAX_VCPUS CONFIG_KVM_ARM_MAX_VCPUS
#else
diff --git a/arch/arm/kvm/Makefile b/arch/arm/kvm/Makefile
index ea1fa76..26a5a42 100644
--- a/arch/arm/kvm/Makefile
+++ b/arch/arm/kvm/Makefile
@@ -19,7 +19,7 @@ kvm-arm-y = $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o $(KVM)/eventfd.o $(KVM)/vf

obj-y += kvm-arm.o init.o interrupts.o
obj-y += arm.o handle_exit.o guest.o mmu.o emulate.o reset.o
-obj-y += coproc.o coproc_a15.o coproc_a7.o mmio.o psci.o perf.o
+obj-y += coproc.o coproc_a15.o coproc_a7.o mmio.o psci.o perf.o kvm_vfio_arm.o
obj-$(CONFIG_KVM_ARM_VGIC) += $(KVM)/arm/vgic.o
obj-$(CONFIG_KVM_ARM_VGIC) += $(KVM)/arm/vgic-v2.o
obj-$(CONFIG_KVM_ARM_TIMER) += $(KVM)/arm/arch_timer.o
diff --git a/arch/arm/kvm/kvm_vfio_arm.c b/arch/arm/kvm/kvm_vfio_arm.c
new file mode 100644
index 0000000..af2c501
--- /dev/null
+++ b/arch/arm/kvm/kvm_vfio_arm.c
@@ -0,0 +1,101 @@
+/*
+ * Copyright (C) 2014 Linaro Ltd.
+ * Authors: Eric Auger <[email protected]>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#include <linux/errno.h>
+#include <linux/file.h>
+#include <linux/kvm_host.h>
+#include <linux/list.h>
+#include <linux/mutex.h>
+#include <linux/vfio.h>
+#include <linux/irq.h>
+#include <asm/kvm_host.h>
+#include <asm/kvm.h>
+#include <linux/irq.h>
+#include <linux/spinlock.h>
+#include <linux/platform_device.h>
+#include <linux/interrupt.h>
+
+/**
+ * kvm_arch_vfio_set_forward - Change the forward state of an IRQ
+ *
+ * @fwd_irq: handle to the forward irq struct
+ * @forward: target forwarding state
+ *
+ * If forward is true, programs genirq and VGIC so that physical IRQ
+ * deactivation ownership is transferred to the guest (using GIC HW feature).
+ * When forward is false, standard behavior is restored, ie. host
+ * deactivates the physical IRQ.
+ * returns:
+ * -EINVAL if the vfio device is not a platform device
+ * -ENOENT if the irq could not be identified
+ * -EBUSY if physical IRQ is in progress
+ * -ENOENT if the VGIC has a physical/virtual IRQ mapping that is not
+ * consistent with the request.
+ */
+int kvm_arch_vfio_set_forward(struct kvm_fwd_irq *fwd_irq,
+ bool forward)
+{
+ int hwirq;
+ int ret = -EBUSY;
+ struct irq_desc *desc;
+ struct irq_data *d;
+ struct platform_device *platdev;
+ struct device *dev = kvm_vfio_external_base_device(fwd_irq->vdev);
+ unsigned long flags;
+ /*
+ * We don't have to garantee the vcpu handle is non void since the
+ * vfio device holds a reference to the kvm struct
+ */
+ struct kvm_vcpu *vcpu = kvm_get_vcpu(fwd_irq->kvm, 0);
+
+ if (dev->bus == &platform_bus_type) {
+ platdev = to_platform_device(dev);
+ hwirq = platform_get_irq(platdev, fwd_irq->index);
+ if (hwirq < 0)
+ return -EINVAL;
+ } else
+ return -ENOENT;
+ desc = irq_to_desc(hwirq);
+
+ /*
+ * if VFIO handler is already set, forwarded state cannot be
+ * changed anymore
+ */
+ raw_spin_lock_irqsave(&desc->lock, flags);
+ if (desc->action)
+ goto end;
+
+ d = &desc->irq_data;
+
+ if (forward) {
+ irqd_set_irq_forwarded(d);
+ raw_spin_unlock_irqrestore(&desc->lock, flags);
+ ret = vgic_map_phys_irq(vcpu,
+ fwd_irq->gsi + VGIC_NR_PRIVATE_IRQS,
+ hwirq);
+ } else {
+ irqd_clr_irq_forwarded(d);
+ raw_spin_unlock_irqrestore(&desc->lock, flags);
+ ret = vgic_unmap_phys_irq(vcpu,
+ fwd_irq->gsi +
+ VGIC_NR_PRIVATE_IRQS,
+ hwirq);
+ }
+ return ret;
+
+end:
+ raw_spin_unlock_irqrestore(&desc->lock, flags);
+ return ret;
+}
--
1.9.1

2014-11-23 18:38:26

by Eric Auger

[permalink] [raw]
Subject: [PATCH v3 7/8] KVM: kvm-vfio: generic forwarding control

This patch introduces a new KVM_DEV_VFIO_DEVICE group.

This is a new control channel which enables KVM to cooperate with
viable VFIO devices.

Functions are introduced to check the validity of a VFIO device
file descriptor, increment/decrement the ref counter of the VFIO
device.

The patch introduces 2 attributes for this new device group:
KVM_DEV_VFIO_DEVICE_FORWARD_IRQ, KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ.
Their purpose is to turn a VFIO device IRQ into a forwarded IRQ and
unset respectively unset the feature.

The VFIO device stores a list of registered forwarded IRQs. The reference
counter of the device is incremented each time a new IRQ is forwarded.
Reference counter is decremented when the IRQ forwarding is unset.

The forwarding programmming is architecture specific, implemented in
kvm_arch_set_fwd_state function. Architecture specific implementation is
enabled when __KVM_HAVE_ARCH_KVM_VFIO_FORWARD is set. When not set those
functions are void.

Signed-off-by: Eric Auger <[email protected]>

---

v2 -> v3:
- add API comments in kvm_host.h
- improve the commit message
- create a private kvm_vfio_fwd_irq struct
- fwd_irq_action replaced by a bool and removal of VFIO_IRQ_CLEANUP. This
latter action will be handled in vgic.
- add a vfio_device handle argument to kvm_arch_set_fwd_state. The goal is
to move platform specific stuff in architecture specific code.
- kvm_arch_set_fwd_state renamed into kvm_arch_vfio_set_forward
- increment the ref counter each time we do an IRQ forwarding and decrement
this latter each time one IRQ forward is unset. Simplifies the whole
ref counting.
- simplification of list handling: create, search, removal

v1 -> v2:
- __KVM_HAVE_ARCH_KVM_VFIO renamed into __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
- original patch file separated into 2 parts: generic part moved in vfio.c
and ARM specific part(kvm_arch_set_fwd_state)
---
include/linux/kvm_host.h | 28 ++++++
virt/kvm/vfio.c | 249 ++++++++++++++++++++++++++++++++++++++++++++++-
2 files changed, 274 insertions(+), 3 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index ea53b04..0b9659d 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1076,6 +1076,15 @@ struct kvm_device_ops {
unsigned long arg);
};

+/* internal self-contained structure describing a forwarded IRQ */
+struct kvm_fwd_irq {
+ struct kvm *kvm; /* VM to inject the GSI into */
+ struct vfio_device *vdev; /* vfio device the IRQ belongs to */
+ __u32 index; /* VFIO device IRQ index */
+ __u32 subindex; /* VFIO device IRQ subindex */
+ __u32 gsi; /* gsi, ie. virtual IRQ number */
+};
+
void kvm_device_get(struct kvm_device *dev);
void kvm_device_put(struct kvm_device *dev);
struct kvm_device *kvm_device_from_filp(struct file *filp);
@@ -1085,6 +1094,25 @@ void kvm_unregister_device_ops(u32 type);
extern struct kvm_device_ops kvm_mpic_ops;
extern struct kvm_device_ops kvm_xics_ops;

+#ifdef __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
+/**
+ * kvm_arch_vfio_set_forward - changes the forwarded state of an IRQ
+ *
+ * @fwd_irq: handle to the forwarded irq struct
+ * @forward: true means forwarded, false means not forwarded
+ * returns 0 on success, < 0 on failure
+ */
+int kvm_arch_vfio_set_forward(struct kvm_fwd_irq *fwd_irq,
+ bool forward);
+
+#else
+static inline int kvm_arch_vfio_set_forward(struct kvm_fwd_irq *fwd_irq,
+ bool forward)
+{
+ return 0;
+}
+#endif
+
#ifdef CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT

static inline void kvm_vcpu_set_in_spin_loop(struct kvm_vcpu *vcpu, bool val)
diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
index 6f0cc34..af178bb 100644
--- a/virt/kvm/vfio.c
+++ b/virt/kvm/vfio.c
@@ -25,8 +25,16 @@ struct kvm_vfio_group {
struct vfio_group *vfio_group;
};

+/* private linkable kvm_fwd_irq struct */
+struct kvm_vfio_fwd_irq_node {
+ struct list_head link;
+ struct kvm_fwd_irq fwd_irq;
+};
+
struct kvm_vfio {
struct list_head group_list;
+ /* list of registered VFIO forwarded IRQs */
+ struct list_head fwd_node_list;
struct mutex lock;
bool noncoherent;
};
@@ -247,12 +255,239 @@ static int kvm_vfio_set_group(struct kvm_device *dev, long attr, u64 arg)
return -ENXIO;
}

+/**
+ * kvm_vfio_get_vfio_device - Returns a handle to a vfio-device
+ *
+ * Checks it is a valid vfio device and increments its reference counter
+ * @fd: file descriptor of the vfio platform device
+ */
+static struct vfio_device *kvm_vfio_get_vfio_device(int fd)
+{
+ struct fd f = fdget(fd);
+ struct vfio_device *vdev;
+
+ if (!f.file)
+ return NULL;
+ vdev = kvm_vfio_device_get_external_user(f.file);
+ fdput(f);
+ return vdev;
+}
+
+/**
+ * kvm_vfio_put_vfio_device: decrements the reference counter of the
+ * vfio platform * device
+ *
+ * @vdev: vfio_device handle to release
+ */
+static void kvm_vfio_put_vfio_device(struct vfio_device *vdev)
+{
+ kvm_vfio_device_put_external_user(vdev);
+}
+
+/**
+ * kvm_vfio_find_fwd_irq - checks whether a forwarded IRQ already is
+ * registered in the list of forwarded IRQs
+ *
+ * @kv: handle to the kvm-vfio device
+ * @fwd: handle to the forwarded irq struct
+ * In the positive returns the handle to its node in the kvm-vfio
+ * forwarded IRQ list, returns NULL otherwise.
+ * Must be called with kv->lock hold.
+ */
+static struct kvm_vfio_fwd_irq_node *kvm_vfio_find_fwd_irq(
+ struct kvm_vfio *kv,
+ struct kvm_fwd_irq *fwd)
+{
+ struct kvm_vfio_fwd_irq_node *node;
+
+ list_for_each_entry(node, &kv->fwd_node_list, link) {
+ if ((node->fwd_irq.index == fwd->index) &&
+ (node->fwd_irq.subindex == fwd->subindex) &&
+ (node->fwd_irq.vdev == fwd->vdev))
+ return node;
+ }
+ return NULL;
+}
+/**
+ * kvm_vfio_register_fwd_irq - Allocates, populates and registers a
+ * forwarded IRQ
+ *
+ * @kv: handle to the kvm-vfio device
+ * @fwd: handle to the forwarded irq struct
+ * In case of success returns a handle to the new list node,
+ * NULL otherwise.
+ * Must be called with kv->lock hold.
+ */
+static struct kvm_vfio_fwd_irq_node *kvm_vfio_register_fwd_irq(
+ struct kvm_vfio *kv,
+ struct kvm_fwd_irq *fwd)
+{
+ struct kvm_vfio_fwd_irq_node *node;
+
+ node = kmalloc(sizeof(*node), GFP_KERNEL);
+ if (!node)
+ return NULL;
+
+ node->fwd_irq = *fwd;
+
+ list_add(&node->link, &kv->fwd_node_list);
+
+ return node;
+}
+
+/**
+ * kvm_vfio_unregister_fwd_irq - unregisters and frees a forwarded IRQ
+ *
+ * @node: handle to the node struct
+ * Must be called with kv->lock hold.
+ */
+static void kvm_vfio_unregister_fwd_irq(struct kvm_vfio_fwd_irq_node *node)
+{
+ list_del(&node->link);
+ kfree(node);
+}
+
+/**
+ * kvm_vfio_set_forward - turns a VFIO device IRQ into a forwarded IRQ
+ * @kv: handle to the kvm-vfio device
+ * @fd: file descriptor of the vfio device the IRQ belongs to
+ * @fwd: handle to the forwarded irq struct
+ *
+ * Registers an IRQ as forwarded and calls the architecture specific
+ * implementation of set_forward. In case of operation failure, the IRQ
+ * is unregistered. In case of success, the vfio device ref counter is
+ * incremented.
+ */
+static int kvm_vfio_set_forward(struct kvm_vfio *kv, int fd,
+ struct kvm_fwd_irq *fwd)
+{
+ int ret;
+ struct kvm_vfio_fwd_irq_node *node =
+ kvm_vfio_find_fwd_irq(kv, fwd);
+
+ if (node)
+ return -EINVAL;
+ node = kvm_vfio_register_fwd_irq(kv, fwd);
+ if (!node)
+ return -ENOMEM;
+ ret = kvm_arch_vfio_set_forward(fwd, true);
+ if (ret < 0) {
+ kvm_vfio_unregister_fwd_irq(node);
+ return ret;
+ }
+ /* increment the ref counter */
+ kvm_vfio_get_vfio_device(fd);
+ return ret;
+}
+
+/**
+ * kvm_vfio_unset_forward - Sets a VFIO device IRQ as non forwarded
+ * @kv: handle to the kvm-vfio device
+ * @fwd: handle to the forwarded irq struct
+ *
+ * Calls the architecture specific implementation of set_forward and
+ * unregisters the IRQ from the forwarded IRQ list. Decrements the vfio
+ * device reference counter.
+ */
+static int kvm_vfio_unset_forward(struct kvm_vfio *kv,
+ struct kvm_fwd_irq *fwd)
+{
+ int ret;
+ struct kvm_vfio_fwd_irq_node *node =
+ kvm_vfio_find_fwd_irq(kv, fwd);
+ if (!node)
+ return -EINVAL;
+ ret = kvm_arch_vfio_set_forward(fwd, false);
+ kvm_vfio_unregister_fwd_irq(node);
+
+ /* decrement the ref counter */
+ kvm_vfio_put_vfio_device(fwd->vdev);
+ return ret;
+}
+
+static int kvm_vfio_control_irq_forward(struct kvm_device *kdev, long attr,
+ int32_t __user *argp)
+{
+ struct kvm_arch_forwarded_irq user_fwd_irq;
+ struct kvm_fwd_irq fwd;
+ struct vfio_device *vdev;
+ struct kvm_vfio *kv = kdev->private;
+ int ret;
+
+ if (copy_from_user(&user_fwd_irq, argp, sizeof(user_fwd_irq)))
+ return -EFAULT;
+
+ vdev = kvm_vfio_get_vfio_device(user_fwd_irq.fd);
+ if (IS_ERR(vdev)) {
+ ret = PTR_ERR(vdev);
+ goto out;
+ }
+
+ fwd.vdev = vdev;
+ fwd.kvm = kdev->kvm;
+ fwd.index = user_fwd_irq.index;
+ fwd.subindex = user_fwd_irq.subindex;
+ fwd.gsi = user_fwd_irq.gsi;
+
+ switch (attr) {
+ case KVM_DEV_VFIO_DEVICE_FORWARD_IRQ:
+ mutex_lock(&kv->lock);
+ ret = kvm_vfio_set_forward(kv, user_fwd_irq.fd, &fwd);
+ mutex_unlock(&kv->lock);
+ break;
+ case KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ:
+ mutex_lock(&kv->lock);
+ ret = kvm_vfio_unset_forward(kv, &fwd);
+ mutex_unlock(&kv->lock);
+ break;
+ }
+out:
+ kvm_vfio_put_vfio_device(vdev);
+ return ret;
+}
+
+static int kvm_vfio_set_device(struct kvm_device *kdev, long attr, u64 arg)
+{
+ int32_t __user *argp = (int32_t __user *)(unsigned long)arg;
+ int ret;
+
+ switch (attr) {
+ case KVM_DEV_VFIO_DEVICE_FORWARD_IRQ:
+ case KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ:
+ ret = kvm_vfio_control_irq_forward(kdev, attr, argp);
+ break;
+ default:
+ ret = -ENXIO;
+ }
+ return ret;
+}
+
+/**
+ * kvm_vfio_clean_fwd_irq - Unset forwarding state of all
+ * registered forwarded IRQs and free their list nodes.
+ * @kv: kvm-vfio device
+ *
+ * Loop on all registered device/IRQ combos, reset the non forwarded state,
+ * void the lists and release the reference
+ */
+static int kvm_vfio_clean_fwd_irq(struct kvm_vfio *kv)
+{
+ struct kvm_vfio_fwd_irq_node *node, *tmp;
+
+ list_for_each_entry_safe(node, tmp, &kv->fwd_node_list, link) {
+ kvm_vfio_unset_forward(kv, &node->fwd_irq);
+ }
+ return 0;
+}
+
static int kvm_vfio_set_attr(struct kvm_device *dev,
struct kvm_device_attr *attr)
{
switch (attr->group) {
case KVM_DEV_VFIO_GROUP:
return kvm_vfio_set_group(dev, attr->attr, attr->addr);
+ case KVM_DEV_VFIO_DEVICE:
+ return kvm_vfio_set_device(dev, attr->attr, attr->addr);
}

return -ENXIO;
@@ -268,10 +503,17 @@ static int kvm_vfio_has_attr(struct kvm_device *dev,
case KVM_DEV_VFIO_GROUP_DEL:
return 0;
}
-
break;
+#ifdef __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
+ case KVM_DEV_VFIO_DEVICE:
+ switch (attr->attr) {
+ case KVM_DEV_VFIO_DEVICE_FORWARD_IRQ:
+ case KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ:
+ return 0;
+ }
+ break;
+#endif
}
-
return -ENXIO;
}

@@ -285,7 +527,7 @@ static void kvm_vfio_destroy(struct kvm_device *dev)
list_del(&kvg->node);
kfree(kvg);
}
-
+ kvm_vfio_clean_fwd_irq(kv);
kvm_vfio_update_coherency(dev);

kfree(kv);
@@ -317,6 +559,7 @@ static int kvm_vfio_create(struct kvm_device *dev, u32 type)
return -ENOMEM;

INIT_LIST_HEAD(&kv->group_list);
+ INIT_LIST_HEAD(&kv->fwd_node_list);
mutex_init(&kv->lock);

dev->private = kv;
--
1.9.1

2014-11-23 18:39:28

by Eric Auger

[permalink] [raw]
Subject: [PATCH v3 4/8] KVM: kvm-vfio: User API for IRQ forwarding

This patch adds and document a new KVM_DEV_VFIO_DEVICE group
and 2 device attributes: KVM_DEV_VFIO_DEVICE_FORWARD_IRQ,
KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ. The purpose is to be able
to set a VFIO device IRQ as forwarded or not forwarded.
the command takes as argument a handle to a new struct named
kvm_arch_forwarded_irq.

Signed-off-by: Eric Auger <[email protected]>

---

v2 -> v3:
- rework vfio kvm device documentation
- reword commit message and title
- add subindex in kvm_arch_forwarded_irq to be closer to VFIO API
- forwarding state can only be changed with VFIO IRQ signaling is off

v1 -> v2:
- struct kvm_arch_forwarded_irq moved from arch/arm/include/uapi/asm/kvm.h
to include/uapi/linux/kvm.h
also irq_index renamed into index and guest_irq renamed into gsi
- ASSIGN/DEASSIGN renamed into FORWARD/UNFORWARD
---
Documentation/virtual/kvm/devices/vfio.txt | 34 ++++++++++++++++++++++++------
include/uapi/linux/kvm.h | 10 +++++++++
2 files changed, 38 insertions(+), 6 deletions(-)

diff --git a/Documentation/virtual/kvm/devices/vfio.txt b/Documentation/virtual/kvm/devices/vfio.txt
index ef51740..f7aff29 100644
--- a/Documentation/virtual/kvm/devices/vfio.txt
+++ b/Documentation/virtual/kvm/devices/vfio.txt
@@ -4,15 +4,24 @@ VFIO virtual device
Device types supported:
KVM_DEV_TYPE_VFIO

-Only one VFIO instance may be created per VM. The created device
-tracks VFIO groups in use by the VM and features of those groups
-important to the correctness and acceleration of the VM. As groups
-are enabled and disabled for use by the VM, KVM should be updated
-about their presence. When registered with KVM, a reference to the
-VFIO-group is held by KVM.
+Only one VFIO instance may be created per VM.
+
+The created device tracks VFIO groups in use by the VM and features
+of those groups important to the correctness and acceleration of
+the VM. As groups are enabled and disabled for use by the VM, KVM
+should be updated about their presence. When registered with KVM,
+a reference to the VFIO-group is held by KVM.
+
+The device also tracks & enable VFIO device forwarded IRQs, if any.
+A physical forwarded IRQ is directly completed by the guest. This
+requires HW support in the interrupt controller which must be able
+to automatically complete the physical IRQ when it detects the guest
+has completed the corresponding virtual IRQ. The modality sometimes
+is named direct EOI.

Groups:
KVM_DEV_VFIO_GROUP
+ KVM_DEV_VFIO_DEVICE

KVM_DEV_VFIO_GROUP attributes:
KVM_DEV_VFIO_GROUP_ADD: Add a VFIO group to VFIO-KVM device tracking
@@ -20,3 +29,16 @@ KVM_DEV_VFIO_GROUP attributes:

For each, kvm_device_attr.addr points to an int32_t file descriptor
for the VFIO group.
+
+KVM_DEV_VFIO_DEVICE attributes:
+ KVM_DEV_VFIO_DEVICE_FORWARD_IRQ: set a VFIO device IRQ as forwarded
+ KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ: set a VFIO device IRQ as not forwarded
+
+For each, kvm_device_attr.addr points to a kvm_arch_forwarded_irq struct.
+
+The forwarded state can only be changed when the VFIO signaling mechanism
+for this physical IRQ is not set. In other words, forwarding must be
+activated before VFIO_DEVICE_SET_IRQS has been called to trigger the IRQ
+or associate an eventfd to it. Unforwarding can only be called while the
+signaling has been disabled with VFIO_DEVICE_SET_IRQS. If this condition is
+not satisfied, the command returns an -EBUSY.
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 6076882..a269a42 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -946,6 +946,9 @@ struct kvm_device_attr {
#define KVM_DEV_VFIO_GROUP 1
#define KVM_DEV_VFIO_GROUP_ADD 1
#define KVM_DEV_VFIO_GROUP_DEL 2
+#define KVM_DEV_VFIO_DEVICE 2
+#define KVM_DEV_VFIO_DEVICE_FORWARD_IRQ 1
+#define KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ 2

enum kvm_device_type {
KVM_DEV_TYPE_FSL_MPIC_20 = 1,
@@ -963,6 +966,13 @@ enum kvm_device_type {
KVM_DEV_TYPE_MAX,
};

+struct kvm_arch_forwarded_irq {
+ __u32 fd; /* file desciptor of the VFIO device */
+ __u32 index; /* VFIO device IRQ index */
+ __u32 subindex; /* VFIO device IRQ subindex */
+ __u32 gsi; /* gsi, ie. virtual IRQ number */
+};
+
/*
* ioctls for VM fds
*/
--
1.9.1

2014-11-23 18:39:51

by Eric Auger

[permalink] [raw]
Subject: [PATCH v3 3/8] VFIO: platform: forwarded state tested when selecting IRQ handler

In case the IRQ is forwarded, the VFIO platform IRQ handler does not
need to disable the IRQ anymore.

When setting the IRQ handler we now also test the forwarded state. In
case the IRQ is forwarded we select the edge handler (no automaske).

Signed-off-by: Eric Auger <[email protected]>

---

v2 -> v3:
- forwarded state was tested in the handler. Now the forwarded state
is tested before setting the handler. This definitively limits
the dynamics of forwarded state changes but I don't think there is
a use case where we need to be able to change the state at any time.
---
drivers/vfio/platform/vfio_platform_irq.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/vfio/platform/vfio_platform_irq.c b/drivers/vfio/platform/vfio_platform_irq.c
index 08d400e..61a2920 100644
--- a/drivers/vfio/platform/vfio_platform_irq.c
+++ b/drivers/vfio/platform/vfio_platform_irq.c
@@ -230,8 +230,13 @@ static int vfio_platform_set_irq_trigger(struct vfio_platform_device *vdev,
{
struct vfio_platform_irq *irq = &vdev->irqs[index];
irq_handler_t handler;
+ struct irq_data *d;
+ bool is_forwarded;

- if (vdev->irqs[index].flags & VFIO_IRQ_INFO_MASKABLE)
+ d = irq_get_irq_data(irq->hwirq);
+ is_forwarded = irqd_irq_forwarded(d);
+
+ if ((vdev->irqs[index].flags & VFIO_IRQ_INFO_MASKABLE) && !is_forwarded)
handler = vfio_maskable_irq_handler;
else
handler = vfio_irq_handler;
--
1.9.1

2014-11-24 08:16:33

by Wu, Feng

[permalink] [raw]
Subject: RE: [PATCH v3 0/8] KVM-VFIO IRQ forward control



> -----Original Message-----
> From: Eric Auger [mailto:[email protected]]
> Sent: Monday, November 24, 2014 2:36 AM
> To: [email protected]; [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]
> Cc: [email protected]; [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]; [email protected]; Wu, Feng
> Subject: [PATCH v3 0/8] KVM-VFIO IRQ forward control
>
> This series proposes an integration of "ARM: Forwarding physical
> interrupts to a guest VM" (http://lwn.net/Articles/603514/) in
> KVM.
>
> It enables to transform a VFIO platform driver IRQ into a forwarded
> IRQ.
>
> When a physical IRQ is forwarded (to a guest), the host does not
> deactivates this latter. Completion ownership is transferred to the
> guest. When the guest deactivates the associated virtual IRQ,
> the interrupt controler automatically completes the physical IRQ.
> Obviously this requires some dedicated HW support in the interrupt
> controler.
>
> The direct benefit is that, for a level sensitive IRQ, it avoids a
> VM exit on forwarded IRQ completion.
>
> When the IRQ is forwarded, the VFIO platform driver does not need to
> mask the physical IRQ anymore before signaling the eventfd. Indeed
> genirq lowers the running priority, enabling other physical IRQ to hit
> except that one.
>
> Besides, the injection still is based on irqfd triggering. The only
> impact on irqfd process is resamplefd is not called anymore on
> virtual IRQ completion since this latter becomes "transparent".
>
> The current integration is based on an extension of the KVM-VFIO
> device, previously used by KVM to interact with VFIO groups. The
> patch series now enables KVM to directly interact with a VFIO
> platform device. The VFIO external API was extended for that purpose.
>
> Th KVM-VFIO device can get/put the vfio platform device, check its
> integrity and type, get the IRQ number associated to an IRQ index.
>
> The IRQ forward programming is architecture specific (virtual interrupt
> controller programming basically). However the whole infrastructure is
> kept generic.
>
> from a user point of view, the functionality is provided through a
> new KVM-VFIO group named KVM_DEV_VFIO_DEVICE and 2 associated
> attributes:
> - KVM_DEV_VFIO_DEVICE_FORWARD_IRQ,
> - KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ.
>
> The capability can be checked with KVM_HAS_DEVICE_ATTR.
>
> Forwarding must be activated before VFIO signaling mechanism is set
> using VFIO_DEVICE_SET_IRQS and unset while the signaling is disabled.
>
> ---
>
> This patch series has the following dependencies:
> - "ARM: Forwarding physical interrupts to a guest VM"
> (http://lwn.net/Articles/603514/)
> - [PATCH v9 00/19] VFIO support for platform and AMBA devices on ARM
> (http://www.spinics.net/lists/kvm-arm/msg11745.html)
> - [PATCH v2 0/6] vfio: type1: support for ARM SMMUS with
> VFIO_IOMMU_TYPE1
> (http://www.spinics.net/lists/kvm-arm/msg11738.html)
>
> Integrated pieces can be found at
> ssh://git.linaro.org/people/eric.auger/linux.git
> on branch irqfd_integ_v8
>
> This was was tested on Calxeda Midway, assigning the xgmac main IRQ.
>

Hi Eric,

Did you send out the latest QEMU part for this patch set, I notice that v6 of
The QEMU part is sent out, but seems some structure in this new version
has been changed, such as, struct kvm_arch_forwarded_irq (subindex is added
in this version), so a new patchset in QEMU is also needed.

Thanks,
Feng


> v2 -> v3:
> - kvm_fwd_irq_action enum replaced by a bool (KVM_VFIO_IRQ_CLEANUP does
> not
> exist anymore)
> - a new struct local to vfio.c was introduced to wrap kvm_fw_irq and make it
> linkable: kvm_vfio_fwd_irq_node
> - kvm_fwd_irq now is self-contained (includes struct vfio_device *)
> - a single list of kvm_vfio_fwd_irq_irq_node is used instead of having
> a list of devices and a list of forward irq per device. Having 2 lists
> brought extra complexity.
> - the VFIO device ref counter is incremented each time a new IRQ is forwarded.
> It is not attempted anymore to hold a single reference whatever the number
> of forwarded IRQs.
> - subindex added on top of index to be closer to VFIO API
> - platform device check moved in the arm specific implementation
> - enable the KVM-VFIO device for arm64
> - forwarded state change only can happen while the VFIO IRQ handler is not
> set; in other words, when the VFIO IRQ signaling is not set.
>
> v1 -> v2:
> - forward control is moved from architecture specific file into generic
> vfio.c module.
> only kvm_arch_set_fwd_state remains architecture specific
> - integrate Kim's patch which enables KVM-VFIO for ARM
> - fix vgic state bypass in vgic_queue_hwirq
> - struct kvm_arch_forwarded_irq moved from
> arch/arm/include/uapi/asm/kvm.h
> to include/uapi/linux/kvm.h
> also irq_index renamed into index and guest_irq renamed into gsi
> - ASSIGN/DEASSIGN renamed into FORWARD/UNFORWARD
> - vfio_external_get_base_device renamed into vfio_external_base_device
> - vfio_external_get_type removed
> - kvm_vfio_external_get_base_device renamed into
> kvm_vfio_external_base_device
> - __KVM_HAVE_ARCH_KVM_VFIO renamed into
> __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
>
>
> Eric Auger (7):
> KVM: arm64: Enable the KVM-VFIO device
> VFIO: platform: forwarded state tested when selecting IRQ handler
> KVM: kvm-vfio: User API for IRQ forwarding
> VFIO: External user API device helpers
> KVM: kvm-vfio: wrapper to VFIO external API device helpers
> KVM: kvm-vfio: generic forwarding control
> KVM: arm: kvm-vfio: forwarding control
>
> Kim Phillips (1):
> KVM: arm: Enable the KVM-VFIO device
>
> Documentation/virtual/kvm/devices/vfio.txt | 34 +++-
> arch/arm/include/asm/kvm_host.h | 7 +
> arch/arm/kvm/Kconfig | 1 +
> arch/arm/kvm/Makefile | 4 +-
> arch/arm/kvm/kvm_vfio_arm.c | 101 ++++++++++
> arch/arm64/kvm/Kconfig | 1 +
> arch/arm64/kvm/Makefile | 2 +-
> drivers/vfio/platform/vfio_platform_irq.c | 7 +-
> drivers/vfio/vfio.c | 24 +++
> include/linux/kvm_host.h | 28 +++
> include/linux/vfio.h | 3 +
> include/uapi/linux/kvm.h | 10 +
> virt/kvm/vfio.c | 294
> ++++++++++++++++++++++++++++-
> 13 files changed, 503 insertions(+), 13 deletions(-)
> create mode 100644 arch/arm/kvm/kvm_vfio_arm.c
>
> --
> 1.9.1

2014-11-24 08:28:48

by Eric Auger

[permalink] [raw]
Subject: Re: [PATCH v3 0/8] KVM-VFIO IRQ forward control

On 11/24/2014 09:14 AM, Wu, Feng wrote:
>
>
>> -----Original Message-----
>> From: Eric Auger [mailto:[email protected]]
>> Sent: Monday, November 24, 2014 2:36 AM
>> To: [email protected]; [email protected]; [email protected];
>> [email protected]; [email protected];
>> [email protected]; [email protected];
>> [email protected]; [email protected];
>> [email protected]; [email protected]; [email protected];
>> [email protected]; [email protected]
>> Cc: [email protected]; [email protected]; [email protected];
>> [email protected]; [email protected];
>> [email protected]; [email protected]; Wu, Feng
>> Subject: [PATCH v3 0/8] KVM-VFIO IRQ forward control
>>
>> This series proposes an integration of "ARM: Forwarding physical
>> interrupts to a guest VM" (http://lwn.net/Articles/603514/) in
>> KVM.
>>
>> It enables to transform a VFIO platform driver IRQ into a forwarded
>> IRQ.
>>
>> When a physical IRQ is forwarded (to a guest), the host does not
>> deactivates this latter. Completion ownership is transferred to the
>> guest. When the guest deactivates the associated virtual IRQ,
>> the interrupt controler automatically completes the physical IRQ.
>> Obviously this requires some dedicated HW support in the interrupt
>> controler.
>>
>> The direct benefit is that, for a level sensitive IRQ, it avoids a
>> VM exit on forwarded IRQ completion.
>>
>> When the IRQ is forwarded, the VFIO platform driver does not need to
>> mask the physical IRQ anymore before signaling the eventfd. Indeed
>> genirq lowers the running priority, enabling other physical IRQ to hit
>> except that one.
>>
>> Besides, the injection still is based on irqfd triggering. The only
>> impact on irqfd process is resamplefd is not called anymore on
>> virtual IRQ completion since this latter becomes "transparent".
>>
>> The current integration is based on an extension of the KVM-VFIO
>> device, previously used by KVM to interact with VFIO groups. The
>> patch series now enables KVM to directly interact with a VFIO
>> platform device. The VFIO external API was extended for that purpose.
>>
>> Th KVM-VFIO device can get/put the vfio platform device, check its
>> integrity and type, get the IRQ number associated to an IRQ index.
>>
>> The IRQ forward programming is architecture specific (virtual interrupt
>> controller programming basically). However the whole infrastructure is
>> kept generic.
>>
>> from a user point of view, the functionality is provided through a
>> new KVM-VFIO group named KVM_DEV_VFIO_DEVICE and 2 associated
>> attributes:
>> - KVM_DEV_VFIO_DEVICE_FORWARD_IRQ,
>> - KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ.
>>
>> The capability can be checked with KVM_HAS_DEVICE_ATTR.
>>
>> Forwarding must be activated before VFIO signaling mechanism is set
>> using VFIO_DEVICE_SET_IRQS and unset while the signaling is disabled.
>>
>> ---
>>
>> This patch series has the following dependencies:
>> - "ARM: Forwarding physical interrupts to a guest VM"
>> (http://lwn.net/Articles/603514/)
>> - [PATCH v9 00/19] VFIO support for platform and AMBA devices on ARM
>> (http://www.spinics.net/lists/kvm-arm/msg11745.html)
>> - [PATCH v2 0/6] vfio: type1: support for ARM SMMUS with
>> VFIO_IOMMU_TYPE1
>> (http://www.spinics.net/lists/kvm-arm/msg11738.html)
>>
>> Integrated pieces can be found at
>> ssh://git.linaro.org/people/eric.auger/linux.git
>> on branch irqfd_integ_v8
>>
>> This was was tested on Calxeda Midway, assigning the xgmac main IRQ.
>>
>
> Hi Eric,
>
> Did you send out the latest QEMU part for this patch set, I notice that v6 of
> The QEMU part is sent out, but seems some structure in this new version
> has been changed, such as, struct kvm_arch_forwarded_irq (subindex is added
> in this version), so a new patchset in QEMU is also needed.

Hi Feng,

v7 is available at:
http://lists.gnu.org/archive/html/qemu-devel/2014-10/msg03804.html. It
already illustrates KVM-VFIO device usage.

v8 which will indeed integrate subindex addition will be delivered this
week.

Best Regards

Eric
>
> Thanks,
> Feng
>
>
>> v2 -> v3:
>> - kvm_fwd_irq_action enum replaced by a bool (KVM_VFIO_IRQ_CLEANUP does
>> not
>> exist anymore)
>> - a new struct local to vfio.c was introduced to wrap kvm_fw_irq and make it
>> linkable: kvm_vfio_fwd_irq_node
>> - kvm_fwd_irq now is self-contained (includes struct vfio_device *)
>> - a single list of kvm_vfio_fwd_irq_irq_node is used instead of having
>> a list of devices and a list of forward irq per device. Having 2 lists
>> brought extra complexity.
>> - the VFIO device ref counter is incremented each time a new IRQ is forwarded.
>> It is not attempted anymore to hold a single reference whatever the number
>> of forwarded IRQs.
>> - subindex added on top of index to be closer to VFIO API
>> - platform device check moved in the arm specific implementation
>> - enable the KVM-VFIO device for arm64
>> - forwarded state change only can happen while the VFIO IRQ handler is not
>> set; in other words, when the VFIO IRQ signaling is not set.
>>
>> v1 -> v2:
>> - forward control is moved from architecture specific file into generic
>> vfio.c module.
>> only kvm_arch_set_fwd_state remains architecture specific
>> - integrate Kim's patch which enables KVM-VFIO for ARM
>> - fix vgic state bypass in vgic_queue_hwirq
>> - struct kvm_arch_forwarded_irq moved from
>> arch/arm/include/uapi/asm/kvm.h
>> to include/uapi/linux/kvm.h
>> also irq_index renamed into index and guest_irq renamed into gsi
>> - ASSIGN/DEASSIGN renamed into FORWARD/UNFORWARD
>> - vfio_external_get_base_device renamed into vfio_external_base_device
>> - vfio_external_get_type removed
>> - kvm_vfio_external_get_base_device renamed into
>> kvm_vfio_external_base_device
>> - __KVM_HAVE_ARCH_KVM_VFIO renamed into
>> __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
>>
>>
>> Eric Auger (7):
>> KVM: arm64: Enable the KVM-VFIO device
>> VFIO: platform: forwarded state tested when selecting IRQ handler
>> KVM: kvm-vfio: User API for IRQ forwarding
>> VFIO: External user API device helpers
>> KVM: kvm-vfio: wrapper to VFIO external API device helpers
>> KVM: kvm-vfio: generic forwarding control
>> KVM: arm: kvm-vfio: forwarding control
>>
>> Kim Phillips (1):
>> KVM: arm: Enable the KVM-VFIO device
>>
>> Documentation/virtual/kvm/devices/vfio.txt | 34 +++-
>> arch/arm/include/asm/kvm_host.h | 7 +
>> arch/arm/kvm/Kconfig | 1 +
>> arch/arm/kvm/Makefile | 4 +-
>> arch/arm/kvm/kvm_vfio_arm.c | 101 ++++++++++
>> arch/arm64/kvm/Kconfig | 1 +
>> arch/arm64/kvm/Makefile | 2 +-
>> drivers/vfio/platform/vfio_platform_irq.c | 7 +-
>> drivers/vfio/vfio.c | 24 +++
>> include/linux/kvm_host.h | 28 +++
>> include/linux/vfio.h | 3 +
>> include/uapi/linux/kvm.h | 10 +
>> virt/kvm/vfio.c | 294
>> ++++++++++++++++++++++++++++-
>> 13 files changed, 503 insertions(+), 13 deletions(-)
>> create mode 100644 arch/arm/kvm/kvm_vfio_arm.c
>>
>> --
>> 1.9.1
>

2014-11-24 08:35:04

by Wu, Feng

[permalink] [raw]
Subject: RE: [PATCH v3 0/8] KVM-VFIO IRQ forward control



> -----Original Message-----
> From: Eric Auger [mailto:[email protected]]
> Sent: Monday, November 24, 2014 4:27 PM
> To: Wu, Feng; [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]
> Cc: [email protected]; [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]; [email protected]
> Subject: Re: [PATCH v3 0/8] KVM-VFIO IRQ forward control
>
> On 11/24/2014 09:14 AM, Wu, Feng wrote:
> >
> >
> >> -----Original Message-----
> >> From: Eric Auger [mailto:[email protected]]
> >> Sent: Monday, November 24, 2014 2:36 AM
> >> To: [email protected]; [email protected]; [email protected];
> >> [email protected]; [email protected];
> >> [email protected]; [email protected];
> >> [email protected]; [email protected];
> >> [email protected]; [email protected]; [email protected];
> >> [email protected]; [email protected]
> >> Cc: [email protected]; [email protected];
> [email protected];
> >> [email protected]; [email protected];
> >> [email protected]; [email protected]; Wu, Feng
> >> Subject: [PATCH v3 0/8] KVM-VFIO IRQ forward control
> >>
> >> This series proposes an integration of "ARM: Forwarding physical
> >> interrupts to a guest VM" (http://lwn.net/Articles/603514/) in
> >> KVM.
> >>
> >> It enables to transform a VFIO platform driver IRQ into a forwarded
> >> IRQ.
> >>
> >> When a physical IRQ is forwarded (to a guest), the host does not
> >> deactivates this latter. Completion ownership is transferred to the
> >> guest. When the guest deactivates the associated virtual IRQ,
> >> the interrupt controler automatically completes the physical IRQ.
> >> Obviously this requires some dedicated HW support in the interrupt
> >> controler.
> >>
> >> The direct benefit is that, for a level sensitive IRQ, it avoids a
> >> VM exit on forwarded IRQ completion.
> >>
> >> When the IRQ is forwarded, the VFIO platform driver does not need to
> >> mask the physical IRQ anymore before signaling the eventfd. Indeed
> >> genirq lowers the running priority, enabling other physical IRQ to hit
> >> except that one.
> >>
> >> Besides, the injection still is based on irqfd triggering. The only
> >> impact on irqfd process is resamplefd is not called anymore on
> >> virtual IRQ completion since this latter becomes "transparent".
> >>
> >> The current integration is based on an extension of the KVM-VFIO
> >> device, previously used by KVM to interact with VFIO groups. The
> >> patch series now enables KVM to directly interact with a VFIO
> >> platform device. The VFIO external API was extended for that purpose.
> >>
> >> Th KVM-VFIO device can get/put the vfio platform device, check its
> >> integrity and type, get the IRQ number associated to an IRQ index.
> >>
> >> The IRQ forward programming is architecture specific (virtual interrupt
> >> controller programming basically). However the whole infrastructure is
> >> kept generic.
> >>
> >> from a user point of view, the functionality is provided through a
> >> new KVM-VFIO group named KVM_DEV_VFIO_DEVICE and 2 associated
> >> attributes:
> >> - KVM_DEV_VFIO_DEVICE_FORWARD_IRQ,
> >> - KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ.
> >>
> >> The capability can be checked with KVM_HAS_DEVICE_ATTR.
> >>
> >> Forwarding must be activated before VFIO signaling mechanism is set
> >> using VFIO_DEVICE_SET_IRQS and unset while the signaling is disabled.
> >>
> >> ---
> >>
> >> This patch series has the following dependencies:
> >> - "ARM: Forwarding physical interrupts to a guest VM"
> >> (http://lwn.net/Articles/603514/)
> >> - [PATCH v9 00/19] VFIO support for platform and AMBA devices on ARM
> >> (http://www.spinics.net/lists/kvm-arm/msg11745.html)
> >> - [PATCH v2 0/6] vfio: type1: support for ARM SMMUS with
> >> VFIO_IOMMU_TYPE1
> >> (http://www.spinics.net/lists/kvm-arm/msg11738.html)
> >>
> >> Integrated pieces can be found at
> >> ssh://git.linaro.org/people/eric.auger/linux.git
> >> on branch irqfd_integ_v8
> >>
> >> This was was tested on Calxeda Midway, assigning the xgmac main IRQ.
> >>
> >
> > Hi Eric,
> >
> > Did you send out the latest QEMU part for this patch set, I notice that v6 of
> > The QEMU part is sent out, but seems some structure in this new version
> > has been changed, such as, struct kvm_arch_forwarded_irq (subindex is
> added
> > in this version), so a new patchset in QEMU is also needed.
>
> Hi Feng,
>
> v7 is available at:
> http://lists.gnu.org/archive/html/qemu-devel/2014-10/msg03804.html. It
> already illustrates KVM-VFIO device usage.
>
> v8 which will indeed integrate subindex addition will be delivered this
> week.
>
> Best Regards
>
> Eric

Thanks a lot for the information, Eric!

Thanks,
Feng

> >
> > Thanks,
> > Feng
> >
> >
> >> v2 -> v3:
> >> - kvm_fwd_irq_action enum replaced by a bool (KVM_VFIO_IRQ_CLEANUP
> does
> >> not
> >> exist anymore)
> >> - a new struct local to vfio.c was introduced to wrap kvm_fw_irq and make it
> >> linkable: kvm_vfio_fwd_irq_node
> >> - kvm_fwd_irq now is self-contained (includes struct vfio_device *)
> >> - a single list of kvm_vfio_fwd_irq_irq_node is used instead of having
> >> a list of devices and a list of forward irq per device. Having 2 lists
> >> brought extra complexity.
> >> - the VFIO device ref counter is incremented each time a new IRQ is
> forwarded.
> >> It is not attempted anymore to hold a single reference whatever the
> number
> >> of forwarded IRQs.
> >> - subindex added on top of index to be closer to VFIO API
> >> - platform device check moved in the arm specific implementation
> >> - enable the KVM-VFIO device for arm64
> >> - forwarded state change only can happen while the VFIO IRQ handler is not
> >> set; in other words, when the VFIO IRQ signaling is not set.
> >>
> >> v1 -> v2:
> >> - forward control is moved from architecture specific file into generic
> >> vfio.c module.
> >> only kvm_arch_set_fwd_state remains architecture specific
> >> - integrate Kim's patch which enables KVM-VFIO for ARM
> >> - fix vgic state bypass in vgic_queue_hwirq
> >> - struct kvm_arch_forwarded_irq moved from
> >> arch/arm/include/uapi/asm/kvm.h
> >> to include/uapi/linux/kvm.h
> >> also irq_index renamed into index and guest_irq renamed into gsi
> >> - ASSIGN/DEASSIGN renamed into FORWARD/UNFORWARD
> >> - vfio_external_get_base_device renamed into vfio_external_base_device
> >> - vfio_external_get_type removed
> >> - kvm_vfio_external_get_base_device renamed into
> >> kvm_vfio_external_base_device
> >> - __KVM_HAVE_ARCH_KVM_VFIO renamed into
> >> __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
> >>
> >>
> >> Eric Auger (7):
> >> KVM: arm64: Enable the KVM-VFIO device
> >> VFIO: platform: forwarded state tested when selecting IRQ handler
> >> KVM: kvm-vfio: User API for IRQ forwarding
> >> VFIO: External user API device helpers
> >> KVM: kvm-vfio: wrapper to VFIO external API device helpers
> >> KVM: kvm-vfio: generic forwarding control
> >> KVM: arm: kvm-vfio: forwarding control
> >>
> >> Kim Phillips (1):
> >> KVM: arm: Enable the KVM-VFIO device
> >>
> >> Documentation/virtual/kvm/devices/vfio.txt | 34 +++-
> >> arch/arm/include/asm/kvm_host.h | 7 +
> >> arch/arm/kvm/Kconfig | 1 +
> >> arch/arm/kvm/Makefile | 4 +-
> >> arch/arm/kvm/kvm_vfio_arm.c | 101 ++++++++++
> >> arch/arm64/kvm/Kconfig | 1 +
> >> arch/arm64/kvm/Makefile | 2 +-
> >> drivers/vfio/platform/vfio_platform_irq.c | 7 +-
> >> drivers/vfio/vfio.c | 24 +++
> >> include/linux/kvm_host.h | 28 +++
> >> include/linux/vfio.h | 3 +
> >> include/uapi/linux/kvm.h | 10 +
> >> virt/kvm/vfio.c | 294
> >> ++++++++++++++++++++++++++++-
> >> 13 files changed, 503 insertions(+), 13 deletions(-)
> >> create mode 100644 arch/arm/kvm/kvm_vfio_arm.c
> >>
> >> --
> >> 1.9.1
> >

2014-11-24 20:57:01

by Alex Williamson

[permalink] [raw]
Subject: Re: [PATCH v3 6/8] KVM: kvm-vfio: wrapper to VFIO external API device helpers

On Sun, 2014-11-23 at 19:35 +0100, Eric Auger wrote:
> Provide wrapper functions that allow KVM-VFIO device code to
> interact with a vfio device:
> - kvm_vfio_device_get_external_user gets a handle to a struct
> vfio_device from the vfio device file descriptor and increments
> its reference counter,
> - kvm_vfio_device_put_external_user decrements the reference counter
> to a vfio device,
> - kvm_vfio_external_base_device returns a handle to the struct device
> of the vfio device.
>
> The KVM-VFIO device uses the VFIO external API device functions.
>
> Signed-off-by: Eric Auger <[email protected]>
>
> ---
>
> v2 -> v3:
> reword the commit message and title
>
> v1 -> v2:
> - kvm_vfio_external_get_base_device renamed into
> kvm_vfio_external_base_device
> - kvm_vfio_external_get_type removed
> ---
> arch/arm/include/asm/kvm_host.h | 5 +++++
> virt/kvm/vfio.c | 45 +++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 50 insertions(+)
>
> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> index 53036e2..bca5b79 100644
> --- a/arch/arm/include/asm/kvm_host.h
> +++ b/arch/arm/include/asm/kvm_host.h
> @@ -169,6 +169,11 @@ void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte);
> unsigned long kvm_arm_num_regs(struct kvm_vcpu *vcpu);
> int kvm_arm_copy_reg_indices(struct kvm_vcpu *vcpu, u64 __user *indices);
>
> +struct vfio_device;
> +struct vfio_device *kvm_vfio_device_get_external_user(struct file *filep);
> +void kvm_vfio_device_put_external_user(struct vfio_device *vdev);
> +struct device *kvm_vfio_external_base_device(struct vfio_device *vdev);
> +

This doesn't look right, why doesn't kvm-vfio send the struct dev to the
arch callback? Then the below functions can be static. Nothing outside
of kvm-vfio device should need to do anything with vfio_device
references.

> /* We do not have shadow page tables, hence the empty hooks */
> static inline int kvm_age_hva(struct kvm *kvm, unsigned long start,
> unsigned long end)
> diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
> index 620e37f..6f0cc34 100644
> --- a/virt/kvm/vfio.c
> +++ b/virt/kvm/vfio.c
> @@ -60,6 +60,51 @@ static void kvm_vfio_group_put_external_user(struct vfio_group *vfio_group)
> symbol_put(vfio_group_put_external_user);
> }
>
> +struct vfio_device *kvm_vfio_device_get_external_user(struct file *filep)
> +{
> + struct vfio_device *vdev;
> + struct vfio_device *(*fn)(struct file *);
> +
> + fn = symbol_get(vfio_device_get_external_user);
> + if (!fn)
> + return ERR_PTR(-EINVAL);
> +
> + vdev = fn(filep);
> +
> + symbol_put(vfio_device_get_external_user);
> +
> + return vdev;
> +}
> +
> +void kvm_vfio_device_put_external_user(struct vfio_device *vdev)
> +{
> + void (*fn)(struct vfio_device *);
> +
> + fn = symbol_get(vfio_device_put_external_user);
> + if (!fn)
> + return;
> +
> + fn(vdev);
> +
> + symbol_put(vfio_device_put_external_user);
> +}
> +
> +struct device *kvm_vfio_external_base_device(struct vfio_device *vdev)
> +{
> + struct device *(*fn)(struct vfio_device *);
> + struct device *dev;
> +
> + fn = symbol_get(vfio_external_base_device);
> + if (!fn)
> + return NULL;
> +
> + dev = fn(vdev);
> +
> + symbol_put(vfio_external_base_device);
> +
> + return dev;
> +}
> +
> static bool kvm_vfio_group_is_coherent(struct vfio_group *vfio_group)
> {
> long (*fn)(struct vfio_group *, unsigned long);


2014-11-24 20:57:06

by Alex Williamson

[permalink] [raw]
Subject: Re: [PATCH v3 7/8] KVM: kvm-vfio: generic forwarding control

On Sun, 2014-11-23 at 19:35 +0100, Eric Auger wrote:
> This patch introduces a new KVM_DEV_VFIO_DEVICE group.
>
> This is a new control channel which enables KVM to cooperate with
> viable VFIO devices.
>
> Functions are introduced to check the validity of a VFIO device
> file descriptor, increment/decrement the ref counter of the VFIO
> device.
>
> The patch introduces 2 attributes for this new device group:
> KVM_DEV_VFIO_DEVICE_FORWARD_IRQ, KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ.
> Their purpose is to turn a VFIO device IRQ into a forwarded IRQ and
> unset respectively unset the feature.
>
> The VFIO device stores a list of registered forwarded IRQs. The reference
> counter of the device is incremented each time a new IRQ is forwarded.
> Reference counter is decremented when the IRQ forwarding is unset.
>
> The forwarding programmming is architecture specific, implemented in
> kvm_arch_set_fwd_state function. Architecture specific implementation is
> enabled when __KVM_HAVE_ARCH_KVM_VFIO_FORWARD is set. When not set those
> functions are void.
>
> Signed-off-by: Eric Auger <[email protected]>
>
> ---
>
> v2 -> v3:
> - add API comments in kvm_host.h
> - improve the commit message
> - create a private kvm_vfio_fwd_irq struct
> - fwd_irq_action replaced by a bool and removal of VFIO_IRQ_CLEANUP. This
> latter action will be handled in vgic.
> - add a vfio_device handle argument to kvm_arch_set_fwd_state. The goal is
> to move platform specific stuff in architecture specific code.
> - kvm_arch_set_fwd_state renamed into kvm_arch_vfio_set_forward
> - increment the ref counter each time we do an IRQ forwarding and decrement
> this latter each time one IRQ forward is unset. Simplifies the whole
> ref counting.
> - simplification of list handling: create, search, removal
>
> v1 -> v2:
> - __KVM_HAVE_ARCH_KVM_VFIO renamed into __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
> - original patch file separated into 2 parts: generic part moved in vfio.c
> and ARM specific part(kvm_arch_set_fwd_state)
> ---
> include/linux/kvm_host.h | 28 ++++++
> virt/kvm/vfio.c | 249 ++++++++++++++++++++++++++++++++++++++++++++++-
> 2 files changed, 274 insertions(+), 3 deletions(-)
>
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index ea53b04..0b9659d 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -1076,6 +1076,15 @@ struct kvm_device_ops {
> unsigned long arg);
> };
>
> +/* internal self-contained structure describing a forwarded IRQ */
> +struct kvm_fwd_irq {
> + struct kvm *kvm; /* VM to inject the GSI into */
> + struct vfio_device *vdev; /* vfio device the IRQ belongs to */
> + __u32 index; /* VFIO device IRQ index */
> + __u32 subindex; /* VFIO device IRQ subindex */
> + __u32 gsi; /* gsi, ie. virtual IRQ number */
> +};
> +
> void kvm_device_get(struct kvm_device *dev);
> void kvm_device_put(struct kvm_device *dev);
> struct kvm_device *kvm_device_from_filp(struct file *filp);
> @@ -1085,6 +1094,25 @@ void kvm_unregister_device_ops(u32 type);
> extern struct kvm_device_ops kvm_mpic_ops;
> extern struct kvm_device_ops kvm_xics_ops;
>
> +#ifdef __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
> +/**
> + * kvm_arch_vfio_set_forward - changes the forwarded state of an IRQ
> + *
> + * @fwd_irq: handle to the forwarded irq struct
> + * @forward: true means forwarded, false means not forwarded
> + * returns 0 on success, < 0 on failure
> + */
> +int kvm_arch_vfio_set_forward(struct kvm_fwd_irq *fwd_irq,
> + bool forward);

We could add a struct device* to the args list or into struct
kvm_fwd_irq so that arch code doesn't need to touch the vdev. arch code
has no business dealing with references to the vfio_device.

> +
> +#else
> +static inline int kvm_arch_vfio_set_forward(struct kvm_fwd_irq *fwd_irq,
> + bool forward)
> +{
> + return 0;
> +}
> +#endif
> +
> #ifdef CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT
>
> static inline void kvm_vcpu_set_in_spin_loop(struct kvm_vcpu *vcpu, bool val)
> diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
> index 6f0cc34..af178bb 100644
> --- a/virt/kvm/vfio.c
> +++ b/virt/kvm/vfio.c
> @@ -25,8 +25,16 @@ struct kvm_vfio_group {
> struct vfio_group *vfio_group;
> };
>
> +/* private linkable kvm_fwd_irq struct */
> +struct kvm_vfio_fwd_irq_node {
> + struct list_head link;
> + struct kvm_fwd_irq fwd_irq;
> +};
> +
> struct kvm_vfio {
> struct list_head group_list;
> + /* list of registered VFIO forwarded IRQs */
> + struct list_head fwd_node_list;
> struct mutex lock;
> bool noncoherent;
> };
> @@ -247,12 +255,239 @@ static int kvm_vfio_set_group(struct kvm_device *dev, long attr, u64 arg)
> return -ENXIO;
> }
>
> +/**
> + * kvm_vfio_get_vfio_device - Returns a handle to a vfio-device
> + *
> + * Checks it is a valid vfio device and increments its reference counter
> + * @fd: file descriptor of the vfio platform device
> + */
> +static struct vfio_device *kvm_vfio_get_vfio_device(int fd)
> +{
> + struct fd f = fdget(fd);
> + struct vfio_device *vdev;
> +
> + if (!f.file)
> + return NULL;

ERR_PTR(-EINVAL)?

ie. propagate errors from the point where they're encountered so we
don't need to make up an errno later.

> + vdev = kvm_vfio_device_get_external_user(f.file);
> + fdput(f);
> + return vdev;
> +}
> +
> +/**
> + * kvm_vfio_put_vfio_device: decrements the reference counter of the
> + * vfio platform * device
> + *
> + * @vdev: vfio_device handle to release
> + */
> +static void kvm_vfio_put_vfio_device(struct vfio_device *vdev)
> +{
> + kvm_vfio_device_put_external_user(vdev);
> +}
> +
> +/**
> + * kvm_vfio_find_fwd_irq - checks whether a forwarded IRQ already is
> + * registered in the list of forwarded IRQs
> + *
> + * @kv: handle to the kvm-vfio device
> + * @fwd: handle to the forwarded irq struct
> + * In the positive returns the handle to its node in the kvm-vfio
> + * forwarded IRQ list, returns NULL otherwise.
> + * Must be called with kv->lock hold.
> + */
> +static struct kvm_vfio_fwd_irq_node *kvm_vfio_find_fwd_irq(
> + struct kvm_vfio *kv,
> + struct kvm_fwd_irq *fwd)
> +{
> + struct kvm_vfio_fwd_irq_node *node;
> +
> + list_for_each_entry(node, &kv->fwd_node_list, link) {
> + if ((node->fwd_irq.index == fwd->index) &&
> + (node->fwd_irq.subindex == fwd->subindex) &&
> + (node->fwd_irq.vdev == fwd->vdev))
> + return node;
> + }
> + return NULL;
> +}
> +/**
> + * kvm_vfio_register_fwd_irq - Allocates, populates and registers a
> + * forwarded IRQ
> + *
> + * @kv: handle to the kvm-vfio device
> + * @fwd: handle to the forwarded irq struct
> + * In case of success returns a handle to the new list node,
> + * NULL otherwise.
> + * Must be called with kv->lock hold.
> + */
> +static struct kvm_vfio_fwd_irq_node *kvm_vfio_register_fwd_irq(
> + struct kvm_vfio *kv,
> + struct kvm_fwd_irq *fwd)
> +{
> + struct kvm_vfio_fwd_irq_node *node;
> +
> + node = kmalloc(sizeof(*node), GFP_KERNEL);
> + if (!node)
> + return NULL;

ERR_PTR(-ENOMEM)?

> +
> + node->fwd_irq = *fwd;
> +
> + list_add(&node->link, &kv->fwd_node_list);
> +
> + return node;
> +}
> +
> +/**
> + * kvm_vfio_unregister_fwd_irq - unregisters and frees a forwarded IRQ
> + *
> + * @node: handle to the node struct
> + * Must be called with kv->lock hold.
> + */
> +static void kvm_vfio_unregister_fwd_irq(struct kvm_vfio_fwd_irq_node *node)
> +{
> + list_del(&node->link);
> + kfree(node);
> +}
> +
> +/**
> + * kvm_vfio_set_forward - turns a VFIO device IRQ into a forwarded IRQ
> + * @kv: handle to the kvm-vfio device
> + * @fd: file descriptor of the vfio device the IRQ belongs to
> + * @fwd: handle to the forwarded irq struct
> + *
> + * Registers an IRQ as forwarded and calls the architecture specific
> + * implementation of set_forward. In case of operation failure, the IRQ
> + * is unregistered. In case of success, the vfio device ref counter is
> + * incremented.
> + */
> +static int kvm_vfio_set_forward(struct kvm_vfio *kv, int fd,
> + struct kvm_fwd_irq *fwd)
> +{
> + int ret;
> + struct kvm_vfio_fwd_irq_node *node =
> + kvm_vfio_find_fwd_irq(kv, fwd);
> +
> + if (node)
> + return -EINVAL;

I assume you're saving -EBUSY for arch code for the case where the IRQ
is already active?

> + node = kvm_vfio_register_fwd_irq(kv, fwd);
> + if (!node)
> + return -ENOMEM;

if (IS_ERR(node))
return PTR_ERR(node);

> + ret = kvm_arch_vfio_set_forward(fwd, true);
> + if (ret < 0) {
> + kvm_vfio_unregister_fwd_irq(node);
> + return ret;
> + }
> + /* increment the ref counter */
> + kvm_vfio_get_vfio_device(fd);

Wouldn't it be easier if the reference counting were coupled with the
register/unregister_fwd_irq? I'd be tempted to pass your user_fwd_irq
to this function instead of a kvm_fwd_irq.

> + return ret;
> +}
> +
> +/**
> + * kvm_vfio_unset_forward - Sets a VFIO device IRQ as non forwarded
> + * @kv: handle to the kvm-vfio device
> + * @fwd: handle to the forwarded irq struct
> + *
> + * Calls the architecture specific implementation of set_forward and
> + * unregisters the IRQ from the forwarded IRQ list. Decrements the vfio
> + * device reference counter.
> + */
> +static int kvm_vfio_unset_forward(struct kvm_vfio *kv,
> + struct kvm_fwd_irq *fwd)
> +{
> + int ret;
> + struct kvm_vfio_fwd_irq_node *node =
> + kvm_vfio_find_fwd_irq(kv, fwd);
> + if (!node)
> + return -EINVAL;
> + ret = kvm_arch_vfio_set_forward(fwd, false);

Whoa, if the unforward fails we continue to undo everything else? How
does userspace cleanup from this? We need a guaranteed shutdown path
for cleanup code, we can never trust userspace to do things in the
correct order. Can we really preclude the user calling unforward with
an active IRQ? Maybe _forward() and _unforward() need to be separate
functions so that 'un' can be made void to indicate it can't fail.

> + kvm_vfio_unregister_fwd_irq(node);
> +
> + /* decrement the ref counter */
> + kvm_vfio_put_vfio_device(fwd->vdev);

Again, seems like the unregister should do this.

> + return ret;
> +}
> +
> +static int kvm_vfio_control_irq_forward(struct kvm_device *kdev, long attr,
> + int32_t __user *argp)
> +{
> + struct kvm_arch_forwarded_irq user_fwd_irq;
> + struct kvm_fwd_irq fwd;
> + struct vfio_device *vdev;
> + struct kvm_vfio *kv = kdev->private;
> + int ret;
> +
> + if (copy_from_user(&user_fwd_irq, argp, sizeof(user_fwd_irq)))
> + return -EFAULT;
> +
> + vdev = kvm_vfio_get_vfio_device(user_fwd_irq.fd);
> + if (IS_ERR(vdev)) {
> + ret = PTR_ERR(vdev);
> + goto out;
> + }
> +
> + fwd.vdev = vdev;
> + fwd.kvm = kdev->kvm;
> + fwd.index = user_fwd_irq.index;
> + fwd.subindex = user_fwd_irq.subindex;
> + fwd.gsi = user_fwd_irq.gsi;
> +
> + switch (attr) {
> + case KVM_DEV_VFIO_DEVICE_FORWARD_IRQ:
> + mutex_lock(&kv->lock);
> + ret = kvm_vfio_set_forward(kv, user_fwd_irq.fd, &fwd);
> + mutex_unlock(&kv->lock);
> + break;
> + case KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ:
> + mutex_lock(&kv->lock);
> + ret = kvm_vfio_unset_forward(kv, &fwd);
> + mutex_unlock(&kv->lock);
> + break;
> + }
> +out:
> + kvm_vfio_put_vfio_device(vdev);

It might add a little extra code, but logically the reference being tied
to the register/unregister feels a bit cleaner than this.

> + return ret;
> +}
> +
> +static int kvm_vfio_set_device(struct kvm_device *kdev, long attr, u64 arg)
> +{
> + int32_t __user *argp = (int32_t __user *)(unsigned long)arg;
> + int ret;
> +
> + switch (attr) {
> + case KVM_DEV_VFIO_DEVICE_FORWARD_IRQ:
> + case KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ:
> + ret = kvm_vfio_control_irq_forward(kdev, attr, argp);
> + break;
> + default:
> + ret = -ENXIO;
> + }
> + return ret;
> +}
> +
> +/**
> + * kvm_vfio_clean_fwd_irq - Unset forwarding state of all
> + * registered forwarded IRQs and free their list nodes.
> + * @kv: kvm-vfio device
> + *
> + * Loop on all registered device/IRQ combos, reset the non forwarded state,
> + * void the lists and release the reference
> + */
> +static int kvm_vfio_clean_fwd_irq(struct kvm_vfio *kv)
> +{
> + struct kvm_vfio_fwd_irq_node *node, *tmp;
> +
> + list_for_each_entry_safe(node, tmp, &kv->fwd_node_list, link) {
> + kvm_vfio_unset_forward(kv, &node->fwd_irq);
> + }
> + return 0;

This shouldn't be able to fail, make it void.

> +}
> +
> static int kvm_vfio_set_attr(struct kvm_device *dev,
> struct kvm_device_attr *attr)
> {
> switch (attr->group) {
> case KVM_DEV_VFIO_GROUP:
> return kvm_vfio_set_group(dev, attr->attr, attr->addr);
> + case KVM_DEV_VFIO_DEVICE:
> + return kvm_vfio_set_device(dev, attr->attr, attr->addr);
> }
>
> return -ENXIO;
> @@ -268,10 +503,17 @@ static int kvm_vfio_has_attr(struct kvm_device *dev,
> case KVM_DEV_VFIO_GROUP_DEL:
> return 0;
> }
> -
> break;
> +#ifdef __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
> + case KVM_DEV_VFIO_DEVICE:
> + switch (attr->attr) {
> + case KVM_DEV_VFIO_DEVICE_FORWARD_IRQ:
> + case KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ:
> + return 0;
> + }
> + break;
> +#endif
> }
> -
> return -ENXIO;
> }
>
> @@ -285,7 +527,7 @@ static void kvm_vfio_destroy(struct kvm_device *dev)
> list_del(&kvg->node);
> kfree(kvg);
> }
> -
> + kvm_vfio_clean_fwd_irq(kv);
> kvm_vfio_update_coherency(dev);
>
> kfree(kv);
> @@ -317,6 +559,7 @@ static int kvm_vfio_create(struct kvm_device *dev, u32 type)
> return -ENOMEM;
>
> INIT_LIST_HEAD(&kv->group_list);
> + INIT_LIST_HEAD(&kv->fwd_node_list);
> mutex_init(&kv->lock);
>
> dev->private = kv;


2014-11-24 20:57:18

by Alex Williamson

[permalink] [raw]
Subject: Re: [PATCH v3 8/8] KVM: arm: kvm-vfio: forwarding control

On Sun, 2014-11-23 at 19:36 +0100, Eric Auger wrote:
> This patch sets __KVM_HAVE_ARCH_KVM_VFIO_FORWARD and implements
> kvm_arch_vfio_set_forward for ARM.
>
> As a result the KVM-VFIO device now allows to forward/unforward a
> VFIO device IRQ on ARM.
>
> kvm_arch_vfio_set_forward programs both genirq and the VGIC to control
> where the physical IRQ deactivation is initiated.
> - forwarded case: deactivation is initiated by the guest; when it
> completes the virtual IRQ, the GIC automatically deactivates the
> physical IRQ.
> - not forwarded case: the physical IRQ deactivation is handled by the host
>
> Signed-off-by: Eric Auger <[email protected]>
>
> ---
>
> v2 -> v3:
> - renaming of kvm_arch_set_fwd_state into kvm_arch_vfio_set_forward
> - takes a bool arg instead of kvm_fwd_irq_action enum
> - removal of KVM_VFIO_IRQ_CLEANUP
> - platform device check now happens here
> - more precise errors returned
> - irq_eoi handled externally to this patch (VGIC)
> - correct enable_irq bug done twice
> - reword the commit message
> - correct check of platform_bus_type
> - use raw_spin_lock_irqsave and check the validity of the handler
> ---
> arch/arm/include/asm/kvm_host.h | 2 +
> arch/arm/kvm/Makefile | 2 +-
> arch/arm/kvm/kvm_vfio_arm.c | 101 ++++++++++++++++++++++++++++++++++++++++
> 3 files changed, 104 insertions(+), 1 deletion(-)
> create mode 100644 arch/arm/kvm/kvm_vfio_arm.c
>
> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> index bca5b79..447f90c 100644
> --- a/arch/arm/include/asm/kvm_host.h
> +++ b/arch/arm/include/asm/kvm_host.h
> @@ -27,6 +27,8 @@
> #include <asm/fpstate.h>
> #include <kvm/arm_arch_timer.h>
>
> +#define __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
> +
> #if defined(CONFIG_KVM_ARM_MAX_VCPUS)
> #define KVM_MAX_VCPUS CONFIG_KVM_ARM_MAX_VCPUS
> #else
> diff --git a/arch/arm/kvm/Makefile b/arch/arm/kvm/Makefile
> index ea1fa76..26a5a42 100644
> --- a/arch/arm/kvm/Makefile
> +++ b/arch/arm/kvm/Makefile
> @@ -19,7 +19,7 @@ kvm-arm-y = $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o $(KVM)/eventfd.o $(KVM)/vf
>
> obj-y += kvm-arm.o init.o interrupts.o
> obj-y += arm.o handle_exit.o guest.o mmu.o emulate.o reset.o
> -obj-y += coproc.o coproc_a15.o coproc_a7.o mmio.o psci.o perf.o
> +obj-y += coproc.o coproc_a15.o coproc_a7.o mmio.o psci.o perf.o kvm_vfio_arm.o
> obj-$(CONFIG_KVM_ARM_VGIC) += $(KVM)/arm/vgic.o
> obj-$(CONFIG_KVM_ARM_VGIC) += $(KVM)/arm/vgic-v2.o
> obj-$(CONFIG_KVM_ARM_TIMER) += $(KVM)/arm/arch_timer.o
> diff --git a/arch/arm/kvm/kvm_vfio_arm.c b/arch/arm/kvm/kvm_vfio_arm.c
> new file mode 100644
> index 0000000..af2c501
> --- /dev/null
> +++ b/arch/arm/kvm/kvm_vfio_arm.c
> @@ -0,0 +1,101 @@
> +/*
> + * Copyright (C) 2014 Linaro Ltd.
> + * Authors: Eric Auger <[email protected]>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License, version 2, as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + *
> + */
> +
> +#include <linux/errno.h>
> +#include <linux/file.h>
> +#include <linux/kvm_host.h>
> +#include <linux/list.h>
> +#include <linux/mutex.h>
> +#include <linux/vfio.h>
> +#include <linux/irq.h>
> +#include <asm/kvm_host.h>
> +#include <asm/kvm.h>
> +#include <linux/irq.h>
> +#include <linux/spinlock.h>
> +#include <linux/platform_device.h>
> +#include <linux/interrupt.h>
> +
> +/**
> + * kvm_arch_vfio_set_forward - Change the forward state of an IRQ
> + *
> + * @fwd_irq: handle to the forward irq struct
> + * @forward: target forwarding state
> + *
> + * If forward is true, programs genirq and VGIC so that physical IRQ
> + * deactivation ownership is transferred to the guest (using GIC HW feature).
> + * When forward is false, standard behavior is restored, ie. host
> + * deactivates the physical IRQ.
> + * returns:
> + * -EINVAL if the vfio device is not a platform device
> + * -ENOENT if the irq could not be identified
> + * -EBUSY if physical IRQ is in progress
> + * -ENOENT if the VGIC has a physical/virtual IRQ mapping that is not
> + * consistent with the request.
> + */
> +int kvm_arch_vfio_set_forward(struct kvm_fwd_irq *fwd_irq,
> + bool forward)
> +{
> + int hwirq;
> + int ret = -EBUSY;
> + struct irq_desc *desc;
> + struct irq_data *d;
> + struct platform_device *platdev;
> + struct device *dev = kvm_vfio_external_base_device(fwd_irq->vdev);

Just pass the dev

> + unsigned long flags;
> + /*
> + * We don't have to garantee the vcpu handle is non void since the

s/garantee/guarantee/

> + * vfio device holds a reference to the kvm struct
> + */
> + struct kvm_vcpu *vcpu = kvm_get_vcpu(fwd_irq->kvm, 0);
> +
> + if (dev->bus == &platform_bus_type) {
> + platdev = to_platform_device(dev);
> + hwirq = platform_get_irq(platdev, fwd_irq->index);
> + if (hwirq < 0)
> + return -EINVAL;
> + } else
> + return -ENOENT;
> + desc = irq_to_desc(hwirq);
> +
> + /*
> + * if VFIO handler is already set, forwarded state cannot be
> + * changed anymore
> + */
> + raw_spin_lock_irqsave(&desc->lock, flags);
> + if (desc->action)
> + goto end;
> +
> + d = &desc->irq_data;
> +
> + if (forward) {
> + irqd_set_irq_forwarded(d);
> + raw_spin_unlock_irqrestore(&desc->lock, flags);
> + ret = vgic_map_phys_irq(vcpu,
> + fwd_irq->gsi + VGIC_NR_PRIVATE_IRQS,
> + hwirq);
> + } else {
> + irqd_clr_irq_forwarded(d);
> + raw_spin_unlock_irqrestore(&desc->lock, flags);
> + ret = vgic_unmap_phys_irq(vcpu,
> + fwd_irq->gsi +
> + VGIC_NR_PRIVATE_IRQS,
> + hwirq);
> + }
> + return ret;
> +
> +end:
> + raw_spin_unlock_irqrestore(&desc->lock, flags);
> + return ret;
> +}


2014-11-25 04:34:43

by Wu, Feng

[permalink] [raw]
Subject: RE: [PATCH v3 7/8] KVM: kvm-vfio: generic forwarding control



> -----Original Message-----
> From: Eric Auger [mailto:[email protected]]
> Sent: Monday, November 24, 2014 2:36 AM
> To: [email protected]; [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]
> Cc: [email protected]; [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]; [email protected]; Wu, Feng
> Subject: [PATCH v3 7/8] KVM: kvm-vfio: generic forwarding control
>
> This patch introduces a new KVM_DEV_VFIO_DEVICE group.
>
> This is a new control channel which enables KVM to cooperate with
> viable VFIO devices.
>
> Functions are introduced to check the validity of a VFIO device
> file descriptor, increment/decrement the ref counter of the VFIO
> device.
>
> The patch introduces 2 attributes for this new device group:
> KVM_DEV_VFIO_DEVICE_FORWARD_IRQ,
> KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ.
> Their purpose is to turn a VFIO device IRQ into a forwarded IRQ and
> unset respectively unset the feature.
>
> The VFIO device stores a list of registered forwarded IRQs. The reference
> counter of the device is incremented each time a new IRQ is forwarded.
> Reference counter is decremented when the IRQ forwarding is unset.
>
> The forwarding programmming is architecture specific, implemented in
> kvm_arch_set_fwd_state function. Architecture specific implementation is
> enabled when __KVM_HAVE_ARCH_KVM_VFIO_FORWARD is set. When not
> set those
> functions are void.
>
> Signed-off-by: Eric Auger <[email protected]>
>
> ---
>
> v2 -> v3:
> - add API comments in kvm_host.h
> - improve the commit message
> - create a private kvm_vfio_fwd_irq struct
> - fwd_irq_action replaced by a bool and removal of VFIO_IRQ_CLEANUP. This
> latter action will be handled in vgic.
> - add a vfio_device handle argument to kvm_arch_set_fwd_state. The goal is
> to move platform specific stuff in architecture specific code.
> - kvm_arch_set_fwd_state renamed into kvm_arch_vfio_set_forward
> - increment the ref counter each time we do an IRQ forwarding and decrement
> this latter each time one IRQ forward is unset. Simplifies the whole
> ref counting.
> - simplification of list handling: create, search, removal
>
> v1 -> v2:
> - __KVM_HAVE_ARCH_KVM_VFIO renamed into
> __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
> - original patch file separated into 2 parts: generic part moved in vfio.c
> and ARM specific part(kvm_arch_set_fwd_state)
> ---
> include/linux/kvm_host.h | 28 ++++++
> virt/kvm/vfio.c | 249
> ++++++++++++++++++++++++++++++++++++++++++++++-
> 2 files changed, 274 insertions(+), 3 deletions(-)
>
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index ea53b04..0b9659d 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -1076,6 +1076,15 @@ struct kvm_device_ops {
> unsigned long arg);
> };
>
> +/* internal self-contained structure describing a forwarded IRQ */
> +struct kvm_fwd_irq {
> + struct kvm *kvm; /* VM to inject the GSI into */
> + struct vfio_device *vdev; /* vfio device the IRQ belongs to */
> + __u32 index; /* VFIO device IRQ index */
> + __u32 subindex; /* VFIO device IRQ subindex */
> + __u32 gsi; /* gsi, ie. virtual IRQ number */
> +};
> +
> void kvm_device_get(struct kvm_device *dev);
> void kvm_device_put(struct kvm_device *dev);
> struct kvm_device *kvm_device_from_filp(struct file *filp);
> @@ -1085,6 +1094,25 @@ void kvm_unregister_device_ops(u32 type);
> extern struct kvm_device_ops kvm_mpic_ops;
> extern struct kvm_device_ops kvm_xics_ops;
>
> +#ifdef __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
> +/**
> + * kvm_arch_vfio_set_forward - changes the forwarded state of an IRQ
> + *
> + * @fwd_irq: handle to the forwarded irq struct
> + * @forward: true means forwarded, false means not forwarded
> + * returns 0 on success, < 0 on failure
> + */
> +int kvm_arch_vfio_set_forward(struct kvm_fwd_irq *fwd_irq,
> + bool forward);
> +
> +#else
> +static inline int kvm_arch_vfio_set_forward(struct kvm_fwd_irq *fwd_irq,
> + bool forward)
> +{
> + return 0;
> +}
> +#endif
> +
> #ifdef CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT
>
> static inline void kvm_vcpu_set_in_spin_loop(struct kvm_vcpu *vcpu, bool val)
> diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
> index 6f0cc34..af178bb 100644
> --- a/virt/kvm/vfio.c
> +++ b/virt/kvm/vfio.c
> @@ -25,8 +25,16 @@ struct kvm_vfio_group {
> struct vfio_group *vfio_group;
> };
>
> +/* private linkable kvm_fwd_irq struct */
> +struct kvm_vfio_fwd_irq_node {
> + struct list_head link;
> + struct kvm_fwd_irq fwd_irq;
> +};
> +
> struct kvm_vfio {
> struct list_head group_list;
> + /* list of registered VFIO forwarded IRQs */
> + struct list_head fwd_node_list;
> struct mutex lock;
> bool noncoherent;
> };
> @@ -247,12 +255,239 @@ static int kvm_vfio_set_group(struct kvm_device
> *dev, long attr, u64 arg)
> return -ENXIO;
> }
>
> +/**
> + * kvm_vfio_get_vfio_device - Returns a handle to a vfio-device
> + *
> + * Checks it is a valid vfio device and increments its reference counter
> + * @fd: file descriptor of the vfio platform device
> + */
> +static struct vfio_device *kvm_vfio_get_vfio_device(int fd)
> +{
> + struct fd f = fdget(fd);
> + struct vfio_device *vdev;
> +
> + if (!f.file)
> + return NULL;
> + vdev = kvm_vfio_device_get_external_user(f.file);
> + fdput(f);
> + return vdev;
> +}
> +
> +/**
> + * kvm_vfio_put_vfio_device: decrements the reference counter of the
> + * vfio platform * device
> + *
> + * @vdev: vfio_device handle to release
> + */
> +static void kvm_vfio_put_vfio_device(struct vfio_device *vdev)
> +{
> + kvm_vfio_device_put_external_user(vdev);
> +}
> +
> +/**
> + * kvm_vfio_find_fwd_irq - checks whether a forwarded IRQ already is
> + * registered in the list of forwarded IRQs
> + *
> + * @kv: handle to the kvm-vfio device
> + * @fwd: handle to the forwarded irq struct
> + * In the positive returns the handle to its node in the kvm-vfio
> + * forwarded IRQ list, returns NULL otherwise.
> + * Must be called with kv->lock hold.
> + */
> +static struct kvm_vfio_fwd_irq_node *kvm_vfio_find_fwd_irq(
> + struct kvm_vfio *kv,
> + struct kvm_fwd_irq *fwd)
> +{
> + struct kvm_vfio_fwd_irq_node *node;
> +
> + list_for_each_entry(node, &kv->fwd_node_list, link) {
> + if ((node->fwd_irq.index == fwd->index) &&
> + (node->fwd_irq.subindex == fwd->subindex) &&
> + (node->fwd_irq.vdev == fwd->vdev))
> + return node;
> + }
> + return NULL;
> +}
> +/**
> + * kvm_vfio_register_fwd_irq - Allocates, populates and registers a
> + * forwarded IRQ
> + *
> + * @kv: handle to the kvm-vfio device
> + * @fwd: handle to the forwarded irq struct
> + * In case of success returns a handle to the new list node,
> + * NULL otherwise.
> + * Must be called with kv->lock hold.
> + */
> +static struct kvm_vfio_fwd_irq_node *kvm_vfio_register_fwd_irq(
> + struct kvm_vfio *kv,
> + struct kvm_fwd_irq *fwd)
> +{
> + struct kvm_vfio_fwd_irq_node *node;
> +
> + node = kmalloc(sizeof(*node), GFP_KERNEL);
> + if (!node)
> + return NULL;
> +
> + node->fwd_irq = *fwd;
> +
> + list_add(&node->link, &kv->fwd_node_list);
> +
> + return node;
> +}
> +
> +/**
> + * kvm_vfio_unregister_fwd_irq - unregisters and frees a forwarded IRQ
> + *
> + * @node: handle to the node struct
> + * Must be called with kv->lock hold.
> + */
> +static void kvm_vfio_unregister_fwd_irq(struct kvm_vfio_fwd_irq_node
> *node)
> +{
> + list_del(&node->link);
> + kfree(node);
> +}
> +
> +/**
> + * kvm_vfio_set_forward - turns a VFIO device IRQ into a forwarded IRQ
> + * @kv: handle to the kvm-vfio device
> + * @fd: file descriptor of the vfio device the IRQ belongs to
> + * @fwd: handle to the forwarded irq struct
> + *
> + * Registers an IRQ as forwarded and calls the architecture specific
> + * implementation of set_forward. In case of operation failure, the IRQ
> + * is unregistered. In case of success, the vfio device ref counter is
> + * incremented.
> + */
> +static int kvm_vfio_set_forward(struct kvm_vfio *kv, int fd,
> + struct kvm_fwd_irq *fwd)
> +{
> + int ret;
> + struct kvm_vfio_fwd_irq_node *node =
> + kvm_vfio_find_fwd_irq(kv, fwd);
> +
> + if (node)
> + return -EINVAL;
> + node = kvm_vfio_register_fwd_irq(kv, fwd);
> + if (!node)
> + return -ENOMEM;
> + ret = kvm_arch_vfio_set_forward(fwd, true);
> + if (ret < 0) {
> + kvm_vfio_unregister_fwd_irq(node);
> + return ret;
> + }
> + /* increment the ref counter */
> + kvm_vfio_get_vfio_device(fd);
> + return ret;
> +}
> +
> +/**
> + * kvm_vfio_unset_forward - Sets a VFIO device IRQ as non forwarded
> + * @kv: handle to the kvm-vfio device
> + * @fwd: handle to the forwarded irq struct
> + *
> + * Calls the architecture specific implementation of set_forward and
> + * unregisters the IRQ from the forwarded IRQ list. Decrements the vfio
> + * device reference counter.
> + */
> +static int kvm_vfio_unset_forward(struct kvm_vfio *kv,
> + struct kvm_fwd_irq *fwd)
> +{
> + int ret;
> + struct kvm_vfio_fwd_irq_node *node =
> + kvm_vfio_find_fwd_irq(kv, fwd);
> + if (!node)
> + return -EINVAL;
> + ret = kvm_arch_vfio_set_forward(fwd, false);
> + kvm_vfio_unregister_fwd_irq(node);
> +
> + /* decrement the ref counter */
> + kvm_vfio_put_vfio_device(fwd->vdev);
> + return ret;
> +}
> +
> +static int kvm_vfio_control_irq_forward(struct kvm_device *kdev, long attr,
> + int32_t __user *argp)
> +{
> + struct kvm_arch_forwarded_irq user_fwd_irq;
> + struct kvm_fwd_irq fwd;
> + struct vfio_device *vdev;
> + struct kvm_vfio *kv = kdev->private;
> + int ret;
> +
> + if (copy_from_user(&user_fwd_irq, argp, sizeof(user_fwd_irq)))
> + return -EFAULT;
> +
> + vdev = kvm_vfio_get_vfio_device(user_fwd_irq.fd);
> + if (IS_ERR(vdev)) {
> + ret = PTR_ERR(vdev);
> + goto out;
> + }
> +
> + fwd.vdev = vdev;
> + fwd.kvm = kdev->kvm;
> + fwd.index = user_fwd_irq.index;
> + fwd.subindex = user_fwd_irq.subindex;
> + fwd.gsi = user_fwd_irq.gsi;
> +
> + switch (attr) {
> + case KVM_DEV_VFIO_DEVICE_FORWARD_IRQ:
> + mutex_lock(&kv->lock);
> + ret = kvm_vfio_set_forward(kv, user_fwd_irq.fd, &fwd);
> + mutex_unlock(&kv->lock);
> + break;
> + case KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ:
> + mutex_lock(&kv->lock);
> + ret = kvm_vfio_unset_forward(kv, &fwd);
> + mutex_unlock(&kv->lock);
> + break;
> + }
> +out:
> + kvm_vfio_put_vfio_device(vdev);
> + return ret;
> +}
> +
> +static int kvm_vfio_set_device(struct kvm_device *kdev, long attr, u64 arg)
> +{
> + int32_t __user *argp = (int32_t __user *)(unsigned long)arg;
> + int ret;
> +
> + switch (attr) {
> + case KVM_DEV_VFIO_DEVICE_FORWARD_IRQ:
> + case KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ:
> + ret = kvm_vfio_control_irq_forward(kdev, attr, argp);
> + break;
> + default:
> + ret = -ENXIO;
> + }
> + return ret;
> +}
> +
> +/**
> + * kvm_vfio_clean_fwd_irq - Unset forwarding state of all
> + * registered forwarded IRQs and free their list nodes.
> + * @kv: kvm-vfio device
> + *
> + * Loop on all registered device/IRQ combos, reset the non forwarded state,
> + * void the lists and release the reference
> + */
> +static int kvm_vfio_clean_fwd_irq(struct kvm_vfio *kv)
> +{
> + struct kvm_vfio_fwd_irq_node *node, *tmp;
> +
> + list_for_each_entry_safe(node, tmp, &kv->fwd_node_list, link) {
> + kvm_vfio_unset_forward(kv, &node->fwd_irq);
> + }
> + return 0;
> +}
> +
> static int kvm_vfio_set_attr(struct kvm_device *dev,
> struct kvm_device_attr *attr)
> {
> switch (attr->group) {
> case KVM_DEV_VFIO_GROUP:
> return kvm_vfio_set_group(dev, attr->attr, attr->addr);
> + case KVM_DEV_VFIO_DEVICE:
> + return kvm_vfio_set_device(dev, attr->attr, attr->addr);
> }
>
> return -ENXIO;
> @@ -268,10 +503,17 @@ static int kvm_vfio_has_attr(struct kvm_device
> *dev,
> case KVM_DEV_VFIO_GROUP_DEL:
> return 0;
> }
> -
> break;
> +#ifdef __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
> + case KVM_DEV_VFIO_DEVICE:
> + switch (attr->attr) {
> + case KVM_DEV_VFIO_DEVICE_FORWARD_IRQ:
> + case KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ:
> + return 0;
> + }
> + break;
> +#endif

Hi Eric,

Can you make the above code like this since group KVM_DEV_VFIO_DEVICE
will be used by posted interrupts as well, and new attributes will be added
in this group.

+ case KVM_DEV_VFIO_DEVICE:
+ switch (attr->attr) {
+#ifdef __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
+ case KVM_DEV_VFIO_DEVICE_FORWARD_IRQ:
+ case KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ:
+ return 0;
+#endif
+ }
+ break;

Thanks,
Feng


> }
> -
> return -ENXIO;
> }
>
> @@ -285,7 +527,7 @@ static void kvm_vfio_destroy(struct kvm_device *dev)
> list_del(&kvg->node);
> kfree(kvg);
> }
> -
> + kvm_vfio_clean_fwd_irq(kv);
> kvm_vfio_update_coherency(dev);
>
> kfree(kv);
> @@ -317,6 +559,7 @@ static int kvm_vfio_create(struct kvm_device *dev,
> u32 type)
> return -ENOMEM;
>
> INIT_LIST_HEAD(&kv->group_list);
> + INIT_LIST_HEAD(&kv->fwd_node_list);
> mutex_init(&kv->lock);
>
> dev->private = kv;
> --
> 1.9.1

2014-11-25 13:41:26

by Eric Auger

[permalink] [raw]
Subject: Re: [PATCH v3 7/8] KVM: kvm-vfio: generic forwarding control

On 11/25/2014 05:33 AM, Wu, Feng wrote:
>
>
>> -----Original Message-----
>> From: Eric Auger [mailto:[email protected]]
>> Sent: Monday, November 24, 2014 2:36 AM
>> To: [email protected]; [email protected]; [email protected];
>> [email protected]; [email protected];
>> [email protected]; [email protected];
>> [email protected]; [email protected];
>> [email protected]; [email protected]; [email protected];
>> [email protected]; [email protected]
>> Cc: [email protected]; [email protected]; [email protected];
>> [email protected]; [email protected];
>> [email protected]; [email protected]; Wu, Feng
>> Subject: [PATCH v3 7/8] KVM: kvm-vfio: generic forwarding control
>>
>> This patch introduces a new KVM_DEV_VFIO_DEVICE group.
>>
>> This is a new control channel which enables KVM to cooperate with
>> viable VFIO devices.
>>
>> Functions are introduced to check the validity of a VFIO device
>> file descriptor, increment/decrement the ref counter of the VFIO
>> device.
>>
>> The patch introduces 2 attributes for this new device group:
>> KVM_DEV_VFIO_DEVICE_FORWARD_IRQ,
>> KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ.
>> Their purpose is to turn a VFIO device IRQ into a forwarded IRQ and
>> unset respectively unset the feature.
>>
>> The VFIO device stores a list of registered forwarded IRQs. The reference
>> counter of the device is incremented each time a new IRQ is forwarded.
>> Reference counter is decremented when the IRQ forwarding is unset.
>>
>> The forwarding programmming is architecture specific, implemented in
>> kvm_arch_set_fwd_state function. Architecture specific implementation is
>> enabled when __KVM_HAVE_ARCH_KVM_VFIO_FORWARD is set. When not
>> set those
>> functions are void.
>>
>> Signed-off-by: Eric Auger <[email protected]>
>>
>> ---
>>
>> v2 -> v3:
>> - add API comments in kvm_host.h
>> - improve the commit message
>> - create a private kvm_vfio_fwd_irq struct
>> - fwd_irq_action replaced by a bool and removal of VFIO_IRQ_CLEANUP. This
>> latter action will be handled in vgic.
>> - add a vfio_device handle argument to kvm_arch_set_fwd_state. The goal is
>> to move platform specific stuff in architecture specific code.
>> - kvm_arch_set_fwd_state renamed into kvm_arch_vfio_set_forward
>> - increment the ref counter each time we do an IRQ forwarding and decrement
>> this latter each time one IRQ forward is unset. Simplifies the whole
>> ref counting.
>> - simplification of list handling: create, search, removal
>>
>> v1 -> v2:
>> - __KVM_HAVE_ARCH_KVM_VFIO renamed into
>> __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
>> - original patch file separated into 2 parts: generic part moved in vfio.c
>> and ARM specific part(kvm_arch_set_fwd_state)
>> ---
>> include/linux/kvm_host.h | 28 ++++++
>> virt/kvm/vfio.c | 249
>> ++++++++++++++++++++++++++++++++++++++++++++++-
>> 2 files changed, 274 insertions(+), 3 deletions(-)
>>
>> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
>> index ea53b04..0b9659d 100644
>> --- a/include/linux/kvm_host.h
>> +++ b/include/linux/kvm_host.h
>> @@ -1076,6 +1076,15 @@ struct kvm_device_ops {
>> unsigned long arg);
>> };
>>
>> +/* internal self-contained structure describing a forwarded IRQ */
>> +struct kvm_fwd_irq {
>> + struct kvm *kvm; /* VM to inject the GSI into */
>> + struct vfio_device *vdev; /* vfio device the IRQ belongs to */
>> + __u32 index; /* VFIO device IRQ index */
>> + __u32 subindex; /* VFIO device IRQ subindex */
>> + __u32 gsi; /* gsi, ie. virtual IRQ number */
>> +};
>> +
>> void kvm_device_get(struct kvm_device *dev);
>> void kvm_device_put(struct kvm_device *dev);
>> struct kvm_device *kvm_device_from_filp(struct file *filp);
>> @@ -1085,6 +1094,25 @@ void kvm_unregister_device_ops(u32 type);
>> extern struct kvm_device_ops kvm_mpic_ops;
>> extern struct kvm_device_ops kvm_xics_ops;
>>
>> +#ifdef __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
>> +/**
>> + * kvm_arch_vfio_set_forward - changes the forwarded state of an IRQ
>> + *
>> + * @fwd_irq: handle to the forwarded irq struct
>> + * @forward: true means forwarded, false means not forwarded
>> + * returns 0 on success, < 0 on failure
>> + */
>> +int kvm_arch_vfio_set_forward(struct kvm_fwd_irq *fwd_irq,
>> + bool forward);
>> +
>> +#else
>> +static inline int kvm_arch_vfio_set_forward(struct kvm_fwd_irq *fwd_irq,
>> + bool forward)
>> +{
>> + return 0;
>> +}
>> +#endif
>> +
>> #ifdef CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT
>>
>> static inline void kvm_vcpu_set_in_spin_loop(struct kvm_vcpu *vcpu, bool val)
>> diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
>> index 6f0cc34..af178bb 100644
>> --- a/virt/kvm/vfio.c
>> +++ b/virt/kvm/vfio.c
>> @@ -25,8 +25,16 @@ struct kvm_vfio_group {
>> struct vfio_group *vfio_group;
>> };
>>
>> +/* private linkable kvm_fwd_irq struct */
>> +struct kvm_vfio_fwd_irq_node {
>> + struct list_head link;
>> + struct kvm_fwd_irq fwd_irq;
>> +};
>> +
>> struct kvm_vfio {
>> struct list_head group_list;
>> + /* list of registered VFIO forwarded IRQs */
>> + struct list_head fwd_node_list;
>> struct mutex lock;
>> bool noncoherent;
>> };
>> @@ -247,12 +255,239 @@ static int kvm_vfio_set_group(struct kvm_device
>> *dev, long attr, u64 arg)
>> return -ENXIO;
>> }
>>
>> +/**
>> + * kvm_vfio_get_vfio_device - Returns a handle to a vfio-device
>> + *
>> + * Checks it is a valid vfio device and increments its reference counter
>> + * @fd: file descriptor of the vfio platform device
>> + */
>> +static struct vfio_device *kvm_vfio_get_vfio_device(int fd)
>> +{
>> + struct fd f = fdget(fd);
>> + struct vfio_device *vdev;
>> +
>> + if (!f.file)
>> + return NULL;
>> + vdev = kvm_vfio_device_get_external_user(f.file);
>> + fdput(f);
>> + return vdev;
>> +}
>> +
>> +/**
>> + * kvm_vfio_put_vfio_device: decrements the reference counter of the
>> + * vfio platform * device
>> + *
>> + * @vdev: vfio_device handle to release
>> + */
>> +static void kvm_vfio_put_vfio_device(struct vfio_device *vdev)
>> +{
>> + kvm_vfio_device_put_external_user(vdev);
>> +}
>> +
>> +/**
>> + * kvm_vfio_find_fwd_irq - checks whether a forwarded IRQ already is
>> + * registered in the list of forwarded IRQs
>> + *
>> + * @kv: handle to the kvm-vfio device
>> + * @fwd: handle to the forwarded irq struct
>> + * In the positive returns the handle to its node in the kvm-vfio
>> + * forwarded IRQ list, returns NULL otherwise.
>> + * Must be called with kv->lock hold.
>> + */
>> +static struct kvm_vfio_fwd_irq_node *kvm_vfio_find_fwd_irq(
>> + struct kvm_vfio *kv,
>> + struct kvm_fwd_irq *fwd)
>> +{
>> + struct kvm_vfio_fwd_irq_node *node;
>> +
>> + list_for_each_entry(node, &kv->fwd_node_list, link) {
>> + if ((node->fwd_irq.index == fwd->index) &&
>> + (node->fwd_irq.subindex == fwd->subindex) &&
>> + (node->fwd_irq.vdev == fwd->vdev))
>> + return node;
>> + }
>> + return NULL;
>> +}
>> +/**
>> + * kvm_vfio_register_fwd_irq - Allocates, populates and registers a
>> + * forwarded IRQ
>> + *
>> + * @kv: handle to the kvm-vfio device
>> + * @fwd: handle to the forwarded irq struct
>> + * In case of success returns a handle to the new list node,
>> + * NULL otherwise.
>> + * Must be called with kv->lock hold.
>> + */
>> +static struct kvm_vfio_fwd_irq_node *kvm_vfio_register_fwd_irq(
>> + struct kvm_vfio *kv,
>> + struct kvm_fwd_irq *fwd)
>> +{
>> + struct kvm_vfio_fwd_irq_node *node;
>> +
>> + node = kmalloc(sizeof(*node), GFP_KERNEL);
>> + if (!node)
>> + return NULL;
>> +
>> + node->fwd_irq = *fwd;
>> +
>> + list_add(&node->link, &kv->fwd_node_list);
>> +
>> + return node;
>> +}
>> +
>> +/**
>> + * kvm_vfio_unregister_fwd_irq - unregisters and frees a forwarded IRQ
>> + *
>> + * @node: handle to the node struct
>> + * Must be called with kv->lock hold.
>> + */
>> +static void kvm_vfio_unregister_fwd_irq(struct kvm_vfio_fwd_irq_node
>> *node)
>> +{
>> + list_del(&node->link);
>> + kfree(node);
>> +}
>> +
>> +/**
>> + * kvm_vfio_set_forward - turns a VFIO device IRQ into a forwarded IRQ
>> + * @kv: handle to the kvm-vfio device
>> + * @fd: file descriptor of the vfio device the IRQ belongs to
>> + * @fwd: handle to the forwarded irq struct
>> + *
>> + * Registers an IRQ as forwarded and calls the architecture specific
>> + * implementation of set_forward. In case of operation failure, the IRQ
>> + * is unregistered. In case of success, the vfio device ref counter is
>> + * incremented.
>> + */
>> +static int kvm_vfio_set_forward(struct kvm_vfio *kv, int fd,
>> + struct kvm_fwd_irq *fwd)
>> +{
>> + int ret;
>> + struct kvm_vfio_fwd_irq_node *node =
>> + kvm_vfio_find_fwd_irq(kv, fwd);
>> +
>> + if (node)
>> + return -EINVAL;
>> + node = kvm_vfio_register_fwd_irq(kv, fwd);
>> + if (!node)
>> + return -ENOMEM;
>> + ret = kvm_arch_vfio_set_forward(fwd, true);
>> + if (ret < 0) {
>> + kvm_vfio_unregister_fwd_irq(node);
>> + return ret;
>> + }
>> + /* increment the ref counter */
>> + kvm_vfio_get_vfio_device(fd);
>> + return ret;
>> +}
>> +
>> +/**
>> + * kvm_vfio_unset_forward - Sets a VFIO device IRQ as non forwarded
>> + * @kv: handle to the kvm-vfio device
>> + * @fwd: handle to the forwarded irq struct
>> + *
>> + * Calls the architecture specific implementation of set_forward and
>> + * unregisters the IRQ from the forwarded IRQ list. Decrements the vfio
>> + * device reference counter.
>> + */
>> +static int kvm_vfio_unset_forward(struct kvm_vfio *kv,
>> + struct kvm_fwd_irq *fwd)
>> +{
>> + int ret;
>> + struct kvm_vfio_fwd_irq_node *node =
>> + kvm_vfio_find_fwd_irq(kv, fwd);
>> + if (!node)
>> + return -EINVAL;
>> + ret = kvm_arch_vfio_set_forward(fwd, false);
>> + kvm_vfio_unregister_fwd_irq(node);
>> +
>> + /* decrement the ref counter */
>> + kvm_vfio_put_vfio_device(fwd->vdev);
>> + return ret;
>> +}
>> +
>> +static int kvm_vfio_control_irq_forward(struct kvm_device *kdev, long attr,
>> + int32_t __user *argp)
>> +{
>> + struct kvm_arch_forwarded_irq user_fwd_irq;
>> + struct kvm_fwd_irq fwd;
>> + struct vfio_device *vdev;
>> + struct kvm_vfio *kv = kdev->private;
>> + int ret;
>> +
>> + if (copy_from_user(&user_fwd_irq, argp, sizeof(user_fwd_irq)))
>> + return -EFAULT;
>> +
>> + vdev = kvm_vfio_get_vfio_device(user_fwd_irq.fd);
>> + if (IS_ERR(vdev)) {
>> + ret = PTR_ERR(vdev);
>> + goto out;
>> + }
>> +
>> + fwd.vdev = vdev;
>> + fwd.kvm = kdev->kvm;
>> + fwd.index = user_fwd_irq.index;
>> + fwd.subindex = user_fwd_irq.subindex;
>> + fwd.gsi = user_fwd_irq.gsi;
>> +
>> + switch (attr) {
>> + case KVM_DEV_VFIO_DEVICE_FORWARD_IRQ:
>> + mutex_lock(&kv->lock);
>> + ret = kvm_vfio_set_forward(kv, user_fwd_irq.fd, &fwd);
>> + mutex_unlock(&kv->lock);
>> + break;
>> + case KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ:
>> + mutex_lock(&kv->lock);
>> + ret = kvm_vfio_unset_forward(kv, &fwd);
>> + mutex_unlock(&kv->lock);
>> + break;
>> + }
>> +out:
>> + kvm_vfio_put_vfio_device(vdev);
>> + return ret;
>> +}
>> +
>> +static int kvm_vfio_set_device(struct kvm_device *kdev, long attr, u64 arg)
>> +{
>> + int32_t __user *argp = (int32_t __user *)(unsigned long)arg;
>> + int ret;
>> +
>> + switch (attr) {
>> + case KVM_DEV_VFIO_DEVICE_FORWARD_IRQ:
>> + case KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ:
>> + ret = kvm_vfio_control_irq_forward(kdev, attr, argp);
>> + break;
>> + default:
>> + ret = -ENXIO;
>> + }
>> + return ret;
>> +}
>> +
>> +/**
>> + * kvm_vfio_clean_fwd_irq - Unset forwarding state of all
>> + * registered forwarded IRQs and free their list nodes.
>> + * @kv: kvm-vfio device
>> + *
>> + * Loop on all registered device/IRQ combos, reset the non forwarded state,
>> + * void the lists and release the reference
>> + */
>> +static int kvm_vfio_clean_fwd_irq(struct kvm_vfio *kv)
>> +{
>> + struct kvm_vfio_fwd_irq_node *node, *tmp;
>> +
>> + list_for_each_entry_safe(node, tmp, &kv->fwd_node_list, link) {
>> + kvm_vfio_unset_forward(kv, &node->fwd_irq);
>> + }
>> + return 0;
>> +}
>> +
>> static int kvm_vfio_set_attr(struct kvm_device *dev,
>> struct kvm_device_attr *attr)
>> {
>> switch (attr->group) {
>> case KVM_DEV_VFIO_GROUP:
>> return kvm_vfio_set_group(dev, attr->attr, attr->addr);
>> + case KVM_DEV_VFIO_DEVICE:
>> + return kvm_vfio_set_device(dev, attr->attr, attr->addr);
>> }
>>
>> return -ENXIO;
>> @@ -268,10 +503,17 @@ static int kvm_vfio_has_attr(struct kvm_device
>> *dev,
>> case KVM_DEV_VFIO_GROUP_DEL:
>> return 0;
>> }
>> -
>> break;
>> +#ifdef __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
>> + case KVM_DEV_VFIO_DEVICE:
>> + switch (attr->attr) {
>> + case KVM_DEV_VFIO_DEVICE_FORWARD_IRQ:
>> + case KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ:
>> + return 0;
>> + }
>> + break;
>> +#endif
>
> Hi Eric,
>
> Can you make the above code like this since group KVM_DEV_VFIO_DEVICE
> will be used by posted interrupts as well, and new attributes will be added
> in this group.
>
> + case KVM_DEV_VFIO_DEVICE:
> + switch (attr->attr) {
> +#ifdef __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
> + case KVM_DEV_VFIO_DEVICE_FORWARD_IRQ:
> + case KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ:
> + return 0;
> +#endif
> + }
> + break;
>
> Thanks,
> Feng
Hi Feng,

yes sure!

Best Regards

Eric
>
>
>> }
>> -
>> return -ENXIO;
>> }
>>
>> @@ -285,7 +527,7 @@ static void kvm_vfio_destroy(struct kvm_device *dev)
>> list_del(&kvg->node);
>> kfree(kvg);
>> }
>> -
>> + kvm_vfio_clean_fwd_irq(kv);
>> kvm_vfio_update_coherency(dev);
>>
>> kfree(kv);
>> @@ -317,6 +559,7 @@ static int kvm_vfio_create(struct kvm_device *dev,
>> u32 type)
>> return -ENOMEM;
>>
>> INIT_LIST_HEAD(&kv->group_list);
>> + INIT_LIST_HEAD(&kv->fwd_node_list);
>> mutex_init(&kv->lock);
>>
>> dev->private = kv;
>> --
>> 1.9.1
>

2014-11-25 18:21:50

by Eric Auger

[permalink] [raw]
Subject: Re: [PATCH v3 7/8] KVM: kvm-vfio: generic forwarding control

On 11/24/2014 09:56 PM, Alex Williamson wrote:
> On Sun, 2014-11-23 at 19:35 +0100, Eric Auger wrote:
>> This patch introduces a new KVM_DEV_VFIO_DEVICE group.
>>
>> This is a new control channel which enables KVM to cooperate with
>> viable VFIO devices.
>>
>> Functions are introduced to check the validity of a VFIO device
>> file descriptor, increment/decrement the ref counter of the VFIO
>> device.
>>
>> The patch introduces 2 attributes for this new device group:
>> KVM_DEV_VFIO_DEVICE_FORWARD_IRQ, KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ.
>> Their purpose is to turn a VFIO device IRQ into a forwarded IRQ and
>> unset respectively unset the feature.
>>
>> The VFIO device stores a list of registered forwarded IRQs. The reference
>> counter of the device is incremented each time a new IRQ is forwarded.
>> Reference counter is decremented when the IRQ forwarding is unset.
>>
>> The forwarding programmming is architecture specific, implemented in
>> kvm_arch_set_fwd_state function. Architecture specific implementation is
>> enabled when __KVM_HAVE_ARCH_KVM_VFIO_FORWARD is set. When not set those
>> functions are void.
>>
>> Signed-off-by: Eric Auger <[email protected]>
>>
>> ---
>>
>> v2 -> v3:
>> - add API comments in kvm_host.h
>> - improve the commit message
>> - create a private kvm_vfio_fwd_irq struct
>> - fwd_irq_action replaced by a bool and removal of VFIO_IRQ_CLEANUP. This
>> latter action will be handled in vgic.
>> - add a vfio_device handle argument to kvm_arch_set_fwd_state. The goal is
>> to move platform specific stuff in architecture specific code.
>> - kvm_arch_set_fwd_state renamed into kvm_arch_vfio_set_forward
>> - increment the ref counter each time we do an IRQ forwarding and decrement
>> this latter each time one IRQ forward is unset. Simplifies the whole
>> ref counting.
>> - simplification of list handling: create, search, removal
>>
>> v1 -> v2:
>> - __KVM_HAVE_ARCH_KVM_VFIO renamed into __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
>> - original patch file separated into 2 parts: generic part moved in vfio.c
>> and ARM specific part(kvm_arch_set_fwd_state)
>> ---
>> include/linux/kvm_host.h | 28 ++++++
>> virt/kvm/vfio.c | 249 ++++++++++++++++++++++++++++++++++++++++++++++-
>> 2 files changed, 274 insertions(+), 3 deletions(-)
>>
>> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
>> index ea53b04..0b9659d 100644
>> --- a/include/linux/kvm_host.h
>> +++ b/include/linux/kvm_host.h
>> @@ -1076,6 +1076,15 @@ struct kvm_device_ops {
>> unsigned long arg);
>> };
>>
>> +/* internal self-contained structure describing a forwarded IRQ */
>> +struct kvm_fwd_irq {
>> + struct kvm *kvm; /* VM to inject the GSI into */
>> + struct vfio_device *vdev; /* vfio device the IRQ belongs to */
>> + __u32 index; /* VFIO device IRQ index */
>> + __u32 subindex; /* VFIO device IRQ subindex */
>> + __u32 gsi; /* gsi, ie. virtual IRQ number */
>> +};
>> +
>> void kvm_device_get(struct kvm_device *dev);
>> void kvm_device_put(struct kvm_device *dev);
>> struct kvm_device *kvm_device_from_filp(struct file *filp);
>> @@ -1085,6 +1094,25 @@ void kvm_unregister_device_ops(u32 type);
>> extern struct kvm_device_ops kvm_mpic_ops;
>> extern struct kvm_device_ops kvm_xics_ops;
>>
>> +#ifdef __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
>> +/**
>> + * kvm_arch_vfio_set_forward - changes the forwarded state of an IRQ
>> + *
>> + * @fwd_irq: handle to the forwarded irq struct
>> + * @forward: true means forwarded, false means not forwarded
>> + * returns 0 on success, < 0 on failure
>> + */
>> +int kvm_arch_vfio_set_forward(struct kvm_fwd_irq *fwd_irq,
>> + bool forward);
>
> We could add a struct device* to the args list or into struct
> kvm_fwd_irq so that arch code doesn't need to touch the vdev. arch code
> has no business dealing with references to the vfio_device.
Hi Alex,

Currently It can't put struct device* into the kvm_fwd_irq struct since
I need to release the vfio_device with
vfio_device_put_external_user(struct vfio_device *vdev)
typically in kvm_vfio_clean_fwd_irq. So I need to store the pointers to
the vfio_device somewhere.

I see 2 solutions: change the proto of
vfio_device_put_external_user(struct vfio_device *vdev) and pass a
struct device* (??)

or change the proto of kvm_arch_vfio_set_forward into

kvm_arch_vfio_set_forward(struct kvm *kvm, struct device *dev, int
index, [int subindex], int gsi, bool forward) or using index/start/count
but loosing the interest of having a self-contained internal struct.

>
>> +
>> +#else
>> +static inline int kvm_arch_vfio_set_forward(struct kvm_fwd_irq *fwd_irq,
>> + bool forward)
>> +{
>> + return 0;
>> +}
>> +#endif
>> +
>> #ifdef CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT
>>
>> static inline void kvm_vcpu_set_in_spin_loop(struct kvm_vcpu *vcpu, bool val)
>> diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
>> index 6f0cc34..af178bb 100644
>> --- a/virt/kvm/vfio.c
>> +++ b/virt/kvm/vfio.c
>> @@ -25,8 +25,16 @@ struct kvm_vfio_group {
>> struct vfio_group *vfio_group;
>> };
>>
>> +/* private linkable kvm_fwd_irq struct */
>> +struct kvm_vfio_fwd_irq_node {
>> + struct list_head link;
>> + struct kvm_fwd_irq fwd_irq;
>> +};
>> +
>> struct kvm_vfio {
>> struct list_head group_list;
>> + /* list of registered VFIO forwarded IRQs */
>> + struct list_head fwd_node_list;
>> struct mutex lock;
>> bool noncoherent;
>> };
>> @@ -247,12 +255,239 @@ static int kvm_vfio_set_group(struct kvm_device *dev, long attr, u64 arg)
>> return -ENXIO;
>> }
>>
>> +/**
>> + * kvm_vfio_get_vfio_device - Returns a handle to a vfio-device
>> + *
>> + * Checks it is a valid vfio device and increments its reference counter
>> + * @fd: file descriptor of the vfio platform device
>> + */
>> +static struct vfio_device *kvm_vfio_get_vfio_device(int fd)
>> +{
>> + struct fd f = fdget(fd);
>> + struct vfio_device *vdev;
>> +
>> + if (!f.file)
>> + return NULL;
>
> ERR_PTR(-EINVAL)?
>
> ie. propagate errors from the point where they're encountered so we
> don't need to make up an errno later.
yes thanks
>
>> + vdev = kvm_vfio_device_get_external_user(f.file);
>> + fdput(f);
>> + return vdev;
>> +}
>> +
>> +/**
>> + * kvm_vfio_put_vfio_device: decrements the reference counter of the
>> + * vfio platform * device
>> + *
>> + * @vdev: vfio_device handle to release
>> + */
>> +static void kvm_vfio_put_vfio_device(struct vfio_device *vdev)
>> +{
>> + kvm_vfio_device_put_external_user(vdev);
>> +}
>> +
>> +/**
>> + * kvm_vfio_find_fwd_irq - checks whether a forwarded IRQ already is
>> + * registered in the list of forwarded IRQs
>> + *
>> + * @kv: handle to the kvm-vfio device
>> + * @fwd: handle to the forwarded irq struct
>> + * In the positive returns the handle to its node in the kvm-vfio
>> + * forwarded IRQ list, returns NULL otherwise.
>> + * Must be called with kv->lock hold.
>> + */
>> +static struct kvm_vfio_fwd_irq_node *kvm_vfio_find_fwd_irq(
>> + struct kvm_vfio *kv,
>> + struct kvm_fwd_irq *fwd)
>> +{
>> + struct kvm_vfio_fwd_irq_node *node;
>> +
>> + list_for_each_entry(node, &kv->fwd_node_list, link) {
>> + if ((node->fwd_irq.index == fwd->index) &&
>> + (node->fwd_irq.subindex == fwd->subindex) &&
>> + (node->fwd_irq.vdev == fwd->vdev))
>> + return node;
>> + }
>> + return NULL;
>> +}
>> +/**
>> + * kvm_vfio_register_fwd_irq - Allocates, populates and registers a
>> + * forwarded IRQ
>> + *
>> + * @kv: handle to the kvm-vfio device
>> + * @fwd: handle to the forwarded irq struct
>> + * In case of success returns a handle to the new list node,
>> + * NULL otherwise.
>> + * Must be called with kv->lock hold.
>> + */
>> +static struct kvm_vfio_fwd_irq_node *kvm_vfio_register_fwd_irq(
>> + struct kvm_vfio *kv,
>> + struct kvm_fwd_irq *fwd)
>> +{
>> + struct kvm_vfio_fwd_irq_node *node;
>> +
>> + node = kmalloc(sizeof(*node), GFP_KERNEL);
>> + if (!node)
>> + return NULL;
>
> ERR_PTR(-ENOMEM)?
>
OK
>> +
>> + node->fwd_irq = *fwd;
>> +
>> + list_add(&node->link, &kv->fwd_node_list);
>> +
>> + return node;
>> +}
>> +
>> +/**
>> + * kvm_vfio_unregister_fwd_irq - unregisters and frees a forwarded IRQ
>> + *
>> + * @node: handle to the node struct
>> + * Must be called with kv->lock hold.
>> + */
>> +static void kvm_vfio_unregister_fwd_irq(struct kvm_vfio_fwd_irq_node *node)
>> +{
>> + list_del(&node->link);
>> + kfree(node);
>> +}
>> +
>> +/**
>> + * kvm_vfio_set_forward - turns a VFIO device IRQ into a forwarded IRQ
>> + * @kv: handle to the kvm-vfio device
>> + * @fd: file descriptor of the vfio device the IRQ belongs to
>> + * @fwd: handle to the forwarded irq struct
>> + *
>> + * Registers an IRQ as forwarded and calls the architecture specific
>> + * implementation of set_forward. In case of operation failure, the IRQ
>> + * is unregistered. In case of success, the vfio device ref counter is
>> + * incremented.
>> + */
>> +static int kvm_vfio_set_forward(struct kvm_vfio *kv, int fd,
>> + struct kvm_fwd_irq *fwd)
>> +{
>> + int ret;
>> + struct kvm_vfio_fwd_irq_node *node =
>> + kvm_vfio_find_fwd_irq(kv, fwd);
>> +
>> + if (node)
>> + return -EINVAL;
>
> I assume you're saving -EBUSY for arch code for the case where the IRQ
> is already active?
yes. -EBUSY now corresponds to the case where the VFIO signaling is
already setup.
>
>> + node = kvm_vfio_register_fwd_irq(kv, fwd);
>> + if (!node)
>> + return -ENOMEM;
>
> if (IS_ERR(node))
> return PTR_ERR(node);
>
>> + ret = kvm_arch_vfio_set_forward(fwd, true);
>> + if (ret < 0) {
>> + kvm_vfio_unregister_fwd_irq(node);
>> + return ret;
>> + }
>> + /* increment the ref counter */
>> + kvm_vfio_get_vfio_device(fd);
>
> Wouldn't it be easier if the reference counting were coupled with the
> register/unregister_fwd_irq?
ok
I'd be tempted to pass your user_fwd_irq
> to this function instead of a kvm_fwd_irq.
Well in that case I would use kvm_arch_forwarded_irq for both set and
unset. Then comes the problem of kvm_vfio_clean_fwd_irq which
manipulates only internal structs.

>> + return ret;
>> +}
>> +
>> +/**
>> + * kvm_vfio_unset_forward - Sets a VFIO device IRQ as non forwarded
>> + * @kv: handle to the kvm-vfio device
>> + * @fwd: handle to the forwarded irq struct
>> + *
>> + * Calls the architecture specific implementation of set_forward and
>> + * unregisters the IRQ from the forwarded IRQ list. Decrements the vfio
>> + * device reference counter.
>> + */
>> +static int kvm_vfio_unset_forward(struct kvm_vfio *kv,
>> + struct kvm_fwd_irq *fwd)
>> +{
>> + int ret;
>> + struct kvm_vfio_fwd_irq_node *node =
>> + kvm_vfio_find_fwd_irq(kv, fwd);
>> + if (!node)
>> + return -EINVAL;
>> + ret = kvm_arch_vfio_set_forward(fwd, false);
>
> Whoa, if the unforward fails we continue to undo everything else? How
> does userspace cleanup from this? We need a guaranteed shutdown path
> for cleanup code, we can never trust userspace to do things in the
> correct order. Can we really preclude the user calling unforward with
> an active IRQ? Maybe _forward() and _unforward() need to be separate
> functions so that 'un' can be made void to indicate it can't fail.
If I accept an unforward while the VFIO signaling mechanism is set, the
wrong handler will be setup on VFIO driver. So should I put in place a
mechanism that changes the VFIO handler for that irq (causing VFIO
driver free_irq/request_irq), using another external API function?
>
>> + kvm_vfio_unregister_fwd_irq(node);
>> +
>> + /* decrement the ref counter */
>> + kvm_vfio_put_vfio_device(fwd->vdev);
>
> Again, seems like the unregister should do this.
ok
>
>> + return ret;
>> +}
>> +
>> +static int kvm_vfio_control_irq_forward(struct kvm_device *kdev, long attr,
>> + int32_t __user *argp)
>> +{
>> + struct kvm_arch_forwarded_irq user_fwd_irq;
>> + struct kvm_fwd_irq fwd;
>> + struct vfio_device *vdev;
>> + struct kvm_vfio *kv = kdev->private;
>> + int ret;
>> +
>> + if (copy_from_user(&user_fwd_irq, argp, sizeof(user_fwd_irq)))
>> + return -EFAULT;
>> +
>> + vdev = kvm_vfio_get_vfio_device(user_fwd_irq.fd);
>> + if (IS_ERR(vdev)) {
>> + ret = PTR_ERR(vdev);
>> + goto out;
>> + }
>> +
>> + fwd.vdev = vdev;
>> + fwd.kvm = kdev->kvm;
>> + fwd.index = user_fwd_irq.index;
>> + fwd.subindex = user_fwd_irq.subindex;
>> + fwd.gsi = user_fwd_irq.gsi;
>> +
>> + switch (attr) {
>> + case KVM_DEV_VFIO_DEVICE_FORWARD_IRQ:
>> + mutex_lock(&kv->lock);
>> + ret = kvm_vfio_set_forward(kv, user_fwd_irq.fd, &fwd);
>> + mutex_unlock(&kv->lock);
>> + break;
>> + case KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ:
>> + mutex_lock(&kv->lock);
>> + ret = kvm_vfio_unset_forward(kv, &fwd);
>> + mutex_unlock(&kv->lock);
>> + break;
>> + }
>> +out:
>> + kvm_vfio_put_vfio_device(vdev);
>
> It might add a little extra code, but logically the reference being tied
> to the register/unregister feels a bit cleaner than this.
Sorry I do not catch your comment here. Since i called
kvm_vfio_get_vfio_device at the beg of the function, I need to release
at the end of the function, besides the ref counter incr/decr at
registration?
>
>> + return ret;
>> +}
>> +
>> +static int kvm_vfio_set_device(struct kvm_device *kdev, long attr, u64 arg)
>> +{
>> + int32_t __user *argp = (int32_t __user *)(unsigned long)arg;
>> + int ret;
>> +
>> + switch (attr) {
>> + case KVM_DEV_VFIO_DEVICE_FORWARD_IRQ:
>> + case KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ:
>> + ret = kvm_vfio_control_irq_forward(kdev, attr, argp);
>> + break;
>> + default:
>> + ret = -ENXIO;
>> + }
>> + return ret;
>> +}
>> +
>> +/**
>> + * kvm_vfio_clean_fwd_irq - Unset forwarding state of all
>> + * registered forwarded IRQs and free their list nodes.
>> + * @kv: kvm-vfio device
>> + *
>> + * Loop on all registered device/IRQ combos, reset the non forwarded state,
>> + * void the lists and release the reference
>> + */
>> +static int kvm_vfio_clean_fwd_irq(struct kvm_vfio *kv)
>> +{
>> + struct kvm_vfio_fwd_irq_node *node, *tmp;
>> +
>> + list_for_each_entry_safe(node, tmp, &kv->fwd_node_list, link) {
>> + kvm_vfio_unset_forward(kv, &node->fwd_irq);
>> + }
>> + return 0;
>
> This shouldn't be able to fail, make it void.
see above questioning? But you're right, I am too much virtualization
oriented. Currently my cleanup is even done in the virtual interrupt
controller when the VM dies because nobody unsets the VFIO signaling.

Best Regards

Eric
>
>> +}
>> +
>> static int kvm_vfio_set_attr(struct kvm_device *dev,
>> struct kvm_device_attr *attr)
>> {
>> switch (attr->group) {
>> case KVM_DEV_VFIO_GROUP:
>> return kvm_vfio_set_group(dev, attr->attr, attr->addr);
>> + case KVM_DEV_VFIO_DEVICE:
>> + return kvm_vfio_set_device(dev, attr->attr, attr->addr);
>> }
>>
>> return -ENXIO;
>> @@ -268,10 +503,17 @@ static int kvm_vfio_has_attr(struct kvm_device *dev,
>> case KVM_DEV_VFIO_GROUP_DEL:
>> return 0;
>> }
>> -
>> break;
>> +#ifdef __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
>> + case KVM_DEV_VFIO_DEVICE:
>> + switch (attr->attr) {
>> + case KVM_DEV_VFIO_DEVICE_FORWARD_IRQ:
>> + case KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ:
>> + return 0;
>> + }
>> + break;
>> +#endif
>> }
>> -
>> return -ENXIO;
>> }
>>
>> @@ -285,7 +527,7 @@ static void kvm_vfio_destroy(struct kvm_device *dev)
>> list_del(&kvg->node);
>> kfree(kvg);
>> }
>> -
>> + kvm_vfio_clean_fwd_irq(kv);
>> kvm_vfio_update_coherency(dev);
>>
>> kfree(kv);
>> @@ -317,6 +559,7 @@ static int kvm_vfio_create(struct kvm_device *dev, u32 type)
>> return -ENOMEM;
>>
>> INIT_LIST_HEAD(&kv->group_list);
>> + INIT_LIST_HEAD(&kv->fwd_node_list);
>> mutex_init(&kv->lock);
>>
>> dev->private = kv;
>
>
>

2014-11-25 19:01:39

by Alex Williamson

[permalink] [raw]
Subject: Re: [PATCH v3 7/8] KVM: kvm-vfio: generic forwarding control

On Tue, 2014-11-25 at 19:20 +0100, Eric Auger wrote:
> On 11/24/2014 09:56 PM, Alex Williamson wrote:
> > On Sun, 2014-11-23 at 19:35 +0100, Eric Auger wrote:
> >> This patch introduces a new KVM_DEV_VFIO_DEVICE group.
> >>
> >> This is a new control channel which enables KVM to cooperate with
> >> viable VFIO devices.
> >>
> >> Functions are introduced to check the validity of a VFIO device
> >> file descriptor, increment/decrement the ref counter of the VFIO
> >> device.
> >>
> >> The patch introduces 2 attributes for this new device group:
> >> KVM_DEV_VFIO_DEVICE_FORWARD_IRQ, KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ.
> >> Their purpose is to turn a VFIO device IRQ into a forwarded IRQ and
> >> unset respectively unset the feature.
> >>
> >> The VFIO device stores a list of registered forwarded IRQs. The reference
> >> counter of the device is incremented each time a new IRQ is forwarded.
> >> Reference counter is decremented when the IRQ forwarding is unset.
> >>
> >> The forwarding programmming is architecture specific, implemented in
> >> kvm_arch_set_fwd_state function. Architecture specific implementation is
> >> enabled when __KVM_HAVE_ARCH_KVM_VFIO_FORWARD is set. When not set those
> >> functions are void.
> >>
> >> Signed-off-by: Eric Auger <[email protected]>
> >>
> >> ---
> >>
> >> v2 -> v3:
> >> - add API comments in kvm_host.h
> >> - improve the commit message
> >> - create a private kvm_vfio_fwd_irq struct
> >> - fwd_irq_action replaced by a bool and removal of VFIO_IRQ_CLEANUP. This
> >> latter action will be handled in vgic.
> >> - add a vfio_device handle argument to kvm_arch_set_fwd_state. The goal is
> >> to move platform specific stuff in architecture specific code.
> >> - kvm_arch_set_fwd_state renamed into kvm_arch_vfio_set_forward
> >> - increment the ref counter each time we do an IRQ forwarding and decrement
> >> this latter each time one IRQ forward is unset. Simplifies the whole
> >> ref counting.
> >> - simplification of list handling: create, search, removal
> >>
> >> v1 -> v2:
> >> - __KVM_HAVE_ARCH_KVM_VFIO renamed into __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
> >> - original patch file separated into 2 parts: generic part moved in vfio.c
> >> and ARM specific part(kvm_arch_set_fwd_state)
> >> ---
> >> include/linux/kvm_host.h | 28 ++++++
> >> virt/kvm/vfio.c | 249 ++++++++++++++++++++++++++++++++++++++++++++++-
> >> 2 files changed, 274 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> >> index ea53b04..0b9659d 100644
> >> --- a/include/linux/kvm_host.h
> >> +++ b/include/linux/kvm_host.h
> >> @@ -1076,6 +1076,15 @@ struct kvm_device_ops {
> >> unsigned long arg);
> >> };
> >>
> >> +/* internal self-contained structure describing a forwarded IRQ */
> >> +struct kvm_fwd_irq {
> >> + struct kvm *kvm; /* VM to inject the GSI into */
> >> + struct vfio_device *vdev; /* vfio device the IRQ belongs to */
> >> + __u32 index; /* VFIO device IRQ index */
> >> + __u32 subindex; /* VFIO device IRQ subindex */
> >> + __u32 gsi; /* gsi, ie. virtual IRQ number */
> >> +};
> >> +
> >> void kvm_device_get(struct kvm_device *dev);
> >> void kvm_device_put(struct kvm_device *dev);
> >> struct kvm_device *kvm_device_from_filp(struct file *filp);
> >> @@ -1085,6 +1094,25 @@ void kvm_unregister_device_ops(u32 type);
> >> extern struct kvm_device_ops kvm_mpic_ops;
> >> extern struct kvm_device_ops kvm_xics_ops;
> >>
> >> +#ifdef __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
> >> +/**
> >> + * kvm_arch_vfio_set_forward - changes the forwarded state of an IRQ
> >> + *
> >> + * @fwd_irq: handle to the forwarded irq struct
> >> + * @forward: true means forwarded, false means not forwarded
> >> + * returns 0 on success, < 0 on failure
> >> + */
> >> +int kvm_arch_vfio_set_forward(struct kvm_fwd_irq *fwd_irq,
> >> + bool forward);
> >
> > We could add a struct device* to the args list or into struct
> > kvm_fwd_irq so that arch code doesn't need to touch the vdev. arch code
> > has no business dealing with references to the vfio_device.
> Hi Alex,
>
> Currently It can't put struct device* into the kvm_fwd_irq struct since
> I need to release the vfio_device with
> vfio_device_put_external_user(struct vfio_device *vdev)
> typically in kvm_vfio_clean_fwd_irq. So I need to store the pointers to
> the vfio_device somewhere.
>
> I see 2 solutions: change the proto of
> vfio_device_put_external_user(struct vfio_device *vdev) and pass a
> struct device* (??)
>
> or change the proto of kvm_arch_vfio_set_forward into
>
> kvm_arch_vfio_set_forward(struct kvm *kvm, struct device *dev, int
> index, [int subindex], int gsi, bool forward) or using index/start/count
> but loosing the interest of having a self-contained internal struct.

The latter is sort of what I was assuming, I think the interface between
VFIO and KVM-VFIO is good, we just don't need to expose VFIO-isms out to
the arch KVM code. KVM-VFIO should be the barrier layer. In that
spirit, maybe it should be kvm_arch_set_forward() and the KVM-VFIO code
should do the processing of index/subindex sort of like how Feng did for
PCI devices.

> >
> >> +
> >> +#else
> >> +static inline int kvm_arch_vfio_set_forward(struct kvm_fwd_irq *fwd_irq,
> >> + bool forward)
> >> +{
> >> + return 0;
> >> +}
> >> +#endif
> >> +
> >> #ifdef CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT
> >>
> >> static inline void kvm_vcpu_set_in_spin_loop(struct kvm_vcpu *vcpu, bool val)
> >> diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
> >> index 6f0cc34..af178bb 100644
> >> --- a/virt/kvm/vfio.c
> >> +++ b/virt/kvm/vfio.c
> >> @@ -25,8 +25,16 @@ struct kvm_vfio_group {
> >> struct vfio_group *vfio_group;
> >> };
> >>
> >> +/* private linkable kvm_fwd_irq struct */
> >> +struct kvm_vfio_fwd_irq_node {
> >> + struct list_head link;
> >> + struct kvm_fwd_irq fwd_irq;
> >> +};
> >> +
> >> struct kvm_vfio {
> >> struct list_head group_list;
> >> + /* list of registered VFIO forwarded IRQs */
> >> + struct list_head fwd_node_list;
> >> struct mutex lock;
> >> bool noncoherent;
> >> };
> >> @@ -247,12 +255,239 @@ static int kvm_vfio_set_group(struct kvm_device *dev, long attr, u64 arg)
> >> return -ENXIO;
> >> }
> >>
> >> +/**
> >> + * kvm_vfio_get_vfio_device - Returns a handle to a vfio-device
> >> + *
> >> + * Checks it is a valid vfio device and increments its reference counter
> >> + * @fd: file descriptor of the vfio platform device
> >> + */
> >> +static struct vfio_device *kvm_vfio_get_vfio_device(int fd)
> >> +{
> >> + struct fd f = fdget(fd);
> >> + struct vfio_device *vdev;
> >> +
> >> + if (!f.file)
> >> + return NULL;
> >
> > ERR_PTR(-EINVAL)?
> >
> > ie. propagate errors from the point where they're encountered so we
> > don't need to make up an errno later.
> yes thanks
> >
> >> + vdev = kvm_vfio_device_get_external_user(f.file);
> >> + fdput(f);
> >> + return vdev;
> >> +}
> >> +
> >> +/**
> >> + * kvm_vfio_put_vfio_device: decrements the reference counter of the
> >> + * vfio platform * device
> >> + *
> >> + * @vdev: vfio_device handle to release
> >> + */
> >> +static void kvm_vfio_put_vfio_device(struct vfio_device *vdev)
> >> +{
> >> + kvm_vfio_device_put_external_user(vdev);
> >> +}
> >> +
> >> +/**
> >> + * kvm_vfio_find_fwd_irq - checks whether a forwarded IRQ already is
> >> + * registered in the list of forwarded IRQs
> >> + *
> >> + * @kv: handle to the kvm-vfio device
> >> + * @fwd: handle to the forwarded irq struct
> >> + * In the positive returns the handle to its node in the kvm-vfio
> >> + * forwarded IRQ list, returns NULL otherwise.
> >> + * Must be called with kv->lock hold.
> >> + */
> >> +static struct kvm_vfio_fwd_irq_node *kvm_vfio_find_fwd_irq(
> >> + struct kvm_vfio *kv,
> >> + struct kvm_fwd_irq *fwd)
> >> +{
> >> + struct kvm_vfio_fwd_irq_node *node;
> >> +
> >> + list_for_each_entry(node, &kv->fwd_node_list, link) {
> >> + if ((node->fwd_irq.index == fwd->index) &&
> >> + (node->fwd_irq.subindex == fwd->subindex) &&
> >> + (node->fwd_irq.vdev == fwd->vdev))
> >> + return node;
> >> + }
> >> + return NULL;
> >> +}
> >> +/**
> >> + * kvm_vfio_register_fwd_irq - Allocates, populates and registers a
> >> + * forwarded IRQ
> >> + *
> >> + * @kv: handle to the kvm-vfio device
> >> + * @fwd: handle to the forwarded irq struct
> >> + * In case of success returns a handle to the new list node,
> >> + * NULL otherwise.
> >> + * Must be called with kv->lock hold.
> >> + */
> >> +static struct kvm_vfio_fwd_irq_node *kvm_vfio_register_fwd_irq(
> >> + struct kvm_vfio *kv,
> >> + struct kvm_fwd_irq *fwd)
> >> +{
> >> + struct kvm_vfio_fwd_irq_node *node;
> >> +
> >> + node = kmalloc(sizeof(*node), GFP_KERNEL);
> >> + if (!node)
> >> + return NULL;
> >
> > ERR_PTR(-ENOMEM)?
> >
> OK
> >> +
> >> + node->fwd_irq = *fwd;
> >> +
> >> + list_add(&node->link, &kv->fwd_node_list);
> >> +
> >> + return node;
> >> +}
> >> +
> >> +/**
> >> + * kvm_vfio_unregister_fwd_irq - unregisters and frees a forwarded IRQ
> >> + *
> >> + * @node: handle to the node struct
> >> + * Must be called with kv->lock hold.
> >> + */
> >> +static void kvm_vfio_unregister_fwd_irq(struct kvm_vfio_fwd_irq_node *node)
> >> +{
> >> + list_del(&node->link);
> >> + kfree(node);
> >> +}
> >> +
> >> +/**
> >> + * kvm_vfio_set_forward - turns a VFIO device IRQ into a forwarded IRQ
> >> + * @kv: handle to the kvm-vfio device
> >> + * @fd: file descriptor of the vfio device the IRQ belongs to
> >> + * @fwd: handle to the forwarded irq struct
> >> + *
> >> + * Registers an IRQ as forwarded and calls the architecture specific
> >> + * implementation of set_forward. In case of operation failure, the IRQ
> >> + * is unregistered. In case of success, the vfio device ref counter is
> >> + * incremented.
> >> + */
> >> +static int kvm_vfio_set_forward(struct kvm_vfio *kv, int fd,
> >> + struct kvm_fwd_irq *fwd)
> >> +{
> >> + int ret;
> >> + struct kvm_vfio_fwd_irq_node *node =
> >> + kvm_vfio_find_fwd_irq(kv, fwd);
> >> +
> >> + if (node)
> >> + return -EINVAL;
> >
> > I assume you're saving -EBUSY for arch code for the case where the IRQ
> > is already active?
> yes. -EBUSY now corresponds to the case where the VFIO signaling is
> already setup.
> >
> >> + node = kvm_vfio_register_fwd_irq(kv, fwd);
> >> + if (!node)
> >> + return -ENOMEM;
> >
> > if (IS_ERR(node))
> > return PTR_ERR(node);
> >
> >> + ret = kvm_arch_vfio_set_forward(fwd, true);
> >> + if (ret < 0) {
> >> + kvm_vfio_unregister_fwd_irq(node);
> >> + return ret;
> >> + }
> >> + /* increment the ref counter */
> >> + kvm_vfio_get_vfio_device(fd);
> >
> > Wouldn't it be easier if the reference counting were coupled with the
> > register/unregister_fwd_irq?
> ok
> I'd be tempted to pass your user_fwd_irq
> > to this function instead of a kvm_fwd_irq.
> Well in that case I would use kvm_arch_forwarded_irq for both set and
> unset. Then comes the problem of kvm_vfio_clean_fwd_irq which
> manipulates only internal structs.
>
> >> + return ret;
> >> +}
> >> +
> >> +/**
> >> + * kvm_vfio_unset_forward - Sets a VFIO device IRQ as non forwarded
> >> + * @kv: handle to the kvm-vfio device
> >> + * @fwd: handle to the forwarded irq struct
> >> + *
> >> + * Calls the architecture specific implementation of set_forward and
> >> + * unregisters the IRQ from the forwarded IRQ list. Decrements the vfio
> >> + * device reference counter.
> >> + */
> >> +static int kvm_vfio_unset_forward(struct kvm_vfio *kv,
> >> + struct kvm_fwd_irq *fwd)
> >> +{
> >> + int ret;
> >> + struct kvm_vfio_fwd_irq_node *node =
> >> + kvm_vfio_find_fwd_irq(kv, fwd);
> >> + if (!node)
> >> + return -EINVAL;
> >> + ret = kvm_arch_vfio_set_forward(fwd, false);
> >
> > Whoa, if the unforward fails we continue to undo everything else? How
> > does userspace cleanup from this? We need a guaranteed shutdown path
> > for cleanup code, we can never trust userspace to do things in the
> > correct order. Can we really preclude the user calling unforward with
> > an active IRQ? Maybe _forward() and _unforward() need to be separate
> > functions so that 'un' can be made void to indicate it can't fail.
> If I accept an unforward while the VFIO signaling mechanism is set, the
> wrong handler will be setup on VFIO driver. So should I put in place a
> mechanism that changes the VFIO handler for that irq (causing VFIO
> driver free_irq/request_irq), using another external API function?

Yep, it seems like we need to re-evaluate whether forwarding can be
changed on a running IRQ. Disallowing it doesn't appear to support KVM
initiated shutdown, only user initiated configuration. So the
vfio-platform interrupt handler probably needs to be bi-modal. Maybe
the forwarding state of the IRQ can use RCU to avoid locks.

> >> + kvm_vfio_unregister_fwd_irq(node);
> >> +
> >> + /* decrement the ref counter */
> >> + kvm_vfio_put_vfio_device(fwd->vdev);
> >
> > Again, seems like the unregister should do this.
> ok
> >
> >> + return ret;
> >> +}
> >> +
> >> +static int kvm_vfio_control_irq_forward(struct kvm_device *kdev, long attr,
> >> + int32_t __user *argp)
> >> +{
> >> + struct kvm_arch_forwarded_irq user_fwd_irq;
> >> + struct kvm_fwd_irq fwd;
> >> + struct vfio_device *vdev;
> >> + struct kvm_vfio *kv = kdev->private;
> >> + int ret;
> >> +
> >> + if (copy_from_user(&user_fwd_irq, argp, sizeof(user_fwd_irq)))
> >> + return -EFAULT;
> >> +
> >> + vdev = kvm_vfio_get_vfio_device(user_fwd_irq.fd);
> >> + if (IS_ERR(vdev)) {
> >> + ret = PTR_ERR(vdev);
> >> + goto out;
> >> + }
> >> +
> >> + fwd.vdev = vdev;
> >> + fwd.kvm = kdev->kvm;
> >> + fwd.index = user_fwd_irq.index;
> >> + fwd.subindex = user_fwd_irq.subindex;
> >> + fwd.gsi = user_fwd_irq.gsi;
> >> +
> >> + switch (attr) {
> >> + case KVM_DEV_VFIO_DEVICE_FORWARD_IRQ:
> >> + mutex_lock(&kv->lock);
> >> + ret = kvm_vfio_set_forward(kv, user_fwd_irq.fd, &fwd);
> >> + mutex_unlock(&kv->lock);
> >> + break;
> >> + case KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ:
> >> + mutex_lock(&kv->lock);
> >> + ret = kvm_vfio_unset_forward(kv, &fwd);
> >> + mutex_unlock(&kv->lock);
> >> + break;
> >> + }
> >> +out:
> >> + kvm_vfio_put_vfio_device(vdev);
> >
> > It might add a little extra code, but logically the reference being tied
> > to the register/unregister feels a bit cleaner than this.
> Sorry I do not catch your comment here. Since i called
> kvm_vfio_get_vfio_device at the beg of the function, I need to release
> at the end of the function, besides the ref counter incr/decr at
> registration?

If we do reference counting on register/unregister, I'd think that the
get/put in this function would go away. Having the reference here seems
redundant.

> >
> >> + return ret;
> >> +}
> >> +
> >> +static int kvm_vfio_set_device(struct kvm_device *kdev, long attr, u64 arg)
> >> +{
> >> + int32_t __user *argp = (int32_t __user *)(unsigned long)arg;
> >> + int ret;
> >> +
> >> + switch (attr) {
> >> + case KVM_DEV_VFIO_DEVICE_FORWARD_IRQ:
> >> + case KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ:
> >> + ret = kvm_vfio_control_irq_forward(kdev, attr, argp);
> >> + break;
> >> + default:
> >> + ret = -ENXIO;
> >> + }
> >> + return ret;
> >> +}
> >> +
> >> +/**
> >> + * kvm_vfio_clean_fwd_irq - Unset forwarding state of all
> >> + * registered forwarded IRQs and free their list nodes.
> >> + * @kv: kvm-vfio device
> >> + *
> >> + * Loop on all registered device/IRQ combos, reset the non forwarded state,
> >> + * void the lists and release the reference
> >> + */
> >> +static int kvm_vfio_clean_fwd_irq(struct kvm_vfio *kv)
> >> +{
> >> + struct kvm_vfio_fwd_irq_node *node, *tmp;
> >> +
> >> + list_for_each_entry_safe(node, tmp, &kv->fwd_node_list, link) {
> >> + kvm_vfio_unset_forward(kv, &node->fwd_irq);
> >> + }
> >> + return 0;
> >
> > This shouldn't be able to fail, make it void.
> see above questioning? But you're right, I am too much virtualization
> oriented. Currently my cleanup is even done in the virtual interrupt
> controller when the VM dies because nobody unsets the VFIO signaling.

Yep, being a kernel interface it needs to be hardened against careless
and malicious users. The kernel should return to the correct state if
we kill -9 QEMU at any point. Thanks,

Alex

2014-11-30 12:14:06

by Christoffer Dall

[permalink] [raw]
Subject: Re: [PATCH v3 2/8] KVM: arm64: Enable the KVM-VFIO device

On Sun, Nov 23, 2014 at 07:35:54PM +0100, Eric Auger wrote:
> Used by KVM-enabled VFIO-based device passthrough support in QEMU.
>
> Signed-off-by: Joel Schopp <[email protected]>
> Signed-off-by: Eric Auger <[email protected]>
>
> ---
>
> Extracted from [RFC PATCH] arm64: KVM: add irqfd support
> http://www.spinics.net/lists/kvm-arm/msg10798.html
> ---
> arch/arm64/kvm/Kconfig | 1 +
> arch/arm64/kvm/Makefile | 2 +-
> 2 files changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
> index 09c25c2..2edf926 100644
> --- a/arch/arm64/kvm/Kconfig
> +++ b/arch/arm64/kvm/Kconfig
> @@ -26,6 +26,7 @@ config KVM
> select KVM_ARM_HOST
> select KVM_ARM_VGIC
> select KVM_ARM_TIMER
> + select KVM_VFIO
> select HAVE_KVM_EVENTFD
> ---help---
> Support hosting virtualized guest machines.
> diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
> index 2e6b827..81ed091 100644
> --- a/arch/arm64/kvm/Makefile
> +++ b/arch/arm64/kvm/Makefile
> @@ -11,7 +11,7 @@ ARM=../../../arch/arm/kvm
>
> obj-$(CONFIG_KVM_ARM_HOST) += kvm.o
>
> -kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o $(KVM)/eventfd.o
> +kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o $(KVM)/eventfd.o $(KVM)/vfio.o
> kvm-$(CONFIG_KVM_ARM_HOST) += $(ARM)/arm.o $(ARM)/mmu.o $(ARM)/mmio.o
> kvm-$(CONFIG_KVM_ARM_HOST) += $(ARM)/psci.o $(ARM)/perf.o
>
> --
> 1.9.1
>

Should these patches not be squashed into one?

Also, what do they enable at this point? Should they be queued by the
end of the series instead?

-Christoffer

2014-11-30 12:47:34

by Christoffer Dall

[permalink] [raw]
Subject: Re: [PATCH v3 3/8] VFIO: platform: forwarded state tested when selecting IRQ handler

The subject reads strangely, perhaps just:

VFIO: platform: test forward state when selecting IRQ handler

On Sun, Nov 23, 2014 at 07:35:55PM +0100, Eric Auger wrote:
> In case the IRQ is forwarded, the VFIO platform IRQ handler does not
> need to disable the IRQ anymore.
>
> When setting the IRQ handler we now also test the forwarded state. In
> case the IRQ is forwarded we select the edge handler (no automaske).
>
> Signed-off-by: Eric Auger <[email protected]>
>
> ---
>
> v2 -> v3:
> - forwarded state was tested in the handler. Now the forwarded state
> is tested before setting the handler. This definitively limits
> the dynamics of forwarded state changes but I don't think there is
> a use case where we need to be able to change the state at any time.

user space can change this by calling the VFIO_IRQ_SET_ACTION_TRIGGER
whenever it wants, right?

> ---
> drivers/vfio/platform/vfio_platform_irq.c | 7 ++++++-
> 1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/vfio/platform/vfio_platform_irq.c b/drivers/vfio/platform/vfio_platform_irq.c
> index 08d400e..61a2920 100644
> --- a/drivers/vfio/platform/vfio_platform_irq.c
> +++ b/drivers/vfio/platform/vfio_platform_irq.c
> @@ -230,8 +230,13 @@ static int vfio_platform_set_irq_trigger(struct vfio_platform_device *vdev,
> {
> struct vfio_platform_irq *irq = &vdev->irqs[index];
> irq_handler_t handler;
> + struct irq_data *d;
> + bool is_forwarded;
>
> - if (vdev->irqs[index].flags & VFIO_IRQ_INFO_MASKABLE)
> + d = irq_get_irq_data(irq->hwirq);
> + is_forwarded = irqd_irq_forwarded(d);
> +
> + if ((vdev->irqs[index].flags & VFIO_IRQ_INFO_MASKABLE) && !is_forwarded)
> handler = vfio_maskable_irq_handler;
> else
> handler = vfio_irq_handler;
> --
> 1.9.1
>

2014-11-30 12:53:18

by Christoffer Dall

[permalink] [raw]
Subject: Re: [PATCH v3 4/8] KVM: kvm-vfio: User API for IRQ forwarding

On Sun, Nov 23, 2014 at 07:35:56PM +0100, Eric Auger wrote:
> This patch adds and document a new KVM_DEV_VFIO_DEVICE group

documents

> and 2 device attributes: KVM_DEV_VFIO_DEVICE_FORWARD_IRQ,
> KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ. The purpose is to be able
> to set a VFIO device IRQ as forwarded or not forwarded.
> the command takes as argument a handle to a new struct named
> kvm_arch_forwarded_irq.
>
> Signed-off-by: Eric Auger <[email protected]>
>
> ---
>
> v2 -> v3:
> - rework vfio kvm device documentation
> - reword commit message and title
> - add subindex in kvm_arch_forwarded_irq to be closer to VFIO API
> - forwarding state can only be changed with VFIO IRQ signaling is off
>
> v1 -> v2:
> - struct kvm_arch_forwarded_irq moved from arch/arm/include/uapi/asm/kvm.h
> to include/uapi/linux/kvm.h
> also irq_index renamed into index and guest_irq renamed into gsi
> - ASSIGN/DEASSIGN renamed into FORWARD/UNFORWARD
> ---
> Documentation/virtual/kvm/devices/vfio.txt | 34 ++++++++++++++++++++++++------
> include/uapi/linux/kvm.h | 10 +++++++++
> 2 files changed, 38 insertions(+), 6 deletions(-)
>
> diff --git a/Documentation/virtual/kvm/devices/vfio.txt b/Documentation/virtual/kvm/devices/vfio.txt
> index ef51740..f7aff29 100644
> --- a/Documentation/virtual/kvm/devices/vfio.txt
> +++ b/Documentation/virtual/kvm/devices/vfio.txt
> @@ -4,15 +4,24 @@ VFIO virtual device
> Device types supported:
> KVM_DEV_TYPE_VFIO
>
> -Only one VFIO instance may be created per VM. The created device
> -tracks VFIO groups in use by the VM and features of those groups
> -important to the correctness and acceleration of the VM. As groups
> -are enabled and disabled for use by the VM, KVM should be updated
> -about their presence. When registered with KVM, a reference to the
> -VFIO-group is held by KVM.
> +Only one VFIO instance may be created per VM.
> +
> +The created device tracks VFIO groups in use by the VM and features
> +of those groups important to the correctness and acceleration of
> +the VM. As groups are enabled and disabled for use by the VM, KVM
> +should be updated about their presence. When registered with KVM,
> +a reference to the VFIO-group is held by KVM.
> +
> +The device also tracks & enable VFIO device forwarded IRQs, if any.

s/tracks & enable/tracks and enables/

> +A physical forwarded IRQ is directly completed by the guest. This
> +requires HW support in the interrupt controller which must be able
> +to automatically complete the physical IRQ when it detects the guest
> +has completed the corresponding virtual IRQ. The modality sometimes
> +is named direct EOI.

this last sentence is a bit floating out there; perhaps you want to be
more specific and say what it is on ARM.

>
> Groups:
> KVM_DEV_VFIO_GROUP
> + KVM_DEV_VFIO_DEVICE
>
> KVM_DEV_VFIO_GROUP attributes:
> KVM_DEV_VFIO_GROUP_ADD: Add a VFIO group to VFIO-KVM device tracking
> @@ -20,3 +29,16 @@ KVM_DEV_VFIO_GROUP attributes:
>
> For each, kvm_device_attr.addr points to an int32_t file descriptor
> for the VFIO group.
> +
> +KVM_DEV_VFIO_DEVICE attributes:
> + KVM_DEV_VFIO_DEVICE_FORWARD_IRQ: set a VFIO device IRQ as forwarded
> + KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ: set a VFIO device IRQ as not forwarded
> +
> +For each, kvm_device_attr.addr points to a kvm_arch_forwarded_irq struct.
> +
> +The forwarded state can only be changed when the VFIO signaling mechanism
> +for this physical IRQ is not set. In other words, forwarding must be
> +activated before VFIO_DEVICE_SET_IRQS has been called to trigger the IRQ
> +or associate an eventfd to it. Unforwarding can only be called while the
> +signaling has been disabled with VFIO_DEVICE_SET_IRQS. If this condition is
> +not satisfied, the command returns an -EBUSY.

s/command/ioctl/

> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 6076882..a269a42 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -946,6 +946,9 @@ struct kvm_device_attr {
> #define KVM_DEV_VFIO_GROUP 1
> #define KVM_DEV_VFIO_GROUP_ADD 1
> #define KVM_DEV_VFIO_GROUP_DEL 2
> +#define KVM_DEV_VFIO_DEVICE 2
> +#define KVM_DEV_VFIO_DEVICE_FORWARD_IRQ 1
> +#define KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ 2
>
> enum kvm_device_type {
> KVM_DEV_TYPE_FSL_MPIC_20 = 1,
> @@ -963,6 +966,13 @@ enum kvm_device_type {
> KVM_DEV_TYPE_MAX,
> };
>
> +struct kvm_arch_forwarded_irq {
> + __u32 fd; /* file desciptor of the VFIO device */
> + __u32 index; /* VFIO device IRQ index */
> + __u32 subindex; /* VFIO device IRQ subindex */
> + __u32 gsi; /* gsi, ie. virtual IRQ number */
> +};
> +
> /*
> * ioctls for VM fds
> */
> --
> 1.9.1
>

otherwise this looks reasonable.

2014-11-30 13:01:55

by Christoffer Dall

[permalink] [raw]
Subject: Re: [PATCH v3 6/8] KVM: kvm-vfio: wrapper to VFIO external API device helpers

On Sun, Nov 23, 2014 at 07:35:58PM +0100, Eric Auger wrote:
> Provide wrapper functions that allow KVM-VFIO device code to
> interact with a vfio device:
> - kvm_vfio_device_get_external_user gets a handle to a struct
> vfio_device from the vfio device file descriptor and increments
> its reference counter,
> - kvm_vfio_device_put_external_user decrements the reference counter
> to a vfio device,
> - kvm_vfio_external_base_device returns a handle to the struct device
> of the vfio device.
>
> The KVM-VFIO device uses the VFIO external API device functions.
>
> Signed-off-by: Eric Auger <[email protected]>
>
> ---
>
> v2 -> v3:
> reword the commit message and title
>
> v1 -> v2:
> - kvm_vfio_external_get_base_device renamed into
> kvm_vfio_external_base_device
> - kvm_vfio_external_get_type removed
> ---
> arch/arm/include/asm/kvm_host.h | 5 +++++
> virt/kvm/vfio.c | 45 +++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 50 insertions(+)
>
> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> index 53036e2..bca5b79 100644
> --- a/arch/arm/include/asm/kvm_host.h
> +++ b/arch/arm/include/asm/kvm_host.h
> @@ -169,6 +169,11 @@ void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte);
> unsigned long kvm_arm_num_regs(struct kvm_vcpu *vcpu);
> int kvm_arm_copy_reg_indices(struct kvm_vcpu *vcpu, u64 __user *indices);
>
> +struct vfio_device;
> +struct vfio_device *kvm_vfio_device_get_external_user(struct file *filep);
> +void kvm_vfio_device_put_external_user(struct vfio_device *vdev);
> +struct device *kvm_vfio_external_base_device(struct vfio_device *vdev);
> +

why do we add this in kvm_host.h ?

> /* We do not have shadow page tables, hence the empty hooks */
> static inline int kvm_age_hva(struct kvm *kvm, unsigned long start,
> unsigned long end)
> diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
> index 620e37f..6f0cc34 100644
> --- a/virt/kvm/vfio.c
> +++ b/virt/kvm/vfio.c
> @@ -60,6 +60,51 @@ static void kvm_vfio_group_put_external_user(struct vfio_group *vfio_group)
> symbol_put(vfio_group_put_external_user);
> }
>
> +struct vfio_device *kvm_vfio_device_get_external_user(struct file *filep)
> +{
> + struct vfio_device *vdev;
> + struct vfio_device *(*fn)(struct file *);
> +
> + fn = symbol_get(vfio_device_get_external_user);
> + if (!fn)
> + return ERR_PTR(-EINVAL);
> +
> + vdev = fn(filep);
> +
> + symbol_put(vfio_device_get_external_user);
> +
> + return vdev;
> +}
> +
> +void kvm_vfio_device_put_external_user(struct vfio_device *vdev)
> +{
> + void (*fn)(struct vfio_device *);
> +
> + fn = symbol_get(vfio_device_put_external_user);
> + if (!fn)
> + return;
> +
> + fn(vdev);
> +
> + symbol_put(vfio_device_put_external_user);
> +}
> +
> +struct device *kvm_vfio_external_base_device(struct vfio_device *vdev)
> +{
> + struct device *(*fn)(struct vfio_device *);
> + struct device *dev;
> +
> + fn = symbol_get(vfio_external_base_device);
> + if (!fn)
> + return NULL;
> +
> + dev = fn(vdev);
> +
> + symbol_put(vfio_external_base_device);
> +
> + return dev;
> +}
> +
> static bool kvm_vfio_group_is_coherent(struct vfio_group *vfio_group)
> {
> long (*fn)(struct vfio_group *, unsigned long);
> --
> 1.9.1
>

2014-12-01 14:40:55

by Eric Auger

[permalink] [raw]
Subject: Re: [PATCH v3 3/8] VFIO: platform: forwarded state tested when selecting IRQ handler

Hi Christoffer,
On 11/30/2014 01:47 PM, Christoffer Dall wrote:
> The subject reads strangely, perhaps just:
>
> VFIO: platform: test forward state when selecting IRQ handler
OK
>
> On Sun, Nov 23, 2014 at 07:35:55PM +0100, Eric Auger wrote:
>> In case the IRQ is forwarded, the VFIO platform IRQ handler does not
>> need to disable the IRQ anymore.
>>
>> When setting the IRQ handler we now also test the forwarded state. In
>> case the IRQ is forwarded we select the edge handler (no automaske).
>>
>> Signed-off-by: Eric Auger <[email protected]>
>>
>> ---
>>
>> v2 -> v3:
>> - forwarded state was tested in the handler. Now the forwarded state
>> is tested before setting the handler. This definitively limits
>> the dynamics of forwarded state changes but I don't think there is
>> a use case where we need to be able to change the state at any time.
>
> user space can change this by calling the VFIO_IRQ_SET_ACTION_TRIGGER
> whenever it wants, right?
yes the user can set/unset the VFIO signaling (and request_irq/free_irq)
using VFIO_IRQ_SET_ACTION_TRIGGER. In this new version I do not allow
changing the forwarded state when the handler is attached (request_irq).

Does it answer your interrogation?

Best Regards

Eric

>
>> ---
>> drivers/vfio/platform/vfio_platform_irq.c | 7 ++++++-
>> 1 file changed, 6 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/vfio/platform/vfio_platform_irq.c b/drivers/vfio/platform/vfio_platform_irq.c
>> index 08d400e..61a2920 100644
>> --- a/drivers/vfio/platform/vfio_platform_irq.c
>> +++ b/drivers/vfio/platform/vfio_platform_irq.c
>> @@ -230,8 +230,13 @@ static int vfio_platform_set_irq_trigger(struct vfio_platform_device *vdev,
>> {
>> struct vfio_platform_irq *irq = &vdev->irqs[index];
>> irq_handler_t handler;
>> + struct irq_data *d;
>> + bool is_forwarded;
>>
>> - if (vdev->irqs[index].flags & VFIO_IRQ_INFO_MASKABLE)
>> + d = irq_get_irq_data(irq->hwirq);
>> + is_forwarded = irqd_irq_forwarded(d);
>> +
>> + if ((vdev->irqs[index].flags & VFIO_IRQ_INFO_MASKABLE) && !is_forwarded)
>> handler = vfio_maskable_irq_handler;
>> else
>> handler = vfio_irq_handler;
>> --
>> 1.9.1
>>

2014-12-01 14:47:50

by Eric Auger

[permalink] [raw]
Subject: Re: [PATCH v3 4/8] KVM: kvm-vfio: User API for IRQ forwarding

On 11/30/2014 01:53 PM, Christoffer Dall wrote:
> On Sun, Nov 23, 2014 at 07:35:56PM +0100, Eric Auger wrote:
>> This patch adds and document a new KVM_DEV_VFIO_DEVICE group
>
> documents
OK
>
>> and 2 device attributes: KVM_DEV_VFIO_DEVICE_FORWARD_IRQ,
>> KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ. The purpose is to be able
>> to set a VFIO device IRQ as forwarded or not forwarded.
>> the command takes as argument a handle to a new struct named
>> kvm_arch_forwarded_irq.
>>
>> Signed-off-by: Eric Auger <[email protected]>
>>
>> ---
>>
>> v2 -> v3:
>> - rework vfio kvm device documentation
>> - reword commit message and title
>> - add subindex in kvm_arch_forwarded_irq to be closer to VFIO API
>> - forwarding state can only be changed with VFIO IRQ signaling is off
>>
>> v1 -> v2:
>> - struct kvm_arch_forwarded_irq moved from arch/arm/include/uapi/asm/kvm.h
>> to include/uapi/linux/kvm.h
>> also irq_index renamed into index and guest_irq renamed into gsi
>> - ASSIGN/DEASSIGN renamed into FORWARD/UNFORWARD
>> ---
>> Documentation/virtual/kvm/devices/vfio.txt | 34 ++++++++++++++++++++++++------
>> include/uapi/linux/kvm.h | 10 +++++++++
>> 2 files changed, 38 insertions(+), 6 deletions(-)
>>
>> diff --git a/Documentation/virtual/kvm/devices/vfio.txt b/Documentation/virtual/kvm/devices/vfio.txt
>> index ef51740..f7aff29 100644
>> --- a/Documentation/virtual/kvm/devices/vfio.txt
>> +++ b/Documentation/virtual/kvm/devices/vfio.txt
>> @@ -4,15 +4,24 @@ VFIO virtual device
>> Device types supported:
>> KVM_DEV_TYPE_VFIO
>>
>> -Only one VFIO instance may be created per VM. The created device
>> -tracks VFIO groups in use by the VM and features of those groups
>> -important to the correctness and acceleration of the VM. As groups
>> -are enabled and disabled for use by the VM, KVM should be updated
>> -about their presence. When registered with KVM, a reference to the
>> -VFIO-group is held by KVM.
>> +Only one VFIO instance may be created per VM.
>> +
>> +The created device tracks VFIO groups in use by the VM and features
>> +of those groups important to the correctness and acceleration of
>> +the VM. As groups are enabled and disabled for use by the VM, KVM
>> +should be updated about their presence. When registered with KVM,
>> +a reference to the VFIO-group is held by KVM.
>> +
>> +The device also tracks & enable VFIO device forwarded IRQs, if any.
>
> s/tracks & enable/tracks and enables/
ok
>
>> +A physical forwarded IRQ is directly completed by the guest. This
>> +requires HW support in the interrupt controller which must be able
>> +to automatically complete the physical IRQ when it detects the guest
>> +has completed the corresponding virtual IRQ. The modality sometimes
>> +is named direct EOI.
>
> this last sentence is a bit floating out there; perhaps you want to be
> more specific and say what it is on ARM.
Well this is naming some people used at the KVM forum. I thought this
was something known on other archs but I can drop this comment.
>
>>
>> Groups:
>> KVM_DEV_VFIO_GROUP
>> + KVM_DEV_VFIO_DEVICE
>>
>> KVM_DEV_VFIO_GROUP attributes:
>> KVM_DEV_VFIO_GROUP_ADD: Add a VFIO group to VFIO-KVM device tracking
>> @@ -20,3 +29,16 @@ KVM_DEV_VFIO_GROUP attributes:
>>
>> For each, kvm_device_attr.addr points to an int32_t file descriptor
>> for the VFIO group.
>> +
>> +KVM_DEV_VFIO_DEVICE attributes:
>> + KVM_DEV_VFIO_DEVICE_FORWARD_IRQ: set a VFIO device IRQ as forwarded
>> + KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ: set a VFIO device IRQ as not forwarded
>> +
>> +For each, kvm_device_attr.addr points to a kvm_arch_forwarded_irq struct.
>> +
>> +The forwarded state can only be changed when the VFIO signaling mechanism
>> +for this physical IRQ is not set. In other words, forwarding must be
>> +activated before VFIO_DEVICE_SET_IRQS has been called to trigger the IRQ
>> +or associate an eventfd to it. Unforwarding can only be called while the
>> +signaling has been disabled with VFIO_DEVICE_SET_IRQS. If this condition is
>> +not satisfied, the command returns an -EBUSY.
>
> s/command/ioctl/
ok
>
>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
>> index 6076882..a269a42 100644
>> --- a/include/uapi/linux/kvm.h
>> +++ b/include/uapi/linux/kvm.h
>> @@ -946,6 +946,9 @@ struct kvm_device_attr {
>> #define KVM_DEV_VFIO_GROUP 1
>> #define KVM_DEV_VFIO_GROUP_ADD 1
>> #define KVM_DEV_VFIO_GROUP_DEL 2
>> +#define KVM_DEV_VFIO_DEVICE 2
>> +#define KVM_DEV_VFIO_DEVICE_FORWARD_IRQ 1
>> +#define KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ 2
>>
>> enum kvm_device_type {
>> KVM_DEV_TYPE_FSL_MPIC_20 = 1,
>> @@ -963,6 +966,13 @@ enum kvm_device_type {
>> KVM_DEV_TYPE_MAX,
>> };
>>
>> +struct kvm_arch_forwarded_irq {
>> + __u32 fd; /* file desciptor of the VFIO device */
>> + __u32 index; /* VFIO device IRQ index */
>> + __u32 subindex; /* VFIO device IRQ subindex */
>> + __u32 gsi; /* gsi, ie. virtual IRQ number */
>> +};
>> +
>> /*
>> * ioctls for VM fds
>> */
>> --
>> 1.9.1
>>
>
> otherwise this looks reasonable.
thanks
Eric
>

2014-12-01 14:57:08

by Eric Auger

[permalink] [raw]
Subject: Re: [PATCH v3 2/8] KVM: arm64: Enable the KVM-VFIO device

On 11/30/2014 01:14 PM, Christoffer Dall wrote:
> On Sun, Nov 23, 2014 at 07:35:54PM +0100, Eric Auger wrote:
>> Used by KVM-enabled VFIO-based device passthrough support in QEMU.
>>
>> Signed-off-by: Joel Schopp <[email protected]>
>> Signed-off-by: Eric Auger <[email protected]>
>>
>> ---
>>
>> Extracted from [RFC PATCH] arm64: KVM: add irqfd support
>> http://www.spinics.net/lists/kvm-arm/msg10798.html
>> ---
>> arch/arm64/kvm/Kconfig | 1 +
>> arch/arm64/kvm/Makefile | 2 +-
>> 2 files changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
>> index 09c25c2..2edf926 100644
>> --- a/arch/arm64/kvm/Kconfig
>> +++ b/arch/arm64/kvm/Kconfig
>> @@ -26,6 +26,7 @@ config KVM
>> select KVM_ARM_HOST
>> select KVM_ARM_VGIC
>> select KVM_ARM_TIMER
>> + select KVM_VFIO
>> select HAVE_KVM_EVENTFD
>> ---help---
>> Support hosting virtualized guest machines.
>> diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
>> index 2e6b827..81ed091 100644
>> --- a/arch/arm64/kvm/Makefile
>> +++ b/arch/arm64/kvm/Makefile
>> @@ -11,7 +11,7 @@ ARM=../../../arch/arm/kvm
>>
>> obj-$(CONFIG_KVM_ARM_HOST) += kvm.o
>>
>> -kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o $(KVM)/eventfd.o
>> +kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o $(KVM)/eventfd.o $(KVM)/vfio.o
>> kvm-$(CONFIG_KVM_ARM_HOST) += $(ARM)/arm.o $(ARM)/mmu.o $(ARM)/mmio.o
>> kvm-$(CONFIG_KVM_ARM_HOST) += $(ARM)/psci.o $(ARM)/perf.o
>>
>> --
>> 1.9.1
>>
>
> Should these patches not be squashed into one?
Yes I can do.
>
> Also, what do they enable at this point? Should they be queued by the
> end of the series instead?
Well to me this patch should be moved even outside of this series. The
KVM-VFIO device is loaded when the QEMU VFIO device is instantiated.
This is used to record the VFIO groups in use. In VFIO platform case, if
the KVM-VFIO device does not exist, this is not fatal but we get a
warning in QEMU.

The KVM-VFIO device however is mandatory to enable forwarded irq feature.

Best Regards

Eric
>
> -Christoffer
>

2014-12-01 20:10:19

by Christoffer Dall

[permalink] [raw]
Subject: Re: [PATCH v3 3/8] VFIO: platform: forwarded state tested when selecting IRQ handler

On Mon, Dec 01, 2014 at 03:39:24PM +0100, Eric Auger wrote:
> Hi Christoffer,
> On 11/30/2014 01:47 PM, Christoffer Dall wrote:
> > The subject reads strangely, perhaps just:
> >
> > VFIO: platform: test forward state when selecting IRQ handler
> OK
> >
> > On Sun, Nov 23, 2014 at 07:35:55PM +0100, Eric Auger wrote:
> >> In case the IRQ is forwarded, the VFIO platform IRQ handler does not
> >> need to disable the IRQ anymore.
> >>
> >> When setting the IRQ handler we now also test the forwarded state. In
> >> case the IRQ is forwarded we select the edge handler (no automaske).
> >>
> >> Signed-off-by: Eric Auger <[email protected]>
> >>
> >> ---
> >>
> >> v2 -> v3:
> >> - forwarded state was tested in the handler. Now the forwarded state
> >> is tested before setting the handler. This definitively limits
> >> the dynamics of forwarded state changes but I don't think there is
> >> a use case where we need to be able to change the state at any time.
> >
> > user space can change this by calling the VFIO_IRQ_SET_ACTION_TRIGGER
> > whenever it wants, right?
> yes the user can set/unset the VFIO signaling (and request_irq/free_irq)
> using VFIO_IRQ_SET_ACTION_TRIGGER. In this new version I do not allow
> changing the forwarded state when the handler is attached (request_irq).
>
> Does it answer your interrogation?
>
interrogation? hopefully it wasn't that bad ;)

Yes, that answers my question.

Thanks,
-Christoffer

2014-12-01 21:16:37

by Eric Auger

[permalink] [raw]
Subject: Re: [PATCH v3 3/8] VFIO: platform: forwarded state tested when selecting IRQ handler

On 12/01/2014 09:10 PM, Christoffer Dall wrote:
> On Mon, Dec 01, 2014 at 03:39:24PM +0100, Eric Auger wrote:
>> Hi Christoffer,
>> On 11/30/2014 01:47 PM, Christoffer Dall wrote:
>>> The subject reads strangely, perhaps just:
>>>
>>> VFIO: platform: test forward state when selecting IRQ handler
>> OK
>>>
>>> On Sun, Nov 23, 2014 at 07:35:55PM +0100, Eric Auger wrote:
>>>> In case the IRQ is forwarded, the VFIO platform IRQ handler does not
>>>> need to disable the IRQ anymore.
>>>>
>>>> When setting the IRQ handler we now also test the forwarded state. In
>>>> case the IRQ is forwarded we select the edge handler (no automaske).
>>>>
>>>> Signed-off-by: Eric Auger <[email protected]>
>>>>
>>>> ---
>>>>
>>>> v2 -> v3:
>>>> - forwarded state was tested in the handler. Now the forwarded state
>>>> is tested before setting the handler. This definitively limits
>>>> the dynamics of forwarded state changes but I don't think there is
>>>> a use case where we need to be able to change the state at any time.
>>>
>>> user space can change this by calling the VFIO_IRQ_SET_ACTION_TRIGGER
>>> whenever it wants, right?
>> yes the user can set/unset the VFIO signaling (and request_irq/free_irq)
>> using VFIO_IRQ_SET_ACTION_TRIGGER. In this new version I do not allow
>> changing the forwarded state when the handler is attached (request_irq).
>>
>> Does it answer your interrogation?
>>
> interrogation? hopefully it wasn't that bad ;)
Oups sorry "faux ami" in french. I wanted to say question but wanted to
say it smarter. another time ;-)
>
> Yes, that answers my question.
>
> Thanks,
> -Christoffer
>

2014-12-08 12:24:15

by Eric Auger

[permalink] [raw]
Subject: Re: [PATCH v3 7/8] KVM: kvm-vfio: generic forwarding control

On 11/25/2014 08:00 PM, Alex Williamson wrote:
> On Tue, 2014-11-25 at 19:20 +0100, Eric Auger wrote:
>> On 11/24/2014 09:56 PM, Alex Williamson wrote:
>>> On Sun, 2014-11-23 at 19:35 +0100, Eric Auger wrote:
>>>> This patch introduces a new KVM_DEV_VFIO_DEVICE group.
>>>>
>>>> This is a new control channel which enables KVM to cooperate with
>>>> viable VFIO devices.
>>>>
>>>> Functions are introduced to check the validity of a VFIO device
>>>> file descriptor, increment/decrement the ref counter of the VFIO
>>>> device.
>>>>
>>>> The patch introduces 2 attributes for this new device group:
>>>> KVM_DEV_VFIO_DEVICE_FORWARD_IRQ, KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ.
>>>> Their purpose is to turn a VFIO device IRQ into a forwarded IRQ and
>>>> unset respectively unset the feature.
>>>>
>>>> The VFIO device stores a list of registered forwarded IRQs. The reference
>>>> counter of the device is incremented each time a new IRQ is forwarded.
>>>> Reference counter is decremented when the IRQ forwarding is unset.
>>>>
>>>> The forwarding programmming is architecture specific, implemented in
>>>> kvm_arch_set_fwd_state function. Architecture specific implementation is
>>>> enabled when __KVM_HAVE_ARCH_KVM_VFIO_FORWARD is set. When not set those
>>>> functions are void.
>>>>
>>>> Signed-off-by: Eric Auger <[email protected]>
>>>>
>>>> ---
>>>>
>>>> v2 -> v3:
>>>> - add API comments in kvm_host.h
>>>> - improve the commit message
>>>> - create a private kvm_vfio_fwd_irq struct
>>>> - fwd_irq_action replaced by a bool and removal of VFIO_IRQ_CLEANUP. This
>>>> latter action will be handled in vgic.
>>>> - add a vfio_device handle argument to kvm_arch_set_fwd_state. The goal is
>>>> to move platform specific stuff in architecture specific code.
>>>> - kvm_arch_set_fwd_state renamed into kvm_arch_vfio_set_forward
>>>> - increment the ref counter each time we do an IRQ forwarding and decrement
>>>> this latter each time one IRQ forward is unset. Simplifies the whole
>>>> ref counting.
>>>> - simplification of list handling: create, search, removal
>>>>
>>>> v1 -> v2:
>>>> - __KVM_HAVE_ARCH_KVM_VFIO renamed into __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
>>>> - original patch file separated into 2 parts: generic part moved in vfio.c
>>>> and ARM specific part(kvm_arch_set_fwd_state)
>>>> ---
>>>> include/linux/kvm_host.h | 28 ++++++
>>>> virt/kvm/vfio.c | 249 ++++++++++++++++++++++++++++++++++++++++++++++-
>>>> 2 files changed, 274 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
>>>> index ea53b04..0b9659d 100644
>>>> --- a/include/linux/kvm_host.h
>>>> +++ b/include/linux/kvm_host.h
>>>> @@ -1076,6 +1076,15 @@ struct kvm_device_ops {
>>>> unsigned long arg);
>>>> };
>>>>
>>>> +/* internal self-contained structure describing a forwarded IRQ */
>>>> +struct kvm_fwd_irq {
>>>> + struct kvm *kvm; /* VM to inject the GSI into */
>>>> + struct vfio_device *vdev; /* vfio device the IRQ belongs to */
>>>> + __u32 index; /* VFIO device IRQ index */
>>>> + __u32 subindex; /* VFIO device IRQ subindex */
>>>> + __u32 gsi; /* gsi, ie. virtual IRQ number */
>>>> +};
>>>> +
>>>> void kvm_device_get(struct kvm_device *dev);
>>>> void kvm_device_put(struct kvm_device *dev);
>>>> struct kvm_device *kvm_device_from_filp(struct file *filp);
>>>> @@ -1085,6 +1094,25 @@ void kvm_unregister_device_ops(u32 type);
>>>> extern struct kvm_device_ops kvm_mpic_ops;
>>>> extern struct kvm_device_ops kvm_xics_ops;
>>>>
>>>> +#ifdef __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
>>>> +/**
>>>> + * kvm_arch_vfio_set_forward - changes the forwarded state of an IRQ
>>>> + *
>>>> + * @fwd_irq: handle to the forwarded irq struct
>>>> + * @forward: true means forwarded, false means not forwarded
>>>> + * returns 0 on success, < 0 on failure
>>>> + */
>>>> +int kvm_arch_vfio_set_forward(struct kvm_fwd_irq *fwd_irq,
>>>> + bool forward);
>>>
>>> We could add a struct device* to the args list or into struct
>>> kvm_fwd_irq so that arch code doesn't need to touch the vdev. arch code
>>> has no business dealing with references to the vfio_device.
>> Hi Alex,
>>
>> Currently It can't put struct device* into the kvm_fwd_irq struct since
>> I need to release the vfio_device with
>> vfio_device_put_external_user(struct vfio_device *vdev)
>> typically in kvm_vfio_clean_fwd_irq. So I need to store the pointers to
>> the vfio_device somewhere.
>>
>> I see 2 solutions: change the proto of
>> vfio_device_put_external_user(struct vfio_device *vdev) and pass a
>> struct device* (??)
>>
>> or change the proto of kvm_arch_vfio_set_forward into
>>
>> kvm_arch_vfio_set_forward(struct kvm *kvm, struct device *dev, int
>> index, [int subindex], int gsi, bool forward) or using index/start/count
>> but loosing the interest of having a self-contained internal struct.
>
> The latter is sort of what I was assuming, I think the interface between
> VFIO and KVM-VFIO is good, we just don't need to expose VFIO-isms out to
> the arch KVM code. KVM-VFIO should be the barrier layer. In that
> spirit, maybe it should be kvm_arch_set_forward() and the KVM-VFIO code
> should do the processing of index/subindex sort of like how Feng did for
> PCI devices.

Hi Alex,

In Feng's series, host irq is retrieved in the generic part while in
mine it is retrieved in arch specific part, as encouraged at some point.
>From the above comment I understand the right API between generic and
arch specific parts may be <operation>(kvm, host_irq, guest_irq) in
which case I should revert the platform specific IRQ retrieval in the
generic part. Is it the correct understanding?

Thanks

Eric
>
>>>
>>>> +
>>>> +#else
>>>> +static inline int kvm_arch_vfio_set_forward(struct kvm_fwd_irq *fwd_irq,
>>>> + bool forward)
>>>> +{
>>>> + return 0;
>>>> +}
>>>> +#endif
>>>> +
>>>> #ifdef CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT
>>>>
>>>> static inline void kvm_vcpu_set_in_spin_loop(struct kvm_vcpu *vcpu, bool val)
>>>> diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
>>>> index 6f0cc34..af178bb 100644
>>>> --- a/virt/kvm/vfio.c
>>>> +++ b/virt/kvm/vfio.c
>>>> @@ -25,8 +25,16 @@ struct kvm_vfio_group {
>>>> struct vfio_group *vfio_group;
>>>> };
>>>>
>>>> +/* private linkable kvm_fwd_irq struct */
>>>> +struct kvm_vfio_fwd_irq_node {
>>>> + struct list_head link;
>>>> + struct kvm_fwd_irq fwd_irq;
>>>> +};
>>>> +
>>>> struct kvm_vfio {
>>>> struct list_head group_list;
>>>> + /* list of registered VFIO forwarded IRQs */
>>>> + struct list_head fwd_node_list;
>>>> struct mutex lock;
>>>> bool noncoherent;
>>>> };
>>>> @@ -247,12 +255,239 @@ static int kvm_vfio_set_group(struct kvm_device *dev, long attr, u64 arg)
>>>> return -ENXIO;
>>>> }
>>>>
>>>> +/**
>>>> + * kvm_vfio_get_vfio_device - Returns a handle to a vfio-device
>>>> + *
>>>> + * Checks it is a valid vfio device and increments its reference counter
>>>> + * @fd: file descriptor of the vfio platform device
>>>> + */
>>>> +static struct vfio_device *kvm_vfio_get_vfio_device(int fd)
>>>> +{
>>>> + struct fd f = fdget(fd);
>>>> + struct vfio_device *vdev;
>>>> +
>>>> + if (!f.file)
>>>> + return NULL;
>>>
>>> ERR_PTR(-EINVAL)?
>>>
>>> ie. propagate errors from the point where they're encountered so we
>>> don't need to make up an errno later.
>> yes thanks
>>>
>>>> + vdev = kvm_vfio_device_get_external_user(f.file);
>>>> + fdput(f);
>>>> + return vdev;
>>>> +}
>>>> +
>>>> +/**
>>>> + * kvm_vfio_put_vfio_device: decrements the reference counter of the
>>>> + * vfio platform * device
>>>> + *
>>>> + * @vdev: vfio_device handle to release
>>>> + */
>>>> +static void kvm_vfio_put_vfio_device(struct vfio_device *vdev)
>>>> +{
>>>> + kvm_vfio_device_put_external_user(vdev);
>>>> +}
>>>> +
>>>> +/**
>>>> + * kvm_vfio_find_fwd_irq - checks whether a forwarded IRQ already is
>>>> + * registered in the list of forwarded IRQs
>>>> + *
>>>> + * @kv: handle to the kvm-vfio device
>>>> + * @fwd: handle to the forwarded irq struct
>>>> + * In the positive returns the handle to its node in the kvm-vfio
>>>> + * forwarded IRQ list, returns NULL otherwise.
>>>> + * Must be called with kv->lock hold.
>>>> + */
>>>> +static struct kvm_vfio_fwd_irq_node *kvm_vfio_find_fwd_irq(
>>>> + struct kvm_vfio *kv,
>>>> + struct kvm_fwd_irq *fwd)
>>>> +{
>>>> + struct kvm_vfio_fwd_irq_node *node;
>>>> +
>>>> + list_for_each_entry(node, &kv->fwd_node_list, link) {
>>>> + if ((node->fwd_irq.index == fwd->index) &&
>>>> + (node->fwd_irq.subindex == fwd->subindex) &&
>>>> + (node->fwd_irq.vdev == fwd->vdev))
>>>> + return node;
>>>> + }
>>>> + return NULL;
>>>> +}
>>>> +/**
>>>> + * kvm_vfio_register_fwd_irq - Allocates, populates and registers a
>>>> + * forwarded IRQ
>>>> + *
>>>> + * @kv: handle to the kvm-vfio device
>>>> + * @fwd: handle to the forwarded irq struct
>>>> + * In case of success returns a handle to the new list node,
>>>> + * NULL otherwise.
>>>> + * Must be called with kv->lock hold.
>>>> + */
>>>> +static struct kvm_vfio_fwd_irq_node *kvm_vfio_register_fwd_irq(
>>>> + struct kvm_vfio *kv,
>>>> + struct kvm_fwd_irq *fwd)
>>>> +{
>>>> + struct kvm_vfio_fwd_irq_node *node;
>>>> +
>>>> + node = kmalloc(sizeof(*node), GFP_KERNEL);
>>>> + if (!node)
>>>> + return NULL;
>>>
>>> ERR_PTR(-ENOMEM)?
>>>
>> OK
>>>> +
>>>> + node->fwd_irq = *fwd;
>>>> +
>>>> + list_add(&node->link, &kv->fwd_node_list);
>>>> +
>>>> + return node;
>>>> +}
>>>> +
>>>> +/**
>>>> + * kvm_vfio_unregister_fwd_irq - unregisters and frees a forwarded IRQ
>>>> + *
>>>> + * @node: handle to the node struct
>>>> + * Must be called with kv->lock hold.
>>>> + */
>>>> +static void kvm_vfio_unregister_fwd_irq(struct kvm_vfio_fwd_irq_node *node)
>>>> +{
>>>> + list_del(&node->link);
>>>> + kfree(node);
>>>> +}
>>>> +
>>>> +/**
>>>> + * kvm_vfio_set_forward - turns a VFIO device IRQ into a forwarded IRQ
>>>> + * @kv: handle to the kvm-vfio device
>>>> + * @fd: file descriptor of the vfio device the IRQ belongs to
>>>> + * @fwd: handle to the forwarded irq struct
>>>> + *
>>>> + * Registers an IRQ as forwarded and calls the architecture specific
>>>> + * implementation of set_forward. In case of operation failure, the IRQ
>>>> + * is unregistered. In case of success, the vfio device ref counter is
>>>> + * incremented.
>>>> + */
>>>> +static int kvm_vfio_set_forward(struct kvm_vfio *kv, int fd,
>>>> + struct kvm_fwd_irq *fwd)
>>>> +{
>>>> + int ret;
>>>> + struct kvm_vfio_fwd_irq_node *node =
>>>> + kvm_vfio_find_fwd_irq(kv, fwd);
>>>> +
>>>> + if (node)
>>>> + return -EINVAL;
>>>
>>> I assume you're saving -EBUSY for arch code for the case where the IRQ
>>> is already active?
>> yes. -EBUSY now corresponds to the case where the VFIO signaling is
>> already setup.
>>>
>>>> + node = kvm_vfio_register_fwd_irq(kv, fwd);
>>>> + if (!node)
>>>> + return -ENOMEM;
>>>
>>> if (IS_ERR(node))
>>> return PTR_ERR(node);
>>>
>>>> + ret = kvm_arch_vfio_set_forward(fwd, true);
>>>> + if (ret < 0) {
>>>> + kvm_vfio_unregister_fwd_irq(node);
>>>> + return ret;
>>>> + }
>>>> + /* increment the ref counter */
>>>> + kvm_vfio_get_vfio_device(fd);
>>>
>>> Wouldn't it be easier if the reference counting were coupled with the
>>> register/unregister_fwd_irq?
>> ok
>> I'd be tempted to pass your user_fwd_irq
>>> to this function instead of a kvm_fwd_irq.
>> Well in that case I would use kvm_arch_forwarded_irq for both set and
>> unset. Then comes the problem of kvm_vfio_clean_fwd_irq which
>> manipulates only internal structs.
>>
>>>> + return ret;
>>>> +}
>>>> +
>>>> +/**
>>>> + * kvm_vfio_unset_forward - Sets a VFIO device IRQ as non forwarded
>>>> + * @kv: handle to the kvm-vfio device
>>>> + * @fwd: handle to the forwarded irq struct
>>>> + *
>>>> + * Calls the architecture specific implementation of set_forward and
>>>> + * unregisters the IRQ from the forwarded IRQ list. Decrements the vfio
>>>> + * device reference counter.
>>>> + */
>>>> +static int kvm_vfio_unset_forward(struct kvm_vfio *kv,
>>>> + struct kvm_fwd_irq *fwd)
>>>> +{
>>>> + int ret;
>>>> + struct kvm_vfio_fwd_irq_node *node =
>>>> + kvm_vfio_find_fwd_irq(kv, fwd);
>>>> + if (!node)
>>>> + return -EINVAL;
>>>> + ret = kvm_arch_vfio_set_forward(fwd, false);
>>>
>>> Whoa, if the unforward fails we continue to undo everything else? How
>>> does userspace cleanup from this? We need a guaranteed shutdown path
>>> for cleanup code, we can never trust userspace to do things in the
>>> correct order. Can we really preclude the user calling unforward with
>>> an active IRQ? Maybe _forward() and _unforward() need to be separate
>>> functions so that 'un' can be made void to indicate it can't fail.
>> If I accept an unforward while the VFIO signaling mechanism is set, the
>> wrong handler will be setup on VFIO driver. So should I put in place a
>> mechanism that changes the VFIO handler for that irq (causing VFIO
>> driver free_irq/request_irq), using another external API function?
>
> Yep, it seems like we need to re-evaluate whether forwarding can be
> changed on a running IRQ. Disallowing it doesn't appear to support KVM
> initiated shutdown, only user initiated configuration. So the
> vfio-platform interrupt handler probably needs to be bi-modal. Maybe
> the forwarding state of the IRQ can use RCU to avoid locks.
>
>>>> + kvm_vfio_unregister_fwd_irq(node);
>>>> +
>>>> + /* decrement the ref counter */
>>>> + kvm_vfio_put_vfio_device(fwd->vdev);
>>>
>>> Again, seems like the unregister should do this.
>> ok
>>>
>>>> + return ret;
>>>> +}
>>>> +
>>>> +static int kvm_vfio_control_irq_forward(struct kvm_device *kdev, long attr,
>>>> + int32_t __user *argp)
>>>> +{
>>>> + struct kvm_arch_forwarded_irq user_fwd_irq;
>>>> + struct kvm_fwd_irq fwd;
>>>> + struct vfio_device *vdev;
>>>> + struct kvm_vfio *kv = kdev->private;
>>>> + int ret;
>>>> +
>>>> + if (copy_from_user(&user_fwd_irq, argp, sizeof(user_fwd_irq)))
>>>> + return -EFAULT;
>>>> +
>>>> + vdev = kvm_vfio_get_vfio_device(user_fwd_irq.fd);
>>>> + if (IS_ERR(vdev)) {
>>>> + ret = PTR_ERR(vdev);
>>>> + goto out;
>>>> + }
>>>> +
>>>> + fwd.vdev = vdev;
>>>> + fwd.kvm = kdev->kvm;
>>>> + fwd.index = user_fwd_irq.index;
>>>> + fwd.subindex = user_fwd_irq.subindex;
>>>> + fwd.gsi = user_fwd_irq.gsi;
>>>> +
>>>> + switch (attr) {
>>>> + case KVM_DEV_VFIO_DEVICE_FORWARD_IRQ:
>>>> + mutex_lock(&kv->lock);
>>>> + ret = kvm_vfio_set_forward(kv, user_fwd_irq.fd, &fwd);
>>>> + mutex_unlock(&kv->lock);
>>>> + break;
>>>> + case KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ:
>>>> + mutex_lock(&kv->lock);
>>>> + ret = kvm_vfio_unset_forward(kv, &fwd);
>>>> + mutex_unlock(&kv->lock);
>>>> + break;
>>>> + }
>>>> +out:
>>>> + kvm_vfio_put_vfio_device(vdev);
>>>
>>> It might add a little extra code, but logically the reference being tied
>>> to the register/unregister feels a bit cleaner than this.
>> Sorry I do not catch your comment here. Since i called
>> kvm_vfio_get_vfio_device at the beg of the function, I need to release
>> at the end of the function, besides the ref counter incr/decr at
>> registration?
>
> If we do reference counting on register/unregister, I'd think that the
> get/put in this function would go away. Having the reference here seems
> redundant.
>
>>>
>>>> + return ret;
>>>> +}
>>>> +
>>>> +static int kvm_vfio_set_device(struct kvm_device *kdev, long attr, u64 arg)
>>>> +{
>>>> + int32_t __user *argp = (int32_t __user *)(unsigned long)arg;
>>>> + int ret;
>>>> +
>>>> + switch (attr) {
>>>> + case KVM_DEV_VFIO_DEVICE_FORWARD_IRQ:
>>>> + case KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ:
>>>> + ret = kvm_vfio_control_irq_forward(kdev, attr, argp);
>>>> + break;
>>>> + default:
>>>> + ret = -ENXIO;
>>>> + }
>>>> + return ret;
>>>> +}
>>>> +
>>>> +/**
>>>> + * kvm_vfio_clean_fwd_irq - Unset forwarding state of all
>>>> + * registered forwarded IRQs and free their list nodes.
>>>> + * @kv: kvm-vfio device
>>>> + *
>>>> + * Loop on all registered device/IRQ combos, reset the non forwarded state,
>>>> + * void the lists and release the reference
>>>> + */
>>>> +static int kvm_vfio_clean_fwd_irq(struct kvm_vfio *kv)
>>>> +{
>>>> + struct kvm_vfio_fwd_irq_node *node, *tmp;
>>>> +
>>>> + list_for_each_entry_safe(node, tmp, &kv->fwd_node_list, link) {
>>>> + kvm_vfio_unset_forward(kv, &node->fwd_irq);
>>>> + }
>>>> + return 0;
>>>
>>> This shouldn't be able to fail, make it void.
>> see above questioning? But you're right, I am too much virtualization
>> oriented. Currently my cleanup is even done in the virtual interrupt
>> controller when the VM dies because nobody unsets the VFIO signaling.
>
> Yep, being a kernel interface it needs to be hardened against careless
> and malicious users. The kernel should return to the correct state if
> we kill -9 QEMU at any point. Thanks,
>
> Alex
>

2014-12-08 16:55:59

by Alex Williamson

[permalink] [raw]
Subject: Re: [PATCH v3 7/8] KVM: kvm-vfio: generic forwarding control

On Mon, 2014-12-08 at 13:22 +0100, Eric Auger wrote:
> On 11/25/2014 08:00 PM, Alex Williamson wrote:
> > On Tue, 2014-11-25 at 19:20 +0100, Eric Auger wrote:
> >> On 11/24/2014 09:56 PM, Alex Williamson wrote:
> >>> On Sun, 2014-11-23 at 19:35 +0100, Eric Auger wrote:
> >>>> This patch introduces a new KVM_DEV_VFIO_DEVICE group.
> >>>>
> >>>> This is a new control channel which enables KVM to cooperate with
> >>>> viable VFIO devices.
> >>>>
> >>>> Functions are introduced to check the validity of a VFIO device
> >>>> file descriptor, increment/decrement the ref counter of the VFIO
> >>>> device.
> >>>>
> >>>> The patch introduces 2 attributes for this new device group:
> >>>> KVM_DEV_VFIO_DEVICE_FORWARD_IRQ, KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ.
> >>>> Their purpose is to turn a VFIO device IRQ into a forwarded IRQ and
> >>>> unset respectively unset the feature.
> >>>>
> >>>> The VFIO device stores a list of registered forwarded IRQs. The reference
> >>>> counter of the device is incremented each time a new IRQ is forwarded.
> >>>> Reference counter is decremented when the IRQ forwarding is unset.
> >>>>
> >>>> The forwarding programmming is architecture specific, implemented in
> >>>> kvm_arch_set_fwd_state function. Architecture specific implementation is
> >>>> enabled when __KVM_HAVE_ARCH_KVM_VFIO_FORWARD is set. When not set those
> >>>> functions are void.
> >>>>
> >>>> Signed-off-by: Eric Auger <[email protected]>
> >>>>
> >>>> ---
> >>>>
> >>>> v2 -> v3:
> >>>> - add API comments in kvm_host.h
> >>>> - improve the commit message
> >>>> - create a private kvm_vfio_fwd_irq struct
> >>>> - fwd_irq_action replaced by a bool and removal of VFIO_IRQ_CLEANUP. This
> >>>> latter action will be handled in vgic.
> >>>> - add a vfio_device handle argument to kvm_arch_set_fwd_state. The goal is
> >>>> to move platform specific stuff in architecture specific code.
> >>>> - kvm_arch_set_fwd_state renamed into kvm_arch_vfio_set_forward
> >>>> - increment the ref counter each time we do an IRQ forwarding and decrement
> >>>> this latter each time one IRQ forward is unset. Simplifies the whole
> >>>> ref counting.
> >>>> - simplification of list handling: create, search, removal
> >>>>
> >>>> v1 -> v2:
> >>>> - __KVM_HAVE_ARCH_KVM_VFIO renamed into __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
> >>>> - original patch file separated into 2 parts: generic part moved in vfio.c
> >>>> and ARM specific part(kvm_arch_set_fwd_state)
> >>>> ---
> >>>> include/linux/kvm_host.h | 28 ++++++
> >>>> virt/kvm/vfio.c | 249 ++++++++++++++++++++++++++++++++++++++++++++++-
> >>>> 2 files changed, 274 insertions(+), 3 deletions(-)
> >>>>
> >>>> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> >>>> index ea53b04..0b9659d 100644
> >>>> --- a/include/linux/kvm_host.h
> >>>> +++ b/include/linux/kvm_host.h
> >>>> @@ -1076,6 +1076,15 @@ struct kvm_device_ops {
> >>>> unsigned long arg);
> >>>> };
> >>>>
> >>>> +/* internal self-contained structure describing a forwarded IRQ */
> >>>> +struct kvm_fwd_irq {
> >>>> + struct kvm *kvm; /* VM to inject the GSI into */
> >>>> + struct vfio_device *vdev; /* vfio device the IRQ belongs to */
> >>>> + __u32 index; /* VFIO device IRQ index */
> >>>> + __u32 subindex; /* VFIO device IRQ subindex */
> >>>> + __u32 gsi; /* gsi, ie. virtual IRQ number */
> >>>> +};
> >>>> +
> >>>> void kvm_device_get(struct kvm_device *dev);
> >>>> void kvm_device_put(struct kvm_device *dev);
> >>>> struct kvm_device *kvm_device_from_filp(struct file *filp);
> >>>> @@ -1085,6 +1094,25 @@ void kvm_unregister_device_ops(u32 type);
> >>>> extern struct kvm_device_ops kvm_mpic_ops;
> >>>> extern struct kvm_device_ops kvm_xics_ops;
> >>>>
> >>>> +#ifdef __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
> >>>> +/**
> >>>> + * kvm_arch_vfio_set_forward - changes the forwarded state of an IRQ
> >>>> + *
> >>>> + * @fwd_irq: handle to the forwarded irq struct
> >>>> + * @forward: true means forwarded, false means not forwarded
> >>>> + * returns 0 on success, < 0 on failure
> >>>> + */
> >>>> +int kvm_arch_vfio_set_forward(struct kvm_fwd_irq *fwd_irq,
> >>>> + bool forward);
> >>>
> >>> We could add a struct device* to the args list or into struct
> >>> kvm_fwd_irq so that arch code doesn't need to touch the vdev. arch code
> >>> has no business dealing with references to the vfio_device.
> >> Hi Alex,
> >>
> >> Currently It can't put struct device* into the kvm_fwd_irq struct since
> >> I need to release the vfio_device with
> >> vfio_device_put_external_user(struct vfio_device *vdev)
> >> typically in kvm_vfio_clean_fwd_irq. So I need to store the pointers to
> >> the vfio_device somewhere.
> >>
> >> I see 2 solutions: change the proto of
> >> vfio_device_put_external_user(struct vfio_device *vdev) and pass a
> >> struct device* (??)
> >>
> >> or change the proto of kvm_arch_vfio_set_forward into
> >>
> >> kvm_arch_vfio_set_forward(struct kvm *kvm, struct device *dev, int
> >> index, [int subindex], int gsi, bool forward) or using index/start/count
> >> but loosing the interest of having a self-contained internal struct.
> >
> > The latter is sort of what I was assuming, I think the interface between
> > VFIO and KVM-VFIO is good, we just don't need to expose VFIO-isms out to
> > the arch KVM code. KVM-VFIO should be the barrier layer. In that
> > spirit, maybe it should be kvm_arch_set_forward() and the KVM-VFIO code
> > should do the processing of index/subindex sort of like how Feng did for
> > PCI devices.
>
> Hi Alex,
>
> In Feng's series, host irq is retrieved in the generic part while in
> mine it is retrieved in arch specific part, as encouraged at some point.
> From the above comment I understand the right API between generic and
> arch specific parts may be <operation>(kvm, host_irq, guest_irq) in
> which case I should revert the platform specific IRQ retrieval in the
> generic part. Is it the correct understanding?

Hi Eric,

Sorry if I'm flip-flopping on any of this, but I think you're right that
we want some sort of <operation>(kvm, host_irq, guest_irq) callout in
the kvm direction. The parsing of vfio index/sub-index is vfio specific
and needs to happen in either kvm-vfio or deeper on the vfio side of the
equation. We don't want things like vfio-pci encoding of
index/sub-index leaking out into the non-vfio related parts of the code.

In a perfectly abstracted world, that might mean that in addition to the
vfio external user interface to get the base struct device, we also have
a mechanism that takes a vfio_device, index, and sub-index and returns a
host irq, encapsulating that code in vfio-pci or vfio-platform rather
than having it leak into kvm-vfio. Our layering should be:

vfio bus driver - vfio - kvm-vfio - kvm - arch

And ideally the data goes like this:

host irq --------------------------------->
device -------------------->
guest irq -------->

I'm willing to accept though that kvm-vfio might have everything it
needs to determine host irq and the routing back to the bus driver is
more effort than simply decoding it in kvm-vfio. Thanks,

Alex

> >>>> +
> >>>> +#else
> >>>> +static inline int kvm_arch_vfio_set_forward(struct kvm_fwd_irq *fwd_irq,
> >>>> + bool forward)
> >>>> +{
> >>>> + return 0;
> >>>> +}
> >>>> +#endif
> >>>> +
> >>>> #ifdef CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT
> >>>>
> >>>> static inline void kvm_vcpu_set_in_spin_loop(struct kvm_vcpu *vcpu, bool val)
> >>>> diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
> >>>> index 6f0cc34..af178bb 100644
> >>>> --- a/virt/kvm/vfio.c
> >>>> +++ b/virt/kvm/vfio.c
> >>>> @@ -25,8 +25,16 @@ struct kvm_vfio_group {
> >>>> struct vfio_group *vfio_group;
> >>>> };
> >>>>
> >>>> +/* private linkable kvm_fwd_irq struct */
> >>>> +struct kvm_vfio_fwd_irq_node {
> >>>> + struct list_head link;
> >>>> + struct kvm_fwd_irq fwd_irq;
> >>>> +};
> >>>> +
> >>>> struct kvm_vfio {
> >>>> struct list_head group_list;
> >>>> + /* list of registered VFIO forwarded IRQs */
> >>>> + struct list_head fwd_node_list;
> >>>> struct mutex lock;
> >>>> bool noncoherent;
> >>>> };
> >>>> @@ -247,12 +255,239 @@ static int kvm_vfio_set_group(struct kvm_device *dev, long attr, u64 arg)
> >>>> return -ENXIO;
> >>>> }
> >>>>
> >>>> +/**
> >>>> + * kvm_vfio_get_vfio_device - Returns a handle to a vfio-device
> >>>> + *
> >>>> + * Checks it is a valid vfio device and increments its reference counter
> >>>> + * @fd: file descriptor of the vfio platform device
> >>>> + */
> >>>> +static struct vfio_device *kvm_vfio_get_vfio_device(int fd)
> >>>> +{
> >>>> + struct fd f = fdget(fd);
> >>>> + struct vfio_device *vdev;
> >>>> +
> >>>> + if (!f.file)
> >>>> + return NULL;
> >>>
> >>> ERR_PTR(-EINVAL)?
> >>>
> >>> ie. propagate errors from the point where they're encountered so we
> >>> don't need to make up an errno later.
> >> yes thanks
> >>>
> >>>> + vdev = kvm_vfio_device_get_external_user(f.file);
> >>>> + fdput(f);
> >>>> + return vdev;
> >>>> +}
> >>>> +
> >>>> +/**
> >>>> + * kvm_vfio_put_vfio_device: decrements the reference counter of the
> >>>> + * vfio platform * device
> >>>> + *
> >>>> + * @vdev: vfio_device handle to release
> >>>> + */
> >>>> +static void kvm_vfio_put_vfio_device(struct vfio_device *vdev)
> >>>> +{
> >>>> + kvm_vfio_device_put_external_user(vdev);
> >>>> +}
> >>>> +
> >>>> +/**
> >>>> + * kvm_vfio_find_fwd_irq - checks whether a forwarded IRQ already is
> >>>> + * registered in the list of forwarded IRQs
> >>>> + *
> >>>> + * @kv: handle to the kvm-vfio device
> >>>> + * @fwd: handle to the forwarded irq struct
> >>>> + * In the positive returns the handle to its node in the kvm-vfio
> >>>> + * forwarded IRQ list, returns NULL otherwise.
> >>>> + * Must be called with kv->lock hold.
> >>>> + */
> >>>> +static struct kvm_vfio_fwd_irq_node *kvm_vfio_find_fwd_irq(
> >>>> + struct kvm_vfio *kv,
> >>>> + struct kvm_fwd_irq *fwd)
> >>>> +{
> >>>> + struct kvm_vfio_fwd_irq_node *node;
> >>>> +
> >>>> + list_for_each_entry(node, &kv->fwd_node_list, link) {
> >>>> + if ((node->fwd_irq.index == fwd->index) &&
> >>>> + (node->fwd_irq.subindex == fwd->subindex) &&
> >>>> + (node->fwd_irq.vdev == fwd->vdev))
> >>>> + return node;
> >>>> + }
> >>>> + return NULL;
> >>>> +}
> >>>> +/**
> >>>> + * kvm_vfio_register_fwd_irq - Allocates, populates and registers a
> >>>> + * forwarded IRQ
> >>>> + *
> >>>> + * @kv: handle to the kvm-vfio device
> >>>> + * @fwd: handle to the forwarded irq struct
> >>>> + * In case of success returns a handle to the new list node,
> >>>> + * NULL otherwise.
> >>>> + * Must be called with kv->lock hold.
> >>>> + */
> >>>> +static struct kvm_vfio_fwd_irq_node *kvm_vfio_register_fwd_irq(
> >>>> + struct kvm_vfio *kv,
> >>>> + struct kvm_fwd_irq *fwd)
> >>>> +{
> >>>> + struct kvm_vfio_fwd_irq_node *node;
> >>>> +
> >>>> + node = kmalloc(sizeof(*node), GFP_KERNEL);
> >>>> + if (!node)
> >>>> + return NULL;
> >>>
> >>> ERR_PTR(-ENOMEM)?
> >>>
> >> OK
> >>>> +
> >>>> + node->fwd_irq = *fwd;
> >>>> +
> >>>> + list_add(&node->link, &kv->fwd_node_list);
> >>>> +
> >>>> + return node;
> >>>> +}
> >>>> +
> >>>> +/**
> >>>> + * kvm_vfio_unregister_fwd_irq - unregisters and frees a forwarded IRQ
> >>>> + *
> >>>> + * @node: handle to the node struct
> >>>> + * Must be called with kv->lock hold.
> >>>> + */
> >>>> +static void kvm_vfio_unregister_fwd_irq(struct kvm_vfio_fwd_irq_node *node)
> >>>> +{
> >>>> + list_del(&node->link);
> >>>> + kfree(node);
> >>>> +}
> >>>> +
> >>>> +/**
> >>>> + * kvm_vfio_set_forward - turns a VFIO device IRQ into a forwarded IRQ
> >>>> + * @kv: handle to the kvm-vfio device
> >>>> + * @fd: file descriptor of the vfio device the IRQ belongs to
> >>>> + * @fwd: handle to the forwarded irq struct
> >>>> + *
> >>>> + * Registers an IRQ as forwarded and calls the architecture specific
> >>>> + * implementation of set_forward. In case of operation failure, the IRQ
> >>>> + * is unregistered. In case of success, the vfio device ref counter is
> >>>> + * incremented.
> >>>> + */
> >>>> +static int kvm_vfio_set_forward(struct kvm_vfio *kv, int fd,
> >>>> + struct kvm_fwd_irq *fwd)
> >>>> +{
> >>>> + int ret;
> >>>> + struct kvm_vfio_fwd_irq_node *node =
> >>>> + kvm_vfio_find_fwd_irq(kv, fwd);
> >>>> +
> >>>> + if (node)
> >>>> + return -EINVAL;
> >>>
> >>> I assume you're saving -EBUSY for arch code for the case where the IRQ
> >>> is already active?
> >> yes. -EBUSY now corresponds to the case where the VFIO signaling is
> >> already setup.
> >>>
> >>>> + node = kvm_vfio_register_fwd_irq(kv, fwd);
> >>>> + if (!node)
> >>>> + return -ENOMEM;
> >>>
> >>> if (IS_ERR(node))
> >>> return PTR_ERR(node);
> >>>
> >>>> + ret = kvm_arch_vfio_set_forward(fwd, true);
> >>>> + if (ret < 0) {
> >>>> + kvm_vfio_unregister_fwd_irq(node);
> >>>> + return ret;
> >>>> + }
> >>>> + /* increment the ref counter */
> >>>> + kvm_vfio_get_vfio_device(fd);
> >>>
> >>> Wouldn't it be easier if the reference counting were coupled with the
> >>> register/unregister_fwd_irq?
> >> ok
> >> I'd be tempted to pass your user_fwd_irq
> >>> to this function instead of a kvm_fwd_irq.
> >> Well in that case I would use kvm_arch_forwarded_irq for both set and
> >> unset. Then comes the problem of kvm_vfio_clean_fwd_irq which
> >> manipulates only internal structs.
> >>
> >>>> + return ret;
> >>>> +}
> >>>> +
> >>>> +/**
> >>>> + * kvm_vfio_unset_forward - Sets a VFIO device IRQ as non forwarded
> >>>> + * @kv: handle to the kvm-vfio device
> >>>> + * @fwd: handle to the forwarded irq struct
> >>>> + *
> >>>> + * Calls the architecture specific implementation of set_forward and
> >>>> + * unregisters the IRQ from the forwarded IRQ list. Decrements the vfio
> >>>> + * device reference counter.
> >>>> + */
> >>>> +static int kvm_vfio_unset_forward(struct kvm_vfio *kv,
> >>>> + struct kvm_fwd_irq *fwd)
> >>>> +{
> >>>> + int ret;
> >>>> + struct kvm_vfio_fwd_irq_node *node =
> >>>> + kvm_vfio_find_fwd_irq(kv, fwd);
> >>>> + if (!node)
> >>>> + return -EINVAL;
> >>>> + ret = kvm_arch_vfio_set_forward(fwd, false);
> >>>
> >>> Whoa, if the unforward fails we continue to undo everything else? How
> >>> does userspace cleanup from this? We need a guaranteed shutdown path
> >>> for cleanup code, we can never trust userspace to do things in the
> >>> correct order. Can we really preclude the user calling unforward with
> >>> an active IRQ? Maybe _forward() and _unforward() need to be separate
> >>> functions so that 'un' can be made void to indicate it can't fail.
> >> If I accept an unforward while the VFIO signaling mechanism is set, the
> >> wrong handler will be setup on VFIO driver. So should I put in place a
> >> mechanism that changes the VFIO handler for that irq (causing VFIO
> >> driver free_irq/request_irq), using another external API function?
> >
> > Yep, it seems like we need to re-evaluate whether forwarding can be
> > changed on a running IRQ. Disallowing it doesn't appear to support KVM
> > initiated shutdown, only user initiated configuration. So the
> > vfio-platform interrupt handler probably needs to be bi-modal. Maybe
> > the forwarding state of the IRQ can use RCU to avoid locks.
> >
> >>>> + kvm_vfio_unregister_fwd_irq(node);
> >>>> +
> >>>> + /* decrement the ref counter */
> >>>> + kvm_vfio_put_vfio_device(fwd->vdev);
> >>>
> >>> Again, seems like the unregister should do this.
> >> ok
> >>>
> >>>> + return ret;
> >>>> +}
> >>>> +
> >>>> +static int kvm_vfio_control_irq_forward(struct kvm_device *kdev, long attr,
> >>>> + int32_t __user *argp)
> >>>> +{
> >>>> + struct kvm_arch_forwarded_irq user_fwd_irq;
> >>>> + struct kvm_fwd_irq fwd;
> >>>> + struct vfio_device *vdev;
> >>>> + struct kvm_vfio *kv = kdev->private;
> >>>> + int ret;
> >>>> +
> >>>> + if (copy_from_user(&user_fwd_irq, argp, sizeof(user_fwd_irq)))
> >>>> + return -EFAULT;
> >>>> +
> >>>> + vdev = kvm_vfio_get_vfio_device(user_fwd_irq.fd);
> >>>> + if (IS_ERR(vdev)) {
> >>>> + ret = PTR_ERR(vdev);
> >>>> + goto out;
> >>>> + }
> >>>> +
> >>>> + fwd.vdev = vdev;
> >>>> + fwd.kvm = kdev->kvm;
> >>>> + fwd.index = user_fwd_irq.index;
> >>>> + fwd.subindex = user_fwd_irq.subindex;
> >>>> + fwd.gsi = user_fwd_irq.gsi;
> >>>> +
> >>>> + switch (attr) {
> >>>> + case KVM_DEV_VFIO_DEVICE_FORWARD_IRQ:
> >>>> + mutex_lock(&kv->lock);
> >>>> + ret = kvm_vfio_set_forward(kv, user_fwd_irq.fd, &fwd);
> >>>> + mutex_unlock(&kv->lock);
> >>>> + break;
> >>>> + case KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ:
> >>>> + mutex_lock(&kv->lock);
> >>>> + ret = kvm_vfio_unset_forward(kv, &fwd);
> >>>> + mutex_unlock(&kv->lock);
> >>>> + break;
> >>>> + }
> >>>> +out:
> >>>> + kvm_vfio_put_vfio_device(vdev);
> >>>
> >>> It might add a little extra code, but logically the reference being tied
> >>> to the register/unregister feels a bit cleaner than this.
> >> Sorry I do not catch your comment here. Since i called
> >> kvm_vfio_get_vfio_device at the beg of the function, I need to release
> >> at the end of the function, besides the ref counter incr/decr at
> >> registration?
> >
> > If we do reference counting on register/unregister, I'd think that the
> > get/put in this function would go away. Having the reference here seems
> > redundant.
> >
> >>>
> >>>> + return ret;
> >>>> +}
> >>>> +
> >>>> +static int kvm_vfio_set_device(struct kvm_device *kdev, long attr, u64 arg)
> >>>> +{
> >>>> + int32_t __user *argp = (int32_t __user *)(unsigned long)arg;
> >>>> + int ret;
> >>>> +
> >>>> + switch (attr) {
> >>>> + case KVM_DEV_VFIO_DEVICE_FORWARD_IRQ:
> >>>> + case KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ:
> >>>> + ret = kvm_vfio_control_irq_forward(kdev, attr, argp);
> >>>> + break;
> >>>> + default:
> >>>> + ret = -ENXIO;
> >>>> + }
> >>>> + return ret;
> >>>> +}
> >>>> +
> >>>> +/**
> >>>> + * kvm_vfio_clean_fwd_irq - Unset forwarding state of all
> >>>> + * registered forwarded IRQs and free their list nodes.
> >>>> + * @kv: kvm-vfio device
> >>>> + *
> >>>> + * Loop on all registered device/IRQ combos, reset the non forwarded state,
> >>>> + * void the lists and release the reference
> >>>> + */
> >>>> +static int kvm_vfio_clean_fwd_irq(struct kvm_vfio *kv)
> >>>> +{
> >>>> + struct kvm_vfio_fwd_irq_node *node, *tmp;
> >>>> +
> >>>> + list_for_each_entry_safe(node, tmp, &kv->fwd_node_list, link) {
> >>>> + kvm_vfio_unset_forward(kv, &node->fwd_irq);
> >>>> + }
> >>>> + return 0;
> >>>
> >>> This shouldn't be able to fail, make it void.
> >> see above questioning? But you're right, I am too much virtualization
> >> oriented. Currently my cleanup is even done in the virtual interrupt
> >> controller when the VM dies because nobody unsets the VFIO signaling.
> >
> > Yep, being a kernel interface it needs to be hardened against careless
> > and malicious users. The kernel should return to the correct state if
> > we kill -9 QEMU at any point. Thanks,
> >
> > Alex
> >
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/


2014-12-08 17:14:30

by Eric Auger

[permalink] [raw]
Subject: Re: [PATCH v3 7/8] KVM: kvm-vfio: generic forwarding control

On 12/08/2014 05:54 PM, Alex Williamson wrote:
> On Mon, 2014-12-08 at 13:22 +0100, Eric Auger wrote:
>> On 11/25/2014 08:00 PM, Alex Williamson wrote:
>>> On Tue, 2014-11-25 at 19:20 +0100, Eric Auger wrote:
>>>> On 11/24/2014 09:56 PM, Alex Williamson wrote:
>>>>> On Sun, 2014-11-23 at 19:35 +0100, Eric Auger wrote:
>>>>>> This patch introduces a new KVM_DEV_VFIO_DEVICE group.
>>>>>>
>>>>>> This is a new control channel which enables KVM to cooperate with
>>>>>> viable VFIO devices.
>>>>>>
>>>>>> Functions are introduced to check the validity of a VFIO device
>>>>>> file descriptor, increment/decrement the ref counter of the VFIO
>>>>>> device.
>>>>>>
>>>>>> The patch introduces 2 attributes for this new device group:
>>>>>> KVM_DEV_VFIO_DEVICE_FORWARD_IRQ, KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ.
>>>>>> Their purpose is to turn a VFIO device IRQ into a forwarded IRQ and
>>>>>> unset respectively unset the feature.
>>>>>>
>>>>>> The VFIO device stores a list of registered forwarded IRQs. The reference
>>>>>> counter of the device is incremented each time a new IRQ is forwarded.
>>>>>> Reference counter is decremented when the IRQ forwarding is unset.
>>>>>>
>>>>>> The forwarding programmming is architecture specific, implemented in
>>>>>> kvm_arch_set_fwd_state function. Architecture specific implementation is
>>>>>> enabled when __KVM_HAVE_ARCH_KVM_VFIO_FORWARD is set. When not set those
>>>>>> functions are void.
>>>>>>
>>>>>> Signed-off-by: Eric Auger <[email protected]>
>>>>>>
>>>>>> ---
>>>>>>
>>>>>> v2 -> v3:
>>>>>> - add API comments in kvm_host.h
>>>>>> - improve the commit message
>>>>>> - create a private kvm_vfio_fwd_irq struct
>>>>>> - fwd_irq_action replaced by a bool and removal of VFIO_IRQ_CLEANUP. This
>>>>>> latter action will be handled in vgic.
>>>>>> - add a vfio_device handle argument to kvm_arch_set_fwd_state. The goal is
>>>>>> to move platform specific stuff in architecture specific code.
>>>>>> - kvm_arch_set_fwd_state renamed into kvm_arch_vfio_set_forward
>>>>>> - increment the ref counter each time we do an IRQ forwarding and decrement
>>>>>> this latter each time one IRQ forward is unset. Simplifies the whole
>>>>>> ref counting.
>>>>>> - simplification of list handling: create, search, removal
>>>>>>
>>>>>> v1 -> v2:
>>>>>> - __KVM_HAVE_ARCH_KVM_VFIO renamed into __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
>>>>>> - original patch file separated into 2 parts: generic part moved in vfio.c
>>>>>> and ARM specific part(kvm_arch_set_fwd_state)
>>>>>> ---
>>>>>> include/linux/kvm_host.h | 28 ++++++
>>>>>> virt/kvm/vfio.c | 249 ++++++++++++++++++++++++++++++++++++++++++++++-
>>>>>> 2 files changed, 274 insertions(+), 3 deletions(-)
>>>>>>
>>>>>> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
>>>>>> index ea53b04..0b9659d 100644
>>>>>> --- a/include/linux/kvm_host.h
>>>>>> +++ b/include/linux/kvm_host.h
>>>>>> @@ -1076,6 +1076,15 @@ struct kvm_device_ops {
>>>>>> unsigned long arg);
>>>>>> };
>>>>>>
>>>>>> +/* internal self-contained structure describing a forwarded IRQ */
>>>>>> +struct kvm_fwd_irq {
>>>>>> + struct kvm *kvm; /* VM to inject the GSI into */
>>>>>> + struct vfio_device *vdev; /* vfio device the IRQ belongs to */
>>>>>> + __u32 index; /* VFIO device IRQ index */
>>>>>> + __u32 subindex; /* VFIO device IRQ subindex */
>>>>>> + __u32 gsi; /* gsi, ie. virtual IRQ number */
>>>>>> +};
>>>>>> +
>>>>>> void kvm_device_get(struct kvm_device *dev);
>>>>>> void kvm_device_put(struct kvm_device *dev);
>>>>>> struct kvm_device *kvm_device_from_filp(struct file *filp);
>>>>>> @@ -1085,6 +1094,25 @@ void kvm_unregister_device_ops(u32 type);
>>>>>> extern struct kvm_device_ops kvm_mpic_ops;
>>>>>> extern struct kvm_device_ops kvm_xics_ops;
>>>>>>
>>>>>> +#ifdef __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
>>>>>> +/**
>>>>>> + * kvm_arch_vfio_set_forward - changes the forwarded state of an IRQ
>>>>>> + *
>>>>>> + * @fwd_irq: handle to the forwarded irq struct
>>>>>> + * @forward: true means forwarded, false means not forwarded
>>>>>> + * returns 0 on success, < 0 on failure
>>>>>> + */
>>>>>> +int kvm_arch_vfio_set_forward(struct kvm_fwd_irq *fwd_irq,
>>>>>> + bool forward);
>>>>>
>>>>> We could add a struct device* to the args list or into struct
>>>>> kvm_fwd_irq so that arch code doesn't need to touch the vdev. arch code
>>>>> has no business dealing with references to the vfio_device.
>>>> Hi Alex,
>>>>
>>>> Currently It can't put struct device* into the kvm_fwd_irq struct since
>>>> I need to release the vfio_device with
>>>> vfio_device_put_external_user(struct vfio_device *vdev)
>>>> typically in kvm_vfio_clean_fwd_irq. So I need to store the pointers to
>>>> the vfio_device somewhere.
>>>>
>>>> I see 2 solutions: change the proto of
>>>> vfio_device_put_external_user(struct vfio_device *vdev) and pass a
>>>> struct device* (??)
>>>>
>>>> or change the proto of kvm_arch_vfio_set_forward into
>>>>
>>>> kvm_arch_vfio_set_forward(struct kvm *kvm, struct device *dev, int
>>>> index, [int subindex], int gsi, bool forward) or using index/start/count
>>>> but loosing the interest of having a self-contained internal struct.
>>>
>>> The latter is sort of what I was assuming, I think the interface between
>>> VFIO and KVM-VFIO is good, we just don't need to expose VFIO-isms out to
>>> the arch KVM code. KVM-VFIO should be the barrier layer. In that
>>> spirit, maybe it should be kvm_arch_set_forward() and the KVM-VFIO code
>>> should do the processing of index/subindex sort of like how Feng did for
>>> PCI devices.
>>
>> Hi Alex,
>>
>> In Feng's series, host irq is retrieved in the generic part while in
>> mine it is retrieved in arch specific part, as encouraged at some point.
>> From the above comment I understand the right API between generic and
>> arch specific parts may be <operation>(kvm, host_irq, guest_irq) in
>> which case I should revert the platform specific IRQ retrieval in the
>> generic part. Is it the correct understanding?
>
> Hi Eric,
>
> Sorry if I'm flip-flopping on any of this, but I think you're right that
> we want some sort of <operation>(kvm, host_irq, guest_irq) callout in
> the kvm direction. The parsing of vfio index/sub-index is vfio specific
> and needs to happen in either kvm-vfio or deeper on the vfio side of the
> equation. We don't want things like vfio-pci encoding of
> index/sub-index leaking out into the non-vfio related parts of the code.
>
> In a perfectly abstracted world, that might mean that in addition to the
> vfio external user interface to get the base struct device, we also have
> a mechanism that takes a vfio_device, index, and sub-index and returns a
> host irq, encapsulating that code in vfio-pci or vfio-platform rather
> than having it leak into kvm-vfio. Our layering should be:
>
> vfio bus driver - vfio - kvm-vfio - kvm - arch
>
> And ideally the data goes like this:
>
> host irq --------------------------------->
> device -------------------->
> guest irq -------->

Hi Alex,

no problem. thanks for clarifying. Makes sense to me too.

Best Regards

Eric
>
> I'm willing to accept though that kvm-vfio might have everything it
> needs to determine host irq and the routing back to the bus driver is
> more effort than simply decoding it in kvm-vfio. Thanks,
>
> Alex
>
>>>>>> +
>>>>>> +#else
>>>>>> +static inline int kvm_arch_vfio_set_forward(struct kvm_fwd_irq *fwd_irq,
>>>>>> + bool forward)
>>>>>> +{
>>>>>> + return 0;
>>>>>> +}
>>>>>> +#endif
>>>>>> +
>>>>>> #ifdef CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT
>>>>>>
>>>>>> static inline void kvm_vcpu_set_in_spin_loop(struct kvm_vcpu *vcpu, bool val)
>>>>>> diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
>>>>>> index 6f0cc34..af178bb 100644
>>>>>> --- a/virt/kvm/vfio.c
>>>>>> +++ b/virt/kvm/vfio.c
>>>>>> @@ -25,8 +25,16 @@ struct kvm_vfio_group {
>>>>>> struct vfio_group *vfio_group;
>>>>>> };
>>>>>>
>>>>>> +/* private linkable kvm_fwd_irq struct */
>>>>>> +struct kvm_vfio_fwd_irq_node {
>>>>>> + struct list_head link;
>>>>>> + struct kvm_fwd_irq fwd_irq;
>>>>>> +};
>>>>>> +
>>>>>> struct kvm_vfio {
>>>>>> struct list_head group_list;
>>>>>> + /* list of registered VFIO forwarded IRQs */
>>>>>> + struct list_head fwd_node_list;
>>>>>> struct mutex lock;
>>>>>> bool noncoherent;
>>>>>> };
>>>>>> @@ -247,12 +255,239 @@ static int kvm_vfio_set_group(struct kvm_device *dev, long attr, u64 arg)
>>>>>> return -ENXIO;
>>>>>> }
>>>>>>
>>>>>> +/**
>>>>>> + * kvm_vfio_get_vfio_device - Returns a handle to a vfio-device
>>>>>> + *
>>>>>> + * Checks it is a valid vfio device and increments its reference counter
>>>>>> + * @fd: file descriptor of the vfio platform device
>>>>>> + */
>>>>>> +static struct vfio_device *kvm_vfio_get_vfio_device(int fd)
>>>>>> +{
>>>>>> + struct fd f = fdget(fd);
>>>>>> + struct vfio_device *vdev;
>>>>>> +
>>>>>> + if (!f.file)
>>>>>> + return NULL;
>>>>>
>>>>> ERR_PTR(-EINVAL)?
>>>>>
>>>>> ie. propagate errors from the point where they're encountered so we
>>>>> don't need to make up an errno later.
>>>> yes thanks
>>>>>
>>>>>> + vdev = kvm_vfio_device_get_external_user(f.file);
>>>>>> + fdput(f);
>>>>>> + return vdev;
>>>>>> +}
>>>>>> +
>>>>>> +/**
>>>>>> + * kvm_vfio_put_vfio_device: decrements the reference counter of the
>>>>>> + * vfio platform * device
>>>>>> + *
>>>>>> + * @vdev: vfio_device handle to release
>>>>>> + */
>>>>>> +static void kvm_vfio_put_vfio_device(struct vfio_device *vdev)
>>>>>> +{
>>>>>> + kvm_vfio_device_put_external_user(vdev);
>>>>>> +}
>>>>>> +
>>>>>> +/**
>>>>>> + * kvm_vfio_find_fwd_irq - checks whether a forwarded IRQ already is
>>>>>> + * registered in the list of forwarded IRQs
>>>>>> + *
>>>>>> + * @kv: handle to the kvm-vfio device
>>>>>> + * @fwd: handle to the forwarded irq struct
>>>>>> + * In the positive returns the handle to its node in the kvm-vfio
>>>>>> + * forwarded IRQ list, returns NULL otherwise.
>>>>>> + * Must be called with kv->lock hold.
>>>>>> + */
>>>>>> +static struct kvm_vfio_fwd_irq_node *kvm_vfio_find_fwd_irq(
>>>>>> + struct kvm_vfio *kv,
>>>>>> + struct kvm_fwd_irq *fwd)
>>>>>> +{
>>>>>> + struct kvm_vfio_fwd_irq_node *node;
>>>>>> +
>>>>>> + list_for_each_entry(node, &kv->fwd_node_list, link) {
>>>>>> + if ((node->fwd_irq.index == fwd->index) &&
>>>>>> + (node->fwd_irq.subindex == fwd->subindex) &&
>>>>>> + (node->fwd_irq.vdev == fwd->vdev))
>>>>>> + return node;
>>>>>> + }
>>>>>> + return NULL;
>>>>>> +}
>>>>>> +/**
>>>>>> + * kvm_vfio_register_fwd_irq - Allocates, populates and registers a
>>>>>> + * forwarded IRQ
>>>>>> + *
>>>>>> + * @kv: handle to the kvm-vfio device
>>>>>> + * @fwd: handle to the forwarded irq struct
>>>>>> + * In case of success returns a handle to the new list node,
>>>>>> + * NULL otherwise.
>>>>>> + * Must be called with kv->lock hold.
>>>>>> + */
>>>>>> +static struct kvm_vfio_fwd_irq_node *kvm_vfio_register_fwd_irq(
>>>>>> + struct kvm_vfio *kv,
>>>>>> + struct kvm_fwd_irq *fwd)
>>>>>> +{
>>>>>> + struct kvm_vfio_fwd_irq_node *node;
>>>>>> +
>>>>>> + node = kmalloc(sizeof(*node), GFP_KERNEL);
>>>>>> + if (!node)
>>>>>> + return NULL;
>>>>>
>>>>> ERR_PTR(-ENOMEM)?
>>>>>
>>>> OK
>>>>>> +
>>>>>> + node->fwd_irq = *fwd;
>>>>>> +
>>>>>> + list_add(&node->link, &kv->fwd_node_list);
>>>>>> +
>>>>>> + return node;
>>>>>> +}
>>>>>> +
>>>>>> +/**
>>>>>> + * kvm_vfio_unregister_fwd_irq - unregisters and frees a forwarded IRQ
>>>>>> + *
>>>>>> + * @node: handle to the node struct
>>>>>> + * Must be called with kv->lock hold.
>>>>>> + */
>>>>>> +static void kvm_vfio_unregister_fwd_irq(struct kvm_vfio_fwd_irq_node *node)
>>>>>> +{
>>>>>> + list_del(&node->link);
>>>>>> + kfree(node);
>>>>>> +}
>>>>>> +
>>>>>> +/**
>>>>>> + * kvm_vfio_set_forward - turns a VFIO device IRQ into a forwarded IRQ
>>>>>> + * @kv: handle to the kvm-vfio device
>>>>>> + * @fd: file descriptor of the vfio device the IRQ belongs to
>>>>>> + * @fwd: handle to the forwarded irq struct
>>>>>> + *
>>>>>> + * Registers an IRQ as forwarded and calls the architecture specific
>>>>>> + * implementation of set_forward. In case of operation failure, the IRQ
>>>>>> + * is unregistered. In case of success, the vfio device ref counter is
>>>>>> + * incremented.
>>>>>> + */
>>>>>> +static int kvm_vfio_set_forward(struct kvm_vfio *kv, int fd,
>>>>>> + struct kvm_fwd_irq *fwd)
>>>>>> +{
>>>>>> + int ret;
>>>>>> + struct kvm_vfio_fwd_irq_node *node =
>>>>>> + kvm_vfio_find_fwd_irq(kv, fwd);
>>>>>> +
>>>>>> + if (node)
>>>>>> + return -EINVAL;
>>>>>
>>>>> I assume you're saving -EBUSY for arch code for the case where the IRQ
>>>>> is already active?
>>>> yes. -EBUSY now corresponds to the case where the VFIO signaling is
>>>> already setup.
>>>>>
>>>>>> + node = kvm_vfio_register_fwd_irq(kv, fwd);
>>>>>> + if (!node)
>>>>>> + return -ENOMEM;
>>>>>
>>>>> if (IS_ERR(node))
>>>>> return PTR_ERR(node);
>>>>>
>>>>>> + ret = kvm_arch_vfio_set_forward(fwd, true);
>>>>>> + if (ret < 0) {
>>>>>> + kvm_vfio_unregister_fwd_irq(node);
>>>>>> + return ret;
>>>>>> + }
>>>>>> + /* increment the ref counter */
>>>>>> + kvm_vfio_get_vfio_device(fd);
>>>>>
>>>>> Wouldn't it be easier if the reference counting were coupled with the
>>>>> register/unregister_fwd_irq?
>>>> ok
>>>> I'd be tempted to pass your user_fwd_irq
>>>>> to this function instead of a kvm_fwd_irq.
>>>> Well in that case I would use kvm_arch_forwarded_irq for both set and
>>>> unset. Then comes the problem of kvm_vfio_clean_fwd_irq which
>>>> manipulates only internal structs.
>>>>
>>>>>> + return ret;
>>>>>> +}
>>>>>> +
>>>>>> +/**
>>>>>> + * kvm_vfio_unset_forward - Sets a VFIO device IRQ as non forwarded
>>>>>> + * @kv: handle to the kvm-vfio device
>>>>>> + * @fwd: handle to the forwarded irq struct
>>>>>> + *
>>>>>> + * Calls the architecture specific implementation of set_forward and
>>>>>> + * unregisters the IRQ from the forwarded IRQ list. Decrements the vfio
>>>>>> + * device reference counter.
>>>>>> + */
>>>>>> +static int kvm_vfio_unset_forward(struct kvm_vfio *kv,
>>>>>> + struct kvm_fwd_irq *fwd)
>>>>>> +{
>>>>>> + int ret;
>>>>>> + struct kvm_vfio_fwd_irq_node *node =
>>>>>> + kvm_vfio_find_fwd_irq(kv, fwd);
>>>>>> + if (!node)
>>>>>> + return -EINVAL;
>>>>>> + ret = kvm_arch_vfio_set_forward(fwd, false);
>>>>>
>>>>> Whoa, if the unforward fails we continue to undo everything else? How
>>>>> does userspace cleanup from this? We need a guaranteed shutdown path
>>>>> for cleanup code, we can never trust userspace to do things in the
>>>>> correct order. Can we really preclude the user calling unforward with
>>>>> an active IRQ? Maybe _forward() and _unforward() need to be separate
>>>>> functions so that 'un' can be made void to indicate it can't fail.
>>>> If I accept an unforward while the VFIO signaling mechanism is set, the
>>>> wrong handler will be setup on VFIO driver. So should I put in place a
>>>> mechanism that changes the VFIO handler for that irq (causing VFIO
>>>> driver free_irq/request_irq), using another external API function?
>>>
>>> Yep, it seems like we need to re-evaluate whether forwarding can be
>>> changed on a running IRQ. Disallowing it doesn't appear to support KVM
>>> initiated shutdown, only user initiated configuration. So the
>>> vfio-platform interrupt handler probably needs to be bi-modal. Maybe
>>> the forwarding state of the IRQ can use RCU to avoid locks.
>>>
>>>>>> + kvm_vfio_unregister_fwd_irq(node);
>>>>>> +
>>>>>> + /* decrement the ref counter */
>>>>>> + kvm_vfio_put_vfio_device(fwd->vdev);
>>>>>
>>>>> Again, seems like the unregister should do this.
>>>> ok
>>>>>
>>>>>> + return ret;
>>>>>> +}
>>>>>> +
>>>>>> +static int kvm_vfio_control_irq_forward(struct kvm_device *kdev, long attr,
>>>>>> + int32_t __user *argp)
>>>>>> +{
>>>>>> + struct kvm_arch_forwarded_irq user_fwd_irq;
>>>>>> + struct kvm_fwd_irq fwd;
>>>>>> + struct vfio_device *vdev;
>>>>>> + struct kvm_vfio *kv = kdev->private;
>>>>>> + int ret;
>>>>>> +
>>>>>> + if (copy_from_user(&user_fwd_irq, argp, sizeof(user_fwd_irq)))
>>>>>> + return -EFAULT;
>>>>>> +
>>>>>> + vdev = kvm_vfio_get_vfio_device(user_fwd_irq.fd);
>>>>>> + if (IS_ERR(vdev)) {
>>>>>> + ret = PTR_ERR(vdev);
>>>>>> + goto out;
>>>>>> + }
>>>>>> +
>>>>>> + fwd.vdev = vdev;
>>>>>> + fwd.kvm = kdev->kvm;
>>>>>> + fwd.index = user_fwd_irq.index;
>>>>>> + fwd.subindex = user_fwd_irq.subindex;
>>>>>> + fwd.gsi = user_fwd_irq.gsi;
>>>>>> +
>>>>>> + switch (attr) {
>>>>>> + case KVM_DEV_VFIO_DEVICE_FORWARD_IRQ:
>>>>>> + mutex_lock(&kv->lock);
>>>>>> + ret = kvm_vfio_set_forward(kv, user_fwd_irq.fd, &fwd);
>>>>>> + mutex_unlock(&kv->lock);
>>>>>> + break;
>>>>>> + case KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ:
>>>>>> + mutex_lock(&kv->lock);
>>>>>> + ret = kvm_vfio_unset_forward(kv, &fwd);
>>>>>> + mutex_unlock(&kv->lock);
>>>>>> + break;
>>>>>> + }
>>>>>> +out:
>>>>>> + kvm_vfio_put_vfio_device(vdev);
>>>>>
>>>>> It might add a little extra code, but logically the reference being tied
>>>>> to the register/unregister feels a bit cleaner than this.
>>>> Sorry I do not catch your comment here. Since i called
>>>> kvm_vfio_get_vfio_device at the beg of the function, I need to release
>>>> at the end of the function, besides the ref counter incr/decr at
>>>> registration?
>>>
>>> If we do reference counting on register/unregister, I'd think that the
>>> get/put in this function would go away. Having the reference here seems
>>> redundant.
>>>
>>>>>
>>>>>> + return ret;
>>>>>> +}
>>>>>> +
>>>>>> +static int kvm_vfio_set_device(struct kvm_device *kdev, long attr, u64 arg)
>>>>>> +{
>>>>>> + int32_t __user *argp = (int32_t __user *)(unsigned long)arg;
>>>>>> + int ret;
>>>>>> +
>>>>>> + switch (attr) {
>>>>>> + case KVM_DEV_VFIO_DEVICE_FORWARD_IRQ:
>>>>>> + case KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ:
>>>>>> + ret = kvm_vfio_control_irq_forward(kdev, attr, argp);
>>>>>> + break;
>>>>>> + default:
>>>>>> + ret = -ENXIO;
>>>>>> + }
>>>>>> + return ret;
>>>>>> +}
>>>>>> +
>>>>>> +/**
>>>>>> + * kvm_vfio_clean_fwd_irq - Unset forwarding state of all
>>>>>> + * registered forwarded IRQs and free their list nodes.
>>>>>> + * @kv: kvm-vfio device
>>>>>> + *
>>>>>> + * Loop on all registered device/IRQ combos, reset the non forwarded state,
>>>>>> + * void the lists and release the reference
>>>>>> + */
>>>>>> +static int kvm_vfio_clean_fwd_irq(struct kvm_vfio *kv)
>>>>>> +{
>>>>>> + struct kvm_vfio_fwd_irq_node *node, *tmp;
>>>>>> +
>>>>>> + list_for_each_entry_safe(node, tmp, &kv->fwd_node_list, link) {
>>>>>> + kvm_vfio_unset_forward(kv, &node->fwd_irq);
>>>>>> + }
>>>>>> + return 0;
>>>>>
>>>>> This shouldn't be able to fail, make it void.
>>>> see above questioning? But you're right, I am too much virtualization
>>>> oriented. Currently my cleanup is even done in the virtual interrupt
>>>> controller when the VM dies because nobody unsets the VFIO signaling.
>>>
>>> Yep, being a kernel interface it needs to be hardened against careless
>>> and malicious users. The kernel should return to the correct state if
>>> we kill -9 QEMU at any point. Thanks,
>>>
>>> Alex
>>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at http://www.tux.org/lkml/
>
>
>

2014-12-09 16:20:54

by Eric Auger

[permalink] [raw]
Subject: Re: [PATCH v3 7/8] KVM: kvm-vfio: generic forwarding control

On 11/24/2014 09:56 PM, Alex Williamson wrote:
> On Sun, 2014-11-23 at 19:35 +0100, Eric Auger wrote:
>> This patch introduces a new KVM_DEV_VFIO_DEVICE group.
>>
>> This is a new control channel which enables KVM to cooperate with
>> viable VFIO devices.
>>
>> Functions are introduced to check the validity of a VFIO device
>> file descriptor, increment/decrement the ref counter of the VFIO
>> device.
>>
>> The patch introduces 2 attributes for this new device group:
>> KVM_DEV_VFIO_DEVICE_FORWARD_IRQ, KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ.
>> Their purpose is to turn a VFIO device IRQ into a forwarded IRQ and
>> unset respectively unset the feature.
>>
>> The VFIO device stores a list of registered forwarded IRQs. The reference
>> counter of the device is incremented each time a new IRQ is forwarded.
>> Reference counter is decremented when the IRQ forwarding is unset.
>>
>> The forwarding programmming is architecture specific, implemented in
>> kvm_arch_set_fwd_state function. Architecture specific implementation is
>> enabled when __KVM_HAVE_ARCH_KVM_VFIO_FORWARD is set. When not set those
>> functions are void.
>>
>> Signed-off-by: Eric Auger <[email protected]>
>>
>> ---
>>
>> v2 -> v3:
>> - add API comments in kvm_host.h
>> - improve the commit message
>> - create a private kvm_vfio_fwd_irq struct
>> - fwd_irq_action replaced by a bool and removal of VFIO_IRQ_CLEANUP. This
>> latter action will be handled in vgic.
>> - add a vfio_device handle argument to kvm_arch_set_fwd_state. The goal is
>> to move platform specific stuff in architecture specific code.
>> - kvm_arch_set_fwd_state renamed into kvm_arch_vfio_set_forward
>> - increment the ref counter each time we do an IRQ forwarding and decrement
>> this latter each time one IRQ forward is unset. Simplifies the whole
>> ref counting.
>> - simplification of list handling: create, search, removal
>>
>> v1 -> v2:
>> - __KVM_HAVE_ARCH_KVM_VFIO renamed into __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
>> - original patch file separated into 2 parts: generic part moved in vfio.c
>> and ARM specific part(kvm_arch_set_fwd_state)
>> ---
>> include/linux/kvm_host.h | 28 ++++++
>> virt/kvm/vfio.c | 249 ++++++++++++++++++++++++++++++++++++++++++++++-
>> 2 files changed, 274 insertions(+), 3 deletions(-)
>>
>> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
>> index ea53b04..0b9659d 100644
>> --- a/include/linux/kvm_host.h
>> +++ b/include/linux/kvm_host.h
>> @@ -1076,6 +1076,15 @@ struct kvm_device_ops {
>> unsigned long arg);
>> };
>>
>> +/* internal self-contained structure describing a forwarded IRQ */
>> +struct kvm_fwd_irq {
>> + struct kvm *kvm; /* VM to inject the GSI into */
>> + struct vfio_device *vdev; /* vfio device the IRQ belongs to */
>> + __u32 index; /* VFIO device IRQ index */
>> + __u32 subindex; /* VFIO device IRQ subindex */
>> + __u32 gsi; /* gsi, ie. virtual IRQ number */
>> +};
>> +
>> void kvm_device_get(struct kvm_device *dev);
>> void kvm_device_put(struct kvm_device *dev);
>> struct kvm_device *kvm_device_from_filp(struct file *filp);
>> @@ -1085,6 +1094,25 @@ void kvm_unregister_device_ops(u32 type);
>> extern struct kvm_device_ops kvm_mpic_ops;
>> extern struct kvm_device_ops kvm_xics_ops;
>>
>> +#ifdef __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
>> +/**
>> + * kvm_arch_vfio_set_forward - changes the forwarded state of an IRQ
>> + *
>> + * @fwd_irq: handle to the forwarded irq struct
>> + * @forward: true means forwarded, false means not forwarded
>> + * returns 0 on success, < 0 on failure
>> + */
>> +int kvm_arch_vfio_set_forward(struct kvm_fwd_irq *fwd_irq,
>> + bool forward);
>
> We could add a struct device* to the args list or into struct
> kvm_fwd_irq so that arch code doesn't need to touch the vdev. arch code
> has no business dealing with references to the vfio_device.
>
>> +
>> +#else
>> +static inline int kvm_arch_vfio_set_forward(struct kvm_fwd_irq *fwd_irq,
>> + bool forward)
>> +{
>> + return 0;
>> +}
>> +#endif
>> +
>> #ifdef CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT
>>
>> static inline void kvm_vcpu_set_in_spin_loop(struct kvm_vcpu *vcpu, bool val)
>> diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
>> index 6f0cc34..af178bb 100644
>> --- a/virt/kvm/vfio.c
>> +++ b/virt/kvm/vfio.c
>> @@ -25,8 +25,16 @@ struct kvm_vfio_group {
>> struct vfio_group *vfio_group;
>> };
>>
>> +/* private linkable kvm_fwd_irq struct */
>> +struct kvm_vfio_fwd_irq_node {
>> + struct list_head link;
>> + struct kvm_fwd_irq fwd_irq;
>> +};
>> +
>> struct kvm_vfio {
>> struct list_head group_list;
>> + /* list of registered VFIO forwarded IRQs */
>> + struct list_head fwd_node_list;
>> struct mutex lock;
>> bool noncoherent;
>> };
>> @@ -247,12 +255,239 @@ static int kvm_vfio_set_group(struct kvm_device *dev, long attr, u64 arg)
>> return -ENXIO;
>> }
>>
>> +/**
>> + * kvm_vfio_get_vfio_device - Returns a handle to a vfio-device
>> + *
>> + * Checks it is a valid vfio device and increments its reference counter
>> + * @fd: file descriptor of the vfio platform device
>> + */
>> +static struct vfio_device *kvm_vfio_get_vfio_device(int fd)
>> +{
>> + struct fd f = fdget(fd);
>> + struct vfio_device *vdev;
>> +
>> + if (!f.file)
>> + return NULL;
>
> ERR_PTR(-EINVAL)?
>
> ie. propagate errors from the point where they're encountered so we
> don't need to make up an errno later.
>
>> + vdev = kvm_vfio_device_get_external_user(f.file);
>> + fdput(f);
>> + return vdev;
>> +}
>> +
>> +/**
>> + * kvm_vfio_put_vfio_device: decrements the reference counter of the
>> + * vfio platform * device
>> + *
>> + * @vdev: vfio_device handle to release
>> + */
>> +static void kvm_vfio_put_vfio_device(struct vfio_device *vdev)
>> +{
>> + kvm_vfio_device_put_external_user(vdev);
>> +}
>> +
>> +/**
>> + * kvm_vfio_find_fwd_irq - checks whether a forwarded IRQ already is
>> + * registered in the list of forwarded IRQs
>> + *
>> + * @kv: handle to the kvm-vfio device
>> + * @fwd: handle to the forwarded irq struct
>> + * In the positive returns the handle to its node in the kvm-vfio
>> + * forwarded IRQ list, returns NULL otherwise.
>> + * Must be called with kv->lock hold.
>> + */
>> +static struct kvm_vfio_fwd_irq_node *kvm_vfio_find_fwd_irq(
>> + struct kvm_vfio *kv,
>> + struct kvm_fwd_irq *fwd)
>> +{
>> + struct kvm_vfio_fwd_irq_node *node;
>> +
>> + list_for_each_entry(node, &kv->fwd_node_list, link) {
>> + if ((node->fwd_irq.index == fwd->index) &&
>> + (node->fwd_irq.subindex == fwd->subindex) &&
>> + (node->fwd_irq.vdev == fwd->vdev))
>> + return node;
>> + }
>> + return NULL;
>> +}
>> +/**
>> + * kvm_vfio_register_fwd_irq - Allocates, populates and registers a
>> + * forwarded IRQ
>> + *
>> + * @kv: handle to the kvm-vfio device
>> + * @fwd: handle to the forwarded irq struct
>> + * In case of success returns a handle to the new list node,
>> + * NULL otherwise.
>> + * Must be called with kv->lock hold.
>> + */
>> +static struct kvm_vfio_fwd_irq_node *kvm_vfio_register_fwd_irq(
>> + struct kvm_vfio *kv,
>> + struct kvm_fwd_irq *fwd)
>> +{
>> + struct kvm_vfio_fwd_irq_node *node;
>> +
>> + node = kmalloc(sizeof(*node), GFP_KERNEL);
>> + if (!node)
>> + return NULL;
>
> ERR_PTR(-ENOMEM)?
>
>> +
>> + node->fwd_irq = *fwd;
>> +
>> + list_add(&node->link, &kv->fwd_node_list);
>> +
>> + return node;
>> +}
>> +
>> +/**
>> + * kvm_vfio_unregister_fwd_irq - unregisters and frees a forwarded IRQ
>> + *
>> + * @node: handle to the node struct
>> + * Must be called with kv->lock hold.
>> + */
>> +static void kvm_vfio_unregister_fwd_irq(struct kvm_vfio_fwd_irq_node *node)
>> +{
>> + list_del(&node->link);
>> + kfree(node);
>> +}
>> +
>> +/**
>> + * kvm_vfio_set_forward - turns a VFIO device IRQ into a forwarded IRQ
>> + * @kv: handle to the kvm-vfio device
>> + * @fd: file descriptor of the vfio device the IRQ belongs to
>> + * @fwd: handle to the forwarded irq struct
>> + *
>> + * Registers an IRQ as forwarded and calls the architecture specific
>> + * implementation of set_forward. In case of operation failure, the IRQ
>> + * is unregistered. In case of success, the vfio device ref counter is
>> + * incremented.
>> + */
>> +static int kvm_vfio_set_forward(struct kvm_vfio *kv, int fd,
>> + struct kvm_fwd_irq *fwd)
>> +{
>> + int ret;
>> + struct kvm_vfio_fwd_irq_node *node =
>> + kvm_vfio_find_fwd_irq(kv, fwd);
>> +
>> + if (node)
>> + return -EINVAL;
>
> I assume you're saving -EBUSY for arch code for the case where the IRQ
> is already active?
>
>> + node = kvm_vfio_register_fwd_irq(kv, fwd);
>> + if (!node)
>> + return -ENOMEM;
>
> if (IS_ERR(node))
> return PTR_ERR(node);
>
>> + ret = kvm_arch_vfio_set_forward(fwd, true);
>> + if (ret < 0) {
>> + kvm_vfio_unregister_fwd_irq(node);
>> + return ret;
>> + }
>> + /* increment the ref counter */
>> + kvm_vfio_get_vfio_device(fd);
>
> Wouldn't it be easier if the reference counting were coupled with the
> register/unregister_fwd_irq? I'd be tempted to pass your user_fwd_irq
> to this function instead of a kvm_fwd_irq.
Hi Alex,

The usage of gsi flexible array member in kvm_vfio_dev_irq makes
impractical (to me) to use user_fwd_irq beyond
kvm_vfio_control_irq_forward. I would prefer keeping using either the
private kvm_fwd_irq or individual fields - as Feng does too -.

With respect to moving the ref counting in register/unregister_fwd_irq
my main issue is the kvm_vfio_get_vfio_device currently takes an fd and
not a vfio_device. Either I also store the fd in kvm_fwd_irq or I add
another external API that makes possible to increment the reference from
the vfio_device.

There is currently struct vfio_device *vfio_device_get_from_dev(struct
device *dev).

I could add somethink like
static void vfio_device_get_from_dev_external(struct vfio_device *vdev)
{
vfio_device_get(vdev);
}

Does it make sense to you?

Best Regards

Eric



>
>> + return ret;
>> +}
>> +
>> +/**
>> + * kvm_vfio_unset_forward - Sets a VFIO device IRQ as non forwarded
>> + * @kv: handle to the kvm-vfio device
>> + * @fwd: handle to the forwarded irq struct
>> + *
>> + * Calls the architecture specific implementation of set_forward and
>> + * unregisters the IRQ from the forwarded IRQ list. Decrements the vfio
>> + * device reference counter.
>> + */
>> +static int kvm_vfio_unset_forward(struct kvm_vfio *kv,
>> + struct kvm_fwd_irq *fwd)
>> +{
>> + int ret;
>> + struct kvm_vfio_fwd_irq_node *node =
>> + kvm_vfio_find_fwd_irq(kv, fwd);
>> + if (!node)
>> + return -EINVAL;
>> + ret = kvm_arch_vfio_set_forward(fwd, false);
>
> Whoa, if the unforward fails we continue to undo everything else? How
> does userspace cleanup from this? We need a guaranteed shutdown path
> for cleanup code, we can never trust userspace to do things in the
> correct order. Can we really preclude the user calling unforward with
> an active IRQ? Maybe _forward() and _unforward() need to be separate
> functions so that 'un' can be made void to indicate it can't fail.
>
>> + kvm_vfio_unregister_fwd_irq(node);
>> +
>> + /* decrement the ref counter */
>> + kvm_vfio_put_vfio_device(fwd->vdev);
>
> Again, seems like the unregister should do this.
>
>> + return ret;
>> +}
>> +
>> +static int kvm_vfio_control_irq_forward(struct kvm_device *kdev, long attr,
>> + int32_t __user *argp)
>> +{
>> + struct kvm_arch_forwarded_irq user_fwd_irq;
>> + struct kvm_fwd_irq fwd;
>> + struct vfio_device *vdev;
>> + struct kvm_vfio *kv = kdev->private;
>> + int ret;
>> +
>> + if (copy_from_user(&user_fwd_irq, argp, sizeof(user_fwd_irq)))
>> + return -EFAULT;
>> +
>> + vdev = kvm_vfio_get_vfio_device(user_fwd_irq.fd);
>> + if (IS_ERR(vdev)) {
>> + ret = PTR_ERR(vdev);
>> + goto out;
>> + }
>> +
>> + fwd.vdev = vdev;
>> + fwd.kvm = kdev->kvm;
>> + fwd.index = user_fwd_irq.index;
>> + fwd.subindex = user_fwd_irq.subindex;
>> + fwd.gsi = user_fwd_irq.gsi;
>> +
>> + switch (attr) {
>> + case KVM_DEV_VFIO_DEVICE_FORWARD_IRQ:
>> + mutex_lock(&kv->lock);
>> + ret = kvm_vfio_set_forward(kv, user_fwd_irq.fd, &fwd);
>> + mutex_unlock(&kv->lock);
>> + break;
>> + case KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ:
>> + mutex_lock(&kv->lock);
>> + ret = kvm_vfio_unset_forward(kv, &fwd);
>> + mutex_unlock(&kv->lock);
>> + break;
>> + }
>> +out:
>> + kvm_vfio_put_vfio_device(vdev);
>
> It might add a little extra code, but logically the reference being tied
> to the register/unregister feels a bit cleaner than this.
>
>> + return ret;
>> +}
>> +
>> +static int kvm_vfio_set_device(struct kvm_device *kdev, long attr, u64 arg)
>> +{
>> + int32_t __user *argp = (int32_t __user *)(unsigned long)arg;
>> + int ret;
>> +
>> + switch (attr) {
>> + case KVM_DEV_VFIO_DEVICE_FORWARD_IRQ:
>> + case KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ:
>> + ret = kvm_vfio_control_irq_forward(kdev, attr, argp);
>> + break;
>> + default:
>> + ret = -ENXIO;
>> + }
>> + return ret;
>> +}
>> +
>> +/**
>> + * kvm_vfio_clean_fwd_irq - Unset forwarding state of all
>> + * registered forwarded IRQs and free their list nodes.
>> + * @kv: kvm-vfio device
>> + *
>> + * Loop on all registered device/IRQ combos, reset the non forwarded state,
>> + * void the lists and release the reference
>> + */
>> +static int kvm_vfio_clean_fwd_irq(struct kvm_vfio *kv)
>> +{
>> + struct kvm_vfio_fwd_irq_node *node, *tmp;
>> +
>> + list_for_each_entry_safe(node, tmp, &kv->fwd_node_list, link) {
>> + kvm_vfio_unset_forward(kv, &node->fwd_irq);
>> + }
>> + return 0;
>
> This shouldn't be able to fail, make it void.
>
>> +}
>> +
>> static int kvm_vfio_set_attr(struct kvm_device *dev,
>> struct kvm_device_attr *attr)
>> {
>> switch (attr->group) {
>> case KVM_DEV_VFIO_GROUP:
>> return kvm_vfio_set_group(dev, attr->attr, attr->addr);
>> + case KVM_DEV_VFIO_DEVICE:
>> + return kvm_vfio_set_device(dev, attr->attr, attr->addr);
>> }
>>
>> return -ENXIO;
>> @@ -268,10 +503,17 @@ static int kvm_vfio_has_attr(struct kvm_device *dev,
>> case KVM_DEV_VFIO_GROUP_DEL:
>> return 0;
>> }
>> -
>> break;
>> +#ifdef __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
>> + case KVM_DEV_VFIO_DEVICE:
>> + switch (attr->attr) {
>> + case KVM_DEV_VFIO_DEVICE_FORWARD_IRQ:
>> + case KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ:
>> + return 0;
>> + }
>> + break;
>> +#endif
>> }
>> -
>> return -ENXIO;
>> }
>>
>> @@ -285,7 +527,7 @@ static void kvm_vfio_destroy(struct kvm_device *dev)
>> list_del(&kvg->node);
>> kfree(kvg);
>> }
>> -
>> + kvm_vfio_clean_fwd_irq(kv);
>> kvm_vfio_update_coherency(dev);
>>
>> kfree(kv);
>> @@ -317,6 +559,7 @@ static int kvm_vfio_create(struct kvm_device *dev, u32 type)
>> return -ENOMEM;
>>
>> INIT_LIST_HEAD(&kv->group_list);
>> + INIT_LIST_HEAD(&kv->fwd_node_list);
>> mutex_init(&kv->lock);
>>
>> dev->private = kv;
>
>
>

2014-12-09 20:51:34

by Alex Williamson

[permalink] [raw]
Subject: Re: [PATCH v3 7/8] KVM: kvm-vfio: generic forwarding control

On Tue, 2014-12-09 at 17:19 +0100, Eric Auger wrote:
> On 11/24/2014 09:56 PM, Alex Williamson wrote:
> > On Sun, 2014-11-23 at 19:35 +0100, Eric Auger wrote:
> >> This patch introduces a new KVM_DEV_VFIO_DEVICE group.
> >>
> >> This is a new control channel which enables KVM to cooperate with
> >> viable VFIO devices.
> >>
> >> Functions are introduced to check the validity of a VFIO device
> >> file descriptor, increment/decrement the ref counter of the VFIO
> >> device.
> >>
> >> The patch introduces 2 attributes for this new device group:
> >> KVM_DEV_VFIO_DEVICE_FORWARD_IRQ, KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ.
> >> Their purpose is to turn a VFIO device IRQ into a forwarded IRQ and
> >> unset respectively unset the feature.
> >>
> >> The VFIO device stores a list of registered forwarded IRQs. The reference
> >> counter of the device is incremented each time a new IRQ is forwarded.
> >> Reference counter is decremented when the IRQ forwarding is unset.
> >>
> >> The forwarding programmming is architecture specific, implemented in
> >> kvm_arch_set_fwd_state function. Architecture specific implementation is
> >> enabled when __KVM_HAVE_ARCH_KVM_VFIO_FORWARD is set. When not set those
> >> functions are void.
> >>
> >> Signed-off-by: Eric Auger <[email protected]>
> >>
> >> ---
> >>
> >> v2 -> v3:
> >> - add API comments in kvm_host.h
> >> - improve the commit message
> >> - create a private kvm_vfio_fwd_irq struct
> >> - fwd_irq_action replaced by a bool and removal of VFIO_IRQ_CLEANUP. This
> >> latter action will be handled in vgic.
> >> - add a vfio_device handle argument to kvm_arch_set_fwd_state. The goal is
> >> to move platform specific stuff in architecture specific code.
> >> - kvm_arch_set_fwd_state renamed into kvm_arch_vfio_set_forward
> >> - increment the ref counter each time we do an IRQ forwarding and decrement
> >> this latter each time one IRQ forward is unset. Simplifies the whole
> >> ref counting.
> >> - simplification of list handling: create, search, removal
> >>
> >> v1 -> v2:
> >> - __KVM_HAVE_ARCH_KVM_VFIO renamed into __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
> >> - original patch file separated into 2 parts: generic part moved in vfio.c
> >> and ARM specific part(kvm_arch_set_fwd_state)
> >> ---
> >> include/linux/kvm_host.h | 28 ++++++
> >> virt/kvm/vfio.c | 249 ++++++++++++++++++++++++++++++++++++++++++++++-
> >> 2 files changed, 274 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> >> index ea53b04..0b9659d 100644
> >> --- a/include/linux/kvm_host.h
> >> +++ b/include/linux/kvm_host.h
> >> @@ -1076,6 +1076,15 @@ struct kvm_device_ops {
> >> unsigned long arg);
> >> };
> >>
> >> +/* internal self-contained structure describing a forwarded IRQ */
> >> +struct kvm_fwd_irq {
> >> + struct kvm *kvm; /* VM to inject the GSI into */
> >> + struct vfio_device *vdev; /* vfio device the IRQ belongs to */
> >> + __u32 index; /* VFIO device IRQ index */
> >> + __u32 subindex; /* VFIO device IRQ subindex */
> >> + __u32 gsi; /* gsi, ie. virtual IRQ number */
> >> +};
> >> +
> >> void kvm_device_get(struct kvm_device *dev);
> >> void kvm_device_put(struct kvm_device *dev);
> >> struct kvm_device *kvm_device_from_filp(struct file *filp);
> >> @@ -1085,6 +1094,25 @@ void kvm_unregister_device_ops(u32 type);
> >> extern struct kvm_device_ops kvm_mpic_ops;
> >> extern struct kvm_device_ops kvm_xics_ops;
> >>
> >> +#ifdef __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
> >> +/**
> >> + * kvm_arch_vfio_set_forward - changes the forwarded state of an IRQ
> >> + *
> >> + * @fwd_irq: handle to the forwarded irq struct
> >> + * @forward: true means forwarded, false means not forwarded
> >> + * returns 0 on success, < 0 on failure
> >> + */
> >> +int kvm_arch_vfio_set_forward(struct kvm_fwd_irq *fwd_irq,
> >> + bool forward);
> >
> > We could add a struct device* to the args list or into struct
> > kvm_fwd_irq so that arch code doesn't need to touch the vdev. arch code
> > has no business dealing with references to the vfio_device.
> >
> >> +
> >> +#else
> >> +static inline int kvm_arch_vfio_set_forward(struct kvm_fwd_irq *fwd_irq,
> >> + bool forward)
> >> +{
> >> + return 0;
> >> +}
> >> +#endif
> >> +
> >> #ifdef CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT
> >>
> >> static inline void kvm_vcpu_set_in_spin_loop(struct kvm_vcpu *vcpu, bool val)
> >> diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
> >> index 6f0cc34..af178bb 100644
> >> --- a/virt/kvm/vfio.c
> >> +++ b/virt/kvm/vfio.c
> >> @@ -25,8 +25,16 @@ struct kvm_vfio_group {
> >> struct vfio_group *vfio_group;
> >> };
> >>
> >> +/* private linkable kvm_fwd_irq struct */
> >> +struct kvm_vfio_fwd_irq_node {
> >> + struct list_head link;
> >> + struct kvm_fwd_irq fwd_irq;
> >> +};
> >> +
> >> struct kvm_vfio {
> >> struct list_head group_list;
> >> + /* list of registered VFIO forwarded IRQs */
> >> + struct list_head fwd_node_list;
> >> struct mutex lock;
> >> bool noncoherent;
> >> };
> >> @@ -247,12 +255,239 @@ static int kvm_vfio_set_group(struct kvm_device *dev, long attr, u64 arg)
> >> return -ENXIO;
> >> }
> >>
> >> +/**
> >> + * kvm_vfio_get_vfio_device - Returns a handle to a vfio-device
> >> + *
> >> + * Checks it is a valid vfio device and increments its reference counter
> >> + * @fd: file descriptor of the vfio platform device
> >> + */
> >> +static struct vfio_device *kvm_vfio_get_vfio_device(int fd)
> >> +{
> >> + struct fd f = fdget(fd);
> >> + struct vfio_device *vdev;
> >> +
> >> + if (!f.file)
> >> + return NULL;
> >
> > ERR_PTR(-EINVAL)?
> >
> > ie. propagate errors from the point where they're encountered so we
> > don't need to make up an errno later.
> >
> >> + vdev = kvm_vfio_device_get_external_user(f.file);
> >> + fdput(f);
> >> + return vdev;
> >> +}
> >> +
> >> +/**
> >> + * kvm_vfio_put_vfio_device: decrements the reference counter of the
> >> + * vfio platform * device
> >> + *
> >> + * @vdev: vfio_device handle to release
> >> + */
> >> +static void kvm_vfio_put_vfio_device(struct vfio_device *vdev)
> >> +{
> >> + kvm_vfio_device_put_external_user(vdev);
> >> +}
> >> +
> >> +/**
> >> + * kvm_vfio_find_fwd_irq - checks whether a forwarded IRQ already is
> >> + * registered in the list of forwarded IRQs
> >> + *
> >> + * @kv: handle to the kvm-vfio device
> >> + * @fwd: handle to the forwarded irq struct
> >> + * In the positive returns the handle to its node in the kvm-vfio
> >> + * forwarded IRQ list, returns NULL otherwise.
> >> + * Must be called with kv->lock hold.
> >> + */
> >> +static struct kvm_vfio_fwd_irq_node *kvm_vfio_find_fwd_irq(
> >> + struct kvm_vfio *kv,
> >> + struct kvm_fwd_irq *fwd)
> >> +{
> >> + struct kvm_vfio_fwd_irq_node *node;
> >> +
> >> + list_for_each_entry(node, &kv->fwd_node_list, link) {
> >> + if ((node->fwd_irq.index == fwd->index) &&
> >> + (node->fwd_irq.subindex == fwd->subindex) &&
> >> + (node->fwd_irq.vdev == fwd->vdev))
> >> + return node;
> >> + }
> >> + return NULL;
> >> +}
> >> +/**
> >> + * kvm_vfio_register_fwd_irq - Allocates, populates and registers a
> >> + * forwarded IRQ
> >> + *
> >> + * @kv: handle to the kvm-vfio device
> >> + * @fwd: handle to the forwarded irq struct
> >> + * In case of success returns a handle to the new list node,
> >> + * NULL otherwise.
> >> + * Must be called with kv->lock hold.
> >> + */
> >> +static struct kvm_vfio_fwd_irq_node *kvm_vfio_register_fwd_irq(
> >> + struct kvm_vfio *kv,
> >> + struct kvm_fwd_irq *fwd)
> >> +{
> >> + struct kvm_vfio_fwd_irq_node *node;
> >> +
> >> + node = kmalloc(sizeof(*node), GFP_KERNEL);
> >> + if (!node)
> >> + return NULL;
> >
> > ERR_PTR(-ENOMEM)?
> >
> >> +
> >> + node->fwd_irq = *fwd;
> >> +
> >> + list_add(&node->link, &kv->fwd_node_list);
> >> +
> >> + return node;
> >> +}
> >> +
> >> +/**
> >> + * kvm_vfio_unregister_fwd_irq - unregisters and frees a forwarded IRQ
> >> + *
> >> + * @node: handle to the node struct
> >> + * Must be called with kv->lock hold.
> >> + */
> >> +static void kvm_vfio_unregister_fwd_irq(struct kvm_vfio_fwd_irq_node *node)
> >> +{
> >> + list_del(&node->link);
> >> + kfree(node);
> >> +}
> >> +
> >> +/**
> >> + * kvm_vfio_set_forward - turns a VFIO device IRQ into a forwarded IRQ
> >> + * @kv: handle to the kvm-vfio device
> >> + * @fd: file descriptor of the vfio device the IRQ belongs to
> >> + * @fwd: handle to the forwarded irq struct
> >> + *
> >> + * Registers an IRQ as forwarded and calls the architecture specific
> >> + * implementation of set_forward. In case of operation failure, the IRQ
> >> + * is unregistered. In case of success, the vfio device ref counter is
> >> + * incremented.
> >> + */
> >> +static int kvm_vfio_set_forward(struct kvm_vfio *kv, int fd,
> >> + struct kvm_fwd_irq *fwd)
> >> +{
> >> + int ret;
> >> + struct kvm_vfio_fwd_irq_node *node =
> >> + kvm_vfio_find_fwd_irq(kv, fwd);
> >> +
> >> + if (node)
> >> + return -EINVAL;
> >
> > I assume you're saving -EBUSY for arch code for the case where the IRQ
> > is already active?
> >
> >> + node = kvm_vfio_register_fwd_irq(kv, fwd);
> >> + if (!node)
> >> + return -ENOMEM;
> >
> > if (IS_ERR(node))
> > return PTR_ERR(node);
> >
> >> + ret = kvm_arch_vfio_set_forward(fwd, true);
> >> + if (ret < 0) {
> >> + kvm_vfio_unregister_fwd_irq(node);
> >> + return ret;
> >> + }
> >> + /* increment the ref counter */
> >> + kvm_vfio_get_vfio_device(fd);
> >
> > Wouldn't it be easier if the reference counting were coupled with the
> > register/unregister_fwd_irq? I'd be tempted to pass your user_fwd_irq
> > to this function instead of a kvm_fwd_irq.
> Hi Alex,
>
> The usage of gsi flexible array member in kvm_vfio_dev_irq makes
> impractical (to me) to use user_fwd_irq beyond
> kvm_vfio_control_irq_forward. I would prefer keeping using either the
> private kvm_fwd_irq or individual fields - as Feng does too -.
>
> With respect to moving the ref counting in register/unregister_fwd_irq
> my main issue is the kvm_vfio_get_vfio_device currently takes an fd and
> not a vfio_device. Either I also store the fd in kvm_fwd_irq or I add
> another external API that makes possible to increment the reference from
> the vfio_device.
>
> There is currently struct vfio_device *vfio_device_get_from_dev(struct
> device *dev).
>
> I could add somethink like
> static void vfio_device_get_from_dev_external(struct vfio_device *vdev)
> {
> vfio_device_get(vdev);
> }
>
> Does it make sense to you?

Honestly, no. It doesn't really matter if we use a structure or
individual fields, but it seems like the problem being solved with an
extra reference interface is self imposed. Do we really need to double
increment the reference or can we just keep better track of the
reference we have? I suspect we can do the latter. Thanks,

Alex

> >> + return ret;
> >> +}
> >> +
> >> +/**
> >> + * kvm_vfio_unset_forward - Sets a VFIO device IRQ as non forwarded
> >> + * @kv: handle to the kvm-vfio device
> >> + * @fwd: handle to the forwarded irq struct
> >> + *
> >> + * Calls the architecture specific implementation of set_forward and
> >> + * unregisters the IRQ from the forwarded IRQ list. Decrements the vfio
> >> + * device reference counter.
> >> + */
> >> +static int kvm_vfio_unset_forward(struct kvm_vfio *kv,
> >> + struct kvm_fwd_irq *fwd)
> >> +{
> >> + int ret;
> >> + struct kvm_vfio_fwd_irq_node *node =
> >> + kvm_vfio_find_fwd_irq(kv, fwd);
> >> + if (!node)
> >> + return -EINVAL;
> >> + ret = kvm_arch_vfio_set_forward(fwd, false);
> >
> > Whoa, if the unforward fails we continue to undo everything else? How
> > does userspace cleanup from this? We need a guaranteed shutdown path
> > for cleanup code, we can never trust userspace to do things in the
> > correct order. Can we really preclude the user calling unforward with
> > an active IRQ? Maybe _forward() and _unforward() need to be separate
> > functions so that 'un' can be made void to indicate it can't fail.
> >
> >> + kvm_vfio_unregister_fwd_irq(node);
> >> +
> >> + /* decrement the ref counter */
> >> + kvm_vfio_put_vfio_device(fwd->vdev);
> >
> > Again, seems like the unregister should do this.
> >
> >> + return ret;
> >> +}
> >> +
> >> +static int kvm_vfio_control_irq_forward(struct kvm_device *kdev, long attr,
> >> + int32_t __user *argp)
> >> +{
> >> + struct kvm_arch_forwarded_irq user_fwd_irq;
> >> + struct kvm_fwd_irq fwd;
> >> + struct vfio_device *vdev;
> >> + struct kvm_vfio *kv = kdev->private;
> >> + int ret;
> >> +
> >> + if (copy_from_user(&user_fwd_irq, argp, sizeof(user_fwd_irq)))
> >> + return -EFAULT;
> >> +
> >> + vdev = kvm_vfio_get_vfio_device(user_fwd_irq.fd);
> >> + if (IS_ERR(vdev)) {
> >> + ret = PTR_ERR(vdev);
> >> + goto out;
> >> + }
> >> +
> >> + fwd.vdev = vdev;
> >> + fwd.kvm = kdev->kvm;
> >> + fwd.index = user_fwd_irq.index;
> >> + fwd.subindex = user_fwd_irq.subindex;
> >> + fwd.gsi = user_fwd_irq.gsi;
> >> +
> >> + switch (attr) {
> >> + case KVM_DEV_VFIO_DEVICE_FORWARD_IRQ:
> >> + mutex_lock(&kv->lock);
> >> + ret = kvm_vfio_set_forward(kv, user_fwd_irq.fd, &fwd);
> >> + mutex_unlock(&kv->lock);
> >> + break;
> >> + case KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ:
> >> + mutex_lock(&kv->lock);
> >> + ret = kvm_vfio_unset_forward(kv, &fwd);
> >> + mutex_unlock(&kv->lock);
> >> + break;
> >> + }
> >> +out:
> >> + kvm_vfio_put_vfio_device(vdev);
> >
> > It might add a little extra code, but logically the reference being tied
> > to the register/unregister feels a bit cleaner than this.
> >
> >> + return ret;
> >> +}
> >> +
> >> +static int kvm_vfio_set_device(struct kvm_device *kdev, long attr, u64 arg)
> >> +{
> >> + int32_t __user *argp = (int32_t __user *)(unsigned long)arg;
> >> + int ret;
> >> +
> >> + switch (attr) {
> >> + case KVM_DEV_VFIO_DEVICE_FORWARD_IRQ:
> >> + case KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ:
> >> + ret = kvm_vfio_control_irq_forward(kdev, attr, argp);
> >> + break;
> >> + default:
> >> + ret = -ENXIO;
> >> + }
> >> + return ret;
> >> +}
> >> +
> >> +/**
> >> + * kvm_vfio_clean_fwd_irq - Unset forwarding state of all
> >> + * registered forwarded IRQs and free their list nodes.
> >> + * @kv: kvm-vfio device
> >> + *
> >> + * Loop on all registered device/IRQ combos, reset the non forwarded state,
> >> + * void the lists and release the reference
> >> + */
> >> +static int kvm_vfio_clean_fwd_irq(struct kvm_vfio *kv)
> >> +{
> >> + struct kvm_vfio_fwd_irq_node *node, *tmp;
> >> +
> >> + list_for_each_entry_safe(node, tmp, &kv->fwd_node_list, link) {
> >> + kvm_vfio_unset_forward(kv, &node->fwd_irq);
> >> + }
> >> + return 0;
> >
> > This shouldn't be able to fail, make it void.
> >
> >> +}
> >> +
> >> static int kvm_vfio_set_attr(struct kvm_device *dev,
> >> struct kvm_device_attr *attr)
> >> {
> >> switch (attr->group) {
> >> case KVM_DEV_VFIO_GROUP:
> >> return kvm_vfio_set_group(dev, attr->attr, attr->addr);
> >> + case KVM_DEV_VFIO_DEVICE:
> >> + return kvm_vfio_set_device(dev, attr->attr, attr->addr);
> >> }
> >>
> >> return -ENXIO;
> >> @@ -268,10 +503,17 @@ static int kvm_vfio_has_attr(struct kvm_device *dev,
> >> case KVM_DEV_VFIO_GROUP_DEL:
> >> return 0;
> >> }
> >> -
> >> break;
> >> +#ifdef __KVM_HAVE_ARCH_KVM_VFIO_FORWARD
> >> + case KVM_DEV_VFIO_DEVICE:
> >> + switch (attr->attr) {
> >> + case KVM_DEV_VFIO_DEVICE_FORWARD_IRQ:
> >> + case KVM_DEV_VFIO_DEVICE_UNFORWARD_IRQ:
> >> + return 0;
> >> + }
> >> + break;
> >> +#endif
> >> }
> >> -
> >> return -ENXIO;
> >> }
> >>
> >> @@ -285,7 +527,7 @@ static void kvm_vfio_destroy(struct kvm_device *dev)
> >> list_del(&kvg->node);
> >> kfree(kvg);
> >> }
> >> -
> >> + kvm_vfio_clean_fwd_irq(kv);
> >> kvm_vfio_update_coherency(dev);
> >>
> >> kfree(kv);
> >> @@ -317,6 +559,7 @@ static int kvm_vfio_create(struct kvm_device *dev, u32 type)
> >> return -ENOMEM;
> >>
> >> INIT_LIST_HEAD(&kv->group_list);
> >> + INIT_LIST_HEAD(&kv->fwd_node_list);
> >> mutex_init(&kv->lock);
> >>
> >> dev->private = kv;
> >
> >
> >
>