2012-06-27 05:08:56

by Alex Williamson

[permalink] [raw]
Subject: [PATCH v2 0/6] kvm: level triggered irqfd support

Ok, let's see how this flies. I actually quite like this, so be
gentle tearing it apart ;)

I just couldn't bring myself to contort KVM_IRQFD into something
that either sets up an irqfd or specifies a nearly unrelated EOI
eventfd. The solution I've come up with, that also avoids exposing
irq_source_ids to userspace, is to work through the irqfd. If we
setup a level irqfd, we can optionally associate an eoifd with
the same irq_source_id, by passing the irqfd. To do this, we just
need to create a new reference counted object for the source ID
so we don't run into problems ordering release. This means we
end up with a KVM_EOIFD ioctl that has both general usefulness and
can still tie into an irqfd. In patch 6/6 I also include an
alternate de-assert mechanism via an irqfd with the opposite
polarity. I don't currently use this, but it's pretty trivial
and at least available in archives now.

I don't address whether injecting an edge irqfd really needs an assert
followed by de-assert (I don't know). This new interface really
unties itself from caring. We might be able to consolidate inject
functions at some future point, but it doesn't change how we'd name
flags as it did in the previous version. Thanks,

Alex

---

Alex Williamson (6):
kvm: Level IRQ de-assert for KVM_IRQFD
kvm: KVM_EOIFD, an eventfd for EOIs
kvm: Extend irqfd to support level interrupts
kvm: Sanitize KVM_IRQFD flags
kvm: Add missing KVM_IRQFD API documentation
kvm: Pass kvm_irqfd to functions


Documentation/virtual/kvm/api.txt | 53 ++++++
arch/x86/kvm/x86.c | 2
include/linux/kvm.h | 23 +++
include/linux/kvm_host.h | 17 ++
virt/kvm/eventfd.c | 323 ++++++++++++++++++++++++++++++++++++-
virt/kvm/kvm_main.c | 13 +
6 files changed, 414 insertions(+), 17 deletions(-)


2012-06-27 05:09:12

by Alex Williamson

[permalink] [raw]
Subject: [PATCH v2 1/6] kvm: Pass kvm_irqfd to functions

Prune this down to just the struct kvm_irqfd so we can avoid
changing function definition for every flag or field we use.

Signed-off-by: Alex Williamson <[email protected]>
---

include/linux/kvm_host.h | 4 ++--
virt/kvm/eventfd.c | 20 ++++++++++----------
virt/kvm/kvm_main.c | 2 +-
3 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 27ac8a4..ae3b426 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -824,7 +824,7 @@ static inline void kvm_free_irq_routing(struct kvm *kvm) {}
#ifdef CONFIG_HAVE_KVM_EVENTFD

void kvm_eventfd_init(struct kvm *kvm);
-int kvm_irqfd(struct kvm *kvm, int fd, int gsi, int flags);
+int kvm_irqfd(struct kvm *kvm, struct kvm_irqfd *args);
void kvm_irqfd_release(struct kvm *kvm);
void kvm_irq_routing_update(struct kvm *, struct kvm_irq_routing_table *);
int kvm_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args);
@@ -833,7 +833,7 @@ int kvm_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args);

static inline void kvm_eventfd_init(struct kvm *kvm) {}

-static inline int kvm_irqfd(struct kvm *kvm, int fd, int gsi, int flags)
+static inline int kvm_irqfd(struct kvm *kvm, struct kvm_irqfd *args)
{
return -EINVAL;
}
diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
index f59c1e8..c307c24 100644
--- a/virt/kvm/eventfd.c
+++ b/virt/kvm/eventfd.c
@@ -198,7 +198,7 @@ static void irqfd_update(struct kvm *kvm, struct _irqfd *irqfd,
}

static int
-kvm_irqfd_assign(struct kvm *kvm, int fd, int gsi)
+kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args)
{
struct kvm_irq_routing_table *irq_rt;
struct _irqfd *irqfd, *tmp;
@@ -212,12 +212,12 @@ kvm_irqfd_assign(struct kvm *kvm, int fd, int gsi)
return -ENOMEM;

irqfd->kvm = kvm;
- irqfd->gsi = gsi;
+ irqfd->gsi = args->gsi;
INIT_LIST_HEAD(&irqfd->list);
INIT_WORK(&irqfd->inject, irqfd_inject);
INIT_WORK(&irqfd->shutdown, irqfd_shutdown);

- file = eventfd_fget(fd);
+ file = eventfd_fget(args->fd);
if (IS_ERR(file)) {
ret = PTR_ERR(file);
goto fail;
@@ -298,19 +298,19 @@ kvm_eventfd_init(struct kvm *kvm)
* shutdown any irqfd's that match fd+gsi
*/
static int
-kvm_irqfd_deassign(struct kvm *kvm, int fd, int gsi)
+kvm_irqfd_deassign(struct kvm *kvm, struct kvm_irqfd *args)
{
struct _irqfd *irqfd, *tmp;
struct eventfd_ctx *eventfd;

- eventfd = eventfd_ctx_fdget(fd);
+ eventfd = eventfd_ctx_fdget(args->fd);
if (IS_ERR(eventfd))
return PTR_ERR(eventfd);

spin_lock_irq(&kvm->irqfds.lock);

list_for_each_entry_safe(irqfd, tmp, &kvm->irqfds.items, list) {
- if (irqfd->eventfd == eventfd && irqfd->gsi == gsi) {
+ if (irqfd->eventfd == eventfd && irqfd->gsi == args->gsi) {
/*
* This rcu_assign_pointer is needed for when
* another thread calls kvm_irq_routing_update before
@@ -338,12 +338,12 @@ kvm_irqfd_deassign(struct kvm *kvm, int fd, int gsi)
}

int
-kvm_irqfd(struct kvm *kvm, int fd, int gsi, int flags)
+kvm_irqfd(struct kvm *kvm, struct kvm_irqfd *args)
{
- if (flags & KVM_IRQFD_FLAG_DEASSIGN)
- return kvm_irqfd_deassign(kvm, fd, gsi);
+ if (args->flags & KVM_IRQFD_FLAG_DEASSIGN)
+ return kvm_irqfd_deassign(kvm, args);

- return kvm_irqfd_assign(kvm, fd, gsi);
+ return kvm_irqfd_assign(kvm, args);
}

/*
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 02cb440..b4ad14cc 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2059,7 +2059,7 @@ static long kvm_vm_ioctl(struct file *filp,
r = -EFAULT;
if (copy_from_user(&data, argp, sizeof data))
goto out;
- r = kvm_irqfd(kvm, data.fd, data.gsi, data.flags);
+ r = kvm_irqfd(kvm, &data);
break;
}
case KVM_IOEVENTFD: {

2012-06-27 05:09:23

by Alex Williamson

[permalink] [raw]
Subject: [PATCH v2 2/6] kvm: Add missing KVM_IRQFD API documentation

Signed-off-by: Alex Williamson <[email protected]>
---

Documentation/virtual/kvm/api.txt | 16 ++++++++++++++++
1 file changed, 16 insertions(+)

diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index 310fe50..ea9edce 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -1965,6 +1965,22 @@ return the hash table order in the parameter. (If the guest is using
the virtualized real-mode area (VRMA) facility, the kernel will
re-create the VMRA HPTEs on the next KVM_RUN of any vcpu.)

+4.76 KVM_IRQFD
+
+Capability: KVM_CAP_IRQFD
+Architectures: x86
+Type: vm ioctl
+Parameters: struct kvm_irqfd (in)
+Returns: 0 on success, -1 on error
+
+Allows setting an eventfd to directly trigger a guest interrupt.
+kvm_irqfd.fd specifies the file descriptor to use as the eventfd and
+kvm_irqfd.gsi specifies the irqchip pin toggled by this event. When
+an event is tiggered on the eventfd, an interrupt is injected into
+the guest using the specified gsi pin. The irqfd is removed using
+the KVM_IRQFD_FLAG_DEASSIGN flag, specifying both kvm_irqfd.fd
+and kvm_irqfd.gsi.
+

5. The kvm_run structure
------------------------

2012-06-27 05:09:38

by Alex Williamson

[permalink] [raw]
Subject: [PATCH v2 3/6] kvm: Sanitize KVM_IRQFD flags

We only know of one so far.

Signed-off-by: Alex Williamson <[email protected]>
---

virt/kvm/eventfd.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
index c307c24..7d7e2aa 100644
--- a/virt/kvm/eventfd.c
+++ b/virt/kvm/eventfd.c
@@ -340,6 +340,9 @@ kvm_irqfd_deassign(struct kvm *kvm, struct kvm_irqfd *args)
int
kvm_irqfd(struct kvm *kvm, struct kvm_irqfd *args)
{
+ if (args->flags & ~KVM_IRQFD_FLAG_DEASSIGN)
+ return -EINVAL;
+
if (args->flags & KVM_IRQFD_FLAG_DEASSIGN)
return kvm_irqfd_deassign(kvm, args);

2012-06-27 05:09:52

by Alex Williamson

[permalink] [raw]
Subject: [PATCH v2 4/6] kvm: Extend irqfd to support level interrupts

In order to inject an interrupt from an external source using an
irqfd, we need to allocate a new irq_source_id. This allows us to
assert and (later) de-assert an interrupt line independently from
users of KVM_IRQ_LINE and avoid lost interrupts.

We also add what may appear like a bit of excessive infrastructure
around an object for storing this irq_source_id. However, notice
that we only provide a way to assert the interrupt here. A follow-on
interface will make use of the same irq_source_id to allow de-assert.

Signed-off-by: Alex Williamson <[email protected]>
---

Documentation/virtual/kvm/api.txt | 5 ++
arch/x86/kvm/x86.c | 1
include/linux/kvm.h | 3 +
virt/kvm/eventfd.c | 95 +++++++++++++++++++++++++++++++++++--
4 files changed, 99 insertions(+), 5 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index ea9edce..b216709 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -1981,6 +1981,11 @@ the guest using the specified gsi pin. The irqfd is removed using
the KVM_IRQFD_FLAG_DEASSIGN flag, specifying both kvm_irqfd.fd
and kvm_irqfd.gsi.

+With KVM_IRQFD_FLAG_LEVEL KVM_IRQFD allocates a new IRQ source ID for
+the requested irqfd. This is necessary to share level triggered
+interrupts with those injected through KVM_IRQ_LINE. IRQFDs created
+with KVM_IRQFD_FLAG_LEVEL must also set this flag when de-assiging.
+KVM_IRQFD_FLAG_LEVEL support is indicated by KVM_CAP_IRQFD_LEVEL.

5. The kvm_run structure
------------------------
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index a01a424..80bed07 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2148,6 +2148,7 @@ int kvm_dev_ioctl_check_extension(long ext)
case KVM_CAP_GET_TSC_KHZ:
case KVM_CAP_PCI_2_3:
case KVM_CAP_KVMCLOCK_CTRL:
+ case KVM_CAP_IRQFD_LEVEL:
r = 1;
break;
case KVM_CAP_COALESCED_MMIO:
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index 2ce09aa..b2e6e4f 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -618,6 +618,7 @@ struct kvm_ppc_smmu_info {
#define KVM_CAP_PPC_GET_SMMU_INFO 78
#define KVM_CAP_S390_COW 79
#define KVM_CAP_PPC_ALLOC_HTAB 80
+#define KVM_CAP_IRQFD_LEVEL 81

#ifdef KVM_CAP_IRQ_ROUTING

@@ -683,6 +684,8 @@ struct kvm_xen_hvm_config {
#endif

#define KVM_IRQFD_FLAG_DEASSIGN (1 << 0)
+/* Available with KVM_CAP_IRQFD_LEVEL */
+#define KVM_IRQFD_FLAG_LEVEL (1 << 1)

struct kvm_irqfd {
__u32 fd;
diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
index 7d7e2aa..18cc284 100644
--- a/virt/kvm/eventfd.c
+++ b/virt/kvm/eventfd.c
@@ -36,6 +36,64 @@
#include "iodev.h"

/*
+ * An irq_source_id can be created from KVM_IRQFD for level interrupt
+ * injections and shared with other interfaces for EOI or de-assert.
+ * Create an object with reference counting to make it easy to use.
+ */
+struct _irq_source {
+ int id;
+ struct kvm *kvm;
+ struct kref kref;
+};
+
+static void release_irq_source(struct kref *kref)
+{
+ struct _irq_source *source;
+
+ source = container_of(kref, struct _irq_source, kref);
+
+ kvm_free_irq_source_id(source->kvm, source->id);
+ kfree(source);
+}
+
+static void put_irq_source(struct _irq_source *source)
+{
+ if (source)
+ kref_put(&source->kref, release_irq_source);
+}
+
+static struct _irq_source *__attribute__ ((used)) /* white lie for now */
+get_irq_source(struct _irq_source *source)
+{
+ if (source)
+ kref_get(&source->kref);
+
+ return source;
+}
+
+static struct _irq_source *new_irq_source(struct kvm *kvm)
+{
+ struct _irq_source *source;
+ int id;
+
+ source = kzalloc(sizeof(*source), GFP_KERNEL);
+ if (!source)
+ return ERR_PTR(-ENOMEM);
+
+ id = kvm_request_irq_source_id(kvm);
+ if (id < 0) {
+ kfree(source);
+ return ERR_PTR(id);
+ }
+
+ kref_init(&source->kref);
+ source->kvm = kvm;
+ source->id = id;
+
+ return source;
+}
+
+/*
* --------------------------------------------------------------------
* irqfd: Allows an fd to be used to inject an interrupt to the guest
*
@@ -52,6 +110,7 @@ struct _irqfd {
/* Used for level IRQ fast-path */
int gsi;
struct work_struct inject;
+ struct _irq_source *source;
/* Used for setup/shutdown */
struct eventfd_ctx *eventfd;
struct list_head list;
@@ -62,7 +121,7 @@ struct _irqfd {
static struct workqueue_struct *irqfd_cleanup_wq;

static void
-irqfd_inject(struct work_struct *work)
+irqfd_inject_edge(struct work_struct *work)
{
struct _irqfd *irqfd = container_of(work, struct _irqfd, inject);
struct kvm *kvm = irqfd->kvm;
@@ -71,6 +130,14 @@ irqfd_inject(struct work_struct *work)
kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd->gsi, 0);
}

+static void
+irqfd_inject_level(struct work_struct *work)
+{
+ struct _irqfd *irqfd = container_of(work, struct _irqfd, inject);
+
+ kvm_set_irq(irqfd->kvm, irqfd->source->id, irqfd->gsi, 1);
+}
+
/*
* Race-free decouple logic (ordering is critical)
*/
@@ -96,6 +163,9 @@ irqfd_shutdown(struct work_struct *work)
* It is now safe to release the object's resources
*/
eventfd_ctx_put(irqfd->eventfd);
+
+ put_irq_source(irqfd->source);
+
kfree(irqfd);
}

@@ -214,7 +284,19 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args)
irqfd->kvm = kvm;
irqfd->gsi = args->gsi;
INIT_LIST_HEAD(&irqfd->list);
- INIT_WORK(&irqfd->inject, irqfd_inject);
+
+ if (args->flags & KVM_IRQFD_FLAG_LEVEL) {
+ irqfd->source = new_irq_source(kvm);
+ if (IS_ERR(irqfd->source)) {
+ ret = PTR_ERR(irqfd->source);
+ irqfd->source = NULL;
+ goto fail;
+ }
+
+ INIT_WORK(&irqfd->inject, irqfd_inject_level);
+ } else
+ INIT_WORK(&irqfd->inject, irqfd_inject_edge);
+
INIT_WORK(&irqfd->shutdown, irqfd_shutdown);

file = eventfd_fget(args->fd);
@@ -279,9 +361,10 @@ fail:
if (eventfd && !IS_ERR(eventfd))
eventfd_ctx_put(eventfd);

- if (!IS_ERR(file))
+ if (file && !IS_ERR(file))
fput(file);

+ put_irq_source(irqfd->source);
kfree(irqfd);
return ret;
}
@@ -302,6 +385,7 @@ kvm_irqfd_deassign(struct kvm *kvm, struct kvm_irqfd *args)
{
struct _irqfd *irqfd, *tmp;
struct eventfd_ctx *eventfd;
+ bool is_level = (args->flags & KVM_IRQFD_FLAG_LEVEL) != 0;

eventfd = eventfd_ctx_fdget(args->fd);
if (IS_ERR(eventfd))
@@ -310,7 +394,8 @@ kvm_irqfd_deassign(struct kvm *kvm, struct kvm_irqfd *args)
spin_lock_irq(&kvm->irqfds.lock);

list_for_each_entry_safe(irqfd, tmp, &kvm->irqfds.items, list) {
- if (irqfd->eventfd == eventfd && irqfd->gsi == args->gsi) {
+ if (irqfd->eventfd == eventfd && irqfd->gsi == args->gsi &&
+ is_level == (irqfd->source != NULL)) {
/*
* This rcu_assign_pointer is needed for when
* another thread calls kvm_irq_routing_update before
@@ -340,7 +425,7 @@ kvm_irqfd_deassign(struct kvm *kvm, struct kvm_irqfd *args)
int
kvm_irqfd(struct kvm *kvm, struct kvm_irqfd *args)
{
- if (args->flags & ~KVM_IRQFD_FLAG_DEASSIGN)
+ if (args->flags & ~(KVM_IRQFD_FLAG_DEASSIGN | KVM_IRQFD_FLAG_LEVEL))
return -EINVAL;

if (args->flags & KVM_IRQFD_FLAG_DEASSIGN)

2012-06-27 05:10:15

by Alex Williamson

[permalink] [raw]
Subject: [PATCH v2 5/6] kvm: KVM_EOIFD, an eventfd for EOIs

This new ioctl enables an eventfd to be triggered when an EOI is
written for a specified irqchip pin. By default this is a simple
notification, but we can also tie the eoifd to a level irqfd, which
enables the irqchip pin to be automatically de-asserted on EOI.
This mode is particularly useful for device-assignment applications
where the unmask and notify triggers a hardware unmask. The default
mode is most applicable to simple notify with no side-effects for
userspace usage, such as Qemu.

Here we make use of the reference counting of the _irq_source
object allowing us to share it with an irqfd and cleanup regardless
of the release order.

Signed-off-by: Alex Williamson <[email protected]>
---

Documentation/virtual/kvm/api.txt | 24 +++++
arch/x86/kvm/x86.c | 1
include/linux/kvm.h | 14 +++
include/linux/kvm_host.h | 13 +++
virt/kvm/eventfd.c | 189 +++++++++++++++++++++++++++++++++++++
virt/kvm/kvm_main.c | 11 ++
6 files changed, 250 insertions(+), 2 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index b216709..87a2558 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -1987,6 +1987,30 @@ interrupts with those injected through KVM_IRQ_LINE. IRQFDs created
with KVM_IRQFD_FLAG_LEVEL must also set this flag when de-assiging.
KVM_IRQFD_FLAG_LEVEL support is indicated by KVM_CAP_IRQFD_LEVEL.

+4.77 KVM_EOIFD
+
+Capability: KVM_CAP_EOIFD
+Architectures: x86
+Type: vm ioctl
+Parameters: struct kvm_eoifd (in)
+Returns: 0 on success, -1 on error
+
+KVM_EOIFD allows userspace to receive EOI notification through an
+eventfd for level triggered irqchip interrupts. Behavior for edge
+triggered interrupts is undefined. kvm_eoifd.fd specifies the eventfd
+used for notification and kvm_eoifd.gsi specifies the irchip pin,
+similar to KVM_IRQFD. KVM_EOIFD_FLAG_DEASSIGN is used to deassign
+a previously enabled eoifd and should also set fd and gsi to match.
+
+The KVM_EOIFD_FLAG_LEVEL_IRQFD flag indicates that the EOI is for
+a level triggered EOI and the kvm_eoifd structure includes
+kvm_eoifd.irqfd, which must be previously configured using KVM_IRQFD
+with the KVM_IRQFD_FLAG_LEVEL flag. This allows both EOI notification
+through kvm_eoifd.fd as well as automatically de-asserting level
+irqfds on EOI. Both KVM_EOIFD_FLAG_DEASSIGN and
+KVM_EOIFD_FLAG_LEVEL_IRQFD should be used to de-assign an eoifd
+initially setup with KVM_EOIFD_FLAG_LEVEL_IRQFD.
+
5. The kvm_run structure
------------------------

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 80bed07..62d6eca 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2149,6 +2149,7 @@ int kvm_dev_ioctl_check_extension(long ext)
case KVM_CAP_PCI_2_3:
case KVM_CAP_KVMCLOCK_CTRL:
case KVM_CAP_IRQFD_LEVEL:
+ case KVM_CAP_EOIFD:
r = 1;
break;
case KVM_CAP_COALESCED_MMIO:
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index b2e6e4f..7567e7d 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -619,6 +619,7 @@ struct kvm_ppc_smmu_info {
#define KVM_CAP_S390_COW 79
#define KVM_CAP_PPC_ALLOC_HTAB 80
#define KVM_CAP_IRQFD_LEVEL 81
+#define KVM_CAP_EOIFD 82

#ifdef KVM_CAP_IRQ_ROUTING

@@ -694,6 +695,17 @@ struct kvm_irqfd {
__u8 pad[20];
};

+#define KVM_EOIFD_FLAG_DEASSIGN (1 << 0)
+#define KVM_EOIFD_FLAG_LEVEL_IRQFD (1 << 1)
+
+struct kvm_eoifd {
+ __u32 fd;
+ __u32 gsi;
+ __u32 flags;
+ __u32 irqfd;
+ __u8 pad[16];
+};
+
struct kvm_clock_data {
__u64 clock;
__u32 flags;
@@ -834,6 +846,8 @@ struct kvm_s390_ucas_mapping {
#define KVM_PPC_GET_SMMU_INFO _IOR(KVMIO, 0xa6, struct kvm_ppc_smmu_info)
/* Available with KVM_CAP_PPC_ALLOC_HTAB */
#define KVM_PPC_ALLOCATE_HTAB _IOWR(KVMIO, 0xa7, __u32)
+/* Available with KVM_CAP_EOIFD */
+#define KVM_EOIFD _IOW(KVMIO, 0xa8, struct kvm_eoifd)

/*
* ioctls for vcpu fds
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index ae3b426..83472eb 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -285,6 +285,10 @@ struct kvm {
struct list_head items;
} irqfds;
struct list_head ioeventfds;
+ struct {
+ spinlock_t lock;
+ struct list_head items;
+ } eoifds;
#endif
struct kvm_vm_stat stat;
struct kvm_arch arch;
@@ -828,6 +832,8 @@ int kvm_irqfd(struct kvm *kvm, struct kvm_irqfd *args);
void kvm_irqfd_release(struct kvm *kvm);
void kvm_irq_routing_update(struct kvm *, struct kvm_irq_routing_table *);
int kvm_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args);
+int kvm_eoifd(struct kvm *kvm, struct kvm_eoifd *args);
+void kvm_eoifd_release(struct kvm *kvm);

#else

@@ -853,6 +859,13 @@ static inline int kvm_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args)
return -ENOSYS;
}

+static inline int kvm_eoifd(struct kvm *kvm, struct kvm_eoifd *args)
+{
+ return -ENOSYS;
+}
+
+static inline void kvm_eoifd_release(struct kvm *kvm) {}
+
#endif /* CONFIG_HAVE_KVM_EVENTFD */

#ifdef CONFIG_KVM_APIC_ARCHITECTURE
diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
index 18cc284..02ca50f 100644
--- a/virt/kvm/eventfd.c
+++ b/virt/kvm/eventfd.c
@@ -62,8 +62,7 @@ static void put_irq_source(struct _irq_source *source)
kref_put(&source->kref, release_irq_source);
}

-static struct _irq_source *__attribute__ ((used)) /* white lie for now */
-get_irq_source(struct _irq_source *source)
+static struct _irq_source *get_irq_source(struct _irq_source *source)
{
if (source)
kref_get(&source->kref);
@@ -118,6 +117,41 @@ struct _irqfd {
struct work_struct shutdown;
};

+static struct _irq_source *get_irq_source_from_irqfd(struct kvm *kvm, int fd)
+{
+ struct file *file;
+ struct eventfd_ctx *eventfd;
+ struct _irqfd *irqfd;
+ struct _irq_source *source = NULL;
+
+ file = fget(fd);
+ if (!file)
+ return ERR_PTR(-EBADF);
+
+ eventfd = eventfd_ctx_fileget(file);
+ if (IS_ERR(eventfd)) {
+ fput(file);
+ return (struct _irq_source *)eventfd;
+ }
+
+ spin_lock_irq(&kvm->irqfds.lock);
+
+ list_for_each_entry(irqfd, &kvm->irqfds.items, list) {
+ if (irqfd->eventfd != eventfd)
+ continue;
+
+ source = get_irq_source(irqfd->source);
+ break;
+ }
+
+ spin_unlock_irq(&kvm->irqfds.lock);
+
+ eventfd_ctx_put(eventfd);
+ fput(file);
+
+ return source ? source : ERR_PTR(-ENODEV);
+}
+
static struct workqueue_struct *irqfd_cleanup_wq;

static void
@@ -375,6 +409,8 @@ kvm_eventfd_init(struct kvm *kvm)
spin_lock_init(&kvm->irqfds.lock);
INIT_LIST_HEAD(&kvm->irqfds.items);
INIT_LIST_HEAD(&kvm->ioeventfds);
+ spin_lock_init(&kvm->eoifds.lock);
+ INIT_LIST_HEAD(&kvm->eoifds.items);
}

/*
@@ -743,3 +779,152 @@ kvm_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args)

return kvm_assign_ioeventfd(kvm, args);
}
+
+/*
+ * --------------------------------------------------------------------
+ * eoifd: Translate KVM APIC/IOAPIC EOI into eventfd signal.
+ *
+ * userspace can register GSIs with an eventfd for receiving
+ * notification when an EOI occurs.
+ * --------------------------------------------------------------------
+ */
+
+struct _eoifd {
+ struct kvm *kvm;
+ struct eventfd_ctx *eventfd;
+ struct _irq_source *source;
+ struct kvm_irq_ack_notifier notifier;
+ struct list_head list;
+};
+
+static void eoifd_event(struct kvm_irq_ack_notifier *notifier)
+{
+ struct _eoifd *eoifd;
+
+ eoifd = container_of(notifier, struct _eoifd, notifier);
+
+ if (eoifd->source)
+ kvm_set_irq(eoifd->kvm, eoifd->source->id,
+ eoifd->notifier.gsi, 0);
+
+ eventfd_signal(eoifd->eventfd, 1);
+}
+
+static int kvm_assign_eoifd(struct kvm *kvm, struct kvm_eoifd *args)
+{
+ struct eventfd_ctx *eventfd;
+ struct _eoifd *eoifd, *tmp;
+ struct _irq_source *source = NULL;
+
+ if (args->flags & KVM_EOIFD_FLAG_LEVEL_IRQFD) {
+ source = get_irq_source_from_irqfd(kvm, args->irqfd);
+ if (IS_ERR(source))
+ return PTR_ERR(source);
+ }
+
+ eventfd = eventfd_ctx_fdget(args->fd);
+ if (IS_ERR(eventfd)) {
+ put_irq_source(source);
+ return PTR_ERR(eventfd);
+ }
+
+ eoifd = kzalloc(sizeof(*eoifd), GFP_KERNEL);
+ if (!eoifd) {
+ put_irq_source(source);
+ eventfd_ctx_put(eventfd);
+ return -ENOMEM;
+ }
+
+ INIT_LIST_HEAD(&eoifd->list);
+ eoifd->kvm = kvm;
+ eoifd->eventfd = eventfd;
+ eoifd->source = source;
+ eoifd->notifier.gsi = args->gsi;
+ eoifd->notifier.irq_acked = eoifd_event;
+
+ spin_lock_irq(&kvm->eoifds.lock);
+
+ list_for_each_entry(tmp, &kvm->eoifds.items, list) {
+ if (eoifd->eventfd != tmp->eventfd)
+ continue;
+
+ put_irq_source(source);
+ eventfd_ctx_put(eventfd);
+ kfree(eoifd);
+ return -EBUSY;
+ }
+
+ list_add_tail(&eoifd->list, &kvm->eoifds.items);
+ kvm_register_irq_ack_notifier(kvm, &eoifd->notifier);
+
+ spin_unlock_irq(&kvm->eoifds.lock);
+
+ return 0;
+}
+
+static void eoifd_deactivate(struct kvm *kvm, struct _eoifd *eoifd)
+{
+ list_del(&eoifd->list);
+ kvm_unregister_irq_ack_notifier(kvm, &eoifd->notifier);
+ put_irq_source(eoifd->source);
+ eventfd_ctx_put(eoifd->eventfd);
+ kfree(eoifd);
+}
+
+void kvm_eoifd_release(struct kvm *kvm)
+{
+ struct _eoifd *eoifd, *tmp;
+
+ spin_lock_irq(&kvm->eoifds.lock);
+
+ list_for_each_entry_safe(eoifd, tmp, &kvm->eoifds.items, list)
+ eoifd_deactivate(kvm, eoifd);
+
+ spin_unlock_irq(&kvm->eoifds.lock);
+}
+
+static int kvm_deassign_eoifd(struct kvm *kvm, struct kvm_eoifd *args)
+{
+ struct eventfd_ctx *eventfd;
+ struct _eoifd *eoifd;
+ bool uses_source = (args->flags & KVM_EOIFD_FLAG_LEVEL_IRQFD) != 0;
+ int ret = -ENODEV;
+
+ eventfd = eventfd_ctx_fdget(args->fd);
+ if (IS_ERR(eventfd))
+ return PTR_ERR(eventfd);
+
+ spin_lock_irq(&kvm->eoifds.lock);
+
+ list_for_each_entry(eoifd, &kvm->eoifds.items, list) {
+ /*
+ * Matching eventfd is unique since we don't allow dulicates,
+ * the rest is sanitizing the calling parameters.
+ */
+ if (eoifd->eventfd == eventfd &&
+ eoifd->notifier.gsi == args->gsi &&
+ uses_source == (eoifd->source != NULL)) {
+ eoifd_deactivate(kvm, eoifd);
+ ret = 0;
+ break;
+ }
+ }
+
+ spin_unlock_irq(&kvm->eoifds.lock);
+
+ eventfd_ctx_put(eventfd);
+
+ return ret;
+}
+
+int kvm_eoifd(struct kvm *kvm, struct kvm_eoifd *args)
+{
+ if (args->flags & ~(KVM_EOIFD_FLAG_DEASSIGN |
+ KVM_EOIFD_FLAG_LEVEL_IRQFD))
+ return -EINVAL;
+
+ if (args->flags & KVM_EOIFD_FLAG_DEASSIGN)
+ return kvm_deassign_eoifd(kvm, args);
+
+ return kvm_assign_eoifd(kvm, args);
+}
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index b4ad14cc..5b41df1 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -620,6 +620,8 @@ static int kvm_vm_release(struct inode *inode, struct file *filp)

kvm_irqfd_release(kvm);

+ kvm_eoifd_release(kvm);
+
kvm_put_kvm(kvm);
return 0;
}
@@ -2093,6 +2095,15 @@ static long kvm_vm_ioctl(struct file *filp,
break;
}
#endif
+ case KVM_EOIFD: {
+ struct kvm_eoifd data;
+
+ r = -EFAULT;
+ if (copy_from_user(&data, argp, sizeof data))
+ goto out;
+ r = kvm_eoifd(kvm, &data);
+ break;
+ }
default:
r = kvm_arch_vm_ioctl(filp, ioctl, arg);
if (r == -ENOTTY)

2012-06-27 05:10:26

by Alex Williamson

[permalink] [raw]
Subject: [PATCH v2 6/6] kvm: Level IRQ de-assert for KVM_IRQFD

This is an alternate level irqfd de-assert mode that's potentially
useful for emulated drivers. It's included here to show how easy it
is to implement with the new level irqfd and eoifd support. It's
possible this mode might also prove interesting for device-assignment
where we inject via level irqfd, receive an EOI (w/o de-assert), and
use the level de-assert irqfd here.

Signed-off-by: Alex Williamson <[email protected]>
---

Documentation/virtual/kvm/api.txt | 8 ++++++++
include/linux/kvm.h | 6 +++++-
virt/kvm/eventfd.c | 28 ++++++++++++++++++++++++++--
3 files changed, 39 insertions(+), 3 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index 87a2558..b356937 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -1987,6 +1987,14 @@ interrupts with those injected through KVM_IRQ_LINE. IRQFDs created
with KVM_IRQFD_FLAG_LEVEL must also set this flag when de-assiging.
KVM_IRQFD_FLAG_LEVEL support is indicated by KVM_CAP_IRQFD_LEVEL.

+The KVM_IRQFD_FLAG_LEVEL_DEASSERT flag creates an irqfd similar to
+KVM_IRQFD_FLAG_LEVEL, except the irqfd de-asserts the irqchip pin
+rather than asserts it. The level irqfd must first be created as
+aboved and passed to this ioctl in kvm_irqfd.irqfd. This ensures
+the de-assert irqfd uses the same IRQ source ID as the assert irqfd.
+This flag should also be specified on de-assign. This feature is
+present when KVM_CAP_IRQFD_LEVEL_DEASSERT is available.
+
4.77 KVM_EOIFD

Capability: KVM_CAP_EOIFD
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index 7567e7d..0bbfd47 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -620,6 +620,7 @@ struct kvm_ppc_smmu_info {
#define KVM_CAP_PPC_ALLOC_HTAB 80
#define KVM_CAP_IRQFD_LEVEL 81
#define KVM_CAP_EOIFD 82
+#define KVM_CAP_IRQFD_LEVEL_DEASSERT 83

#ifdef KVM_CAP_IRQ_ROUTING

@@ -687,12 +688,15 @@ struct kvm_xen_hvm_config {
#define KVM_IRQFD_FLAG_DEASSIGN (1 << 0)
/* Available with KVM_CAP_IRQFD_LEVEL */
#define KVM_IRQFD_FLAG_LEVEL (1 << 1)
+/* Available with KVM_CAP_IRQFD_LEVEL_DEASSERT */
+#define KVM_IRQFD_FLAG_LEVEL_DEASSERT (1 << 2)

struct kvm_irqfd {
__u32 fd;
__u32 gsi;
__u32 flags;
- __u8 pad[20];
+ __u32 irqfd;
+ __u8 pad[16];
};

#define KVM_EOIFD_FLAG_DEASSIGN (1 << 0)
diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
index 02ca50f..50ace0e 100644
--- a/virt/kvm/eventfd.c
+++ b/virt/kvm/eventfd.c
@@ -172,6 +172,14 @@ irqfd_inject_level(struct work_struct *work)
kvm_set_irq(irqfd->kvm, irqfd->source->id, irqfd->gsi, 1);
}

+static void
+irqfd_inject_level_deassert(struct work_struct *work)
+{
+ struct _irqfd *irqfd = container_of(work, struct _irqfd, inject);
+
+ kvm_set_irq(irqfd->kvm, irqfd->source->id, irqfd->gsi, 0);
+}
+
/*
* Race-free decouple logic (ordering is critical)
*/
@@ -320,6 +328,9 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args)
INIT_LIST_HEAD(&irqfd->list);

if (args->flags & KVM_IRQFD_FLAG_LEVEL) {
+ if (args->flags & KVM_IRQFD_FLAG_LEVEL_DEASSERT)
+ return -EINVAL; /* mutually exclusive */
+
irqfd->source = new_irq_source(kvm);
if (IS_ERR(irqfd->source)) {
ret = PTR_ERR(irqfd->source);
@@ -328,6 +339,16 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args)
}

INIT_WORK(&irqfd->inject, irqfd_inject_level);
+
+ } else if (args->flags & KVM_IRQFD_FLAG_LEVEL_DEASSERT) {
+ irqfd->source = get_irq_source_from_irqfd(kvm, args->irqfd);
+ if (IS_ERR(irqfd->source)) {
+ ret = PTR_ERR(irqfd->source);
+ irqfd->source = NULL;
+ goto fail;
+ }
+
+ INIT_WORK(&irqfd->inject, irqfd_inject_level_deassert);
} else
INIT_WORK(&irqfd->inject, irqfd_inject_edge);

@@ -421,7 +442,8 @@ kvm_irqfd_deassign(struct kvm *kvm, struct kvm_irqfd *args)
{
struct _irqfd *irqfd, *tmp;
struct eventfd_ctx *eventfd;
- bool is_level = (args->flags & KVM_IRQFD_FLAG_LEVEL) != 0;
+ bool is_level = (args->flags & (KVM_IRQFD_FLAG_LEVEL |
+ KVM_IRQFD_FLAG_LEVEL_DEASSERT)) != 0;

eventfd = eventfd_ctx_fdget(args->fd);
if (IS_ERR(eventfd))
@@ -461,7 +483,9 @@ kvm_irqfd_deassign(struct kvm *kvm, struct kvm_irqfd *args)
int
kvm_irqfd(struct kvm *kvm, struct kvm_irqfd *args)
{
- if (args->flags & ~(KVM_IRQFD_FLAG_DEASSIGN | KVM_IRQFD_FLAG_LEVEL))
+ if (args->flags & ~(KVM_IRQFD_FLAG_DEASSIGN |
+ KVM_IRQFD_FLAG_LEVEL |
+ KVM_IRQFD_FLAG_LEVEL_DEASSERT))
return -EINVAL;

if (args->flags & KVM_IRQFD_FLAG_DEASSIGN)

2012-06-27 09:15:41

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH v2 0/6] kvm: level triggered irqfd support

On Tue, Jun 26, 2012 at 11:08:52PM -0600, Alex Williamson wrote:
> I don't address whether injecting an edge irqfd really needs an assert
> followed by de-assert (I don't know).

So I just sent a patch removing that (works fine for me),
and we'll see what others say. If it gets applied
your patch will get smaller.

--
MST

2012-06-27 09:21:53

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH v2 3/6] kvm: Sanitize KVM_IRQFD flags

On Tue, Jun 26, 2012 at 11:09:32PM -0600, Alex Williamson wrote:
> We only know of one so far.
>
> Signed-off-by: Alex Williamson <[email protected]>

Ugh. So we have a bug: we should have sanitized the fields.
If there's buggy userspace that only set the low bit
it will break with this change.
Is it too late now? Do we need KVM_IRQFD2 which
sanitized fields properly? Avi?

> ---
>
> virt/kvm/eventfd.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
> index c307c24..7d7e2aa 100644
> --- a/virt/kvm/eventfd.c
> +++ b/virt/kvm/eventfd.c
> @@ -340,6 +340,9 @@ kvm_irqfd_deassign(struct kvm *kvm, struct kvm_irqfd *args)
> int
> kvm_irqfd(struct kvm *kvm, struct kvm_irqfd *args)
> {
> + if (args->flags & ~KVM_IRQFD_FLAG_DEASSIGN)
> + return -EINVAL;
> +
> if (args->flags & KVM_IRQFD_FLAG_DEASSIGN)
> return kvm_irqfd_deassign(kvm, args);
>

2012-06-27 09:34:26

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH v2 4/6] kvm: Extend irqfd to support level interrupts

On Tue, Jun 26, 2012 at 11:09:46PM -0600, Alex Williamson wrote:
> In order to inject an interrupt from an external source using an
> irqfd, we need to allocate a new irq_source_id. This allows us to
> assert and (later) de-assert an interrupt line independently from
> users of KVM_IRQ_LINE and avoid lost interrupts.
>
> We also add what may appear like a bit of excessive infrastructure
> around an object for storing this irq_source_id. However, notice
> that we only provide a way to assert the interrupt here. A follow-on
> interface will make use of the same irq_source_id to allow de-assert.
>
> Signed-off-by: Alex Williamson <[email protected]>
> ---
>
> Documentation/virtual/kvm/api.txt | 5 ++
> arch/x86/kvm/x86.c | 1
> include/linux/kvm.h | 3 +
> virt/kvm/eventfd.c | 95 +++++++++++++++++++++++++++++++++++--
> 4 files changed, 99 insertions(+), 5 deletions(-)
>
> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> index ea9edce..b216709 100644
> --- a/Documentation/virtual/kvm/api.txt
> +++ b/Documentation/virtual/kvm/api.txt
> @@ -1981,6 +1981,11 @@ the guest using the specified gsi pin. The irqfd is removed using
> the KVM_IRQFD_FLAG_DEASSIGN flag, specifying both kvm_irqfd.fd
> and kvm_irqfd.gsi.
>
> +With KVM_IRQFD_FLAG_LEVEL KVM_IRQFD allocates a new IRQ source ID for
> +the requested irqfd.

This is the first and last time the term source id appears in this text.
We probably can do without talking about it at all, simply
explain that multiple irqfds mapped to same GSI are
OR-ed together.

> This is necessary to share level triggered
> +interrupts with those injected through KVM_IRQ_LINE. IRQFDs created
> +with KVM_IRQFD_FLAG_LEVEL must also set this flag when de-assiging.
> +KVM_IRQFD_FLAG_LEVEL support is indicated by KVM_CAP_IRQFD_LEVEL.
>
> 5. The kvm_run structure
> ------------------------
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index a01a424..80bed07 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -2148,6 +2148,7 @@ int kvm_dev_ioctl_check_extension(long ext)
> case KVM_CAP_GET_TSC_KHZ:
> case KVM_CAP_PCI_2_3:
> case KVM_CAP_KVMCLOCK_CTRL:
> + case KVM_CAP_IRQFD_LEVEL:
> r = 1;
> break;
> case KVM_CAP_COALESCED_MMIO:
> diff --git a/include/linux/kvm.h b/include/linux/kvm.h
> index 2ce09aa..b2e6e4f 100644
> --- a/include/linux/kvm.h
> +++ b/include/linux/kvm.h
> @@ -618,6 +618,7 @@ struct kvm_ppc_smmu_info {
> #define KVM_CAP_PPC_GET_SMMU_INFO 78
> #define KVM_CAP_S390_COW 79
> #define KVM_CAP_PPC_ALLOC_HTAB 80
> +#define KVM_CAP_IRQFD_LEVEL 81
>
> #ifdef KVM_CAP_IRQ_ROUTING
>
> @@ -683,6 +684,8 @@ struct kvm_xen_hvm_config {
> #endif
>
> #define KVM_IRQFD_FLAG_DEASSIGN (1 << 0)
> +/* Available with KVM_CAP_IRQFD_LEVEL */
> +#define KVM_IRQFD_FLAG_LEVEL (1 << 1)
>
> struct kvm_irqfd {
> __u32 fd;
> diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
> index 7d7e2aa..18cc284 100644
> --- a/virt/kvm/eventfd.c
> +++ b/virt/kvm/eventfd.c
> @@ -36,6 +36,64 @@
> #include "iodev.h"
>
> /*
> + * An irq_source_id can be created from KVM_IRQFD for level interrupt
> + * injections and shared with other interfaces for EOI or de-assert.
> + * Create an object with reference counting to make it easy to use.
> + */
> +struct _irq_source {
> + int id;
> + struct kvm *kvm;
> + struct kref kref;
> +};
> +
> +static void release_irq_source(struct kref *kref)
> +{
> + struct _irq_source *source;
> +
> + source = container_of(kref, struct _irq_source, kref);
> +
> + kvm_free_irq_source_id(source->kvm, source->id);
> + kfree(source);
> +}
> +

It would be nicer to prefix everything with irqfd_
and generally use prefixes. _irqfd_source irqfd_source_release_ etc.

> +static void put_irq_source(struct _irq_source *source)
> +{
> + if (source)
> + kref_put(&source->kref, release_irq_source);
> +}
> +
> +static struct _irq_source *__attribute__ ((used)) /* white lie for now */
> +get_irq_source(struct _irq_source *source)
> +{
> + if (source)
> + kref_get(&source->kref);
> +
> + return source;
> +}
> +
> +static struct _irq_source *new_irq_source(struct kvm *kvm)
> +{
> + struct _irq_source *source;
> + int id;
> +
> + source = kzalloc(sizeof(*source), GFP_KERNEL);
> + if (!source)
> + return ERR_PTR(-ENOMEM);
> +
> + id = kvm_request_irq_source_id(kvm);
> + if (id < 0) {
> + kfree(source);
> + return ERR_PTR(id);
> + }
> +
> + kref_init(&source->kref);
> + source->kvm = kvm;
> + source->id = id;
> +
> + return source;
> +}
> +

s/new/alloc/

> +/*
> * --------------------------------------------------------------------
> * irqfd: Allows an fd to be used to inject an interrupt to the guest
> *
> @@ -52,6 +110,7 @@ struct _irqfd {
> /* Used for level IRQ fast-path */
> int gsi;
> struct work_struct inject;
> + struct _irq_source *source;
> /* Used for setup/shutdown */
> struct eventfd_ctx *eventfd;
> struct list_head list;
> @@ -62,7 +121,7 @@ struct _irqfd {
> static struct workqueue_struct *irqfd_cleanup_wq;
>
> static void
> -irqfd_inject(struct work_struct *work)
> +irqfd_inject_edge(struct work_struct *work)
> {
> struct _irqfd *irqfd = container_of(work, struct _irqfd, inject);
> struct kvm *kvm = irqfd->kvm;
> @@ -71,6 +130,14 @@ irqfd_inject(struct work_struct *work)
> kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd->gsi, 0);
> }
>
> +static void
> +irqfd_inject_level(struct work_struct *work)
> +{
> + struct _irqfd *irqfd = container_of(work, struct _irqfd, inject);
> +
> + kvm_set_irq(irqfd->kvm, irqfd->source->id, irqfd->gsi, 1);
> +}
> +
> /*
> * Race-free decouple logic (ordering is critical)
> */
> @@ -96,6 +163,9 @@ irqfd_shutdown(struct work_struct *work)
> * It is now safe to release the object's resources
> */
> eventfd_ctx_put(irqfd->eventfd);
> +
> + put_irq_source(irqfd->source);
> +
> kfree(irqfd);
> }
>
> @@ -214,7 +284,19 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args)
> irqfd->kvm = kvm;
> irqfd->gsi = args->gsi;
> INIT_LIST_HEAD(&irqfd->list);
> - INIT_WORK(&irqfd->inject, irqfd_inject);
> +
> + if (args->flags & KVM_IRQFD_FLAG_LEVEL) {
> + irqfd->source = new_irq_source(kvm);
> + if (IS_ERR(irqfd->source)) {
> + ret = PTR_ERR(irqfd->source);
> + irqfd->source = NULL;
> + goto fail;

A bit cleaner to create a dedicated label which skips
put_irq_source than doing source=NULL tricks.

> + }
> +
> + INIT_WORK(&irqfd->inject, irqfd_inject_level);
> + } else
> + INIT_WORK(&irqfd->inject, irqfd_inject_edge);
> +
> INIT_WORK(&irqfd->shutdown, irqfd_shutdown);
>
> file = eventfd_fget(args->fd);
> @@ -279,9 +361,10 @@ fail:
> if (eventfd && !IS_ERR(eventfd))
> eventfd_ctx_put(eventfd);
>
> - if (!IS_ERR(file))
> + if (file && !IS_ERR(file))
> fput(file);
>
> + put_irq_source(irqfd->source);
> kfree(irqfd);
> return ret;
> }
> @@ -302,6 +385,7 @@ kvm_irqfd_deassign(struct kvm *kvm, struct kvm_irqfd *args)
> {
> struct _irqfd *irqfd, *tmp;
> struct eventfd_ctx *eventfd;
> + bool is_level = (args->flags & KVM_IRQFD_FLAG_LEVEL) != 0;

!= 0 is not needed here I think.

Also I'm not really sure why are we making userspace
supply KVM_IRQFD_FLAG_LEVEL flag for clear.

>
> eventfd = eventfd_ctx_fdget(args->fd);
> if (IS_ERR(eventfd))
> @@ -310,7 +394,8 @@ kvm_irqfd_deassign(struct kvm *kvm, struct kvm_irqfd *args)
> spin_lock_irq(&kvm->irqfds.lock);
>
> list_for_each_entry_safe(irqfd, tmp, &kvm->irqfds.items, list) {
> - if (irqfd->eventfd == eventfd && irqfd->gsi == args->gsi) {
> + if (irqfd->eventfd == eventfd && irqfd->gsi == args->gsi &&
> + is_level == (irqfd->source != NULL)) {

Let's rename source to level_source to make usage clear?

> /*
> * This rcu_assign_pointer is needed for when
> * another thread calls kvm_irq_routing_update before
> @@ -340,7 +425,7 @@ kvm_irqfd_deassign(struct kvm *kvm, struct kvm_irqfd *args)
> int
> kvm_irqfd(struct kvm *kvm, struct kvm_irqfd *args)
> {
> - if (args->flags & ~KVM_IRQFD_FLAG_DEASSIGN)
> + if (args->flags & ~(KVM_IRQFD_FLAG_DEASSIGN | KVM_IRQFD_FLAG_LEVEL))
> return -EINVAL;
>
> if (args->flags & KVM_IRQFD_FLAG_DEASSIGN)

2012-06-27 09:35:00

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH v2 1/6] kvm: Pass kvm_irqfd to functions

On Tue, Jun 26, 2012 at 11:09:04PM -0600, Alex Williamson wrote:
> Prune this down to just the struct kvm_irqfd so we can avoid
> changing function definition for every flag or field we use.
>
> Signed-off-by: Alex Williamson <[email protected]>

This is not needed anymore, right? We are not adding
new fields.

> ---
>
> include/linux/kvm_host.h | 4 ++--
> virt/kvm/eventfd.c | 20 ++++++++++----------
> virt/kvm/kvm_main.c | 2 +-
> 3 files changed, 13 insertions(+), 13 deletions(-)
>
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 27ac8a4..ae3b426 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -824,7 +824,7 @@ static inline void kvm_free_irq_routing(struct kvm *kvm) {}
> #ifdef CONFIG_HAVE_KVM_EVENTFD
>
> void kvm_eventfd_init(struct kvm *kvm);
> -int kvm_irqfd(struct kvm *kvm, int fd, int gsi, int flags);
> +int kvm_irqfd(struct kvm *kvm, struct kvm_irqfd *args);
> void kvm_irqfd_release(struct kvm *kvm);
> void kvm_irq_routing_update(struct kvm *, struct kvm_irq_routing_table *);
> int kvm_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args);
> @@ -833,7 +833,7 @@ int kvm_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args);
>
> static inline void kvm_eventfd_init(struct kvm *kvm) {}
>
> -static inline int kvm_irqfd(struct kvm *kvm, int fd, int gsi, int flags)
> +static inline int kvm_irqfd(struct kvm *kvm, struct kvm_irqfd *args)
> {
> return -EINVAL;
> }
> diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
> index f59c1e8..c307c24 100644
> --- a/virt/kvm/eventfd.c
> +++ b/virt/kvm/eventfd.c
> @@ -198,7 +198,7 @@ static void irqfd_update(struct kvm *kvm, struct _irqfd *irqfd,
> }
>
> static int
> -kvm_irqfd_assign(struct kvm *kvm, int fd, int gsi)
> +kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args)
> {
> struct kvm_irq_routing_table *irq_rt;
> struct _irqfd *irqfd, *tmp;
> @@ -212,12 +212,12 @@ kvm_irqfd_assign(struct kvm *kvm, int fd, int gsi)
> return -ENOMEM;
>
> irqfd->kvm = kvm;
> - irqfd->gsi = gsi;
> + irqfd->gsi = args->gsi;
> INIT_LIST_HEAD(&irqfd->list);
> INIT_WORK(&irqfd->inject, irqfd_inject);
> INIT_WORK(&irqfd->shutdown, irqfd_shutdown);
>
> - file = eventfd_fget(fd);
> + file = eventfd_fget(args->fd);
> if (IS_ERR(file)) {
> ret = PTR_ERR(file);
> goto fail;
> @@ -298,19 +298,19 @@ kvm_eventfd_init(struct kvm *kvm)
> * shutdown any irqfd's that match fd+gsi
> */
> static int
> -kvm_irqfd_deassign(struct kvm *kvm, int fd, int gsi)
> +kvm_irqfd_deassign(struct kvm *kvm, struct kvm_irqfd *args)
> {
> struct _irqfd *irqfd, *tmp;
> struct eventfd_ctx *eventfd;
>
> - eventfd = eventfd_ctx_fdget(fd);
> + eventfd = eventfd_ctx_fdget(args->fd);
> if (IS_ERR(eventfd))
> return PTR_ERR(eventfd);
>
> spin_lock_irq(&kvm->irqfds.lock);
>
> list_for_each_entry_safe(irqfd, tmp, &kvm->irqfds.items, list) {
> - if (irqfd->eventfd == eventfd && irqfd->gsi == gsi) {
> + if (irqfd->eventfd == eventfd && irqfd->gsi == args->gsi) {
> /*
> * This rcu_assign_pointer is needed for when
> * another thread calls kvm_irq_routing_update before
> @@ -338,12 +338,12 @@ kvm_irqfd_deassign(struct kvm *kvm, int fd, int gsi)
> }
>
> int
> -kvm_irqfd(struct kvm *kvm, int fd, int gsi, int flags)
> +kvm_irqfd(struct kvm *kvm, struct kvm_irqfd *args)
> {
> - if (flags & KVM_IRQFD_FLAG_DEASSIGN)
> - return kvm_irqfd_deassign(kvm, fd, gsi);
> + if (args->flags & KVM_IRQFD_FLAG_DEASSIGN)
> + return kvm_irqfd_deassign(kvm, args);
>
> - return kvm_irqfd_assign(kvm, fd, gsi);
> + return kvm_irqfd_assign(kvm, args);
> }
>
> /*
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 02cb440..b4ad14cc 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -2059,7 +2059,7 @@ static long kvm_vm_ioctl(struct file *filp,
> r = -EFAULT;
> if (copy_from_user(&data, argp, sizeof data))
> goto out;
> - r = kvm_irqfd(kvm, data.fd, data.gsi, data.flags);
> + r = kvm_irqfd(kvm, &data);
> break;
> }
> case KVM_IOEVENTFD: {

2012-06-27 09:49:15

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH v2 5/6] kvm: KVM_EOIFD, an eventfd for EOIs

On Tue, Jun 26, 2012 at 11:10:08PM -0600, Alex Williamson wrote:
> This new ioctl enables an eventfd to be triggered when an EOI is
> written for a specified irqchip pin. By default this is a simple
> notification, but we can also tie the eoifd to a level irqfd, which
> enables the irqchip pin to be automatically de-asserted on EOI.
> This mode is particularly useful for device-assignment applications
> where the unmask and notify triggers a hardware unmask. The default
> mode is most applicable to simple notify with no side-effects for
> userspace usage, such as Qemu.
>
> Here we make use of the reference counting of the _irq_source
> object allowing us to share it with an irqfd and cleanup regardless
> of the release order.
>
> Signed-off-by: Alex Williamson <[email protected]>

Suggest you try running this with various lock checking
enabled. You should see interesting things, at least
one of which I found and point out below :)

> ---
>
> Documentation/virtual/kvm/api.txt | 24 +++++
> arch/x86/kvm/x86.c | 1
> include/linux/kvm.h | 14 +++
> include/linux/kvm_host.h | 13 +++
> virt/kvm/eventfd.c | 189 +++++++++++++++++++++++++++++++++++++
> virt/kvm/kvm_main.c | 11 ++
> 6 files changed, 250 insertions(+), 2 deletions(-)
>
> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> index b216709..87a2558 100644
> --- a/Documentation/virtual/kvm/api.txt
> +++ b/Documentation/virtual/kvm/api.txt
> @@ -1987,6 +1987,30 @@ interrupts with those injected through KVM_IRQ_LINE. IRQFDs created
> with KVM_IRQFD_FLAG_LEVEL must also set this flag when de-assiging.
> KVM_IRQFD_FLAG_LEVEL support is indicated by KVM_CAP_IRQFD_LEVEL.
>
> +4.77 KVM_EOIFD
> +
> +Capability: KVM_CAP_EOIFD
> +Architectures: x86
> +Type: vm ioctl
> +Parameters: struct kvm_eoifd (in)
> +Returns: 0 on success, -1 on error
> +
> +KVM_EOIFD allows userspace to receive EOI notification through an
> +eventfd for level triggered irqchip interrupts. Behavior for edge
> +triggered interrupts is undefined. kvm_eoifd.fd specifies the eventfd
> +used for notification and kvm_eoifd.gsi specifies the irchip pin,
> +similar to KVM_IRQFD. KVM_EOIFD_FLAG_DEASSIGN is used to deassign
> +a previously enabled eoifd and should also set fd and gsi to match.
> +
> +The KVM_EOIFD_FLAG_LEVEL_IRQFD flag indicates that the EOI is for
> +a level triggered EOI and the kvm_eoifd structure includes
> +kvm_eoifd.irqfd,

Actually it is always level, right? So we can just call it
KVM_EOIFD_FLAG_IRQFD?

> which must be previously configured using KVM_IRQFD
> +with the KVM_IRQFD_FLAG_LEVEL flag. This allows both EOI notification
> +through kvm_eoifd.fd as well as automatically de-asserting level
> +irqfds on EOI. Both KVM_EOIFD_FLAG_DEASSIGN and
> +KVM_EOIFD_FLAG_LEVEL_IRQFD should be used to de-assign an eoifd
> +initially setup with KVM_EOIFD_FLAG_LEVEL_IRQFD.

And do you also need to pass in same irqfd?

> +
> 5. The kvm_run structure
> ------------------------
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 80bed07..62d6eca 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -2149,6 +2149,7 @@ int kvm_dev_ioctl_check_extension(long ext)
> case KVM_CAP_PCI_2_3:
> case KVM_CAP_KVMCLOCK_CTRL:
> case KVM_CAP_IRQFD_LEVEL:
> + case KVM_CAP_EOIFD:
> r = 1;
> break;
> case KVM_CAP_COALESCED_MMIO:
> diff --git a/include/linux/kvm.h b/include/linux/kvm.h
> index b2e6e4f..7567e7d 100644
> --- a/include/linux/kvm.h
> +++ b/include/linux/kvm.h
> @@ -619,6 +619,7 @@ struct kvm_ppc_smmu_info {
> #define KVM_CAP_S390_COW 79
> #define KVM_CAP_PPC_ALLOC_HTAB 80
> #define KVM_CAP_IRQFD_LEVEL 81
> +#define KVM_CAP_EOIFD 82
>
> #ifdef KVM_CAP_IRQ_ROUTING
>
> @@ -694,6 +695,17 @@ struct kvm_irqfd {
> __u8 pad[20];
> };
>
> +#define KVM_EOIFD_FLAG_DEASSIGN (1 << 0)
> +#define KVM_EOIFD_FLAG_LEVEL_IRQFD (1 << 1)
> +
> +struct kvm_eoifd {
> + __u32 fd;
> + __u32 gsi;
> + __u32 flags;
> + __u32 irqfd;
> + __u8 pad[16];
> +};
> +
> struct kvm_clock_data {
> __u64 clock;
> __u32 flags;
> @@ -834,6 +846,8 @@ struct kvm_s390_ucas_mapping {
> #define KVM_PPC_GET_SMMU_INFO _IOR(KVMIO, 0xa6, struct kvm_ppc_smmu_info)
> /* Available with KVM_CAP_PPC_ALLOC_HTAB */
> #define KVM_PPC_ALLOCATE_HTAB _IOWR(KVMIO, 0xa7, __u32)
> +/* Available with KVM_CAP_EOIFD */
> +#define KVM_EOIFD _IOW(KVMIO, 0xa8, struct kvm_eoifd)
>
> /*
> * ioctls for vcpu fds
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index ae3b426..83472eb 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -285,6 +285,10 @@ struct kvm {
> struct list_head items;
> } irqfds;
> struct list_head ioeventfds;
> + struct {
> + spinlock_t lock;
> + struct list_head items;
> + } eoifds;
> #endif
> struct kvm_vm_stat stat;
> struct kvm_arch arch;
> @@ -828,6 +832,8 @@ int kvm_irqfd(struct kvm *kvm, struct kvm_irqfd *args);
> void kvm_irqfd_release(struct kvm *kvm);
> void kvm_irq_routing_update(struct kvm *, struct kvm_irq_routing_table *);
> int kvm_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args);
> +int kvm_eoifd(struct kvm *kvm, struct kvm_eoifd *args);
> +void kvm_eoifd_release(struct kvm *kvm);
>
> #else
>
> @@ -853,6 +859,13 @@ static inline int kvm_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args)
> return -ENOSYS;
> }
>
> +static inline int kvm_eoifd(struct kvm *kvm, struct kvm_eoifd *args)
> +{
> + return -ENOSYS;
> +}
> +
> +static inline void kvm_eoifd_release(struct kvm *kvm) {}
> +
> #endif /* CONFIG_HAVE_KVM_EVENTFD */
>
> #ifdef CONFIG_KVM_APIC_ARCHITECTURE
> diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
> index 18cc284..02ca50f 100644
> --- a/virt/kvm/eventfd.c
> +++ b/virt/kvm/eventfd.c
> @@ -62,8 +62,7 @@ static void put_irq_source(struct _irq_source *source)
> kref_put(&source->kref, release_irq_source);
> }
>
> -static struct _irq_source *__attribute__ ((used)) /* white lie for now */
> -get_irq_source(struct _irq_source *source)
> +static struct _irq_source *get_irq_source(struct _irq_source *source)
> {
> if (source)
> kref_get(&source->kref);
> @@ -118,6 +117,41 @@ struct _irqfd {
> struct work_struct shutdown;
> };
>
> +static struct _irq_source *get_irq_source_from_irqfd(struct kvm *kvm, int fd)
> +{
> + struct file *file;
> + struct eventfd_ctx *eventfd;
> + struct _irqfd *irqfd;
> + struct _irq_source *source = NULL;
> +
> + file = fget(fd);
> + if (!file)
> + return ERR_PTR(-EBADF);
> +
> + eventfd = eventfd_ctx_fileget(file);
> + if (IS_ERR(eventfd)) {
> + fput(file);
> + return (struct _irq_source *)eventfd;
> + }
> +
> + spin_lock_irq(&kvm->irqfds.lock);
> +
> + list_for_each_entry(irqfd, &kvm->irqfds.items, list) {
> + if (irqfd->eventfd != eventfd)
> + continue;
> +
> + source = get_irq_source(irqfd->source);
> + break;
> + }
> +
> + spin_unlock_irq(&kvm->irqfds.lock);
> +
> + eventfd_ctx_put(eventfd);
> + fput(file);
> +
> + return source ? source : ERR_PTR(-ENODEV);
> +}
> +
> static struct workqueue_struct *irqfd_cleanup_wq;
>
> static void
> @@ -375,6 +409,8 @@ kvm_eventfd_init(struct kvm *kvm)
> spin_lock_init(&kvm->irqfds.lock);
> INIT_LIST_HEAD(&kvm->irqfds.items);
> INIT_LIST_HEAD(&kvm->ioeventfds);
> + spin_lock_init(&kvm->eoifds.lock);
> + INIT_LIST_HEAD(&kvm->eoifds.items);
> }
>
> /*
> @@ -743,3 +779,152 @@ kvm_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args)
>
> return kvm_assign_ioeventfd(kvm, args);
> }
> +
> +/*
> + * --------------------------------------------------------------------
> + * eoifd: Translate KVM APIC/IOAPIC EOI into eventfd signal.
> + *
> + * userspace can register GSIs with an eventfd for receiving
> + * notification when an EOI occurs.
> + * --------------------------------------------------------------------
> + */
> +
> +struct _eoifd {
> + struct kvm *kvm;
> + struct eventfd_ctx *eventfd;
> + struct _irq_source *source;

Need to document what this field does (ideally other
fields too). I think this is source we should clear?
So call it source_to_clear or something like that?

> + struct kvm_irq_ack_notifier notifier;
> + struct list_head list;
> +};
> +
> +static void eoifd_event(struct kvm_irq_ack_notifier *notifier)
> +{
> + struct _eoifd *eoifd;
> +
> + eoifd = container_of(notifier, struct _eoifd, notifier);
> +
> + if (eoifd->source)
> + kvm_set_irq(eoifd->kvm, eoifd->source->id,
> + eoifd->notifier.gsi, 0);
> +
> + eventfd_signal(eoifd->eventfd, 1);
> +}
> +
> +static int kvm_assign_eoifd(struct kvm *kvm, struct kvm_eoifd *args)
> +{
> + struct eventfd_ctx *eventfd;
> + struct _eoifd *eoifd, *tmp;
> + struct _irq_source *source = NULL;
> +
> + if (args->flags & KVM_EOIFD_FLAG_LEVEL_IRQFD) {
> + source = get_irq_source_from_irqfd(kvm, args->irqfd);
> + if (IS_ERR(source))
> + return PTR_ERR(source);
> + }
> +
> + eventfd = eventfd_ctx_fdget(args->fd);
> + if (IS_ERR(eventfd)) {
> + put_irq_source(source);
> + return PTR_ERR(eventfd);
> + }
> +
> + eoifd = kzalloc(sizeof(*eoifd), GFP_KERNEL);
> + if (!eoifd) {
> + put_irq_source(source);
> + eventfd_ctx_put(eventfd);
> + return -ENOMEM;
> + }
> +
> + INIT_LIST_HEAD(&eoifd->list);
> + eoifd->kvm = kvm;
> + eoifd->eventfd = eventfd;
> + eoifd->source = source;
> + eoifd->notifier.gsi = args->gsi;
> + eoifd->notifier.irq_acked = eoifd_event;
> +
> + spin_lock_irq(&kvm->eoifds.lock);
> +
> + list_for_each_entry(tmp, &kvm->eoifds.items, list) {
> + if (eoifd->eventfd != tmp->eventfd)
> + continue;
> +
> + put_irq_source(source);
> + eventfd_ctx_put(eventfd);
> + kfree(eoifd);
> + return -EBUSY;
> + }
> +
> + list_add_tail(&eoifd->list, &kvm->eoifds.items);
> + kvm_register_irq_ack_notifier(kvm, &eoifd->notifier);
> +
> + spin_unlock_irq(&kvm->eoifds.lock);
> +
> + return 0;
> +}
> +
> +static void eoifd_deactivate(struct kvm *kvm, struct _eoifd *eoifd)
> +{
> + list_del(&eoifd->list);
> + kvm_unregister_irq_ack_notifier(kvm, &eoifd->notifier);

This can sleep so eoifd_deactivate can too.

> + put_irq_source(eoifd->source);
> + eventfd_ctx_put(eoifd->eventfd);
> + kfree(eoifd);
> +}
> +
> +void kvm_eoifd_release(struct kvm *kvm)
> +{
> + struct _eoifd *eoifd, *tmp;
> +
> + spin_lock_irq(&kvm->eoifds.lock);
> +
> + list_for_each_entry_safe(eoifd, tmp, &kvm->eoifds.items, list)
> + eoifd_deactivate(kvm, eoifd);

sleeping function called under a spinlock here.

> +
> + spin_unlock_irq(&kvm->eoifds.lock);
> +}
> +
> +static int kvm_deassign_eoifd(struct kvm *kvm, struct kvm_eoifd *args)
> +{
> + struct eventfd_ctx *eventfd;
> + struct _eoifd *eoifd;
> + bool uses_source = (args->flags & KVM_EOIFD_FLAG_LEVEL_IRQFD) != 0;
> + int ret = -ENODEV;
> +
> + eventfd = eventfd_ctx_fdget(args->fd);
> + if (IS_ERR(eventfd))
> + return PTR_ERR(eventfd);
> +
> + spin_lock_irq(&kvm->eoifds.lock);
> +
> + list_for_each_entry(eoifd, &kvm->eoifds.items, list) {
> + /*
> + * Matching eventfd is unique since we don't allow dulicates,


typo

> + * the rest is sanitizing the calling parameters.
> + */
> + if (eoifd->eventfd == eventfd &&
> + eoifd->notifier.gsi == args->gsi &&
> + uses_source == (eoifd->source != NULL)) {
> + eoifd_deactivate(kvm, eoifd);
> + ret = 0;
> + break;
> + }
> + }
> +
> + spin_unlock_irq(&kvm->eoifds.lock);
> +
> + eventfd_ctx_put(eventfd);
> +
> + return ret;
> +}
> +
> +int kvm_eoifd(struct kvm *kvm, struct kvm_eoifd *args)
> +{
> + if (args->flags & ~(KVM_EOIFD_FLAG_DEASSIGN |
> + KVM_EOIFD_FLAG_LEVEL_IRQFD))
> + return -EINVAL;
> +
> + if (args->flags & KVM_EOIFD_FLAG_DEASSIGN)
> + return kvm_deassign_eoifd(kvm, args);
> +
> + return kvm_assign_eoifd(kvm, args);
> +}
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index b4ad14cc..5b41df1 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -620,6 +620,8 @@ static int kvm_vm_release(struct inode *inode, struct file *filp)
>
> kvm_irqfd_release(kvm);
>
> + kvm_eoifd_release(kvm);
> +
> kvm_put_kvm(kvm);
> return 0;
> }
> @@ -2093,6 +2095,15 @@ static long kvm_vm_ioctl(struct file *filp,
> break;
> }
> #endif
> + case KVM_EOIFD: {
> + struct kvm_eoifd data;
> +
> + r = -EFAULT;
> + if (copy_from_user(&data, argp, sizeof data))
> + goto out;
> + r = kvm_eoifd(kvm, &data);
> + break;
> + }
> default:
> r = kvm_arch_vm_ioctl(filp, ioctl, arg);
> if (r == -ENOTTY)

2012-06-27 09:51:39

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH v2 4/6] kvm: Extend irqfd to support level interrupts

On Tue, Jun 26, 2012 at 11:09:46PM -0600, Alex Williamson wrote:
> In order to inject an interrupt from an external source using an
> irqfd, we need to allocate a new irq_source_id. This allows us to
> assert and (later) de-assert an interrupt line independently from
> users of KVM_IRQ_LINE and avoid lost interrupts.
>
> We also add what may appear like a bit of excessive infrastructure
> around an object for storing this irq_source_id. However, notice
> that we only provide a way to assert the interrupt here. A follow-on
> interface will make use of the same irq_source_id to allow de-assert.
>
> Signed-off-by: Alex Williamson <[email protected]>
> ---
>
> Documentation/virtual/kvm/api.txt | 5 ++
> arch/x86/kvm/x86.c | 1
> include/linux/kvm.h | 3 +
> virt/kvm/eventfd.c | 95 +++++++++++++++++++++++++++++++++++--
> 4 files changed, 99 insertions(+), 5 deletions(-)
>
> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> index ea9edce..b216709 100644
> --- a/Documentation/virtual/kvm/api.txt
> +++ b/Documentation/virtual/kvm/api.txt
> @@ -1981,6 +1981,11 @@ the guest using the specified gsi pin. The irqfd is removed using
> the KVM_IRQFD_FLAG_DEASSIGN flag, specifying both kvm_irqfd.fd
> and kvm_irqfd.gsi.
>
> +With KVM_IRQFD_FLAG_LEVEL KVM_IRQFD allocates a new IRQ source ID for
> +the requested irqfd. This is necessary to share level triggered
> +interrupts with those injected through KVM_IRQ_LINE. IRQFDs created
> +with KVM_IRQFD_FLAG_LEVEL must also set this flag when de-assiging.
> +KVM_IRQFD_FLAG_LEVEL support is indicated by KVM_CAP_IRQFD_LEVEL.

Note that if my patch removing auto-deassert gets accepted,
this is not needed at all: we can just look at the GSI
to see if it's level or edge.

>
> 5. The kvm_run structure
> ------------------------
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index a01a424..80bed07 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -2148,6 +2148,7 @@ int kvm_dev_ioctl_check_extension(long ext)
> case KVM_CAP_GET_TSC_KHZ:
> case KVM_CAP_PCI_2_3:
> case KVM_CAP_KVMCLOCK_CTRL:
> + case KVM_CAP_IRQFD_LEVEL:
> r = 1;
> break;
> case KVM_CAP_COALESCED_MMIO:
> diff --git a/include/linux/kvm.h b/include/linux/kvm.h
> index 2ce09aa..b2e6e4f 100644
> --- a/include/linux/kvm.h
> +++ b/include/linux/kvm.h
> @@ -618,6 +618,7 @@ struct kvm_ppc_smmu_info {
> #define KVM_CAP_PPC_GET_SMMU_INFO 78
> #define KVM_CAP_S390_COW 79
> #define KVM_CAP_PPC_ALLOC_HTAB 80
> +#define KVM_CAP_IRQFD_LEVEL 81
>
> #ifdef KVM_CAP_IRQ_ROUTING
>
> @@ -683,6 +684,8 @@ struct kvm_xen_hvm_config {
> #endif
>
> #define KVM_IRQFD_FLAG_DEASSIGN (1 << 0)
> +/* Available with KVM_CAP_IRQFD_LEVEL */
> +#define KVM_IRQFD_FLAG_LEVEL (1 << 1)
>
> struct kvm_irqfd {
> __u32 fd;
> diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
> index 7d7e2aa..18cc284 100644
> --- a/virt/kvm/eventfd.c
> +++ b/virt/kvm/eventfd.c
> @@ -36,6 +36,64 @@
> #include "iodev.h"
>
> /*
> + * An irq_source_id can be created from KVM_IRQFD for level interrupt
> + * injections and shared with other interfaces for EOI or de-assert.
> + * Create an object with reference counting to make it easy to use.
> + */
> +struct _irq_source {
> + int id;
> + struct kvm *kvm;
> + struct kref kref;
> +};
> +
> +static void release_irq_source(struct kref *kref)
> +{
> + struct _irq_source *source;
> +
> + source = container_of(kref, struct _irq_source, kref);
> +
> + kvm_free_irq_source_id(source->kvm, source->id);
> + kfree(source);
> +}
> +
> +static void put_irq_source(struct _irq_source *source)
> +{
> + if (source)
> + kref_put(&source->kref, release_irq_source);
> +}
> +
> +static struct _irq_source *__attribute__ ((used)) /* white lie for now */
> +get_irq_source(struct _irq_source *source)
> +{
> + if (source)
> + kref_get(&source->kref);
> +
> + return source;
> +}
> +
> +static struct _irq_source *new_irq_source(struct kvm *kvm)
> +{
> + struct _irq_source *source;
> + int id;
> +
> + source = kzalloc(sizeof(*source), GFP_KERNEL);
> + if (!source)
> + return ERR_PTR(-ENOMEM);
> +
> + id = kvm_request_irq_source_id(kvm);
> + if (id < 0) {
> + kfree(source);
> + return ERR_PTR(id);
> + }
> +
> + kref_init(&source->kref);
> + source->kvm = kvm;
> + source->id = id;
> +
> + return source;
> +}
> +
> +/*
> * --------------------------------------------------------------------
> * irqfd: Allows an fd to be used to inject an interrupt to the guest
> *
> @@ -52,6 +110,7 @@ struct _irqfd {
> /* Used for level IRQ fast-path */
> int gsi;
> struct work_struct inject;
> + struct _irq_source *source;
> /* Used for setup/shutdown */
> struct eventfd_ctx *eventfd;
> struct list_head list;
> @@ -62,7 +121,7 @@ struct _irqfd {
> static struct workqueue_struct *irqfd_cleanup_wq;
>
> static void
> -irqfd_inject(struct work_struct *work)
> +irqfd_inject_edge(struct work_struct *work)
> {
> struct _irqfd *irqfd = container_of(work, struct _irqfd, inject);
> struct kvm *kvm = irqfd->kvm;
> @@ -71,6 +130,14 @@ irqfd_inject(struct work_struct *work)
> kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd->gsi, 0);
> }
>
> +static void
> +irqfd_inject_level(struct work_struct *work)
> +{
> + struct _irqfd *irqfd = container_of(work, struct _irqfd, inject);
> +
> + kvm_set_irq(irqfd->kvm, irqfd->source->id, irqfd->gsi, 1);
> +}
> +
> /*
> * Race-free decouple logic (ordering is critical)
> */
> @@ -96,6 +163,9 @@ irqfd_shutdown(struct work_struct *work)
> * It is now safe to release the object's resources
> */
> eventfd_ctx_put(irqfd->eventfd);
> +
> + put_irq_source(irqfd->source);
> +
> kfree(irqfd);
> }
>
> @@ -214,7 +284,19 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args)
> irqfd->kvm = kvm;
> irqfd->gsi = args->gsi;
> INIT_LIST_HEAD(&irqfd->list);
> - INIT_WORK(&irqfd->inject, irqfd_inject);
> +
> + if (args->flags & KVM_IRQFD_FLAG_LEVEL) {
> + irqfd->source = new_irq_source(kvm);
> + if (IS_ERR(irqfd->source)) {
> + ret = PTR_ERR(irqfd->source);
> + irqfd->source = NULL;
> + goto fail;
> + }
> +
> + INIT_WORK(&irqfd->inject, irqfd_inject_level);
> + } else
> + INIT_WORK(&irqfd->inject, irqfd_inject_edge);
> +
> INIT_WORK(&irqfd->shutdown, irqfd_shutdown);
>
> file = eventfd_fget(args->fd);
> @@ -279,9 +361,10 @@ fail:
> if (eventfd && !IS_ERR(eventfd))
> eventfd_ctx_put(eventfd);
>
> - if (!IS_ERR(file))
> + if (file && !IS_ERR(file))
> fput(file);
>
> + put_irq_source(irqfd->source);
> kfree(irqfd);
> return ret;
> }
> @@ -302,6 +385,7 @@ kvm_irqfd_deassign(struct kvm *kvm, struct kvm_irqfd *args)
> {
> struct _irqfd *irqfd, *tmp;
> struct eventfd_ctx *eventfd;
> + bool is_level = (args->flags & KVM_IRQFD_FLAG_LEVEL) != 0;
>
> eventfd = eventfd_ctx_fdget(args->fd);
> if (IS_ERR(eventfd))
> @@ -310,7 +394,8 @@ kvm_irqfd_deassign(struct kvm *kvm, struct kvm_irqfd *args)
> spin_lock_irq(&kvm->irqfds.lock);
>
> list_for_each_entry_safe(irqfd, tmp, &kvm->irqfds.items, list) {
> - if (irqfd->eventfd == eventfd && irqfd->gsi == args->gsi) {
> + if (irqfd->eventfd == eventfd && irqfd->gsi == args->gsi &&
> + is_level == (irqfd->source != NULL)) {
> /*
> * This rcu_assign_pointer is needed for when
> * another thread calls kvm_irq_routing_update before
> @@ -340,7 +425,7 @@ kvm_irqfd_deassign(struct kvm *kvm, struct kvm_irqfd *args)
> int
> kvm_irqfd(struct kvm *kvm, struct kvm_irqfd *args)
> {
> - if (args->flags & ~KVM_IRQFD_FLAG_DEASSIGN)
> + if (args->flags & ~(KVM_IRQFD_FLAG_DEASSIGN | KVM_IRQFD_FLAG_LEVEL))
> return -EINVAL;
>
> if (args->flags & KVM_IRQFD_FLAG_DEASSIGN)

2012-06-27 09:53:46

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH v2 2/6] kvm: Add missing KVM_IRQFD API documentation

On Tue, Jun 26, 2012 at 11:09:17PM -0600, Alex Williamson wrote:
> Signed-off-by: Alex Williamson <[email protected]>

Acked-by: Michael S. Tsirkin <[email protected]>

> ---
>
> Documentation/virtual/kvm/api.txt | 16 ++++++++++++++++
> 1 file changed, 16 insertions(+)
>
> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> index 310fe50..ea9edce 100644
> --- a/Documentation/virtual/kvm/api.txt
> +++ b/Documentation/virtual/kvm/api.txt
> @@ -1965,6 +1965,22 @@ return the hash table order in the parameter. (If the guest is using
> the virtualized real-mode area (VRMA) facility, the kernel will
> re-create the VMRA HPTEs on the next KVM_RUN of any vcpu.)
>
> +4.76 KVM_IRQFD
> +
> +Capability: KVM_CAP_IRQFD
> +Architectures: x86
> +Type: vm ioctl
> +Parameters: struct kvm_irqfd (in)
> +Returns: 0 on success, -1 on error
> +
> +Allows setting an eventfd to directly trigger a guest interrupt.
> +kvm_irqfd.fd specifies the file descriptor to use as the eventfd and
> +kvm_irqfd.gsi specifies the irqchip pin toggled by this event. When
> +an event is tiggered on the eventfd, an interrupt is injected into
> +the guest using the specified gsi pin. The irqfd is removed using
> +the KVM_IRQFD_FLAG_DEASSIGN flag, specifying both kvm_irqfd.fd
> +and kvm_irqfd.gsi.
> +
>
> 5. The kvm_run structure
> ------------------------

2012-06-27 09:58:13

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH v2 0/6] kvm: level triggered irqfd support

On Tue, Jun 26, 2012 at 11:08:52PM -0600, Alex Williamson wrote:
> Ok, let's see how this flies. I actually quite like this, so be
> gentle tearing it apart ;)
>
> I just couldn't bring myself to contort KVM_IRQFD into something
> that either sets up an irqfd or specifies a nearly unrelated EOI
> eventfd. The solution I've come up with, that also avoids exposing
> irq_source_ids to userspace, is to work through the irqfd. If we
> setup a level irqfd, we can optionally associate an eoifd with
> the same irq_source_id, by passing the irqfd. To do this, we just
> need to create a new reference counted object for the source ID
> so we don't run into problems ordering release. This means we
> end up with a KVM_EOIFD ioctl that has both general usefulness and
> can still tie into an irqfd.

I like this API.

> In patch 6/6 I also include an alternate de-assert mechanism via an
> irqfd with the opposite polarity. I don't currently use this, but
> it's pretty trivial and at least available in archives now.

The nasty bit about that is that if you assert twice then
deassert once it's not clear what should happen.
But yea, it does not hurt to put them in the archives.

>
> I don't address whether injecting an edge irqfd really needs an assert
> followed by de-assert (I don't know). This new interface really
> unties itself from caring. We might be able to consolidate inject
> functions at some future point, but it doesn't change how we'd name
> flags as it did in the previous version. Thanks,
>
> Alex

2012-06-27 13:58:20

by Gleb Natapov

[permalink] [raw]
Subject: Re: [PATCH v2 5/6] kvm: KVM_EOIFD, an eventfd for EOIs

On Tue, Jun 26, 2012 at 11:10:08PM -0600, Alex Williamson wrote:
> This new ioctl enables an eventfd to be triggered when an EOI is
> written for a specified irqchip pin. By default this is a simple
> notification, but we can also tie the eoifd to a level irqfd, which
> enables the irqchip pin to be automatically de-asserted on EOI.
> This mode is particularly useful for device-assignment applications
> where the unmask and notify triggers a hardware unmask. The default
> mode is most applicable to simple notify with no side-effects for
> userspace usage, such as Qemu.
>
> Here we make use of the reference counting of the _irq_source
> object allowing us to share it with an irqfd and cleanup regardless
> of the release order.
>
> Signed-off-by: Alex Williamson <[email protected]>
> ---
>
> Documentation/virtual/kvm/api.txt | 24 +++++
> arch/x86/kvm/x86.c | 1
> include/linux/kvm.h | 14 +++
> include/linux/kvm_host.h | 13 +++
> virt/kvm/eventfd.c | 189 +++++++++++++++++++++++++++++++++++++
> virt/kvm/kvm_main.c | 11 ++
> 6 files changed, 250 insertions(+), 2 deletions(-)
>
> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> index b216709..87a2558 100644
> --- a/Documentation/virtual/kvm/api.txt
> +++ b/Documentation/virtual/kvm/api.txt
> @@ -1987,6 +1987,30 @@ interrupts with those injected through KVM_IRQ_LINE. IRQFDs created
> with KVM_IRQFD_FLAG_LEVEL must also set this flag when de-assiging.
> KVM_IRQFD_FLAG_LEVEL support is indicated by KVM_CAP_IRQFD_LEVEL.
>
> +4.77 KVM_EOIFD
> +
> +Capability: KVM_CAP_EOIFD
> +Architectures: x86
> +Type: vm ioctl
> +Parameters: struct kvm_eoifd (in)
> +Returns: 0 on success, -1 on error
> +
> +KVM_EOIFD allows userspace to receive EOI notification through an
> +eventfd for level triggered irqchip interrupts. Behavior for edge
> +triggered interrupts is undefined. kvm_eoifd.fd specifies the eventfd
Lets make it defined. EOI notification can be used by userspace to fix
time drift due to lost interrupts. But than userspace needs to know
which vcpu did EOI.

Also see question below.

> +used for notification and kvm_eoifd.gsi specifies the irchip pin,
> +similar to KVM_IRQFD. KVM_EOIFD_FLAG_DEASSIGN is used to deassign
> +a previously enabled eoifd and should also set fd and gsi to match.
> +
> +The KVM_EOIFD_FLAG_LEVEL_IRQFD flag indicates that the EOI is for
> +a level triggered EOI and the kvm_eoifd structure includes
> +kvm_eoifd.irqfd, which must be previously configured using KVM_IRQFD
> +with the KVM_IRQFD_FLAG_LEVEL flag. This allows both EOI notification
> +through kvm_eoifd.fd as well as automatically de-asserting level
> +irqfds on EOI. Both KVM_EOIFD_FLAG_DEASSIGN and
> +KVM_EOIFD_FLAG_LEVEL_IRQFD should be used to de-assign an eoifd
> +initially setup with KVM_EOIFD_FLAG_LEVEL_IRQFD.
> +
> 5. The kvm_run structure
> ------------------------
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 80bed07..62d6eca 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -2149,6 +2149,7 @@ int kvm_dev_ioctl_check_extension(long ext)
> case KVM_CAP_PCI_2_3:
> case KVM_CAP_KVMCLOCK_CTRL:
> case KVM_CAP_IRQFD_LEVEL:
> + case KVM_CAP_EOIFD:
> r = 1;
> break;
> case KVM_CAP_COALESCED_MMIO:
> diff --git a/include/linux/kvm.h b/include/linux/kvm.h
> index b2e6e4f..7567e7d 100644
> --- a/include/linux/kvm.h
> +++ b/include/linux/kvm.h
> @@ -619,6 +619,7 @@ struct kvm_ppc_smmu_info {
> #define KVM_CAP_S390_COW 79
> #define KVM_CAP_PPC_ALLOC_HTAB 80
> #define KVM_CAP_IRQFD_LEVEL 81
> +#define KVM_CAP_EOIFD 82
>
> #ifdef KVM_CAP_IRQ_ROUTING
>
> @@ -694,6 +695,17 @@ struct kvm_irqfd {
> __u8 pad[20];
> };
>
> +#define KVM_EOIFD_FLAG_DEASSIGN (1 << 0)
> +#define KVM_EOIFD_FLAG_LEVEL_IRQFD (1 << 1)
> +
> +struct kvm_eoifd {
> + __u32 fd;
> + __u32 gsi;
> + __u32 flags;
> + __u32 irqfd;
> + __u8 pad[16];
> +};
> +
> struct kvm_clock_data {
> __u64 clock;
> __u32 flags;
> @@ -834,6 +846,8 @@ struct kvm_s390_ucas_mapping {
> #define KVM_PPC_GET_SMMU_INFO _IOR(KVMIO, 0xa6, struct kvm_ppc_smmu_info)
> /* Available with KVM_CAP_PPC_ALLOC_HTAB */
> #define KVM_PPC_ALLOCATE_HTAB _IOWR(KVMIO, 0xa7, __u32)
> +/* Available with KVM_CAP_EOIFD */
> +#define KVM_EOIFD _IOW(KVMIO, 0xa8, struct kvm_eoifd)
>
> /*
> * ioctls for vcpu fds
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index ae3b426..83472eb 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -285,6 +285,10 @@ struct kvm {
> struct list_head items;
> } irqfds;
> struct list_head ioeventfds;
> + struct {
> + spinlock_t lock;
> + struct list_head items;
> + } eoifds;
> #endif
> struct kvm_vm_stat stat;
> struct kvm_arch arch;
> @@ -828,6 +832,8 @@ int kvm_irqfd(struct kvm *kvm, struct kvm_irqfd *args);
> void kvm_irqfd_release(struct kvm *kvm);
> void kvm_irq_routing_update(struct kvm *, struct kvm_irq_routing_table *);
> int kvm_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args);
> +int kvm_eoifd(struct kvm *kvm, struct kvm_eoifd *args);
> +void kvm_eoifd_release(struct kvm *kvm);
>
> #else
>
> @@ -853,6 +859,13 @@ static inline int kvm_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args)
> return -ENOSYS;
> }
>
> +static inline int kvm_eoifd(struct kvm *kvm, struct kvm_eoifd *args)
> +{
> + return -ENOSYS;
> +}
> +
> +static inline void kvm_eoifd_release(struct kvm *kvm) {}
> +
> #endif /* CONFIG_HAVE_KVM_EVENTFD */
>
> #ifdef CONFIG_KVM_APIC_ARCHITECTURE
> diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
> index 18cc284..02ca50f 100644
> --- a/virt/kvm/eventfd.c
> +++ b/virt/kvm/eventfd.c
> @@ -62,8 +62,7 @@ static void put_irq_source(struct _irq_source *source)
> kref_put(&source->kref, release_irq_source);
> }
>
> -static struct _irq_source *__attribute__ ((used)) /* white lie for now */
> -get_irq_source(struct _irq_source *source)
> +static struct _irq_source *get_irq_source(struct _irq_source *source)
> {
> if (source)
> kref_get(&source->kref);
> @@ -118,6 +117,41 @@ struct _irqfd {
> struct work_struct shutdown;
> };
>
> +static struct _irq_source *get_irq_source_from_irqfd(struct kvm *kvm, int fd)
> +{
> + struct file *file;
> + struct eventfd_ctx *eventfd;
> + struct _irqfd *irqfd;
> + struct _irq_source *source = NULL;
> +
> + file = fget(fd);
> + if (!file)
> + return ERR_PTR(-EBADF);
> +
> + eventfd = eventfd_ctx_fileget(file);
> + if (IS_ERR(eventfd)) {
> + fput(file);
> + return (struct _irq_source *)eventfd;
> + }
> +
> + spin_lock_irq(&kvm->irqfds.lock);
> +
> + list_for_each_entry(irqfd, &kvm->irqfds.items, list) {
> + if (irqfd->eventfd != eventfd)
> + continue;
> +
> + source = get_irq_source(irqfd->source);
> + break;
> + }
> +
> + spin_unlock_irq(&kvm->irqfds.lock);
> +
> + eventfd_ctx_put(eventfd);
> + fput(file);
> +
> + return source ? source : ERR_PTR(-ENODEV);
> +}
> +
> static struct workqueue_struct *irqfd_cleanup_wq;
>
> static void
> @@ -375,6 +409,8 @@ kvm_eventfd_init(struct kvm *kvm)
> spin_lock_init(&kvm->irqfds.lock);
> INIT_LIST_HEAD(&kvm->irqfds.items);
> INIT_LIST_HEAD(&kvm->ioeventfds);
> + spin_lock_init(&kvm->eoifds.lock);
> + INIT_LIST_HEAD(&kvm->eoifds.items);
> }
>
> /*
> @@ -743,3 +779,152 @@ kvm_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args)
>
> return kvm_assign_ioeventfd(kvm, args);
> }
> +
> +/*
> + * --------------------------------------------------------------------
> + * eoifd: Translate KVM APIC/IOAPIC EOI into eventfd signal.
> + *
> + * userspace can register GSIs with an eventfd for receiving
> + * notification when an EOI occurs.
> + * --------------------------------------------------------------------
> + */
> +
> +struct _eoifd {
> + struct kvm *kvm;
> + struct eventfd_ctx *eventfd;
> + struct _irq_source *source;
> + struct kvm_irq_ack_notifier notifier;
> + struct list_head list;
> +};
> +
> +static void eoifd_event(struct kvm_irq_ack_notifier *notifier)
> +{
> + struct _eoifd *eoifd;
> +
> + eoifd = container_of(notifier, struct _eoifd, notifier);
> +
> + if (eoifd->source)
> + kvm_set_irq(eoifd->kvm, eoifd->source->id,
> + eoifd->notifier.gsi, 0);
> +
> + eventfd_signal(eoifd->eventfd, 1);
> +}
> +
> +static int kvm_assign_eoifd(struct kvm *kvm, struct kvm_eoifd *args)
> +{
> + struct eventfd_ctx *eventfd;
> + struct _eoifd *eoifd, *tmp;
> + struct _irq_source *source = NULL;
> +
> + if (args->flags & KVM_EOIFD_FLAG_LEVEL_IRQFD) {
> + source = get_irq_source_from_irqfd(kvm, args->irqfd);
> + if (IS_ERR(source))
> + return PTR_ERR(source);
> + }
> +
Shouldn't you check that eoifd->gsi == irqfd->gsi?

> + eventfd = eventfd_ctx_fdget(args->fd);
> + if (IS_ERR(eventfd)) {
> + put_irq_source(source);
> + return PTR_ERR(eventfd);
> + }
> +
> + eoifd = kzalloc(sizeof(*eoifd), GFP_KERNEL);
> + if (!eoifd) {
> + put_irq_source(source);
> + eventfd_ctx_put(eventfd);
> + return -ENOMEM;
> + }
> +
> + INIT_LIST_HEAD(&eoifd->list);
> + eoifd->kvm = kvm;
> + eoifd->eventfd = eventfd;
> + eoifd->source = source;
> + eoifd->notifier.gsi = args->gsi;
> + eoifd->notifier.irq_acked = eoifd_event;
> +
> + spin_lock_irq(&kvm->eoifds.lock);
> +
> + list_for_each_entry(tmp, &kvm->eoifds.items, list) {
> + if (eoifd->eventfd != tmp->eventfd)
> + continue;
> +
> + put_irq_source(source);
> + eventfd_ctx_put(eventfd);
> + kfree(eoifd);
> + return -EBUSY;
> + }
> +
> + list_add_tail(&eoifd->list, &kvm->eoifds.items);
> + kvm_register_irq_ack_notifier(kvm, &eoifd->notifier);
> +
> + spin_unlock_irq(&kvm->eoifds.lock);
> +
> + return 0;
> +}
> +
> +static void eoifd_deactivate(struct kvm *kvm, struct _eoifd *eoifd)
> +{
> + list_del(&eoifd->list);
> + kvm_unregister_irq_ack_notifier(kvm, &eoifd->notifier);
> + put_irq_source(eoifd->source);
> + eventfd_ctx_put(eoifd->eventfd);
> + kfree(eoifd);
> +}
> +
> +void kvm_eoifd_release(struct kvm *kvm)
> +{
> + struct _eoifd *eoifd, *tmp;
> +
> + spin_lock_irq(&kvm->eoifds.lock);
> +
> + list_for_each_entry_safe(eoifd, tmp, &kvm->eoifds.items, list)
> + eoifd_deactivate(kvm, eoifd);
> +
> + spin_unlock_irq(&kvm->eoifds.lock);
> +}
> +
> +static int kvm_deassign_eoifd(struct kvm *kvm, struct kvm_eoifd *args)
> +{
> + struct eventfd_ctx *eventfd;
> + struct _eoifd *eoifd;
> + bool uses_source = (args->flags & KVM_EOIFD_FLAG_LEVEL_IRQFD) != 0;
> + int ret = -ENODEV;
> +
> + eventfd = eventfd_ctx_fdget(args->fd);
> + if (IS_ERR(eventfd))
> + return PTR_ERR(eventfd);
> +
> + spin_lock_irq(&kvm->eoifds.lock);
> +
> + list_for_each_entry(eoifd, &kvm->eoifds.items, list) {
> + /*
> + * Matching eventfd is unique since we don't allow dulicates,
> + * the rest is sanitizing the calling parameters.
> + */
> + if (eoifd->eventfd == eventfd &&
> + eoifd->notifier.gsi == args->gsi &&
> + uses_source == (eoifd->source != NULL)) {
> + eoifd_deactivate(kvm, eoifd);
> + ret = 0;
> + break;
> + }
> + }
> +
> + spin_unlock_irq(&kvm->eoifds.lock);
> +
> + eventfd_ctx_put(eventfd);
> +
> + return ret;
> +}
> +
> +int kvm_eoifd(struct kvm *kvm, struct kvm_eoifd *args)
> +{
> + if (args->flags & ~(KVM_EOIFD_FLAG_DEASSIGN |
> + KVM_EOIFD_FLAG_LEVEL_IRQFD))
> + return -EINVAL;
> +
> + if (args->flags & KVM_EOIFD_FLAG_DEASSIGN)
> + return kvm_deassign_eoifd(kvm, args);
> +
> + return kvm_assign_eoifd(kvm, args);
> +}
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index b4ad14cc..5b41df1 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -620,6 +620,8 @@ static int kvm_vm_release(struct inode *inode, struct file *filp)
>
> kvm_irqfd_release(kvm);
>
> + kvm_eoifd_release(kvm);
> +
> kvm_put_kvm(kvm);
> return 0;
> }
> @@ -2093,6 +2095,15 @@ static long kvm_vm_ioctl(struct file *filp,
> break;
> }
> #endif
> + case KVM_EOIFD: {
> + struct kvm_eoifd data;
> +
> + r = -EFAULT;
> + if (copy_from_user(&data, argp, sizeof data))
> + goto out;
> + r = kvm_eoifd(kvm, &data);
> + break;
> + }
> default:
> r = kvm_arch_vm_ioctl(filp, ioctl, arg);
> if (r == -ENOTTY)
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

--
Gleb.

2012-06-27 14:24:40

by Cornelia Huck

[permalink] [raw]
Subject: Re: [PATCH v2 1/6] kvm: Pass kvm_irqfd to functions

On Tue, 26 Jun 2012 23:09:04 -0600
Alex Williamson <[email protected]> wrote:

> Prune this down to just the struct kvm_irqfd so we can avoid
> changing function definition for every flag or field we use.
>
> Signed-off-by: Alex Williamson <[email protected]>

I'm currently trying to find a way to make irqfd workable for s390
which will likely include using a new field in kvm_irqfd, so I'd like
to have this change (and I also think it makes the code nicer to read).
So:

Acked-by: Cornelia Huck <[email protected]>

2012-06-27 14:29:09

by Alex Williamson

[permalink] [raw]
Subject: Re: [PATCH v2 5/6] kvm: KVM_EOIFD, an eventfd for EOIs

On Wed, 2012-06-27 at 16:58 +0300, Gleb Natapov wrote:
> On Tue, Jun 26, 2012 at 11:10:08PM -0600, Alex Williamson wrote:
> > This new ioctl enables an eventfd to be triggered when an EOI is
> > written for a specified irqchip pin. By default this is a simple
> > notification, but we can also tie the eoifd to a level irqfd, which
> > enables the irqchip pin to be automatically de-asserted on EOI.
> > This mode is particularly useful for device-assignment applications
> > where the unmask and notify triggers a hardware unmask. The default
> > mode is most applicable to simple notify with no side-effects for
> > userspace usage, such as Qemu.
> >
> > Here we make use of the reference counting of the _irq_source
> > object allowing us to share it with an irqfd and cleanup regardless
> > of the release order.
> >
> > Signed-off-by: Alex Williamson <[email protected]>
> > ---
> >
> > Documentation/virtual/kvm/api.txt | 24 +++++
> > arch/x86/kvm/x86.c | 1
> > include/linux/kvm.h | 14 +++
> > include/linux/kvm_host.h | 13 +++
> > virt/kvm/eventfd.c | 189 +++++++++++++++++++++++++++++++++++++
> > virt/kvm/kvm_main.c | 11 ++
> > 6 files changed, 250 insertions(+), 2 deletions(-)
> >
> > diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> > index b216709..87a2558 100644
> > --- a/Documentation/virtual/kvm/api.txt
> > +++ b/Documentation/virtual/kvm/api.txt
> > @@ -1987,6 +1987,30 @@ interrupts with those injected through KVM_IRQ_LINE. IRQFDs created
> > with KVM_IRQFD_FLAG_LEVEL must also set this flag when de-assiging.
> > KVM_IRQFD_FLAG_LEVEL support is indicated by KVM_CAP_IRQFD_LEVEL.
> >
> > +4.77 KVM_EOIFD
> > +
> > +Capability: KVM_CAP_EOIFD
> > +Architectures: x86
> > +Type: vm ioctl
> > +Parameters: struct kvm_eoifd (in)
> > +Returns: 0 on success, -1 on error
> > +
> > +KVM_EOIFD allows userspace to receive EOI notification through an
> > +eventfd for level triggered irqchip interrupts. Behavior for edge
> > +triggered interrupts is undefined. kvm_eoifd.fd specifies the eventfd
> Lets make it defined. EOI notification can be used by userspace to fix
> time drift due to lost interrupts. But than userspace needs to know
> which vcpu did EOI.

Hmm, do we need an additional flag and field in kvm_eoifd to filter by
vCPU then?

> Also see question below.
>
...
> > +
> > +static int kvm_assign_eoifd(struct kvm *kvm, struct kvm_eoifd *args)
> > +{
> > + struct eventfd_ctx *eventfd;
> > + struct _eoifd *eoifd, *tmp;
> > + struct _irq_source *source = NULL;
> > +
> > + if (args->flags & KVM_EOIFD_FLAG_LEVEL_IRQFD) {
> > + source = get_irq_source_from_irqfd(kvm, args->irqfd);
> > + if (IS_ERR(source))
> > + return PTR_ERR(source);
> > + }
> > +
> Shouldn't you check that eoifd->gsi == irqfd->gsi?

Yes I should. Thanks,

Alex

2012-06-27 14:31:00

by Alex Williamson

[permalink] [raw]
Subject: Re: [PATCH v2 1/6] kvm: Pass kvm_irqfd to functions

On Wed, 2012-06-27 at 12:35 +0300, Michael S. Tsirkin wrote:
> On Tue, Jun 26, 2012 at 11:09:04PM -0600, Alex Williamson wrote:
> > Prune this down to just the struct kvm_irqfd so we can avoid
> > changing function definition for every flag or field we use.
> >
> > Signed-off-by: Alex Williamson <[email protected]>
>
> This is not needed anymore, right? We are not adding
> new fields.

Right, but I still prefer it. Thanks,

Alex

2012-06-27 14:33:43

by Alex Williamson

[permalink] [raw]
Subject: Re: [PATCH v2 0/6] kvm: level triggered irqfd support

On Wed, 2012-06-27 at 12:58 +0300, Michael S. Tsirkin wrote:
> On Tue, Jun 26, 2012 at 11:08:52PM -0600, Alex Williamson wrote:
> > Ok, let's see how this flies. I actually quite like this, so be
> > gentle tearing it apart ;)
> >
> > I just couldn't bring myself to contort KVM_IRQFD into something
> > that either sets up an irqfd or specifies a nearly unrelated EOI
> > eventfd. The solution I've come up with, that also avoids exposing
> > irq_source_ids to userspace, is to work through the irqfd. If we
> > setup a level irqfd, we can optionally associate an eoifd with
> > the same irq_source_id, by passing the irqfd. To do this, we just
> > need to create a new reference counted object for the source ID
> > so we don't run into problems ordering release. This means we
> > end up with a KVM_EOIFD ioctl that has both general usefulness and
> > can still tie into an irqfd.
>
> I like this API.

Yay! I'll work on the bugs you've spotted.

> > In patch 6/6 I also include an alternate de-assert mechanism via an
> > irqfd with the opposite polarity. I don't currently use this, but
> > it's pretty trivial and at least available in archives now.
>
> The nasty bit about that is that if you assert twice then
> deassert once it's not clear what should happen.
> But yea, it does not hurt to put them in the archives.

It's no different that KVM_IRQ_LINE in that sense. It's not an
accumulator, it's a set state. Thanks,

Alex

2012-06-27 14:51:39

by Gleb Natapov

[permalink] [raw]
Subject: Re: [PATCH v2 5/6] kvm: KVM_EOIFD, an eventfd for EOIs

On Wed, Jun 27, 2012 at 08:29:04AM -0600, Alex Williamson wrote:
> On Wed, 2012-06-27 at 16:58 +0300, Gleb Natapov wrote:
> > On Tue, Jun 26, 2012 at 11:10:08PM -0600, Alex Williamson wrote:
> > > This new ioctl enables an eventfd to be triggered when an EOI is
> > > written for a specified irqchip pin. By default this is a simple
> > > notification, but we can also tie the eoifd to a level irqfd, which
> > > enables the irqchip pin to be automatically de-asserted on EOI.
> > > This mode is particularly useful for device-assignment applications
> > > where the unmask and notify triggers a hardware unmask. The default
> > > mode is most applicable to simple notify with no side-effects for
> > > userspace usage, such as Qemu.
> > >
> > > Here we make use of the reference counting of the _irq_source
> > > object allowing us to share it with an irqfd and cleanup regardless
> > > of the release order.
> > >
> > > Signed-off-by: Alex Williamson <[email protected]>
> > > ---
> > >
> > > Documentation/virtual/kvm/api.txt | 24 +++++
> > > arch/x86/kvm/x86.c | 1
> > > include/linux/kvm.h | 14 +++
> > > include/linux/kvm_host.h | 13 +++
> > > virt/kvm/eventfd.c | 189 +++++++++++++++++++++++++++++++++++++
> > > virt/kvm/kvm_main.c | 11 ++
> > > 6 files changed, 250 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> > > index b216709..87a2558 100644
> > > --- a/Documentation/virtual/kvm/api.txt
> > > +++ b/Documentation/virtual/kvm/api.txt
> > > @@ -1987,6 +1987,30 @@ interrupts with those injected through KVM_IRQ_LINE. IRQFDs created
> > > with KVM_IRQFD_FLAG_LEVEL must also set this flag when de-assiging.
> > > KVM_IRQFD_FLAG_LEVEL support is indicated by KVM_CAP_IRQFD_LEVEL.
> > >
> > > +4.77 KVM_EOIFD
> > > +
> > > +Capability: KVM_CAP_EOIFD
> > > +Architectures: x86
> > > +Type: vm ioctl
> > > +Parameters: struct kvm_eoifd (in)
> > > +Returns: 0 on success, -1 on error
> > > +
> > > +KVM_EOIFD allows userspace to receive EOI notification through an
> > > +eventfd for level triggered irqchip interrupts. Behavior for edge
> > > +triggered interrupts is undefined. kvm_eoifd.fd specifies the eventfd
> > Lets make it defined. EOI notification can be used by userspace to fix
> > time drift due to lost interrupts. But than userspace needs to know
> > which vcpu did EOI.
>
> Hmm, do we need an additional flag and field in kvm_eoifd to filter by
> vCPU then?
>
This will be enough for a use case I am aware of. Don't know if this
interface is generic enough for all possible use cases.

--
Gleb.

2012-06-27 15:20:29

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH v2 5/6] kvm: KVM_EOIFD, an eventfd for EOIs

On Tue, Jun 26, 2012 at 11:10:08PM -0600, Alex Williamson wrote:
> +static void eoifd_event(struct kvm_irq_ack_notifier *notifier)
> +{
> + struct _eoifd *eoifd;
> +
> + eoifd = container_of(notifier, struct _eoifd, notifier);
> +
> + if (eoifd->source)
> + kvm_set_irq(eoifd->kvm, eoifd->source->id,
> + eoifd->notifier.gsi, 0);

Let's add a comment here explaining why it's a safe thing
to do, which is, userspace will check status and reinject if
necessary.

--
MST

2012-06-27 15:26:26

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH v2 4/6] kvm: Extend irqfd to support level interrupts

On Tue, Jun 26, 2012 at 11:09:46PM -0600, Alex Williamson wrote:
> @@ -71,6 +130,14 @@ irqfd_inject(struct work_struct *work)
> kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd->gsi, 0);
> }
>
> +static void
> +irqfd_inject_level(struct work_struct *work)
> +{
> + struct _irqfd *irqfd = container_of(work, struct _irqfd, inject);
> +
> + kvm_set_irq(irqfd->kvm, irqfd->source->id, irqfd->gsi, 1);
> +}
> +
> /*
> * Race-free decouple logic (ordering is critical)
> */


Why is it safe to ignore return value here?
needs a comment.

2012-06-27 20:12:22

by Alex Williamson

[permalink] [raw]
Subject: Re: [PATCH v2 3/6] kvm: Sanitize KVM_IRQFD flags

On Wed, 2012-06-27 at 12:21 +0300, Michael S. Tsirkin wrote:
> On Tue, Jun 26, 2012 at 11:09:32PM -0600, Alex Williamson wrote:
> > We only know of one so far.
> >
> > Signed-off-by: Alex Williamson <[email protected]>
>
> Ugh. So we have a bug: we should have sanitized the fields.
> If there's buggy userspace that only set the low bit
> it will break with this change.
> Is it too late now? Do we need KVM_IRQFD2 which
> sanitized fields properly? Avi?

If we take that attitude that we haven't sanitized the bits in the past
and therefore all other bits are tainted from future use, we might as
well toss out this ioctl and start over. There's no way to add
anything. :-\

> > ---
> >
> > virt/kvm/eventfd.c | 3 +++
> > 1 file changed, 3 insertions(+)
> >
> > diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
> > index c307c24..7d7e2aa 100644
> > --- a/virt/kvm/eventfd.c
> > +++ b/virt/kvm/eventfd.c
> > @@ -340,6 +340,9 @@ kvm_irqfd_deassign(struct kvm *kvm, struct kvm_irqfd *args)
> > int
> > kvm_irqfd(struct kvm *kvm, struct kvm_irqfd *args)
> > {
> > + if (args->flags & ~KVM_IRQFD_FLAG_DEASSIGN)
> > + return -EINVAL;
> > +
> > if (args->flags & KVM_IRQFD_FLAG_DEASSIGN)
> > return kvm_irqfd_deassign(kvm, args);
> >


2012-06-27 20:22:04

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH v2 3/6] kvm: Sanitize KVM_IRQFD flags

On Wed, Jun 27, 2012 at 02:12:18PM -0600, Alex Williamson wrote:
> On Wed, 2012-06-27 at 12:21 +0300, Michael S. Tsirkin wrote:
> > On Tue, Jun 26, 2012 at 11:09:32PM -0600, Alex Williamson wrote:
> > > We only know of one so far.
> > >
> > > Signed-off-by: Alex Williamson <[email protected]>
> >
> > Ugh. So we have a bug: we should have sanitized the fields.
> > If there's buggy userspace that only set the low bit
> > it will break with this change.
> > Is it too late now? Do we need KVM_IRQFD2 which
> > sanitized fields properly? Avi?
>
> If we take that attitude that we haven't sanitized the bits in the past
> and therefore all other bits are tainted from future use, we might as
> well toss out this ioctl and start over. There's no way to add
> anything. :-\

This is what I'm asking.

> > > ---
> > >
> > > virt/kvm/eventfd.c | 3 +++
> > > 1 file changed, 3 insertions(+)
> > >
> > > diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
> > > index c307c24..7d7e2aa 100644
> > > --- a/virt/kvm/eventfd.c
> > > +++ b/virt/kvm/eventfd.c
> > > @@ -340,6 +340,9 @@ kvm_irqfd_deassign(struct kvm *kvm, struct kvm_irqfd *args)
> > > int
> > > kvm_irqfd(struct kvm *kvm, struct kvm_irqfd *args)
> > > {
> > > + if (args->flags & ~KVM_IRQFD_FLAG_DEASSIGN)
> > > + return -EINVAL;
> > > +
> > > if (args->flags & KVM_IRQFD_FLAG_DEASSIGN)
> > > return kvm_irqfd_deassign(kvm, args);
> > >
>
>

2012-06-27 20:59:13

by Alex Williamson

[permalink] [raw]
Subject: Re: [PATCH v2 4/6] kvm: Extend irqfd to support level interrupts

On Wed, 2012-06-27 at 12:51 +0300, Michael S. Tsirkin wrote:
> On Tue, Jun 26, 2012 at 11:09:46PM -0600, Alex Williamson wrote:
> > In order to inject an interrupt from an external source using an
> > irqfd, we need to allocate a new irq_source_id. This allows us to
> > assert and (later) de-assert an interrupt line independently from
> > users of KVM_IRQ_LINE and avoid lost interrupts.
> >
> > We also add what may appear like a bit of excessive infrastructure
> > around an object for storing this irq_source_id. However, notice
> > that we only provide a way to assert the interrupt here. A follow-on
> > interface will make use of the same irq_source_id to allow de-assert.
> >
> > Signed-off-by: Alex Williamson <[email protected]>
> > ---
> >
> > Documentation/virtual/kvm/api.txt | 5 ++
> > arch/x86/kvm/x86.c | 1
> > include/linux/kvm.h | 3 +
> > virt/kvm/eventfd.c | 95 +++++++++++++++++++++++++++++++++++--
> > 4 files changed, 99 insertions(+), 5 deletions(-)
> >
> > diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> > index ea9edce..b216709 100644
> > --- a/Documentation/virtual/kvm/api.txt
> > +++ b/Documentation/virtual/kvm/api.txt
> > @@ -1981,6 +1981,11 @@ the guest using the specified gsi pin. The irqfd is removed using
> > the KVM_IRQFD_FLAG_DEASSIGN flag, specifying both kvm_irqfd.fd
> > and kvm_irqfd.gsi.
> >
> > +With KVM_IRQFD_FLAG_LEVEL KVM_IRQFD allocates a new IRQ source ID for
> > +the requested irqfd. This is necessary to share level triggered
> > +interrupts with those injected through KVM_IRQ_LINE. IRQFDs created
> > +with KVM_IRQFD_FLAG_LEVEL must also set this flag when de-assiging.
> > +KVM_IRQFD_FLAG_LEVEL support is indicated by KVM_CAP_IRQFD_LEVEL.
>
> Note that if my patch removing auto-deassert gets accepted,
> this is not needed at all: we can just look at the GSI
> to see if it's level or edge.

I'm not sure this is a good idea. I know from vfio that I'm injecting a
level interrupt regardless of how the guest has the pic/ioapic
programmed at the time I'm calling this ioctl. Peeking across address
spaces to get to the right pin on the right pic/ioapic and see how it's
currently programmed seems fragile. Thanks,

Alex

2012-06-27 21:19:52

by Alex Williamson

[permalink] [raw]
Subject: Re: [PATCH v2 4/6] kvm: Extend irqfd to support level interrupts

On Wed, 2012-06-27 at 12:34 +0300, Michael S. Tsirkin wrote:
> On Tue, Jun 26, 2012 at 11:09:46PM -0600, Alex Williamson wrote:
> > In order to inject an interrupt from an external source using an
> > irqfd, we need to allocate a new irq_source_id. This allows us to
> > assert and (later) de-assert an interrupt line independently from
> > users of KVM_IRQ_LINE and avoid lost interrupts.
> >
> > We also add what may appear like a bit of excessive infrastructure
> > around an object for storing this irq_source_id. However, notice
> > that we only provide a way to assert the interrupt here. A follow-on
> > interface will make use of the same irq_source_id to allow de-assert.
> >
> > Signed-off-by: Alex Williamson <[email protected]>
> > ---
> >
> > Documentation/virtual/kvm/api.txt | 5 ++
> > arch/x86/kvm/x86.c | 1
> > include/linux/kvm.h | 3 +
> > virt/kvm/eventfd.c | 95 +++++++++++++++++++++++++++++++++++--
> > 4 files changed, 99 insertions(+), 5 deletions(-)
> >
> > diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> > index ea9edce..b216709 100644
> > --- a/Documentation/virtual/kvm/api.txt
> > +++ b/Documentation/virtual/kvm/api.txt
> > @@ -1981,6 +1981,11 @@ the guest using the specified gsi pin. The irqfd is removed using
> > the KVM_IRQFD_FLAG_DEASSIGN flag, specifying both kvm_irqfd.fd
> > and kvm_irqfd.gsi.
> >
> > +With KVM_IRQFD_FLAG_LEVEL KVM_IRQFD allocates a new IRQ source ID for
> > +the requested irqfd.
>
> This is the first and last time the term source id appears in this text.
> We probably can do without talking about it at all, simply
> explain that multiple irqfds mapped to same GSI are
> OR-ed together.

Ok

> > This is necessary to share level triggered
> > +interrupts with those injected through KVM_IRQ_LINE. IRQFDs created
> > +with KVM_IRQFD_FLAG_LEVEL must also set this flag when de-assiging.
> > +KVM_IRQFD_FLAG_LEVEL support is indicated by KVM_CAP_IRQFD_LEVEL.
> >
> > 5. The kvm_run structure
> > ------------------------
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index a01a424..80bed07 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -2148,6 +2148,7 @@ int kvm_dev_ioctl_check_extension(long ext)
> > case KVM_CAP_GET_TSC_KHZ:
> > case KVM_CAP_PCI_2_3:
> > case KVM_CAP_KVMCLOCK_CTRL:
> > + case KVM_CAP_IRQFD_LEVEL:
> > r = 1;
> > break;
> > case KVM_CAP_COALESCED_MMIO:
> > diff --git a/include/linux/kvm.h b/include/linux/kvm.h
> > index 2ce09aa..b2e6e4f 100644
> > --- a/include/linux/kvm.h
> > +++ b/include/linux/kvm.h
> > @@ -618,6 +618,7 @@ struct kvm_ppc_smmu_info {
> > #define KVM_CAP_PPC_GET_SMMU_INFO 78
> > #define KVM_CAP_S390_COW 79
> > #define KVM_CAP_PPC_ALLOC_HTAB 80
> > +#define KVM_CAP_IRQFD_LEVEL 81
> >
> > #ifdef KVM_CAP_IRQ_ROUTING
> >
> > @@ -683,6 +684,8 @@ struct kvm_xen_hvm_config {
> > #endif
> >
> > #define KVM_IRQFD_FLAG_DEASSIGN (1 << 0)
> > +/* Available with KVM_CAP_IRQFD_LEVEL */
> > +#define KVM_IRQFD_FLAG_LEVEL (1 << 1)
> >
> > struct kvm_irqfd {
> > __u32 fd;
> > diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
> > index 7d7e2aa..18cc284 100644
> > --- a/virt/kvm/eventfd.c
> > +++ b/virt/kvm/eventfd.c
> > @@ -36,6 +36,64 @@
> > #include "iodev.h"
> >
> > /*
> > + * An irq_source_id can be created from KVM_IRQFD for level interrupt
> > + * injections and shared with other interfaces for EOI or de-assert.
> > + * Create an object with reference counting to make it easy to use.
> > + */
> > +struct _irq_source {
> > + int id;
> > + struct kvm *kvm;
> > + struct kref kref;
> > +};
> > +
> > +static void release_irq_source(struct kref *kref)
> > +{
> > + struct _irq_source *source;
> > +
> > + source = container_of(kref, struct _irq_source, kref);
> > +
> > + kvm_free_irq_source_id(source->kvm, source->id);
> > + kfree(source);
> > +}
> > +
>
> It would be nicer to prefix everything with irqfd_
> and generally use prefixes. _irqfd_source irqfd_source_release_ etc.

Hmm, these objects are setup by an irqfd, but I don't think that
warrants naming all the functions after irqfd. I'll make them
_irq_source_foo()

> > +static void put_irq_source(struct _irq_source *source)
> > +{
> > + if (source)
> > + kref_put(&source->kref, release_irq_source);
> > +}
> > +
> > +static struct _irq_source *__attribute__ ((used)) /* white lie for now */
> > +get_irq_source(struct _irq_source *source)
> > +{
> > + if (source)
> > + kref_get(&source->kref);
> > +
> > + return source;
> > +}
> > +
> > +static struct _irq_source *new_irq_source(struct kvm *kvm)
> > +{
> > + struct _irq_source *source;
> > + int id;
> > +
> > + source = kzalloc(sizeof(*source), GFP_KERNEL);
> > + if (!source)
> > + return ERR_PTR(-ENOMEM);
> > +
> > + id = kvm_request_irq_source_id(kvm);
> > + if (id < 0) {
> > + kfree(source);
> > + return ERR_PTR(id);
> > + }
> > +
> > + kref_init(&source->kref);
> > + source->kvm = kvm;
> > + source->id = id;
> > +
> > + return source;
> > +}
> > +
>
> s/new/alloc/

Ok

> > +/*
> > * --------------------------------------------------------------------
> > * irqfd: Allows an fd to be used to inject an interrupt to the guest
> > *
> > @@ -52,6 +110,7 @@ struct _irqfd {
> > /* Used for level IRQ fast-path */
> > int gsi;
> > struct work_struct inject;
> > + struct _irq_source *source;
> > /* Used for setup/shutdown */
> > struct eventfd_ctx *eventfd;
> > struct list_head list;
> > @@ -62,7 +121,7 @@ struct _irqfd {
> > static struct workqueue_struct *irqfd_cleanup_wq;
> >
> > static void
> > -irqfd_inject(struct work_struct *work)
> > +irqfd_inject_edge(struct work_struct *work)
> > {
> > struct _irqfd *irqfd = container_of(work, struct _irqfd, inject);
> > struct kvm *kvm = irqfd->kvm;
> > @@ -71,6 +130,14 @@ irqfd_inject(struct work_struct *work)
> > kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd->gsi, 0);
> > }
> >
> > +static void
> > +irqfd_inject_level(struct work_struct *work)
> > +{
> > + struct _irqfd *irqfd = container_of(work, struct _irqfd, inject);
> > +
> > + kvm_set_irq(irqfd->kvm, irqfd->source->id, irqfd->gsi, 1);
> > +}
> > +
> > /*
> > * Race-free decouple logic (ordering is critical)
> > */
> > @@ -96,6 +163,9 @@ irqfd_shutdown(struct work_struct *work)
> > * It is now safe to release the object's resources
> > */
> > eventfd_ctx_put(irqfd->eventfd);
> > +
> > + put_irq_source(irqfd->source);
> > +
> > kfree(irqfd);
> > }
> >
> > @@ -214,7 +284,19 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args)
> > irqfd->kvm = kvm;
> > irqfd->gsi = args->gsi;
> > INIT_LIST_HEAD(&irqfd->list);
> > - INIT_WORK(&irqfd->inject, irqfd_inject);
> > +
> > + if (args->flags & KVM_IRQFD_FLAG_LEVEL) {
> > + irqfd->source = new_irq_source(kvm);
> > + if (IS_ERR(irqfd->source)) {
> > + ret = PTR_ERR(irqfd->source);
> > + irqfd->source = NULL;
> > + goto fail;
>
> A bit cleaner to create a dedicated label which skips
> put_irq_source than doing source=NULL tricks.

Matter of opinion. I think I'll switch to the mechanism used elsewhere
in this function:

source = _irq_source_alloc(kvm);
if (IS_ERR(source)) {
...
}

irqfd->source = source;

> > + }
> > +
> > + INIT_WORK(&irqfd->inject, irqfd_inject_level);
> > + } else
> > + INIT_WORK(&irqfd->inject, irqfd_inject_edge);
> > +
> > INIT_WORK(&irqfd->shutdown, irqfd_shutdown);
> >
> > file = eventfd_fget(args->fd);
> > @@ -279,9 +361,10 @@ fail:
> > if (eventfd && !IS_ERR(eventfd))
> > eventfd_ctx_put(eventfd);
> >
> > - if (!IS_ERR(file))
> > + if (file && !IS_ERR(file))
> > fput(file);
> >
> > + put_irq_source(irqfd->source);
> > kfree(irqfd);
> > return ret;
> > }
> > @@ -302,6 +385,7 @@ kvm_irqfd_deassign(struct kvm *kvm, struct kvm_irqfd *args)
> > {
> > struct _irqfd *irqfd, *tmp;
> > struct eventfd_ctx *eventfd;
> > + bool is_level = (args->flags & KVM_IRQFD_FLAG_LEVEL) != 0;
>
> != 0 is not needed here I think.

Strictly, no it's not. I just prefer to set bools from bools and I
think Avi has commented on using !! before.

> Also I'm not really sure why are we making userspace
> supply KVM_IRQFD_FLAG_LEVEL flag for clear.

It's mostly a sanity test for userspace. We don't allow duplicate
eventfds, so we really only need args->fd to match. But the existing
code matches args->gsi as well, so I thought I should make an attempt to
follow tradition. It'd be a lot easier to drop the tests below and only
match eventfd. Which do we want?

> >
> > eventfd = eventfd_ctx_fdget(args->fd);
> > if (IS_ERR(eventfd))
> > @@ -310,7 +394,8 @@ kvm_irqfd_deassign(struct kvm *kvm, struct kvm_irqfd *args)
> > spin_lock_irq(&kvm->irqfds.lock);
> >
> > list_for_each_entry_safe(irqfd, tmp, &kvm->irqfds.items, list) {
> > - if (irqfd->eventfd == eventfd && irqfd->gsi == args->gsi) {
> > + if (irqfd->eventfd == eventfd && irqfd->gsi == args->gsi &&
> > + is_level == (irqfd->source != NULL)) {
>
> Let's rename source to level_source to make usage clear?

has_source is perhaps more versatile. Thanks,

Alex

> > /*
> > * This rcu_assign_pointer is needed for when
> > * another thread calls kvm_irq_routing_update before
> > @@ -340,7 +425,7 @@ kvm_irqfd_deassign(struct kvm *kvm, struct kvm_irqfd *args)
> > int
> > kvm_irqfd(struct kvm *kvm, struct kvm_irqfd *args)
> > {
> > - if (args->flags & ~KVM_IRQFD_FLAG_DEASSIGN)
> > + if (args->flags & ~(KVM_IRQFD_FLAG_DEASSIGN | KVM_IRQFD_FLAG_LEVEL))
> > return -EINVAL;
> >
> > if (args->flags & KVM_IRQFD_FLAG_DEASSIGN)


2012-06-27 21:28:23

by Alex Williamson

[permalink] [raw]
Subject: Re: [PATCH v2 4/6] kvm: Extend irqfd to support level interrupts

On Thu, 2012-06-28 at 00:14 +0300, Michael S. Tsirkin wrote:
> On Wed, Jun 27, 2012 at 02:59:09PM -0600, Alex Williamson wrote:
> > On Wed, 2012-06-27 at 12:51 +0300, Michael S. Tsirkin wrote:
> > > On Tue, Jun 26, 2012 at 11:09:46PM -0600, Alex Williamson wrote:
> > > > In order to inject an interrupt from an external source using an
> > > > irqfd, we need to allocate a new irq_source_id. This allows us to
> > > > assert and (later) de-assert an interrupt line independently from
> > > > users of KVM_IRQ_LINE and avoid lost interrupts.
> > > >
> > > > We also add what may appear like a bit of excessive infrastructure
> > > > around an object for storing this irq_source_id. However, notice
> > > > that we only provide a way to assert the interrupt here. A follow-on
> > > > interface will make use of the same irq_source_id to allow de-assert.
> > > >
> > > > Signed-off-by: Alex Williamson <[email protected]>
> > > > ---
> > > >
> > > > Documentation/virtual/kvm/api.txt | 5 ++
> > > > arch/x86/kvm/x86.c | 1
> > > > include/linux/kvm.h | 3 +
> > > > virt/kvm/eventfd.c | 95 +++++++++++++++++++++++++++++++++++--
> > > > 4 files changed, 99 insertions(+), 5 deletions(-)
> > > >
> > > > diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> > > > index ea9edce..b216709 100644
> > > > --- a/Documentation/virtual/kvm/api.txt
> > > > +++ b/Documentation/virtual/kvm/api.txt
> > > > @@ -1981,6 +1981,11 @@ the guest using the specified gsi pin. The irqfd is removed using
> > > > the KVM_IRQFD_FLAG_DEASSIGN flag, specifying both kvm_irqfd.fd
> > > > and kvm_irqfd.gsi.
> > > >
> > > > +With KVM_IRQFD_FLAG_LEVEL KVM_IRQFD allocates a new IRQ source ID for
> > > > +the requested irqfd. This is necessary to share level triggered
> > > > +interrupts with those injected through KVM_IRQ_LINE. IRQFDs created
> > > > +with KVM_IRQFD_FLAG_LEVEL must also set this flag when de-assiging.
> > > > +KVM_IRQFD_FLAG_LEVEL support is indicated by KVM_CAP_IRQFD_LEVEL.
> > >
> > > Note that if my patch removing auto-deassert gets accepted,
> > > this is not needed at all: we can just look at the GSI
> > > to see if it's level or edge.
> >
> > I'm not sure this is a good idea. I know from vfio that I'm injecting a
> > level interrupt regardless of how the guest has the pic/ioapic
> > programmed at the time I'm calling this ioctl. Peeking across address
> > spaces to get to the right pin on the right pic/ioapic and see how it's
> > currently programmed seems fragile. Thanks,
> >
> > Alex
>
> Fragile? If you set eventfd as LEVEL but GSI is really edge then
> it all explodes, right? So why give users the option to shoot
> themselves in the foot?

If the guest has the ioapic rte set to edge at the time I call KVM_IRQFD
to register my level interrupt then it all explodes, right? I'd rather
let the user shoot themselves than play Russian roulette with the guest.
Am I misunderstanding what you mean by looking that the GSI to see if
it's level or edge?

2012-06-27 22:04:22

by Alex Williamson

[permalink] [raw]
Subject: Re: [PATCH v2 4/6] kvm: Extend irqfd to support level interrupts

On Wed, 2012-06-27 at 18:26 +0300, Michael S. Tsirkin wrote:
> On Tue, Jun 26, 2012 at 11:09:46PM -0600, Alex Williamson wrote:
> > @@ -71,6 +130,14 @@ irqfd_inject(struct work_struct *work)
> > kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd->gsi, 0);
> > }
> >
> > +static void
> > +irqfd_inject_level(struct work_struct *work)
> > +{
> > + struct _irqfd *irqfd = container_of(work, struct _irqfd, inject);
> > +
> > + kvm_set_irq(irqfd->kvm, irqfd->source->id, irqfd->gsi, 1);
> > +}
> > +
> > /*
> > * Race-free decouple logic (ordering is critical)
> > */
>
>
> Why is it safe to ignore return value here?
> needs a comment.

Well, it seems like you and Gleb came to the conclusion that it's safe,
but I can really follow from the list thread. Can you explain and I'll
add a comment? Thanks,

Alex

2012-06-27 22:08:05

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH v2 4/6] kvm: Extend irqfd to support level interrupts

On Wed, Jun 27, 2012 at 02:59:09PM -0600, Alex Williamson wrote:
> On Wed, 2012-06-27 at 12:51 +0300, Michael S. Tsirkin wrote:
> > On Tue, Jun 26, 2012 at 11:09:46PM -0600, Alex Williamson wrote:
> > > In order to inject an interrupt from an external source using an
> > > irqfd, we need to allocate a new irq_source_id. This allows us to
> > > assert and (later) de-assert an interrupt line independently from
> > > users of KVM_IRQ_LINE and avoid lost interrupts.
> > >
> > > We also add what may appear like a bit of excessive infrastructure
> > > around an object for storing this irq_source_id. However, notice
> > > that we only provide a way to assert the interrupt here. A follow-on
> > > interface will make use of the same irq_source_id to allow de-assert.
> > >
> > > Signed-off-by: Alex Williamson <[email protected]>
> > > ---
> > >
> > > Documentation/virtual/kvm/api.txt | 5 ++
> > > arch/x86/kvm/x86.c | 1
> > > include/linux/kvm.h | 3 +
> > > virt/kvm/eventfd.c | 95 +++++++++++++++++++++++++++++++++++--
> > > 4 files changed, 99 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> > > index ea9edce..b216709 100644
> > > --- a/Documentation/virtual/kvm/api.txt
> > > +++ b/Documentation/virtual/kvm/api.txt
> > > @@ -1981,6 +1981,11 @@ the guest using the specified gsi pin. The irqfd is removed using
> > > the KVM_IRQFD_FLAG_DEASSIGN flag, specifying both kvm_irqfd.fd
> > > and kvm_irqfd.gsi.
> > >
> > > +With KVM_IRQFD_FLAG_LEVEL KVM_IRQFD allocates a new IRQ source ID for
> > > +the requested irqfd. This is necessary to share level triggered
> > > +interrupts with those injected through KVM_IRQ_LINE. IRQFDs created
> > > +with KVM_IRQFD_FLAG_LEVEL must also set this flag when de-assiging.
> > > +KVM_IRQFD_FLAG_LEVEL support is indicated by KVM_CAP_IRQFD_LEVEL.
> >
> > Note that if my patch removing auto-deassert gets accepted,
> > this is not needed at all: we can just look at the GSI
> > to see if it's level or edge.
>
> I'm not sure this is a good idea. I know from vfio that I'm injecting a
> level interrupt regardless of how the guest has the pic/ioapic
> programmed at the time I'm calling this ioctl. Peeking across address
> spaces to get to the right pin on the right pic/ioapic and see how it's
> currently programmed seems fragile. Thanks,
>
> Alex

Fragile? If you set eventfd as LEVEL but GSI is really edge then
it all explodes, right? So why give users the option to shoot
themselves in the foot?


--
MST

2012-06-27 22:28:52

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH v2 4/6] kvm: Extend irqfd to support level interrupts

On Wed, Jun 27, 2012 at 03:28:19PM -0600, Alex Williamson wrote:
> On Thu, 2012-06-28 at 00:14 +0300, Michael S. Tsirkin wrote:
> > On Wed, Jun 27, 2012 at 02:59:09PM -0600, Alex Williamson wrote:
> > > On Wed, 2012-06-27 at 12:51 +0300, Michael S. Tsirkin wrote:
> > > > On Tue, Jun 26, 2012 at 11:09:46PM -0600, Alex Williamson wrote:
> > > > > In order to inject an interrupt from an external source using an
> > > > > irqfd, we need to allocate a new irq_source_id. This allows us to
> > > > > assert and (later) de-assert an interrupt line independently from
> > > > > users of KVM_IRQ_LINE and avoid lost interrupts.
> > > > >
> > > > > We also add what may appear like a bit of excessive infrastructure
> > > > > around an object for storing this irq_source_id. However, notice
> > > > > that we only provide a way to assert the interrupt here. A follow-on
> > > > > interface will make use of the same irq_source_id to allow de-assert.
> > > > >
> > > > > Signed-off-by: Alex Williamson <[email protected]>
> > > > > ---
> > > > >
> > > > > Documentation/virtual/kvm/api.txt | 5 ++
> > > > > arch/x86/kvm/x86.c | 1
> > > > > include/linux/kvm.h | 3 +
> > > > > virt/kvm/eventfd.c | 95 +++++++++++++++++++++++++++++++++++--
> > > > > 4 files changed, 99 insertions(+), 5 deletions(-)
> > > > >
> > > > > diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> > > > > index ea9edce..b216709 100644
> > > > > --- a/Documentation/virtual/kvm/api.txt
> > > > > +++ b/Documentation/virtual/kvm/api.txt
> > > > > @@ -1981,6 +1981,11 @@ the guest using the specified gsi pin. The irqfd is removed using
> > > > > the KVM_IRQFD_FLAG_DEASSIGN flag, specifying both kvm_irqfd.fd
> > > > > and kvm_irqfd.gsi.
> > > > >
> > > > > +With KVM_IRQFD_FLAG_LEVEL KVM_IRQFD allocates a new IRQ source ID for
> > > > > +the requested irqfd. This is necessary to share level triggered
> > > > > +interrupts with those injected through KVM_IRQ_LINE. IRQFDs created
> > > > > +with KVM_IRQFD_FLAG_LEVEL must also set this flag when de-assiging.
> > > > > +KVM_IRQFD_FLAG_LEVEL support is indicated by KVM_CAP_IRQFD_LEVEL.
> > > >
> > > > Note that if my patch removing auto-deassert gets accepted,
> > > > this is not needed at all: we can just look at the GSI
> > > > to see if it's level or edge.
> > >
> > > I'm not sure this is a good idea. I know from vfio that I'm injecting a
> > > level interrupt regardless of how the guest has the pic/ioapic
> > > programmed at the time I'm calling this ioctl. Peeking across address
> > > spaces to get to the right pin on the right pic/ioapic and see how it's
> > > currently programmed seems fragile. Thanks,
> > >
> > > Alex
> >
> > Fragile? If you set eventfd as LEVEL but GSI is really edge then
> > it all explodes, right? So why give users the option to shoot
> > themselves in the foot?
>
> If the guest has the ioapic rte set to edge at the time I call KVM_IRQFD
> to register my level interrupt then it all explodes, right? I'd rather
> let the user shoot themselves than play Russian roulette with the guest.
> Am I misunderstanding what you mean by looking that the GSI to see if
> it's level or edge?

Not sure.
I simply mean this: if eventfd is bound to irqfd, set level from irqfd
and clear from eventfd ack notifier.
There's no need for a special IRQ_LEVEL for this.

--
MST

2012-06-27 22:31:30

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH v2 4/6] kvm: Extend irqfd to support level interrupts

On Wed, Jun 27, 2012 at 04:04:18PM -0600, Alex Williamson wrote:
> On Wed, 2012-06-27 at 18:26 +0300, Michael S. Tsirkin wrote:
> > On Tue, Jun 26, 2012 at 11:09:46PM -0600, Alex Williamson wrote:
> > > @@ -71,6 +130,14 @@ irqfd_inject(struct work_struct *work)
> > > kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd->gsi, 0);
> > > }
> > >
> > > +static void
> > > +irqfd_inject_level(struct work_struct *work)
> > > +{
> > > + struct _irqfd *irqfd = container_of(work, struct _irqfd, inject);
> > > +
> > > + kvm_set_irq(irqfd->kvm, irqfd->source->id, irqfd->gsi, 1);
> > > +}
> > > +
> > > /*
> > > * Race-free decouple logic (ordering is critical)
> > > */
> >
> >
> > Why is it safe to ignore return value here?
> > needs a comment.
>
> Well, it seems like you and Gleb came to the conclusion that it's safe,
> but I can really follow from the list thread. Can you explain and I'll
> add a comment? Thanks,
>
> Alex

We merely talked about edge interrupts.

--
MST

2012-06-28 03:52:57

by Alex Williamson

[permalink] [raw]
Subject: Re: [PATCH v2 4/6] kvm: Extend irqfd to support level interrupts

On Thu, 2012-06-28 at 01:28 +0300, Michael S. Tsirkin wrote:
> On Wed, Jun 27, 2012 at 03:28:19PM -0600, Alex Williamson wrote:
> > On Thu, 2012-06-28 at 00:14 +0300, Michael S. Tsirkin wrote:
> > > On Wed, Jun 27, 2012 at 02:59:09PM -0600, Alex Williamson wrote:
> > > > On Wed, 2012-06-27 at 12:51 +0300, Michael S. Tsirkin wrote:
> > > > > On Tue, Jun 26, 2012 at 11:09:46PM -0600, Alex Williamson wrote:
> > > > > > In order to inject an interrupt from an external source using an
> > > > > > irqfd, we need to allocate a new irq_source_id. This allows us to
> > > > > > assert and (later) de-assert an interrupt line independently from
> > > > > > users of KVM_IRQ_LINE and avoid lost interrupts.
> > > > > >
> > > > > > We also add what may appear like a bit of excessive infrastructure
> > > > > > around an object for storing this irq_source_id. However, notice
> > > > > > that we only provide a way to assert the interrupt here. A follow-on
> > > > > > interface will make use of the same irq_source_id to allow de-assert.
> > > > > >
> > > > > > Signed-off-by: Alex Williamson <[email protected]>
> > > > > > ---
> > > > > >
> > > > > > Documentation/virtual/kvm/api.txt | 5 ++
> > > > > > arch/x86/kvm/x86.c | 1
> > > > > > include/linux/kvm.h | 3 +
> > > > > > virt/kvm/eventfd.c | 95 +++++++++++++++++++++++++++++++++++--
> > > > > > 4 files changed, 99 insertions(+), 5 deletions(-)
> > > > > >
> > > > > > diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> > > > > > index ea9edce..b216709 100644
> > > > > > --- a/Documentation/virtual/kvm/api.txt
> > > > > > +++ b/Documentation/virtual/kvm/api.txt
> > > > > > @@ -1981,6 +1981,11 @@ the guest using the specified gsi pin. The irqfd is removed using
> > > > > > the KVM_IRQFD_FLAG_DEASSIGN flag, specifying both kvm_irqfd.fd
> > > > > > and kvm_irqfd.gsi.
> > > > > >
> > > > > > +With KVM_IRQFD_FLAG_LEVEL KVM_IRQFD allocates a new IRQ source ID for
> > > > > > +the requested irqfd. This is necessary to share level triggered
> > > > > > +interrupts with those injected through KVM_IRQ_LINE. IRQFDs created
> > > > > > +with KVM_IRQFD_FLAG_LEVEL must also set this flag when de-assiging.
> > > > > > +KVM_IRQFD_FLAG_LEVEL support is indicated by KVM_CAP_IRQFD_LEVEL.
> > > > >
> > > > > Note that if my patch removing auto-deassert gets accepted,
> > > > > this is not needed at all: we can just look at the GSI
> > > > > to see if it's level or edge.
> > > >
> > > > I'm not sure this is a good idea. I know from vfio that I'm injecting a
> > > > level interrupt regardless of how the guest has the pic/ioapic
> > > > programmed at the time I'm calling this ioctl. Peeking across address
> > > > spaces to get to the right pin on the right pic/ioapic and see how it's
> > > > currently programmed seems fragile. Thanks,
> > > >
> > > > Alex
> > >
> > > Fragile? If you set eventfd as LEVEL but GSI is really edge then
> > > it all explodes, right? So why give users the option to shoot
> > > themselves in the foot?
> >
> > If the guest has the ioapic rte set to edge at the time I call KVM_IRQFD
> > to register my level interrupt then it all explodes, right? I'd rather
> > let the user shoot themselves than play Russian roulette with the guest.
> > Am I misunderstanding what you mean by looking that the GSI to see if
> > it's level or edge?
>
> Not sure.
> I simply mean this: if eventfd is bound to irqfd, set level from irqfd
> and clear from eventfd ack notifier.

Are you simply saying assert (kvm_set_irq(,,,1)) from irqfd trigger and
de-assert (kvm_set_irq(,,,0)) from eventfd ack notifier (aka KVM_EOIFD)?

> There's no need for a special IRQ_LEVEL for this.

That ignores the whole problem of when do we need to allocate a new
irq_source_id and when do we inject using KVM_USERSPACE_IRQ_SOURCE_ID.
We've already discussed that a level triggered, externally fired
interrupt must use a separate source ID from Qemu userspace. Therefore
when you say "look at the GSI to see if it's level or edge", I assume
you mean trace the gsi back to the pic/ioapic pin and look at the
trigger mode. That trigger mode is configured by the guest, so that
means that at the point in time when we call KVM_IRQFD we make a
determination based on how the _guest_ has programmed the ioapic. That
may not match the interrupt we expect to inject. On the other hand, the
user calling KVM_IRQFD absolutely knows the type of interrupt provided
by their device. I think we need a flag regardless of whether your
patch is accepted. We may be able to share the inject handler if it is
accepted, but it doesn't change the user API. Thanks,

Alex

2012-06-28 03:55:48

by Alex Williamson

[permalink] [raw]
Subject: Re: [PATCH v2 5/6] kvm: KVM_EOIFD, an eventfd for EOIs

On Wed, 2012-06-27 at 17:51 +0300, Gleb Natapov wrote:
> On Wed, Jun 27, 2012 at 08:29:04AM -0600, Alex Williamson wrote:
> > On Wed, 2012-06-27 at 16:58 +0300, Gleb Natapov wrote:
> > > On Tue, Jun 26, 2012 at 11:10:08PM -0600, Alex Williamson wrote:
> > > > This new ioctl enables an eventfd to be triggered when an EOI is
> > > > written for a specified irqchip pin. By default this is a simple
> > > > notification, but we can also tie the eoifd to a level irqfd, which
> > > > enables the irqchip pin to be automatically de-asserted on EOI.
> > > > This mode is particularly useful for device-assignment applications
> > > > where the unmask and notify triggers a hardware unmask. The default
> > > > mode is most applicable to simple notify with no side-effects for
> > > > userspace usage, such as Qemu.
> > > >
> > > > Here we make use of the reference counting of the _irq_source
> > > > object allowing us to share it with an irqfd and cleanup regardless
> > > > of the release order.
> > > >
> > > > Signed-off-by: Alex Williamson <[email protected]>
> > > > ---
> > > >
> > > > Documentation/virtual/kvm/api.txt | 24 +++++
> > > > arch/x86/kvm/x86.c | 1
> > > > include/linux/kvm.h | 14 +++
> > > > include/linux/kvm_host.h | 13 +++
> > > > virt/kvm/eventfd.c | 189 +++++++++++++++++++++++++++++++++++++
> > > > virt/kvm/kvm_main.c | 11 ++
> > > > 6 files changed, 250 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> > > > index b216709..87a2558 100644
> > > > --- a/Documentation/virtual/kvm/api.txt
> > > > +++ b/Documentation/virtual/kvm/api.txt
> > > > @@ -1987,6 +1987,30 @@ interrupts with those injected through KVM_IRQ_LINE. IRQFDs created
> > > > with KVM_IRQFD_FLAG_LEVEL must also set this flag when de-assiging.
> > > > KVM_IRQFD_FLAG_LEVEL support is indicated by KVM_CAP_IRQFD_LEVEL.
> > > >
> > > > +4.77 KVM_EOIFD
> > > > +
> > > > +Capability: KVM_CAP_EOIFD
> > > > +Architectures: x86
> > > > +Type: vm ioctl
> > > > +Parameters: struct kvm_eoifd (in)
> > > > +Returns: 0 on success, -1 on error
> > > > +
> > > > +KVM_EOIFD allows userspace to receive EOI notification through an
> > > > +eventfd for level triggered irqchip interrupts. Behavior for edge
> > > > +triggered interrupts is undefined. kvm_eoifd.fd specifies the eventfd
> > > Lets make it defined. EOI notification can be used by userspace to fix
> > > time drift due to lost interrupts. But than userspace needs to know
> > > which vcpu did EOI.
> >
> > Hmm, do we need an additional flag and field in kvm_eoifd to filter by
> > vCPU then?
> >
> This will be enough for a use case I am aware of. Don't know if this
> interface is generic enough for all possible use cases.

That's generally a hard prediction to make ;) We currently don't pass a
kvm_vcpu anywhere close to the irq ack notifier. The ioapic path could
be relatively trivial, but the pic path is a bit further disconnected.
If we had that plumbing, a KVM_CAP plus vcpu filter flag and specifying
the vcpu using some of the padding space seems like it's sufficient.
I'll drop mention of level-only from the description, but the plumbing
and vcpu filtering can be a follow-on. Thanks,

Alex

2012-06-28 06:34:36

by Gleb Natapov

[permalink] [raw]
Subject: Re: [PATCH v2 4/6] kvm: Extend irqfd to support level interrupts

On Thu, Jun 28, 2012 at 01:31:29AM +0300, Michael S. Tsirkin wrote:
> On Wed, Jun 27, 2012 at 04:04:18PM -0600, Alex Williamson wrote:
> > On Wed, 2012-06-27 at 18:26 +0300, Michael S. Tsirkin wrote:
> > > On Tue, Jun 26, 2012 at 11:09:46PM -0600, Alex Williamson wrote:
> > > > @@ -71,6 +130,14 @@ irqfd_inject(struct work_struct *work)
> > > > kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd->gsi, 0);
> > > > }
> > > >
> > > > +static void
> > > > +irqfd_inject_level(struct work_struct *work)
> > > > +{
> > > > + struct _irqfd *irqfd = container_of(work, struct _irqfd, inject);
> > > > +
> > > > + kvm_set_irq(irqfd->kvm, irqfd->source->id, irqfd->gsi, 1);
> > > > +}
> > > > +
> > > > /*
> > > > * Race-free decouple logic (ordering is critical)
> > > > */
> > >
> > >
> > > Why is it safe to ignore return value here?
> > > needs a comment.
> >
> > Well, it seems like you and Gleb came to the conclusion that it's safe,
> > but I can really follow from the list thread. Can you explain and I'll
> > add a comment? Thanks,
> >
> > Alex
>
> We merely talked about edge interrupts.
>
In fact it would have been nice to return -EBUSY when write() to level
irqfd is coalesced.

--
Gleb.

2012-06-28 08:29:28

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH v2 4/6] kvm: Extend irqfd to support level interrupts

On Wed, Jun 27, 2012 at 09:52:52PM -0600, Alex Williamson wrote:
> On Thu, 2012-06-28 at 01:28 +0300, Michael S. Tsirkin wrote:
> > On Wed, Jun 27, 2012 at 03:28:19PM -0600, Alex Williamson wrote:
> > > On Thu, 2012-06-28 at 00:14 +0300, Michael S. Tsirkin wrote:
> > > > On Wed, Jun 27, 2012 at 02:59:09PM -0600, Alex Williamson wrote:
> > > > > On Wed, 2012-06-27 at 12:51 +0300, Michael S. Tsirkin wrote:
> > > > > > On Tue, Jun 26, 2012 at 11:09:46PM -0600, Alex Williamson wrote:
> > > > > > > In order to inject an interrupt from an external source using an
> > > > > > > irqfd, we need to allocate a new irq_source_id. This allows us to
> > > > > > > assert and (later) de-assert an interrupt line independently from
> > > > > > > users of KVM_IRQ_LINE and avoid lost interrupts.
> > > > > > >
> > > > > > > We also add what may appear like a bit of excessive infrastructure
> > > > > > > around an object for storing this irq_source_id. However, notice
> > > > > > > that we only provide a way to assert the interrupt here. A follow-on
> > > > > > > interface will make use of the same irq_source_id to allow de-assert.
> > > > > > >
> > > > > > > Signed-off-by: Alex Williamson <[email protected]>
> > > > > > > ---
> > > > > > >
> > > > > > > Documentation/virtual/kvm/api.txt | 5 ++
> > > > > > > arch/x86/kvm/x86.c | 1
> > > > > > > include/linux/kvm.h | 3 +
> > > > > > > virt/kvm/eventfd.c | 95 +++++++++++++++++++++++++++++++++++--
> > > > > > > 4 files changed, 99 insertions(+), 5 deletions(-)
> > > > > > >
> > > > > > > diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> > > > > > > index ea9edce..b216709 100644
> > > > > > > --- a/Documentation/virtual/kvm/api.txt
> > > > > > > +++ b/Documentation/virtual/kvm/api.txt
> > > > > > > @@ -1981,6 +1981,11 @@ the guest using the specified gsi pin. The irqfd is removed using
> > > > > > > the KVM_IRQFD_FLAG_DEASSIGN flag, specifying both kvm_irqfd.fd
> > > > > > > and kvm_irqfd.gsi.
> > > > > > >
> > > > > > > +With KVM_IRQFD_FLAG_LEVEL KVM_IRQFD allocates a new IRQ source ID for
> > > > > > > +the requested irqfd. This is necessary to share level triggered
> > > > > > > +interrupts with those injected through KVM_IRQ_LINE. IRQFDs created
> > > > > > > +with KVM_IRQFD_FLAG_LEVEL must also set this flag when de-assiging.
> > > > > > > +KVM_IRQFD_FLAG_LEVEL support is indicated by KVM_CAP_IRQFD_LEVEL.
> > > > > >
> > > > > > Note that if my patch removing auto-deassert gets accepted,
> > > > > > this is not needed at all: we can just look at the GSI
> > > > > > to see if it's level or edge.
> > > > >
> > > > > I'm not sure this is a good idea. I know from vfio that I'm injecting a
> > > > > level interrupt regardless of how the guest has the pic/ioapic
> > > > > programmed at the time I'm calling this ioctl. Peeking across address
> > > > > spaces to get to the right pin on the right pic/ioapic and see how it's
> > > > > currently programmed seems fragile. Thanks,
> > > > >
> > > > > Alex
> > > >
> > > > Fragile? If you set eventfd as LEVEL but GSI is really edge then
> > > > it all explodes, right? So why give users the option to shoot
> > > > themselves in the foot?
> > >
> > > If the guest has the ioapic rte set to edge at the time I call KVM_IRQFD
> > > to register my level interrupt then it all explodes, right? I'd rather
> > > let the user shoot themselves than play Russian roulette with the guest.
> > > Am I misunderstanding what you mean by looking that the GSI to see if
> > > it's level or edge?
> >
> > Not sure.
> > I simply mean this: if eventfd is bound to irqfd, set level from irqfd
> > and clear from eventfd ack notifier.
>
> Are you simply saying assert (kvm_set_irq(,,,1)) from irqfd trigger and
> de-assert (kvm_set_irq(,,,0)) from eventfd ack notifier (aka KVM_EOIFD)?

Yes.

> > There's no need for a special IRQ_LEVEL for this.
>
> That ignores the whole problem of when do we need to allocate a new
> irq_source_id and when do we inject using KVM_USERSPACE_IRQ_SOURCE_ID.
> We've already discussed that a level triggered, externally fired
> interrupt must use a separate source ID from Qemu userspace. Therefore
> when you say "look at the GSI to see if it's level or edge", I assume
> you mean trace the gsi back to the pic/ioapic pin and look at the
> trigger mode. That trigger mode is configured by the guest, so that
> means that at the point in time when we call KVM_IRQFD we make a
> determination based on how the _guest_ has programmed the ioapic. That
> may not match the interrupt we expect to inject. On the other hand, the
> user calling KVM_IRQFD absolutely knows the type of interrupt provided
> by their device. I think we need a flag regardless of whether your
> patch is accepted. We may be able to share the inject handler if it is
> accepted, but it doesn't change the user API. Thanks,
>
> Alex

This has merit, I am just looking for a way out
without adding LEVEL flag which seems to duplicate
what guest does, especially now it turns out
we can't add new flags to IRQFD.

How about this: allocate source id when eventfd is mapped?

--
MST

2012-06-28 08:34:39

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH v2 4/6] kvm: Extend irqfd to support level interrupts

On Thu, Jun 28, 2012 at 09:34:31AM +0300, Gleb Natapov wrote:
> On Thu, Jun 28, 2012 at 01:31:29AM +0300, Michael S. Tsirkin wrote:
> > On Wed, Jun 27, 2012 at 04:04:18PM -0600, Alex Williamson wrote:
> > > On Wed, 2012-06-27 at 18:26 +0300, Michael S. Tsirkin wrote:
> > > > On Tue, Jun 26, 2012 at 11:09:46PM -0600, Alex Williamson wrote:
> > > > > @@ -71,6 +130,14 @@ irqfd_inject(struct work_struct *work)
> > > > > kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd->gsi, 0);
> > > > > }
> > > > >
> > > > > +static void
> > > > > +irqfd_inject_level(struct work_struct *work)
> > > > > +{
> > > > > + struct _irqfd *irqfd = container_of(work, struct _irqfd, inject);
> > > > > +
> > > > > + kvm_set_irq(irqfd->kvm, irqfd->source->id, irqfd->gsi, 1);
> > > > > +}
> > > > > +
> > > > > /*
> > > > > * Race-free decouple logic (ordering is critical)
> > > > > */
> > > >
> > > >
> > > > Why is it safe to ignore return value here?
> > > > needs a comment.
> > >
> > > Well, it seems like you and Gleb came to the conclusion that it's safe,
> > > but I can really follow from the list thread. Can you explain and I'll
> > > add a comment? Thanks,
> > >
> > > Alex
> >
> > We merely talked about edge interrupts.
> >
> In fact it would have been nice to return -EBUSY when write() to level
> irqfd is coalesced.

Possibly nice but not really practical.

> --
> Gleb.

2012-06-28 08:35:48

by Gleb Natapov

[permalink] [raw]
Subject: Re: [PATCH v2 4/6] kvm: Extend irqfd to support level interrupts

On Thu, Jun 28, 2012 at 11:34:35AM +0300, Michael S. Tsirkin wrote:
> On Thu, Jun 28, 2012 at 09:34:31AM +0300, Gleb Natapov wrote:
> > On Thu, Jun 28, 2012 at 01:31:29AM +0300, Michael S. Tsirkin wrote:
> > > On Wed, Jun 27, 2012 at 04:04:18PM -0600, Alex Williamson wrote:
> > > > On Wed, 2012-06-27 at 18:26 +0300, Michael S. Tsirkin wrote:
> > > > > On Tue, Jun 26, 2012 at 11:09:46PM -0600, Alex Williamson wrote:
> > > > > > @@ -71,6 +130,14 @@ irqfd_inject(struct work_struct *work)
> > > > > > kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd->gsi, 0);
> > > > > > }
> > > > > >
> > > > > > +static void
> > > > > > +irqfd_inject_level(struct work_struct *work)
> > > > > > +{
> > > > > > + struct _irqfd *irqfd = container_of(work, struct _irqfd, inject);
> > > > > > +
> > > > > > + kvm_set_irq(irqfd->kvm, irqfd->source->id, irqfd->gsi, 1);
> > > > > > +}
> > > > > > +
> > > > > > /*
> > > > > > * Race-free decouple logic (ordering is critical)
> > > > > > */
> > > > >
> > > > >
> > > > > Why is it safe to ignore return value here?
> > > > > needs a comment.
> > > >
> > > > Well, it seems like you and Gleb came to the conclusion that it's safe,
> > > > but I can really follow from the list thread. Can you explain and I'll
> > > > add a comment? Thanks,
> > > >
> > > > Alex
> > >
> > > We merely talked about edge interrupts.
> > >
> > In fact it would have been nice to return -EBUSY when write() to level
> > irqfd is coalesced.
>
> Possibly nice but not really practical.
>
What do you mean by that? Impossible to implement or not useful?

--
Gleb.

2012-06-28 08:39:01

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH v2 1/6] kvm: Pass kvm_irqfd to functions

On Wed, Jun 27, 2012 at 04:24:30PM +0200, Cornelia Huck wrote:
> On Tue, 26 Jun 2012 23:09:04 -0600
> Alex Williamson <[email protected]> wrote:
>
> > Prune this down to just the struct kvm_irqfd so we can avoid
> > changing function definition for every flag or field we use.
> >
> > Signed-off-by: Alex Williamson <[email protected]>
>
> I'm currently trying to find a way to make irqfd workable for s390
> which will likely include using a new field in kvm_irqfd, so I'd like
> to have this change (and I also think it makes the code nicer to read).
> So:
>
> Acked-by: Cornelia Huck <[email protected]>

Unfortunately it looks like we are not sanitizing kvm_irqfd
at all so we won't be able to use the padding :(
We'll need a new ioctl instead.

--
MST

2012-06-28 08:41:06

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH v2 4/6] kvm: Extend irqfd to support level interrupts

On Thu, Jun 28, 2012 at 11:35:41AM +0300, Gleb Natapov wrote:
> On Thu, Jun 28, 2012 at 11:34:35AM +0300, Michael S. Tsirkin wrote:
> > On Thu, Jun 28, 2012 at 09:34:31AM +0300, Gleb Natapov wrote:
> > > On Thu, Jun 28, 2012 at 01:31:29AM +0300, Michael S. Tsirkin wrote:
> > > > On Wed, Jun 27, 2012 at 04:04:18PM -0600, Alex Williamson wrote:
> > > > > On Wed, 2012-06-27 at 18:26 +0300, Michael S. Tsirkin wrote:
> > > > > > On Tue, Jun 26, 2012 at 11:09:46PM -0600, Alex Williamson wrote:
> > > > > > > @@ -71,6 +130,14 @@ irqfd_inject(struct work_struct *work)
> > > > > > > kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd->gsi, 0);
> > > > > > > }
> > > > > > >
> > > > > > > +static void
> > > > > > > +irqfd_inject_level(struct work_struct *work)
> > > > > > > +{
> > > > > > > + struct _irqfd *irqfd = container_of(work, struct _irqfd, inject);
> > > > > > > +
> > > > > > > + kvm_set_irq(irqfd->kvm, irqfd->source->id, irqfd->gsi, 1);
> > > > > > > +}
> > > > > > > +
> > > > > > > /*
> > > > > > > * Race-free decouple logic (ordering is critical)
> > > > > > > */
> > > > > >
> > > > > >
> > > > > > Why is it safe to ignore return value here?
> > > > > > needs a comment.
> > > > >
> > > > > Well, it seems like you and Gleb came to the conclusion that it's safe,
> > > > > but I can really follow from the list thread. Can you explain and I'll
> > > > > add a comment? Thanks,
> > > > >
> > > > > Alex
> > > >
> > > > We merely talked about edge interrupts.
> > > >
> > > In fact it would have been nice to return -EBUSY when write() to level
> > > irqfd is coalesced.
> >
> > Possibly nice but not really practical.
> >
> What do you mean by that? Impossible to implement or not useful?

Impossible to implement and also does not match normal eventfd
semantics.

> --
> Gleb.

2012-06-28 08:42:35

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH v2 0/6] kvm: level triggered irqfd support

On Wed, Jun 27, 2012 at 08:33:40AM -0600, Alex Williamson wrote:
> On Wed, 2012-06-27 at 12:58 +0300, Michael S. Tsirkin wrote:
> > On Tue, Jun 26, 2012 at 11:08:52PM -0600, Alex Williamson wrote:
> > > Ok, let's see how this flies. I actually quite like this, so be
> > > gentle tearing it apart ;)
> > >
> > > I just couldn't bring myself to contort KVM_IRQFD into something
> > > that either sets up an irqfd or specifies a nearly unrelated EOI
> > > eventfd. The solution I've come up with, that also avoids exposing
> > > irq_source_ids to userspace, is to work through the irqfd. If we
> > > setup a level irqfd, we can optionally associate an eoifd with
> > > the same irq_source_id, by passing the irqfd. To do this, we just
> > > need to create a new reference counted object for the source ID
> > > so we don't run into problems ordering release. This means we
> > > end up with a KVM_EOIFD ioctl that has both general usefulness and
> > > can still tie into an irqfd.
> >
> > I like this API.
>
> Yay! I'll work on the bugs you've spotted.
>
> > > In patch 6/6 I also include an alternate de-assert mechanism via an
> > > irqfd with the opposite polarity. I don't currently use this, but
> > > it's pretty trivial and at least available in archives now.
> >
> > The nasty bit about that is that if you assert twice then
> > deassert once it's not clear what should happen.
> > But yea, it does not hurt to put them in the archives.
>
> It's no different that KVM_IRQ_LINE in that sense. It's not an
> accumulator, it's a set state. Thanks,
>
> Alex

Yes but eventfd semantics are that of an accumulator so this
does not match what a normal eventfd does very well.
Anyway, as it's not required now we can argue about it
when we get to it.

--
MST

2012-06-28 08:46:18

by Gleb Natapov

[permalink] [raw]
Subject: Re: [PATCH v2 4/6] kvm: Extend irqfd to support level interrupts

On Thu, Jun 28, 2012 at 11:41:05AM +0300, Michael S. Tsirkin wrote:
> On Thu, Jun 28, 2012 at 11:35:41AM +0300, Gleb Natapov wrote:
> > On Thu, Jun 28, 2012 at 11:34:35AM +0300, Michael S. Tsirkin wrote:
> > > On Thu, Jun 28, 2012 at 09:34:31AM +0300, Gleb Natapov wrote:
> > > > On Thu, Jun 28, 2012 at 01:31:29AM +0300, Michael S. Tsirkin wrote:
> > > > > On Wed, Jun 27, 2012 at 04:04:18PM -0600, Alex Williamson wrote:
> > > > > > On Wed, 2012-06-27 at 18:26 +0300, Michael S. Tsirkin wrote:
> > > > > > > On Tue, Jun 26, 2012 at 11:09:46PM -0600, Alex Williamson wrote:
> > > > > > > > @@ -71,6 +130,14 @@ irqfd_inject(struct work_struct *work)
> > > > > > > > kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd->gsi, 0);
> > > > > > > > }
> > > > > > > >
> > > > > > > > +static void
> > > > > > > > +irqfd_inject_level(struct work_struct *work)
> > > > > > > > +{
> > > > > > > > + struct _irqfd *irqfd = container_of(work, struct _irqfd, inject);
> > > > > > > > +
> > > > > > > > + kvm_set_irq(irqfd->kvm, irqfd->source->id, irqfd->gsi, 1);
> > > > > > > > +}
> > > > > > > > +
> > > > > > > > /*
> > > > > > > > * Race-free decouple logic (ordering is critical)
> > > > > > > > */
> > > > > > >
> > > > > > >
> > > > > > > Why is it safe to ignore return value here?
> > > > > > > needs a comment.
> > > > > >
> > > > > > Well, it seems like you and Gleb came to the conclusion that it's safe,
> > > > > > but I can really follow from the list thread. Can you explain and I'll
> > > > > > add a comment? Thanks,
> > > > > >
> > > > > > Alex
> > > > >
> > > > > We merely talked about edge interrupts.
> > > > >
> > > > In fact it would have been nice to return -EBUSY when write() to level
> > > > irqfd is coalesced.
> > >
> > > Possibly nice but not really practical.
> > >
> > What do you mean by that? Impossible to implement or not useful?
>
> Impossible to implement and also does not match normal eventfd
> semantics.
>
Hmm, I remember we discussed using irqfd for level triggered interrupt ~2
years ago and came to a conclusion that eventfd is a bad fit for it,
was true than is true now. Not be able to detect coalescing will make
irqfd level interrupts inferior to IRQ_LINE ioctl.

--
Gleb.

2012-06-28 08:48:44

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH v2 4/6] kvm: Extend irqfd to support level interrupts

On Thu, Jun 28, 2012 at 11:46:11AM +0300, Gleb Natapov wrote:
> On Thu, Jun 28, 2012 at 11:41:05AM +0300, Michael S. Tsirkin wrote:
> > On Thu, Jun 28, 2012 at 11:35:41AM +0300, Gleb Natapov wrote:
> > > On Thu, Jun 28, 2012 at 11:34:35AM +0300, Michael S. Tsirkin wrote:
> > > > On Thu, Jun 28, 2012 at 09:34:31AM +0300, Gleb Natapov wrote:
> > > > > On Thu, Jun 28, 2012 at 01:31:29AM +0300, Michael S. Tsirkin wrote:
> > > > > > On Wed, Jun 27, 2012 at 04:04:18PM -0600, Alex Williamson wrote:
> > > > > > > On Wed, 2012-06-27 at 18:26 +0300, Michael S. Tsirkin wrote:
> > > > > > > > On Tue, Jun 26, 2012 at 11:09:46PM -0600, Alex Williamson wrote:
> > > > > > > > > @@ -71,6 +130,14 @@ irqfd_inject(struct work_struct *work)
> > > > > > > > > kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd->gsi, 0);
> > > > > > > > > }
> > > > > > > > >
> > > > > > > > > +static void
> > > > > > > > > +irqfd_inject_level(struct work_struct *work)
> > > > > > > > > +{
> > > > > > > > > + struct _irqfd *irqfd = container_of(work, struct _irqfd, inject);
> > > > > > > > > +
> > > > > > > > > + kvm_set_irq(irqfd->kvm, irqfd->source->id, irqfd->gsi, 1);
> > > > > > > > > +}
> > > > > > > > > +
> > > > > > > > > /*
> > > > > > > > > * Race-free decouple logic (ordering is critical)
> > > > > > > > > */
> > > > > > > >
> > > > > > > >
> > > > > > > > Why is it safe to ignore return value here?
> > > > > > > > needs a comment.
> > > > > > >
> > > > > > > Well, it seems like you and Gleb came to the conclusion that it's safe,
> > > > > > > but I can really follow from the list thread. Can you explain and I'll
> > > > > > > add a comment? Thanks,
> > > > > > >
> > > > > > > Alex
> > > > > >
> > > > > > We merely talked about edge interrupts.
> > > > > >
> > > > > In fact it would have been nice to return -EBUSY when write() to level
> > > > > irqfd is coalesced.
> > > >
> > > > Possibly nice but not really practical.
> > > >
> > > What do you mean by that? Impossible to implement or not useful?
> >
> > Impossible to implement and also does not match normal eventfd
> > semantics.
> >
> Hmm, I remember we discussed using irqfd for level triggered interrupt ~2
> years ago and came to a conclusion that eventfd is a bad fit for it,
> was true than is true now. Not be able to detect coalescing will make
> irqfd level interrupts inferior to IRQ_LINE ioctl.

This aspect isn't different from MSI though, is it?

> --
> Gleb.

2012-06-28 08:53:58

by Gleb Natapov

[permalink] [raw]
Subject: Re: [PATCH v2 4/6] kvm: Extend irqfd to support level interrupts

On Thu, Jun 28, 2012 at 11:48:40AM +0300, Michael S. Tsirkin wrote:
> On Thu, Jun 28, 2012 at 11:46:11AM +0300, Gleb Natapov wrote:
> > On Thu, Jun 28, 2012 at 11:41:05AM +0300, Michael S. Tsirkin wrote:
> > > On Thu, Jun 28, 2012 at 11:35:41AM +0300, Gleb Natapov wrote:
> > > > On Thu, Jun 28, 2012 at 11:34:35AM +0300, Michael S. Tsirkin wrote:
> > > > > On Thu, Jun 28, 2012 at 09:34:31AM +0300, Gleb Natapov wrote:
> > > > > > On Thu, Jun 28, 2012 at 01:31:29AM +0300, Michael S. Tsirkin wrote:
> > > > > > > On Wed, Jun 27, 2012 at 04:04:18PM -0600, Alex Williamson wrote:
> > > > > > > > On Wed, 2012-06-27 at 18:26 +0300, Michael S. Tsirkin wrote:
> > > > > > > > > On Tue, Jun 26, 2012 at 11:09:46PM -0600, Alex Williamson wrote:
> > > > > > > > > > @@ -71,6 +130,14 @@ irqfd_inject(struct work_struct *work)
> > > > > > > > > > kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd->gsi, 0);
> > > > > > > > > > }
> > > > > > > > > >
> > > > > > > > > > +static void
> > > > > > > > > > +irqfd_inject_level(struct work_struct *work)
> > > > > > > > > > +{
> > > > > > > > > > + struct _irqfd *irqfd = container_of(work, struct _irqfd, inject);
> > > > > > > > > > +
> > > > > > > > > > + kvm_set_irq(irqfd->kvm, irqfd->source->id, irqfd->gsi, 1);
> > > > > > > > > > +}
> > > > > > > > > > +
> > > > > > > > > > /*
> > > > > > > > > > * Race-free decouple logic (ordering is critical)
> > > > > > > > > > */
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Why is it safe to ignore return value here?
> > > > > > > > > needs a comment.
> > > > > > > >
> > > > > > > > Well, it seems like you and Gleb came to the conclusion that it's safe,
> > > > > > > > but I can really follow from the list thread. Can you explain and I'll
> > > > > > > > add a comment? Thanks,
> > > > > > > >
> > > > > > > > Alex
> > > > > > >
> > > > > > > We merely talked about edge interrupts.
> > > > > > >
> > > > > > In fact it would have been nice to return -EBUSY when write() to level
> > > > > > irqfd is coalesced.
> > > > >
> > > > > Possibly nice but not really practical.
> > > > >
> > > > What do you mean by that? Impossible to implement or not useful?
> > >
> > > Impossible to implement and also does not match normal eventfd
> > > semantics.
> > >
> > Hmm, I remember we discussed using irqfd for level triggered interrupt ~2
> > years ago and came to a conclusion that eventfd is a bad fit for it,
> > was true than is true now. Not be able to detect coalescing will make
> > irqfd level interrupts inferior to IRQ_LINE ioctl.
>
> This aspect isn't different from MSI though, is it?
>
For MSI delivered through irqfd it isn't, so yes, level triggered are
not more broken in this regards. You can detect coalescing if you
deliver MSI through ioctl.

--
Gleb.

2012-06-28 09:04:01

by Cornelia Huck

[permalink] [raw]
Subject: Re: [PATCH v2 1/6] kvm: Pass kvm_irqfd to functions

On Thu, 28 Jun 2012 11:38:57 +0300
"Michael S. Tsirkin" <[email protected]> wrote:

> On Wed, Jun 27, 2012 at 04:24:30PM +0200, Cornelia Huck wrote:
> > On Tue, 26 Jun 2012 23:09:04 -0600
> > Alex Williamson <[email protected]> wrote:
> >
> > > Prune this down to just the struct kvm_irqfd so we can avoid
> > > changing function definition for every flag or field we use.
> > >
> > > Signed-off-by: Alex Williamson <[email protected]>
> >
> > I'm currently trying to find a way to make irqfd workable for s390
> > which will likely include using a new field in kvm_irqfd, so I'd like
> > to have this change (and I also think it makes the code nicer to read).
> > So:
> >
> > Acked-by: Cornelia Huck <[email protected]>
>
> Unfortunately it looks like we are not sanitizing kvm_irqfd
> at all so we won't be able to use the padding :(
> We'll need a new ioctl instead.
>

How about something like this as parameter for the new ioctl?

struct kvm_irqfd2 {
__u32 fd;
__u32 flags; /* for things like deassign */
__u64 type; /* determines the payload */
union {
/* type traditional */
struct {
__u32 gsi;
} trad;
/* type s390 */
struct {
__u32 int_type;
__u32 parm;
__u64 parm64;
} s390;
__u8 pad[20];
};
}

This could be combined with an arch or a per-kvm callback to keep the
generic code clean of architecture dependencies.

Cornelia

2012-06-28 09:34:48

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH v2 1/6] kvm: Pass kvm_irqfd to functions

On Thu, Jun 28, 2012 at 11:03:16AM +0200, Cornelia Huck wrote:
> On Thu, 28 Jun 2012 11:38:57 +0300
> "Michael S. Tsirkin" <[email protected]> wrote:
>
> > On Wed, Jun 27, 2012 at 04:24:30PM +0200, Cornelia Huck wrote:
> > > On Tue, 26 Jun 2012 23:09:04 -0600
> > > Alex Williamson <[email protected]> wrote:
> > >
> > > > Prune this down to just the struct kvm_irqfd so we can avoid
> > > > changing function definition for every flag or field we use.
> > > >
> > > > Signed-off-by: Alex Williamson <[email protected]>
> > >
> > > I'm currently trying to find a way to make irqfd workable for s390
> > > which will likely include using a new field in kvm_irqfd, so I'd like
> > > to have this change (and I also think it makes the code nicer to read).
> > > So:
> > >
> > > Acked-by: Cornelia Huck <[email protected]>
> >
> > Unfortunately it looks like we are not sanitizing kvm_irqfd
> > at all so we won't be able to use the padding :(
> > We'll need a new ioctl instead.
> >
>
> How about something like this as parameter for the new ioctl?
>
> struct kvm_irqfd2 {
> __u32 fd;
> __u32 flags; /* for things like deassign */
> __u64 type; /* determines the payload */
> union {
> /* type traditional */
> struct {
> __u32 gsi;
> } trad;
> /* type s390 */
> struct {
> __u32 int_type;
> __u32 parm;
> __u64 parm64;
> } s390;
> __u8 pad[20];
> };
> }
>
> This could be combined with an arch or a per-kvm callback to keep the
> generic code clean of architecture dependencies.
>
> Cornelia

Looks a bit weird - shouldn't all this be part of gsi routing?
But no idea really, I don't see the big picture here.

2012-06-28 12:01:06

by Cornelia Huck

[permalink] [raw]
Subject: Re: [PATCH v2 1/6] kvm: Pass kvm_irqfd to functions

On Thu, 28 Jun 2012 12:34:43 +0300
"Michael S. Tsirkin" <[email protected]> wrote:

> On Thu, Jun 28, 2012 at 11:03:16AM +0200, Cornelia Huck wrote:

> > How about something like this as parameter for the new ioctl?
> >
> > struct kvm_irqfd2 {
> > __u32 fd;
> > __u32 flags; /* for things like deassign */
> > __u64 type; /* determines the payload */
> > union {
> > /* type traditional */
> > struct {
> > __u32 gsi;
> > } trad;
> > /* type s390 */
> > struct {
> > __u32 int_type;
> > __u32 parm;
> > __u64 parm64;
> > } s390;
> > __u8 pad[20];
> > };
> > }
> >
> > This could be combined with an arch or a per-kvm callback to keep the
> > generic code clean of architecture dependencies.
> >
> > Cornelia
>
> Looks a bit weird - shouldn't all this be part of gsi routing?
> But no idea really, I don't see the big picture here.
>

Well, on s390 we don't have anything like "gsi routing" (I'm not even
really sure what that is).

My understanding is the following:

- Basically, we want to notify the guest for a virtqueue.
- For this, we want to inject an interrupt for the associated device.
- On x86, this means raising an interrupt on an interrupt line, as
specified by some kind of number.
- On s390, we need some more information to (a) identify the device and
(b) additional information that needs to be transmitted for an
interrupt (device specific). (This is what basically goes into struct
kvm_s390_interrupt, which I reproduced in the s390 part.)

Cornelia

2012-06-28 12:09:50

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH v2 1/6] kvm: Pass kvm_irqfd to functions

On Thu, Jun 28, 2012 at 02:00:41PM +0200, Cornelia Huck wrote:
> On Thu, 28 Jun 2012 12:34:43 +0300
> "Michael S. Tsirkin" <[email protected]> wrote:
>
> > On Thu, Jun 28, 2012 at 11:03:16AM +0200, Cornelia Huck wrote:
>
> > > How about something like this as parameter for the new ioctl?
> > >
> > > struct kvm_irqfd2 {
> > > __u32 fd;
> > > __u32 flags; /* for things like deassign */
> > > __u64 type; /* determines the payload */
> > > union {
> > > /* type traditional */
> > > struct {
> > > __u32 gsi;
> > > } trad;
> > > /* type s390 */
> > > struct {
> > > __u32 int_type;
> > > __u32 parm;
> > > __u64 parm64;
> > > } s390;
> > > __u8 pad[20];
> > > };
> > > }
> > >
> > > This could be combined with an arch or a per-kvm callback to keep the
> > > generic code clean of architecture dependencies.
> > >
> > > Cornelia
> >
> > Looks a bit weird - shouldn't all this be part of gsi routing?
> > But no idea really, I don't see the big picture here.
> >
>
> Well, on s390 we don't have anything like "gsi routing" (I'm not even
> really sure what that is).

I really mean kvm_irq_routing. This has options for
irqchip, msi, I guess we can add more.


> My understanding is the following:
>
> - Basically, we want to notify the guest for a virtqueue.
> - For this, we want to inject an interrupt for the associated device.
> - On x86, this means raising an interrupt on an interrupt line, as
> specified by some kind of number.

Not just that: for MSI we pass in data encoding priority
destination etc.

> - On s390, we need some more information to (a) identify the device and
> (b) additional information that needs to be transmitted for an
> interrupt (device specific). (This is what basically goes into struct
> kvm_s390_interrupt, which I reproduced in the s390 part.)
>
> Cornelia

Is this b mostly static or does it change for each interrupt?

2012-06-28 12:36:06

by Avi Kivity

[permalink] [raw]
Subject: Re: [PATCH v2 3/6] kvm: Sanitize KVM_IRQFD flags

On 06/27/2012 12:21 PM, Michael S. Tsirkin wrote:
> On Tue, Jun 26, 2012 at 11:09:32PM -0600, Alex Williamson wrote:
>> We only know of one so far.
>>
>> Signed-off-by: Alex Williamson <[email protected]>
>
> Ugh. So we have a bug: we should have sanitized the fields.
> If there's buggy userspace that only set the low bit
> it will break with this change.
> Is it too late now? Do we need KVM_IRQFD2 which
> sanitized fields properly? Avi?

We try and see. Commit this, if somebody complain, revert after
apologizing profusely. If no one notices, we can claim those bits.



--
error compiling committee.c: too many arguments to function

2012-06-28 12:41:39

by Avi Kivity

[permalink] [raw]
Subject: Re: [PATCH v2 4/6] kvm: Extend irqfd to support level interrupts

On 06/28/2012 12:19 AM, Alex Williamson wrote:
>> > @@ -302,6 +385,7 @@ kvm_irqfd_deassign(struct kvm *kvm, struct kvm_irqfd *args)
>> > {
>> > struct _irqfd *irqfd, *tmp;
>> > struct eventfd_ctx *eventfd;
>> > + bool is_level = (args->flags & KVM_IRQFD_FLAG_LEVEL) != 0;
>>
>> != 0 is not needed here I think.
>
> Strictly, no it's not. I just prefer to set bools from bools and I
> think Avi has commented on using !! before.

I prefer the naked version too, but != 0 is acceptable.
--
error compiling committee.c: too many arguments to function

2012-06-28 12:59:35

by Avi Kivity

[permalink] [raw]
Subject: Re: [PATCH v2 6/6] kvm: Level IRQ de-assert for KVM_IRQFD

On 06/27/2012 08:10 AM, Alex Williamson wrote:
> This is an alternate level irqfd de-assert mode that's potentially
> useful for emulated drivers. It's included here to show how easy it
> is to implement with the new level irqfd and eoifd support. It's
> possible this mode might also prove interesting for device-assignment
> where we inject via level irqfd, receive an EOI (w/o de-assert), and
> use the level de-assert irqfd here.

This use case is racy. The guest driver will have shut down the
interrupt before EOI, but with what you describe, it will fire again
until the eoifd/deassertfd sequence completes.

An emulated device will see the guest driver shutting down the interrupt
so it's not a problem there.

--
error compiling committee.c: too many arguments to function

2012-06-28 13:11:44

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH v2 5/6] kvm: KVM_EOIFD, an eventfd for EOIs

On Wed, Jun 27, 2012 at 09:55:44PM -0600, Alex Williamson wrote:
> On Wed, 2012-06-27 at 17:51 +0300, Gleb Natapov wrote:
> > On Wed, Jun 27, 2012 at 08:29:04AM -0600, Alex Williamson wrote:
> > > On Wed, 2012-06-27 at 16:58 +0300, Gleb Natapov wrote:
> > > > On Tue, Jun 26, 2012 at 11:10:08PM -0600, Alex Williamson wrote:
> > > > > This new ioctl enables an eventfd to be triggered when an EOI is
> > > > > written for a specified irqchip pin. By default this is a simple
> > > > > notification, but we can also tie the eoifd to a level irqfd, which
> > > > > enables the irqchip pin to be automatically de-asserted on EOI.
> > > > > This mode is particularly useful for device-assignment applications
> > > > > where the unmask and notify triggers a hardware unmask. The default
> > > > > mode is most applicable to simple notify with no side-effects for
> > > > > userspace usage, such as Qemu.
> > > > >
> > > > > Here we make use of the reference counting of the _irq_source
> > > > > object allowing us to share it with an irqfd and cleanup regardless
> > > > > of the release order.
> > > > >
> > > > > Signed-off-by: Alex Williamson <[email protected]>
> > > > > ---
> > > > >
> > > > > Documentation/virtual/kvm/api.txt | 24 +++++
> > > > > arch/x86/kvm/x86.c | 1
> > > > > include/linux/kvm.h | 14 +++
> > > > > include/linux/kvm_host.h | 13 +++
> > > > > virt/kvm/eventfd.c | 189 +++++++++++++++++++++++++++++++++++++
> > > > > virt/kvm/kvm_main.c | 11 ++
> > > > > 6 files changed, 250 insertions(+), 2 deletions(-)
> > > > >
> > > > > diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> > > > > index b216709..87a2558 100644
> > > > > --- a/Documentation/virtual/kvm/api.txt
> > > > > +++ b/Documentation/virtual/kvm/api.txt
> > > > > @@ -1987,6 +1987,30 @@ interrupts with those injected through KVM_IRQ_LINE. IRQFDs created
> > > > > with KVM_IRQFD_FLAG_LEVEL must also set this flag when de-assiging.
> > > > > KVM_IRQFD_FLAG_LEVEL support is indicated by KVM_CAP_IRQFD_LEVEL.
> > > > >
> > > > > +4.77 KVM_EOIFD
> > > > > +
> > > > > +Capability: KVM_CAP_EOIFD
> > > > > +Architectures: x86
> > > > > +Type: vm ioctl
> > > > > +Parameters: struct kvm_eoifd (in)
> > > > > +Returns: 0 on success, -1 on error
> > > > > +
> > > > > +KVM_EOIFD allows userspace to receive EOI notification through an
> > > > > +eventfd for level triggered irqchip interrupts. Behavior for edge
> > > > > +triggered interrupts is undefined. kvm_eoifd.fd specifies the eventfd
> > > > Lets make it defined. EOI notification can be used by userspace to fix
> > > > time drift due to lost interrupts. But than userspace needs to know
> > > > which vcpu did EOI.
> > >
> > > Hmm, do we need an additional flag and field in kvm_eoifd to filter by
> > > vCPU then?
> > >
> > This will be enough for a use case I am aware of. Don't know if this
> > interface is generic enough for all possible use cases.
>
> That's generally a hard prediction to make ;) We currently don't pass a
> kvm_vcpu anywhere close to the irq ack notifier. The ioapic path could
> be relatively trivial, but the pic path is a bit further disconnected.
> If we had that plumbing, a KVM_CAP plus vcpu filter flag and specifying
> the vcpu using some of the padding space seems like it's sufficient.
> I'll drop mention of level-only from the description, but the plumbing
> and vcpu filtering can be a follow-on. Thanks,
>
> Alex

If we don't implement what's needed for timedrift to be fixed,
then IMO it's better to simply require an IRQFD for EOIFD for now,
and limit this to level. Otherwise when we actually try to implement
we might find issues.

Another reason to explicitly say EOI is not supported for edge is that
EOI might not get invoked at all with PV EOI.


--
MST

2012-06-28 14:08:09

by Gleb Natapov

[permalink] [raw]
Subject: Re: [PATCH v2 5/6] kvm: KVM_EOIFD, an eventfd for EOIs

On Thu, Jun 28, 2012 at 04:11:40PM +0300, Michael S. Tsirkin wrote:
> On Wed, Jun 27, 2012 at 09:55:44PM -0600, Alex Williamson wrote:
> > On Wed, 2012-06-27 at 17:51 +0300, Gleb Natapov wrote:
> > > On Wed, Jun 27, 2012 at 08:29:04AM -0600, Alex Williamson wrote:
> > > > On Wed, 2012-06-27 at 16:58 +0300, Gleb Natapov wrote:
> > > > > On Tue, Jun 26, 2012 at 11:10:08PM -0600, Alex Williamson wrote:
> > > > > > This new ioctl enables an eventfd to be triggered when an EOI is
> > > > > > written for a specified irqchip pin. By default this is a simple
> > > > > > notification, but we can also tie the eoifd to a level irqfd, which
> > > > > > enables the irqchip pin to be automatically de-asserted on EOI.
> > > > > > This mode is particularly useful for device-assignment applications
> > > > > > where the unmask and notify triggers a hardware unmask. The default
> > > > > > mode is most applicable to simple notify with no side-effects for
> > > > > > userspace usage, such as Qemu.
> > > > > >
> > > > > > Here we make use of the reference counting of the _irq_source
> > > > > > object allowing us to share it with an irqfd and cleanup regardless
> > > > > > of the release order.
> > > > > >
> > > > > > Signed-off-by: Alex Williamson <[email protected]>
> > > > > > ---
> > > > > >
> > > > > > Documentation/virtual/kvm/api.txt | 24 +++++
> > > > > > arch/x86/kvm/x86.c | 1
> > > > > > include/linux/kvm.h | 14 +++
> > > > > > include/linux/kvm_host.h | 13 +++
> > > > > > virt/kvm/eventfd.c | 189 +++++++++++++++++++++++++++++++++++++
> > > > > > virt/kvm/kvm_main.c | 11 ++
> > > > > > 6 files changed, 250 insertions(+), 2 deletions(-)
> > > > > >
> > > > > > diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> > > > > > index b216709..87a2558 100644
> > > > > > --- a/Documentation/virtual/kvm/api.txt
> > > > > > +++ b/Documentation/virtual/kvm/api.txt
> > > > > > @@ -1987,6 +1987,30 @@ interrupts with those injected through KVM_IRQ_LINE. IRQFDs created
> > > > > > with KVM_IRQFD_FLAG_LEVEL must also set this flag when de-assiging.
> > > > > > KVM_IRQFD_FLAG_LEVEL support is indicated by KVM_CAP_IRQFD_LEVEL.
> > > > > >
> > > > > > +4.77 KVM_EOIFD
> > > > > > +
> > > > > > +Capability: KVM_CAP_EOIFD
> > > > > > +Architectures: x86
> > > > > > +Type: vm ioctl
> > > > > > +Parameters: struct kvm_eoifd (in)
> > > > > > +Returns: 0 on success, -1 on error
> > > > > > +
> > > > > > +KVM_EOIFD allows userspace to receive EOI notification through an
> > > > > > +eventfd for level triggered irqchip interrupts. Behavior for edge
> > > > > > +triggered interrupts is undefined. kvm_eoifd.fd specifies the eventfd
> > > > > Lets make it defined. EOI notification can be used by userspace to fix
> > > > > time drift due to lost interrupts. But than userspace needs to know
> > > > > which vcpu did EOI.
> > > >
> > > > Hmm, do we need an additional flag and field in kvm_eoifd to filter by
> > > > vCPU then?
> > > >
> > > This will be enough for a use case I am aware of. Don't know if this
> > > interface is generic enough for all possible use cases.
> >
> > That's generally a hard prediction to make ;) We currently don't pass a
> > kvm_vcpu anywhere close to the irq ack notifier. The ioapic path could
> > be relatively trivial, but the pic path is a bit further disconnected.
> > If we had that plumbing, a KVM_CAP plus vcpu filter flag and specifying
> > the vcpu using some of the padding space seems like it's sufficient.
> > I'll drop mention of level-only from the description, but the plumbing
> > and vcpu filtering can be a follow-on. Thanks,
> >
> > Alex
>
> If we don't implement what's needed for timedrift to be fixed,
> then IMO it's better to simply require an IRQFD for EOIFD for now,
> and limit this to level. Otherwise when we actually try to implement
> we might find issues.
>
> Another reason to explicitly say EOI is not supported for edge is that
> EOI might not get invoked at all with PV EOI.
>
Good point, but easily addressable by disabling PV EOI for a GSI that
has EOI notifier registered.

--
Gleb.

2012-06-28 16:51:18

by Cornelia Huck

[permalink] [raw]
Subject: Re: [PATCH v2 1/6] kvm: Pass kvm_irqfd to functions

On Thu, 28 Jun 2012 15:09:49 +0300
"Michael S. Tsirkin" <[email protected]> wrote:

> On Thu, Jun 28, 2012 at 02:00:41PM +0200, Cornelia Huck wrote:
> > On Thu, 28 Jun 2012 12:34:43 +0300
> > "Michael S. Tsirkin" <[email protected]> wrote:
> >
> > > On Thu, Jun 28, 2012 at 11:03:16AM +0200, Cornelia Huck wrote:
> >
> > > > How about something like this as parameter for the new ioctl?
> > > >
> > > > struct kvm_irqfd2 {
> > > > __u32 fd;
> > > > __u32 flags; /* for things like deassign */
> > > > __u64 type; /* determines the payload */
> > > > union {
> > > > /* type traditional */
> > > > struct {
> > > > __u32 gsi;
> > > > } trad;
> > > > /* type s390 */
> > > > struct {
> > > > __u32 int_type;
> > > > __u32 parm;
> > > > __u64 parm64;
> > > > } s390;
> > > > __u8 pad[20];
> > > > };
> > > > }
> > > >
> > > > This could be combined with an arch or a per-kvm callback to keep the
> > > > generic code clean of architecture dependencies.
> > > >
> > > > Cornelia
> > >
> > > Looks a bit weird - shouldn't all this be part of gsi routing?
> > > But no idea really, I don't see the big picture here.
> > >
> >
> > Well, on s390 we don't have anything like "gsi routing" (I'm not even
> > really sure what that is).
>
> I really mean kvm_irq_routing. This has options for
> irqchip, msi, I guess we can add more.

I stared at irq_comm.c for a bit and it seems to fulfill a purpose
similar to arch/s390/kvm/interrupt.c (although it looks more static).
But I don't really see how they could fit together easily.

>
>
> > My understanding is the following:
> >
> > - Basically, we want to notify the guest for a virtqueue.
> > - For this, we want to inject an interrupt for the associated device.
> > - On x86, this means raising an interrupt on an interrupt line, as
> > specified by some kind of number.
>
> Not just that: for MSI we pass in data encoding priority
> destination etc.
>
> > - On s390, we need some more information to (a) identify the device and
> > (b) additional information that needs to be transmitted for an
> > interrupt (device specific). (This is what basically goes into struct
> > kvm_s390_interrupt, which I reproduced in the s390 part.)
> >
> > Cornelia
>
> Is this b mostly static or does it change for each interrupt?

For Linux guests it will be static, although the architecture would
allow for changing (some of) it.

Cornelia

2012-06-28 16:55:29

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH v2 5/6] kvm: KVM_EOIFD, an eventfd for EOIs

On Thu, Jun 28, 2012 at 05:08:04PM +0300, Gleb Natapov wrote:
> On Thu, Jun 28, 2012 at 04:11:40PM +0300, Michael S. Tsirkin wrote:
> > On Wed, Jun 27, 2012 at 09:55:44PM -0600, Alex Williamson wrote:
> > > On Wed, 2012-06-27 at 17:51 +0300, Gleb Natapov wrote:
> > > > On Wed, Jun 27, 2012 at 08:29:04AM -0600, Alex Williamson wrote:
> > > > > On Wed, 2012-06-27 at 16:58 +0300, Gleb Natapov wrote:
> > > > > > On Tue, Jun 26, 2012 at 11:10:08PM -0600, Alex Williamson wrote:
> > > > > > > This new ioctl enables an eventfd to be triggered when an EOI is
> > > > > > > written for a specified irqchip pin. By default this is a simple
> > > > > > > notification, but we can also tie the eoifd to a level irqfd, which
> > > > > > > enables the irqchip pin to be automatically de-asserted on EOI.
> > > > > > > This mode is particularly useful for device-assignment applications
> > > > > > > where the unmask and notify triggers a hardware unmask. The default
> > > > > > > mode is most applicable to simple notify with no side-effects for
> > > > > > > userspace usage, such as Qemu.
> > > > > > >
> > > > > > > Here we make use of the reference counting of the _irq_source
> > > > > > > object allowing us to share it with an irqfd and cleanup regardless
> > > > > > > of the release order.
> > > > > > >
> > > > > > > Signed-off-by: Alex Williamson <[email protected]>
> > > > > > > ---
> > > > > > >
> > > > > > > Documentation/virtual/kvm/api.txt | 24 +++++
> > > > > > > arch/x86/kvm/x86.c | 1
> > > > > > > include/linux/kvm.h | 14 +++
> > > > > > > include/linux/kvm_host.h | 13 +++
> > > > > > > virt/kvm/eventfd.c | 189 +++++++++++++++++++++++++++++++++++++
> > > > > > > virt/kvm/kvm_main.c | 11 ++
> > > > > > > 6 files changed, 250 insertions(+), 2 deletions(-)
> > > > > > >
> > > > > > > diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> > > > > > > index b216709..87a2558 100644
> > > > > > > --- a/Documentation/virtual/kvm/api.txt
> > > > > > > +++ b/Documentation/virtual/kvm/api.txt
> > > > > > > @@ -1987,6 +1987,30 @@ interrupts with those injected through KVM_IRQ_LINE. IRQFDs created
> > > > > > > with KVM_IRQFD_FLAG_LEVEL must also set this flag when de-assiging.
> > > > > > > KVM_IRQFD_FLAG_LEVEL support is indicated by KVM_CAP_IRQFD_LEVEL.
> > > > > > >
> > > > > > > +4.77 KVM_EOIFD
> > > > > > > +
> > > > > > > +Capability: KVM_CAP_EOIFD
> > > > > > > +Architectures: x86
> > > > > > > +Type: vm ioctl
> > > > > > > +Parameters: struct kvm_eoifd (in)
> > > > > > > +Returns: 0 on success, -1 on error
> > > > > > > +
> > > > > > > +KVM_EOIFD allows userspace to receive EOI notification through an
> > > > > > > +eventfd for level triggered irqchip interrupts. Behavior for edge
> > > > > > > +triggered interrupts is undefined. kvm_eoifd.fd specifies the eventfd
> > > > > > Lets make it defined. EOI notification can be used by userspace to fix
> > > > > > time drift due to lost interrupts. But than userspace needs to know
> > > > > > which vcpu did EOI.
> > > > >
> > > > > Hmm, do we need an additional flag and field in kvm_eoifd to filter by
> > > > > vCPU then?
> > > > >
> > > > This will be enough for a use case I am aware of. Don't know if this
> > > > interface is generic enough for all possible use cases.
> > >
> > > That's generally a hard prediction to make ;) We currently don't pass a
> > > kvm_vcpu anywhere close to the irq ack notifier. The ioapic path could
> > > be relatively trivial, but the pic path is a bit further disconnected.
> > > If we had that plumbing, a KVM_CAP plus vcpu filter flag and specifying
> > > the vcpu using some of the padding space seems like it's sufficient.
> > > I'll drop mention of level-only from the description, but the plumbing
> > > and vcpu filtering can be a follow-on. Thanks,
> > >
> > > Alex
> >
> > If we don't implement what's needed for timedrift to be fixed,
> > then IMO it's better to simply require an IRQFD for EOIFD for now,
> > and limit this to level. Otherwise when we actually try to implement
> > we might find issues.
> >
> > Another reason to explicitly say EOI is not supported for edge is that
> > EOI might not get invoked at all with PV EOI.
> >
> Good point, but easily addressable by disabling PV EOI for a GSI that
> has EOI notifier registered.

This patch doesn't do this though, so it will need a separate
patch and a separate capability.

> --
> Gleb.

2012-06-28 16:56:55

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH v2 1/6] kvm: Pass kvm_irqfd to functions

On Thu, Jun 28, 2012 at 06:51:09PM +0200, Cornelia Huck wrote:
> On Thu, 28 Jun 2012 15:09:49 +0300
> "Michael S. Tsirkin" <[email protected]> wrote:
>
> > On Thu, Jun 28, 2012 at 02:00:41PM +0200, Cornelia Huck wrote:
> > > On Thu, 28 Jun 2012 12:34:43 +0300
> > > "Michael S. Tsirkin" <[email protected]> wrote:
> > >
> > > > On Thu, Jun 28, 2012 at 11:03:16AM +0200, Cornelia Huck wrote:
> > >
> > > > > How about something like this as parameter for the new ioctl?
> > > > >
> > > > > struct kvm_irqfd2 {
> > > > > __u32 fd;
> > > > > __u32 flags; /* for things like deassign */
> > > > > __u64 type; /* determines the payload */
> > > > > union {
> > > > > /* type traditional */
> > > > > struct {
> > > > > __u32 gsi;
> > > > > } trad;
> > > > > /* type s390 */
> > > > > struct {
> > > > > __u32 int_type;
> > > > > __u32 parm;
> > > > > __u64 parm64;
> > > > > } s390;
> > > > > __u8 pad[20];
> > > > > };
> > > > > }
> > > > >
> > > > > This could be combined with an arch or a per-kvm callback to keep the
> > > > > generic code clean of architecture dependencies.
> > > > >
> > > > > Cornelia
> > > >
> > > > Looks a bit weird - shouldn't all this be part of gsi routing?
> > > > But no idea really, I don't see the big picture here.
> > > >
> > >
> > > Well, on s390 we don't have anything like "gsi routing" (I'm not even
> > > really sure what that is).
> >
> > I really mean kvm_irq_routing. This has options for
> > irqchip, msi, I guess we can add more.
>
> I stared at irq_comm.c for a bit and it seems to fulfill a purpose
> similar to arch/s390/kvm/interrupt.c (although it looks more static).
> But I don't really see how they could fit together easily.
>
> >
> >
> > > My understanding is the following:
> > >
> > > - Basically, we want to notify the guest for a virtqueue.
> > > - For this, we want to inject an interrupt for the associated device.
> > > - On x86, this means raising an interrupt on an interrupt line, as
> > > specified by some kind of number.
> >
> > Not just that: for MSI we pass in data encoding priority
> > destination etc.
> >
> > > - On s390, we need some more information to (a) identify the device and
> > > (b) additional information that needs to be transmitted for an
> > > interrupt (device specific). (This is what basically goes into struct
> > > kvm_s390_interrupt, which I reproduced in the s390 part.)
> > >
> > > Cornelia
> >
> > Is this b mostly static or does it change for each interrupt?
>
> For Linux guests it will be static, although the architecture would
> allow for changing (some of) it.
>
> Cornelia

So storing this info in routing might be a good fit.

--
MST

2012-06-28 19:29:33

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH v2 5/6] kvm: KVM_EOIFD, an eventfd for EOIs

On Tue, Jun 26, 2012 at 11:10:08PM -0600, Alex Williamson wrote:
> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> index b216709..87a2558 100644
> --- a/Documentation/virtual/kvm/api.txt
> +++ b/Documentation/virtual/kvm/api.txt
> @@ -1987,6 +1987,30 @@ interrupts with those injected through KVM_IRQ_LINE. IRQFDs created
> with KVM_IRQFD_FLAG_LEVEL must also set this flag when de-assiging.
> KVM_IRQFD_FLAG_LEVEL support is indicated by KVM_CAP_IRQFD_LEVEL.
>
> +4.77 KVM_EOIFD
> +
> +Capability: KVM_CAP_EOIFD
> +Architectures: x86
> +Type: vm ioctl
> +Parameters: struct kvm_eoifd (in)
> +Returns: 0 on success, -1 on error
> +
> +KVM_EOIFD allows userspace to receive EOI notification through an
> +eventfd for level triggered irqchip interrupts. Behavior for edge
> +triggered interrupts is undefined. kvm_eoifd.fd specifies the eventfd
> +used for notification and kvm_eoifd.gsi specifies the irchip pin,
> +similar to KVM_IRQFD. KVM_EOIFD_FLAG_DEASSIGN is used to deassign
> +a previously enabled eoifd and should also set fd and gsi to match.
> +
> +The KVM_EOIFD_FLAG_LEVEL_IRQFD flag indicates that the EOI is for
> +a level triggered EOI and the kvm_eoifd structure includes
> +kvm_eoifd.irqfd, which must be previously configured using KVM_IRQFD
> +with the KVM_IRQFD_FLAG_LEVEL flag. This allows both EOI notification
> +through kvm_eoifd.fd as well as automatically de-asserting level
> +irqfds on EOI. Both KVM_EOIFD_FLAG_DEASSIGN and
> +KVM_EOIFD_FLAG_LEVEL_IRQFD should be used to de-assign an eoifd
> +initially setup with KVM_EOIFD_FLAG_LEVEL_IRQFD.
> +

OK so thinking about this some more, does the below makes sense:
it is enough to have LEVEL either in IRQFD or in EOIFD but not both.
I weakly prefer it in EOIFD: when you bind it to irqfd you
also assign a source id to that irqfd.

We also can make is a bit clearer: rename KVM_EOIFD_FLAG_LEVEL_IRQFD
to KVM_EOIFD_FLAG_LEVEL_CLEAR which makes it explicit that we clear
on EOI and that it is for a level interrupt. The fact that it needs
an irqfd is less important IMHO.
Make this flag mandatory for now, we'll see later how to handle
vcpu filtering and edge.

How does it sound?

--
MST

2012-06-29 15:09:44

by Alex Williamson

[permalink] [raw]
Subject: Re: [PATCH v2 5/6] kvm: KVM_EOIFD, an eventfd for EOIs

On Thu, 2012-06-28 at 22:29 +0300, Michael S. Tsirkin wrote:
> On Tue, Jun 26, 2012 at 11:10:08PM -0600, Alex Williamson wrote:
> > diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> > index b216709..87a2558 100644
> > --- a/Documentation/virtual/kvm/api.txt
> > +++ b/Documentation/virtual/kvm/api.txt
> > @@ -1987,6 +1987,30 @@ interrupts with those injected through KVM_IRQ_LINE. IRQFDs created
> > with KVM_IRQFD_FLAG_LEVEL must also set this flag when de-assiging.
> > KVM_IRQFD_FLAG_LEVEL support is indicated by KVM_CAP_IRQFD_LEVEL.
> >
> > +4.77 KVM_EOIFD
> > +
> > +Capability: KVM_CAP_EOIFD
> > +Architectures: x86
> > +Type: vm ioctl
> > +Parameters: struct kvm_eoifd (in)
> > +Returns: 0 on success, -1 on error
> > +
> > +KVM_EOIFD allows userspace to receive EOI notification through an
> > +eventfd for level triggered irqchip interrupts. Behavior for edge
> > +triggered interrupts is undefined. kvm_eoifd.fd specifies the eventfd
> > +used for notification and kvm_eoifd.gsi specifies the irchip pin,
> > +similar to KVM_IRQFD. KVM_EOIFD_FLAG_DEASSIGN is used to deassign
> > +a previously enabled eoifd and should also set fd and gsi to match.
> > +
> > +The KVM_EOIFD_FLAG_LEVEL_IRQFD flag indicates that the EOI is for
> > +a level triggered EOI and the kvm_eoifd structure includes
> > +kvm_eoifd.irqfd, which must be previously configured using KVM_IRQFD
> > +with the KVM_IRQFD_FLAG_LEVEL flag. This allows both EOI notification
> > +through kvm_eoifd.fd as well as automatically de-asserting level
> > +irqfds on EOI. Both KVM_EOIFD_FLAG_DEASSIGN and
> > +KVM_EOIFD_FLAG_LEVEL_IRQFD should be used to de-assign an eoifd
> > +initially setup with KVM_EOIFD_FLAG_LEVEL_IRQFD.
> > +
>
> OK so thinking about this some more, does the below makes sense:
> it is enough to have LEVEL either in IRQFD or in EOIFD but not both.
> I weakly prefer it in EOIFD: when you bind it to irqfd you
> also assign a source id to that irqfd.
>
> We also can make is a bit clearer: rename KVM_EOIFD_FLAG_LEVEL_IRQFD
> to KVM_EOIFD_FLAG_LEVEL_CLEAR which makes it explicit that we clear
> on EOI and that it is for a level interrupt. The fact that it needs
> an irqfd is less important IMHO.
> Make this flag mandatory for now, we'll see later how to handle
> vcpu filtering and edge.
>
> How does it sound?

Honestly, I don't like it at all. We're designing the interface
assuming that we can't modify KVM_IRQFD flags. Let's do as Avi suggests
and test whether KVM_IRQFD is a beyond broken first. Given that we do
make use of 1 bit in the flags for de-assert, I'm optimistic that nobody
is leaving random garbage in the rest of it.

Your idea may work, but I really don't like that KVM_EOIFD can reach
across and change the behavior of KVM_IRQFD. That's not an intuitive
interface. The LEVEL_IRQFD flag is specifying that the struct kvm_eoifd
includes an irqfd for a level triggered interrupt. AFAICT, there's no
reason to specify that except to clear it since the EOI notification is
not per source ID. However, it's still valid to ask for EOI
notification without binding it to a level irqfd, in which case it
really is just EOI notification. Thanks,

Alex

2012-06-29 15:12:15

by Alex Williamson

[permalink] [raw]
Subject: Re: [PATCH v2 5/6] kvm: KVM_EOIFD, an eventfd for EOIs

On Fri, 2012-06-29 at 09:09 -0600, Alex Williamson wrote:
> On Thu, 2012-06-28 at 22:29 +0300, Michael S. Tsirkin wrote:
> > On Tue, Jun 26, 2012 at 11:10:08PM -0600, Alex Williamson wrote:
> > > diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> > > index b216709..87a2558 100644
> > > --- a/Documentation/virtual/kvm/api.txt
> > > +++ b/Documentation/virtual/kvm/api.txt
> > > @@ -1987,6 +1987,30 @@ interrupts with those injected through KVM_IRQ_LINE. IRQFDs created
> > > with KVM_IRQFD_FLAG_LEVEL must also set this flag when de-assiging.
> > > KVM_IRQFD_FLAG_LEVEL support is indicated by KVM_CAP_IRQFD_LEVEL.
> > >
> > > +4.77 KVM_EOIFD
> > > +
> > > +Capability: KVM_CAP_EOIFD
> > > +Architectures: x86
> > > +Type: vm ioctl
> > > +Parameters: struct kvm_eoifd (in)
> > > +Returns: 0 on success, -1 on error
> > > +
> > > +KVM_EOIFD allows userspace to receive EOI notification through an
> > > +eventfd for level triggered irqchip interrupts. Behavior for edge
> > > +triggered interrupts is undefined. kvm_eoifd.fd specifies the eventfd
> > > +used for notification and kvm_eoifd.gsi specifies the irchip pin,
> > > +similar to KVM_IRQFD. KVM_EOIFD_FLAG_DEASSIGN is used to deassign
> > > +a previously enabled eoifd and should also set fd and gsi to match.
> > > +
> > > +The KVM_EOIFD_FLAG_LEVEL_IRQFD flag indicates that the EOI is for
> > > +a level triggered EOI and the kvm_eoifd structure includes
> > > +kvm_eoifd.irqfd, which must be previously configured using KVM_IRQFD
> > > +with the KVM_IRQFD_FLAG_LEVEL flag. This allows both EOI notification
> > > +through kvm_eoifd.fd as well as automatically de-asserting level
> > > +irqfds on EOI. Both KVM_EOIFD_FLAG_DEASSIGN and
> > > +KVM_EOIFD_FLAG_LEVEL_IRQFD should be used to de-assign an eoifd
> > > +initially setup with KVM_EOIFD_FLAG_LEVEL_IRQFD.
> > > +
> >
> > OK so thinking about this some more, does the below makes sense:
> > it is enough to have LEVEL either in IRQFD or in EOIFD but not both.
> > I weakly prefer it in EOIFD: when you bind it to irqfd you
> > also assign a source id to that irqfd.
> >
> > We also can make is a bit clearer: rename KVM_EOIFD_FLAG_LEVEL_IRQFD
> > to KVM_EOIFD_FLAG_LEVEL_CLEAR which makes it explicit that we clear
> > on EOI and that it is for a level interrupt. The fact that it needs
> > an irqfd is less important IMHO.
> > Make this flag mandatory for now, we'll see later how to handle
> > vcpu filtering and edge.
> >
> > How does it sound?
>
> Honestly, I don't like it at all. We're designing the interface
> assuming that we can't modify KVM_IRQFD flags. Let's do as Avi suggests
> and test whether KVM_IRQFD is a beyond broken first. Given that we do
> make use of 1 bit in the flags for de-assert, I'm optimistic that nobody

s/de-assert/de-assign/

> is leaving random garbage in the rest of it.
>
> Your idea may work, but I really don't like that KVM_EOIFD can reach
> across and change the behavior of KVM_IRQFD. That's not an intuitive
> interface. The LEVEL_IRQFD flag is specifying that the struct kvm_eoifd
> includes an irqfd for a level triggered interrupt. AFAICT, there's no
> reason to specify that except to clear it since the EOI notification is
> not per source ID. However, it's still valid to ask for EOI
> notification without binding it to a level irqfd, in which case it
> really is just EOI notification. Thanks,
>
> Alex
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html


2012-06-29 15:13:36

by Alex Williamson

[permalink] [raw]
Subject: Re: [PATCH v2 4/6] kvm: Extend irqfd to support level interrupts

On Thu, 2012-06-28 at 11:29 +0300, Michael S. Tsirkin wrote:
> On Wed, Jun 27, 2012 at 09:52:52PM -0600, Alex Williamson wrote:
> > On Thu, 2012-06-28 at 01:28 +0300, Michael S. Tsirkin wrote:
> > > On Wed, Jun 27, 2012 at 03:28:19PM -0600, Alex Williamson wrote:
> > > > On Thu, 2012-06-28 at 00:14 +0300, Michael S. Tsirkin wrote:
> > > > > On Wed, Jun 27, 2012 at 02:59:09PM -0600, Alex Williamson wrote:
> > > > > > On Wed, 2012-06-27 at 12:51 +0300, Michael S. Tsirkin wrote:
> > > > > > > On Tue, Jun 26, 2012 at 11:09:46PM -0600, Alex Williamson wrote:
> > > > > > > > In order to inject an interrupt from an external source using an
> > > > > > > > irqfd, we need to allocate a new irq_source_id. This allows us to
> > > > > > > > assert and (later) de-assert an interrupt line independently from
> > > > > > > > users of KVM_IRQ_LINE and avoid lost interrupts.
> > > > > > > >
> > > > > > > > We also add what may appear like a bit of excessive infrastructure
> > > > > > > > around an object for storing this irq_source_id. However, notice
> > > > > > > > that we only provide a way to assert the interrupt here. A follow-on
> > > > > > > > interface will make use of the same irq_source_id to allow de-assert.
> > > > > > > >
> > > > > > > > Signed-off-by: Alex Williamson <[email protected]>
> > > > > > > > ---
> > > > > > > >
> > > > > > > > Documentation/virtual/kvm/api.txt | 5 ++
> > > > > > > > arch/x86/kvm/x86.c | 1
> > > > > > > > include/linux/kvm.h | 3 +
> > > > > > > > virt/kvm/eventfd.c | 95 +++++++++++++++++++++++++++++++++++--
> > > > > > > > 4 files changed, 99 insertions(+), 5 deletions(-)
> > > > > > > >
> > > > > > > > diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> > > > > > > > index ea9edce..b216709 100644
> > > > > > > > --- a/Documentation/virtual/kvm/api.txt
> > > > > > > > +++ b/Documentation/virtual/kvm/api.txt
> > > > > > > > @@ -1981,6 +1981,11 @@ the guest using the specified gsi pin. The irqfd is removed using
> > > > > > > > the KVM_IRQFD_FLAG_DEASSIGN flag, specifying both kvm_irqfd.fd
> > > > > > > > and kvm_irqfd.gsi.
> > > > > > > >
> > > > > > > > +With KVM_IRQFD_FLAG_LEVEL KVM_IRQFD allocates a new IRQ source ID for
> > > > > > > > +the requested irqfd. This is necessary to share level triggered
> > > > > > > > +interrupts with those injected through KVM_IRQ_LINE. IRQFDs created
> > > > > > > > +with KVM_IRQFD_FLAG_LEVEL must also set this flag when de-assiging.
> > > > > > > > +KVM_IRQFD_FLAG_LEVEL support is indicated by KVM_CAP_IRQFD_LEVEL.
> > > > > > >
> > > > > > > Note that if my patch removing auto-deassert gets accepted,
> > > > > > > this is not needed at all: we can just look at the GSI
> > > > > > > to see if it's level or edge.
> > > > > >
> > > > > > I'm not sure this is a good idea. I know from vfio that I'm injecting a
> > > > > > level interrupt regardless of how the guest has the pic/ioapic
> > > > > > programmed at the time I'm calling this ioctl. Peeking across address
> > > > > > spaces to get to the right pin on the right pic/ioapic and see how it's
> > > > > > currently programmed seems fragile. Thanks,
> > > > > >
> > > > > > Alex
> > > > >
> > > > > Fragile? If you set eventfd as LEVEL but GSI is really edge then
> > > > > it all explodes, right? So why give users the option to shoot
> > > > > themselves in the foot?
> > > >
> > > > If the guest has the ioapic rte set to edge at the time I call KVM_IRQFD
> > > > to register my level interrupt then it all explodes, right? I'd rather
> > > > let the user shoot themselves than play Russian roulette with the guest.
> > > > Am I misunderstanding what you mean by looking that the GSI to see if
> > > > it's level or edge?
> > >
> > > Not sure.
> > > I simply mean this: if eventfd is bound to irqfd, set level from irqfd
> > > and clear from eventfd ack notifier.
> >
> > Are you simply saying assert (kvm_set_irq(,,,1)) from irqfd trigger and
> > de-assert (kvm_set_irq(,,,0)) from eventfd ack notifier (aka KVM_EOIFD)?
>
> Yes.
>
> > > There's no need for a special IRQ_LEVEL for this.
> >
> > That ignores the whole problem of when do we need to allocate a new
> > irq_source_id and when do we inject using KVM_USERSPACE_IRQ_SOURCE_ID.
> > We've already discussed that a level triggered, externally fired
> > interrupt must use a separate source ID from Qemu userspace. Therefore
> > when you say "look at the GSI to see if it's level or edge", I assume
> > you mean trace the gsi back to the pic/ioapic pin and look at the
> > trigger mode. That trigger mode is configured by the guest, so that
> > means that at the point in time when we call KVM_IRQFD we make a
> > determination based on how the _guest_ has programmed the ioapic. That
> > may not match the interrupt we expect to inject. On the other hand, the
> > user calling KVM_IRQFD absolutely knows the type of interrupt provided
> > by their device. I think we need a flag regardless of whether your
> > patch is accepted. We may be able to share the inject handler if it is
> > accepted, but it doesn't change the user API. Thanks,
> >
> > Alex
>
> This has merit, I am just looking for a way out
> without adding LEVEL flag which seems to duplicate
> what guest does, especially now it turns out
> we can't add new flags to IRQFD.

This hasn't been decided yet AFAICT.

2012-06-29 15:14:40

by Alex Williamson

[permalink] [raw]
Subject: Re: [PATCH v2 1/6] kvm: Pass kvm_irqfd to functions

On Thu, 2012-06-28 at 11:38 +0300, Michael S. Tsirkin wrote:
> On Wed, Jun 27, 2012 at 04:24:30PM +0200, Cornelia Huck wrote:
> > On Tue, 26 Jun 2012 23:09:04 -0600
> > Alex Williamson <[email protected]> wrote:
> >
> > > Prune this down to just the struct kvm_irqfd so we can avoid
> > > changing function definition for every flag or field we use.
> > >
> > > Signed-off-by: Alex Williamson <[email protected]>
> >
> > I'm currently trying to find a way to make irqfd workable for s390
> > which will likely include using a new field in kvm_irqfd, so I'd like
> > to have this change (and I also think it makes the code nicer to read).
> > So:
> >
> > Acked-by: Cornelia Huck <[email protected]>
>
> Unfortunately it looks like we are not sanitizing kvm_irqfd
> at all so we won't be able to use the padding :(
> We'll need a new ioctl instead.

I think you're jumping the gun on this decision.


2012-06-29 15:39:56

by Alex Williamson

[permalink] [raw]
Subject: Re: [PATCH v2 6/6] kvm: Level IRQ de-assert for KVM_IRQFD

On Thu, 2012-06-28 at 15:59 +0300, Avi Kivity wrote:
> On 06/27/2012 08:10 AM, Alex Williamson wrote:
> > This is an alternate level irqfd de-assert mode that's potentially
> > useful for emulated drivers. It's included here to show how easy it
> > is to implement with the new level irqfd and eoifd support. It's
> > possible this mode might also prove interesting for device-assignment
> > where we inject via level irqfd, receive an EOI (w/o de-assert), and
> > use the level de-assert irqfd here.
>
> This use case is racy. The guest driver will have shut down the
> interrupt before EOI, but with what you describe, it will fire again
> until the eoifd/deassertfd sequence completes.

Hmm, that's a good point. We'll continue asserting the interrupt in the
indeterminate gap between eoifd and de-assert-irqfd which could fire
enough times for the guest to disable it. Oh well. Thanks,

Alex

2012-06-29 22:28:31

by Alex Williamson

[permalink] [raw]
Subject: Re: [PATCH v2 4/6] kvm: Extend irqfd to support level interrupts

On Thu, 2012-06-28 at 11:46 +0300, Gleb Natapov wrote:
> On Thu, Jun 28, 2012 at 11:41:05AM +0300, Michael S. Tsirkin wrote:
> > On Thu, Jun 28, 2012 at 11:35:41AM +0300, Gleb Natapov wrote:
> > > On Thu, Jun 28, 2012 at 11:34:35AM +0300, Michael S. Tsirkin wrote:
> > > > On Thu, Jun 28, 2012 at 09:34:31AM +0300, Gleb Natapov wrote:
> > > > > On Thu, Jun 28, 2012 at 01:31:29AM +0300, Michael S. Tsirkin wrote:
> > > > > > On Wed, Jun 27, 2012 at 04:04:18PM -0600, Alex Williamson wrote:
> > > > > > > On Wed, 2012-06-27 at 18:26 +0300, Michael S. Tsirkin wrote:
> > > > > > > > On Tue, Jun 26, 2012 at 11:09:46PM -0600, Alex Williamson wrote:
> > > > > > > > > @@ -71,6 +130,14 @@ irqfd_inject(struct work_struct *work)
> > > > > > > > > kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd->gsi, 0);
> > > > > > > > > }
> > > > > > > > >
> > > > > > > > > +static void
> > > > > > > > > +irqfd_inject_level(struct work_struct *work)
> > > > > > > > > +{
> > > > > > > > > + struct _irqfd *irqfd = container_of(work, struct _irqfd, inject);
> > > > > > > > > +
> > > > > > > > > + kvm_set_irq(irqfd->kvm, irqfd->source->id, irqfd->gsi, 1);
> > > > > > > > > +}
> > > > > > > > > +
> > > > > > > > > /*
> > > > > > > > > * Race-free decouple logic (ordering is critical)
> > > > > > > > > */
> > > > > > > >
> > > > > > > >
> > > > > > > > Why is it safe to ignore return value here?
> > > > > > > > needs a comment.
> > > > > > >
> > > > > > > Well, it seems like you and Gleb came to the conclusion that it's safe,
> > > > > > > but I can really follow from the list thread. Can you explain and I'll
> > > > > > > add a comment? Thanks,
> > > > > > >
> > > > > > > Alex
> > > > > >
> > > > > > We merely talked about edge interrupts.
> > > > > >
> > > > > In fact it would have been nice to return -EBUSY when write() to level
> > > > > irqfd is coalesced.
> > > >
> > > > Possibly nice but not really practical.
> > > >
> > > What do you mean by that? Impossible to implement or not useful?
> >
> > Impossible to implement and also does not match normal eventfd
> > semantics.
> >
> Hmm, I remember we discussed using irqfd for level triggered interrupt ~2
> years ago and came to a conclusion that eventfd is a bad fit for it,
> was true than is true now. Not be able to detect coalescing will make
> irqfd level interrupts inferior to IRQ_LINE ioctl.

Why do we care about coalescing? I've been worried we need to re-inject
based on the return value of kvm_set_irq(), but re-reading specs and
code, we always post the interrupt to the irr. For device assignment we
don't really care if kvm_set_irq() managed to actually inject the
interrupt, we're happy as long as it eventually hits the vcpu. Current
device assignment uses kvm_set_irq() without looking for coalescing.
KVM_LINE_STATUS is the only caller that does something with the return
value and neither apic nor ioapic code in qemu do anything with the
value other than update accounting stats. What am I missing that makes
the return value worth knowing? Thanks,

Alex