2023-08-02 05:48:49

by Like Xu

[permalink] [raw]
Subject: [PATCH v2 0/2] KVM: irqbypass: XArray conversion and a deref fix

Hi,

When VMM(s) simultaneously create a large number of irqfds and register
their irqfds in the global consumers list, the global mutex contention
exponentially increases the average wait latency, which is no longer
tolerable on modern systems with a large number of CPU cores.

The patch set is intended to reduce this source of latency by
converting producers/consumers single linked list to XArray.

Please feel free to run more tests and share comments to move forward.

V1 -> V2 Changelog:
- Send the prerequisite fix as a series; (Alex W.)
- Keep producer and consumer connected and tracked; (Alex W.)
V1:
https://lore.kernel.org/kvm/[email protected]/
https://lore.kernel.org/kvm/[email protected]/

Like Xu (2):
KVM: eventfd: Fix NULL deref irqbypass producer
KVM: irqbypass: Convert producers/consumers linked list to
XArray

include/linux/irqbypass.h | 8 +--
virt/lib/irqbypass.c | 127 +++++++++++++++++++-------------------
2 files changed, 63 insertions(+), 72 deletions(-)


base-commit: 5a7591176c47cce363c1eed704241e5d1c42c5a6
--
2.41.0



2023-08-02 06:09:44

by Like Xu

[permalink] [raw]
Subject: [PATCH v2 1/2] KVM: eventfd: Fix NULL deref irqbypass producer

From: Like Xu <[email protected]>

Adding guard logic to make irq_bypass_register/unregister_producer()
looks for the producer entry based on producer pointer itself instead
of pure token matching.

As was attempted commit 4f3dbdf47e15 ("KVM: eventfd: fix NULL deref
irqbypass consumer"), two different producers may occasionally have two
identical eventfd's. In this case, the later producer may unregister
the previous one after the registration fails (since they share the same
token), then NULL deref incurres in the path of deleting producer from
the producers list.

Registration should also fail if a registered producer changes its
token and registers again via the same producer pointer.

Cc: Alex Williamson <[email protected]>
Signed-off-by: Like Xu <[email protected]>
Reviewed-by: Alex Williamson <[email protected]>
---
virt/lib/irqbypass.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/virt/lib/irqbypass.c b/virt/lib/irqbypass.c
index 28fda42e471b..e0aabbbf27ec 100644
--- a/virt/lib/irqbypass.c
+++ b/virt/lib/irqbypass.c
@@ -98,7 +98,7 @@ int irq_bypass_register_producer(struct irq_bypass_producer *producer)
mutex_lock(&lock);

list_for_each_entry(tmp, &producers, node) {
- if (tmp->token == producer->token) {
+ if (tmp->token == producer->token || tmp == producer) {
ret = -EBUSY;
goto out_err;
}
@@ -148,7 +148,7 @@ void irq_bypass_unregister_producer(struct irq_bypass_producer *producer)
mutex_lock(&lock);

list_for_each_entry(tmp, &producers, node) {
- if (tmp->token != producer->token)
+ if (tmp != producer)
continue;

list_for_each_entry(consumer, &consumers, node) {
--
2.41.0


2023-08-02 07:18:47

by Like Xu

[permalink] [raw]
Subject: [PATCH v2 2/2] KVM: irqbypass: Convert producers/consumers single linked list to XArray

From: Like Xu <[email protected]>

Replace producers/consumers linked list with XArray. There are no changes
in functionality, but lookup performance has been improved.

The producers and consumers in current IRQ bypass manager are stored in
simple linked lists, and a single mutex is held while traversing the lists
and connecting a consumer to a producer (and vice versa). With this design
and implementation, if there are a large number of KVM agents concurrently
creating irqfds and all requesting to register their irqfds in the global
consumers list, the global mutex contention will exponentially increase
the avg wait latency, which is no longer tolerable in modern systems with
a large number of CPU cores. For example:

the wait time latency to acquire the mutex in a stress test where 174000
irqfds were created concurrently on an 2.70GHz ICX w/ 144 cores:

- avg = 117.855314 ms
- min = 20 ns
- max = 11428.340858 ms

To reduce latency introduced by the irq_bypass_register_consumer() in
the above usage scenario, the data structure XArray and its normal API
is applied to track the producers and consumers so that lookups don't
require a linear walk since the "tokens" used to match producers and
consumers are just kernel pointers.

Thanks to the nature of XArray (more memory-efficient, parallelisable
and cache friendly), the latecny is significantly reduced (compared to
list and hlist proposal) under the same environment and testing:

- avg = 314 ns
- min = 124 ns
- max = 47637 ns

In this conversion, the non-NULL opaque token to match between producer
and consumer () is used as the XArray index. The list_for_each_entry() is
replaced by xa_load(), and list_add/del() is replaced by xa_store/erase().
The list_head member for linked list is removed, along with comments.

Cc: Alex Williamson <[email protected]>
Reported-by: Yong He <[email protected]>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217379
Suggested-by: Sean Christopherson <[email protected]>
Suggested-by: Paolo Bonzini <[email protected]>
Signed-off-by: Like Xu <[email protected]>
---
include/linux/irqbypass.h | 8 +--
virt/lib/irqbypass.c | 127 +++++++++++++++++++-------------------
2 files changed, 63 insertions(+), 72 deletions(-)

diff --git a/include/linux/irqbypass.h b/include/linux/irqbypass.h
index 9bdb2a781841..dbcc1b4d0ccf 100644
--- a/include/linux/irqbypass.h
+++ b/include/linux/irqbypass.h
@@ -8,14 +8,12 @@
#ifndef IRQBYPASS_H
#define IRQBYPASS_H

-#include <linux/list.h>
-
struct irq_bypass_consumer;

/*
* Theory of operation
*
- * The IRQ bypass manager is a simple set of lists and callbacks that allows
+ * The IRQ bypass manager is a simple set of xarrays and callbacks that allows
* IRQ producers (ex. physical interrupt sources) to be matched to IRQ
* consumers (ex. virtualization hardware that allows IRQ bypass or offload)
* via a shared token (ex. eventfd_ctx). Producers and consumers register
@@ -30,7 +28,6 @@ struct irq_bypass_consumer;

/**
* struct irq_bypass_producer - IRQ bypass producer definition
- * @node: IRQ bypass manager private list management
* @token: opaque token to match between producer and consumer (non-NULL)
* @irq: Linux IRQ number for the producer device
* @add_consumer: Connect the IRQ producer to an IRQ consumer (optional)
@@ -43,7 +40,6 @@ struct irq_bypass_consumer;
* for a physical device assigned to a VM.
*/
struct irq_bypass_producer {
- struct list_head node;
void *token;
int irq;
int (*add_consumer)(struct irq_bypass_producer *,
@@ -56,7 +52,6 @@ struct irq_bypass_producer {

/**
* struct irq_bypass_consumer - IRQ bypass consumer definition
- * @node: IRQ bypass manager private list management
* @token: opaque token to match between producer and consumer (non-NULL)
* @add_producer: Connect the IRQ consumer to an IRQ producer
* @del_producer: Disconnect the IRQ consumer from an IRQ producer
@@ -69,7 +64,6 @@ struct irq_bypass_producer {
* portions of the interrupt handling to the VM.
*/
struct irq_bypass_consumer {
- struct list_head node;
void *token;
int (*add_producer)(struct irq_bypass_consumer *,
struct irq_bypass_producer *);
diff --git a/virt/lib/irqbypass.c b/virt/lib/irqbypass.c
index e0aabbbf27ec..3f8736951e92 100644
--- a/virt/lib/irqbypass.c
+++ b/virt/lib/irqbypass.c
@@ -15,15 +15,15 @@
*/

#include <linux/irqbypass.h>
-#include <linux/list.h>
#include <linux/module.h>
#include <linux/mutex.h>
+#include <linux/xarray.h>

MODULE_LICENSE("GPL v2");
MODULE_DESCRIPTION("IRQ bypass manager utility module");

-static LIST_HEAD(producers);
-static LIST_HEAD(consumers);
+static DEFINE_XARRAY(producers);
+static DEFINE_XARRAY(consumers);
static DEFINE_MUTEX(lock);

/* @lock must be held when calling connect */
@@ -78,11 +78,12 @@ static void __disconnect(struct irq_bypass_producer *prod,
* irq_bypass_register_producer - register IRQ bypass producer
* @producer: pointer to producer structure
*
- * Add the provided IRQ producer to the list of producers and connect
- * with any matching token found on the IRQ consumers list.
+ * Add the provided IRQ producer to the xarray of producers and connect
+ * with any matching token found on the IRQ consumers xarray.
*/
int irq_bypass_register_producer(struct irq_bypass_producer *producer)
{
+ unsigned long token = (unsigned long)producer->token;
struct irq_bypass_producer *tmp;
struct irq_bypass_consumer *consumer;
int ret;
@@ -97,24 +98,23 @@ int irq_bypass_register_producer(struct irq_bypass_producer *producer)

mutex_lock(&lock);

- list_for_each_entry(tmp, &producers, node) {
- if (tmp->token == producer->token || tmp == producer) {
- ret = -EBUSY;
+ tmp = xa_load(&producers, token);
+ if (tmp || tmp == producer) {
+ ret = -EBUSY;
+ goto out_err;
+ }
+
+ ret = xa_err(xa_store(&producers, token, producer, GFP_KERNEL));
+ if (ret)
+ goto out_err;
+
+ consumer = xa_load(&consumers, token);
+ if (consumer) {
+ ret = __connect(producer, consumer);
+ if (ret)
goto out_err;
- }
}

- list_for_each_entry(consumer, &consumers, node) {
- if (consumer->token == producer->token) {
- ret = __connect(producer, consumer);
- if (ret)
- goto out_err;
- break;
- }
- }
-
- list_add(&producer->node, &producers);
-
mutex_unlock(&lock);

return 0;
@@ -129,11 +129,12 @@ EXPORT_SYMBOL_GPL(irq_bypass_register_producer);
* irq_bypass_unregister_producer - unregister IRQ bypass producer
* @producer: pointer to producer structure
*
- * Remove a previously registered IRQ producer from the list of producers
+ * Remove a previously registered IRQ producer from the xarray of producers
* and disconnect it from any connected IRQ consumer.
*/
void irq_bypass_unregister_producer(struct irq_bypass_producer *producer)
{
+ unsigned long token = (unsigned long)producer->token;
struct irq_bypass_producer *tmp;
struct irq_bypass_consumer *consumer;

@@ -143,24 +144,18 @@ void irq_bypass_unregister_producer(struct irq_bypass_producer *producer)
might_sleep();

if (!try_module_get(THIS_MODULE))
- return; /* nothing in the list anyway */
+ return; /* nothing in the xarray anyway */

mutex_lock(&lock);

- list_for_each_entry(tmp, &producers, node) {
- if (tmp != producer)
- continue;
+ tmp = xa_load(&producers, token);
+ if (tmp == producer) {
+ consumer = xa_load(&consumers, token);
+ if (consumer)
+ __disconnect(producer, consumer);

- list_for_each_entry(consumer, &consumers, node) {
- if (consumer->token == producer->token) {
- __disconnect(producer, consumer);
- break;
- }
- }
-
- list_del(&producer->node);
+ xa_erase(&producers, token);
module_put(THIS_MODULE);
- break;
}

mutex_unlock(&lock);
@@ -173,11 +168,12 @@ EXPORT_SYMBOL_GPL(irq_bypass_unregister_producer);
* irq_bypass_register_consumer - register IRQ bypass consumer
* @consumer: pointer to consumer structure
*
- * Add the provided IRQ consumer to the list of consumers and connect
- * with any matching token found on the IRQ producer list.
+ * Add the provided IRQ consumer to the xarray of consumers and connect
+ * with any matching token found on the IRQ producer xarray.
*/
int irq_bypass_register_consumer(struct irq_bypass_consumer *consumer)
{
+ unsigned long token = (unsigned long)consumer->token;
struct irq_bypass_consumer *tmp;
struct irq_bypass_producer *producer;
int ret;
@@ -193,24 +189,23 @@ int irq_bypass_register_consumer(struct irq_bypass_consumer *consumer)

mutex_lock(&lock);

- list_for_each_entry(tmp, &consumers, node) {
- if (tmp->token == consumer->token || tmp == consumer) {
- ret = -EBUSY;
+ tmp = xa_load(&consumers, token);
+ if (tmp || tmp == consumer) {
+ ret = -EBUSY;
+ goto out_err;
+ }
+
+ ret = xa_err(xa_store(&consumers, token, consumer, GFP_KERNEL));
+ if (ret)
+ goto out_err;
+
+ producer = xa_load(&producers, token);
+ if (producer) {
+ ret = __connect(producer, consumer);
+ if (ret)
goto out_err;
- }
}

- list_for_each_entry(producer, &producers, node) {
- if (producer->token == consumer->token) {
- ret = __connect(producer, consumer);
- if (ret)
- goto out_err;
- break;
- }
- }
-
- list_add(&consumer->node, &consumers);
-
mutex_unlock(&lock);

return 0;
@@ -225,11 +220,12 @@ EXPORT_SYMBOL_GPL(irq_bypass_register_consumer);
* irq_bypass_unregister_consumer - unregister IRQ bypass consumer
* @consumer: pointer to consumer structure
*
- * Remove a previously registered IRQ consumer from the list of consumers
+ * Remove a previously registered IRQ consumer from the xarray of consumers
* and disconnect it from any connected IRQ producer.
*/
void irq_bypass_unregister_consumer(struct irq_bypass_consumer *consumer)
{
+ unsigned long token = (unsigned long)consumer->token;
struct irq_bypass_consumer *tmp;
struct irq_bypass_producer *producer;

@@ -239,24 +235,18 @@ void irq_bypass_unregister_consumer(struct irq_bypass_consumer *consumer)
might_sleep();

if (!try_module_get(THIS_MODULE))
- return; /* nothing in the list anyway */
+ return; /* nothing in the xarray anyway */

mutex_lock(&lock);

- list_for_each_entry(tmp, &consumers, node) {
- if (tmp != consumer)
- continue;
+ tmp = xa_load(&consumers, token);
+ if (tmp == consumer) {
+ producer = xa_load(&producers, token);
+ if (producer)
+ __disconnect(producer, consumer);

- list_for_each_entry(producer, &producers, node) {
- if (producer->token == consumer->token) {
- __disconnect(producer, consumer);
- break;
- }
- }
-
- list_del(&consumer->node);
+ xa_erase(&consumers, token);
module_put(THIS_MODULE);
- break;
}

mutex_unlock(&lock);
@@ -264,3 +254,10 @@ void irq_bypass_unregister_consumer(struct irq_bypass_consumer *consumer)
module_put(THIS_MODULE);
}
EXPORT_SYMBOL_GPL(irq_bypass_unregister_consumer);
+
+static void __exit irqbypass_exit(void)
+{
+ xa_destroy(&producers);
+ xa_destroy(&consumers);
+}
+module_exit(irqbypass_exit);
--
2.41.0


2023-08-02 18:54:09

by Alex Williamson

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] KVM: irqbypass: Convert producers/consumers single linked list to XArray

On Wed, 2 Aug 2023 13:17:00 +0800
Like Xu <[email protected]> wrote:

> From: Like Xu <[email protected]>
>
> Replace producers/consumers linked list with XArray. There are no changes
> in functionality, but lookup performance has been improved.
>
> The producers and consumers in current IRQ bypass manager are stored in
> simple linked lists, and a single mutex is held while traversing the lists
> and connecting a consumer to a producer (and vice versa). With this design
> and implementation, if there are a large number of KVM agents concurrently
> creating irqfds and all requesting to register their irqfds in the global
> consumers list, the global mutex contention will exponentially increase
> the avg wait latency, which is no longer tolerable in modern systems with
> a large number of CPU cores. For example:
>
> the wait time latency to acquire the mutex in a stress test where 174000
> irqfds were created concurrently on an 2.70GHz ICX w/ 144 cores:
>
> - avg = 117.855314 ms
> - min = 20 ns
> - max = 11428.340858 ms
>
> To reduce latency introduced by the irq_bypass_register_consumer() in
> the above usage scenario, the data structure XArray and its normal API
> is applied to track the producers and consumers so that lookups don't
> require a linear walk since the "tokens" used to match producers and
> consumers are just kernel pointers.
>
> Thanks to the nature of XArray (more memory-efficient, parallelisable
> and cache friendly), the latecny is significantly reduced (compared to
> list and hlist proposal) under the same environment and testing:
>
> - avg = 314 ns
> - min = 124 ns
> - max = 47637 ns
>
> In this conversion, the non-NULL opaque token to match between producer
> and consumer () is used as the XArray index. The list_for_each_entry() is
> replaced by xa_load(), and list_add/del() is replaced by xa_store/erase().
> The list_head member for linked list is removed, along with comments.
>
> Cc: Alex Williamson <[email protected]>
> Reported-by: Yong He <[email protected]>
> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217379
> Suggested-by: Sean Christopherson <[email protected]>
> Suggested-by: Paolo Bonzini <[email protected]>
> Signed-off-by: Like Xu <[email protected]>
> ---
> include/linux/irqbypass.h | 8 +--
> virt/lib/irqbypass.c | 127 +++++++++++++++++++-------------------
> 2 files changed, 63 insertions(+), 72 deletions(-)
>
> diff --git a/include/linux/irqbypass.h b/include/linux/irqbypass.h
> index 9bdb2a781841..dbcc1b4d0ccf 100644
> --- a/include/linux/irqbypass.h
> +++ b/include/linux/irqbypass.h
> @@ -8,14 +8,12 @@
> #ifndef IRQBYPASS_H
> #define IRQBYPASS_H
>
> -#include <linux/list.h>
> -
> struct irq_bypass_consumer;
>
> /*
> * Theory of operation
> *
> - * The IRQ bypass manager is a simple set of lists and callbacks that allows
> + * The IRQ bypass manager is a simple set of xarrays and callbacks that allows
> * IRQ producers (ex. physical interrupt sources) to be matched to IRQ
> * consumers (ex. virtualization hardware that allows IRQ bypass or offload)
> * via a shared token (ex. eventfd_ctx). Producers and consumers register
> @@ -30,7 +28,6 @@ struct irq_bypass_consumer;
>
> /**
> * struct irq_bypass_producer - IRQ bypass producer definition
> - * @node: IRQ bypass manager private list management
> * @token: opaque token to match between producer and consumer (non-NULL)
> * @irq: Linux IRQ number for the producer device
> * @add_consumer: Connect the IRQ producer to an IRQ consumer (optional)
> @@ -43,7 +40,6 @@ struct irq_bypass_consumer;
> * for a physical device assigned to a VM.
> */
> struct irq_bypass_producer {
> - struct list_head node;
> void *token;
> int irq;
> int (*add_consumer)(struct irq_bypass_producer *,
> @@ -56,7 +52,6 @@ struct irq_bypass_producer {
>
> /**
> * struct irq_bypass_consumer - IRQ bypass consumer definition
> - * @node: IRQ bypass manager private list management
> * @token: opaque token to match between producer and consumer (non-NULL)
> * @add_producer: Connect the IRQ consumer to an IRQ producer
> * @del_producer: Disconnect the IRQ consumer from an IRQ producer
> @@ -69,7 +64,6 @@ struct irq_bypass_producer {
> * portions of the interrupt handling to the VM.
> */
> struct irq_bypass_consumer {
> - struct list_head node;
> void *token;
> int (*add_producer)(struct irq_bypass_consumer *,
> struct irq_bypass_producer *);
> diff --git a/virt/lib/irqbypass.c b/virt/lib/irqbypass.c
> index e0aabbbf27ec..3f8736951e92 100644
> --- a/virt/lib/irqbypass.c
> +++ b/virt/lib/irqbypass.c
> @@ -15,15 +15,15 @@
> */
>
> #include <linux/irqbypass.h>
> -#include <linux/list.h>
> #include <linux/module.h>
> #include <linux/mutex.h>
> +#include <linux/xarray.h>
>
> MODULE_LICENSE("GPL v2");
> MODULE_DESCRIPTION("IRQ bypass manager utility module");
>
> -static LIST_HEAD(producers);
> -static LIST_HEAD(consumers);
> +static DEFINE_XARRAY(producers);
> +static DEFINE_XARRAY(consumers);
> static DEFINE_MUTEX(lock);
>
> /* @lock must be held when calling connect */
> @@ -78,11 +78,12 @@ static void __disconnect(struct irq_bypass_producer *prod,
> * irq_bypass_register_producer - register IRQ bypass producer
> * @producer: pointer to producer structure
> *
> - * Add the provided IRQ producer to the list of producers and connect
> - * with any matching token found on the IRQ consumers list.
> + * Add the provided IRQ producer to the xarray of producers and connect
> + * with any matching token found on the IRQ consumers xarray.
> */
> int irq_bypass_register_producer(struct irq_bypass_producer *producer)
> {
> + unsigned long token = (unsigned long)producer->token;
> struct irq_bypass_producer *tmp;
> struct irq_bypass_consumer *consumer;
> int ret;
> @@ -97,24 +98,23 @@ int irq_bypass_register_producer(struct irq_bypass_producer *producer)
>
> mutex_lock(&lock);
>
> - list_for_each_entry(tmp, &producers, node) {
> - if (tmp->token == producer->token || tmp == producer) {
> - ret = -EBUSY;
> + tmp = xa_load(&producers, token);
> + if (tmp || tmp == producer) {
> + ret = -EBUSY;
> + goto out_err;
> + }
> +
> + ret = xa_err(xa_store(&producers, token, producer, GFP_KERNEL));
> + if (ret)
> + goto out_err;
> +
> + consumer = xa_load(&consumers, token);
> + if (consumer) {
> + ret = __connect(producer, consumer);
> + if (ret)
> goto out_err;

This doesn't match previous behavior, the producer is registered to the
xarray regardless of the result of the connect operation and the caller
cannot distinguish between failures. The module reference is released
regardless of xarray item. Nak.

> - }
> }
>
> - list_for_each_entry(consumer, &consumers, node) {
> - if (consumer->token == producer->token) {
> - ret = __connect(producer, consumer);
> - if (ret)
> - goto out_err;
> - break;
> - }
> - }
> -
> - list_add(&producer->node, &producers);
> -
> mutex_unlock(&lock);
>
> return 0;
> @@ -129,11 +129,12 @@ EXPORT_SYMBOL_GPL(irq_bypass_register_producer);
> * irq_bypass_unregister_producer - unregister IRQ bypass producer
> * @producer: pointer to producer structure
> *
> - * Remove a previously registered IRQ producer from the list of producers
> + * Remove a previously registered IRQ producer from the xarray of producers
> * and disconnect it from any connected IRQ consumer.
> */
> void irq_bypass_unregister_producer(struct irq_bypass_producer *producer)
> {
> + unsigned long token = (unsigned long)producer->token;
> struct irq_bypass_producer *tmp;
> struct irq_bypass_consumer *consumer;
>
> @@ -143,24 +144,18 @@ void irq_bypass_unregister_producer(struct irq_bypass_producer *producer)
> might_sleep();
>
> if (!try_module_get(THIS_MODULE))
> - return; /* nothing in the list anyway */
> + return; /* nothing in the xarray anyway */
>
> mutex_lock(&lock);
>
> - list_for_each_entry(tmp, &producers, node) {
> - if (tmp != producer)
> - continue;
> + tmp = xa_load(&producers, token);
> + if (tmp == producer) {
> + consumer = xa_load(&consumers, token);
> + if (consumer)
> + __disconnect(producer, consumer);
>
> - list_for_each_entry(consumer, &consumers, node) {
> - if (consumer->token == producer->token) {
> - __disconnect(producer, consumer);
> - break;
> - }
> - }
> -
> - list_del(&producer->node);
> + xa_erase(&producers, token);
> module_put(THIS_MODULE);
> - break;
> }
>
> mutex_unlock(&lock);
> @@ -173,11 +168,12 @@ EXPORT_SYMBOL_GPL(irq_bypass_unregister_producer);
> * irq_bypass_register_consumer - register IRQ bypass consumer
> * @consumer: pointer to consumer structure
> *
> - * Add the provided IRQ consumer to the list of consumers and connect
> - * with any matching token found on the IRQ producer list.
> + * Add the provided IRQ consumer to the xarray of consumers and connect
> + * with any matching token found on the IRQ producer xarray.
> */
> int irq_bypass_register_consumer(struct irq_bypass_consumer *consumer)
> {
> + unsigned long token = (unsigned long)consumer->token;
> struct irq_bypass_consumer *tmp;
> struct irq_bypass_producer *producer;
> int ret;
> @@ -193,24 +189,23 @@ int irq_bypass_register_consumer(struct irq_bypass_consumer *consumer)
>
> mutex_lock(&lock);
>
> - list_for_each_entry(tmp, &consumers, node) {
> - if (tmp->token == consumer->token || tmp == consumer) {
> - ret = -EBUSY;
> + tmp = xa_load(&consumers, token);
> + if (tmp || tmp == consumer) {
> + ret = -EBUSY;
> + goto out_err;
> + }
> +
> + ret = xa_err(xa_store(&consumers, token, consumer, GFP_KERNEL));
> + if (ret)
> + goto out_err;
> +
> + producer = xa_load(&producers, token);
> + if (producer) {
> + ret = __connect(producer, consumer);
> + if (ret)
> goto out_err;

Same. Thanks,

Alex

> - }
> }
>
> - list_for_each_entry(producer, &producers, node) {
> - if (producer->token == consumer->token) {
> - ret = __connect(producer, consumer);
> - if (ret)
> - goto out_err;
> - break;
> - }
> - }
> -
> - list_add(&consumer->node, &consumers);
> -
> mutex_unlock(&lock);
>
> return 0;
> @@ -225,11 +220,12 @@ EXPORT_SYMBOL_GPL(irq_bypass_register_consumer);
> * irq_bypass_unregister_consumer - unregister IRQ bypass consumer
> * @consumer: pointer to consumer structure
> *
> - * Remove a previously registered IRQ consumer from the list of consumers
> + * Remove a previously registered IRQ consumer from the xarray of consumers
> * and disconnect it from any connected IRQ producer.
> */
> void irq_bypass_unregister_consumer(struct irq_bypass_consumer *consumer)
> {
> + unsigned long token = (unsigned long)consumer->token;
> struct irq_bypass_consumer *tmp;
> struct irq_bypass_producer *producer;
>
> @@ -239,24 +235,18 @@ void irq_bypass_unregister_consumer(struct irq_bypass_consumer *consumer)
> might_sleep();
>
> if (!try_module_get(THIS_MODULE))
> - return; /* nothing in the list anyway */
> + return; /* nothing in the xarray anyway */
>
> mutex_lock(&lock);
>
> - list_for_each_entry(tmp, &consumers, node) {
> - if (tmp != consumer)
> - continue;
> + tmp = xa_load(&consumers, token);
> + if (tmp == consumer) {
> + producer = xa_load(&producers, token);
> + if (producer)
> + __disconnect(producer, consumer);
>
> - list_for_each_entry(producer, &producers, node) {
> - if (producer->token == consumer->token) {
> - __disconnect(producer, consumer);
> - break;
> - }
> - }
> -
> - list_del(&consumer->node);
> + xa_erase(&consumers, token);
> module_put(THIS_MODULE);
> - break;
> }
>
> mutex_unlock(&lock);
> @@ -264,3 +254,10 @@ void irq_bypass_unregister_consumer(struct irq_bypass_consumer *consumer)
> module_put(THIS_MODULE);
> }
> EXPORT_SYMBOL_GPL(irq_bypass_unregister_consumer);
> +
> +static void __exit irqbypass_exit(void)
> +{
> + xa_destroy(&producers);
> + xa_destroy(&consumers);
> +}
> +module_exit(irqbypass_exit);


2023-09-25 11:45:02

by Like Xu

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] KVM: eventfd: Fix NULL deref irqbypass producer

Hi Alex, what do you think of Sean's proposal diff below ?

By the way, could anyone to accept patch 1/2, thanks.

On 17/8/2023 2:37 am, Sean Christopherson wrote:
> On Wed, Aug 02, 2023, Like Xu wrote:
>> From: Like Xu <[email protected]>
>>
>> Adding guard logic to make irq_bypass_register/unregister_producer()
>> looks for the producer entry based on producer pointer itself instead
>> of pure token matching.
>>
>> As was attempted commit 4f3dbdf47e15 ("KVM: eventfd: fix NULL deref
>> irqbypass consumer"), two different producers may occasionally have two
>> identical eventfd's. In this case, the later producer may unregister
>> the previous one after the registration fails (since they share the same
>> token), then NULL deref incurres in the path of deleting producer from
>> the producers list.
>>
>> Registration should also fail if a registered producer changes its
>> token and registers again via the same producer pointer.
>>
>> Cc: Alex Williamson <[email protected]>
>> Signed-off-by: Like Xu <[email protected]>
>> Reviewed-by: Alex Williamson <[email protected]>
>> ---
>> virt/lib/irqbypass.c | 4 ++--
>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/virt/lib/irqbypass.c b/virt/lib/irqbypass.c
>> index 28fda42e471b..e0aabbbf27ec 100644
>> --- a/virt/lib/irqbypass.c
>> +++ b/virt/lib/irqbypass.c
>> @@ -98,7 +98,7 @@ int irq_bypass_register_producer(struct irq_bypass_producer *producer)
>> mutex_lock(&lock);
>>
>> list_for_each_entry(tmp, &producers, node) {
>> - if (tmp->token == producer->token) {
>> + if (tmp->token == producer->token || tmp == producer) {
>> ret = -EBUSY;
>> goto out_err;
>> }
>> @@ -148,7 +148,7 @@ void irq_bypass_unregister_producer(struct irq_bypass_producer *producer)
>> mutex_lock(&lock);
>>
>> list_for_each_entry(tmp, &producers, node) {
>> - if (tmp->token != producer->token)
>> + if (tmp != producer)
>
> What are the rules for using these APIs? E.g. is doing unregister without
> first doing a register actually allowed? Ditto for having multiple in-flight
> calls to (un)register the exact same producer or consumer.
>
> E.g. can we do something like the below, and then remove the list iteration to
> find the passed in pointer (which is super odd IMO). Obviously not a blocker
> for this patch, but it seems like we could achieve a simpler and more performant
> implementation if we first sanitize the rules and the usage.
>
> diff --git a/virt/lib/irqbypass.c b/virt/lib/irqbypass.c
> index 28fda42e471b..be0ba4224a23 100644
> --- a/virt/lib/irqbypass.c
> +++ b/virt/lib/irqbypass.c
> @@ -90,6 +90,9 @@ int irq_bypass_register_producer(struct irq_bypass_producer *producer)
> if (!producer->token)
> return -EINVAL;
>
> + if (WARN_ON_ONCE(producer->node.prev && !list_empty(&producer->node)))
> + return -EINVAL;
> +
> might_sleep();
>
> if (!try_module_get(THIS_MODULE))
> @@ -140,6 +143,9 @@ void irq_bypass_unregister_producer(struct irq_bypass_producer *producer)
> if (!producer->token)
> return;
>
> + if (WARN_ON_ONCE(!producer->node.prev || list_empty(&producer->node)))
> + return;
> +
> might_sleep();
>
> if (!try_module_get(THIS_MODULE))
>

2023-09-25 22:02:59

by Like Xu

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] KVM: irqbypass: Convert producers/consumers single linked list to XArray

On 2023/8/3 02:30, Alex Williamson wrote:
> On Wed, 2 Aug 2023 13:17:00 +0800
> Like Xu <[email protected]> wrote:
>
>> From: Like Xu <[email protected]>
>>
>> Replace producers/consumers linked list with XArray. There are no changes
>> in functionality, but lookup performance has been improved.
>>
>> The producers and consumers in current IRQ bypass manager are stored in
>> simple linked lists, and a single mutex is held while traversing the lists
>> and connecting a consumer to a producer (and vice versa). With this design
>> and implementation, if there are a large number of KVM agents concurrently
>> creating irqfds and all requesting to register their irqfds in the global
>> consumers list, the global mutex contention will exponentially increase
>> the avg wait latency, which is no longer tolerable in modern systems with
>> a large number of CPU cores. For example:
>>
>> the wait time latency to acquire the mutex in a stress test where 174000
>> irqfds were created concurrently on an 2.70GHz ICX w/ 144 cores:
>>
>> - avg = 117.855314 ms
>> - min = 20 ns
>> - max = 11428.340858 ms
>>
>> To reduce latency introduced by the irq_bypass_register_consumer() in
>> the above usage scenario, the data structure XArray and its normal API
>> is applied to track the producers and consumers so that lookups don't
>> require a linear walk since the "tokens" used to match producers and
>> consumers are just kernel pointers.
>>
>> Thanks to the nature of XArray (more memory-efficient, parallelisable
>> and cache friendly), the latecny is significantly reduced (compared to
>> list and hlist proposal) under the same environment and testing:
>>
>> - avg = 314 ns
>> - min = 124 ns
>> - max = 47637 ns
>>
>> In this conversion, the non-NULL opaque token to match between producer
>> and consumer () is used as the XArray index. The list_for_each_entry() is
>> replaced by xa_load(), and list_add/del() is replaced by xa_store/erase().
>> The list_head member for linked list is removed, along with comments.
>>
>> Cc: Alex Williamson <[email protected]>
>> Reported-by: Yong He <[email protected]>
>> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217379
>> Suggested-by: Sean Christopherson <[email protected]>
>> Suggested-by: Paolo Bonzini <[email protected]>
>> Signed-off-by: Like Xu <[email protected]>
>> ---
>> include/linux/irqbypass.h | 8 +--
>> virt/lib/irqbypass.c | 127 +++++++++++++++++++-------------------
>> 2 files changed, 63 insertions(+), 72 deletions(-)
>>
>> diff --git a/include/linux/irqbypass.h b/include/linux/irqbypass.h
>> index 9bdb2a781841..dbcc1b4d0ccf 100644
>> --- a/include/linux/irqbypass.h
>> +++ b/include/linux/irqbypass.h
>> @@ -8,14 +8,12 @@
>> #ifndef IRQBYPASS_H
>> #define IRQBYPASS_H
>>
>> -#include <linux/list.h>
>> -
>> struct irq_bypass_consumer;
>>
>> /*
>> * Theory of operation
>> *
>> - * The IRQ bypass manager is a simple set of lists and callbacks that allows
>> + * The IRQ bypass manager is a simple set of xarrays and callbacks that allows
>> * IRQ producers (ex. physical interrupt sources) to be matched to IRQ
>> * consumers (ex. virtualization hardware that allows IRQ bypass or offload)
>> * via a shared token (ex. eventfd_ctx). Producers and consumers register
>> @@ -30,7 +28,6 @@ struct irq_bypass_consumer;
>>
>> /**
>> * struct irq_bypass_producer - IRQ bypass producer definition
>> - * @node: IRQ bypass manager private list management
>> * @token: opaque token to match between producer and consumer (non-NULL)
>> * @irq: Linux IRQ number for the producer device
>> * @add_consumer: Connect the IRQ producer to an IRQ consumer (optional)
>> @@ -43,7 +40,6 @@ struct irq_bypass_consumer;
>> * for a physical device assigned to a VM.
>> */
>> struct irq_bypass_producer {
>> - struct list_head node;
>> void *token;
>> int irq;
>> int (*add_consumer)(struct irq_bypass_producer *,
>> @@ -56,7 +52,6 @@ struct irq_bypass_producer {
>>
>> /**
>> * struct irq_bypass_consumer - IRQ bypass consumer definition
>> - * @node: IRQ bypass manager private list management
>> * @token: opaque token to match between producer and consumer (non-NULL)
>> * @add_producer: Connect the IRQ consumer to an IRQ producer
>> * @del_producer: Disconnect the IRQ consumer from an IRQ producer
>> @@ -69,7 +64,6 @@ struct irq_bypass_producer {
>> * portions of the interrupt handling to the VM.
>> */
>> struct irq_bypass_consumer {
>> - struct list_head node;
>> void *token;
>> int (*add_producer)(struct irq_bypass_consumer *,
>> struct irq_bypass_producer *);
>> diff --git a/virt/lib/irqbypass.c b/virt/lib/irqbypass.c
>> index e0aabbbf27ec..3f8736951e92 100644
>> --- a/virt/lib/irqbypass.c
>> +++ b/virt/lib/irqbypass.c
>> @@ -15,15 +15,15 @@
>> */
>>
>> #include <linux/irqbypass.h>
>> -#include <linux/list.h>
>> #include <linux/module.h>
>> #include <linux/mutex.h>
>> +#include <linux/xarray.h>
>>
>> MODULE_LICENSE("GPL v2");
>> MODULE_DESCRIPTION("IRQ bypass manager utility module");
>>
>> -static LIST_HEAD(producers);
>> -static LIST_HEAD(consumers);
>> +static DEFINE_XARRAY(producers);
>> +static DEFINE_XARRAY(consumers);
>> static DEFINE_MUTEX(lock);
>>
>> /* @lock must be held when calling connect */
>> @@ -78,11 +78,12 @@ static void __disconnect(struct irq_bypass_producer *prod,
>> * irq_bypass_register_producer - register IRQ bypass producer
>> * @producer: pointer to producer structure
>> *
>> - * Add the provided IRQ producer to the list of producers and connect
>> - * with any matching token found on the IRQ consumers list.
>> + * Add the provided IRQ producer to the xarray of producers and connect
>> + * with any matching token found on the IRQ consumers xarray.
>> */
>> int irq_bypass_register_producer(struct irq_bypass_producer *producer)
>> {
>> + unsigned long token = (unsigned long)producer->token;
>> struct irq_bypass_producer *tmp;
>> struct irq_bypass_consumer *consumer;
>> int ret;
>> @@ -97,24 +98,23 @@ int irq_bypass_register_producer(struct irq_bypass_producer *producer)
>>
>> mutex_lock(&lock);
>>
>> - list_for_each_entry(tmp, &producers, node) {
>> - if (tmp->token == producer->token || tmp == producer) {
>> - ret = -EBUSY;
>> + tmp = xa_load(&producers, token);
>> + if (tmp || tmp == producer) {
>> + ret = -EBUSY;
>> + goto out_err;
>> + }
>> +
>> + ret = xa_err(xa_store(&producers, token, producer, GFP_KERNEL));
>> + if (ret)
>> + goto out_err;
>> +
>> + consumer = xa_load(&consumers, token);
>> + if (consumer) {
>> + ret = __connect(producer, consumer);
>> + if (ret)
>> goto out_err;
>
> This doesn't match previous behavior, the producer is registered to the
> xarray regardless of the result of the connect operation and the caller
> cannot distinguish between failures. The module reference is released
> regardless of xarray item. Nak.

Hi Alex,

Thanks for your comments and indeed, the additional error throwing logic
breaks the caller's expectations as you said.

What if we use LIST as a fallback option for XARRAY? Specifically, when
xa_err(xa_store()) is true, then fallback to use LIST to check for
producers/consumers, and in most cases it still takes the XARRAY path:

static DEFINE_XARRAY(xproducers);
...
if (xa_err(xa_store(&xproducers, (unsigned long)producer->token,
producer, GFP_KERNEL)))
list_add(&producer->node, &producers);
...

There will also be a LIST option on the lookup path.

The rough code already works, could we move in this direction (combining
XARRAY with LIST to hidden the memory allocation error from xa_store) ?

>
>> - }
>> }
>>
>> - list_for_each_entry(consumer, &consumers, node) {
>> - if (consumer->token == producer->token) {
>> - ret = __connect(producer, consumer);
>> - if (ret)
>> - goto out_err;
>> - break;
>> - }
>> - }
>> -
>> - list_add(&producer->node, &producers);
>> -
>> mutex_unlock(&lock);
>>
>> return 0;
>> @@ -129,11 +129,12 @@ EXPORT_SYMBOL_GPL(irq_bypass_register_producer);
>> * irq_bypass_unregister_producer - unregister IRQ bypass producer
>> * @producer: pointer to producer structure
>> *
>> - * Remove a previously registered IRQ producer from the list of producers
>> + * Remove a previously registered IRQ producer from the xarray of producers
>> * and disconnect it from any connected IRQ consumer.
>> */
>> void irq_bypass_unregister_producer(struct irq_bypass_producer *producer)
>> {
>> + unsigned long token = (unsigned long)producer->token;
>> struct irq_bypass_producer *tmp;
>> struct irq_bypass_consumer *consumer;
>>
>> @@ -143,24 +144,18 @@ void irq_bypass_unregister_producer(struct irq_bypass_producer *producer)
>> might_sleep();
>>
>> if (!try_module_get(THIS_MODULE))
>> - return; /* nothing in the list anyway */
>> + return; /* nothing in the xarray anyway */
>>
>> mutex_lock(&lock);
>>
>> - list_for_each_entry(tmp, &producers, node) {
>> - if (tmp != producer)
>> - continue;
>> + tmp = xa_load(&producers, token);
>> + if (tmp == producer) {
>> + consumer = xa_load(&consumers, token);
>> + if (consumer)
>> + __disconnect(producer, consumer);
>>
>> - list_for_each_entry(consumer, &consumers, node) {
>> - if (consumer->token == producer->token) {
>> - __disconnect(producer, consumer);
>> - break;
>> - }
>> - }
>> -
>> - list_del(&producer->node);
>> + xa_erase(&producers, token);
>> module_put(THIS_MODULE);
>> - break;
>> }
>>
>> mutex_unlock(&lock);
>> @@ -173,11 +168,12 @@ EXPORT_SYMBOL_GPL(irq_bypass_unregister_producer);
>> * irq_bypass_register_consumer - register IRQ bypass consumer
>> * @consumer: pointer to consumer structure
>> *
>> - * Add the provided IRQ consumer to the list of consumers and connect
>> - * with any matching token found on the IRQ producer list.
>> + * Add the provided IRQ consumer to the xarray of consumers and connect
>> + * with any matching token found on the IRQ producer xarray.
>> */
>> int irq_bypass_register_consumer(struct irq_bypass_consumer *consumer)
>> {
>> + unsigned long token = (unsigned long)consumer->token;
>> struct irq_bypass_consumer *tmp;
>> struct irq_bypass_producer *producer;
>> int ret;
>> @@ -193,24 +189,23 @@ int irq_bypass_register_consumer(struct irq_bypass_consumer *consumer)
>>
>> mutex_lock(&lock);
>>
>> - list_for_each_entry(tmp, &consumers, node) {
>> - if (tmp->token == consumer->token || tmp == consumer) {
>> - ret = -EBUSY;
>> + tmp = xa_load(&consumers, token);
>> + if (tmp || tmp == consumer) {
>> + ret = -EBUSY;
>> + goto out_err;
>> + }
>> +
>> + ret = xa_err(xa_store(&consumers, token, consumer, GFP_KERNEL));
>> + if (ret)
>> + goto out_err;
>> +
>> + producer = xa_load(&producers, token);
>> + if (producer) {
>> + ret = __connect(producer, consumer);
>> + if (ret)
>> goto out_err;
>
> Same. Thanks,
>
> Alex
>
>> - }
>> }
>>
>> - list_for_each_entry(producer, &producers, node) {
>> - if (producer->token == consumer->token) {
>> - ret = __connect(producer, consumer);
>> - if (ret)
>> - goto out_err;
>> - break;
>> - }
>> - }
>> -
>> - list_add(&consumer->node, &consumers);
>> -
>> mutex_unlock(&lock);
>>
>> return 0;
>> @@ -225,11 +220,12 @@ EXPORT_SYMBOL_GPL(irq_bypass_register_consumer);
>> * irq_bypass_unregister_consumer - unregister IRQ bypass consumer
>> * @consumer: pointer to consumer structure
>> *
>> - * Remove a previously registered IRQ consumer from the list of consumers
>> + * Remove a previously registered IRQ consumer from the xarray of consumers
>> * and disconnect it from any connected IRQ producer.
>> */
>> void irq_bypass_unregister_consumer(struct irq_bypass_consumer *consumer)
>> {
>> + unsigned long token = (unsigned long)consumer->token;
>> struct irq_bypass_consumer *tmp;
>> struct irq_bypass_producer *producer;
>>
>> @@ -239,24 +235,18 @@ void irq_bypass_unregister_consumer(struct irq_bypass_consumer *consumer)
>> might_sleep();
>>
>> if (!try_module_get(THIS_MODULE))
>> - return; /* nothing in the list anyway */
>> + return; /* nothing in the xarray anyway */
>>
>> mutex_lock(&lock);
>>
>> - list_for_each_entry(tmp, &consumers, node) {
>> - if (tmp != consumer)
>> - continue;
>> + tmp = xa_load(&consumers, token);
>> + if (tmp == consumer) {
>> + producer = xa_load(&producers, token);
>> + if (producer)
>> + __disconnect(producer, consumer);
>>
>> - list_for_each_entry(producer, &producers, node) {
>> - if (producer->token == consumer->token) {
>> - __disconnect(producer, consumer);
>> - break;
>> - }
>> - }
>> -
>> - list_del(&consumer->node);
>> + xa_erase(&consumers, token);
>> module_put(THIS_MODULE);
>> - break;
>> }
>>
>> mutex_unlock(&lock);
>> @@ -264,3 +254,10 @@ void irq_bypass_unregister_consumer(struct irq_bypass_consumer *consumer)
>> module_put(THIS_MODULE);
>> }
>> EXPORT_SYMBOL_GPL(irq_bypass_unregister_consumer);
>> +
>> +static void __exit irqbypass_exit(void)
>> +{
>> + xa_destroy(&producers);
>> + xa_destroy(&consumers);
>> +}
>> +module_exit(irqbypass_exit);
>

2023-09-26 15:29:28

by Like Xu

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] KVM: irqbypass: Convert producers/consumers single linked list to XArray

On 25/9/2023 11:24 pm, Like Xu wrote:
>>> @@ -97,24 +98,23 @@ int irq_bypass_register_producer(struct
>>> irq_bypass_producer *producer)
>>>       mutex_lock(&lock);
>>> -    list_for_each_entry(tmp, &producers, node) {
>>> -        if (tmp->token == producer->token || tmp == producer) {
>>> -            ret = -EBUSY;
>>> +    tmp = xa_load(&producers, token);
>>> +    if (tmp || tmp == producer) {
>>> +        ret = -EBUSY;
>>> +        goto out_err;
>>> +    }
>>> +
>>> +    ret = xa_err(xa_store(&producers, token, producer, GFP_KERNEL));
>>> +    if (ret)
>>> +        goto out_err;
>>> +
>>> +    consumer = xa_load(&consumers, token);
>>> +    if (consumer) {
>>> +        ret = __connect(producer, consumer);
>>> +        if (ret)
>>>               goto out_err;
>>
>> This doesn't match previous behavior, the producer is registered to the
>> xarray regardless of the result of the connect operation and the caller
>> cannot distinguish between failures.  The module reference is released
>> regardless of xarray item.  Nak.
>
> Hi Alex,
>
> Thanks for your comments and indeed, the additional error throwing logic
> breaks the caller's expectations as you said.
>
> What if we use LIST as a fallback option for XARRAY? Specifically, when
> xa_err(xa_store()) is true, then fallback to use LIST to check for
> producers/consumers, and in most cases it still takes the XARRAY path:
>
>     static DEFINE_XARRAY(xproducers);
>     ...
>     if (xa_err(xa_store(&xproducers, (unsigned long)producer->token,
>                 producer, GFP_KERNEL)))
>         list_add(&producer->node, &producers);
>     ...
>
> There will also be a LIST option on the lookup path.
>
> The rough code already works, could we move in this direction (combining
> XARRAY with LIST to hidden the memory allocation error from xa_store) ?

For better discussion and further improvement, here's the draft code combining
xarray and list, using both xarray and list to store producers and consumers,
but with xarray preferred for queries:

diff --git a/virt/lib/irqbypass.c b/virt/lib/irqbypass.c
index e0aabbbf27ec..7cc30d699ece 100644
--- a/virt/lib/irqbypass.c
+++ b/virt/lib/irqbypass.c
@@ -18,12 +18,15 @@
#include <linux/list.h>
#include <linux/module.h>
#include <linux/mutex.h>
+#include <linux/xarray.h>

MODULE_LICENSE("GPL v2");
MODULE_DESCRIPTION("IRQ bypass manager utility module");

static LIST_HEAD(producers);
static LIST_HEAD(consumers);
+static DEFINE_XARRAY(xproducers);
+static DEFINE_XARRAY(xconsumers);
static DEFINE_MUTEX(lock);

/* @lock must be held when calling connect */
@@ -74,6 +77,117 @@ static void __disconnect(struct irq_bypass_producer *prod,
prod->start(prod);
}

+#define CHECK_TOKEN BIT_ULL(0)
+#define CHECK_POINTER BIT_ULL(1)
+
+static inline bool
+producer_already_exist(struct irq_bypass_producer *producer, u64 flags)
+{
+ struct irq_bypass_producer *tmp;
+
+ if (((flags & CHECK_POINTER) && xa_load(&xproducers,
+ (unsigned long)producer)) ||
+ ((flags & CHECK_TOKEN) && xa_load(&xproducers,
+ (unsigned long)producer->token)))
+ return true;
+
+ list_for_each_entry(tmp, &producers, node) {
+ if (((flags & CHECK_POINTER) && tmp == producer) ||
+ ((flags & CHECK_TOKEN) && tmp->token == producer->token))
+ return true;
+ }
+
+ return false;
+}
+
+static inline bool
+consumer_already_exist(struct irq_bypass_consumer *consumer, u64 flags)
+{
+ struct irq_bypass_consumer *tmp;
+
+ if (((flags & CHECK_POINTER) && xa_load(&xconsumers,
+ (unsigned long)consumer)) ||
+ ((flags & CHECK_TOKEN) && xa_load(&xconsumers,
+ (unsigned long)consumer->token)))
+ return true;
+
+ list_for_each_entry(tmp, &consumers, node) {
+ if (((flags & CHECK_POINTER) && tmp == consumer) ||
+ ((flags & CHECK_TOKEN) && tmp->token == consumer->token))
+ return true;
+ }
+
+ return false;
+}
+
+static inline struct irq_bypass_producer *get_producer_by_token(void *token)
+{
+ struct irq_bypass_producer *tmp;
+
+ tmp = xa_load(&xproducers, (unsigned long)token);
+ if (tmp)
+ return tmp;
+
+ list_for_each_entry(tmp, &producers, node) {
+ if (tmp->token == token)
+ return tmp;
+ }
+
+ return NULL;
+}
+
+static inline struct irq_bypass_consumer *get_consumer_by_token(void *token)
+{
+ struct irq_bypass_consumer *tmp;
+
+ tmp = xa_load(&xconsumers, (unsigned long)token);
+ if (tmp)
+ return tmp;
+
+ list_for_each_entry(tmp, &consumers, node) {
+ if (tmp->token == token)
+ return tmp;
+ }
+
+ return NULL;
+}
+
+static inline void add_irq_bypass_producer(struct irq_bypass_producer *producer)
+{
+ xa_store(&xproducers, (unsigned long)producer->token,
+ producer, GFP_KERNEL);
+ xa_store(&xproducers, (unsigned long)producer,
+ producer, GFP_KERNEL);
+
+ list_add(&producer->node, &producers);
+}
+
+static inline void del_irq_bypass_producer(struct irq_bypass_producer *producer)
+{
+ xa_erase(&xproducers, (unsigned long)producer->token);
+ xa_erase(&xproducers, (unsigned long)producer);
+
+ list_del(&producer->node);
+}
+
+static inline void add_irq_bypass_consumer(struct irq_bypass_consumer *consumer)
+{
+ xa_store(&xconsumers, (unsigned long)consumer->token,
+ consumer, GFP_KERNEL);
+ xa_store(&xconsumers, (unsigned long)consumer,
+ consumer, GFP_KERNEL);
+
+ list_add(&consumer->node, &consumers);
+}
+
+static inline void del_irq_bypass_consumer(struct irq_bypass_consumer *consumer)
+{
+ xa_erase(&xconsumers, (unsigned long)consumer->token);
+ xa_erase(&xconsumers, (unsigned long)consumer);
+
+ list_del(&consumer->node);
+}
+
/**
* irq_bypass_register_producer - register IRQ bypass producer
* @producer: pointer to producer structure
@@ -83,7 +197,6 @@ static void __disconnect(struct irq_bypass_producer *prod,
*/
int irq_bypass_register_producer(struct irq_bypass_producer *producer)
{
- struct irq_bypass_producer *tmp;
struct irq_bypass_consumer *consumer;
int ret;

@@ -97,23 +210,19 @@ int irq_bypass_register_producer(struct irq_bypass_producer
*producer)

mutex_lock(&lock);

- list_for_each_entry(tmp, &producers, node) {
- if (tmp->token == producer->token || tmp == producer) {
- ret = -EBUSY;
+ if (producer_already_exist(producer, CHECK_TOKEN | CHECK_POINTER)) {
+ ret = -EBUSY;
+ goto out_err;
+ }
+
+ consumer = get_consumer_by_token(producer->token);
+ if (consumer) {
+ ret = __connect(producer, consumer);
+ if (ret)
goto out_err;
- }
}

- list_for_each_entry(consumer, &consumers, node) {
- if (consumer->token == producer->token) {
- ret = __connect(producer, consumer);
- if (ret)
- goto out_err;
- break;
- }
- }
-
- list_add(&producer->node, &producers);
+ add_irq_bypass_producer(producer);

mutex_unlock(&lock);

@@ -134,7 +243,6 @@ EXPORT_SYMBOL_GPL(irq_bypass_register_producer);
*/
void irq_bypass_unregister_producer(struct irq_bypass_producer *producer)
{
- struct irq_bypass_producer *tmp;
struct irq_bypass_consumer *consumer;

if (!producer->token)
@@ -147,20 +255,13 @@ void irq_bypass_unregister_producer(struct
irq_bypass_producer *producer)

mutex_lock(&lock);

- list_for_each_entry(tmp, &producers, node) {
- if (tmp != producer)
- continue;
+ if (producer_already_exist(producer, CHECK_POINTER)) {
+ consumer = get_consumer_by_token(producer->token);
+ if (consumer)
+ __disconnect(producer, consumer);

- list_for_each_entry(consumer, &consumers, node) {
- if (consumer->token == producer->token) {
- __disconnect(producer, consumer);
- break;
- }
- }
-
- list_del(&producer->node);
+ del_irq_bypass_producer(producer);
module_put(THIS_MODULE);
- break;
}

mutex_unlock(&lock);
@@ -178,7 +279,6 @@ EXPORT_SYMBOL_GPL(irq_bypass_unregister_producer);
*/
int irq_bypass_register_consumer(struct irq_bypass_consumer *consumer)
{
- struct irq_bypass_consumer *tmp;
struct irq_bypass_producer *producer;
int ret;

@@ -193,23 +293,19 @@ int irq_bypass_register_consumer(struct
irq_bypass_consumer *consumer)

mutex_lock(&lock);

- list_for_each_entry(tmp, &consumers, node) {
- if (tmp->token == consumer->token || tmp == consumer) {
- ret = -EBUSY;
+ if (consumer_already_exist(consumer, CHECK_TOKEN | CHECK_POINTER)) {
+ ret = -EBUSY;
+ goto out_err;
+ }
+
+ producer = get_producer_by_token(consumer->token);
+ if (producer) {
+ ret = __connect(producer, consumer);
+ if (ret)
goto out_err;
- }
}

- list_for_each_entry(producer, &producers, node) {
- if (producer->token == consumer->token) {
- ret = __connect(producer, consumer);
- if (ret)
- goto out_err;
- break;
- }
- }
-
- list_add(&consumer->node, &consumers);
+ add_irq_bypass_consumer(consumer);

mutex_unlock(&lock);

@@ -230,7 +326,6 @@ EXPORT_SYMBOL_GPL(irq_bypass_register_consumer);
*/
void irq_bypass_unregister_consumer(struct irq_bypass_consumer *consumer)
{
- struct irq_bypass_consumer *tmp;
struct irq_bypass_producer *producer;

if (!consumer->token)
@@ -243,20 +338,13 @@ void irq_bypass_unregister_consumer(struct
irq_bypass_consumer *consumer)

mutex_lock(&lock);

- list_for_each_entry(tmp, &consumers, node) {
- if (tmp != consumer)
- continue;
+ if (consumer_already_exist(consumer, CHECK_POINTER)) {
+ producer = get_producer_by_token(consumer->token);
+ if (producer)
+ __disconnect(producer, consumer);

- list_for_each_entry(producer, &producers, node) {
- if (producer->token == consumer->token) {
- __disconnect(producer, consumer);
- break;
- }
- }
-
- list_del(&consumer->node);
+ del_irq_bypass_consumer(consumer);
module_put(THIS_MODULE);
- break;
}

mutex_unlock(&lock);
@@ -264,3 +352,10 @@ void irq_bypass_unregister_consumer(struct
irq_bypass_consumer *consumer)
module_put(THIS_MODULE);
}
EXPORT_SYMBOL_GPL(irq_bypass_unregister_consumer);
+
+static void __exit irqbypass_exit(void)
+{
+ xa_destroy(&xproducers);
+ xa_destroy(&xconsumers);
+}
+module_exit(irqbypass_exit);
--
2.42.0

2023-10-11 11:26:51

by Like Xu

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] KVM: irqbypass: Convert producers/consumers single linked list to XArray

Hi all, do we have any negative comments on this issue ?
This register path could be a performance bottleneck in serverless scenarios.
Your comments are much appreciated, thanks!

On 26/9/2023 5:18 pm, Like Xu wrote:
> On 25/9/2023 11:24 pm, Like Xu wrote:
>>>> @@ -97,24 +98,23 @@ int irq_bypass_register_producer(struct
>>>> irq_bypass_producer *producer)
>>>>       mutex_lock(&lock);
>>>> -    list_for_each_entry(tmp, &producers, node) {
>>>> -        if (tmp->token == producer->token || tmp == producer) {
>>>> -            ret = -EBUSY;
>>>> +    tmp = xa_load(&producers, token);
>>>> +    if (tmp || tmp == producer) {
>>>> +        ret = -EBUSY;
>>>> +        goto out_err;
>>>> +    }
>>>> +
>>>> +    ret = xa_err(xa_store(&producers, token, producer, GFP_KERNEL));
>>>> +    if (ret)
>>>> +        goto out_err;
>>>> +
>>>> +    consumer = xa_load(&consumers, token);
>>>> +    if (consumer) {
>>>> +        ret = __connect(producer, consumer);
>>>> +        if (ret)
>>>>               goto out_err;
>>>
>>> This doesn't match previous behavior, the producer is registered to the
>>> xarray regardless of the result of the connect operation and the caller
>>> cannot distinguish between failures.  The module reference is released
>>> regardless of xarray item.  Nak.
>>
>> Hi Alex,
>>
>> Thanks for your comments and indeed, the additional error throwing logic
>> breaks the caller's expectations as you said.
>>
>> What if we use LIST as a fallback option for XARRAY? Specifically, when
>> xa_err(xa_store()) is true, then fallback to use LIST to check for
>> producers/consumers, and in most cases it still takes the XARRAY path:
>>
>>      static DEFINE_XARRAY(xproducers);
>>      ...
>>      if (xa_err(xa_store(&xproducers, (unsigned long)producer->token,
>>                  producer, GFP_KERNEL)))
>>          list_add(&producer->node, &producers);
>>      ...
>>
>> There will also be a LIST option on the lookup path.
>>
>> The rough code already works, could we move in this direction (combining
>> XARRAY with LIST to hidden the memory allocation error from xa_store) ?
>
> For better discussion and further improvement, here's the draft code combining
> xarray and list, using both xarray and list to store producers and consumers,
> but with xarray preferred for queries:
>
> diff --git a/virt/lib/irqbypass.c b/virt/lib/irqbypass.c
> index e0aabbbf27ec..7cc30d699ece 100644
> --- a/virt/lib/irqbypass.c
> +++ b/virt/lib/irqbypass.c
> @@ -18,12 +18,15 @@
>  #include <linux/list.h>
>  #include <linux/module.h>
>  #include <linux/mutex.h>
> +#include <linux/xarray.h>
>
>  MODULE_LICENSE("GPL v2");
>  MODULE_DESCRIPTION("IRQ bypass manager utility module");
>
>  static LIST_HEAD(producers);
>  static LIST_HEAD(consumers);
> +static DEFINE_XARRAY(xproducers);
> +static DEFINE_XARRAY(xconsumers);
>  static DEFINE_MUTEX(lock);
>
>  /* @lock must be held when calling connect */
> @@ -74,6 +77,117 @@ static void __disconnect(struct irq_bypass_producer *prod,
>          prod->start(prod);
>  }
>
> +#define CHECK_TOKEN    BIT_ULL(0)
> +#define CHECK_POINTER    BIT_ULL(1)
> +
> +static inline bool
> +producer_already_exist(struct irq_bypass_producer *producer, u64 flags)
> +{
> +    struct irq_bypass_producer *tmp;
> +
> +    if (((flags & CHECK_POINTER) && xa_load(&xproducers,
> +                        (unsigned long)producer)) ||
> +        ((flags & CHECK_TOKEN) && xa_load(&xproducers,
> +                          (unsigned long)producer->token)))
> +        return true;
> +
> +    list_for_each_entry(tmp, &producers, node) {
> +        if (((flags & CHECK_POINTER) && tmp == producer) ||
> +            ((flags & CHECK_TOKEN) && tmp->token == producer->token))
> +            return true;
> +    }
> +
> +    return false;
> +}
> +
> +static inline bool
> +consumer_already_exist(struct irq_bypass_consumer *consumer, u64 flags)
> +{
> +    struct irq_bypass_consumer *tmp;
> +
> +    if (((flags & CHECK_POINTER) && xa_load(&xconsumers,
> +                        (unsigned long)consumer)) ||
> +        ((flags & CHECK_TOKEN) && xa_load(&xconsumers,
> +                          (unsigned long)consumer->token)))
> +        return true;
> +
> +    list_for_each_entry(tmp, &consumers, node) {
> +        if (((flags & CHECK_POINTER) && tmp == consumer) ||
> +            ((flags & CHECK_TOKEN) && tmp->token == consumer->token))
> +            return true;
> +    }
> +
> +    return false;
> +}
> +
> +static inline struct irq_bypass_producer *get_producer_by_token(void *token)
> +{
> +    struct irq_bypass_producer *tmp;
> +
> +    tmp = xa_load(&xproducers, (unsigned long)token);
> +    if (tmp)
> +        return tmp;
> +
> +    list_for_each_entry(tmp, &producers, node) {
> +        if (tmp->token == token)
> +            return tmp;
> +    }
> +
> +    return NULL;
> +}
> +
> +static inline struct irq_bypass_consumer *get_consumer_by_token(void *token)
> +{
> +    struct irq_bypass_consumer *tmp;
> +
> +    tmp = xa_load(&xconsumers, (unsigned long)token);
> +    if (tmp)
> +        return tmp;
> +
> +    list_for_each_entry(tmp, &consumers, node) {
> +        if (tmp->token == token)
> +            return tmp;
> +    }
> +
> +    return NULL;
> +}
> +
> +static inline void add_irq_bypass_producer(struct irq_bypass_producer *producer)
> +{
> +    xa_store(&xproducers, (unsigned long)producer->token,
> +         producer, GFP_KERNEL);
> +    xa_store(&xproducers, (unsigned long)producer,
> +         producer, GFP_KERNEL);
> +
> +    list_add(&producer->node, &producers);
> +}
> +
> +static inline void del_irq_bypass_producer(struct irq_bypass_producer *producer)
> +{
> +    xa_erase(&xproducers, (unsigned long)producer->token);
> +    xa_erase(&xproducers, (unsigned long)producer);
> +
> +    list_del(&producer->node);
> +}
> +
> +static inline void add_irq_bypass_consumer(struct irq_bypass_consumer *consumer)
> +{
> +    xa_store(&xconsumers, (unsigned long)consumer->token,
> +         consumer, GFP_KERNEL);
> +    xa_store(&xconsumers, (unsigned long)consumer,
> +         consumer, GFP_KERNEL);
> +
> +    list_add(&consumer->node, &consumers);
> +}
> +
> +static inline void del_irq_bypass_consumer(struct irq_bypass_consumer *consumer)
> +{
> +    xa_erase(&xconsumers, (unsigned long)consumer->token);
> +    xa_erase(&xconsumers, (unsigned long)consumer);
> +
> +    list_del(&consumer->node);
> +}
> +
>  /**
>   * irq_bypass_register_producer - register IRQ bypass producer
>   * @producer: pointer to producer structure
> @@ -83,7 +197,6 @@ static void __disconnect(struct irq_bypass_producer *prod,
>   */
>  int irq_bypass_register_producer(struct irq_bypass_producer *producer)
>  {
> -    struct irq_bypass_producer *tmp;
>      struct irq_bypass_consumer *consumer;
>      int ret;
>
> @@ -97,23 +210,19 @@ int irq_bypass_register_producer(struct irq_bypass_producer
> *producer)
>
>      mutex_lock(&lock);
>
> -    list_for_each_entry(tmp, &producers, node) {
> -        if (tmp->token == producer->token || tmp == producer) {
> -            ret = -EBUSY;
> +    if (producer_already_exist(producer, CHECK_TOKEN | CHECK_POINTER)) {
> +        ret = -EBUSY;
> +        goto out_err;
> +    }
> +
> +    consumer = get_consumer_by_token(producer->token);
> +    if (consumer) {
> +        ret = __connect(producer, consumer);
> +        if (ret)
>              goto out_err;
> -        }
>      }
>
> -    list_for_each_entry(consumer, &consumers, node) {
> -        if (consumer->token == producer->token) {
> -            ret = __connect(producer, consumer);
> -            if (ret)
> -                goto out_err;
> -            break;
> -        }
> -    }
> -
> -    list_add(&producer->node, &producers);
> +    add_irq_bypass_producer(producer);
>
>      mutex_unlock(&lock);
>
> @@ -134,7 +243,6 @@ EXPORT_SYMBOL_GPL(irq_bypass_register_producer);
>   */
>  void irq_bypass_unregister_producer(struct irq_bypass_producer *producer)
>  {
> -    struct irq_bypass_producer *tmp;
>      struct irq_bypass_consumer *consumer;
>
>      if (!producer->token)
> @@ -147,20 +255,13 @@ void irq_bypass_unregister_producer(struct
> irq_bypass_producer *producer)
>
>      mutex_lock(&lock);
>
> -    list_for_each_entry(tmp, &producers, node) {
> -        if (tmp != producer)
> -            continue;
> +    if (producer_already_exist(producer, CHECK_POINTER)) {
> +        consumer = get_consumer_by_token(producer->token);
> +        if (consumer)
> +            __disconnect(producer, consumer);
>
> -        list_for_each_entry(consumer, &consumers, node) {
> -            if (consumer->token == producer->token) {
> -                __disconnect(producer, consumer);
> -                break;
> -            }
> -        }
> -
> -        list_del(&producer->node);
> +        del_irq_bypass_producer(producer);
>          module_put(THIS_MODULE);
> -        break;
>      }
>
>      mutex_unlock(&lock);
> @@ -178,7 +279,6 @@ EXPORT_SYMBOL_GPL(irq_bypass_unregister_producer);
>   */
>  int irq_bypass_register_consumer(struct irq_bypass_consumer *consumer)
>  {
> -    struct irq_bypass_consumer *tmp;
>      struct irq_bypass_producer *producer;
>      int ret;
>
> @@ -193,23 +293,19 @@ int irq_bypass_register_consumer(struct
> irq_bypass_consumer *consumer)
>
>      mutex_lock(&lock);
>
> -    list_for_each_entry(tmp, &consumers, node) {
> -        if (tmp->token == consumer->token || tmp == consumer) {
> -            ret = -EBUSY;
> +    if (consumer_already_exist(consumer, CHECK_TOKEN | CHECK_POINTER)) {
> +        ret = -EBUSY;
> +        goto out_err;
> +    }
> +
> +    producer = get_producer_by_token(consumer->token);
> +    if (producer) {
> +        ret = __connect(producer, consumer);
> +        if (ret)
>              goto out_err;
> -        }
>      }
>
> -    list_for_each_entry(producer, &producers, node) {
> -        if (producer->token == consumer->token) {
> -            ret = __connect(producer, consumer);
> -            if (ret)
> -                goto out_err;
> -            break;
> -        }
> -    }
> -
> -    list_add(&consumer->node, &consumers);
> +    add_irq_bypass_consumer(consumer);
>
>      mutex_unlock(&lock);
>
> @@ -230,7 +326,6 @@ EXPORT_SYMBOL_GPL(irq_bypass_register_consumer);
>   */
>  void irq_bypass_unregister_consumer(struct irq_bypass_consumer *consumer)
>  {
> -    struct irq_bypass_consumer *tmp;
>      struct irq_bypass_producer *producer;
>
>      if (!consumer->token)
> @@ -243,20 +338,13 @@ void irq_bypass_unregister_consumer(struct
> irq_bypass_consumer *consumer)
>
>      mutex_lock(&lock);
>
> -    list_for_each_entry(tmp, &consumers, node) {
> -        if (tmp != consumer)
> -            continue;
> +    if (consumer_already_exist(consumer, CHECK_POINTER)) {
> +        producer = get_producer_by_token(consumer->token);
> +        if (producer)
> +            __disconnect(producer, consumer);
>
> -        list_for_each_entry(producer, &producers, node) {
> -            if (producer->token == consumer->token) {
> -                __disconnect(producer, consumer);
> -                break;
> -            }
> -        }
> -
> -        list_del(&consumer->node);
> +        del_irq_bypass_consumer(consumer);
>          module_put(THIS_MODULE);
> -        break;
>      }
>
>      mutex_unlock(&lock);
> @@ -264,3 +352,10 @@ void irq_bypass_unregister_consumer(struct
> irq_bypass_consumer *consumer)
>      module_put(THIS_MODULE);
>  }
>  EXPORT_SYMBOL_GPL(irq_bypass_unregister_consumer);
> +
> +static void __exit irqbypass_exit(void)
> +{
> +    xa_destroy(&xproducers);
> +    xa_destroy(&xconsumers);
> +}
> +module_exit(irqbypass_exit);