2022-08-24 10:26:35

by Peng Zhang

[permalink] [raw]
Subject: [PATCH v2] iommu/iova: Optimize alloc_iova with rbtree_augmented

The current algorithm of alloc_iova is to scan all iovas until it finds
a gap that satisfies the condition to allocate. This can be very slow in
some scenarios. We can optimize alloc_iova() from time complexity O(n)
to O(log(n)).

We can make a test like this:
Write a module and initialize iova_domain with 4k granule.
Then using a user-mode program to call the module to allocate iova of
size 1 2^20 times within the allocation limit of 2^20. This is single
threaded and the low 4g space is full after 2^20 allocations.

Finally loop the following three steps:
1. Randomly releases an iova.

2. Allocate an iova of size 1 within the allocation limit of 2^20.

3. Allocate an iova of size 1 within the allocation limit of 2^20.
This will fail and take a very long time, because max32_alloc_size
is reset whenever an iova is released.

The data below is the result of repeating the three steps 1024 times in
a physical machine with a CPU clocked at 2.30GHz

Before improvement:
Tracing 1 functions for "alloc_iova"...
nsecs : count distbution
256 -> 511 : 1594 | |
512 -> 1023 : 1030686 |**************************************|
1024 -> 2047 : 14661 | |
2048 -> 4095 : 1730 | |
4096 -> 8191 : 634 | |
8192 -> 16383 : 20 | |
16384 -> 32767 : 2 | |
32768 -> 65535 : 2 | |
65536 -> 131071 : 3 | |
131072 -> 262143 : 6 | |
262144 -> 524287 : 8 | |
524288 -> 1048575 : 19 | |
1048576 -> 2097151 : 35 | |
2097152 -> 4194303 : 55 | |
4194304 -> 8388607 : 117 | |
8388608 -> 16777215 : 165 | |
16777216 -> 33554431 : 1112 | |
avg = 33867 nsecs, total: 35589643563 nsecs, count: 1050849

With improvement:
Tracing 1 functions for "alloc_iova"...
nsecs : count distribution
512 -> 1023 : 1033561 |****************************************|
1024 -> 2047 : 13631 | |
2048 -> 4095 : 2981 | |
4096 -> 8191 : 448 | |
8192 -> 16383 : 5 | |
16384 -> 32767 : 1 | |
avg = 696 nsecs, total: 732196323 nsecs, count: 1050627

Introduce the improved algorithm:

------------------------------------------------------------------------
| gap1 |iova1| gap2 |iova2| gap3 |iova3| gap4 |iova4| gap5 |anchor|
------------------------------------------------------------------------

let A = allocatable_size
let B = max_allocatable_size
____________
/ iova2 \ B = max( left_child->B,
| A | right_child->B,
\ B / A)
------------
/ \
/ \
____________ ____________
/ iova1 \ / iova4 \
| A | | A |
\ B / \ B /
------------ ------------
/ \
/ \
____________ ____________
/ iova3 \ / anchor \
| A | | A |
\ B / \ B /
------------ ------------

Define the gap of a iova is the gap between the iova and it's previous
iova. Such as the gap of iova3 is gap3.This gap can be used to allocate.

Add three variables to struct iova.
prev_iova:
point to the previous iova, sush as iova3->prev_iova point to
iova2.

allocatable_size:
allocatable_size is the max size can be allocated from a gap.
It is not the length of a gap because the allocated address
may need to be aligned.

max_allocatable_size:
max_allocatable_size is the max allocatable_size of all iova's
gap in the subtree.

max_allocatable_size = max( left_child->max_allocatable_size,
right_child->max_allocatable_size,
allocatable_size)

We can use rbtree_augmented to maintain max_allocatable_size in time
complexity O(log(n)).

In the rbtree, with the max_allocatable_size and allocatable_size,
searching the gap to allocate is fast and the time complexity is
O(log(n)).

Signed-off-by: Peng Zhang <[email protected]>
---
drivers/iommu/iova.c | 265 ++++++++++++++++++++++++++++++++-----------
include/linux/iova.h | 5 +-
2 files changed, 204 insertions(+), 66 deletions(-)

diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index db77aa675145..79625ac82560 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -43,6 +43,56 @@ static struct iova *to_iova(struct rb_node *node)
return rb_entry(node, struct iova, node);
}

+/*
+ * We can't judge whether it can be allocated only by a given interval length
+ * because the address may be aligned.
+ * This function computes the max allocatable size for a given interval.
+ * The time complexity of this function is O(log(n)).
+ */
+static unsigned long __compute_allocatable_size(unsigned long lo,
+ unsigned long hi)
+{
+ unsigned long allocatable_size = 0;
+
+ if (lo == 0)
+ return hi;
+ while (lo < hi) {
+ unsigned long delta = 1UL << __ffs64(lo);
+
+ if (hi - lo <= delta) {
+ allocatable_size = max(allocatable_size, hi - lo);
+ break;
+ }
+ allocatable_size = max(allocatable_size, delta);
+ lo += delta;
+ }
+ return allocatable_size;
+}
+
+static inline unsigned long prev_iova_high(struct iova *iova)
+{
+ return iova->prev_iova ? iova->prev_iova->pfn_hi + 1 : 0;
+}
+
+static inline unsigned long iova_compute_allocatable_size(struct iova *iova)
+{
+ return __compute_allocatable_size(prev_iova_high(iova), iova->pfn_lo);
+}
+
+static inline unsigned long iova_get_allocatable_size(struct iova *iova)
+{
+ return iova->allocatable_size;
+}
+
+RB_DECLARE_CALLBACKS_MAX(static, iova_gap_callbacks, struct iova, node,
+ unsigned long, max_allocatable_size,
+ iova_get_allocatable_size)
+
+static inline void iova_max_allocatable_size_update(struct iova *iova)
+{
+ iova_gap_callbacks_propagate(&iova->node, NULL);
+}
+
void
init_iova_domain(struct iova_domain *iovad, unsigned long granule,
unsigned long start_pfn)
@@ -63,8 +113,16 @@ init_iova_domain(struct iova_domain *iovad, unsigned long granule,
iovad->dma_32bit_pfn = 1UL << (32 - iova_shift(iovad));
iovad->max32_alloc_size = iovad->dma_32bit_pfn;
iovad->anchor.pfn_lo = iovad->anchor.pfn_hi = IOVA_ANCHOR;
+ iovad->anchor.prev_iova = NULL;
+ iovad->anchor.allocatable_size =
+ __compute_allocatable_size(0, IOVA_ANCHOR);
+ iovad->anchor.max_allocatable_size = iovad->anchor.allocatable_size;
+
rb_link_node(&iovad->anchor.node, NULL, &iovad->rbroot.rb_node);
rb_insert_color(&iovad->anchor.node, &iovad->rbroot);
+
+ if (start_pfn)
+ reserve_iova(iovad, 0, start_pfn - 1);
}
EXPORT_SYMBOL_GPL(init_iova_domain);

@@ -87,7 +145,8 @@ __cached_rbnode_insert_update(struct iova_domain *iovad, struct iova *new)
}

static void
-__cached_rbnode_delete_update(struct iova_domain *iovad, struct iova *free)
+__cached_rbnode_delete_update(struct iova_domain *iovad, struct iova *free,
+ struct rb_node *next)
{
struct iova *cached_iova;

@@ -95,51 +154,32 @@ __cached_rbnode_delete_update(struct iova_domain *iovad, struct iova *free)
if (free == cached_iova ||
(free->pfn_hi < iovad->dma_32bit_pfn &&
free->pfn_lo >= cached_iova->pfn_lo))
- iovad->cached32_node = rb_next(&free->node);
+ iovad->cached32_node = next;

if (free->pfn_lo < iovad->dma_32bit_pfn)
iovad->max32_alloc_size = iovad->dma_32bit_pfn;

cached_iova = to_iova(iovad->cached_node);
if (free->pfn_lo >= cached_iova->pfn_lo)
- iovad->cached_node = rb_next(&free->node);
+ iovad->cached_node = next;
}

-static struct rb_node *iova_find_limit(struct iova_domain *iovad, unsigned long limit_pfn)
+static struct rb_node *iova_find_limit(struct iova_domain *iovad,
+ unsigned long limit_pfn)
{
- struct rb_node *node, *next;
- /*
- * Ideally what we'd like to judge here is whether limit_pfn is close
- * enough to the highest-allocated IOVA that starting the allocation
- * walk from the anchor node will be quicker than this initial work to
- * find an exact starting point (especially if that ends up being the
- * anchor node anyway). This is an incredibly crude approximation which
- * only really helps the most likely case, but is at least trivially easy.
- */
- if (limit_pfn > iovad->dma_32bit_pfn)
- return &iovad->anchor.node;
-
- node = iovad->rbroot.rb_node;
- while (to_iova(node)->pfn_hi < limit_pfn)
- node = node->rb_right;
-
-search_left:
- while (node->rb_left && to_iova(node->rb_left)->pfn_lo >= limit_pfn)
- node = node->rb_left;
-
- if (!node->rb_left)
- return node;
-
- next = node->rb_left;
- while (next->rb_right) {
- next = next->rb_right;
- if (to_iova(next)->pfn_lo >= limit_pfn) {
- node = next;
- goto search_left;
- }
- }
+ struct rb_node *curr = iovad->rbroot.rb_node;

- return node;
+ while (curr) {
+ struct iova *iova = to_iova(curr);
+
+ if (limit_pfn - 1 > iova->pfn_hi)
+ curr = curr->rb_right;
+ else if (limit_pfn <= prev_iova_high(iova))
+ curr = curr->rb_left;
+ else
+ break;
+ }
+ return curr;
}

/* Insert the iova into domain rbtree by holding writer lock */
@@ -148,6 +188,7 @@ iova_insert_rbtree(struct rb_root *root, struct iova *iova,
struct rb_node *start)
{
struct rb_node **new, *parent = NULL;
+ struct iova *next_iova;

new = (start) ? &start : &(root->rb_node);
/* Figure out where to put new node */
@@ -166,61 +207,143 @@ iova_insert_rbtree(struct rb_root *root, struct iova *iova,
}
}
/* Add new node and rebalance tree. */
+
rb_link_node(&iova->node, parent, new);
- rb_insert_color(&iova->node, root);
+
+ next_iova = to_iova(rb_next(&iova->node));
+ iova->prev_iova = next_iova->prev_iova;
+ next_iova->prev_iova = iova;
+
+ iova->allocatable_size = iova_compute_allocatable_size(iova);
+ next_iova->allocatable_size = iova_compute_allocatable_size(next_iova);
+
+ /*
+ * Do't swap the following two lines, because next_iova is the ancestor
+ * of iova and updating iova first is faster.
+ */
+ iova_max_allocatable_size_update(iova);
+ iova_max_allocatable_size_update(next_iova);
+
+ rb_insert_augmented(&iova->node, root, &iova_gap_callbacks);
+}
+
+static inline bool check_interval(unsigned long lo, unsigned long hi,
+ unsigned long limit_pfn, unsigned long size,
+ unsigned long align_mask)
+{
+ hi = min(hi, limit_pfn);
+ if (lo >= hi)
+ return false;
+ if (hi >= size && ((hi - size) & align_mask) >= lo)
+ return true;
+ return false;
}

static int __alloc_and_insert_iova_range(struct iova_domain *iovad,
unsigned long size, unsigned long limit_pfn,
struct iova *new, bool size_aligned)
{
- struct rb_node *curr, *prev;
- struct iova *curr_iova;
unsigned long flags;
- unsigned long new_pfn, retry_pfn;
+ struct rb_node *curr;
+ struct rb_node *parent;
+ struct iova *curr_iova;
unsigned long align_mask = ~0UL;
- unsigned long high_pfn = limit_pfn, low_pfn = iovad->start_pfn;
+ bool ignore = false;

if (size_aligned)
align_mask <<= fls_long(size - 1);

- /* Walk the tree backwards */
spin_lock_irqsave(&iovad->iova_rbtree_lock, flags);
+
if (limit_pfn <= iovad->dma_32bit_pfn &&
size >= iovad->max32_alloc_size)
goto iova32_full;

curr = __get_cached_rbnode(iovad, limit_pfn);
curr_iova = to_iova(curr);
- retry_pfn = curr_iova->pfn_hi + 1;

-retry:
- do {
- high_pfn = min(high_pfn, curr_iova->pfn_lo);
- new_pfn = (high_pfn - size) & align_mask;
- prev = curr;
- curr = rb_prev(curr);
- curr_iova = to_iova(curr);
- } while (curr && new_pfn <= curr_iova->pfn_hi && new_pfn >= low_pfn);
-
- if (high_pfn < size || new_pfn < low_pfn) {
- if (low_pfn == iovad->start_pfn && retry_pfn < limit_pfn) {
- high_pfn = limit_pfn;
- low_pfn = retry_pfn;
- curr = iova_find_limit(iovad, limit_pfn);
- curr_iova = to_iova(curr);
- goto retry;
+ if (limit_pfn >= curr_iova->pfn_lo &&
+ curr_iova->allocatable_size >= size)
+ goto found;
+
+ /* If limit_pfn > dma_32bit_pfn, this could be faster. */
+ if (limit_pfn > iovad->dma_32bit_pfn) {
+ curr_iova = to_iova(&iovad->anchor.node);
+
+ while (curr_iova) {
+ if (check_interval(prev_iova_high(curr_iova),
+ curr_iova->pfn_lo, limit_pfn,
+ size, align_mask))
+ goto found;
+ curr_iova = curr_iova->prev_iova;
}
iovad->max32_alloc_size = size;
goto iova32_full;
}

+ curr = iova_find_limit(iovad, limit_pfn);
+ curr_iova = to_iova(curr);
+
+ if (check_interval(prev_iova_high(curr_iova),
+ curr_iova->pfn_lo, limit_pfn,
+ size, align_mask))
+ goto found;
+
+ while (true) {
+ /* Check left subtree */
+ if (!ignore && curr->rb_left) {
+ curr_iova = to_iova(curr->rb_left);
+ if (curr_iova->max_allocatable_size >= size)
+ goto check_subtree;
+ }
+
+ parent = rb_parent(curr);
+ if (parent == NULL)
+ break;
+ /*
+ * If current node is the left child of it's parent,
+ * the parent node and the parent's right sub_tree should not
+ * to be checked because they exceed the limit_pfn.
+ */
+ ignore = parent->rb_left == curr;
+ curr = parent;
+
+ /* Check current node. */
+ if (!ignore) {
+ curr_iova = to_iova(curr);
+ if (curr_iova->allocatable_size >= size)
+ goto found;
+ }
+ }
+ if (limit_pfn >= iovad->dma_32bit_pfn)
+ iovad->max32_alloc_size = size;
+ goto iova32_full;
+
+check_subtree:
+ while (true) {
+ if (curr_iova->allocatable_size >= size)
+ goto found;
+
+ curr = &curr_iova->node;
+ if (curr->rb_right &&
+ to_iova(curr->rb_right)->max_allocatable_size >= size) {
+ curr_iova = to_iova(curr->rb_right);
+ continue;
+ }
+ WARN_ON(curr->rb_left == NULL);
+ curr_iova = to_iova(curr->rb_left);
+ }
+
+found:
/* pfn_lo will point to size aligned address if size_aligned is set */
- new->pfn_lo = new_pfn;
+ new->pfn_lo = (min(curr_iova->pfn_lo, limit_pfn) - size) & align_mask;
new->pfn_hi = new->pfn_lo + size - 1;

- /* If we have 'prev', it's a valid place to start the insertion. */
- iova_insert_rbtree(&iovad->rbroot, new, prev);
+ /*
+ * If we have 'prev' or 'next',
+ * it's a valid place to start the insertion.
+ */
+ iova_insert_rbtree(&iovad->rbroot, new, &curr_iova->node);
__cached_rbnode_insert_update(iovad, new);

spin_unlock_irqrestore(&iovad->iova_rbtree_lock, flags);
@@ -352,9 +475,18 @@ private_find_iova(struct iova_domain *iovad, unsigned long pfn)

static void remove_iova(struct iova_domain *iovad, struct iova *iova)
{
+ struct rb_node *next;
+ struct iova *next_iova;
assert_spin_locked(&iovad->iova_rbtree_lock);
- __cached_rbnode_delete_update(iovad, iova);
- rb_erase(&iova->node, &iovad->rbroot);
+
+ next = rb_next(&iova->node);
+ __cached_rbnode_delete_update(iovad, iova, next);
+
+ next_iova = to_iova(next);
+ next_iova->prev_iova = iova->prev_iova;
+ next_iova->allocatable_size = iova_compute_allocatable_size(next_iova);
+ iova_max_allocatable_size_update(next_iova);
+ rb_erase_augmented(&iova->node, &iovad->rbroot, &iova_gap_callbacks);
}

/**
@@ -554,8 +686,11 @@ static void
__adjust_overlap_range(struct iova *iova,
unsigned long *pfn_lo, unsigned long *pfn_hi)
{
- if (*pfn_lo < iova->pfn_lo)
+ if (*pfn_lo < iova->pfn_lo) {
iova->pfn_lo = *pfn_lo;
+ iova->allocatable_size = iova_compute_allocatable_size(iova);
+ iova_max_allocatable_size_update(iova);
+ }
if (*pfn_hi > iova->pfn_hi)
*pfn_lo = iova->pfn_hi + 1;
}
diff --git a/include/linux/iova.h b/include/linux/iova.h
index 320a70e40233..feb8121f104d 100644
--- a/include/linux/iova.h
+++ b/include/linux/iova.h
@@ -11,7 +11,7 @@

#include <linux/types.h>
#include <linux/kernel.h>
-#include <linux/rbtree.h>
+#include <linux/rbtree_augmented.h>
#include <linux/dma-mapping.h>

/* iova structure */
@@ -19,6 +19,9 @@ struct iova {
struct rb_node node;
unsigned long pfn_hi; /* Highest allocated pfn */
unsigned long pfn_lo; /* Lowest allocated pfn */
+ struct iova *prev_iova;
+ unsigned long allocatable_size;
+ unsigned long max_allocatable_size;
};


--
2.20.1


2022-08-24 12:35:06

by Ethan Zhao

[permalink] [raw]
Subject: Re: [PATCH v2] iommu/iova: Optimize alloc_iova with rbtree_augmented

Hi,

在 2022/8/24 17:51, Peng Zhang 写道:
> The current algorithm of alloc_iova is to scan all iovas until it finds
> a gap that satisfies the condition to allocate. This can be very slow in
> some scenarios. We can optimize alloc_iova() from time complexity O(n)
What kind of scenarion for example ?
> to O(log(n)).
>
> We can make a test like this:
> Write a module and initialize iova_domain with 4k granule.
> Then using a user-mode program to call the module to allocate iova of
> size 1 2^20 times within the allocation limit of 2^20. This is single
> threaded and the low 4g space is full after 2^20 allocations.
>
> Finally loop the following three steps:
> 1. Randomly releases an iova.
>
> 2. Allocate an iova of size 1 within the allocation limit of 2^20.
>
> 3. Allocate an iova of size 1 within the allocation limit of 2^20.
> This will fail and take a very long time, because max32_alloc_size
> is reset whenever an iova is released.
>
> The data below is the result of repeating the three steps 1024 times in
> a physical machine with a CPU clocked at 2.30GHz
>
> Before improvement:
> Tracing 1 functions for "alloc_iova"...

Out of curiosity, do you have number about "alloc_iova + free_iova" and

"alloc_iova_fast + free_iova_fast" ?

> nsecs : count distbution

s/distbution/distribution ?


Thanks,

Ethan

> 256 -> 511 : 1594 | |
> 512 -> 1023 : 1030686 |**************************************|
> 1024 -> 2047 : 14661 | |
> 2048 -> 4095 : 1730 | |
> 4096 -> 8191 : 634 | |
> 8192 -> 16383 : 20 | |
> 16384 -> 32767 : 2 | |
> 32768 -> 65535 : 2 | |
> 65536 -> 131071 : 3 | |
> 131072 -> 262143 : 6 | |
> 262144 -> 524287 : 8 | |
> 524288 -> 1048575 : 19 | |
> 1048576 -> 2097151 : 35 | |
> 2097152 -> 4194303 : 55 | |
> 4194304 -> 8388607 : 117 | |
> 8388608 -> 16777215 : 165 | |
> 16777216 -> 33554431 : 1112 | |
> avg = 33867 nsecs, total: 35589643563 nsecs, count: 1050849
>
> With improvement:
> Tracing 1 functions for "alloc_iova"...
> nsecs : count distribution
> 512 -> 1023 : 1033561 |****************************************|
> 1024 -> 2047 : 13631 | |
> 2048 -> 4095 : 2981 | |
> 4096 -> 8191 : 448 | |
> 8192 -> 16383 : 5 | |
> 16384 -> 32767 : 1 | |
> avg = 696 nsecs, total: 732196323 nsecs, count: 1050627
>
> Introduce the improved algorithm:
>
> ------------------------------------------------------------------------
> | gap1 |iova1| gap2 |iova2| gap3 |iova3| gap4 |iova4| gap5 |anchor|
> ------------------------------------------------------------------------
>
> let A = allocatable_size
> let B = max_allocatable_size
> ____________
> / iova2 \ B = max( left_child->B,
> | A | right_child->B,
> \ B / A)
> ------------
> / \
> / \
> ____________ ____________
> / iova1 \ / iova4 \
> | A | | A |
> \ B / \ B /
> ------------ ------------
> / \
> / \
> ____________ ____________
> / iova3 \ / anchor \
> | A | | A |
> \ B / \ B /
> ------------ ------------
>
> Define the gap of a iova is the gap between the iova and it's previous
> iova. Such as the gap of iova3 is gap3.This gap can be used to allocate.
>
> Add three variables to struct iova.
> prev_iova:
> point to the previous iova, sush as iova3->prev_iova point to
> iova2.
>
> allocatable_size:
> allocatable_size is the max size can be allocated from a gap.
> It is not the length of a gap because the allocated address
> may need to be aligned.
>
> max_allocatable_size:
> max_allocatable_size is the max allocatable_size of all iova's
> gap in the subtree.
>
> max_allocatable_size = max( left_child->max_allocatable_size,
> right_child->max_allocatable_size,
> allocatable_size)
>
> We can use rbtree_augmented to maintain max_allocatable_size in time
> complexity O(log(n)).
>
> In the rbtree, with the max_allocatable_size and allocatable_size,
> searching the gap to allocate is fast and the time complexity is
> O(log(n)).
>
> Signed-off-by: Peng Zhang <[email protected]>
> ---
> drivers/iommu/iova.c | 265 ++++++++++++++++++++++++++++++++-----------
> include/linux/iova.h | 5 +-
> 2 files changed, 204 insertions(+), 66 deletions(-)
>
> diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
> index db77aa675145..79625ac82560 100644
> --- a/drivers/iommu/iova.c
> +++ b/drivers/iommu/iova.c
> @@ -43,6 +43,56 @@ static struct iova *to_iova(struct rb_node *node)
> return rb_entry(node, struct iova, node);
> }
>
> +/*
> + * We can't judge whether it can be allocated only by a given interval length
> + * because the address may be aligned.
> + * This function computes the max allocatable size for a given interval.
> + * The time complexity of this function is O(log(n)).
> + */
> +static unsigned long __compute_allocatable_size(unsigned long lo,
> + unsigned long hi)
> +{
> + unsigned long allocatable_size = 0;
> +
> + if (lo == 0)
> + return hi;
> + while (lo < hi) {
> + unsigned long delta = 1UL << __ffs64(lo);
> +
> + if (hi - lo <= delta) {
> + allocatable_size = max(allocatable_size, hi - lo);
> + break;
> + }
> + allocatable_size = max(allocatable_size, delta);
> + lo += delta;
> + }
> + return allocatable_size;
> +}
> +
> +static inline unsigned long prev_iova_high(struct iova *iova)
> +{
> + return iova->prev_iova ? iova->prev_iova->pfn_hi + 1 : 0;
> +}
> +
> +static inline unsigned long iova_compute_allocatable_size(struct iova *iova)
> +{
> + return __compute_allocatable_size(prev_iova_high(iova), iova->pfn_lo);
> +}
> +
> +static inline unsigned long iova_get_allocatable_size(struct iova *iova)
> +{
> + return iova->allocatable_size;
> +}
> +
> +RB_DECLARE_CALLBACKS_MAX(static, iova_gap_callbacks, struct iova, node,
> + unsigned long, max_allocatable_size,
> + iova_get_allocatable_size)
> +
> +static inline void iova_max_allocatable_size_update(struct iova *iova)
> +{
> + iova_gap_callbacks_propagate(&iova->node, NULL);
> +}
> +
> void
> init_iova_domain(struct iova_domain *iovad, unsigned long granule,
> unsigned long start_pfn)
> @@ -63,8 +113,16 @@ init_iova_domain(struct iova_domain *iovad, unsigned long granule,
> iovad->dma_32bit_pfn = 1UL << (32 - iova_shift(iovad));
> iovad->max32_alloc_size = iovad->dma_32bit_pfn;
> iovad->anchor.pfn_lo = iovad->anchor.pfn_hi = IOVA_ANCHOR;
> + iovad->anchor.prev_iova = NULL;
> + iovad->anchor.allocatable_size =
> + __compute_allocatable_size(0, IOVA_ANCHOR);
> + iovad->anchor.max_allocatable_size = iovad->anchor.allocatable_size;
> +
> rb_link_node(&iovad->anchor.node, NULL, &iovad->rbroot.rb_node);
> rb_insert_color(&iovad->anchor.node, &iovad->rbroot);
> +
> + if (start_pfn)
> + reserve_iova(iovad, 0, start_pfn - 1);
> }
> EXPORT_SYMBOL_GPL(init_iova_domain);
>
> @@ -87,7 +145,8 @@ __cached_rbnode_insert_update(struct iova_domain *iovad, struct iova *new)
> }
>
> static void
> -__cached_rbnode_delete_update(struct iova_domain *iovad, struct iova *free)
> +__cached_rbnode_delete_update(struct iova_domain *iovad, struct iova *free,
> + struct rb_node *next)
> {
> struct iova *cached_iova;
>
> @@ -95,51 +154,32 @@ __cached_rbnode_delete_update(struct iova_domain *iovad, struct iova *free)
> if (free == cached_iova ||
> (free->pfn_hi < iovad->dma_32bit_pfn &&
> free->pfn_lo >= cached_iova->pfn_lo))
> - iovad->cached32_node = rb_next(&free->node);
> + iovad->cached32_node = next;
>
> if (free->pfn_lo < iovad->dma_32bit_pfn)
> iovad->max32_alloc_size = iovad->dma_32bit_pfn;
>
> cached_iova = to_iova(iovad->cached_node);
> if (free->pfn_lo >= cached_iova->pfn_lo)
> - iovad->cached_node = rb_next(&free->node);
> + iovad->cached_node = next;
> }
>
> -static struct rb_node *iova_find_limit(struct iova_domain *iovad, unsigned long limit_pfn)
> +static struct rb_node *iova_find_limit(struct iova_domain *iovad,
> + unsigned long limit_pfn)
> {
> - struct rb_node *node, *next;
> - /*
> - * Ideally what we'd like to judge here is whether limit_pfn is close
> - * enough to the highest-allocated IOVA that starting the allocation
> - * walk from the anchor node will be quicker than this initial work to
> - * find an exact starting point (especially if that ends up being the
> - * anchor node anyway). This is an incredibly crude approximation which
> - * only really helps the most likely case, but is at least trivially easy.
> - */
> - if (limit_pfn > iovad->dma_32bit_pfn)
> - return &iovad->anchor.node;
> -
> - node = iovad->rbroot.rb_node;
> - while (to_iova(node)->pfn_hi < limit_pfn)
> - node = node->rb_right;
> -
> -search_left:
> - while (node->rb_left && to_iova(node->rb_left)->pfn_lo >= limit_pfn)
> - node = node->rb_left;
> -
> - if (!node->rb_left)
> - return node;
> -
> - next = node->rb_left;
> - while (next->rb_right) {
> - next = next->rb_right;
> - if (to_iova(next)->pfn_lo >= limit_pfn) {
> - node = next;
> - goto search_left;
> - }
> - }
> + struct rb_node *curr = iovad->rbroot.rb_node;
>
> - return node;
> + while (curr) {
> + struct iova *iova = to_iova(curr);
> +
> + if (limit_pfn - 1 > iova->pfn_hi)
> + curr = curr->rb_right;
> + else if (limit_pfn <= prev_iova_high(iova))
> + curr = curr->rb_left;
> + else
> + break;
> + }
> + return curr;
> }
>
> /* Insert the iova into domain rbtree by holding writer lock */
> @@ -148,6 +188,7 @@ iova_insert_rbtree(struct rb_root *root, struct iova *iova,
> struct rb_node *start)
> {
> struct rb_node **new, *parent = NULL;
> + struct iova *next_iova;
>
> new = (start) ? &start : &(root->rb_node);
> /* Figure out where to put new node */
> @@ -166,61 +207,143 @@ iova_insert_rbtree(struct rb_root *root, struct iova *iova,
> }
> }
> /* Add new node and rebalance tree. */
> +
> rb_link_node(&iova->node, parent, new);
> - rb_insert_color(&iova->node, root);
> +
> + next_iova = to_iova(rb_next(&iova->node));
> + iova->prev_iova = next_iova->prev_iova;
> + next_iova->prev_iova = iova;
> +
> + iova->allocatable_size = iova_compute_allocatable_size(iova);
> + next_iova->allocatable_size = iova_compute_allocatable_size(next_iova);
> +
> + /*
> + * Do't swap the following two lines, because next_iova is the ancestor
> + * of iova and updating iova first is faster.
> + */
> + iova_max_allocatable_size_update(iova);
> + iova_max_allocatable_size_update(next_iova);
> +
> + rb_insert_augmented(&iova->node, root, &iova_gap_callbacks);
> +}
> +
> +static inline bool check_interval(unsigned long lo, unsigned long hi,
> + unsigned long limit_pfn, unsigned long size,
> + unsigned long align_mask)
> +{
> + hi = min(hi, limit_pfn);
> + if (lo >= hi)
> + return false;
> + if (hi >= size && ((hi - size) & align_mask) >= lo)
> + return true;
> + return false;
> }
>
> static int __alloc_and_insert_iova_range(struct iova_domain *iovad,
> unsigned long size, unsigned long limit_pfn,
> struct iova *new, bool size_aligned)
> {
> - struct rb_node *curr, *prev;
> - struct iova *curr_iova;
> unsigned long flags;
> - unsigned long new_pfn, retry_pfn;
> + struct rb_node *curr;
> + struct rb_node *parent;
> + struct iova *curr_iova;
> unsigned long align_mask = ~0UL;
> - unsigned long high_pfn = limit_pfn, low_pfn = iovad->start_pfn;
> + bool ignore = false;
>
> if (size_aligned)
> align_mask <<= fls_long(size - 1);
>
> - /* Walk the tree backwards */
> spin_lock_irqsave(&iovad->iova_rbtree_lock, flags);
> +
> if (limit_pfn <= iovad->dma_32bit_pfn &&
> size >= iovad->max32_alloc_size)
> goto iova32_full;
>
> curr = __get_cached_rbnode(iovad, limit_pfn);
> curr_iova = to_iova(curr);
> - retry_pfn = curr_iova->pfn_hi + 1;
>
> -retry:
> - do {
> - high_pfn = min(high_pfn, curr_iova->pfn_lo);
> - new_pfn = (high_pfn - size) & align_mask;
> - prev = curr;
> - curr = rb_prev(curr);
> - curr_iova = to_iova(curr);
> - } while (curr && new_pfn <= curr_iova->pfn_hi && new_pfn >= low_pfn);
> -
> - if (high_pfn < size || new_pfn < low_pfn) {
> - if (low_pfn == iovad->start_pfn && retry_pfn < limit_pfn) {
> - high_pfn = limit_pfn;
> - low_pfn = retry_pfn;
> - curr = iova_find_limit(iovad, limit_pfn);
> - curr_iova = to_iova(curr);
> - goto retry;
> + if (limit_pfn >= curr_iova->pfn_lo &&
> + curr_iova->allocatable_size >= size)
> + goto found;
> +
> + /* If limit_pfn > dma_32bit_pfn, this could be faster. */
> + if (limit_pfn > iovad->dma_32bit_pfn) {
> + curr_iova = to_iova(&iovad->anchor.node);
> +
> + while (curr_iova) {
> + if (check_interval(prev_iova_high(curr_iova),
> + curr_iova->pfn_lo, limit_pfn,
> + size, align_mask))
> + goto found;
> + curr_iova = curr_iova->prev_iova;
> }
> iovad->max32_alloc_size = size;
> goto iova32_full;
> }
>
> + curr = iova_find_limit(iovad, limit_pfn);
> + curr_iova = to_iova(curr);
> +
> + if (check_interval(prev_iova_high(curr_iova),
> + curr_iova->pfn_lo, limit_pfn,
> + size, align_mask))
> + goto found;
> +
> + while (true) {
> + /* Check left subtree */
> + if (!ignore && curr->rb_left) {
> + curr_iova = to_iova(curr->rb_left);
> + if (curr_iova->max_allocatable_size >= size)
> + goto check_subtree;
> + }
> +
> + parent = rb_parent(curr);
> + if (parent == NULL)
> + break;
> + /*
> + * If current node is the left child of it's parent,
> + * the parent node and the parent's right sub_tree should not
> + * to be checked because they exceed the limit_pfn.
> + */
> + ignore = parent->rb_left == curr;
> + curr = parent;
> +
> + /* Check current node. */
> + if (!ignore) {
> + curr_iova = to_iova(curr);
> + if (curr_iova->allocatable_size >= size)
> + goto found;
> + }
> + }
> + if (limit_pfn >= iovad->dma_32bit_pfn)
> + iovad->max32_alloc_size = size;
> + goto iova32_full;
> +
> +check_subtree:
> + while (true) {
> + if (curr_iova->allocatable_size >= size)
> + goto found;
> +
> + curr = &curr_iova->node;
> + if (curr->rb_right &&
> + to_iova(curr->rb_right)->max_allocatable_size >= size) {
> + curr_iova = to_iova(curr->rb_right);
> + continue;
> + }
> + WARN_ON(curr->rb_left == NULL);
> + curr_iova = to_iova(curr->rb_left);
> + }
> +
> +found:
> /* pfn_lo will point to size aligned address if size_aligned is set */
> - new->pfn_lo = new_pfn;
> + new->pfn_lo = (min(curr_iova->pfn_lo, limit_pfn) - size) & align_mask;
> new->pfn_hi = new->pfn_lo + size - 1;
>
> - /* If we have 'prev', it's a valid place to start the insertion. */
> - iova_insert_rbtree(&iovad->rbroot, new, prev);
> + /*
> + * If we have 'prev' or 'next',
> + * it's a valid place to start the insertion.
> + */
> + iova_insert_rbtree(&iovad->rbroot, new, &curr_iova->node);
> __cached_rbnode_insert_update(iovad, new);
>
> spin_unlock_irqrestore(&iovad->iova_rbtree_lock, flags);
> @@ -352,9 +475,18 @@ private_find_iova(struct iova_domain *iovad, unsigned long pfn)
>
> static void remove_iova(struct iova_domain *iovad, struct iova *iova)
> {
> + struct rb_node *next;
> + struct iova *next_iova;
> assert_spin_locked(&iovad->iova_rbtree_lock);
> - __cached_rbnode_delete_update(iovad, iova);
> - rb_erase(&iova->node, &iovad->rbroot);
> +
> + next = rb_next(&iova->node);
> + __cached_rbnode_delete_update(iovad, iova, next);
> +
> + next_iova = to_iova(next);
> + next_iova->prev_iova = iova->prev_iova;
> + next_iova->allocatable_size = iova_compute_allocatable_size(next_iova);
> + iova_max_allocatable_size_update(next_iova);
> + rb_erase_augmented(&iova->node, &iovad->rbroot, &iova_gap_callbacks);
> }
>
> /**
> @@ -554,8 +686,11 @@ static void
> __adjust_overlap_range(struct iova *iova,
> unsigned long *pfn_lo, unsigned long *pfn_hi)
> {
> - if (*pfn_lo < iova->pfn_lo)
> + if (*pfn_lo < iova->pfn_lo) {
> iova->pfn_lo = *pfn_lo;
> + iova->allocatable_size = iova_compute_allocatable_size(iova);
> + iova_max_allocatable_size_update(iova);
> + }
> if (*pfn_hi > iova->pfn_hi)
> *pfn_lo = iova->pfn_hi + 1;
> }
> diff --git a/include/linux/iova.h b/include/linux/iova.h
> index 320a70e40233..feb8121f104d 100644
> --- a/include/linux/iova.h
> +++ b/include/linux/iova.h
> @@ -11,7 +11,7 @@
>
> #include <linux/types.h>
> #include <linux/kernel.h>
> -#include <linux/rbtree.h>
> +#include <linux/rbtree_augmented.h>
> #include <linux/dma-mapping.h>
>
> /* iova structure */
> @@ -19,6 +19,9 @@ struct iova {
> struct rb_node node;
> unsigned long pfn_hi; /* Highest allocated pfn */
> unsigned long pfn_lo; /* Lowest allocated pfn */
> + struct iova *prev_iova;
> + unsigned long allocatable_size;
> + unsigned long max_allocatable_size;
> };
>
>

--
"firm, enduring, strong, and long-lived"

2022-08-25 08:25:02

by Peng Zhang

[permalink] [raw]
Subject: Re: [External] Re: [PATCH v2] iommu/iova: Optimize alloc_iova with rbtree_augmented


Hi,

Here is a real example. The version of kernel is 5.4.56.
Occurs when a lot of iova are not released for a long time.

[Wed May 25 05:27:59 2022] watchdog: BUG: soft lockup - CPU#58 stuck for
23s! [ksoftirqd/58:302]
[Wed May 25 05:27:59 2022] Call Trace:
[Wed May 25 05:27:59 2022] alloc_iova+0xf2/0x140
[Wed May 25 05:27:59 2022] alloc_iova_fast+0x56/0x251
[Wed May 25 05:27:59 2022] dma_ops_alloc_iova.isra.27+0x4b/0x70
[Wed May 25 05:27:59 2022] __map_single.isra.28+0x4a/0x1d0
[Wed May 25 05:27:59 2022] mlx5e_sq_xmit+0x98d/0x12b0 [mlx5_core]
[Wed May 25 05:27:59 2022] ? packet_rcv+0x43/0x460
[Wed May 25 05:27:59 2022] ? dev_hard_start_xmit+0x90/0x1e0
[Wed May 25 05:27:59 2022] ? sch_direct_xmit+0x111/0x320
[Wed May 25 05:27:59 2022] ? __qdisc_run+0x143/0x540
[Wed May 25 05:27:59 2022] ? __dev_queue_xmit+0x6c3/0x8e0
[Wed May 25 05:27:59 2022] ? ip_finish_output2+0x2d5/0x580
[Wed May 25 05:27:59 2022] ? __ip_finish_output+0xe9/0x1b0
[Wed May 25 05:27:59 2022] ? ip_output+0x6c/0xe0
[Wed May 25 05:27:59 2022] ? __ip_finish_output+0x1b0/0x1b0
[Wed May 25 05:27:59 2022] ? __ip_queue_xmit+0x15d/0x420
[Wed May 25 05:27:59 2022] ? __tcp_transmit_skb+0x405/0x600
[Wed May 25 05:27:59 2022] ? tcp_delack_timer_handler+0xb7/0x1b0
[Wed May 25 05:27:59 2022] ? tcp_delack_timer+0x8b/0xa0
[Wed May 25 05:27:59 2022] ? tcp_delack_timer_handler+0x1b0/0x1b0
[Wed May 25 05:27:59 2022] ? call_timer_fn+0x2b/0x120
[Wed May 25 05:27:59 2022] ? run_timer_softirq+0x1a6/0x420
[Wed May 25 05:27:59 2022] ? update_load_avg+0x7e/0x640
[Wed May 25 05:27:59 2022] ? update_curr+0xe1/0x1d0
[Wed May 25 05:27:59 2022] ? __switch_to+0x7a/0x3e0
[Wed May 25 05:27:59 2022] ? __do_softirq+0xda/0x2da
[Wed May 25 05:27:59 2022] ? sort_range+0x20/0x20
[Wed May 25 05:27:59 2022] ? run_ksoftirqd+0x26/0x40
[Wed May 25 05:27:59 2022] ? smpboot_thread_fn+0xb8/0x150
[Wed May 25 05:27:59 2022] ? kthread+0x110/0x130
[Wed May 25 05:27:59 2022] ? kthread_park+0x80/0x80
[Wed May 25 05:27:59 2022] ? ret_from_fork+0x1f/0x30

I did some more tests.

The test is single threaded.
Granule is 4k, limit is 2^20.

When the 1/4 address space is occupied by iova,
Repeat the following two steps:

1. Randomly releases an iova.
2. Allocate an iova of size 1 within the allocation limit of 2^20.

Before improvement:
> Tracing 1 functions for "alloc_iova"... Hit Ctrl-C to end.
> ^C
> nsecs : count distribution
> 0 -> 1 : 0 | |
> 2 -> 3 : 0 | |
> 4 -> 7 : 0 | |
> 8 -> 15 : 0 | |
> 16 -> 31 : 0 | |
> 32 -> 63 : 0 | |
> 64 -> 127 : 0 | |
> 128 -> 255 : 0 | |
> 256 -> 511 : 352 | |
> 512 -> 1023 : 258078 |****************************************|
> 1024 -> 2047 : 3612 | |
> 2048 -> 4095 : 426 | |
> 4096 -> 8191 : 183 | |
> 8192 -> 16383 : 6 | |
> 16384 -> 32767 : 5 | |
> 32768 -> 65535 : 9 | |
> 65536 -> 131071 : 18 | |
> 131072 -> 262143 : 28 | |
> 262144 -> 524287 : 74 | |
> 524288 -> 1048575 : 109 | |
> 1048576 -> 2097151 : 170 | |
> 2097152 -> 4194303 : 100 | |
> 4194304 -> 8388607 : 1 | |
>
> avg = 3110 nsecs, total: 818614399 nsecs, count: 263171
>
> Tracing 1 functions for "remove_iova"... Hit Ctrl-C to end.
> ^C
> nsecs : count distribution
> 0 -> 1 : 0 | |
> 2 -> 3 : 0 | |
> 4 -> 7 : 0 | |
> 8 -> 15 : 0 | |
> 16 -> 31 : 0 | |
> 32 -> 63 : 0 | |
> 64 -> 127 : 0 | |
> 128 -> 255 : 0 | |
> 256 -> 511 : 250651 |****************************************|
> 512 -> 1023 : 12405 |* |
> 1024 -> 2047 : 111 | |
> 2048 -> 4095 : 1 | |
>
> avg = 433 nsecs, total: 114136319 nsecs, count: 263168

With improvement:
> Tracing 1 functions for "alloc_iova"... Hit Ctrl-C to end.
> ^C
> nsecs : count distribution
> 0 -> 1 : 0 | |
> 2 -> 3 : 0 | |
> 4 -> 7 : 0 | |
> 8 -> 15 : 0 | |
> 16 -> 31 : 0 | |
> 32 -> 63 : 0 | |
> 64 -> 127 : 0 | |
> 128 -> 255 : 0 | |
> 256 -> 511 : 0 | |
> 512 -> 1023 : 258975 |****************************************|
> 1024 -> 2047 : 3618 | |
> 2048 -> 4095 : 497 | |
> 4096 -> 8191 : 74 | |
> 8192 -> 16383 : 4 | |
> 16384 -> 32767 : 1 | |
>
> avg = 637 nsecs, total: 167854061 nsecs, count: 263169
>
> Tracing 1 functions for "remove_iova"... Hit Ctrl-C to end.
> ^C
> nsecs : count distribution
> 0 -> 1 : 0 | |
> 2 -> 3 : 0 | |
> 4 -> 7 : 0 | |
> 8 -> 15 : 0 | |
> 16 -> 31 : 0 | |
> 32 -> 63 : 0 | |
> 64 -> 127 : 0 | |
> 128 -> 255 : 0 | |
> 256 -> 511 : 221560 |****************************************|
> 512 -> 1023 : 41427 |******* |
> 1024 -> 2047 : 179 | |
> 2048 -> 4095 : 2 | |
>
> avg = 477 nsecs, total: 125540399 nsecs, count: 263168


> s/distbution/distribution ?
Sorry, it's a typo.

I don't have a test program for "alloc_iova_fast + free_iova_fast"
right now.

Thanks,

Peng

2022-08-26 09:13:44

by Ethan Zhao

[permalink] [raw]
Subject: Re: [External] Re: [PATCH v2] iommu/iova: Optimize alloc_iova with rbtree_augmented

Peng,

在 2022/8/25 16:10, Peng Zhang 写道:
>
> Hi,
>
> Here is a real example. The version of kernel is 5.4.56.
> Occurs when a lot of iova are not released for a long time.
>
> [Wed May 25 05:27:59 2022] watchdog: BUG: soft lockup - CPU#58 stuck
> for 23s! [ksoftirqd/58:302]
> [Wed May 25 05:27:59 2022] Call Trace:
> [Wed May 25 05:27:59 2022]  alloc_iova+0xf2/0x140
> [Wed May 25 05:27:59 2022]  alloc_iova_fast+0x56/0x251
The rcache doesn't work at all , the worst case.
> [Wed May 25 05:27:59 2022]  dma_ops_alloc_iova.isra.27+0x4b/0x70
> [Wed May 25 05:27:59 2022]  __map_single.isra.28+0x4a/0x1d0
> [Wed May 25 05:27:59 2022]  mlx5e_sq_xmit+0x98d/0x12b0 [mlx5_core]
> [Wed May 25 05:27:59 2022]  ? packet_rcv+0x43/0x460
> [Wed May 25 05:27:59 2022]  ? dev_hard_start_xmit+0x90/0x1e0
> [Wed May 25 05:27:59 2022]  ? sch_direct_xmit+0x111/0x320
> [Wed May 25 05:27:59 2022]  ? __qdisc_run+0x143/0x540
> [Wed May 25 05:27:59 2022]  ? __dev_queue_xmit+0x6c3/0x8e0
> [Wed May 25 05:27:59 2022]  ? ip_finish_output2+0x2d5/0x580
> [Wed May 25 05:27:59 2022]  ? __ip_finish_output+0xe9/0x1b0
> [Wed May 25 05:27:59 2022]  ? ip_output+0x6c/0xe0
> [Wed May 25 05:27:59 2022]  ? __ip_finish_output+0x1b0/0x1b0
> [Wed May 25 05:27:59 2022]  ? __ip_queue_xmit+0x15d/0x420
> [Wed May 25 05:27:59 2022]  ? __tcp_transmit_skb+0x405/0x600
> [Wed May 25 05:27:59 2022]  ? tcp_delack_timer_handler+0xb7/0x1b0
> [Wed May 25 05:27:59 2022]  ? tcp_delack_timer+0x8b/0xa0
> [Wed May 25 05:27:59 2022]  ? tcp_delack_timer_handler+0x1b0/0x1b0
> [Wed May 25 05:27:59 2022]  ? call_timer_fn+0x2b/0x120
> [Wed May 25 05:27:59 2022]  ? run_timer_softirq+0x1a6/0x420
> [Wed May 25 05:27:59 2022]  ? update_load_avg+0x7e/0x640
> [Wed May 25 05:27:59 2022]  ? update_curr+0xe1/0x1d0
> [Wed May 25 05:27:59 2022]  ? __switch_to+0x7a/0x3e0
> [Wed May 25 05:27:59 2022]  ? __do_softirq+0xda/0x2da
> [Wed May 25 05:27:59 2022]  ? sort_range+0x20/0x20
> [Wed May 25 05:27:59 2022]  ? run_ksoftirqd+0x26/0x40
> [Wed May 25 05:27:59 2022]  ? smpboot_thread_fn+0xb8/0x150
> [Wed May 25 05:27:59 2022]  ? kthread+0x110/0x130
> [Wed May 25 05:27:59 2022]  ? kthread_park+0x80/0x80
> [Wed May 25 05:27:59 2022]  ? ret_from_fork+0x1f/0x30
>
> I did some more tests.
>
> The test is single threaded.
> Granule is 4k, limit is 2^20.
>
> When the 1/4 address space is occupied by iova,
> Repeat the following two steps:
>
> 1. Randomly releases an iova.
> 2. Allocate an iova of size 1 within the allocation limit of 2^20.
>
> Before improvement:
>> Tracing 1 functions for "alloc_iova"... Hit Ctrl-C to end.
>> ^C
>>      nsecs               : count     distribution
>>          0 -> 1          : 0 |                                        |
>>          2 -> 3          : 0 |                                        |
>>          4 -> 7          : 0 |                                        |
>>          8 -> 15         : 0 |                                        |
>>         16 -> 31         : 0 |                                        |
>>         32 -> 63         : 0 |                                        |
>>         64 -> 127        : 0 |                                        |
>>        128 -> 255        : 0 |                                        |
>>        256 -> 511        : 352
>> |                                        |
>>        512 -> 1023       : 258078
>> |****************************************|
>>       1024 -> 2047       : 3612
>> |                                        |
>>       2048 -> 4095       : 426
>> |                                        |
>>       4096 -> 8191       : 183
>> |                                        |
>>       8192 -> 16383      : 6 |                                        |
>>      16384 -> 32767      : 5 |                                        |
>>      32768 -> 65535      : 9 |                                        |
>>      65536 -> 131071     : 18 |                                        |
>>     131072 -> 262143     : 28 |                                        |
>>     262144 -> 524287     : 74 |                                        |
>>     524288 -> 1048575    : 109
>> |                                        |
>>    1048576 -> 2097151    : 170
>> |                                        |
>>    2097152 -> 4194303    : 100
>> |                                        |
>>    4194304 -> 8388607    : 1 |                                        |
>>
>> avg = 3110 nsecs, total: 818614399 nsecs, count: 263171
>>
>> Tracing 1 functions for "remove_iova"... Hit Ctrl-C to end.
>> ^C
>>      nsecs               : count     distribution
>>          0 -> 1          : 0 |                                        |
>>          2 -> 3          : 0 |                                        |
>>          4 -> 7          : 0 |                                        |
>>          8 -> 15         : 0 |                                        |
>>         16 -> 31         : 0 |                                        |
>>         32 -> 63         : 0 |                                        |
>>         64 -> 127        : 0 |                                        |
>>        128 -> 255        : 0 |                                        |
>>        256 -> 511        : 250651
>> |****************************************|
>>        512 -> 1023       : 12405
>> |*                                       |
>>       1024 -> 2047       : 111
>> |                                        |
>>       2048 -> 4095       : 1 |                                        |
>>
>> avg = 433 nsecs, total: 114136319 nsecs, count: 263168
>
> With improvement:
>> Tracing 1 functions for "alloc_iova"... Hit Ctrl-C to end.
>> ^C
>>      nsecs               : count     distribution
>>          0 -> 1          : 0 |                                        |
>>          2 -> 3          : 0 |                                        |
>>          4 -> 7          : 0 |                                        |
>>          8 -> 15         : 0 |                                        |
>>         16 -> 31         : 0 |                                        |
>>         32 -> 63         : 0 |                                        |
>>         64 -> 127        : 0 |                                        |
>>        128 -> 255        : 0 |                                        |
>>        256 -> 511        : 0 |                                        |
>>        512 -> 1023       : 258975
>> |****************************************|
>>       1024 -> 2047       : 3618
>> |                                        |
>>       2048 -> 4095       : 497
>> |                                        |
>>       4096 -> 8191       : 74 |                                        |
>>       8192 -> 16383      : 4 |                                        |
>>      16384 -> 32767      : 1 |                                        |
>>
>> avg = 637 nsecs, total: 167854061 nsecs, count: 263169
>>
>> Tracing 1 functions for "remove_iova"... Hit Ctrl-C to end.
>> ^C
>>      nsecs               : count     distribution
>>          0 -> 1          : 0 |                                        |
>>          2 -> 3          : 0 |                                        |
>>          4 -> 7          : 0 |                                        |
>>          8 -> 15         : 0 |                                        |
>>         16 -> 31         : 0 |                                        |
>>         32 -> 63         : 0 |                                        |
>>         64 -> 127        : 0 |                                        |
>>        128 -> 255        : 0 |                                        |
>>        256 -> 511        : 221560
>> |****************************************|
>>        512 -> 1023       : 41427
>> |*******                                 |
>>       1024 -> 2047       : 179
>> |                                        |
>>       2048 -> 4095       : 2 |                                        |
>>
>> avg = 477 nsecs, total: 125540399 nsecs, count: 263168

Though only 3-4 drivers use alloc_iova() directly, in my understanding

your test has simulated the worst case, rcache doesn't work at all,

"alloc_iova" +“remove_iova” number looks great for worst case.


Reviewed-by: Ethan Zhao <[email protected]>



>
>> s/distbution/distribution ?
> Sorry, it's a typo.
>
> I don't have a test program for "alloc_iova_fast + free_iova_fast"
> right now.
>
> Thanks,
>
> Peng

--
"firm, enduring, strong, and long-lived"

2022-08-26 10:57:58

by Peng Zhang

[permalink] [raw]
Subject: Re: [PATCH v2] iommu/iova: Optimize alloc_iova with rbtree_augmented


> Though only 3-4 drivers use alloc_iova() directly, in my understanding
>
> your test has simulated the worst case, rcache doesn't work at all,
>
> "alloc_iova" +“remove_iova” number looks great for worst case.

There is another case, when the size to allocate greater to 2^5, even if
alloc_iova_fast() is used, alloc_iova() will always be called because
the maximum iova size that rcache supports to allocate is 32.
IOVA_RANGE_CACHE_MAX_SIZE specifies the maximum size.

Thanks,

Peng

2022-09-01 11:34:24

by John Garry

[permalink] [raw]
Subject: Re: [PATCH v2] iommu/iova: Optimize alloc_iova with rbtree_augmented

On 26/08/2022 11:28, Peng Zhang wrote:
>
>> Though only 3-4 drivers use alloc_iova() directly, in my understanding
>>
>> your test has simulated the worst case, rcache doesn't work at all,
>>
>> "alloc_iova" +“remove_iova” number looks great for worst case.
>
> There is another case, when the size to allocate greater to 2^5, even if
> alloc_iova_fast() is used, alloc_iova() will always be called because
> the maximum iova size that rcache supports to allocate is 32.
> IOVA_RANGE_CACHE_MAX_SIZE specifies the maximum size.
>


If you really have a performance issue with alloc_iova_fast() ->
alloc_iova() then I suggest that you consider trying to use
dma_opt_mapping_size() to teach the DMA engine driver to not create
requests whose overall size exceeds to the rcache limit.

Thanks,
John

2022-09-02 03:47:15

by Peng Zhang

[permalink] [raw]
Subject: Re: [PATCH v2] iommu/iova: Optimize alloc_iova with rbtree_augmented


> If you really have a performance issue with alloc_iova_fast() ->
> alloc_iova() then I suggest that you consider trying to use
> dma_opt_mapping_size() to teach the DMA engine driver to not create
> requests whose overall size exceeds to the rcache limit.

Yes. But I don't think it essentially solves the problem.
A library for users should run stably ant it shouldn't hold
the spinlock for a long time in some cases. It can even be
said to be a bug.

Like this:
[Wed May 25 05:27:59 2022] watchdog: BUG: soft lockup - CPU#58
stuck for 23s!
[Wed May 25 05:27:59 2022] Call Trace:
[Wed May 25 05:27:59 2022] alloc_iova+0xf2/0x140
[Wed May 25 05:27:59 2022] alloc_iova_fast+0x56/0x251

Now we avoid this problem with iommu=pt, but it didn't solve the problem.

Thanks,
Peng

2022-09-21 01:22:44

by wangjie (L)

[permalink] [raw]
Subject: Re: [PATCH v2] iommu/iova: Optimize alloc_iova with rbtree_augmented

This patch seems to solve the performance issues i have. Currently my
nic's rx performance is unstable in large-capacity scenarios. I applied
this patch to 5.19 rc4 and tested 8 times rx performance in these
scenes. Here are test results, "before" row is the result of 5.19 rc4.
"after" row means 5.19 rc4 with this patch, the unit is Mbits/s

1 2 3 4 5 6 7 8
before 55430 76701 84194 77560 88292 90106 87770 77273
after 92770 92767 92792 92764 92742 92696 92781 92756

Obviously, after using this patch, the performance is stable.

On 2022/8/24 17:51, Peng Zhang wrote:
> The current algorithm of alloc_iova is to scan all iovas until it finds
> a gap that satisfies the condition to allocate. This can be very slow in
> some scenarios. We can optimize alloc_iova() from time complexity O(n)
> to O(log(n)).
>
> We can make a test like this:
> Write a module and initialize iova_domain with 4k granule.
> Then using a user-mode program to call the module to allocate iova of
> size 1 2^20 times within the allocation limit of 2^20. This is single
> threaded and the low 4g space is full after 2^20 allocations.
>
> Finally loop the following three steps:
> 1. Randomly releases an iova.
>
> 2. Allocate an iova of size 1 within the allocation limit of 2^20.
>
> 3. Allocate an iova of size 1 within the allocation limit of 2^20.
> This will fail and take a very long time, because max32_alloc_size
> is reset whenever an iova is released.
>
> The data below is the result of repeating the three steps 1024 times in
> a physical machine with a CPU clocked at 2.30GHz
>
> Before improvement:
> Tracing 1 functions for "alloc_iova"...
> nsecs : count distbution
> 256 -> 511 : 1594 | |
> 512 -> 1023 : 1030686 |**************************************|
> 1024 -> 2047 : 14661 | |
> 2048 -> 4095 : 1730 | |
> 4096 -> 8191 : 634 | |
> 8192 -> 16383 : 20 | |
> 16384 -> 32767 : 2 | |
> 32768 -> 65535 : 2 | |
> 65536 -> 131071 : 3 | |
> 131072 -> 262143 : 6 | |
> 262144 -> 524287 : 8 | |
> 524288 -> 1048575 : 19 | |
> 1048576 -> 2097151 : 35 | |
> 2097152 -> 4194303 : 55 | |
> 4194304 -> 8388607 : 117 | |
> 8388608 -> 16777215 : 165 | |
> 16777216 -> 33554431 : 1112 | |
> avg = 33867 nsecs, total: 35589643563 nsecs, count: 1050849
>
> With improvement:
> Tracing 1 functions for "alloc_iova"...
> nsecs : count distribution
> 512 -> 1023 : 1033561 |****************************************|
> 1024 -> 2047 : 13631 | |
> 2048 -> 4095 : 2981 | |
> 4096 -> 8191 : 448 | |
> 8192 -> 16383 : 5 | |
> 16384 -> 32767 : 1 | |
> avg = 696 nsecs, total: 732196323 nsecs, count: 1050627
>
> Introduce the improved algorithm:
>
> ------------------------------------------------------------------------
> | gap1 |iova1| gap2 |iova2| gap3 |iova3| gap4 |iova4| gap5 |anchor|
> ------------------------------------------------------------------------
>
> let A = allocatable_size
> let B = max_allocatable_size
> ____________
> / iova2 \ B = max( left_child->B,
> | A | right_child->B,
> \ B / A)
> ------------
> / \
> / \
> ____________ ____________
> / iova1 \ / iova4 \
> | A | | A |
> \ B / \ B /
> ------------ ------------
> / \
> / \
> ____________ ____________
> / iova3 \ / anchor \
> | A | | A |
> \ B / \ B /
> ------------ ------------
>
> Define the gap of a iova is the gap between the iova and it's previous
> iova. Such as the gap of iova3 is gap3.This gap can be used to allocate.
>
> Add three variables to struct iova.
> prev_iova:
> point to the previous iova, sush as iova3->prev_iova point to
> iova2.
>
> allocatable_size:
> allocatable_size is the max size can be allocated from a gap.
> It is not the length of a gap because the allocated address
> may need to be aligned.
>
> max_allocatable_size:
> max_allocatable_size is the max allocatable_size of all iova's
> gap in the subtree.
>
> max_allocatable_size = max( left_child->max_allocatable_size,
> right_child->max_allocatable_size,
> allocatable_size)
>
> We can use rbtree_augmented to maintain max_allocatable_size in time
> complexity O(log(n)).
>
> In the rbtree, with the max_allocatable_size and allocatable_size,
> searching the gap to allocate is fast and the time complexity is
> O(log(n)).
>
> Signed-off-by: Peng Zhang <[email protected]>
> ---
> drivers/iommu/iova.c | 265 ++++++++++++++++++++++++++++++++-----------
> include/linux/iova.h | 5 +-
> 2 files changed, 204 insertions(+), 66 deletions(-)
>
> diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
> index db77aa675145..79625ac82560 100644
> --- a/drivers/iommu/iova.c
> +++ b/drivers/iommu/iova.c
> @@ -43,6 +43,56 @@ static struct iova *to_iova(struct rb_node *node)
> return rb_entry(node, struct iova, node);
> }
>
> +/*
> + * We can't judge whether it can be allocated only by a given interval length
> + * because the address may be aligned.
> + * This function computes the max allocatable size for a given interval.
> + * The time complexity of this function is O(log(n)).
> + */
> +static unsigned long __compute_allocatable_size(unsigned long lo,
> + unsigned long hi)
> +{
> + unsigned long allocatable_size = 0;
> +
> + if (lo == 0)
> + return hi;
> + while (lo < hi) {
> + unsigned long delta = 1UL << __ffs64(lo);
> +
> + if (hi - lo <= delta) {
> + allocatable_size = max(allocatable_size, hi - lo);
> + break;
> + }
> + allocatable_size = max(allocatable_size, delta);
> + lo += delta;
> + }
> + return allocatable_size;
> +}
> +
> +static inline unsigned long prev_iova_high(struct iova *iova)
> +{
> + return iova->prev_iova ? iova->prev_iova->pfn_hi + 1 : 0;
> +}
> +
> +static inline unsigned long iova_compute_allocatable_size(struct iova *iova)
> +{
> + return __compute_allocatable_size(prev_iova_high(iova), iova->pfn_lo);
> +}
> +
> +static inline unsigned long iova_get_allocatable_size(struct iova *iova)
> +{
> + return iova->allocatable_size;
> +}
> +
> +RB_DECLARE_CALLBACKS_MAX(static, iova_gap_callbacks, struct iova, node,
> + unsigned long, max_allocatable_size,
> + iova_get_allocatable_size)
> +
> +static inline void iova_max_allocatable_size_update(struct iova *iova)
> +{
> + iova_gap_callbacks_propagate(&iova->node, NULL);
> +}
> +
> void
> init_iova_domain(struct iova_domain *iovad, unsigned long granule,
> unsigned long start_pfn)
> @@ -63,8 +113,16 @@ init_iova_domain(struct iova_domain *iovad, unsigned long granule,
> iovad->dma_32bit_pfn = 1UL << (32 - iova_shift(iovad));
> iovad->max32_alloc_size = iovad->dma_32bit_pfn;
> iovad->anchor.pfn_lo = iovad->anchor.pfn_hi = IOVA_ANCHOR;
> + iovad->anchor.prev_iova = NULL;
> + iovad->anchor.allocatable_size =
> + __compute_allocatable_size(0, IOVA_ANCHOR);
> + iovad->anchor.max_allocatable_size = iovad->anchor.allocatable_size;
> +
> rb_link_node(&iovad->anchor.node, NULL, &iovad->rbroot.rb_node);
> rb_insert_color(&iovad->anchor.node, &iovad->rbroot);
> +
> + if (start_pfn)
> + reserve_iova(iovad, 0, start_pfn - 1);
> }
> EXPORT_SYMBOL_GPL(init_iova_domain);
>
> @@ -87,7 +145,8 @@ __cached_rbnode_insert_update(struct iova_domain *iovad, struct iova *new)
> }
>
> static void
> -__cached_rbnode_delete_update(struct iova_domain *iovad, struct iova *free)
> +__cached_rbnode_delete_update(struct iova_domain *iovad, struct iova *free,
> + struct rb_node *next)
> {
> struct iova *cached_iova;
>
> @@ -95,51 +154,32 @@ __cached_rbnode_delete_update(struct iova_domain *iovad, struct iova *free)
> if (free == cached_iova ||
> (free->pfn_hi < iovad->dma_32bit_pfn &&
> free->pfn_lo >= cached_iova->pfn_lo))
> - iovad->cached32_node = rb_next(&free->node);
> + iovad->cached32_node = next;
>
> if (free->pfn_lo < iovad->dma_32bit_pfn)
> iovad->max32_alloc_size = iovad->dma_32bit_pfn;
>
> cached_iova = to_iova(iovad->cached_node);
> if (free->pfn_lo >= cached_iova->pfn_lo)
> - iovad->cached_node = rb_next(&free->node);
> + iovad->cached_node = next;
> }
>
> -static struct rb_node *iova_find_limit(struct iova_domain *iovad, unsigned long limit_pfn)
> +static struct rb_node *iova_find_limit(struct iova_domain *iovad,
> + unsigned long limit_pfn)
> {
> - struct rb_node *node, *next;
> - /*
> - * Ideally what we'd like to judge here is whether limit_pfn is close
> - * enough to the highest-allocated IOVA that starting the allocation
> - * walk from the anchor node will be quicker than this initial work to
> - * find an exact starting point (especially if that ends up being the
> - * anchor node anyway). This is an incredibly crude approximation which
> - * only really helps the most likely case, but is at least trivially easy.
> - */
> - if (limit_pfn > iovad->dma_32bit_pfn)
> - return &iovad->anchor.node;
> -
> - node = iovad->rbroot.rb_node;
> - while (to_iova(node)->pfn_hi < limit_pfn)
> - node = node->rb_right;
> -
> -search_left:
> - while (node->rb_left && to_iova(node->rb_left)->pfn_lo >= limit_pfn)
> - node = node->rb_left;
> -
> - if (!node->rb_left)
> - return node;
> -
> - next = node->rb_left;
> - while (next->rb_right) {
> - next = next->rb_right;
> - if (to_iova(next)->pfn_lo >= limit_pfn) {
> - node = next;
> - goto search_left;
> - }
> - }
> + struct rb_node *curr = iovad->rbroot.rb_node;
>
> - return node;
> + while (curr) {
> + struct iova *iova = to_iova(curr);
> +
> + if (limit_pfn - 1 > iova->pfn_hi)
> + curr = curr->rb_right;
> + else if (limit_pfn <= prev_iova_high(iova))
> + curr = curr->rb_left;
> + else
> + break;
> + }
> + return curr;
> }
>
> /* Insert the iova into domain rbtree by holding writer lock */
> @@ -148,6 +188,7 @@ iova_insert_rbtree(struct rb_root *root, struct iova *iova,
> struct rb_node *start)
> {
> struct rb_node **new, *parent = NULL;
> + struct iova *next_iova;
>
> new = (start) ? &start : &(root->rb_node);
> /* Figure out where to put new node */
> @@ -166,61 +207,143 @@ iova_insert_rbtree(struct rb_root *root, struct iova *iova,
> }
> }
> /* Add new node and rebalance tree. */
> +
> rb_link_node(&iova->node, parent, new);
> - rb_insert_color(&iova->node, root);
> +
> + next_iova = to_iova(rb_next(&iova->node));
> + iova->prev_iova = next_iova->prev_iova;
> + next_iova->prev_iova = iova;
> +
> + iova->allocatable_size = iova_compute_allocatable_size(iova);
> + next_iova->allocatable_size = iova_compute_allocatable_size(next_iova);
> +
> + /*
> + * Do't swap the following two lines, because next_iova is the ancestor
> + * of iova and updating iova first is faster.
> + */
> + iova_max_allocatable_size_update(iova);
> + iova_max_allocatable_size_update(next_iova);
> +
> + rb_insert_augmented(&iova->node, root, &iova_gap_callbacks);
> +}
> +
> +static inline bool check_interval(unsigned long lo, unsigned long hi,
> + unsigned long limit_pfn, unsigned long size,
> + unsigned long align_mask)
> +{
> + hi = min(hi, limit_pfn);
> + if (lo >= hi)
> + return false;
> + if (hi >= size && ((hi - size) & align_mask) >= lo)
> + return true;
> + return false;
> }
>
> static int __alloc_and_insert_iova_range(struct iova_domain *iovad,
> unsigned long size, unsigned long limit_pfn,
> struct iova *new, bool size_aligned)
> {
> - struct rb_node *curr, *prev;
> - struct iova *curr_iova;
> unsigned long flags;
> - unsigned long new_pfn, retry_pfn;
> + struct rb_node *curr;
> + struct rb_node *parent;
> + struct iova *curr_iova;
> unsigned long align_mask = ~0UL;
> - unsigned long high_pfn = limit_pfn, low_pfn = iovad->start_pfn;
> + bool ignore = false;
>
> if (size_aligned)
> align_mask <<= fls_long(size - 1);
>
> - /* Walk the tree backwards */
> spin_lock_irqsave(&iovad->iova_rbtree_lock, flags);
> +
> if (limit_pfn <= iovad->dma_32bit_pfn &&
> size >= iovad->max32_alloc_size)
> goto iova32_full;
>
> curr = __get_cached_rbnode(iovad, limit_pfn);
> curr_iova = to_iova(curr);
> - retry_pfn = curr_iova->pfn_hi + 1;
>
> -retry:
> - do {
> - high_pfn = min(high_pfn, curr_iova->pfn_lo);
> - new_pfn = (high_pfn - size) & align_mask;
> - prev = curr;
> - curr = rb_prev(curr);
> - curr_iova = to_iova(curr);
> - } while (curr && new_pfn <= curr_iova->pfn_hi && new_pfn >= low_pfn);
> -
> - if (high_pfn < size || new_pfn < low_pfn) {
> - if (low_pfn == iovad->start_pfn && retry_pfn < limit_pfn) {
> - high_pfn = limit_pfn;
> - low_pfn = retry_pfn;
> - curr = iova_find_limit(iovad, limit_pfn);
> - curr_iova = to_iova(curr);
> - goto retry;
> + if (limit_pfn >= curr_iova->pfn_lo &&
> + curr_iova->allocatable_size >= size)
> + goto found;
> +
> + /* If limit_pfn > dma_32bit_pfn, this could be faster. */
> + if (limit_pfn > iovad->dma_32bit_pfn) {
> + curr_iova = to_iova(&iovad->anchor.node);
> +
> + while (curr_iova) {
> + if (check_interval(prev_iova_high(curr_iova),
> + curr_iova->pfn_lo, limit_pfn,
> + size, align_mask))
> + goto found;
> + curr_iova = curr_iova->prev_iova;
> }
> iovad->max32_alloc_size = size;
> goto iova32_full;
> }
>
> + curr = iova_find_limit(iovad, limit_pfn);
> + curr_iova = to_iova(curr);
> +
> + if (check_interval(prev_iova_high(curr_iova),
> + curr_iova->pfn_lo, limit_pfn,
> + size, align_mask))
> + goto found;
> +
> + while (true) {
> + /* Check left subtree */
> + if (!ignore && curr->rb_left) {
> + curr_iova = to_iova(curr->rb_left);
> + if (curr_iova->max_allocatable_size >= size)
> + goto check_subtree;
> + }
> +
> + parent = rb_parent(curr);
> + if (parent == NULL)
> + break;
> + /*
> + * If current node is the left child of it's parent,
> + * the parent node and the parent's right sub_tree should not
> + * to be checked because they exceed the limit_pfn.
> + */
> + ignore = parent->rb_left == curr;
> + curr = parent;
> +
> + /* Check current node. */
> + if (!ignore) {
> + curr_iova = to_iova(curr);
> + if (curr_iova->allocatable_size >= size)
> + goto found;
> + }
> + }
> + if (limit_pfn >= iovad->dma_32bit_pfn)
> + iovad->max32_alloc_size = size;
> + goto iova32_full;
> +
> +check_subtree:
> + while (true) {
> + if (curr_iova->allocatable_size >= size)
> + goto found;
> +
> + curr = &curr_iova->node;
> + if (curr->rb_right &&
> + to_iova(curr->rb_right)->max_allocatable_size >= size) {
> + curr_iova = to_iova(curr->rb_right);
> + continue;
> + }
> + WARN_ON(curr->rb_left == NULL);
> + curr_iova = to_iova(curr->rb_left);
> + }
> +
> +found:
> /* pfn_lo will point to size aligned address if size_aligned is set */
> - new->pfn_lo = new_pfn;
> + new->pfn_lo = (min(curr_iova->pfn_lo, limit_pfn) - size) & align_mask;
> new->pfn_hi = new->pfn_lo + size - 1;
>
> - /* If we have 'prev', it's a valid place to start the insertion. */
> - iova_insert_rbtree(&iovad->rbroot, new, prev);
> + /*
> + * If we have 'prev' or 'next',
> + * it's a valid place to start the insertion.
> + */
> + iova_insert_rbtree(&iovad->rbroot, new, &curr_iova->node);
> __cached_rbnode_insert_update(iovad, new);
>
> spin_unlock_irqrestore(&iovad->iova_rbtree_lock, flags);
> @@ -352,9 +475,18 @@ private_find_iova(struct iova_domain *iovad, unsigned long pfn)
>
> static void remove_iova(struct iova_domain *iovad, struct iova *iova)
> {
> + struct rb_node *next;
> + struct iova *next_iova;
> assert_spin_locked(&iovad->iova_rbtree_lock);
> - __cached_rbnode_delete_update(iovad, iova);
> - rb_erase(&iova->node, &iovad->rbroot);
> +
> + next = rb_next(&iova->node);
> + __cached_rbnode_delete_update(iovad, iova, next);
> +
> + next_iova = to_iova(next);
> + next_iova->prev_iova = iova->prev_iova;
> + next_iova->allocatable_size = iova_compute_allocatable_size(next_iova);
> + iova_max_allocatable_size_update(next_iova);
> + rb_erase_augmented(&iova->node, &iovad->rbroot, &iova_gap_callbacks);
> }
>
> /**
> @@ -554,8 +686,11 @@ static void
> __adjust_overlap_range(struct iova *iova,
> unsigned long *pfn_lo, unsigned long *pfn_hi)
> {
> - if (*pfn_lo < iova->pfn_lo)
> + if (*pfn_lo < iova->pfn_lo) {
> iova->pfn_lo = *pfn_lo;
> + iova->allocatable_size = iova_compute_allocatable_size(iova);
> + iova_max_allocatable_size_update(iova);
> + }
> if (*pfn_hi > iova->pfn_hi)
> *pfn_lo = iova->pfn_hi + 1;
> }
> diff --git a/include/linux/iova.h b/include/linux/iova.h
> index 320a70e40233..feb8121f104d 100644
> --- a/include/linux/iova.h
> +++ b/include/linux/iova.h
> @@ -11,7 +11,7 @@
>
> #include <linux/types.h>
> #include <linux/kernel.h>
> -#include <linux/rbtree.h>
> +#include <linux/rbtree_augmented.h>
> #include <linux/dma-mapping.h>
>
> /* iova structure */
> @@ -19,6 +19,9 @@ struct iova {
> struct rb_node node;
> unsigned long pfn_hi; /* Highest allocated pfn */
> unsigned long pfn_lo; /* Lowest allocated pfn */
> + struct iova *prev_iova;
> + unsigned long allocatable_size;
> + unsigned long max_allocatable_size;
> };
>
>
>

2022-09-21 04:39:56

by Peng Zhang

[permalink] [raw]
Subject: Re: [PATCH v2] iommu/iova: Optimize alloc_iova with rbtree_augmented

> This patch seems to solve the performance issues i have.  Currently my
> nic's rx performance is unstable in large-capacity scenarios. I applied
> this patch to 5.19 rc4 and tested 8 times rx performance in these
> scenes. Here are test results, "before" row is the result of 5.19 rc4.
> "after" row means 5.19 rc4 with this patch, the unit is Mbits/s
>
>     1    2    3    4    5    6    7    8
> before    55430    76701    84194    77560    88292    90106    87770
> 77273
> after    92770    92767    92792    92764    92742    92696    92781
> 92756
>
> Obviously, after using this patch, the performance is stable.

Thank you for your test. Can you add a "Tested-by" token?
I plan to update the commit message in the next version.

2022-09-21 11:35:14

by wangjie (L)

[permalink] [raw]
Subject: Re: [PATCH v2] iommu/iova: Optimize alloc_iova with rbtree_augmented

Tested-by: Jie Wang <[email protected]>

On 2022/9/21 11:55, Peng Zhang wrote:
>> This patch seems to solve the performance issues i have. Currently my
>> nic's rx performance is unstable in large-capacity scenarios. I
>> applied this patch to 5.19 rc4 and tested 8 times rx performance in
>> these scenes. Here are test results, "before" row is the result of
>> 5.19 rc4. "after" row means 5.19 rc4 with this patch, the unit is Mbits/s
>>
>> 1 2 3 4 5 6 7 8
>> before 55430 76701 84194 77560 88292 90106
>> 87770 77273
>> after 92770 92767 92792 92764 92742 92696
>> 92781 92756
>>
>> Obviously, after using this patch, the performance is stable.
>
> Thank you for your test. Can you add a "Tested-by" token?
> I plan to update the commit message in the next version.
> .
>