2023-06-09 12:18:01

by Peng Zhang

[permalink] [raw]
Subject: [PATCH v2 0/3] Optimize the fast path of mas_store()

Add fast paths for mas_wr_append() and mas_wr_slot_store() respectively.
The newly added fast path of mas_wr_append() is used in fork() and how
much it benefits fork() depends on how many VMAs are duplicated.

Changes since v1:
- Revise comment and commit log. [3/3]
- Add test for mas_wr_modify() fast path. [1/3]

v1: https://lore.kernel.org/lkml/[email protected]/

Peng Zhang (3):
maple_tree: add test for mas_wr_modify() fast path
maple_tree: optimize mas_wr_append(), also improve duplicating VMAs
maple_tree: add a fast path case in mas_wr_slot_store()

lib/maple_tree.c | 69 ++++++++++++++++++++++++++++---------------
lib/test_maple_tree.c | 65 ++++++++++++++++++++++++++++++++++++++++
2 files changed, 111 insertions(+), 23 deletions(-)

--
2.20.1



2023-06-09 12:18:22

by Peng Zhang

[permalink] [raw]
Subject: [PATCH v2 2/3] maple_tree: optimize mas_wr_append(), also improve duplicating VMAs

When the new range can be completely covered by the original last range
without touching the boundaries on both sides, two new entries can be
appended to the end as a fast path. We update the original last pivot at
the end, and the newly appended two entries will not be accessed before
this, so it is also safe in RCU mode.

This is useful for sequential insertion, which is what we do in
dup_mmap(). Enabling BENCH_FORK in test_maple_tree and just running
bench_forking() gives the following time-consuming numbers:

before: after:
17,874.83 msec 15,738.38 msec

It shows about a 12% performance improvement for duplicating VMAs.

Signed-off-by: Peng Zhang <[email protected]>
Reviewed-by: Liam R. Howlett <[email protected]>
---
lib/maple_tree.c | 33 ++++++++++++++++++++++-----------
1 file changed, 22 insertions(+), 11 deletions(-)

diff --git a/lib/maple_tree.c b/lib/maple_tree.c
index 5ea211c3f186..a96eb646e839 100644
--- a/lib/maple_tree.c
+++ b/lib/maple_tree.c
@@ -4269,10 +4269,10 @@ static inline unsigned char mas_wr_new_end(struct ma_wr_state *wr_mas)
*
* Return: True if appended, false otherwise
*/
-static inline bool mas_wr_append(struct ma_wr_state *wr_mas)
+static inline bool mas_wr_append(struct ma_wr_state *wr_mas,
+ unsigned char new_end)
{
unsigned char end = wr_mas->node_end;
- unsigned char new_end = end + 1;
struct ma_state *mas = wr_mas->mas;
unsigned char node_pivots = mt_pivots[wr_mas->type];

@@ -4284,16 +4284,27 @@ static inline bool mas_wr_append(struct ma_wr_state *wr_mas)
ma_set_meta(wr_mas->node, maple_leaf_64, 0, new_end);
}

- if (mas->last == wr_mas->r_max) {
- /* Append to end of range */
- rcu_assign_pointer(wr_mas->slots[new_end], wr_mas->entry);
- wr_mas->pivots[end] = mas->index - 1;
- mas->offset = new_end;
+ if (new_end == wr_mas->node_end + 1) {
+ if (mas->last == wr_mas->r_max) {
+ /* Append to end of range */
+ rcu_assign_pointer(wr_mas->slots[new_end],
+ wr_mas->entry);
+ wr_mas->pivots[end] = mas->index - 1;
+ mas->offset = new_end;
+ } else {
+ /* Append to start of range */
+ rcu_assign_pointer(wr_mas->slots[new_end],
+ wr_mas->content);
+ wr_mas->pivots[end] = mas->last;
+ rcu_assign_pointer(wr_mas->slots[end], wr_mas->entry);
+ }
} else {
- /* Append to start of range */
+ /* Append to the range without touching any boundaries. */
rcu_assign_pointer(wr_mas->slots[new_end], wr_mas->content);
- wr_mas->pivots[end] = mas->last;
- rcu_assign_pointer(wr_mas->slots[end], wr_mas->entry);
+ wr_mas->pivots[end + 1] = mas->last;
+ rcu_assign_pointer(wr_mas->slots[end + 1], wr_mas->entry);
+ wr_mas->pivots[end] = mas->index - 1;
+ mas->offset = end + 1;
}

if (!wr_mas->content || !wr_mas->entry)
@@ -4340,7 +4351,7 @@ static inline void mas_wr_modify(struct ma_wr_state *wr_mas)
goto slow_path;

/* Attempt to append */
- if (new_end == wr_mas->node_end + 1 && mas_wr_append(wr_mas))
+ if (mas_wr_append(wr_mas, new_end))
return;

if (new_end == wr_mas->node_end && mas_wr_slot_store(wr_mas))
--
2.20.1


2023-06-09 12:18:39

by Peng Zhang

[permalink] [raw]
Subject: [PATCH v2 3/3] maple_tree: add a fast path case in mas_wr_slot_store()

When expanding a range in two directions, only partially overwriting the
previous and next ranges, the number of entries will not be increased, so
we can just update the pivots as a fast path. However, it may introduce
potential risks in RCU mode (although it may pass the test), because it
updates two pivots. We only enable it in non-RCU mode for now.

Signed-off-by: Peng Zhang <[email protected]>
---
lib/maple_tree.c | 36 ++++++++++++++++++++++++------------
1 file changed, 24 insertions(+), 12 deletions(-)

diff --git a/lib/maple_tree.c b/lib/maple_tree.c
index a96eb646e839..d3072858c280 100644
--- a/lib/maple_tree.c
+++ b/lib/maple_tree.c
@@ -4167,23 +4167,35 @@ static inline bool mas_wr_slot_store(struct ma_wr_state *wr_mas)
{
struct ma_state *mas = wr_mas->mas;
unsigned char offset = mas->offset;
+ void __rcu **slots = wr_mas->slots;
bool gap = false;

- if (wr_mas->offset_end - offset != 1)
- return false;
-
- gap |= !mt_slot_locked(mas->tree, wr_mas->slots, offset);
- gap |= !mt_slot_locked(mas->tree, wr_mas->slots, offset + 1);
+ gap |= !mt_slot_locked(mas->tree, slots, offset);
+ gap |= !mt_slot_locked(mas->tree, slots, offset + 1);

- if (mas->index == wr_mas->r_min) {
- /* Overwriting the range and over a part of the next range. */
- rcu_assign_pointer(wr_mas->slots[offset], wr_mas->entry);
- wr_mas->pivots[offset] = mas->last;
- } else {
- /* Overwriting a part of the range and over the next range */
- rcu_assign_pointer(wr_mas->slots[offset + 1], wr_mas->entry);
+ if (wr_mas->offset_end - offset == 1) {
+ if (mas->index == wr_mas->r_min) {
+ /* Overwriting the range and a part of the next one */
+ rcu_assign_pointer(slots[offset], wr_mas->entry);
+ wr_mas->pivots[offset] = mas->last;
+ } else {
+ /* Overwriting a part of the range and the next one */
+ rcu_assign_pointer(slots[offset + 1], wr_mas->entry);
+ wr_mas->pivots[offset] = mas->index - 1;
+ mas->offset++; /* Keep mas accurate. */
+ }
+ } else if (!mt_in_rcu(mas->tree)) {
+ /*
+ * Expand the range, only partially overwriting the previous and
+ * next ranges
+ */
+ gap |= !mt_slot_locked(mas->tree, slots, offset + 2);
+ rcu_assign_pointer(slots[offset + 1], wr_mas->entry);
wr_mas->pivots[offset] = mas->index - 1;
+ wr_mas->pivots[offset + 1] = mas->last;
mas->offset++; /* Keep mas accurate. */
+ } else {
+ return false;
}

trace_ma_write(__func__, mas, 0, wr_mas->entry);
--
2.20.1


2023-06-13 19:04:26

by Liam R. Howlett

[permalink] [raw]
Subject: Re: [PATCH v2 0/3] Optimize the fast path of mas_store()

* Peng Zhang <[email protected]> [230609 08:04]:
> Add fast paths for mas_wr_append() and mas_wr_slot_store() respectively.
> The newly added fast path of mas_wr_append() is used in fork() and how
> much it benefits fork() depends on how many VMAs are duplicated.
>
> Changes since v1:
> - Revise comment and commit log. [3/3]
> - Add test for mas_wr_modify() fast path. [1/3]

Thanks for adding the tests. I'm just trying to figure out how to best
address testing this in RCU mode. And by testing it I mean add tests in
RCU that does this and detect the failure by modifying your code, then
change it back and have it pass the test by falling back to node store.
This would need to change tools/testing/radix-tree/maple.c to update the
testing there.

>
> v1: https://lore.kernel.org/lkml/[email protected]/
>
> Peng Zhang (3):
> maple_tree: add test for mas_wr_modify() fast path
> maple_tree: optimize mas_wr_append(), also improve duplicating VMAs
> maple_tree: add a fast path case in mas_wr_slot_store()
>
> lib/maple_tree.c | 69 ++++++++++++++++++++++++++++---------------
> lib/test_maple_tree.c | 65 ++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 111 insertions(+), 23 deletions(-)
>
> --
> 2.20.1
>

2023-06-14 11:30:40

by Peng Zhang

[permalink] [raw]
Subject: Re: [PATCH v2 0/3] Optimize the fast path of mas_store()



在 2023/6/14 01:23, Liam R. Howlett 写道:
> * Peng Zhang <[email protected]> [230609 08:04]:
>> Add fast paths for mas_wr_append() and mas_wr_slot_store() respectively.
>> The newly added fast path of mas_wr_append() is used in fork() and how
>> much it benefits fork() depends on how many VMAs are duplicated.
>>
>> Changes since v1:
>> - Revise comment and commit log. [3/3]
>> - Add test for mas_wr_modify() fast path. [1/3]
>
> Thanks for adding the tests. I'm just trying to figure out how to best
> address testing this in RCU mode. And by testing it I mean add tests in
> RCU that does this and detect the failure by modifying your code, then
> change it back and have it pass the test by falling back to node store.
> This would need to change tools/testing/radix-tree/maple.c to update the
> testing there.
I see what you mean now, I will try to make a test in RCU mode.
>
>>
>> v1: https://lore.kernel.org/lkml/[email protected]/
>>
>> Peng Zhang (3):
>> maple_tree: add test for mas_wr_modify() fast path
>> maple_tree: optimize mas_wr_append(), also improve duplicating VMAs
>> maple_tree: add a fast path case in mas_wr_slot_store()
>>
>> lib/maple_tree.c | 69 ++++++++++++++++++++++++++++---------------
>> lib/test_maple_tree.c | 65 ++++++++++++++++++++++++++++++++++++++++
>> 2 files changed, 111 insertions(+), 23 deletions(-)
>>
>> --
>> 2.20.1
>>
>