2023-03-03 02:16:16

by Liam R. Howlett

[permalink] [raw]
Subject: [PATCH 1/2] maple_tree: Fix mas_skip_node() end slot detection

mas_skip_node() is used to move the maple state to the node with a
higher limit. It does this by walking up the tree and increasing the
slot count. Since slot count may not be able to be increased, it may
need to walk up multiple times to find room to walk right to a higher
limit node. The limit of slots that was being used was the node limit
and not the last location of data in the node. This would cause the
maple state to be shifted outside actual data and enter an error state,
thus returning -EBUSY.

The result of the incorrect error state means that mas_awalk() would
return an error instead of finding the allocation space.

The fix is to use mas_data_end() in mas_skip_node() to detect the nodes
data end point and continue walking the tree up until it is safe to move
to a node with a higher limit.

mas_skip_node() may also be passed a maple state in an error state from
mas_anode_descend() when no allocations are available. Return on such
an error state immediately.

Reported-by: Snild Dolkow <[email protected]>
Link: https://lore.kernel.org/linux-mm/[email protected]/
Cc: <[email protected]>
Fixes: 54a611b60590 ("Maple Tree: add new data structure")
Signed-off-by: Liam R. Howlett <[email protected]>
---
lib/maple_tree.c | 25 ++++++++++---------------
1 file changed, 10 insertions(+), 15 deletions(-)

diff --git a/lib/maple_tree.c b/lib/maple_tree.c
index 2be86368237d..2efe854946d6 100644
--- a/lib/maple_tree.c
+++ b/lib/maple_tree.c
@@ -5188,34 +5188,29 @@ static inline bool mas_rewind_node(struct ma_state *mas)
*/
static inline bool mas_skip_node(struct ma_state *mas)
{
- unsigned char slot, slot_count;
unsigned long *pivots;
enum maple_type mt;

- mt = mte_node_type(mas->node);
- slot_count = mt_slots[mt] - 1;
+ if (mas_is_err(mas))
+ return false;
+
do {
if (mte_is_root(mas->node)) {
- slot = mas->offset;
- if (slot > slot_count) {
+ if (mas->offset >= mas_data_end(mas)) {
mas_set_err(mas, -EBUSY);
return false;
}
} else {
mas_ascend(mas);
- slot = mas->offset;
- mt = mte_node_type(mas->node);
- slot_count = mt_slots[mt] - 1;
}
- } while (slot > slot_count);
+ } while (mas->offset >= mas_data_end(mas));

- mas->offset = ++slot;
+ mt = mte_node_type(mas->node);
pivots = ma_pivots(mas_mn(mas), mt);
- if (slot > 0)
- mas->min = pivots[slot - 1] + 1;
-
- if (slot <= slot_count)
- mas->max = pivots[slot];
+ mas->min = pivots[mas->offset] + 1;
+ mas->offset++;
+ if (mas->offset < mt_slots[mt])
+ mas->max = pivots[mas->offset];

return true;
}
--
2.39.2



2023-03-03 02:16:19

by Liam R. Howlett

[permalink] [raw]
Subject: [PATCH 2/2] test_maple_tree: Add more testing for mas_empty_area()

Test robust filling of an entire area of the tree, then test one beyond.
This is to test the walking back up the tree at the end of nodes and
error condition.

Test inspired by the reproducer code provided by Snild Dolkow.

Cc: Snild Dolkow <[email protected]>
Link: https://lore.kernel.org/linux-mm/[email protected]/
Cc: <[email protected]>
Fixes: e15e06a83923 ("lib/test_maple_tree: add testing for maple tree")
Signed-off-by: Liam R. Howlett <[email protected]>
---
lib/test_maple_tree.c | 35 +++++++++++++++++++++++++++++++++++
1 file changed, 35 insertions(+)

diff --git a/lib/test_maple_tree.c b/lib/test_maple_tree.c
index 3d19b1f78d71..fa86874763f0 100644
--- a/lib/test_maple_tree.c
+++ b/lib/test_maple_tree.c
@@ -2670,6 +2670,36 @@ static noinline void check_empty_area_window(struct maple_tree *mt)
rcu_read_unlock();
}

+static noinline void check_empty_area_fill(struct maple_tree *mt)
+{
+ int loop, shift;
+ unsigned long max = 0x25D78000;
+ unsigned long size;
+ MA_STATE(mas, mt, 0, 0);
+
+ mt_set_non_kernel(99999);
+ for (shift = 12; shift <= 16; shift++) {
+ loop = 5000;
+ size = 1 << shift;
+ while (loop--) {
+ mas_lock(&mas);
+ MT_BUG_ON(mt, mas_empty_area(&mas, 0, max, size) != 0);
+ MT_BUG_ON(mt, mas.last != mas.index + size - 1);
+ mas_store_gfp(&mas, &check_empty_area_fill, GFP_KERNEL);
+ mas_unlock(&mas);
+ mas_reset(&mas);
+ }
+ }
+
+ /* No space left. */
+ size = 0x1000;
+ rcu_read_lock();
+ MT_BUG_ON(mt, mas_empty_area(&mas, 0, max, size) != -EBUSY);
+ rcu_read_unlock();
+
+ mt_set_non_kernel(0);
+}
+
static DEFINE_MTREE(tree);
static int maple_tree_seed(void)
{
@@ -2926,6 +2956,11 @@ static int maple_tree_seed(void)
check_empty_area_window(&tree);
mtree_destroy(&tree);

+ mt_init_flags(&tree, MT_FLAGS_ALLOC_RANGE);
+ check_empty_area_fill(&tree);
+ mtree_destroy(&tree);
+
+
#if defined(BENCH)
skip:
#endif
--
2.39.2


2023-03-03 10:25:42

by Snild Dolkow

[permalink] [raw]
Subject: Re: [PATCH 1/2] maple_tree: Fix mas_skip_node() end slot detection

Thanks, this is a significant improvement. Applying it on top of v6.1.12
allows my reproducer to pass most of the time (running as init in qemu).

Unfortunately, it's still failing around 10% of the time:
> $ for x in $(seq 100); do qemu-system-x86_64 -nographic -no-reboot -append 'console=ttyS0 panic=-1' -kernel arch/x86/boot/bzImage -initrd initrd/initrd.gz; done | tee qemu.log
> [...] > $ egrep -o 'Failed|Success' qemu.log | sort | uniq -c
> 11 Failed
> 89 Success

The failures now happen later, around 25 MiB:
> $ grep Failed qemu.log
> Failed. m=0xffffffffffffffff size=8192 (1<<13) i=1050 errno=12 total_leaks=29081600 (27 MiB)
> Failed. m=0xffffffffffffffff size=8192 (1<<13) i=332 errno=12 total_leaks=23199744 (22 MiB)
> Failed. m=0xffffffffffffffff size=8192 (1<<13) i=838 errno=12 total_leaks=27344896 (26 MiB)
> Failed. m=0xffffffffffffffff size=8192 (1<<13) i=282 errno=12 total_leaks=22790144 (21 MiB)
> Failed. m=0xffffffffffffffff size=8192 (1<<13) i=695 errno=12 total_leaks=26173440 (24 MiB)
> Failed. m=0xffffffffffffffff size=8192 (1<<13) i=1064 errno=12 total_leaks=29196288 (27 MiB)
> Failed. m=0xffffffffffffffff size=8192 (1<<13) i=608 errno=12 total_leaks=25460736 (24 MiB)
> Failed. m=0xffffffffffffffff size=8192 (1<<13) i=443 errno=12 total_leaks=24109056 (22 MiB)
> Failed. m=0xffffffffffffffff size=8192 (1<<13) i=549 errno=12 total_leaks=24977408 (23 MiB)
> Failed. m=0xffffffffffffffff size=8192 (1<<13) i=630 errno=12 total_leaks=25640960 (24 MiB)
> Failed. m=0xffffffffffffffff size=8192 (1<<13) i=820 errno=12 total_leaks=27197440 (25 MiB)


Just to make sure, I went back to e15e06a8 and ran the same loop.
> $ egrep -o 'Failed|Success' qemu.log | sort | uniq -c
> 100 Success

And with the patches applied on top of master (ee3f96b1):
> $ egrep -o 'Failed|Success' qemu.log | sort | uniq -c
> 10 Failed
> 90 Success

//Snild

2023-03-07 13:08:28

by Peng Zhang

[permalink] [raw]
Subject: Re: [PATCH 1/2] maple_tree: Fix mas_skip_node() end slot detection

Hi, Liam,
> mas_skip_node() is used to move the maple state to the node with a
> higher limit. It does this by walking up the tree and increasing the
> slot count. Since slot count may not be able to be increased, it may
> need to walk up multiple times to find room to walk right to a higher
> limit node. The limit of slots that was being used was the node limit
> and not the last location of data in the node. This would cause the
> maple state to be shifted outside actual data and enter an error state,
> thus returning -EBUSY.
>
> The result of the incorrect error state means that mas_awalk() would
> return an error instead of finding the allocation space.
>
> The fix is to use mas_data_end() in mas_skip_node() to detect the nodes
> data end point and continue walking the tree up until it is safe to move
> to a node with a higher limit.
>
> mas_skip_node() may also be passed a maple state in an error state from
> mas_anode_descend() when no allocations are available. Return on such
> an error state immediately.
>
> Reported-by: Snild Dolkow <[email protected]>
> Link: https://lore.kernel.org/linux-mm/[email protected]/
> Cc: <[email protected]>
> Fixes: 54a611b60590 ("Maple Tree: add new data structure")
> Signed-off-by: Liam R. Howlett <[email protected]>
Reviewed-by: Peng Zhang <[email protected]>
> ---
> lib/maple_tree.c | 25 ++++++++++---------------
> 1 file changed, 10 insertions(+), 15 deletions(-)
>
> diff --git a/lib/maple_tree.c b/lib/maple_tree.c
> index 2be86368237d..2efe854946d6 100644
> --- a/lib/maple_tree.c
> +++ b/lib/maple_tree.c
> @@ -5188,34 +5188,29 @@ static inline bool mas_rewind_node(struct ma_state *mas)
> */
> static inline bool mas_skip_node(struct ma_state *mas)
> {
> - unsigned char slot, slot_count;
> unsigned long *pivots;
> enum maple_type mt;
>
> - mt = mte_node_type(mas->node);
> - slot_count = mt_slots[mt] - 1;
> + if (mas_is_err(mas))
> + return false;
> +
> do {
> if (mte_is_root(mas->node)) {
> - slot = mas->offset;
> - if (slot > slot_count) {
> + if (mas->offset >= mas_data_end(mas)) {
> mas_set_err(mas, -EBUSY);
> return false;
> }
> } else {
> mas_ascend(mas);
> - slot = mas->offset;
> - mt = mte_node_type(mas->node);
> - slot_count = mt_slots[mt] - 1;
> }
> - } while (slot > slot_count);
> + } while (mas->offset >= mas_data_end(mas));
>
> - mas->offset = ++slot;
> + mt = mte_node_type(mas->node);
> pivots = ma_pivots(mas_mn(mas), mt);
> - if (slot > 0)
> - mas->min = pivots[slot - 1] + 1;
> -
> - if (slot <= slot_count)
> - mas->max = pivots[slot];
> + mas->min = pivots[mas->offset] + 1;
> + mas->offset++;
> + if (mas->offset < mt_slots[mt])
> + mas->max = pivots[mas->offset];
There is a bug here, the assignment of mas->min and mas->max is wrong.
The assignment will make them represent the range of a child node, but
it should represent the range of the current node. After mas_ascend()
returns, mas-min and mas->max already represent the range of the current
node, so we should delete these assignments of mas->min and mas->max.
>
> return true;
> }

Sincerely yours,
Peng.

2023-03-07 14:35:56

by Snild Dolkow

[permalink] [raw]
Subject: Re: [PATCH 1/2] maple_tree: Fix mas_skip_node() end slot detection

On 2023-03-07 14:05, Peng Zhang wrote:
> Hi, Liam,
>> -    } while (slot > slot_count);
>> +    } while (mas->offset >= mas_data_end(mas));
>> -    mas->offset = ++slot;
>> +    mt = mte_node_type(mas->node);
>>       pivots = ma_pivots(mas_mn(mas), mt);
>> -    if (slot > 0)
>> -        mas->min = pivots[slot - 1] + 1;
>> -
>> -    if (slot <= slot_count)
>> -        mas->max = pivots[slot];
>> +    mas->min = pivots[mas->offset] + 1;
>> +    mas->offset++;
>> +    if (mas->offset < mt_slots[mt])
>> +        mas->max = pivots[mas->offset];
> There is a bug here, the assignment of mas->min and mas->max is wrong.
> The assignment will make them represent the range of a child node, but
> it should represent the range of the current node. After mas_ascend()
> returns, mas-min and mas->max already represent the range of the current
> node, so we should delete these assignments of mas->min and mas->max.


Thanks for your suggestion, Peng. Applying it literally by removing only
the min/max assignments:

diff --git a/lib/maple_tree.c b/lib/maple_tree.c
index 6fc1ad42b409..9b6e581cf83f 100644
--- a/lib/maple_tree.c
+++ b/lib/maple_tree.c
@@ -5118,10 +5118,7 @@ static inline bool mas_skip_node

mt = mte_node_type(mas->node);
pivots = ma_pivots(mas_mn(mas), mt);
- mas->min = pivots[mas->offset] + 1;
mas->offset++;
- if (mas->offset < mt_slots[mt])
- mas->max = pivots[mas->offset];

return true;
}


This allowed my test to pass 100/100 runs. Still in qemu with the test
as init, so not really stressed in any way except that specific usecase.

//Snild

2023-03-07 16:11:49

by Liam R. Howlett

[permalink] [raw]
Subject: Re: [PATCH 1/2] maple_tree: Fix mas_skip_node() end slot detection

* Snild Dolkow <[email protected]> [230307 09:31]:
> On 2023-03-07 14:05, Peng Zhang wrote:
> > Hi, Liam,
> > > -??? } while (slot > slot_count);
> > > +??? } while (mas->offset >= mas_data_end(mas));
> > > -??? mas->offset = ++slot;
> > > +??? mt = mte_node_type(mas->node);
> > > ????? pivots = ma_pivots(mas_mn(mas), mt);
> > > -??? if (slot > 0)
> > > -??????? mas->min = pivots[slot - 1] + 1;
> > > -
> > > -??? if (slot <= slot_count)
> > > -??????? mas->max = pivots[slot];
> > > +??? mas->min = pivots[mas->offset] + 1;
> > > +??? mas->offset++;
> > > +??? if (mas->offset < mt_slots[mt])
> > > +??????? mas->max = pivots[mas->offset];
> > There is a bug here, the assignment of mas->min and mas->max is wrong.
> > The assignment will make them represent the range of a child node, but
> > it should represent the range of the current node. After mas_ascend()
> > returns, mas-min and mas->max already represent the range of the current
> > node, so we should delete these assignments of mas->min and mas->max.
>
>
> Thanks for your suggestion, Peng. Applying it literally by removing only the
> min/max assignments:
>
> diff --git a/lib/maple_tree.c b/lib/maple_tree.c
> index 6fc1ad42b409..9b6e581cf83f 100644
> --- a/lib/maple_tree.c
> +++ b/lib/maple_tree.c
> @@ -5118,10 +5118,7 @@ static inline bool mas_skip_node
>
> mt = mte_node_type(mas->node);
> pivots = ma_pivots(mas_mn(mas), mt);
> - mas->min = pivots[mas->offset] + 1;
> mas->offset++;
> - if (mas->offset < mt_slots[mt])
> - mas->max = pivots[mas->offset];
>
> return true;
> }
>
>
> This allowed my test to pass 100/100 runs. Still in qemu with the test as
> init, so not really stressed in any way except that specific usecase.
>


Yes, I have a v2 that removes these lines and some extra unused lines.
I've been working on a recreation to add to the testing before I send v2
out. I had 100% success running my testing over the weekend, but I
wanted to have a testcase before sending the change.

Thanks,
Liam

2023-03-07 16:17:30

by Peng Zhang

[permalink] [raw]
Subject: Re: [PATCH 1/2] maple_tree: Fix mas_skip_node() end slot detection

Hi, Snild,

在 2023/3/7 22:30, Snild Dolkow 写道:
> On 2023-03-07 14:05, Peng Zhang wrote:
>> Hi, Liam,
>>> -    } while (slot > slot_count);
>>> +    } while (mas->offset >= mas_data_end(mas));
>>> -    mas->offset = ++slot;
>>> +    mt = mte_node_type(mas->node);
>>>       pivots = ma_pivots(mas_mn(mas), mt);
>>> -    if (slot > 0)
>>> -        mas->min = pivots[slot - 1] + 1;
>>> -
>>> -    if (slot <= slot_count)
>>> -        mas->max = pivots[slot];
>>> +    mas->min = pivots[mas->offset] + 1;
>>> +    mas->offset++;
>>> +    if (mas->offset < mt_slots[mt])
>>> +        mas->max = pivots[mas->offset];
>> There is a bug here, the assignment of mas->min and mas->max is wrong.
>> The assignment will make them represent the range of a child node,
>> but it should represent the range of the current node. After
>> mas_ascend() returns, mas-min and mas->max already represent the
>> range of the current node, so we should delete these assignments of
>> mas->min and mas->max.
>
>
> Thanks for your suggestion, Peng. Applying it literally by removing
> only the min/max assignments:
>
> diff --git a/lib/maple_tree.c b/lib/maple_tree.c
> index 6fc1ad42b409..9b6e581cf83f 100644
> --- a/lib/maple_tree.c
> +++ b/lib/maple_tree.c
> @@ -5118,10 +5118,7 @@ static inline bool mas_skip_node
>
>         mt = mte_node_type(mas->node);
>         pivots = ma_pivots(mas_mn(mas), mt);
> -       mas->min = pivots[mas->offset] + 1;
>         mas->offset++;
> -       if (mas->offset < mt_slots[mt])
> -               mas->max = pivots[mas->offset];
>
>         return true;
>  }
>
>
> This allowed my test to pass 100/100 runs. Still in qemu with the test
> as init, so not really stressed in any way except that specific usecase.
>
> //Snild

Thanks for the test, I'm happy if it happens to fix your problem. So a
patch was made.
This patch needs to be applied after Liam's patch.

Sincerely yours,
Peng.




2023-03-07 16:30:42

by Liam R. Howlett

[permalink] [raw]
Subject: Re: [PATCH 1/2] maple_tree: Fix mas_skip_node() end slot detection

* Peng Zhang <[email protected]> [230307 11:21]:
> Sorry I forgot to post the link, here it is:
> https://lore.kernel.org/lkml/[email protected]/


I will roll this into V2 of my patch set. It is exactly what I have,
but I need to come up with a testcase for this first.