Syzbot reported a BUG_ON in mm/mmap.c which was found to be caused
by an inconsistency between threads walking the VMA maple tree.
The inconsistency is caused by the page fault handler modifying the
maple tree while holding the mmap_lock for read.
This only happens for stack VMAs. We had thought this was safe as it only
modifies a single pivot in the tree. Unfortunately, syzbot constructed
a test case where the stack had no guard page and grew the stack to abut
the next VMA. This causes us to delete the NULL entry between the two
VMAs and rewrite the node.
We considered several options for fixing this, including dropping the
mmap_lock, then reacquiring it for write; and relaxing the definition of
the tree to permit a zero-length NULL entry in the node. We decided the
best option was to backport some of the RCU patches from -next, which
solve the problem by allocating a new node and RCU-freeing the old node.
Since the problem exists in 6.1, we preferred a solution which is
similar to the one we intended to merge next merge window.
These patches have been in -next since next-20230301, and have received
intensive testing in Android as part of the RCU page fault patchset.
They were also sent as part of the "Per-VMA locks" v4 patch series.
Patches 1 to 7 are bug fixes for RCU mode of the tree and patch 8 enables
RCU mode for the tree.
Performance v6.3-rc3 vs patched v6.3-rc3:
Running these changes through mmtests showed there was a 15-20%
performance decrease in will-it-scale/brk1-processes. This tests creating
and inserting a single VMA repeatedly through the brk interface and
isn't representative of any real world applications.
Liam R. Howlett (8):
maple_tree: be more cautious about dead nodes
maple_tree: detect dead nodes in mas_start()
maple_tree: fix freeing of nodes in rcu mode
maple_tree: remove extra smp_wmb() from mas_dead_leaves()
maple_tree: fix write memory barrier of nodes once dead for RCU mode
maple_tree: add smp_rmb() to dead node detection
maple_tree: add RCU lock checking to rcu callback functions
mm: enable maple tree RCU mode by default.
include/linux/mm_types.h | 3 +-
kernel/fork.c | 3 +
lib/maple_tree.c | 269 +++++++++++++++++++++----------
mm/mmap.c | 3 +-
tools/testing/radix-tree/maple.c | 16 ++
5 files changed, 207 insertions(+), 87 deletions(-)
--
2.39.2
From: "Liam R. Howlett" <[email protected]>
Dereferencing RCU objects within the RCU callback without the RCU check
has caused lockdep to complain. Fix the RCU dereferencing by using the
RCU callback lock to ensure the operation is safe.
Also stop creating a new lock to use for dereferencing during destruction
of the tree or subtree. Instead, pass through a pointer to the tree that
has the lock that is held for RCU dereferencing checking. It also does
not make sense to use the maple state in the freeing scenario as the tree
walk is a special case where the tree no longer has the normal encodings
and parent pointers.
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 54a611b60590 ("Maple Tree: add new data structure")
Cc: [email protected]
Reported-by: Suren Baghdasaryan <[email protected]>
Signed-off-by: Liam R. Howlett <[email protected]>
---
lib/maple_tree.c | 188 ++++++++++++++++++++++++-----------------------
1 file changed, 96 insertions(+), 92 deletions(-)
diff --git a/lib/maple_tree.c b/lib/maple_tree.c
index 8ad2d1669fad..2be86368237d 100644
--- a/lib/maple_tree.c
+++ b/lib/maple_tree.c
@@ -824,6 +824,11 @@ static inline void *mt_slot(const struct maple_tree *mt,
return rcu_dereference_check(slots[offset], mt_locked(mt));
}
+static inline void *mt_slot_locked(struct maple_tree *mt, void __rcu **slots,
+ unsigned char offset)
+{
+ return rcu_dereference_protected(slots[offset], mt_locked(mt));
+}
/*
* mas_slot_locked() - Get the slot value when holding the maple tree lock.
* @mas: The maple state
@@ -835,7 +840,7 @@ static inline void *mt_slot(const struct maple_tree *mt,
static inline void *mas_slot_locked(struct ma_state *mas, void __rcu **slots,
unsigned char offset)
{
- return rcu_dereference_protected(slots[offset], mt_locked(mas->tree));
+ return mt_slot_locked(mas->tree, slots, offset);
}
/*
@@ -907,34 +912,35 @@ static inline void ma_set_meta(struct maple_node *mn, enum maple_type mt,
}
/*
- * mas_clear_meta() - clear the metadata information of a node, if it exists
- * @mas: The maple state
+ * mt_clear_meta() - clear the metadata information of a node, if it exists
+ * @mt: The maple tree
* @mn: The maple node
- * @mt: The maple node type
+ * @type: The maple node type
* @offset: The offset of the highest sub-gap in this node.
* @end: The end of the data in this node.
*/
-static inline void mas_clear_meta(struct ma_state *mas, struct maple_node *mn,
- enum maple_type mt)
+static inline void mt_clear_meta(struct maple_tree *mt, struct maple_node *mn,
+ enum maple_type type)
{
struct maple_metadata *meta;
unsigned long *pivots;
void __rcu **slots;
void *next;
- switch (mt) {
+ switch (type) {
case maple_range_64:
pivots = mn->mr64.pivot;
if (unlikely(pivots[MAPLE_RANGE64_SLOTS - 2])) {
slots = mn->mr64.slot;
- next = mas_slot_locked(mas, slots,
- MAPLE_RANGE64_SLOTS - 1);
- if (unlikely((mte_to_node(next) && mte_node_type(next))))
- return; /* The last slot is a node, no metadata */
+ next = mt_slot_locked(mt, slots,
+ MAPLE_RANGE64_SLOTS - 1);
+ if (unlikely((mte_to_node(next) &&
+ mte_node_type(next))))
+ return; /* no metadata, could be node */
}
fallthrough;
case maple_arange_64:
- meta = ma_meta(mn, mt);
+ meta = ma_meta(mn, type);
break;
default:
return;
@@ -5497,7 +5503,7 @@ static inline int mas_rev_alloc(struct ma_state *mas, unsigned long min,
}
/*
- * mas_dead_leaves() - Mark all leaves of a node as dead.
+ * mte_dead_leaves() - Mark all leaves of a node as dead.
* @mas: The maple state
* @slots: Pointer to the slot array
* @type: The maple node type
@@ -5507,16 +5513,16 @@ static inline int mas_rev_alloc(struct ma_state *mas, unsigned long min,
* Return: The number of leaves marked as dead.
*/
static inline
-unsigned char mas_dead_leaves(struct ma_state *mas, void __rcu **slots,
- enum maple_type mt)
+unsigned char mte_dead_leaves(struct maple_enode *enode, struct maple_tree *mt,
+ void __rcu **slots)
{
struct maple_node *node;
enum maple_type type;
void *entry;
int offset;
- for (offset = 0; offset < mt_slots[mt]; offset++) {
- entry = mas_slot_locked(mas, slots, offset);
+ for (offset = 0; offset < mt_slot_count(enode); offset++) {
+ entry = mt_slot(mt, slots, offset);
type = mte_node_type(entry);
node = mte_to_node(entry);
/* Use both node and type to catch LE & BE metadata */
@@ -5531,162 +5537,160 @@ unsigned char mas_dead_leaves(struct ma_state *mas, void __rcu **slots,
return offset;
}
-static void __rcu **mas_dead_walk(struct ma_state *mas, unsigned char offset)
+/**
+ * mte_dead_walk() - Walk down a dead tree to just before the leaves
+ * @enode: The maple encoded node
+ * @offset: The starting offset
+ *
+ * Note: This can only be used from the RCU callback context.
+ */
+static void __rcu **mte_dead_walk(struct maple_enode **enode, unsigned char offset)
{
- struct maple_node *next;
+ struct maple_node *node, *next;
void __rcu **slots = NULL;
- next = mas_mn(mas);
+ next = mte_to_node(*enode);
do {
- mas->node = mt_mk_node(next, next->type);
- slots = ma_slots(next, next->type);
- next = mas_slot_locked(mas, slots, offset);
+ *enode = ma_enode_ptr(next);
+ node = mte_to_node(*enode);
+ slots = ma_slots(node, node->type);
+ next = rcu_dereference_protected(slots[offset],
+ lock_is_held(&rcu_callback_map));
offset = 0;
} while (!ma_is_leaf(next->type));
return slots;
}
+/**
+ * mt_free_walk() - Walk & free a tree in the RCU callback context
+ * @head: The RCU head that's within the node.
+ *
+ * Note: This can only be used from the RCU callback context.
+ */
static void mt_free_walk(struct rcu_head *head)
{
void __rcu **slots;
struct maple_node *node, *start;
- struct maple_tree mt;
+ struct maple_enode *enode;
unsigned char offset;
enum maple_type type;
- MA_STATE(mas, &mt, 0, 0);
node = container_of(head, struct maple_node, rcu);
if (ma_is_leaf(node->type))
goto free_leaf;
- mt_init_flags(&mt, node->ma_flags);
- mas_lock(&mas);
start = node;
- mas.node = mt_mk_node(node, node->type);
- slots = mas_dead_walk(&mas, 0);
- node = mas_mn(&mas);
+ enode = mt_mk_node(node, node->type);
+ slots = mte_dead_walk(&enode, 0);
+ node = mte_to_node(enode);
do {
mt_free_bulk(node->slot_len, slots);
offset = node->parent_slot + 1;
- mas.node = node->piv_parent;
- if (mas_mn(&mas) == node)
- goto start_slots_free;
-
- type = mte_node_type(mas.node);
- slots = ma_slots(mte_to_node(mas.node), type);
- if ((offset < mt_slots[type]) && (slots[offset]))
- slots = mas_dead_walk(&mas, offset);
-
- node = mas_mn(&mas);
+ enode = node->piv_parent;
+ if (mte_to_node(enode) == node)
+ goto free_leaf;
+
+ type = mte_node_type(enode);
+ slots = ma_slots(mte_to_node(enode), type);
+ if ((offset < mt_slots[type]) &&
+ rcu_dereference_protected(slots[offset],
+ lock_is_held(&rcu_callback_map)))
+ slots = mte_dead_walk(&enode, offset);
+ node = mte_to_node(enode);
} while ((node != start) || (node->slot_len < offset));
slots = ma_slots(node, node->type);
mt_free_bulk(node->slot_len, slots);
-start_slots_free:
- mas_unlock(&mas);
free_leaf:
mt_free_rcu(&node->rcu);
}
-static inline void __rcu **mas_destroy_descend(struct ma_state *mas,
- struct maple_enode *prev, unsigned char offset)
+static inline void __rcu **mte_destroy_descend(struct maple_enode **enode,
+ struct maple_tree *mt, struct maple_enode *prev, unsigned char offset)
{
struct maple_node *node;
- struct maple_enode *next = mas->node;
+ struct maple_enode *next = *enode;
void __rcu **slots = NULL;
+ enum maple_type type;
+ unsigned char next_offset = 0;
do {
- mas->node = next;
- node = mas_mn(mas);
- slots = ma_slots(node, mte_node_type(mas->node));
- next = mas_slot_locked(mas, slots, 0);
- if ((mte_dead_node(next))) {
- mte_to_node(next)->type = mte_node_type(next);
- next = mas_slot_locked(mas, slots, 1);
- }
+ *enode = next;
+ node = mte_to_node(*enode);
+ type = mte_node_type(*enode);
+ slots = ma_slots(node, type);
+ next = mt_slot_locked(mt, slots, next_offset);
+ if ((mte_dead_node(next)))
+ next = mt_slot_locked(mt, slots, ++next_offset);
- mte_set_node_dead(mas->node);
- node->type = mte_node_type(mas->node);
- mas_clear_meta(mas, node, node->type);
+ mte_set_node_dead(*enode);
+ node->type = type;
node->piv_parent = prev;
node->parent_slot = offset;
- offset = 0;
- prev = mas->node;
+ offset = next_offset;
+ next_offset = 0;
+ prev = *enode;
} while (!mte_is_leaf(next));
return slots;
}
-static void mt_destroy_walk(struct maple_enode *enode, unsigned char ma_flags,
+static void mt_destroy_walk(struct maple_enode *enode, struct maple_tree *mt,
bool free)
{
void __rcu **slots;
struct maple_node *node = mte_to_node(enode);
struct maple_enode *start;
- struct maple_tree mt;
-
- MA_STATE(mas, &mt, 0, 0);
- mas.node = enode;
if (mte_is_leaf(enode)) {
node->type = mte_node_type(enode);
goto free_leaf;
}
- ma_flags &= ~MT_FLAGS_LOCK_MASK;
- mt_init_flags(&mt, ma_flags);
- mas_lock(&mas);
-
- mte_to_node(enode)->ma_flags = ma_flags;
start = enode;
- slots = mas_destroy_descend(&mas, start, 0);
- node = mas_mn(&mas);
+ slots = mte_destroy_descend(&enode, mt, start, 0);
+ node = mte_to_node(enode); // Updated in the above call.
do {
enum maple_type type;
unsigned char offset;
struct maple_enode *parent, *tmp;
- node->type = mte_node_type(mas.node);
- node->slot_len = mas_dead_leaves(&mas, slots, node->type);
+ node->slot_len = mte_dead_leaves(enode, mt, slots);
if (free)
mt_free_bulk(node->slot_len, slots);
offset = node->parent_slot + 1;
- mas.node = node->piv_parent;
- if (mas_mn(&mas) == node)
- goto start_slots_free;
+ enode = node->piv_parent;
+ if (mte_to_node(enode) == node)
+ goto free_leaf;
- type = mte_node_type(mas.node);
- slots = ma_slots(mte_to_node(mas.node), type);
+ type = mte_node_type(enode);
+ slots = ma_slots(mte_to_node(enode), type);
if (offset >= mt_slots[type])
goto next;
- tmp = mas_slot_locked(&mas, slots, offset);
+ tmp = mt_slot_locked(mt, slots, offset);
if (mte_node_type(tmp) && mte_to_node(tmp)) {
- parent = mas.node;
- mas.node = tmp;
- slots = mas_destroy_descend(&mas, parent, offset);
+ parent = enode;
+ enode = tmp;
+ slots = mte_destroy_descend(&enode, mt, parent, offset);
}
next:
- node = mas_mn(&mas);
- } while (start != mas.node);
+ node = mte_to_node(enode);
+ } while (start != enode);
- node = mas_mn(&mas);
- node->type = mte_node_type(mas.node);
- node->slot_len = mas_dead_leaves(&mas, slots, node->type);
+ node = mte_to_node(enode);
+ node->slot_len = mte_dead_leaves(enode, mt, slots);
if (free)
mt_free_bulk(node->slot_len, slots);
-start_slots_free:
- mas_unlock(&mas);
-
free_leaf:
if (free)
mt_free_rcu(&node->rcu);
else
- mas_clear_meta(&mas, node, node->type);
+ mt_clear_meta(mt, node, node->type);
}
/*
@@ -5702,10 +5706,10 @@ static inline void mte_destroy_walk(struct maple_enode *enode,
struct maple_node *node = mte_to_node(enode);
if (mt_in_rcu(mt)) {
- mt_destroy_walk(enode, mt->ma_flags, false);
+ mt_destroy_walk(enode, mt, false);
call_rcu(&node->rcu, mt_free_walk);
} else {
- mt_destroy_walk(enode, mt->ma_flags, true);
+ mt_destroy_walk(enode, mt, true);
}
}
--
2.39.2
During the development of the maple tree, the strategy of freeing multiple
nodes changed and, in the process, the pivots were reused to store
pointers to dead nodes. To ensure the readers see accurate pivots, the
writers need to mark the nodes as dead and call smp_wmb() to ensure any
readers can identify the node as dead before using the pivot values.
There were two places where the old method of marking the node as dead
without smp_wmb() were being used, which resulted in RCU readers seeing
the wrong pivot value before seeing the node was dead. Fix this race
condition by using mte_set_node_dead() which has the smp_wmb() call to
ensure the race is closed.
Add a WARN_ON() to the ma_free_rcu() call to ensure all nodes being freed
are marked as dead to ensure there are no other call paths besides the two
updated paths.
This is necessary for the RCU mode of the maple tree.
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 54a611b60590 ("Maple Tree: add new data structure")
Signed-off-by: Liam R. Howlett <[email protected]>
Signed-off-by: Suren Baghdasaryan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
---
lib/maple_tree.c | 7 +++++--
tools/testing/radix-tree/maple.c | 16 ++++++++++++++++
2 files changed, 21 insertions(+), 2 deletions(-)
diff --git a/lib/maple_tree.c b/lib/maple_tree.c
index 3d5ab02f981a..6b6eddadd9d2 100644
--- a/lib/maple_tree.c
+++ b/lib/maple_tree.c
@@ -185,7 +185,7 @@ static void mt_free_rcu(struct rcu_head *head)
*/
static void ma_free_rcu(struct maple_node *node)
{
- node->parent = ma_parent_ptr(node);
+ WARN_ON(node->parent != ma_parent_ptr(node));
call_rcu(&node->rcu, mt_free_rcu);
}
@@ -1778,8 +1778,10 @@ static inline void mas_replace(struct ma_state *mas, bool advanced)
rcu_assign_pointer(slots[offset], mas->node);
}
- if (!advanced)
+ if (!advanced) {
+ mte_set_node_dead(old_enode);
mas_free(mas, old_enode);
+ }
}
/*
@@ -4218,6 +4220,7 @@ static inline bool mas_wr_node_store(struct ma_wr_state *wr_mas)
done:
mas_leaf_set_meta(mas, newnode, dst_pivots, maple_leaf_64, new_end);
if (in_rcu) {
+ mte_set_node_dead(mas->node);
mas->node = mt_mk_node(newnode, wr_mas->type);
mas_replace(mas, false);
} else {
diff --git a/tools/testing/radix-tree/maple.c b/tools/testing/radix-tree/maple.c
index 958ee9bdb316..4c89ff333f6f 100644
--- a/tools/testing/radix-tree/maple.c
+++ b/tools/testing/radix-tree/maple.c
@@ -108,6 +108,7 @@ static noinline void check_new_node(struct maple_tree *mt)
MT_BUG_ON(mt, mn->slot[1] != NULL);
MT_BUG_ON(mt, mas_allocated(&mas) != 0);
+ mn->parent = ma_parent_ptr(mn);
ma_free_rcu(mn);
mas.node = MAS_START;
mas_nomem(&mas, GFP_KERNEL);
@@ -160,6 +161,7 @@ static noinline void check_new_node(struct maple_tree *mt)
MT_BUG_ON(mt, mas_allocated(&mas) != i);
MT_BUG_ON(mt, !mn);
MT_BUG_ON(mt, not_empty(mn));
+ mn->parent = ma_parent_ptr(mn);
ma_free_rcu(mn);
}
@@ -192,6 +194,7 @@ static noinline void check_new_node(struct maple_tree *mt)
MT_BUG_ON(mt, not_empty(mn));
MT_BUG_ON(mt, mas_allocated(&mas) != i - 1);
MT_BUG_ON(mt, !mn);
+ mn->parent = ma_parent_ptr(mn);
ma_free_rcu(mn);
}
@@ -210,6 +213,7 @@ static noinline void check_new_node(struct maple_tree *mt)
mn = mas_pop_node(&mas);
MT_BUG_ON(mt, not_empty(mn));
MT_BUG_ON(mt, mas_allocated(&mas) != j - 1);
+ mn->parent = ma_parent_ptr(mn);
ma_free_rcu(mn);
}
MT_BUG_ON(mt, mas_allocated(&mas) != 0);
@@ -233,6 +237,7 @@ static noinline void check_new_node(struct maple_tree *mt)
MT_BUG_ON(mt, mas_allocated(&mas) != i - j);
mn = mas_pop_node(&mas);
MT_BUG_ON(mt, not_empty(mn));
+ mn->parent = ma_parent_ptr(mn);
ma_free_rcu(mn);
MT_BUG_ON(mt, mas_allocated(&mas) != i - j - 1);
}
@@ -269,6 +274,7 @@ static noinline void check_new_node(struct maple_tree *mt)
mn = mas_pop_node(&mas); /* get the next node. */
MT_BUG_ON(mt, mn == NULL);
MT_BUG_ON(mt, not_empty(mn));
+ mn->parent = ma_parent_ptr(mn);
ma_free_rcu(mn);
}
MT_BUG_ON(mt, mas_allocated(&mas) != 0);
@@ -294,6 +300,7 @@ static noinline void check_new_node(struct maple_tree *mt)
mn = mas_pop_node(&mas2); /* get the next node. */
MT_BUG_ON(mt, mn == NULL);
MT_BUG_ON(mt, not_empty(mn));
+ mn->parent = ma_parent_ptr(mn);
ma_free_rcu(mn);
}
MT_BUG_ON(mt, mas_allocated(&mas2) != 0);
@@ -334,10 +341,12 @@ static noinline void check_new_node(struct maple_tree *mt)
MT_BUG_ON(mt, mas_allocated(&mas) != MAPLE_ALLOC_SLOTS + 2);
mn = mas_pop_node(&mas);
MT_BUG_ON(mt, not_empty(mn));
+ mn->parent = ma_parent_ptr(mn);
ma_free_rcu(mn);
for (i = 1; i <= MAPLE_ALLOC_SLOTS + 1; i++) {
mn = mas_pop_node(&mas);
MT_BUG_ON(mt, not_empty(mn));
+ mn->parent = ma_parent_ptr(mn);
ma_free_rcu(mn);
}
MT_BUG_ON(mt, mas_allocated(&mas) != 0);
@@ -375,6 +384,7 @@ static noinline void check_new_node(struct maple_tree *mt)
mas_node_count(&mas, i); /* Request */
mas_nomem(&mas, GFP_KERNEL); /* Fill request */
mn = mas_pop_node(&mas); /* get the next node. */
+ mn->parent = ma_parent_ptr(mn);
ma_free_rcu(mn);
mas_destroy(&mas);
@@ -382,10 +392,13 @@ static noinline void check_new_node(struct maple_tree *mt)
mas_node_count(&mas, i); /* Request */
mas_nomem(&mas, GFP_KERNEL); /* Fill request */
mn = mas_pop_node(&mas); /* get the next node. */
+ mn->parent = ma_parent_ptr(mn);
ma_free_rcu(mn);
mn = mas_pop_node(&mas); /* get the next node. */
+ mn->parent = ma_parent_ptr(mn);
ma_free_rcu(mn);
mn = mas_pop_node(&mas); /* get the next node. */
+ mn->parent = ma_parent_ptr(mn);
ma_free_rcu(mn);
mas_destroy(&mas);
}
@@ -35369,6 +35382,7 @@ static noinline void check_prealloc(struct maple_tree *mt)
MT_BUG_ON(mt, allocated != 1 + height * 3);
mn = mas_pop_node(&mas);
MT_BUG_ON(mt, mas_allocated(&mas) != allocated - 1);
+ mn->parent = ma_parent_ptr(mn);
ma_free_rcu(mn);
MT_BUG_ON(mt, mas_preallocate(&mas, GFP_KERNEL) != 0);
mas_destroy(&mas);
@@ -35386,6 +35400,7 @@ static noinline void check_prealloc(struct maple_tree *mt)
mas_destroy(&mas);
allocated = mas_allocated(&mas);
MT_BUG_ON(mt, allocated != 0);
+ mn->parent = ma_parent_ptr(mn);
ma_free_rcu(mn);
MT_BUG_ON(mt, mas_preallocate(&mas, GFP_KERNEL) != 0);
@@ -35756,6 +35771,7 @@ void farmer_tests(void)
tree.ma_root = mt_mk_node(node, maple_leaf_64);
mt_dump(&tree);
+ node->parent = ma_parent_ptr(node);
ma_free_rcu(node);
/* Check things that will make lockdep angry */
--
2.39.2
From: "Liam R. Howlett" <[email protected]>
The walk to destroy the nodes was not always setting the node type and
would result in a destroy method potentially using the values as nodes.
Avoid this by setting the correct node types. This is necessary for the
RCU mode of the maple tree.
Link: https://lkml.kernel.org/r/[email protected]
Cc: <[email protected]>
Fixes: 54a611b60590 ("Maple Tree: add new data structure")
Signed-off-by: Liam R. Howlett <[email protected]>
---
lib/maple_tree.c | 73 ++++++++++++++++++++++++++++++++++++++++--------
1 file changed, 62 insertions(+), 11 deletions(-)
diff --git a/lib/maple_tree.c b/lib/maple_tree.c
index 089cd76ec379..44d6ce30b28e 100644
--- a/lib/maple_tree.c
+++ b/lib/maple_tree.c
@@ -902,6 +902,44 @@ static inline void ma_set_meta(struct maple_node *mn, enum maple_type mt,
meta->end = end;
}
+/*
+ * mas_clear_meta() - clear the metadata information of a node, if it exists
+ * @mas: The maple state
+ * @mn: The maple node
+ * @mt: The maple node type
+ * @offset: The offset of the highest sub-gap in this node.
+ * @end: The end of the data in this node.
+ */
+static inline void mas_clear_meta(struct ma_state *mas, struct maple_node *mn,
+ enum maple_type mt)
+{
+ struct maple_metadata *meta;
+ unsigned long *pivots;
+ void __rcu **slots;
+ void *next;
+
+ switch (mt) {
+ case maple_range_64:
+ pivots = mn->mr64.pivot;
+ if (unlikely(pivots[MAPLE_RANGE64_SLOTS - 2])) {
+ slots = mn->mr64.slot;
+ next = mas_slot_locked(mas, slots,
+ MAPLE_RANGE64_SLOTS - 1);
+ if (unlikely((mte_to_node(next) && mte_node_type(next))))
+ return; /* The last slot is a node, no metadata */
+ }
+ fallthrough;
+ case maple_arange_64:
+ meta = ma_meta(mn, mt);
+ break;
+ default:
+ return;
+ }
+
+ meta->gap = 0;
+ meta->end = 0;
+}
+
/*
* ma_meta_end() - Get the data end of a node from the metadata
* @mn: The maple node
@@ -5455,20 +5493,22 @@ static inline int mas_rev_alloc(struct ma_state *mas, unsigned long min,
* mas_dead_leaves() - Mark all leaves of a node as dead.
* @mas: The maple state
* @slots: Pointer to the slot array
+ * @type: The maple node type
*
* Must hold the write lock.
*
* Return: The number of leaves marked as dead.
*/
static inline
-unsigned char mas_dead_leaves(struct ma_state *mas, void __rcu **slots)
+unsigned char mas_dead_leaves(struct ma_state *mas, void __rcu **slots,
+ enum maple_type mt)
{
struct maple_node *node;
enum maple_type type;
void *entry;
int offset;
- for (offset = 0; offset < mt_slot_count(mas->node); offset++) {
+ for (offset = 0; offset < mt_slots[mt]; offset++) {
entry = mas_slot_locked(mas, slots, offset);
type = mte_node_type(entry);
node = mte_to_node(entry);
@@ -5487,14 +5527,13 @@ unsigned char mas_dead_leaves(struct ma_state *mas, void __rcu **slots)
static void __rcu **mas_dead_walk(struct ma_state *mas, unsigned char offset)
{
- struct maple_node *node, *next;
+ struct maple_node *next;
void __rcu **slots = NULL;
next = mas_mn(mas);
do {
- mas->node = ma_enode_ptr(next);
- node = mas_mn(mas);
- slots = ma_slots(node, node->type);
+ mas->node = mt_mk_node(next, next->type);
+ slots = ma_slots(next, next->type);
next = mas_slot_locked(mas, slots, offset);
offset = 0;
} while (!ma_is_leaf(next->type));
@@ -5558,11 +5597,14 @@ static inline void __rcu **mas_destroy_descend(struct ma_state *mas,
node = mas_mn(mas);
slots = ma_slots(node, mte_node_type(mas->node));
next = mas_slot_locked(mas, slots, 0);
- if ((mte_dead_node(next)))
+ if ((mte_dead_node(next))) {
+ mte_to_node(next)->type = mte_node_type(next);
next = mas_slot_locked(mas, slots, 1);
+ }
mte_set_node_dead(mas->node);
node->type = mte_node_type(mas->node);
+ mas_clear_meta(mas, node, node->type);
node->piv_parent = prev;
node->parent_slot = offset;
offset = 0;
@@ -5582,13 +5624,18 @@ static void mt_destroy_walk(struct maple_enode *enode, unsigned char ma_flags,
MA_STATE(mas, &mt, 0, 0);
- if (mte_is_leaf(enode))
+ mas.node = enode;
+ if (mte_is_leaf(enode)) {
+ node->type = mte_node_type(enode);
goto free_leaf;
+ }
+ ma_flags &= ~MT_FLAGS_LOCK_MASK;
mt_init_flags(&mt, ma_flags);
mas_lock(&mas);
- mas.node = start = enode;
+ mte_to_node(enode)->ma_flags = ma_flags;
+ start = enode;
slots = mas_destroy_descend(&mas, start, 0);
node = mas_mn(&mas);
do {
@@ -5596,7 +5643,8 @@ static void mt_destroy_walk(struct maple_enode *enode, unsigned char ma_flags,
unsigned char offset;
struct maple_enode *parent, *tmp;
- node->slot_len = mas_dead_leaves(&mas, slots);
+ node->type = mte_node_type(mas.node);
+ node->slot_len = mas_dead_leaves(&mas, slots, node->type);
if (free)
mt_free_bulk(node->slot_len, slots);
offset = node->parent_slot + 1;
@@ -5620,7 +5668,8 @@ static void mt_destroy_walk(struct maple_enode *enode, unsigned char ma_flags,
} while (start != mas.node);
node = mas_mn(&mas);
- node->slot_len = mas_dead_leaves(&mas, slots);
+ node->type = mte_node_type(mas.node);
+ node->slot_len = mas_dead_leaves(&mas, slots, node->type);
if (free)
mt_free_bulk(node->slot_len, slots);
@@ -5630,6 +5679,8 @@ static void mt_destroy_walk(struct maple_enode *enode, unsigned char ma_flags,
free_leaf:
if (free)
mt_free_rcu(&node->rcu);
+ else
+ mas_clear_meta(&mas, node, node->type);
}
/*
--
2.39.2
On Mon, 27 Mar 2023 14:55:24 -0400 "Liam R. Howlett" <[email protected]> wrote:
> These patches have been in -next since next-20230301, and have received
> intensive testing in Android as part of the RCU page fault patchset.
> They were also sent as part of the "Per-VMA locks" v4 patch series.
> Patches 1 to 7 are bug fixes for RCU mode of the tree and patch 8 enables
> RCU mode for the tree.
What's happening here? I assume you've decided that the first 8
patches of the "Per-VMA locks v4" series should be fast-tracked into
6.3-rcX and backported? And we retain the rest of that series for
6.4-rc1?
Patch [3/8] hasn't come through to me, to linux-mm or to linux-kernel.
* Andrew Morton <[email protected]> [230327 15:35]:
> On Mon, 27 Mar 2023 14:55:24 -0400 "Liam R. Howlett" <[email protected]> wrote:
>
> > These patches have been in -next since next-20230301, and have received
> > intensive testing in Android as part of the RCU page fault patchset.
> > They were also sent as part of the "Per-VMA locks" v4 patch series.
> > Patches 1 to 7 are bug fixes for RCU mode of the tree and patch 8 enables
> > RCU mode for the tree.
>
> What's happening here? I assume you've decided that the first 8
> patches of the "Per-VMA locks v4" series should be fast-tracked into
> 6.3-rcX and backported? And we retain the rest of that series for
> 6.4-rc1?
Yes, they need to be backported and fast tracked to fix the issue syzbot
found.
>
> Patch [3/8] hasn't come through to me, to linux-mm or to linux-kernel.
Should arrive shortly, I received it from one of the ML.
On 3/27/23 21:48, Liam R. Howlett wrote:
> * Andrew Morton <[email protected]> [230327 15:35]:
>> On Mon, 27 Mar 2023 14:55:24 -0400 "Liam R. Howlett" <[email protected]> wrote:
>>
>> > These patches have been in -next since next-20230301, and have received
>> > intensive testing in Android as part of the RCU page fault patchset.
>> > They were also sent as part of the "Per-VMA locks" v4 patch series.
>> > Patches 1 to 7 are bug fixes for RCU mode of the tree and patch 8 enables
>> > RCU mode for the tree.
>>
>> What's happening here? I assume you've decided that the first 8
>> patches of the "Per-VMA locks v4" series should be fast-tracked into
>> 6.3-rcX and backported? And we retain the rest of that series for
>> 6.4-rc1?
>
> Yes, they need to be backported and fast tracked to fix the issue syzbot
> found.
Stable usually wants the "mainline first" which means fast tracking first,
then once it's in mainline, they pick it and annotate with mainline commit id.
One question is how Linus would feel about this now for rc5.
Another question is if we should really deviate in the patch 8/8 backport
just because it's not necessary for stable. Generally they would also prefer
not to deviate, unless there's a strong reason.
>>
>> Patch [3/8] hasn't come through to me, to linux-mm or to linux-kernel.
>
> Should arrive shortly, I received it from one of the ML.
>
>
* Vlastimil Babka <[email protected]> [230328 05:11]:
> On 3/27/23 21:48, Liam R. Howlett wrote:
> > * Andrew Morton <[email protected]> [230327 15:35]:
> >> On Mon, 27 Mar 2023 14:55:24 -0400 "Liam R. Howlett" <[email protected]> wrote:
> >>
> >> > These patches have been in -next since next-20230301, and have received
> >> > intensive testing in Android as part of the RCU page fault patchset.
> >> > They were also sent as part of the "Per-VMA locks" v4 patch series.
> >> > Patches 1 to 7 are bug fixes for RCU mode of the tree and patch 8 enables
> >> > RCU mode for the tree.
> >>
> >> What's happening here? I assume you've decided that the first 8
> >> patches of the "Per-VMA locks v4" series should be fast-tracked into
> >> 6.3-rcX and backported? And we retain the rest of that series for
> >> 6.4-rc1?
> >
> > Yes, they need to be backported and fast tracked to fix the issue syzbot
> > found.
>
> Stable usually wants the "mainline first" which means fast tracking first,
> then once it's in mainline, they pick it and annotate with mainline commit id.
Right. I meant these patches won't cleanly apply to 6.1/6.2 and will
need more than just a cherry-pick due to the vma iterator changes. I
have those modified patches ready to go as well.
>
> One question is how Linus would feel about this now for rc5.
>
> Another question is if we should really deviate in the patch 8/8 backport
> just because it's not necessary for stable. Generally they would also prefer
> not to deviate, unless there's a strong reason.
Just to clarify, the change is to remove something that isn't necessary
at all.
>
> >>
> >> Patch [3/8] hasn't come through to me, to linux-mm or to linux-kernel.
> >
> > Should arrive shortly, I received it from one of the ML.
> >
> >
>
>
> --
> maple-tree mailing list
> [email protected]
> https://lists.infradead.org/mailman/listinfo/maple-tree
* Liam R. Howlett <[email protected]> [230328 09:02]:
> * Vlastimil Babka <[email protected]> [230328 05:11]:
> > On 3/27/23 21:48, Liam R. Howlett wrote:
> > > * Andrew Morton <[email protected]> [230327 15:35]:
> > >> On Mon, 27 Mar 2023 14:55:24 -0400 "Liam R. Howlett" <[email protected]> wrote:
> > >>
> > >> > These patches have been in -next since next-20230301, and have received
> > >> > intensive testing in Android as part of the RCU page fault patchset.
> > >> > They were also sent as part of the "Per-VMA locks" v4 patch series.
> > >> > Patches 1 to 7 are bug fixes for RCU mode of the tree and patch 8 enables
> > >> > RCU mode for the tree.
> > >>
> > >> What's happening here? I assume you've decided that the first 8
> > >> patches of the "Per-VMA locks v4" series should be fast-tracked into
> > >> 6.3-rcX and backported? And we retain the rest of that series for
> > >> 6.4-rc1?
> > >
> > > Yes, they need to be backported and fast tracked to fix the issue syzbot
> > > found.
> >
> > Stable usually wants the "mainline first" which means fast tracking first,
> > then once it's in mainline, they pick it and annotate with mainline commit id.
>
> Right. I meant these patches won't cleanly apply to 6.1/6.2 and will
> need more than just a cherry-pick due to the vma iterator changes. I
> have those modified patches ready to go as well.
>
> >
> > One question is how Linus would feel about this now for rc5.
> >
> > Another question is if we should really deviate in the patch 8/8 backport
> > just because it's not necessary for stable. Generally they would also prefer
> > not to deviate, unless there's a strong reason.
>
> Just to clarify, the change is to remove something that isn't necessary
> at all.
>
Andrew,
I just wanted to know where we stand with these patches?
I understand that it's late in the cycle, but this is a bug that affects
6.1, 6.2, 6.3-rc5 and can be triggered from userspace.
I'm asking because the LTS 6.1 is starting to be picked up by
distributions, although I don't know the scale of the install, and
getting these upstream will allow for the backported fixes to be picked
up by stable quicker.
Thanks,
Liam
On Mon, 3 Apr 2023 15:44:43 -0400 "Liam R. Howlett" <[email protected]> wrote:
> I just wanted to know where we stand with these patches?
They're in the mm.git hotfixes queue. I'll be asking Linus
to pull them this week.