2023-11-22 23:51:48

by Kent Overstreet

[permalink] [raw]
Subject: [PATCH 0/6] lockdep enhancements for bcachefs

This adds:
- lockdep_set_no_check_recursion(), which allows lockdep checking for
bcachefs six locks

bcachefs has a cycle detector for deadlock avoidance; there is no lock
ordering for lockdep to check w.r.t. other btree node locks. However, we
do want held btree node locks to be tracked, and lock ordering to be
checked w.r.t. other locks.

- lock_class_is_held(), which we use for some extra locking assertion

We want to assert e.g. that we're not blocking on IO with btree node
locks held.

Kent Overstreet (6):
locking/lockdep: lock_class_is_held()
locking/lockdep: lockdep_set_no_check_recursion()
bcachefs: Assert that btree node locks aren't being leaked
bcachefs: Use lock_class_is_held() for btree locking asserts
bcachefs: Check for btree locks held on transaction init
bcachefs: Switch to lockdep_set_no_check_recursion()

fs/bcachefs/btree_gc.c | 3 +++
fs/bcachefs/btree_iter.c | 2 ++
fs/bcachefs/btree_locking.c | 14 ++++++++---
fs/bcachefs/btree_types.h | 1 +
include/linux/lockdep.h | 10 ++++++++
include/linux/lockdep_types.h | 2 +-
kernel/locking/lockdep.c | 46 +++++++++++++++++++++++++++++++++++
7 files changed, 73 insertions(+), 5 deletions(-)

--
2.42.0


2023-11-22 23:51:53

by Kent Overstreet

[permalink] [raw]
Subject: [PATCH 5/6] bcachefs: Check for btree locks held on transaction init

From: Kent Overstreet <[email protected]>

Ideally we would disallow multiple btree_trans being initialized within
the same process - and hopefully we will at some point, the stack usage
is excessive - but for now there are a couple places where we do this:

- transaction commit error path -> journal reclaim - btree key cache
flush
- move data path -> do_pending_writes -> write path -> bucket
allocation (in the near future when bucket allocation switches to
using a freespace btree)

In order to avoid deadlocking the first btree_trans must have been
unlocked with bch2_trans_unlock() before using the second btree_trans -
this patch adds an assertion to bch2_trans_init() that verifies that
this has been done when lockdep is enabled.

Signed-off-by: Kent Overstreet <[email protected]>
Signed-off-by: Kent Overstreet <[email protected]>
---
fs/bcachefs/btree_iter.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/fs/bcachefs/btree_iter.c b/fs/bcachefs/btree_iter.c
index 1d734e297eb4..a52fd206f822 100644
--- a/fs/bcachefs/btree_iter.c
+++ b/fs/bcachefs/btree_iter.c
@@ -2854,6 +2854,8 @@ struct btree_trans *__bch2_trans_get(struct bch_fs *c, unsigned fn_idx)
struct btree_trans *trans;
struct btree_transaction_stats *s;

+ bch2_assert_btree_nodes_not_locked();
+
trans = bch2_trans_alloc(c);

memset(trans, 0, sizeof(*trans));
--
2.42.0

2023-11-22 23:52:20

by Kent Overstreet

[permalink] [raw]
Subject: [PATCH 3/6] bcachefs: Assert that btree node locks aren't being leaked

This asserts (when lockdep is enabled) that btree locks aren't held when
exiting a btree_trans.

Signed-off-by: Kent Overstreet <[email protected]>
Signed-off-by: Kent Overstreet <[email protected]>
---
fs/bcachefs/btree_gc.c | 3 +++
fs/bcachefs/btree_locking.c | 7 +++++++
fs/bcachefs/btree_types.h | 1 +
3 files changed, 11 insertions(+)

diff --git a/fs/bcachefs/btree_gc.c b/fs/bcachefs/btree_gc.c
index 136df94e9f84..7e5d52f8ffd7 100644
--- a/fs/bcachefs/btree_gc.c
+++ b/fs/bcachefs/btree_gc.c
@@ -1087,6 +1087,9 @@ static int bch2_gc_btrees(struct bch_fs *c, bool initial, bool metadata_only)
unsigned i;
int ret = 0;

+ if (initial)
+ trans->is_initial_gc = true;
+
for (i = 0; i < BTREE_ID_NR; i++)
ids[i] = i;
bubble_sort(ids, BTREE_ID_NR, btree_id_gc_phase_cmp);
diff --git a/fs/bcachefs/btree_locking.c b/fs/bcachefs/btree_locking.c
index 6039278121dc..308c891ad3ca 100644
--- a/fs/bcachefs/btree_locking.c
+++ b/fs/bcachefs/btree_locking.c
@@ -751,6 +751,13 @@ void bch2_trans_unlock(struct btree_trans *trans)

trans_for_each_path(trans, path)
__bch2_btree_path_unlock(trans, path);
+
+ /*
+ * bch2_gc_btree_init_recurse() doesn't use btree iterators for walking
+ * btree nodes, it implements its own walking:
+ */
+ if (!trans->is_initial_gc)
+ bch2_assert_btree_nodes_not_locked();
}

void bch2_trans_unlock_long(struct btree_trans *trans)
diff --git a/fs/bcachefs/btree_types.h b/fs/bcachefs/btree_types.h
index ca7526603d06..2326bceb34f8 100644
--- a/fs/bcachefs/btree_types.h
+++ b/fs/bcachefs/btree_types.h
@@ -399,6 +399,7 @@ struct btree_trans {
bool memory_allocation_failure:1;
bool journal_transaction_names:1;
bool journal_replay_not_finished:1;
+ bool is_initial_gc:1;
bool notrace_relock_fail:1;
bool write_locked:1;
enum bch_errcode restarted:16;
--
2.42.0

2023-11-22 23:52:34

by Kent Overstreet

[permalink] [raw]
Subject: [PATCH 6/6] bcachefs: Switch to lockdep_set_no_check_recursion()

Signed-off-by: Kent Overstreet <[email protected]>
---
fs/bcachefs/btree_locking.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/fs/bcachefs/btree_locking.c b/fs/bcachefs/btree_locking.c
index ef84b0423deb..59c57c585a4c 100644
--- a/fs/bcachefs/btree_locking.c
+++ b/fs/bcachefs/btree_locking.c
@@ -10,7 +10,9 @@ void bch2_btree_lock_init(struct btree_bkey_cached_common *b,
enum six_lock_init_flags flags)
{
__six_lock_init(&b->lock, "b->c.lock", &bch2_btree_node_lock_key, flags);
- lockdep_set_novalidate_class(&b->lock);
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+ lockdep_set_no_check_recursion(&b->lock.dep_map);
+#endif
}

#ifdef CONFIG_LOCKDEP
--
2.42.0

2023-11-22 23:52:41

by Kent Overstreet

[permalink] [raw]
Subject: [PATCH 2/6] locking/lockdep: lockdep_set_no_check_recursion()

This adds a method to tell lockdep not to check lock ordering within a
lock class - but to still check lock ordering w.r.t. other lock types.

This is for bcachefs, where for btree node locks we have our own
deadlock avoidance strategy w.r.t. other btree node locks (cycle
detection), but we still want lockdep to check lock ordering w.r.t.
other lock types.

Signed-off-by: Kent Overstreet <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Waiman Long <[email protected]>
Cc: Boqun Feng <[email protected]>
---
include/linux/lockdep.h | 6 ++++++
include/linux/lockdep_types.h | 2 +-
kernel/locking/lockdep.c | 26 ++++++++++++++++++++++++++
3 files changed, 33 insertions(+), 1 deletion(-)

diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
index fc86557d2a21..1bc7eb7cb548 100644
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -700,4 +700,10 @@ lockdep_rcu_suspicious(const char *file, const int line, const char *s)
}
#endif

+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+void lockdep_set_no_check_recursion(struct lockdep_map *map);
+#else
+static inline void lockdep_set_no_check_recursion(struct lockdep_map *map) {}
+#endif
+
#endif /* __LINUX_LOCKDEP_H */
diff --git a/include/linux/lockdep_types.h b/include/linux/lockdep_types.h
index 2ebc323d345a..aa6bddac2a64 100644
--- a/include/linux/lockdep_types.h
+++ b/include/linux/lockdep_types.h
@@ -137,7 +137,7 @@ struct lock_class {
u8 wait_type_inner;
u8 wait_type_outer;
u8 lock_type;
- /* u8 hole; */
+ u8 no_check_recursion;

#ifdef CONFIG_LOCK_STAT
unsigned long contention_point[LOCKSTAT_POINTS];
diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 38924d90da85..50a0a1ea960a 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -3048,6 +3048,9 @@ check_deadlock(struct task_struct *curr, struct held_lock *next)

class = hlock_class(prev);

+ if (class->no_check_recursion)
+ continue;
+
if (class->cmp_fn &&
class->cmp_fn(prev->instance, next->instance) < 0)
continue;
@@ -3113,6 +3116,10 @@ check_prev_add(struct task_struct *curr, struct held_lock *prev,
return 2;
}

+ if (hlock_class(prev) == hlock_class(next) &&
+ hlock_class(prev)->no_check_recursion)
+ return 2;
+
if (prev->class_idx == next->class_idx) {
struct lock_class *class = hlock_class(prev);

@@ -6732,3 +6739,22 @@ void lockdep_rcu_suspicious(const char *file, const int line, const char *s)
warn_rcu_exit(rcu);
}
EXPORT_SYMBOL_GPL(lockdep_rcu_suspicious);
+
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+void lockdep_set_no_check_recursion(struct lockdep_map *lock)
+{
+ struct lock_class *class = lock->class_cache[0];
+ unsigned long flags;
+
+ raw_local_irq_save(flags);
+ lockdep_recursion_inc();
+
+ if (!class)
+ class = register_lock_class(lock, 0, 0);
+ if (class)
+ class->no_check_recursion = true;
+ lockdep_recursion_finish();
+ raw_local_irq_restore(flags);
+}
+EXPORT_SYMBOL_GPL(lockdep_set_no_check_recursion);
+#endif
--
2.42.0

2023-11-24 09:58:43

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 2/6] locking/lockdep: lockdep_set_no_check_recursion()

On Wed, Nov 22, 2023 at 06:51:09PM -0500, Kent Overstreet wrote:
> This adds a method to tell lockdep not to check lock ordering within a
> lock class - but to still check lock ordering w.r.t. other lock types.
>
> This is for bcachefs, where for btree node locks we have our own
> deadlock avoidance strategy w.r.t. other btree node locks (cycle
> detection), but we still want lockdep to check lock ordering w.r.t.
> other lock types.

So earlier you added custom sort order.

Additionally there is the wound-wait mutexes that also have semantics
similar to what you describe.

Explain why you can't use either your own added feature or the existing
infrastructure to solve this?

2023-11-24 23:29:53

by Kent Overstreet

[permalink] [raw]
Subject: Re: [PATCH 2/6] locking/lockdep: lockdep_set_no_check_recursion()

On Fri, Nov 24, 2023 at 10:58:04AM +0100, Peter Zijlstra wrote:
> On Wed, Nov 22, 2023 at 06:51:09PM -0500, Kent Overstreet wrote:
> > This adds a method to tell lockdep not to check lock ordering within a
> > lock class - but to still check lock ordering w.r.t. other lock types.
> >
> > This is for bcachefs, where for btree node locks we have our own
> > deadlock avoidance strategy w.r.t. other btree node locks (cycle
> > detection), but we still want lockdep to check lock ordering w.r.t.
> > other lock types.
>
> So earlier you added custom sort order.

That was never merged? I've been meaning to revisit that though and
convert some other things to it, the original patchset just converted
bcache, not bcachefs.

But it's not applicable here, bcachefs doesn't have lock ordering, it
has cycle detection.

> Additionally there is the wound-wait mutexes that also have semantics
> similar to what you describe.

Yeah, we talked about that as well. Wait/wound is similar in principle
but aborts much more frequently, since it's just comparing transaction
start time and not doing full cycle detection.