2023-01-13 07:24:42

by Boqun Feng

[permalink] [raw]
Subject: [PATCH 0/3] Detect SRCU related deadlocks

This is actually a leftover of the recursive read deadlock detection
patchset:

https://lore.kernel.org/lkml/[email protected]/

I resolve comments then and add more test cases, and hopefully this can
fulfill the request from KVM:

https://lore.kernel.org/lkml/[email protected]/

;-)

The patch #3 is now WIP for two reasons:

* It may conflicts with Paul's patchset on removing CONFIG_SRCU

* I haven't found a proper way to "reinit" srcu_struct when
lockdep selftest runs: cleanup_srcu_struct() needs workqueue
however the tests can run before there is one.

Anyway, these selftests prove the detection actually works. And as
always, feedbacks and comments are welcome!

Regards,
Boqun

Boqun Feng (3):
locking/lockdep: Introduce lock_sync()
rcu: Equip sleepable RCU with lockdep dependency graph checks
WIP: locking/lockdep: selftests: Add selftests for SRCU

include/linux/lockdep.h | 5 +++
include/linux/srcu.h | 23 +++++++++++--
kernel/locking/lockdep.c | 34 +++++++++++++++++++
kernel/rcu/srcutiny.c | 2 ++
kernel/rcu/srcutree.c | 2 ++
lib/locking-selftest.c | 71 ++++++++++++++++++++++++++++++++++++++++
6 files changed, 135 insertions(+), 2 deletions(-)

--
2.38.1


2023-01-13 07:24:43

by Boqun Feng

[permalink] [raw]
Subject: [PATCH 2/3] rcu: Equip sleepable RCU with lockdep dependency graph checks

Although all flavors of RCU are annotated correctly with lockdep as
recursive read locks, their 'check' parameter of lock_acquire() is
unset. It means that RCU read locks are not added into the lockdep
dependency graph therefore deadlock detection based on dependency graph
won't catch deadlock caused by RCU. This is fine for "non-sleepable" RCU
flavors since wait-context detection and other context based detection
can catch these deadlocks. However for sleepable RCU, this is limited.

Actually we can detect the deadlocks caused by SRCU by 1) making
srcu_read_lock() a 'check'ed recursive read lock and 2) making
synchronize_srcu() a empty write lock critical section. Even better,
with the newly introduced lock_sync(), we can avoid false positives
about irq-unsafe/safe. So do it.

Note that NMI safe SRCU read side critical sections are currently not
annonated, since step-by-step approach can help us deal with
false-positives. These may be annotated in the future.

Signed-off-by: Boqun Feng <[email protected]>
---
include/linux/srcu.h | 23 +++++++++++++++++++++--
kernel/rcu/srcutiny.c | 2 ++
kernel/rcu/srcutree.c | 2 ++
3 files changed, 25 insertions(+), 2 deletions(-)

diff --git a/include/linux/srcu.h b/include/linux/srcu.h
index 9b9d0bbf1d3c..a1595f8c5155 100644
--- a/include/linux/srcu.h
+++ b/include/linux/srcu.h
@@ -102,6 +102,21 @@ static inline int srcu_read_lock_held(const struct srcu_struct *ssp)
return lock_is_held(&ssp->dep_map);
}

+static inline void srcu_lock_acquire(struct lockdep_map *map)
+{
+ lock_map_acquire_read(map);
+}
+
+static inline void srcu_lock_release(struct lockdep_map *map)
+{
+ lock_map_release(map);
+}
+
+static inline void srcu_lock_sync(struct lockdep_map *map)
+{
+ lock_map_sync(map);
+}
+
#else /* #ifdef CONFIG_DEBUG_LOCK_ALLOC */

static inline int srcu_read_lock_held(const struct srcu_struct *ssp)
@@ -109,6 +124,10 @@ static inline int srcu_read_lock_held(const struct srcu_struct *ssp)
return 1;
}

+#define srcu_lock_acquire(m) do { } while (0)
+#define srcu_lock_release(m) do { } while (0)
+#define srcu_lock_sync(m) do { } while (0)
+
#endif /* #else #ifdef CONFIG_DEBUG_LOCK_ALLOC */

#define SRCU_NMI_UNKNOWN 0x0
@@ -182,7 +201,7 @@ static inline int srcu_read_lock(struct srcu_struct *ssp) __acquires(ssp)

srcu_check_nmi_safety(ssp, false);
retval = __srcu_read_lock(ssp);
- rcu_lock_acquire(&(ssp)->dep_map);
+ srcu_lock_acquire(&(ssp)->dep_map);
return retval;
}

@@ -226,7 +245,7 @@ static inline void srcu_read_unlock(struct srcu_struct *ssp, int idx)
{
WARN_ON_ONCE(idx & ~0x1);
srcu_check_nmi_safety(ssp, false);
- rcu_lock_release(&(ssp)->dep_map);
+ srcu_lock_release(&(ssp)->dep_map);
__srcu_read_unlock(ssp, idx);
}

diff --git a/kernel/rcu/srcutiny.c b/kernel/rcu/srcutiny.c
index b12fb0cec44d..336af24e0fe3 100644
--- a/kernel/rcu/srcutiny.c
+++ b/kernel/rcu/srcutiny.c
@@ -197,6 +197,8 @@ void synchronize_srcu(struct srcu_struct *ssp)
{
struct rcu_synchronize rs;

+ srcu_lock_sync(&ssp->dep_map);
+
RCU_LOCKDEP_WARN(lockdep_is_held(ssp) ||
lock_is_held(&rcu_bh_lock_map) ||
lock_is_held(&rcu_lock_map) ||
diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
index ca4b5dcec675..408088c73e0e 100644
--- a/kernel/rcu/srcutree.c
+++ b/kernel/rcu/srcutree.c
@@ -1267,6 +1267,8 @@ static void __synchronize_srcu(struct srcu_struct *ssp, bool do_norm)
{
struct rcu_synchronize rcu;

+ srcu_lock_sync(&ssp->dep_map);
+
RCU_LOCKDEP_WARN(lockdep_is_held(ssp) ||
lock_is_held(&rcu_bh_lock_map) ||
lock_is_held(&rcu_lock_map) ||
--
2.38.1

2023-01-13 07:24:43

by Boqun Feng

[permalink] [raw]
Subject: [PATCH 3/3] WIP: locking/lockdep: selftests: Add selftests for SRCU

Signed-off-by: Boqun Feng <[email protected]>
---
lib/locking-selftest.c | 71 ++++++++++++++++++++++++++++++++++++++++++
1 file changed, 71 insertions(+)

diff --git a/lib/locking-selftest.c b/lib/locking-selftest.c
index 8d24279fad05..5fc206a2f9f1 100644
--- a/lib/locking-selftest.c
+++ b/lib/locking-selftest.c
@@ -60,6 +60,7 @@ __setup("debug_locks_verbose=", setup_debug_locks_verbose);
#define LOCKTYPE_RTMUTEX 0x20
#define LOCKTYPE_LL 0x40
#define LOCKTYPE_SPECIAL 0x80
+#define LOCKTYPE_SRCU 0x100

static struct ww_acquire_ctx t, t2;
static struct ww_mutex o, o2, o3;
@@ -100,6 +101,13 @@ static DEFINE_RT_MUTEX(rtmutex_D);

#endif

+#ifdef CONFIG_SRCU
+static struct lock_class_key srcu_A_key;
+static struct lock_class_key srcu_B_key;
+static struct srcu_struct srcu_A;
+static struct srcu_struct srcu_B;
+#endif
+
/*
* Locks that we initialize dynamically as well so that
* e.g. X1 and X2 becomes two instances of the same class,
@@ -1418,6 +1426,12 @@ static void reset_locks(void)
memset(&ww_lockdep.acquire_key, 0, sizeof(ww_lockdep.acquire_key));
memset(&ww_lockdep.mutex_key, 0, sizeof(ww_lockdep.mutex_key));
local_irq_enable();
+
+#ifdef CONFIG_SRCU
+ __init_srcu_struct(&srcu_A, "srcuA", &srcu_A_key);
+ __init_srcu_struct(&srcu_B, "srcuB", &srcu_B_key);
+#endif
+
}

#undef I
@@ -2360,6 +2374,58 @@ static void ww_tests(void)
pr_cont("\n");
}

+static void srcu_ABBA(void)
+{
+ int ia, ib;
+
+ ia = srcu_read_lock(&srcu_A);
+ synchronize_srcu(&srcu_B);
+ srcu_read_unlock(&srcu_A, ia);
+
+ ib = srcu_read_lock(&srcu_B);
+ synchronize_srcu(&srcu_A);
+ srcu_read_unlock(&srcu_B, ib); // should fail
+}
+
+static void srcu_mutex_ABBA(void)
+{
+ int ia;
+
+ mutex_lock(&mutex_A);
+ synchronize_srcu(&srcu_A);
+ mutex_unlock(&mutex_A);
+
+ ia = srcu_read_lock(&srcu_A);
+ mutex_lock(&mutex_A);
+ mutex_unlock(&mutex_A);
+ srcu_read_unlock(&srcu_A, ia); // should fail
+}
+
+static void srcu_irqsafe(void)
+{
+ int ia;
+
+ HARDIRQ_ENTER();
+ ia = srcu_read_lock(&srcu_A);
+ srcu_read_unlock(&srcu_A, ia);
+ HARDIRQ_EXIT();
+
+ synchronize_srcu(&srcu_A); // should NOT fail
+}
+
+static void srcu_tests(void)
+{
+ printk(" --------------------------------------------------------------------------\n");
+ printk(" | SRCU tests |\n");
+ printk(" ---------------\n");
+ print_testname("ABBA read-sync/read-sync");
+ dotest(srcu_ABBA, FAILURE, LOCKTYPE_SRCU);
+ print_testname("ABBA mutex-sync/read-mutex");
+ dotest(srcu_mutex_ABBA, FAILURE, LOCKTYPE_SRCU);
+ print_testname("Irqsafe synchronize_srcu");
+ dotest(srcu_irqsafe, SUCCESS, LOCKTYPE_SRCU);
+ pr_cont("\n");
+}

/*
* <in hardirq handler>
@@ -2881,6 +2947,10 @@ void locking_selftest(void)
printk(" --------------------------------------------------------------------------\n");

init_shared_classes();
+#ifdef CONFIG_SRCU
+ __init_srcu_struct(&srcu_A, "srcuA", &srcu_A_key);
+ __init_srcu_struct(&srcu_B, "srcuB", &srcu_B_key);
+#endif
lockdep_set_selftest_task(current);

DO_TESTCASE_6R("A-A deadlock", AA);
@@ -2965,6 +3035,7 @@ void locking_selftest(void)
DO_TESTCASE_6x2x2RW("irq read-recursion #3", irq_read_recursion3);

ww_tests();
+ srcu_tests();

force_read_lock_recursive = 0;
/*
--
2.38.1

2023-01-13 07:24:45

by Boqun Feng

[permalink] [raw]
Subject: [PATCH 1/3] locking/lockdep: Introduce lock_sync()

Currently, in order to annonate functions like synchronize_srcu() for
lockdep, a trick as follow can be used:

lock_acquire();
lock_release();

, which indicates synchronize_srcu() acts like an empty critical section
that waits for other (read-side) critical sections to finish. This
surely can catch some deadlock, but as discussion brought up by Paul
Mckenney [1], this could introduce false positives because of
irq-safe/unsafe detection. Extra tricks might help this:

local_irq_disable(..);
lock_acquire();
lock_release();
local_irq_enable(...);

But it's better that lockdep could provide an annonation for
synchronize_srcu() like functions, so that people won't need to repeat
the ugly tricks above. Therefore introduce lock_sync(). It's simply an
lock+unlock pair with no irq safe/unsafe deadlock check, since the
to-be-annontated functions don't create real critical sections therefore
there is no way that irq can create extra dependencies.

[1]: https://lore.kernel.org/lkml/20180412021233.ewncg5jjuzjw3x62@tardis/

Signed-off-by: Boqun Feng <[email protected]>
---
include/linux/lockdep.h | 5 +++++
kernel/locking/lockdep.c | 34 ++++++++++++++++++++++++++++++++++
2 files changed, 39 insertions(+)

diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
index 1f1099dac3f0..ba09df6a0872 100644
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -268,6 +268,10 @@ extern void lock_acquire(struct lockdep_map *lock, unsigned int subclass,

extern void lock_release(struct lockdep_map *lock, unsigned long ip);

+extern void lock_sync(struct lockdep_map *lock, unsigned int subclass,
+ int read, int check, struct lockdep_map *nest_lock,
+ unsigned long ip);
+
/* lock_is_held_type() returns */
#define LOCK_STATE_UNKNOWN -1
#define LOCK_STATE_NOT_HELD 0
@@ -555,6 +559,7 @@ do { \
#define lock_map_acquire_read(l) lock_acquire_shared_recursive(l, 0, 0, NULL, _THIS_IP_)
#define lock_map_acquire_tryread(l) lock_acquire_shared_recursive(l, 0, 1, NULL, _THIS_IP_)
#define lock_map_release(l) lock_release(l, _THIS_IP_)
+#define lock_map_sync(l) lock_sync(l, 0, 0, 1, NULL, _THIS_IP_)

#ifdef CONFIG_PROVE_LOCKING
# define might_lock(lock) \
diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index e3375bc40dad..cffa026a765f 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -5692,6 +5692,40 @@ void lock_release(struct lockdep_map *lock, unsigned long ip)
}
EXPORT_SYMBOL_GPL(lock_release);

+/*
+ * lock_sync() - A special annotation for synchronize_{s,}rcu()-like API.
+ *
+ * No actual critical section is created by the APIs annotated with this: these
+ * APIs are used to wait for one or multiple critical sections (on other CPUs
+ * or threads), and it means that calling these APIs inside these critical
+ * sections is potential deadlock.
+ *
+ * This annotation acts as an acqurie+release anontation pair with hardirqoff
+ * being 1. Since there's no critical section, no interrupt can create extra
+ * dependencies "inside" the annotation, hardirqoff == 1 allows us to avoid
+ * false positives.
+ */
+void lock_sync(struct lockdep_map *lock, unsigned subclass, int read,
+ int check, struct lockdep_map *nest_lock, unsigned long ip)
+{
+ unsigned long flags;
+
+ if (unlikely(!lockdep_enabled()))
+ return;
+
+ raw_local_irq_save(flags);
+ check_flags(flags);
+
+ lockdep_recursion_inc();
+ __lock_acquire(lock, subclass, 0, read, check, 1, nest_lock, ip, 0, 0);
+
+ if (__lock_release(lock, ip))
+ check_chain_key(current);
+ lockdep_recursion_finish();
+ raw_local_irq_restore(flags);
+}
+EXPORT_SYMBOL_GPL(lock_sync);
+
noinstr int lock_is_held_type(const struct lockdep_map *lock, int read)
{
unsigned long flags;
--
2.38.1

2023-01-13 11:53:29

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH 2/3] rcu: Equip sleepable RCU with lockdep dependency graph checks

On Thu, Jan 12, 2023 at 10:59:54PM -0800, Boqun Feng wrote:
> Although all flavors of RCU are annotated correctly with lockdep as
> recursive read locks, their 'check' parameter of lock_acquire() is
> unset. It means that RCU read locks are not added into the lockdep
> dependency graph therefore deadlock detection based on dependency graph
> won't catch deadlock caused by RCU. This is fine for "non-sleepable" RCU
> flavors since wait-context detection and other context based detection
> can catch these deadlocks. However for sleepable RCU, this is limited.
>
> Actually we can detect the deadlocks caused by SRCU by 1) making
> srcu_read_lock() a 'check'ed recursive read lock and 2) making
> synchronize_srcu() a empty write lock critical section. Even better,
> with the newly introduced lock_sync(), we can avoid false positives
> about irq-unsafe/safe. So do it.
>
> Note that NMI safe SRCU read side critical sections are currently not
> annonated, since step-by-step approach can help us deal with
> false-positives. These may be annotated in the future.
>
> Signed-off-by: Boqun Feng <[email protected]>

Nice, thank you!!!

Acked-by: Paul E. McKenney <[email protected]>

Or if you would prefer that I take the series through -rcu, please just
let me know.

Thanx, Paul

> ---
> include/linux/srcu.h | 23 +++++++++++++++++++++--
> kernel/rcu/srcutiny.c | 2 ++
> kernel/rcu/srcutree.c | 2 ++
> 3 files changed, 25 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/srcu.h b/include/linux/srcu.h
> index 9b9d0bbf1d3c..a1595f8c5155 100644
> --- a/include/linux/srcu.h
> +++ b/include/linux/srcu.h
> @@ -102,6 +102,21 @@ static inline int srcu_read_lock_held(const struct srcu_struct *ssp)
> return lock_is_held(&ssp->dep_map);
> }
>
> +static inline void srcu_lock_acquire(struct lockdep_map *map)
> +{
> + lock_map_acquire_read(map);
> +}
> +
> +static inline void srcu_lock_release(struct lockdep_map *map)
> +{
> + lock_map_release(map);
> +}
> +
> +static inline void srcu_lock_sync(struct lockdep_map *map)
> +{
> + lock_map_sync(map);
> +}
> +
> #else /* #ifdef CONFIG_DEBUG_LOCK_ALLOC */
>
> static inline int srcu_read_lock_held(const struct srcu_struct *ssp)
> @@ -109,6 +124,10 @@ static inline int srcu_read_lock_held(const struct srcu_struct *ssp)
> return 1;
> }
>
> +#define srcu_lock_acquire(m) do { } while (0)
> +#define srcu_lock_release(m) do { } while (0)
> +#define srcu_lock_sync(m) do { } while (0)
> +
> #endif /* #else #ifdef CONFIG_DEBUG_LOCK_ALLOC */
>
> #define SRCU_NMI_UNKNOWN 0x0
> @@ -182,7 +201,7 @@ static inline int srcu_read_lock(struct srcu_struct *ssp) __acquires(ssp)
>
> srcu_check_nmi_safety(ssp, false);
> retval = __srcu_read_lock(ssp);
> - rcu_lock_acquire(&(ssp)->dep_map);
> + srcu_lock_acquire(&(ssp)->dep_map);
> return retval;
> }
>
> @@ -226,7 +245,7 @@ static inline void srcu_read_unlock(struct srcu_struct *ssp, int idx)
> {
> WARN_ON_ONCE(idx & ~0x1);
> srcu_check_nmi_safety(ssp, false);
> - rcu_lock_release(&(ssp)->dep_map);
> + srcu_lock_release(&(ssp)->dep_map);
> __srcu_read_unlock(ssp, idx);
> }
>
> diff --git a/kernel/rcu/srcutiny.c b/kernel/rcu/srcutiny.c
> index b12fb0cec44d..336af24e0fe3 100644
> --- a/kernel/rcu/srcutiny.c
> +++ b/kernel/rcu/srcutiny.c
> @@ -197,6 +197,8 @@ void synchronize_srcu(struct srcu_struct *ssp)
> {
> struct rcu_synchronize rs;
>
> + srcu_lock_sync(&ssp->dep_map);
> +
> RCU_LOCKDEP_WARN(lockdep_is_held(ssp) ||
> lock_is_held(&rcu_bh_lock_map) ||
> lock_is_held(&rcu_lock_map) ||
> diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
> index ca4b5dcec675..408088c73e0e 100644
> --- a/kernel/rcu/srcutree.c
> +++ b/kernel/rcu/srcutree.c
> @@ -1267,6 +1267,8 @@ static void __synchronize_srcu(struct srcu_struct *ssp, bool do_norm)
> {
> struct rcu_synchronize rcu;
>
> + srcu_lock_sync(&ssp->dep_map);
> +
> RCU_LOCKDEP_WARN(lockdep_is_held(ssp) ||
> lock_is_held(&rcu_bh_lock_map) ||
> lock_is_held(&rcu_lock_map) ||
> --
> 2.38.1
>

2023-01-13 13:04:28

by David Woodhouse

[permalink] [raw]
Subject: [PATCH 0/3] KVM: Make use of SRCU deadlock detection support


On Thu, 2023-01-12 at 22:59 -0800, Boqun Feng wrote:
> This is actually a leftover of the recursive read deadlock detection
> patchset:
>
> https://lore.kernel.org/lkml/[email protected]/
>
> I resolve comments then and add more test cases, and hopefully this can
> fulfill the request from KVM:
>
> https://lore.kernel.org/lkml/[email protected]/
>
> ;-)

It definitely seems to work; thank you! I can revert some of the recent
fixes from the KVM tree, apply your patches, and then I can trigger the
lockdep warnings. To make it reliably trigger, we need to artificially
call synchronize_srcu(&kvm->srcu) under kvm->lock on KVM init, because
the circumstances under which that happens are a bit esoteric and don't
always happen otherwise, so lockdep wouldn't notice.

> The patch #3 is now WIP for two reasons:
>
> * It may conflicts with Paul's patchset on removing CONFIG_SRCU
>
> * I haven't found a proper way to "reinit" srcu_struct when
> lockdep selftest runs: cleanup_srcu_struct() needs workqueue
> however the tests can run before there is one.

Understood. I think the KVM series which follows can stand alone and go
via the KVM tree separately. As and when your series gets merged, it'll
serve to protect against regressions.

Thanks again!

David Woodhouse (3):
KVM: Show lockdep the kvm->mutex vs. kvm->srcu ordering rule
KVM: selftests: Use enum for test numbers in xen_shinfo_test
KVM: selftests: Add EVTCHNOP_send slow path test to xen_shinfo_test

.../testing/selftests/kvm/x86_64/xen_shinfo_test.c | 165 ++++++++++++++-------
virt/kvm/kvm_main.c | 10 ++
2 files changed, 124 insertions(+), 51 deletions(-)


2023-01-13 13:04:59

by David Woodhouse

[permalink] [raw]
Subject: [PATCH 1/3] KVM: Show lockdep the kvm->mutex vs. kvm->srcu ordering rule

From: David Woodhouse <[email protected]>

Lockdep is learning to spot deadlocks with sleepable RCU vs. mutexes,
which can occur where one code path calls synchronize_scru() with a
mutex held, while another code path attempts to obtain the same mutex
while in a read-side section.

Since lockdep isn't very good at reading the English prose in
Documentation/virt/kvm/locking.rst, give it a demonstration by calling
synchronize_scru(&kvm->srcu) while holding kvm->lock in kvm_create_vm().
The cases where this happens naturally are relatively esoteric and may
not happen otherwise.

Signed-off-by: David Woodhouse <[email protected]>
---
virt/kvm/kvm_main.c | 10 ++++++++++
1 file changed, 10 insertions(+)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 13e88297f999..285b3c5a6364 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1173,6 +1173,16 @@ static struct kvm *kvm_create_vm(unsigned long type, const char *fdname)
if (init_srcu_struct(&kvm->irq_srcu))
goto out_err_no_irq_srcu;

+#ifdef CONFIG_LOCKDEP
+ /*
+ * Ensure lockdep knows that it's not permitted to lock kvm->lock
+ * from a SRCU read section on kvm->srcu.
+ */
+ mutex_lock(&kvm->lock);
+ synchronize_srcu(&kvm->srcu);
+ mutex_unlock(&kvm->lock);
+#endif
+
refcount_set(&kvm->users_count, 1);
for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
for (j = 0; j < 2; j++) {
--
2.35.3

2023-01-13 18:16:07

by Boqun Feng

[permalink] [raw]
Subject: Re: [PATCH 2/3] rcu: Equip sleepable RCU with lockdep dependency graph checks

On Fri, Jan 13, 2023 at 09:03:30PM +0800, Hillf Danton wrote:
> On 12 Jan 2023 22:59:54 -0800 Boqun Feng <[email protected]>
> > --- a/kernel/rcu/srcutree.c
> > +++ b/kernel/rcu/srcutree.c
> > @@ -1267,6 +1267,8 @@ static void __synchronize_srcu(struct srcu_struct *ssp, bool do_norm)
> > {
> > struct rcu_synchronize rcu;
> >
> > + srcu_lock_sync(&ssp->dep_map);
> > +
> > RCU_LOCKDEP_WARN(lockdep_is_held(ssp) ||
> > lock_is_held(&rcu_bh_lock_map) ||
> > lock_is_held(&rcu_lock_map) ||
> > --
> > 2.38.1
>
> The following deadlock is able to escape srcu_lock_sync() because the
> __lock_release folded in sync leaves one lock on the sync side.
>
> cpu9 cpu0
> --- ---
> lock A srcu_lock_acquire(&ssp->dep_map);
> srcu_lock_sync(&ssp->dep_map);
> lock A

But isn't it just the srcu_mutex_ABBA test case in patch #3, and my run
of lockdep selftest shows we can catch it. Anything subtle I'm missing?

Regards,
Boqun

2023-01-13 18:21:06

by Boqun Feng

[permalink] [raw]
Subject: Re: [PATCH 2/3] rcu: Equip sleepable RCU with lockdep dependency graph checks

On Fri, Jan 13, 2023 at 03:29:49AM -0800, Paul E. McKenney wrote:
> On Thu, Jan 12, 2023 at 10:59:54PM -0800, Boqun Feng wrote:
> > Although all flavors of RCU are annotated correctly with lockdep as
> > recursive read locks, their 'check' parameter of lock_acquire() is
> > unset. It means that RCU read locks are not added into the lockdep
> > dependency graph therefore deadlock detection based on dependency graph
> > won't catch deadlock caused by RCU. This is fine for "non-sleepable" RCU
> > flavors since wait-context detection and other context based detection
> > can catch these deadlocks. However for sleepable RCU, this is limited.
> >
> > Actually we can detect the deadlocks caused by SRCU by 1) making
> > srcu_read_lock() a 'check'ed recursive read lock and 2) making
> > synchronize_srcu() a empty write lock critical section. Even better,
> > with the newly introduced lock_sync(), we can avoid false positives
> > about irq-unsafe/safe. So do it.
> >
> > Note that NMI safe SRCU read side critical sections are currently not
> > annonated, since step-by-step approach can help us deal with
> > false-positives. These may be annotated in the future.
> >
> > Signed-off-by: Boqun Feng <[email protected]>
>
> Nice, thank you!!!
>
> Acked-by: Paul E. McKenney <[email protected]>
>
> Or if you would prefer that I take the series through -rcu, please just
> let me know.
>

I prefer that the first two patches go through your tree, because it
reduces the synchronization among locking, rcu and KVM trees to the
synchronization betwen rcu and KVM trees.

For patch #3, since it's not really ready yet, so I don't know, but I
guess when it's finished, probably better go through -rcu.

Regards,
Boqun

> Thanx, Paul
>
> > ---
> > include/linux/srcu.h | 23 +++++++++++++++++++++--
> > kernel/rcu/srcutiny.c | 2 ++
> > kernel/rcu/srcutree.c | 2 ++
> > 3 files changed, 25 insertions(+), 2 deletions(-)
> >
> > diff --git a/include/linux/srcu.h b/include/linux/srcu.h
> > index 9b9d0bbf1d3c..a1595f8c5155 100644
> > --- a/include/linux/srcu.h
> > +++ b/include/linux/srcu.h
> > @@ -102,6 +102,21 @@ static inline int srcu_read_lock_held(const struct srcu_struct *ssp)
> > return lock_is_held(&ssp->dep_map);
> > }
> >
> > +static inline void srcu_lock_acquire(struct lockdep_map *map)
> > +{
> > + lock_map_acquire_read(map);
> > +}
> > +
> > +static inline void srcu_lock_release(struct lockdep_map *map)
> > +{
> > + lock_map_release(map);
> > +}
> > +
> > +static inline void srcu_lock_sync(struct lockdep_map *map)
> > +{
> > + lock_map_sync(map);
> > +}
> > +
> > #else /* #ifdef CONFIG_DEBUG_LOCK_ALLOC */
> >
> > static inline int srcu_read_lock_held(const struct srcu_struct *ssp)
> > @@ -109,6 +124,10 @@ static inline int srcu_read_lock_held(const struct srcu_struct *ssp)
> > return 1;
> > }
> >
> > +#define srcu_lock_acquire(m) do { } while (0)
> > +#define srcu_lock_release(m) do { } while (0)
> > +#define srcu_lock_sync(m) do { } while (0)
> > +
> > #endif /* #else #ifdef CONFIG_DEBUG_LOCK_ALLOC */
> >
> > #define SRCU_NMI_UNKNOWN 0x0
> > @@ -182,7 +201,7 @@ static inline int srcu_read_lock(struct srcu_struct *ssp) __acquires(ssp)
> >
> > srcu_check_nmi_safety(ssp, false);
> > retval = __srcu_read_lock(ssp);
> > - rcu_lock_acquire(&(ssp)->dep_map);
> > + srcu_lock_acquire(&(ssp)->dep_map);
> > return retval;
> > }
> >
> > @@ -226,7 +245,7 @@ static inline void srcu_read_unlock(struct srcu_struct *ssp, int idx)
> > {
> > WARN_ON_ONCE(idx & ~0x1);
> > srcu_check_nmi_safety(ssp, false);
> > - rcu_lock_release(&(ssp)->dep_map);
> > + srcu_lock_release(&(ssp)->dep_map);
> > __srcu_read_unlock(ssp, idx);
> > }
> >
> > diff --git a/kernel/rcu/srcutiny.c b/kernel/rcu/srcutiny.c
> > index b12fb0cec44d..336af24e0fe3 100644
> > --- a/kernel/rcu/srcutiny.c
> > +++ b/kernel/rcu/srcutiny.c
> > @@ -197,6 +197,8 @@ void synchronize_srcu(struct srcu_struct *ssp)
> > {
> > struct rcu_synchronize rs;
> >
> > + srcu_lock_sync(&ssp->dep_map);
> > +
> > RCU_LOCKDEP_WARN(lockdep_is_held(ssp) ||
> > lock_is_held(&rcu_bh_lock_map) ||
> > lock_is_held(&rcu_lock_map) ||
> > diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
> > index ca4b5dcec675..408088c73e0e 100644
> > --- a/kernel/rcu/srcutree.c
> > +++ b/kernel/rcu/srcutree.c
> > @@ -1267,6 +1267,8 @@ static void __synchronize_srcu(struct srcu_struct *ssp, bool do_norm)
> > {
> > struct rcu_synchronize rcu;
> >
> > + srcu_lock_sync(&ssp->dep_map);
> > +
> > RCU_LOCKDEP_WARN(lockdep_is_held(ssp) ||
> > lock_is_held(&rcu_bh_lock_map) ||
> > lock_is_held(&rcu_lock_map) ||
> > --
> > 2.38.1
> >

2023-01-13 19:27:26

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH 2/3] rcu: Equip sleepable RCU with lockdep dependency graph checks

On Fri, Jan 13, 2023 at 10:05:22AM -0800, Boqun Feng wrote:
> On Fri, Jan 13, 2023 at 03:29:49AM -0800, Paul E. McKenney wrote:
> > On Thu, Jan 12, 2023 at 10:59:54PM -0800, Boqun Feng wrote:
> > > Although all flavors of RCU are annotated correctly with lockdep as
> > > recursive read locks, their 'check' parameter of lock_acquire() is
> > > unset. It means that RCU read locks are not added into the lockdep
> > > dependency graph therefore deadlock detection based on dependency graph
> > > won't catch deadlock caused by RCU. This is fine for "non-sleepable" RCU
> > > flavors since wait-context detection and other context based detection
> > > can catch these deadlocks. However for sleepable RCU, this is limited.
> > >
> > > Actually we can detect the deadlocks caused by SRCU by 1) making
> > > srcu_read_lock() a 'check'ed recursive read lock and 2) making
> > > synchronize_srcu() a empty write lock critical section. Even better,
> > > with the newly introduced lock_sync(), we can avoid false positives
> > > about irq-unsafe/safe. So do it.
> > >
> > > Note that NMI safe SRCU read side critical sections are currently not
> > > annonated, since step-by-step approach can help us deal with
> > > false-positives. These may be annotated in the future.
> > >
> > > Signed-off-by: Boqun Feng <[email protected]>
> >
> > Nice, thank you!!!
> >
> > Acked-by: Paul E. McKenney <[email protected]>
> >
> > Or if you would prefer that I take the series through -rcu, please just
> > let me know.
>
> I prefer that the first two patches go through your tree, because it
> reduces the synchronization among locking, rcu and KVM trees to the
> synchronization betwen rcu and KVM trees.

Very well, I have queued and pushed these with the usual wordsmithing,
thank you!

On the possibility of annotating __srcu_read_lock_nmisafe() and
__srcu_read_unlock_nmisafe(), are those lockdep annotations themselves
NMI-safe?

> For patch #3, since it's not really ready yet, so I don't know, but I
> guess when it's finished, probably better go through -rcu.

Please let me know when it is ready!

Thanx, Paul

> Regards,
> Boqun
>
> > Thanx, Paul
> >
> > > ---
> > > include/linux/srcu.h | 23 +++++++++++++++++++++--
> > > kernel/rcu/srcutiny.c | 2 ++
> > > kernel/rcu/srcutree.c | 2 ++
> > > 3 files changed, 25 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/include/linux/srcu.h b/include/linux/srcu.h
> > > index 9b9d0bbf1d3c..a1595f8c5155 100644
> > > --- a/include/linux/srcu.h
> > > +++ b/include/linux/srcu.h
> > > @@ -102,6 +102,21 @@ static inline int srcu_read_lock_held(const struct srcu_struct *ssp)
> > > return lock_is_held(&ssp->dep_map);
> > > }
> > >
> > > +static inline void srcu_lock_acquire(struct lockdep_map *map)
> > > +{
> > > + lock_map_acquire_read(map);
> > > +}
> > > +
> > > +static inline void srcu_lock_release(struct lockdep_map *map)
> > > +{
> > > + lock_map_release(map);
> > > +}
> > > +
> > > +static inline void srcu_lock_sync(struct lockdep_map *map)
> > > +{
> > > + lock_map_sync(map);
> > > +}
> > > +
> > > #else /* #ifdef CONFIG_DEBUG_LOCK_ALLOC */
> > >
> > > static inline int srcu_read_lock_held(const struct srcu_struct *ssp)
> > > @@ -109,6 +124,10 @@ static inline int srcu_read_lock_held(const struct srcu_struct *ssp)
> > > return 1;
> > > }
> > >
> > > +#define srcu_lock_acquire(m) do { } while (0)
> > > +#define srcu_lock_release(m) do { } while (0)
> > > +#define srcu_lock_sync(m) do { } while (0)
> > > +
> > > #endif /* #else #ifdef CONFIG_DEBUG_LOCK_ALLOC */
> > >
> > > #define SRCU_NMI_UNKNOWN 0x0
> > > @@ -182,7 +201,7 @@ static inline int srcu_read_lock(struct srcu_struct *ssp) __acquires(ssp)
> > >
> > > srcu_check_nmi_safety(ssp, false);
> > > retval = __srcu_read_lock(ssp);
> > > - rcu_lock_acquire(&(ssp)->dep_map);
> > > + srcu_lock_acquire(&(ssp)->dep_map);
> > > return retval;
> > > }
> > >
> > > @@ -226,7 +245,7 @@ static inline void srcu_read_unlock(struct srcu_struct *ssp, int idx)
> > > {
> > > WARN_ON_ONCE(idx & ~0x1);
> > > srcu_check_nmi_safety(ssp, false);
> > > - rcu_lock_release(&(ssp)->dep_map);
> > > + srcu_lock_release(&(ssp)->dep_map);
> > > __srcu_read_unlock(ssp, idx);
> > > }
> > >
> > > diff --git a/kernel/rcu/srcutiny.c b/kernel/rcu/srcutiny.c
> > > index b12fb0cec44d..336af24e0fe3 100644
> > > --- a/kernel/rcu/srcutiny.c
> > > +++ b/kernel/rcu/srcutiny.c
> > > @@ -197,6 +197,8 @@ void synchronize_srcu(struct srcu_struct *ssp)
> > > {
> > > struct rcu_synchronize rs;
> > >
> > > + srcu_lock_sync(&ssp->dep_map);
> > > +
> > > RCU_LOCKDEP_WARN(lockdep_is_held(ssp) ||
> > > lock_is_held(&rcu_bh_lock_map) ||
> > > lock_is_held(&rcu_lock_map) ||
> > > diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
> > > index ca4b5dcec675..408088c73e0e 100644
> > > --- a/kernel/rcu/srcutree.c
> > > +++ b/kernel/rcu/srcutree.c
> > > @@ -1267,6 +1267,8 @@ static void __synchronize_srcu(struct srcu_struct *ssp, bool do_norm)
> > > {
> > > struct rcu_synchronize rcu;
> > >
> > > + srcu_lock_sync(&ssp->dep_map);
> > > +
> > > RCU_LOCKDEP_WARN(lockdep_is_held(ssp) ||
> > > lock_is_held(&rcu_bh_lock_map) ||
> > > lock_is_held(&rcu_lock_map) ||
> > > --
> > > 2.38.1
> > >

2023-01-14 00:34:48

by Boqun Feng

[permalink] [raw]
Subject: [PATCH 4/3] locking/lockdep: Improve the deadlock scenario print for sync and read lock

Lock scenario print is always a weak spot of lockdep splats. Improvement
can be made if we rework the dependency search and the error printing.

However without touching the graph search, we can improve a little for
the circular deadlock case, since we have the to-be-added lock
dependency, and know whether these two locks are read/write/sync.

In order to know whether a held_lock is sync or not, a bit was
"stolen" from ->references, which reduce our limit for the same lock
class nesting from 2^12 to 2^11, and it should still be good enough.

Besides, since we now have bit in held_lock for sync, we don't need the
"hardirqoffs being 1" trick, and also we can avoid the __lock_release()
if we jump out of __lock_acquire() before the held_lock stored.

With these changes, a deadlock case evolved with read lock and sync gets
a better print-out from:

[...] Possible unsafe locking scenario:
[...]
[...] CPU0 CPU1
[...] ---- ----
[...] lock(srcuA);
[...] lock(srcuB);
[...] lock(srcuA);
[...] lock(srcuB);

to

[...] Possible unsafe locking scenario:
[...]
[...] CPU0 CPU1
[...] ---- ----
[...] rlock(srcuA);
[...] lock(srcuB);
[...] lock(srcuA);
[...] sync(srcuB);

Signed-off-by: Boqun Feng <[email protected]>
---
include/linux/lockdep.h | 3 ++-
kernel/locking/lockdep.c | 48 ++++++++++++++++++++++++++--------------
2 files changed, 34 insertions(+), 17 deletions(-)

diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
index ba09df6a0872..febd7ecc225c 100644
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -134,7 +134,8 @@ struct held_lock {
unsigned int read:2; /* see lock_acquire() comment */
unsigned int check:1; /* see lock_acquire() comment */
unsigned int hardirqs_off:1;
- unsigned int references:12; /* 32 bits */
+ unsigned int sync:1;
+ unsigned int references:11; /* 32 bits */
unsigned int pin_count;
};

diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index cffa026a765f..4031d87f6829 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -1880,6 +1880,8 @@ print_circular_lock_scenario(struct held_lock *src,
struct lock_class *source = hlock_class(src);
struct lock_class *target = hlock_class(tgt);
struct lock_class *parent = prt->class;
+ int src_read = src->read;
+ int tgt_read = tgt->read;

/*
* A direct locking problem where unsafe_class lock is taken
@@ -1907,7 +1909,10 @@ print_circular_lock_scenario(struct held_lock *src,
printk(" Possible unsafe locking scenario:\n\n");
printk(" CPU0 CPU1\n");
printk(" ---- ----\n");
- printk(" lock(");
+ if (tgt_read != 0)
+ printk(" rlock(");
+ else
+ printk(" lock(");
__print_lock_name(target);
printk(KERN_CONT ");\n");
printk(" lock(");
@@ -1916,7 +1921,12 @@ print_circular_lock_scenario(struct held_lock *src,
printk(" lock(");
__print_lock_name(target);
printk(KERN_CONT ");\n");
- printk(" lock(");
+ if (src_read != 0)
+ printk(" rlock(");
+ else if (src->sync)
+ printk(" sync(");
+ else
+ printk(" lock(");
__print_lock_name(source);
printk(KERN_CONT ");\n");
printk("\n *** DEADLOCK ***\n\n");
@@ -4530,7 +4540,13 @@ mark_usage(struct task_struct *curr, struct held_lock *hlock, int check)
return 0;
}
}
- if (!hlock->hardirqs_off) {
+
+ /*
+ * For lock_sync(), don't mark the ENABLED usage, since lock_sync()
+ * creates no critical section and no extra dependency can be introduced
+ * by interrupts
+ */
+ if (!hlock->hardirqs_off && !hlock->sync) {
if (hlock->read) {
if (!mark_lock(curr, hlock,
LOCK_ENABLED_HARDIRQ_READ))
@@ -4909,7 +4925,7 @@ static int __lock_is_held(const struct lockdep_map *lock, int read);
static int __lock_acquire(struct lockdep_map *lock, unsigned int subclass,
int trylock, int read, int check, int hardirqs_off,
struct lockdep_map *nest_lock, unsigned long ip,
- int references, int pin_count)
+ int references, int pin_count, int sync)
{
struct task_struct *curr = current;
struct lock_class *class = NULL;
@@ -4960,7 +4976,8 @@ static int __lock_acquire(struct lockdep_map *lock, unsigned int subclass,

class_idx = class - lock_classes;

- if (depth) { /* we're holding locks */
+ if (depth && !sync) {
+ /* we're holding locks and the new held lock is not a sync */
hlock = curr->held_locks + depth - 1;
if (hlock->class_idx == class_idx && nest_lock) {
if (!references)
@@ -4994,6 +5011,7 @@ static int __lock_acquire(struct lockdep_map *lock, unsigned int subclass,
hlock->trylock = trylock;
hlock->read = read;
hlock->check = check;
+ hlock->sync = !!sync;
hlock->hardirqs_off = !!hardirqs_off;
hlock->references = references;
#ifdef CONFIG_LOCK_STAT
@@ -5055,6 +5073,10 @@ static int __lock_acquire(struct lockdep_map *lock, unsigned int subclass,
if (!validate_chain(curr, hlock, chain_head, chain_key))
return 0;

+ /* For lock_sync(), we are done here since no actual critical section */
+ if (hlock->sync)
+ return 1;
+
curr->curr_chain_key = chain_key;
curr->lockdep_depth++;
check_chain_key(curr);
@@ -5196,7 +5218,7 @@ static int reacquire_held_locks(struct task_struct *curr, unsigned int depth,
hlock->read, hlock->check,
hlock->hardirqs_off,
hlock->nest_lock, hlock->acquire_ip,
- hlock->references, hlock->pin_count)) {
+ hlock->references, hlock->pin_count, 0)) {
case 0:
return 1;
case 1:
@@ -5666,7 +5688,7 @@ void lock_acquire(struct lockdep_map *lock, unsigned int subclass,

lockdep_recursion_inc();
__lock_acquire(lock, subclass, trylock, read, check,
- irqs_disabled_flags(flags), nest_lock, ip, 0, 0);
+ irqs_disabled_flags(flags), nest_lock, ip, 0, 0, 0);
lockdep_recursion_finish();
raw_local_irq_restore(flags);
}
@@ -5699,11 +5721,6 @@ EXPORT_SYMBOL_GPL(lock_release);
* APIs are used to wait for one or multiple critical sections (on other CPUs
* or threads), and it means that calling these APIs inside these critical
* sections is potential deadlock.
- *
- * This annotation acts as an acqurie+release anontation pair with hardirqoff
- * being 1. Since there's no critical section, no interrupt can create extra
- * dependencies "inside" the annotation, hardirqoff == 1 allows us to avoid
- * false positives.
*/
void lock_sync(struct lockdep_map *lock, unsigned subclass, int read,
int check, struct lockdep_map *nest_lock, unsigned long ip)
@@ -5717,10 +5734,9 @@ void lock_sync(struct lockdep_map *lock, unsigned subclass, int read,
check_flags(flags);

lockdep_recursion_inc();
- __lock_acquire(lock, subclass, 0, read, check, 1, nest_lock, ip, 0, 0);
-
- if (__lock_release(lock, ip))
- check_chain_key(current);
+ __lock_acquire(lock, subclass, 0, read, check,
+ irqs_disabled_flags(flags), nest_lock, ip, 0, 0, 1);
+ check_chain_key(current);
lockdep_recursion_finish();
raw_local_irq_restore(flags);
}
--
2.38.1

2023-01-14 00:35:18

by Boqun Feng

[permalink] [raw]
Subject: Re: [PATCH 2/3] rcu: Equip sleepable RCU with lockdep dependency graph checks

On Sat, Jan 14, 2023 at 07:58:09AM +0800, Hillf Danton wrote:
> On 13 Jan 2023 09:58:10 -0800 Boqun Feng <[email protected]>
> > On Fri, Jan 13, 2023 at 09:03:30PM +0800, Hillf Danton wrote:
> > > On 12 Jan 2023 22:59:54 -0800 Boqun Feng <[email protected]>
> > > > --- a/kernel/rcu/srcutree.c
> > > > +++ b/kernel/rcu/srcutree.c
> > > > @@ -1267,6 +1267,8 @@ static void __synchronize_srcu(struct srcu_struct *ssp, bool do_norm)
> > > > {
> > > > struct rcu_synchronize rcu;
> > > >
> > > > + srcu_lock_sync(&ssp->dep_map);
> > > > +
> > > > RCU_LOCKDEP_WARN(lockdep_is_held(ssp) ||
> > > > lock_is_held(&rcu_bh_lock_map) ||
> > > > lock_is_held(&rcu_lock_map) ||
> > > > --
> > > > 2.38.1
> > >
> > > The following deadlock is able to escape srcu_lock_sync() because the
> > > __lock_release folded in sync leaves one lock on the sync side.
> > >
> > > cpu9 cpu0
> > > --- ---
> > > lock A srcu_lock_acquire(&ssp->dep_map);
> > > srcu_lock_sync(&ssp->dep_map);
> > > lock A
> >
> > But isn't it just the srcu_mutex_ABBA test case in patch #3, and my run
> > of lockdep selftest shows we can catch it. Anything subtle I'm missing?
>
> I am leaning to not call it ABBA deadlock, because B is unlocked.
>
> task X task Y
> --- ---
> lock A
> lock B
> lock B
> unlock B
> wait_for_completion E
>
> lock A
> complete E
>
> And no deadlock should be detected/caught after B goes home.

Your example makes me more confused.. given the case:

task X task Y
--- ---
mutex_lock(A);
srcu_read_lock(B);
synchronze_srcu(B);
mutex_lock(A);

isn't it a deadlock? If your example, A, B or E which one is srcu?

Regards,
Boqun

2023-01-14 08:06:12

by Boqun Feng

[permalink] [raw]
Subject: Re: [PATCH 2/3] rcu: Equip sleepable RCU with lockdep dependency graph checks

On Sat, Jan 14, 2023 at 03:18:32PM +0800, Hillf Danton wrote:
> On Fri, 13 Jan 2023 16:17:59 -0800 Boqun Feng <[email protected]>
> > On Sat, Jan 14, 2023 at 07:58:09AM +0800, Hillf Danton wrote:
> > > On 13 Jan 2023 09:58:10 -0800 Boqun Feng <[email protected]>
> > > > On Fri, Jan 13, 2023 at 09:03:30PM +0800, Hillf Danton wrote:
> > > > > On 12 Jan 2023 22:59:54 -0800 Boqun Feng <[email protected]>
> > > > > > --- a/kernel/rcu/srcutree.c
> > > > > > +++ b/kernel/rcu/srcutree.c
> > > > > > @@ -1267,6 +1267,8 @@ static void __synchronize_srcu(struct srcu_struct *ssp, bool do_norm)
> > > > > > {
> > > > > > struct rcu_synchronize rcu;
> > > > > >
> > > > > > + srcu_lock_sync(&ssp->dep_map);
> > > > > > +
> > > > > > RCU_LOCKDEP_WARN(lockdep_is_held(ssp) ||
> > > > > > lock_is_held(&rcu_bh_lock_map) ||
> > > > > > lock_is_held(&rcu_lock_map) ||
> > > > > > --
> > > > > > 2.38.1
> > > > >
> > > > > The following deadlock is able to escape srcu_lock_sync() because the
> > > > > __lock_release folded in sync leaves one lock on the sync side.
> > > > >
> > > > > cpu9 cpu0
> > > > > --- ---
> > > > > lock A srcu_lock_acquire(&ssp->dep_map);
> > > > > srcu_lock_sync(&ssp->dep_map);
> > > > > lock A
> > > >
> > > > But isn't it just the srcu_mutex_ABBA test case in patch #3, and my run
> > > > of lockdep selftest shows we can catch it. Anything subtle I'm missing?
> > >
> > > I am leaning to not call it ABBA deadlock, because B is unlocked.
> > >
> > > task X task Y
> > > --- ---
> > > lock A
> > > lock B
> > > lock B
> > > unlock B
> > > wait_for_completion E
> > >
> > > lock A
> > > complete E
> > >
> > > And no deadlock should be detected/caught after B goes home.
> >
> > Your example makes me more confused.. given the case:
> >
> > task X task Y
> > --- ---
> > mutex_lock(A);
> > srcu_read_lock(B);
> > synchronze_srcu(B);
> > mutex_lock(A);
> >
> > isn't it a deadlock?
>
> Yes and nope, see below.
>
> > If your example, A, B or E which one is srcu?
>
> A and B are mutex, and E is completion in my example to show the failure
> of catching deadlock in case of non-fake lock. Now see srcu after your change.
>
> task X task Y
> --- ---
> mutex_lock(A);
> srcu_read_lock(B);
> srcu_lock_acquire(&B->dep_map);
> a) lock_map_acquire_read(&B->dep_map);
> synchronze_srcu(B);
> __synchronize_srcu(B);
> srcu_lock_sync(&B->dep_map);
> lock_map_sync(&B->dep_map);
> lock_sync(&B->dep_map);
> __lock_acquire(&B->dep_map);

At this time, lockdep add dependency A -> B in the dependency graph.

> b) lock_map_acquire_read(&B->dep_map);
> __lock_release(&B->dep_map);
> c) lock_map_acquire_read(&B->dep_map);
> mutex_lock(A);

and here, lockdep will try to add dependency B -> A into the dependency
graph, and find that A -> B -> A will form a circle (with strong
dependency), therefore lockdep knows it's a deadlock.

>
> No deadlock could be detected if taskY takes mutexA after taskX releases B,

The timing that taskX releases B doesn't master, since lockdep uses
graph to detect deadlocks rather than after-fact detection.

> and how taskY acquires B does not matter as per the a), b) and c) modes in
> the above chart, again because releasing lock can break deadlock in general.

I have test cases showing the above deadlock can be detected, so if you
think there is a deadlock that may dodge from my change, feel free to
add a test case in lib/locking-selftest.c ;-)

Regards,
Boqun

2023-01-15 03:01:32

by Boqun Feng

[permalink] [raw]
Subject: Re: [PATCH 2/3] rcu: Equip sleepable RCU with lockdep dependency graph checks

On Sat, Jan 14, 2023 at 06:26:59PM +0800, Hillf Danton wrote:
> On Fri, 13 Jan 2023 23:32:01 -0800 Boqun Feng <[email protected]>
> > On Sat, Jan 14, 2023 at 03:18:32PM +0800, Hillf Danton wrote:
> > >
> > > task X task Y
> > > --- ---
> > > mutex_lock(A);
> > > srcu_read_lock(B);
> > > srcu_lock_acquire(&B->dep_map);
> > > a) lock_map_acquire_read(&B->dep_map);
> > > synchronze_srcu(B);
> > > __synchronize_srcu(B);
> > > srcu_lock_sync(&B->dep_map);
> > > lock_map_sync(&B->dep_map);
> > > lock_sync(&B->dep_map);
> > > __lock_acquire(&B->dep_map);
> >
> > At this time, lockdep add dependency A -> B in the dependency graph.
> >
> > > b) lock_map_acquire_read(&B->dep_map);
> > > __lock_release(&B->dep_map);
> > > c) lock_map_acquire_read(&B->dep_map);
> > > mutex_lock(A);
> >
> > and here, lockdep will try to add dependency B -> A into the dependency
> > graph, and find that A -> B -> A will form a circle (with strong
> > dependency), therefore lockdep knows it's a deadlock.
>
> Is the strong dependency applying to mode c)?
> If yes then deadlock should be also detected in the following locking
> pattern that has no deadlock.
>
> cpu0 cpu1
> --- ---
> mutex_lock A
> mutex_lock B
> mutex_unlock B
> mutex_lock B
> mutex_lock A

Well, of course, this is how lockdep works. Lockdep detects the
*potential* deadlocks rather than detects the deadlocks when they
really happen. Otherwise lockdep is useless.

The execution in your example shows the potential deadlocks, i.e. one
task acquires A and then acquires B, the other task acquires B and then
acquires A. Potential deadlocks mean given a correct timing, a deadlock
may happen.

Regards,
Boqun

> >
> > >
> > > No deadlock could be detected if taskY takes mutexA after taskX releases B,
> >
> > The timing that taskX releases B doesn't master, since lockdep uses
> > graph to detect deadlocks rather than after-fact detection.
> >
> > > and how taskY acquires B does not matter as per the a), b) and c) modes in
> > > the above chart, again because releasing lock can break deadlock in general.
> >
> > I have test cases showing the above deadlock can be detected, so if you
> > think there is a deadlock that may dodge from my change, feel free to
> > add a test case in lib/locking-selftest.c ;-)
> >
> > Regards,
> > Boqun

2023-01-16 18:19:21

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH 2/3] rcu: Equip sleepable RCU with lockdep dependency graph checks

On 1/13/23 20:11, Paul E. McKenney wrote:
> On Fri, Jan 13, 2023 at 10:05:22AM -0800, Boqun Feng wrote:
>> On Fri, Jan 13, 2023 at 03:29:49AM -0800, Paul E. McKenney wrote:
>> I prefer that the first two patches go through your tree, because it
>> reduces the synchronization among locking, rcu and KVM trees to the
>> synchronization betwen rcu and KVM trees.
>
> Very well, I have queued and pushed these with the usual wordsmithing,
> thank you!

I'm worried about this case:

CPU 0 CPU 1
-------------------- ------------------
lock A srcu lock B
srcu lock B lock A
srcu unlock B unlock A
unlock A srcu unlock B

While a bit unclean, there is nothing that downright forbids this; as
long as synchronize_srcu does not happen inside lock A, no deadlock can
occur.

However, if srcu is replaced with an rwlock then lockdep should and does
report a deadlock. Boqun, do you get a false positive or do your
patches correctly suppress this?

Paolo

2023-01-16 19:08:08

by Boqun Feng

[permalink] [raw]
Subject: Re: [PATCH 2/3] rcu: Equip sleepable RCU with lockdep dependency graph checks

On Mon, Jan 16, 2023 at 06:36:43PM +0100, Paolo Bonzini wrote:
> On 1/13/23 20:11, Paul E. McKenney wrote:
> > On Fri, Jan 13, 2023 at 10:05:22AM -0800, Boqun Feng wrote:
> > > On Fri, Jan 13, 2023 at 03:29:49AM -0800, Paul E. McKenney wrote:
> > > I prefer that the first two patches go through your tree, because it
> > > reduces the synchronization among locking, rcu and KVM trees to the
> > > synchronization betwen rcu and KVM trees.
> >
> > Very well, I have queued and pushed these with the usual wordsmithing,
> > thank you!
>
> I'm worried about this case:
>
> CPU 0 CPU 1
> -------------------- ------------------
> lock A srcu lock B
> srcu lock B lock A
> srcu unlock B unlock A
> unlock A srcu unlock B
>
> While a bit unclean, there is nothing that downright forbids this; as long
> as synchronize_srcu does not happen inside lock A, no deadlock can occur.
>

First, even with my change, lockdep won't report this as a deadlock
because srcu_read_lock() is annotated as a recursive (unfair) read lock
(the "read" parameter for lock_acquire() is 2) and in this case lockdep
knows that it won't cause deadlock.

For SRCU, the annotation mapping that is 1) srcu_read_lock() is marked
as recursive read lock and 2) synchronize_srcu() is marked as write lock
sync, recursive read locks themselves cannot cause deadlocks and lockdep
is aware of it.

Will add a selftest for this later.

> However, if srcu is replaced with an rwlock then lockdep should and does
> report a deadlock. Boqun, do you get a false positive or do your patches

To be more precise, to have a deadlock, the read lock on CPU 0 has to be
a *fair* read lock (i.e. non-recursive reader, the "read" parameter for
lock_acquire is 1)

> correctly suppress this?
>

I'm pretty sure that lockdep handles this ;-)

Regards,
Boqun

> Paolo
>

2023-01-16 19:38:20

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH 2/3] rcu: Equip sleepable RCU with lockdep dependency graph checks

On Mon, Jan 16, 2023 at 09:54:32AM -0800, Boqun Feng wrote:
> On Mon, Jan 16, 2023 at 06:36:43PM +0100, Paolo Bonzini wrote:
> > On 1/13/23 20:11, Paul E. McKenney wrote:
> > > On Fri, Jan 13, 2023 at 10:05:22AM -0800, Boqun Feng wrote:
> > > > On Fri, Jan 13, 2023 at 03:29:49AM -0800, Paul E. McKenney wrote:
> > > > I prefer that the first two patches go through your tree, because it
> > > > reduces the synchronization among locking, rcu and KVM trees to the
> > > > synchronization betwen rcu and KVM trees.
> > >
> > > Very well, I have queued and pushed these with the usual wordsmithing,
> > > thank you!
> >
> > I'm worried about this case:
> >
> > CPU 0 CPU 1
> > -------------------- ------------------
> > lock A srcu lock B
> > srcu lock B lock A
> > srcu unlock B unlock A
> > unlock A srcu unlock B
> >
> > While a bit unclean, there is nothing that downright forbids this; as long
> > as synchronize_srcu does not happen inside lock A, no deadlock can occur.
> >
>
> First, even with my change, lockdep won't report this as a deadlock
> because srcu_read_lock() is annotated as a recursive (unfair) read lock
> (the "read" parameter for lock_acquire() is 2) and in this case lockdep
> knows that it won't cause deadlock.
>
> For SRCU, the annotation mapping that is 1) srcu_read_lock() is marked
> as recursive read lock and 2) synchronize_srcu() is marked as write lock
> sync, recursive read locks themselves cannot cause deadlocks and lockdep
> is aware of it.
>
> Will add a selftest for this later.
>
> > However, if srcu is replaced with an rwlock then lockdep should and does
> > report a deadlock. Boqun, do you get a false positive or do your patches
>
> To be more precise, to have a deadlock, the read lock on CPU 0 has to be
> a *fair* read lock (i.e. non-recursive reader, the "read" parameter for
> lock_acquire is 1)
>
> > correctly suppress this?
>
> I'm pretty sure that lockdep handles this ;-)

And lockdep agrees, refusing to complain about the following:

idx = srcu_read_lock(&srcu1);
mutex_lock(&mut1);
mutex_unlock(&mut1);
srcu_read_unlock(&srcu1, idx);

mutex_lock(&mut1);
idx = srcu_read_lock(&srcu1);
srcu_read_unlock(&srcu1, idx);
mutex_unlock(&mut1);

Thanx, Paul

2023-01-16 22:43:41

by Waiman Long

[permalink] [raw]
Subject: Re: [PATCH 4/3] locking/lockdep: Improve the deadlock scenario print for sync and read lock

On 1/13/23 18:57, Boqun Feng wrote:
> Lock scenario print is always a weak spot of lockdep splats. Improvement
> can be made if we rework the dependency search and the error printing.
>
> However without touching the graph search, we can improve a little for
> the circular deadlock case, since we have the to-be-added lock
> dependency, and know whether these two locks are read/write/sync.
>
> In order to know whether a held_lock is sync or not, a bit was
> "stolen" from ->references, which reduce our limit for the same lock
> class nesting from 2^12 to 2^11, and it should still be good enough.
>
> Besides, since we now have bit in held_lock for sync, we don't need the
> "hardirqoffs being 1" trick, and also we can avoid the __lock_release()
> if we jump out of __lock_acquire() before the held_lock stored.
>
> With these changes, a deadlock case evolved with read lock and sync gets
> a better print-out from:
>
> [...] Possible unsafe locking scenario:
> [...]
> [...] CPU0 CPU1
> [...] ---- ----
> [...] lock(srcuA);
> [...] lock(srcuB);
> [...] lock(srcuA);
> [...] lock(srcuB);
>
> to
>
> [...] Possible unsafe locking scenario:
> [...]
> [...] CPU0 CPU1
> [...] ---- ----
> [...] rlock(srcuA);
> [...] lock(srcuB);
> [...] lock(srcuA);
> [...] sync(srcuB);
>
> Signed-off-by: Boqun Feng <[email protected]>
> ---
> include/linux/lockdep.h | 3 ++-
> kernel/locking/lockdep.c | 48 ++++++++++++++++++++++++++--------------
> 2 files changed, 34 insertions(+), 17 deletions(-)
>
> diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
> index ba09df6a0872..febd7ecc225c 100644
> --- a/include/linux/lockdep.h
> +++ b/include/linux/lockdep.h
> @@ -134,7 +134,8 @@ struct held_lock {
> unsigned int read:2; /* see lock_acquire() comment */
> unsigned int check:1; /* see lock_acquire() comment */
> unsigned int hardirqs_off:1;
> - unsigned int references:12; /* 32 bits */
> + unsigned int sync:1;
> + unsigned int references:11; /* 32 bits */
> unsigned int pin_count;
> };
>
> diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
> index cffa026a765f..4031d87f6829 100644
> --- a/kernel/locking/lockdep.c
> +++ b/kernel/locking/lockdep.c
> @@ -1880,6 +1880,8 @@ print_circular_lock_scenario(struct held_lock *src,
> struct lock_class *source = hlock_class(src);
> struct lock_class *target = hlock_class(tgt);
> struct lock_class *parent = prt->class;
> + int src_read = src->read;
> + int tgt_read = tgt->read;
>
> /*
> * A direct locking problem where unsafe_class lock is taken
> @@ -1907,7 +1909,10 @@ print_circular_lock_scenario(struct held_lock *src,
> printk(" Possible unsafe locking scenario:\n\n");
> printk(" CPU0 CPU1\n");
> printk(" ---- ----\n");
> - printk(" lock(");
> + if (tgt_read != 0)
> + printk(" rlock(");
> + else
> + printk(" lock(");
> __print_lock_name(target);
> printk(KERN_CONT ");\n");
> printk(" lock(");
> @@ -1916,7 +1921,12 @@ print_circular_lock_scenario(struct held_lock *src,
> printk(" lock(");
> __print_lock_name(target);
> printk(KERN_CONT ");\n");
> - printk(" lock(");
> + if (src_read != 0)
> + printk(" rlock(");
> + else if (src->sync)
> + printk(" sync(");
> + else
> + printk(" lock(");
> __print_lock_name(source);
> printk(KERN_CONT ");\n");
> printk("\n *** DEADLOCK ***\n\n");

src can be sync() but not the target. Is there a reason why that is the
case?


> @@ -4530,7 +4540,13 @@ mark_usage(struct task_struct *curr, struct held_lock *hlock, int check)
> return 0;
> }
> }
> - if (!hlock->hardirqs_off) {
> +
> + /*
> + * For lock_sync(), don't mark the ENABLED usage, since lock_sync()
> + * creates no critical section and no extra dependency can be introduced
> + * by interrupts
> + */
> + if (!hlock->hardirqs_off && !hlock->sync) {
> if (hlock->read) {
> if (!mark_lock(curr, hlock,
> LOCK_ENABLED_HARDIRQ_READ))
> @@ -4909,7 +4925,7 @@ static int __lock_is_held(const struct lockdep_map *lock, int read);
> static int __lock_acquire(struct lockdep_map *lock, unsigned int subclass,
> int trylock, int read, int check, int hardirqs_off,
> struct lockdep_map *nest_lock, unsigned long ip,
> - int references, int pin_count)
> + int references, int pin_count, int sync)
> {
> struct task_struct *curr = current;
> struct lock_class *class = NULL;
> @@ -4960,7 +4976,8 @@ static int __lock_acquire(struct lockdep_map *lock, unsigned int subclass,
>
> class_idx = class - lock_classes;
>
> - if (depth) { /* we're holding locks */
> + if (depth && !sync) {
> + /* we're holding locks and the new held lock is not a sync */
> hlock = curr->held_locks + depth - 1;
> if (hlock->class_idx == class_idx && nest_lock) {
> if (!references)
> @@ -4994,6 +5011,7 @@ static int __lock_acquire(struct lockdep_map *lock, unsigned int subclass,
> hlock->trylock = trylock;
> hlock->read = read;
> hlock->check = check;
> + hlock->sync = !!sync;
> hlock->hardirqs_off = !!hardirqs_off;
> hlock->references = references;
> #ifdef CONFIG_LOCK_STAT
> @@ -5055,6 +5073,10 @@ static int __lock_acquire(struct lockdep_map *lock, unsigned int subclass,
> if (!validate_chain(curr, hlock, chain_head, chain_key))
> return 0;
>
> + /* For lock_sync(), we are done here since no actual critical section */
> + if (hlock->sync)
> + return 1;
> +
> curr->curr_chain_key = chain_key;
> curr->lockdep_depth++;
> check_chain_key(curr);

Even with sync, there is still a corresponding lock_acquire() and
lock_release(), you can't exit here without increasing lockdep_depth.
That can cause underflow.

Cheers,
Longman

2023-01-16 22:50:02

by Waiman Long

[permalink] [raw]
Subject: Re: [PATCH 1/3] locking/lockdep: Introduce lock_sync()

On 1/13/23 01:59, Boqun Feng wrote:
> Currently, in order to annonate functions like synchronize_srcu() for
> lockdep, a trick as follow can be used:
>
> lock_acquire();
> lock_release();
>
> , which indicates synchronize_srcu() acts like an empty critical section
> that waits for other (read-side) critical sections to finish. This
> surely can catch some deadlock, but as discussion brought up by Paul
> Mckenney [1], this could introduce false positives because of
> irq-safe/unsafe detection. Extra tricks might help this:
>
> local_irq_disable(..);
> lock_acquire();
> lock_release();
> local_irq_enable(...);
>
> But it's better that lockdep could provide an annonation for
> synchronize_srcu() like functions, so that people won't need to repeat
> the ugly tricks above. Therefore introduce lock_sync(). It's simply an
> lock+unlock pair with no irq safe/unsafe deadlock check, since the
> to-be-annontated functions don't create real critical sections therefore
> there is no way that irq can create extra dependencies.
>
> [1]: https://lore.kernel.org/lkml/20180412021233.ewncg5jjuzjw3x62@tardis/
>
> Signed-off-by: Boqun Feng <[email protected]>
> ---
> include/linux/lockdep.h | 5 +++++
> kernel/locking/lockdep.c | 34 ++++++++++++++++++++++++++++++++++
> 2 files changed, 39 insertions(+)
>
> diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
> index 1f1099dac3f0..ba09df6a0872 100644
> --- a/include/linux/lockdep.h
> +++ b/include/linux/lockdep.h
> @@ -268,6 +268,10 @@ extern void lock_acquire(struct lockdep_map *lock, unsigned int subclass,
>
> extern void lock_release(struct lockdep_map *lock, unsigned long ip);
>
> +extern void lock_sync(struct lockdep_map *lock, unsigned int subclass,
> + int read, int check, struct lockdep_map *nest_lock,
> + unsigned long ip);
> +
> /* lock_is_held_type() returns */
> #define LOCK_STATE_UNKNOWN -1
> #define LOCK_STATE_NOT_HELD 0
> @@ -555,6 +559,7 @@ do { \
> #define lock_map_acquire_read(l) lock_acquire_shared_recursive(l, 0, 0, NULL, _THIS_IP_)
> #define lock_map_acquire_tryread(l) lock_acquire_shared_recursive(l, 0, 1, NULL, _THIS_IP_)
> #define lock_map_release(l) lock_release(l, _THIS_IP_)
> +#define lock_map_sync(l) lock_sync(l, 0, 0, 1, NULL, _THIS_IP_)
>
> #ifdef CONFIG_PROVE_LOCKING
> # define might_lock(lock) \
> diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
> index e3375bc40dad..cffa026a765f 100644
> --- a/kernel/locking/lockdep.c
> +++ b/kernel/locking/lockdep.c
> @@ -5692,6 +5692,40 @@ void lock_release(struct lockdep_map *lock, unsigned long ip)
> }
> EXPORT_SYMBOL_GPL(lock_release);
>
> +/*
> + * lock_sync() - A special annotation for synchronize_{s,}rcu()-like API.
> + *
> + * No actual critical section is created by the APIs annotated with this: these
> + * APIs are used to wait for one or multiple critical sections (on other CPUs
> + * or threads), and it means that calling these APIs inside these critical
> + * sections is potential deadlock.
> + *
> + * This annotation acts as an acqurie+release anontation pair with hardirqoff
> + * being 1. Since there's no critical section, no interrupt can create extra
> + * dependencies "inside" the annotation, hardirqoff == 1 allows us to avoid
> + * false positives.
> + */
> +void lock_sync(struct lockdep_map *lock, unsigned subclass, int read,
> + int check, struct lockdep_map *nest_lock, unsigned long ip)
> +{
> + unsigned long flags;
> +
> + if (unlikely(!lockdep_enabled()))
> + return;
> +
> + raw_local_irq_save(flags);
> + check_flags(flags);
> +
> + lockdep_recursion_inc();
> + __lock_acquire(lock, subclass, 0, read, check, 1, nest_lock, ip, 0, 0);
> +
> + if (__lock_release(lock, ip))
> + check_chain_key(current);
> + lockdep_recursion_finish();
> + raw_local_irq_restore(flags);
> +}
> +EXPORT_SYMBOL_GPL(lock_sync);
> +
> noinstr int lock_is_held_type(const struct lockdep_map *lock, int read)
> {
> unsigned long flags;

This patch looks good to me.

Acked-by: Waiman Long <[email protected]>

2023-01-16 22:50:07

by Boqun Feng

[permalink] [raw]
Subject: Re: [PATCH 4/3] locking/lockdep: Improve the deadlock scenario print for sync and read lock

On Mon, Jan 16, 2023 at 05:21:09PM -0500, Waiman Long wrote:
> On 1/13/23 18:57, Boqun Feng wrote:
> > Lock scenario print is always a weak spot of lockdep splats. Improvement
> > can be made if we rework the dependency search and the error printing.
> >
> > However without touching the graph search, we can improve a little for
> > the circular deadlock case, since we have the to-be-added lock
> > dependency, and know whether these two locks are read/write/sync.
> >
> > In order to know whether a held_lock is sync or not, a bit was
> > "stolen" from ->references, which reduce our limit for the same lock
> > class nesting from 2^12 to 2^11, and it should still be good enough.
> >
> > Besides, since we now have bit in held_lock for sync, we don't need the
> > "hardirqoffs being 1" trick, and also we can avoid the __lock_release()
> > if we jump out of __lock_acquire() before the held_lock stored.
> >
> > With these changes, a deadlock case evolved with read lock and sync gets
> > a better print-out from:
> >
> > [...] Possible unsafe locking scenario:
> > [...]
> > [...] CPU0 CPU1
> > [...] ---- ----
> > [...] lock(srcuA);
> > [...] lock(srcuB);
> > [...] lock(srcuA);
> > [...] lock(srcuB);
> >
> > to
> >
> > [...] Possible unsafe locking scenario:
> > [...]
> > [...] CPU0 CPU1
> > [...] ---- ----
> > [...] rlock(srcuA);
> > [...] lock(srcuB);
> > [...] lock(srcuA);
> > [...] sync(srcuB);
> >
> > Signed-off-by: Boqun Feng <[email protected]>
> > ---
> > include/linux/lockdep.h | 3 ++-
> > kernel/locking/lockdep.c | 48 ++++++++++++++++++++++++++--------------
> > 2 files changed, 34 insertions(+), 17 deletions(-)
> >
> > diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
> > index ba09df6a0872..febd7ecc225c 100644
> > --- a/include/linux/lockdep.h
> > +++ b/include/linux/lockdep.h
> > @@ -134,7 +134,8 @@ struct held_lock {
> > unsigned int read:2; /* see lock_acquire() comment */
> > unsigned int check:1; /* see lock_acquire() comment */
> > unsigned int hardirqs_off:1;
> > - unsigned int references:12; /* 32 bits */
> > + unsigned int sync:1;
> > + unsigned int references:11; /* 32 bits */
> > unsigned int pin_count;
> > };
> > diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
> > index cffa026a765f..4031d87f6829 100644
> > --- a/kernel/locking/lockdep.c
> > +++ b/kernel/locking/lockdep.c
> > @@ -1880,6 +1880,8 @@ print_circular_lock_scenario(struct held_lock *src,
> > struct lock_class *source = hlock_class(src);
> > struct lock_class *target = hlock_class(tgt);
> > struct lock_class *parent = prt->class;
> > + int src_read = src->read;
> > + int tgt_read = tgt->read;
> > /*
> > * A direct locking problem where unsafe_class lock is taken
> > @@ -1907,7 +1909,10 @@ print_circular_lock_scenario(struct held_lock *src,
> > printk(" Possible unsafe locking scenario:\n\n");
> > printk(" CPU0 CPU1\n");
> > printk(" ---- ----\n");
> > - printk(" lock(");
> > + if (tgt_read != 0)
> > + printk(" rlock(");
> > + else
> > + printk(" lock(");
> > __print_lock_name(target);
> > printk(KERN_CONT ");\n");
> > printk(" lock(");
> > @@ -1916,7 +1921,12 @@ print_circular_lock_scenario(struct held_lock *src,
> > printk(" lock(");
> > __print_lock_name(target);
> > printk(KERN_CONT ");\n");
> > - printk(" lock(");
> > + if (src_read != 0)
> > + printk(" rlock(");
> > + else if (src->sync)
> > + printk(" sync(");
> > + else
> > + printk(" lock(");
> > __print_lock_name(source);
> > printk(KERN_CONT ");\n");
> > printk("\n *** DEADLOCK ***\n\n");
>
> src can be sync() but not the target. Is there a reason why that is the
> case?
>

The functions annotated by sync() don't create real critical sections,
so no lock dependency can be created from a sync(), for example:

synchronize_srcu(A);
mutex_lock(B);

no dependency from A to B. In the scenario case, if we see a dependency
target -> source, the target cannot be a lock_sync(). I will add better
documentation later.

>
> > @@ -4530,7 +4540,13 @@ mark_usage(struct task_struct *curr, struct held_lock *hlock, int check)
> > return 0;
> > }
> > }
> > - if (!hlock->hardirqs_off) {
> > +
> > + /*
> > + * For lock_sync(), don't mark the ENABLED usage, since lock_sync()
> > + * creates no critical section and no extra dependency can be introduced
> > + * by interrupts
> > + */
> > + if (!hlock->hardirqs_off && !hlock->sync) {
> > if (hlock->read) {
> > if (!mark_lock(curr, hlock,
> > LOCK_ENABLED_HARDIRQ_READ))
> > @@ -4909,7 +4925,7 @@ static int __lock_is_held(const struct lockdep_map *lock, int read);
> > static int __lock_acquire(struct lockdep_map *lock, unsigned int subclass,
> > int trylock, int read, int check, int hardirqs_off,
> > struct lockdep_map *nest_lock, unsigned long ip,
> > - int references, int pin_count)
> > + int references, int pin_count, int sync)
> > {
> > struct task_struct *curr = current;
> > struct lock_class *class = NULL;
> > @@ -4960,7 +4976,8 @@ static int __lock_acquire(struct lockdep_map *lock, unsigned int subclass,
> > class_idx = class - lock_classes;
> > - if (depth) { /* we're holding locks */
> > + if (depth && !sync) {
> > + /* we're holding locks and the new held lock is not a sync */
> > hlock = curr->held_locks + depth - 1;
> > if (hlock->class_idx == class_idx && nest_lock) {
> > if (!references)
> > @@ -4994,6 +5011,7 @@ static int __lock_acquire(struct lockdep_map *lock, unsigned int subclass,
> > hlock->trylock = trylock;
> > hlock->read = read;
> > hlock->check = check;
> > + hlock->sync = !!sync;
> > hlock->hardirqs_off = !!hardirqs_off;
> > hlock->references = references;
> > #ifdef CONFIG_LOCK_STAT
> > @@ -5055,6 +5073,10 @@ static int __lock_acquire(struct lockdep_map *lock, unsigned int subclass,
> > if (!validate_chain(curr, hlock, chain_head, chain_key))
> > return 0;
> > + /* For lock_sync(), we are done here since no actual critical section */
> > + if (hlock->sync)
> > + return 1;
> > +
> > curr->curr_chain_key = chain_key;
> > curr->lockdep_depth++;
> > check_chain_key(curr);
>
> Even with sync, there is still a corresponding lock_acquire() and
> lock_release(), you can't exit here without increasing lockdep_depth. That
> can cause underflow.
>

I actually remove the __lock_release() in lock_sync() in this patch, so
I think it's OK. But I must admit the whole submission is to give David
something to see whether the output is an improvement, so I probably
should separate the output changes and the lock_sync() internall into
two patches (and the later can also be folded into the introduction
patch).

Regards,
Boqun

> Cheers,
> Longman
>

2023-01-16 23:20:39

by Waiman Long

[permalink] [raw]
Subject: Re: [PATCH 2/3] rcu: Equip sleepable RCU with lockdep dependency graph checks

On 1/13/23 01:59, Boqun Feng wrote:
> Although all flavors of RCU are annotated correctly with lockdep as
> recursive read locks, their 'check' parameter of lock_acquire() is
> unset. It means that RCU read locks are not added into the lockdep
> dependency graph therefore deadlock detection based on dependency graph
> won't catch deadlock caused by RCU. This is fine for "non-sleepable" RCU
> flavors since wait-context detection and other context based detection
> can catch these deadlocks. However for sleepable RCU, this is limited.
>
> Actually we can detect the deadlocks caused by SRCU by 1) making
> srcu_read_lock() a 'check'ed recursive read lock and 2) making
> synchronize_srcu() a empty write lock critical section. Even better,
> with the newly introduced lock_sync(), we can avoid false positives
> about irq-unsafe/safe. So do it.
>
> Note that NMI safe SRCU read side critical sections are currently not
> annonated, since step-by-step approach can help us deal with
> false-positives. These may be annotated in the future.
>
> Signed-off-by: Boqun Feng <[email protected]>
> ---
> include/linux/srcu.h | 23 +++++++++++++++++++++--
> kernel/rcu/srcutiny.c | 2 ++
> kernel/rcu/srcutree.c | 2 ++
> 3 files changed, 25 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/srcu.h b/include/linux/srcu.h
> index 9b9d0bbf1d3c..a1595f8c5155 100644
> --- a/include/linux/srcu.h
> +++ b/include/linux/srcu.h
> @@ -102,6 +102,21 @@ static inline int srcu_read_lock_held(const struct srcu_struct *ssp)
> return lock_is_held(&ssp->dep_map);
> }
>
> +static inline void srcu_lock_acquire(struct lockdep_map *map)
> +{
> + lock_map_acquire_read(map);
> +}
> +
> +static inline void srcu_lock_release(struct lockdep_map *map)
> +{
> + lock_map_release(map);
> +}
> +
> +static inline void srcu_lock_sync(struct lockdep_map *map)
> +{
> + lock_map_sync(map);
> +}
> +
> #else /* #ifdef CONFIG_DEBUG_LOCK_ALLOC */

It would be nice if a comment is added to document the difference
between the 2 sets of rcu_lock_* and srcu_lock_* functions. It is
described in patch description, but it is not easy to figure that out
just by looking at the source files.

Cheers,
Longman

2023-01-17 02:15:21

by Waiman Long

[permalink] [raw]
Subject: Re: [PATCH 4/3] locking/lockdep: Improve the deadlock scenario print for sync and read lock

On 1/16/23 17:35, Boqun Feng wrote:
> On Mon, Jan 16, 2023 at 05:21:09PM -0500, Waiman Long wrote:
>> On 1/13/23 18:57, Boqun Feng wrote:
>>> Lock scenario print is always a weak spot of lockdep splats. Improvement
>>> can be made if we rework the dependency search and the error printing.
>>>
>>> However without touching the graph search, we can improve a little for
>>> the circular deadlock case, since we have the to-be-added lock
>>> dependency, and know whether these two locks are read/write/sync.
>>>
>>> In order to know whether a held_lock is sync or not, a bit was
>>> "stolen" from ->references, which reduce our limit for the same lock
>>> class nesting from 2^12 to 2^11, and it should still be good enough.
>>>
>>> Besides, since we now have bit in held_lock for sync, we don't need the
>>> "hardirqoffs being 1" trick, and also we can avoid the __lock_release()
>>> if we jump out of __lock_acquire() before the held_lock stored.
>>>
>>> With these changes, a deadlock case evolved with read lock and sync gets
>>> a better print-out from:
>>>
>>> [...] Possible unsafe locking scenario:
>>> [...]
>>> [...] CPU0 CPU1
>>> [...] ---- ----
>>> [...] lock(srcuA);
>>> [...] lock(srcuB);
>>> [...] lock(srcuA);
>>> [...] lock(srcuB);
>>>
>>> to
>>>
>>> [...] Possible unsafe locking scenario:
>>> [...]
>>> [...] CPU0 CPU1
>>> [...] ---- ----
>>> [...] rlock(srcuA);
>>> [...] lock(srcuB);
>>> [...] lock(srcuA);
>>> [...] sync(srcuB);
>>>
>>> Signed-off-by: Boqun Feng <[email protected]>
>>> ---
>>> include/linux/lockdep.h | 3 ++-
>>> kernel/locking/lockdep.c | 48 ++++++++++++++++++++++++++--------------
>>> 2 files changed, 34 insertions(+), 17 deletions(-)
>>>
>>> diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
>>> index ba09df6a0872..febd7ecc225c 100644
>>> --- a/include/linux/lockdep.h
>>> +++ b/include/linux/lockdep.h
>>> @@ -134,7 +134,8 @@ struct held_lock {
>>> unsigned int read:2; /* see lock_acquire() comment */
>>> unsigned int check:1; /* see lock_acquire() comment */
>>> unsigned int hardirqs_off:1;
>>> - unsigned int references:12; /* 32 bits */
>>> + unsigned int sync:1;
>>> + unsigned int references:11; /* 32 bits */
>>> unsigned int pin_count;
>>> };
>>> diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
>>> index cffa026a765f..4031d87f6829 100644
>>> --- a/kernel/locking/lockdep.c
>>> +++ b/kernel/locking/lockdep.c
>>> @@ -1880,6 +1880,8 @@ print_circular_lock_scenario(struct held_lock *src,
>>> struct lock_class *source = hlock_class(src);
>>> struct lock_class *target = hlock_class(tgt);
>>> struct lock_class *parent = prt->class;
>>> + int src_read = src->read;
>>> + int tgt_read = tgt->read;
>>> /*
>>> * A direct locking problem where unsafe_class lock is taken
>>> @@ -1907,7 +1909,10 @@ print_circular_lock_scenario(struct held_lock *src,
>>> printk(" Possible unsafe locking scenario:\n\n");
>>> printk(" CPU0 CPU1\n");
>>> printk(" ---- ----\n");
>>> - printk(" lock(");
>>> + if (tgt_read != 0)
>>> + printk(" rlock(");
>>> + else
>>> + printk(" lock(");
>>> __print_lock_name(target);
>>> printk(KERN_CONT ");\n");
>>> printk(" lock(");
>>> @@ -1916,7 +1921,12 @@ print_circular_lock_scenario(struct held_lock *src,
>>> printk(" lock(");
>>> __print_lock_name(target);
>>> printk(KERN_CONT ");\n");
>>> - printk(" lock(");
>>> + if (src_read != 0)
>>> + printk(" rlock(");
>>> + else if (src->sync)
>>> + printk(" sync(");
>>> + else
>>> + printk(" lock(");
>>> __print_lock_name(source);
>>> printk(KERN_CONT ");\n");
>>> printk("\n *** DEADLOCK ***\n\n");
>> src can be sync() but not the target. Is there a reason why that is the
>> case?
>>
> The functions annotated by sync() don't create real critical sections,
> so no lock dependency can be created from a sync(), for example:
>
> synchronize_srcu(A);
> mutex_lock(B);
>
> no dependency from A to B. In the scenario case, if we see a dependency
> target -> source, the target cannot be a lock_sync(). I will add better
> documentation later.
Right, the dependency won't happen since you reduce lock_sync() to
mostly do validate_chain() without actually storing it in the lock chain
which I did miss in my initial review. Without that, a dependency may
happen if an NMI happens between lock_acquire() and lock_release() in
lock_sync().
>>> @@ -4530,7 +4540,13 @@ mark_usage(struct task_struct *curr, struct held_lock *hlock, int check)
>>> return 0;
>>> }
>>> }
>>> - if (!hlock->hardirqs_off) {
>>> +
>>> + /*
>>> + * For lock_sync(), don't mark the ENABLED usage, since lock_sync()
>>> + * creates no critical section and no extra dependency can be introduced
>>> + * by interrupts
>>> + */
>>> + if (!hlock->hardirqs_off && !hlock->sync) {
>>> if (hlock->read) {
>>> if (!mark_lock(curr, hlock,
>>> LOCK_ENABLED_HARDIRQ_READ))
>>> @@ -4909,7 +4925,7 @@ static int __lock_is_held(const struct lockdep_map *lock, int read);
>>> static int __lock_acquire(struct lockdep_map *lock, unsigned int subclass,
>>> int trylock, int read, int check, int hardirqs_off,
>>> struct lockdep_map *nest_lock, unsigned long ip,
>>> - int references, int pin_count)
>>> + int references, int pin_count, int sync)
>>> {
>>> struct task_struct *curr = current;
>>> struct lock_class *class = NULL;
>>> @@ -4960,7 +4976,8 @@ static int __lock_acquire(struct lockdep_map *lock, unsigned int subclass,
>>> class_idx = class - lock_classes;
>>> - if (depth) { /* we're holding locks */
>>> + if (depth && !sync) {
>>> + /* we're holding locks and the new held lock is not a sync */
>>> hlock = curr->held_locks + depth - 1;
>>> if (hlock->class_idx == class_idx && nest_lock) {
>>> if (!references)
>>> @@ -4994,6 +5011,7 @@ static int __lock_acquire(struct lockdep_map *lock, unsigned int subclass,
>>> hlock->trylock = trylock;
>>> hlock->read = read;
>>> hlock->check = check;
>>> + hlock->sync = !!sync;
>>> hlock->hardirqs_off = !!hardirqs_off;
>>> hlock->references = references;
>>> #ifdef CONFIG_LOCK_STAT
>>> @@ -5055,6 +5073,10 @@ static int __lock_acquire(struct lockdep_map *lock, unsigned int subclass,
>>> if (!validate_chain(curr, hlock, chain_head, chain_key))
>>> return 0;
>>> + /* For lock_sync(), we are done here since no actual critical section */
>>> + if (hlock->sync)
>>> + return 1;
>>> +
>>> curr->curr_chain_key = chain_key;
>>> curr->lockdep_depth++;
>>> check_chain_key(curr);
>> Even with sync, there is still a corresponding lock_acquire() and
>> lock_release(), you can't exit here without increasing lockdep_depth. That
>> can cause underflow.
>>
> I actually remove the __lock_release() in lock_sync() in this patch, so
> I think it's OK. But I must admit the whole submission is to give David
> something to see whether the output is an improvement, so I probably
> should separate the output changes and the lock_sync() internall into
> two patches (and the later can also be folded into the introduction
> patch).

I saw that now. You may not need to separate it into 2 patches since
there is some dependency between the two. You do have to document the 2
different changes in your patch description.

Cheers,
Longman

2023-02-04 02:34:38

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH 0/3] KVM: Make use of SRCU deadlock detection support

On Fri, Jan 13, 2023, David Woodhouse wrote:
> David Woodhouse (3):
> KVM: Show lockdep the kvm->mutex vs. kvm->srcu ordering rule
> KVM: selftests: Use enum for test numbers in xen_shinfo_test
> KVM: selftests: Add EVTCHNOP_send slow path test to xen_shinfo_test
>
> .../testing/selftests/kvm/x86_64/xen_shinfo_test.c | 165 ++++++++++++++-------
> virt/kvm/kvm_main.c | 10 ++
> 2 files changed, 124 insertions(+), 51 deletions(-)

As mentioned in patch three, I'm going to repost the selftests changes on top
of other cleanups, and will plan on applying them next week if all goes well.

Paolo, do you want to grab the KVM change directly?