Hi Peter,
As we talked in IRC, Johannes Berg reported a problem and I could
reproduce by the selftest case in patch #4, there are three issues:
1) printing of backwards dependency path doesn't work as expected,
2) we have unnecessary and incorrect save_trace() call
3) check_irq_usage() may result in a wrong depedency path (when a
real one really exits).
Fix them separately in patch 1~3.
Regards,
Boqun
Boqun Feng (4):
locking/lockdep: Fix the dep path printing for backwards BFS
locking/lockdep: Remove the unnecessary trace saving
lockding/lockdep: Avoid to find wrong lock dep path in
check_irq_usage()
locking/selftests: Add a selftest for check_irq_usage()
kernel/locking/lockdep.c | 123 +++++++++++++++++++++++++++++++++++++--
lib/locking-selftest.c | 65 +++++++++++++++++++++
2 files changed, 182 insertions(+), 6 deletions(-)
--
2.30.2
In print_bad_irq_dependency(), save_trace() is called to set the ->trace
for @prev_root as the current call trace, however @prev_root corresponds
to the the held lock, which may not be acquired in current call trace,
therefore it's wrong to use save_trace() to set ->trace of @prev_root.
Moreover, with our adjustment of printing backwards dependency path, the
->trace of @prev_root is unncessary, so remove it.
Reported-by: Johannes Berg <[email protected]>
Signed-off-by: Boqun Feng <[email protected]>
---
kernel/locking/lockdep.c | 3 ---
1 file changed, 3 deletions(-)
diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 3b32cd9cdfd0..74d084a398be 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -2550,9 +2550,6 @@ print_bad_irq_dependency(struct task_struct *curr,
lockdep_print_held_locks(curr);
pr_warn("\nthe dependencies between %s-irq-safe lock and the holding lock:\n", irqclass);
- prev_root->trace = save_trace();
- if (!prev_root->trace)
- return;
print_shortest_lock_dependencies_backwards(backwards_entry, prev_root);
pr_warn("\nthe dependencies between the lock to be acquired");
--
2.30.2
In the step #3 of check_irq_usage(), we seach backwards to find a lock
whose usage conflicts the usage of @target_entry1 on safe/unsafe.
However, we should only keep the irq-unsafe usage of @target_entry1 into
consideration, because it could be a case where a lock is hardirq-unsafe
but soft-safe, and in check_irq_usage() we find it because its
hardirq-unsafe could result into a hardirq-safe-unsafe deadlock, but
currently since we don't filter out the other usage bits, so we may find
a lock dependency path softirq-unsafe -> softirq-safe, which in fact
doesn't cause a deadlock. And this may cause misleading lockdep splats.
Fix this by only keeping LOCKF_ENABLED_IRQ_ALL bits when we try the
backwards search.
Reported-by: Johannes Berg <[email protected]>
Signed-off-by: Boqun Feng <[email protected]>
---
kernel/locking/lockdep.c | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)
diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 74d084a398be..6ff1e8405a83 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -2768,8 +2768,18 @@ static int check_irq_usage(struct task_struct *curr, struct held_lock *prev,
* Step 3: we found a bad match! Now retrieve a lock from the backward
* list whose usage mask matches the exclusive usage mask from the
* lock found on the forward list.
+ *
+ * Note, we should only keep the LOCKF_ENABLED_IRQ_ALL bits, considering
+ * the follow case:
+ *
+ * When trying to add A -> B to the graph, we find that there is a
+ * hardirq-safe L, that L -> ... -> A, and another hardirq-unsafe M,
+ * that B -> ... -> M. However M is **softirq-safe**, if we use exact
+ * invert bits of M's usage_mask, we will find another lock N that is
+ * **softirq-unsafe** and N -> ... -> A, however N -> .. -> M will not
+ * cause a inversion deadlock.
*/
- backward_mask = original_mask(target_entry1->class->usage_mask);
+ backward_mask = original_mask(target_entry1->class->usage_mask & LOCKF_ENABLED_IRQ_ALL);
ret = find_usage_backwards(&this, backward_mask, &target_entry);
if (bfs_error(ret)) {
--
2.30.2
Hi Boqun,
Great, thanks! I'll ask the folks who could reproduce this issue to do
so as soon as possible.
Not sure if you saw my previous posting:
https://lore.kernel.org/lkml/[email protected]/T/#u
That was with patch 3 of this set already applied.
If I understand correctly, then you're basically saying that if we apply
all the 3 (or 4) patches here, it'll just change the report (that you
can see at the link above) to actually say something that I can
understand to see where the issue is?
Thanks,
johannes
Johannes Berg reported a lockdep problem which could be reproduced by
the special test case introduced in this patch, so add it.
Signed-off-by: Boqun Feng <[email protected]>
---
lib/locking-selftest.c | 65 ++++++++++++++++++++++++++++++++++++++++++
1 file changed, 65 insertions(+)
diff --git a/lib/locking-selftest.c b/lib/locking-selftest.c
index 2d85abac1744..5c50b0910396 100644
--- a/lib/locking-selftest.c
+++ b/lib/locking-selftest.c
@@ -53,6 +53,7 @@ __setup("debug_locks_verbose=", setup_debug_locks_verbose);
#define LOCKTYPE_WW 0x10
#define LOCKTYPE_RTMUTEX 0x20
#define LOCKTYPE_LL 0x40
+#define LOCKTYPE_SPECIAL 0x80
static struct ww_acquire_ctx t, t2;
static struct ww_mutex o, o2, o3;
@@ -2744,6 +2745,66 @@ static void local_lock_tests(void)
pr_cont("\n");
}
+static void hardirq_deadlock_softirq_not_deadlock(void)
+{
+ /* mutex_A is hardirq-unsafe and softirq-unsafe */
+ /* mutex_A -> lock_C */
+ mutex_lock(&mutex_A);
+ HARDIRQ_DISABLE();
+ spin_lock(&lock_C);
+ spin_unlock(&lock_C);
+ HARDIRQ_ENABLE();
+ mutex_unlock(&mutex_A);
+
+ /* lock_A is hardirq-safe */
+ HARDIRQ_ENTER();
+ spin_lock(&lock_A);
+ spin_unlock(&lock_A);
+ HARDIRQ_EXIT();
+
+ /* lock_A -> lock_B */
+ HARDIRQ_DISABLE();
+ spin_lock(&lock_A);
+ spin_lock(&lock_B);
+ spin_unlock(&lock_B);
+ spin_unlock(&lock_A);
+ HARDIRQ_ENABLE();
+
+ /* lock_B -> lock_C */
+ HARDIRQ_DISABLE();
+ spin_lock(&lock_B);
+ spin_lock(&lock_C);
+ spin_unlock(&lock_C);
+ spin_unlock(&lock_B);
+ HARDIRQ_ENABLE();
+
+ /* lock_D is softirq-safe */
+ SOFTIRQ_ENTER();
+ spin_lock(&lock_D);
+ spin_unlock(&lock_D);
+ SOFTIRQ_EXIT();
+
+ /* And lock_D is hardirq-unsafe */
+ SOFTIRQ_DISABLE();
+ spin_lock(&lock_D);
+ spin_unlock(&lock_D);
+ SOFTIRQ_ENABLE();
+
+ /*
+ * mutex_A -> lock_C -> lock_D is softirq-unsafe -> softirq-safe, not
+ * deadlock.
+ *
+ * lock_A -> lock_B -> lock_C -> lock_D is hardirq-safe ->
+ * hardirq-unsafe, deadlock.
+ */
+ HARDIRQ_DISABLE();
+ spin_lock(&lock_C);
+ spin_lock(&lock_D);
+ spin_unlock(&lock_D);
+ spin_unlock(&lock_C);
+ HARDIRQ_ENABLE();
+}
+
void locking_selftest(void)
{
/*
@@ -2872,6 +2933,10 @@ void locking_selftest(void)
local_lock_tests();
+ print_testname("hardirq_unsafe_softirq_safe");
+ dotest(hardirq_deadlock_softirq_not_deadlock, FAILURE, LOCKTYPE_SPECIAL);
+ pr_cont("\n");
+
if (unexpected_testcase_failures) {
printk("-----------------------------------------------------------------\n");
debug_locks = 0;
--
2.30.2
On Fri, Jun 18, 2021 at 07:36:26PM +0200, Johannes Berg wrote:
> Hi Boqun,
>
Hello,
> Great, thanks! I'll ask the folks who could reproduce this issue to do
> so as soon as possible.
>
> Not sure if you saw my previous posting:
> https://lore.kernel.org/lkml/[email protected]/T/#u
>
I just replied that thread on the particular deadlock scenario there.
> That was with patch 3 of this set already applied.
>
>
> If I understand correctly, then you're basically saying that if we apply
> all the 3 (or 4) patches here, it'll just change the report (that you
> can see at the link above) to actually say something that I can
> understand to see where the issue is?
>
Sort of, these patches will provide you a correct lock dependency path
on the deadlock possibility along with the correct call stacks. However,
currently I don't think it's easy for lockdep to print the 4-CPU
scenario that I post in the other email, so it will still take some
effort to decode the lockdep, so you will still need some lockdep
knowledge to understand the real issue ;-(
Regards,
Boqun
> Thanks,
> johannes
>
>
The following commit has been merged into the locking/core branch of tip:
Commit-ID: 8946ccc25ed22d957ca7f0b6fac1dcf6d25eaf1f
Gitweb: https://git.kernel.org/tip/8946ccc25ed22d957ca7f0b6fac1dcf6d25eaf1f
Author: Boqun Feng <[email protected]>
AuthorDate: Sat, 19 Jun 2021 01:01:10 +08:00
Committer: Peter Zijlstra <[email protected]>
CommitterDate: Tue, 22 Jun 2021 16:42:07 +02:00
locking/selftests: Add a selftest for check_irq_usage()
Johannes Berg reported a lockdep problem which could be reproduced by
the special test case introduced in this patch, so add it.
Signed-off-by: Boqun Feng <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
lib/locking-selftest.c | 65 +++++++++++++++++++++++++++++++++++++++++-
1 file changed, 65 insertions(+)
diff --git a/lib/locking-selftest.c b/lib/locking-selftest.c
index 2d85aba..5c50b09 100644
--- a/lib/locking-selftest.c
+++ b/lib/locking-selftest.c
@@ -53,6 +53,7 @@ __setup("debug_locks_verbose=", setup_debug_locks_verbose);
#define LOCKTYPE_WW 0x10
#define LOCKTYPE_RTMUTEX 0x20
#define LOCKTYPE_LL 0x40
+#define LOCKTYPE_SPECIAL 0x80
static struct ww_acquire_ctx t, t2;
static struct ww_mutex o, o2, o3;
@@ -2744,6 +2745,66 @@ static void local_lock_tests(void)
pr_cont("\n");
}
+static void hardirq_deadlock_softirq_not_deadlock(void)
+{
+ /* mutex_A is hardirq-unsafe and softirq-unsafe */
+ /* mutex_A -> lock_C */
+ mutex_lock(&mutex_A);
+ HARDIRQ_DISABLE();
+ spin_lock(&lock_C);
+ spin_unlock(&lock_C);
+ HARDIRQ_ENABLE();
+ mutex_unlock(&mutex_A);
+
+ /* lock_A is hardirq-safe */
+ HARDIRQ_ENTER();
+ spin_lock(&lock_A);
+ spin_unlock(&lock_A);
+ HARDIRQ_EXIT();
+
+ /* lock_A -> lock_B */
+ HARDIRQ_DISABLE();
+ spin_lock(&lock_A);
+ spin_lock(&lock_B);
+ spin_unlock(&lock_B);
+ spin_unlock(&lock_A);
+ HARDIRQ_ENABLE();
+
+ /* lock_B -> lock_C */
+ HARDIRQ_DISABLE();
+ spin_lock(&lock_B);
+ spin_lock(&lock_C);
+ spin_unlock(&lock_C);
+ spin_unlock(&lock_B);
+ HARDIRQ_ENABLE();
+
+ /* lock_D is softirq-safe */
+ SOFTIRQ_ENTER();
+ spin_lock(&lock_D);
+ spin_unlock(&lock_D);
+ SOFTIRQ_EXIT();
+
+ /* And lock_D is hardirq-unsafe */
+ SOFTIRQ_DISABLE();
+ spin_lock(&lock_D);
+ spin_unlock(&lock_D);
+ SOFTIRQ_ENABLE();
+
+ /*
+ * mutex_A -> lock_C -> lock_D is softirq-unsafe -> softirq-safe, not
+ * deadlock.
+ *
+ * lock_A -> lock_B -> lock_C -> lock_D is hardirq-safe ->
+ * hardirq-unsafe, deadlock.
+ */
+ HARDIRQ_DISABLE();
+ spin_lock(&lock_C);
+ spin_lock(&lock_D);
+ spin_unlock(&lock_D);
+ spin_unlock(&lock_C);
+ HARDIRQ_ENABLE();
+}
+
void locking_selftest(void)
{
/*
@@ -2872,6 +2933,10 @@ void locking_selftest(void)
local_lock_tests();
+ print_testname("hardirq_unsafe_softirq_safe");
+ dotest(hardirq_deadlock_softirq_not_deadlock, FAILURE, LOCKTYPE_SPECIAL);
+ pr_cont("\n");
+
if (unexpected_testcase_failures) {
printk("-----------------------------------------------------------------\n");
debug_locks = 0;
The following commit has been merged into the locking/core branch of tip:
Commit-ID: 7b1f8c6179769af6ffa055e1169610b51d71edd5
Gitweb: https://git.kernel.org/tip/7b1f8c6179769af6ffa055e1169610b51d71edd5
Author: Boqun Feng <[email protected]>
AuthorDate: Sat, 19 Jun 2021 01:01:09 +08:00
Committer: Peter Zijlstra <[email protected]>
CommitterDate: Tue, 22 Jun 2021 16:42:07 +02:00
lockding/lockdep: Avoid to find wrong lock dep path in check_irq_usage()
In the step #3 of check_irq_usage(), we seach backwards to find a lock
whose usage conflicts the usage of @target_entry1 on safe/unsafe.
However, we should only keep the irq-unsafe usage of @target_entry1 into
consideration, because it could be a case where a lock is hardirq-unsafe
but soft-safe, and in check_irq_usage() we find it because its
hardirq-unsafe could result into a hardirq-safe-unsafe deadlock, but
currently since we don't filter out the other usage bits, so we may find
a lock dependency path softirq-unsafe -> softirq-safe, which in fact
doesn't cause a deadlock. And this may cause misleading lockdep splats.
Fix this by only keeping LOCKF_ENABLED_IRQ_ALL bits when we try the
backwards search.
Reported-by: Johannes Berg <[email protected]>
Signed-off-by: Boqun Feng <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
kernel/locking/lockdep.c | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)
diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 74d084a..6ff1e84 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -2768,8 +2768,18 @@ static int check_irq_usage(struct task_struct *curr, struct held_lock *prev,
* Step 3: we found a bad match! Now retrieve a lock from the backward
* list whose usage mask matches the exclusive usage mask from the
* lock found on the forward list.
+ *
+ * Note, we should only keep the LOCKF_ENABLED_IRQ_ALL bits, considering
+ * the follow case:
+ *
+ * When trying to add A -> B to the graph, we find that there is a
+ * hardirq-safe L, that L -> ... -> A, and another hardirq-unsafe M,
+ * that B -> ... -> M. However M is **softirq-safe**, if we use exact
+ * invert bits of M's usage_mask, we will find another lock N that is
+ * **softirq-unsafe** and N -> ... -> A, however N -> .. -> M will not
+ * cause a inversion deadlock.
*/
- backward_mask = original_mask(target_entry1->class->usage_mask);
+ backward_mask = original_mask(target_entry1->class->usage_mask & LOCKF_ENABLED_IRQ_ALL);
ret = find_usage_backwards(&this, backward_mask, &target_entry);
if (bfs_error(ret)) {
The following commit has been merged into the locking/core branch of tip:
Commit-ID: d4c157c7b1a67a0844a904baaca9a840c196c103
Gitweb: https://git.kernel.org/tip/d4c157c7b1a67a0844a904baaca9a840c196c103
Author: Boqun Feng <[email protected]>
AuthorDate: Sat, 19 Jun 2021 01:01:08 +08:00
Committer: Peter Zijlstra <[email protected]>
CommitterDate: Tue, 22 Jun 2021 16:42:07 +02:00
locking/lockdep: Remove the unnecessary trace saving
In print_bad_irq_dependency(), save_trace() is called to set the ->trace
for @prev_root as the current call trace, however @prev_root corresponds
to the the held lock, which may not be acquired in current call trace,
therefore it's wrong to use save_trace() to set ->trace of @prev_root.
Moreover, with our adjustment of printing backwards dependency path, the
->trace of @prev_root is unncessary, so remove it.
Reported-by: Johannes Berg <[email protected]>
Signed-off-by: Boqun Feng <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
kernel/locking/lockdep.c | 3 ---
1 file changed, 3 deletions(-)
diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 3b32cd9..74d084a 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -2550,9 +2550,6 @@ print_bad_irq_dependency(struct task_struct *curr,
lockdep_print_held_locks(curr);
pr_warn("\nthe dependencies between %s-irq-safe lock and the holding lock:\n", irqclass);
- prev_root->trace = save_trace();
- if (!prev_root->trace)
- return;
print_shortest_lock_dependencies_backwards(backwards_entry, prev_root);
pr_warn("\nthe dependencies between the lock to be acquired");