2021-06-18 19:23:50

by Boqun Feng

[permalink] [raw]
Subject: [PATCH 0/4] Fix issues in check_irq_usage()

Hi Peter,

As we talked in IRC, Johannes Berg reported a problem and I could
reproduce by the selftest case in patch #4, there are three issues:

1) printing of backwards dependency path doesn't work as expected,

2) we have unnecessary and incorrect save_trace() call

3) check_irq_usage() may result in a wrong depedency path (when a
real one really exits).

Fix them separately in patch 1~3.

Regards,
Boqun

Boqun Feng (4):
locking/lockdep: Fix the dep path printing for backwards BFS
locking/lockdep: Remove the unnecessary trace saving
lockding/lockdep: Avoid to find wrong lock dep path in
check_irq_usage()
locking/selftests: Add a selftest for check_irq_usage()

kernel/locking/lockdep.c | 123 +++++++++++++++++++++++++++++++++++++--
lib/locking-selftest.c | 65 +++++++++++++++++++++
2 files changed, 182 insertions(+), 6 deletions(-)

--
2.30.2


2021-06-18 19:24:12

by Boqun Feng

[permalink] [raw]
Subject: [PATCH 2/4] locking/lockdep: Remove the unnecessary trace saving

In print_bad_irq_dependency(), save_trace() is called to set the ->trace
for @prev_root as the current call trace, however @prev_root corresponds
to the the held lock, which may not be acquired in current call trace,
therefore it's wrong to use save_trace() to set ->trace of @prev_root.
Moreover, with our adjustment of printing backwards dependency path, the
->trace of @prev_root is unncessary, so remove it.

Reported-by: Johannes Berg <[email protected]>
Signed-off-by: Boqun Feng <[email protected]>
---
kernel/locking/lockdep.c | 3 ---
1 file changed, 3 deletions(-)

diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 3b32cd9cdfd0..74d084a398be 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -2550,9 +2550,6 @@ print_bad_irq_dependency(struct task_struct *curr,
lockdep_print_held_locks(curr);

pr_warn("\nthe dependencies between %s-irq-safe lock and the holding lock:\n", irqclass);
- prev_root->trace = save_trace();
- if (!prev_root->trace)
- return;
print_shortest_lock_dependencies_backwards(backwards_entry, prev_root);

pr_warn("\nthe dependencies between the lock to be acquired");
--
2.30.2

2021-06-18 19:24:28

by Boqun Feng

[permalink] [raw]
Subject: [PATCH 3/4] lockding/lockdep: Avoid to find wrong lock dep path in check_irq_usage()

In the step #3 of check_irq_usage(), we seach backwards to find a lock
whose usage conflicts the usage of @target_entry1 on safe/unsafe.
However, we should only keep the irq-unsafe usage of @target_entry1 into
consideration, because it could be a case where a lock is hardirq-unsafe
but soft-safe, and in check_irq_usage() we find it because its
hardirq-unsafe could result into a hardirq-safe-unsafe deadlock, but
currently since we don't filter out the other usage bits, so we may find
a lock dependency path softirq-unsafe -> softirq-safe, which in fact
doesn't cause a deadlock. And this may cause misleading lockdep splats.

Fix this by only keeping LOCKF_ENABLED_IRQ_ALL bits when we try the
backwards search.

Reported-by: Johannes Berg <[email protected]>
Signed-off-by: Boqun Feng <[email protected]>
---
kernel/locking/lockdep.c | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 74d084a398be..6ff1e8405a83 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -2768,8 +2768,18 @@ static int check_irq_usage(struct task_struct *curr, struct held_lock *prev,
* Step 3: we found a bad match! Now retrieve a lock from the backward
* list whose usage mask matches the exclusive usage mask from the
* lock found on the forward list.
+ *
+ * Note, we should only keep the LOCKF_ENABLED_IRQ_ALL bits, considering
+ * the follow case:
+ *
+ * When trying to add A -> B to the graph, we find that there is a
+ * hardirq-safe L, that L -> ... -> A, and another hardirq-unsafe M,
+ * that B -> ... -> M. However M is **softirq-safe**, if we use exact
+ * invert bits of M's usage_mask, we will find another lock N that is
+ * **softirq-unsafe** and N -> ... -> A, however N -> .. -> M will not
+ * cause a inversion deadlock.
*/
- backward_mask = original_mask(target_entry1->class->usage_mask);
+ backward_mask = original_mask(target_entry1->class->usage_mask & LOCKF_ENABLED_IRQ_ALL);

ret = find_usage_backwards(&this, backward_mask, &target_entry);
if (bfs_error(ret)) {
--
2.30.2

2021-06-18 19:27:10

by Johannes Berg

[permalink] [raw]
Subject: Re: [PATCH 0/4] Fix issues in check_irq_usage()

Hi Boqun,

Great, thanks! I'll ask the folks who could reproduce this issue to do
so as soon as possible.

Not sure if you saw my previous posting:
https://lore.kernel.org/lkml/[email protected]/T/#u

That was with patch 3 of this set already applied.


If I understand correctly, then you're basically saying that if we apply
all the 3 (or 4) patches here, it'll just change the report (that you
can see at the link above) to actually say something that I can
understand to see where the issue is?

Thanks,
johannes


2021-06-18 22:58:56

by Boqun Feng

[permalink] [raw]
Subject: [PATCH 4/4] locking/selftests: Add a selftest for check_irq_usage()

Johannes Berg reported a lockdep problem which could be reproduced by
the special test case introduced in this patch, so add it.

Signed-off-by: Boqun Feng <[email protected]>
---
lib/locking-selftest.c | 65 ++++++++++++++++++++++++++++++++++++++++++
1 file changed, 65 insertions(+)

diff --git a/lib/locking-selftest.c b/lib/locking-selftest.c
index 2d85abac1744..5c50b0910396 100644
--- a/lib/locking-selftest.c
+++ b/lib/locking-selftest.c
@@ -53,6 +53,7 @@ __setup("debug_locks_verbose=", setup_debug_locks_verbose);
#define LOCKTYPE_WW 0x10
#define LOCKTYPE_RTMUTEX 0x20
#define LOCKTYPE_LL 0x40
+#define LOCKTYPE_SPECIAL 0x80

static struct ww_acquire_ctx t, t2;
static struct ww_mutex o, o2, o3;
@@ -2744,6 +2745,66 @@ static void local_lock_tests(void)
pr_cont("\n");
}

+static void hardirq_deadlock_softirq_not_deadlock(void)
+{
+ /* mutex_A is hardirq-unsafe and softirq-unsafe */
+ /* mutex_A -> lock_C */
+ mutex_lock(&mutex_A);
+ HARDIRQ_DISABLE();
+ spin_lock(&lock_C);
+ spin_unlock(&lock_C);
+ HARDIRQ_ENABLE();
+ mutex_unlock(&mutex_A);
+
+ /* lock_A is hardirq-safe */
+ HARDIRQ_ENTER();
+ spin_lock(&lock_A);
+ spin_unlock(&lock_A);
+ HARDIRQ_EXIT();
+
+ /* lock_A -> lock_B */
+ HARDIRQ_DISABLE();
+ spin_lock(&lock_A);
+ spin_lock(&lock_B);
+ spin_unlock(&lock_B);
+ spin_unlock(&lock_A);
+ HARDIRQ_ENABLE();
+
+ /* lock_B -> lock_C */
+ HARDIRQ_DISABLE();
+ spin_lock(&lock_B);
+ spin_lock(&lock_C);
+ spin_unlock(&lock_C);
+ spin_unlock(&lock_B);
+ HARDIRQ_ENABLE();
+
+ /* lock_D is softirq-safe */
+ SOFTIRQ_ENTER();
+ spin_lock(&lock_D);
+ spin_unlock(&lock_D);
+ SOFTIRQ_EXIT();
+
+ /* And lock_D is hardirq-unsafe */
+ SOFTIRQ_DISABLE();
+ spin_lock(&lock_D);
+ spin_unlock(&lock_D);
+ SOFTIRQ_ENABLE();
+
+ /*
+ * mutex_A -> lock_C -> lock_D is softirq-unsafe -> softirq-safe, not
+ * deadlock.
+ *
+ * lock_A -> lock_B -> lock_C -> lock_D is hardirq-safe ->
+ * hardirq-unsafe, deadlock.
+ */
+ HARDIRQ_DISABLE();
+ spin_lock(&lock_C);
+ spin_lock(&lock_D);
+ spin_unlock(&lock_D);
+ spin_unlock(&lock_C);
+ HARDIRQ_ENABLE();
+}
+
void locking_selftest(void)
{
/*
@@ -2872,6 +2933,10 @@ void locking_selftest(void)

local_lock_tests();

+ print_testname("hardirq_unsafe_softirq_safe");
+ dotest(hardirq_deadlock_softirq_not_deadlock, FAILURE, LOCKTYPE_SPECIAL);
+ pr_cont("\n");
+
if (unexpected_testcase_failures) {
printk("-----------------------------------------------------------------\n");
debug_locks = 0;
--
2.30.2

2021-06-21 07:30:45

by Boqun Feng

[permalink] [raw]
Subject: Re: [PATCH 0/4] Fix issues in check_irq_usage()

On Fri, Jun 18, 2021 at 07:36:26PM +0200, Johannes Berg wrote:
> Hi Boqun,
>

Hello,

> Great, thanks! I'll ask the folks who could reproduce this issue to do
> so as soon as possible.
>
> Not sure if you saw my previous posting:
> https://lore.kernel.org/lkml/[email protected]/T/#u
>

I just replied that thread on the particular deadlock scenario there.

> That was with patch 3 of this set already applied.
>
>
> If I understand correctly, then you're basically saying that if we apply
> all the 3 (or 4) patches here, it'll just change the report (that you
> can see at the link above) to actually say something that I can
> understand to see where the issue is?
>

Sort of, these patches will provide you a correct lock dependency path
on the deadlock possibility along with the correct call stacks. However,
currently I don't think it's easy for lockdep to print the 4-CPU
scenario that I post in the other email, so it will still take some
effort to decode the lockdep, so you will still need some lockdep
knowledge to understand the real issue ;-(

Regards,
Boqun

> Thanks,
> johannes
>
>

Subject: [tip: locking/core] locking/selftests: Add a selftest for check_irq_usage()

The following commit has been merged into the locking/core branch of tip:

Commit-ID: 8946ccc25ed22d957ca7f0b6fac1dcf6d25eaf1f
Gitweb: https://git.kernel.org/tip/8946ccc25ed22d957ca7f0b6fac1dcf6d25eaf1f
Author: Boqun Feng <[email protected]>
AuthorDate: Sat, 19 Jun 2021 01:01:10 +08:00
Committer: Peter Zijlstra <[email protected]>
CommitterDate: Tue, 22 Jun 2021 16:42:07 +02:00

locking/selftests: Add a selftest for check_irq_usage()

Johannes Berg reported a lockdep problem which could be reproduced by
the special test case introduced in this patch, so add it.

Signed-off-by: Boqun Feng <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
lib/locking-selftest.c | 65 +++++++++++++++++++++++++++++++++++++++++-
1 file changed, 65 insertions(+)

diff --git a/lib/locking-selftest.c b/lib/locking-selftest.c
index 2d85aba..5c50b09 100644
--- a/lib/locking-selftest.c
+++ b/lib/locking-selftest.c
@@ -53,6 +53,7 @@ __setup("debug_locks_verbose=", setup_debug_locks_verbose);
#define LOCKTYPE_WW 0x10
#define LOCKTYPE_RTMUTEX 0x20
#define LOCKTYPE_LL 0x40
+#define LOCKTYPE_SPECIAL 0x80

static struct ww_acquire_ctx t, t2;
static struct ww_mutex o, o2, o3;
@@ -2744,6 +2745,66 @@ static void local_lock_tests(void)
pr_cont("\n");
}

+static void hardirq_deadlock_softirq_not_deadlock(void)
+{
+ /* mutex_A is hardirq-unsafe and softirq-unsafe */
+ /* mutex_A -> lock_C */
+ mutex_lock(&mutex_A);
+ HARDIRQ_DISABLE();
+ spin_lock(&lock_C);
+ spin_unlock(&lock_C);
+ HARDIRQ_ENABLE();
+ mutex_unlock(&mutex_A);
+
+ /* lock_A is hardirq-safe */
+ HARDIRQ_ENTER();
+ spin_lock(&lock_A);
+ spin_unlock(&lock_A);
+ HARDIRQ_EXIT();
+
+ /* lock_A -> lock_B */
+ HARDIRQ_DISABLE();
+ spin_lock(&lock_A);
+ spin_lock(&lock_B);
+ spin_unlock(&lock_B);
+ spin_unlock(&lock_A);
+ HARDIRQ_ENABLE();
+
+ /* lock_B -> lock_C */
+ HARDIRQ_DISABLE();
+ spin_lock(&lock_B);
+ spin_lock(&lock_C);
+ spin_unlock(&lock_C);
+ spin_unlock(&lock_B);
+ HARDIRQ_ENABLE();
+
+ /* lock_D is softirq-safe */
+ SOFTIRQ_ENTER();
+ spin_lock(&lock_D);
+ spin_unlock(&lock_D);
+ SOFTIRQ_EXIT();
+
+ /* And lock_D is hardirq-unsafe */
+ SOFTIRQ_DISABLE();
+ spin_lock(&lock_D);
+ spin_unlock(&lock_D);
+ SOFTIRQ_ENABLE();
+
+ /*
+ * mutex_A -> lock_C -> lock_D is softirq-unsafe -> softirq-safe, not
+ * deadlock.
+ *
+ * lock_A -> lock_B -> lock_C -> lock_D is hardirq-safe ->
+ * hardirq-unsafe, deadlock.
+ */
+ HARDIRQ_DISABLE();
+ spin_lock(&lock_C);
+ spin_lock(&lock_D);
+ spin_unlock(&lock_D);
+ spin_unlock(&lock_C);
+ HARDIRQ_ENABLE();
+}
+
void locking_selftest(void)
{
/*
@@ -2872,6 +2933,10 @@ void locking_selftest(void)

local_lock_tests();

+ print_testname("hardirq_unsafe_softirq_safe");
+ dotest(hardirq_deadlock_softirq_not_deadlock, FAILURE, LOCKTYPE_SPECIAL);
+ pr_cont("\n");
+
if (unexpected_testcase_failures) {
printk("-----------------------------------------------------------------\n");
debug_locks = 0;

Subject: [tip: locking/core] lockding/lockdep: Avoid to find wrong lock dep path in check_irq_usage()

The following commit has been merged into the locking/core branch of tip:

Commit-ID: 7b1f8c6179769af6ffa055e1169610b51d71edd5
Gitweb: https://git.kernel.org/tip/7b1f8c6179769af6ffa055e1169610b51d71edd5
Author: Boqun Feng <[email protected]>
AuthorDate: Sat, 19 Jun 2021 01:01:09 +08:00
Committer: Peter Zijlstra <[email protected]>
CommitterDate: Tue, 22 Jun 2021 16:42:07 +02:00

lockding/lockdep: Avoid to find wrong lock dep path in check_irq_usage()

In the step #3 of check_irq_usage(), we seach backwards to find a lock
whose usage conflicts the usage of @target_entry1 on safe/unsafe.
However, we should only keep the irq-unsafe usage of @target_entry1 into
consideration, because it could be a case where a lock is hardirq-unsafe
but soft-safe, and in check_irq_usage() we find it because its
hardirq-unsafe could result into a hardirq-safe-unsafe deadlock, but
currently since we don't filter out the other usage bits, so we may find
a lock dependency path softirq-unsafe -> softirq-safe, which in fact
doesn't cause a deadlock. And this may cause misleading lockdep splats.

Fix this by only keeping LOCKF_ENABLED_IRQ_ALL bits when we try the
backwards search.

Reported-by: Johannes Berg <[email protected]>
Signed-off-by: Boqun Feng <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
kernel/locking/lockdep.c | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 74d084a..6ff1e84 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -2768,8 +2768,18 @@ static int check_irq_usage(struct task_struct *curr, struct held_lock *prev,
* Step 3: we found a bad match! Now retrieve a lock from the backward
* list whose usage mask matches the exclusive usage mask from the
* lock found on the forward list.
+ *
+ * Note, we should only keep the LOCKF_ENABLED_IRQ_ALL bits, considering
+ * the follow case:
+ *
+ * When trying to add A -> B to the graph, we find that there is a
+ * hardirq-safe L, that L -> ... -> A, and another hardirq-unsafe M,
+ * that B -> ... -> M. However M is **softirq-safe**, if we use exact
+ * invert bits of M's usage_mask, we will find another lock N that is
+ * **softirq-unsafe** and N -> ... -> A, however N -> .. -> M will not
+ * cause a inversion deadlock.
*/
- backward_mask = original_mask(target_entry1->class->usage_mask);
+ backward_mask = original_mask(target_entry1->class->usage_mask & LOCKF_ENABLED_IRQ_ALL);

ret = find_usage_backwards(&this, backward_mask, &target_entry);
if (bfs_error(ret)) {

Subject: [tip: locking/core] locking/lockdep: Remove the unnecessary trace saving

The following commit has been merged into the locking/core branch of tip:

Commit-ID: d4c157c7b1a67a0844a904baaca9a840c196c103
Gitweb: https://git.kernel.org/tip/d4c157c7b1a67a0844a904baaca9a840c196c103
Author: Boqun Feng <[email protected]>
AuthorDate: Sat, 19 Jun 2021 01:01:08 +08:00
Committer: Peter Zijlstra <[email protected]>
CommitterDate: Tue, 22 Jun 2021 16:42:07 +02:00

locking/lockdep: Remove the unnecessary trace saving

In print_bad_irq_dependency(), save_trace() is called to set the ->trace
for @prev_root as the current call trace, however @prev_root corresponds
to the the held lock, which may not be acquired in current call trace,
therefore it's wrong to use save_trace() to set ->trace of @prev_root.
Moreover, with our adjustment of printing backwards dependency path, the
->trace of @prev_root is unncessary, so remove it.

Reported-by: Johannes Berg <[email protected]>
Signed-off-by: Boqun Feng <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
kernel/locking/lockdep.c | 3 ---
1 file changed, 3 deletions(-)

diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 3b32cd9..74d084a 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -2550,9 +2550,6 @@ print_bad_irq_dependency(struct task_struct *curr,
lockdep_print_held_locks(curr);

pr_warn("\nthe dependencies between %s-irq-safe lock and the holding lock:\n", irqclass);
- prev_root->trace = save_trace();
- if (!prev_root->trace)
- return;
print_shortest_lock_dependencies_backwards(backwards_entry, prev_root);

pr_warn("\nthe dependencies between the lock to be acquired");