2021-03-16 16:45:27

by Waiman Long

[permalink] [raw]
Subject: [PATCH 3/4] locking/ww_mutex: Treat ww_mutex_lock() like a trylock

It was found that running the ww_mutex_lock-torture test produced the
following lockdep splat almost immediately:

[ 103.892638] ======================================================
[ 103.892639] WARNING: possible circular locking dependency detected
[ 103.892641] 5.12.0-rc3-debug+ #2 Tainted: G S W
[ 103.892643] ------------------------------------------------------
[ 103.892643] lock_torture_wr/3234 is trying to acquire lock:
[ 103.892646] ffffffffc0b35b10 (torture_ww_mutex_2.base){+.+.}-{3:3}, at: torture_ww_mutex_lock+0x316/0x720 [locktorture]
[ 103.892660]
[ 103.892660] but task is already holding lock:
[ 103.892661] ffffffffc0b35cd0 (torture_ww_mutex_0.base){+.+.}-{3:3}, at: torture_ww_mutex_lock+0x3e2/0x720 [locktorture]
[ 103.892669]
[ 103.892669] which lock already depends on the new lock.
[ 103.892669]
[ 103.892670]
[ 103.892670] the existing dependency chain (in reverse order) is:
[ 103.892671]
[ 103.892671] -> #2 (torture_ww_mutex_0.base){+.+.}-{3:3}:
[ 103.892675] lock_acquire+0x1c5/0x830
[ 103.892682] __ww_mutex_lock.constprop.15+0x1d1/0x2e50
[ 103.892687] ww_mutex_lock+0x4b/0x180
[ 103.892690] torture_ww_mutex_lock+0x316/0x720 [locktorture]
[ 103.892694] lock_torture_writer+0x142/0x3a0 [locktorture]
[ 103.892698] kthread+0x35f/0x430
[ 103.892701] ret_from_fork+0x1f/0x30
[ 103.892706]
[ 103.892706] -> #1 (torture_ww_mutex_1.base){+.+.}-{3:3}:
[ 103.892709] lock_acquire+0x1c5/0x830
[ 103.892712] __ww_mutex_lock.constprop.15+0x1d1/0x2e50
[ 103.892715] ww_mutex_lock+0x4b/0x180
[ 103.892717] torture_ww_mutex_lock+0x316/0x720 [locktorture]
[ 103.892721] lock_torture_writer+0x142/0x3a0 [locktorture]
[ 103.892725] kthread+0x35f/0x430
[ 103.892727] ret_from_fork+0x1f/0x30
[ 103.892730]
[ 103.892730] -> #0 (torture_ww_mutex_2.base){+.+.}-{3:3}:
[ 103.892733] check_prevs_add+0x3fd/0x2470
[ 103.892736] __lock_acquire+0x2602/0x3100
[ 103.892738] lock_acquire+0x1c5/0x830
[ 103.892740] __ww_mutex_lock.constprop.15+0x1d1/0x2e50
[ 103.892743] ww_mutex_lock+0x4b/0x180
[ 103.892746] torture_ww_mutex_lock+0x316/0x720 [locktorture]
[ 103.892749] lock_torture_writer+0x142/0x3a0 [locktorture]
[ 103.892753] kthread+0x35f/0x430
[ 103.892755] ret_from_fork+0x1f/0x30
[ 103.892757]
[ 103.892757] other info that might help us debug this:
[ 103.892757]
[ 103.892758] Chain exists of:
[ 103.892758] torture_ww_mutex_2.base --> torture_ww_mutex_1.base --> torture_ww_mutex_0.base
[ 103.892758]
[ 103.892763] Possible unsafe locking scenario:
[ 103.892763]
[ 103.892764] CPU0 CPU1
[ 103.892765] ---- ----
[ 103.892765] lock(torture_ww_mutex_0.base);
[ 103.892767] lock(torture_ww_mutex_1.base);
[ 103.892770] lock(torture_ww_mutex_0.base);
[ 103.892772] lock(torture_ww_mutex_2.base);
[ 103.892774]
[ 103.892774] *** DEADLOCK ***

Since ww_mutex is supposed to be deadlock-proof if used properly, such
deadlock scenario should not happen. To avoid this false positive splat,
treat ww_mutex_lock() like a trylock().

After applying this patch, the locktorture test can run for a long time
without triggering the circular locking dependency splat.

Signed-off-by: Waiman Long <[email protected]>
---
kernel/locking/mutex.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 622ebdfcd083..bb89393cd3a2 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -946,7 +946,10 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
}

preempt_disable();
- mutex_acquire_nest(&lock->dep_map, subclass, 0, nest_lock, ip);
+ /*
+ * Treat as trylock for ww_mutex.
+ */
+ mutex_acquire_nest(&lock->dep_map, subclass, !!ww_ctx, nest_lock, ip);

if (__mutex_trylock(lock) ||
mutex_optimistic_spin(lock, ww_ctx, NULL)) {
--
2.18.1


2021-03-17 03:06:11

by Davidlohr Bueso

[permalink] [raw]
Subject: Re: [PATCH 3/4] locking/ww_mutex: Treat ww_mutex_lock() like a trylock

On Tue, 16 Mar 2021, Waiman Long wrote:

>It was found that running the ww_mutex_lock-torture test produced the
>following lockdep splat almost immediately:
>
>[ 103.892638] ======================================================
>[ 103.892639] WARNING: possible circular locking dependency detected
>[ 103.892641] 5.12.0-rc3-debug+ #2 Tainted: G S W
>[ 103.892643] ------------------------------------------------------
>[ 103.892643] lock_torture_wr/3234 is trying to acquire lock:
>[ 103.892646] ffffffffc0b35b10 (torture_ww_mutex_2.base){+.+.}-{3:3}, at: torture_ww_mutex_lock+0x316/0x720 [locktorture]
>[ 103.892660]
>[ 103.892660] but task is already holding lock:
>[ 103.892661] ffffffffc0b35cd0 (torture_ww_mutex_0.base){+.+.}-{3:3}, at: torture_ww_mutex_lock+0x3e2/0x720 [locktorture]
>[ 103.892669]
>[ 103.892669] which lock already depends on the new lock.
>[ 103.892669]
>[ 103.892670]
>[ 103.892670] the existing dependency chain (in reverse order) is:
>[ 103.892671]
>[ 103.892671] -> #2 (torture_ww_mutex_0.base){+.+.}-{3:3}:
>[ 103.892675] lock_acquire+0x1c5/0x830
>[ 103.892682] __ww_mutex_lock.constprop.15+0x1d1/0x2e50
>[ 103.892687] ww_mutex_lock+0x4b/0x180
>[ 103.892690] torture_ww_mutex_lock+0x316/0x720 [locktorture]
>[ 103.892694] lock_torture_writer+0x142/0x3a0 [locktorture]
>[ 103.892698] kthread+0x35f/0x430
>[ 103.892701] ret_from_fork+0x1f/0x30
>[ 103.892706]
>[ 103.892706] -> #1 (torture_ww_mutex_1.base){+.+.}-{3:3}:
>[ 103.892709] lock_acquire+0x1c5/0x830
>[ 103.892712] __ww_mutex_lock.constprop.15+0x1d1/0x2e50
>[ 103.892715] ww_mutex_lock+0x4b/0x180
>[ 103.892717] torture_ww_mutex_lock+0x316/0x720 [locktorture]
>[ 103.892721] lock_torture_writer+0x142/0x3a0 [locktorture]
>[ 103.892725] kthread+0x35f/0x430
>[ 103.892727] ret_from_fork+0x1f/0x30
>[ 103.892730]
>[ 103.892730] -> #0 (torture_ww_mutex_2.base){+.+.}-{3:3}:
>[ 103.892733] check_prevs_add+0x3fd/0x2470
>[ 103.892736] __lock_acquire+0x2602/0x3100
>[ 103.892738] lock_acquire+0x1c5/0x830
>[ 103.892740] __ww_mutex_lock.constprop.15+0x1d1/0x2e50
>[ 103.892743] ww_mutex_lock+0x4b/0x180
>[ 103.892746] torture_ww_mutex_lock+0x316/0x720 [locktorture]
>[ 103.892749] lock_torture_writer+0x142/0x3a0 [locktorture]
>[ 103.892753] kthread+0x35f/0x430
>[ 103.892755] ret_from_fork+0x1f/0x30
>[ 103.892757]
>[ 103.892757] other info that might help us debug this:
>[ 103.892757]
>[ 103.892758] Chain exists of:
>[ 103.892758] torture_ww_mutex_2.base --> torture_ww_mutex_1.base --> torture_ww_mutex_0.base
>[ 103.892758]
>[ 103.892763] Possible unsafe locking scenario:
>[ 103.892763]
>[ 103.892764] CPU0 CPU1
>[ 103.892765] ---- ----
>[ 103.892765] lock(torture_ww_mutex_0.base);
>[ 103.892767] lock(torture_ww_mutex_1.base);
>[ 103.892770] lock(torture_ww_mutex_0.base);
>[ 103.892772] lock(torture_ww_mutex_2.base);
>[ 103.892774]
>[ 103.892774] *** DEADLOCK ***
>
>Since ww_mutex is supposed to be deadlock-proof if used properly, such
>deadlock scenario should not happen. To avoid this false positive splat,
>treat ww_mutex_lock() like a trylock().
>
>After applying this patch, the locktorture test can run for a long time
>without triggering the circular locking dependency splat.
>
>Signed-off-by: Waiman Long <[email protected]>

Acked-by Davidlohr Bueso <[email protected]>

Subject: [tip: locking/urgent] locking/ww_mutex: Treat ww_mutex_lock() like a trylock

The following commit has been merged into the locking/urgent branch of tip:

Commit-ID: b058f2e4d0a70c060e21ed122b264e9649cad57f
Gitweb: https://git.kernel.org/tip/b058f2e4d0a70c060e21ed122b264e9649cad57f
Author: Waiman Long <[email protected]>
AuthorDate: Tue, 16 Mar 2021 11:31:18 -04:00
Committer: Ingo Molnar <[email protected]>
CommitterDate: Wed, 17 Mar 2021 09:56:46 +01:00

locking/ww_mutex: Treat ww_mutex_lock() like a trylock

It was found that running the ww_mutex_lock-torture test produced the
following lockdep splat almost immediately:

[ 103.892638] ======================================================
[ 103.892639] WARNING: possible circular locking dependency detected
[ 103.892641] 5.12.0-rc3-debug+ #2 Tainted: G S W
[ 103.892643] ------------------------------------------------------
[ 103.892643] lock_torture_wr/3234 is trying to acquire lock:
[ 103.892646] ffffffffc0b35b10 (torture_ww_mutex_2.base){+.+.}-{3:3}, at: torture_ww_mutex_lock+0x316/0x720 [locktorture]
[ 103.892660]
[ 103.892660] but task is already holding lock:
[ 103.892661] ffffffffc0b35cd0 (torture_ww_mutex_0.base){+.+.}-{3:3}, at: torture_ww_mutex_lock+0x3e2/0x720 [locktorture]
[ 103.892669]
[ 103.892669] which lock already depends on the new lock.
[ 103.892669]
[ 103.892670]
[ 103.892670] the existing dependency chain (in reverse order) is:
[ 103.892671]
[ 103.892671] -> #2 (torture_ww_mutex_0.base){+.+.}-{3:3}:
[ 103.892675] lock_acquire+0x1c5/0x830
[ 103.892682] __ww_mutex_lock.constprop.15+0x1d1/0x2e50
[ 103.892687] ww_mutex_lock+0x4b/0x180
[ 103.892690] torture_ww_mutex_lock+0x316/0x720 [locktorture]
[ 103.892694] lock_torture_writer+0x142/0x3a0 [locktorture]
[ 103.892698] kthread+0x35f/0x430
[ 103.892701] ret_from_fork+0x1f/0x30
[ 103.892706]
[ 103.892706] -> #1 (torture_ww_mutex_1.base){+.+.}-{3:3}:
[ 103.892709] lock_acquire+0x1c5/0x830
[ 103.892712] __ww_mutex_lock.constprop.15+0x1d1/0x2e50
[ 103.892715] ww_mutex_lock+0x4b/0x180
[ 103.892717] torture_ww_mutex_lock+0x316/0x720 [locktorture]
[ 103.892721] lock_torture_writer+0x142/0x3a0 [locktorture]
[ 103.892725] kthread+0x35f/0x430
[ 103.892727] ret_from_fork+0x1f/0x30
[ 103.892730]
[ 103.892730] -> #0 (torture_ww_mutex_2.base){+.+.}-{3:3}:
[ 103.892733] check_prevs_add+0x3fd/0x2470
[ 103.892736] __lock_acquire+0x2602/0x3100
[ 103.892738] lock_acquire+0x1c5/0x830
[ 103.892740] __ww_mutex_lock.constprop.15+0x1d1/0x2e50
[ 103.892743] ww_mutex_lock+0x4b/0x180
[ 103.892746] torture_ww_mutex_lock+0x316/0x720 [locktorture]
[ 103.892749] lock_torture_writer+0x142/0x3a0 [locktorture]
[ 103.892753] kthread+0x35f/0x430
[ 103.892755] ret_from_fork+0x1f/0x30
[ 103.892757]
[ 103.892757] other info that might help us debug this:
[ 103.892757]
[ 103.892758] Chain exists of:
[ 103.892758] torture_ww_mutex_2.base --> torture_ww_mutex_1.base --> torture_ww_mutex_0.base
[ 103.892758]
[ 103.892763] Possible unsafe locking scenario:
[ 103.892763]
[ 103.892764] CPU0 CPU1
[ 103.892765] ---- ----
[ 103.892765] lock(torture_ww_mutex_0.base);
[ 103.892767] lock(torture_ww_mutex_1.base);
[ 103.892770] lock(torture_ww_mutex_0.base);
[ 103.892772] lock(torture_ww_mutex_2.base);
[ 103.892774]
[ 103.892774] *** DEADLOCK ***

Since ww_mutex is supposed to be deadlock-proof if used properly, such
deadlock scenario should not happen. To avoid this false positive splat,
treat ww_mutex_lock() like a trylock().

After applying this patch, the locktorture test can run for a long time
without triggering the circular locking dependency splat.

Signed-off-by: Waiman Long <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Acked-by Davidlohr Bueso <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
kernel/locking/mutex.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 622ebdf..bb89393 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -946,7 +946,10 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
}

preempt_disable();
- mutex_acquire_nest(&lock->dep_map, subclass, 0, nest_lock, ip);
+ /*
+ * Treat as trylock for ww_mutex.
+ */
+ mutex_acquire_nest(&lock->dep_map, subclass, !!ww_ctx, nest_lock, ip);

if (__mutex_trylock(lock) ||
mutex_optimistic_spin(lock, ww_ctx, NULL)) {

2021-03-17 13:17:57

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [tip: locking/urgent] locking/ww_mutex: Treat ww_mutex_lock() like a trylock

On Wed, Mar 17, 2021 at 12:38:21PM -0000, tip-bot2 for Waiman Long wrote:
> The following commit has been merged into the locking/urgent branch of tip:
>
> Commit-ID: b058f2e4d0a70c060e21ed122b264e9649cad57f
> Gitweb: https://git.kernel.org/tip/b058f2e4d0a70c060e21ed122b264e9649cad57f
> Author: Waiman Long <[email protected]>
> AuthorDate: Tue, 16 Mar 2021 11:31:18 -04:00
> Committer: Ingo Molnar <[email protected]>
> CommitterDate: Wed, 17 Mar 2021 09:56:46 +01:00
>
> locking/ww_mutex: Treat ww_mutex_lock() like a trylock
>
> It was found that running the ww_mutex_lock-torture test produced the
> following lockdep splat almost immediately:
>
> [ 103.892638] ======================================================
> [ 103.892639] WARNING: possible circular locking dependency detected
> [ 103.892641] 5.12.0-rc3-debug+ #2 Tainted: G S W
> [ 103.892643] ------------------------------------------------------
> [ 103.892643] lock_torture_wr/3234 is trying to acquire lock:
> [ 103.892646] ffffffffc0b35b10 (torture_ww_mutex_2.base){+.+.}-{3:3}, at: torture_ww_mutex_lock+0x316/0x720 [locktorture]
> [ 103.892660]
> [ 103.892660] but task is already holding lock:
> [ 103.892661] ffffffffc0b35cd0 (torture_ww_mutex_0.base){+.+.}-{3:3}, at: torture_ww_mutex_lock+0x3e2/0x720 [locktorture]
> [ 103.892669]
> [ 103.892669] which lock already depends on the new lock.
> [ 103.892669]
> [ 103.892670]
> [ 103.892670] the existing dependency chain (in reverse order) is:
> [ 103.892671]
> [ 103.892671] -> #2 (torture_ww_mutex_0.base){+.+.}-{3:3}:
> [ 103.892675] lock_acquire+0x1c5/0x830
> [ 103.892682] __ww_mutex_lock.constprop.15+0x1d1/0x2e50
> [ 103.892687] ww_mutex_lock+0x4b/0x180
> [ 103.892690] torture_ww_mutex_lock+0x316/0x720 [locktorture]
> [ 103.892694] lock_torture_writer+0x142/0x3a0 [locktorture]
> [ 103.892698] kthread+0x35f/0x430
> [ 103.892701] ret_from_fork+0x1f/0x30
> [ 103.892706]
> [ 103.892706] -> #1 (torture_ww_mutex_1.base){+.+.}-{3:3}:
> [ 103.892709] lock_acquire+0x1c5/0x830
> [ 103.892712] __ww_mutex_lock.constprop.15+0x1d1/0x2e50
> [ 103.892715] ww_mutex_lock+0x4b/0x180
> [ 103.892717] torture_ww_mutex_lock+0x316/0x720 [locktorture]
> [ 103.892721] lock_torture_writer+0x142/0x3a0 [locktorture]
> [ 103.892725] kthread+0x35f/0x430
> [ 103.892727] ret_from_fork+0x1f/0x30
> [ 103.892730]
> [ 103.892730] -> #0 (torture_ww_mutex_2.base){+.+.}-{3:3}:
> [ 103.892733] check_prevs_add+0x3fd/0x2470
> [ 103.892736] __lock_acquire+0x2602/0x3100
> [ 103.892738] lock_acquire+0x1c5/0x830
> [ 103.892740] __ww_mutex_lock.constprop.15+0x1d1/0x2e50
> [ 103.892743] ww_mutex_lock+0x4b/0x180
> [ 103.892746] torture_ww_mutex_lock+0x316/0x720 [locktorture]
> [ 103.892749] lock_torture_writer+0x142/0x3a0 [locktorture]
> [ 103.892753] kthread+0x35f/0x430
> [ 103.892755] ret_from_fork+0x1f/0x30
> [ 103.892757]
> [ 103.892757] other info that might help us debug this:
> [ 103.892757]
> [ 103.892758] Chain exists of:
> [ 103.892758] torture_ww_mutex_2.base --> torture_ww_mutex_1.base --> torture_ww_mutex_0.base
> [ 103.892758]
> [ 103.892763] Possible unsafe locking scenario:
> [ 103.892763]
> [ 103.892764] CPU0 CPU1
> [ 103.892765] ---- ----
> [ 103.892765] lock(torture_ww_mutex_0.base);
> [ 103.892767] lock(torture_ww_mutex_1.base);
> [ 103.892770] lock(torture_ww_mutex_0.base);
> [ 103.892772] lock(torture_ww_mutex_2.base);
> [ 103.892774]
> [ 103.892774] *** DEADLOCK ***
>
> Since ww_mutex is supposed to be deadlock-proof if used properly, such
> deadlock scenario should not happen. To avoid this false positive splat,
> treat ww_mutex_lock() like a trylock().
>
> After applying this patch, the locktorture test can run for a long time
> without triggering the circular locking dependency splat.
>
> Signed-off-by: Waiman Long <[email protected]>
> Signed-off-by: Ingo Molnar <[email protected]>
> Acked-by Davidlohr Bueso <[email protected]>
> Link: https://lore.kernel.org/r/[email protected]
> ---
> kernel/locking/mutex.c | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
> index 622ebdf..bb89393 100644
> --- a/kernel/locking/mutex.c
> +++ b/kernel/locking/mutex.c
> @@ -946,7 +946,10 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
> }
>
> preempt_disable();
> - mutex_acquire_nest(&lock->dep_map, subclass, 0, nest_lock, ip);
> + /*
> + * Treat as trylock for ww_mutex.
> + */
> + mutex_acquire_nest(&lock->dep_map, subclass, !!ww_ctx, nest_lock, ip);

I'm confused... why isn't nest_lock working here?

For ww_mutex, we're supposed to have ctx->dep_map as a nest_lock, and
all lock acquisitions under a nest lock should be fine. Afaict the above
is just plain wrong.

2021-03-17 14:05:20

by Waiman Long

[permalink] [raw]
Subject: Re: [tip: locking/urgent] locking/ww_mutex: Treat ww_mutex_lock() like a trylock

On 3/17/21 9:31 AM, Peter Zijlstra wrote:
> On Wed, Mar 17, 2021 at 02:12:41PM +0100, Peter Zijlstra wrote:
>> On Wed, Mar 17, 2021 at 12:38:21PM -0000, tip-bot2 for Waiman Long wrote:
>>> + /*
>>> + * Treat as trylock for ww_mutex.
>>> + */
>>> + mutex_acquire_nest(&lock->dep_map, subclass, !!ww_ctx, nest_lock, ip);
>> I'm confused... why isn't nest_lock working here?
>>
>> For ww_mutex, we're supposed to have ctx->dep_map as a nest_lock, and
>> all lock acquisitions under a nest lock should be fine. Afaict the above
>> is just plain wrong.
> To clarify:
>
> mutex_lock(&A); ww_mutex_lock(&B, ctx);
> ww_mutex_lock(&B, ctx); mutex_lock(&A);
>
> should still very much be a deadlock, but your 'fix' makes it not report
> that.
>
> Only order within the ww_ctx can be ignored, and that's exactly what
> nest_lock should be doing.
>
I will take a deeper look into why that is the case.

Cheers,
Longman

2021-03-17 17:11:29

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [tip: locking/urgent] locking/ww_mutex: Treat ww_mutex_lock() like a trylock

On Wed, Mar 17, 2021 at 02:12:41PM +0100, Peter Zijlstra wrote:
> On Wed, Mar 17, 2021 at 12:38:21PM -0000, tip-bot2 for Waiman Long wrote:
> > + /*
> > + * Treat as trylock for ww_mutex.
> > + */
> > + mutex_acquire_nest(&lock->dep_map, subclass, !!ww_ctx, nest_lock, ip);
>
> I'm confused... why isn't nest_lock working here?
>
> For ww_mutex, we're supposed to have ctx->dep_map as a nest_lock, and
> all lock acquisitions under a nest lock should be fine. Afaict the above
> is just plain wrong.

To clarify:

mutex_lock(&A); ww_mutex_lock(&B, ctx);
ww_mutex_lock(&B, ctx); mutex_lock(&A);

should still very much be a deadlock, but your 'fix' makes it not report
that.

Only order within the ww_ctx can be ignored, and that's exactly what
nest_lock should be doing.

2021-03-17 17:20:09

by Waiman Long

[permalink] [raw]
Subject: Re: [tip: locking/urgent] locking/ww_mutex: Treat ww_mutex_lock() like a trylock

On 3/17/21 10:03 AM, Waiman Long wrote:
> On 3/17/21 9:31 AM, Peter Zijlstra wrote:
>> On Wed, Mar 17, 2021 at 02:12:41PM +0100, Peter Zijlstra wrote:
>>> On Wed, Mar 17, 2021 at 12:38:21PM -0000, tip-bot2 for Waiman Long
>>> wrote:
>>>> +    /*
>>>> +     * Treat as trylock for ww_mutex.
>>>> +     */
>>>> +    mutex_acquire_nest(&lock->dep_map, subclass, !!ww_ctx,
>>>> nest_lock, ip);
>>> I'm confused... why isn't nest_lock working here?
>>>
>>> For ww_mutex, we're supposed to have ctx->dep_map as a nest_lock, and
>>> all lock acquisitions under a nest lock should be fine. Afaict the
>>> above
>>> is just plain wrong.
>> To clarify:
>>
>>     mutex_lock(&A);            ww_mutex_lock(&B, ctx);
>>     ww_mutex_lock(&B, ctx);        mutex_lock(&A);
>>
>> should still very much be a deadlock, but your 'fix' makes it not report
>> that.
>>
>> Only order within the ww_ctx can be ignored, and that's exactly what
>> nest_lock should be doing.
>>
> I will take a deeper look into why that is the case.

From reading the source code, nest_lock check is done in
check_deadlock() so that it won't complain. However, nest_lock isn't
considered in check_noncircular() which causes the splat to come out.
Maybe we should add a check for nest_lock there. I will fiddle with the
code to see if it can address the issue.

Cheers,
Longman


2021-03-17 17:22:45

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [tip: locking/urgent] locking/ww_mutex: Treat ww_mutex_lock() like a trylock

On Wed, Mar 17, 2021 at 11:35:12AM -0400, Waiman Long wrote:

> From reading the source code, nest_lock check is done in check_deadlock() so
> that it won't complain. However, nest_lock isn't considered in
> check_noncircular() which causes the splat to come out. Maybe we should add
> a check for nest_lock there. I will fiddle with the code to see if it can
> address the issue.

Nah, that's not how it's supposed to work. I think the problem is that
DEFINE_WW_MUTEX is buggered, there's not actually any other user of it
in-tree.

Everybody else (including locking-selftests) seem to be using
ww_mutex_init().

So all locks in a ww_class should be having the same lock class, and
then nest_lock will fold them all into a single entry with ->references
incremented. See __lock_acquire().

But from the report:

> [ 103.892671] -> #2 (torture_ww_mutex_0.base){+.+.}-{3:3}:
> [ 103.892706] -> #1 (torture_ww_mutex_1.base){+.+.}-{3:3}:
> [ 103.892730] -> #0 (torture_ww_mutex_2.base){+.+.}-{3:3}:

that went sideways, they're *not* the same class.

I think you'll find that if you use ww_mutex_init() it'll all work. Let
me go and zap this patch, and then I'll try and figure out why
DEFINE_WW_MUTEX() is buggered.

2021-03-17 17:43:21

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [tip: locking/urgent] locking/ww_mutex: Treat ww_mutex_lock() like a trylock

On Wed, Mar 17, 2021 at 05:48:48PM +0100, Peter Zijlstra wrote:

> I think you'll find that if you use ww_mutex_init() it'll all work. Let
> me go and zap this patch, and then I'll try and figure out why
> DEFINE_WW_MUTEX() is buggered.

Moo, I can't get the compiler to do as I want :/

The below is so close but doesn't actually compile.. Maybe we should
just give up on DEFINE_WW_MUTEX and simply remove it.

---
diff --git a/include/linux/mutex.h b/include/linux/mutex.h
index 0cd631a19727..85f50538f26a 100644
--- a/include/linux/mutex.h
+++ b/include/linux/mutex.h
@@ -129,12 +129,15 @@ do { \
# define __DEP_MAP_MUTEX_INITIALIZER(lockname)
#endif

-#define __MUTEX_INITIALIZER(lockname) \
+#define ___MUTEX_INITIALIZER(lockname, depmap) \
{ .owner = ATOMIC_LONG_INIT(0) \
, .wait_lock = __SPIN_LOCK_UNLOCKED(lockname.wait_lock) \
, .wait_list = LIST_HEAD_INIT(lockname.wait_list) \
__DEBUG_MUTEX_INITIALIZER(lockname) \
- __DEP_MAP_MUTEX_INITIALIZER(lockname) }
+ depmap }
+
+#define __MUTEX_INITIALIZER(lockname) \
+ ___MUTEX_INITIALIZER(lockname, __DEP_MAP_MUTEX_INITIALIZER(lockname))

#define DEFINE_MUTEX(mutexname) \
struct mutex mutexname = __MUTEX_INITIALIZER(mutexname)
diff --git a/include/linux/ww_mutex.h b/include/linux/ww_mutex.h
index 6ecf2a0220db..c62a030652b4 100644
--- a/include/linux/ww_mutex.h
+++ b/include/linux/ww_mutex.h
@@ -50,9 +50,17 @@ struct ww_acquire_ctx {

#ifdef CONFIG_DEBUG_LOCK_ALLOC
# define __WW_CLASS_MUTEX_INITIALIZER(lockname, class) \
- , .ww_class = class
+ , .ww_class = &(class)
+
+# define __DEP_MAP_WW_MUTEX_INITIALIZER(lockname, class) \
+ , .dep_map = { \
+ .key = &(class).mutex_key, \
+ .name = (class).mutex_name, \
+ .wait_type_inner = LD_WAIT_SLEEP, \
+ }
#else
# define __WW_CLASS_MUTEX_INITIALIZER(lockname, class)
+# define __DEP_MAP_WW_MUTEX_INITIALIZER(lockname, class)
#endif

#define __WW_CLASS_INITIALIZER(ww_class, _is_wait_die) \
@@ -62,7 +70,8 @@ struct ww_acquire_ctx {
, .is_wait_die = _is_wait_die }

#define __WW_MUTEX_INITIALIZER(lockname, class) \
- { .base = __MUTEX_INITIALIZER(lockname.base) \
+ { .base = ___MUTEX_INITIALIZER(lockname.base, \
+ __DEP_MAP_WW_MUTEX_INITIALIZER(lockname.base, class)) \
__WW_CLASS_MUTEX_INITIALIZER(lockname, class) }

#define DEFINE_WD_CLASS(classname) \
diff --git a/kernel/locking/locktorture.c b/kernel/locking/locktorture.c
index 0ab94e1f1276..e8305233eb0b 100644
--- a/kernel/locking/locktorture.c
+++ b/kernel/locking/locktorture.c
@@ -358,9 +358,9 @@ static struct lock_torture_ops mutex_lock_ops = {

#include <linux/ww_mutex.h>
static DEFINE_WD_CLASS(torture_ww_class);
-static DEFINE_WW_MUTEX(torture_ww_mutex_0, &torture_ww_class);
-static DEFINE_WW_MUTEX(torture_ww_mutex_1, &torture_ww_class);
-static DEFINE_WW_MUTEX(torture_ww_mutex_2, &torture_ww_class);
+static DEFINE_WW_MUTEX(torture_ww_mutex_0, torture_ww_class);
+static DEFINE_WW_MUTEX(torture_ww_mutex_1, torture_ww_class);
+static DEFINE_WW_MUTEX(torture_ww_mutex_2, torture_ww_class);

static int torture_ww_mutex_lock(void)
__acquires(torture_ww_mutex_0)

2021-03-17 17:48:30

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [tip: locking/urgent] locking/ww_mutex: Treat ww_mutex_lock() like a trylock

On Wed, Mar 17, 2021 at 06:40:27PM +0100, Peter Zijlstra wrote:
> On Wed, Mar 17, 2021 at 05:48:48PM +0100, Peter Zijlstra wrote:
>
> > I think you'll find that if you use ww_mutex_init() it'll all work. Let
> > me go and zap this patch, and then I'll try and figure out why
> > DEFINE_WW_MUTEX() is buggered.
>
> Moo, I can't get the compiler to do as I want :/
>
> The below is so close but doesn't actually compile.. Maybe we should
> just give up on DEFINE_WW_MUTEX and simply remove it.
>
> ---
> diff --git a/include/linux/mutex.h b/include/linux/mutex.h
> index 0cd631a19727..85f50538f26a 100644
> --- a/include/linux/mutex.h
> +++ b/include/linux/mutex.h
> @@ -129,12 +129,15 @@ do { \
> # define __DEP_MAP_MUTEX_INITIALIZER(lockname)
> #endif
>
> -#define __MUTEX_INITIALIZER(lockname) \
> +#define ___MUTEX_INITIALIZER(lockname, depmap) \
> { .owner = ATOMIC_LONG_INIT(0) \
> , .wait_lock = __SPIN_LOCK_UNLOCKED(lockname.wait_lock) \
> , .wait_list = LIST_HEAD_INIT(lockname.wait_list) \
> __DEBUG_MUTEX_INITIALIZER(lockname) \
> - __DEP_MAP_MUTEX_INITIALIZER(lockname) }
> + depmap }
> +
> +#define __MUTEX_INITIALIZER(lockname) \
> + ___MUTEX_INITIALIZER(lockname, __DEP_MAP_MUTEX_INITIALIZER(lockname))
>
> #define DEFINE_MUTEX(mutexname) \
> struct mutex mutexname = __MUTEX_INITIALIZER(mutexname)
> diff --git a/include/linux/ww_mutex.h b/include/linux/ww_mutex.h
> index 6ecf2a0220db..c62a030652b4 100644
> --- a/include/linux/ww_mutex.h
> +++ b/include/linux/ww_mutex.h
> @@ -50,9 +50,17 @@ struct ww_acquire_ctx {
>
> #ifdef CONFIG_DEBUG_LOCK_ALLOC
> # define __WW_CLASS_MUTEX_INITIALIZER(lockname, class) \
> - , .ww_class = class
> + , .ww_class = &(class)
> +
> +# define __DEP_MAP_WW_MUTEX_INITIALIZER(lockname, class) \
> + , .dep_map = { \
> + .key = &(class).mutex_key, \
> + .name = (class).mutex_name, \

,name = #class "_mutex", \

and it 'works', but shees!

> + .wait_type_inner = LD_WAIT_SLEEP, \
> + }
> #else
> # define __WW_CLASS_MUTEX_INITIALIZER(lockname, class)
> +# define __DEP_MAP_WW_MUTEX_INITIALIZER(lockname, class)
> #endif
>
> #define __WW_CLASS_INITIALIZER(ww_class, _is_wait_die) \
> @@ -62,7 +70,8 @@ struct ww_acquire_ctx {
> , .is_wait_die = _is_wait_die }
>
> #define __WW_MUTEX_INITIALIZER(lockname, class) \
> - { .base = __MUTEX_INITIALIZER(lockname.base) \
> + { .base = ___MUTEX_INITIALIZER(lockname.base, \
> + __DEP_MAP_WW_MUTEX_INITIALIZER(lockname.base, class)) \
> __WW_CLASS_MUTEX_INITIALIZER(lockname, class) }
>
> #define DEFINE_WD_CLASS(classname) \
> diff --git a/kernel/locking/locktorture.c b/kernel/locking/locktorture.c
> index 0ab94e1f1276..e8305233eb0b 100644
> --- a/kernel/locking/locktorture.c
> +++ b/kernel/locking/locktorture.c
> @@ -358,9 +358,9 @@ static struct lock_torture_ops mutex_lock_ops = {
>
> #include <linux/ww_mutex.h>
> static DEFINE_WD_CLASS(torture_ww_class);
> -static DEFINE_WW_MUTEX(torture_ww_mutex_0, &torture_ww_class);
> -static DEFINE_WW_MUTEX(torture_ww_mutex_1, &torture_ww_class);
> -static DEFINE_WW_MUTEX(torture_ww_mutex_2, &torture_ww_class);
> +static DEFINE_WW_MUTEX(torture_ww_mutex_0, torture_ww_class);
> +static DEFINE_WW_MUTEX(torture_ww_mutex_1, torture_ww_class);
> +static DEFINE_WW_MUTEX(torture_ww_mutex_2, torture_ww_class);
>
> static int torture_ww_mutex_lock(void)
> __acquires(torture_ww_mutex_0)

2021-03-17 20:01:52

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [tip: locking/urgent] locking/ww_mutex: Treat ww_mutex_lock() like a trylock

On Wed, Mar 17, 2021 at 02:32:27PM -0400, Waiman Long wrote:
> On 3/17/21 1:45 PM, Peter Zijlstra wrote:

> > > +# define __DEP_MAP_WW_MUTEX_INITIALIZER(lockname, class) \
> > > + , .dep_map = { \
> > > + .key = &(class).mutex_key, \
> > > + .name = (class).mutex_name, \
> > ,name = #class "_mutex", \
> >
> > and it 'works', but shees!
>
> The name string itself may be duplicated for multiple instances of
> DEFINE_WW_MUTEX(). Do you want to keep DEFINE_WW_MUTEX() or just use
> ww_mutex_init() for all?

So linkers can merge literals, but no guarantee. But yeah, lets just
kill the thing, less tricky macro crud to worry about.

> I notice that the problem with DEFINE_WW_MUTEX is that the ww_mutex
> themselves has null key instead of the same key from the ww_class as with
> ww_mutex_init().

Correct.

2021-03-17 20:23:36

by Waiman Long

[permalink] [raw]
Subject: Re: [tip: locking/urgent] locking/ww_mutex: Treat ww_mutex_lock() like a trylock

On 3/17/21 3:58 PM, Peter Zijlstra wrote:
> On Wed, Mar 17, 2021 at 02:32:27PM -0400, Waiman Long wrote:
>> On 3/17/21 1:45 PM, Peter Zijlstra wrote:
>>>> +# define __DEP_MAP_WW_MUTEX_INITIALIZER(lockname, class) \
>>>> + , .dep_map = { \
>>>> + .key = &(class).mutex_key, \
>>>> + .name = (class).mutex_name, \
>>> ,name = #class "_mutex", \
>>>
>>> and it 'works', but shees!
>> The name string itself may be duplicated for multiple instances of
>> DEFINE_WW_MUTEX(). Do you want to keep DEFINE_WW_MUTEX() or just use
>> ww_mutex_init() for all?
> So linkers can merge literals, but no guarantee. But yeah, lets just
> kill the thing, less tricky macro crud to worry about.
>
Good, just to confirm the right way to move forward.

Cheers,
Longman

2021-03-17 21:41:48

by Waiman Long

[permalink] [raw]
Subject: Re: [tip: locking/urgent] locking/ww_mutex: Treat ww_mutex_lock() like a trylock

On 3/17/21 1:40 PM, Peter Zijlstra wrote:
> On Wed, Mar 17, 2021 at 05:48:48PM +0100, Peter Zijlstra wrote:
>
>> I think you'll find that if you use ww_mutex_init() it'll all work. Let
>> me go and zap this patch, and then I'll try and figure out why
>> DEFINE_WW_MUTEX() is buggered.
> Moo, I can't get the compiler to do as I want :/
>
> The below is so close but doesn't actually compile.. Maybe we should
> just give up on DEFINE_WW_MUTEX and simply remove it.
>
> ---
> diff --git a/include/linux/mutex.h b/include/linux/mutex.h
> index 0cd631a19727..85f50538f26a 100644
> --- a/include/linux/mutex.h
> +++ b/include/linux/mutex.h
> @@ -129,12 +129,15 @@ do { \
> # define __DEP_MAP_MUTEX_INITIALIZER(lockname)
> #endif
>
> -#define __MUTEX_INITIALIZER(lockname) \
> +#define ___MUTEX_INITIALIZER(lockname, depmap) \
> { .owner = ATOMIC_LONG_INIT(0) \
> , .wait_lock = __SPIN_LOCK_UNLOCKED(lockname.wait_lock) \
> , .wait_list = LIST_HEAD_INIT(lockname.wait_list) \
> __DEBUG_MUTEX_INITIALIZER(lockname) \
> - __DEP_MAP_MUTEX_INITIALIZER(lockname) }
> + depmap }
> +
> +#define __MUTEX_INITIALIZER(lockname) \
> + ___MUTEX_INITIALIZER(lockname, __DEP_MAP_MUTEX_INITIALIZER(lockname))
>
> #define DEFINE_MUTEX(mutexname) \
> struct mutex mutexname = __MUTEX_INITIALIZER(mutexname)
> diff --git a/include/linux/ww_mutex.h b/include/linux/ww_mutex.h
> index 6ecf2a0220db..c62a030652b4 100644
> --- a/include/linux/ww_mutex.h
> +++ b/include/linux/ww_mutex.h
> @@ -50,9 +50,17 @@ struct ww_acquire_ctx {
>
> #ifdef CONFIG_DEBUG_LOCK_ALLOC
> # define __WW_CLASS_MUTEX_INITIALIZER(lockname, class) \
> - , .ww_class = class
> + , .ww_class = &(class)
> +
> +# define __DEP_MAP_WW_MUTEX_INITIALIZER(lockname, class) \
> + , .dep_map = { \
> + .key = &(class).mutex_key, \
> + .name = (class).mutex_name, \
> + .wait_type_inner = LD_WAIT_SLEEP, \
> + }
> #else
> # define __WW_CLASS_MUTEX_INITIALIZER(lockname, class)
> +# define __DEP_MAP_WW_MUTEX_INITIALIZER(lockname, class)
> #endif
>
> #define __WW_CLASS_INITIALIZER(ww_class, _is_wait_die) \
> @@ -62,7 +70,8 @@ struct ww_acquire_ctx {
> , .is_wait_die = _is_wait_die }
>
> #define __WW_MUTEX_INITIALIZER(lockname, class) \
> - { .base = __MUTEX_INITIALIZER(lockname.base) \
> + { .base = ___MUTEX_INITIALIZER(lockname.base, \
> + __DEP_MAP_WW_MUTEX_INITIALIZER(lockname.base, class)) \
> __WW_CLASS_MUTEX_INITIALIZER(lockname, class) }
>
> #define DEFINE_WD_CLASS(classname) \
> diff --git a/kernel/locking/locktorture.c b/kernel/locking/locktorture.c
> index 0ab94e1f1276..e8305233eb0b 100644
> --- a/kernel/locking/locktorture.c
> +++ b/kernel/locking/locktorture.c
> @@ -358,9 +358,9 @@ static struct lock_torture_ops mutex_lock_ops = {
>
> #include <linux/ww_mutex.h>
> static DEFINE_WD_CLASS(torture_ww_class);
> -static DEFINE_WW_MUTEX(torture_ww_mutex_0, &torture_ww_class);
> -static DEFINE_WW_MUTEX(torture_ww_mutex_1, &torture_ww_class);
> -static DEFINE_WW_MUTEX(torture_ww_mutex_2, &torture_ww_class);
> +static DEFINE_WW_MUTEX(torture_ww_mutex_0, torture_ww_class);
> +static DEFINE_WW_MUTEX(torture_ww_mutex_1, torture_ww_class);
> +static DEFINE_WW_MUTEX(torture_ww_mutex_2, torture_ww_class);
>
> static int torture_ww_mutex_lock(void)
> __acquires(torture_ww_mutex_0)
>
I changed locktorture to use ww_mutex_init() and the lockdep splat was
indeed gone. So zapping WW_MUTEX_INIT() and use ww_mutex_init() is a
possible solution. I will send out a patch to do that.

Thanks,
Longman

2021-03-17 21:42:54

by Waiman Long

[permalink] [raw]
Subject: Re: [tip: locking/urgent] locking/ww_mutex: Treat ww_mutex_lock() like a trylock

On 3/17/21 1:45 PM, Peter Zijlstra wrote:
> On Wed, Mar 17, 2021 at 06:40:27PM +0100, Peter Zijlstra wrote:
>> On Wed, Mar 17, 2021 at 05:48:48PM +0100, Peter Zijlstra wrote:
>>
>>> I think you'll find that if you use ww_mutex_init() it'll all work. Let
>>> me go and zap this patch, and then I'll try and figure out why
>>> DEFINE_WW_MUTEX() is buggered.
>> Moo, I can't get the compiler to do as I want :/
>>
>> The below is so close but doesn't actually compile.. Maybe we should
>> just give up on DEFINE_WW_MUTEX and simply remove it.
>>
>> ---
>> diff --git a/include/linux/mutex.h b/include/linux/mutex.h
>> index 0cd631a19727..85f50538f26a 100644
>> --- a/include/linux/mutex.h
>> +++ b/include/linux/mutex.h
>> @@ -129,12 +129,15 @@ do { \
>> # define __DEP_MAP_MUTEX_INITIALIZER(lockname)
>> #endif
>>
>> -#define __MUTEX_INITIALIZER(lockname) \
>> +#define ___MUTEX_INITIALIZER(lockname, depmap) \
>> { .owner = ATOMIC_LONG_INIT(0) \
>> , .wait_lock = __SPIN_LOCK_UNLOCKED(lockname.wait_lock) \
>> , .wait_list = LIST_HEAD_INIT(lockname.wait_list) \
>> __DEBUG_MUTEX_INITIALIZER(lockname) \
>> - __DEP_MAP_MUTEX_INITIALIZER(lockname) }
>> + depmap }
>> +
>> +#define __MUTEX_INITIALIZER(lockname) \
>> + ___MUTEX_INITIALIZER(lockname, __DEP_MAP_MUTEX_INITIALIZER(lockname))
>>
>> #define DEFINE_MUTEX(mutexname) \
>> struct mutex mutexname = __MUTEX_INITIALIZER(mutexname)
>> diff --git a/include/linux/ww_mutex.h b/include/linux/ww_mutex.h
>> index 6ecf2a0220db..c62a030652b4 100644
>> --- a/include/linux/ww_mutex.h
>> +++ b/include/linux/ww_mutex.h
>> @@ -50,9 +50,17 @@ struct ww_acquire_ctx {
>>
>> #ifdef CONFIG_DEBUG_LOCK_ALLOC
>> # define __WW_CLASS_MUTEX_INITIALIZER(lockname, class) \
>> - , .ww_class = class
>> + , .ww_class = &(class)
>> +
>> +# define __DEP_MAP_WW_MUTEX_INITIALIZER(lockname, class) \
>> + , .dep_map = { \
>> + .key = &(class).mutex_key, \
>> + .name = (class).mutex_name, \
> ,name = #class "_mutex", \
>
> and it 'works', but shees!

The name string itself may be duplicated for multiple instances of
DEFINE_WW_MUTEX(). Do you want to keep DEFINE_WW_MUTEX() or just use
ww_mutex_init() for all?

I notice that the problem with DEFINE_WW_MUTEX is that the ww_mutex
themselves has null key instead of the same key from the ww_class as
with ww_mutex_init().

Cheers,
Longman


2021-03-18 02:26:57

by Boqun Feng

[permalink] [raw]
Subject: Re: [PATCH 3/4] locking/ww_mutex: Treat ww_mutex_lock() like a trylock

Hi Waiman,

Just a question out of curiosity: how does this problem hide so long?
;-) Because IIUC, both locktorture and ww_mutex_lock have been there for
a while, so why didn't we spot this earlier?

I ask just to make sure we don't introduce the problem because of some
subtle problems in lock(dep).

Regards,
Boqun

On Tue, Mar 16, 2021 at 11:31:18AM -0400, Waiman Long wrote:
> It was found that running the ww_mutex_lock-torture test produced the
> following lockdep splat almost immediately:
>
> [ 103.892638] ======================================================
> [ 103.892639] WARNING: possible circular locking dependency detected
> [ 103.892641] 5.12.0-rc3-debug+ #2 Tainted: G S W
> [ 103.892643] ------------------------------------------------------
> [ 103.892643] lock_torture_wr/3234 is trying to acquire lock:
> [ 103.892646] ffffffffc0b35b10 (torture_ww_mutex_2.base){+.+.}-{3:3}, at: torture_ww_mutex_lock+0x316/0x720 [locktorture]
> [ 103.892660]
> [ 103.892660] but task is already holding lock:
> [ 103.892661] ffffffffc0b35cd0 (torture_ww_mutex_0.base){+.+.}-{3:3}, at: torture_ww_mutex_lock+0x3e2/0x720 [locktorture]
> [ 103.892669]
> [ 103.892669] which lock already depends on the new lock.
> [ 103.892669]
> [ 103.892670]
> [ 103.892670] the existing dependency chain (in reverse order) is:
> [ 103.892671]
> [ 103.892671] -> #2 (torture_ww_mutex_0.base){+.+.}-{3:3}:
> [ 103.892675] lock_acquire+0x1c5/0x830
> [ 103.892682] __ww_mutex_lock.constprop.15+0x1d1/0x2e50
> [ 103.892687] ww_mutex_lock+0x4b/0x180
> [ 103.892690] torture_ww_mutex_lock+0x316/0x720 [locktorture]
> [ 103.892694] lock_torture_writer+0x142/0x3a0 [locktorture]
> [ 103.892698] kthread+0x35f/0x430
> [ 103.892701] ret_from_fork+0x1f/0x30
> [ 103.892706]
> [ 103.892706] -> #1 (torture_ww_mutex_1.base){+.+.}-{3:3}:
> [ 103.892709] lock_acquire+0x1c5/0x830
> [ 103.892712] __ww_mutex_lock.constprop.15+0x1d1/0x2e50
> [ 103.892715] ww_mutex_lock+0x4b/0x180
> [ 103.892717] torture_ww_mutex_lock+0x316/0x720 [locktorture]
> [ 103.892721] lock_torture_writer+0x142/0x3a0 [locktorture]
> [ 103.892725] kthread+0x35f/0x430
> [ 103.892727] ret_from_fork+0x1f/0x30
> [ 103.892730]
> [ 103.892730] -> #0 (torture_ww_mutex_2.base){+.+.}-{3:3}:
> [ 103.892733] check_prevs_add+0x3fd/0x2470
> [ 103.892736] __lock_acquire+0x2602/0x3100
> [ 103.892738] lock_acquire+0x1c5/0x830
> [ 103.892740] __ww_mutex_lock.constprop.15+0x1d1/0x2e50
> [ 103.892743] ww_mutex_lock+0x4b/0x180
> [ 103.892746] torture_ww_mutex_lock+0x316/0x720 [locktorture]
> [ 103.892749] lock_torture_writer+0x142/0x3a0 [locktorture]
> [ 103.892753] kthread+0x35f/0x430
> [ 103.892755] ret_from_fork+0x1f/0x30
> [ 103.892757]
> [ 103.892757] other info that might help us debug this:
> [ 103.892757]
> [ 103.892758] Chain exists of:
> [ 103.892758] torture_ww_mutex_2.base --> torture_ww_mutex_1.base --> torture_ww_mutex_0.base
> [ 103.892758]
> [ 103.892763] Possible unsafe locking scenario:
> [ 103.892763]
> [ 103.892764] CPU0 CPU1
> [ 103.892765] ---- ----
> [ 103.892765] lock(torture_ww_mutex_0.base);
> [ 103.892767] lock(torture_ww_mutex_1.base);
> [ 103.892770] lock(torture_ww_mutex_0.base);
> [ 103.892772] lock(torture_ww_mutex_2.base);
> [ 103.892774]
> [ 103.892774] *** DEADLOCK ***
>
> Since ww_mutex is supposed to be deadlock-proof if used properly, such
> deadlock scenario should not happen. To avoid this false positive splat,
> treat ww_mutex_lock() like a trylock().
>
> After applying this patch, the locktorture test can run for a long time
> without triggering the circular locking dependency splat.
>
> Signed-off-by: Waiman Long <[email protected]>
> ---
> kernel/locking/mutex.c | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
> index 622ebdfcd083..bb89393cd3a2 100644
> --- a/kernel/locking/mutex.c
> +++ b/kernel/locking/mutex.c
> @@ -946,7 +946,10 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
> }
>
> preempt_disable();
> - mutex_acquire_nest(&lock->dep_map, subclass, 0, nest_lock, ip);
> + /*
> + * Treat as trylock for ww_mutex.
> + */
> + mutex_acquire_nest(&lock->dep_map, subclass, !!ww_ctx, nest_lock, ip);
>
> if (__mutex_trylock(lock) ||
> mutex_optimistic_spin(lock, ww_ctx, NULL)) {
> --
> 2.18.1
>

2021-03-18 02:58:18

by Waiman Long

[permalink] [raw]
Subject: Re: [PATCH 3/4] locking/ww_mutex: Treat ww_mutex_lock() like a trylock

On 3/17/21 10:24 PM, Boqun Feng wrote:
> Hi Waiman,
>
> Just a question out of curiosity: how does this problem hide so long?
> ;-) Because IIUC, both locktorture and ww_mutex_lock have been there for
> a while, so why didn't we spot this earlier?
>
> I ask just to make sure we don't introduce the problem because of some
> subtle problems in lock(dep).
>
You have to explicitly specify ww_mutex in the locktorture module
parameter to run the test. ww_mutex is usually not the intended target
of testing as there aren't that many places that use it. Even if someone
run it, it probably is not on a debug kernel.

Our QA people try to run locktorture on ww_mutex and discover that.

Cheers,
Longman

2021-03-18 06:40:41

by Boqun Feng

[permalink] [raw]
Subject: Re: [PATCH 3/4] locking/ww_mutex: Treat ww_mutex_lock() like a trylock

On Wed, Mar 17, 2021 at 10:54:17PM -0400, Waiman Long wrote:
> On 3/17/21 10:24 PM, Boqun Feng wrote:
> > Hi Waiman,
> >
> > Just a question out of curiosity: how does this problem hide so long?
> > ;-) Because IIUC, both locktorture and ww_mutex_lock have been there for
> > a while, so why didn't we spot this earlier?
> >
> > I ask just to make sure we don't introduce the problem because of some
> > subtle problems in lock(dep).
> >
> You have to explicitly specify ww_mutex in the locktorture module parameter
> to run the test. ww_mutex is usually not the intended target of testing as
> there aren't that many places that use it. Even if someone run it, it
> probably is not on a debug kernel.
>
> Our QA people try to run locktorture on ww_mutex and discover that.
>

Got it. Thanks ;-)

Regards,
Boqun

> Cheers,
> Longman
>