2023-12-19 16:06:34

by Thomas Hellström

[permalink] [raw]
Subject: [PATCH] locking/ww_mutex: Adjust to lockdep nest_lock requirements

When using mutex_acquire_nest() with a nest_lock, lockdep refcounts the
number of acquired lockdep_maps of mutexes of the same class, and also
keeps a pointer to the first acquired lockdep_map of a class. That pointer
is then used for various comparison-, printing- and checking purposes,
but there is no mechanism to actively ensure that lockdep_map stays in
memory. Instead, a warning is printed if the lockdep_map is freed and
there are still held locks of the same lock class, even if the lockdep_map
itself has been released.

In the context of WW/WD transactions that means that if a user unlocks
and frees a ww_mutex from within an ongoing ww transaction, and that
mutex happens to be the first ww_mutex grabbed in the transaction,
such a warning is printed and there might be a risk of a UAF.

Note that this is only problem when lockdep is enabled and affects only
dereferences of struct lockdep_map.

Adjust to this by adding a fake lockdep_map to the acquired context and
make sure it is the first acquired lockdep map of the associated
ww_mutex class. Then hold it for the duration of the WW/WD transaction.

This has the side effect that trying to lock a ww mutex *without* a
ww_acquire_context but where a such context has been acquire, we'd see
a lockdep splat. The test-ww_mutex.c selftest attempts to do that, so
modify that particular test to not acquire a ww_acquire_context if it
is not going to be used.

Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Waiman Long <[email protected]>
Cc: Boqun Feng <[email protected]>
Cc: Maarten Lankhorst <[email protected]>
Cc: Christian König <[email protected]>
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Thomas Hellström <[email protected]>
---
include/linux/ww_mutex.h | 14 ++++++++++++++
kernel/locking/test-ww_mutex.c | 6 ++++--
2 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/include/linux/ww_mutex.h b/include/linux/ww_mutex.h
index bb763085479a..a401a2f31a77 100644
--- a/include/linux/ww_mutex.h
+++ b/include/linux/ww_mutex.h
@@ -65,6 +65,16 @@ struct ww_acquire_ctx {
#endif
#ifdef CONFIG_DEBUG_LOCK_ALLOC
struct lockdep_map dep_map;
+ /**
+ * @first_lock_dep_map: fake lockdep_map for first locked ww_mutex.
+ *
+ * lockdep requires the lockdep_map for the first locked ww_mutex
+ * in a ww transaction to remain in memory until all ww_mutexes of
+ * the transaction have been unlocked. Ensure this by keeping a
+ * fake locked ww_mutex lockdep map between ww_acquire_init() and
+ * ww_acquire_fini().
+ */
+ struct lockdep_map first_lock_dep_map;
#endif
#ifdef CONFIG_DEBUG_WW_MUTEX_SLOWPATH
unsigned int deadlock_inject_interval;
@@ -146,7 +156,10 @@ static inline void ww_acquire_init(struct ww_acquire_ctx *ctx,
debug_check_no_locks_freed((void *)ctx, sizeof(*ctx));
lockdep_init_map(&ctx->dep_map, ww_class->acquire_name,
&ww_class->acquire_key, 0);
+ lockdep_init_map(&ctx->first_lock_dep_map, ww_class->mutex_name,
+ &ww_class->mutex_key, 0);
mutex_acquire(&ctx->dep_map, 0, 0, _RET_IP_);
+ mutex_acquire_nest(&ctx->first_lock_dep_map, 0, 0, &ctx->dep_map, _RET_IP_);
#endif
#ifdef CONFIG_DEBUG_WW_MUTEX_SLOWPATH
ctx->deadlock_inject_interval = 1;
@@ -185,6 +198,7 @@ static inline void ww_acquire_done(struct ww_acquire_ctx *ctx)
static inline void ww_acquire_fini(struct ww_acquire_ctx *ctx)
{
#ifdef CONFIG_DEBUG_LOCK_ALLOC
+ mutex_release(&ctx->first_lock_dep_map, _THIS_IP_);
mutex_release(&ctx->dep_map, _THIS_IP_);
#endif
#ifdef DEBUG_WW_MUTEXES
diff --git a/kernel/locking/test-ww_mutex.c b/kernel/locking/test-ww_mutex.c
index 78719e1ef1b1..c688131ed8f4 100644
--- a/kernel/locking/test-ww_mutex.c
+++ b/kernel/locking/test-ww_mutex.c
@@ -62,7 +62,8 @@ static int __test_mutex(unsigned int flags)
int ret;

ww_mutex_init(&mtx.mutex, &ww_class);
- ww_acquire_init(&ctx, &ww_class);
+ if (flags & TEST_MTX_CTX)
+ ww_acquire_init(&ctx, &ww_class);

INIT_WORK_ONSTACK(&mtx.work, test_mutex_work);
init_completion(&mtx.ready);
@@ -90,7 +91,8 @@ static int __test_mutex(unsigned int flags)
ret = wait_for_completion_timeout(&mtx.done, TIMEOUT);
}
ww_mutex_unlock(&mtx.mutex);
- ww_acquire_fini(&ctx);
+ if (flags & TEST_MTX_CTX)
+ ww_acquire_fini(&ctx);

if (ret) {
pr_err("%s(flags=%x): mutual exclusion failure\n",
--
2.42.0



2023-12-22 09:20:30

by Oliver Sang

[permalink] [raw]
Subject: Re: [PATCH] locking/ww_mutex: Adjust to lockdep nest_lock requirements



Hello,

kernel test robot noticed "WARNING:at_kernel/locking/lockdep.c:#__lock_acquire" on:

commit: 4f37585fa02f3eaebc72efa1260e6742478c00fa ("[PATCH] locking/ww_mutex: Adjust to lockdep nest_lock requirements")
url: https://github.com/intel-lab-lkp/linux/commits/Thomas-Hellstr-m/locking-ww_mutex-Adjust-to-lockdep-nest_lock-requirements/20231220-000906
base: https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git a51749ab34d9e5dec548fe38ede7e01e8bb26454
patch link: https://lore.kernel.org/all/[email protected]/
patch subject: [PATCH] locking/ww_mutex: Adjust to lockdep nest_lock requirements

in testcase: boot

compiler: gcc-7
test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G

(please refer to attached dmesg/kmsg for entire log/backtrace)


+-----------------------------------------------------+------------+------------+
| | a51749ab34 | 4f37585fa0 |
+-----------------------------------------------------+------------+------------+
| WARNING:at_kernel/locking/lockdep.c:#__lock_acquire | 0 | 6 |
| EIP:__lock_acquire | 0 | 6 |
+-----------------------------------------------------+------------+------------+


If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <[email protected]>
| Closes: https://lore.kernel.org/oe-lkp/[email protected]


[ 9.777113][ T10] ------------[ cut here ]------------
[ 9.777784][ T10] DEBUG_LOCKS_WARN_ON(hlock->references < references)
[ 9.777795][ T10] WARNING: CPU: 0 PID: 10 at kernel/locking/lockdep.c:5055 __lock_acquire (kernel/locking/lockdep.c:5055 (discriminator 9))
[ 9.778839][ T10] Modules linked in:
[ 9.778839][ T10] CPU: 0 PID: 10 Comm: kworker/u2:0 Not tainted 6.7.0-rc1-00006-g4f37585fa02f #5
[ 9.778839][ T10] Workqueue: test-ww_mutex stress_inorder_work
[ 9.778839][ T10] EIP: __lock_acquire (kernel/locking/lockdep.c:5055 (discriminator 9))
[ 9.778839][ T10] Code: e8 ae 10 2e 00 85 c0 0f 84 0e 0f 00 00 8b 1d 94 9c b5 ce 85 db 0f 85 00 0f 00 00 68 64 16 34 ce 68 fe 17 33 ce e8 09 22 fc ff <0f> 0b 58 5a e9 28 12 00 00 a1 90 ef 24 cf 85 c0 74 07 eb 2a 8d 74
All code
========
0: e8 ae 10 2e 00 callq 0x2e10b3
5: 85 c0 test %eax,%eax
7: 0f 84 0e 0f 00 00 je 0xf1b
d: 8b 1d 94 9c b5 ce mov -0x314a636c(%rip),%ebx # 0xffffffffceb59ca7
13: 85 db test %ebx,%ebx
15: 0f 85 00 0f 00 00 jne 0xf1b
1b: 68 64 16 34 ce pushq $0xffffffffce341664
20: 68 fe 17 33 ce pushq $0xffffffffce3317fe
25: e8 09 22 fc ff callq 0xfffffffffffc2233
2a:* 0f 0b ud2 <-- trapping instruction
2c: 58 pop %rax
2d: 5a pop %rdx
2e: e9 28 12 00 00 jmpq 0x125b
33: a1 90 ef 24 cf 85 c0 movabs 0x774c085cf24ef90,%eax
3a: 74 07
3c: eb 2a jmp 0x68
3e: 8d .byte 0x8d
3f: 74 .byte 0x74

Code starting with the faulting instruction
===========================================
0: 0f 0b ud2
2: 58 pop %rax
3: 5a pop %rdx
4: e9 28 12 00 00 jmpq 0x1231
9: a1 90 ef 24 cf 85 c0 movabs 0x774c085cf24ef90,%eax
10: 74 07
12: eb 2a jmp 0x3e
14: 8d .byte 0x8d
15: 74 .byte 0x74
[ 9.778839][ T10] EAX: 00000033 EBX: 00000000 ECX: c0245cb4 EDX: ce9da1ac
[ 9.778839][ T10] ESI: c26cb37c EDI: c02569ac EBP: c0245db0 ESP: c0245d60
[ 9.778839][ T10] DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068 EFLAGS: 00010002
[ 9.778839][ T10] CR0: 80050033 CR2: ffdda000 CR3: 0ec21000 CR4: 00040690
[ 9.778839][ T10] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[ 9.778839][ T10] DR6: fffe0ff0 DR7: 00000400
[ 9.778839][ T10] Call Trace:
[ 9.778839][ T10] ? show_regs (arch/x86/kernel/dumpstack.c:478)
[ 9.778839][ T10] ? __warn (kernel/panic.c:677)
[ 9.778839][ T10] ? __lock_acquire (kernel/locking/lockdep.c:5055 (discriminator 9))
[ 9.778839][ T10] ? report_bug (lib/bug.c:201 lib/bug.c:219)
[ 9.778839][ T10] ? exc_overflow (arch/x86/kernel/traps.c:250)
[ 9.778839][ T10] ? handle_bug (arch/x86/kernel/traps.c:237)
[ 9.778839][ T10] ? exc_invalid_op (arch/x86/kernel/traps.c:258 (discriminator 1))
[ 9.778839][ T10] ? handle_exception (arch/x86/entry/entry_32.S:1056)
[ 9.778839][ T10] ? __lock_acquire (kernel/locking/lockdep.c:5047)
[ 9.778839][ T10] ? exc_overflow (arch/x86/kernel/traps.c:250)
[ 9.778839][ T10] ? __lock_acquire (kernel/locking/lockdep.c:5055 (discriminator 9))
[ 9.778839][ T10] ? exc_overflow (arch/x86/kernel/traps.c:250)
[ 9.778839][ T10] ? __lock_acquire (kernel/locking/lockdep.c:5055 (discriminator 9))
[ 9.778839][ T10] ? kvm_sched_clock_read (arch/x86/kernel/kvmclock.c:91)
[ 9.778839][ T10] lock_acquire (kernel/locking/lockdep.c:467 kernel/locking/lockdep.c:5755)
[ 9.778839][ T10] ? stress_inorder_work (kernel/locking/test-ww_mutex.c:457)
[ 9.778839][ T10] ? lock_is_held_type (kernel/locking/lockdep.c:5494 kernel/locking/lockdep.c:5824)
[ 9.778839][ T10] __ww_mutex_lock+0xb1/0xd84
[ 9.778839][ T10] ? stress_inorder_work (kernel/locking/test-ww_mutex.c:457)
[ 9.778839][ T10] ww_mutex_lock (kernel/locking/mutex.c:878)
[ 9.778839][ T10] stress_inorder_work (kernel/locking/test-ww_mutex.c:457)
[ 9.778839][ T10] process_scheduled_works (kernel/workqueue.c:2630 include/linux/jump_label.h:207 include/trace/events/workqueue.h:108 kernel/workqueue.c:2635 kernel/workqueue.c:2703)
[ 9.778839][ T10] worker_thread (include/linux/list.h:373 kernel/workqueue.c:841 kernel/workqueue.c:2785)
[ 9.778839][ T10] kthread (kernel/kthread.c:390)
[ 9.778839][ T10] ? rescuer_thread (kernel/workqueue.c:2730)
[ 9.778839][ T10] ? kthread_complete_and_exit (kernel/kthread.c:341)
[ 9.778839][ T10] ret_from_fork (arch/x86/kernel/process.c:153)
[ 9.778839][ T10] ? kthread_complete_and_exit (kernel/kthread.c:341)
[ 9.778839][ T10] ret_from_fork_asm (arch/x86/entry/entry_32.S:741)
[ 9.778839][ T10] entry_INT80_32 (arch/x86/entry/entry_32.S:947)
[ 9.778839][ T10] irq event stamp: 14037
[ 9.778839][ T10] hardirqs last enabled at (14037): irqentry_exit (kernel/entry/common.c:435)
[ 9.778839][ T10] hardirqs last disabled at (14036): common_interrupt (arch/x86/kernel/irq.c:247)
[ 9.778839][ T10] softirqs last enabled at (8978): __do_softirq (arch/x86/include/asm/preempt.h:27 kernel/softirq.c:400 kernel/softirq.c:582)
[ 9.778839][ T10] softirqs last disabled at (8969): do_softirq_own_stack (arch/x86/kernel/irq_32.c:57 arch/x86/kernel/irq_32.c:147)
[ 9.778839][ T10] ---[ end trace 0000000000000000 ]---
[ 9.917906][ T36] torture_spin_lock_write_delay: delay = 25 jiffies.
[ 10.273181][ T36] torture_spin_lock_write_delay: delay = 25 jiffies.
[ 10.621916][ T36] torture_spin_lock_write_delay: delay = 25 jiffies.
[ 10.788923][ T36] torture_spin_lock_write_delay: delay = 25 jiffies.
[ 10.919665][ T36] torture_spin_lock_write_delay: delay = 26 jiffies.
[ 11.085916][ T36] torture_spin_lock_write_delay: delay = 25 jiffies.
[ 11.471931][ T35] torture_spin_lock_write_delay: delay = 25 jiffies.
[ 11.784602][ T1] All ww mutex selftests passed



The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20231222/[email protected]



--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki