2024-05-08 07:54:02

by Ubisectech Sirius

[permalink] [raw]
Subject: inconsistent lock state in __mmap_lock_do_trace_released

Hello.
We are Ubisectech Sirius Team, the vulnerability lab of China ValiantSec. Recently, our team has discovered a issue in Linux kernel 6.7. Attached to the email were a PoC file of the issue.

Stack dump:

================================
WARNING: inconsistent lock state
6.7.0 #2 Not tainted
--------------------------------
inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
modprobe/17218 [HC1[1]:SC0[0]:HE0:SE1] takes:
ffff88802c6376a0 (lock#13){?.+.}-{2:2}, at: local_lock_acquire include/linux/local_lock_internal.h:29 [inline]
ffff88802c6376a0 (lock#13){?.+.}-{2:2}, at: __mmap_lock_do_trace_released+0x7b/0x740 mm/mmap_lock.c:243
{HARDIRQ-ON-W} state was registered at:
lock_acquire kernel/locking/lockdep.c:5754 [inline]
lock_acquire+0x1b1/0x530 kernel/locking/lockdep.c:5719
local_lock_acquire include/linux/local_lock_internal.h:29 [inline]
__mmap_lock_do_trace_acquire_returned+0x96/0x740 mm/mmap_lock.c:237
__mmap_lock_trace_acquire_returned include/linux/mmap_lock.h:36 [inline]
mmap_read_trylock include/linux/mmap_lock.h:166 [inline]
get_mmap_lock_carefully mm/memory.c:5372 [inline]
lock_mm_and_find_vma+0xf1/0x5a0 mm/memory.c:5432
do_user_addr_fault+0x390/0x1010 arch/x86/mm/fault.c:1387
handle_page_fault arch/x86/mm/fault.c:1507 [inline]
exc_page_fault+0x99/0x180 arch/x86/mm/fault.c:1563
asm_exc_page_fault+0x26/0x30 arch/x86/include/asm/idtentry.h:570
__put_user_8+0x11/0x20 arch/x86/lib/putuser.S:105
clear_rseq_cs kernel/rseq.c:257 [inline]
rseq_ip_fixup kernel/rseq.c:291 [inline]
__rseq_handle_notify_resume+0xd50/0x1000 kernel/rseq.c:329
rseq_handle_notify_resume include/linux/sched.h:2361 [inline]
resume_user_mode_work include/linux/resume_user_mode.h:61 [inline]
exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
exit_to_user_mode_prepare+0x170/0x240 kernel/entry/common.c:204
__syscall_exit_to_user_mode_work kernel/entry/common.c:285 [inline]
syscall_exit_to_user_mode+0x1e/0x60 kernel/entry/common.c:296
do_syscall_64+0x53/0x120 arch/x86/entry/common.c:89
entry_SYSCALL_64_after_hwframe+0x6f/0x77
irq event stamp: 696
hardirqs last enabled at (695): [<ffffffff813d875b>] flush_tlb_mm_range+0x26b/0x340 arch/x86/mm/tlb.c:1035
hardirqs last disabled at (696): [<ffffffff8a85e448>] sysvec_irq_work+0x18/0xf0 arch/x86/kernel/irq_work.c:17
softirqs last enabled at (244): [<ffffffff81329389>] local_bh_enable include/linux/bottom_half.h:33 [inline]
softirqs last enabled at (244): [<ffffffff81329389>] fpregs_unlock arch/x86/include/asm/fpu/api.h:80 [inline]
softirqs last enabled at (244): [<ffffffff81329389>] fpu_reset_fpregs arch/x86/kernel/fpu/core.c:733 [inline]
softirqs last enabled at (244): [<ffffffff81329389>] fpu_flush_thread+0x309/0x400 arch/x86/kernel/fpu/core.c:777
softirqs last disabled at (242): [<ffffffff813292ba>] local_bh_disable include/linux/bottom_half.h:20 [inline]
softirqs last disabled at (242): [<ffffffff813292ba>] fpregs_lock arch/x86/include/asm/fpu/api.h:72 [inline]
softirqs last disabled at (242): [<ffffffff813292ba>] fpu_reset_fpregs arch/x86/kernel/fpu/core.c:716 [inline]
softirqs last disabled at (242): [<ffffffff813292ba>] fpu_flush_thread+0x23a/0x400 arch/x86/kernel/fpu/core.c:777

other info that might help us debug this:
Possible unsafe locking scenario:

CPU0
----
lock(lock#13);
<Interrupt>
lock(lock#13);

*** DEADLOCK ***

3 locks held by modprobe/17218:
#0: ffff8880200de658 (&vma->vm_lock->lock){++++}-{3:3}, at: vma_start_read include/linux/mm.h:663 [inline]
#0: ffff8880200de658 (&vma->vm_lock->lock){++++}-{3:3}, at: lock_vma_under_rcu+0x1e1/0x960 mm/memory.c:5501
#1: ffffffff8d3a9f60 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:301 [inline]
#1: ffffffff8d3a9f60 (rcu_read_lock){....}-{1:2}, at: rcu_read_lock include/linux/rcupdate.h:747 [inline]
#1: ffffffff8d3a9f60 (rcu_read_lock){....}-{1:2}, at: __pte_offset_map+0x42/0x570 mm/pgtable-generic.c:285
#2: ffff8880491b0258 (ptlock_ptr(ptdesc)#2){+.+.}-{2:2}, at: spin_lock include/linux/spinlock.h:351 [inline]
#2: ffff8880491b0258 (ptlock_ptr(ptdesc)#2){+.+.}-{2:2}, at: __pte_offset_map_lock+0x10d/0x2f0 mm/pgtable-generic.c:373

stack backtrace:
CPU: 0 PID: 17218 Comm: modprobe Not tainted 6.7.0 #2
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xd9/0x150 lib/dump_stack.c:106
print_usage_bug kernel/locking/lockdep.c:3971 [inline]
valid_state kernel/locking/lockdep.c:4013 [inline]
mark_lock_irq kernel/locking/lockdep.c:4216 [inline]
mark_lock+0x99b/0xd60 kernel/locking/lockdep.c:4678
mark_usage kernel/locking/lockdep.c:4564 [inline]
__lock_acquire+0x1339/0x3bb0 kernel/locking/lockdep.c:5091
lock_acquire kernel/locking/lockdep.c:5754 [inline]
lock_acquire+0x1b1/0x530 kernel/locking/lockdep.c:5719
local_lock_acquire include/linux/local_lock_internal.h:29 [inline]
__mmap_lock_do_trace_released+0x93/0x740 mm/mmap_lock.c:243
__mmap_lock_trace_released include/linux/mmap_lock.h:42 [inline]
mmap_read_unlock_non_owner include/linux/mmap_lock.h:178 [inline]
do_mmap_read_unlock+0x4f/0x60 kernel/bpf/task_iter.c:1054
irq_work_single+0x127/0x260 kernel/irq_work.c:221
irq_work_run_list+0x91/0xc0 kernel/irq_work.c:252
irq_work_run+0x58/0xd0 kernel/irq_work.c:261
__sysvec_irq_work+0x82/0x3a0 arch/x86/kernel/irq_work.c:22
sysvec_irq_work+0xcb/0xf0 arch/x86/kernel/irq_work.c:17
</IRQ>
<TASK>
asm_sysvec_irq_work+0x1a/0x20 arch/x86/include/asm/idtentry.h:674
RIP: 0010:put_flush_tlb_info arch/x86/mm/tlb.c:997 [inline]
RIP: 0010:flush_tlb_mm_range+0x175/0x340 arch/x86/mm/tlb.c:1038
Code: 16 38 d0 7c 08 84 d2 0f 85 90 01 00 00 39 0d 12 46 18 0e 0f 87 0d 01 00 00 65 48 8b 05 24 37 c6 7e 48 39 c5 0f 84 8f 00 00 00 <65> ff 0d e4 fe c4 7e bf 01 00 00 00 e8 5a 64 1e 00 65 8b 05 0b 36
RSP: 0000:ffffc90001c17b80 EFLAGS: 00000206
RAX: 00000000000002b7 RBX: 0000000000000000 RCX: 1ffffffff270379e
RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000000000
RBP: ffff888020f0df00 R08: 0000000000000001 R09: fffffbfff27031e0
R10: 0000000000000001 R11: 0000000000000000 R12: 00007ffba5129000
R13: 00007ffba5128000 R14: ffff888020f0e7c0 R15: ffff88802c63be00
flush_tlb_page arch/x86/include/asm/tlbflush.h:254 [inline]
ptep_clear_flush+0x13d/0x180 mm/pgtable-generic.c:101
wp_page_copy mm/memory.c:3188 [inline]
do_wp_page+0x11fc/0x3590 mm/memory.c:3511
handle_pte_fault mm/memory.c:5055 [inline]
__handle_mm_fault+0x15d8/0x3c60 mm/memory.c:5180
handle_mm_fault+0x3c2/0xa40 mm/memory.c:5345
do_user_addr_fault+0x2ed/0x1010 arch/x86/mm/fault.c:1364
handle_page_fault arch/x86/mm/fault.c:1507 [inline]
exc_page_fault+0x99/0x180 arch/x86/mm/fault.c:1563
asm_exc_page_fault+0x26/0x30 arch/x86/include/asm/idtentry.h:570
RIP: 0033:0x7ffba50fee1c
Code: ec 38 0f 31 48 c1 e2 20 48 09 d0 48 8d 15 74 90 02 00 48 89 05 75 87 02 00 48 8b 05 66 90 02 00 49 89 d4 4c 2b 25 e4 91 02 00 <48> 89 15 d5 9b 02 00 4c 89 25 be 9b 02 00 48 85 c0 74 6f bf ff ff
RSP: 002b:00007ffea1d38f90 EFLAGS: 00010206
RAX: 000000000000000e RBX: 0000000000000000 RCX: 0000000000000000
RDX: 00007ffba5127e78 RSI: 0000000000000000 RDI: 00007ffea1d39000
RBP: 00007ffea1d38ff0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 00007ffba50fd000
R13: 0000000000000000 R14: 0000000000000000 R15: 00007ffea1d39000
</TASK>
----------------
Code disassembly (best guess), 1 bytes skipped:
0: 38 d0 cmp %dl,%al
2: 7c 08 jl 0xc
4: 84 d2 test %dl,%dl
6: 0f 85 90 01 00 00 jne 0x19c
c: 39 0d 12 46 18 0e cmp %ecx,0xe184612(%rip) # 0xe184624
12: 0f 87 0d 01 00 00 ja 0x125
18: 65 48 8b 05 24 37 c6 mov %gs:0x7ec63724(%rip),%rax # 0x7ec63744
1f: 7e
20: 48 39 c5 cmp %rax,%rbp
23: 0f 84 8f 00 00 00 je 0xb8
* 29: 65 ff 0d e4 fe c4 7e decl %gs:0x7ec4fee4(%rip) # 0x7ec4ff14 <-- trapping instruction
30: bf 01 00 00 00 mov $0x1,%edi
35: e8 5a 64 1e 00 call 0x1e6494
3a: 65 gs
3b: 8b .byte 0x8b
3c: 05 .byte 0x5
3d: 0b 36 or (%rsi),%esi

Thank you for taking the time to read this email and we look forward to working with you further.










Attachments:
poc.c (9.79 kB)