2022-11-12 14:12:53

by syzbot

[permalink] [raw]
Subject: [syzbot] WARNING: locking bug in hugetlb_no_page

Hello,

syzbot found the following issue on:

HEAD commit: 1621b6eaebf7 Merge branch 'for-next/fixes' into for-kernelci
git tree: git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-kernelci
console output: https://syzkaller.appspot.com/x/log.txt?x=13bd511e880000
kernel config: https://syzkaller.appspot.com/x/.config?x=606e57fd25c5c6cc
dashboard link: https://syzkaller.appspot.com/bug?extid=d07c65298d2c15eafcb0
compiler: Debian clang version 13.0.1-++20220126092033+75e33f71c2da-1~exp1~20220126212112.63, GNU ld (GNU Binutils for Debian) 2.35.2
userspace arch: arm64
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=13315856880000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=173614d1880000

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/82aa7741098d/disk-1621b6ea.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/f6be08c4e4c2/vmlinux-1621b6ea.xz
kernel image: https://storage.googleapis.com/syzbot-assets/296b6946258a/Image-1621b6ea.gz.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: [email protected]

------------[ cut here ]------------
DEBUG_LOCKS_WARN_ON(!test_bit(class_idx, lock_classes_in_use))
WARNING: CPU: 1 PID: 3290 at kernel/locking/lockdep.c:5025 __lock_acquire+0x2758/0x3084
Modules linked in:
CPU: 1 PID: 3290 Comm: syz-executor317 Not tainted 6.1.0-rc4-syzkaller-31872-g1621b6eaebf7 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/30/2022
pstate: 604000c5 (nZCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : __lock_acquire+0x2758/0x3084
lr : __lock_acquire+0x2754/0x3084 kernel/locking/lockdep.c:5025
sp : ffff800012e3b3e0
x29: ffff800012e3b4c0 x28: 0000000000000001 x27: ffff0000cb891a68
x26: ffff0000cb892450 x25: ffff0000cb892470 x24: ffff0000cb892470
x23: 00000000000000c0 x22: 0000000000000001 x21: 0000000000000000
x20: ffff0000cb891a40 x19: aaaaaa0000fb22ca x18: 0000000000000358
x17: ffff80000c04d83c x16: 0000000000000000 x15: 0000000000000000
x14: 0000000000000000 x13: 0000000000000012 x12: ffff80000d86ff30
x11: ff808000081c06c8 x10: 0000000000000000 x9 : ddc86c2f228f9600
x8 : ddc86c2f228f9600 x7 : 4e5241575f534b43 x6 : ffff80000c01775c
x5 : 0000000000000000 x4 : 0000000000000001 x3 : 0000000000000000
x2 : 0000000000000000 x1 : 0000000100000000 x0 : 0000000000000000
Call trace:
__lock_acquire+0x2758/0x3084
reacquire_held_locks+0x120/0x1c0 kernel/locking/lockdep.c:5193
__lock_release kernel/locking/lockdep.c:5382 [inline]
lock_release+0x148/0x2b4 kernel/locking/lockdep.c:5688
__mutex_unlock_slowpath+0x44/0x1cc kernel/locking/mutex.c:907
mutex_unlock+0x24/0x30 kernel/locking/mutex.c:543
hugetlb_no_page+0x284/0xe1c mm/hugetlb.c:5771
hugetlb_fault+0x3a0/0xdfc mm/hugetlb.c:5874
handle_mm_fault+0x904/0xa48 mm/memory.c:5216
__do_page_fault arch/arm64/mm/fault.c:506 [inline]
do_page_fault+0x428/0x79c arch/arm64/mm/fault.c:606
do_translation_fault+0x78/0x194 arch/arm64/mm/fault.c:689
do_mem_abort+0x54/0x130 arch/arm64/mm/fault.c:825
el1_abort+0x3c/0x5c arch/arm64/kernel/entry-common.c:367
el1h_64_sync_handler+0x60/0xac arch/arm64/kernel/entry-common.c:427
el1h_64_sync+0x64/0x68 arch/arm64/kernel/entry.S:579
__arch_copy_from_user+0x24/0x1f4 arch/arm64/lib/copy_from_user.S:77
__import_iovec+0x60/0x248 lib/iov_iter.c:1773
import_iovec+0x6c/0x88 lib/iov_iter.c:1838
vfs_writev fs/read_write.c:931 [inline]
do_writev+0xf8/0x234 fs/read_write.c:977
__do_sys_writev fs/read_write.c:1050 [inline]
__se_sys_writev fs/read_write.c:1047 [inline]
__arm64_sys_writev+0x28/0x38 fs/read_write.c:1047
__invoke_syscall arch/arm64/kernel/syscall.c:38 [inline]
invoke_syscall arch/arm64/kernel/syscall.c:52 [inline]
el0_svc_common+0x138/0x220 arch/arm64/kernel/syscall.c:142
do_el0_svc+0x48/0x164 arch/arm64/kernel/syscall.c:206
el0_svc+0x58/0x150 arch/arm64/kernel/entry-common.c:637
el0t_64_sync_handler+0x84/0xf0 arch/arm64/kernel/entry-common.c:655
el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:584
irq event stamp: 941
hardirqs last enabled at (941): [<ffff80000c01c86c>] __raw_spin_unlock_irq include/linux/spinlock_api_smp.h:159 [inline]
hardirqs last enabled at (941): [<ffff80000c01c86c>] _raw_spin_unlock_irq+0x3c/0x70 kernel/locking/spinlock.c:202
hardirqs last disabled at (940): [<ffff80000c01c66c>] __raw_spin_lock_irq include/linux/spinlock_api_smp.h:117 [inline]
hardirqs last disabled at (940): [<ffff80000c01c66c>] _raw_spin_lock_irq+0x34/0x9c kernel/locking/spinlock.c:170
softirqs last enabled at (744): [<ffff80000801c38c>] local_bh_enable+0x10/0x34 include/linux/bottom_half.h:32
softirqs last disabled at (742): [<ffff80000801c358>] local_bh_disable+0x10/0x34 include/linux/bottom_half.h:19
---[ end trace 0000000000000000 ]---


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at [email protected].

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
syzbot can test patches for this issue, for details see:
https://goo.gl/tpsmEJ#testing-patches


2022-11-13 06:04:17

by syzbot

[permalink] [raw]
Subject: Re: [syzbot] WARNING: locking bug in hugetlb_no_page

Hello,

syzbot has tested the proposed patch but the reproducer is still triggering an issue:
BUG: unable to handle kernel paging request in hugetlb_no_page

Unable to handle kernel paging request at virtual address 1fff800003441a18
Mem abort info:
ESR = 0x0000000096000006
EC = 0x25: DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
FSC = 0x06: level 2 translation fault
Data abort info:
ISV = 0, ISS = 0x00000006
CM = 0, WnR = 0
[1fff800003441a18] address between user and kernel address ranges
Internal error: Oops: 0000000096000006 [#1] PREEMPT SMP
Modules linked in:
CPU: 0 PID: 4269 Comm: syz-executor.2 Not tainted 6.1.0-rc4-syzkaller-00039-g1621b6eaebf7-dirty #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/30/2022
pstate: 804000c5 (Nzcv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : generic_test_bit include/asm-generic/bitops/generic-non-atomic.h:128 [inline]
pc : __lock_acquire+0x654/0x3084 kernel/locking/lockdep.c:5025
lr : mark_usage kernel/locking/lockdep.c:4555 [inline]
lr : __lock_acquire+0x630/0x3084 kernel/locking/lockdep.c:5009
sp : ffff8000131033d0
x29: ffff8000131034b0 x28: 0000000000000001 x27: ffff0000d2c89a68
x26: ffff0000d2c8a450 x25: ffff0000d2c8a470 x24: ffff0000d2c8a470
x23: 00000000000000c0 x22: 0000000000000001 x21: 0000000000000000
x20: ffff0000d2c89a40 x19: 555554aaabb2c422 x18: 00000000000000c0
x17: ffff80000dcdc198 x16: ffff80000db1a158 x15: ffff0000d2c89a40
x14: 0000000000000018 x13: ffff80000819fba0 x12: 00000000c73c5909
x11: ff808000095f17a4 x10: ffff80000dcdc198 x9 : 1ffffffff5765880
x8 : 0000000000000000 x7 : 0000000000000000 x6 : ffff80000801154c
x5 : 4c1501080080ffff x4 : ffff80000801154c x3 : 4c1501080080ffff
x2 : fffffffffffffff8 x1 : ffff80000cc75907 x0 : 0000000000000001
Call trace:
generic_test_bit include/asm-generic/bitops/generic-non-atomic.h:128 [inline]
__lock_acquire+0x654/0x3084 kernel/locking/lockdep.c:5025
reacquire_held_locks+0x120/0x1c0 kernel/locking/lockdep.c:5193
__lock_release kernel/locking/lockdep.c:5382 [inline]
lock_release+0x148/0x2b4 kernel/locking/lockdep.c:5688
__mutex_unlock_slowpath+0x44/0x1cc kernel/locking/mutex.c:907
mutex_unlock+0x24/0x30 kernel/locking/mutex.c:543
hugetlb_no_page+0x298/0xe38 mm/hugetlb.c:5772
hugetlb_fault+0x3d0/0xe30 mm/hugetlb.c:5877
handle_mm_fault+0x904/0xa48 mm/memory.c:5216
__do_page_fault arch/arm64/mm/fault.c:506 [inline]
do_page_fault+0x428/0x79c arch/arm64/mm/fault.c:606
do_translation_fault+0x78/0x194 arch/arm64/mm/fault.c:689
do_mem_abort+0x54/0x130 arch/arm64/mm/fault.c:825
el1_abort+0x3c/0x5c arch/arm64/kernel/entry-common.c:367
el1h_64_sync_handler+0x60/0xac arch/arm64/kernel/entry-common.c:427
el1h_64_sync+0x64/0x68 arch/arm64/kernel/entry.S:579
__arch_copy_from_user+0x1bc/0x1f4 arch/arm64/lib/copy_from_user.S:214
__import_iovec+0x60/0x248 lib/iov_iter.c:1773
import_iovec+0x6c/0x88 lib/iov_iter.c:1838
vfs_writev fs/read_write.c:931 [inline]
do_writev+0xf8/0x234 fs/read_write.c:977
__do_sys_writev fs/read_write.c:1050 [inline]
__se_sys_writev fs/read_write.c:1047 [inline]
__arm64_sys_writev+0x28/0x38 fs/read_write.c:1047
__invoke_syscall arch/arm64/kernel/syscall.c:38 [inline]
invoke_syscall arch/arm64/kernel/syscall.c:52 [inline]
el0_svc_common+0x138/0x220 arch/arm64/kernel/syscall.c:142
do_el0_svc+0x48/0x164 arch/arm64/kernel/syscall.c:206
el0_svc+0x58/0x150 arch/arm64/kernel/entry-common.c:637
el0t_64_sync_handler+0x84/0xf0 arch/arm64/kernel/entry-common.c:655
el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:584
Code: 350000e8 93407e69 d343fd29 927de529 (f8696949)
---[ end trace 0000000000000000 ]---
----------------
Code disassembly (best guess):
0: 350000e8 cbnz w8, 0x1c
4: 93407e69 sxtw x9, w19
8: d343fd29 lsr x9, x9, #3
c: 927de529 and x9, x9, #0x1ffffffffffffff8
* 10: f8696949 ldr x9, [x10, x9] <-- trapping instruction


Tested on:

commit: 1621b6ea Merge branch 'for-next/fixes' into for-kernelci
git tree: https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git
console output: https://syzkaller.appspot.com/x/log.txt?x=12018ac1880000
kernel config: https://syzkaller.appspot.com/x/.config?x=606e57fd25c5c6cc
dashboard link: https://syzkaller.appspot.com/bug?extid=d07c65298d2c15eafcb0
compiler: Debian clang version 13.0.1-++20220126092033+75e33f71c2da-1~exp1~20220126212112.63, GNU ld (GNU Binutils for Debian) 2.35.2
userspace arch: arm64
patch: https://syzkaller.appspot.com/x/patch.diff?x=12eb8b71880000


2022-11-13 10:58:30

by syzbot

[permalink] [raw]
Subject: Re: [syzbot] WARNING: locking bug in hugetlb_no_page

Hello,

syzbot has tested the proposed patch but the reproducer is still triggering an issue:
WARNING: locking bug in hugetlb_no_page

------------[ cut here ]------------
DEBUG_LOCKS_WARN_ON(1)
WARNING: CPU: 1 PID: 3786 at kernel/locking/lockdep.c:231 check_wait_context kernel/locking/lockdep.c:4729 [inline]
WARNING: CPU: 1 PID: 3786 at kernel/locking/lockdep.c:231 __lock_acquire+0x2b0/0x3084 kernel/locking/lockdep.c:5005
Modules linked in:
CPU: 1 PID: 3786 Comm: syz-executor.1 Not tainted 6.1.0-rc4-syzkaller-00039-g1621b6eaebf7-dirty #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/30/2022
pstate: 604000c5 (nZCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : check_wait_context kernel/locking/lockdep.c:4729 [inline]
pc : __lock_acquire+0x2b0/0x3084 kernel/locking/lockdep.c:5005
lr : hlock_class kernel/locking/lockdep.c:231 [inline]
lr : check_wait_context kernel/locking/lockdep.c:4729 [inline]
lr : __lock_acquire+0x298/0x3084 kernel/locking/lockdep.c:5005
sp : ffff80001301b3e0
x29: ffff80001301b4c0 x28: 0000000000000001 x27: ffff0000cfbbb4a8
x26: ffff0000d342ea78 x25: ffff0000cfbbbeb0 x24: 0000000000000000
x23: 0000000000000000
x22: 0000000000000001 x21: 0000000000000000
x20: 0000000000000001 x19: aaaaaa0001076c5e
x18: 00000000000000c0
x17: ffff80000dcdc198 x16: ffff80000db1a158 x15: ffff0000cfbbb480
x14: 0000000000000000 x13: 0000000000000012 x12: ffff80000d86ff30
x11: ff808000081c06c8
x10: ffff80000dcdc198
x9 : 3a226953cce2cb00

x8 : 0000000000000000 x7 : 4e5241575f534b43
x6 : ffff80000c01775c
x5 : 0000000000000000 x4 : 0000000000000001 x3 : 0000000000000000

x2 : 0000000000000000
x1 : 0000000100000000
x0 : 0000000000000016
Call trace:
check_wait_context kernel/locking/lockdep.c:4729 [inline]
__lock_acquire+0x2b0/0x3084 kernel/locking/lockdep.c:5005
reacquire_held_locks+0x120/0x1c0 kernel/locking/lockdep.c:5193
__lock_release kernel/locking/lockdep.c:5382 [inline]
lock_release+0x148/0x2b4 kernel/locking/lockdep.c:5688
__mutex_unlock_slowpath+0x44/0x1cc kernel/locking/mutex.c:907
mutex_unlock+0x24/0x30 kernel/locking/mutex.c:543
hugetlb_no_page+0x284/0xe1c mm/hugetlb.c:5779
hugetlb_fault+0x3a0/0xdfc mm/hugetlb.c:5882
handle_mm_fault+0x904/0xa48 mm/memory.c:5216
__do_page_fault arch/arm64/mm/fault.c:506 [inline]
do_page_fault+0x428/0x79c arch/arm64/mm/fault.c:606
do_translation_fault+0x78/0x194 arch/arm64/mm/fault.c:689
do_mem_abort+0x54/0x130 arch/arm64/mm/fault.c:825
el1_abort+0x3c/0x5c arch/arm64/kernel/entry-common.c:367
el1h_64_sync_handler+0x60/0xac arch/arm64/kernel/entry-common.c:427
el1h_64_sync+0x64/0x68 arch/arm64/kernel/entry.S:579
__arch_copy_from_user+0x24/0x1f4 arch/arm64/lib/copy_from_user.S:77
__import_iovec+0x60/0x248 lib/iov_iter.c:1773
import_iovec+0x6c/0x88 lib/iov_iter.c:1838
vfs_writev fs/read_write.c:931 [inline]
do_writev+0xf8/0x234 fs/read_write.c:977
__do_sys_writev fs/read_write.c:1050 [inline]
__se_sys_writev fs/read_write.c:1047 [inline]
__arm64_sys_writev+0x28/0x38 fs/read_write.c:1047
__invoke_syscall arch/arm64/kernel/syscall.c:38 [inline]
invoke_syscall arch/arm64/kernel/syscall.c:52 [inline]
el0_svc_common+0x138/0x220 arch/arm64/kernel/syscall.c:142
do_el0_svc+0x48/0x164 arch/arm64/kernel/syscall.c:206
el0_svc+0x58/0x150 arch/arm64/kernel/entry-common.c:637
el0t_64_sync_handler+0x84/0xf0 arch/arm64/kernel/entry-common.c:655
el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:584
irq event stamp: 41
hardirqs last enabled at (41): [<ffff80000c01c86c>] __raw_spin_unlock_irq include/linux/spinlock_api_smp.h:159 [inline]
hardirqs last enabled at (41): [<ffff80000c01c86c>] _raw_spin_unlock_irq+0x3c/0x70 kernel/locking/spinlock.c:202
hardirqs last disabled at (40): [<ffff80000c01c66c>] __raw_spin_lock_irq include/linux/spinlock_api_smp.h:117 [inline]
hardirqs last disabled at (40): [<ffff80000c01c66c>] _raw_spin_lock_irq+0x34/0x9c kernel/locking/spinlock.c:170
softirqs last enabled at (8): [<ffff80000801c38c>] local_bh_enable+0x10/0x34 include/linux/bottom_half.h:32
softirqs last disabled at (6): [<ffff80000801c358>] local_bh_disable+0x10/0x34 include/linux/bottom_half.h:19
---[ end trace 0000000000000000 ]---
BUG: sleeping function called from invalid context at arch/arm64/mm/fault.c:597
in_atomic(): 0, irqs_disabled(): 128, non_block: 0, pid: 3786, name: syz-executor.1
preempt_count: 0, expected: 0
RCU nest depth: 0, expected: 0
INFO: lockdep is turned off.
irq event stamp: 41
hardirqs last enabled at (41): [<ffff80000c01c86c>] __raw_spin_unlock_irq include/linux/spinlock_api_smp.h:159 [inline]
hardirqs last enabled at (41): [<ffff80000c01c86c>] _raw_spin_unlock_irq+0x3c/0x70 kernel/locking/spinlock.c:202
hardirqs last disabled at (40): [<ffff80000c01c66c>] __raw_spin_lock_irq include/linux/spinlock_api_smp.h:117 [inline]
hardirqs last disabled at (40): [<ffff80000c01c66c>] _raw_spin_lock_irq+0x34/0x9c kernel/locking/spinlock.c:170
softirqs last enabled at (8): [<ffff80000801c38c>] local_bh_enable+0x10/0x34 include/linux/bottom_half.h:32
softirqs last disabled at (6): [<ffff80000801c358>] local_bh_disable+0x10/0x34 include/linux/bottom_half.h:19
CPU: 1 PID: 3786 Comm: syz-executor.1 Tainted: G W 6.1.0-rc4-syzkaller-00039-g1621b6eaebf7-dirty #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/30/2022
Call trace:
dump_backtrace+0x1c4/0x1f0 arch/arm64/kernel/stacktrace.c:156
show_stack+0x2c/0x54 arch/arm64/kernel/stacktrace.c:163
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x104/0x16c lib/dump_stack.c:106
dump_stack+0x1c/0x58 lib/dump_stack.c:113
__might_resched+0x208/0x218 kernel/sched/core.c:9890
__might_sleep+0x48/0x78 kernel/sched/core.c:9819
do_page_fault+0x214/0x79c arch/arm64/mm/fault.c:597
do_translation_fault+0x78/0x194 arch/arm64/mm/fault.c:689
do_mem_abort+0x54/0x130 arch/arm64/mm/fault.c:825
el1_abort+0x3c/0x5c arch/arm64/kernel/entry-common.c:367
el1h_64_sync_handler+0x60/0xac arch/arm64/kernel/entry-common.c:427
el1h_64_sync+0x64/0x68 arch/arm64/kernel/entry.S:579
hlock_class kernel/locking/lockdep.c:222 [inline]
check_wait_context kernel/locking/lockdep.c:4730 [inline]
__lock_acquire+0x2d0/0x3084 kernel/locking/lockdep.c:5005
reacquire_held_locks+0x120/0x1c0 kernel/locking/lockdep.c:5193
__lock_release kernel/locking/lockdep.c:5382 [inline]
lock_release+0x148/0x2b4 kernel/locking/lockdep.c:5688
__mutex_unlock_slowpath+0x44/0x1cc kernel/locking/mutex.c:907
mutex_unlock+0x24/0x30 kernel/locking/mutex.c:543
hugetlb_no_page+0x284/0xe1c mm/hugetlb.c:5779
hugetlb_fault+0x3a0/0xdfc mm/hugetlb.c:5882
handle_mm_fault+0x904/0xa48 mm/memory.c:5216
__do_page_fault arch/arm64/mm/fault.c:506 [inline]
do_page_fault+0x428/0x79c arch/arm64/mm/fault.c:606
do_translation_fault+0x78/0x194 arch/arm64/mm/fault.c:689
do_mem_abort+0x54/0x130 arch/arm64/mm/fault.c:825
el1_abort+0x3c/0x5c arch/arm64/kernel/entry-common.c:367
el1h_64_sync_handler+0x60/0xac arch/arm64/kernel/entry-common.c:427
el1h_64_sync+0x64/0x68 arch/arm64/kernel/entry.S:579
__arch_copy_from_user+0x24/0x1f4 arch/arm64/lib/copy_from_user.S:77
__import_iovec+0x60/0x248 lib/iov_iter.c:1773
import_iovec+0x6c/0x88 lib/iov_iter.c:1838
vfs_writev fs/read_write.c:931 [inline]
do_writev+0xf8/0x234 fs/read_write.c:977
__do_sys_writev fs/read_write.c:1050 [inline]
__se_sys_writev fs/read_write.c:1047 [inline]
__arm64_sys_writev+0x28/0x38 fs/read_write.c:1047
__invoke_syscall arch/arm64/kernel/syscall.c:38 [inline]
invoke_syscall arch/arm64/kernel/syscall.c:52 [inline]
el0_svc_common+0x138/0x220 arch/arm64/kernel/syscall.c:142
do_el0_svc+0x48/0x164 arch/arm64/kernel/syscall.c:206
el0_svc+0x58/0x150 arch/arm64/kernel/entry-common.c:637
el0t_64_sync_handler+0x84/0xf0 arch/arm64/kernel/entry-common.c:655
el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:584
Unable to handle kernel NULL pointer dereference at virtual address 00000000000000b8
Mem abort info:
ESR = 0x0000000096000006
EC = 0x25: DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
FSC = 0x06: level 2 translation fault
Data abort info:
ISV = 0, ISS = 0x00000006
CM = 0, WnR = 0
user pgtable: 4k pages, 48-bit VAs, pgdp=000000011345f000
[00000000000000b8] pgd=080000011347a003, p4d=080000011347a003, pud=080000011347e003, pmd=0000000000000000
Internal error: Oops: 0000000096000006 [#1] PREEMPT SMP
Modules linked in:
CPU: 1 PID: 3786 Comm: syz-executor.1 Tainted: G W 6.1.0-rc4-syzkaller-00039-g1621b6eaebf7-dirty #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/30/2022
pstate: 604000c5 (nZCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : check_wait_context kernel/locking/lockdep.c:4729 [inline]
pc : __lock_acquire+0x2d0/0x3084 kernel/locking/lockdep.c:5005
lr : hlock_class kernel/locking/lockdep.c:231 [inline]
lr : check_wait_context kernel/locking/lockdep.c:4729 [inline]
lr : __lock_acquire+0x298/0x3084 kernel/locking/lockdep.c:5005
sp : ffff80001301b3e0
x29: ffff80001301b4c0 x28: 0000000000000001 x27: ffff0000cfbbb4a8
x26: ffff0000d342ea78 x25: ffff0000cfbbbeb0 x24: 0000000000000000
x23: 0000000000000000 x22: 0000000000000001 x21: 0000000000000000
x20: 0000000000000001 x19: aaaaaa0001076c5e x18: 00000000000000c0
x17: ffff80000dcdc198 x16: ffff80000db1a158 x15: ffff0000cfbbb480
x14: 0000000000000000 x13: 0000000000000012 x12: ffff80000d86ff30
x11: ff808000081c06c8 x10: ffff80000dcdc198 x9 : 0000000000050c5e
x8 : 0000000000000000 x7 : 4e5241575f534b43 x6 : ffff80000c01775c
x5 : 0000000000000000 x4 : 0000000000000001 x3 : 0000000000000000
x2 : 0000000000000000 x1 : 0000000100000000 x0 : 0000000000000016
Call trace:
hlock_class kernel/locking/lockdep.c:222 [inline]
check_wait_context kernel/locking/lockdep.c:4730 [inline]
__lock_acquire+0x2d0/0x3084 kernel/locking/lockdep.c:5005
reacquire_held_locks+0x120/0x1c0 kernel/locking/lockdep.c:5193
__lock_release kernel/locking/lockdep.c:5382 [inline]
lock_release+0x148/0x2b4 kernel/locking/lockdep.c:5688
__mutex_unlock_slowpath+0x44/0x1cc kernel/locking/mutex.c:907
mutex_unlock+0x24/0x30 kernel/locking/mutex.c:543
hugetlb_no_page+0x284/0xe1c mm/hugetlb.c:5779
hugetlb_fault+0x3a0/0xdfc mm/hugetlb.c:5882
handle_mm_fault+0x904/0xa48 mm/memory.c:5216
__do_page_fault arch/arm64/mm/fault.c:506 [inline]
do_page_fault+0x428/0x79c arch/arm64/mm/fault.c:606
do_translation_fault+0x78/0x194 arch/arm64/mm/fault.c:689
do_mem_abort+0x54/0x130 arch/arm64/mm/fault.c:825
el1_abort+0x3c/0x5c arch/arm64/kernel/entry-common.c:367
el1h_64_sync_handler+0x60/0xac arch/arm64/kernel/entry-common.c:427
el1h_64_sync+0x64/0x68 arch/arm64/kernel/entry.S:579
__arch_copy_from_user+0x24/0x1f4 arch/arm64/lib/copy_from_user.S:77
__import_iovec+0x60/0x248 lib/iov_iter.c:1773
import_iovec+0x6c/0x88 lib/iov_iter.c:1838
vfs_writev fs/read_write.c:931 [inline]
do_writev+0xf8/0x234 fs/read_write.c:977
__do_sys_writev fs/read_write.c:1050 [inline]
__se_sys_writev fs/read_write.c:1047 [inline]
__arm64_sys_writev+0x28/0x38 fs/read_write.c:1047
__invoke_syscall arch/arm64/kernel/syscall.c:38 [inline]
invoke_syscall arch/arm64/kernel/syscall.c:52 [inline]
el0_svc_common+0x138/0x220 arch/arm64/kernel/syscall.c:142
do_el0_svc+0x48/0x164 arch/arm64/kernel/syscall.c:206
el0_svc+0x58/0x150 arch/arm64/kernel/entry-common.c:637
el0t_64_sync_handler+0x84/0xf0 arch/arm64/kernel/entry-common.c:655
el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:584
Code: d002da0a 91056210 9106614a b9400329 (3942e114)
---[ end trace 0000000000000000 ]---
----------------
Code disassembly (best guess):
0: d002da0a adrp x10, 0x5b42000
4: 91056210 add x16, x16, #0x158
8: 9106614a add x10, x10, #0x198
c: b9400329 ldr w9, [x25]
* 10: 3942e114 ldrb w20, [x8, #184] <-- trapping instruction


Tested on:

commit: 1621b6ea Merge branch 'for-next/fixes' into for-kernelci
git tree: https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git
console output: https://syzkaller.appspot.com/x/log.txt?x=15b39c85880000
kernel config: https://syzkaller.appspot.com/x/.config?x=606e57fd25c5c6cc
dashboard link: https://syzkaller.appspot.com/bug?extid=d07c65298d2c15eafcb0
compiler: Debian clang version 13.0.1-++20220126092033+75e33f71c2da-1~exp1~20220126212112.63, GNU ld (GNU Binutils for Debian) 2.35.2
userspace arch: arm64
patch: https://syzkaller.appspot.com/x/patch.diff?x=14fbe185880000


2022-11-13 16:02:06

by Dmitry Vyukov

[permalink] [raw]
Subject: Re: [syzbot] WARNING: locking bug in hugetlb_no_page

On Sat, 12 Nov 2022 at 15:03, syzbot
<[email protected]> wrote:
>
> Hello,
>
> syzbot found the following issue on:
>
> HEAD commit: 1621b6eaebf7 Merge branch 'for-next/fixes' into for-kernelci
> git tree: git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-kernelci
> console output: https://syzkaller.appspot.com/x/log.txt?x=13bd511e880000
> kernel config: https://syzkaller.appspot.com/x/.config?x=606e57fd25c5c6cc
> dashboard link: https://syzkaller.appspot.com/bug?extid=d07c65298d2c15eafcb0
> compiler: Debian clang version 13.0.1-++20220126092033+75e33f71c2da-1~exp1~20220126212112.63, GNU ld (GNU Binutils for Debian) 2.35.2
> userspace arch: arm64
> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=13315856880000
> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=173614d1880000
>
> Downloadable assets:
> disk image: https://storage.googleapis.com/syzbot-assets/82aa7741098d/disk-1621b6ea.raw.xz
> vmlinux: https://storage.googleapis.com/syzbot-assets/f6be08c4e4c2/vmlinux-1621b6ea.xz
> kernel image: https://storage.googleapis.com/syzbot-assets/296b6946258a/Image-1621b6ea.gz.xz
>
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: [email protected]

This may have the same root cause as:

possible deadlock in hugetlb_fault
https://lore.kernel.org/all/CACT4Y+ZWNV6ApzEv0UrsF2T8JWmXez_-H-EGMii-S_2JbXv07Q@mail.gmail.com/

and there is a potential explanation as to what may be the problem.

> ------------[ cut here ]------------
> DEBUG_LOCKS_WARN_ON(!test_bit(class_idx, lock_classes_in_use))
> WARNING: CPU: 1 PID: 3290 at kernel/locking/lockdep.c:5025 __lock_acquire+0x2758/0x3084
> Modules linked in:
> CPU: 1 PID: 3290 Comm: syz-executor317 Not tainted 6.1.0-rc4-syzkaller-31872-g1621b6eaebf7 #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/30/2022
> pstate: 604000c5 (nZCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> pc : __lock_acquire+0x2758/0x3084
> lr : __lock_acquire+0x2754/0x3084 kernel/locking/lockdep.c:5025
> sp : ffff800012e3b3e0
> x29: ffff800012e3b4c0 x28: 0000000000000001 x27: ffff0000cb891a68
> x26: ffff0000cb892450 x25: ffff0000cb892470 x24: ffff0000cb892470
> x23: 00000000000000c0 x22: 0000000000000001 x21: 0000000000000000
> x20: ffff0000cb891a40 x19: aaaaaa0000fb22ca x18: 0000000000000358
> x17: ffff80000c04d83c x16: 0000000000000000 x15: 0000000000000000
> x14: 0000000000000000 x13: 0000000000000012 x12: ffff80000d86ff30
> x11: ff808000081c06c8 x10: 0000000000000000 x9 : ddc86c2f228f9600
> x8 : ddc86c2f228f9600 x7 : 4e5241575f534b43 x6 : ffff80000c01775c
> x5 : 0000000000000000 x4 : 0000000000000001 x3 : 0000000000000000
> x2 : 0000000000000000 x1 : 0000000100000000 x0 : 0000000000000000
> Call trace:
> __lock_acquire+0x2758/0x3084
> reacquire_held_locks+0x120/0x1c0 kernel/locking/lockdep.c:5193
> __lock_release kernel/locking/lockdep.c:5382 [inline]
> lock_release+0x148/0x2b4 kernel/locking/lockdep.c:5688
> __mutex_unlock_slowpath+0x44/0x1cc kernel/locking/mutex.c:907
> mutex_unlock+0x24/0x30 kernel/locking/mutex.c:543
> hugetlb_no_page+0x284/0xe1c mm/hugetlb.c:5771
> hugetlb_fault+0x3a0/0xdfc mm/hugetlb.c:5874
> handle_mm_fault+0x904/0xa48 mm/memory.c:5216
> __do_page_fault arch/arm64/mm/fault.c:506 [inline]
> do_page_fault+0x428/0x79c arch/arm64/mm/fault.c:606
> do_translation_fault+0x78/0x194 arch/arm64/mm/fault.c:689
> do_mem_abort+0x54/0x130 arch/arm64/mm/fault.c:825
> el1_abort+0x3c/0x5c arch/arm64/kernel/entry-common.c:367
> el1h_64_sync_handler+0x60/0xac arch/arm64/kernel/entry-common.c:427
> el1h_64_sync+0x64/0x68 arch/arm64/kernel/entry.S:579
> __arch_copy_from_user+0x24/0x1f4 arch/arm64/lib/copy_from_user.S:77
> __import_iovec+0x60/0x248 lib/iov_iter.c:1773
> import_iovec+0x6c/0x88 lib/iov_iter.c:1838
> vfs_writev fs/read_write.c:931 [inline]
> do_writev+0xf8/0x234 fs/read_write.c:977
> __do_sys_writev fs/read_write.c:1050 [inline]
> __se_sys_writev fs/read_write.c:1047 [inline]
> __arm64_sys_writev+0x28/0x38 fs/read_write.c:1047
> __invoke_syscall arch/arm64/kernel/syscall.c:38 [inline]
> invoke_syscall arch/arm64/kernel/syscall.c:52 [inline]
> el0_svc_common+0x138/0x220 arch/arm64/kernel/syscall.c:142
> do_el0_svc+0x48/0x164 arch/arm64/kernel/syscall.c:206
> el0_svc+0x58/0x150 arch/arm64/kernel/entry-common.c:637
> el0t_64_sync_handler+0x84/0xf0 arch/arm64/kernel/entry-common.c:655
> el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:584
> irq event stamp: 941
> hardirqs last enabled at (941): [<ffff80000c01c86c>] __raw_spin_unlock_irq include/linux/spinlock_api_smp.h:159 [inline]
> hardirqs last enabled at (941): [<ffff80000c01c86c>] _raw_spin_unlock_irq+0x3c/0x70 kernel/locking/spinlock.c:202
> hardirqs last disabled at (940): [<ffff80000c01c66c>] __raw_spin_lock_irq include/linux/spinlock_api_smp.h:117 [inline]
> hardirqs last disabled at (940): [<ffff80000c01c66c>] _raw_spin_lock_irq+0x34/0x9c kernel/locking/spinlock.c:170
> softirqs last enabled at (744): [<ffff80000801c38c>] local_bh_enable+0x10/0x34 include/linux/bottom_half.h:32
> softirqs last disabled at (742): [<ffff80000801c358>] local_bh_disable+0x10/0x34 include/linux/bottom_half.h:19
> ---[ end trace 0000000000000000 ]---
>
>
> ---
> This report is generated by a bot. It may contain errors.
> See https://goo.gl/tpsmEJ for more information about syzbot.
> syzbot engineers can be reached at [email protected].
>
> syzbot will keep track of this issue. See:
> https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
> syzbot can test patches for this issue, for details see:
> https://goo.gl/tpsmEJ#testing-patches

2022-11-13 19:49:20

by Mike Kravetz

[permalink] [raw]
Subject: Re: [syzbot] WARNING: locking bug in hugetlb_no_page

On 11/13/22 16:36, Dmitry Vyukov wrote:
> On Sat, 12 Nov 2022 at 15:03, syzbot
> <[email protected]> wrote:
> >
> > Hello,
> >
> > syzbot found the following issue on:
> >
> > HEAD commit: 1621b6eaebf7 Merge branch 'for-next/fixes' into for-kernelci
> > git tree: git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-kernelci
> > console output: https://syzkaller.appspot.com/x/log.txt?x=13bd511e880000
> > kernel config: https://syzkaller.appspot.com/x/.config?x=606e57fd25c5c6cc
> > dashboard link: https://syzkaller.appspot.com/bug?extid=d07c65298d2c15eafcb0
> > compiler: Debian clang version 13.0.1-++20220126092033+75e33f71c2da-1~exp1~20220126212112.63, GNU ld (GNU Binutils for Debian) 2.35.2
> > userspace arch: arm64
> > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=13315856880000
> > C reproducer: https://syzkaller.appspot.com/x/repro.c?x=173614d1880000
> >
> > Downloadable assets:
> > disk image: https://storage.googleapis.com/syzbot-assets/82aa7741098d/disk-1621b6ea.raw.xz
> > vmlinux: https://storage.googleapis.com/syzbot-assets/f6be08c4e4c2/vmlinux-1621b6ea.xz
> > kernel image: https://storage.googleapis.com/syzbot-assets/296b6946258a/Image-1621b6ea.gz.xz
> >
> > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > Reported-by: [email protected]
>
> This may have the same root cause as:
>
> possible deadlock in hugetlb_fault
> https://lore.kernel.org/all/CACT4Y+ZWNV6ApzEv0UrsF2T8JWmXez_-H-EGMii-S_2JbXv07Q@mail.gmail.com/
>
> and there is a potential explanation as to what may be the problem.

Thanks Dmitry!

An issue with this new hugetlb locking was previously reported and I have been
working on a solution. When I look at the reproducer, I see that it is calling
madvise(MADV_DONTNEED). This triggers the other issue and could certainly
cause the issue reported here.

Proposed patches are here and in next-20221111:
https://lore.kernel.org/linux-mm/[email protected]/

I am currently trying to run the reproducer, but it is not reproducing quickly.
Since this is a timing issue that as expected. Interesting that this
report is run on arm64 and I am trying to reproduce on x86. Although, the
issue is not architecture specific in any way.

I'll keep looking, but am fairly confident this is the root cause.
--
Mike Kravetz

2022-11-13 20:43:47

by syzbot

[permalink] [raw]
Subject: Re: [syzbot] WARNING: locking bug in hugetlb_no_page

Hello,

syzbot has tested the proposed patch but the reproducer is still triggering an issue:
INFO: rcu detected stall in corrupted

rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { P3507 } 2668 jiffies s: 2069 root: 0x0/T
rcu: blocking rcu_node structures (internal RCU debug):


Tested on:

commit: 1621b6ea Merge branch 'for-next/fixes' into for-kernelci
git tree: https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git
console output: https://syzkaller.appspot.com/x/log.txt?x=108ca515880000
kernel config: https://syzkaller.appspot.com/x/.config?x=606e57fd25c5c6cc
dashboard link: https://syzkaller.appspot.com/bug?extid=d07c65298d2c15eafcb0
compiler: Debian clang version 13.0.1-++20220126092033+75e33f71c2da-1~exp1~20220126212112.63, GNU ld (GNU Binutils for Debian) 2.35.2
userspace arch: arm64
patch: https://syzkaller.appspot.com/x/patch.diff?x=174f46d1880000


2022-11-14 02:30:35

by Mike Kravetz

[permalink] [raw]
Subject: Re: [syzbot] WARNING: locking bug in hugetlb_no_page

On 11/13/22 10:50, Mike Kravetz wrote:
> On 11/13/22 16:36, Dmitry Vyukov wrote:
> > On Sat, 12 Nov 2022 at 15:03, syzbot
> > <[email protected]> wrote:
> > >
> > > Hello,
> > >
> > > syzbot found the following issue on:
> > >
> > > HEAD commit: 1621b6eaebf7 Merge branch 'for-next/fixes' into for-kernelci
> > > git tree: git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-kernelci
> > > console output: https://syzkaller.appspot.com/x/log.txt?x=13bd511e880000
> > > kernel config: https://syzkaller.appspot.com/x/.config?x=606e57fd25c5c6cc
> > > dashboard link: https://syzkaller.appspot.com/bug?extid=d07c65298d2c15eafcb0
> > > compiler: Debian clang version 13.0.1-++20220126092033+75e33f71c2da-1~exp1~20220126212112.63, GNU ld (GNU Binutils for Debian) 2.35.2
> > > userspace arch: arm64
> > > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=13315856880000
> > > C reproducer: https://syzkaller.appspot.com/x/repro.c?x=173614d1880000
> > >
> > > Downloadable assets:
> > > disk image: https://storage.googleapis.com/syzbot-assets/82aa7741098d/disk-1621b6ea.raw.xz
> > > vmlinux: https://storage.googleapis.com/syzbot-assets/f6be08c4e4c2/vmlinux-1621b6ea.xz
> > > kernel image: https://storage.googleapis.com/syzbot-assets/296b6946258a/Image-1621b6ea.gz.xz
> > >
> > > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > > Reported-by: [email protected]
> >
> > This may have the same root cause as:
> >
> > possible deadlock in hugetlb_fault
> > https://lore.kernel.org/all/CACT4Y+ZWNV6ApzEv0UrsF2T8JWmXez_-H-EGMii-S_2JbXv07Q@mail.gmail.com/
> >
> > and there is a potential explanation as to what may be the problem.
>
> Thanks Dmitry!
>
> An issue with this new hugetlb locking was previously reported and I have been
> working on a solution. When I look at the reproducer, I see that it is calling
> madvise(MADV_DONTNEED). This triggers the other issue and could certainly
> cause the issue reported here.
>
> Proposed patches are here and in next-20221111:
> https://lore.kernel.org/linux-mm/[email protected]/
>
> I am currently trying to run the reproducer, but it is not reproducing quickly.
> Since this is a timing issue that as expected. Interesting that this
> report is run on arm64 and I am trying to reproduce on x86. Although, the
> issue is not architecture specific in any way.

After tweaking my config, I was able to reliably reproduce.

> I'll keep looking, but am fairly confident this is the root cause.

I was also able to verify the series above addresses the issue.

--
Mike Kravetz

2022-11-14 10:32:11

by Dmitry Vyukov

[permalink] [raw]
Subject: Re: [syzbot] WARNING: locking bug in hugetlb_no_page

On Mon, 14 Nov 2022 at 03:24, Mike Kravetz <[email protected]> wrote:
>
> On 11/13/22 10:50, Mike Kravetz wrote:
> > On 11/13/22 16:36, Dmitry Vyukov wrote:
> > > On Sat, 12 Nov 2022 at 15:03, syzbot
> > > <[email protected]> wrote:
> > > >
> > > > Hello,
> > > >
> > > > syzbot found the following issue on:
> > > >
> > > > HEAD commit: 1621b6eaebf7 Merge branch 'for-next/fixes' into for-kernelci
> > > > git tree: git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-kernelci
> > > > console output: https://syzkaller.appspot.com/x/log.txt?x=13bd511e880000
> > > > kernel config: https://syzkaller.appspot.com/x/.config?x=606e57fd25c5c6cc
> > > > dashboard link: https://syzkaller.appspot.com/bug?extid=d07c65298d2c15eafcb0
> > > > compiler: Debian clang version 13.0.1-++20220126092033+75e33f71c2da-1~exp1~20220126212112.63, GNU ld (GNU Binutils for Debian) 2.35.2
> > > > userspace arch: arm64
> > > > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=13315856880000
> > > > C reproducer: https://syzkaller.appspot.com/x/repro.c?x=173614d1880000
> > > >
> > > > Downloadable assets:
> > > > disk image: https://storage.googleapis.com/syzbot-assets/82aa7741098d/disk-1621b6ea.raw.xz
> > > > vmlinux: https://storage.googleapis.com/syzbot-assets/f6be08c4e4c2/vmlinux-1621b6ea.xz
> > > > kernel image: https://storage.googleapis.com/syzbot-assets/296b6946258a/Image-1621b6ea.gz.xz
> > > >
> > > > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > > > Reported-by: [email protected]
> > >
> > > This may have the same root cause as:
> > >
> > > possible deadlock in hugetlb_fault
> > > https://lore.kernel.org/all/CACT4Y+ZWNV6ApzEv0UrsF2T8JWmXez_-H-EGMii-S_2JbXv07Q@mail.gmail.com/
> > >
> > > and there is a potential explanation as to what may be the problem.
> >
> > Thanks Dmitry!
> >
> > An issue with this new hugetlb locking was previously reported and I have been
> > working on a solution. When I look at the reproducer, I see that it is calling
> > madvise(MADV_DONTNEED). This triggers the other issue and could certainly
> > cause the issue reported here.
> >
> > Proposed patches are here and in next-20221111:
> > https://lore.kernel.org/linux-mm/[email protected]/
> >
> > I am currently trying to run the reproducer, but it is not reproducing quickly.
> > Since this is a timing issue that as expected. Interesting that this
> > report is run on arm64 and I am trying to reproduce on x86. Although, the
> > issue is not architecture specific in any way.
>
> After tweaking my config, I was able to reliably reproduce.
>
> > I'll keep looking, but am fairly confident this is the root cause.
>
> I was also able to verify the series above addresses the issue.

Let's tell syzbot about the fix so that it reports similar issues in future:

#syz fix:
hugetlb: don't delete vma_lock in hugetlb MADV_DONTNEED processing