2023-07-21 21:17:20

by syzbot

[permalink] [raw]
Subject: [syzbot] [gfs2?] kernel panic: hung_task: blocked tasks (2)

Hello,

syzbot found the following issue on:

HEAD commit: fdf0eaf11452 Linux 6.5-rc2
git tree: upstream
console+strace: https://syzkaller.appspot.com/x/log.txt?x=1797783aa80000
kernel config: https://syzkaller.appspot.com/x/.config?x=27e33fd2346a54b
dashboard link: https://syzkaller.appspot.com/bug?extid=607aa822c60b2e75b269
compiler: gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=11322fb6a80000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=17687f1aa80000

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/0ac950f24d26/disk-fdf0eaf1.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/666fcbcfa05d/vmlinux-fdf0eaf1.xz
kernel image: https://storage.googleapis.com/syzbot-assets/5bbe73baa630/bzImage-fdf0eaf1.xz
mounted in repro: https://storage.googleapis.com/syzbot-assets/85821d156573/mount_0.gz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: [email protected]

Kernel panic - not syncing: hung_task: blocked tasks
CPU: 0 PID: 27 Comm: khungtaskd Not tainted 6.5.0-rc2-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/03/2023
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xd9/0x1b0 lib/dump_stack.c:106
panic+0x6a4/0x750 kernel/panic.c:340
check_hung_uninterruptible_tasks kernel/hung_task.c:226 [inline]
watchdog+0xcf2/0x11b0 kernel/hung_task.c:379
kthread+0x33a/0x430 kernel/kthread.c:389
ret_from_fork+0x2c/0x70 arch/x86/kernel/process.c:145
ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:296
RIP: 0000:0x0
Code: Unable to access opcode bytes at 0xffffffffffffffd6.
RSP: 0000:0000000000000000 EFLAGS: 00000000 ORIG_RAX: 0000000000000000
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
</TASK>
Kernel Offset: disabled
Rebooting in 86400 seconds..


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at [email protected].

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

If the bug is already fixed, let syzbot know by replying with:
#syz fix: exact-commit-title

If you want syzbot to run the reproducer, reply with:
#syz test: git://repo/address.git branch-or-commit-hash
If you attach or paste a git patch, syzbot will apply it before testing.

If you want to change bug's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)

If the bug is a duplicate of another bug, reply with:
#syz dup: exact-subject-of-another-report

If you want to undo deduplication, reply with:
#syz undup


2023-07-28 00:03:34

by syzbot

[permalink] [raw]
Subject: Re: [syzbot] [gfs2?] kernel panic: hung_task: blocked tasks (2)

syzbot has bisected this issue to:

commit 9c8ad7a2ff0bfe58f019ec0abc1fb965114dde7d
Author: David Howells <[email protected]>
Date: Thu May 16 11:52:27 2019 +0000

uapi, x86: Fix the syscall numbering of the mount API syscalls [ver #2]

bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=169b475ea80000
start commit: fdf0eaf11452 Linux 6.5-rc2
git tree: upstream
final oops: https://syzkaller.appspot.com/x/report.txt?x=159b475ea80000
console output: https://syzkaller.appspot.com/x/log.txt?x=119b475ea80000
kernel config: https://syzkaller.appspot.com/x/.config?x=27e33fd2346a54b
dashboard link: https://syzkaller.appspot.com/bug?extid=607aa822c60b2e75b269
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=11322fb6a80000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=17687f1aa80000

Reported-by: [email protected]
Fixes: 9c8ad7a2ff0b ("uapi, x86: Fix the syscall numbering of the mount API syscalls [ver #2]")

For information about bisection process see: https://goo.gl/tpsmEJ#bisection

2023-07-28 09:52:26

by David Howells

[permalink] [raw]
Subject: Re: [syzbot] [gfs2?] kernel panic: hung_task: blocked tasks (2)

syzbot <[email protected]> wrote:

> Fixes: 9c8ad7a2ff0b ("uapi, x86: Fix the syscall numbering of the mount API syscalls [ver #2]")

This would seem unlikely to be the culprit. It just changes the numbering on
the fsconfig-related syscalls.

Running the test program on v6.5-rc3, however, I end up with the test process
stuck in the D state:

INFO: task repro-17687f1aa:5551 blocked for more than 120 seconds.
Not tainted 6.5.0-rc3-build3+ #1448
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:repro-17687f1aa state:D stack:0 pid:5551 ppid:5516 flags:0x00004002
Call Trace:
<TASK>
__schedule+0x4a7/0x4f1
schedule+0x66/0xa1
schedule_timeout+0x9d/0xd7
? __next_timer_interrupt+0xf6/0xf6
gfs2_gl_hash_clear+0xa0/0xdc
? sugov_irq_work+0x15/0x15
gfs2_put_super+0x19f/0x1d3
generic_shutdown_super+0x78/0x187
kill_block_super+0x1c/0x32
deactivate_locked_super+0x2f/0x61
cleanup_mnt+0xab/0xcc
task_work_run+0x6b/0x80
exit_to_user_mode_prepare+0x76/0xfd
syscall_exit_to_user_mode+0x14/0x31
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7f89aac31dab
RSP: 002b:00007fff43d9b878 EFLAGS: 00000206 ORIG_RAX: 00000000000000a6
RAX: 0000000000000000 RBX: 00007fff43d9cad8 RCX: 00007f89aac31dab
RDX: 0000000000000000 RSI: 000000000000000a RDI: 00007fff43d9b920
RBP: 00007fff43d9c960 R08: 0000000000000000 R09: 0000000000000073
R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000000
R13: 00007fff43d9cae8 R14: 0000000000417e18 R15: 00007f89aad51000
</TASK>

David


2023-07-28 12:57:37

by Bob Peterson

[permalink] [raw]
Subject: Re: [syzbot] [gfs2?] kernel panic: hung_task: blocked tasks (2)

On 7/28/23 3:20 AM, David Howells wrote:
> syzbot <[email protected]> wrote:
>
>> Fixes: 9c8ad7a2ff0b ("uapi, x86: Fix the syscall numbering of the mount API syscalls [ver #2]")
>
> This would seem unlikely to be the culprit. It just changes the numbering on
> the fsconfig-related syscalls.
>
> Running the test program on v6.5-rc3, however, I end up with the test process
> stuck in the D state:
>
> INFO: task repro-17687f1aa:5551 blocked for more than 120 seconds.
> Not tainted 6.5.0-rc3-build3+ #1448
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> task:repro-17687f1aa state:D stack:0 pid:5551 ppid:5516 flags:0x00004002
> Call Trace:
> <TASK>
> __schedule+0x4a7/0x4f1
> schedule+0x66/0xa1
> schedule_timeout+0x9d/0xd7
> ? __next_timer_interrupt+0xf6/0xf6
> gfs2_gl_hash_clear+0xa0/0xdc
> ? sugov_irq_work+0x15/0x15
> gfs2_put_super+0x19f/0x1d3
> generic_shutdown_super+0x78/0x187
> kill_block_super+0x1c/0x32
> deactivate_locked_super+0x2f/0x61
> cleanup_mnt+0xab/0xcc
> task_work_run+0x6b/0x80
> exit_to_user_mode_prepare+0x76/0xfd
> syscall_exit_to_user_mode+0x14/0x31
> entry_SYSCALL_64_after_hwframe+0x63/0xcd
> RIP: 0033:0x7f89aac31dab
> RSP: 002b:00007fff43d9b878 EFLAGS: 00000206 ORIG_RAX: 00000000000000a6
> RAX: 0000000000000000 RBX: 00007fff43d9cad8 RCX: 00007f89aac31dab
> RDX: 0000000000000000 RSI: 000000000000000a RDI: 00007fff43d9b920
> RBP: 00007fff43d9c960 R08: 0000000000000000 R09: 0000000000000073
> R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000000
> R13: 00007fff43d9cae8 R14: 0000000000417e18 R15: 00007f89aad51000
> </TASK>
>
> David
>
Hi David,

This indicates gfs2 is having trouble resolving and freeing all its
glocks, which usually means a reference counting problem or ail (active
items list) problem during unmount.

If gfs2_gl_hash_clear gets stuck for a long period of time it is
supposed to dump the remaining list of glocks that still have not been
resolved. I think it takes 10 minutes or so. Can you post the console
messages that follow? That will help us figure out what's happening. Thanks.

Regards,

Bob Peterson