2023-09-04 08:44:45

by syzbot

[permalink] [raw]
Subject: [syzbot] [xfs?] INFO: task hung in __fdget_pos (4)

Hello,

syzbot found the following issue on:

HEAD commit: b97d64c72259 Merge tag '6.6-rc-smb3-client-fixes-part1' of..
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=14136d8fa80000
kernel config: https://syzkaller.appspot.com/x/.config?x=958c1fdc38118172
dashboard link: https://syzkaller.appspot.com/bug?extid=e245f0516ee625aaa412
compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40

Unfortunately, I don't have any reproducer for this issue yet.

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/99875b49c50b/disk-b97d64c7.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/5bcacc1a3f5b/vmlinux-b97d64c7.xz
kernel image: https://storage.googleapis.com/syzbot-assets/e2fe9c8de38a/bzImage-b97d64c7.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: [email protected]

INFO: task syz-executor.0:19830 blocked for more than 143 seconds.
Not tainted 6.5.0-syzkaller-08894-gb97d64c72259 #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:syz-executor.0 state:D stack:26480 pid:19830 ppid:5057 flags:0x00004006
Call Trace:
<TASK>
context_switch kernel/sched/core.c:5382 [inline]
__schedule+0x1873/0x48f0 kernel/sched/core.c:6695
schedule+0xc3/0x180 kernel/sched/core.c:6771
schedule_preempt_disabled+0x13/0x20 kernel/sched/core.c:6830
__mutex_lock_common+0xe33/0x2530 kernel/locking/mutex.c:679
__mutex_lock kernel/locking/mutex.c:747 [inline]
mutex_lock_nested+0x1b/0x20 kernel/locking/mutex.c:799
__fdget_pos+0x2b0/0x340 fs/file.c:1064
fdget_pos include/linux/file.h:74 [inline]
ksys_write+0x82/0x2c0 fs/read_write.c:628
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7f61ad07cae9
RSP: 002b:00007f61abbfe0c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 00007f61ad19c050 RCX: 00007f61ad07cae9
RDX: 0000000000000090 RSI: 0000000020000200 RDI: 0000000000000005
RBP: 00007f61ad0c847a R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 000000000000006e R14: 00007f61ad19c050 R15: 00007ffc19e49218
</TASK>
INFO: task syz-executor.0:19835 blocked for more than 143 seconds.
Not tainted 6.5.0-syzkaller-08894-gb97d64c72259 #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:syz-executor.0 state:D stack:29552 pid:19835 ppid:5057 flags:0x00004006
Call Trace:
<TASK>
context_switch kernel/sched/core.c:5382 [inline]
__schedule+0x1873/0x48f0 kernel/sched/core.c:6695
schedule+0xc3/0x180 kernel/sched/core.c:6771
schedule_preempt_disabled+0x13/0x20 kernel/sched/core.c:6830
rwsem_down_read_slowpath+0x5f4/0x950 kernel/locking/rwsem.c:1086
__down_read_common kernel/locking/rwsem.c:1250 [inline]
__down_read kernel/locking/rwsem.c:1263 [inline]
down_read+0x9c/0x2f0 kernel/locking/rwsem.c:1522
inode_lock_shared include/linux/fs.h:812 [inline]
lookup_slow+0x45/0x70 fs/namei.c:1709
walk_component+0x2d0/0x400 fs/namei.c:2001
lookup_last fs/namei.c:2458 [inline]
path_lookupat+0x16f/0x450 fs/namei.c:2482
filename_lookup+0x255/0x610 fs/namei.c:2511
user_path_at_empty+0x44/0x180 fs/namei.c:2910
user_path_at include/linux/namei.h:57 [inline]
__do_sys_chdir fs/open.c:551 [inline]
__se_sys_chdir+0xbf/0x220 fs/open.c:545
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7f61ad07cae9
RSP: 002b:00007f61a37fd0c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000050
RAX: ffffffffffffffda RBX: 00007f61ad19c120 RCX: 00007f61ad07cae9
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000020000040
RBP: 00007f61ad0c847a R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 000000000000006e R14: 00007f61ad19c120 R15: 00007ffc19e49218
</TASK>
INFO: task syz-executor.0:19836 blocked for more than 144 seconds.
Not tainted 6.5.0-syzkaller-08894-gb97d64c72259 #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:syz-executor.0 state:D stack:29104 pid:19836 ppid:5057 flags:0x00004006
Call Trace:
<TASK>
context_switch kernel/sched/core.c:5382 [inline]
__schedule+0x1873/0x48f0 kernel/sched/core.c:6695
schedule+0xc3/0x180 kernel/sched/core.c:6771
schedule_preempt_disabled+0x13/0x20 kernel/sched/core.c:6830
rwsem_down_read_slowpath+0x5f4/0x950 kernel/locking/rwsem.c:1086
__down_read_common kernel/locking/rwsem.c:1250 [inline]
__down_read kernel/locking/rwsem.c:1263 [inline]
down_read+0x9c/0x2f0 kernel/locking/rwsem.c:1522
inode_lock_shared include/linux/fs.h:812 [inline]
open_last_lookups fs/namei.c:3562 [inline]
path_openat+0x7b3/0x3180 fs/namei.c:3793
do_filp_open+0x234/0x490 fs/namei.c:3823
do_sys_openat2+0x13e/0x1d0 fs/open.c:1422
do_sys_open fs/open.c:1437 [inline]
__do_sys_open fs/open.c:1445 [inline]
__se_sys_open fs/open.c:1441 [inline]
__x64_sys_open+0x225/0x270 fs/open.c:1441
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7f61ad07cae9
RSP: 002b:00007f61a35dc0c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000002
RAX: ffffffffffffffda RBX: 00007f61ad19c1f0 RCX: 00007f61ad07cae9
RDX: 0000000000000000 RSI: 0000000000000001 RDI: 00000000200001c0
RBP: 00007f61ad0c847a R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 000000000000006e R14: 00007f61ad19c1f0 R15: 00007ffc19e49218
</TASK>
INFO: lockdep is turned off.
NMI backtrace for cpu 0
CPU: 0 PID: 28 Comm: khungtaskd Not tainted 6.5.0-syzkaller-08894-gb97d64c72259 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/26/2023
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x1e7/0x2d0 lib/dump_stack.c:106
nmi_cpu_backtrace+0x498/0x4d0 lib/nmi_backtrace.c:113
nmi_trigger_cpumask_backtrace+0x198/0x310 lib/nmi_backtrace.c:62
trigger_all_cpu_backtrace include/linux/nmi.h:160 [inline]
check_hung_uninterruptible_tasks kernel/hung_task.c:222 [inline]
watchdog+0xdf5/0xe40 kernel/hung_task.c:379
kthread+0x2b8/0x350 kernel/kthread.c:388
ret_from_fork+0x48/0x80 arch/x86/kernel/process.c:145
ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:304
</TASK>
Sending NMI from CPU 0 to CPUs 1:
NMI backtrace for cpu 1
CPU: 1 PID: 21301 Comm: syz-executor.1 Not tainted 6.5.0-syzkaller-08894-gb97d64c72259 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/26/2023
RIP: 0010:clear_page_erms+0xb/0x10 arch/x86/lib/clear_page_64.S:50
Code: 48 89 47 20 48 89 47 28 48 89 47 30 48 89 47 38 48 8d 7f 40 75 d9 90 c3 0f 1f 80 00 00 00 00 f3 0f 1e fa b9 00 10 00 00 31 c0 <f3> aa c3 66 90 f3 0f 1e fa 48 83 f9 40 73 36 83 f9 08 73 0f 85 c9
RSP: 0018:ffffc900031ef8b8 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00000000000004c0
RDX: ffff888042e98000 RSI: 0000000000000001 RDI: ffff888042e98b40
RBP: ffffc900031efb70 R08: ffffea00010ba637 R09: 0000000000000000
R10: ffffed10085d3000 R11: fffff940002174c7 R12: 0000000000000001
R13: 0000000000000001 R14: ffffea00010ba600 R15: 0000000000000000
FS: 00005555556b5480(0000) GS:ffff8880b9900000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f561ff9c018 CR3: 00000000289f1000 CR4: 00000000003506e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<NMI>
</NMI>
<TASK>
clear_page arch/x86/include/asm/page_64.h:53 [inline]
clear_highpage_kasan_tagged include/linux/highmem.h:248 [inline]
kernel_init_pages mm/page_alloc.c:1071 [inline]
post_alloc_hook+0xf8/0x210 mm/page_alloc.c:1534
prep_new_page mm/page_alloc.c:1543 [inline]
get_page_from_freelist+0x31ec/0x3370 mm/page_alloc.c:3183
__alloc_pages+0x255/0x670 mm/page_alloc.c:4439
vm_area_alloc_pages mm/vmalloc.c:3063 [inline]
__vmalloc_area_node mm/vmalloc.c:3139 [inline]
__vmalloc_node_range+0x9a3/0x1490 mm/vmalloc.c:3320
vmalloc_user+0x74/0x80 mm/vmalloc.c:3474
kcov_ioctl+0x59/0x630 kernel/kcov.c:704
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:871 [inline]
__se_sys_ioctl+0xf8/0x170 fs/ioctl.c:857
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7ff00647c84b
Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 1c 48 8b 44 24 18 64 48 2b 04 25 28 00 00
RSP: 002b:00007ffe6712be40 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007ff00647c84b
RDX: 0000000000040000 RSI: ffffffff80086301 RDI: 00000000000000d9
RBP: 00007ff00659c0e8 R08: 00000000000000d8 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe6712c578
R13: 0000000000000003 R14: 00007ff00659c0e8 R15: 0000000000000001
</TASK>


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at [email protected].

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

If the bug is already fixed, let syzbot know by replying with:
#syz fix: exact-commit-title

If you want to overwrite bug's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)

If the bug is a duplicate of another bug, reply with:
#syz dup: exact-subject-of-another-report

If you want to undo deduplication, reply with:
#syz undup


2023-09-04 18:48:01

by Christian Brauner

[permalink] [raw]
Subject: Re: [syzbot] [xfs?] INFO: task hung in __fdget_pos (4)

> > Which is pretty much the case for all filesystem bug reports from

I think we should at least consider the option of reducing the number of
filesystems syzbot tests. Filesystems that are orphaned or that have no
active maintainers for a long period of time just produce noise.
Slapping tags such as [ntfs3?] or [reiserfs?] really don't help to
reduce the noise.

2023-09-05 16:04:45

by Mateusz Guzik

[permalink] [raw]
Subject: Re: [syzbot] [xfs?] INFO: task hung in __fdget_pos (4)

On 9/3/23, Al Viro <[email protected]> wrote:
> On Sun, Sep 03, 2023 at 08:57:23PM +0200, Mateusz Guzik wrote:
>
>> This does not dump backtraces, just a list of tasks + some stats.
>>
>> The closest to useful here I found are 'w' ("Dumps tasks that are in
>> uninterruptable (blocked) state.") and 'l' ("Shows a stack backtrace
>> for all active CPUs."), both of which can miss the task which matters
>> (e.g., stuck in a very much *interruptible* state with f_pos_lock
>> held).
>>
>> Unless someone can point at a way to get all these stacks, I'm going
>> to hack something up in the upcoming week, if only for immediate
>> syzbot usage.
>
> Huh? Sample of output here:
> 2023-09-03T15:34:36.271833-04:00 duke kernel: [87367.574459] task:ssh-agent
> state:S stack:0 pid:3949 ppid:3947 flags:0x
> 00000002
> 2023-09-03T15:34:36.284796-04:00 duke kernel: [87367.582848] Call Trace:
> 2023-09-03T15:34:36.284797-04:00 duke kernel: [87367.585306] <TASK>
> 2023-09-03T15:34:36.284797-04:00 duke kernel: [87367.587423]
> __schedule+0x222/0x630
> 2023-09-03T15:34:36.291459-04:00 duke kernel: [87367.590932]
> schedule+0x4b/0x90
> 2023-09-03T15:34:36.291460-04:00 duke kernel: [87367.594086]
> schedule_hrtimeout_range_clock+0xb1/0x110
> 2023-09-03T15:34:36.300477-04:00 duke kernel: [87367.599245] ?
> __hrtimer_init+0xf0/0xf0
> 2023-09-03T15:34:36.300477-04:00 duke kernel: [87367.603103]
> do_sys_poll+0x489/0x580
> 2023-09-03T15:34:36.308971-04:00 duke kernel: [87367.606702] ?
> _raw_spin_unlock_irqrestore+0x9/0x20
> 2023-09-03T15:34:36.308972-04:00 duke kernel: [87367.611598] ?
> __alloc_pages+0x111/0x1a0
> 2023-09-03T15:34:36.317380-04:00 duke kernel: [87367.615544] ?
> select_task_rq_fair+0x1c8/0xf70
> 2023-09-03T15:34:36.317381-04:00 duke kernel: [87367.620006] ?
> _raw_spin_unlock+0x5/0x10
> 2023-09-03T15:34:36.325273-04:00 duke kernel: [87367.623953] ?
> sched_clock_cpu+0x1c/0xd0
> 2023-09-03T15:34:36.325274-04:00 duke kernel: [87367.627899] ?
> default_send_IPI_single_phys+0x21/0x30
> 2023-09-03T15:34:36.334812-04:00 duke kernel: [87367.632977] ?
> ttwu_queue_wakelist+0x109/0x110
> 2023-09-03T15:34:36.334813-04:00 duke kernel: [87367.637439] ?
> _raw_spin_unlock_irqrestore+0x9/0x20
> 2023-09-03T15:34:36.343753-04:00 duke kernel: [87367.642344] ?
> try_to_wake_up+0x1eb/0x300
> 2023-09-03T15:34:36.343754-04:00 duke kernel: [87367.646380] ?
> __pollwait+0x110/0x110
> 2023-09-03T15:34:36.351376-04:00 duke kernel: [87367.650063] ?
> _raw_spin_unlock+0x5/0x10
> 2023-09-03T15:34:36.351377-04:00 duke kernel: [87367.654001] ?
> unix_stream_read_generic+0x528/0xa90
> 2023-09-03T15:34:36.361179-04:00 duke kernel: [87367.658906] ?
> _raw_spin_unlock_irqrestore+0x9/0x20
> 2023-09-03T15:34:36.361180-04:00 duke kernel: [87367.663805] ?
> _raw_spin_unlock_irqrestore+0x9/0x20
> 2023-09-03T15:34:36.370988-04:00 duke kernel: [87367.668708] ?
> __inode_wait_for_writeback+0x68/0xc0
> 2023-09-03T15:34:36.370989-04:00 duke kernel: [87367.673614] ?
> fsnotify_grab_connector+0x49/0x90
> 2023-09-03T15:34:36.380274-04:00 duke kernel: [87367.678258] ?
> fsnotify_destroy_marks+0x11/0x140
> 2023-09-03T15:34:36.380275-04:00 duke kernel: [87367.682901] ?
> enqueue_task_fair+0x211/0x5f0
> 2023-09-03T15:34:36.389726-04:00 duke kernel: [87367.687196] ?
> __rseq_handle_notify_resume+0x2b4/0x3a0
> 2023-09-03T15:34:36.389728-04:00 duke kernel: [87367.692353] ?
> recalibrate_cpu_khz+0x10/0x10
> 2023-09-03T15:34:36.397884-04:00 duke kernel: [87367.696651] ?
> ktime_get_ts64+0x47/0xe0
> 2023-09-03T15:34:36.397885-04:00 duke kernel: [87367.700509]
> __x64_sys_poll+0x93/0x120
> 2023-09-03T15:34:36.405254-04:00 duke kernel: [87367.704282]
> do_syscall_64+0x42/0x90
> 2023-09-03T15:34:36.405255-04:00 duke kernel: [87367.707880]
> entry_SYSCALL_64_after_hwframe+0x4b/0xb5
> 2023-09-03T15:34:36.413922-04:00 duke kernel: [87367.712959] RIP:
> 0033:0x7f451858f000
> 2023-09-03T15:34:36.413923-04:00 duke kernel: [87367.716548] RSP:
> 002b:00007ffd799cece8 EFLAGS: 00000202 ORIG_RAX: 0000000000000007
> 2023-09-03T15:34:36.428692-04:00 duke kernel: [87367.724154] RAX:
> ffffffffffffffda RBX: 00000000000001b0 RCX: 00007f451858f000
> 2023-09-03T15:34:36.428692-04:00 duke kernel: [87367.731317] RDX:
> 0000000000002710 RSI: 0000000000000001 RDI: 00005596fc603190
> 2023-09-03T15:34:36.443022-04:00 duke kernel: [87367.738485] RBP:
> 0000000000000001 R08: 0000000000000000 R09: 0000000000000000
> 2023-09-03T15:34:36.443023-04:00 duke kernel: [87367.745649] R10:
> 0000000000000000 R11: 0000000000000202 R12: 0000000000000000
> 2023-09-03T15:34:36.457354-04:00 duke kernel: [87367.752818] R13:
> 0000000000000001 R14: 000000000000000a R15: 00005596fc603190
> 2023-09-03T15:34:36.457355-04:00 duke kernel: [87367.759981] </TASK>
>
> Looks like a stack trace to me; seeing one of the callers of fdget_pos()
> in that would tell you who's currently holding *some* ->f_pos_lock.
>
> That - on 6.1.42, with fairly bland .config (minimal debugging;
> I need that box for fast builds, among other things). Enable
> lockdep and you'll get who's holding which logs in addition
> to those stack traces...
>

That's my bad, the only content which survived for me in terminal
buffer was a spew for the scheduler and a ps-like task list.
Stacktraces are indeed there. Sorry for the noise on this one. I
verified stack dumps are not gated by any defines either btw, so it's
all good on this front.

All this brainos aside, I added Aleksandr Nogikh to To: in my previous
e-mail. From short poking around I found he was responsive to some
queries concerning Linux vs syzbot and I figured would a good person
to ask.

So, Aleksandr, tl;dr would it be a problem to add the 't' sysrq to
syzbot in order to dump backtraces from all threads? Either break t
over a serial port or t written to /proc/sysrq-trigger. Lack of this
info blocks progress on the issue reported here
(https://syzkaller.appspot.com/bug?extid=e245f0516ee625aaa412)

--
Mateusz Guzik <mjguzik gmail.com>

2023-09-05 16:05:55

by Al Viro

[permalink] [raw]
Subject: Re: [syzbot] [xfs?] INFO: task hung in __fdget_pos (4)

On Mon, Sep 04, 2023 at 08:27:15AM +1000, Dave Chinner wrote:

> It already is (sysrq-t), but I'm not sure that will help - if it is
> a leaked unlock then nothing will show up at all.

Unlikely; grep and you'll see - very few callers, and for all of them
there's an fdput_pos() downstream of any fdget_pos() that had picked
non-NULL file reference.

In theory, it's not impossible that something had stripped FDPUT_POS_UNLOCK
from the flags, but that's basically "something might've corrupted the
local variables" scenario. There are 12 functions total where we might
be calling fdget_pos() and all of them are pretty small (1 in alpha
osf_sys.c, 6 in read_write.c and 5 in readdir.c); none of those takes
an address of struct fd, none of them has assignments to it after fdget_pos()
and the only accesses to its members are those to fd.file - all fetches.
Control flow is also easy to check - they are all short.

IMO it's much more likely that we'll find something like

thread A:
grabs some fs lock
gets stuck on something
thread B: write()
finds file
grabs ->f_pos_lock
calls into filesystem
blocks on fs lock held by A
thread C: read()/write()/lseek() on the same file
blocks on ->f_pos_lock

2023-09-05 16:19:24

by Christian Brauner

[permalink] [raw]
Subject: Re: [syzbot] [xfs?] INFO: task hung in __fdget_pos (4)

On Mon, Sep 04, 2023 at 09:09:48AM +1000, Dave Chinner wrote:
> On Mon, Sep 04, 2023 at 12:47:53AM +0200, Mateusz Guzik wrote:
> > On 9/4/23, Dave Chinner <[email protected]> wrote:
> > > On Sun, Sep 03, 2023 at 10:33:57AM +0200, Mateusz Guzik wrote:
> > >> On Sun, Sep 03, 2023 at 03:25:28PM +1000, Dave Chinner wrote:
> > >> > On Sat, Sep 02, 2023 at 09:11:34PM -0700, syzbot wrote:
> > >> > > Hello,
> > >> > >
> > >> > > syzbot found the following issue on:
> > >> > >
> > >> > > HEAD commit: b97d64c72259 Merge tag
> > >> > > '6.6-rc-smb3-client-fixes-part1' of..
> > >> > > git tree: upstream
> > >> > > console output:
> > >> > > https://syzkaller.appspot.com/x/log.txt?x=14136d8fa80000
> > >> > > kernel config:
> > >> > > https://syzkaller.appspot.com/x/.config?x=958c1fdc38118172
> > >> > > dashboard link:
> > >> > > https://syzkaller.appspot.com/bug?extid=e245f0516ee625aaa412
> > >> > > compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for
> > >> > > Debian) 2.40
> > >> > >
> > >> > > Unfortunately, I don't have any reproducer for this issue yet.
> > >> >
> > >> > Been happening for months, apparently, yet for some reason it now
> > >> > thinks a locking hang in __fdget_pos() is an XFS issue?
> > >> >
> > >> > #syz set subsystems: fs
> > >> >
> > >>
> > >> The report does not have info necessary to figure this out -- no
> > >> backtrace for whichever thread which holds f_pos_lock. I clicked on a
> > >> bunch of other reports and it is the same story.
> > >
> > > That's true, but there's nothing that points at XFS in *any* of the
> > > bug reports. Indeed, log from the most recent report doesn't have
> > > any of the output from the time stuff hung. i.e. the log starts
> > > at kernel time 669.487771 seconds, and the hung task report is at:
> > >
> >
> > I did not mean to imply this is an xfs problem.
> >
> > You wrote reports have been coming in for months so it is pretty clear
> > nobody is investigating.
>
> Which is pretty much the case for all filesystem bug reports from
> syzbot except for those reported against XFS. Almost nobody else is

I'll try to address syzkaller reports that we do get for VFS issues
asap but often they're just not VFS issues or have no reproducer.

2023-11-30 16:58:42

by syzbot

[permalink] [raw]
Subject: Re: [syzbot] [fs] INFO: task hung in __fdget_pos (4)

syzbot has found a reproducer for the following issue on:

HEAD commit: 3b47bc037bd4 Merge tag 'pinctrl-v6.7-2' of git://git.kerne..
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=14e6bdaae80000
kernel config: https://syzkaller.appspot.com/x/.config?x=c2c74446ab4f0028
dashboard link: https://syzkaller.appspot.com/bug?extid=e245f0516ee625aaa412
compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=11026e54e80000

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/43de6cde1436/disk-3b47bc03.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/24a2ba210bcf/vmlinux-3b47bc03.xz
kernel image: https://storage.googleapis.com/syzbot-assets/a5d414ced6d5/bzImage-3b47bc03.xz
mounted in repro: https://storage.googleapis.com/syzbot-assets/c28d17ae0132/mount_4.gz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: [email protected]

INFO: task syz-executor.0:24270 blocked for more than 143 seconds.
Not tainted 6.7.0-rc3-syzkaller-00033-g3b47bc037bd4 #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:syz-executor.0 state:D stack:28784 pid:24270 tgid:24259 ppid:5149 flags:0x00004006
Call Trace:
<TASK>
context_switch kernel/sched/core.c:5376 [inline]
__schedule+0x1961/0x4ab0 kernel/sched/core.c:6688
__schedule_loop kernel/sched/core.c:6763 [inline]
schedule+0x149/0x260 kernel/sched/core.c:6778
schedule_preempt_disabled+0x13/0x20 kernel/sched/core.c:6835
__mutex_lock_common kernel/locking/mutex.c:679 [inline]
__mutex_lock+0x6a3/0xd60 kernel/locking/mutex.c:747
__fdget_pos+0x2b0/0x340 fs/file.c:1177
fdget_pos include/linux/file.h:74 [inline]
__do_sys_getdents64 fs/readdir.c:401 [inline]
__se_sys_getdents64+0x1dc/0x4f0 fs/readdir.c:390
do_syscall_x64 arch/x86/entry/common.c:51 [inline]
do_syscall_64+0x45/0x110 arch/x86/entry/common.c:82
entry_SYSCALL_64_after_hwframe+0x63/0x6b
RIP: 0033:0x7fd0cec7cae9
RSP: 002b:00007fd0cfa610c8 EFLAGS: 00000246 ORIG_RAX: 00000000000000d9
RAX: ffffffffffffffda RBX: 00007fd0ced9c050 RCX: 00007fd0cec7cae9
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000005
RBP: 00007fd0cecc847a R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 000000000000006e R14: 00007fd0ced9c050 R15: 00007ffeeda72018
</TASK>

Showing all locks held in the system:
1 lock held by khungtaskd/29:
#0: ffffffff8d92cf60 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:301 [inline]
#0: ffffffff8d92cf60 (rcu_read_lock){....}-{1:2}, at: rcu_read_lock include/linux/rcupdate.h:747 [inline]
#0: ffffffff8d92cf60 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x55/0x2a0 kernel/locking/lockdep.c:6614
2 locks held by getty/4818:
#0: ffff88814afe20a0 (&tty->ldisc_sem){++++}-{0:0}, at: tty_ldisc_ref_wait+0x25/0x70 drivers/tty/tty_ldisc.c:243
#1: ffffc900031332f0 (&ldata->atomic_read_lock){+.+.}-{3:3}, at: n_tty_read+0x6b4/0x1e10 drivers/tty/n_tty.c:2201
1 lock held by syz-executor.5/5143:
3 locks held by syz-executor.0/24262:
1 lock held by syz-executor.0/24270:
#0: ffff88807c5574c8 (&f->f_pos_lock){+.+.}-{3:3}, at: __fdget_pos+0x2b0/0x340 fs/file.c:1177
1 lock held by syz-executor.0/24319:
2 locks held by syz-executor.1/32017:
#0: ffff8880b983c358 (&rq->__lock){-.-.}-{2:2}, at: raw_spin_rq_lock_nested+0x2a/0x140 kernel/sched/core.c:558
#1: ffff8880b9828808 (&per_cpu_ptr(group->pcpu, cpu)->seq){-.-.}-{0:0}, at: psi_task_switch+0x441/0x770 kernel/sched/psi.c:988
1 lock held by syz-executor.1/32030:
#0: ffff8880b983c358 (&rq->__lock){-.-.}-{2:2}, at: raw_spin_rq_lock_nested+0x2a/0x140 kernel/sched/core.c:558
2 locks held by syz-executor.4/32028:
#0: ffff8880b983c358 (&rq->__lock){-.-.}-{2:2}, at: raw_spin_rq_lock_nested+0x2a/0x140 kernel/sched/core.c:558
#1: ffff8880b9828808 (&per_cpu_ptr(group->pcpu, cpu)->seq){-.-.}-{0:0}, at: psi_task_switch+0x441/0x770 kernel/sched/psi.c:988
2 locks held by syz-executor.3/32042:
#0: ffff88801cfea360 (&lo->lo_mutex){+.+.}-{3:3}, at: loop_global_lock_killable drivers/block/loop.c:120 [inline]
#0: ffff88801cfea360 (&lo->lo_mutex){+.+.}-{3:3}, at: loop_configure+0x1f9/0x1250 drivers/block/loop.c:1023
#1: ffffe8ffffd6a108 (&per_cpu_ptr(group->pcpu, cpu)->seq){-.-.}-{0:0}, at: psi_task_switch+0x3a7/0x770 kernel/sched/psi.c:976
4 locks held by syz-executor.0/32045:

=============================================

NMI backtrace for cpu 1
CPU: 1 PID: 29 Comm: khungtaskd Not tainted 6.7.0-rc3-syzkaller-00033-g3b47bc037bd4 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/10/2023
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x1e7/0x2d0 lib/dump_stack.c:106
nmi_cpu_backtrace+0x498/0x4d0 lib/nmi_backtrace.c:113
nmi_trigger_cpumask_backtrace+0x198/0x310 lib/nmi_backtrace.c:62
trigger_all_cpu_backtrace include/linux/nmi.h:160 [inline]
check_hung_uninterruptible_tasks kernel/hung_task.c:222 [inline]
watchdog+0xfaf/0xff0 kernel/hung_task.c:379
kthread+0x2d3/0x370 kernel/kthread.c:388
ret_from_fork+0x48/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:242
</TASK>
Sending NMI from CPU 1 to CPUs 0:
NMI backtrace for cpu 0
CPU: 0 PID: 5138 Comm: syz-executor.1 Not tainted 6.7.0-rc3-syzkaller-00033-g3b47bc037bd4 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/10/2023
RIP: 0010:orc_find arch/x86/kernel/unwind_orc.c:217 [inline]
RIP: 0010:unwind_next_frame+0x271/0x29e0 arch/x86/kernel/unwind_orc.c:494
Code: 50 00 39 eb 0f 86 f7 1d 00 00 89 e8 48 8d 1c 85 08 26 20 90 48 89 d8 48 c1 e8 03 49 be 00 00 00 00 00 fc ff df 42 0f b6 04 30 <84> c0 0f 85 89 20 00 00 44 8b 3b 89 e8 ff c0 48 8d 1c 85 08 26 20
RSP: 0018:ffffc90003f3f700 EFLAGS: 00000a06
RAX: 0000000000000000 RBX: ffffffff90211d88 RCX: 00000000000a6001
RDX: ffff88801eaa3b80 RSI: 0000000000003de0 RDI: 00000000000a6000
RBP: 0000000000003de0 R08: ffffffff813db069 R09: 0000000000000000
R10: ffffc90003f3f840 R11: fffff520007e7f14 R12: ffffc90003f3f840
R13: 00000000000a6001 R14: dffffc0000000000 R15: ffffffff813de03c
FS: 0000555556748480(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f84b6775198 CR3: 00000000200ef000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<NMI>
</NMI>
<TASK>
__unwind_start+0x641/0x7a0 arch/x86/kernel/unwind_orc.c:760
unwind_start arch/x86/include/asm/unwind.h:64 [inline]
arch_stack_walk+0xfd/0x1a0 arch/x86/kernel/stacktrace.c:24
stack_trace_save+0x117/0x1c0 kernel/stacktrace.c:122
kasan_save_stack mm/kasan/common.c:45 [inline]
kasan_set_track+0x4f/0x70 mm/kasan/common.c:52
__kasan_slab_alloc+0x66/0x70 mm/kasan/common.c:328
kasan_slab_alloc include/linux/kasan.h:188 [inline]
slab_post_alloc_hook+0x6c/0x3c0 mm/slab.h:763
slab_alloc_node mm/slub.c:3478 [inline]
slab_alloc mm/slub.c:3486 [inline]
__kmem_cache_alloc_lru mm/slub.c:3493 [inline]
kmem_cache_alloc+0x19e/0x2b0 mm/slub.c:3502
sk_prot_alloc+0x58/0x210 net/core/sock.c:2069
sk_alloc+0x38/0x370 net/core/sock.c:2128
inet_create+0x652/0xe20 net/ipv4/af_inet.c:325
__sock_create+0x48c/0x910 net/socket.c:1569
sock_create net/socket.c:1620 [inline]
__sys_socket_create net/socket.c:1657 [inline]
__sys_socket+0x14f/0x3b0 net/socket.c:1704
__do_sys_socket net/socket.c:1718 [inline]
__se_sys_socket net/socket.c:1716 [inline]
__x64_sys_socket+0x7a/0x90 net/socket.c:1716
do_syscall_x64 arch/x86/entry/common.c:51 [inline]
do_syscall_64+0x45/0x110 arch/x86/entry/common.c:82
entry_SYSCALL_64_after_hwframe+0x63/0x6b
RIP: 0033:0x7f84b667e867
Code: f0 ff ff 77 06 c3 0f 1f 44 00 00 48 c7 c2 b0 ff ff ff f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 b8 29 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ffed1a3ed28 EFLAGS: 00000206 ORIG_RAX: 0000000000000029
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f84b667e867
RDX: 0000000000000006 RSI: 0000000000000001 RDI: 0000000000000002
RBP: 00007ffed1a3f43c R08: 00007ffed1a3ed1c R09: 00007ffed1a3f127
R10: 00007ffed1a3eda0 R11: 0000000000000206 R12: 0000000000000032
R13: 0000000000165078 R14: 0000000000165035 R15: 0000000000000008
</TASK>


---
If you want syzbot to run the reproducer, reply with:
#syz test: git://repo/address.git branch-or-commit-hash
If you attach or paste a git patch, syzbot will apply it before testing.