2023-01-14 08:14:13

by syzbot

[permalink] [raw]
Subject: [syzbot] [ext4?] possible deadlock in ext4_setattr

Hello,

syzbot found the following issue on:

HEAD commit: 358a161a6a9e Merge branch 'for-next/fixes' into for-kernelci
git tree: git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-kernelci
console output: https://syzkaller.appspot.com/x/log.txt?x=166001ba480000
kernel config: https://syzkaller.appspot.com/x/.config?x=2573056c6a11f00d
dashboard link: https://syzkaller.appspot.com/bug?extid=cbb68193bdb95af4340a
compiler: Debian clang version 13.0.1-++20220126092033+75e33f71c2da-1~exp1~20220126212112.63, GNU ld (GNU Binutils for Debian) 2.35.2
userspace arch: arm64
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=1287585a480000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=167d644a480000

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/99d14e0f4c19/disk-358a161a.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/23275b612976/vmlinux-358a161a.xz
kernel image: https://storage.googleapis.com/syzbot-assets/ed79195fac61/Image-358a161a.gz.xz
mounted in repro: https://storage.googleapis.com/syzbot-assets/6e1f0d31d9b5/mount_0.gz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: [email protected]

EXT4-fs error (device loop0): ext4_mb_mark_diskspace_used:3789: comm syz-executor184: Allocating blocks 44-48 which overlap fs metadata
EXT4-fs error (device loop0): ext4_xattr_block_set:2175: inode #12: comm syz-executor184: bad block 0
======================================================
WARNING: possible circular locking dependency detected
6.2.0-rc3-syzkaller-16369-g358a161a6a9e #0 Not tainted
------------------------------------------------------
syz-executor184/4449 is trying to acquire lock:
ffff0000cb1ae540 (mapping.invalidate_lock){++++}-{3:3}, at: filemap_invalidate_lock include/linux/fs.h:801 [inline]
ffff0000cb1ae540 (mapping.invalidate_lock){++++}-{3:3}, at: ext4_setattr+0x804/0xb38 fs/ext4/inode.c:5545

but task is already holding lock:
ffff0000cb1ae3a0 (&ea_inode->i_rwsem#8/1){+.+.}-{3:3}, at: inode_lock include/linux/fs.h:756 [inline]
ffff0000cb1ae3a0 (&ea_inode->i_rwsem#8/1){+.+.}-{3:3}, at: do_truncate+0xf8/0x154 fs/open.c:63

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #3 (&ea_inode->i_rwsem#8/1){+.+.}-{3:3}:
down_write+0x5c/0x88 kernel/locking/rwsem.c:1562
inode_lock include/linux/fs.h:756 [inline]
ext4_xattr_inode_create fs/ext4/xattr.c:1457 [inline]
ext4_xattr_inode_lookup_create fs/ext4/xattr.c:1540 [inline]
ext4_xattr_set_entry+0xdf4/0xe40 fs/ext4/xattr.c:1669
ext4_xattr_block_set+0x3d0/0x1404 fs/ext4/xattr.c:1975
ext4_xattr_set_handle+0x724/0x9a0 fs/ext4/xattr.c:2390
ext4_xattr_set+0x104/0x1d4 fs/ext4/xattr.c:2492
ext4_xattr_trusted_set+0x4c/0x64 fs/ext4/xattr_trusted.c:38
__vfs_setxattr+0x290/0x29c fs/xattr.c:202
__vfs_setxattr_noperm+0xcc/0x320 fs/xattr.c:236
__vfs_setxattr_locked+0x16c/0x194 fs/xattr.c:297
vfs_setxattr+0xf4/0x1f4 fs/xattr.c:323
do_setxattr fs/xattr.c:608 [inline]
setxattr fs/xattr.c:631 [inline]
path_setxattr+0x32c/0x424 fs/xattr.c:650
__do_sys_setxattr fs/xattr.c:666 [inline]
__se_sys_setxattr fs/xattr.c:662 [inline]
__arm64_sys_setxattr+0x2c/0x40 fs/xattr.c:662
__invoke_syscall arch/arm64/kernel/syscall.c:38 [inline]
invoke_syscall arch/arm64/kernel/syscall.c:52 [inline]
el0_svc_common+0x138/0x220 arch/arm64/kernel/syscall.c:142
do_el0_svc+0x48/0x140 arch/arm64/kernel/syscall.c:197
el0_svc+0x58/0x150 arch/arm64/kernel/entry-common.c:637
el0t_64_sync_handler+0x84/0xf0 arch/arm64/kernel/entry-common.c:655
el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:584

-> #2 (&ei->xattr_sem){++++}-{3:3}:
down_write+0x5c/0x88 kernel/locking/rwsem.c:1562
ext4_write_lock_xattr fs/ext4/xattr.h:155 [inline]
ext4_xattr_set_handle+0xd0/0x9a0 fs/ext4/xattr.c:2305
ext4_initxattrs+0x50/0xac fs/ext4/xattr_security.c:44
security_inode_init_security+0x208/0x278 security/security.c:1147
ext4_init_security+0x44/0x58 fs/ext4/xattr_security.c:58
__ext4_new_inode+0x1a88/0x1c78 fs/ext4/ialloc.c:1324
ext4_create+0x124/0x26c fs/ext4/namei.c:2809
lookup_open fs/namei.c:3413 [inline]
open_last_lookups fs/namei.c:3481 [inline]
path_openat+0x804/0x11f0 fs/namei.c:3711
do_filp_open+0xdc/0x1b8 fs/namei.c:3741
do_sys_openat2+0xb8/0x22c fs/open.c:1310
do_sys_open fs/open.c:1326 [inline]
__do_sys_openat fs/open.c:1342 [inline]
__se_sys_openat fs/open.c:1337 [inline]
__arm64_sys_openat+0xb0/0xe0 fs/open.c:1337
__invoke_syscall arch/arm64/kernel/syscall.c:38 [inline]
invoke_syscall arch/arm64/kernel/syscall.c:52 [inline]
el0_svc_common+0x138/0x220 arch/arm64/kernel/syscall.c:142
do_el0_svc+0x48/0x140 arch/arm64/kernel/syscall.c:197
el0_svc+0x58/0x150 arch/arm64/kernel/entry-common.c:637
el0t_64_sync_handler+0x84/0xf0 arch/arm64/kernel/entry-common.c:655
el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:584

-> #1 (jbd2_handle){.+.+}-{0:0}:
start_this_handle+0x6e0/0x7fc fs/jbd2/transaction.c:463
jbd2__journal_start+0x148/0x1f0 fs/jbd2/transaction.c:520
__ext4_journal_start_sb+0x158/0x210 fs/ext4/ext4_jbd2.c:111
__ext4_journal_start fs/ext4/ext4_jbd2.h:326 [inline]
ext4_truncate+0x3b4/0x72c fs/ext4/inode.c:4315
ext4_setattr+0x900/0xb38 fs/ext4/inode.c:5622
notify_change+0x738/0x7d0 fs/attr.c:482
do_truncate+0x10c/0x154 fs/open.c:65
do_sys_ftruncate+0x1b4/0x1c4 fs/open.c:193
__do_sys_ftruncate fs/open.c:204 [inline]
__se_sys_ftruncate fs/open.c:202 [inline]
__arm64_sys_ftruncate+0x28/0x38 fs/open.c:202
__invoke_syscall arch/arm64/kernel/syscall.c:38 [inline]
invoke_syscall arch/arm64/kernel/syscall.c:52 [inline]
el0_svc_common+0x138/0x220 arch/arm64/kernel/syscall.c:142
do_el0_svc+0x48/0x140 arch/arm64/kernel/syscall.c:197
el0_svc+0x58/0x150 arch/arm64/kernel/entry-common.c:637
el0t_64_sync_handler+0x84/0xf0 arch/arm64/kernel/entry-common.c:655
el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:584

-> #0 (mapping.invalidate_lock){++++}-{3:3}:
check_prev_add kernel/locking/lockdep.c:3097 [inline]
check_prevs_add kernel/locking/lockdep.c:3216 [inline]
validate_chain kernel/locking/lockdep.c:3831 [inline]
__lock_acquire+0x1530/0x3084 kernel/locking/lockdep.c:5055
lock_acquire+0x100/0x1f8 kernel/locking/lockdep.c:5668
down_write+0x5c/0x88 kernel/locking/rwsem.c:1562
filemap_invalidate_lock include/linux/fs.h:801 [inline]
ext4_setattr+0x804/0xb38 fs/ext4/inode.c:5545
notify_change+0x738/0x7d0 fs/attr.c:482
do_truncate+0x10c/0x154 fs/open.c:65
handle_truncate+0xf4/0x154 fs/namei.c:3216
do_open fs/namei.c:3561 [inline]
path_openat+0xee0/0x11f0 fs/namei.c:3714
do_filp_open+0xdc/0x1b8 fs/namei.c:3741
do_sys_openat2+0xb8/0x22c fs/open.c:1310
do_sys_open fs/open.c:1326 [inline]
__do_sys_openat fs/open.c:1342 [inline]
__se_sys_openat fs/open.c:1337 [inline]
__arm64_sys_openat+0xb0/0xe0 fs/open.c:1337
__invoke_syscall arch/arm64/kernel/syscall.c:38 [inline]
invoke_syscall arch/arm64/kernel/syscall.c:52 [inline]
el0_svc_common+0x138/0x220 arch/arm64/kernel/syscall.c:142
do_el0_svc+0x48/0x140 arch/arm64/kernel/syscall.c:197
el0_svc+0x58/0x150 arch/arm64/kernel/entry-common.c:637
el0t_64_sync_handler+0x84/0xf0 arch/arm64/kernel/entry-common.c:655
el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:584

other info that might help us debug this:

Chain exists of:
mapping.invalidate_lock --> &ei->xattr_sem --> &ea_inode->i_rwsem#8/1

Possible unsafe locking scenario:

CPU0 CPU1
---- ----
lock(&ea_inode->i_rwsem#8/1);
lock(&ei->xattr_sem);
lock(&ea_inode->i_rwsem#8/1);
lock(mapping.invalidate_lock);

*** DEADLOCK ***

2 locks held by syz-executor184/4449:
#0: ffff0000c6723460 (sb_writers#3){.+.+}-{0:0}, at: mnt_want_write+0x20/0x64 fs/namespace.c:508
#1: ffff0000cb1ae3a0 (&ea_inode->i_rwsem#8/1){+.+.}-{3:3}, at: inode_lock include/linux/fs.h:756 [inline]
#1: ffff0000cb1ae3a0 (&ea_inode->i_rwsem#8/1){+.+.}-{3:3}, at: do_truncate+0xf8/0x154 fs/open.c:63

stack backtrace:
CPU: 1 PID: 4449 Comm: syz-executor184 Not tainted 6.2.0-rc3-syzkaller-16369-g358a161a6a9e #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/26/2022
Call trace:
dump_backtrace+0x1c4/0x1f0 arch/arm64/kernel/stacktrace.c:156
show_stack+0x2c/0x3c arch/arm64/kernel/stacktrace.c:163
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x104/0x16c lib/dump_stack.c:106
dump_stack+0x1c/0x58 lib/dump_stack.c:113
print_circular_bug+0x2c4/0x2c8 kernel/locking/lockdep.c:2055
check_noncircular+0x14c/0x154 kernel/locking/lockdep.c:2177
check_prev_add kernel/locking/lockdep.c:3097 [inline]
check_prevs_add kernel/locking/lockdep.c:3216 [inline]
validate_chain kernel/locking/lockdep.c:3831 [inline]
__lock_acquire+0x1530/0x3084 kernel/locking/lockdep.c:5055
lock_acquire+0x100/0x1f8 kernel/locking/lockdep.c:5668
down_write+0x5c/0x88 kernel/locking/rwsem.c:1562
filemap_invalidate_lock include/linux/fs.h:801 [inline]
ext4_setattr+0x804/0xb38 fs/ext4/inode.c:5545
notify_change+0x738/0x7d0 fs/attr.c:482
do_truncate+0x10c/0x154 fs/open.c:65
handle_truncate+0xf4/0x154 fs/namei.c:3216
do_open fs/namei.c:3561 [inline]
path_openat+0xee0/0x11f0 fs/namei.c:3714
do_filp_open+0xdc/0x1b8 fs/namei.c:3741
do_sys_openat2+0xb8/0x22c fs/open.c:1310
do_sys_open fs/open.c:1326 [inline]
__do_sys_openat fs/open.c:1342 [inline]
__se_sys_openat fs/open.c:1337 [inline]
__arm64_sys_openat+0xb0/0xe0 fs/open.c:1337
__invoke_syscall arch/arm64/kernel/syscall.c:38 [inline]
invoke_syscall arch/arm64/kernel/syscall.c:52 [inline]
el0_svc_common+0x138/0x220 arch/arm64/kernel/syscall.c:142
do_el0_svc+0x48/0x140 arch/arm64/kernel/syscall.c:197
el0_svc+0x58/0x150 arch/arm64/kernel/entry-common.c:637
el0t_64_sync_handler+0x84/0xf0 arch/arm64/kernel/entry-common.c:655
el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:584


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at [email protected].

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
syzbot can test patches for this issue, for details see:
https://goo.gl/tpsmEJ#testing-patches


2023-05-14 21:13:53

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [syzbot] [ext4?] possible deadlock in ext4_setattr

On Sun, May 14, 2023 at 12:24:32PM -0700, syzbot wrote:
> syzbot has bisected this issue to:
>
> commit a3c06ae158dd6fa8336157c31d9234689d068d02
> Author: Parav Pandit <[email protected]>
> Date: Tue Jan 5 10:32:03 2021 +0000
>
> vdpa_sim_net: Add support for user supported devices
>
> bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=16e372c6280000
> start commit: e922ba281a8d Add linux-next specific files for 20230512
> git tree: linux-next
> final oops: https://syzkaller.appspot.com/x/report.txt?x=15e372c6280000
> console output: https://syzkaller.appspot.com/x/log.txt?x=11e372c6280000
> kernel config: https://syzkaller.appspot.com/x/.config?x=17a4c2d44484b62f
> dashboard link: https://syzkaller.appspot.com/bug?extid=cbb68193bdb95af4340a
> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=172a21c6280000
> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=16b67fc6280000
>
> Reported-by: [email protected]
> Fixes: a3c06ae158dd ("vdpa_sim_net: Add support for user supported devices")
>
> For information about bisection process see: https://goo.gl/tpsmEJ#bisection

I don't see how this can be related, I don't think the test setup uses
vdpa sim at all.

--
MST


2023-05-15 00:59:25

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [syzbot] [ext4?] possible deadlock in ext4_setattr

On Sun, May 14, 2023 at 04:39:36PM -0400, Michael S. Tsirkin wrote:
> On Sun, May 14, 2023 at 12:24:32PM -0700, syzbot wrote:
> > syzbot has bisected this issue to:
> >
> > commit a3c06ae158dd6fa8336157c31d9234689d068d02
> > Author: Parav Pandit <[email protected]>
> > Date: Tue Jan 5 10:32:03 2021 +0000
> >
> > vdpa_sim_net: Add support for user supported devices
> >
> > For information about bisection process see: https://goo.gl/tpsmEJ#bisection
>
> I don't see how this can be related, I don't think the test setup uses
> vdpa sim at all.

Yeah, it's totally bogus. You can see it by looking at the bisection
log[1].

[1] https://syzkaller.appspot.com/text?tag=Log&x=16e372c6280000

The initial bisection logs make it clear that it is *trivially* easy
to reproduce, and that the failure signature is:

crashed: possible deadlock in {ext4_xattr_set_handle,ext4_setattr,ext4_xattr_set_handle}

However, somewhere in the bisection, we start seeing this:

run #0: boot failed: WARNING in kvm_wait
run #1: boot failed: WARNING in kvm_wait
run #2: OK
run #3: OK
run #4: OK
run #5: OK
run #6: OK
run #7: OK
run #8: OK
run #9: OK

This is a completely different failure signature, and the "WARNING in
kvm_wait" should be ignored, and this should be considered a "git
bisect good". However, the syzkaller bisection doesn't understand
this, and so it gets sent down the primrose path. :-(

Unfortunately, there isn't a good way to tell the syzkaller bisection,
"go home, your drunk", and to tell it to try again, perhaps with a
human-supplied discriminator of what should be considered a valid
failure signature.

In the ideal world, a human should be able to give it a setup set of
bisection "good" and "bad" commits, along with a regexp for what a
failure should look like, and perhaps with a hint of how many tries it
should use before assuming that a bisection result is "good", and to
teach it to assume that the bisection has "bad" after seeing a single
failure signature.

If the above is too much to ask, in the slightly-less-ideal world,
there would be a way to give "#syz test" a count argument, so we could
have syzkaller try N times, for when we need to do a manual bisection
using "#syz test"....

- Ted

2023-05-15 01:19:45

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [syzbot] [ext4?] possible deadlock in ext4_setattr

On Sun, May 14, 2023 at 08:53:21PM -0400, Theodore Ts'o wrote:
> However, somewhere in the bisection, we start seeing this:
>
> run #0: boot failed: WARNING in kvm_wait
...
>
> This is a completely different failure signature, and the "WARNING in
> kvm_wait" should be ignored, and this should be considered a "git
> bisect good". However, the syzkaller bisection doesn't understand
> this, and so it gets sent down the primrose path. :-(

Sorry, I spoke too soon. It did get this right; it looks like it
ignores "boot failed: " messages, although it treats is as one of the
10 tries for a particular commit.

I don't see anything obviously wrong with the bisect log, although
obviously I'd want to retry some of the "All OK" results to see if
somehow things got confused. In any case, there have been a large
number of timees where the bisection results have been less than
correct, and unfortunately, there's not much that can be done other
than just to ignore them. It would be nice to have a human being be
able to mark the bisection as Obviously Wrong(tm), and maybe ask it do
do a slightly different bisection.

Also unfortunate is that I've had more than one case where the problem
doesn't reproduce at all using KVM, but only reproduces using #syz
test. Which means that manual bisection may be the only way for now
to work some of these.

- Ted