2024-01-06 02:48:41

by Hillf Danton

[permalink] [raw]
Subject: Re: [syzbot] [ext4?] INFO: task hung in ext4_quota_write

On Mon, 01 Jan 2024 04:06:21 -0800
> HEAD commit: f5837722ffec Merge tag 'mm-hotfixes-stable-2023-12-27-15-0..
> git tree: upstream
> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=13177855e80000

#syz test https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master

--- x/include/linux/sched.h
+++ y/include/linux/sched.h
@@ -1544,6 +1544,7 @@ struct task_struct {
struct user_event_mm *user_event_mm;
#endif

+ unsigned long bfl;
/*
* New fields for task_struct should be added above here, so that
* they are included in the randomized portion of task_struct.
--- x/include/linux/buffer_head.h
+++ y/include/linux/buffer_head.h
@@ -78,6 +78,7 @@ struct buffer_head {
spinlock_t b_uptodate_lock; /* Used by the first bh in a page, to
* serialise IO completion of other
* buffers in the page */
+ struct task_struct *lko;
};

/*
@@ -402,6 +403,9 @@ static inline void lock_buffer(struct bu
might_sleep();
if (!trylock_buffer(bh))
__lock_buffer(bh);
+ bh->lko = current;
+ get_task_struct(bh->lko);
+ bh->lko->bfl = (unsigned long) bh;
}

static inline void bh_readahead(struct buffer_head *bh, blk_opf_t op_flags)
--- x/fs/ext4/super.c
+++ y/fs/ext4/super.c
@@ -7248,6 +7248,7 @@ static ssize_t ext4_quota_write(struct s
brelse(bh);
return err;
}
+ BUG_ON(current->bfl == (unsigned long) bh);
lock_buffer(bh);
memcpy(bh->b_data+offset, data, len);
flush_dcache_page(bh->b_page);
--- x/fs/buffer.c
+++ y/fs/buffer.c
@@ -77,6 +77,11 @@ void unlock_buffer(struct buffer_head *b
clear_bit_unlock(BH_Lock, &bh->b_state);
smp_mb__after_atomic();
wake_up_bit(&bh->b_state, BH_Lock);
+ if (!bh->lko)
+ return;
+ bh->lko->bfl = 0;
+ put_task_struct(bh->lko);
+ bh->lko = NULL;
}
EXPORT_SYMBOL(unlock_buffer);

--


2024-01-09 18:17:21

by syzbot

[permalink] [raw]
Subject: Re: [syzbot] [ext4?] INFO: task hung in ext4_quota_write

Hello,

syzbot has tested the proposed patch but the reproducer is still triggering an issue:
kernel BUG in ext4_quota_write

EXT4-fs error (device loop0) in ext4_process_orphan:347: Corrupt filesystem
EXT4-fs (loop0): 1 truncate cleaned up
EXT4-fs (loop0): mounted filesystem 00000000-0000-0000-0000-000000000000 r/w without journal. Quota mode: writeback.
ext4 filesystem being mounted at /root/syzkaller-testdir1916097639/syzkaller.TbSsym/0/file1 supports timestamps until 2038-01-19 (0x7fffffff)
------------[ cut here ]------------
kernel BUG at fs/ext4/super.c:7251!
invalid opcode: 0000 [#1] PREEMPT SMP KASAN
CPU: 1 PID: 5480 Comm: syz-executor.0 Not tainted 6.7.0-rc8-syzkaller-00159-ga4ab2706bb12-dirty #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/17/2023
RIP: 0010:ext4_quota_write+0x6e5/0x6f0 fs/ext4/super.c:7251
Code: f9 ff ff e8 8d 37 39 ff 48 c7 c7 00 16 af 8d 4c 89 e6 48 89 da e8 7b 2e 68 02 e9 38 fa ff ff e8 21 27 c3 08 e8 6c 37 39 ff 90 <0f> 0b 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 53 48 89 fb e8 53 37
RSP: 0018:ffffc9000547ee00 EFLAGS: 00010293
RAX: ffffffff82554284 RBX: ffff8880739ac690 RCX: ffff88801bfd0000
RDX: 0000000000000000 RSI: ffff8880739ac690 RDI: ffff8880739ac690
RBP: ffffc9000547eef0 R08: ffffffff82553f4d R09: 0000000000000001
R10: dffffc0000000000 R11: ffffed100e7358d3 R12: ffff8880739ac690
R13: 0000000000000001 R14: dffffc0000000000 R15: ffff8880739ac690
FS: 00007f079ddca6c0(0000) GS:ffff8880b9900000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055636d85ffc8 CR3: 0000000028711000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
qtree_write_dquot+0x243/0x530 fs/quota/quota_tree.c:431
v2_write_dquot+0x120/0x190 fs/quota/quota_v2.c:358
dquot_commit+0x3c4/0x520 fs/quota/dquot.c:512
ext4_write_dquot+0x1f2/0x2c0 fs/ext4/super.c:6877
mark_dquot_dirty fs/quota/dquot.c:372 [inline]
mark_all_dquot_dirty fs/quota/dquot.c:410 [inline]
dquot_alloc_inode+0x69f/0xb70 fs/quota/dquot.c:1780
ext4_xattr_inode_alloc_quota fs/ext4/xattr.c:932 [inline]
ext4_xattr_set_entry+0xaf3/0x3fc0 fs/ext4/xattr.c:1715
ext4_xattr_block_set+0x73f/0x3680 fs/ext4/xattr.c:1970
ext4_xattr_set_handle+0xcdf/0x1570 fs/ext4/xattr.c:2456
ext4_xattr_set+0x241/0x3d0 fs/ext4/xattr.c:2558
__vfs_setxattr+0x460/0x4a0 fs/xattr.c:201
__vfs_setxattr_noperm+0x12e/0x5e0 fs/xattr.c:235
vfs_setxattr+0x221/0x420 fs/xattr.c:322
do_setxattr fs/xattr.c:630 [inline]
setxattr+0x25d/0x2f0 fs/xattr.c:653
path_setxattr+0x1c0/0x2a0 fs/xattr.c:672
__do_sys_setxattr fs/xattr.c:688 [inline]
__se_sys_setxattr fs/xattr.c:684 [inline]
__x64_sys_setxattr+0xbb/0xd0 fs/xattr.c:684
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0x45/0x110 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x63/0x6b
RIP: 0033:0x7f079d07cce9
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 e1 20 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f079ddca0c8 EFLAGS: 00000246 ORIG_RAX: 00000000000000bc
RAX: ffffffffffffffda RBX: 00007f079d19bf80 RCX: 00007f079d07cce9
RDX: 0000000020000380 RSI: 0000000020000340 RDI: 00000000200002c0
RBP: 00007f079d0c947a R08: 0000000000000000 R09: 0000000000000000
R10: 000000000000ffed R11: 0000000000000246 R12: 0000000000000000
R13: 000000000000000b R14: 00007f079d19bf80 R15: 00007fff6aba59d8
</TASK>
Modules linked in:
---[ end trace 0000000000000000 ]---
RIP: 0010:ext4_quota_write+0x6e5/0x6f0 fs/ext4/super.c:7251
Code: f9 ff ff e8 8d 37 39 ff 48 c7 c7 00 16 af 8d 4c 89 e6 48 89 da e8 7b 2e 68 02 e9 38 fa ff ff e8 21 27 c3 08 e8 6c 37 39 ff 90 <0f> 0b 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 53 48 89 fb e8 53 37
RSP: 0018:ffffc9000547ee00 EFLAGS: 00010293
RAX: ffffffff82554284 RBX: ffff8880739ac690 RCX: ffff88801bfd0000
RDX: 0000000000000000 RSI: ffff8880739ac690 RDI: ffff8880739ac690
RBP: ffffc9000547eef0 R08: ffffffff82553f4d R09: 0000000000000001
R10: dffffc0000000000 R11: ffffed100e7358d3 R12: ffff8880739ac690
R13: 0000000000000001 R14: dffffc0000000000 R15: ffff8880739ac690
FS: 00007f079ddca6c0(0000) GS:ffff8880b9900000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055636d85ffc8 CR3: 0000000028711000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400


Tested on:

commit: a4ab2706 Merge tag 'firewire-fixes-6.7-final' of git:/..
git tree: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
console output: https://syzkaller.appspot.com/x/log.txt?x=15f50a09e80000
kernel config: https://syzkaller.appspot.com/x/.config?x=655f8abe9fe69b3b
dashboard link: https://syzkaller.appspot.com/bug?extid=a43d4f48b8397d0e41a9
compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
patch: https://syzkaller.appspot.com/x/patch.diff?x=15123275e80000


2024-01-10 11:13:59

by Hillf Danton

[permalink] [raw]
Subject: Re: [syzbot] [ext4?] INFO: task hung in ext4_quota_write

On Tue, 09 Jan 2024 10:17:07 -0800
> Hello,
>
> syzbot has tested the proposed patch but the reproducer is still triggering an issue:
> kernel BUG in ext4_quota_write
>
> EXT4-fs error (device loop0) in ext4_process_orphan:347: Corrupt filesystem
> EXT4-fs (loop0): 1 truncate cleaned up
> EXT4-fs (loop0): mounted filesystem 00000000-0000-0000-0000-000000000000 r/w without journal. Quota mode: writeback.
> ext4 filesystem being mounted at /root/syzkaller-testdir1916097639/syzkaller.TbSsym/0/file1 supports timestamps until 2038-01-19 (0x7fffffff)
> ------------[ cut here ]------------
> kernel BUG at fs/ext4/super.c:7251!

Given the BUG_ON in the debug patch tested, could deadlock be the reason
behind the trigger instead of IO in flight? Or is it due to corrupted
filesystem at the first place?

> invalid opcode: 0000 [#1] PREEMPT SMP KASAN
> CPU: 1 PID: 5480 Comm: syz-executor.0 Not tainted 6.7.0-rc8-syzkaller-00159-ga4ab2706bb12-dirty #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/17/2023
> RIP: 0010:ext4_quota_write+0x6e5/0x6f0 fs/ext4/super.c:7251
> Code: f9 ff ff e8 8d 37 39 ff 48 c7 c7 00 16 af 8d 4c 89 e6 48 89 da e8 7b 2e 68 02 e9 38 fa ff ff e8 21 27 c3 08 e8 6c 37 39 ff 90 <0f> 0b 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 53 48 89 fb e8 53 37
> RSP: 0018:ffffc9000547ee00 EFLAGS: 00010293
> RAX: ffffffff82554284 RBX: ffff8880739ac690 RCX: ffff88801bfd0000
> RDX: 0000000000000000 RSI: ffff8880739ac690 RDI: ffff8880739ac690
> RBP: ffffc9000547eef0 R08: ffffffff82553f4d R09: 0000000000000001
> R10: dffffc0000000000 R11: ffffed100e7358d3 R12: ffff8880739ac690
> R13: 0000000000000001 R14: dffffc0000000000 R15: ffff8880739ac690
> FS: 00007f079ddca6c0(0000) GS:ffff8880b9900000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 000055636d85ffc8 CR3: 0000000028711000 CR4: 00000000003506f0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
> <TASK>
> qtree_write_dquot+0x243/0x530 fs/quota/quota_tree.c:431
> v2_write_dquot+0x120/0x190 fs/quota/quota_v2.c:358
> dquot_commit+0x3c4/0x520 fs/quota/dquot.c:512
> ext4_write_dquot+0x1f2/0x2c0 fs/ext4/super.c:6877
> mark_dquot_dirty fs/quota/dquot.c:372 [inline]
> mark_all_dquot_dirty fs/quota/dquot.c:410 [inline]
> dquot_alloc_inode+0x69f/0xb70 fs/quota/dquot.c:1780
> ext4_xattr_inode_alloc_quota fs/ext4/xattr.c:932 [inline]
> ext4_xattr_set_entry+0xaf3/0x3fc0 fs/ext4/xattr.c:1715
> ext4_xattr_block_set+0x73f/0x3680 fs/ext4/xattr.c:1970
> ext4_xattr_set_handle+0xcdf/0x1570 fs/ext4/xattr.c:2456
> ext4_xattr_set+0x241/0x3d0 fs/ext4/xattr.c:2558
> __vfs_setxattr+0x460/0x4a0 fs/xattr.c:201
> __vfs_setxattr_noperm+0x12e/0x5e0 fs/xattr.c:235
> vfs_setxattr+0x221/0x420 fs/xattr.c:322
> do_setxattr fs/xattr.c:630 [inline]
> setxattr+0x25d/0x2f0 fs/xattr.c:653
> path_setxattr+0x1c0/0x2a0 fs/xattr.c:672
> __do_sys_setxattr fs/xattr.c:688 [inline]
> __se_sys_setxattr fs/xattr.c:684 [inline]
> __x64_sys_setxattr+0xbb/0xd0 fs/xattr.c:684
> do_syscall_x64 arch/x86/entry/common.c:52 [inline]
> do_syscall_64+0x45/0x110 arch/x86/entry/common.c:83
> entry_SYSCALL_64_after_hwframe+0x63/0x6b
>
[...]
>
> Tested on:
>
> commit: a4ab2706 Merge tag 'firewire-fixes-6.7-final' of git:/..
> git tree: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
> console output: https://syzkaller.appspot.com/x/log.txt?x=15f50a09e80000
> kernel config: https://syzkaller.appspot.com/x/.config?x=655f8abe9fe69b3b
> dashboard link: https://syzkaller.appspot.com/bug?extid=a43d4f48b8397d0e41a9
> compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
> patch: https://syzkaller.appspot.com/x/patch.diff?x=15123275e80000

2024-01-10 11:40:19

by Jan Kara

[permalink] [raw]
Subject: Re: [syzbot] [ext4?] INFO: task hung in ext4_quota_write

On Wed 10-01-24 19:12:59, Hillf Danton wrote:
> On Tue, 09 Jan 2024 10:17:07 -0800
> > Hello,
> >
> > syzbot has tested the proposed patch but the reproducer is still triggering an issue:
> > kernel BUG in ext4_quota_write
> >
> > EXT4-fs error (device loop0) in ext4_process_orphan:347: Corrupt filesystem
> > EXT4-fs (loop0): 1 truncate cleaned up
> > EXT4-fs (loop0): mounted filesystem 00000000-0000-0000-0000-000000000000 r/w without journal. Quota mode: writeback.
> > ext4 filesystem being mounted at /root/syzkaller-testdir1916097639/syzkaller.TbSsym/0/file1 supports timestamps until 2038-01-19 (0x7fffffff)
> > ------------[ cut here ]------------
> > kernel BUG at fs/ext4/super.c:7251!
>
> Given the BUG_ON in the debug patch tested, could deadlock be the reason
> behind the trigger instead of IO in flight? Or is it due to corrupted
> filesystem at the first place?

Thanks for the investigation! Based on your test results as well as on
results by Edward Adam Davis <[email protected]> I'd say syzbot has created a
cycle in the quota tree or something like that. Sadly the fs image provided
by syzbot is corrupted to the extent that e2fsprogs refuse to touch it so
I'll have to check manually why the kernel is mounting this image or what's
going on with the reproducer...

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR