Hello,
syzbot found the following issue on:
HEAD commit: fe46a7dd189e Merge tag 'sound-6.9-rc1' of git://git.kernel..
git tree: upstream
console+strace: https://syzkaller.appspot.com/x/log.txt?x=136eb2f6180000
kernel config: https://syzkaller.appspot.com/x/.config?x=4d90a36f0cab495a
dashboard link: https://syzkaller.appspot.com/bug?extid=9a5b0ced8b1bfb238b56
compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=17f1d93d180000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=15c38139180000
Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/72ab73815344/disk-fe46a7dd.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/2d6d6b0d7071/vmlinux-fe46a7dd.xz
kernel image: https://storage.googleapis.com/syzbot-assets/48e275e5478b/bzImage-fe46a7dd.xz
IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: [email protected]
======================================================
WARNING: possible circular locking dependency detected
6.8.0-syzkaller-08951-gfe46a7dd189e #0 Not tainted
------------------------------------------------------
syz-executor250/5062 is trying to acquire lock:
ffff888022c36888 (&of->mutex){+.+.}-{3:3}, at: kernfs_fop_llseek+0x7e/0x2a0 fs/kernfs/file.c:867
but task is already holding lock:
ffff88807edeadd8 (&ovl_i_lock_key[depth]){+.+.}-{3:3}, at: ovl_inode_lock fs/overlayfs/overlayfs.h:649 [inline]
ffff88807edeadd8 (&ovl_i_lock_key[depth]){+.+.}-{3:3}, at: ovl_llseek+0x26b/0x470 fs/overlayfs/file.c:214
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #2 (&ovl_i_lock_key[depth]){+.+.}-{3:3}:
lock_acquire+0x1e4/0x530 kernel/locking/lockdep.c:5754
__mutex_lock_common kernel/locking/mutex.c:608 [inline]
__mutex_lock+0x136/0xd70 kernel/locking/mutex.c:752
ovl_inode_lock_interruptible fs/overlayfs/overlayfs.h:654 [inline]
ovl_nlink_start+0xdc/0x390 fs/overlayfs/util.c:1162
ovl_do_remove+0x1fa/0xd90 fs/overlayfs/dir.c:893
vfs_rmdir+0x367/0x4c0 fs/namei.c:4209
do_rmdir+0x3b5/0x580 fs/namei.c:4268
__do_sys_rmdir fs/namei.c:4287 [inline]
__se_sys_rmdir fs/namei.c:4285 [inline]
__x64_sys_rmdir+0x49/0x60 fs/namei.c:4285
do_syscall_64+0xfb/0x240
entry_SYSCALL_64_after_hwframe+0x6d/0x75
-> #1 (&ovl_i_mutex_dir_key[depth]){++++}-{3:3}:
lock_acquire+0x1e4/0x530 kernel/locking/lockdep.c:5754
down_read+0xb1/0xa40 kernel/locking/rwsem.c:1526
inode_lock_shared include/linux/fs.h:803 [inline]
lookup_slow+0x45/0x70 fs/namei.c:1708
walk_component+0x2e1/0x410 fs/namei.c:2004
lookup_last fs/namei.c:2461 [inline]
path_lookupat+0x16f/0x450 fs/namei.c:2485
filename_lookup+0x256/0x610 fs/namei.c:2514
kern_path+0x35/0x50 fs/namei.c:2622
lookup_bdev+0xc5/0x290 block/bdev.c:1072
resume_store+0x1a0/0x710 kernel/power/hibernate.c:1235
kernfs_fop_write_iter+0x3a4/0x500 fs/kernfs/file.c:334
call_write_iter include/linux/fs.h:2108 [inline]
new_sync_write fs/read_write.c:497 [inline]
vfs_write+0xa84/0xcb0 fs/read_write.c:590
ksys_write+0x1a0/0x2c0 fs/read_write.c:643
do_syscall_64+0xfb/0x240
entry_SYSCALL_64_after_hwframe+0x6d/0x75
-> #0 (&of->mutex){+.+.}-{3:3}:
check_prev_add kernel/locking/lockdep.c:3134 [inline]
check_prevs_add kernel/locking/lockdep.c:3253 [inline]
validate_chain+0x18cb/0x58e0 kernel/locking/lockdep.c:3869
__lock_acquire+0x1346/0x1fd0 kernel/locking/lockdep.c:5137
lock_acquire+0x1e4/0x530 kernel/locking/lockdep.c:5754
__mutex_lock_common kernel/locking/mutex.c:608 [inline]
__mutex_lock+0x136/0xd70 kernel/locking/mutex.c:752
kernfs_fop_llseek+0x7e/0x2a0 fs/kernfs/file.c:867
ovl_llseek+0x314/0x470 fs/overlayfs/file.c:218
vfs_llseek fs/read_write.c:289 [inline]
ksys_lseek fs/read_write.c:302 [inline]
__do_sys_lseek fs/read_write.c:313 [inline]
__se_sys_lseek fs/read_write.c:311 [inline]
__x64_sys_lseek+0x153/0x1e0 fs/read_write.c:311
do_syscall_64+0xfb/0x240
entry_SYSCALL_64_after_hwframe+0x6d/0x75
other info that might help us debug this:
Chain exists of:
&of->mutex --> &ovl_i_mutex_dir_key[depth] --> &ovl_i_lock_key[depth]
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&ovl_i_lock_key[depth]);
lock(&ovl_i_mutex_dir_key[depth]);
lock(&ovl_i_lock_key[depth]);
lock(&of->mutex);
*** DEADLOCK ***
1 lock held by syz-executor250/5062:
#0: ffff88807edeadd8 (&ovl_i_lock_key[depth]){+.+.}-{3:3}, at: ovl_inode_lock fs/overlayfs/overlayfs.h:649 [inline]
#0: ffff88807edeadd8 (&ovl_i_lock_key[depth]){+.+.}-{3:3}, at: ovl_llseek+0x26b/0x470 fs/overlayfs/file.c:214
stack backtrace:
CPU: 1 PID: 5062 Comm: syz-executor250 Not tainted 6.8.0-syzkaller-08951-gfe46a7dd189e #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114
check_noncircular+0x36a/0x4a0 kernel/locking/lockdep.c:2187
check_prev_add kernel/locking/lockdep.c:3134 [inline]
check_prevs_add kernel/locking/lockdep.c:3253 [inline]
validate_chain+0x18cb/0x58e0 kernel/locking/lockdep.c:3869
__lock_acquire+0x1346/0x1fd0 kernel/locking/lockdep.c:5137
lock_acquire+0x1e4/0x530 kernel/locking/lockdep.c:5754
__mutex_lock_common kernel/locking/mutex.c:608 [inline]
__mutex_lock+0x136/0xd70 kernel/locking/mutex.c:752
kernfs_fop_llseek+0x7e/0x2a0 fs/kernfs/file.c:867
ovl_llseek+0x314/0x470 fs/overlayfs/file.c:218
vfs_llseek fs/read_write.c:289 [inline]
ksys_lseek fs/read_write.c:302 [inline]
__do_sys_lseek fs/read_write.c:313 [inline]
__se_sys_lseek fs/read_write.c:311 [inline]
__x64_sys_lseek+0x153/0x1e0 fs/read_write.c:311
do_syscall_64+0xfb/0x240
entry_SYSCALL_64_after_hwframe+0x6d/0x75
RIP: 0033:0x7f0e2bdfd219
Code: 48 83 c4 28 c3 e8 67 17 00 00 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ffd2f80f3f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000008
RAX: ffffffffffffffda RBX: 00007ffd2f80f400 RCX: 00007f0e2bdfd219
RDX: 0000000000000003 RSI: 0000000000000000 RDI: 0000000000000005
RBP: 00007ffd2f80f408 R08: 00007f0e2bdca000 R09: 00007f0e2bdca000
R10: 00007f0e2bdca000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffd2f80f668 R14: 0000000000000001 R15: 0000000000000001
---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at [email protected].
syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
If the report is already addressed, let syzbot know by replying with:
#syz fix: exact-commit-title
If you want syzbot to run the reproducer, reply with:
#syz test: git://repo/address.git branch-or-commit-hash
If you attach or paste a git patch, syzbot will apply it before testing.
If you want to overwrite report's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)
If the report is a duplicate of another one, reply with:
#syz dup: exact-subject-of-another-report
If you want to undo deduplication, reply with:
#syz undup
syzbot has bisected this issue to:
commit 0fedefd4c4e33dd24f726b13b5d7c143e2b483be
Author: Valentine Sinitsyn <[email protected]>
Date: Mon Sep 25 08:40:12 2023 +0000
kernfs: sysfs: support custom llseek method for sysfs entries
bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=17cb5e03180000
start commit: fe46a7dd189e Merge tag 'sound-6.9-rc1' of git://git.kernel..
git tree: upstream
final oops: https://syzkaller.appspot.com/x/report.txt?x=142b5e03180000
console output: https://syzkaller.appspot.com/x/log.txt?x=102b5e03180000
kernel config: https://syzkaller.appspot.com/x/.config?x=4d90a36f0cab495a
dashboard link: https://syzkaller.appspot.com/bug?extid=9a5b0ced8b1bfb238b56
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=17f1d93d180000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=15c38139180000
Reported-by: [email protected]
Fixes: 0fedefd4c4e3 ("kernfs: sysfs: support custom llseek method for sysfs entries")
For information about bisection process see: https://goo.gl/tpsmEJ#bisection
On Wed, 03 Apr 2024 11:23:26 -0700
> syzbot found the following issue on:
>
> HEAD commit: fe46a7dd189e Merge tag 'sound-6.9-rc1' of git://git.kernel..
> git tree: upstream
> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=15c38139180000
#syz test https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git fe46a7dd189e
--- x/fs/overlayfs/file.c
+++ y/fs/overlayfs/file.c
@@ -204,6 +204,7 @@ static loff_t ovl_llseek(struct file *fi
if (ret)
return ret;
+ inode_lock(inode);
/*
* Overlay file f_pos is the master copy that is preserved
* through copy up and modified on read/write, but only real
@@ -220,6 +221,7 @@ static loff_t ovl_llseek(struct file *fi
file->f_pos = real.file->f_pos;
ovl_inode_unlock(inode);
+ inode_unlock(inode);
fdput(real);
--
Hello,
syzbot has tested the proposed patch but the reproducer is still triggering an issue:
possible deadlock in kernfs_fop_llseek
======================================================
WARNING: possible circular locking dependency detected
6.8.0-syzkaller-08951-gfe46a7dd189e-dirty #0 Not tainted
------------------------------------------------------
syz-executor.0/5486 is trying to acquire lock:
ffff88807ec0b888 (&of->mutex){+.+.}-{3:3}, at: kernfs_fop_llseek+0x7e/0x2a0 fs/kernfs/file.c:867
but task is already holding lock:
ffff888050982238 (&ovl_i_lock_key[depth]){+.+.}-{3:3}, at: ovl_inode_lock fs/overlayfs/overlayfs.h:649 [inline]
ffff888050982238 (&ovl_i_lock_key[depth]){+.+.}-{3:3}, at: ovl_llseek+0x28a/0x4a0 fs/overlayfs/file.c:215
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #2 (&ovl_i_lock_key[depth]){+.+.}-{3:3}:
lock_acquire+0x1e4/0x530 kernel/locking/lockdep.c:5754
__mutex_lock_common kernel/locking/mutex.c:608 [inline]
__mutex_lock+0x136/0xd70 kernel/locking/mutex.c:752
ovl_inode_lock_interruptible fs/overlayfs/overlayfs.h:654 [inline]
ovl_nlink_start+0xdc/0x390 fs/overlayfs/util.c:1162
ovl_do_remove+0x1fa/0xd90 fs/overlayfs/dir.c:893
vfs_rmdir+0x367/0x4c0 fs/namei.c:4209
do_rmdir+0x3b5/0x580 fs/namei.c:4268
__do_sys_rmdir fs/namei.c:4287 [inline]
__se_sys_rmdir fs/namei.c:4285 [inline]
__x64_sys_rmdir+0x49/0x60 fs/namei.c:4285
do_syscall_64+0xfb/0x240
entry_SYSCALL_64_after_hwframe+0x6d/0x75
-> #1 (&ovl_i_mutex_dir_key[depth]){++++}-{3:3}:
lock_acquire+0x1e4/0x530 kernel/locking/lockdep.c:5754
down_read+0xb1/0xa40 kernel/locking/rwsem.c:1526
inode_lock_shared include/linux/fs.h:803 [inline]
lookup_slow+0x45/0x70 fs/namei.c:1708
walk_component+0x2e1/0x410 fs/namei.c:2004
lookup_last fs/namei.c:2461 [inline]
path_lookupat+0x16f/0x450 fs/namei.c:2485
filename_lookup+0x256/0x610 fs/namei.c:2514
kern_path+0x35/0x50 fs/namei.c:2622
lookup_bdev+0xc5/0x290 block/bdev.c:1072
resume_store+0x1a0/0x710 kernel/power/hibernate.c:1235
kernfs_fop_write_iter+0x3a4/0x500 fs/kernfs/file.c:334
call_write_iter include/linux/fs.h:2108 [inline]
new_sync_write fs/read_write.c:497 [inline]
vfs_write+0xa84/0xcb0 fs/read_write.c:590
ksys_write+0x1a0/0x2c0 fs/read_write.c:643
do_syscall_64+0xfb/0x240
entry_SYSCALL_64_after_hwframe+0x6d/0x75
-> #0 (&of->mutex){+.+.}-{3:3}:
check_prev_add kernel/locking/lockdep.c:3134 [inline]
check_prevs_add kernel/locking/lockdep.c:3253 [inline]
validate_chain+0x18cb/0x58e0 kernel/locking/lockdep.c:3869
__lock_acquire+0x1346/0x1fd0 kernel/locking/lockdep.c:5137
lock_acquire+0x1e4/0x530 kernel/locking/lockdep.c:5754
__mutex_lock_common kernel/locking/mutex.c:608 [inline]
__mutex_lock+0x136/0xd70 kernel/locking/mutex.c:752
kernfs_fop_llseek+0x7e/0x2a0 fs/kernfs/file.c:867
ovl_llseek+0x335/0x4a0 fs/overlayfs/file.c:219
vfs_llseek fs/read_write.c:289 [inline]
ksys_lseek fs/read_write.c:302 [inline]
__do_sys_lseek fs/read_write.c:313 [inline]
__se_sys_lseek fs/read_write.c:311 [inline]
__x64_sys_lseek+0x153/0x1e0 fs/read_write.c:311
do_syscall_64+0xfb/0x240
entry_SYSCALL_64_after_hwframe+0x6d/0x75
other info that might help us debug this:
Chain exists of:
&of->mutex --> &ovl_i_mutex_dir_key[depth] --> &ovl_i_lock_key[depth]
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&ovl_i_lock_key[depth]);
lock(&ovl_i_mutex_dir_key[depth]);
lock(&ovl_i_lock_key[depth]);
lock(&of->mutex);
*** DEADLOCK ***
3 locks held by syz-executor.0/5486:
#0: ffff8880707c8ac8 (&f->f_pos_lock){+.+.}-{3:3}, at: __fdget_pos+0x259/0x320 fs/file.c:1191
#1: ffff888050981e80 (&ovl_i_mutex_key[depth]){+.+.}-{3:3}, at: inode_lock include/linux/fs.h:793 [inline]
#1: ffff888050981e80 (&ovl_i_mutex_key[depth]){+.+.}-{3:3}, at: ovl_llseek+0x277/0x4a0 fs/overlayfs/file.c:207
#2: ffff888050982238 (&ovl_i_lock_key[depth]){+.+.}-{3:3}, at: ovl_inode_lock fs/overlayfs/overlayfs.h:649 [inline]
#2: ffff888050982238 (&ovl_i_lock_key[depth]){+.+.}-{3:3}, at: ovl_llseek+0x28a/0x4a0 fs/overlayfs/file.c:215
stack backtrace:
CPU: 0 PID: 5486 Comm: syz-executor.0 Not tainted 6.8.0-syzkaller-08951-gfe46a7dd189e-dirty #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114
check_noncircular+0x36a/0x4a0 kernel/locking/lockdep.c:2187
check_prev_add kernel/locking/lockdep.c:3134 [inline]
check_prevs_add kernel/locking/lockdep.c:3253 [inline]
validate_chain+0x18cb/0x58e0 kernel/locking/lockdep.c:3869
__lock_acquire+0x1346/0x1fd0 kernel/locking/lockdep.c:5137
lock_acquire+0x1e4/0x530 kernel/locking/lockdep.c:5754
__mutex_lock_common kernel/locking/mutex.c:608 [inline]
__mutex_lock+0x136/0xd70 kernel/locking/mutex.c:752
kernfs_fop_llseek+0x7e/0x2a0 fs/kernfs/file.c:867
ovl_llseek+0x335/0x4a0 fs/overlayfs/file.c:219
vfs_llseek fs/read_write.c:289 [inline]
ksys_lseek fs/read_write.c:302 [inline]
__do_sys_lseek fs/read_write.c:313 [inline]
__se_sys_lseek fs/read_write.c:311 [inline]
__x64_sys_lseek+0x153/0x1e0 fs/read_write.c:311
do_syscall_64+0xfb/0x240
entry_SYSCALL_64_after_hwframe+0x6d/0x75
RIP: 0033:0x7fe1f567dde9
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 e1 20 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fe1f64370c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000008
RAX: ffffffffffffffda RBX: 00007fe1f57abf80 RCX: 00007fe1f567dde9
RDX: 0000000000000003 RSI: 0000000000000000 RDI: 0000000000000005
RBP: 00007fe1f56ca47a R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 000000000000000b R14: 00007fe1f57abf80 R15: 00007ffe353f4cc8
</TASK>
Tested on:
commit: fe46a7dd Merge tag 'sound-6.9-rc1' of git://git.kernel..
git tree: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
console output: https://syzkaller.appspot.com/x/log.txt?x=104e8475180000
kernel config: https://syzkaller.appspot.com/x/.config?x=4d90a36f0cab495a
dashboard link: https://syzkaller.appspot.com/bug?extid=9a5b0ced8b1bfb238b56
compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
patch: https://syzkaller.appspot.com/x/patch.diff?x=1799f6ad180000
On Wed, 03 Apr 2024 11:23:26 -0700
> syzbot found the following issue on:
>
> HEAD commit: fe46a7dd189e Merge tag 'sound-6.9-rc1' of git://git.kernel..
> git tree: upstream
> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=15c38139180000
#syz test https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git fe46a7dd189e
--- x/fs/overlayfs/util.c
+++ y/fs/overlayfs/util.c
@@ -1159,9 +1159,8 @@ int ovl_nlink_start(struct dentry *dentr
return err;
}
- err = ovl_inode_lock_interruptible(inode);
- if (err)
- return err;
+ if (!mutex_trylock(&OVL_I(inode)->lock))
+ return -EBUSY;
err = ovl_want_write(dentry);
if (err)
--
Hello,
syzbot has tested the proposed patch and the reproducer did not trigger any issue:
Reported-and-tested-by: [email protected]
Tested on:
commit: fe46a7dd Merge tag 'sound-6.9-rc1' of git://git.kernel..
git tree: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
console output: https://syzkaller.appspot.com/x/log.txt?x=15410de3180000
kernel config: https://syzkaller.appspot.com/x/.config?x=4d90a36f0cab495a
dashboard link: https://syzkaller.appspot.com/bug?extid=9a5b0ced8b1bfb238b56
compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
patch: https://syzkaller.appspot.com/x/patch.diff?x=153a4a3d180000
Note: testing is done by a robot and is best-effort only.
On Thu, Apr 4, 2024 at 2:51 AM syzbot
<[email protected]> wrote:
>
> syzbot has bisected this issue to:
>
> commit 0fedefd4c4e33dd24f726b13b5d7c143e2b483be
> Author: Valentine Sinitsyn <[email protected]>
> Date: Mon Sep 25 08:40:12 2023 +0000
>
> kernfs: sysfs: support custom llseek method for sysfs entries
>
> bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=17cb5e03180000
> start commit: fe46a7dd189e Merge tag 'sound-6.9-rc1' of git://git.kernel..
> git tree: upstream
> final oops: https://syzkaller.appspot.com/x/report.txt?x=142b5e03180000
> console output: https://syzkaller.appspot.com/x/log.txt?x=102b5e03180000
> kernel config: https://syzkaller.appspot.com/x/.config?x=4d90a36f0cab495a
> dashboard link: https://syzkaller.appspot.com/bug?extid=9a5b0ced8b1bfb238b56
> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=17f1d93d180000
> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=15c38139180000
>
> Reported-by: [email protected]
> Fixes: 0fedefd4c4e3 ("kernfs: sysfs: support custom llseek method for sysfs entries")
>
> For information about bisection process see: https://goo.gl/tpsmEJ#bisection
>
I think this commit is only the trigger for lockdep warning in this
specific scenario, but the conceptual issue existed before that
for example, with read from sysfs, which also can take of->mutex.
I think (not sure) that the potential deadlock is real, not a false
positive. OTOH, hibernation code may be crawling with potential
and more likely deadlocks...
The conceptual issue (I think) is this:
Overlayfs is a stacked filesystem which regularly calls vfs helpers
such as path lookup on other filesystems.
This specialized behavior is accompanied with a declaration of
s_stack_depth > 0, annotating ovl inode locks per stack depth
(ovl_lockdep_annotate_inode_mutex_key) and restricting the
types of filesystems that are allowed for writable upper layer.
In the lockdep dependency chain, overlayfs inode lock is taken
before kernfs internal of->mutex, where kernfs (sysfs) is the lower
layer of overlayfs, which is sane.
With /sys/power/resume (and probably other files), sysfs also
behaves as a stacking filesystem, calling vfs helpers, such as
lookup_bdev() -> kern_path(), which is a behavior of a stacked
filesystem, without all the precautions that comes with behaving
as a stacked filesystem.
If an overlayfs path is written into /sys/power/resume and that
overlayfs has sysfs as its lower layer, then there may be a small
chance of hitting the ABBA deadlock, but there could very well
be some conditions that prevent it.
It's a shame that converting blockdev path to major:minor is
not always done as a separate step by usersapce, but not much
to do about it now...
I don't think that Christoph's refactoring to blockdev interfaces
have changed anything in this regard, but I wasn't sure so CCed
the developers involved.
A relatively easy way to avert this use case is to define a lookup
flag LOOKUP_NO_STACKED, which does not traverse into any
stacked fs (s_stack_depth > 0) and use this flag for looking up
the hibernation blockdev.
I suppose writing the hibernation blockdev from inside containers
is not a common thing to do, so this mitigation could be good enough.
Thoughts?
Amir.
On Thu, Apr 4, 2024 at 11:21 AM Al Viro <[email protected]> wrote:
>
> On Thu, Apr 04, 2024 at 09:11:22AM +0100, Al Viro wrote:
> > On Thu, Apr 04, 2024 at 09:54:35AM +0300, Amir Goldstein wrote:
> > >
> > > In the lockdep dependency chain, overlayfs inode lock is taken
> > > before kernfs internal of->mutex, where kernfs (sysfs) is the lower
> > > layer of overlayfs, which is sane.
> > >
> > > With /sys/power/resume (and probably other files), sysfs also
> > > behaves as a stacking filesystem, calling vfs helpers, such as
> > > lookup_bdev() -> kern_path(), which is a behavior of a stacked
> > > filesystem, without all the precautions that comes with behaving
> > > as a stacked filesystem.
> >
> > No. This is far worse than anything stacked filesystems do - it's
> > an arbitrary pathname resolution while holding a lock.
> > It's not local. Just about anything (including automounts, etc.)
> > can be happening there and it pushes the lock in question outside
> > of *ALL* pathwalk-related locks. Pathname doesn't have to
> > resolve to anything on overlayfs - it can just go through
> > a symlink on it, or walk into it and traverse a bunch of ..
> > afterwards, etc.
> >
> > Don't confuse that with stacking - it's not even close.
> > You can't use that anywhere near overlayfs layers.
> >
> > Maybe isolate it into a separate filesystem, to be automounted
> > on /sys/power. And make anyone playing with overlayfs with
> > sysfs as a layer mount the damn thing on top of power/ in your
> > overlayfs. But using that thing as a part of layer is
> > a non-starter.
I don't follow what you are saying.
Which code is in non-starter violation?
kernfs for calling lookup_bdev() with internal of->mutex held?
Overlayfs for allowing sysfs as a lower layer and calling
vfs_llseek(lower_sysfs_file,...) during copy up while ovl inode is held
for legit reasons (e.g. from ovl_rename())?
>
> Incidentally, why do you need to lock overlayfs inode to call
> vfs_llseek() on the underlying file? It might (or might not)
> need to lock the underlying file (for things like ->i_size,
> etc.), but that will be done by ->llseek() instance and it
> would deal with the inode in the layer, not overlayfs one.
We do not (anymore) lock ovl inode in ovl_llseek(), see:
b1f9d3858f72 ovl: use ovl_inode_lock in ovl_llseek()
but ovl inode is held in operations (e.g. ovl_rename)
which trigger copy up and call vfs_llseek() on the lower file.
>
> Similar question applies to ovl_write_iter() - why do you
> need to hold the overlayfs inode locked during the call of
> backing_file_write_iter()?
>
Not sure. This question I need to defer to Miklos.
I see in several places the pattern:
inode_lock(inode);
/* Update mode */
ovl_copyattr(inode);
ret = file_remove_privs(file);
..
/* Update size */
ovl_file_modified(file);
..
inode_unlock(inode);
so it could be related to atomic remove privs and update mtime,
but possibly we could convert all of those inode_lock() to
ovl_inode_lock() (i.e. internal lock below vfs inode lock).
[...]
> Consider the scenario when unlink() is called on that sucker
> during the write() that triggers that pathwalk. We have
>
> unlink: blocked on overlayfs inode of file, while holding the parent directory.
> write: holding the overlayfs inode of file and trying to resolve a pathname
> that contains .../power/suspend_stats/../../...; blocked on attempt to lock
> parent so we could do a lookup in it.
This specifically cannot happen because sysfs is not allowed as an
upper layer only as a lower layer, so overlayfs itself will not be writing to
/sys/power/resume.
>
> No llseek involved anywhere, kernfs of->mutex held, but not contended,
> deadlock purely on ->i_rwsem of overlayfs inodes.
>
> Holding overlayfs inode locked during the call of lookup_bdev() is really
> no-go.
Yes. I see that, but how can this be resolved?
If the specific file /sys/power/resume will not have FMODE_LSEEK,
then ovl_copy_up_file() will not try to skip_hole on it at all and the
llseek lock dependency will be averted.
The question is whether opt-out of FMODE_LSEEK for /sys/power/resume
will break anything or if we should mark seq files in another way so that
overlayfs would not try to seek_hole on any of them categorically.
Thanks,
Amir.
On Thu, Apr 04, 2024 at 09:21:10AM +0100, Al Viro wrote:
> Similar question applies to ovl_write_iter() - why do you
> need to hold the overlayfs inode locked during the call of
> backing_file_write_iter()?
Consider the scenario when unlink() is called on that sucker
during the write() that triggers that pathwalk. We have
unlink: blocked on overlayfs inode of file, while holding the parent directory.
write: holding the overlayfs inode of file and trying to resolve a pathname
that contains .../power/suspend_stats/../../...; blocked on attempt to lock
parent so we could do a lookup in it.
No llseek involved anywhere, kernfs of->mutex held, but not contended,
deadlock purely on ->i_rwsem of overlayfs inodes.
Holding overlayfs inode locked during the call of lookup_bdev() is really
no-go.
On Thu, Apr 04, 2024 at 09:54:35AM +0300, Amir Goldstein wrote:
>
> In the lockdep dependency chain, overlayfs inode lock is taken
> before kernfs internal of->mutex, where kernfs (sysfs) is the lower
> layer of overlayfs, which is sane.
>
> With /sys/power/resume (and probably other files), sysfs also
> behaves as a stacking filesystem, calling vfs helpers, such as
> lookup_bdev() -> kern_path(), which is a behavior of a stacked
> filesystem, without all the precautions that comes with behaving
> as a stacked filesystem.
No. This is far worse than anything stacked filesystems do - it's
an arbitrary pathname resolution while holding a lock.
It's not local. Just about anything (including automounts, etc.)
can be happening there and it pushes the lock in question outside
of *ALL* pathwalk-related locks. Pathname doesn't have to
resolve to anything on overlayfs - it can just go through
a symlink on it, or walk into it and traverse a bunch of ..
afterwards, etc.
Don't confuse that with stacking - it's not even close.
You can't use that anywhere near overlayfs layers.
Maybe isolate it into a separate filesystem, to be automounted
on /sys/power. And make anyone playing with overlayfs with
sysfs as a layer mount the damn thing on top of power/ in your
overlayfs. But using that thing as a part of layer is
a non-starter.
On Thu, Apr 04, 2024 at 09:11:22AM +0100, Al Viro wrote:
> On Thu, Apr 04, 2024 at 09:54:35AM +0300, Amir Goldstein wrote:
> >
> > In the lockdep dependency chain, overlayfs inode lock is taken
> > before kernfs internal of->mutex, where kernfs (sysfs) is the lower
> > layer of overlayfs, which is sane.
> >
> > With /sys/power/resume (and probably other files), sysfs also
> > behaves as a stacking filesystem, calling vfs helpers, such as
> > lookup_bdev() -> kern_path(), which is a behavior of a stacked
> > filesystem, without all the precautions that comes with behaving
> > as a stacked filesystem.
>
> No. This is far worse than anything stacked filesystems do - it's
> an arbitrary pathname resolution while holding a lock.
> It's not local. Just about anything (including automounts, etc.)
> can be happening there and it pushes the lock in question outside
> of *ALL* pathwalk-related locks. Pathname doesn't have to
> resolve to anything on overlayfs - it can just go through
> a symlink on it, or walk into it and traverse a bunch of ..
> afterwards, etc.
>
> Don't confuse that with stacking - it's not even close.
> You can't use that anywhere near overlayfs layers.
>
> Maybe isolate it into a separate filesystem, to be automounted
> on /sys/power. And make anyone playing with overlayfs with
> sysfs as a layer mount the damn thing on top of power/ in your
> overlayfs. But using that thing as a part of layer is
> a non-starter.
Incidentally, why do you need to lock overlayfs inode to call
vfs_llseek() on the underlying file? It might (or might not)
need to lock the underlying file (for things like ->i_size,
etc.), but that will be done by ->llseek() instance and it
would deal with the inode in the layer, not overlayfs one.
Similar question applies to ovl_write_iter() - why do you
need to hold the overlayfs inode locked during the call of
backing_file_write_iter()?
On Thu, Apr 04, 2024 at 12:33:40PM +0300, Amir Goldstein wrote:
> This specifically cannot happen because sysfs is not allowed as an
> upper layer only as a lower layer, so overlayfs itself will not be writing to
> /sys/power/resume.
Then how could you possibly get a deadlock there? What would your minimal
deadlocked set look like?
1. Something is blocked in lookup_bdev() called from resume_store(), called
from sysfs_kf_write(), called from kernfs_write_iter(), which has acquired
->mutex of struct kernfs_open_file that had been allocated by
kernfs_fop_open() back when the file had been opened. Note that each
struct file instance gets a separate struct kernfs_open_file. Since we are
calling ->write_iter(), the file *MUST* have been opened for write.
2. Something is blocked in kernfs_fop_llseek() on the same of->mutex,
i.e. using the same struct file as (1). That something is holding an
overlayfs inode lock, which is what the next thread is blocked on.
+ at least one more thread, to complete the cycle.
Right? How could that possibly happen without overlayfs opening /sys/power/resume
for write? Again, each struct file instance gets a separate of->mutex;
for a deadlock you need a cycle of threads and a cycle of locks, such
that each thread is holding the corresponding lock and is blocked on
attempt to get the lock that comes next in the cyclic order.
If overlayfs never writes to that sucker, it can't participate in that
cycle. Sure, you can get overlayfs llseek grabbing of->mutex of *ANOTHER*
struct file opened for the same sysfs file. Since it's not the same
struct file and since each struct file there gets a separate kernfs_open_file
instance, the mutex won't be the same.
Unless I'm missing something else, that can't deadlock. For a quick and
dirty experiment, try to give of->mutex on r/o opens a class separate from
that on r/w and w/o opens (mutex_init() in kernfs_fop_open()) and see
if lockdep warnings persist.
Something like
if (has_mmap)
mutex_init(&of->mutex);
else if (file->f_mode & FMODE_WRITE)
mutex_init(&of->mutex);
else
mutex_init(&of->mutex);
circa fs/kernfs/file.c:642.
On Thu, Apr 04, 2024 at 12:33:40PM +0300, Amir Goldstein wrote:
> I don't follow what you are saying.
> Which code is in non-starter violation?
> kernfs for calling lookup_bdev() with internal of->mutex held?
That is a huge problem, and has been causing endless annoying lockdep
chains in the block layer for us. If we have some way to kill this
the whole block layer would benefit.
On Fri, Apr 5, 2024 at 1:01 AM Al Viro <[email protected]> wrote:
>
> On Thu, Apr 04, 2024 at 12:33:40PM +0300, Amir Goldstein wrote:
>
> > This specifically cannot happen because sysfs is not allowed as an
> > upper layer only as a lower layer, so overlayfs itself will not be writing to
> > /sys/power/resume.
>
> Then how could you possibly get a deadlock there? What would your minimal
> deadlocked set look like?
>
> 1. Something is blocked in lookup_bdev() called from resume_store(), called
> from sysfs_kf_write(), called from kernfs_write_iter(), which has acquired
> ->mutex of struct kernfs_open_file that had been allocated by
> kernfs_fop_open() back when the file had been opened. Note that each
> struct file instance gets a separate struct kernfs_open_file. Since we are
> calling ->write_iter(), the file *MUST* have been opened for write.
>
> 2. Something is blocked in kernfs_fop_llseek() on the same of->mutex,
> i.e. using the same struct file as (1). That something is holding an
> overlayfs inode lock, which is what the next thread is blocked on.
>
> + at least one more thread, to complete the cycle.
>
> Right? How could that possibly happen without overlayfs opening /sys/power/resume
> for write? Again, each struct file instance gets a separate of->mutex;
> for a deadlock you need a cycle of threads and a cycle of locks, such
> that each thread is holding the corresponding lock and is blocked on
> attempt to get the lock that comes next in the cyclic order.
Absolutely right.
I had it in my mind that this was a node lock. Did not look closely.
>
> If overlayfs never writes to that sucker, it can't participate in that
> cycle. Sure, you can get overlayfs llseek grabbing of->mutex of *ANOTHER*
> struct file opened for the same sysfs file. Since it's not the same
> struct file and since each struct file there gets a separate kernfs_open_file
> instance, the mutex won't be the same.
>
> Unless I'm missing something else, that can't deadlock. For a quick and
> dirty experiment, try to give of->mutex on r/o opens a class separate from
> that on r/w and w/o opens (mutex_init() in kernfs_fop_open()) and see
> if lockdep warnings persist.
>
> Something like
>
> if (has_mmap)
> mutex_init(&of->mutex);
> else if (file->f_mode & FMODE_WRITE)
> mutex_init(&of->mutex);
> else
> mutex_init(&of->mutex);
Why a quick experiment?
Why not a permanent kludge?
It is not any better or worse than the already existing has_mmap
subclass annotation. huh?
Thanks,
Amir.
On Thu, Apr 04, 2024 at 12:33:40PM +0300, Amir Goldstein wrote:
> On Thu, Apr 4, 2024 at 11:21 AM Al Viro <[email protected]> wrote:
> >
> > On Thu, Apr 04, 2024 at 09:11:22AM +0100, Al Viro wrote:
> > > On Thu, Apr 04, 2024 at 09:54:35AM +0300, Amir Goldstein wrote:
> > > >
> > > > In the lockdep dependency chain, overlayfs inode lock is taken
> > > > before kernfs internal of->mutex, where kernfs (sysfs) is the lower
> > > > layer of overlayfs, which is sane.
> > > >
> > > > With /sys/power/resume (and probably other files), sysfs also
> > > > behaves as a stacking filesystem, calling vfs helpers, such as
> > > > lookup_bdev() -> kern_path(), which is a behavior of a stacked
> > > > filesystem, without all the precautions that comes with behaving
> > > > as a stacked filesystem.
> > >
> > > No. This is far worse than anything stacked filesystems do - it's
> > > an arbitrary pathname resolution while holding a lock.
> > > It's not local. Just about anything (including automounts, etc.)
> > > can be happening there and it pushes the lock in question outside
> > > of *ALL* pathwalk-related locks. Pathname doesn't have to
> > > resolve to anything on overlayfs - it can just go through
> > > a symlink on it, or walk into it and traverse a bunch of ..
> > > afterwards, etc.
> > >
> > > Don't confuse that with stacking - it's not even close.
> > > You can't use that anywhere near overlayfs layers.
> > >
> > > Maybe isolate it into a separate filesystem, to be automounted
> > > on /sys/power. And make anyone playing with overlayfs with
> > > sysfs as a layer mount the damn thing on top of power/ in your
> > > overlayfs. But using that thing as a part of layer is
> > > a non-starter.
>
> I don't follow what you are saying.
> Which code is in non-starter violation?
> kernfs for calling lookup_bdev() with internal of->mutex held?
> Overlayfs for allowing sysfs as a lower layer and calling
> vfs_llseek(lower_sysfs_file,...) during copy up while ovl inode is held
> for legit reasons (e.g. from ovl_rename())?
>
> >
> > Incidentally, why do you need to lock overlayfs inode to call
> > vfs_llseek() on the underlying file? It might (or might not)
> > need to lock the underlying file (for things like ->i_size,
> > etc.), but that will be done by ->llseek() instance and it
> > would deal with the inode in the layer, not overlayfs one.
>
> We do not (anymore) lock ovl inode in ovl_llseek(), see:
> b1f9d3858f72 ovl: use ovl_inode_lock in ovl_llseek()
> but ovl inode is held in operations (e.g. ovl_rename)
> which trigger copy up and call vfs_llseek() on the lower file.
>
> >
> > Similar question applies to ovl_write_iter() - why do you
> > need to hold the overlayfs inode locked during the call of
> > backing_file_write_iter()?
> >
>
> Not sure. This question I need to defer to Miklos.
> I see in several places the pattern:
> inode_lock(inode);
> /* Update mode */
> ovl_copyattr(inode);
> ret = file_remove_privs(file);
> ...
> /* Update size */
> ovl_file_modified(file);
> ...
> inode_unlock(inode);
>
> so it could be related to atomic remove privs and update mtime,
> but possibly we could convert all of those inode_lock() to
> ovl_inode_lock() (i.e. internal lock below vfs inode lock).
>
> [...]
> > Consider the scenario when unlink() is called on that sucker
> > during the write() that triggers that pathwalk. We have
> >
> > unlink: blocked on overlayfs inode of file, while holding the parent directory.
> > write: holding the overlayfs inode of file and trying to resolve a pathname
> > that contains .../power/suspend_stats/../../...; blocked on attempt to lock
> > parent so we could do a lookup in it.
>
> This specifically cannot happen because sysfs is not allowed as an
> upper layer only as a lower layer, so overlayfs itself will not be writing to
> /sys/power/resume.
I don't understand that part. If overlayfs uses /sys/power/ as a lower
layer it can open and write to /sys/power/resume, no?
Honestly, why don't you just block /sys/power from appearing in any
layer in overlayfs? This seems like such a niche use-case that it's so
unlikely that this will be used that I would just try and kill it.
If you do it like Al suggested and switch it to an automount you get
that for free. But I guess you can also just block it without that.
(Frankly, I find it weird that sysfs is allowed as a layer in any case. I
completely forgot about this. Imho, both procfs and sysfs should not be
usable as a lower layer - procfs is, I know - and then only select parts
should be like /sys/fs/cgroup or sm where I can see the container people
somehow using this to mess with the cgroup tree or something.)
>
> >
> > No llseek involved anywhere, kernfs of->mutex held, but not contended,
> > deadlock purely on ->i_rwsem of overlayfs inodes.
> >
> > Holding overlayfs inode locked during the call of lookup_bdev() is really
> > no-go.
>
> Yes. I see that, but how can this be resolved?
>
> If the specific file /sys/power/resume will not have FMODE_LSEEK,
> then ovl_copy_up_file() will not try to skip_hole on it at all and the
> llseek lock dependency will be averted.
>
> The question is whether opt-out of FMODE_LSEEK for /sys/power/resume
> will break anything or if we should mark seq files in another way so that
> overlayfs would not try to seek_hole on any of them categorically.
>
> Thanks,
> Amir.
On Fri, Apr 05, 2024 at 08:51:35AM +0200, Christoph Hellwig wrote:
> On Thu, Apr 04, 2024 at 12:33:40PM +0300, Amir Goldstein wrote:
> > I don't follow what you are saying.
> > Which code is in non-starter violation?
> > kernfs for calling lookup_bdev() with internal of->mutex held?
>
> That is a huge problem, and has been causing endless annoying lockdep
> chains in the block layer for us. If we have some way to kill this
> the whole block layer would benefit.
Why not just try and add a better resume api that forces resume to not
use a path argument neither for resume_file nor for writes to
/sys/power/resume. IOW, extend the protocol what can get written to
/sys/power/resume and then phase out the path argument. It'll take a
while but it's a possibly clean solution.
On Fri, Apr 05, 2024 at 01:19:35PM +0200, Christian Brauner wrote:
> On Fri, Apr 05, 2024 at 08:51:35AM +0200, Christoph Hellwig wrote:
> > On Thu, Apr 04, 2024 at 12:33:40PM +0300, Amir Goldstein wrote:
> > > I don't follow what you are saying.
> > > Which code is in non-starter violation?
> > > kernfs for calling lookup_bdev() with internal of->mutex held?
> >
> > That is a huge problem, and has been causing endless annoying lockdep
> > chains in the block layer for us. If we have some way to kill this
> > the whole block layer would benefit.
>
> Why not just try and add a better resume api that forces resume to not
> use a path argument neither for resume_file nor for writes to
> /sys/power/resume. IOW, extend the protocol what can get written to
> /sys/power/resume and then phase out the path argument. It'll take a
> while but it's a possibly clean solution.
In fact, just looking at this code with a naive set of eyes for a second:
* There's early_lookup_bdev() which deals with PARTUUID,
PARTLABEL, raw device number, and lookup based on /dev. No actual path
lookup involved in that.
* So the only interesting case is lookup_bdev() for /sys/power/suspend.
That one takes arbitrary paths. But being realistic for a moment...
How many people will specify a device path that's _not_ some variant
of /dev/...? IOW, how many people will specify a device path that's
not on devtmpfs or a symlink on devtmpfs? Probably almost no one.
Containers come to mind ofc. But they can't mount devtmpfs so usually
what they do is that they create a tmpfs mount at /dev and then
bind-mount device nodes they need into there. But unprivileged
containers cannot use suspend because that requires init_user_ns
capabilities. And privileged containers that are allowed to hibernate
and use custom paths seem extremly unlikely as well.
So really, _naively_ it seems to me that one could factor out the /dev/*
part of the device number parsing logic in early_lookup_bdev() and port
resume_store() to use that first and only if that fails fall back to
full lookup_bdev(). (Possibly combined with some sort of logging that the
user should use /dev/... paths or at least a way to recognize that this
arbitrary path stuff is actually used.)
And citing from a chat with the hibernation maintainer in systemd:
<brauner> So /sys/power/resume does systemd ever write anything other than a /dev/* path in to there?
<maintainer> Hmm? You never do that? It only accepts devno.
So that takes away one of the main users of this api. So I really
suspect that arbitrary device path is unused in practice. Maybe I'm all
wrong though.
On Fri, Apr 5, 2024 at 1:47 PM Christian Brauner <[email protected]> wrote:
>
> On Thu, Apr 04, 2024 at 12:33:40PM +0300, Amir Goldstein wrote:
> > On Thu, Apr 4, 2024 at 11:21 AM Al Viro <[email protected]> wrote:
> > >
> > > On Thu, Apr 04, 2024 at 09:11:22AM +0100, Al Viro wrote:
> > > > On Thu, Apr 04, 2024 at 09:54:35AM +0300, Amir Goldstein wrote:
> > > > >
> > > > > In the lockdep dependency chain, overlayfs inode lock is taken
> > > > > before kernfs internal of->mutex, where kernfs (sysfs) is the lower
> > > > > layer of overlayfs, which is sane.
> > > > >
> > > > > With /sys/power/resume (and probably other files), sysfs also
> > > > > behaves as a stacking filesystem, calling vfs helpers, such as
> > > > > lookup_bdev() -> kern_path(), which is a behavior of a stacked
> > > > > filesystem, without all the precautions that comes with behaving
> > > > > as a stacked filesystem.
> > > >
> > > > No. This is far worse than anything stacked filesystems do - it's
> > > > an arbitrary pathname resolution while holding a lock.
> > > > It's not local. Just about anything (including automounts, etc.)
> > > > can be happening there and it pushes the lock in question outside
> > > > of *ALL* pathwalk-related locks. Pathname doesn't have to
> > > > resolve to anything on overlayfs - it can just go through
> > > > a symlink on it, or walk into it and traverse a bunch of ..
> > > > afterwards, etc.
> > > >
> > > > Don't confuse that with stacking - it's not even close.
> > > > You can't use that anywhere near overlayfs layers.
> > > >
> > > > Maybe isolate it into a separate filesystem, to be automounted
> > > > on /sys/power. And make anyone playing with overlayfs with
> > > > sysfs as a layer mount the damn thing on top of power/ in your
> > > > overlayfs. But using that thing as a part of layer is
> > > > a non-starter.
> >
> > I don't follow what you are saying.
> > Which code is in non-starter violation?
> > kernfs for calling lookup_bdev() with internal of->mutex held?
> > Overlayfs for allowing sysfs as a lower layer and calling
> > vfs_llseek(lower_sysfs_file,...) during copy up while ovl inode is held
> > for legit reasons (e.g. from ovl_rename())?
> >
> > >
> > > Incidentally, why do you need to lock overlayfs inode to call
> > > vfs_llseek() on the underlying file? It might (or might not)
> > > need to lock the underlying file (for things like ->i_size,
> > > etc.), but that will be done by ->llseek() instance and it
> > > would deal with the inode in the layer, not overlayfs one.
> >
> > We do not (anymore) lock ovl inode in ovl_llseek(), see:
> > b1f9d3858f72 ovl: use ovl_inode_lock in ovl_llseek()
> > but ovl inode is held in operations (e.g. ovl_rename)
> > which trigger copy up and call vfs_llseek() on the lower file.
> >
> > >
> > > Similar question applies to ovl_write_iter() - why do you
> > > need to hold the overlayfs inode locked during the call of
> > > backing_file_write_iter()?
> > >
> >
> > Not sure. This question I need to defer to Miklos.
> > I see in several places the pattern:
> > inode_lock(inode);
> > /* Update mode */
> > ovl_copyattr(inode);
> > ret = file_remove_privs(file);
> > ...
> > /* Update size */
> > ovl_file_modified(file);
> > ...
> > inode_unlock(inode);
> >
> > so it could be related to atomic remove privs and update mtime,
> > but possibly we could convert all of those inode_lock() to
> > ovl_inode_lock() (i.e. internal lock below vfs inode lock).
> >
> > [...]
> > > Consider the scenario when unlink() is called on that sucker
> > > during the write() that triggers that pathwalk. We have
> > >
> > > unlink: blocked on overlayfs inode of file, while holding the parent directory.
> > > write: holding the overlayfs inode of file and trying to resolve a pathname
> > > that contains .../power/suspend_stats/../../...; blocked on attempt to lock
> > > parent so we could do a lookup in it.
> >
> > This specifically cannot happen because sysfs is not allowed as an
> > upper layer only as a lower layer, so overlayfs itself will not be writing to
> > /sys/power/resume.
>
> I don't understand that part. If overlayfs uses /sys/power/ as a lower
> layer it can open and write to /sys/power/resume, no?
>
> Honestly, why don't you just block /sys/power from appearing in any
> layer in overlayfs? This seems like such a niche use-case that it's so
> unlikely that this will be used that I would just try and kill it.
I do not want to special case /sys/power in overlayfs.
>
> If you do it like Al suggested and switch it to an automount you get
Not important enough IMO to make this change.
> that for free. But I guess you can also just block it without that.
>
> (Frankly, I find it weird that sysfs is allowed as a layer in any case. I
> completely forgot about this. Imho, both procfs and sysfs should not be
> usable as a lower layer - procfs is, I know - and then only select parts
> should be like /sys/fs/cgroup or sm where I can see the container people
> somehow using this to mess with the cgroup tree or something.)
>
I do not know if using sysfs as a lower layer is an important use case,
but I have a feeling that people already may do it, so I cannot regress it
without a good reason.
Al's suggestion to annotate writable kernfs files as a different class from
readonly kernfs files seems fine by me to silence lockdep false positive.
I will try to feed this solution to syzbot.
Thanks,
Amir.
On Fri, Apr 05, 2024 at 03:48:20PM +0200, Christian Brauner wrote:
> * There's early_lookup_bdev() which deals with PARTUUID,
> PARTLABEL, raw device number, and lookup based on /dev. No actual path
> lookup involved in that.
>
> * So the only interesting case is lookup_bdev() for /sys/power/suspend.
> That one takes arbitrary paths. But being realistic for a moment...
> How many people will specify a device path that's _not_ some variant
> of /dev/...? IOW, how many people will specify a device path that's
> not on devtmpfs or a symlink on devtmpfs? Probably almost no one.
That's not the point. The poins is that trying to do the dumb name
to bdev translation in early_lookup_bdev is wrong. Distro had and have
their own numbering schemes, and not using them bypasses access
control. We should never use that at runtime.
> <brauner> So /sys/power/resume does systemd ever write anything other than a /dev/* path in to there?
> <maintainer> Hmm? You never do that? It only accepts devno.
>
> So that takes away one of the main users of this api. So I really
> suspect that arbitrary device path is unused in practice. Maybe I'm all
> wrong though.
I'm all fine with just accepting a devno and no name. But I fear it
will break something as someone added it for whatever use case they had
(and we should not have allowed that back then, but that ship has sailed
unfortunately)
On Thu, Apr 4, 2024 at 9:54 AM Amir Goldstein <[email protected]> wrote:
>
> On Thu, Apr 4, 2024 at 2:51 AM syzbot
> <[email protected]> wrote:
> >
> > syzbot has bisected this issue to:
> >
> > commit 0fedefd4c4e33dd24f726b13b5d7c143e2b483be
> > Author: Valentine Sinitsyn <[email protected]>
> > Date: Mon Sep 25 08:40:12 2023 +0000
> >
> > kernfs: sysfs: support custom llseek method for sysfs entries
> >
> > bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=17cb5e03180000
> > start commit: fe46a7dd189e Merge tag 'sound-6.9-rc1' of git://git.kernel..
> > git tree: upstream
> > final oops: https://syzkaller.appspot.com/x/report.txt?x=142b5e03180000
> > console output: https://syzkaller.appspot.com/x/log.txt?x=102b5e03180000
> > kernel config: https://syzkaller.appspot.com/x/.config?x=4d90a36f0cab495a
> > dashboard link: https://syzkaller.appspot.com/bug?extid=9a5b0ced8b1bfb238b56
> > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=17f1d93d180000
> > C reproducer: https://syzkaller.appspot.com/x/repro.c?x=15c38139180000
> >
> > Reported-by: [email protected]
> > Fixes: 0fedefd4c4e3 ("kernfs: sysfs: support custom llseek method for sysfs entries")
> >
> > For information about bisection process see: https://goo.gl/tpsmEJ#bisection
> >
>
Let's test Al's solution.
#syz test: https://github.com/amir73il/linux/ vfs-fixes
Thanks,
Amir.
Hello,
syzbot tried to test the proposed patch but the build/boot failed:
viperboard
[ 8.144672][ T1] usbcore: registered new interface driver dln2
[ 8.146814][ T1] usbcore: registered new interface driver pn533_usb
[ 8.154721][ T1] nfcsim 0.2 initialized
[ 8.156856][ T1] usbcore: registered new interface driver port100
[ 8.158745][ T1] usbcore: registered new interface driver nfcmrvl
[ 8.168709][ T1] Loading iSCSI transport class v2.0-870.
[ 8.188275][ T1] virtio_scsi virtio0: 1/0/0 default/read/poll queues
[ 8.202624][ T1] ------------[ cut here ]------------
[ 8.204252][ T1] refcount_t: decrement hit 0; leaking memory.
[ 8.206219][ T1] WARNING: CPU: 0 PID: 1 at lib/refcount.c:31 refcount_warn_saturate+0xfa/0x1d0
[ 8.208638][ T1] Modules linked in:
[ 8.209556][ T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.9.0-rc1-syzkaller-00001-g70d370568b75 #0
[ 8.214301][ T1] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024
[ 8.217361][ T1] RIP: 0010:refcount_warn_saturate+0xfa/0x1d0
[ 8.218680][ T1] Code: b2 00 00 00 e8 77 99 f2 fc 5b 5d c3 cc cc cc cc e8 6b 99 f2 fc c6 05 11 1e f0 0a 01 90 48 c7 c7 e0 5e 1e 8c e8 b7 35 b5 fc 90 <0f> 0b 90 90 eb d9 e8 4b 99 f2 fc c6 05 ee 1d f0 0a 01 90 48 c7 c7
[ 8.223425][ T1] RSP: 0000:ffffc90000066e18 EFLAGS: 00010246
[ 8.225385][ T1] RAX: 1f757896feb95b00 RBX: ffff8881482a2d7c RCX: ffff888016ac8000
[ 8.227629][ T1] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[ 8.229990][ T1] RBP: 0000000000000004 R08: ffffffff8157ffe2 R09: fffffbfff1c396e0
[ 8.232520][ T1] R10: dffffc0000000000 R11: fffffbfff1c396e0 R12: ffffea000502cdc0
[ 8.234601][ T1] R13: ffffea000502cdc8 R14: 1ffffd4000a059b9 R15: 0000000000000000
[ 8.236876][ T1] FS: 0000000000000000(0000) GS:ffff8880b9400000(0000) knlGS:0000000000000000
[ 8.239600][ T1] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 8.240914][ T1] CR2: ffff88823ffff000 CR3: 000000000e132000 CR4: 00000000003506f0
[ 8.243303][ T1] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 8.245750][ T1] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 8.247945][ T1] Call Trace:
[ 8.248964][ T1] <TASK>
[ 8.250217][ T1] ? __warn+0x163/0x4e0
[ 8.251589][ T1] ? refcount_warn_saturate+0xfa/0x1d0
[ 8.253743][ T1] ? report_bug+0x2b3/0x500
[ 8.254770][ T1] ? refcount_warn_saturate+0xfa/0x1d0
[ 8.256336][ T1] ? handle_bug+0x3e/0x70
[ 8.257030][ T1] ? exc_invalid_op+0x1a/0x50
[ 8.258667][ T1] ? asm_exc_invalid_op+0x1a/0x20
[ 8.259957][ T1] ? __warn_printk+0x292/0x360
[ 8.261215][ T1] ? refcount_warn_saturate+0xfa/0x1d0
[ 8.263208][ T1] ? refcount_warn_saturate+0xf9/0x1d0
[ 8.264293][ T1] __free_pages_ok+0xc54/0xd80
[ 8.265404][ T1] make_alloc_exact+0xa3/0xf0
[ 8.266560][ T1] vring_alloc_queue_split+0x20a/0x600
[ 8.267988][ T1] ? __pfx_vring_alloc_queue_split+0x10/0x10
[ 8.269965][ T1] ? vp_find_vqs+0x4c/0x4e0
[ 8.271058][ T1] ? virtscsi_probe+0x3ea/0xf60
[ 8.272057][ T1] ? virtio_dev_probe+0x991/0xaf0
[ 8.273466][ T1] ? really_probe+0x2b8/0xad0
[ 8.275102][ T1] ? driver_probe_device+0x50/0x430
[ 8.276174][ T1] vring_create_virtqueue_split+0xc6/0x310
[ 8.277826][ T1] ? ret_from_fork+0x4b/0x80
[ 8.279140][ T1] ? __pfx_vring_create_virtqueue_split+0x10/0x10
[ 8.281586][ T1] vring_create_virtqueue+0xca/0x110
[ 8.283311][ T1] ? __pfx_vp_notify+0x10/0x10
[ 8.285146][ T1] ? __pfx_virtscsi_ctrl_done+0x10/0x10
[ 8.286554][ T1] setup_vq+0xe9/0x2d0
[ 8.287916][ T1] ? __pfx_vp_notify+0x10/0x10
[ 8.289684][ T1] ? __pfx_virtscsi_ctrl_done+0x10/0x10
[ 8.291329][ T1] ? __pfx_virtscsi_ctrl_done+0x10/0x10
[ 8.292741][ T1] ? __pfx_virtscsi_ctrl_done+0x10/0x10
[ 8.294786][ T1] vp_setup_vq+0xbf/0x330
[ 8.296422][ T1] ? __pfx_vp_config_changed+0x10/0x10
[ 8.297504][ T1] ? ioread16+0x2f/0x90
[ 8.298989][ T1] ? __pfx_virtscsi_ctrl_done+0x10/0x10
[ 8.300925][ T1] vp_find_vqs_msix+0x8b2/0xc80
[ 8.302291][ T1] vp_find_vqs+0x4c/0x4e0
[ 8.303546][ T1] virtscsi_init+0x8db/0xd00
[ 8.304820][ T1] ? __pfx_virtscsi_init+0x10/0x10
[ 8.305808][ T1] ? __pfx_default_calc_sets+0x10/0x10
[ 8.307015][ T1] ? scsi_host_alloc+0xa57/0xea0
[ 8.308290][ T1] ? vp_get+0xfd/0x140
[ 8.309405][ T1] virtscsi_probe+0x3ea/0xf60
[ 8.310389][ T1] ? __pfx_virtscsi_probe+0x10/0x10
[ 8.311500][ T1] ? kernfs_add_one+0x156/0x8b0
[ 8.312348][ T1] ? virtio_no_restricted_mem_acc+0x9/0x10
[ 8.313977][ T1] ? virtio_features_ok+0x10c/0x270
[ 8.314859][ T1] virtio_dev_probe+0x991/0xaf0
[ 8.315766][ T1] ? __pfx_virtio_dev_probe+0x10/0x10
[ 8.316886][ T1] really_probe+0x2b8/0xad0
[ 8.317711][ T1] __driver_probe_device+0x1a2/0x390
[ 8.318525][ T1] driver_probe_device+0x50/0x430
[ 8.319567][ T1] __driver_attach+0x45f/0x710
[ 8.320525][ T1] ? __pfx___driver_attach+0x10/0x10
[ 8.321841][ T1] bus_for_each_dev+0x239/0x2b0
[ 8.322872][ T1] ? __pfx___driver_attach+0x10/0x10
[ 8.324235][ T1] ? __pfx_bus_for_each_dev+0x10/0x10
[ 8.326127][ T1] ? do_raw_spin_unlock+0x13c/0x8b0
[ 8.327579][ T1] bus_add_driver+0x347/0x620
[ 8.328989][ T1] driver_register+0x23a/0x320
[ 8.330318][ T1] ? __pfx_virtio_scsi_init+0x10/0x10
[ 8.332178][ T1] virtio_scsi_init+0x65/0xe0
[ 8.333752][ T1] ? __pfx_virtio_scsi_init+0x10/0x10
[ 8.334950][ T1] do_one_initcall+0x248/0x880
[ 8.336015][ T1] ? __pfx_virtio_scsi_init+0x10/0x10
[ 8.337202][ T1] ? __pfx_do_one_initcall+0x10/0x10
[ 8.338254][ T1] ? lockdep_hardirqs_on_prepare+0x43d/0x780
[ 8.339378][ T1] ? __pfx_parse_args+0x10/0x10
[ 8.341464][ T1] ? do_initcalls+0x1c/0x80
[ 8.342223][ T1] ? rcu_is_watching+0x15/0xb0
[ 8.343161][ T1] do_initcall_level+0x157/0x210
[ 8.344252][ T1] do_initcalls+0x3f/0x80
[ 8.345400][ T1] kernel_init_freeable+0x435/0x5d0
[ 8.346571][ T1] ? __pfx_kernel_init_freeable+0x10/0x10
[ 8.349024][ T1] ? __pfx_lockdep_hardirqs_on_prepare+0x10/0x10
[ 8.350664][ T1] ? __pfx_kernel_init+0x10/0x10
[ 8.352143][ T1] ? __pfx_kernel_init+0x10/0x10
[ 8.353430][ T1] ? __pfx_kernel_init+0x10/0x10
[ 8.354609][ T1] kernel_init+0x1d/0x2b0
[ 8.355433][ T1] ret_from_fork+0x4b/0x80
[ 8.356203][ T1] ? __pfx_kernel_init+0x10/0x10
[ 8.357092][ T1] ret_from_fork_asm+0x1a/0x30
[ 8.358282][ T1] </TASK>
[ 8.359166][ T1] Kernel panic - not syncing: kernel: panic_on_warn set ...
[ 8.360941][ T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.9.0-rc1-syzkaller-00001-g70d370568b75 #0
[ 8.363085][ T1] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024
[ 8.363411][ T1] Call Trace:
[ 8.363411][ T1] <TASK>
[ 8.363411][ T1] dump_stack_lvl+0x241/0x360
[ 8.363411][ T1] ? __pfx_dump_stack_lvl+0x10/0x10
[ 8.363411][ T1] ? __pfx__printk+0x10/0x10
[ 8.363411][ T1] ? _printk+0xd5/0x120
[ 8.363411][ T1] ? vscnprintf+0x5d/0x90
[ 8.363411][ T1] panic+0x349/0x860
[ 8.363411][ T1] ? __warn+0x172/0x4e0
[ 8.363411][ T1] ? __pfx_panic+0x10/0x10
[ 8.363411][ T1] ? show_trace_log_lvl+0x4e6/0x520
[ 8.363411][ T1] ? ret_from_fork_asm+0x1a/0x30
[ 8.363411][ T1] __warn+0x346/0x4e0
[ 8.363411][ T1] ? refcount_warn_saturate+0xfa/0x1d0
[ 8.363411][ T1] report_bug+0x2b3/0x500
[ 8.363411][ T1] ? refcount_warn_saturate+0xfa/0x1d0
[ 8.363411][ T1] handle_bug+0x3e/0x70
[ 8.363411][ T1] exc_invalid_op+0x1a/0x50
[ 8.363411][ T1] asm_exc_invalid_op+0x1a/0x20
[ 8.363411][ T1] RIP: 0010:refcount_warn_saturate+0xfa/0x1d0
[ 8.363411][ T1] Code: b2 00 00 00 e8 77 99 f2 fc 5b 5d c3 cc cc cc cc e8 6b 99 f2 fc c6 05 11 1e f0 0a 01 90 48 c7 c7 e0 5e 1e 8c e8 b7 35 b5 fc 90 <0f> 0b 90 90 eb d9 e8 4b 99 f2 fc c6 05 ee 1d f0 0a 01 90 48 c7 c7
[ 8.363411][ T1] RSP: 0000:ffffc90000066e18 EFLAGS: 00010246
[ 8.363411][ T1] RAX: 1f757896feb95b00 RBX: ffff8881482a2d7c RCX: ffff888016ac8000
[ 8.363411][ T1] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[ 8.363411][ T1] RBP: 0000000000000004 R08: ffffffff8157ffe2 R09: fffffbfff1c396e0
[ 8.363411][ T1] R10: dffffc0000000000 R11: fffffbfff1c396e0 R12: ffffea000502cdc0
[ 8.363411][ T1] R13: ffffea000502cdc8 R14: 1ffffd4000a059b9 R15: 0000000000000000
[ 8.363411][ T1] ? __warn_printk+0x292/0x360
[ 8.363411][ T1] ? refcount_warn_saturate+0xf9/0x1d0
[ 8.363411][ T1] __free_pages_ok+0xc54/0xd80
[ 8.363411][ T1] make_alloc_exact+0xa3/0xf0
[ 8.363411][ T1] vring_alloc_queue_split+0x20a/0x600
[ 8.363411][ T1] ? __pfx_vring_alloc_queue_split+0x10/0x10
[ 8.363411][ T1] ? vp_find_vqs+0x4c/0x4e0
[ 8.363411][ T1] ? virtscsi_probe+0x3ea/0xf60
[ 8.363411][ T1] ? virtio_dev_probe+0x991/0xaf0
[ 8.363411][ T1] ? really_probe+0x2b8/0xad0
[ 8.363411][ T1] ? driver_probe_device+0x50/0x430
[ 8.363411][ T1] vring_create_virtqueue_split+0xc6/0x310
[ 8.363411][ T1] ? ret_from_fork+0x4b/0x80
[ 8.363411][ T1] ? __pfx_vring_create_virtqueue_split+0x10/0x10
[ 8.363411][ T1] vring_create_virtqueue+0xca/0x110
[ 8.363411][ T1] ? __pfx_vp_notify+0x10/0x10
[ 8.363411][ T1] ? __pfx_virtscsi_ctrl_done+0x10/0x10
[ 8.363411][ T1] setup_vq+0xe9/0x2d0
[ 8.363411][ T1] ? __pfx_vp_notify+0x10/0x10
[ 8.363411][ T1] ? __pfx_virtscsi_ctrl_done+0x10/0x10
[ 8.363411][ T1] ? __pfx_virtscsi_ctrl_done+0x10/0x10
[ 8.363411][ T1] ? __pfx_virtscsi_ctrl_done+0x10/0x10
[ 8.363411][ T1] vp_setup_vq+0xbf/0x330
[ 8.363411][ T1] ? __pfx_vp_config_changed+0x10/0x10
[ 8.363411][ T1] ? ioread16+0x2f/0x90
[ 8.363411][ T1] ? __pfx_virtscsi_ctrl_done+0x10/0x10
[ 8.363411][ T1] vp_find_vqs_msix+0x8b2/0xc80
[ 8.363411][ T1] vp_find_vqs+0x4c/0x4e0
[ 8.363411][ T1] virtscsi_init+0x8db/0xd00
[ 8.363411][ T1] ? __pfx_virtscsi_init+0x10/0x10
[ 8.363411][ T1] ? __pfx_default_calc_sets+0x10/0x10
[ 8.363411][ T1] ? scsi_host_alloc+0xa57/0xea0
[ 8.363411][ T1] ? vp_get+0xfd/0x140
[ 8.363411][ T1] virtscsi_probe+0x3ea/0xf60
[ 8.363411][ T1] ? __pfx_virtscsi_probe+0x10/0x10
[ 8.363411][ T1] ? kernfs_add_one+0x156/0x8b0
[ 8.363411][ T1] ? virtio_no_restricted_mem_acc+0x9/0x10
[ 8.363411][ T1] ? virtio_features_ok+0x10c/0x270
[ 8.363411][ T1] virtio_dev_probe+0x991/0xaf0
[ 8.363411][ T1] ? __pfx_virtio_dev_probe+0x10/0x10
[ 8.363411][ T1] really_probe+0x2b8/0xad0
[ 8.363411][ T1] __driver_probe_device+0x1a2/0x390
[ 8.363411][ T1] driver_probe_device+0x50/0x430
[ 8.363411][ T1] __driver_attach+0x45f/0x710
[ 8.363411][ T1] ? __pfx___driver_attach+0x10/0x10
[ 8.363411][ T1] bus_for_each_dev+0x239/0x2b0
[ 8.363411][ T1] ? __pfx___driver_attach+0x10/0x10
[ 8.363411][ T1] ? __pfx_bus_for_each_dev+0x10/0x10
[ 8.363411][ T1] ? do_raw_spin_unlock+0x13c/0x8b0
[ 8.363411][ T1] bus_add_driver+0x347/0x620
[ 8.363411][ T1] driver_register+0x23a/0x320
[ 8.363411][ T1] ? __pfx_virtio_scsi_init+0x10/0x10
[ 8.363411][ T1] virtio_scsi_init+0x65/0xe0
[ 8.363411][ T1] ? __pfx_virtio_scsi_init+0x10/0x10
[ 8.363411][ T1] do_one_initcall+0x248/0x880
[ 8.363411][ T1] ? __pfx_virtio_scsi_init+0x10/0x10
[ 8.363411][ T1] ? __pfx_do_one_initcall+0x10/0x10
[ 8.363411][ T1] ? lockdep_hardirqs_on_prepare+0x43d/0x780
[ 8.363411][ T1] ? __pfx_parse_args+0x10/0x10
[ 8.363411][ T1] ? do_initcalls+0x1c/0x80
[ 8.363411][ T1] ? rcu_is_watching+0x15/0xb0
[ 8.363411][ T1] do_initcall_level+0x157/0x210
[ 8.363411][ T1] do_initcalls+0x3f/0x80
[ 8.363411][ T1] kernel_init_freeable+0x435/0x5d0
[ 8.363411][ T1] ? __pfx_kernel_init_freeable+0x10/0x10
[ 8.363411][ T1] ? __pfx_lockdep_hardirqs_on_prepare+0x10/0x10
[ 8.363411][ T1] ? __pfx_kernel_init+0x10/0x10
[ 8.363411][ T1] ? __pfx_kernel_init+0x10/0x10
[ 8.363411][ T1] ? __pfx_kernel_init+0x10/0x10
[ 8.363411][ T1] kernel_init+0x1d/0x2b0
[ 8.363411][ T1] ret_from_fork+0x4b/0x80
[ 8.363411][ T1] ? __pfx_kernel_init+0x10/0x10
[ 8.363411][ T1] ret_from_fork_asm+0x1a/0x30
[ 8.363411][ T1] </TASK>
[ 8.363411][ T1] Kernel Offset: disabled
[ 8.363411][ T1] Rebooting in 86400 seconds..
syzkaller build log:
go env (err=<nil>)
GO111MODULE='auto'
GOARCH='amd64'
GOBIN=''
GOCACHE='/syzkaller/.cache/go-build'
GOENV='/syzkaller/.config/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='amd64'
GOHOSTOS='linux'
GOINSECURE=''
GOMODCACHE='/syzkaller/jobs-2/linux/gopath/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='linux'
GOPATH='/syzkaller/jobs-2/linux/gopath'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/usr/local/go'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/usr/local/go/pkg/tool/linux_amd64'
GOVCS=''
GOVERSION='go1.21.4'
GCCGO='gccgo'
GOAMD64='v1'
AR='ar'
CC='gcc'
CXX='g++'
CGO_ENABLED='1'
GOMOD='/syzkaller/jobs-2/linux/gopath/src/github.com/google/syzkaller/go.mod'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -m64 -pthread -Wl,--no-gc-sections -fmessage-length=0 -ffile-prefix-map=/tmp/go-build2829307758=/tmp/go-build -gno-record-gcc-switches'
git status (err=<nil>)
HEAD detached at 51c4dcff8
nothing to commit, working tree clean
tput: No value for $TERM and no -T specified
tput: No value for $TERM and no -T specified
Makefile:31: run command via tools/syz-env for best compatibility, see:
Makefile:32: https://github.com/google/syzkaller/blob/master/docs/contributing.md#using-syz-env
go list -f '{{.Stale}}' ./sys/syz-sysgen | grep -q false || go install ./sys/syz-sysgen
make .descriptions
tput: No value for $TERM and no -T specified
tput: No value for $TERM and no -T specified
Makefile:31: run command via tools/syz-env for best compatibility, see:
Makefile:32: https://github.com/google/syzkaller/blob/master/docs/contributing.md#using-syz-env
bin/syz-sysgen
touch .descriptions
GOOS=linux GOARCH=amd64 go build "-ldflags=-s -w -X github.com/google/syzkaller/prog.GitRevision=51c4dcff83b0574620c280cc5130ef59cc4a2e32 -X 'github.com/google/syzkaller/prog.gitRevisionDate=20240403-123700'" "-tags=syz_target syz_os_linux syz_arch_amd64 " -o ./bin/linux_amd64/syz-fuzzer github.com/google/syzkaller/syz-fuzzer
GOOS=linux GOARCH=amd64 go build "-ldflags=-s -w -X github.com/google/syzkaller/prog.GitRevision=51c4dcff83b0574620c280cc5130ef59cc4a2e32 -X 'github.com/google/syzkaller/prog.gitRevisionDate=20240403-123700'" "-tags=syz_target syz_os_linux syz_arch_amd64 " -o ./bin/linux_amd64/syz-execprog github.com/google/syzkaller/tools/syz-execprog
GOOS=linux GOARCH=amd64 go build "-ldflags=-s -w -X github.com/google/syzkaller/prog.GitRevision=51c4dcff83b0574620c280cc5130ef59cc4a2e32 -X 'github.com/google/syzkaller/prog.gitRevisionDate=20240403-123700'" "-tags=syz_target syz_os_linux syz_arch_amd64 " -o ./bin/linux_amd64/syz-stress github.com/google/syzkaller/tools/syz-stress
mkdir -p ./bin/linux_amd64
gcc -o ./bin/linux_amd64/syz-executor executor/executor.cc \
-m64 -O2 -pthread -Wall -Werror -Wparentheses -Wunused-const-variable -Wframe-larger-than=16384 -Wno-stringop-overflow -Wno-array-bounds -Wno-format-overflow -Wno-unused-but-set-variable -Wno-unused-command-line-argument -static-pie -fpermissive -w -DGOOS_linux=1 -DGOARCH_amd64=1 \
-DHOSTGOOS_linux=1 -DGIT_REVISION=\"51c4dcff83b0574620c280cc5130ef59cc4a2e32\"
Error text is too large and was truncated, full error text is at:
https://syzkaller.appspot.com/x/error.txt?x=16ae7ead180000
Tested on:
commit: 70d37056 kernfs: annotate different lockdep class for ..
git tree: https://github.com/amir73il/linux/ vfs-fixes
kernel config: https://syzkaller.appspot.com/x/.config?x=d45c08b2154c215d
dashboard link: https://syzkaller.appspot.com/bug?extid=9a5b0ced8b1bfb238b56
compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
Note: no patches were applied.
Hello,
On Fri, Apr 05, 2024 at 08:51:35AM +0200, Christoph Hellwig wrote:
> On Thu, Apr 04, 2024 at 12:33:40PM +0300, Amir Goldstein wrote:
> > I don't follow what you are saying.
> > Which code is in non-starter violation?
> > kernfs for calling lookup_bdev() with internal of->mutex held?
>
> That is a huge problem, and has been causing endless annoying lockdep
> chains in the block layer for us. If we have some way to kill this
> the whole block layer would benefit.
of->mutex is mostly there as a convenience to kernfs (here, sysfs) users so
that they don't have to worry about concurrent invocation of the callbacks.
It needs more careful look but on cursory observation, it shouldn't be
difficult to implement a flag or different op type which skips of->mutex if
this causes a lot of pain.
Thanks.
--
tejun
On Fri, Apr 05, 2024 at 08:37:03AM -0700, syzbot wrote:
> Hello,
>
> syzbot tried to test the proposed patch but the build/boot failed:
WTF? The patch is
diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c
index e9df2f87072c6..8502ef68459b9 100644
--- a/fs/kernfs/file.c
+++ b/fs/kernfs/file.c
@@ -636,11 +636,18 @@ static int kernfs_fop_open(struct inode *inode, struct file *file)
* each file a separate locking class. Let's differentiate on
* whether the file has mmap or not for now.
*
- * Both paths of the branch look the same. They're supposed to
+ * For similar reasons, writable and readonly files are given different
+ * lockdep key, because the writable file /sys/power/resume may call vfs
+ * lookup helpers for arbitrary paths and readonly files can be read by
+ * overlayfs from vfs helpers when sysfs is a lower layer of overalyfs.
+ *
+ * All three cases look the same. They're supposed to
* look that way and give @of->mutex different static lockdep keys.
*/
if (has_mmap)
mutex_init(&of->mutex);
+ else if (file->f_mode & FMODE_WRITE)
+ mutex_init(&of->mutex);
else
mutex_init(&of->mutex);
How could it possibly trigger boot failure? Test the parent, perhaps?
Hello,
syzbot tried to test the proposed patch but the build/boot failed:
failed to apply patch:
checking file fs/overlayfs/util.c
Hunk #1 FAILED at 639.
1 out of 1 hunk FAILED
Tested on:
commit: fe46a7dd Merge tag 'sound-6.9-rc1' of git://git.kernel..
git tree: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel config: https://syzkaller.appspot.com/x/.config?x=4d90a36f0cab495a
dashboard link: https://syzkaller.appspot.com/bug?extid=9a5b0ced8b1bfb238b56
compiler:
patch: https://syzkaller.appspot.com/x/patch.diff?x=1591ee15180000
On Wed, 03 Apr 2024 11:23:26 -0700
> syzbot found the following issue on:
>
> HEAD commit: fe46a7dd189e Merge tag 'sound-6.9-rc1' of git://git.kernel..
> git tree: upstream
> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=15c38139180000
Test Al's idea.
#syz test https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git fe46a7dd189e
--- a/fs/kernfs/file.c
+++ b/fs/kernfs/file.c
@@ -639,6 +639,8 @@ static int kernfs_fop_open(struct inode
*/
if (has_mmap)
mutex_init(&of->mutex);
+ else if (file->f_mode & FMODE_WRITE)
+ mutex_init(&of->mutex);
else
mutex_init(&of->mutex);
--
On Wed, 03 Apr 2024 11:23:26 -0700
> syzbot found the following issue on:
>
> HEAD commit: fe46a7dd189e Merge tag 'sound-6.9-rc1' of git://git.kernel..
> git tree: upstream
> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=15c38139180000
Test Al's idea.
#syz test https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git fe46a7dd189e
--- x/fs/overlayfs/util.c
+++ y/fs/overlayfs/util.c
@@ -639,6 +639,8 @@ static int kernfs_fop_open(struct inode
*/
if (has_mmap)
mutex_init(&of->mutex);
+ else if (file->f_mode & FMODE_WRITE)
+ mutex_init(&of->mutex);
else
mutex_init(&of->mutex);
--
On Fri, Apr 05, 2024 at 04:02:03PM -0700, syzbot wrote:
> Hello,
>
> syzbot tried to test the proposed patch but the build/boot failed:
>
> failed to apply patch:
> checking file fs/overlayfs/util.c
> Hunk #1 FAILED at 639.
> 1 out of 1 hunk FAILED
Heh...
ed fs/kernfs/file.c <<'EOF'
/mutex_init/a
else if (file->f_mode & FMODE_WRITE)
mutex_init(&of->mutex);
.
w
q
EOF
(and if the bot is willing to apply that kind of patches, we have a security
problem on hands ;-)
Hello,
syzbot has tested the proposed patch and the reproducer did not trigger any issue:
Reported-and-tested-by: [email protected]
Tested on:
commit: fe46a7dd Merge tag 'sound-6.9-rc1' of git://git.kernel..
git tree: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
console output: https://syzkaller.appspot.com/x/log.txt?x=137893e3180000
kernel config: https://syzkaller.appspot.com/x/.config?x=4d90a36f0cab495a
dashboard link: https://syzkaller.appspot.com/bug?extid=9a5b0ced8b1bfb238b56
compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
patch: https://syzkaller.appspot.com/x/patch.diff?x=111f2c05180000
Note: testing is done by a robot and is best-effort only.
On Sat, Apr 6, 2024 at 7:09 AM Al Viro <[email protected]> wrote:
>
> On Thu, Apr 04, 2024 at 12:33:40PM +0300, Amir Goldstein wrote:
>
> > We do not (anymore) lock ovl inode in ovl_llseek(), see:
> > b1f9d3858f72 ovl: use ovl_inode_lock in ovl_llseek()
> > but ovl inode is held in operations (e.g. ovl_rename)
> > which trigger copy up and call vfs_llseek() on the lower file.
>
> OK, but why do we bother with ovl_inode_lock() there?
> Note that serialization on struct file level is provided
> on syscall level - see call of fdget_pos() in there.
> IOW, which object are you protecting? If it's struct file
> passed your way, you should already have the serialization.
> If it's underlying file on disk, that's up to vfs_llseek().
You're right.
> Exclusion with copyup by a different operation?
Nah, don't see how this is relevant to file->f_pos.
>
> I'm not saying it's wrong - it's just that the thing is
> tricky enough, so some clarification might be a good idea.
I think I just used inode_lock() in
9e46b840c705 ("ovl: support stacked SEEK_HOLE/SEEK_DATA")
as a common coding pattern in overlayfs when protecting the
"master" copy of overlay inode attributes, but it was not needed
for file->f_pos.
Miklos, please ack that I am not missing anything and that
ovl_inode_lock() is indeed redundant in ovl_llseek().
Anyway, this lock is not part of the lockdep issue that started this thread.
Thanks,
Amir.
On Fri, Apr 5, 2024 at 7:23 PM Al Viro <[email protected]> wrote:
>
> On Fri, Apr 05, 2024 at 08:37:03AM -0700, syzbot wrote:
> > Hello,
> >
> > syzbot tried to test the proposed patch but the build/boot failed:
>
> WTF? The patch is
>
> diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c
> index e9df2f87072c6..8502ef68459b9 100644
> --- a/fs/kernfs/file.c
> +++ b/fs/kernfs/file.c
> @@ -636,11 +636,18 @@ static int kernfs_fop_open(struct inode *inode, struct file *file)
> * each file a separate locking class. Let's differentiate on
> * whether the file has mmap or not for now.
> *
> - * Both paths of the branch look the same. They're supposed to
> + * For similar reasons, writable and readonly files are given different
> + * lockdep key, because the writable file /sys/power/resume may call vfs
> + * lookup helpers for arbitrary paths and readonly files can be read by
> + * overlayfs from vfs helpers when sysfs is a lower layer of overalyfs.
> + *
> + * All three cases look the same. They're supposed to
> * look that way and give @of->mutex different static lockdep keys.
> */
> if (has_mmap)
> mutex_init(&of->mutex);
> + else if (file->f_mode & FMODE_WRITE)
> + mutex_init(&of->mutex);
> else
> mutex_init(&of->mutex);
>
> How could it possibly trigger boot failure? Test the parent, perhaps?
Let's try again, rebased on current master:
#syz test: https://github.com/amir73il/linux/ vfs-fixes
Thanks,
Amir.
Hello,
syzbot tried to test the proposed patch but the build/boot failed:
viperboard
[ 8.566509][ T1] usbcore: registered new interface driver dln2
[ 8.568801][ T1] usbcore: registered new interface driver pn533_usb
[ 8.579446][ T1] nfcsim 0.2 initialized
[ 8.581035][ T1] usbcore: registered new interface driver port100
[ 8.583546][ T1] usbcore: registered new interface driver nfcmrvl
[ 8.595440][ T1] Loading iSCSI transport class v2.0-870.
[ 8.614503][ T1] virtio_scsi virtio0: 1/0/0 default/read/poll queues
[ 8.624792][ T1] ------------[ cut here ]------------
[ 8.626075][ T1] refcount_t: decrement hit 0; leaking memory.
[ 8.627319][ T1] WARNING: CPU: 0 PID: 1 at lib/refcount.c:31 refcount_warn_saturate+0xfa/0x1d0
[ 8.629053][ T1] Modules linked in:
[ 8.629654][ T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.9.0-rc2-syzkaller-00388-g3398bf34c993 #0
[ 8.631954][ T1] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024
[ 8.634054][ T1] RIP: 0010:refcount_warn_saturate+0xfa/0x1d0
[ 8.635268][ T1] Code: b2 00 00 00 e8 d7 93 f0 fc 5b 5d c3 cc cc cc cc e8 cb 93 f0 fc c6 05 ae 5c ee 0a 01 90 48 c7 c7 40 80 1e 8c e8 e7 2e b3 fc 90 <0f> 0b 90 90 eb d9 e8 ab 93 f0 fc c6 05 8b 5c ee 0a 01 90 48 c7 c7
[ 8.638923][ T1] RSP: 0000:ffffc90000066e18 EFLAGS: 00010246
[ 8.639822][ T1] RAX: 9b6fe4ff1df42a00 RBX: ffff888021006ccc RCX: ffff888016ac0000
[ 8.641085][ T1] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[ 8.642968][ T1] RBP: 0000000000000004 R08: ffffffff8157ffc2 R09: fffffbfff1c39af8
[ 8.645656][ T1] R10: dffffc0000000000 R11: fffffbfff1c39af8 R12: ffffea000502fdc0
[ 8.646945][ T1] R13: ffffea000502fdc8 R14: 1ffffd4000a05fb9 R15: 0000000000000000
[ 8.648121][ T1] FS: 0000000000000000(0000) GS:ffff8880b9400000(0000) knlGS:0000000000000000
[ 8.650496][ T1] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 8.652023][ T1] CR2: ffff88823ffff000 CR3: 000000000e134000 CR4: 00000000003506f0
[ 8.653935][ T1] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 8.655934][ T1] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 8.657851][ T1] Call Trace:
[ 8.658338][ T1] <TASK>
[ 8.658807][ T1] ? __warn+0x163/0x4e0
[ 8.659517][ T1] ? refcount_warn_saturate+0xfa/0x1d0
[ 8.660549][ T1] ? report_bug+0x2b3/0x500
[ 8.661574][ T1] ? refcount_warn_saturate+0xfa/0x1d0
[ 8.663372][ T1] ? handle_bug+0x3e/0x70
[ 8.664928][ T1] ? exc_invalid_op+0x1a/0x50
[ 8.666599][ T1] ? asm_exc_invalid_op+0x1a/0x20
[ 8.668117][ T1] ? __warn_printk+0x292/0x360
[ 8.670085][ T1] ? refcount_warn_saturate+0xfa/0x1d0
[ 8.671620][ T1] ? refcount_warn_saturate+0xf9/0x1d0
[ 8.673520][ T1] __free_pages_ok+0xc54/0xd80
[ 8.675020][ T1] make_alloc_exact+0xa3/0xf0
[ 8.676151][ T1] vring_alloc_queue_split+0x20a/0x600
[ 8.677685][ T1] ? __pfx_vring_alloc_queue_split+0x10/0x10
[ 8.678762][ T1] ? vp_find_vqs+0x4c/0x4e0
[ 8.679763][ T1] ? virtscsi_probe+0x3ea/0xf60
[ 8.681408][ T1] ? virtio_dev_probe+0x991/0xaf0
[ 8.683147][ T1] ? really_probe+0x2b8/0xad0
[ 8.684759][ T1] ? driver_probe_device+0x50/0x430
[ 8.685967][ T1] vring_create_virtqueue_split+0xc6/0x310
[ 8.687403][ T1] ? ret_from_fork+0x4b/0x80
[ 8.688676][ T1] ? __pfx_vring_create_virtqueue_split+0x10/0x10
[ 8.690045][ T1] vring_create_virtqueue+0xca/0x110
[ 8.691014][ T1] ? __pfx_vp_notify+0x10/0x10
[ 8.692612][ T1] ? __pfx_virtscsi_ctrl_done+0x10/0x10
[ 8.693924][ T1] setup_vq+0xe9/0x2d0
[ 8.695355][ T1] ? __pfx_vp_notify+0x10/0x10
[ 8.696851][ T1] ? __pfx_virtscsi_ctrl_done+0x10/0x10
[ 8.699030][ T1] ? __pfx_virtscsi_ctrl_done+0x10/0x10
[ 8.700639][ T1] ? __pfx_virtscsi_ctrl_done+0x10/0x10
[ 8.702131][ T1] vp_setup_vq+0xbf/0x330
[ 8.703279][ T1] ? __pfx_vp_config_changed+0x10/0x10
[ 8.704850][ T1] ? ioread16+0x2f/0x90
[ 8.706423][ T1] ? __pfx_virtscsi_ctrl_done+0x10/0x10
[ 8.707372][ T1] vp_find_vqs_msix+0x8b2/0xc80
[ 8.708975][ T1] vp_find_vqs+0x4c/0x4e0
[ 8.709918][ T1] virtscsi_init+0x8db/0xd00
[ 8.710723][ T1] ? __pfx_virtscsi_init+0x10/0x10
[ 8.711813][ T1] ? __pfx_default_calc_sets+0x10/0x10
[ 8.713608][ T1] ? scsi_host_alloc+0xa57/0xea0
[ 8.714998][ T1] ? vp_get+0xfd/0x140
[ 8.715786][ T1] virtscsi_probe+0x3ea/0xf60
[ 8.716933][ T1] ? __pfx_virtscsi_probe+0x10/0x10
[ 8.718052][ T1] ? kernfs_add_one+0x156/0x8b0
[ 8.719084][ T1] ? virtio_no_restricted_mem_acc+0x9/0x10
[ 8.719983][ T1] ? virtio_features_ok+0x10c/0x270
[ 8.721406][ T1] virtio_dev_probe+0x991/0xaf0
[ 8.722392][ T1] ? __pfx_virtio_dev_probe+0x10/0x10
[ 8.723354][ T1] really_probe+0x2b8/0xad0
[ 8.725227][ T1] __driver_probe_device+0x1a2/0x390
[ 8.726307][ T1] driver_probe_device+0x50/0x430
[ 8.727156][ T1] __driver_attach+0x45f/0x710
[ 8.727968][ T1] ? __pfx___driver_attach+0x10/0x10
[ 8.728999][ T1] bus_for_each_dev+0x239/0x2b0
[ 8.730291][ T1] ? __pfx___driver_attach+0x10/0x10
[ 8.731396][ T1] ? __pfx_bus_for_each_dev+0x10/0x10
[ 8.732523][ T1] ? do_raw_spin_unlock+0x13c/0x8b0
[ 8.733601][ T1] bus_add_driver+0x347/0x620
[ 8.734733][ T1] driver_register+0x23a/0x320
[ 8.735723][ T1] ? __pfx_virtio_scsi_init+0x10/0x10
[ 8.736859][ T1] virtio_scsi_init+0x65/0xe0
[ 8.737729][ T1] ? __pfx_virtio_scsi_init+0x10/0x10
[ 8.739109][ T1] do_one_initcall+0x248/0x880
[ 8.740693][ T1] ? __pfx_virtio_scsi_init+0x10/0x10
[ 8.742428][ T1] ? __pfx_do_one_initcall+0x10/0x10
[ 8.745322][ T1] ? lockdep_hardirqs_on_prepare+0x43d/0x780
[ 8.746957][ T1] ? __pfx_parse_args+0x10/0x10
[ 8.748064][ T1] ? do_initcalls+0x1c/0x80
[ 8.749434][ T1] ? rcu_is_watching+0x15/0xb0
[ 8.750232][ T1] do_initcall_level+0x157/0x210
[ 8.751547][ T1] do_initcalls+0x3f/0x80
[ 8.752397][ T1] kernel_init_freeable+0x435/0x5d0
[ 8.754091][ T1] ? __pfx_kernel_init_freeable+0x10/0x10
[ 8.755229][ T1] ? __pfx_lockdep_hardirqs_on_prepare+0x10/0x10
[ 8.757834][ T1] ? __pfx_kernel_init+0x10/0x10
[ 8.759572][ T1] ? __pfx_kernel_init+0x10/0x10
[ 8.761202][ T1] ? __pfx_kernel_init+0x10/0x10
[ 8.762156][ T1] kernel_init+0x1d/0x2b0
[ 8.763250][ T1] ret_from_fork+0x4b/0x80
[ 8.764029][ T1] ? __pfx_kernel_init+0x10/0x10
[ 8.765039][ T1] ret_from_fork_asm+0x1a/0x30
[ 8.766178][ T1] </TASK>
[ 8.766835][ T1] Kernel panic - not syncing: kernel: panic_on_warn set ...
[ 8.768906][ T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.9.0-rc2-syzkaller-00388-g3398bf34c993 #0
[ 8.770531][ T1] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024
[ 8.772129][ T1] Call Trace:
[ 8.772129][ T1] <TASK>
[ 8.772129][ T1] dump_stack_lvl+0x241/0x360
[ 8.772129][ T1] ? __pfx_dump_stack_lvl+0x10/0x10
[ 8.772129][ T1] ? __pfx__printk+0x10/0x10
[ 8.772129][ T1] ? _printk+0xd5/0x120
[ 8.772129][ T1] ? vscnprintf+0x5d/0x90
[ 8.772129][ T1] panic+0x349/0x860
[ 8.772129][ T1] ? __warn+0x172/0x4e0
[ 8.772129][ T1] ? __pfx_panic+0x10/0x10
[ 8.772129][ T1] ? show_trace_log_lvl+0x4e6/0x520
[ 8.781563][ T1] ? ret_from_fork_asm+0x1a/0x30
[ 8.781563][ T1] __warn+0x346/0x4e0
[ 8.781563][ T1] ? refcount_warn_saturate+0xfa/0x1d0
[ 8.781563][ T1] report_bug+0x2b3/0x500
[ 8.781563][ T1] ? refcount_warn_saturate+0xfa/0x1d0
[ 8.781563][ T1] handle_bug+0x3e/0x70
[ 8.781563][ T1] exc_invalid_op+0x1a/0x50
[ 8.781563][ T1] asm_exc_invalid_op+0x1a/0x20
[ 8.781563][ T1] RIP: 0010:refcount_warn_saturate+0xfa/0x1d0
[ 8.791455][ T1] Code: b2 00 00 00 e8 d7 93 f0 fc 5b 5d c3 cc cc cc cc e8 cb 93 f0 fc c6 05 ae 5c ee 0a 01 90 48 c7 c7 40 80 1e 8c e8 e7 2e b3 fc 90 <0f> 0b 90 90 eb d9 e8 ab 93 f0 fc c6 05 8b 5c ee 0a 01 90 48 c7 c7
[ 8.791455][ T1] RSP: 0000:ffffc90000066e18 EFLAGS: 00010246
[ 8.791455][ T1] RAX: 9b6fe4ff1df42a00 RBX: ffff888021006ccc RCX: ffff888016ac0000
[ 8.791455][ T1] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[ 8.791455][ T1] RBP: 0000000000000004 R08: ffffffff8157ffc2 R09: fffffbfff1c39af8
[ 8.801572][ T1] R10: dffffc0000000000 R11: fffffbfff1c39af8 R12: ffffea000502fdc0
[ 8.801572][ T1] R13: ffffea000502fdc8 R14: 1ffffd4000a05fb9 R15: 0000000000000000
[ 8.801572][ T1] ? __warn_printk+0x292/0x360
[ 8.801572][ T1] ? refcount_warn_saturate+0xf9/0x1d0
[ 8.801572][ T1] __free_pages_ok+0xc54/0xd80
[ 8.801572][ T1] make_alloc_exact+0xa3/0xf0
[ 8.801572][ T1] vring_alloc_queue_split+0x20a/0x600
[ 8.801572][ T1] ? __pfx_vring_alloc_queue_split+0x10/0x10
[ 8.811433][ T1] ? vp_find_vqs+0x4c/0x4e0
[ 8.811433][ T1] ? virtscsi_probe+0x3ea/0xf60
[ 8.811433][ T1] ? virtio_dev_probe+0x991/0xaf0
[ 8.811433][ T1] ? really_probe+0x2b8/0xad0
[ 8.811433][ T1] ? driver_probe_device+0x50/0x430
[ 8.811433][ T1] vring_create_virtqueue_split+0xc6/0x310
[ 8.811433][ T1] ? ret_from_fork+0x4b/0x80
[ 8.811433][ T1] ? __pfx_vring_create_virtqueue_split+0x10/0x10
[ 8.811433][ T1] vring_create_virtqueue+0xca/0x110
[ 8.821568][ T1] ? __pfx_vp_notify+0x10/0x10
[ 8.821568][ T1] ? __pfx_virtscsi_ctrl_done+0x10/0x10
[ 8.821568][ T1] setup_vq+0xe9/0x2d0
[ 8.821568][ T1] ? __pfx_vp_notify+0x10/0x10
[ 8.821568][ T1] ? __pfx_virtscsi_ctrl_done+0x10/0x10
[ 8.821568][ T1] ? __pfx_virtscsi_ctrl_done+0x10/0x10
[ 8.821568][ T1] ? __pfx_virtscsi_ctrl_done+0x10/0x10
[ 8.821568][ T1] vp_setup_vq+0xbf/0x330
[ 8.831450][ T1] ? __pfx_vp_config_changed+0x10/0x10
[ 8.831450][ T1] ? ioread16+0x2f/0x90
[ 8.831450][ T1] ? __pfx_virtscsi_ctrl_done+0x10/0x10
[ 8.831450][ T1] vp_find_vqs_msix+0x8b2/0xc80
[ 8.831450][ T1] vp_find_vqs+0x4c/0x4e0
[ 8.831450][ T1] virtscsi_init+0x8db/0xd00
[ 8.831450][ T1] ? __pfx_virtscsi_init+0x10/0x10
[ 8.831450][ T1] ? __pfx_default_calc_sets+0x10/0x10
[ 8.831450][ T1] ? scsi_host_alloc+0xa57/0xea0
[ 8.831450][ T1] ? vp_get+0xfd/0x140
[ 8.831450][ T1] virtscsi_probe+0x3ea/0xf60
[ 8.841557][ T1] ? __pfx_virtscsi_probe+0x10/0x10
[ 8.841557][ T1] ? kernfs_add_one+0x156/0x8b0
[ 8.841557][ T1] ? virtio_no_restricted_mem_acc+0x9/0x10
[ 8.841557][ T1] ? virtio_features_ok+0x10c/0x270
[ 8.841557][ T1] virtio_dev_probe+0x991/0xaf0
[ 8.841557][ T1] ? __pfx_virtio_dev_probe+0x10/0x10
[ 8.841557][ T1] really_probe+0x2b8/0xad0
[ 8.841557][ T1] __driver_probe_device+0x1a2/0x390
[ 8.841557][ T1] driver_probe_device+0x50/0x430
[ 8.841557][ T1] __driver_attach+0x45f/0x710
[ 8.851437][ T1] ? __pfx___driver_attach+0x10/0x10
[ 8.851437][ T1] bus_for_each_dev+0x239/0x2b0
[ 8.851437][ T1] ? __pfx___driver_attach+0x10/0x10
[ 8.851437][ T1] ? __pfx_bus_for_each_dev+0x10/0x10
[ 8.851437][ T1] ? do_raw_spin_unlock+0x13c/0x8b0
[ 8.851437][ T1] bus_add_driver+0x347/0x620
[ 8.851437][ T1] driver_register+0x23a/0x320
[ 8.851437][ T1] ? __pfx_virtio_scsi_init+0x10/0x10
[ 8.851437][ T1] virtio_scsi_init+0x65/0xe0
[ 8.851437][ T1] ? __pfx_virtio_scsi_init+0x10/0x10
[ 8.851437][ T1] do_one_initcall+0x248/0x880
[ 8.861551][ T1] ? __pfx_virtio_scsi_init+0x10/0x10
[ 8.861551][ T1] ? __pfx_do_one_initcall+0x10/0x10
[ 8.861551][ T1] ? lockdep_hardirqs_on_prepare+0x43d/0x780
[ 8.861551][ T1] ? __pfx_parse_args+0x10/0x10
[ 8.861551][ T1] ? do_initcalls+0x1c/0x80
[ 8.861551][ T1] ? rcu_is_watching+0x15/0xb0
[ 8.861551][ T1] do_initcall_level+0x157/0x210
[ 8.861551][ T1] do_initcalls+0x3f/0x80
[ 8.861551][ T1] kernel_init_freeable+0x435/0x5d0
[ 8.871429][ T1] ? __pfx_kernel_init_freeable+0x10/0x10
[ 8.871429][ T1] ? __pfx_lockdep_hardirqs_on_prepare+0x10/0x10
[ 8.871429][ T1] ? __pfx_kernel_init+0x10/0x10
[ 8.871429][ T1] ? __pfx_kernel_init+0x10/0x10
[ 8.871429][ T1] ? __pfx_kernel_init+0x10/0x10
[ 8.871429][ T1] kernel_init+0x1d/0x2b0
[ 8.871429][ T1] ret_from_fork+0x4b/0x80
[ 8.871429][ T1] ? __pfx_kernel_init+0x10/0x10
[ 8.871429][ T1] ret_from_fork_asm+0x1a/0x30
[ 8.871429][ T1] </TASK>
[ 8.871429][ T1] Kernel Offset: disabled
[ 8.871429][ T1] Rebooting in 86400 seconds..
syzkaller build log:
go env (err=<nil>)
GO111MODULE='auto'
GOARCH='amd64'
GOBIN=''
GOCACHE='/syzkaller/.cache/go-build'
GOENV='/syzkaller/.config/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='amd64'
GOHOSTOS='linux'
GOINSECURE=''
GOMODCACHE='/syzkaller/jobs-2/linux/gopath/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='linux'
GOPATH='/syzkaller/jobs-2/linux/gopath'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/usr/local/go'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/usr/local/go/pkg/tool/linux_amd64'
GOVCS=''
GOVERSION='go1.21.4'
GCCGO='gccgo'
GOAMD64='v1'
AR='ar'
CC='gcc'
CXX='g++'
CGO_ENABLED='1'
GOMOD='/syzkaller/jobs-2/linux/gopath/src/github.com/google/syzkaller/go.mod'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -m64 -pthread -Wl,--no-gc-sections -fmessage-length=0 -ffile-prefix-map=/tmp/go-build3147238964=/tmp/go-build -gno-record-gcc-switches'
git status (err=<nil>)
HEAD detached at 51c4dcff8
nothing to commit, working tree clean
tput: No value for $TERM and no -T specified
tput: No value for $TERM and no -T specified
Makefile:31: run command via tools/syz-env for best compatibility, see:
Makefile:32: https://github.com/google/syzkaller/blob/master/docs/contributing.md#using-syz-env
go list -f '{{.Stale}}' ./sys/syz-sysgen | grep -q false || go install ./sys/syz-sysgen
make .descriptions
tput: No value for $TERM and no -T specified
tput: No value for $TERM and no -T specified
Makefile:31: run command via tools/syz-env for best compatibility, see:
Makefile:32: https://github.com/google/syzkaller/blob/master/docs/contributing.md#using-syz-env
bin/syz-sysgen
touch .descriptions
GOOS=linux GOARCH=amd64 go build "-ldflags=-s -w -X github.com/google/syzkaller/prog.GitRevision=51c4dcff83b0574620c280cc5130ef59cc4a2e32 -X 'github.com/google/syzkaller/prog.gitRevisionDate=20240403-123700'" "-tags=syz_target syz_os_linux syz_arch_amd64 " -o ./bin/linux_amd64/syz-fuzzer github.com/google/syzkaller/syz-fuzzer
GOOS=linux GOARCH=amd64 go build "-ldflags=-s -w -X github.com/google/syzkaller/prog.GitRevision=51c4dcff83b0574620c280cc5130ef59cc4a2e32 -X 'github.com/google/syzkaller/prog.gitRevisionDate=20240403-123700'" "-tags=syz_target syz_os_linux syz_arch_amd64 " -o ./bin/linux_amd64/syz-execprog github.com/google/syzkaller/tools/syz-execprog
GOOS=linux GOARCH=amd64 go build "-ldflags=-s -w -X github.com/google/syzkaller/prog.GitRevision=51c4dcff83b0574620c280cc5130ef59cc4a2e32 -X 'github.com/google/syzkaller/prog.gitRevisionDate=20240403-123700'" "-tags=syz_target syz_os_linux syz_arch_amd64 " -o ./bin/linux_amd64/syz-stress github.com/google/syzkaller/tools/syz-stress
mkdir -p ./bin/linux_amd64
gcc -o ./bin/linux_amd64/syz-executor executor/executor.cc \
-m64 -O2 -pthread -Wall -Werror -Wparentheses -Wunused-const-variable -Wframe-larger-than=16384 -Wno-stringop-overflow -Wno-array-bounds -Wno-format-overflow -Wno-unused-but-set-variable -Wno-unused-command-line-argument -static-pie -fpermissive -w -DGOOS_linux=1 -DGOARCH_amd64=1 \
-DHOSTGOOS_linux=1 -DGIT_REVISION=\"51c4dcff83b0574620c280cc5130ef59cc4a2e32\"
Error text is too large and was truncated, full error text is at:
https://syzkaller.appspot.com/x/error.txt?x=16a356f6180000
Tested on:
commit: 3398bf34 kernfs: annotate different lockdep class for ..
git tree: https://github.com/amir73il/linux/ vfs-fixes
kernel config: https://syzkaller.appspot.com/x/.config?x=c5cda112a8438056
dashboard link: https://syzkaller.appspot.com/bug?extid=9a5b0ced8b1bfb238b56
compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
Note: no patches were applied.
On Sat, 6 Apr 2024 08:11:30 +0100 Al Viro <[email protected]>
> On Sat, Apr 06, 2024 at 12:05:04AM -0700, syzbot wrote:
> > commit: 3398bf34 kernfs: annotate different lockdep class for ..
> > git tree: https://github.com/amir73il/linux/ vfs-fixes
> > kernel config: https://syzkaller.appspot.com/x/.config?x=c5cda112a8438056
> > dashboard link: https://syzkaller.appspot.com/bug?extid=9a5b0ced8b1bfb238b56
> > compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
> >
> > Note: no patches were applied.
>
> How about the same test on 6c6e47d69d821047097909288b6d7f1aafb3b9b1?
>
JFYI it works [1]
[1] https://lore.kernel.org/lkml/[email protected]/
On Fri, Apr 05, 2024 at 08:51:35AM +0200, Christoph Hellwig wrote:
> On Thu, Apr 04, 2024 at 12:33:40PM +0300, Amir Goldstein wrote:
> > I don't follow what you are saying.
> > Which code is in non-starter violation?
> > kernfs for calling lookup_bdev() with internal of->mutex held?
>
> That is a huge problem, and has been causing endless annoying lockdep
> chains in the block layer for us. If we have some way to kill this
> the whole block layer would benefit.
Specifically of->mutex or having pathwalk done from some ->write_iter()?
Incidentally, losetup happily accepts sysfs files as backing store.
I don't think it adds problems we don't have otherwise, but IMO
that's really bogus...
On Thu, Apr 04, 2024 at 12:33:40PM +0300, Amir Goldstein wrote:
> We do not (anymore) lock ovl inode in ovl_llseek(), see:
> b1f9d3858f72 ovl: use ovl_inode_lock in ovl_llseek()
> but ovl inode is held in operations (e.g. ovl_rename)
> which trigger copy up and call vfs_llseek() on the lower file.
OK, but why do we bother with ovl_inode_lock() there?
Note that serialization on struct file level is provided
on syscall level - see call of fdget_pos() in there.
IOW, which object are you protecting? If it's struct file
passed your way, you should already have the serialization.
If it's underlying file on disk, that's up to vfs_llseek().
Exclusion with copyup by a different operation?
I'm not saying it's wrong - it's just that the thing is
tricky enough, so some clarification might be a good idea.
On Sat, Apr 6, 2024 at 10:11 AM Al Viro <[email protected]> wrote:
>
> On Sat, Apr 06, 2024 at 12:05:04AM -0700, syzbot wrote:
>
> > commit: 3398bf34 kernfs: annotate different lockdep class for ..
> > git tree: https://github.com/amir73il/linux/ vfs-fixes
> > kernel config: https://syzkaller.appspot.com/x/.config?x=c5cda112a8438056
> > dashboard link: https://syzkaller.appspot.com/bug?extid=9a5b0ced8b1bfb238b56
> > compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
> >
> > Note: no patches were applied.
>
Looks like it fixes the problem:
https://lore.kernel.org/lkml/[email protected]/
Al,
Are you ok with going with your solution?
Do you want to pick it up through your tree?
Or shall I post it and ask Christian or Greg to pick it up?
Thanks,
Amir.
On Sat, Apr 06, 2024 at 12:05:04AM -0700, syzbot wrote:
> commit: 3398bf34 kernfs: annotate different lockdep class for ..
> git tree: https://github.com/amir73il/linux/ vfs-fixes
> kernel config: https://syzkaller.appspot.com/x/.config?x=c5cda112a8438056
> dashboard link: https://syzkaller.appspot.com/bug?extid=9a5b0ced8b1bfb238b56
> compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
>
> Note: no patches were applied.
How about the same test on 6c6e47d69d821047097909288b6d7f1aafb3b9b1?
On Sat, Apr 06, 2024 at 04:23:51PM +0800, Hillf Danton wrote:
> On Sat, 6 Apr 2024 08:11:30 +0100 Al Viro <[email protected]>
> > On Sat, Apr 06, 2024 at 12:05:04AM -0700, syzbot wrote:
> > > commit: 3398bf34 kernfs: annotate different lockdep class for ..
> > > git tree: https://github.com/amir73il/linux/ vfs-fixes
> > > kernel config: https://syzkaller.appspot.com/x/.config?x=c5cda112a8438056
> > > dashboard link: https://syzkaller.appspot.com/bug?extid=9a5b0ced8b1bfb238b56
> > > compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
c > >
> > > Note: no patches were applied.
> >
> > How about the same test on 6c6e47d69d821047097909288b6d7f1aafb3b9b1?
> >
> JFYI it works [1]
>
> [1] https://lore.kernel.org/lkml/[email protected]/
It works on top of v6.8-8951-gfe46a7dd189e; boot failures on top
of v6.9-rc2-387-g6c6e47d69d82 and on top of v6.9-rc1. See
https://lore.kernel.org/lkml/[email protected]/
and
https://lore.kernel.org/lkml/[email protected]/
resp. Both hit refcount_t underflow in virtio_scsi probing, with
very similar call chains (if not outright identical ones - hadn't
checked in details).
I don't believe that this patch introduces a boot failure, let alone
this one - all of that is likely to be shared with the corresponding
points on mainline.
Might be interesting to try and figure out what that is, but that's
a separate bug.
On Sat, Apr 06, 2024 at 11:57:11AM +0300, Amir Goldstein wrote:
> On Sat, Apr 6, 2024 at 10:11 AM Al Viro <[email protected]> wrote:
> >
> > On Sat, Apr 06, 2024 at 12:05:04AM -0700, syzbot wrote:
> >
> > > commit: 3398bf34 kernfs: annotate different lockdep class for ..
> > > git tree: https://github.com/amir73il/linux/ vfs-fixes
> > > kernel config: https://syzkaller.appspot.com/x/.config?x=c5cda112a8438056
> > > dashboard link: https://syzkaller.appspot.com/bug?extid=9a5b0ced8b1bfb238b56
> > > compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
> > >
> > > Note: no patches were applied.
> >
>
> Looks like it fixes the problem:
> https://lore.kernel.org/lkml/[email protected]/
>
> Al,
>
> Are you ok with going with your solution?
> Do you want to pick it up through your tree?
> Or shall I post it and ask Christian or Greg to pick it up?
Umm... I can grab it into #fixes. Would be nice to get the result of
that test both on current mainline and mainline + patch, though - looks
like the same test keeps hitting some unrelated shite in virtio_scsi.
On Sun, Apr 7, 2024 at 3:51 AM Al Viro <[email protected]> wrote:
>
> On Sat, Apr 06, 2024 at 11:57:11AM +0300, Amir Goldstein wrote:
> > On Sat, Apr 6, 2024 at 10:11 AM Al Viro <[email protected]> wrote:
> > >
> > > On Sat, Apr 06, 2024 at 12:05:04AM -0700, syzbot wrote:
> > >
> > > > commit: 3398bf34 kernfs: annotate different lockdep class for ..
> > > > git tree: https://github.com/amir73il/linux/ vfs-fixes
> > > > kernel config: https://syzkaller.appspot.com/x/.config?x=c5cda112a8438056
> > > > dashboard link: https://syzkaller.appspot.com/bug?extid=9a5b0ced8b1bfb238b56
> > > > compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
> > > >
> > > > Note: no patches were applied.
> > >
> >
> > Looks like it fixes the problem:
> > https://lore.kernel.org/lkml/[email protected]/
> >
> > Al,
> >
> > Are you ok with going with your solution?
> > Do you want to pick it up through your tree?
> > Or shall I post it and ask Christian or Greg to pick it up?
>
> Umm... I can grab it into #fixes.
Cool.
Also:
#syz dup: possible deadlock in seq_read_iter (3)
https://lore.kernel.org/linux-fsdevel/CAJfpegumD0gDXpZwy53pPu54ifOfW-tTBLniLHep3sW2Z96MjQ@mail.gmail.com/
Thanks,
Amir.
On Sun, Apr 07, 2024 at 01:50:56AM +0100, Al Viro wrote:
> On Sat, Apr 06, 2024 at 11:57:11AM +0300, Amir Goldstein wrote:
> > On Sat, Apr 6, 2024 at 10:11 AM Al Viro <[email protected]> wrote:
> > >
> > > On Sat, Apr 06, 2024 at 12:05:04AM -0700, syzbot wrote:
> > >
> > > > commit: 3398bf34 kernfs: annotate different lockdep class for ..
> > > > git tree: https://github.com/amir73il/linux/ vfs-fixes
> > > > kernel config: https://syzkaller.appspot.com/x/.config?x=c5cda112a8438056
> > > > dashboard link: https://syzkaller.appspot.com/bug?extid=9a5b0ced8b1bfb238b56
> > > > compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
> > > >
> > > > Note: no patches were applied.
> > >
> >
> > Looks like it fixes the problem:
> > https://lore.kernel.org/lkml/[email protected]/
> >
> > Al,
> >
> > Are you ok with going with your solution?
> > Do you want to pick it up through your tree?
> > Or shall I post it and ask Christian or Greg to pick it up?
>
> Umm... I can grab it into #fixes. Would be nice to get the result of
Fine by me if you have other stuff pending for -rc4 anyway. Otherwise I
can grab it.