2022-10-05 02:12:55

by syzbot

[permalink] [raw]
Subject: [syzbot] general protection fault in kernfs_get_inode

Hello,

syzbot found the following issue on:

HEAD commit: 0326074ff465 Merge tag 'net-next-6.1' of git://git.kernel...
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=1067555c880000
kernel config: https://syzkaller.appspot.com/x/.config?x=e1de7ca9efcc028c
dashboard link: https://syzkaller.appspot.com/bug?extid=534ee3d24c37c411f37f
compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2

Unfortunately, I don't have any reproducer for this issue yet.

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/43729d6ce2fc/disk-0326074f.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/1f76d6f68eb3/vmlinux-0326074f.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: [email protected]

general protection fault, probably for non-canonical address 0xdffffc0000000012: 0000 [#1] PREEMPT SMP KASAN
KASAN: null-ptr-deref in range [0x0000000000000090-0x0000000000000097]
CPU: 1 PID: 29107 Comm: syz-executor.3 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
RIP: 0010:kernfs_ino include/linux/kernfs.h:358 [inline]
RIP: 0010:kernfs_get_inode+0x2e/0x520 fs/kernfs/inode.c:254
Code: 41 56 41 55 41 54 49 89 fc 53 48 89 f3 e8 1a 04 7e ff 48 8d bb 90 00 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 3a 04 00 00 48 8b b3 90 00 00 00 4c 89 e7 e8 79
RSP: 0018:ffffc9000323fa30 EFLAGS: 00010206
RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffc900069ba000
RDX: 0000000000000012 RSI: ffffffff81fd1156 RDI: 0000000000000090
RBP: ffffc9000323fa50 R08: 0000000000000005 R09: 0000000000000000
R10: 0000000000000001 R11: 000000000000007c R12: ffff8880205d4000
R13: ffff88802368b000 R14: ffff88801ba6d880 R15: ffff88802378e000
FS: 00007fa73c7aa700(0000) GS:ffff8880b9b00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fd6239b0000 CR3: 00000000782f6000 CR4: 00000000003526e0
Call Trace:
<TASK>
cgroup_may_write+0x86/0x120 kernel/cgroup/cgroup.c:4937
cgroup_css_set_fork kernel/cgroup/cgroup.c:6237 [inline]
cgroup_can_fork+0x961/0xec0 kernel/cgroup/cgroup.c:6331
copy_process+0x43ed/0x7090 kernel/fork.c:2358
kernel_clone+0xe7/0xab0 kernel/fork.c:2671
__do_sys_clone3+0x1cd/0x2e0 kernel/fork.c:2963
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7fa73b68a5a9
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fa73c7aa038 EFLAGS: 00000246 ORIG_RAX: 00000000000001b3
RAX: ffffffffffffffda RBX: 00007fa73b7abf80 RCX: 00007fa73b68a5a9
RDX: 0000000000000000 RSI: 0000000000000058 RDI: 00007fa73c7aa050
RBP: 00007fa73b6e5580 R08: 0000000000000000 R09: 0000000000000058
R10: 00007fa73c7aa050 R11: 0000000000000246 R12: 0000000000000058
R13: 00007fa73bcdfb1f R14: 00007fa73c7aa300 R15: 0000000000022000
</TASK>
Modules linked in:
---[ end trace 0000000000000000 ]---
RIP: 0010:kernfs_ino include/linux/kernfs.h:358 [inline]
RIP: 0010:kernfs_get_inode+0x2e/0x520 fs/kernfs/inode.c:254
Code: 41 56 41 55 41 54 49 89 fc 53 48 89 f3 e8 1a 04 7e ff 48 8d bb 90 00 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 3a 04 00 00 48 8b b3 90 00 00 00 4c 89 e7 e8 79
RSP: 0018:ffffc9000323fa30 EFLAGS: 00010206
RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffc900069ba000
RDX: 0000000000000012 RSI: ffffffff81fd1156 RDI: 0000000000000090
RBP: ffffc9000323fa50 R08: 0000000000000005 R09: 0000000000000000
R10: 0000000000000001 R11: 000000000000007c R12: ffff8880205d4000
R13: ffff88802368b000 R14: ffff88801ba6d880 R15: ffff88802378e000
FS: 00007fa73c7aa700(0000) GS:ffff8880b9a00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000200e6000 CR3: 00000000782f6000 CR4: 00000000003526f0
----------------
Code disassembly (best guess):
0: 41 56 push %r14
2: 41 55 push %r13
4: 41 54 push %r12
6: 49 89 fc mov %rdi,%r12
9: 53 push %rbx
a: 48 89 f3 mov %rsi,%rbx
d: e8 1a 04 7e ff callq 0xff7e042c
12: 48 8d bb 90 00 00 00 lea 0x90(%rbx),%rdi
19: 48 b8 00 00 00 00 00 movabs $0xdffffc0000000000,%rax
20: fc ff df
23: 48 89 fa mov %rdi,%rdx
26: 48 c1 ea 03 shr $0x3,%rdx
* 2a: 80 3c 02 00 cmpb $0x0,(%rdx,%rax,1) <-- trapping instruction
2e: 0f 85 3a 04 00 00 jne 0x46e
34: 48 8b b3 90 00 00 00 mov 0x90(%rbx),%rsi
3b: 4c 89 e7 mov %r12,%rdi
3e: e8 .byte 0xe8
3f: 79 .byte 0x79


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at [email protected].

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.


2022-10-05 02:34:16

by syzbot

[permalink] [raw]
Subject: Re: [syzbot] general protection fault in kernfs_get_inode

syzbot has found a reproducer for the following issue on:

HEAD commit: 0326074ff465 Merge tag 'net-next-6.1' of git://git.kernel...
git tree: upstream
console+strace: https://syzkaller.appspot.com/x/log.txt?x=149c92cc880000
kernel config: https://syzkaller.appspot.com/x/.config?x=e1de7ca9efcc028c
dashboard link: https://syzkaller.appspot.com/bug?extid=534ee3d24c37c411f37f
compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=10af2492880000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=104874f0880000

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/43729d6ce2fc/disk-0326074f.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/1f76d6f68eb3/vmlinux-0326074f.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: [email protected]

general protection fault, probably for non-canonical address 0xdffffc0000000012: 0000 [#1] PREEMPT SMP KASAN
KASAN: null-ptr-deref in range [0x0000000000000090-0x0000000000000097]
CPU: 1 PID: 3617 Comm: syz-executor384 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
RIP: 0010:kernfs_ino include/linux/kernfs.h:358 [inline]
RIP: 0010:kernfs_get_inode+0x2e/0x520 fs/kernfs/inode.c:254
Code: 41 56 41 55 41 54 49 89 fc 53 48 89 f3 e8 1a 04 7e ff 48 8d bb 90 00 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 3a 04 00 00 48 8b b3 90 00 00 00 4c 89 e7 e8 79
RSP: 0018:ffffc90003c8fa30 EFLAGS: 00010206
RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000012 RSI: ffffffff81fd1156 RDI: 0000000000000090
RBP: ffffc90003c8fa50 R08: 0000000000000005 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000024 R12: ffff8880224ba000
R13: ffff888075922000 R14: ffff88801ebf0000 R15: ffff8880211ae000
FS: 0000555556907300(0000) GS:ffff8880b9b00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020000200 CR3: 000000001c179000 CR4: 00000000003506e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
cgroup_may_write+0x86/0x120 kernel/cgroup/cgroup.c:4937
cgroup_css_set_fork kernel/cgroup/cgroup.c:6237 [inline]
cgroup_can_fork+0x961/0xec0 kernel/cgroup/cgroup.c:6331
copy_process+0x43ed/0x7090 kernel/fork.c:2358
kernel_clone+0xe7/0xab0 kernel/fork.c:2671
__do_sys_clone3+0x1cd/0x2e0 kernel/fork.c:2963
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7f621d8c3e99
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 51 15 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ffcaa952ec8 EFLAGS: 00000206 ORIG_RAX: 00000000000001b3
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f621d8c3e99
RDX: 0000000000000000 RSI: 0000000000000058 RDI: 00007ffcaa952f40
RBP: 0000000000000000 R08: 00007ffcaa952d60 R09: 00007ffcaa952ef0
R10: 00000000ffffffff R11: 0000000000000206 R12: 00007ffcaa952eec
R13: 00007ffcaa952f00 R14: 00007ffcaa952f40 R15: 0000000000000000
</TASK>
Modules linked in:
---[ end trace 0000000000000000 ]---
RIP: 0010:kernfs_ino include/linux/kernfs.h:358 [inline]
RIP: 0010:kernfs_get_inode+0x2e/0x520 fs/kernfs/inode.c:254
Code: 41 56 41 55 41 54 49 89 fc 53 48 89 f3 e8 1a 04 7e ff 48 8d bb 90 00 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 3a 04 00 00 48 8b b3 90 00 00 00 4c 89 e7 e8 79
RSP: 0018:ffffc90003c8fa30 EFLAGS: 00010206
RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000012 RSI: ffffffff81fd1156 RDI: 0000000000000090
RBP: ffffc90003c8fa50 R08: 0000000000000005 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000024 R12: ffff8880224ba000
R13: ffff888075922000 R14: ffff88801ebf0000 R15: ffff8880211ae000
FS: 0000555556907300(0000) GS:ffff8880b9a00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ffcaa9cb8f0 CR3: 000000001c179000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
----------------
Code disassembly (best guess):
0: 41 56 push %r14
2: 41 55 push %r13
4: 41 54 push %r12
6: 49 89 fc mov %rdi,%r12
9: 53 push %rbx
a: 48 89 f3 mov %rsi,%rbx
d: e8 1a 04 7e ff callq 0xff7e042c
12: 48 8d bb 90 00 00 00 lea 0x90(%rbx),%rdi
19: 48 b8 00 00 00 00 00 movabs $0xdffffc0000000000,%rax
20: fc ff df
23: 48 89 fa mov %rdi,%rdx
26: 48 c1 ea 03 shr $0x3,%rdx
* 2a: 80 3c 02 00 cmpb $0x0,(%rdx,%rax,1) <-- trapping instruction
2e: 0f 85 3a 04 00 00 jne 0x46e
34: 48 8b b3 90 00 00 00 mov 0x90(%rbx),%rsi
3b: 4c 89 e7 mov %r12,%rdi
3e: e8 .byte 0xe8
3f: 79 .byte 0x79

2022-10-07 21:48:06

by Tejun Heo

[permalink] [raw]
Subject: Re: [syzbot] general protection fault in kernfs_get_inode

(cc'ing Christian and quoting whole body)

Christan, I can't repro it here but think what we need is sth like the
following. The problem is that cgroup_is_dead() check in the fork path isn't
enough as the perm check depends on cgrp->procs_file being available but
that is cleared while DYING before DEAD. So, depending on the timing, we can
end up trying to deref NULL pointer in may_write.

diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index 8ad2c267ff471..603b7167450a1 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -4934,6 +4934,10 @@ static int cgroup_may_write(const struct cgroup *cgrp, struct super_block *sb)

lockdep_assert_held(&cgroup_mutex);

+ /*if @cgrp is being removed, procs_file may already be gone */
+ if (!cgrp->procs_file.kn)
+ return -ENODEV;
+
inode = kernfs_get_inode(sb, cgrp->procs_file.kn);
if (!inode)
return -ENOMEM;


On Tue, Oct 04, 2022 at 07:19:34PM -0700, syzbot wrote:
> syzbot has found a reproducer for the following issue on:
>
> HEAD commit: 0326074ff465 Merge tag 'net-next-6.1' of git://git.kernel...
> git tree: upstream
> console+strace: https://syzkaller.appspot.com/x/log.txt?x=149c92cc880000
> kernel config: https://syzkaller.appspot.com/x/.config?x=e1de7ca9efcc028c
> dashboard link: https://syzkaller.appspot.com/bug?extid=534ee3d24c37c411f37f
> compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=10af2492880000
> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=104874f0880000
>
> Downloadable assets:
> disk image: https://storage.googleapis.com/syzbot-assets/43729d6ce2fc/disk-0326074f.raw.xz
> vmlinux: https://storage.googleapis.com/syzbot-assets/1f76d6f68eb3/vmlinux-0326074f.xz
>
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: [email protected]
>
> general protection fault, probably for non-canonical address 0xdffffc0000000012: 0000 [#1] PREEMPT SMP KASAN
> KASAN: null-ptr-deref in range [0x0000000000000090-0x0000000000000097]
> CPU: 1 PID: 3617 Comm: syz-executor384 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
> RIP: 0010:kernfs_ino include/linux/kernfs.h:358 [inline]
> RIP: 0010:kernfs_get_inode+0x2e/0x520 fs/kernfs/inode.c:254
> Code: 41 56 41 55 41 54 49 89 fc 53 48 89 f3 e8 1a 04 7e ff 48 8d bb 90 00 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 3a 04 00 00 48 8b b3 90 00 00 00 4c 89 e7 e8 79
> RSP: 0018:ffffc90003c8fa30 EFLAGS: 00010206
> RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000000
> RDX: 0000000000000012 RSI: ffffffff81fd1156 RDI: 0000000000000090
> RBP: ffffc90003c8fa50 R08: 0000000000000005 R09: 0000000000000000
> R10: 0000000000000001 R11: 0000000000000024 R12: ffff8880224ba000
> R13: ffff888075922000 R14: ffff88801ebf0000 R15: ffff8880211ae000
> FS: 0000555556907300(0000) GS:ffff8880b9b00000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000020000200 CR3: 000000001c179000 CR4: 00000000003506e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
> <TASK>
> cgroup_may_write+0x86/0x120 kernel/cgroup/cgroup.c:4937
> cgroup_css_set_fork kernel/cgroup/cgroup.c:6237 [inline]
> cgroup_can_fork+0x961/0xec0 kernel/cgroup/cgroup.c:6331
> copy_process+0x43ed/0x7090 kernel/fork.c:2358
> kernel_clone+0xe7/0xab0 kernel/fork.c:2671
> __do_sys_clone3+0x1cd/0x2e0 kernel/fork.c:2963
> do_syscall_x64 arch/x86/entry/common.c:50 [inline]
> do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
> entry_SYSCALL_64_after_hwframe+0x63/0xcd
> RIP: 0033:0x7f621d8c3e99
> Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 51 15 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48
> RSP: 002b:00007ffcaa952ec8 EFLAGS: 00000206 ORIG_RAX: 00000000000001b3
> RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f621d8c3e99
> RDX: 0000000000000000 RSI: 0000000000000058 RDI: 00007ffcaa952f40
> RBP: 0000000000000000 R08: 00007ffcaa952d60 R09: 00007ffcaa952ef0
> R10: 00000000ffffffff R11: 0000000000000206 R12: 00007ffcaa952eec
> R13: 00007ffcaa952f00 R14: 00007ffcaa952f40 R15: 0000000000000000
> </TASK>
> Modules linked in:
> ---[ end trace 0000000000000000 ]---
> RIP: 0010:kernfs_ino include/linux/kernfs.h:358 [inline]
> RIP: 0010:kernfs_get_inode+0x2e/0x520 fs/kernfs/inode.c:254
> Code: 41 56 41 55 41 54 49 89 fc 53 48 89 f3 e8 1a 04 7e ff 48 8d bb 90 00 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 3a 04 00 00 48 8b b3 90 00 00 00 4c 89 e7 e8 79
> RSP: 0018:ffffc90003c8fa30 EFLAGS: 00010206
> RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000000
> RDX: 0000000000000012 RSI: ffffffff81fd1156 RDI: 0000000000000090
> RBP: ffffc90003c8fa50 R08: 0000000000000005 R09: 0000000000000000
> R10: 0000000000000001 R11: 0000000000000024 R12: ffff8880224ba000
> R13: ffff888075922000 R14: ffff88801ebf0000 R15: ffff8880211ae000
> FS: 0000555556907300(0000) GS:ffff8880b9a00000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007ffcaa9cb8f0 CR3: 000000001c179000 CR4: 00000000003506f0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> ----------------
> Code disassembly (best guess):
> 0: 41 56 push %r14
> 2: 41 55 push %r13
> 4: 41 54 push %r12
> 6: 49 89 fc mov %rdi,%r12
> 9: 53 push %rbx
> a: 48 89 f3 mov %rsi,%rbx
> d: e8 1a 04 7e ff callq 0xff7e042c
> 12: 48 8d bb 90 00 00 00 lea 0x90(%rbx),%rdi
> 19: 48 b8 00 00 00 00 00 movabs $0xdffffc0000000000,%rax
> 20: fc ff df
> 23: 48 89 fa mov %rdi,%rdx
> 26: 48 c1 ea 03 shr $0x3,%rdx
> * 2a: 80 3c 02 00 cmpb $0x0,(%rdx,%rax,1) <-- trapping instruction
> 2e: 0f 85 3a 04 00 00 jne 0x46e
> 34: 48 8b b3 90 00 00 00 mov 0x90(%rbx),%rsi
> 3b: 4c 89 e7 mov %r12,%rdi
> 3e: e8 .byte 0xe8
> 3f: 79 .byte 0x79
>

--
tejun

2022-10-08 05:56:18

by Christian Brauner

[permalink] [raw]
Subject: Re: [syzbot] general protection fault in kernfs_get_inode

On Fri, Oct 07, 2022 at 11:35:49AM -1000, Tejun Heo wrote:
> (cc'ing Christian and quoting whole body)
>
> Christan, I can't repro it here but think what we need is sth like the
> following. The problem is that cgroup_is_dead() check in the fork path isn't
> enough as the perm check depends on cgrp->procs_file being available but
> that is cleared while DYING before DEAD. So, depending on the timing, we can
> end up trying to deref NULL pointer in may_write.
>
> diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
> index 8ad2c267ff471..603b7167450a1 100644
> --- a/kernel/cgroup/cgroup.c
> +++ b/kernel/cgroup/cgroup.c
> @@ -4934,6 +4934,10 @@ static int cgroup_may_write(const struct cgroup *cgrp, struct super_block *sb)
>
> lockdep_assert_held(&cgroup_mutex);
>
> + /*if @cgrp is being removed, procs_file may already be gone */
> + if (!cgrp->procs_file.kn)
> + return -ENODEV;
> +
> inode = kernfs_get_inode(sb, cgrp->procs_file.kn);
> if (!inode)
> return -ENOMEM;

Tejun, thanks for the Cc.
Yep, that seems to be the correct analysis.
I assume you're turning this into a proper patch, so:

Reviewed-by: Christian Brauner (Microsoft) <[email protected]>

2022-10-08 06:26:27

by Christian Brauner

[permalink] [raw]
Subject: Re: [syzbot] general protection fault in kernfs_get_inode

On Sat, Oct 08, 2022 at 07:46:12AM +0200, Christian Brauner wrote:
> On Fri, Oct 07, 2022 at 11:35:49AM -1000, Tejun Heo wrote:
> > (cc'ing Christian and quoting whole body)
> >
> > Christan, I can't repro it here but think what we need is sth like the
> > following. The problem is that cgroup_is_dead() check in the fork path isn't
> > enough as the perm check depends on cgrp->procs_file being available but
> > that is cleared while DYING before DEAD. So, depending on the timing, we can
> > end up trying to deref NULL pointer in may_write.
> >
> > diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
> > index 8ad2c267ff471..603b7167450a1 100644
> > --- a/kernel/cgroup/cgroup.c
> > +++ b/kernel/cgroup/cgroup.c
> > @@ -4934,6 +4934,10 @@ static int cgroup_may_write(const struct cgroup *cgrp, struct super_block *sb)
> >
> > lockdep_assert_held(&cgroup_mutex);
> >
> > + /*if @cgrp is being removed, procs_file may already be gone */
> > + if (!cgrp->procs_file.kn)
> > + return -ENODEV;
> > +
> > inode = kernfs_get_inode(sb, cgrp->procs_file.kn);
> > if (!inode)
> > return -ENOMEM;
>
> Tejun, thanks for the Cc.
> Yep, that seems to be the correct analysis.
> I assume you're turning this into a proper patch, so:
>
> Reviewed-by: Christian Brauner (Microsoft) <[email protected]>

#syz test: [email protected]:brauner/linux.git kernel.cgroup_may_write.fix

From f2517f35b571ee80ab046f205ef8b3143d039d57 Mon Sep 17 00:00:00 2001
From: Not My Commit <[email protected]>
Date: Sat, 8 Oct 2022 07:44:57 +0200
Subject: [PATCH] NOT A REAL COMMIT

THIS IS JUST FOR SYZBOT.
---
kernel/cgroup/cgroup.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index 8ad2c267ff47..f8386a066e0e 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -4934,6 +4934,9 @@ static int cgroup_may_write(const struct cgroup *cgrp, struct super_block *sb)

lockdep_assert_held(&cgroup_mutex);

+ if (!cgrp->procs_file.kn)
+ return -ENODEV;
+
inode = kernfs_get_inode(sb, cgrp->procs_file.kn);
if (!inode)
return -ENOMEM;
--
2.34.1

2022-10-08 11:35:56

by syzbot

[permalink] [raw]
Subject: Re: [syzbot] general protection fault in kernfs_get_inode

Hello,

syzbot tried to test the proposed patch but the build/boot failed:

failed to checkout kernel repo [email protected]:brauner/linux.git/kernel.cgroup_may_write.fix: failed to run ["git" "fetch" "--force" "81e322358ba96b4e95306c1d79b01e0c61095de5" "kernel.cgroup_may_write.fix"]: exit status 128
Host key verification failed.
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.



Tested on:

commit: [unknown
git tree: [email protected]:brauner/linux.git kernel.cgroup_may_write.fix
dashboard link: https://syzkaller.appspot.com/bug?extid=534ee3d24c37c411f37f
compiler:
patch: https://syzkaller.appspot.com/x/patch.diff?x=13ae1c42880000

2022-10-08 18:50:43

by Christian A. Ehrhardt

[permalink] [raw]
Subject: Re: [syzbot] general protection fault in kernfs_get_inode


Hi (from another Christian),

On Fri, Oct 07, 2022 at 11:35:49AM -1000, Tejun Heo wrote:
> (cc'ing Christian and quoting whole body)
>
> Christan, I can't repro it here but think what we need is sth like the
> following. The problem is that cgroup_is_dead() check in the fork path isn't
> enough as the perm check depends on cgrp->procs_file being available but
> that is cleared while DYING before DEAD. So, depending on the timing, we can
> end up trying to deref NULL pointer in may_write.
>
> diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
> index 8ad2c267ff471..603b7167450a1 100644
> --- a/kernel/cgroup/cgroup.c
> +++ b/kernel/cgroup/cgroup.c
> @@ -4934,6 +4934,10 @@ static int cgroup_may_write(const struct cgroup *cgrp, struct super_block *sb)
>
> lockdep_assert_held(&cgroup_mutex);
>
> + /*if @cgrp is being removed, procs_file may already be gone */
> + if (!cgrp->procs_file.kn)
> + return -ENODEV;
> +
> inode = kernfs_get_inode(sb, cgrp->procs_file.kn);
> if (!inode)
> return -ENOMEM;

I had syzbot hit the same crash here and as can be seen from the
reproducer the root cause is that the target cgroup (specified
via CLONE_INTO_CGROUP) is a version 1 cgroup. These cgroups just
don't initialize ->procs_file.kn.

This is a regression from v6.0 caused by

f3a2aebdd6 ("cgroup: enable cgroup_get_from_file() on cgroup1")

Unless we want to revert this patch the correct fix might be
something like this:

diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index f487c54a0087..67474b1ae087 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -6249,6 +6249,11 @@ static int cgroup_css_set_fork(struct kernel_clone_args *kargs)
goto err;
}

+ if (!cgroup_on_dfl(dst_cgrp)) {
+ ret = -EBADF;
+ goto err;
+ }
+
if (cgroup_is_dead(dst_cgrp)) {
ret = -ENODEV;
goto err;


For reference: This is the reproducer produces by syzbot here:
| r0 = fsopen(&(0x7f0000000040)='cpuset\x00', 0x0)
| fsconfig$FSCONFIG_CMD_CREATE(r0, 0x6, 0x0, 0x0, 0x0)
| r1 = fsmount(r0, 0x0, 0x0)
| syz_clone3(&(0x7f00000009c0)={0x244000700, 0x0, 0x0, 0x0, {}, 0x0, 0x0, 0x0, 0x0, 0x0, {r1}}, 0x58)

The crash is 100% reproducible on every single run, i.e. not a
deletion race or similar.

regards Christian

>
>
> On Tue, Oct 04, 2022 at 07:19:34PM -0700, syzbot wrote:
> > syzbot has found a reproducer for the following issue on:
> >
> > HEAD commit: 0326074ff465 Merge tag 'net-next-6.1' of git://git.kernel...
> > git tree: upstream
> > console+strace: https://syzkaller.appspot.com/x/log.txt?x=149c92cc880000
> > kernel config: https://syzkaller.appspot.com/x/.config?x=e1de7ca9efcc028c
> > dashboard link: https://syzkaller.appspot.com/bug?extid=534ee3d24c37c411f37f
> > compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
> > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=10af2492880000
> > C reproducer: https://syzkaller.appspot.com/x/repro.c?x=104874f0880000
> >
> > Downloadable assets:
> > disk image: https://storage.googleapis.com/syzbot-assets/43729d6ce2fc/disk-0326074f.raw.xz
> > vmlinux: https://storage.googleapis.com/syzbot-assets/1f76d6f68eb3/vmlinux-0326074f.xz
> >
> > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > Reported-by: [email protected]
> >
> > general protection fault, probably for non-canonical address 0xdffffc0000000012: 0000 [#1] PREEMPT SMP KASAN
> > KASAN: null-ptr-deref in range [0x0000000000000090-0x0000000000000097]
> > CPU: 1 PID: 3617 Comm: syz-executor384 Not tainted 6.0.0-syzkaller-02734-g0326074ff465 #0
> > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/22/2022
> > RIP: 0010:kernfs_ino include/linux/kernfs.h:358 [inline]
> > RIP: 0010:kernfs_get_inode+0x2e/0x520 fs/kernfs/inode.c:254
> > Code: 41 56 41 55 41 54 49 89 fc 53 48 89 f3 e8 1a 04 7e ff 48 8d bb 90 00 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 3a 04 00 00 48 8b b3 90 00 00 00 4c 89 e7 e8 79
> > RSP: 0018:ffffc90003c8fa30 EFLAGS: 00010206
> > RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000000
> > RDX: 0000000000000012 RSI: ffffffff81fd1156 RDI: 0000000000000090
> > RBP: ffffc90003c8fa50 R08: 0000000000000005 R09: 0000000000000000
> > R10: 0000000000000001 R11: 0000000000000024 R12: ffff8880224ba000
> > R13: ffff888075922000 R14: ffff88801ebf0000 R15: ffff8880211ae000
> > FS: 0000555556907300(0000) GS:ffff8880b9b00000(0000) knlGS:0000000000000000
> > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 0000000020000200 CR3: 000000001c179000 CR4: 00000000003506e0
> > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > Call Trace:
> > <TASK>
> > cgroup_may_write+0x86/0x120 kernel/cgroup/cgroup.c:4937
> > cgroup_css_set_fork kernel/cgroup/cgroup.c:6237 [inline]
> > cgroup_can_fork+0x961/0xec0 kernel/cgroup/cgroup.c:6331
> > copy_process+0x43ed/0x7090 kernel/fork.c:2358
> > kernel_clone+0xe7/0xab0 kernel/fork.c:2671
> > __do_sys_clone3+0x1cd/0x2e0 kernel/fork.c:2963
> > do_syscall_x64 arch/x86/entry/common.c:50 [inline]
> > do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
> > entry_SYSCALL_64_after_hwframe+0x63/0xcd
> > RIP: 0033:0x7f621d8c3e99
> > Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 51 15 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48
> > RSP: 002b:00007ffcaa952ec8 EFLAGS: 00000206 ORIG_RAX: 00000000000001b3
> > RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f621d8c3e99
> > RDX: 0000000000000000 RSI: 0000000000000058 RDI: 00007ffcaa952f40
> > RBP: 0000000000000000 R08: 00007ffcaa952d60 R09: 00007ffcaa952ef0
> > R10: 00000000ffffffff R11: 0000000000000206 R12: 00007ffcaa952eec
> > R13: 00007ffcaa952f00 R14: 00007ffcaa952f40 R15: 0000000000000000
> > </TASK>
> > Modules linked in:
> > ---[ end trace 0000000000000000 ]---
> > RIP: 0010:kernfs_ino include/linux/kernfs.h:358 [inline]
> > RIP: 0010:kernfs_get_inode+0x2e/0x520 fs/kernfs/inode.c:254
> > Code: 41 56 41 55 41 54 49 89 fc 53 48 89 f3 e8 1a 04 7e ff 48 8d bb 90 00 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 3a 04 00 00 48 8b b3 90 00 00 00 4c 89 e7 e8 79
> > RSP: 0018:ffffc90003c8fa30 EFLAGS: 00010206
> > RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000000
> > RDX: 0000000000000012 RSI: ffffffff81fd1156 RDI: 0000000000000090
> > RBP: ffffc90003c8fa50 R08: 0000000000000005 R09: 0000000000000000
> > R10: 0000000000000001 R11: 0000000000000024 R12: ffff8880224ba000
> > R13: ffff888075922000 R14: ffff88801ebf0000 R15: ffff8880211ae000
> > FS: 0000555556907300(0000) GS:ffff8880b9a00000(0000) knlGS:0000000000000000
> > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 00007ffcaa9cb8f0 CR3: 000000001c179000 CR4: 00000000003506f0
> > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > ----------------
> > Code disassembly (best guess):
> > 0: 41 56 push %r14
> > 2: 41 55 push %r13
> > 4: 41 54 push %r12
> > 6: 49 89 fc mov %rdi,%r12
> > 9: 53 push %rbx
> > a: 48 89 f3 mov %rsi,%rbx
> > d: e8 1a 04 7e ff callq 0xff7e042c
> > 12: 48 8d bb 90 00 00 00 lea 0x90(%rbx),%rdi
> > 19: 48 b8 00 00 00 00 00 movabs $0xdffffc0000000000,%rax
> > 20: fc ff df
> > 23: 48 89 fa mov %rdi,%rdx
> > 26: 48 c1 ea 03 shr $0x3,%rdx
> > * 2a: 80 3c 02 00 cmpb $0x0,(%rdx,%rax,1) <-- trapping instruction
> > 2e: 0f 85 3a 04 00 00 jne 0x46e
> > 34: 48 8b b3 90 00 00 00 mov 0x90(%rbx),%rsi
> > 3b: 4c 89 e7 mov %r12,%rdi
> > 3e: e8 .byte 0xe8
> > 3f: 79 .byte 0x79
> >
>
> --
> tejun

2022-10-09 09:17:06

by Christian Brauner

[permalink] [raw]
Subject: Re: [syzbot] general protection fault in kernfs_get_inode

On Sat, Oct 08, 2022 at 08:29:40PM +0200, Christian A. Ehrhardt wrote:
>
> Hi (from another Christian),
>
> On Fri, Oct 07, 2022 at 11:35:49AM -1000, Tejun Heo wrote:
> > (cc'ing Christian and quoting whole body)
> >
> > Christan, I can't repro it here but think what we need is sth like the
> > following. The problem is that cgroup_is_dead() check in the fork path isn't
> > enough as the perm check depends on cgrp->procs_file being available but
> > that is cleared while DYING before DEAD. So, depending on the timing, we can
> > end up trying to deref NULL pointer in may_write.
> >
> > diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
> > index 8ad2c267ff471..603b7167450a1 100644
> > --- a/kernel/cgroup/cgroup.c
> > +++ b/kernel/cgroup/cgroup.c
> > @@ -4934,6 +4934,10 @@ static int cgroup_may_write(const struct cgroup *cgrp, struct super_block *sb)
> >
> > lockdep_assert_held(&cgroup_mutex);
> >
> > + /*if @cgrp is being removed, procs_file may already be gone */
> > + if (!cgrp->procs_file.kn)
> > + return -ENODEV;
> > +
> > inode = kernfs_get_inode(sb, cgrp->procs_file.kn);
> > if (!inode)
> > return -ENOMEM;
>
> I had syzbot hit the same crash here and as can be seen from the
> reproducer the root cause is that the target cgroup (specified
> via CLONE_INTO_CGROUP) is a version 1 cgroup. These cgroups just
> don't initialize ->procs_file.kn.
>
> This is a regression from v6.0 caused by
>
> f3a2aebdd6 ("cgroup: enable cgroup_get_from_file() on cgroup1")

Yeah, this patch is wrong in its simple form and definitely breaks CLONE_INTO_CGROUP.
CLONE_INTO_CGROUP can only work with cgroup2 fds. It absolutely cannot
work with cgroup1 fds. The semantics would be terrible as controllers
can be mounted into separate hierarchies.

>
> Unless we want to revert this patch the correct fix might be
> something like this:
>
> diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
> index f487c54a0087..67474b1ae087 100644
> --- a/kernel/cgroup/cgroup.c
> +++ b/kernel/cgroup/cgroup.c
> @@ -6249,6 +6249,11 @@ static int cgroup_css_set_fork(struct kernel_clone_args *kargs)
> goto err;
> }
>
> + if (!cgroup_on_dfl(dst_cgrp)) {
> + ret = -EBADF;
> + goto err;
> + }

That seems like a good enough patch to me.

2022-10-09 13:37:17

by Christian A. Ehrhardt

[permalink] [raw]
Subject: [PATCH] cgroup: Fix crash with CLONE_INTO_CGROUP and v1 cgroups


Since commit f3a2aebdd6, Version 1 cgroups no longer cause an
error when used with CLONE_INTO_CGROUP. However, the permission
checks performed during clone assume a Version 2 cgroup.

Restore the error check for V1 cgroups in the clone() path.

Reported-by: [email protected]
Link: https://lore.kernel.org/lkml/[email protected]/
Fixes: f3a2aebdd6 ("cgroup: enable cgroup_get_from_file() on cgroup1")
Signed-off-by: Christian A. Ehrhardt <[email protected]>
---
kernel/cgroup/cgroup.c | 5 +++++
1 file changed, 5 insertions(+)

diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index b6e3110b3ea7..f7fc3afa88c1 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -6244,6 +6244,11 @@ static int cgroup_css_set_fork(struct kernel_clone_args *kargs)
goto err;
}

+ if (!cgroup_on_dfl(dst_cgrp)) {
+ ret = -EBADF;
+ goto err;
+ }
+
if (cgroup_is_dead(dst_cgrp)) {
ret = -ENODEV;
goto err;
--
2.34.1

2022-10-09 17:42:14

by Christian Brauner

[permalink] [raw]
Subject: Re: [PATCH] cgroup: Fix crash with CLONE_INTO_CGROUP and v1 cgroups

On Sun, Oct 09, 2022 at 03:10:36PM +0200, Christian A. Ehrhardt wrote:
>
> Since commit f3a2aebdd6, Version 1 cgroups no longer cause an
> error when used with CLONE_INTO_CGROUP. However, the permission
> checks performed during clone assume a Version 2 cgroup.
>
> Restore the error check for V1 cgroups in the clone() path.
>
> Reported-by: [email protected]
> Link: https://lore.kernel.org/lkml/[email protected]/
> Fixes: f3a2aebdd6 ("cgroup: enable cgroup_get_from_file() on cgroup1")
> Signed-off-by: Christian A. Ehrhardt <[email protected]>
> ---

Thanks for fixing this,
Reviewed-by: Christian Brauner (Microsoft) <[email protected]>

Fwiw, @Tejun, after
f3a2aebdd6 ("cgroup: enable cgroup_get_from_file() on cgroup1")
that non-me-Christian fixes with this patch cgroup_get_from_file() looks
a bit odd. It could use sm like:

From 0a8a5f7acbec0385ec16cd93a9cff2486fae0ecb Mon Sep 17 00:00:00 2001
From: Christian Brauner <[email protected]>
Date: Sun, 9 Oct 2022 19:33:21 +0200
Subject: [PATCH] cgroup: remove unneeded local variable

Since commit f3a2aebdd6 ("cgroup: enable cgroup_get_from_file() on cgroup1")
the assigment to @cgrp has become pointless.

Signed-off-by: Christian Brauner (Microsoft) <[email protected]>
---
kernel/cgroup/cgroup.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index 8ad2c267ff47..550e7bd96b76 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -6160,14 +6160,12 @@ void cgroup_fork(struct task_struct *child)
static struct cgroup *cgroup_get_from_file(struct file *f)
{
struct cgroup_subsys_state *css;
- struct cgroup *cgrp;

css = css_tryget_online_from_dir(f->f_path.dentry, NULL);
if (IS_ERR(css))
return ERR_CAST(css);

- cgrp = css->cgroup;
- return cgrp;
+ return css->cgroup;
}

/**
--
2.34.1

2022-10-09 18:38:47

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH] cgroup: Fix crash with CLONE_INTO_CGROUP and v1 cgroups

On Sun, Oct 09, 2022 at 07:35:19PM +0200, Christian Brauner wrote:
> On Sun, Oct 09, 2022 at 03:10:36PM +0200, Christian A. Ehrhardt wrote:
> >
> > Since commit f3a2aebdd6, Version 1 cgroups no longer cause an
> > error when used with CLONE_INTO_CGROUP. However, the permission
> > checks performed during clone assume a Version 2 cgroup.
> >
> > Restore the error check for V1 cgroups in the clone() path.
> >
> > Reported-by: [email protected]
> > Link: https://lore.kernel.org/lkml/[email protected]/
> > Fixes: f3a2aebdd6 ("cgroup: enable cgroup_get_from_file() on cgroup1")
> > Signed-off-by: Christian A. Ehrhardt <[email protected]>
> > ---
>
> Thanks for fixing this,
> Reviewed-by: Christian Brauner (Microsoft) <[email protected]>

No cc: stable? :(

2022-10-09 18:49:31

by Yosry Ahmed

[permalink] [raw]
Subject: Re: [PATCH] cgroup: Fix crash with CLONE_INTO_CGROUP and v1 cgroups

On Sun, Oct 9, 2022 at 6:10 AM Christian A. Ehrhardt <[email protected]> wrote:
>
>
> Since commit f3a2aebdd6, Version 1 cgroups no longer cause an
> error when used with CLONE_INTO_CGROUP. However, the permission
> checks performed during clone assume a Version 2 cgroup.
>
> Restore the error check for V1 cgroups in the clone() path.
>
> Reported-by: [email protected]
> Link: https://lore.kernel.org/lkml/[email protected]/
> Fixes: f3a2aebdd6 ("cgroup: enable cgroup_get_from_file() on cgroup1")

Thanks for fixing this, and sorry if this caused a mess.

cgroup_get_from_file() independently seemed like it can support
cgroup1, I didn't realize that some of the callers depend on the fact
that it only supports cgroup2.

+Andrii Nakryiko +Alexei Starovoitov +Martin KaFai Lau +bpf
I wonder if BPF users have this dependency. Does cgroup_bpf_attach()
also depend on cgroup_get_from_fd() (which calls
cgroup_get_from_file()) eliminating v1 cgroups?

It seems like cgroup storages (and some other places) use cgroup ids.
Collisions can happen in cgroup1 ids so I am assuming we want to add a
check there as well. Perhaps in cgroup_bpf_attach() ?

I can send a patch for this if that's the case.

> Signed-off-by: Christian A. Ehrhardt <[email protected]>
> ---
> kernel/cgroup/cgroup.c | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
> index b6e3110b3ea7..f7fc3afa88c1 100644
> --- a/kernel/cgroup/cgroup.c
> +++ b/kernel/cgroup/cgroup.c
> @@ -6244,6 +6244,11 @@ static int cgroup_css_set_fork(struct kernel_clone_args *kargs)
> goto err;
> }
>
> + if (!cgroup_on_dfl(dst_cgrp)) {
> + ret = -EBADF;
> + goto err;
> + }
> +
> if (cgroup_is_dead(dst_cgrp)) {
> ret = -ENODEV;
> goto err;
> --
> 2.34.1
>

2022-10-10 18:49:16

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH] cgroup: Fix crash with CLONE_INTO_CGROUP and v1 cgroups

Hello,

On Sun, Oct 09, 2022 at 03:10:36PM +0200, Christian A. Ehrhardt wrote:
>
> Since commit f3a2aebdd6, Version 1 cgroups no longer cause an
> error when used with CLONE_INTO_CGROUP. However, the permission
> checks performed during clone assume a Version 2 cgroup.
>
> Restore the error check for V1 cgroups in the clone() path.
>
> Reported-by: [email protected]
> Link: https://lore.kernel.org/lkml/[email protected]/
> Fixes: f3a2aebdd6 ("cgroup: enable cgroup_get_from_file() on cgroup1")
> Signed-off-by: Christian A. Ehrhardt <[email protected]>

This feels too error prone. I'd rather revert the original commit. Yosry,
imma revert f3a2aebdd6. Can you please add a separate function which allows
looking up IDs for cgroup1 hierarchies if absolutely necessary? But,
frankly, given how inherently confusing using IDs for cgroup1 hierarchies is
(fd for cgroup1 identifies both the hierarchy and the cgroup, id is
inherently partial which is super confusing), I'd rather just not do it.

Thanks.

--
tejun

2022-10-10 18:56:13

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH] cgroup: Fix crash with CLONE_INTO_CGROUP and v1 cgroups

On Sun, Oct 09, 2022 at 08:16:29PM +0200, Greg KH wrote:
> On Sun, Oct 09, 2022 at 07:35:19PM +0200, Christian Brauner wrote:
> > On Sun, Oct 09, 2022 at 03:10:36PM +0200, Christian A. Ehrhardt wrote:
> > >
> > > Since commit f3a2aebdd6, Version 1 cgroups no longer cause an
> > > error when used with CLONE_INTO_CGROUP. However, the permission
> > > checks performed during clone assume a Version 2 cgroup.
> > >
> > > Restore the error check for V1 cgroups in the clone() path.
> > >
> > > Reported-by: [email protected]
> > > Link: https://lore.kernel.org/lkml/[email protected]/
> > > Fixes: f3a2aebdd6 ("cgroup: enable cgroup_get_from_file() on cgroup1")
> > > Signed-off-by: Christian A. Ehrhardt <[email protected]>
> > > ---
> >
> > Thanks for fixing this,
> > Reviewed-by: Christian Brauner (Microsoft) <[email protected]>
>
> No cc: stable? :(

This got merged this cycle, so no need for -stable.

Thanks.

--
tejun

2022-10-10 19:24:27

by Martin KaFai Lau

[permalink] [raw]
Subject: Re: [PATCH] cgroup: Fix crash with CLONE_INTO_CGROUP and v1 cgroups

On 10/9/22 11:42 AM, Yosry Ahmed wrote:
> On Sun, Oct 9, 2022 at 6:10 AM Christian A. Ehrhardt <[email protected]> wrote:
>>
>>
>> Since commit f3a2aebdd6, Version 1 cgroups no longer cause an
>> error when used with CLONE_INTO_CGROUP. However, the permission
>> checks performed during clone assume a Version 2 cgroup.
>>
>> Restore the error check for V1 cgroups in the clone() path.
>>
>> Reported-by: [email protected]
>> Link: https://lore.kernel.org/lkml/[email protected]/
>> Fixes: f3a2aebdd6 ("cgroup: enable cgroup_get_from_file() on cgroup1")
>
> Thanks for fixing this, and sorry if this caused a mess.
>
> cgroup_get_from_file() independently seemed like it can support
> cgroup1, I didn't realize that some of the callers depend on the fact
> that it only supports cgroup2.
>
> +Andrii Nakryiko +Alexei Starovoitov +Martin KaFai Lau +bpf
> I wonder if BPF users have this dependency. Does cgroup_bpf_attach()
> also depend on cgroup_get_from_fd() (which calls
> cgroup_get_from_file()) eliminating v1 cgroups?

Yes, cgroup_bpf_{prog,link}_attach() depends on cgroup_get_from_fd() only
returning v2 cgroup. Thus, it needs a fix to get back this filtering after
commit f3a2aebdd6.


>
> It seems like cgroup storages (and some other places) use cgroup ids.
> Collisions can happen in cgroup1 ids so I am assuming we want to add a
> check there as well. Perhaps in cgroup_bpf_attach() ?

From a quick look, cgroup storage should be fine. The insertion (where the
cgrp is required for the key purpose) has already passed the attach filtering.

Where 'other places' might have problem with the cgroup id?

>
> I can send a patch for this if that's the case.
>
>> Signed-off-by: Christian A. Ehrhardt <[email protected]>
>> ---
>> kernel/cgroup/cgroup.c | 5 +++++
>> 1 file changed, 5 insertions(+)
>>
>> diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
>> index b6e3110b3ea7..f7fc3afa88c1 100644
>> --- a/kernel/cgroup/cgroup.c
>> +++ b/kernel/cgroup/cgroup.c
>> @@ -6244,6 +6244,11 @@ static int cgroup_css_set_fork(struct kernel_clone_args *kargs)
>> goto err;
>> }
>>
>> + if (!cgroup_on_dfl(dst_cgrp)) {
>> + ret = -EBADF;
>> + goto err;
>> + }
>> +
>> if (cgroup_is_dead(dst_cgrp)) {
>> ret = -ENODEV;
>> goto err;
>> --
>> 2.34.1
>>

2022-10-10 19:24:47

by Yosry Ahmed

[permalink] [raw]
Subject: Re: [PATCH] cgroup: Fix crash with CLONE_INTO_CGROUP and v1 cgroups

On Mon, Oct 10, 2022 at 11:44 AM Tejun Heo <[email protected]> wrote:
>
> Hello,
>
> On Sun, Oct 09, 2022 at 03:10:36PM +0200, Christian A. Ehrhardt wrote:
> >
> > Since commit f3a2aebdd6, Version 1 cgroups no longer cause an
> > error when used with CLONE_INTO_CGROUP. However, the permission
> > checks performed during clone assume a Version 2 cgroup.
> >
> > Restore the error check for V1 cgroups in the clone() path.
> >
> > Reported-by: [email protected]
> > Link: https://lore.kernel.org/lkml/[email protected]/
> > Fixes: f3a2aebdd6 ("cgroup: enable cgroup_get_from_file() on cgroup1")
> > Signed-off-by: Christian A. Ehrhardt <[email protected]>
>
> This feels too error prone. I'd rather revert the original commit. Yosry,
> imma revert f3a2aebdd6. Can you please add a separate function which allows
> looking up IDs for cgroup1 hierarchies if absolutely necessary? But,
> frankly, given how inherently confusing using IDs for cgroup1 hierarchies is
> (fd for cgroup1 identifies both the hierarchy and the cgroup, id is
> inherently partial which is super confusing), I'd rather just not do it.

The purpose of f3a2aebdd6 was to make cgroup_get_from_fd() support
cgroup1, which IIUC makes sense. It was unrelated to IDs.

There are currently two users of
cgroup_get_from_file()/cgroup_get_from_fd() AFAICT, one of which is
the fork code fixed by this commit, the second is BPF cgroup prog
attachment. I can send another patch to add explicit filtering in the
BPF attachment code as well.

Alternatively, we can have separate functions that do the filtering if
needed. For example:
cgroup_get_from_fd() / cgroup_get_from_file() -> support both v1 and v2
cgroup_get_dfl_from_fd() / cgroup_get_dfl_from_file() -> support only v2

We can then use the versions with filtering for all the current users
except cgroup_iter (that needs to support both v1 and v2). WDYT?

>
> Thanks.
>
> --
> tejun

2022-10-10 20:01:40

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH] cgroup: Fix crash with CLONE_INTO_CGROUP and v1 cgroups

Hello,

On Mon, Oct 10, 2022 at 11:50:50AM -0700, Yosry Ahmed wrote:
> The purpose of f3a2aebdd6 was to make cgroup_get_from_fd() support
> cgroup1, which IIUC makes sense. It was unrelated to IDs.

Ah, right you are. For some reason, I thought this was ID based.

> There are currently two users of
> cgroup_get_from_file()/cgroup_get_from_fd() AFAICT, one of which is
> the fork code fixed by this commit, the second is BPF cgroup prog
> attachment. I can send another patch to add explicit filtering in the
> BPF attachment code as well.
>
> Alternatively, we can have separate functions that do the filtering if
> needed. For example:
> cgroup_get_from_fd() / cgroup_get_from_file() -> support both v1 and v2
> cgroup_get_dfl_from_fd() / cgroup_get_dfl_from_file() -> support only v2
>
> We can then use the versions with filtering for all the current users
> except cgroup_iter (that needs to support both v1 and v2). WDYT?

Yes, but please leave the existing ones v2 only and add new ones which
allows cgroup1 too.

Thanks.

--
tejun

2022-10-10 20:16:09

by Yosry Ahmed

[permalink] [raw]
Subject: Re: [PATCH] cgroup: Fix crash with CLONE_INTO_CGROUP and v1 cgroups

On Mon, Oct 10, 2022 at 12:51 PM Tejun Heo <[email protected]> wrote:
>
> Hello,
>
> On Mon, Oct 10, 2022 at 11:50:50AM -0700, Yosry Ahmed wrote:
> > The purpose of f3a2aebdd6 was to make cgroup_get_from_fd() support
> > cgroup1, which IIUC makes sense. It was unrelated to IDs.
>
> Ah, right you are. For some reason, I thought this was ID based.
>
> > There are currently two users of
> > cgroup_get_from_file()/cgroup_get_from_fd() AFAICT, one of which is
> > the fork code fixed by this commit, the second is BPF cgroup prog
> > attachment. I can send another patch to add explicit filtering in the
> > BPF attachment code as well.
> >
> > Alternatively, we can have separate functions that do the filtering if
> > needed. For example:
> > cgroup_get_from_fd() / cgroup_get_from_file() -> support both v1 and v2
> > cgroup_get_dfl_from_fd() / cgroup_get_dfl_from_file() -> support only v2
> >
> > We can then use the versions with filtering for all the current users
> > except cgroup_iter (that needs to support both v1 and v2). WDYT?
>
> Yes, but please leave the existing ones v2 only and add new ones which
> allows cgroup1 too.

Any suggestions for the new names though? Given that the new ones
would be the ones that will support both versions, I am not sure how
to name them differently in a meaningful way. Maybe something like
cgroup_get_all_from_[fd/file]() ?

IMO, we can rename the current versions to
cgroup_get_dfl_from_[fd/file](), and the new ones (which support both)
as cgroup_get_from_fd/file(). In this case it's obvious that one
version specifically works on "dfl", aka cgroup2.

Alternatively, we can have separate versions for v1,
cgroup1_get_from_[fd/file](), but then callers that want both v1 and
v2 will have to make 2 separate calls unnecessarily.

Thoughts?

>
> Thanks.
>
> --
> tejun

2022-10-10 20:16:28

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH] cgroup: Fix crash with CLONE_INTO_CGROUP and v1 cgroups

Hello,

On Mon, Oct 10, 2022 at 12:57:34PM -0700, Yosry Ahmed wrote:
> Any suggestions for the new names though? Given that the new ones
> would be the ones that will support both versions, I am not sure how
> to name them differently in a meaningful way. Maybe something like
> cgroup_get_all_from_[fd/file]() ?

idk, cgroup12_get_from_[fd/file]()?

> IMO, we can rename the current versions to
> cgroup_get_dfl_from_[fd/file](), and the new ones (which support both)
> as cgroup_get_from_fd/file(). In this case it's obvious that one
> version specifically works on "dfl", aka cgroup2.

It's kinda confusing because we've been assuming that these lookup functions
are all cgroup2 only.

Thanks.

--
tejun

2022-10-10 21:25:41

by Yosry Ahmed

[permalink] [raw]
Subject: Re: [PATCH] cgroup: Fix crash with CLONE_INTO_CGROUP and v1 cgroups

On Mon, Oct 10, 2022 at 1:07 PM Tejun Heo <[email protected]> wrote:
>
> Hello,
>
> On Mon, Oct 10, 2022 at 12:57:34PM -0700, Yosry Ahmed wrote:
> > Any suggestions for the new names though? Given that the new ones
> > would be the ones that will support both versions, I am not sure how
> > to name them differently in a meaningful way. Maybe something like
> > cgroup_get_all_from_[fd/file]() ?
>
> idk, cgroup12_get_from_[fd/file]()?
>
> > IMO, we can rename the current versions to
> > cgroup_get_dfl_from_[fd/file](), and the new ones (which support both)
> > as cgroup_get_from_fd/file(). In this case it's obvious that one
> > version specifically works on "dfl", aka cgroup2.
>
> It's kinda confusing because we've been assuming that these lookup functions
> are all cgroup2 only.
>
> Thanks.

Got it. Will send a patch shortly. Thanks!

>
> --
> tejun

2022-11-17 08:21:59

by syzbot

[permalink] [raw]
Subject: Re: [syzbot] general protection fault in kernfs_get_inode

syzbot suspects this issue was fixed by commit:

commit c6a7f445a2727a66fe68a7097f42698d8b31ea2c
Author: Yang Shi <[email protected]>
Date: Wed Jul 6 23:59:20 2022 +0000

mm: khugepaged: don't carry huge page to the next loop for !CONFIG_NUMA

bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=179cadcd880000
start commit: 55be6084c8e0 Merge tag 'timers-core-2022-10-05' of git://g..
git tree: upstream
kernel config: https://syzkaller.appspot.com/x/.config?x=df75278aabf0681a
dashboard link: https://syzkaller.appspot.com/bug?extid=534ee3d24c37c411f37f
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=150adc52880000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=149d9584880000

If the result looks correct, please mark the issue as fixed by replying with:

#syz fix: mm: khugepaged: don't carry huge page to the next loop for !CONFIG_NUMA

For information about bisection process see: https://goo.gl/tpsmEJ#bisection