Hello,
syzbot found the following issue on:
HEAD commit: 23d04328444a Merge tag 'for-5.17/parisc-4' of git://git.ke..
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=1614a812700000
kernel config: https://syzkaller.appspot.com/x/.config?x=f2a8c25b60d49d24
dashboard link: https://syzkaller.appspot.com/bug?extid=5e28cdb7ebd0f2389ca4
compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
Unfortunately, I don't have any reproducer for this issue yet.
IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: [email protected]
kmem_cache_destroy 9p-fcall-cache: Slab cache still has objects when called from p9_client_destroy+0x213/0x370 net/9p/client.c:1100
WARNING: CPU: 1 PID: 3701 at mm/slab_common.c:502 kmem_cache_destroy mm/slab_common.c:502 [inline]
WARNING: CPU: 1 PID: 3701 at mm/slab_common.c:502 kmem_cache_destroy+0x13b/0x140 mm/slab_common.c:490
Modules linked in:
CPU: 1 PID: 3701 Comm: syz-executor.3 Not tainted 5.17.0-rc5-syzkaller-00021-g23d04328444a #0
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014
RIP: 0010:kmem_cache_destroy mm/slab_common.c:502 [inline]
RIP: 0010:kmem_cache_destroy+0x13b/0x140 mm/slab_common.c:490
Code: da a8 0e 48 89 ee e8 44 6e 15 00 eb c1 c3 48 8b 55 58 48 c7 c6 60 cd b6 89 48 c7 c7 30 83 3a 8b 48 8b 4c 24 18 e8 9b 30 60 07 <0f> 0b eb a0 90 41 55 49 89 d5 41 54 49 89 f4 55 48 89 fd 53 48 83
RSP: 0018:ffffc90002767cf0 EFLAGS: 00010282
RAX: 0000000000000000 RBX: 1ffff920004ecfa5 RCX: 0000000000000000
RDX: ffff88801e56a280 RSI: ffffffff815f4b38 RDI: fffff520004ecf90
RBP: ffff888020ba8b00 R08: 0000000000000000 R09: 0000000000000000
R10: ffffffff815ef1ce R11: 0000000000000000 R12: 0000000000000001
R13: ffffc90002767d68 R14: dffffc0000000000 R15: 0000000000000000
FS: 00005555561b0400(0000) GS:ffff88802ca00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000555556ead708 CR3: 0000000068b97000 CR4: 0000000000150ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
p9_client_destroy+0x213/0x370 net/9p/client.c:1100
v9fs_session_close+0x45/0x2d0 fs/9p/v9fs.c:504
v9fs_kill_super+0x49/0x90 fs/9p/vfs_super.c:226
deactivate_locked_super+0x94/0x160 fs/super.c:332
deactivate_super+0xad/0xd0 fs/super.c:363
cleanup_mnt+0x3a2/0x540 fs/namespace.c:1173
task_work_run+0xdd/0x1a0 kernel/task_work.c:164
tracehook_notify_resume include/linux/tracehook.h:188 [inline]
exit_to_user_mode_loop kernel/entry/common.c:175 [inline]
exit_to_user_mode_prepare+0x27e/0x290 kernel/entry/common.c:207
__syscall_exit_to_user_mode_work kernel/entry/common.c:289 [inline]
syscall_exit_to_user_mode+0x19/0x60 kernel/entry/common.c:300
do_syscall_64+0x42/0xb0 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f5ff63ed4c7
Code: ff ff ff f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 31 f6 e9 09 00 00 00 66 0f 1f 84 00 00 00 00 00 b8 a6 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fff01862e98 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00007f5ff63ed4c7
RDX: 00007fff01862f6c RSI: 000000000000000a RDI: 00007fff01862f60
RBP: 00007fff01862f60 R08: 00000000ffffffff R09: 00007fff01862d30
R10: 00005555561b18b3 R11: 0000000000000246 R12: 00007f5ff64451ea
R13: 00007fff01864020 R14: 00005555561b1810 R15: 00007fff01864060
</TASK>
---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at [email protected].
syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
syzbot wrote on Sun, Feb 27, 2022 at 04:53:29PM -0800:
> kmem_cache_destroy 9p-fcall-cache: Slab cache still has objects when
> called from p9_client_destroy+0x213/0x370 net/9p/client.c:1100
hmm, there is no previous "Packet with tag %d has still references"
(sic) message, so this is probably because p9_tag_cleanup only relies on
rcu read lock for consistency, so even if the connection has been closed
above (clnt->trans_mode->close) there could have been a request sent
(= tag added) just before that which isn't visible on the destroying
side?
I guess adding an rcu_barrier() is what makes most sense here to protect
this case?
I'll send a patch in the next few days unless it was a stupid idea.
--
Dominique
Sorry, got to resend this in plain text. It doesn't look like it is
getting through to the mailing lists.
On Thu, Mar 24, 2022 at 3:13 PM David Kahurani <[email protected]> wrote:
>
> On Monday, February 28, 2022 at 4:38:57 AM UTC+3 [email protected] wrote:
>>
>> syzbot wrote on Sun, Feb 27, 2022 at 04:53:29PM -0800:
>> > kmem_cache_destroy 9p-fcall-cache: Slab cache still has objects when
>> > called from p9_client_destroy+0x213/0x370 net/9p/client.c:1100
>>
>> hmm, there is no previous "Packet with tag %d has still references"
>> (sic) message, so this is probably because p9_tag_cleanup only relies on
>> rcu read lock for consistency, so even if the connection has been closed
>> above (clnt->trans_mode->close) there could have been a request sent
>> (= tag added) just before that which isn't visible on the destroying
>> side?
>>
>> I guess adding an rcu_barrier() is what makes most sense here to protect
>> this case?
>> I'll send a patch in the next few days unless it was a stupid idea.
>
>
> Looking at this brought me to the same conclusion.
>
> ---------------------
>
> From cd5a11207a140004bf55005fac7f7e4cec2fd075 Mon Sep 17 00:00:00 2001
> From: David Kahurani <[email protected]>
> Date: Thu, 24 Mar 2022 15:00:23 +0300
> Subject: [PATCH] net/9p: Flush any delayed rce free
>
> As is best practice
>
> kmem_cache_destroy 9p-fcall-cache: Slab cache still has objects when called from p9_client_destroy+0x213/0x370 net/9p/client.c:1100
> WARNING: CPU: 1 PID: 3701 at mm/slab_common.c:502 kmem_cache_destroy mm/slab_common.c:502 [inline]
> WARNING: CPU: 1 PID: 3701 at mm/slab_common.c:502 kmem_cache_destroy+0x13b/0x140 mm/slab_common.c:490
> Modules linked in:
> CPU: 1 PID: 3701 Comm: syz-executor.3 Not tainted 5.17.0-rc5-syzkaller-00021-g23d04328444a #0
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014
> RIP: 0010:kmem_cache_destroy mm/slab_common.c:502 [inline]
> RIP: 0010:kmem_cache_destroy+0x13b/0x140 mm/slab_common.c:490
> Code: da a8 0e 48 89 ee e8 44 6e 15 00 eb c1 c3 48 8b 55 58 48 c7 c6 60 cd b6 89 48 c7 c7 30 83 3a 8b 48 8b 4c 24 18 e8 9b 30 60 07 <0f> 0b eb a0 90 41 55 49 89 d5 41 54 49 89 f4 55 48 89 fd 53 48 83
> RSP: 0018:ffffc90002767cf0 EFLAGS: 00010282
> RAX: 0000000000000000 RBX: 1ffff920004ecfa5 RCX: 0000000000000000
> RDX: ffff88801e56a280 RSI: ffffffff815f4b38 RDI: fffff520004ecf90
> RBP: ffff888020ba8b00 R08: 0000000000000000 R09: 0000000000000000
> R10: ffffffff815ef1ce R11: 0000000000000000 R12: 0000000000000001
> R13: ffffc90002767d68 R14: dffffc0000000000 R15: 0000000000000000
> FS: 00005555561b0400(0000) GS:ffff88802ca00000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000555556ead708 CR3: 0000000068b97000 CR4: 0000000000150ef0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
> <TASK>
> p9_client_destroy+0x213/0x370 net/9p/client.c:1100
> v9fs_session_close+0x45/0x2d0 fs/9p/v9fs.c:504
> v9fs_kill_super+0x49/0x90 fs/9p/vfs_super.c:226
> deactivate_locked_super+0x94/0x160 fs/super.c:332
> deactivate_super+0xad/0xd0 fs/super.c:363
> cleanup_mnt+0x3a2/0x540 fs/namespace.c:1173
> task_work_run+0xdd/0x1a0 kernel/task_work.c:164
> tracehook_notify_resume include/linux/tracehook.h:188 [inline]
> exit_to_user_mode_loop kernel/entry/common.c:175 [inline]
> exit_to_user_mode_prepare+0x27e/0x290 kernel/entry/common.c:207
> __syscall_exit_to_user_mode_work kernel/entry/common.c:289 [inline]
> syscall_exit_to_user_mode+0x19/0x60 kernel/entry/common.c:300
> do_syscall_64+0x42/0xb0 arch/x86/entry/common.c:86
> entry_SYSCALL_64_after_hwframe+0x44/0xae
> RIP: 0033:0x7f5ff63ed4c7
> Code: ff ff ff f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 31 f6 e9 09 00 00 00 66 0f 1f 84 00 00 00 00 00 b8 a6 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
> RSP: 002b:00007fff01862e98 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
> RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00007f5ff63ed4c7
> RDX: 00007fff01862f6c RSI: 000000000000000a RDI: 00007fff01862f60
> RBP: 00007fff01862f60 R08: 00000000ffffffff R09: 00007fff01862d30
> R10: 00005555561b18b3 R11: 0000000000000246 R12: 00007f5ff64451ea
> R13: 00007fff01864020 R14: 00005555561b1810 R15: 00007fff01864060
> </TASK>
>
> Signed-off-by: David Kahurani <[email protected]>
> Reported-by: [email protected]
> ---
> net/9p/client.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/net/9p/client.c b/net/9p/client.c
> index 8bba0d9cf..67c51913a 100644
> --- a/net/9p/client.c
> +++ b/net/9p/client.c
> @@ -1097,6 +1097,7 @@ void p9_client_destroy(struct p9_client *clnt)
>
> p9_tag_cleanup(clnt);
>
> + rcu_barrier();
> kmem_cache_destroy(clnt->fcall_cache);
> kfree(clnt);
> }
> --
> 2.25.1
>
>
On Donnerstag, 24. M?rz 2022 13:13:25 CET David Kahurani wrote:
> On Monday, February 28, 2022 at 4:38:57 AM UTC+3 [email protected]
>
> wrote:
> > syzbot wrote on Sun, Feb 27, 2022 at 04:53:29PM -0800:
> > > kmem_cache_destroy 9p-fcall-cache: Slab cache still has objects when
> > > called from p9_client_destroy+0x213/0x370 net/9p/client.c:1100
> >
> > hmm, there is no previous "Packet with tag %d has still references"
> > (sic) message, so this is probably because p9_tag_cleanup only relies on
> > rcu read lock for consistency, so even if the connection has been closed
> > above (clnt->trans_mode->close) there could have been a request sent
> > (= tag added) just before that which isn't visible on the destroying
> > side?
> >
> > I guess adding an rcu_barrier() is what makes most sense here to protect
> > this case?
> > I'll send a patch in the next few days unless it was a stupid idea.
>
> Looking at this brought me to the same conclusion.
>
> ---------------------
>
> From cd5a11207a140004bf55005fac7f7e4cec2fd075 Mon Sep 17 00:00:00 2001
> From: David Kahurani <[email protected]>
> Date: Thu, 24 Mar 2022 15:00:23 +0300
> Subject: [PATCH] net/9p: Flush any delayed rce free
>
> As is best practice
>
> kmem_cache_destroy 9p-fcall-cache: Slab cache still has objects when called
> from p9_client_destroy+0x213/0x370 net/9p/client.c:1100
> WARNING: CPU: 1 PID: 3701 at mm/slab_common.c:502 kmem_cache_destroy
> mm/slab_common.c:502 [inline]
> WARNING: CPU: 1 PID: 3701 at mm/slab_common.c:502
> kmem_cache_destroy+0x13b/0x140 mm/slab_common.c:490
> Modules linked in:
> CPU: 1 PID: 3701 Comm: syz-executor.3 Not tainted
> 5.17.0-rc5-syzkaller-00021-g23d04328444a #0
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014
> RIP: 0010:kmem_cache_destroy mm/slab_common.c:502 [inline]
> RIP: 0010:kmem_cache_destroy+0x13b/0x140 mm/slab_common.c:490
> Code: da a8 0e 48 89 ee e8 44 6e 15 00 eb c1 c3 48 8b 55 58 48 c7 c6 60 cd
> b6 89 48 c7 c7 30 83 3a 8b 48 8b 4c 24 18 e8 9b 30 60 07 <0f> 0b eb a0 90
> 41 55 49 89 d5 41 54 49 89 f4 55 48 89 fd 53 48 83
> RSP: 0018:ffffc90002767cf0 EFLAGS: 00010282
> RAX: 0000000000000000 RBX: 1ffff920004ecfa5 RCX: 0000000000000000
> RDX: ffff88801e56a280 RSI: ffffffff815f4b38 RDI: fffff520004ecf90
> RBP: ffff888020ba8b00 R08: 0000000000000000 R09: 0000000000000000
> R10: ffffffff815ef1ce R11: 0000000000000000 R12: 0000000000000001
> R13: ffffc90002767d68 R14: dffffc0000000000 R15: 0000000000000000
> FS: 00005555561b0400(0000) GS:ffff88802ca00000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000555556ead708 CR3: 0000000068b97000 CR4: 0000000000150ef0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
> <TASK>
> p9_client_destroy+0x213/0x370 net/9p/client.c:1100
> v9fs_session_close+0x45/0x2d0 fs/9p/v9fs.c:504
> v9fs_kill_super+0x49/0x90 fs/9p/vfs_super.c:226
> deactivate_locked_super+0x94/0x160 fs/super.c:332
> deactivate_super+0xad/0xd0 fs/super.c:363
> cleanup_mnt+0x3a2/0x540 fs/namespace.c:1173
> task_work_run+0xdd/0x1a0 kernel/task_work.c:164
> tracehook_notify_resume include/linux/tracehook.h:188 [inline]
> exit_to_user_mode_loop kernel/entry/common.c:175 [inline]
> exit_to_user_mode_prepare+0x27e/0x290 kernel/entry/common.c:207
> __syscall_exit_to_user_mode_work kernel/entry/common.c:289 [inline]
> syscall_exit_to_user_mode+0x19/0x60 kernel/entry/common.c:300
> do_syscall_64+0x42/0xb0 arch/x86/entry/common.c:86
> entry_SYSCALL_64_after_hwframe+0x44/0xae
> RIP: 0033:0x7f5ff63ed4c7
> Code: ff ff ff f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 31 f6 e9 09
> 00 00 00 66 0f 1f 84 00 00 00 00 00 b8 a6 00 00 00 0f 05 <48> 3d 01 f0 ff
> ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
> RSP: 002b:00007fff01862e98 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
> RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00007f5ff63ed4c7
> RDX: 00007fff01862f6c RSI: 000000000000000a RDI: 00007fff01862f60
> RBP: 00007fff01862f60 R08: 00000000ffffffff R09: 00007fff01862d30
> R10: 00005555561b18b3 R11: 0000000000000246 R12: 00007f5ff64451ea
> R13: 00007fff01864020 R14: 00005555561b1810 R15: 00007fff01864060
> </TASK>
>
> Signed-off-by: David Kahurani <[email protected]>
> Reported-by: [email protected]
I'm not absolutely sure that this will really fix this issue, but it seems to
be a good idea to add a rcu_barrier() call here nevertheless.
Reviewed-by: Christian Schoenebeck <[email protected]>
> ---
> net/9p/client.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/net/9p/client.c b/net/9p/client.c
> index 8bba0d9cf..67c51913a 100644
> --- a/net/9p/client.c
> +++ b/net/9p/client.c
> @@ -1097,6 +1097,7 @@ void p9_client_destroy(struct p9_client *clnt)
>
> p9_tag_cleanup(clnt);
>
> + rcu_barrier();
> kmem_cache_destroy(clnt->fcall_cache);
> kfree(clnt);
> }
Christian Schoenebeck wrote on Sat, Mar 26, 2022 at 12:48:26PM +0100:
> [...]
>
> > Signed-off-by: David Kahurani <[email protected]>
> > Reported-by: [email protected]
Looks good to me - it's pretty much what I'd have done if I hadn't
forgotten!
It doesn't strike me as anything critical and I don't have anything else
for this cycle so I'll just queue it in -next for now, and submit it
at the start of the 5.19 cycle in ~2months.
> I'm not absolutely sure that this will really fix this issue, but it seems to
> be a good idea to add a rcu_barrier() call here nevertheless.
Yeah, I'm not really sure either but this is the only idea I have given
the debug code doesn't list anything left in the cache, and David came
to the same conclusion :/
Can't hurt though, so let's try and see if syzbot complains
again. Thanks for the review!
--
Dominique
On Samstag, 26. M?rz 2022 13:24:10 CET [email protected] wrote:
> Christian Schoenebeck wrote on Sat, Mar 26, 2022 at 12:48:26PM +0100:
> > [...]
> >
> > > Signed-off-by: David Kahurani <[email protected]>
> > > Reported-by: [email protected]
>
> Looks good to me - it's pretty much what I'd have done if I hadn't
> forgotten!
> It doesn't strike me as anything critical and I don't have anything else
> for this cycle so I'll just queue it in -next for now, and submit it
> at the start of the 5.19 cycle in ~2months.
BTW, another issue that I am seeing for a long time affects the fs-cache: When
I use cache=mmap then things seem to be harmless, I periodically see messages
like these, but that's about it:
[90763.435562] FS-Cache: Duplicate cookie detected
[90763.436514] FS-Cache: O-cookie c=00dcb42f [p=00000003 fl=216 nc=0 na=0]
[90763.437795] FS-Cache: O-cookie d=0000000000000000{?} n=0000000000000000
[90763.440096] FS-Cache: O-key=[8] 'a7ab2c0000000000'
[90763.441656] FS-Cache: N-cookie c=00dcb4a7 [p=00000003 fl=2 nc=0 na=1]
[90763.446753] FS-Cache: N-cookie d=000000005b583d5a{9p.inode} n=00000000212184fb
[90763.448196] FS-Cache: N-key=[8] 'a7ab2c0000000000'
The real trouble starts when I use cache=loose though, in this case I get all
sorts of misbehaviours from time to time, especially complaining about invalid
file descriptors.
Any clues?
Best regards,
Christian Schoenebeck
syzbot has found a reproducer for the following issue on:
HEAD commit: cb71b93c2dc3 Add linux-next specific files for 20220628
git tree: linux-next
console output: https://syzkaller.appspot.com/x/log.txt?x=106a4022080000
kernel config: https://syzkaller.appspot.com/x/.config?x=badbc1adb2d582eb
dashboard link: https://syzkaller.appspot.com/bug?extid=5e28cdb7ebd0f2389ca4
compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=156f74ee080000
IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: [email protected]
------------[ cut here ]------------
kmem_cache_destroy 9p-fcall-cache: Slab cache still has objects when called from p9_client_destroy+0x213/0x370 net/9p/client.c:1100
WARNING: CPU: 0 PID: 3687 at mm/slab_common.c:505 kmem_cache_destroy mm/slab_common.c:505 [inline]
WARNING: CPU: 0 PID: 3687 at mm/slab_common.c:505 kmem_cache_destroy+0x138/0x140 mm/slab_common.c:493
Modules linked in:
CPU: 1 PID: 3687 Comm: syz-executor.0 Not tainted 5.19.0-rc4-next-20220628-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 06/29/2022
RIP: 0010:kmem_cache_destroy mm/slab_common.c:505 [inline]
RIP: 0010:kmem_cache_destroy+0x138/0x140 mm/slab_common.c:493
Code: 95 18 00 48 89 ef e8 07 96 18 00 eb cc c3 48 8b 55 60 48 c7 c6 80 da d7 89 48 c7 c7 88 e8 61 8b 48 8b 4c 24 18 e8 f2 3a 86 07 <0f> 0b eb ab 0f 1f 40 00 41 56 41 89 d6 41 55 49 89 f5 41 54 49 89
RSP: 0018:ffffc900034efcf0 EFLAGS: 00010282
RAX: 0000000000000000 RBX: 1ffff9200069dfa5 RCX: 0000000000000000
RDX: ffff88807513ba80 RSI: ffffffff81610608 RDI: fffff5200069df90
RBP: ffff88801f0cc8c0 R08: 0000000000000005 R09: 0000000000000000
R10: 0000000080000000 R11: 0000000000000001 R12: 0000000000000001
R13: ffffc900034efd68 R14: dffffc0000000000 R15: 0000000000000000
FS: 0000555556019400(0000) GS:ffff8880b9a00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fe57b1fe718 CR3: 00000000728bc000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
p9_client_destroy+0x213/0x370 net/9p/client.c:1100
v9fs_session_close+0x45/0x2d0 fs/9p/v9fs.c:504
v9fs_kill_super+0x49/0x90 fs/9p/vfs_super.c:226
deactivate_locked_super+0x94/0x160 fs/super.c:332
deactivate_super+0xad/0xd0 fs/super.c:363
cleanup_mnt+0x3a2/0x540 fs/namespace.c:1186
task_work_run+0xdd/0x1a0 kernel/task_work.c:177
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:169 [inline]
exit_to_user_mode_prepare+0x23c/0x250 kernel/entry/common.c:201
__syscall_exit_to_user_mode_work kernel/entry/common.c:283 [inline]
syscall_exit_to_user_mode+0x19/0x50 kernel/entry/common.c:294
do_syscall_64+0x42/0xb0 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x46/0xb0
RIP: 0033:0x7fe57ba8a677
Code: ff ff ff f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 31 f6 e9 09 00 00 00 66 0f 1f 84 00 00 00 00 00 b8 a6 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fff19aa4578 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00007fe57ba8a677
RDX: 00007fff19aa464c RSI: 000000000000000a RDI: 00007fff19aa4640
RBP: 00007fff19aa4640 R08: 00000000ffffffff R09: 00007fff19aa4410
R10: 000055555601a8b3 R11: 0000000000000246 R12: 00007fe57bae22a6
R13: 00007fff19aa5700 R14: 000055555601a810 R15: 00007fff19aa5740
</TASK>
syzbot has bisected this issue to:
commit 7302e91f39a81a9c2efcf4bc5749d18128366945
Author: Marco Elver <[email protected]>
Date: Fri Jan 14 22:03:58 2022 +0000
mm/slab_common: use WARN() if cache still has objects on destroy
bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=142882ce080000
start commit: cb71b93c2dc3 Add linux-next specific files for 20220628
git tree: linux-next
final oops: https://syzkaller.appspot.com/x/report.txt?x=162882ce080000
console output: https://syzkaller.appspot.com/x/log.txt?x=122882ce080000
kernel config: https://syzkaller.appspot.com/x/.config?x=badbc1adb2d582eb
dashboard link: https://syzkaller.appspot.com/bug?extid=5e28cdb7ebd0f2389ca4
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=156f74ee080000
Reported-by: [email protected]
Fixes: 7302e91f39a8 ("mm/slab_common: use WARN() if cache still has objects on destroy")
For information about bisection process see: https://goo.gl/tpsmEJ#bisection
Hello,
syzbot tried to test the proposed patch but the build/boot failed:
./include/net/9p/client.h:128:13: error: duplicate member 'reqs'
net/9p/client.c:288:19: error: passing argument 1 of 'idr_alloc' from incompatible pointer type [-Werror=incompatible-pointer-types]
net/9p/client.c:291:19: error: passing argument 1 of 'idr_alloc' from incompatible pointer type [-Werror=incompatible-pointer-types]
net/9p/client.c:336:17: error: passing argument 1 of 'idr_find' from incompatible pointer type [-Werror=incompatible-pointer-types]
net/9p/client.c:370:13: error: passing argument 1 of 'idr_remove' from incompatible pointer type [-Werror=incompatible-pointer-types]
net/9p/client.c:407:21: error: passing argument 1 of 'idr_get_next' from incompatible pointer type [-Werror=incompatible-pointer-types]
net/9p/client.c:1024:11: error: passing argument 1 of 'idr_init' from incompatible pointer type [-Werror=incompatible-pointer-types]
Tested on:
commit: cb71b93c Add linux-next specific files for 20220628
git tree: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
dashboard link: https://syzkaller.appspot.com/bug?extid=5e28cdb7ebd0f2389ca4
compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
patch: https://syzkaller.appspot.com/x/patch.diff?x=10707be6080000
On 7/24/22 15:17, syzbot wrote:
> syzbot has bisected this issue to:
>
> commit 7302e91f39a81a9c2efcf4bc5749d18128366945
> Author: Marco Elver <[email protected]>
> Date: Fri Jan 14 22:03:58 2022 +0000
>
> mm/slab_common: use WARN() if cache still has objects on destroy
Just to state the obvious, bisection pointed to a commit that added the
warning, but the reason for the warning would be that p9 is destroying a
kmem_cache without freeing all the objects there first, and that would be
true even before the commit.
>
> bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=142882ce080000
> start commit: cb71b93c2dc3 Add linux-next specific files for 20220628
> git tree: linux-next
> final oops: https://syzkaller.appspot.com/x/report.txt?x=162882ce080000
> console output: https://syzkaller.appspot.com/x/log.txt?x=122882ce080000
> kernel config: https://syzkaller.appspot.com/x/.config?x=badbc1adb2d582eb
> dashboard link: https://syzkaller.appspot.com/bug?extid=5e28cdb7ebd0f2389ca4
> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=156f74ee080000
>
> Reported-by: [email protected]
> Fixes: 7302e91f39a8 ("mm/slab_common: use WARN() if cache still has objects on destroy")
>
> For information about bisection process see: https://goo.gl/tpsmEJ#bisection
Hello,
syzbot has tested the proposed patch but the reproducer is still triggering an issue:
INFO: trying to register non-static key in p9_req_free
INFO: trying to register non-static key.
The code is fine but needs lockdep annotation, or maybe
you didn't initialize this object before use?
turning off the locking correctness validator.
CPU: 0 PID: 4084 Comm: syz-executor.0 Not tainted 5.19.0-rc4-next-20220628-syzkaller-dirty #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 06/29/2022
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
assign_lock_key kernel/locking/lockdep.c:979 [inline]
register_lock_class+0xf30/0x1130 kernel/locking/lockdep.c:1292
__lock_acquire+0x10a/0x5660 kernel/locking/lockdep.c:4932
lock_acquire kernel/locking/lockdep.c:5665 [inline]
lock_acquire+0x1ab/0x570 kernel/locking/lockdep.c:5630
__raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
_raw_spin_lock_irqsave+0x39/0x50 kernel/locking/spinlock.c:162
complete+0x19/0x1f0 kernel/sched/completion.c:32
p9_req_free+0x16d/0x1d0 net/9p/client.c:385
kref_put include/linux/kref.h:65 [inline]
p9_req_put net/9p/client.c:390 [inline]
p9_tag_remove+0xe2/0x120 net/9p/client.c:372
p9_client_version net/9p/client.c:999 [inline]
p9_client_create+0xc5f/0x1100 net/9p/client.c:1062
v9fs_session_init+0x1e2/0x1810 fs/9p/v9fs.c:408
v9fs_mount+0x73/0xa80 fs/9p/vfs_super.c:126
legacy_get_tree+0x105/0x220 fs/fs_context.c:610
vfs_get_tree+0x89/0x2f0 fs/super.c:1501
do_new_mount fs/namespace.c:3040 [inline]
path_mount+0x1320/0x1fa0 fs/namespace.c:3370
do_mount fs/namespace.c:3383 [inline]
__do_sys_mount fs/namespace.c:3591 [inline]
__se_sys_mount fs/namespace.c:3568 [inline]
__x64_sys_mount+0x27f/0x300 fs/namespace.c:3568
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x46/0xb0
RIP: 0033:0x7fc4c8e89209
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fc4c9fd0168 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
RAX: ffffffffffffffda RBX: 00007fc4c8f9bf60 RCX: 00007fc4c8e89209
RDX: 0000000020000140 RSI: 0000000020000200 RDI: 0000000000000000
RBP: 00007fc4c8ee3161 R08: 0000000020000580 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffdb94b6f3f R14: 00007fc4c9fd0300 R15: 0000000000022000
</TASK>
BUG: unable to handle page fault for address: fffffffffffffff8
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD ba8f067 P4D ba8f067 PUD ba91067 PMD 0
Oops: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 4084 Comm: syz-executor.0 Not tainted 5.19.0-rc4-next-20220628-syzkaller-dirty #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 06/29/2022
RIP: 0010:swake_up_locked kernel/sched/swait.c:29 [inline]
RIP: 0010:swake_up_locked kernel/sched/swait.c:21 [inline]
RIP: 0010:complete+0x98/0x1f0 kernel/sched/completion.c:36
Code: e0 0f 84 f7 00 00 00 48 b8 00 00 00 00 00 fc ff df 48 8b 5b 48 48 8d 7b f8 48 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 03 01 00 00 <48> 8b 7b f8 4c 8d 63 08 e8 8b 74 f8 ff 48 89 df e8 e3 79 a3 02 84
RSP: 0018:ffffc900030af978 EFLAGS: 00010046
RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffffff815f0600
RDX: 1fffffffffffffff RSI: 0000000000000004 RDI: fffffffffffffff8
RBP: 0000000000000293 R08: 0000000000000001 R09: 0000000000000003
R10: fffff52000615f1d R11: 3e4b5341542f3c20 R12: ffff88806ef26cb8
R13: ffff88806ef26c78 R14: ffff8880205a6000 R15: ffff88806ef26c48
FS: 00007fc4c9fd0700(0000) GS:ffff8880b9a00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: fffffffffffffff8 CR3: 0000000078de1000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
p9_req_free+0x16d/0x1d0 net/9p/client.c:385
kref_put include/linux/kref.h:65 [inline]
p9_req_put net/9p/client.c:390 [inline]
p9_tag_remove+0xe2/0x120 net/9p/client.c:372
p9_client_version net/9p/client.c:999 [inline]
p9_client_create+0xc5f/0x1100 net/9p/client.c:1062
v9fs_session_init+0x1e2/0x1810 fs/9p/v9fs.c:408
v9fs_mount+0x73/0xa80 fs/9p/vfs_super.c:126
legacy_get_tree+0x105/0x220 fs/fs_context.c:610
vfs_get_tree+0x89/0x2f0 fs/super.c:1501
do_new_mount fs/namespace.c:3040 [inline]
path_mount+0x1320/0x1fa0 fs/namespace.c:3370
do_mount fs/namespace.c:3383 [inline]
__do_sys_mount fs/namespace.c:3591 [inline]
__se_sys_mount fs/namespace.c:3568 [inline]
__x64_sys_mount+0x27f/0x300 fs/namespace.c:3568
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x46/0xb0
RIP: 0033:0x7fc4c8e89209
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fc4c9fd0168 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
RAX: ffffffffffffffda RBX: 00007fc4c8f9bf60 RCX: 00007fc4c8e89209
RDX: 0000000020000140 RSI: 0000000020000200 RDI: 0000000000000000
RBP: 00007fc4c8ee3161 R08: 0000000020000580 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffdb94b6f3f R14: 00007fc4c9fd0300 R15: 0000000000022000
</TASK>
Modules linked in:
CR2: fffffffffffffff8
---[ end trace 0000000000000000 ]---
RIP: 0010:swake_up_locked kernel/sched/swait.c:29 [inline]
RIP: 0010:swake_up_locked kernel/sched/swait.c:21 [inline]
RIP: 0010:complete+0x98/0x1f0 kernel/sched/completion.c:36
Code: e0 0f 84 f7 00 00 00 48 b8 00 00 00 00 00 fc ff df 48 8b 5b 48 48 8d 7b f8 48 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 03 01 00 00 <48> 8b 7b f8 4c 8d 63 08 e8 8b 74 f8 ff 48 89 df e8 e3 79 a3 02 84
RSP: 0018:ffffc900030af978 EFLAGS: 00010046
RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffffff815f0600
RDX: 1fffffffffffffff RSI: 0000000000000004 RDI: fffffffffffffff8
RBP: 0000000000000293 R08: 0000000000000001 R09: 0000000000000003
R10: fffff52000615f1d R11: 3e4b5341542f3c20 R12: ffff88806ef26cb8
R13: ffff88806ef26c78 R14: ffff8880205a6000 R15: ffff88806ef26c48
FS: 00007fc4c9fd0700(0000) GS:ffff8880b9a00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: fffffffffffffff8 CR3: 0000000078de1000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
----------------
Code disassembly (best guess), 1 bytes skipped:
0: 0f 84 f7 00 00 00 je 0xfd
6: 48 b8 00 00 00 00 00 movabs $0xdffffc0000000000,%rax
d: fc ff df
10: 48 8b 5b 48 mov 0x48(%rbx),%rbx
14: 48 8d 7b f8 lea -0x8(%rbx),%rdi
18: 48 89 fa mov %rdi,%rdx
1b: 48 c1 ea 03 shr $0x3,%rdx
1f: 80 3c 02 00 cmpb $0x0,(%rdx,%rax,1)
23: 0f 85 03 01 00 00 jne 0x12c
* 29: 48 8b 7b f8 mov -0x8(%rbx),%rdi <-- trapping instruction
2d: 4c 8d 63 08 lea 0x8(%rbx),%r12
31: e8 8b 74 f8 ff callq 0xfff874c1
36: 48 89 df mov %rbx,%rdi
39: e8 e3 79 a3 02 callq 0x2a37a21
3e: 84 .byte 0x84
Tested on:
commit: cb71b93c Add linux-next specific files for 20220628
git tree: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
console output: https://syzkaller.appspot.com/x/log.txt?x=121acc4a080000
kernel config: https://syzkaller.appspot.com/x/.config?x=badbc1adb2d582eb
dashboard link: https://syzkaller.appspot.com/bug?extid=5e28cdb7ebd0f2389ca4
compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
patch: https://syzkaller.appspot.com/x/patch.diff?x=109ffe72080000
Vlastimil Babka wrote on Mon, Jul 25, 2022 at 12:15:24PM +0200:
> On 7/24/22 15:17, syzbot wrote:
> > syzbot has bisected this issue to:
> >
> > commit 7302e91f39a81a9c2efcf4bc5749d18128366945
> > Author: Marco Elver <[email protected]>
> > Date: Fri Jan 14 22:03:58 2022 +0000
> >
> > mm/slab_common: use WARN() if cache still has objects on destroy
>
> Just to state the obvious, bisection pointed to a commit that added the
> warning, but the reason for the warning would be that p9 is destroying a
> kmem_cache without freeing all the objects there first, and that would be
> true even before the commit.
Probably true from the moment that cache/idr was introduced... I've got
a couple of fixes in next but given syzcaller claims that's the tree it
was produced on I guess there can be more such leaks.
(well, the lines it sent in the backtrace yesterday don't match next,
but I wouldn't count on it)
If someone wants to have a look please feel free, I would bet the
problem is just that p9_fd_close() doesn't call or does something
equivalent to p9_conn_cancel() and there just are some requests that
haven't been sent yet when the mount is closed..
But I don't have/can/want to take the time to check right now as I
consider such a leak harmless enough, someone has to be root or
equivalent to do 9p mounts in most cases.
--
Dominique
Hello,
syzbot has tested the proposed patch but the reproducer is still triggering an issue:
INFO: task hung in p9_client_destroy
INFO: task syz-executor.0:4046 blocked for more than 143 seconds.
Not tainted 5.19.0-rc4-next-20220628-syzkaller-dirty #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:syz-executor.0 state:D stack:24520 pid: 4046 ppid: 1 flags:0x00004006
Call Trace:
<TASK>
context_switch kernel/sched/core.c:5184 [inline]
__schedule+0xa09/0x4f10 kernel/sched/core.c:6496
schedule+0xd2/0x1f0 kernel/sched/core.c:6568
schedule_timeout+0x1db/0x2a0 kernel/time/timer.c:1911
do_wait_for_common kernel/sched/completion.c:85 [inline]
__wait_for_common+0x378/0x530 kernel/sched/completion.c:106
p9_client_destroy+0x306/0x3b0 net/9p/client.c:1109
v9fs_session_close+0x45/0x2d0 fs/9p/v9fs.c:504
v9fs_kill_super+0x49/0x90 fs/9p/vfs_super.c:226
deactivate_locked_super+0x94/0x160 fs/super.c:332
deactivate_super+0xad/0xd0 fs/super.c:363
cleanup_mnt+0x3a2/0x540 fs/namespace.c:1186
task_work_run+0xdd/0x1a0 kernel/task_work.c:177
resume_user_mode_work include/linux/resume_user_mode.h:49 [inline]
exit_to_user_mode_loop kernel/entry/common.c:169 [inline]
exit_to_user_mode_prepare+0x23c/0x250 kernel/entry/common.c:201
__syscall_exit_to_user_mode_work kernel/entry/common.c:283 [inline]
syscall_exit_to_user_mode+0x19/0x50 kernel/entry/common.c:294
do_syscall_64+0x42/0xb0 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x46/0xb0
RIP: 0033:0x7f7586c8a677
RSP: 002b:00007fffb29ce3b8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00007f7586c8a677
RDX: 00007fffb29ce48a RSI: 000000000000000a RDI: 00007fffb29ce480
RBP: 00007fffb29ce480 R08: 00000000ffffffff R09: 00007fffb29ce250
R10: 00005555571af8b3 R11: 0000000000000246 R12: 00007f7586ce22a6
R13: 00007fffb29cf540 R14: 00005555571af810 R15: 00007fffb29cf580
</TASK>
Showing all locks held in the system:
1 lock held by rcu_tasks_kthre/12:
#0: ffffffff8bd864f0 (rcu_tasks.tasks_gp_mutex){+.+.}-{3:3}, at: rcu_tasks_one_gp+0x26/0xc70 kernel/rcu/tasks.h:507
1 lock held by rcu_tasks_trace/13:
#0: ffffffff8bd861f0 (rcu_tasks_trace.tasks_gp_mutex){+.+.}-{3:3}, at: rcu_tasks_one_gp+0x26/0xc70 kernel/rcu/tasks.h:507
1 lock held by khungtaskd/27:
#0: ffffffff8bd87040 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x53/0x260 kernel/locking/lockdep.c:6491
2 locks held by getty/3279:
#0: ffff88814ac14098 (&tty->ldisc_sem){++++}-{0:0}, at: tty_ldisc_ref_wait+0x22/0x80 drivers/tty/tty_ldisc.c:244
#1: ffffc90002d162f0 (&ldata->atomic_read_lock){+.+.}-{3:3}, at: n_tty_read+0xe50/0x13c0 drivers/tty/n_tty.c:2177
=============================================
NMI backtrace for cpu 1
CPU: 1 PID: 27 Comm: khungtaskd Not tainted 5.19.0-rc4-next-20220628-syzkaller-dirty #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 06/29/2022
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
nmi_cpu_backtrace.cold+0x47/0x144 lib/nmi_backtrace.c:111
nmi_trigger_cpumask_backtrace+0x1e6/0x230 lib/nmi_backtrace.c:62
trigger_all_cpu_backtrace include/linux/nmi.h:146 [inline]
check_hung_uninterruptible_tasks kernel/hung_task.c:212 [inline]
watchdog+0xc18/0xf50 kernel/hung_task.c:369
kthread+0x2e9/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:302
</TASK>
Sending NMI from CPU 1 to CPUs 0:
NMI backtrace for cpu 0
CPU: 0 PID: 46 Comm: kworker/u4:3 Not tainted 5.19.0-rc4-next-20220628-syzkaller-dirty #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 06/29/2022
Workqueue: events_unbound toggle_allocation_gate
RIP: 0010:preempt_latency_start kernel/sched/core.c:5628 [inline]
RIP: 0010:preempt_latency_start kernel/sched/core.c:5623 [inline]
RIP: 0010:preempt_count_add+0x81/0x140 kernel/sched/core.c:5651
Code: 05 2f b0 7e 25 ff ff ff 7f 39 c3 74 03 5b 5d c3 48 8b 5c 24 10 48 89 df e8 0c 9a 0c 00 85 c0 75 35 65 48 8b 2c 25 80 6f 02 00 <48> 8d bd 48 14 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1
RSP: 0018:ffffc90000b779f0 EFLAGS: 00000246
RAX: 0000000000000000 RBX: ffffffff897b1e8a RCX: 1ffffffff21eeb10
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff897b1e8a
RBP: ffff888017483a80 R08: ffffffff812bb538 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
R13: dffffc0000000000 R14: 0000000000000000 R15: 0000000000000000
FS: 0000000000000000(0000) GS:ffff8880b9a00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f4129efb990 CR3: 000000000ba8e000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
__mutex_lock_common kernel/locking/mutex.c:602 [inline]
__mutex_lock+0x10a/0x1350 kernel/locking/mutex.c:747
arch_jump_label_transform_queue+0x58/0x100 arch/x86/kernel/jump_label.c:136
__jump_label_update+0x12e/0x400 kernel/jump_label.c:451
jump_label_update+0x32f/0x410 kernel/jump_label.c:830
static_key_disable_cpuslocked+0x152/0x1b0 kernel/jump_label.c:207
static_key_disable+0x16/0x20 kernel/jump_label.c:215
toggle_allocation_gate mm/kfence/core.c:825 [inline]
toggle_allocation_gate+0x183/0x390 mm/kfence/core.c:803
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e9/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:302
</TASK>
Tested on:
commit: cb71b93c Add linux-next specific files for 20220628
git tree: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
console output: https://syzkaller.appspot.com/x/log.txt?x=16da19aa080000
kernel config: https://syzkaller.appspot.com/x/.config?x=badbc1adb2d582eb
dashboard link: https://syzkaller.appspot.com/bug?extid=5e28cdb7ebd0f2389ca4
compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
patch: https://syzkaller.appspot.com/x/patch.diff?x=147ce022080000
On Mon, 25 Jul 2022 at 13:51, <[email protected]> wrote:
>
> Vlastimil Babka wrote on Mon, Jul 25, 2022 at 12:15:24PM +0200:
> > On 7/24/22 15:17, syzbot wrote:
> > > syzbot has bisected this issue to:
> > >
> > > commit 7302e91f39a81a9c2efcf4bc5749d18128366945
> > > Author: Marco Elver <[email protected]>
> > > Date: Fri Jan 14 22:03:58 2022 +0000
> > >
> > > mm/slab_common: use WARN() if cache still has objects on destroy
> >
> > Just to state the obvious, bisection pointed to a commit that added the
> > warning, but the reason for the warning would be that p9 is destroying a
> > kmem_cache without freeing all the objects there first, and that would be
> > true even before the commit.
>
> Probably true from the moment that cache/idr was introduced... I've got
> a couple of fixes in next but given syzcaller claims that's the tree it
> was produced on I guess there can be more such leaks.
> (well, the lines it sent in the backtrace yesterday don't match next,
> but I wouldn't count on it)
>
> If someone wants to have a look please feel free, I would bet the
> problem is just that p9_fd_close() doesn't call or does something
> equivalent to p9_conn_cancel() and there just are some requests that
> haven't been sent yet when the mount is closed..
> But I don't have/can/want to take the time to check right now as I
> consider such a leak harmless enough, someone has to be root or
> equivalent to do 9p mounts in most cases.
FWIW with KASAN we have allocation stacks for each heap object. So
when KASAN is enabled that warning could list all live object
allocation stacks.
On Montag, 25. Juli 2022 14:45:08 CEST Dmitry Vyukov wrote:
> On Mon, 25 Jul 2022 at 13:51, <[email protected]> wrote:
> > Vlastimil Babka wrote on Mon, Jul 25, 2022 at 12:15:24PM +0200:
> > > On 7/24/22 15:17, syzbot wrote:
> > > > syzbot has bisected this issue to:
> > > >
> > > > commit 7302e91f39a81a9c2efcf4bc5749d18128366945
> > > > Author: Marco Elver <[email protected]>
> > > > Date: Fri Jan 14 22:03:58 2022 +0000
> > > >
> > > > mm/slab_common: use WARN() if cache still has objects on destroy
> > >
> > > Just to state the obvious, bisection pointed to a commit that added the
> > > warning, but the reason for the warning would be that p9 is destroying a
> > > kmem_cache without freeing all the objects there first, and that would
> > > be
> > > true even before the commit.
> >
> > Probably true from the moment that cache/idr was introduced... I've got
> > a couple of fixes in next but given syzcaller claims that's the tree it
> > was produced on I guess there can be more such leaks.
> > (well, the lines it sent in the backtrace yesterday don't match next,
> > but I wouldn't count on it)
> >
> > If someone wants to have a look please feel free, I would bet the
> > problem is just that p9_fd_close() doesn't call or does something
> > equivalent to p9_conn_cancel() and there just are some requests that
> > haven't been sent yet when the mount is closed..
> > But I don't have/can/want to take the time to check right now as I
> > consider such a leak harmless enough, someone has to be root or
> > equivalent to do 9p mounts in most cases.
>
> FWIW with KASAN we have allocation stacks for each heap object. So
> when KASAN is enabled that warning could list all live object
> allocation stacks.
With allocation stack you mean the backtrace/call stack at the point in time
when the memory originally was acquired?
If the answer is yes, then sure, if someone had a chance to post those
backtraces, then that would help us to take a closer look at where this leak
might happen. Otherwise I fear it will end up among those other "lack of
priority" issues.
Best regards,
Christian Schoenebeck
On Tue, 26 Jul 2022 at 14:10, Christian Schoenebeck
<[email protected]> wrote:
>
> On Montag, 25. Juli 2022 14:45:08 CEST Dmitry Vyukov wrote:
> > On Mon, 25 Jul 2022 at 13:51, <[email protected]> wrote:
> > > Vlastimil Babka wrote on Mon, Jul 25, 2022 at 12:15:24PM +0200:
> > > > On 7/24/22 15:17, syzbot wrote:
> > > > > syzbot has bisected this issue to:
> > > > >
> > > > > commit 7302e91f39a81a9c2efcf4bc5749d18128366945
> > > > > Author: Marco Elver <[email protected]>
> > > > > Date: Fri Jan 14 22:03:58 2022 +0000
> > > > >
> > > > > mm/slab_common: use WARN() if cache still has objects on destroy
> > > >
> > > > Just to state the obvious, bisection pointed to a commit that added the
> > > > warning, but the reason for the warning would be that p9 is destroying a
> > > > kmem_cache without freeing all the objects there first, and that would
> > > > be
> > > > true even before the commit.
> > >
> > > Probably true from the moment that cache/idr was introduced... I've got
> > > a couple of fixes in next but given syzcaller claims that's the tree it
> > > was produced on I guess there can be more such leaks.
> > > (well, the lines it sent in the backtrace yesterday don't match next,
> > > but I wouldn't count on it)
> > >
> > > If someone wants to have a look please feel free, I would bet the
> > > problem is just that p9_fd_close() doesn't call or does something
> > > equivalent to p9_conn_cancel() and there just are some requests that
> > > haven't been sent yet when the mount is closed..
> > > But I don't have/can/want to take the time to check right now as I
> > > consider such a leak harmless enough, someone has to be root or
> > > equivalent to do 9p mounts in most cases.
> >
> > FWIW with KASAN we have allocation stacks for each heap object. So
> > when KASAN is enabled that warning could list all live object
> > allocation stacks.
>
> With allocation stack you mean the backtrace/call stack at the point in time
> when the memory originally was acquired?
>
> If the answer is yes, then sure, if someone had a chance to post those
> backtraces, then that would help us to take a closer look at where this leak
> might happen. Otherwise I fear it will end up among those other "lack of
> priority" issues.
Yes, I meant providing allocation stacks for leaked objects.
Filed https://bugzilla.kernel.org/show_bug.cgi?id=216306 for this feature.