2024-03-25 15:29:46

by Borislav Petkov

[permalink] [raw]
Subject: Re: Boot failure with kernel BUG at mm/usercopy.c on next-20240325

Adding more people from that

328c801335d5 ("cpumask: create dedicated kmem cache for cpumask var")

in linux-next.

On Mon, Mar 25, 2024 at 01:40:20PM +0100, V, Narasimhan wrote:
> [AMD Official Use Only - General]
>
> Hi,
> There is a boot failure as below.
> On bisecting, the bad commit is found to be 328c801335d5f7edf2a3c9c331ddf8978f21e2a7.
> Boots fine if we revert the above bad commit.

Narasimhan,

please send your .config and reproduction instructions. I'm guessing
you're simply booting it, right?

Leaving in the rest for the newly CCed people.

> kernel BUG at mm/usercopy.c:102!
> invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
> CPU: 15 PID: 567 Comm: systemd-udevd Not tainted 6.9.0-rc1-next-20240325-1711333827684 #1
> Hardware name: AMD Corporation Shale96/Shale96, BIOS RSH100BD 12/11/2023
> RIP: 0010:usercopy_abort+0x72/0x90
> Code: 4f f7 b8 50 48 c7 c2 31 a6 f4 b8 57 48 c7 c7 50 10 fe b8 48 0f 44 d6 48 c7 c6 32 30 f5 b8 4c 89 d1 49 0f 44 f3 e8 5e 2b d1 ff <0f> 0b 49 c7 c1 1c 60 f4 b8 4c 89 cf 4d 89 c8 eb a9 66 66 2e 0f 1f
> RSP: 0018:ff855d5641947e08 EFLAGS: 00010246
> RAX: 0000000000000060 RBX: 0000000000000000 RCX: ff44c5dccd9ff8a8
> RDX: 0000000000000000 RSI: 0000000000000003 RDI: 0000000000000001
> RBP: ff855d5641947e20 R08: 0000000000000060 R09: 6c656e72654b203a
> R10: ffffffffba1edd60 R11: 657275736f707865 R12: 0000000000000008
> R13: ff44c5cd80037800 R14: 0000000000000001 R15: 0000000000000000
> FS: 00007fbc2be258c0(0000) GS:ff44c5dc86f80000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007fbc2c459230 CR3: 0000000103f54004 CR4: 0000000000771ef0
> PKRU: 55555554
> Call Trace:
> <TASK>
> ? show_regs+0x6d/0x80
> ? die+0x3c/0xa0
> ? do_trap+0xcf/0xf0
> ? do_error_trap+0x75/0xa0
> ? usercopy_abort+0x72/0x90
> ? exc_invalid_op+0x57/0x80
> ? usercopy_abort+0x72/0x90
> ? asm_exc_invalid_op+0x1f/0x30
> ? usercopy_abort+0x72/0x90
> ? usercopy_abort+0x72/0x90
> __check_heap_object+0xd6/0x110
> __check_object_size+0x28a/0x2f0
> ? srso_alias_return_thunk+0x5/0xfbef5
> __x64_sys_sched_getaffinity+0xda/0x120
> do_syscall_64+0x76/0x120
> ? srso_alias_return_thunk+0x5/0xfbef5
> ? do_syscall_64+0x85/0x120
> ? srso_alias_return_thunk+0x5/0xfbef5
> ? syscall_exit_to_user_mode+0x75/0x190
> ? srso_alias_return_thunk+0x5/0xfbef5
> ? do_syscall_64+0x85/0x120
> entry_SYSCALL_64_after_hwframe+0x6c/0x74
> RIP: 0033:0x7fbc2c507d6a
> Code: d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa 49 89 f0 be ff ff ff 7f b8 cc 00 00 00 49 39 f0 49 0f 46 f0 0f 05 <48> 3d 00 f0 ff ff 77 2e 41 89 c1 83 f8 ff 74 38 48 98 48 83 ec 08
> RSP: 002b:00007ffceab058d8 EFLAGS: 00000297 ORIG_RAX: 00000000000000cc
> RAX: ffffffffffffffda RBX: 0000000000000010 RCX: 00007fbc2c507d6a
> RDX: 000055b26dfa3040 RSI: 0000000000000008 RDI: 0000000000000000
> RBP: 000055b26dfa3040 R08: 0000000000000008 R09: 00000000ffffffff
> R10: 000055b26dfa3030 R11: 0000000000000297 R12: 0000000000000008
> R13: 000000000000003c R14: 00007ffceab05ac8 R15: 000055b258445078
> </TASK>
> Modules linked in: aesni_intel crypto_simd cryptd
> ---[ end trace 0000000000000000 ]---

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette


2024-03-25 18:46:25

by Andrew Morton

[permalink] [raw]
Subject: Re: Boot failure with kernel BUG at mm/usercopy.c on next-20240325

On Mon, 25 Mar 2024 13:50:17 +0100 Borislav Petkov <[email protected]> wrote:

> Adding more people from that
>
> 328c801335d5 ("cpumask: create dedicated kmem cache for cpumask var")
>
> in linux-next.
>
> On Mon, Mar 25, 2024 at 01:40:20PM +0100, V, Narasimhan wrote:
> > [AMD Official Use Only - General]
> >
> > Hi,
> > There is a boot failure as below.
> > On bisecting, the bad commit is found to be 328c801335d5f7edf2a3c9c331ddf8978f21e2a7.
> > Boots fine if we revert the above bad commit.
>
> Narasimhan,
>
> please send your .config and reproduction instructions. I'm guessing
> you're simply booting it, right?
>
> Leaving in the rest for the newly CCed people.

Thanks, I'll just drop the patch. It didn't receive a very favorable
review reception anyway.


2024-03-25 20:39:14

by Borislav Petkov

[permalink] [raw]
Subject: Re: Boot failure with kernel BUG at mm/usercopy.c on next-20240325

On Mon, Mar 25, 2024 at 11:34:33AM -0700, Andrew Morton wrote:
> Thanks, I'll just drop the patch. It didn't receive a very favorable
> review reception anyway.

See here:

https://lore.kernel.org/all/DM4PR12MB5086B9BDBF32D53DF226CBF489362@DM4PR12MB5086.namprd12.prod.outlook.com/

folks still need to learn email. :-)

Anyway, apparently there's some fix there.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2024-03-26 02:44:33

by Feng Tang

[permalink] [raw]
Subject: Re: Boot failure with kernel BUG at mm/usercopy.c on next-20240325

Add Vlastimil for slab related topic.

On Tue, Mar 26, 2024 at 04:37:14AM +0800, Borislav Petkov wrote:
> On Mon, Mar 25, 2024 at 11:34:33AM -0700, Andrew Morton wrote:
> > Thanks, I'll just drop the patch. It didn't receive a very favorable
> > review reception anyway.
>
> See here:
>
> https://lore.kernel.org/all/DM4PR12MB5086B9BDBF32D53DF226CBF489362@DM4PR12MB5086.namprd12.prod.outlook.com/
>
> folks still need to learn email. :-)
>
> Anyway, apparently there's some fix there.

The original commit 328c801335d5 ("cpumask: create dedicated kmem
cache for cpumask var") has some benefit, that there are CPU numbers
which are not power of 8, like 144, 288 etc where it will save
some memory.

And 'slabtop' on a qemu-VM with 16 cpus shows it is surprisingly
non-trivial and has the third largest number of objects:

22350 22350 100% 0.13K 745 30 2980K kernfs_node_cache
11172 10693 0% 0.19K 266 42 2128K dentry
10240 8222 0% 0.01K 20 512 80K cpumask

Andrew, if it is worth merging, you can folder my fix into the patch.

Thanks,
Feng


> Thx.
>
> --
> Regards/Gruss,
> Boris.
>
> https://people.kernel.org/tglx/notes-about-netiquette
>

2024-03-26 19:56:39

by Andrew Morton

[permalink] [raw]
Subject: Re: Boot failure with kernel BUG at mm/usercopy.c on next-20240325

On Tue, 26 Mar 2024 10:30:10 +0800 Feng Tang <[email protected]> wrote:

> Add Vlastimil for slab related topic.
>
> On Tue, Mar 26, 2024 at 04:37:14AM +0800, Borislav Petkov wrote:
> > On Mon, Mar 25, 2024 at 11:34:33AM -0700, Andrew Morton wrote:
> > > Thanks, I'll just drop the patch. It didn't receive a very favorable
> > > review reception anyway.
> >
> > See here:
> >
> > https://lore.kernel.org/all/DM4PR12MB5086B9BDBF32D53DF226CBF489362@DM4PR12MB5086.namprd12.prod.outlook.com/
> >
> > folks still need to learn email. :-)
> >
> > Anyway, apparently there's some fix there.
>
> The original commit 328c801335d5 ("cpumask: create dedicated kmem
> cache for cpumask var") has some benefit, that there are CPU numbers
> which are not power of 8, like 144, 288 etc where it will save
> some memory.
>
> And 'slabtop' on a qemu-VM with 16 cpus shows it is surprisingly
> non-trivial and has the third largest number of objects:
>
> 22350 22350 100% 0.13K 745 30 2980K kernfs_node_cache
> 11172 10693 0% 0.19K 266 42 2128K dentry
> 10240 8222 0% 0.01K 20 512 80K cpumask
>
> Andrew, if it is worth merging, you can folder my fix into the patch.

I'll await a resend, please.