2020-06-22 05:41:08

by syzbot

[permalink] [raw]
Subject: linux-next boot error: WARNING in kmem_cache_free

Hello,

syzbot found the following crash on:

HEAD commit: 5a94f5bc Add linux-next specific files for 20200621
git tree: linux-next
console output: https://syzkaller.appspot.com/x/log.txt?x=12a02c76100000
kernel config: https://syzkaller.appspot.com/x/.config?x=e1788c418b2ddc66
dashboard link: https://syzkaller.appspot.com/bug?extid=95bccd805a4aa06a4b0d
compiler: gcc (GCC) 9.0.0 20181231 (experimental)

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: [email protected]

------------[ cut here ]------------
WARNING: CPU: 0 PID: 0 at mm/slab.h:232 kmem_cache_free+0x0/0x200 mm/slab.c:2262
Kernel panic - not syncing: panic_on_warn set ...
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.8.0-rc1-next-20200621-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x18f/0x20d lib/dump_stack.c:118
panic+0x2e3/0x75c kernel/panic.c:231
__warn.cold+0x2f/0x3a kernel/panic.c:600
report_bug+0x271/0x2f0 lib/bug.c:198
exc_invalid_op+0x1b9/0x370 arch/x86/kernel/traps.c:235
asm_exc_invalid_op+0x12/0x20 arch/x86/include/asm/idtentry.h:563
RIP: 0010:kmem_cache_debug_flags mm/slab.h:232 [inline]
RIP: 0010:cache_from_obj mm/slab.h:459 [inline]
RIP: 0010:kmem_cache_free+0x0/0x200 mm/slab.c:3678
Code: ff 49 c7 84 24 90 00 00 00 00 00 00 00 83 c3 01 39 1d 2c ec fb 08 77 af 5b 5d 41 5c 41 5d c3 90 66 2e 0f 1f 84 00 00 00 00 00 <0f> 0b 48 85 ff 0f 84 a9 01 00 00 48 83 3d 15 6b 02 08 00 0f 84 9c
RSP: 0000:ffffffff89a07a58 EFLAGS: 00010293
RAX: ffffffff89a86580 RBX: ffff8880aa01f0e8 RCX: ffffffff81a84573
RDX: 0000000000000000 RSI: ffff8880aa01f480 RDI: ffff8880aa00fe00
RBP: ffff8880aa01f4a8 R08: ffffffff89a86580 R09: fffffbfff1340f3f
R10: 0000000000000003 R11: fffffbfff1340f3e R12: ffff8880aa01f4b0
R13: ffff8880aa01f688 R14: ffff8880aa01f480 R15: ffffc90000000000
adjust_va_to_fit_type mm/vmalloc.c:980 [inline]
__alloc_vmap_area mm/vmalloc.c:1096 [inline]
alloc_vmap_area+0x1494/0x1df0 mm/vmalloc.c:1196
__get_vm_area_node+0x178/0x3b0 mm/vmalloc.c:2060
__vmalloc_node_range+0x12c/0x910 mm/vmalloc.c:2484
__vmalloc_node mm/vmalloc.c:2532 [inline]
__vmalloc_area_node mm/vmalloc.c:2404 [inline]
__vmalloc_node_range+0x76c/0x910 mm/vmalloc.c:2489
__vmalloc_node mm/vmalloc.c:2532 [inline]
__vmalloc+0x69/0x80 mm/vmalloc.c:2546
alloc_large_system_hash+0x1c9/0x2e2 mm/page_alloc.c:8181
inode_init+0xab/0xbc fs/inode.c:2099
vfs_caches_init+0x104/0x11e fs/dcache.c:3231
start_kernel+0x978/0x9fb init/main.c:1025
secondary_startup_64+0xa4/0xb0 arch/x86/kernel/head_64.S:243


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at [email protected].

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.


2020-06-22 06:31:48

by Qian Cai

[permalink] [raw]
Subject: Re: linux-next boot error: WARNING in kmem_cache_free



> On Jun 22, 2020, at 1:37 AM, syzbot <[email protected]> wrote:
>
> WARNING: CPU: 0 PID: 0 at mm/slab.h:232 kmem_cache_free+0x0/0x200 mm/slab.c:2262

Is there any particular reason to use CONFIG_SLAB rather than CONFIG_SLUB?

You are really asking for trouble to test something that almost nobody is exercising that code path very well nowadays.

Anyway, there is a patchset in -mm that might well introduce this regression that we could go to confirm it, but I kind of don’t want to spend too much time on SLAB that suppose to be obsolete eventually.

2020-06-22 06:45:04

by Dmitry Vyukov

[permalink] [raw]
Subject: Re: linux-next boot error: WARNING in kmem_cache_free

On Mon, Jun 22, 2020 at 8:29 AM Qian Cai <[email protected]> wrote:
> > On Jun 22, 2020, at 1:37 AM, syzbot <[email protected]> wrote:
> >
> > WARNING: CPU: 0 PID: 0 at mm/slab.h:232 kmem_cache_free+0x0/0x200 mm/slab.c:2262
>
> Is there any particular reason to use CONFIG_SLAB rather than CONFIG_SLUB?

There is a reason, it's still important for us.
But also it's not our strategy to deal with bugs by not testing
configurations and closing eyes on bugs, right? If it's an official
config in the kernel, it needs to be tested. If SLAB is in the state
that we don't care about any bugs in it, then we need to drop it. It
will automatically remove it from all testing systems out there. Or at
least make it "depends on BROKEN" to slowly phase it out during
several releases.


> You are really asking for trouble to test something that almost nobody is exercising that code path very well nowadays.
>
> Anyway, there is a patchset in -mm that might well introduce this regression that we could go to confirm it, but I kind of don’t want to spend too much time on SLAB that suppose to be obsolete eventually.

2020-06-22 07:30:38

by Qian Cai

[permalink] [raw]
Subject: Re: linux-next boot error: WARNING in kmem_cache_free



> On Jun 22, 2020, at 2:42 AM, Dmitry Vyukov <[email protected]> wrote:
>
> There is a reason, it's still important for us.
> But also it's not our strategy to deal with bugs by not testing
> configurations and closing eyes on bugs, right? If it's an official
> config in the kernel, it needs to be tested. If SLAB is in the state
> that we don't care about any bugs in it, then we need to drop it. It
> will automatically remove it from all testing systems out there. Or at
> least make it "depends on BROKEN" to slowly phase it out during
> several releases.

Do you mind sharing what’s your use cases with CONFIG_SLAB? The only thing prevents it from being purged early is that it might perform better with a certain type of networking workloads where syzbot should have nothing to gain from it.

I am more of thinking about the testing coverage that we could use for syzbot to test SLUB instead of SLAB. Also, I have no objection for syzbot to test SLAB, but then from my experience, you are probably on your own to debug further with those testing failures. Until you are able to figure out the buggy patch or patchset introduced the regression, I am afraid not many people would be able to spend much time on SLAB. The developers are pretty much already half-hearted on it by only fixing SLAB here and there without runtime testing it.

2020-06-27 23:12:13

by Eric Biggers

[permalink] [raw]
Subject: Re: linux-next boot error: WARNING in kmem_cache_free

[+Cc linux-mm; +Bcc linux-fsdevel]

On Mon, Jun 22, 2020 at 03:28:09AM -0400, Qian Cai wrote:
>
>
> > On Jun 22, 2020, at 2:42 AM, Dmitry Vyukov <[email protected]> wrote:
> >
> > There is a reason, it's still important for us.
> > But also it's not our strategy to deal with bugs by not testing
> > configurations and closing eyes on bugs, right? If it's an official
> > config in the kernel, it needs to be tested. If SLAB is in the state
> > that we don't care about any bugs in it, then we need to drop it. It
> > will automatically remove it from all testing systems out there. Or at
> > least make it "depends on BROKEN" to slowly phase it out during
> > several releases.
>
> Do you mind sharing what’s your use cases with CONFIG_SLAB? The only thing prevents it from being purged early is that it might perform better with a certain type of networking workloads where syzbot should have nothing to gain from it.
>
> I am more of thinking about the testing coverage that we could use for syzbot to test SLUB instead of SLAB. Also, I have no objection for syzbot to test SLAB, but then from my experience, you are probably on your own to debug further with those testing failures. Until you are able to figure out the buggy patch or patchset introduced the regression, I am afraid not many people would be able to spend much time on SLAB. The developers are pretty much already half-hearted on it by only fixing SLAB here and there without runtime testing it.
>

This bug also got reported 2 days later by the kernel test robot
(https://lore.kernel.org/lkml/20200623090213.GW5535@shao2-debian/).
Then it was fixed by commit 437edcaafbe3, so telling syzbot:

#syz fix: mm, slab/slub: improve error reporting and overhead of cache_from_obj()-fix

If CONFIG_SLAB is no longer useful and supported then it needs to be removed
from the kernel. Otherwise, it needs to be tested just like all other options.

- Eric

2020-06-28 00:51:11

by Qian Cai

[permalink] [raw]
Subject: Re: linux-next boot error: WARNING in kmem_cache_free



> On Jun 27, 2020, at 7:10 PM, Eric Biggers <[email protected]> wrote:
>
> This bug also got reported 2 days later by the kernel test robot
> (lore.kernel.org/lkml/20200623090213.GW5535@shao2-debian/).
> Then it was fixed by commit 437edcaafbe3, so telling syzbot:
>
> #syz fix: mm, slab/slub: improve error reporting and overhead of cache_from_obj()-fix
>
> If CONFIG_SLAB is no longer useful and supported then it needs to be removed
> from the kernel. Otherwise, it needs to be tested just like all other options.

It is awesome that kernel test robot was able to bisect it which is especially useful for testing legacy options like SLAB.