by Feng Tang

[permalink] [raw]

Subject: Re: [PATCH v6 1/4] mm/slub: enable debugging memory wasting of kmalloc

Hi John,

Thanks for the bisecting and reporting!

On Mon, Oct 31, 2022 at 05:30:24AM +0800, Vlastimil Babka wrote:
> On 10/30/22 20:23, John Thomson wrote:
> > On Tue, 13 Sep 2022, at 06:54, Feng Tang wrote:
> >> kmalloc's API family is critical for mm, with one nature that it will
> >> round up the request size to a fixed one (mostly power of 2). Say
> >> when user requests memory for '2^n + 1' bytes, actually 2^(n+1) bytes
> >> could be allocated, so in worst case, there is around 50% memory
> >> space waste.
> >
> >
> > I have a ralink mt7621 router running Openwrt, using the mips ZBOOT kernel, and appear to have bisected
> > a very-nearly-clean kernel v6.1rc-2 boot issue to this commit.
> > I have 3 commits atop 6.1-rc2: fix a ZBOOT compile error, use the Openwrt LZMA options,
> > and enable DEBUG_ZBOOT for my platform. I am compiling my kernel within the Openwrt build system.
> > No guarantees this is not due to something I am doing wrong, but any insight would be greatly appreciated.
> >
> >
> > On UART, No indication of the (once extracted) kernel booting:
> >
> > transfer started ......................................... transfer ok, time=2.01s
> > setting up elf image... OK
> > jumping to kernel code
> > zimage at: 80BA4100 810D4720
> > Uncompressing Linux at load address 80001000
> > Copy device tree to address 80B96EE0
> > Now, booting the kernel...
>
> It's weird that the commit would cause no output so early, SLUB code is
> run only later.

I noticed your cmdline has console setting, could you enable the
earlyprintk in cmdline like "earlyprintk=ttyS0,115200" etc to see
if there is more message printed out.

Also I want to confirm this is a boot failure and not only a boot
message missing.

> > Nothing follows
> >
> > 6edf2576a6cc ("mm/slub: enable debugging memory wasting of kmalloc") reverted, normal boot:
> > transfer started ......................................... transfer ok, time=2.01s
> > setting up elf image... OK
> > jumping to kernel code
> > zimage at: 80BA4100 810D47A4
> > Uncompressing Linux at load address 80001000
> > Copy device tree to address 80B96EE0
> > Now, booting the kernel...
> >
> > [ 0.000000] Linux version 6.1.0-rc2 (john@john) (mipsel-openwrt-linux-musl-gcc (OpenWrt GCC 11.3.0 r19724+16-1521d5f453) 11.3.0, GNU ld (GNU Binutils) 2.37) #0 SMP Fri Oct 28 03:48:10 2022
> > [ 0.000000] SoC Type: MediaTek MT7621 ver:1 eco:3
> > [ 0.000000] printk: bootconsole [early0] enabled
> > [ 0.000000] CPU0 revision is: 0001992f (MIPS 1004Kc)
> > [ 0.000000] MIPS: machine is MikroTik RouterBOARD 760iGS
> > [ 0.000000] Initrd not found or empty - disabling initrd
> > [ 0.000000] VPE topology {2,2} total 4
> > [ 0.000000] Primary instruction cache 32kB, VIPT, 4-way, linesize 32 bytes.
> > [ 0.000000] Primary data cache 32kB, 4-way, PIPT, no aliases, linesize 32 bytes
> > [ 0.000000] MIPS secondary cache 256kB, 8-way, linesize 32 bytes.
> > [ 0.000000] Zone ranges:
> > [ 0.000000] Normal [mem 0x0000000000000000-0x000000000fffffff]
> > [ 0.000000] HighMem empty
> > [ 0.000000] Movable zone start for each node
> > [ 0.000000] Early memory node ranges
> > [ 0.000000] node 0: [mem 0x0000000000000000-0x000000000fffffff]
> > [ 0.000000] Initmem setup node 0 [mem 0x0000000000000000-0x000000000fffffff]
> > [ 0.000000] percpu: Embedded 11 pages/cpu s16064 r8192 d20800 u45056
> > [ 0.000000] Built 1 zonelists, mobility grouping on. Total pages: 64960
> > [ 0.000000] Kernel command line: console=ttyS0,115200 rootfstype=squashfs,jffs2
> > [ 0.000000] Dentry cache hash table entries: 32768 (order: 5, 131072 bytes, linear)
> > [ 0.000000] Inode-cache hash table entries: 16384 (order: 4, 65536 bytes, linear)
> > [ 0.000000] Writing ErrCtl register=00019146
> > [ 0.000000] Readback ErrCtl register=00019146
> > [ 0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
> > [ 0.000000] Memory: 246220K/262144K available (7455K kernel code, 628K rwdata, 1308K rodata, 3524K init, 245K bss, 15924K reserved, 0K cma-reserved, 0K highmem)
> > [ 0.000000] SLUB: HWalign=32, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
> > [ 0.000000] rcu: Hierarchical RCU implementation.
> >
> >
> > boot continues as expected
> >
> >
> > possibly relevant config options:
> > grep -E '(SLUB|SLAB)' .config
> > # SLAB allocator options
> > # CONFIG_SLAB is not set
> > CONFIG_SLUB=y
> > CONFIG_SLAB_MERGE_DEFAULT=y
> > # CONFIG_SLAB_FREELIST_RANDOM is not set
> > # CONFIG_SLAB_FREELIST_HARDENED is not set
> > # CONFIG_SLUB_STATS is not set
> > CONFIG_SLUB_CPU_PARTIAL=y
> > # end of SLAB allocator options
> > # CONFIG_SLUB_DEBUG is not set
>
> Also not having CONFIG_SLUB_DEBUG enabled means most of the code the
> patch/commit touches is not even active.
> Could this be some miscompile or code layout change exposing some
> different bug, hmm.
> Is it any different if you do enable CONFIG_SLUB_DEBUG ?
> Or change to CONFIG_SLAB? (that would be really weird if not)

I haven't found any clue from the code either, and I compiled
kernel with the config above and tested booting on an Alder-lake
desktop and a QEMU, which boot fine.

Could you provide the full kernel config and demsg (in compressed
format if you think it's too big), so we can check more?

Thanks,
Feng

2022-10-31 10:51:16

by John Thomson

[permalink] [raw]

Subject: Re: [PATCH v6 1/4] mm/slub: enable debugging memory wasting of kmalloc

On Mon, 31 Oct 2022, at 02:36, Feng Tang wrote:
> Hi John,
>
> Thanks for the bisecting and reporting!
>
> On Mon, Oct 31, 2022 at 05:30:24AM +0800, Vlastimil Babka wrote:
>> On 10/30/22 20:23, John Thomson wrote:
>> > On Tue, 13 Sep 2022, at 06:54, Feng Tang wrote:
>> >> kmalloc's API family is critical for mm, with one nature that it will
>> >> round up the request size to a fixed one (mostly power of 2). Say
>> >> when user requests memory for '2^n + 1' bytes, actually 2^(n+1) bytes
>> >> could be allocated, so in worst case, there is around 50% memory
>> >> space waste.
>> >
>> >
>> > I have a ralink mt7621 router running Openwrt, using the mips ZBOOT kernel, and appear to have bisected
>> > a very-nearly-clean kernel v6.1rc-2 boot issue to this commit.
>> > I have 3 commits atop 6.1-rc2: fix a ZBOOT compile error, use the Openwrt LZMA options,
>> > and enable DEBUG_ZBOOT for my platform. I am compiling my kernel within the Openwrt build system.
>> > No guarantees this is not due to something I am doing wrong, but any insight would be greatly appreciated.
>> >
>> >
>> > On UART, No indication of the (once extracted) kernel booting:
>> >
>> > transfer started ......................................... transfer ok, time=2.01s
>> > setting up elf image... OK
>> > jumping to kernel code
>> > zimage at: 80BA4100 810D4720
>> > Uncompressing Linux at load address 80001000
>> > Copy device tree to address 80B96EE0
>> > Now, booting the kernel...
>>
>> It's weird that the commit would cause no output so early, SLUB code is
>> run only later.
>
> I noticed your cmdline has console setting, could you enable the
> earlyprintk in cmdline like "earlyprintk=ttyS0,115200" etc to see
> if there is more message printed out.

Still nothing from vmlinux with earlykprint on UART unless revert.

>
> Also I want to confirm this is a boot failure and not only a boot
> message missing.

Yes, boot failure.
Network comes up automatically on successful boot. Not happening when no kernel UART

>
>> > Nothing follows
>> >
>> > 6edf2576a6cc ("mm/slub: enable debugging memory wasting of kmalloc") reverted, normal boot:
>> > transfer started ......................................... transfer ok, time=2.01s
>> > setting up elf image... OK
>> > jumping to kernel code
>> > zimage at: 80BA4100 810D47A4
>> > Uncompressing Linux at load address 80001000
>> > Copy device tree to address 80B96EE0
>> > Now, booting the kernel...
>> >
>> > [ 0.000000] Linux version 6.1.0-rc2 (john@john) (mipsel-openwrt-linux-musl-gcc (OpenWrt GCC 11.3.0 r19724+16-1521d5f453) 11.3.0, GNU ld (GNU Binutils) 2.37) #0 SMP Fri Oct 28 03:48:10 2022
>> > [ 0.000000] SoC Type: MediaTek MT7621 ver:1 eco:3
>> > [ 0.000000] printk: bootconsole [early0] enabled
>> > [ 0.000000] CPU0 revision is: 0001992f (MIPS 1004Kc)
>> > [ 0.000000] MIPS: machine is MikroTik RouterBOARD 760iGS
>> > [ 0.000000] Initrd not found or empty - disabling initrd
>> > [ 0.000000] VPE topology {2,2} total 4
>> > [ 0.000000] Primary instruction cache 32kB, VIPT, 4-way, linesize 32 bytes.
>> > [ 0.000000] Primary data cache 32kB, 4-way, PIPT, no aliases, linesize 32 bytes
>> > [ 0.000000] MIPS secondary cache 256kB, 8-way, linesize 32 bytes.
>> > [ 0.000000] Zone ranges:
>> > [ 0.000000] Normal [mem 0x0000000000000000-0x000000000fffffff]
>> > [ 0.000000] HighMem empty
>> > [ 0.000000] Movable zone start for each node
>> > [ 0.000000] Early memory node ranges
>> > [ 0.000000] node 0: [mem 0x0000000000000000-0x000000000fffffff]
>> > [ 0.000000] Initmem setup node 0 [mem 0x0000000000000000-0x000000000fffffff]
>> > [ 0.000000] percpu: Embedded 11 pages/cpu s16064 r8192 d20800 u45056
>> > [ 0.000000] Built 1 zonelists, mobility grouping on. Total pages: 64960
>> > [ 0.000000] Kernel command line: console=ttyS0,115200 rootfstype=squashfs,jffs2
>> > [ 0.000000] Dentry cache hash table entries: 32768 (order: 5, 131072 bytes, linear)
>> > [ 0.000000] Inode-cache hash table entries: 16384 (order: 4, 65536 bytes, linear)
>> > [ 0.000000] Writing ErrCtl register=00019146
>> > [ 0.000000] Readback ErrCtl register=00019146
>> > [ 0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
>> > [ 0.000000] Memory: 246220K/262144K available (7455K kernel code, 628K rwdata, 1308K rodata, 3524K init, 245K bss, 15924K reserved, 0K cma-reserved, 0K highmem)
>> > [ 0.000000] SLUB: HWalign=32, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
>> > [ 0.000000] rcu: Hierarchical RCU implementation.
>> >
>> >
>> > boot continues as expected
>> >
>> >
>> > possibly relevant config options:
>> > grep -E '(SLUB|SLAB)' .config
>> > # SLAB allocator options
>> > # CONFIG_SLAB is not set
>> > CONFIG_SLUB=y
>> > CONFIG_SLAB_MERGE_DEFAULT=y
>> > # CONFIG_SLAB_FREELIST_RANDOM is not set
>> > # CONFIG_SLAB_FREELIST_HARDENED is not set
>> > # CONFIG_SLUB_STATS is not set
>> > CONFIG_SLUB_CPU_PARTIAL=y
>> > # end of SLAB allocator options
>> > # CONFIG_SLUB_DEBUG is not set
>>
>> Also not having CONFIG_SLUB_DEBUG enabled means most of the code the
>> patch/commit touches is not even active.
>> Could this be some miscompile or code layout change exposing some
>> different bug, hmm.

Yes, it could be.

>> Is it any different if you do enable CONFIG_SLUB_DEBUG ?

No change

>> Or change to CONFIG_SLAB? (that would be really weird if not)

This boots fine

> I haven't found any clue from the code either, and I compiled
> kernel with the config above and tested booting on an Alder-lake
> desktop and a QEMU, which boot fine.
>
> Could you provide the full kernel config and demsg (in compressed
> format if you think it's too big), so we can check more?

Attached

> Thanks,
> Feng

vmlinux is bigger, and entry point is larger (0x8074081c vs 0x807407dc revert vs 0x8073fcbc),
so that may be it? Or not?
revert + SLUB_DEBUG + SLUB_DEBUG_ON is bigger still, but does successfully boot.
vmlinux entry point is 0x8074705c

transfer started ......................................... transfer ok, time=2.01s
setting up elf image... OK
jumping to kernel code
zimage at: 80BA4100 810D6FA0
Uncompressing Linux at load address 80001000
Copy device tree to address 80B9EEE0
Now, booting the kernel...
[ 0.000000] Linux version 6.1.0-rc2 (john@john) (mipsel-openwrt-linux-musl-gc
c (OpenWrt GCC 11.3.0 r19724+16-1521d5f453) 11.3.0, GNU ld (GNU Binutils) 2.37)
#0 SMP Fri Oct 28 03:48:10 2022

I will keep looking.

Thank you,
--
John Thomson

Attachments:

mt7621-mm-waste-debug.tar.zstd (29.06 kB)

2022-10-31 12:31:47

On Thu, Nov 03, 2022 at 06:35:53PM +0100, Vlastimil Babka wrote:

[...]

> >> But FYI I'm suggesting to drop CONFIG_TRACING=n variant:
> >>
> >> https://lore.kernel.org/linux-mm/[email protected]/T/#m20ecf14390e406247bde0ea9cce368f469c539ed
> >>
> >> Any thoughts?
> >
> > I'll get to it, also I think we were pondering that within your series too,
> > but I wanted to postpone in case somebody objects to the extra function call
> > it creates.
> > But that would be for 6.2 anyway while I'll collect the fix here for 6.1.
>
> On second thought, the fix is making the inlined kmalloc_trace() expand to a
> call that had 2 parameters and now it has 5, which seems to me like a worse
> thing (code bloat) than the function call. With the other reasons to ditch
> the CONFIG_TRACING=n variant I'm inclined to just do it right now.

That's great! It will save much trouble, and reduce code complexity.

Btw, the patch below also has some compiling issue for some kconfig
(thanks to 0Day's kbuild bot).

Thanks,
Feng

> >>>
> >>> How about the following fix?
> >>>
> >>> Thanks,
> >>> Feng
> >>>
> >>> ---
> >>> From 9f9fa9da8946fd44625f873c0f51167357075be1 Mon Sep 17 00:00:00 2001
> >>> From: Feng Tang <[email protected]>
> >>> Date: Thu, 3 Nov 2022 21:32:10 +0800
> >>> Subject: [PATCH] mm/slub: Add missing orig_size parameter for wastage debug
> >>>
> >>> commit 6edf2576a6cc ("mm/slub: enable debugging memory wasting of
> >>> kmalloc") was introduced for debugging kmalloc memory wastage,
> >>> and it missed to pass the original request size for kmalloc_trace()
> >>> and kmalloc_node_trace() in CONFIG_TRACING=n path.
> >>>
> >>> Fix it by using __kmem_cache_alloc_node() with correct original
> >>> request size.
> >>>
> >>> Fixes: 6edf2576a6cc ("mm/slub: enable debugging memory wasting of kmalloc")
> >>> Suggested-by: Vlastimil Babka <[email protected]>
> >>> Signed-off-by: Feng Tang <[email protected]>
> >>> ---
> >>> include/linux/slab.h | 9 +++++++--
> >>> 1 file changed, 7 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/include/linux/slab.h b/include/linux/slab.h
> >>> index 90877fcde70b..9691afa569e1 100644
> >>> --- a/include/linux/slab.h
> >>> +++ b/include/linux/slab.h
> >>> @@ -469,6 +469,9 @@ void *__kmalloc_node(size_t size, gfp_t flags, int node) __assume_kmalloc_alignm
> >>> __alloc_size(1);
> >>> void *kmem_cache_alloc_node(struct kmem_cache *s, gfp_t flags, int node) __assume_slab_alignment
> >>> __malloc;
> >>> +void *__kmem_cache_alloc_node(struct kmem_cache *s, gfp_t flags, int node,
> >>> + size_t orig_size, unsigned long caller) __assume_slab_alignment
> >>> + __malloc;
> >>>
> >>> #ifdef CONFIG_TRACING
> >>> void *kmalloc_trace(struct kmem_cache *s, gfp_t flags, size_t size)
> >>> @@ -482,7 +485,8 @@ void *kmalloc_node_trace(struct kmem_cache *s, gfp_t gfpflags,
> >>> static __always_inline __alloc_size(3)
> >>> void *kmalloc_trace(struct kmem_cache *s, gfp_t flags, size_t size)
> >>> {
> >>> - void *ret = kmem_cache_alloc(s, flags);
> >>> + void *ret = __kmem_cache_alloc_node(s, flags, NUMA_NO_NODE,
> >>> + size, _RET_IP_);
> >>>
> >>> ret = kasan_kmalloc(s, ret, size, flags);
> >>> return ret;
> >>> @@ -492,7 +496,8 @@ static __always_inline __alloc_size(4)
> >>> void *kmalloc_node_trace(struct kmem_cache *s, gfp_t gfpflags,
> >>> int node, size_t size)
> >>> {
> >>> - void *ret = kmem_cache_alloc_node(s, gfpflags, node);
> >>> + void *ret = __kmem_cache_alloc_node(s, gfpflags, node,
> >>> + size, _RET_IP_);
> >>>
> >>> ret = kasan_kmalloc(s, ret, size, gfpflags);
> >>> return ret;
> >>> --
> >>> 2.34.1
> >>>
> >>>
> >>>
> >>
> >
>
>