2014-10-02 05:35:36

by Joonsoo Kim

[permalink] [raw]
Subject: Re: [mm/slab] BUG: unable to handle kernel paging request at 00010023

On Tue, Sep 30, 2014 at 03:56:24PM +0800, Fengguang Wu wrote:
> Hi Joonsoo,
>
> 0day kernel testing robot got the below dmesg and the first bad commit is
>
> git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
>
> commit 36fbfebe776eb5871d61e7a755c9feb1c96cc4aa
> Author: Joonsoo Kim <[email protected]>
> AuthorDate: Tue Sep 23 11:52:35 2014 +1000
> Commit: Stephen Rothwell <[email protected]>
> CommitDate: Tue Sep 23 11:52:35 2014 +1000
>
> mm/slab: support slab merge
>
> Slab merge is good feature to reduce fragmentation. If new creating slab
> have similar size and property with exsitent slab, this feature reuse it
> rather than creating new one. As a result, objects are packed into fewer
> slabs so that fragmentation is reduced.
>
> Below is result of my testing.
>
> * After boot, sleep 20; cat /proc/meminfo | grep Slab
>
> <Before>
> Slab: 25136 kB
>
> <After>
> Slab: 24364 kB
>
> We can save 3% memory used by slab.
>
> For supporting this feature in SLAB, we need to implement SLAB specific
> kmem_cache_flag() and __kmem_cache_alias(), because SLUB implements some
> SLUB specific processing related to debug flag and object size change on
> these functions.
>
> Signed-off-by: Joonsoo Kim <[email protected]>
> Cc: Christoph Lameter <[email protected]>
> Cc: Pekka Enberg <[email protected]>
> Cc: David Rientjes <[email protected]>
> Signed-off-by: Andrew Morton <[email protected]>
>
> +----------------------------------------------+------------+------------+---------------+
> | | 8c087489b8 | 36fbfebe77 | next-20140923 |
> +----------------------------------------------+------------+------------+---------------+
> | boot_successes | 60 | 0 | 1 |
> | boot_failures | 0 | 20 | 314 |
> | BUG:unable_to_handle_kernel | 0 | 20 | 312 |
> | Oops | 0 | 20 | 312 |
> | EIP_is_at_kernfs_link_sibling | 0 | 4 | 14 |
> | Kernel_panic-not_syncing:Fatal_exception | 0 | 20 | 312 |
> | backtrace:acpi_bus_scan | 0 | 4 | 14 |
> | backtrace:acpi_scan_init | 0 | 20 | 45 |
> | backtrace:acpi_init | 0 | 20 | 45 |
> | backtrace:kernel_init_freeable | 0 | 20 | 312 |
> | EIP_is_at_kernfs_add_one | 0 | 16 | 298 |
> | backtrace:kobject_add_internal | 0 | 16 | 31 |
> | backtrace:kobject_init_and_add | 0 | 16 | 31 |
> | backtrace:acpi_scan_add_handler_with_hotplug | 0 | 16 | 31 |
> | backtrace:acpi_pci_root_init | 0 | 16 | 31 |
> | backtrace:tty_register_driver | 0 | 0 | 106 |
> | backtrace:pty_init | 0 | 0 | 106 |
> | backtrace:acpi_bus_register_driver | 0 | 0 | 1 |
> | backtrace:acpi_button_driver_init | 0 | 0 | 1 |
> | BUG:kernel_boot_crashed | 0 | 0 | 1 |
> | BUG:kernel_test_crashed | 0 | 0 | 1 |
> | backtrace:subsys_system_register | 0 | 0 | 160 |
> | backtrace:container_dev_init | 0 | 0 | 160 |
> | backtrace:driver_init | 0 | 0 | 160 |
> +----------------------------------------------+------------+------------+---------------+
>
> [ 0.463788] ACPI: (supports S0 S5)
> [ 0.464003] ACPI: Using IOAPIC for interrupt routing
> [ 0.464738] PCI: Using host bridge windows from ACPI; if necessary, use "pci=nocrs" and report a bug
> [ 0.466034] BUG: unable to handle kernel paging request at 00010023
> [ 0.466989] IP: [<c117dcf9>] kernfs_add_one+0x89/0x130
> [ 0.467812] *pdpt = 0000000000000000 *pde = f000ff53f000ff53
> [ 0.468000] Oops: 0002 [#1] SMP
> [ 0.468000] Modules linked in:
> [ 0.468000] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.17.0-rc6-00089-g36fbfeb #1
> [ 0.468000] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> [ 0.468000] task: d303ec90 ti: d3040000 task.ti: d3040000
> [ 0.468000] EIP: 0060:[<c117dcf9>] EFLAGS: 00010286 CPU: 0
> [ 0.468000] EIP is at kernfs_add_one+0x89/0x130
> [ 0.468000] EAX: 542572cb EBX: 00010003 ECX: 00000008 EDX: 2c8de598
> [ 0.468000] ESI: d311de10 EDI: d311de70 EBP: d3041dd8 ESP: d3041db4
> [ 0.468000] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
> [ 0.468000] CR0: 8005003b CR2: 00010023 CR3: 01a8a000 CR4: 000006f0
> [ 0.468000] Stack:
> [ 0.468000] d3006f00 00000202 d311de70 d311de10 d3041dd8 c117dba0 d311de10 c159a5c0
> [ 0.468000] c1862a00 d3041df0 c117f0f2 00000000 c18629f4 d311de70 00000000 d3041e2c
> [ 0.468000] c117f8b5 00001000 00000000 c159a5c0 c18629f4 00000000 00000001 c1862a00
> [ 0.468000] Call Trace:
> [ 0.468000] [<c117dba0>] ? kernfs_new_node+0x30/0x40
> [ 0.468000] [<c117f0f2>] __kernfs_create_file+0x92/0xc0
> [ 0.468000] [<c117f8b5>] sysfs_add_file_mode_ns+0x95/0x190
> [ 0.468000] [<c117f9d7>] sysfs_create_file_ns+0x27/0x40
> [ 0.468000] [<c1252ef6>] kobject_add_internal+0x136/0x2c0
> [ 0.468000] [<c125e360>] ? kvasprintf+0x40/0x50
> [ 0.468000] [<c1252a92>] ? kobject_set_name_vargs+0x42/0x60
> [ 0.468000] [<c12530b5>] kobject_init_and_add+0x35/0x50
> [ 0.468000] [<c12ad04f>] acpi_sysfs_add_hotplug_profile+0x24/0x4a
> [ 0.468000] [<c12a7280>] acpi_scan_add_handler_with_hotplug+0x21/0x28
> [ 0.468000] [<c18df524>] acpi_pci_root_init+0x20/0x22
> [ 0.468000] [<c18df0e1>] acpi_scan_init+0x24/0x16d
> [ 0.468000] [<c18def73>] acpi_init+0x20c/0x224
> [ 0.468000] [<c18ded67>] ? acpi_sleep_init+0xab/0xab
> [ 0.468000] [<c100041e>] do_one_initcall+0x7e/0x1b0
> [ 0.468000] [<c18ded67>] ? acpi_sleep_init+0xab/0xab
> [ 0.468000] [<c18b24ba>] ? repair_env_string+0x12/0x54
> [ 0.468000] [<c18b24a8>] ? initcall_blacklist+0x7c/0x7c
> [ 0.468000] [<c105e100>] ? parse_args+0x160/0x3f0
> [ 0.468000] [<c18b2bd1>] kernel_init_freeable+0xfc/0x179
> [ 0.468000] [<c156782b>] kernel_init+0xb/0xd0
> [ 0.468000] [<c1574601>] ret_from_kernel_thread+0x21/0x30
> [ 0.468000] [<c1567820>] ? rest_init+0xb0/0xb0
> [ 0.468000] Code: 26 00 83 e1 10 75 5b 8b 46 24 e8 b3 ea ff ff 89 46 38 89 f0 e8 d9 f9 ff ff 85 c0 89 c3 75 ca 8b 5f 5c 85 db 74 11 e8 97 90 f1 ff <89> 43 20 89 53 24 89 43 28 89 53 2c b8 c0 2f 85 c1 e8 d1 36 3f
> [ 0.468000] EIP: [<c117dcf9>] kernfs_add_one+0x89/0x130 SS:ESP 0068:d3041db4
> [ 0.468000] CR2: 0000000000010023
> [ 0.468000] ---[ end trace 4fa173691404b63f ]---
> [ 0.468000] Kernel panic - not syncing: Fatal exception
>
> git bisect start 55f21306900abf9f9d2a087a127ff49c6d388ad2 0f33be009b89d2268e94194dc4fd01a7851b6d51 --
> git bisect good 18c13e2d9b75e2760e6520f2fde00401192956f3 # 17:56 20+ 0 Merge remote-tracking branch 'bluetooth/master'
> git bisect good abf79495f38ba66f750566b3f0a8da8dd94b4dc3 # 18:03 20+ 0 Merge remote-tracking branch 'ftrace/for-next'
> git bisect good 0bed22034e26a3c37ee4407fccffa8c095d5e144 # 18:09 20+ 0 Merge remote-tracking branch 'pinctrl/for-next'
> git bisect good 15c9281a15ed7718868d115d4d00619b0b7a2624 # 18:14 20+ 0 Merge remote-tracking branch 'clk/clk-next'
> git bisect good 50939531dea1b913b7fa29f9bbc69feafefd090c # 18:23 20+ 0 Merge branch 'rd-docs/master'
> git bisect good aa881e3c5e87c8aa23519f40554897d56f32b935 # 18:49 20+ 0 Merge remote-tracking branch 'powerpc-mpe/next'
> git bisect bad 81b63d14db32bd7706c955d1e04e65b152b2277a # 18:57 0- 2 Merge branch 'akpm-current/current'
> git bisect bad f313ca82d72066a3c44fd6c66cee57b25de43aa9 # 19:31 0- 1 introduce-dump_vma-fix
> git bisect good 2c5fe9213048c5640b8e46407f5614038c03ad93 # 20:16 20+ 0 mm: fix kmemcheck.c build errors
> git bisect bad 69454f8be7f621ac8c3c6c9763bb70e116988942 # 20:35 0- 18 block_dev: implement readpages() to optimize sequential read
> git bisect bad 66a31d528a1e3d483be2b1c993ec1268412f0074 # 20:40 0- 1 memory-hotplug-add-sysfs-zones_online_to-attribute-fix-2
> git bisect good 5e8acb68610c077b08cb3f16305aa3cc22e5d2a8 # 21:07 20+ 0 kernel/kthread.c: partial revert of 81c98869faa5 ("kthread: ensure locality of task_struct allocations")
> git bisect bad 36fbfebe776eb5871d61e7a755c9feb1c96cc4aa # 22:06 0- 10 mm/slab: support slab merge
> git bisect good 11e57381eced875ef5a6fea4005fdf72b6f68eff # 22:19 20+ 0 mm/slab_common: commonize slab merge logic
> git bisect good 8c087489b8a32b9235f7f9417390c62d93aba522 # 22:24 20+ 0 mm/slab_common: fix build failure if CONFIG_SLUB
> # first bad commit: [36fbfebe776eb5871d61e7a755c9feb1c96cc4aa] mm/slab: support slab merge
> git bisect good 8c087489b8a32b9235f7f9417390c62d93aba522 # 22:26 60+ 0 mm/slab_common: fix build failure if CONFIG_SLUB
> git bisect bad 55f21306900abf9f9d2a087a127ff49c6d388ad2 # 22:26 0- 314 Add linux-next specific files for 20140923
> git bisect good f4cb707e7ad9727a046b463232f2de166e327d3e # 22:32 60+ 0 Merge tag 'pm+acpi-3.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
> git bisect bad 4d8426f9ac601db2a64fa7be64051d02b9c9fe01 # 22:36 0- 60 Add linux-next specific files for 20140926
>
>
> This script may reproduce the error.
>
> ----------------------------------------------------------------------------
> #!/bin/bash
>
> kernel=$1
>
> kvm=(
> qemu-system-x86_64
> -cpu kvm64
> -enable-kvm
> -kernel $kernel
> -m 320
> -smp 1
> -net nic,vlan=1,model=e1000
> -net user,vlan=1
> -boot order=nc
> -no-reboot
> -watchdog i6300esb
> -rtc base=localtime
> -serial stdio
> -display none
> -monitor null
> )
>
> append=(
> hung_task_panic=1
> earlyprintk=ttyS0,115200
> debug
> apic=debug
> sysrq_always_enabled
> rcupdate.rcu_cpu_stall_timeout=100
> panic=-1
> softlockup_panic=1
> nmi_watchdog=panic
> oops=panic
> load_ramdisk=2
> prompt_ramdisk=0
> console=ttyS0,115200
> console=tty0
> vga=normal
> root=/dev/ram0
> rw
> drbd.minor_count=8
> )
>
> "${kvm[@]}" --append "${append[*]}"
> ----------------------------------------------------------------------------

Thanks Fengguang.
I can reproduce this bug in my system with your configuration.
Below is fix for this bug.

Hello Andrew,

Could you take this patch?

Thanks.

----------->8------------------
>From e9c884c1f807dc9de2f4cc3b2779a014e5ebb7a1 Mon Sep 17 00:00:00 2001
From: Joonsoo Kim <[email protected]>
Date: Thu, 2 Oct 2014 12:09:41 +0900
Subject: [PATCH] mm/slab: adjust object_size in order to fix bug in slab
merge

Fengguang reported following bug and his bisect result points
to this patch ('mm/slab: support slab merge') as root cause.

[ 0.466034] BUG: unable to handle kernel paging request at 00010023
[ 0.466989] IP: [<c117dcf9>] kernfs_add_one+0x89/0x130
[ 0.467812] *pdpt = 0000000000000000 *pde = f000ff53f000ff53
[ 0.468000] Oops: 0002 [#1] SMP
[ 0.468000] Modules linked in:
[ 0.468000] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.17.0-rc6-00089-g36fbfeb #1
[ 0.468000] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[ 0.468000] task: d303ec90 ti: d3040000 task.ti: d3040000
[ 0.468000] EIP: 0060:[<c117dcf9>] EFLAGS: 00010286 CPU: 0
[ 0.468000] EIP is at kernfs_add_one+0x89/0x130
[ 0.468000] EAX: 542572cb EBX: 00010003 ECX: 00000008 EDX: 2c8de598
[ 0.468000] ESI: d311de10 EDI: d311de70 EBP: d3041dd8 ESP: d3041db4
[ 0.468000] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
[ 0.468000] CR0: 8005003b CR2: 00010023 CR3: 01a8a000 CR4: 000006f0
[ 0.468000] Stack:
[ 0.468000] d3006f00 00000202 d311de70 d311de10 d3041dd8 c117dba0 d311de10 c159a5c0
[ 0.468000] c1862a00 d3041df0 c117f0f2 00000000 c18629f4 d311de70 00000000 d3041e2c
[ 0.468000] c117f8b5 00001000 00000000 c159a5c0 c18629f4 00000000 00000001 c1862a00
[ 0.468000] Call Trace:
[ 0.468000] [<c117dba0>] ? kernfs_new_node+0x30/0x40
[ 0.468000] [<c117f0f2>] __kernfs_create_file+0x92/0xc0
[ 0.468000] [<c117f8b5>] sysfs_add_file_mode_ns+0x95/0x190
[ 0.468000] [<c117f9d7>] sysfs_create_file_ns+0x27/0x40
[ 0.468000] [<c1252ef6>] kobject_add_internal+0x136/0x2c0
[ 0.468000] [<c125e360>] ? kvasprintf+0x40/0x50
[ 0.468000] [<c1252a92>] ? kobject_set_name_vargs+0x42/0x60
[ 0.468000] [<c12530b5>] kobject_init_and_add+0x35/0x50
[ 0.468000] [<c12ad04f>] acpi_sysfs_add_hotplug_profile+0x24/0x4a
[ 0.468000] [<c12a7280>] acpi_scan_add_handler_with_hotplug+0x21/0x28
[ 0.468000] [<c18df524>] acpi_pci_root_init+0x20/0x22
[ 0.468000] [<c18df0e1>] acpi_scan_init+0x24/0x16d
[ 0.468000] [<c18def73>] acpi_init+0x20c/0x224
[ 0.468000] [<c18ded67>] ? acpi_sleep_init+0xab/0xab
[ 0.468000] [<c100041e>] do_one_initcall+0x7e/0x1b0
[ 0.468000] [<c18ded67>] ? acpi_sleep_init+0xab/0xab
[ 0.468000] [<c18b24ba>] ? repair_env_string+0x12/0x54
[ 0.468000] [<c18b24a8>] ? initcall_blacklist+0x7c/0x7c
[ 0.468000] [<c105e100>] ? parse_args+0x160/0x3f0
[ 0.468000] [<c18b2bd1>] kernel_init_freeable+0xfc/0x179
[ 0.468000] [<c156782b>] kernel_init+0xb/0xd0
[ 0.468000] [<c1574601>] ret_from_kernel_thread+0x21/0x30
[ 0.468000] [<c1567820>] ? rest_init+0xb0/0xb0
[snip...]
[ 0.468000] EIP: [<c117dcf9>] kernfs_add_one+0x89/0x130 SS:ESP 0068:d3041db4
[ 0.468000] CR2: 0000000000010023
[ 0.468000] ---[ end trace 4fa173691404b63f ]---
[ 0.468000] Kernel panic - not syncing: Fatal exception

This error is caused by wrongly initialized object due to slab merge.
Size of vm_area_struct is 92 bytes in this configuration, and, for better
alignment, this kmem_cache manage memory in 96 bytes unit. But, maybe for
performance reason, if user requests zeroing for this object, object is
cleared up to 92 bytes.

Meanwhile, size of kernfs_node_cache is 96 bytes so that it can be merged
with kmem_cache for vm_area_struct. In this situation, if user request
zeroing for objects for kernfs_node_cache, object is only cleared up to
92 bytes. So, kernfs_node had odd value on iattr field and this results in
de-referencing wrong address bug.

To fix this problem, object size is adjusted when merging occurs.
After this change, zeroing will be done to complete object so that
de-referencing wrong address can't happen.

Reported-by: Fengguang Wu <[email protected]>
Signed-off-by: Joonsoo Kim <[email protected]>
---
mm/slab.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/mm/slab.c b/mm/slab.c
index c55b1ec..6307131 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -2118,9 +2118,15 @@ __kmem_cache_alias(const char *name, size_t size, size_t align,
struct kmem_cache *cachep;

cachep = find_mergeable(size, align, flags, name, ctor);
- if (cachep)
+ if (cachep) {
cachep->refcount++;

+ /*
+ * Adjust the object sizes so that we clear
+ * the complete object on kzalloc.
+ */
+ cachep->object_size = max_t(int, cachep->object_size, size);
+ }
return cachep;
}

--
1.7.9.5