Greeting,
FYI, we noticed the following commit (built with gcc-9):
commit: 620a6dc40754dc218f5b6389b5d335e9a107fd29 ("sched/topology: Make sched_init_numa() use a set for the deduplicating sort")
https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git sched/core
in testcase: rcutorture
version:
with following parameters:
runtime: 300s
test: cpuhotplug
torture_type: tasks
test-description: rcutorture is rcutorture kernel module load/unload test.
test-url: https://www.kernel.org/doc/Documentation/RCU/torture.txt
on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 8G
caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
If you fix the issue, kindly add following tag
Reported-by: kernel test robot <[email protected]>
[ 3.458081] BUG: KASAN: slab-out-of-bounds in build_sched_domains (kbuild/src/consumer/kernel/sched/topology.c:1796 kbuild/src/consumer/kernel/sched/topology.c:1300 kbuild/src/consumer/kernel/sched/topology.c:2039)
[ 3.458081] Read of size 8 at addr ffff8881008efe00 by task swapper/0/1
[ 3.458081]
[ 3.458081] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.11.0-rc2-00012-g620a6dc40754 #4
[ 3.458081] Call Trace:
[ 3.458081] ? dump_stack (kbuild/src/consumer/lib/dump_stack.c:79 kbuild/src/consumer/lib/dump_stack.c:120)
[ 3.458081] ? print_address_description.cold+0xdd/0x4a3
[ 3.458081] ? build_sched_domains (kbuild/src/consumer/kernel/sched/topology.c:1796 kbuild/src/consumer/kernel/sched/topology.c:1300 kbuild/src/consumer/kernel/sched/topology.c:2039)
[ 3.458081] ? kasan_report.cold (kbuild/src/consumer/mm/kasan/report.c:397 kbuild/src/consumer/mm/kasan/report.c:413)
[ 3.458081] ? build_sched_domains (kbuild/src/consumer/kernel/sched/topology.c:1796 kbuild/src/consumer/kernel/sched/topology.c:1300 kbuild/src/consumer/kernel/sched/topology.c:2039)
[ 3.458081] ? __asan_load8 (kbuild/src/consumer/mm/kasan/generic.c:179 kbuild/src/consumer/mm/kasan/generic.c:252)
[ 3.458081] ? build_sched_domains (kbuild/src/consumer/kernel/sched/topology.c:1796 kbuild/src/consumer/kernel/sched/topology.c:1300 kbuild/src/consumer/kernel/sched/topology.c:2039)
[ 3.458081] ? __kasan_kmalloc (kbuild/src/consumer/mm/kasan/common.c:443)
[ 3.458081] ? __kmalloc_node (kbuild/src/consumer/include/linux/kasan.h:215 kbuild/src/consumer/mm/slub.c:4018)
[ 3.458081] ? cpu_attach_domain (kbuild/src/consumer/kernel/sched/topology.c:2027)
[ 3.458081] ? __bitmap_and (kbuild/src/consumer/lib/bitmap.c:248)
[ 3.458081] ? sched_init_domains (kbuild/src/consumer/kernel/sched/topology.c:2194)
[ 3.458081] ? sched_init_smp (kbuild/src/consumer/kernel/sched/core.c:7727)
[ 3.458081] ? kernel_init_freeable (kbuild/src/consumer/init/main.c:1525)
[ 3.458081] ? rest_init (kbuild/src/consumer/init/main.c:1412)
[ 3.458081] ? kernel_init (kbuild/src/consumer/init/main.c:1415)
[ 3.458081] ? ret_from_fork (kbuild/src/consumer/arch/x86/entry/entry_64.S:302)
[ 3.458081]
[ 3.458081] Allocated by task 1:
[ 3.458081] kasan_save_stack (kbuild/src/consumer/mm/kasan/common.c:38)
[ 3.458081] ____kasan_kmalloc+0xb0/0x120
[ 3.458081] __kasan_kmalloc (kbuild/src/consumer/mm/kasan/common.c:443)
[ 3.458081] __kmalloc (kbuild/src/consumer/include/linux/kasan.h:215 kbuild/src/consumer/mm/slub.c:3970)
[ 3.458081] sched_init_numa (kbuild/src/consumer/include/linux/slab.h:557 kbuild/src/consumer/include/linux/slab.h:682 kbuild/src/consumer/kernel/sched/topology.c:1705)
[ 3.458081] sched_init_smp (kbuild/src/consumer/kernel/sched/core.c:7725)
[ 3.458081] kernel_init_freeable (kbuild/src/consumer/init/main.c:1525)
[ 3.458081] kernel_init (kbuild/src/consumer/init/main.c:1415)
[ 3.458081] ret_from_fork (kbuild/src/consumer/arch/x86/entry/entry_64.S:302)
[ 3.458081]
[ 3.458081] The buggy address belongs to the object at ffff8881008efd00
[ 3.458081] which belongs to the cache kmalloc-256 of size 256
[ 3.458081] The buggy address is located 0 bytes to the right of
[ 3.458081] 256-byte region [ffff8881008efd00, ffff8881008efe00)
[ 3.458081] The buggy address belongs to the page:
[ 3.458081] page:(____ptrval____) refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff8881008ef900 pfn:0x1008ee
[ 3.458081] head:(____ptrval____) order:1 compound_mapcount:0
[ 3.458081] flags: 0x200000000010200(slab|head)
[ 3.458081] raw: 0200000000010200 ffff888100040bc8 ffff888100040bc8 ffff8881000431c0
[ 3.458081] raw: ffff8881008ef900 0000000000080005 00000001ffffffff 0000000000000000
[ 3.458081] page dumped because: kasan: bad access detected
[ 3.458081]
[ 3.458081] Memory state around the buggy address:
[ 3.458081] ffff8881008efd00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 3.458081] ffff8881008efd80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 3.458081] >ffff8881008efe00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 3.458081] ^
[ 3.458081] ffff8881008efe80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 3.458081] ffff8881008eff00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 3.458081] ==================================================================
[ 3.458081] Disabling lock debugging due to kernel taint
[ 3.463182] workqueue: round-robin CPU selection forced, expect performance impact
[ 4.508234] node 0 deferred pages initialised in 1045ms
[ 4.511634] devtmpfs: initialized
[ 4.517369] version magic: 0x4139332a
[ 4.553244] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 1911260446275000 ns
[ 4.554121] futex hash table entries: 512 (order: 4, 65536 bytes, linear)
[ 4.555946] prandom: seed boundary self test passed
[ 4.558995] prandom: 100 self tests passed
[ 4.572944] prandom32: self test passed (less than 6 bits correlated)
[ 4.573124] pinctrl core: initialized pinctrl subsystem
[ 4.578139] regulator-dummy: no parameters, enabled
[ 4.583742] NET: Registered protocol family 16
[ 4.599663] thermal_sys: Registered thermal governor 'fair_share'
[ 4.599679] thermal_sys: Registered thermal governor 'bang_bang'
[ 4.600094] thermal_sys: Registered thermal governor 'step_wise'
[ 4.600907] thermal_sys: Registered thermal governor 'user_space'
[ 4.601774] EISA bus registered
[ 4.602593] cpuidle: using governor menu
[ 4.605102] ACPI: bus type PCI registered
[ 4.607132] PCI: Using configuration type 1 for base access
[ 4.749200] HugeTLB registered 2.00 MiB page size, pre-allocated 0 pages
[ 4.764169] cryptd: max_cpu_qlen set to 1000
[ 4.810234] ACPI: Added _OSI(Module Device)
[ 4.810969] ACPI: Added _OSI(Processor Device)
[ 4.811099] ACPI: Added _OSI(3.0 _SCP Extensions)
[ 4.811861] ACPI: Added _OSI(Processor Aggregator Device)
[ 4.812190] ACPI: Added _OSI(Linux-Dell-Video)
[ 4.812978] ACPI: Added _OSI(Linux-Lenovo-NV-HDMI-Audio)
[ 4.813172] ACPI: Added _OSI(Linux-HPI-Hybrid-Graphics)
[ 4.966673] ACPI: 1 ACPI AML tables successfully acquired and loaded
[ 5.039535] ACPI: Interpreter enabled
[ 5.040652] ACPI: (supports S0 S3 S5)
[ 5.041096] ACPI: Using IOAPIC for interrupt routing
[ 5.042693] PCI: Using host bridge windows from ACPI; if necessary, use "pci=nocrs" and report a bug
[ 5.053778] ACPI: Enabled 2 GPEs in block 00 to 0F
[ 5.407710] ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
[ 5.408213] acpi PNP0A03:00: _OSC: OS supports [ASPM ClockPM Segments MSI HPX-Type3]
[ 5.411341] acpi PNP0A03:00: fail to add MMCONFIG information, can't access extended PCI configuration space under this bridge.
[ 5.442324] PCI host bridge to bus 0000:00
[ 5.443004] pci_bus 0000:00: root bus resource [io 0x0000-0x0cf7 window]
[ 5.443137] pci_bus 0000:00: root bus resource [io 0x0d00-0xffff window]
[ 5.444101] pci_bus 0000:00: root bus resource [mem 0x000a0000-0x000bffff window]
[ 5.445143] pci_bus 0000:00: root bus resource [mem 0xc0000000-0xfebfffff window]
[ 5.446139] pci_bus 0000:00: root bus resource [mem 0x240000000-0x2bfffffff window]
[ 5.447148] pci_bus 0000:00: root bus resource [bus 00-ff]
[ 5.448627] pci 0000:00:00.0: [8086:1237] type 00 class 0x060000
To reproduce:
# build kernel
cd linux
cp config-5.11.0-rc2-00012-g620a6dc40754 .config
make HOSTCC=gcc-9 CC=gcc-9 ARCH=x86_64 olddefconfig prepare modules_prepare bzImage
git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp qemu -k <bzImage> job-script # job-script is attached in this email
Thanks,
Oliver Sang
On 01/02/2021 08:49, kernel test robot wrote:
>
> Greeting,
>
> FYI, we noticed the following commit (built with gcc-9):
>
> commit: 620a6dc40754dc218f5b6389b5d335e9a107fd29 ("sched/topology: Make sched_init_numa() use a set for the deduplicating sort")
> https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git sched/core
>
We also saw an issue with this patch during sched domain build which got
fixed by:
https://lkml.kernel.org/r/[email protected]