Greeting,
FYI, we noticed the following commit (built with gcc-9):
commit: c5665868183fec689dbab9fb8505188b2c4f0757 ("mm: kmemleak: use the memory pool for early allocations")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
in testcase: rcuperf
with following parameters:
runtime: 300s
perf_type: srcu
on test machine: qemu-system-i386 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
+--------------------------------------------------------------------------------------+------------+------------+
| | 0647398a8c | c566586818 |
+--------------------------------------------------------------------------------------+------------+------------+
| boot_successes | 0 | 0 |
| boot_failures | 18 | 18 |
| BUG:kernel_hang_in_boot_stage | 18 | |
| BUG:kernel_hang_in_early-boot_stage,last_printk:Probing_EDD(edd=off_to_disable)...ok | 0 | 18 |
+--------------------------------------------------------------------------------------+------------+------------+
If you fix the issue, kindly add following tag
Reported-by: kernel test robot <[email protected]>
early console in setup code
Probing EDD (edd=off to disable)... ok
BUG: kernel hang in early-boot stage, last printk: Probing EDD (edd=off to disable)... ok
Linux version 5.3.0-11792-gc5665868183fec #1
Command line: ip=::::vm-snb-i386-43::dhcp root=/dev/ram0 user=lkp job=/lkp/jobs/scheduled/vm-snb-i386-43/rcuperf-srcu-300s-debian-i386-20191205.cgz-c5665868183fec689dbab9fb8505188b2c4f0757-20200817-4365-xzdqse-11.yaml ARCH=i386 kconfig=i386-randconfig-a013-20200808 branch=linus/master commit=c5665868183fec689dbab9fb8505188b2c4f0757 BOOT_IMAGE=/pkg/linux/i386-randconfig-a013-20200808/gcc-9/c5665868183fec689dbab9fb8505188b2c4f0757/vmlinuz-5.3.0-11792-gc5665868183fec max_uptime=1500 RESULT_ROOT=/result/rcuperf/srcu-300s/vm-snb-i386/debian-i386-20191205.cgz/i386-randconfig-a013-20200808/gcc-9/c5665868183fec689dbab9fb8505188b2c4f0757/8 LKP_SERVER=inn selinux=0 debug apic=debug sysrq_always_enabled rcupdate.rcu_cpu_stall_timeout=100 net.ifnames=0 printk.devkmsg=on panic=-1 softlockup_panic=1 nmi_watchdog=panic oops=panic load_ramdisk=2 prompt_ramdisk=0 drbd.minor_count=8 systemd.log_level=err ignore_loglevel console=tty0 earlyprintk=ttyS0,115200 console=ttyS0,115200 vga=normal rw rcuperf.shutdown=0 watchdog_thresh=60
Kboot worker: lkp-worker33
Elapsed time: 780
To reproduce:
# build kernel
cd linux
cp config-5.3.0-11792-gc5665868183fec .config
make HOSTCC=gcc-9 CC=gcc-9 ARCH=i386 olddefconfig prepare modules_prepare bzImage
git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp qemu -k <bzImage> job-script # job-script is attached in this email
Thanks,
Rong Chen
On Tue, Aug 18, 2020 at 08:23:51AM +0800, kernel test robot wrote:
> Greeting,
>
> FYI, we noticed the following commit (built with gcc-9):
>
> commit: c5665868183fec689dbab9fb8505188b2c4f0757 ("mm: kmemleak: use the memory pool for early allocations")
I might see one of those early boot failure before. In my case, the bare-metal
system was reset. Can you try to narrow down to a smaller
CONFIG_DEBUG_KMEMLEAK_MEM_POOL_SIZE (assume 0 works if your bisecting was
correct) that works?
On 8/21/20 9:01 AM, Qian Cai wrote:
> On Tue, Aug 18, 2020 at 08:23:51AM +0800, kernel test robot wrote:
>> Greeting,
>>
>> FYI, we noticed the following commit (built with gcc-9):
>>
>> commit: c5665868183fec689dbab9fb8505188b2c4f0757 ("mm: kmemleak: use the memory pool for early allocations")
> I might see one of those early boot failure before. In my case, the bare-metal
> system was reset. Can you try to narrow down to a smaller
> CONFIG_DEBUG_KMEMLEAK_MEM_POOL_SIZE (assume 0 works if your bisecting was
> correct) that works?
Hi Qian,
Adding CONFIG_EARLY_PRINTK=y to the kconfig file, and the boot hangs in
the below position:
[ 0.715834] Kernel command line: root=/dev/ram0 hung_task_panic=1
debug apic=debug sysrq_always_enabled rcupdate.rcu_cpu_stall_timeout=100
net.ifnames=0 printk.devkmsg=on panic=-1 softlockup_panic=1
nmi_watchdog=panic oops=panic load_ramdisk=2 prompt_ramdisk=0
drbd.minor_count=8
systemd.log_level=err ignore_loglevel console=tty0
earlyprintk=ttyS0,115200 console=ttyS0,115200 vga=normal rw
rcuperf.shutdown=0 watchdog_thresh=60
[ 0.719688] sysrq: sysrq always enabled.
[ 0.801005] Dentry cache hash table entries: 2097152 (order: 12,
16777216 bytes, linear)
[ 0.805588] Inode-cache hash table entries: 1048576 (order: 11,
8388608 bytes, linear)
[ 0.806464] mem auto-init: stack:off, heap alloc:on, heap free:off
[ 1.080978] Memory: 12319196K/12680692K available (10243K kernel
code, 2414K rwdata, 8184K rodata, 856K init, 20772K bss, 361496K
reserved, 0K cma-reserved)
qemu-system-x86_64: terminating on signal 2
The problem disappeared if CONFIG_DEBUG_KMEMLEAK_MEM_POOL_SIZE=400:
[ 1.064575] Kernel command line: root=/dev/ram0 hung_task_panic=1
debug apic=debug sysrq_always_enabled rcupdate.rcu_cpu_stall_timeout=100
net.ifnames=0 printk.devkmsg=on panic=-1 softlockup_panic=1
nmi_watchdog=panic oops=panic load_ramdisk=2 prompt_ramdisk=0
drbd.minor_count=8
systemd.log_level=err ignore_loglevel console=tty0
earlyprintk=ttyS0,115200 console=ttyS0,115200 vga=normal rw
rcuperf.shutdown=0 watchdog_thresh=60
[ 1.068883] sysrq: sysrq always enabled.
[ 1.108720] Dentry cache hash table entries: 2097152 (order: 12,
16777216 bytes, linear)
[ 1.148852] Inode-cache hash table entries: 1048576 (order: 11,
8388608 bytes, linear)
[ 1.150271] mem auto-init: stack:off, heap alloc:on, heap free:off
[ 1.835309] Memory: 12325340K/12680692K available (10243K kernel
code, 2414K rwdata, 8192K rodata, 856K init, 14628K bss, 355352K
reserved, 0K cma-reserved)
[ 3.487078] general protection fault: 0000 [#1] DEBUG_PAGEALLOC PTI
[ 3.488185] CPU: 0 PID: 0 Comm: swapper Not tainted
5.3.0-11792-gc5665868183fe #1
[ 3.489581] RIP: 0010:lookup_address_in_pgd+0xd1/0x158
[ 3.490566] Code: 80 e0 01 0f 84 a7 00 00 00 49 89 f0 b8 11 ff ff 01
48 21 f9 49 c1 e8 12 48 c1 e0 27 41 81 e0 f8 0f 00 00 49 01 c0 49 8d 04
08 <48> f7 00 9f ff ff ff 0f 84 34 ff ff ff c7 02 02 00 00 00 48 8b 38
[ 3.494020] RSP: 0000:ffffffff8239e700 EFLAGS: 00010086
[ 3.494959] RAX: ffff110302110d90 RBX: 0000000000000001 RCX:
ffff888302110000
[ 3.496222] RDX: ffffffff8239e72c RSI: ffff88833653b000 RDI:
ffff88833653a000
[ 3.497363] RBP: ffffffff8239e808 R08: ffff888000000d90 R09:
0000000000346000
[ 3.498467] R10: 0000000000000001 R11: 0000000000000000 R12:
0000000000000001
[ 3.499554] R13: 0000000000000000 R14: ffff88833653b000 R15:
0000000000000000
[ 3.500638] FS: 0000000000000000(0000) GS:ffffffff8243d000(0000)
knlGS:0000000000000000
[ 3.501863] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3.503036] CR2: ffff8883447ff000 CR3: 0000000002420000 CR4:
00000000000006b0
[ 3.504116] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[ 3.505156] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[ 3.506225] Call Trace:
[ 3.506597] Modules linked in:
[ 3.507255] random: get_random_bytes called from
init_oops_id+0x1d/0x2c with crng_init=0
[ 3.508734] ---[ end trace 0000000000000000 ]---
Best Regards,
Rong Chen
On Mon, Aug 24, 2020 at 10:47:20AM +0800, Rong Chen wrote:
>
>
> On 8/21/20 9:01 AM, Qian Cai wrote:
> > On Tue, Aug 18, 2020 at 08:23:51AM +0800, kernel test robot wrote:
> > > Greeting,
> > >
> > > FYI, we noticed the following commit (built with gcc-9):
> > >
> > > commit: c5665868183fec689dbab9fb8505188b2c4f0757 ("mm: kmemleak: use the memory pool for early allocations")
> > I might see one of those early boot failure before. In my case, the bare-metal
> > system was reset. Can you try to narrow down to a smaller
> > CONFIG_DEBUG_KMEMLEAK_MEM_POOL_SIZE (assume 0 works if your bisecting was
> > correct) that works?
>
> Hi Qian,
>
> Adding CONFIG_EARLY_PRINTK=y to the kconfig file, and the boot hangs in the
> below position:
>
> [??? 0.715834] Kernel command line: root=/dev/ram0 hung_task_panic=1 debug
> apic=debug sysrq_always_enabled rcupdate.rcu_cpu_stall_timeout=100
> net.ifnames=0 printk.devkmsg=on panic=-1 softlockup_panic=1
> nmi_watchdog=panic oops=panic load_ramdisk=2 prompt_ramdisk=0
> drbd.minor_count=8
> systemd.log_level=err ignore_loglevel console=tty0 earlyprintk=ttyS0,115200
> console=ttyS0,115200 vga=normal rw rcuperf.shutdown=0 watchdog_thresh=60
> [??? 0.719688] sysrq: sysrq always enabled.
> [??? 0.801005] Dentry cache hash table entries: 2097152 (order: 12, 16777216
> bytes, linear)
> [??? 0.805588] Inode-cache hash table entries: 1048576 (order: 11, 8388608
> bytes, linear)
> [??? 0.806464] mem auto-init: stack:off, heap alloc:on, heap free:off
> [??? 1.080978] Memory: 12319196K/12680692K available (10243K kernel code,
> 2414K rwdata, 8184K rodata, 856K init, 20772K bss, 361496K reserved, 0K
> cma-reserved)
> qemu-system-x86_64: terminating on signal 2
>
> The problem disappeared if CONFIG_DEBUG_KMEMLEAK_MEM_POOL_SIZE=400:
Interesting. Can you paste the line as show:
./scripts/faddr2line vmlinux lookup_address_in_pgd+0xd1/0x158
Also, does this happens on the latest mainline or linux-next? Looks like you
were reproducing using v5.3.
>
> [??? 1.064575] Kernel command line: root=/dev/ram0 hung_task_panic=1 debug
> apic=debug sysrq_always_enabled rcupdate.rcu_cpu_stall_timeout=100
> net.ifnames=0 printk.devkmsg=on panic=-1 softlockup_panic=1
> nmi_watchdog=panic oops=panic load_ramdisk=2 prompt_ramdisk=0
> drbd.minor_count=8
> systemd.log_level=err ignore_loglevel console=tty0 earlyprintk=ttyS0,115200
> console=ttyS0,115200 vga=normal rw rcuperf.shutdown=0 watchdog_thresh=60
> [??? 1.068883] sysrq: sysrq always enabled.
> [??? 1.108720] Dentry cache hash table entries: 2097152 (order: 12, 16777216
> bytes, linear)
> [??? 1.148852] Inode-cache hash table entries: 1048576 (order: 11, 8388608
> bytes, linear)
> [??? 1.150271] mem auto-init: stack:off, heap alloc:on, heap free:off
> [??? 1.835309] Memory: 12325340K/12680692K available (10243K kernel code,
> 2414K rwdata, 8192K rodata, 856K init, 14628K bss, 355352K reserved, 0K
> cma-reserved)
> [??? 3.487078] general protection fault: 0000 [#1] DEBUG_PAGEALLOC PTI
> [??? 3.488185] CPU: 0 PID: 0 Comm: swapper Not tainted
> 5.3.0-11792-gc5665868183fe #1
> [??? 3.489581] RIP: 0010:lookup_address_in_pgd+0xd1/0x158
> [??? 3.490566] Code: 80 e0 01 0f 84 a7 00 00 00 49 89 f0 b8 11 ff ff 01 48
> 21 f9 49 c1 e8 12 48 c1 e0 27 41 81 e0 f8 0f 00 00 49 01 c0 49 8d 04 08 <48>
> f7 00 9f ff ff ff 0f 84 34 ff ff ff c7 02 02 00 00 00 48 8b 38
> [??? 3.494020] RSP: 0000:ffffffff8239e700 EFLAGS: 00010086
> [??? 3.494959] RAX: ffff110302110d90 RBX: 0000000000000001 RCX:
> ffff888302110000
> [??? 3.496222] RDX: ffffffff8239e72c RSI: ffff88833653b000 RDI:
> ffff88833653a000
> [??? 3.497363] RBP: ffffffff8239e808 R08: ffff888000000d90 R09:
> 0000000000346000
> [??? 3.498467] R10: 0000000000000001 R11: 0000000000000000 R12:
> 0000000000000001
> [??? 3.499554] R13: 0000000000000000 R14: ffff88833653b000 R15:
> 0000000000000000
> [??? 3.500638] FS:? 0000000000000000(0000) GS:ffffffff8243d000(0000)
> knlGS:0000000000000000
> [??? 3.501863] CS:? 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [??? 3.503036] CR2: ffff8883447ff000 CR3: 0000000002420000 CR4:
> 00000000000006b0
> [??? 3.504116] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> [??? 3.505156] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> 0000000000000400
> [??? 3.506225] Call Trace:
> [??? 3.506597] Modules linked in:
> [??? 3.507255] random: get_random_bytes called from init_oops_id+0x1d/0x2c
> with crng_init=0
> [??? 3.508734] ---[ end trace 0000000000000000 ]---
>
>
> Best Regards,
> Rong Chen
On 8/24/20 8:29 PM, Qian Cai wrote:
> On Mon, Aug 24, 2020 at 10:47:20AM +0800, Rong Chen wrote:
>>
>> On 8/21/20 9:01 AM, Qian Cai wrote:
>>> On Tue, Aug 18, 2020 at 08:23:51AM +0800, kernel test robot wrote:
>>>> Greeting,
>>>>
>>>> FYI, we noticed the following commit (built with gcc-9):
>>>>
>>>> commit: c5665868183fec689dbab9fb8505188b2c4f0757 ("mm: kmemleak: use the memory pool for early allocations")
>>> I might see one of those early boot failure before. In my case, the bare-metal
>>> system was reset. Can you try to narrow down to a smaller
>>> CONFIG_DEBUG_KMEMLEAK_MEM_POOL_SIZE (assume 0 works if your bisecting was
>>> correct) that works?
>> Hi Qian,
>>
>> Adding CONFIG_EARLY_PRINTK=y to the kconfig file, and the boot hangs in the
>> below position:
>>
>> [ 0.715834] Kernel command line: root=/dev/ram0 hung_task_panic=1 debug
>> apic=debug sysrq_always_enabled rcupdate.rcu_cpu_stall_timeout=100
>> net.ifnames=0 printk.devkmsg=on panic=-1 softlockup_panic=1
>> nmi_watchdog=panic oops=panic load_ramdisk=2 prompt_ramdisk=0
>> drbd.minor_count=8
>> systemd.log_level=err ignore_loglevel console=tty0 earlyprintk=ttyS0,115200
>> console=ttyS0,115200 vga=normal rw rcuperf.shutdown=0 watchdog_thresh=60
>> [ 0.719688] sysrq: sysrq always enabled.
>> [ 0.801005] Dentry cache hash table entries: 2097152 (order: 12, 16777216
>> bytes, linear)
>> [ 0.805588] Inode-cache hash table entries: 1048576 (order: 11, 8388608
>> bytes, linear)
>> [ 0.806464] mem auto-init: stack:off, heap alloc:on, heap free:off
>> [ 1.080978] Memory: 12319196K/12680692K available (10243K kernel code,
>> 2414K rwdata, 8184K rodata, 856K init, 20772K bss, 361496K reserved, 0K
>> cma-reserved)
>> qemu-system-x86_64: terminating on signal 2
>>
>> The problem disappeared if CONFIG_DEBUG_KMEMLEAK_MEM_POOL_SIZE=400:
> Interesting. Can you paste the line as show:
>
> ./scripts/faddr2line vmlinux lookup_address_in_pgd+0xd1/0x158
>
> Also, does this happens on the latest mainline or linux-next? Looks like you
> were reproducing using v5.3.
Hi Qian,
I rebuilt the kernel on commit c566586818 but the error changed to "RIP:
0010:clear_page_orig+0x12/0x40",
and the error can be reproduced on parent commit:
[ 0.539811] Memory: 12325340K/12680692K available (10243K kernel
code, 2414K rwdata, 8188K rodata, 856K init, 14628K bss, 355352K
reserved, 0K cma-reserved)
[ 4.133400] BUG: unable to handle page fault for address:
ffff88833653e000
[ 4.134130] #PF: supervisor write access in kernel mode
[ 4.134694] #PF: error_code(0x0002) - not-present page
[ 4.135177] PGD 3800067 P4D 3800067 PUD f000e6f2f000d445 PMD 0
[ 4.135730] Thread overran stack, or stack corrupted
[ 4.136192] Oops: 0002 [#1] DEBUG_PAGEALLOC PTI
[ 4.136609] CPU: 0 PID: 0 Comm: swapper Not tainted
5.3.0-11792-gc5665868183fe #1
[ 4.137300] RIP: 0010:clear_page_orig+0x12/0x40
[ 4.137732] Code: 03 00 00 00 b0 01 5b c3 b9 00 02 00 00 31 c0 f3 48
ab c3 0f 1f 44 00 00 31 c0 b9 40 00 00 00 66 0f 1f 84 00 00 00 00 00 ff
c9 <48> 89 07 48 89 47 08 48 89 47 10 48 89 47 18 48 89 47 20 48 89 47
[ 4.139453] RSP: 0000:ffffffff8239d8e8 EFLAGS: 00010016
[ 4.139939] RAX: 0000000000000000 RBX: 0000000000000101 RCX:
000000000000003f
[ 4.140602] RDX: 0000000000000000 RSI: 0000000000000001 RDI:
ffff88833653e000
[ 4.141261] RBP: ffffea000cd94f80 R08: ffffffff82427800 R09:
ffffea000cd94f80
[ 4.141956] R10: 0000160000000000 R11: ffff888000000000 R12:
0000000000000000
[ 4.142642] R13: 0000000000000001 R14: 0000000000092000 R15:
0000000000000046
[ 4.143298] FS: 0000000000000000(0000) GS:ffffffff8243d000(0000)
knlGS:0000000000000000
[ 4.144076] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4.144661] CR2: ffff88833653e000 CR3: 0000000002420000 CR4:
00000000000006b0
[ 4.145382] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[ 4.146121] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[ 4.146829] Call Trace:
[ 4.147066] Modules linked in:
[ 4.147359] CR2: ffff88833653e000
[ 4.147757] random: get_random_bytes called from
init_oops_id+0x1d/0x2c with crng_init=0
$ ./scripts/faddr2line vmlinux clear_page_orig+0x12/0x40
clear_page_orig+0x12/0x40:
clear_page_orig at arch/x86/lib/clear_page_64.S:31
but I also can reproduced the lookup_address_in_pgd error in v5.9-rc2
with attached config file:
[ 0.382789] Memory: 12313044K/12680692K available (10242K kernel
code, 2658K rwdata, 8916K rodata, 800K init, 24540K bss, 367392K
reserved, 0K cma-reserved)
[ 4.027977] general protection fault, probably for non-canonical
address 0xf0006f7280000d98: 0000 [#1] DEBUG_PAGEALLOC PTI
[ 4.029094] CPU: 0 PID: 0 Comm: swapper Not tainted 5.9.0-rc2 #1
[ 4.029741] RIP: 0010:lookup_address_in_pgd+0x7c/0xcc
[ 4.030341] Code: 00 00 48 3d 81 00 00 00 74 6c 4c 89 df e8 9d f2 ff
ff 48 f7 d0 4c 21 d8 a8 01 74 5a 4c 89 d6 4c 89 df e8 fd f5 ff ff 49 89
c0 <48> f7 00 9f ff ff ff 74 93 41 c7 01 02 00 00 00 48 8b 08 48 89 cf
[ 4.032205] RSP: 0000:ffffffff82453a08 EFLAGS: 00010082
[ 4.032716] RAX: f0006f7280000d98 RBX: 0000000000000001 RCX:
f000e6f280000000
[ 4.033569] RDX: ffff888000000000 RSI: ffff888000000d98 RDI:
f000e6f2f000d400
[ 4.034474] RBP: ffffffff82453b28 R08: f0006f7280000d98 R09:
ffffffff82453a48
[ 4.035125] R10: ffff88833664c000 R11: f000e6f2f000d445 R12:
ffff88833664c000
[ 4.035836] R13: 0000000000000001 R14: ffff888000000000 R15:
ffffffff827806b8
[ 4.036575] FS: 0000000000000000(0000) GS:ffffffff82641000(0000)
knlGS:0000000000000000
[ 4.037389] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4.037961] CR2: ffff8883447ff000 CR3: 0000000002622000 CR4:
00000000000006b0
[ 4.038677] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[ 4.039388] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[ 4.040243] Call Trace:
[ 4.040552] Modules linked in:
[ 4.041033] random: get_random_bytes called from
init_oops_id+0x1d/0x2c with crng_init=0
$ ./scripts/faddr2line vmlinux lookup_address_in_pgd+0x7c/0xcc
lookup_address_in_pgd+0x7c/0xcc:
lookup_address_in_pgd at arch/x86/mm/pat/set_memory.c:604
(inlined by) lookup_address_in_pgd at arch/x86/mm/pat/set_memory.c:575
Best Regards,
Rong Chen
> On Aug 25, 2020, at 8:44 PM, Rong Chen <[email protected]> wrote:
>
> I rebuilt the kernel on commit c566586818 but the error changed to "RIP: 0010:clear_page_orig+0x12/0x40",
> and the error can be reproduced on parent commit:
Catalin, any thought? Sounds like those early kmemleak allocations cause some sort of memory corruption?
>
> [ 0.539811] Memory: 12325340K/12680692K available (10243K kernel code, 2414K rwdata, 8188K rodata, 856K init, 14628K bss, 355352K reserved, 0K cma-reserved)
> [ 4.133400] BUG: unable to handle page fault for address: ffff88833653e000
> [ 4.134130] #PF: supervisor write access in kernel mode
> [ 4.134694] #PF: error_code(0x0002) - not-present page
> [ 4.135177] PGD 3800067 P4D 3800067 PUD f000e6f2f000d445 PMD 0
> [ 4.135730] Thread overran stack, or stack corrupted
> [ 4.136192] Oops: 0002 [#1] DEBUG_PAGEALLOC PTI
> [ 4.136609] CPU: 0 PID: 0 Comm: swapper Not tainted 5.3.0-11792-gc5665868183fe #1
> [ 4.137300] RIP: 0010:clear_page_orig+0x12/0x40
> [ 4.137732] Code: 03 00 00 00 b0 01 5b c3 b9 00 02 00 00 31 c0 f3 48 ab c3 0f 1f 44 00 00 31 c0 b9 40 00 00 00 66 0f 1f 84 00 00 00 00 00 ff c9 <48> 89 07 48 89 47 08 48 89 47 10 48 89 47 18 48 89 47 20 48 89 47
> [ 4.139453] RSP: 0000:ffffffff8239d8e8 EFLAGS: 00010016
> [ 4.139939] RAX: 0000000000000000 RBX: 0000000000000101 RCX: 000000000000003f
> [ 4.140602] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff88833653e000
> [ 4.141261] RBP: ffffea000cd94f80 R08: ffffffff82427800 R09: ffffea000cd94f80
> [ 4.141956] R10: 0000160000000000 R11: ffff888000000000 R12: 0000000000000000
> [ 4.142642] R13: 0000000000000001 R14: 0000000000092000 R15: 0000000000000046
> [ 4.143298] FS: 0000000000000000(0000) GS:ffffffff8243d000(0000) knlGS:0000000000000000
> [ 4.144076] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 4.144661] CR2: ffff88833653e000 CR3: 0000000002420000 CR4: 00000000000006b0
> [ 4.145382] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 4.146121] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 4.146829] Call Trace:
> [ 4.147066] Modules linked in:
> [ 4.147359] CR2: ffff88833653e000
> [ 4.147757] random: get_random_bytes called from init_oops_id+0x1d/0x2c with crng_init=0
>
> $ ./scripts/faddr2line vmlinux clear_page_orig+0x12/0x40
> clear_page_orig+0x12/0x40:
> clear_page_orig at arch/x86/lib/clear_page_64.S:31
>
>
> but I also can reproduced the lookup_address_in_pgd error in v5.9-rc2 with attached config file:
>
> [ 0.382789] Memory: 12313044K/12680692K available (10242K kernel code, 2658K rwdata, 8916K rodata, 800K init, 24540K bss, 367392K reserved, 0K cma-reserved)
> [ 4.027977] general protection fault, probably for non-canonical address 0xf0006f7280000d98: 0000 [#1] DEBUG_PAGEALLOC PTI
> [ 4.029094] CPU: 0 PID: 0 Comm: swapper Not tainted 5.9.0-rc2 #1
> [ 4.029741] RIP: 0010:lookup_address_in_pgd+0x7c/0xcc
> [ 4.030341] Code: 00 00 48 3d 81 00 00 00 74 6c 4c 89 df e8 9d f2 ff ff 48 f7 d0 4c 21 d8 a8 01 74 5a 4c 89 d6 4c 89 df e8 fd f5 ff ff 49 89 c0 <48> f7 00 9f ff ff ff 74 93 41 c7 01 02 00 00 00 48 8b 08 48 89 cf
> [ 4.032205] RSP: 0000:ffffffff82453a08 EFLAGS: 00010082
> [ 4.032716] RAX: f0006f7280000d98 RBX: 0000000000000001 RCX: f000e6f280000000
> [ 4.033569] RDX: ffff888000000000 RSI: ffff888000000d98 RDI: f000e6f2f000d400
> [ 4.034474] RBP: ffffffff82453b28 R08: f0006f7280000d98 R09: ffffffff82453a48
> [ 4.035125] R10: ffff88833664c000 R11: f000e6f2f000d445 R12: ffff88833664c000
> [ 4.035836] R13: 0000000000000001 R14: ffff888000000000 R15: ffffffff827806b8
> [ 4.036575] FS: 0000000000000000(0000) GS:ffffffff82641000(0000) knlGS:0000000000000000
> [ 4.037389] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 4.037961] CR2: ffff8883447ff000 CR3: 0000000002622000 CR4: 00000000000006b0
> [ 4.038677] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 4.039388] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 4.040243] Call Trace:
> [ 4.040552] Modules linked in:
> [ 4.041033] random: get_random_bytes called from init_oops_id+0x1d/0x2c with crng_init=0
>
> $ ./scripts/faddr2line vmlinux lookup_address_in_pgd+0x7c/0xcc
> lookup_address_in_pgd+0x7c/0xcc:
> lookup_address_in_pgd at arch/x86/mm/pat/set_memory.c:604
> (inlined by) lookup_address_in_pgd at arch/x86/mm/pat/set_memory.c:575
On Tue, Aug 25, 2020 at 11:02:40PM -0400, Qian Cai wrote:
> On Aug 25, 2020, at 8:44 PM, Rong Chen <[email protected]> wrote:
> > I rebuilt the kernel on commit c566586818 but the error changed to
> > "RIP: 0010:clear_page_orig+0x12/0x40", and the error can be
> > reproduced on parent commit:
>
> Catalin, any thought? Sounds like those early kmemleak allocations
> cause some sort of memory corruption?
I can't immediately see how but Rong implies that the error also happens
on the parent commit. Does it mean the bisection isn't entirely right?
--
Catalin
On 8/27/20 1:30 AM, Catalin Marinas wrote:
> On Tue, Aug 25, 2020 at 11:02:40PM -0400, Qian Cai wrote:
>> On Aug 25, 2020, at 8:44 PM, Rong Chen <[email protected]> wrote:
>>> I rebuilt the kernel on commit c566586818 but the error changed to
>>> "RIP: 0010:clear_page_orig+0x12/0x40", and the error can be
>>> reproduced on parent commit:
>> Catalin, any thought? Sounds like those early kmemleak allocations
>> cause some sort of memory corruption?
> I can't immediately see how but Rong implies that the error also happens
> on the parent commit. Does it mean the bisection isn't entirely right?
>
Hi Catalin,
The original bisection is for "BUG:kernel_hang_in_early-boot_stage"
which locate to commit c566586818,
and the boot will go on and meet the error "RIP:
0010:clear_page_orig+0x12/0x40" if we set
CONFIG_DEBUG_KMEMLEAK_MEM_POOL_SIZE=400, but the error shouldn't cause
by commit c566586818
because we can reproduce the error on parent commit.
Best Regards,
Rong Chen