2022-05-19 16:04:55

by kernel test robot

[permalink] [raw]
Subject: [fork] 753550eb0c: BUG:kernel_NULL_pointer_dereference,address


(please be noted we reported
"[fork] 753550eb0c: BUG:KASAN:null-ptr-deref_in_task_nr_scan_windows"
on
https://lists.01.org/hyperkitty/list/[email protected]/thread/5X3LMPC6LEVEMMH7OQUHCFDHDXDR4XOF/
now we noticed the issue still exists on linux-next/master,
report again FYI)


Greeting,

FYI, we noticed the following commit (built with gcc-11):

commit: 753550eb0ce1fea4b5cbd989f2e06ef80b2feb28 ("fork: Explicitly set PF_KTHREAD")
https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master

in testcase: stress-ng
version: stress-ng-x86_64-0.11-06_20220518
with following parameters:

nr_threads: 100%
testtime: 60s
class: cpu
test: str
cpufreq_governor: performance
ucode: 0x42e



on test machine: 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz with 112G memory

caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):



If you fix the issue, kindly add following tag
Reported-by: kernel test robot <[email protected]>


[ 4.420549][ C0] BUG: kernel NULL pointer dereference, address: 00000000000002d8
[ 4.420552][ C0] #PF: supervisor read access in kernel mode
[ 4.420554][ C0] #PF: error_code(0x0000) - not-present page
[ 4.420557][ C0] PGD 0 P4D 0
[ 4.420563][ C0] Oops: 0000 [#1] SMP PTI
[ 4.420568][ C0] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.18.0-rc1-00006-g753550eb0ce1 #1
[ 4.420572][ C0] Hardware name: Intel Corporation S2600WP/S2600WP, BIOS SE5C600.86B.02.02.0002.122320131210 12/23/2013
[ 4.420575][ C0] RIP: task_nr_scan_windows+0x5/0x80
[ 4.420588][ C0] Code: 66 2e 0f 1f 84 00 00 00 00 00 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 0f 1f 44 00 00 <48> 8b 87 d8 02 00 00 8b 0d d2 91 73 01 48 8b b7 e0 02 00 00 48 8b
All code
========
0: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
7: 00 00 00
a: 66 66 2e 0f 1f 84 00 data16 nopw %cs:0x0(%rax,%rax,1)
11: 00 00 00 00
15: 66 66 2e 0f 1f 84 00 data16 nopw %cs:0x0(%rax,%rax,1)
1c: 00 00 00 00
20: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
25: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
2a:* 48 8b 87 d8 02 00 00 mov 0x2d8(%rdi),%rax <-- trapping instruction
31: 8b 0d d2 91 73 01 mov 0x17391d2(%rip),%ecx # 0x1739209
37: 48 8b b7 e0 02 00 00 mov 0x2e0(%rdi),%rsi
3e: 48 rex.W
3f: 8b .byte 0x8b

Code starting with the faulting instruction
===========================================
0: 48 8b 87 d8 02 00 00 mov 0x2d8(%rdi),%rax
7: 8b 0d d2 91 73 01 mov 0x17391d2(%rip),%ecx # 0x17391df
d: 48 8b b7 e0 02 00 00 mov 0x2e0(%rdi),%rsi
14: 48 rex.W
15: 8b .byte 0x8b
[ 4.420592][ C0] RSP: 0000:ffffc90000003f00 EFLAGS: 00010046
[ 4.420595][ C0] RAX: 0000000000000064 RBX: 00000000000003e8 RCX: 000000000000000a
[ 4.420598][ C0] RDX: 0000000000000000 RSI: ffff88810033a880 RDI: 0000000000000000
[ 4.420601][ C0] RBP: ffff88810033a800 R08: 0000000000000236 R09: ffff88810033a880
[ 4.420603][ C0] R10: 0000000000000236 R11: 0000000000000000 R12: 0000000000000064
[ 4.420606][ C0] R13: ffff88810033a800 R14: ffff888f0282b940 R15: ffff88810033a800
[ 4.420608][ C0] FS: 0000000000000000(0000) GS:ffff888f02800000(0000) knlGS:0000000000000000
[ 4.420611][ C0] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4.420613][ C0] CR2: 00000000000002d8 CR3: 0000001c3ec0a001 CR4: 00000000001706f0
[ 4.420616][ C0] Call Trace:
[ 4.420622][ C0] <IRQ>
[ 4.420623][ C0] task_scan_start (kernel/sched/fair.c:1132 kernel/sched/fair.c:1138)
[ 4.420629][ C0] task_tick_fair (kernel/sched/fair.c:2933 kernel/sched/fair.c:11216)
[ 4.420635][ C0] scheduler_tick (arch/x86/include/asm/jump_label.h:27 include/linux/jump_label.h:207 kernel/sched/features.h:99 kernel/sched/core.c:5345)
[ 4.420646][ C0] update_process_times (kernel/time/timer.c:1793)
[ 4.420653][ C0] tick_periodic (kernel/time/tick-common.c:101)
[ 4.420660][ C0] tick_handle_periodic (kernel/time/tick-common.c:120)
[ 4.420665][ C0] __sysvec_apic_timer_interrupt (arch/x86/include/asm/jump_label.h:27 include/linux/jump_label.h:207 arch/x86/include/asm/trace/irq_vectors.h:41 arch/x86/kernel/apic/apic.c:1104)
[ 4.420674][ C0] sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1097 (discriminator 14))
[ 4.420686][ C0] </IRQ>
[ 4.420688][ C0] <TASK>
[ 4.420689][ C0] asm_sysvec_apic_timer_interrupt (arch/x86/include/asm/idtentry.h:645)
[ 4.420698][ C0] RIP: 0010:console_unlock (arch/x86/include/asm/atomic.h:29 include/linux/atomic/atomic-instrumented.h:28 kernel/printk/printk.c:270 kernel/printk/printk.c:2622 kernel/printk/printk.c:2783)
[ 4.420706][ C0] Code: 0f 85 c5 00 00 00 e8 7a fc ff ff 85 c0 0f 85 8c fd ff ff e9 01 ff ff ff e8 e8 1c 00 00 41 f7 c7 00 02 00 00 0f 85 df 00 00 00 <8b> 05 e5 d9 6e 01 83 f8 ff 0f 85 5b ff ff ff 85 ed 0f 84 a6 fd ff
All code
========
0: 0f 85 c5 00 00 00 jne 0xcb
6: e8 7a fc ff ff callq 0xfffffffffffffc85
b: 85 c0 test %eax,%eax
d: 0f 85 8c fd ff ff jne 0xfffffffffffffd9f
13: e9 01 ff ff ff jmpq 0xffffffffffffff19
18: e8 e8 1c 00 00 callq 0x1d05
1d: 41 f7 c7 00 02 00 00 test $0x200,%r15d
24: 0f 85 df 00 00 00 jne 0x109
2a:* 8b 05 e5 d9 6e 01 mov 0x16ed9e5(%rip),%eax # 0x16eda15 <-- trapping instruction
30: 83 f8 ff cmp $0xffffffff,%eax
33: 0f 85 5b ff ff ff jne 0xffffffffffffff94
39: 85 ed test %ebp,%ebp
3b: 0f .byte 0xf
3c: 84 .byte 0x84
3d: a6 cmpsb %es:(%rdi),%ds:(%rsi)
3e: fd std
3f: ff .byte 0xff

Code starting with the faulting instruction
===========================================
0: 8b 05 e5 d9 6e 01 mov 0x16ed9e5(%rip),%eax # 0x16ed9eb
6: 83 f8 ff cmp $0xffffffff,%eax
9: 0f 85 5b ff ff ff jne 0xffffffffffffff6a
f: 85 ed test %ebp,%ebp
11: 0f .byte 0xf
12: 84 .byte 0x84
13: a6 cmpsb %es:(%rdi),%ds:(%rsi)
14: fd std
15: ff .byte 0xff
[ 4.420709][ C0] RSP: 0000:ffffc900000738a8 EFLAGS: 00000206
[ 4.420712][ C0] RAX: 0000000000000000 RBX: ffff88810033a800 RCX: ffff8880000bf3a0
[ 4.420714][ C0] RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffffffff835ac878
[ 4.420716][ C0] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000019
[ 4.420718][ C0] R10: 000000000000075d R11: 000000005f696370 R12: ffffffff835ac878
[ 4.420721][ C0] R13: 000000000000005d R14: 0000000000000000 R15: 0000000000000206
[ 4.420726][ C0] ? console_unlock (arch/x86/include/asm/irqflags.h:137 kernel/printk/printk.c:2778)
[ 4.420731][ C0] vprintk_emit (arch/x86/include/asm/preempt.h:85 kernel/printk/printk.c:2273)
[ 4.420736][ C0] dev_vprintk_emit (drivers/base/core.c:4605)
[ 4.420750][ C0] dev_printk_emit (drivers/base/core.c:4620)
[ 4.420755][ C0] ? vprintk_emit (arch/x86/include/asm/preempt.h:85 kernel/printk/printk.c:2273)
[ 4.420759][ C0] ? __dev_printk (drivers/base/core.c:4627)
[ 4.420764][ C0] _dev_info (drivers/base/core.c:4673)
[ 4.420771][ C0] pci_register_host_bridge.cold (drivers/pci/probe.c:1019)
[ 4.420778][ C0] ? complete_all (kernel/sched/completion.c:64)
[ 4.420786][ C0] pci_create_root_bus (drivers/pci/probe.c:3041)
[ 4.420795][ C0] acpi_pci_root_create (drivers/acpi/pci_root.c:900)
[ 4.420805][ C0] pci_acpi_scan_root (arch/x86/pci/acpi.c:368)
[ 4.420817][ C0] acpi_pci_root_add.cold (drivers/acpi/pci_root.c:602)
[ 4.420826][ C0] acpi_bus_attach (drivers/acpi/scan.c:2191 drivers/acpi/scan.c:2238)
[ 4.420830][ C0] ? __device_attach (drivers/base/dd.c:992)
[ 4.420838][ C0] acpi_bus_attach (drivers/acpi/scan.c:2258 (discriminator 3))
[ 4.420841][ C0] ? __device_attach (drivers/base/dd.c:992)
[ 4.420844][ C0] acpi_bus_attach (drivers/acpi/scan.c:2258 (discriminator 3))
[ 4.420848][ C0] acpi_bus_scan (drivers/acpi/scan.c:2451)
[ 4.420852][ C0] ? acpi_bus_init (drivers/acpi/bus.c:1336)
[ 4.420863][ C0] acpi_scan_init (drivers/acpi/scan.c:2613)
[ 4.420870][ C0] acpi_init (drivers/acpi/bus.c:1362)
[ 4.420875][ C0] do_one_initcall (init/main.c:1298)
[ 4.420885][ C0] do_initcalls (init/main.c:1370 init/main.c:1387)
[ 4.420894][ C0] kernel_init_freeable (init/main.c:1617)
[ 4.420900][ C0] ? rest_init (init/main.c:1494)
[ 4.420906][ C0] kernel_init (init/main.c:1504)
[ 4.420912][ C0] ret_from_fork (arch/x86/entry/entry_64.S:304)
[ 4.420920][ C0] </TASK>
[ 4.420922][ C0] Modules linked in:
[ 4.420926][ C0] CR2: 00000000000002d8
[ 4.420930][ C0] ---[ end trace 0000000000000000 ]---
[ 4.420932][ C0] RIP: task_nr_scan_windows+0x5/0x80
[ 4.420935][ C0] Code: 66 2e 0f 1f 84 00 00 00 00 00 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 0f 1f 44 00 00 <48> 8b 87 d8 02 00 00 8b 0d d2 91 73 01 48 8b b7 e0 02 00 00 48 8b
All code
========
0: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
7: 00 00 00
a: 66 66 2e 0f 1f 84 00 data16 nopw %cs:0x0(%rax,%rax,1)
11: 00 00 00 00
15: 66 66 2e 0f 1f 84 00 data16 nopw %cs:0x0(%rax,%rax,1)
1c: 00 00 00 00
20: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
25: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
2a:* 48 8b 87 d8 02 00 00 mov 0x2d8(%rdi),%rax <-- trapping instruction
31: 8b 0d d2 91 73 01 mov 0x17391d2(%rip),%ecx # 0x1739209
37: 48 8b b7 e0 02 00 00 mov 0x2e0(%rdi),%rsi
3e: 48 rex.W
3f: 8b .byte 0x8b

Code starting with the faulting instruction
===========================================
0: 48 8b 87 d8 02 00 00 mov 0x2d8(%rdi),%rax
7: 8b 0d d2 91 73 01 mov 0x17391d2(%rip),%ecx # 0x17391df
d: 48 8b b7 e0 02 00 00 mov 0x2e0(%rdi),%rsi
14: 48 rex.W
15: 8b .byte 0x8b


To reproduce:

git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
sudo bin/lkp install job.yaml # job file is attached in this email
bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
sudo bin/lkp run generated-yaml-file

# if come across any failure that blocks the test,
# please remove ~/.lkp and /lkp dir to run from a clean state.



--
0-DAY CI Kernel Test Service
https://01.org/lkp



Attachments:
(No filename) (10.64 kB)
config-5.18.0-rc1-00006-g753550eb0ce1 (165.11 kB)
job-script (8.02 kB)
dmesg.xz (10.06 kB)
job.yaml (5.24 kB)
Download all attachments