2021-12-16 10:30:14

by kernel test robot

[permalink] [raw]
Subject: [kthread] 40966e316f: WARNING:at_kernel/sched/core.c:#sched_init



Greeting,

FYI, we noticed the following commit (built with gcc-9):

commit: 40966e316f86b8cfd83abd31ccb4df729309d3e7 ("kthread: Ensure struct kthread is present for all kthreads")
https://git.kernel.org/cgit/linux/kernel/git/ebiederm/user-namespace.git signal-for-v5.17

in testcase: trinity
version: trinity-x86_64-608712d8-1_20211207
with following parameters:

runtime: 300s

test-description: Trinity is a linux system call fuzz tester.
test-url: http://codemonkey.org.uk/projects/trinity/


on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G

caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):



If you fix the issue, kindly add following tag
Reported-by: kernel test robot <[email protected]>


[ 1.977577][ T0] WARNING: CPU: 0 PID: 0 at kernel/sched/core.c:9428 sched_init (kernel/sched/core.c:9428 (discriminator 1))
[ 1.978820][ T0] Modules linked in:
[ 1.979372][ T0] CPU: 0 PID: 0 Comm: swapper Not tainted 5.16.0-rc1-00016-g40966e316f86 #1
[ 1.980543][ T0] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
[ 1.981796][ T0] RIP: 0010:sched_init (kernel/sched/core.c:9428 (discriminator 1))
[ 1.982511][ T0] Code: 91 e8 9a 76 90 fb 48 c7 c7 a0 69 29 91 65 4c 8b 24 25 00 6d 02 00 4c 89 e6 e8 82 a7 7d fb 4c 89 e7 e8 fa 41 8e fb 84 c0 74 02 <0f> 0b 65 8b 35 75 47 ee 6d 65 48 8b 3c 25 00 6d 02 00 e8 d6 ed ff
All code
========
0: 91 xchg %eax,%ecx
1: e8 9a 76 90 fb callq 0xfffffffffb9076a0
6: 48 c7 c7 a0 69 29 91 mov $0xffffffff912969a0,%rdi
d: 65 4c 8b 24 25 00 6d mov %gs:0x26d00,%r12
14: 02 00
16: 4c 89 e6 mov %r12,%rsi
19: e8 82 a7 7d fb callq 0xfffffffffb7da7a0
1e: 4c 89 e7 mov %r12,%rdi
21: e8 fa 41 8e fb callq 0xfffffffffb8e4220
26: 84 c0 test %al,%al
28: 74 02 je 0x2c
2a:* 0f 0b ud2 <-- trapping instruction
2c: 65 8b 35 75 47 ee 6d mov %gs:0x6dee4775(%rip),%esi # 0x6dee47a8
33: 65 48 8b 3c 25 00 6d mov %gs:0x26d00,%rdi
3a: 02 00
3c: e8 .byte 0xe8
3d: d6 (bad)
3e: ed in (%dx),%eax
3f: ff .byte 0xff

Code starting with the faulting instruction
===========================================
0: 0f 0b ud2
2: 65 8b 35 75 47 ee 6d mov %gs:0x6dee4775(%rip),%esi # 0x6dee477e
9: 65 48 8b 3c 25 00 6d mov %gs:0x26d00,%rdi
10: 02 00
12: e8 .byte 0xe8
13: d6 (bad)
14: ed in (%dx),%eax
15: ff .byte 0xff
[ 1.985437][ T0] RSP: 0000:ffffffff90c07ed8 EFLAGS: 00010002
[ 1.986298][ T0] RAX: 0000000000000001 RBX: ffff88839d737a40 RCX: 0000000000000000
[ 1.987402][ T0] RDX: 1ffffffff2184597 RSI: ffffffff9027c180 RDI: ffffffff90c22cb0
[ 1.988535][ T0] RBP: ffffffff90c07f20 R08: ffffffff920edb2b R09: ffffffff8d800107
[ 1.989665][ T0] R10: ffffffff920edb2b R11: ffffffff8d800107 R12: ffffffff90c22280
[ 1.990816][ T0] R13: 0000000000000002 R14: ffff88839d738418 R15: 00000000000f4240
[ 1.991938][ T0] FS: 0000000000000000(0000) GS:ffff88839d600000(0000) knlGS:0000000000000000
[ 1.993136][ T0] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1.994074][ T0] CR2: ffff88843ffff000 CR3: 0000000375c14000 CR4: 00000000000406b0
[ 1.995185][ T0] Call Trace:
[ 1.995658][ T0] <TASK>
[ 1.996073][ T0] start_kernel (arch/x86/include/asm/irqflags.h:29 arch/x86/include/asm/irqflags.h:70 init/main.c:996)
[ 1.996726][ T0] secondary_startup_64_no_verify (arch/x86/kernel/head_64.S:283)
[ 1.997625][ T0] </TASK>
[ 1.998051][ T0] ---[ end trace 9b598a628b1de811 ]---
[ 2.000123][ T0] rcu: Hierarchical RCU implementation.
[ 2.000911][ T0] rcu: RCU restricting CPUs from NR_CPUS=8192 to nr_cpu_ids=2.
[ 2.001949][ T0] RCU CPU stall warnings timeout set to 100 (rcu_cpu_stall_timeout).
[ 2.003115][ T0] Trampoline variant of Tasks RCU enabled.
[ 2.003927][ T0] Rude variant of Tasks RCU enabled.
[ 2.004696][ T0] Tracing variant of Tasks RCU enabled.
[ 2.005476][ T0] rcu: RCU calculated value of scheduler-enlistment delay is 100 jiffies.
[ 2.006689][ T0] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=2
[ 2.049794][ T0] NR_IRQS: 524544, nr_irqs: 440, preallocated irqs: 16
[ 2.057841][ T0] Console: colour VGA+ 80x25
[ 2.141394][ T0] printk: console [tty0] enabled
[ 2.142437][ T0] printk: console [ttyS0] enabled
[ 2.144033][ T0] printk: bootconsole [earlyser0] disabled
[ 2.145973][ T0] ACPI: Core revision 20210930
[ 2.147506][ T0] clocksource: hpet: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 19112604467 ns
[ 2.149537][ T0] APIC: Switch to symmetric I/O mode setup
[ 2.150977][ T0] x2apic enabled
[ 2.152043][ T0] Switched APIC routing to physical x2apic.
[ 2.153198][ T0] masked ExtINT on CPU#0
[ 2.154883][ T0] ENABLING IO-APIC IRQs
[ 2.155737][ T0] init IO_APIC IRQs
[ 2.156553][ T0] apic 0 pin 0 not connected
[ 2.157517][ T0] IOAPIC[0]: Preconfigured routing entry (0-1 -> IRQ 1 Level:0 ActiveLow:0)
[ 2.159208][ T0] IOAPIC[0]: Preconfigured routing entry (0-2 -> IRQ 0 Level:0 ActiveLow:0)
[ 2.160960][ T0] IOAPIC[0]: Preconfigured routing entry (0-3 -> IRQ 3 Level:0 ActiveLow:0)
[ 2.162643][ T0] IOAPIC[0]: Preconfigured routing entry (0-4 -> IRQ 4 Level:0 ActiveLow:0)
[ 2.164329][ T0] IOAPIC[0]: Preconfigured routing entry (0-5 -> IRQ 5 Level:1 ActiveLow:0)
[ 2.166123][ T0] IOAPIC[0]: Preconfigured routing entry (0-6 -> IRQ 6 Level:0 ActiveLow:0)
[ 2.167935][ T0] IOAPIC[0]: Preconfigured routing entry (0-7 -> IRQ 7 Level:0 ActiveLow:0)
[ 2.169686][ T0] IOAPIC[0]: Preconfigured routing entry (0-8 -> IRQ 8 Level:0 ActiveLow:0)
[ 2.171571][ T0] IOAPIC[0]: Preconfigured routing entry (0-9 -> IRQ 9 Level:1 ActiveLow:0)
[ 2.173392][ T0] IOAPIC[0]: Preconfigured routing entry (0-10 -> IRQ 10 Level:1 ActiveLow:0)
[ 2.175323][ T0] IOAPIC[0]: Preconfigured routing entry (0-11 -> IRQ 11 Level:1 ActiveLow:0)
[ 2.177186][ T0] IOAPIC[0]: Preconfigured routing entry (0-12 -> IRQ 12 Level:0 ActiveLow:0)
[ 2.179103][ T0] IOAPIC[0]: Preconfigured routing entry (0-13 -> IRQ 13 Level:0 ActiveLow:0)
[ 2.180864][ T0] IOAPIC[0]: Preconfigured routing entry (0-14 -> IRQ 14 Level:0 ActiveLow:0)
[ 2.182756][ T0] IOAPIC[0]: Preconfigured routing entry (0-15 -> IRQ 15 Level:0 ActiveLow:0)
[ 2.184563][ T0] apic 0 pin 16 not connected
[ 2.185577][ T0] apic 0 pin 17 not connected
[ 2.186572][ T0] apic 0 pin 18 not connected
[ 2.187580][ T0] apic 0 pin 19 not connected
[ 2.188595][ T0] apic 0 pin 20 not connected
[ 2.189599][ T0] apic 0 pin 21 not connected
[ 2.190565][ T0] apic 0 pin 22 not connected
[ 2.191561][ T0] apic 0 pin 23 not connected
[ 2.192740][ T0] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
[ 2.194099][ T0] clocksource: tsc-early: mask: 0xffffffffffffffff max_cycles: 0x2113883cfc4, max_idle_ns: 440795203346 ns
[ 2.196354][ T0] Calibrating delay loop (skipped) preset value.. 4589.33 BogoMIPS (lpj=2294668)
[ 2.197346][ T0] pid_max: default: 32768 minimum: 301
[ 2.199643][ T0] LSM: Security Framework initializing
[ 2.200371][ T0] Yama: becoming mindful.
[ 2.201599][ T0] Mount-cache hash table entries: 32768 (order: 6, 262144 bytes, linear)
[ 2.202407][ T0] Mountpoint-cache hash table entries: 32768 (order: 6, 262144 bytes, linear)
Poking KASLR using RDTSC...
[ 2.206585][ T0] Last level iTLB entries: 4KB 0, 2MB 0, 4MB 0
[ 2.207343][ T0] Last level dTLB entries: 4KB 0, 2MB 0, 4MB 0, 1GB 0
[ 2.208360][ T0] Spectre V1 : Mitigation: usercopy/swapgs barriers and __user pointer sanitization
[ 2.209355][ T0] Spectre V2 : Mitigation: Full generic retpoline
[ 2.210343][ T0] Spectre V2 : Spectre v2 / SpectreRSB mitigation: Filling RSB on context switch
[ 2.211344][ T0] Speculative Store Bypass: Vulnerable
[ 2.212362][ T0] MDS: Vulnerable: Clear CPU buffers attempted, no microcode
[ 2.230374][ T0] Freeing SMP alternatives memory: 40K
[ 2.231962][ T1] smpboot: CPU0: Intel Xeon E312xx (Sandy Bridge) (family: 0x6, model: 0x2a, stepping: 0x1)
[ 2.233793][ T1] Performance Events: unsupported p6 CPU model 42 no PMU driver, software events only.
[ 2.234580][ T1] rcu: Hierarchical SRCU implementation.
[ 2.238888][ T1] NMI watchdog: Perf NMI watchdog permanently disabled
[ 2.239965][ T1] smp: Bringing up secondary CPUs ...
[ 2.242120][ T1] x86: Booting SMP configuration:
[ 2.242355][ T1] .... node #0, CPUs: #1
[ 0.140853][ T0] kvm-clock: cpu 1, msr 377449041, secondary cpu clock
[ 0.140853][ T0] masked ExtINT on CPU#1
[ 0.140853][ T0] smpboot: CPU 1 Converting physical 0 to logical die 1
[ 2.247421][ T17] kvm-guest: stealtime: cpu 1, msr 39d727180
[ 2.248414][ T1] smp: Brought up 1 node, 2 CPUs
[ 2.249352][ T1] smpboot: Max logical packages: 2
[ 2.250010][ T1] smpboot: Total of 2 processors activated (9178.67 BogoMIPS)
[ 2.501557][ T22] node 0 deferred pages initialised in 251ms
[ 2.633580][ T1] allocated 201326592 bytes of page_ext
[ 2.634559][ T1] Node 0, zone DMA: page owner found early allocated 0 pages
[ 2.637514][ T1] Node 0, zone DMA32: page owner found early allocated 0 pages
[ 2.665731][ T1] Node 0, zone Normal: page owner found early allocated 50302 pages
[ 2.666834][ T1] devtmpfs: initialized
[ 2.667853][ T1] x86/mm: Memory block size: 128MB
[ 2.703773][ T1] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 1911260446275000 ns
[ 2.704375][ T1] futex hash table entries: 512 (order: 3, 32768 bytes, linear)
[ 2.705771][ T1] pinctrl core: initialized pinctrl subsystem
[ 2.708068][ T1] NET: Registered PF_NETLINK/PF_ROUTE protocol family
[ 2.710285][ T1] audit: initializing netlink subsys (disabled)
[ 2.710675][ T27] audit: type=2000 audit(1639545078.756:1): state=initialized audit_enabled=0 res=1
[ 2.712593][ T1] thermal_sys: Registered thermal governor 'fair_share'
[ 2.712599][ T1] thermal_sys: Registered thermal governor 'bang_bang'
[ 2.713348][ T1] thermal_sys: Registered thermal governor 'step_wise'
[ 2.714348][ T1] thermal_sys: Registered thermal governor 'user_space'
[ 2.715474][ T1] cpuidle: using governor menu
[ 2.718322][ T1] acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5
[ 2.719335][ T1] PCI: Using configuration type 1 for base access
[ 2.762444][ T1] kprobes: kprobe jump-optimization is enabled. All kprobes are optimized if possible.
[ 2.764642][ T1] HugeTLB registered 2.00 MiB page size, pre-allocated 0 pages
[ 2.767543][ T1] cryptd: max_cpu_qlen set to 1000


To reproduce:

# build kernel
cd linux
cp config-5.16.0-rc1-00016-g40966e316f86 .config
make HOSTCC=gcc-9 CC=gcc-9 ARCH=x86_64 olddefconfig prepare modules_prepare bzImage modules
make HOSTCC=gcc-9 CC=gcc-9 ARCH=x86_64 INSTALL_MOD_PATH=<mod-install-dir> modules_install
cd <mod-install-dir>
find lib/ | cpio -o -H newc --quiet | gzip > modules.cgz


git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp qemu -k <bzImage> -m modules.cgz job-script # job-script is attached in this email

# if come across any failure that blocks the test,
# please remove ~/.lkp and /lkp dir to run from a clean state.



---
0DAY/LKP+ Test Infrastructure Open Source Technology Center
https://lists.01.org/hyperkitty/list/[email protected] Intel Corporation

Thanks,
Oliver Sang


Attachments:
(No filename) (11.84 kB)
config-5.16.0-rc1-00016-g40966e316f86 (173.53 kB)
job-script (4.52 kB)
dmesg.xz (18.50 kB)
trinity (3.76 kB)
Download all attachments

2021-12-17 17:01:56

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [kthread] 40966e316f: WARNING:at_kernel/sched/core.c:#sched_init

kernel test robot <[email protected]> writes:

> Greeting,
>
> FYI, we noticed the following commit (built with gcc-9):
>
> commit: 40966e316f86b8cfd83abd31ccb4df729309d3e7 ("kthread: Ensure struct kthread is present for all kthreads")
> https://git.kernel.org/cgit/linux/kernel/git/ebiederm/user-namespace.git signal-for-v5.17
>
> in testcase: trinity
> version: trinity-x86_64-608712d8-1_20211207
> with following parameters:
>
> runtime: 300s
>
> test-description: Trinity is a linux system call fuzz tester.
> test-url: http://codemonkey.org.uk/projects/trinity/
>
>
> on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
>
> caused below changes (please refer to attached dmesg/kmsg for entire
> log/backtrace):


Ok. That is very weird. I will dig into it.

Silly question is there anything in this testing to cause memory
allocations to fail early in boot?

Eric

2021-12-23 02:48:20

by kernel test robot

[permalink] [raw]
Subject: Re: [kthread] 40966e316f: WARNING:at_kernel/sched/core.c:#sched_init

hi Eric,

On Fri, Dec 17, 2021 at 11:01:41AM -0600, Eric W. Biederman wrote:
> kernel test robot <[email protected]> writes:
>
> > Greeting,
> >
> > FYI, we noticed the following commit (built with gcc-9):
> >
> > commit: 40966e316f86b8cfd83abd31ccb4df729309d3e7 ("kthread: Ensure struct kthread is present for all kthreads")
> > https://git.kernel.org/cgit/linux/kernel/git/ebiederm/user-namespace.git signal-for-v5.17
> >
> > in testcase: trinity
> > version: trinity-x86_64-608712d8-1_20211207
> > with following parameters:
> >
> > runtime: 300s
> >
> > test-description: Trinity is a linux system call fuzz tester.
> > test-url: http://codemonkey.org.uk/projects/trinity/
> >
> >
> > on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
> >
> > caused below changes (please refer to attached dmesg/kmsg for entire
> > log/backtrace):
>
>
> Ok. That is very weird. I will dig into it.
>
> Silly question is there anything in this testing to cause memory
> allocations to fail early in boot?

we didn't observe it. actually test could finish for both this
commit (though with reported warning) and parent. the only diff seems
the reported warning.

cead18552660702a 40966e316f86b8cfd83abd31ccb
---------------- ---------------------------
fail:runs %reproduction fail:runs
| | |
:8 100% 8:8 dmesg.RIP:sched_init
:8 100% 8:8 dmesg.WARNING:at_kernel/sched/core.c:#sched_init

but could you check in dmesg we attached in original report for any suspicous
memory issue?

or could you help check the config we used (also attached in original report)
to see if we can make some changes then to expose the possible memory issue,
in case our current config cannot capture? Thanks



>
> Eric

2021-12-23 03:34:51

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [kthread] 40966e316f: WARNING:at_kernel/sched/core.c:#sched_init

Oliver Sang <[email protected]> writes:

> hi Eric,
>
> On Fri, Dec 17, 2021 at 11:01:41AM -0600, Eric W. Biederman wrote:
>> kernel test robot <[email protected]> writes:
>>
>> > Greeting,
>> >
>> > FYI, we noticed the following commit (built with gcc-9):
>> >
>> > commit: 40966e316f86b8cfd83abd31ccb4df729309d3e7 ("kthread: Ensure struct kthread is present for all kthreads")
>> > https://git.kernel.org/cgit/linux/kernel/git/ebiederm/user-namespace.git signal-for-v5.17
>> >
>> > in testcase: trinity
>> > version: trinity-x86_64-608712d8-1_20211207
>> > with following parameters:
>> >
>> > runtime: 300s
>> >
>> > test-description: Trinity is a linux system call fuzz tester.
>> > test-url: http://codemonkey.org.uk/projects/trinity/
>> >
>> >
>> > on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
>> >
>> > caused below changes (please refer to attached dmesg/kmsg for entire
>> > log/backtrace):
>>
>>
>> Ok. That is very weird. I will dig into it.
>>
>> Silly question is there anything in this testing to cause memory
>> allocations to fail early in boot?
>
> we didn't observe it. actually test could finish for both this
> commit (though with reported warning) and parent. the only diff seems
> the reported warning.
>
> cead18552660702a 40966e316f86b8cfd83abd31ccb
> ---------------- ---------------------------
> fail:runs %reproduction fail:runs
> | | |
> :8 100% 8:8 dmesg.RIP:sched_init
> :8 100% 8:8 dmesg.WARNING:at_kernel/sched/core.c:#sched_init
>
> but could you check in dmesg we attached in original report for any suspicous
> memory issue?
>
> or could you help check the config we used (also attached in original report)
> to see if we can make some changes then to expose the possible memory issue,
> in case our current config cannot capture? Thanks

This was my bad. I have pushed the fix out now. I had the test
backwards in the WARN_ON.

My apologies for the noise.

Eric