Linux next-20230517 build with clang nightly for i386 boot fails intermittently.
- i386: boot/clang-nightly-lkftconfig - failed
Please find detailed log here,
boot log:
=====
[ 0.000000] Linux version 6.4.0-rc2-next-20230517 (tuxmake@tuxmake)
(Debian clang version 17.0.0
(++20230507093322+728b8a139804-1~exp1~20230507093335.652), Debian LLD
17.0.0) #1 SMP PREEMPT_DYNAMIC @1684291209
<6>[ 0.000000] x86/fpu: x87 FPU will use FXSAVE
<6>[ 0.000000] signal: max sigframe size: 1440
<6>[ 0.000000] BIOS-provided physical RAM map:
<6>[ 0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
<6>[ 0.000000] BIOS-e820: [mem
0x000000000009fc00-0x000000000009ffff] reserved
<6>[ 0.000000] BIOS-e820: [mem
0x00000000000f0000-0x00000000000fffff] reserved
<6>[ 0.000000] BIOS-e820: [mem 0x0000000000100000-0x000000007ffdefff] usable
<6>[ 0.000000] BIOS-e820: [mem
0x000000007ffdf000-0x000000007fffffff] reserved
<6>[ 0.000000] BIOS-e820: [mem
0x00000000b0000000-0x00000000bfffffff] reserved
<6>[ 0.000000] BIOS-e820: [mem
0x00000000fed1c000-0x00000000fed1ffff] reserved
<6>[ 0.000000] BIOS-e820: [mem
0x00000000fffc0000-0x00000000ffffffff] reserved
<6>[ 0.000000] BIOS-e820: [mem 0x0000000100000000-0x000000017fffffff] usable
<5>[ 0.000000] random: crng init done
<5>[ 0.000000] Notice: NX (Execute Disable) protection cannot be
enabled: non-PAE kernel!
<7>[ 0.000000] e820: update [mem 0x00cfce20-0x00cfce2f] usable ==> usable
<7>[ 0.000000] e820: update [mem 0x00cfce20-0x00cfce2f] usable ==> usable
<6>[ 0.000000] extended physical RAM map:
<6>[ 0.000000] reserve setup_data: [mem
0x0000000000000000-0x000000000009fbff] usable
<6>[ 0.000000] reserve setup_data: [mem
0x000000000009fc00-0x000000000009ffff] reserved
<6>[ 0.000000] reserve setup_data: [mem
0x00000000000f0000-0x00000000000fffff] reserved
<6>[ 0.000000] reserve setup_data: [mem
0x0000000000100000-0x0000000000cfce1f] usable
<6>[ 0.000000] reserve setup_data: [mem
0x0000000000cfce20-0x0000000000cfce2f] usable
<6>[ 0.000000] reserve setup_data: [mem
0x0000000000cfce30-0x000000007ffdefff] usable
<6>[ 0.000000] reserve setup_data: [mem
0x000000007ffdf000-0x000000007fffffff] reserved
<6>[ 0.000000] reserve setup_data: [mem
0x00000000b0000000-0x00000000bfffffff] reserved
<6>[ 0.000000] reserve setup_data: [mem
0x00000000fed1c000-0x00000000fed1ffff] reserved
<6>[ 0.000000] reserve setup_data: [mem
0x00000000fffc0000-0x00000000ffffffff] reserved
<6>[ 0.000000] reserve setup_data: [mem
0x0000000100000000-0x000000017fffffff] usable
<6>[ 0.000000] SMBIOS 2.8 present.
<6>[ 0.000000] DMI: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
1.14.0-2 04/01/2014
<6>[ 0.000000] tsc: Fast TSC calibration using PIT
<6>[ 0.000000] tsc: Detected 3000.002 MHz processor
<7>[ 0.009140] e820: update [mem 0x00000000-0x00000fff] usable ==> reserved
<7>[ 0.009369] e820: remove [mem 0x000a0000-0x000fffff] usable
<6>[ 0.009688] last_pfn = 0x7ffdf max_arch_pfn = 0x100000
<6>[ 0.010313] x86/PAT: PAT not supported by the CPU.
<6>[ 0.010476] x86/PAT: Configuration [0-7]: WB WT UC- UC WB WT UC- UC
<6>[ 0.028035] found SMP MP-table at [mem 0x000f5ce0-0x000f5cef]
<7>[ 0.029050] initial memory mapped: [mem 0x00000000-0x18ffffff]
<6>[ 0.032814] ACPI: Early table checksum verification disabled
<6>[ 0.033316] ACPI: RSDP 0x00000000000F5B10 000014 (v00 BOCHS )
<6>[ 0.033820] ACPI: RSDT 0x000000007FFE2319 000038 (v01 BOCHS
BXPC 00000001 BXPC 00000001)
<6>[ 0.034477] ACPI: FACP 0x000000007FFE2109 0000F4 (v03 BOCHS
BXPC 00000001 BXPC 00000001)
<6>[ 0.035093] ACPI: DSDT 0x000000007FFE0040 0020C9 (v01 BOCHS
BXPC 00000001 BXPC 00000001)
<6>[ 0.035191] ACPI: FACS 0x000000007FFE0000 000040
<6>[ 0.035258] ACPI: APIC 0x000000007FFE21FD 000080 (v01 BOCHS
BXPC 00000001 BXPC 00000001)
<6>[ 0.035306] ACPI: HPET 0x000000007FFE227D 000038 (v01 BOCHS
BXPC 00000001 BXPC 00000001)
<6>[ 0.035336] ACPI: MCFG 0x000000007FFE22B5 00003C (v01 BOCHS
BXPC 00000001 BXPC 00000001)
<6>[ 0.035365] ACPI: WAET 0x000000007FFE22F1 000028 (v01 BOCHS
BXPC 00000001 BXPC 00000001)
<6>[ 0.035470] ACPI: Reserving FACP table memory at [mem
0x7ffe2109-0x7ffe21fc]
<6>[ 0.035522] ACPI: Reserving DSDT table memory at [mem
0x7ffe0040-0x7ffe2108]
<6>[ 0.035533] ACPI: Reserving FACS table memory at [mem
0x7ffe0000-0x7ffe003f]
<6>[ 0.035542] ACPI: Reserving APIC table memory at [mem
0x7ffe21fd-0x7ffe227c]
<6>[ 0.035550] ACPI: Reserving HPET table memory at [mem
0x7ffe227d-0x7ffe22b4]
<6>[ 0.035558] ACPI: Reserving MCFG table memory at [mem
0x7ffe22b5-0x7ffe22f0]
<6>[ 0.035566] ACPI: Reserving WAET table memory at [mem
0x7ffe22f1-0x7ffe2318]
<5>[ 0.036775] 1167MB HIGHMEM available.
<5>[ 0.036823] 879MB LOWMEM available.
<6>[ 0.036848] mapped low ram: 0 - 36ffe000
<6>[ 0.036864] low ram: 0 - 36ffe000
<6>[ 0.037465] Zone ranges:
<6>[ 0.037490] DMA [mem 0x0000000000001000-0x0000000000ffffff]
<6>[ 0.037576] Normal [mem 0x0000000001000000-0x0000000036ffdfff]
<6>[ 0.037591] HighMem [mem 0x0000000036ffe000-0x000000007ffdefff]
<6>[ 0.037610] Movable zone start for each node
<6>[ 0.037625] Early memory node ranges
<6>[ 0.037668] node 0: [mem 0x0000000000001000-0x000000000009efff]
<6>[ 0.037741] node 0: [mem 0x0000000000100000-0x000000007ffdefff]
<6>[ 0.037912] Initmem setup node 0 [mem
0x0000000000001000-0x000000007ffdefff]
<6>[ 0.039699] On node 0, zone DMA: 1 pages in unavailable ranges
<6>[ 0.040493] On node 0, zone DMA: 97 pages in unavailable ranges
<6>[ 0.068623] Using APIC driver default
<6>[ 0.069096] ACPI: PM-Timer IO Port: 0x608
<6>[ 0.069764] ACPI: LAPIC_NMI (acpi_id[0xff] dfl dfl lint[0x1])
<6>[ 0.070159] IOAPIC[0]: apic_id 0, version 32, address 0xfec00000, GSI 0-23
<6>[ 0.070278] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
<6>[ 0.070576] ACPI: INT_SRC_OVR (bus 0 bus_irq 5 global_irq 5 high level)
<6>[ 0.070611] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
<6>[ 0.070722] ACPI: INT_SRC_OVR (bus 0 bus_irq 10 global_irq 10 high level)
<6>[ 0.070735] ACPI: INT_SRC_OVR (bus 0 bus_irq 11 global_irq 11 high level)
<6>[ 0.070968] ACPI: Using ACPI (MADT) for SMP configuration information
<6>[ 0.071240] smpboot: Allowing 2 CPUs, 0 hotplug CPUs
<6>[ 0.072248] PM: hibernation: Registered nosave memory: [mem
0x00000000-0x00000fff]
<6>[ 0.072334] PM: hibernation: Registered nosave memory: [mem
0x0009f000-0x0009ffff]
<6>[ 0.072384] PM: hibernation: Registered nosave memory: [mem
0x000a0000-0x000effff]
<6>[ 0.072394] PM: hibernation: Registered nosave memory: [mem
0x000f0000-0x000fffff]
<6>[ 0.072405] PM: hibernation: Registered nosave memory: [mem
0x00cfc000-0x00cfcfff]
<6>[ 0.072417] PM: hibernation: Registered nosave memory: [mem
0x00cfc000-0x00cfcfff]
<6>[ 0.072636] [mem 0xc0000000-0xfed1bfff] available for PCI devices
<6>[ 0.072683] Booting paravirtualized kernel on bare hardware
<6>[ 0.073201] clocksource: refined-jiffies: mask: 0xffffffff
max_cycles: 0xffffffff, max_idle_ns: 1910969940391419 ns
<6>[ 0.073674] setup_percpu: NR_CPUS:8 nr_cpumask_bits:2
nr_cpu_ids:2 nr_node_ids:1
<6>[ 0.075154] percpu: Embedded 31 pages/cpu s96372 r0 d30604 u126976
<7>[ 0.075652] pcpu-alloc: s96372 r0 d30604 u126976 alloc=31*4096
<7>[ 0.075742] pcpu-alloc: [0] 0 [0] 1
<5>[ 0.077147] Kernel command line: console=ttyS0,115200 rootwait
root=/dev/sda debug verbose console_msg_format=syslog earlycon
<5>[ 0.078730] Unknown kernel command line parameters \"verbose\",
will be passed to user space.
<6>[ 0.080340] Dentry cache hash table entries: 131072 (order: 7,
524288 bytes, linear)
<6>[ 0.081003] Inode-cache hash table entries: 65536 (order: 6,
262144 bytes, linear)
<6>[ 0.082840] Built 1 zonelists, mobility grouping on. Total pages: 521957
<6>[ 0.088082] mem auto-init: stack:all(zero), heap alloc:off, heap free:off
<6>[ 0.088308] Initializing HighMem for node 0 (00036ffe:0007ffdf)
<6>[ 0.454727] Initializing Movable for node 0 (00000000:00000000)
<6>[ 0.459790] Checking if this processor honours the WP bit even
in supervisor mode...Ok.
<6>[ 0.460945] Memory: 2046372K/2096628K available (15688K kernel
code, 2173K rwdata, 5772K rodata, 996K init, 780K bss, 50256K
reserved, 0K cma-reserved, 1195908K highmem)
<6>[ 0.466051] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=2, Nodes=1
<6>[ 0.468048] ftrace: allocating 54241 entries in 106 pages
<6>[ 0.511020] ftrace: allocated 106 pages with 4 groups
<6>[ 0.514727] trace event string verifier disabled
<6>[ 0.519501] Dynamic Preempt: voluntary
<6>[ 0.522734] rcu: Preemptible hierarchical RCU implementation.
<6>[ 0.522756] rcu: RCU event tracing is enabled.
<6>[ 0.522785] rcu: RCU restricting CPUs from NR_CPUS=8 to nr_cpu_ids=2.
<6>[ 0.522916] Trampoline variant of Tasks RCU enabled.
<6>[ 0.522927] Rude variant of Tasks RCU enabled.
<6>[ 0.523017] rcu: RCU calculated value of scheduler-enlistment
delay is 100 jiffies.
<6>[ 0.523045] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=2
<6>[ 0.538503] NR_IRQS: 2304, nr_irqs: 440, preallocated irqs: 16
<6>[ 0.543029] rcu: srcu_init: Setting srcu_struct sizes based on contention.
<6>[ 0.546346] kfence: initialized - using 2097152 bytes for 255
objects at 0x(ptrval)-0x(ptrval)
<6>[ 0.551378] Console: colour VGA+ 80x25
<6>[ 0.553103] printk: console [ttyS0] enabled
<6>[ 0.626471] ACPI: Core revision 20230331
<6>[ 0.637882] APIC: Switch to symmetric I/O mode setup
<6>[ 0.638348] Enabling APIC mode: Flat. Using 1 I/O APICs
<6>[ 0.644118] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
<6>[ 0.650790] clocksource: tsc-early: mask: 0xffffffffffffffff
max_cycles: 0x2b3e46fa93c, max_idle_ns: 440795237232 ns
<6>[ 0.652683] Calibrating delay loop (skipped), value calculated
using timer frequency.. 6000.00 BogoMIPS (lpj=3000002)
<6>[ 0.654599] pid_max: default: 32768 minimum: 301
<6>[ 0.656733] LSM: initializing lsm=capability,selinux,integrity
<6>[ 0.657713] SELinux: Initializing.
<6>[ 0.660732] Mount-cache hash table entries: 2048 (order: 1, 8192
bytes, linear)
<6>[ 0.661233] Mountpoint-cache hash table entries: 2048 (order: 1,
8192 bytes, linear)
<6>[ 0.682197] Last level iTLB entries: 4KB 0, 2MB 0, 4MB 0
<6>[ 0.682458] Last level dTLB entries: 4KB 0, 2MB 0, 4MB 0, 1GB 0
<6>[ 0.682670] Spectre V1 : Mitigation: usercopy/swapgs barriers
and __user pointer sanitization
<6>[ 0.684632] Spectre V2 : Mitigation: Retpolines
<6>[ 0.684930] Spectre V2 : Spectre v2 / SpectreRSB mitigation:
Filling RSB on context switch
<6>[ 0.685539] Spectre V2 : Spectre v2 / SpectreRSB : Filling RSB on VMEXIT
<4>[ 0.686934] L1TF: Kernel not compiled for PAE. No mitigation for L1TF
<6>[ 0.688097] MDS: Vulnerable: Clear CPU buffers attempted, no microcode
<6>[ 0.688548] MMIO Stale Data: Unknown: No mitigations
<0>[ 0.692458] __common_interrupt: 0.79 No irq handler for vector
<0>[ 0.692458] __common_interrupt: 0.160 No irq handler for vector
<1>[ 0.692458] BUG: unable to handle page fault for address: 000024c0
<1>[ 0.692458] #PF: user read access in kernel mode
<1>[ 0.692458] #PF: error_code(0x0004) - not-present page
<6>[ 0.692458] *pde = 00000000
<4>[ 0.692458] Oops: 0004 [#1] PREEMPT SMP
<4>[ 0.692458] CPU: 0 PID: 0 Comm: swapper/0 Not tainted
6.4.0-rc2-next-20230517 #1
<4>[ 0.692458] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009),
BIOS 1.14.0-2 04/01/2014
<4>[ 0.692458] EIP: 0x60
<4>[ 0.692458] Code: Unable to access opcode bytes at 0x36.
<4>[ 0.692458] EAX: 00000060 EBX: 00000000 ECX: b51e0945 EDX: 006afcf0
<4>[ 0.692458] ESI: 00000060 EDI: 00000020 EBP: d86f7c04 ESP: d86f7ce0
<4>[ 0.692458] DS: 0000 ES: 0000 FS: 0000 GS: 0000 SS: 7c8c EFLAGS: 00324b92
<4>[ 0.692458] CR0: 80050033 CR2: 000024c0 CR3: 18a49000 CR4: 000006d0
<4>[ 0.692458] Call Trace:
<4>[ 0.692458] ? vscnprintf+0x12/0x30
<4>[ 0.692458] ? prb_read_valid+0x31/0x50
<4>[ 0.692458] ? console_unlock+0x94/0xc0
<4>[ 0.692458] ? vprintk_emit+0xf4/0x230
<4>[ 0.692458] ? vprintk_emit+0xf4/0x230
<4>[ 0.692458] ? vprintk_default+0x15/0x20
<4>[ 0.692458] ? __is_insn_slot_addr+0x41/0x50
<4>[ 0.692458] ? kernel_text_address+0x7c/0x90
<4>[ 0.692458] ? __kernel_text_address+0x10/0x40
<4>[ 0.692458] ? show_trace_log_lvl.17+0x1f5/0x2f0
<4>[ 0.692458] ? __die_body+0x54/0x90
<4>[ 0.692458] ? __die+0x70/0x80
<4>[ 0.692458] ? page_fault_oops+0x2d3/0x310
<4>[ 0.692458] ? kernelmode_fixup_or_oops+0xa5/0xc0
<4>[ 0.692458] ? __bad_area_nosemaphore+0x59/0x1d0
<4>[ 0.692458] ? bad_area_nosemaphore+0x12/0x20
<4>[ 0.692458] ? do_user_addr_fault+0x156/0x400
<4>[ 0.692458] ? irqentry_enter+0x32/0x38
<4>[ 0.692458] ? trace_hardirqs_off_finish+0x2c/0x70
<4>[ 0.692458] ? exc_page_fault+0x50/0x154
<4>[ 0.692458] ? pvclock_clocksource_read_nowd+0xdc/0xdc
<4>[ 0.692458] ? handle_exception+0x133/0x133
<4>[ 0.692458] ? pvclock_clocksource_read_nowd+0xdc/0xdc
<4>[ 0.692458] Modules linked in:
<4>[ 0.692458] CR2: 00000000000024c0
<4>[ 0.692458] ---[ end trace 0000000000000000 ]---
<4>[ 0.692458] EIP: 0x60
<4>[ 0.692458] Code: Unable to access opcode bytes at 0x36.
<4>[ 0.692458] EAX: 00000060 EBX: 00000000 ECX: b51e0945 EDX: 006afcf0
<4>[ 0.692458] ESI: 00000060 EDI: 00000020 EBP: d86f7c04 ESP: d86f7ce0
<4>[ 0.692458] DS: 0000 ES: 0000 FS: 0000 GS: 0000 SS: 7c8c EFLAGS: 00324b92
<4>[ 0.692458] CR0: 80050033 CR2: 000024c0 CR3: 18a49000 CR4: 000006d0
<0>[ 0.692458] Kernel panic - not syncing: Attempted to kill the idle task!
<0>[ 0.692458] ---[ end Kernel panic - not syncing: Attempted to
kill the idle task! ]---
poweroff
logs:
===
- https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20230517/testrun/17029757/suite/boot/test/clang-nightly-lkftconfig/log
- https://tuxapi.tuxsuite.com/v1/groups/linaro/projects/lkft/tests/2PtyoY5qEJrKsuIBPRVSIGK02m3
Steps to reproduce intermittently,
=============
# To install tuxrun on your system globally:
# sudo pip3 install -U tuxrun==0.42.0
#
# See https://tuxrun.org/ for complete documentation.
tuxrun \
--runtime podman \
--device qemu-i386 \
--kernel https://storage.tuxsuite.com/public/linaro/lkft/builds/2PtylXBvODt99O8r9YKOEwj0fQc/bzImage
\
--modules https://storage.tuxsuite.com/public/linaro/lkft/builds/2PtylXBvODt99O8r9YKOEwj0fQc/modules.tar.xz
\
--rootfs https://storage.tuxsuite.com/public/linaro/lkft/oebuilds/2PeQhkn36JqPowQBZge0AFhBav6/images/intel-core2-32/lkft-tux-image-intel-core2-32-20230511145016.rootfs.ext4.gz
\
--parameters SKIPFILE=skipfile-lkft.yaml \
--parameters SHARD_NUMBER=10 \
--parameters SHARD_INDEX=2 \
--image docker.io/lavasoftware/lava-dispatcher:2023.01.0020.gc1598238f \
--tests ltp-syscalls \
--timeouts boot=15 ltp-syscalls=50
--
Linaro LKFT
https://lkft.linaro.org
On Wed, May 17, 2023 at 8:21 AM Naresh Kamboju
<[email protected]> wrote:
>
> Linux next-20230517 build with clang nightly for i386 boot fails intermittently.
Keyword: intermittently. That will make tracking this down fun.
Our CI also hit a boot failure on tip/master with the same splat:
https://github.com/ClangBuiltLinux/continuous-integration2/actions/runs/4998374271/jobs/8957285746
Though the CI pulled down a SHA
0932447780e1f9a43bf68ef7fe3d9b41b46d58fc
which looks weird on
https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit/?id=0932447780e1f9a43bf68ef7fe3d9b41b46d58fc
>> Notice: this object is not reachable from any branch.
That this failed in -next and -tip in the same way makes me wonder if
something affecting this is coming in via -tip? Maybe the splat looks
familiar to x86 folks?
I haven't been able to reproduce locally when my machine is relatively
load-less. If I do a kernel build in the background, I was able to
get QEMU to hang, but without any splat. That was using tip/master @
f81d8f759e7f.
Naresh, when you say "intermittent" do you have any data on the
relative frequency of this boot failure? (Also, please make sure to
use [email protected] in the future; we moved mailing lists years
ago).
Looks like our CI report linked above has an additional splat though
via apply_alternatives and optimize_nops.
>> [ 0.166742] Code: Unable to access opcode bytes at 0x36.
Peter, that smells like perhaps either:
commit b6c881b248ef ("x86/alternative: Complicate optimize_nops() some more")
commit 6c480f222128 ("x86/alternative: Rewrite optimize_nops() some")
Looks like BP committed them May 11; maybe just recently they were
merged into tip/master?
> - i386: boot/clang-nightly-lkftconfig - failed
>
> Please find detailed log here,
>
> boot log:
> =====
> [ 0.000000] Linux version 6.4.0-rc2-next-20230517 (tuxmake@tuxmake)
> (Debian clang version 17.0.0
> (++20230507093322+728b8a139804-1~exp1~20230507093335.652), Debian LLD
> 17.0.0) #1 SMP PREEMPT_DYNAMIC @1684291209
> <6>[ 0.000000] x86/fpu: x87 FPU will use FXSAVE
> <6>[ 0.000000] signal: max sigframe size: 1440
> <6>[ 0.000000] BIOS-provided physical RAM map:
> <6>[ 0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
> <6>[ 0.000000] BIOS-e820: [mem
> 0x000000000009fc00-0x000000000009ffff] reserved
> <6>[ 0.000000] BIOS-e820: [mem
> 0x00000000000f0000-0x00000000000fffff] reserved
> <6>[ 0.000000] BIOS-e820: [mem 0x0000000000100000-0x000000007ffdefff] usable
> <6>[ 0.000000] BIOS-e820: [mem
> 0x000000007ffdf000-0x000000007fffffff] reserved
> <6>[ 0.000000] BIOS-e820: [mem
> 0x00000000b0000000-0x00000000bfffffff] reserved
> <6>[ 0.000000] BIOS-e820: [mem
> 0x00000000fed1c000-0x00000000fed1ffff] reserved
> <6>[ 0.000000] BIOS-e820: [mem
> 0x00000000fffc0000-0x00000000ffffffff] reserved
> <6>[ 0.000000] BIOS-e820: [mem 0x0000000100000000-0x000000017fffffff] usable
> <5>[ 0.000000] random: crng init done
> <5>[ 0.000000] Notice: NX (Execute Disable) protection cannot be
> enabled: non-PAE kernel!
> <7>[ 0.000000] e820: update [mem 0x00cfce20-0x00cfce2f] usable ==> usable
> <7>[ 0.000000] e820: update [mem 0x00cfce20-0x00cfce2f] usable ==> usable
> <6>[ 0.000000] extended physical RAM map:
> <6>[ 0.000000] reserve setup_data: [mem
> 0x0000000000000000-0x000000000009fbff] usable
> <6>[ 0.000000] reserve setup_data: [mem
> 0x000000000009fc00-0x000000000009ffff] reserved
> <6>[ 0.000000] reserve setup_data: [mem
> 0x00000000000f0000-0x00000000000fffff] reserved
> <6>[ 0.000000] reserve setup_data: [mem
> 0x0000000000100000-0x0000000000cfce1f] usable
> <6>[ 0.000000] reserve setup_data: [mem
> 0x0000000000cfce20-0x0000000000cfce2f] usable
> <6>[ 0.000000] reserve setup_data: [mem
> 0x0000000000cfce30-0x000000007ffdefff] usable
> <6>[ 0.000000] reserve setup_data: [mem
> 0x000000007ffdf000-0x000000007fffffff] reserved
> <6>[ 0.000000] reserve setup_data: [mem
> 0x00000000b0000000-0x00000000bfffffff] reserved
> <6>[ 0.000000] reserve setup_data: [mem
> 0x00000000fed1c000-0x00000000fed1ffff] reserved
> <6>[ 0.000000] reserve setup_data: [mem
> 0x00000000fffc0000-0x00000000ffffffff] reserved
> <6>[ 0.000000] reserve setup_data: [mem
> 0x0000000100000000-0x000000017fffffff] usable
> <6>[ 0.000000] SMBIOS 2.8 present.
> <6>[ 0.000000] DMI: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
> 1.14.0-2 04/01/2014
> <6>[ 0.000000] tsc: Fast TSC calibration using PIT
> <6>[ 0.000000] tsc: Detected 3000.002 MHz processor
> <7>[ 0.009140] e820: update [mem 0x00000000-0x00000fff] usable ==> reserved
> <7>[ 0.009369] e820: remove [mem 0x000a0000-0x000fffff] usable
> <6>[ 0.009688] last_pfn = 0x7ffdf max_arch_pfn = 0x100000
> <6>[ 0.010313] x86/PAT: PAT not supported by the CPU.
> <6>[ 0.010476] x86/PAT: Configuration [0-7]: WB WT UC- UC WB WT UC- UC
> <6>[ 0.028035] found SMP MP-table at [mem 0x000f5ce0-0x000f5cef]
> <7>[ 0.029050] initial memory mapped: [mem 0x00000000-0x18ffffff]
> <6>[ 0.032814] ACPI: Early table checksum verification disabled
> <6>[ 0.033316] ACPI: RSDP 0x00000000000F5B10 000014 (v00 BOCHS )
> <6>[ 0.033820] ACPI: RSDT 0x000000007FFE2319 000038 (v01 BOCHS
> BXPC 00000001 BXPC 00000001)
> <6>[ 0.034477] ACPI: FACP 0x000000007FFE2109 0000F4 (v03 BOCHS
> BXPC 00000001 BXPC 00000001)
> <6>[ 0.035093] ACPI: DSDT 0x000000007FFE0040 0020C9 (v01 BOCHS
> BXPC 00000001 BXPC 00000001)
> <6>[ 0.035191] ACPI: FACS 0x000000007FFE0000 000040
> <6>[ 0.035258] ACPI: APIC 0x000000007FFE21FD 000080 (v01 BOCHS
> BXPC 00000001 BXPC 00000001)
> <6>[ 0.035306] ACPI: HPET 0x000000007FFE227D 000038 (v01 BOCHS
> BXPC 00000001 BXPC 00000001)
> <6>[ 0.035336] ACPI: MCFG 0x000000007FFE22B5 00003C (v01 BOCHS
> BXPC 00000001 BXPC 00000001)
> <6>[ 0.035365] ACPI: WAET 0x000000007FFE22F1 000028 (v01 BOCHS
> BXPC 00000001 BXPC 00000001)
> <6>[ 0.035470] ACPI: Reserving FACP table memory at [mem
> 0x7ffe2109-0x7ffe21fc]
> <6>[ 0.035522] ACPI: Reserving DSDT table memory at [mem
> 0x7ffe0040-0x7ffe2108]
> <6>[ 0.035533] ACPI: Reserving FACS table memory at [mem
> 0x7ffe0000-0x7ffe003f]
> <6>[ 0.035542] ACPI: Reserving APIC table memory at [mem
> 0x7ffe21fd-0x7ffe227c]
> <6>[ 0.035550] ACPI: Reserving HPET table memory at [mem
> 0x7ffe227d-0x7ffe22b4]
> <6>[ 0.035558] ACPI: Reserving MCFG table memory at [mem
> 0x7ffe22b5-0x7ffe22f0]
> <6>[ 0.035566] ACPI: Reserving WAET table memory at [mem
> 0x7ffe22f1-0x7ffe2318]
> <5>[ 0.036775] 1167MB HIGHMEM available.
> <5>[ 0.036823] 879MB LOWMEM available.
> <6>[ 0.036848] mapped low ram: 0 - 36ffe000
> <6>[ 0.036864] low ram: 0 - 36ffe000
> <6>[ 0.037465] Zone ranges:
> <6>[ 0.037490] DMA [mem 0x0000000000001000-0x0000000000ffffff]
> <6>[ 0.037576] Normal [mem 0x0000000001000000-0x0000000036ffdfff]
> <6>[ 0.037591] HighMem [mem 0x0000000036ffe000-0x000000007ffdefff]
> <6>[ 0.037610] Movable zone start for each node
> <6>[ 0.037625] Early memory node ranges
> <6>[ 0.037668] node 0: [mem 0x0000000000001000-0x000000000009efff]
> <6>[ 0.037741] node 0: [mem 0x0000000000100000-0x000000007ffdefff]
> <6>[ 0.037912] Initmem setup node 0 [mem
> 0x0000000000001000-0x000000007ffdefff]
> <6>[ 0.039699] On node 0, zone DMA: 1 pages in unavailable ranges
> <6>[ 0.040493] On node 0, zone DMA: 97 pages in unavailable ranges
> <6>[ 0.068623] Using APIC driver default
> <6>[ 0.069096] ACPI: PM-Timer IO Port: 0x608
> <6>[ 0.069764] ACPI: LAPIC_NMI (acpi_id[0xff] dfl dfl lint[0x1])
> <6>[ 0.070159] IOAPIC[0]: apic_id 0, version 32, address 0xfec00000, GSI 0-23
> <6>[ 0.070278] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
> <6>[ 0.070576] ACPI: INT_SRC_OVR (bus 0 bus_irq 5 global_irq 5 high level)
> <6>[ 0.070611] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
> <6>[ 0.070722] ACPI: INT_SRC_OVR (bus 0 bus_irq 10 global_irq 10 high level)
> <6>[ 0.070735] ACPI: INT_SRC_OVR (bus 0 bus_irq 11 global_irq 11 high level)
> <6>[ 0.070968] ACPI: Using ACPI (MADT) for SMP configuration information
> <6>[ 0.071240] smpboot: Allowing 2 CPUs, 0 hotplug CPUs
> <6>[ 0.072248] PM: hibernation: Registered nosave memory: [mem
> 0x00000000-0x00000fff]
> <6>[ 0.072334] PM: hibernation: Registered nosave memory: [mem
> 0x0009f000-0x0009ffff]
> <6>[ 0.072384] PM: hibernation: Registered nosave memory: [mem
> 0x000a0000-0x000effff]
> <6>[ 0.072394] PM: hibernation: Registered nosave memory: [mem
> 0x000f0000-0x000fffff]
> <6>[ 0.072405] PM: hibernation: Registered nosave memory: [mem
> 0x00cfc000-0x00cfcfff]
> <6>[ 0.072417] PM: hibernation: Registered nosave memory: [mem
> 0x00cfc000-0x00cfcfff]
> <6>[ 0.072636] [mem 0xc0000000-0xfed1bfff] available for PCI devices
> <6>[ 0.072683] Booting paravirtualized kernel on bare hardware
> <6>[ 0.073201] clocksource: refined-jiffies: mask: 0xffffffff
> max_cycles: 0xffffffff, max_idle_ns: 1910969940391419 ns
> <6>[ 0.073674] setup_percpu: NR_CPUS:8 nr_cpumask_bits:2
> nr_cpu_ids:2 nr_node_ids:1
> <6>[ 0.075154] percpu: Embedded 31 pages/cpu s96372 r0 d30604 u126976
> <7>[ 0.075652] pcpu-alloc: s96372 r0 d30604 u126976 alloc=31*4096
> <7>[ 0.075742] pcpu-alloc: [0] 0 [0] 1
> <5>[ 0.077147] Kernel command line: console=ttyS0,115200 rootwait
> root=/dev/sda debug verbose console_msg_format=syslog earlycon
> <5>[ 0.078730] Unknown kernel command line parameters \"verbose\",
> will be passed to user space.
> <6>[ 0.080340] Dentry cache hash table entries: 131072 (order: 7,
> 524288 bytes, linear)
> <6>[ 0.081003] Inode-cache hash table entries: 65536 (order: 6,
> 262144 bytes, linear)
> <6>[ 0.082840] Built 1 zonelists, mobility grouping on. Total pages: 521957
> <6>[ 0.088082] mem auto-init: stack:all(zero), heap alloc:off, heap free:off
> <6>[ 0.088308] Initializing HighMem for node 0 (00036ffe:0007ffdf)
> <6>[ 0.454727] Initializing Movable for node 0 (00000000:00000000)
> <6>[ 0.459790] Checking if this processor honours the WP bit even
> in supervisor mode...Ok.
> <6>[ 0.460945] Memory: 2046372K/2096628K available (15688K kernel
> code, 2173K rwdata, 5772K rodata, 996K init, 780K bss, 50256K
> reserved, 0K cma-reserved, 1195908K highmem)
> <6>[ 0.466051] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=2, Nodes=1
> <6>[ 0.468048] ftrace: allocating 54241 entries in 106 pages
> <6>[ 0.511020] ftrace: allocated 106 pages with 4 groups
> <6>[ 0.514727] trace event string verifier disabled
> <6>[ 0.519501] Dynamic Preempt: voluntary
> <6>[ 0.522734] rcu: Preemptible hierarchical RCU implementation.
> <6>[ 0.522756] rcu: RCU event tracing is enabled.
> <6>[ 0.522785] rcu: RCU restricting CPUs from NR_CPUS=8 to nr_cpu_ids=2.
> <6>[ 0.522916] Trampoline variant of Tasks RCU enabled.
> <6>[ 0.522927] Rude variant of Tasks RCU enabled.
> <6>[ 0.523017] rcu: RCU calculated value of scheduler-enlistment
> delay is 100 jiffies.
> <6>[ 0.523045] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=2
> <6>[ 0.538503] NR_IRQS: 2304, nr_irqs: 440, preallocated irqs: 16
> <6>[ 0.543029] rcu: srcu_init: Setting srcu_struct sizes based on contention.
> <6>[ 0.546346] kfence: initialized - using 2097152 bytes for 255
> objects at 0x(ptrval)-0x(ptrval)
> <6>[ 0.551378] Console: colour VGA+ 80x25
> <6>[ 0.553103] printk: console [ttyS0] enabled
> <6>[ 0.626471] ACPI: Core revision 20230331
> <6>[ 0.637882] APIC: Switch to symmetric I/O mode setup
> <6>[ 0.638348] Enabling APIC mode: Flat. Using 1 I/O APICs
> <6>[ 0.644118] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
> <6>[ 0.650790] clocksource: tsc-early: mask: 0xffffffffffffffff
> max_cycles: 0x2b3e46fa93c, max_idle_ns: 440795237232 ns
> <6>[ 0.652683] Calibrating delay loop (skipped), value calculated
> using timer frequency.. 6000.00 BogoMIPS (lpj=3000002)
> <6>[ 0.654599] pid_max: default: 32768 minimum: 301
> <6>[ 0.656733] LSM: initializing lsm=capability,selinux,integrity
> <6>[ 0.657713] SELinux: Initializing.
> <6>[ 0.660732] Mount-cache hash table entries: 2048 (order: 1, 8192
> bytes, linear)
> <6>[ 0.661233] Mountpoint-cache hash table entries: 2048 (order: 1,
> 8192 bytes, linear)
> <6>[ 0.682197] Last level iTLB entries: 4KB 0, 2MB 0, 4MB 0
> <6>[ 0.682458] Last level dTLB entries: 4KB 0, 2MB 0, 4MB 0, 1GB 0
> <6>[ 0.682670] Spectre V1 : Mitigation: usercopy/swapgs barriers
> and __user pointer sanitization
> <6>[ 0.684632] Spectre V2 : Mitigation: Retpolines
> <6>[ 0.684930] Spectre V2 : Spectre v2 / SpectreRSB mitigation:
> Filling RSB on context switch
> <6>[ 0.685539] Spectre V2 : Spectre v2 / SpectreRSB : Filling RSB on VMEXIT
> <4>[ 0.686934] L1TF: Kernel not compiled for PAE. No mitigation for L1TF
> <6>[ 0.688097] MDS: Vulnerable: Clear CPU buffers attempted, no microcode
> <6>[ 0.688548] MMIO Stale Data: Unknown: No mitigations
> <0>[ 0.692458] __common_interrupt: 0.79 No irq handler for vector
> <0>[ 0.692458] __common_interrupt: 0.160 No irq handler for vector
> <1>[ 0.692458] BUG: unable to handle page fault for address: 000024c0
> <1>[ 0.692458] #PF: user read access in kernel mode
> <1>[ 0.692458] #PF: error_code(0x0004) - not-present page
> <6>[ 0.692458] *pde = 00000000
> <4>[ 0.692458] Oops: 0004 [#1] PREEMPT SMP
> <4>[ 0.692458] CPU: 0 PID: 0 Comm: swapper/0 Not tainted
> 6.4.0-rc2-next-20230517 #1
> <4>[ 0.692458] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009),
> BIOS 1.14.0-2 04/01/2014
> <4>[ 0.692458] EIP: 0x60
> <4>[ 0.692458] Code: Unable to access opcode bytes at 0x36.
> <4>[ 0.692458] EAX: 00000060 EBX: 00000000 ECX: b51e0945 EDX: 006afcf0
> <4>[ 0.692458] ESI: 00000060 EDI: 00000020 EBP: d86f7c04 ESP: d86f7ce0
> <4>[ 0.692458] DS: 0000 ES: 0000 FS: 0000 GS: 0000 SS: 7c8c EFLAGS: 00324b92
> <4>[ 0.692458] CR0: 80050033 CR2: 000024c0 CR3: 18a49000 CR4: 000006d0
> <4>[ 0.692458] Call Trace:
> <4>[ 0.692458] ? vscnprintf+0x12/0x30
> <4>[ 0.692458] ? prb_read_valid+0x31/0x50
> <4>[ 0.692458] ? console_unlock+0x94/0xc0
> <4>[ 0.692458] ? vprintk_emit+0xf4/0x230
> <4>[ 0.692458] ? vprintk_emit+0xf4/0x230
> <4>[ 0.692458] ? vprintk_default+0x15/0x20
> <4>[ 0.692458] ? __is_insn_slot_addr+0x41/0x50
> <4>[ 0.692458] ? kernel_text_address+0x7c/0x90
> <4>[ 0.692458] ? __kernel_text_address+0x10/0x40
> <4>[ 0.692458] ? show_trace_log_lvl.17+0x1f5/0x2f0
> <4>[ 0.692458] ? __die_body+0x54/0x90
> <4>[ 0.692458] ? __die+0x70/0x80
> <4>[ 0.692458] ? page_fault_oops+0x2d3/0x310
> <4>[ 0.692458] ? kernelmode_fixup_or_oops+0xa5/0xc0
> <4>[ 0.692458] ? __bad_area_nosemaphore+0x59/0x1d0
> <4>[ 0.692458] ? bad_area_nosemaphore+0x12/0x20
> <4>[ 0.692458] ? do_user_addr_fault+0x156/0x400
> <4>[ 0.692458] ? irqentry_enter+0x32/0x38
> <4>[ 0.692458] ? trace_hardirqs_off_finish+0x2c/0x70
> <4>[ 0.692458] ? exc_page_fault+0x50/0x154
> <4>[ 0.692458] ? pvclock_clocksource_read_nowd+0xdc/0xdc
> <4>[ 0.692458] ? handle_exception+0x133/0x133
> <4>[ 0.692458] ? pvclock_clocksource_read_nowd+0xdc/0xdc
> <4>[ 0.692458] Modules linked in:
> <4>[ 0.692458] CR2: 00000000000024c0
> <4>[ 0.692458] ---[ end trace 0000000000000000 ]---
> <4>[ 0.692458] EIP: 0x60
> <4>[ 0.692458] Code: Unable to access opcode bytes at 0x36.
> <4>[ 0.692458] EAX: 00000060 EBX: 00000000 ECX: b51e0945 EDX: 006afcf0
> <4>[ 0.692458] ESI: 00000060 EDI: 00000020 EBP: d86f7c04 ESP: d86f7ce0
> <4>[ 0.692458] DS: 0000 ES: 0000 FS: 0000 GS: 0000 SS: 7c8c EFLAGS: 00324b92
> <4>[ 0.692458] CR0: 80050033 CR2: 000024c0 CR3: 18a49000 CR4: 000006d0
> <0>[ 0.692458] Kernel panic - not syncing: Attempted to kill the idle task!
> <0>[ 0.692458] ---[ end Kernel panic - not syncing: Attempted to
> kill the idle task! ]---
> poweroff
>
>
> logs:
> ===
> - https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20230517/testrun/17029757/suite/boot/test/clang-nightly-lkftconfig/log
> - https://tuxapi.tuxsuite.com/v1/groups/linaro/projects/lkft/tests/2PtyoY5qEJrKsuIBPRVSIGK02m3
>
>
> Steps to reproduce intermittently,
> =============
> # To install tuxrun on your system globally:
> # sudo pip3 install -U tuxrun==0.42.0
> #
> # See https://tuxrun.org/ for complete documentation.
>
> tuxrun \
> --runtime podman \
> --device qemu-i386 \
> --kernel https://storage.tuxsuite.com/public/linaro/lkft/builds/2PtylXBvODt99O8r9YKOEwj0fQc/bzImage
> \
> --modules https://storage.tuxsuite.com/public/linaro/lkft/builds/2PtylXBvODt99O8r9YKOEwj0fQc/modules.tar.xz
> \
> --rootfs https://storage.tuxsuite.com/public/linaro/lkft/oebuilds/2PeQhkn36JqPowQBZge0AFhBav6/images/intel-core2-32/lkft-tux-image-intel-core2-32-20230511145016.rootfs.ext4.gz
> \
> --parameters SKIPFILE=skipfile-lkft.yaml \
> --parameters SHARD_NUMBER=10 \
> --parameters SHARD_INDEX=2 \
> --image docker.io/lavasoftware/lava-dispatcher:2023.01.0020.gc1598238f \
> --tests ltp-syscalls \
> --timeouts boot=15 ltp-syscalls=50
>
> --
> Linaro LKFT
> https://lkft.linaro.org
--
Thanks,
~Nick Desaulniers
On Wed, 17 May 2023 at 23:42, Nick Desaulniers <[email protected]> wrote:
>
> On Wed, May 17, 2023 at 8:21 AM Naresh Kamboju
> <[email protected]> wrote:
> >
> > Linux next-20230517 build with clang nightly for i386 boot fails intermittently.
>
> Keyword: intermittently. That will make tracking this down fun.
>
> Our CI also hit a boot failure on tip/master with the same splat:
> https://github.com/ClangBuiltLinux/continuous-integration2/actions/runs/4998374271/jobs/8957285746
> Though the CI pulled down a SHA
> 0932447780e1f9a43bf68ef7fe3d9b41b46d58fc
> which looks weird on
> https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit/?id=0932447780e1f9a43bf68ef7fe3d9b41b46d58fc
> >> Notice: this object is not reachable from any branch.
>
> That this failed in -next and -tip in the same way makes me wonder if
> something affecting this is coming in via -tip? Maybe the splat looks
> familiar to x86 folks?
>
> I haven't been able to reproduce locally when my machine is relatively
> load-less. If I do a kernel build in the background, I was able to
> get QEMU to hang, but without any splat. That was using tip/master @
> f81d8f759e7f.
>
> Naresh, when you say "intermittent" do you have any data on the
> relative frequency of this boot failure? (Also, please make sure to
> use [email protected] in the future; we moved mailing lists years
> ago).
Noted:
It is reproducible only 3 times out of 100 runs.
>
> Looks like our CI report linked above has an additional splat though
> via apply_alternatives and optimize_nops.
>
> >> [ 0.166742] Code: Unable to access opcode bytes at 0x36.
>
> Peter, that smells like perhaps either:
> commit b6c881b248ef ("x86/alternative: Complicate optimize_nops() some more")
> commit 6c480f222128 ("x86/alternative: Rewrite optimize_nops() some")
>
> Looks like BP committed them May 11; maybe just recently they were
> merged into tip/master?
>
>
> > - i386: boot/clang-nightly-lkftconfig - failed
> >
- Naresh
On Wed, May 17, 2023 at 11:11:51AM -0700, Nick Desaulniers wrote:
> On Wed, May 17, 2023 at 8:21 AM Naresh Kamboju
> <[email protected]> wrote:
> >
> > Linux next-20230517 build with clang nightly for i386 boot fails intermittently.
>
> Keyword: intermittently. That will make tracking this down fun.
>
> Our CI also hit a boot failure on tip/master with the same splat:
> https://github.com/ClangBuiltLinux/continuous-integration2/actions/runs/4998374271/jobs/8957285746
> Though the CI pulled down a SHA
> 0932447780e1f9a43bf68ef7fe3d9b41b46d58fc
> which looks weird on
> https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit/?id=0932447780e1f9a43bf68ef7fe3d9b41b46d58fc
> >> Notice: this object is not reachable from any branch.
Github isn't willing to show me content unless I log in or somesuch
nonsense.
> That this failed in -next and -tip in the same way makes me wonder if
> something affecting this is coming in via -tip? Maybe the splat looks
> familiar to x86 folks?
>
> I haven't been able to reproduce locally when my machine is relatively
> load-less. If I do a kernel build in the background, I was able to
> get QEMU to hang, but without any splat. That was using tip/master @
> f81d8f759e7f.
>
> Naresh, when you say "intermittent" do you have any data on the
> relative frequency of this boot failure? (Also, please make sure to
> use [email protected] in the future; we moved mailing lists years
> ago).
>
> Looks like our CI report linked above has an additional splat though
> via apply_alternatives and optimize_nops.
>
> >> [ 0.166742] Code: Unable to access opcode bytes at 0x36.
>
> Peter, that smells like perhaps either:
> commit b6c881b248ef ("x86/alternative: Complicate optimize_nops() some more")
> commit 6c480f222128 ("x86/alternative: Rewrite optimize_nops() some")
So I did find me a 'funny' there, but nothing that explains boot fail.
It would think that 'PAUSE' is a 2 byte NOP and replace it with NOP2;
which is not quite the same thing. The below seems to cure that.
Let me continue poking at things...
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index 93aa95afd005..bb0a7b03e52f 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -159,9 +160,12 @@ void text_poke_early(void *addr, const void *opcode, size_t len);
*/
static bool insn_is_nop(struct insn *insn)
{
- if (insn->opcode.bytes[0] == 0x90)
+ /* Anything NOP, but not REP NOP. */
+ if (insn->opcode.bytes[0] == 0x90 &&
+ (!insn->prefixes.nbytes || insn->prefixes.bytes[0] != 0xF3))
return true;
+ /* NOPL */
if (insn->opcode.bytes[0] == 0x0F && insn->opcode.bytes[1] == 0x1F)
return true;
On Wed, May 24, 2023 at 6:01 AM Peter Zijlstra <[email protected]> wrote:
>
> On Wed, May 17, 2023 at 11:11:51AM -0700, Nick Desaulniers wrote:
> > On Wed, May 17, 2023 at 8:21 AM Naresh Kamboju
> > <[email protected]> wrote:
> > >
> > > Linux next-20230517 build with clang nightly for i386 boot fails intermittently.
> >
> > Keyword: intermittently. That will make tracking this down fun.
> >
> > Our CI also hit a boot failure on tip/master with the same splat:
> > https://github.com/ClangBuiltLinux/continuous-integration2/actions/runs/4998374271/jobs/8957285746
> > Though the CI pulled down a SHA
> > 0932447780e1f9a43bf68ef7fe3d9b41b46d58fc
> > which looks weird on
> > https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit/?id=0932447780e1f9a43bf68ef7fe3d9b41b46d58fc
> > >> Notice: this object is not reachable from any branch.
>
> Github isn't willing to show me content unless I log in or somesuch
> nonsense.
Ah, sorry about that.
https://paste.debian.net/1281050/ should be the log of ours.
https://storage.tuxsuite.com/public/clangbuiltlinux/continuous-integration2/builds/2QEtkwi60Mn3NLX4U0sDCAH0qqp/bzImage
is the corresponding build artifact.
There's ongoing discussion in #x86 on LinuxNet. I suspect that a few
of Naresh's recent reports are all perhaps one single issue.
Arnd mentioned https://lore.kernel.org/all/CA+G9fYvVZ9WF-2zfrYeo3xnWNra0QGxLzei+b4yANZwEvr5CYw@mail.gmail.com/
which looks similar but is with GCC.
Either way, we're seeing this in mainline.
>
> > That this failed in -next and -tip in the same way makes me wonder if
> > something affecting this is coming in via -tip? Maybe the splat looks
> > familiar to x86 folks?
> >
> > I haven't been able to reproduce locally when my machine is relatively
> > load-less. If I do a kernel build in the background, I was able to
> > get QEMU to hang, but without any splat. That was using tip/master @
> > f81d8f759e7f.
> >
> > Naresh, when you say "intermittent" do you have any data on the
> > relative frequency of this boot failure? (Also, please make sure to
> > use [email protected] in the future; we moved mailing lists years
> > ago).
> >
> > Looks like our CI report linked above has an additional splat though
> > via apply_alternatives and optimize_nops.
> >
> > >> [ 0.166742] Code: Unable to access opcode bytes at 0x36.
> >
> > Peter, that smells like perhaps either:
> > commit b6c881b248ef ("x86/alternative: Complicate optimize_nops() some more")
> > commit 6c480f222128 ("x86/alternative: Rewrite optimize_nops() some")
>
> So I did find me a 'funny' there, but nothing that explains boot fail.
>
> It would think that 'PAUSE' is a 2 byte NOP and replace it with NOP2;
> which is not quite the same thing. The below seems to cure that.
>
> Let me continue poking at things...
>
> diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
> index 93aa95afd005..bb0a7b03e52f 100644
> --- a/arch/x86/kernel/alternative.c
> +++ b/arch/x86/kernel/alternative.c
> @@ -159,9 +160,12 @@ void text_poke_early(void *addr, const void *opcode, size_t len);
> */
> static bool insn_is_nop(struct insn *insn)
> {
> - if (insn->opcode.bytes[0] == 0x90)
> + /* Anything NOP, but not REP NOP. */
> + if (insn->opcode.bytes[0] == 0x90 &&
> + (!insn->prefixes.nbytes || insn->prefixes.bytes[0] != 0xF3))
> return true;
>
> + /* NOPL */
> if (insn->opcode.bytes[0] == 0x0F && insn->opcode.bytes[1] == 0x1F)
> return true;
>
--
Thanks,
~Nick Desaulniers