2024-03-22 09:06:58

by kernel test robot

[permalink] [raw]
Subject: [linus:master] [x86/sme] 48204aba80: BUG:kernel_failed_in_early-boot_stage,last_printk:Booting_the_kernel(entry_offset:#)



Hello,

kernel test robot noticed "BUG:kernel_failed_in_early-boot_stage,last_printk:Booting_the_kernel(entry_offset:#)" on:

commit: 48204aba801f1b512b3abed10b8e1a63e03f3dd1 ("x86/sme: Move early SME kernel encryption handling into .head.text")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

[test failed on linus/master 741e9d668aa50c91e4f681511ce0e408d55dd7ce]
[test failed on linux-next/master a1e7655b77e3391b58ac28256789ea45b1685abb]

in testcase: boot

compiler: gcc-12
test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G

(please refer to attached dmesg/kmsg for entire log/backtrace)


+--------------------------------------------------------------------------------------+------------+------------+
| | cd0d9d92c8 | 48204aba80 |
+--------------------------------------------------------------------------------------+------------+------------+
| BUG:kernel_failed_in_early-boot_stage,last_printk:Booting_the_kernel(entry_offset:#) | 0 | 18 |
+--------------------------------------------------------------------------------------+------------+------------+


If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <[email protected]>
| Closes: https://lore.kernel.org/oe-lkp/[email protected]



Decompressing Linux... No EFI environment detected.
Parsing ELF... Performing relocations... done.
Booting the kernel (entry_offset: 0x0000000000000000).
convert early boot stage from reboot-without-warning to failed
BUG: kernel failed in early-boot stage, last printk: Booting the kernel (entry_offset: 0x0000000000000000).
Linux version 6.8.0-rc6-00057-g48204aba801f #1
Command line: ip=::::vm-meta-180::dhcp root=/dev/ram0 RESULT_ROOT=/result/boot/1/vm-snb/quantal-x86_64-core-20190426.cgz/x86_64-rhel-8.3-bpf/gcc-12/48204aba801f1b512b3abed10b8e1a63e03f3dd1/3 BOOT_IMAGE=/pkg/linux/x86_64-rhel-8.3-bpf/gcc-12/48204aba801f1b512b3abed10b8e1a63e03f3dd1/vmlinuz-6.80-rc6-00057-g48204aba801f branch=linus/master job=/lkp/jobs/scheduled/vm-meta-180/boot-1-quantal-x86_64-core-20190426.cgz-48204aba801f-20240317-32104-1snnfl0-5.yaml user=lkp ARCH=x86_64 kconfig=x86_64-rhel-8.3-bpf commit=48204aba801f1b512b3abed10b8e1a63e03f3dd1 nmi_watchdog=0 intremap=posted_msi vmalloc=256M initramfs_async=0 page_owner=on max_uptime=600 LKP_SERVER=internal-lkp-server selinux=0 debug apic=debug sysrq_always_enabled rcupdate.rcu_cpu_stall_timeout=100 net.ifnames=0 printk.devkmsg=on panic=-1 softlockup_panic=1 nmi_watchdog=panic oops=panic load_ramdisk=2 prompt_ramdisk=0 drbd.minor_count=8 systemd.log_level=err ignore_loglevel console=tty0 earlyprintk=ttyS0,115200 console=ttyS0,115200 vga=normal rw rcuperf.shutdown=0 watchdog_thresh=240 audit=0

Kboot worker: lkp-worker27
Elapsed time: 60


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20240322/[email protected]



--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki



2024-03-24 14:26:13

by Borislav Petkov

[permalink] [raw]
Subject: Re: [linus:master] [x86/sme] 48204aba80: BUG:kernel_failed_in_early-boot_stage,last_printk:Booting_the_kernel(entry_offset:#)

On Fri, Mar 22, 2024 at 05:03:18PM +0800, kernel test robot wrote:
>
>
> Hello,
>
> kernel test robot noticed "BUG:kernel_failed_in_early-boot_stage,last_printk:Booting_the_kernel(entry_offset:#)" on:
>
> commit: 48204aba801f1b512b3abed10b8e1a63e03f3dd1 ("x86/sme: Move early SME kernel encryption handling into .head.text")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>
> [test failed on linus/master 741e9d668aa50c91e4f681511ce0e408d55dd7ce]
> [test failed on linux-next/master a1e7655b77e3391b58ac28256789ea45b1685abb]
>
> in testcase: boot
>
> compiler: gcc-12
> test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G

My guest boots with your .config and SNB as CPU model:

..
[ 0.373770][ T1] smpboot: CPU0: Intel Xeon E312xx (Sandy Bridge) (family: 0x6, model: 0x2a, stepping: 0x1)

Artefacts like:

-initrd initrd-vm-meta-180.cgz

or

RESULT_ROOT=/result/boot/1/vm-snb/quantal-x86_64-core-20190426.cgz/x86_64-rhel-8.3-bpf/gcc-12/48204aba801f1b512b3abed10b8e1a63e03f3dd1/3

I don't have and don't know how to generate here so I can't run your
exact reproducer.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2024-03-25 16:38:58

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [linus:master] [x86/sme] 48204aba80: BUG:kernel_failed_in_early-boot_stage,last_printk:Booting_the_kernel(entry_offset:#)

On Sun, 24 Mar 2024 at 16:26, Borislav Petkov <[email protected]> wrote:
>
> On Fri, Mar 22, 2024 at 05:03:18PM +0800, kernel test robot wrote:
> >
> >
> > Hello,
> >
> > kernel test robot noticed "BUG:kernel_failed_in_early-boot_stage,last_printk:Booting_the_kernel(entry_offset:#)" on:
> >
> > commit: 48204aba801f1b512b3abed10b8e1a63e03f3dd1 ("x86/sme: Move early SME kernel encryption handling into .head.text")
> > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> >
> > [test failed on linus/master 741e9d668aa50c91e4f681511ce0e408d55dd7ce]
> > [test failed on linux-next/master a1e7655b77e3391b58ac28256789ea45b1685abb]
> >
> > in testcase: boot
> >
> > compiler: gcc-12
> > test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
>
> My guest boots with your .config and SNB as CPU model:
>
> ...
> [ 0.373770][ T1] smpboot: CPU0: Intel Xeon E312xx (Sandy Bridge) (family: 0x6, model: 0x2a, stepping: 0x1)
>
> Artefacts like:
>
> -initrd initrd-vm-meta-180.cgz
>
> or
>
> RESULT_ROOT=/result/boot/1/vm-snb/quantal-x86_64-core-20190426.cgz/x86_64-rhel-8.3-bpf/gcc-12/48204aba801f1b512b3abed10b8e1a63e03f3dd1/3
>
> I don't have and don't know how to generate here so I can't run your
> exact reproducer.
>

I ran the reproducer using the instructions, and things seem to work fine.

https://paste.debian.net/1311951/

Could you provide any information regarding the QEMU version and its
BIOS implementation?

2024-03-26 08:31:53

by kernel test robot

[permalink] [raw]
Subject: Re: [linus:master] [x86/sme] 48204aba80: BUG:kernel_failed_in_early-boot_stage,last_printk:Booting_the_kernel(entry_offset:#)

hi, Ard Biesheuvel,

On Mon, Mar 25, 2024 at 04:39:26PM +0200, Ard Biesheuvel wrote:
> On Sun, 24 Mar 2024 at 16:26, Borislav Petkov <[email protected]> wrote:
> >
> > On Fri, Mar 22, 2024 at 05:03:18PM +0800, kernel test robot wrote:
> > >
> > >
> > > Hello,
> > >
> > > kernel test robot noticed "BUG:kernel_failed_in_early-boot_stage,last_printk:Booting_the_kernel(entry_offset:#)" on:
> > >
> > > commit: 48204aba801f1b512b3abed10b8e1a63e03f3dd1 ("x86/sme: Move early SME kernel encryption handling into .head.text")
> > > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> > >
> > > [test failed on linus/master 741e9d668aa50c91e4f681511ce0e408d55dd7ce]
> > > [test failed on linux-next/master a1e7655b77e3391b58ac28256789ea45b1685abb]
> > >
> > > in testcase: boot
> > >
> > > compiler: gcc-12
> > > test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
> >
> > My guest boots with your .config and SNB as CPU model:
> >
> > ...
> > [ 0.373770][ T1] smpboot: CPU0: Intel Xeon E312xx (Sandy Bridge) (family: 0x6, model: 0x2a, stepping: 0x1)
> >
> > Artefacts like:
> >
> > -initrd initrd-vm-meta-180.cgz
> >
> > or
> >
> > RESULT_ROOT=/result/boot/1/vm-snb/quantal-x86_64-core-20190426.cgz/x86_64-rhel-8.3-bpf/gcc-12/48204aba801f1b512b3abed10b8e1a63e03f3dd1/3
> >
> > I don't have and don't know how to generate here so I can't run your
> > exact reproducer.
> >
>
> I ran the reproducer using the instructions, and things seem to work fine.
>
> https://paste.debian.net/1311951/
>
> Could you provide any information regarding the QEMU version and its
> BIOS implementation?

for QEMU version:

$ qemu-system-x86_64 --version
QEMU emulator version 7.2.9 (Debian 1:7.2+dfsg-7+deb12u5)
Copyright (c) 2003-2022 Fabrice Bellard and the QEMU Project developers


for BIOS:

We don't specify bios option for qemu, my understanding is we just run with
default bios for qemu (the seabios). Extra info of seabios

SeaBios
QEMU, by default, uses a BIOS called SeaBios. It is a pretty good option and
most can be used with most bootloaders. 4 And naturally, every guest machine
is loaded with the SeaBios and you don't have to do anything. However,
you might want, or need, to use UEFI instead.


2024-03-26 08:59:24

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [linus:master] [x86/sme] 48204aba80: BUG:kernel_failed_in_early-boot_stage,last_printk:Booting_the_kernel(entry_offset:#)

On Tue, 26 Mar 2024 at 10:31, Oliver Sang <[email protected]> wrote:
>
> hi, Ard Biesheuvel,
>
> On Mon, Mar 25, 2024 at 04:39:26PM +0200, Ard Biesheuvel wrote:
> > On Sun, 24 Mar 2024 at 16:26, Borislav Petkov <[email protected]> wrote:
> > >
> > > On Fri, Mar 22, 2024 at 05:03:18PM +0800, kernel test robot wrote:
> > > >
> > > >
> > > > Hello,
> > > >
> > > > kernel test robot noticed "BUG:kernel_failed_in_early-boot_stage,last_printk:Booting_the_kernel(entry_offset:#)" on:
> > > >
> > > > commit: 48204aba801f1b512b3abed10b8e1a63e03f3dd1 ("x86/sme: Move early SME kernel encryption handling into .head.text")
> > > > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> > > >
> > > > [test failed on linus/master 741e9d668aa50c91e4f681511ce0e408d55dd7ce]
> > > > [test failed on linux-next/master a1e7655b77e3391b58ac28256789ea45b1685abb]
> > > >
> > > > in testcase: boot
> > > >
> > > > compiler: gcc-12
> > > > test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
> > >
> > > My guest boots with your .config and SNB as CPU model:
> > >
> > > ...
> > > [ 0.373770][ T1] smpboot: CPU0: Intel Xeon E312xx (Sandy Bridge) (family: 0x6, model: 0x2a, stepping: 0x1)
> > >
> > > Artefacts like:
> > >
> > > -initrd initrd-vm-meta-180.cgz
> > >
> > > or
> > >
> > > RESULT_ROOT=/result/boot/1/vm-snb/quantal-x86_64-core-20190426.cgz/x86_64-rhel-8.3-bpf/gcc-12/48204aba801f1b512b3abed10b8e1a63e03f3dd1/3
> > >
> > > I don't have and don't know how to generate here so I can't run your
> > > exact reproducer.
> > >
> >
> > I ran the reproducer using the instructions, and things seem to work fine.
> >
> > https://paste.debian.net/1311951/
> >
> > Could you provide any information regarding the QEMU version and its
> > BIOS implementation?
>
> for QEMU version:
>
> $ qemu-system-x86_64 --version
> QEMU emulator version 7.2.9 (Debian 1:7.2+dfsg-7+deb12u5)
> Copyright (c) 2003-2022 Fabrice Bellard and the QEMU Project developers
>

I tested the exact same version.

Does it reproduce with -cpu host instead of -cpu SandyBridge? When
running under KVM, I suspect emulating the actual host uarch rather
than setting a different one is a more reliable strategy. What CPU
type does the host have?

>
> for BIOS:
>
> We don't specify bios option for qemu, my understanding is we just run with
> default bios for qemu (the seabios). Extra info of seabios
>

Today, legacy BIOS boot is only used by a minority of x86 systems in
the field, so for better coverage, it would make sense to at least
start testing UEFI as well.

On debian, just install the ovmf package, and pass -bios
/usr/share/ovmf/OVMF.fd on the QEMU command line.

And given that you are doing virt based boot testing, another very
important use case is TDX boot (as well as SEV-SNP, but that may be
more difficult for you to organize). But please explore internally at
Intel whether TDX can be added to your test matrix as well.

2024-03-28 05:57:32

by kernel test robot

[permalink] [raw]
Subject: Re: [linus:master] [x86/sme] 48204aba80: BUG:kernel_failed_in_early-boot_stage,last_printk:Booting_the_kernel(entry_offset:#)

hi, Ard Biesheuvel,

On Tue, Mar 26, 2024 at 10:59:04AM +0200, Ard Biesheuvel wrote:
> On Tue, 26 Mar 2024 at 10:31, Oliver Sang <[email protected]> wrote:
> >
> > hi, Ard Biesheuvel,
> >
> > On Mon, Mar 25, 2024 at 04:39:26PM +0200, Ard Biesheuvel wrote:
> > > On Sun, 24 Mar 2024 at 16:26, Borislav Petkov <[email protected]> wrote:
> > > >
> > > > On Fri, Mar 22, 2024 at 05:03:18PM +0800, kernel test robot wrote:
> > > > >
> > > > >
> > > > > Hello,
> > > > >
> > > > > kernel test robot noticed "BUG:kernel_failed_in_early-boot_stage,last_printk:Booting_the_kernel(entry_offset:#)" on:
> > > > >
> > > > > commit: 48204aba801f1b512b3abed10b8e1a63e03f3dd1 ("x86/sme: Move early SME kernel encryption handling into .head.text")
> > > > > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> > > > >
> > > > > [test failed on linus/master 741e9d668aa50c91e4f681511ce0e408d55dd7ce]
> > > > > [test failed on linux-next/master a1e7655b77e3391b58ac28256789ea45b1685abb]
> > > > >
> > > > > in testcase: boot
> > > > >
> > > > > compiler: gcc-12
> > > > > test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
> > > >
> > > > My guest boots with your .config and SNB as CPU model:
> > > >
> > > > ...
> > > > [ 0.373770][ T1] smpboot: CPU0: Intel Xeon E312xx (Sandy Bridge) (family: 0x6, model: 0x2a, stepping: 0x1)
> > > >
> > > > Artefacts like:
> > > >
> > > > -initrd initrd-vm-meta-180.cgz
> > > >
> > > > or
> > > >
> > > > RESULT_ROOT=/result/boot/1/vm-snb/quantal-x86_64-core-20190426.cgz/x86_64-rhel-8.3-bpf/gcc-12/48204aba801f1b512b3abed10b8e1a63e03f3dd1/3
> > > >
> > > > I don't have and don't know how to generate here so I can't run your
> > > > exact reproducer.
> > > >
> > >
> > > I ran the reproducer using the instructions, and things seem to work fine.
> > >
> > > https://paste.debian.net/1311951/
> > >
> > > Could you provide any information regarding the QEMU version and its
> > > BIOS implementation?
> >
> > for QEMU version:
> >
> > $ qemu-system-x86_64 --version
> > QEMU emulator version 7.2.9 (Debian 1:7.2+dfsg-7+deb12u5)
> > Copyright (c) 2003-2022 Fabrice Bellard and the QEMU Project developers
> >
>
> I tested the exact same version.
>
> Does it reproduce with -cpu host instead of -cpu SandyBridge? When
> running under KVM, I suspect emulating the actual host uarch rather
> than setting a different one is a more reliable strategy. What CPU
> type does the host have?


we have a machine pool which has machines with different cpu models, we deploy
vm on them to run various boot/fuzzy/func tests. to avoid subtle issues, we
couldn't use '-cpu host' directly.


>
> >
> > for BIOS:
> >
> > We don't specify bios option for qemu, my understanding is we just run with
> > default bios for qemu (the seabios). Extra info of seabios
> >
>
> Today, legacy BIOS boot is only used by a minority of x86 systems in
> the field, so for better coverage, it would make sense to at least
> start testing UEFI as well.
>
> On debian, just install the ovmf package, and pass -bios
> /usr/share/ovmf/OVMF.fd on the QEMU command line.
>
> And given that you are doing virt based boot testing, another very
> important use case is TDX boot (as well as SEV-SNP, but that may be
> more difficult for you to organize). But please explore internally at
> Intel whether TDX can be added to your test matrix as well.

thanks a lot for great suggestions! we will investigate these.


regarding this early-boot failure issue, by more tests, we double it may
relate with 3 configs. as we shared in [1], they are set as below when the
kernel run into early-boot failure:

# CONFIG_INIT_STACK_NONE is not set
CONFIG_INIT_STACK_ALL_ZERO=y
CONFIG_GCC_PLUGIN_STACKLEAK=y


the early-boot failure issue will _disappear_ by making either one of two
changes:

(1)
CONFIG_CC_HAS_AUTO_VAR_INIT_ZERO=y
-# CONFIG_INIT_STACK_NONE is not set
+CONFIG_INIT_STACK_NONE=y
# CONFIG_INIT_STACK_ALL_PATTERN is not set
-CONFIG_INIT_STACK_ALL_ZERO=y
+# CONFIG_INIT_STACK_ALL_ZERO is not set
CONFIG_GCC_PLUGIN_STACKLEAK=y


(2)
CONFIG_INIT_STACK_ALL_ZERO=y
-CONFIG_GCC_PLUGIN_STACKLEAK=y
-# CONFIG_GCC_PLUGIN_STACKLEAK_VERBOSE is not set
-CONFIG_STACKLEAK_TRACK_MIN_SIZE=100
-# CONFIG_STACKLEAK_METRICS is not set
-# CONFIG_STACKLEAK_RUNTIME_DISABLE is not set
+# CONFIG_GCC_PLUGIN_STACKLEAK is not set
CONFIG_INIT_ON_ALLOC_DEFAULT_ON=y


[1]
https://download.01.org/0day-ci/archive/20240322/[email protected]/config-6.8.0-rc6-00057-g48204aba801f

2024-03-28 06:54:59

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [linus:master] [x86/sme] 48204aba80: BUG:kernel_failed_in_early-boot_stage,last_printk:Booting_the_kernel(entry_offset:#)

On Thu, 28 Mar 2024 at 07:57, Oliver Sang <[email protected]> wrote:
>
..
> regarding this early-boot failure issue, by more tests, we double it may
> relate with 3 configs. as we shared in [1], they are set as below when the
> kernel run into early-boot failure:
>
> # CONFIG_INIT_STACK_NONE is not set
> CONFIG_INIT_STACK_ALL_ZERO=y
> CONFIG_GCC_PLUGIN_STACKLEAK=y
>
>
> the early-boot failure issue will _disappear_ by making either one of two
> changes:
>
> (1)
> CONFIG_CC_HAS_AUTO_VAR_INIT_ZERO=y
> -# CONFIG_INIT_STACK_NONE is not set
> +CONFIG_INIT_STACK_NONE=y
> # CONFIG_INIT_STACK_ALL_PATTERN is not set
> -CONFIG_INIT_STACK_ALL_ZERO=y
> +# CONFIG_INIT_STACK_ALL_ZERO is not set
> CONFIG_GCC_PLUGIN_STACKLEAK=y
>
>
> (2)
> CONFIG_INIT_STACK_ALL_ZERO=y
> -CONFIG_GCC_PLUGIN_STACKLEAK=y
> -# CONFIG_GCC_PLUGIN_STACKLEAK_VERBOSE is not set
> -CONFIG_STACKLEAK_TRACK_MIN_SIZE=100
> -# CONFIG_STACKLEAK_METRICS is not set
> -# CONFIG_STACKLEAK_RUNTIME_DISABLE is not set
> +# CONFIG_GCC_PLUGIN_STACKLEAK is not set
> CONFIG_INIT_ON_ALLOC_DEFAULT_ON=y
>

Thanks, this was very useful in narrowing it down. I sent out a fix
for the stackleak plugin.