2022-10-28 14:14:39

by Sudip Mukherjee

[permalink] [raw]
Subject: boot failure of linux-next due to 1248fb6a8201 ("x86/mm: Randomize per-cpu entry area")

Hi All,

Our qemu boots were failing since next-20221024, and a git bisect of
next-20221028 showed the bad commit as 1248fb6a8201 ("x86/mm: Randomize per-cpu entry area")

After reverting the commit I could boot qemu again with next-20221028.

This is my config:

make defconfig
make kvm_guest.config
scripts/config -e KCOV -e KCOV_INSTRUMENT_ALL -e KCOV_ENABLE_COMPARISONS -e DEBUG_FS -e DEBUG_KMEMLEAK -e DEBUG_INFO -e KALLSYMS -e KALLSYMS_ALL -e NAMESPACES -e UTS_NS -e IPC_NS -e PID_NS -e NET_NS -e CGROUP_PIDS -e MEMCG -e USER_NS -e CONFIGFS_FS -e SECURITYFS -e KASAN -e KASAN_INLINE -e FAULT_INJECTION -e FAULT_INJECTION_DEBUG_FS -e FAULT_INJECTION_USERCOPY -e FAILSLAB -e FAIL_PAGE_ALLOC -e FAIL_MAKE_REQUEST -e FAIL_IO_TIMEOUT -e FAIL_FUTEX -e LOCKDEP -e PROVE_LOCKING -e DEBUG_ATOMIC_SLEEP -e PROVE_RCU -e DEBUG_VM -e REFCOUNT_FULL -e FORTIFY_SOURCE -e HARDENED_USERCOPY -e LOCKUP_DETECTOR -e SOFTLOCKUP_DETECTOR -e HARDLOCKUP_DETECTOR -e BOOTPARAM_HARDLOCKUP_PANIC -e DETECT_HUNG_TASK -e WQ_WATCHDOG -e USB_GADGET -e USB_RAW_GADGET -e TUN -e KCSAN -d RANDOMIZE_BASE -e MAC80211_HWSIM -e IEEE802154 -e MAC802154 -e IEEE802154_DRIVERS -e IEEE802154_HWSIM -e BT -e BT_HCIVHCI
echo "CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=140" >> .config
echo "CONFIG_RCU_CPU_STALL_TIMEOUT=100" >> .config

I will be happy to test any patch or provide any extra log if needed.
Though I am not sure how I will collect extra logs (if needed) as there
was no output from qemu.


--
Regards
Sudip


2022-10-28 15:17:23

by Dave Hansen

[permalink] [raw]
Subject: Re: boot failure of linux-next due to 1248fb6a8201 ("x86/mm: Randomize per-cpu entry area")

On 10/28/22 06:29, Sudip Mukherjee (Codethink) wrote:
> I will be happy to test any patch or provide any extra log if needed.
> Though I am not sure how I will collect extra logs (if needed) as there
> was no output from qemu.

Could you share your qemu config? The command-line would be fine. Does
it have a serial console set up?

2022-10-28 16:39:44

by Sudip Mukherjee

[permalink] [raw]
Subject: Re: boot failure of linux-next due to 1248fb6a8201 ("x86/mm: Randomize per-cpu entry area")

On Fri, Oct 28, 2022 at 3:33 PM Dave Hansen <[email protected]> wrote:
>
> On 10/28/22 06:29, Sudip Mukherjee (Codethink) wrote:
> > I will be happy to test any patch or provide any extra log if needed.
> > Though I am not sure how I will collect extra logs (if needed) as there
> > was no output from qemu.
>
> Could you share your qemu config? The command-line would be fine. Does
> it have a serial console set up?

qemu-system-x86_64 -m 2048 -smp 2 -chardev
socket,id=SOCKSYZ,server,nowait,host=localhost,port=46514 -mon
chardev=SOCKSYZ,mode=control -display none -serial stdio -no-reboot
-name VM-test -device virtio-rng-pci -enable-kvm -cpu
host,migratable=off -device e1000,netdev=net0 -netdev
user,id=net0,restrict=on,hostfwd=tcp:127.0.0.1:28993-:22 -hda
bullseye.img -snapshot -kernel bzImage -append "root=/dev/sda
console=ttyS0 net.ifnames=0 biosdevname=0"

serial is via stdio, and in normal boots I get the bootlog in the
terminal, but in this case there was nothing.


--
Regards
Sudip

2022-10-28 17:29:59

by Kees Cook

[permalink] [raw]
Subject: Re: boot failure of linux-next due to 1248fb6a8201 ("x86/mm: Randomize per-cpu entry area")

On Fri, Oct 28, 2022 at 02:29:14PM +0100, Sudip Mukherjee (Codethink) wrote:
> Hi All,
>
> Our qemu boots were failing since next-20221024, and a git bisect of
> next-20221028 showed the bad commit as 1248fb6a8201 ("x86/mm: Randomize per-cpu entry area")
>
> After reverting the commit I could boot qemu again with next-20221028.
>
> This is my config:
>
> make defconfig
> make kvm_guest.config
> scripts/config -e KCOV -e KCOV_INSTRUMENT_ALL -e KCOV_ENABLE_COMPARISONS -e DEBUG_FS -e DEBUG_KMEMLEAK -e DEBUG_INFO -e KALLSYMS -e KALLSYMS_ALL -e NAMESPACES -e UTS_NS -e IPC_NS -e PID_NS -e NET_NS -e CGROUP_PIDS -e MEMCG -e USER_NS -e CONFIGFS_FS -e SECURITYFS -e KASAN -e KASAN_INLINE -e FAULT_INJECTION -e FAULT_INJECTION_DEBUG_FS -e FAULT_INJECTION_USERCOPY -e FAILSLAB -e FAIL_PAGE_ALLOC -e FAIL_MAKE_REQUEST -e FAIL_IO_TIMEOUT -e FAIL_FUTEX -e LOCKDEP -e PROVE_LOCKING -e DEBUG_ATOMIC_SLEEP -e PROVE_RCU -e DEBUG_VM -e REFCOUNT_FULL -e FORTIFY_SOURCE -e HARDENED_USERCOPY -e LOCKUP_DETECTOR -e SOFTLOCKUP_DETECTOR -e HARDLOCKUP_DETECTOR -e BOOTPARAM_HARDLOCKUP_PANIC -e DETECT_HUNG_TASK -e WQ_WATCHDOG -e USB_GADGET -e USB_RAW_GADGET -e TUN -e KCSAN -d RANDOMIZE_BASE -e MAC80211_HWSIM -e IEEE802154 -e MAC802154 -e IEEE802154_DRIVERS -e IEEE802154_HWSIM -e BT -e BT_HCIVHCI
> echo "CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=140" >> .config
> echo "CONFIG_RCU_CPU_STALL_TIMEOUT=100" >> .config
>
> I will be happy to test any patch or provide any extra log if needed.
> Though I am not sure how I will collect extra logs (if needed) as there
> was no output from qemu.

I see KASAN in your config, does this fix it?

https://lore.kernel.org/lkml/166693938482.29415.7034851115705424459.tip-bot2@tip-bot2/


--
Kees Cook

2022-10-28 20:12:33

by Sudip Mukherjee

[permalink] [raw]
Subject: Re: boot failure of linux-next due to 1248fb6a8201 ("x86/mm: Randomize per-cpu entry area")

On Fri, Oct 28, 2022 at 5:41 PM Kees Cook <[email protected]> wrote:
>
> On Fri, Oct 28, 2022 at 02:29:14PM +0100, Sudip Mukherjee (Codethink) wrote:
> > Hi All,
> >
> > Our qemu boots were failing since next-20221024, and a git bisect of
> > next-20221028 showed the bad commit as 1248fb6a8201 ("x86/mm: Randomize per-cpu entry area")
> >
> > After reverting the commit I could boot qemu again with next-20221028.
> >
> > This is my config:
> >
> > make defconfig
> > make kvm_guest.config
> > scripts/config -e KCOV -e KCOV_INSTRUMENT_ALL -e KCOV_ENABLE_COMPARISONS -e DEBUG_FS -e DEBUG_KMEMLEAK -e DEBUG_INFO -e KALLSYMS -e KALLSYMS_ALL -e NAMESPACES -e UTS_NS -e IPC_NS -e PID_NS -e NET_NS -e CGROUP_PIDS -e MEMCG -e USER_NS -e CONFIGFS_FS -e SECURITYFS -e KASAN -e KASAN_INLINE -e FAULT_INJECTION -e FAULT_INJECTION_DEBUG_FS -e FAULT_INJECTION_USERCOPY -e FAILSLAB -e FAIL_PAGE_ALLOC -e FAIL_MAKE_REQUEST -e FAIL_IO_TIMEOUT -e FAIL_FUTEX -e LOCKDEP -e PROVE_LOCKING -e DEBUG_ATOMIC_SLEEP -e PROVE_RCU -e DEBUG_VM -e REFCOUNT_FULL -e FORTIFY_SOURCE -e HARDENED_USERCOPY -e LOCKUP_DETECTOR -e SOFTLOCKUP_DETECTOR -e HARDLOCKUP_DETECTOR -e BOOTPARAM_HARDLOCKUP_PANIC -e DETECT_HUNG_TASK -e WQ_WATCHDOG -e USB_GADGET -e USB_RAW_GADGET -e TUN -e KCSAN -d RANDOMIZE_BASE -e MAC80211_HWSIM -e IEEE802154 -e MAC802154 -e IEEE802154_DRIVERS -e IEEE802154_HWSIM -e BT -e BT_HCIVHCI
> > echo "CONFIG_DEFAULT_HUNG_TASK_TIMEOUT=140" >> .config
> > echo "CONFIG_RCU_CPU_STALL_TIMEOUT=100" >> .config
> >
> > I will be happy to test any patch or provide any extra log if needed.
> > Though I am not sure how I will collect extra logs (if needed) as there
> > was no output from qemu.
>
> I see KASAN in your config, does this fix it?
>
> https://lore.kernel.org/lkml/166693938482.29415.7034851115705424459.tip-bot2@tip-bot2/

Yes, it does. Thanks.
I can see qemu booting up again. Also, looks like thats already merged
to x86/mm, so I am not sending a Tested-by in reply to that patch.

--
Regards
Sudip