LinuxLists.cc - Intermittent Qemu boot hang/regression traced back to INT 0x80 changes

2024-04-24 19:07:04

Subject: Intermittent Qemu boot hang/regression traced back to INT 0x80 changes

Richard (via the Yocto auto-builder) reported a sporadic (once per
hundreds) boot hang during the PCI bus boot mapping on v6.6.x
for both x86 and x86_64.

On x86, I isolated it to the INT 0x80 backports added to v6.6.7:

239bff0171a8 x86/tdx: Allow 32-bit emulation by default
22ca647c8f88 x86/entry: Do not allow external 0x80 interrupts
4591766ff655 x86/entry: Convert INT 0x80 emulation to IDTENTRY
34c686e5be2f x86/coco: Disable 32-bit emulation by default on TDX and SEV
f259af26ee04 x86: Introduce ia32_enabled()

The ia32_enabled() is a trivial compile dependency and the Yocto use
case doesn't even compile arch/x86/coco/tdx/tdx.c - leaving just the
middle three commits. I didn't try and bisect within those, since it
seemed relatively clear to me they were assumed to be taken as a group.

To confirm my diagnosis, I reverted this group of changes on v6.6.7
baseline, and the sporadic PCI-hang went away.

I then went to mainline and tested where it was added:

commit f35e46631b28a63ca3887d7afef1a65a5544da52
Author: Linus Torvalds <[email protected]>
Date: Thu Dec 7 11:56:34 2023 -0800

Merge tag 'x86-int80-20231207' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Took about 400 runs, but the PCI-hang eventually showed up.

Of course, the BHI changes touch a lot of the same files, and I was
wondering if the issue would remain. Tested v6.6.27 (has BHI backports)
and it would still happen. Can no longer easily revert the INT80
changes once they are buried under the BHI changes anymore though.

I then took v6.9-rc5 and let it run overnight (700 boots) and I "caught"
three instances of the PCI-hang.

Finally I took linux-next from today (next-20240424) and confirmed a
PCI-hang within 50 boots. I can't explain the variability other than it
being a shared machine where I ran the tests.

Not sure what to do next. Figured step #1 was to report it, at least.
A whole bunch of extra details are in the Yocto case:

https://bugzilla.yoctoproject.org/show_bug.cgi?id=15463

..including the v6.9-rc5 .config and the full qemu arg list.

Paul.
--

Linux version 6.9.0-rc5-next-20240424-yocto-standard (oe-user@oe-host) (i686-poky-linux-gcc (GCC) 13.2.0, GNU ld (GNU Binutils) 2.42.0.20240216) #1 SMP PREEMPT_DYNAMIC
Wed Apr 24 10:57:01 UTC 2024
BIOS-provided physical RAM map:
BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved
[...]
acpi PNP0A08:00: _OSC: platform does not support [LTR]
acpi PNP0A08:00: _OSC: OS now controls [PME PCIeCapability]
acpi resource window ([0x100000000-0x8ffffffff] ignored, not CPU addressable)
PCI host bridge to bus 0000:00
pci_bus 0000:00: root bus resource [io 0x0000-0x0cf7 window]
pci_bus 0000:00: root bus resource [io 0x0d00-0xffff window]
pci_bus 0000:00: root bus resource [mem 0x000a0000-0x000bffff window]
pci_bus 0000:00: root bus resource [mem 0x10000000-0xafffffff window]
pci_bus 0000:00: root bus resource [mem 0xc0000000-0xfebfffff window]
pci_bus 0000:00: root bus resource [bus 00-ff]
pci 0000:00:00.0: [8086:29c0] type 00 class 0x060000 conventional PCI endpoint
pci 0000:00:01.0: [1234:1111] type 00 class 0x030000 conventional PCI endpoint
pci 0000:00:01.0: BAR 0 [mem 0xfd000000-0xfdffffff pref]
pci 0000:00:01.0: BAR 2 [mem 0xfebd0000-0xfebd0fff]
pci 0000:00:01.0: ROM [mem 0xfebc0000-0xfebcffff pref]
pci 0000:00:01.0: Video device with shadowed ROM at [mem 0x000c0000-0x000dffff]
pci 0000:00:02.0: [1af4:1000] type 00 class 0x020000 conventional PCI endpoint
pci 0000:00:02.0: BAR 0 [io 0xc040-0xc05f]
pci 0000:00:02.0: BAR 1 [mem 0xfebd1000-0xfebd1fff]
pci 0000:00:02.0: BAR 4 [mem 0xfe000000-0xfe003fff 64bit pref]
pci 0000:00:02.0: ROM [mem 0xfeb80000-0xfebbffff pref]
pci 0000:00:03.0: [1af4:1005] type 00 class 0x00ff00 conventional PCI endpoint
pci 0000:00:03.0: BAR 0 [io 0xc060-0xc07f]
pci 0000:00:03.0: BAR 1 [mem 0xfebd2000-0xfebd2fff]
pci 0000:00:03.0: BAR 4 [mem 0xfe004000-0xfe007fff 64bit pref]
pci 0000:00:1d.0: [8086:2934] type 00 class 0x0c0300 conventional PCI endpoint
pci 0000:00:1d.0: BAR 4 [io 0xc080-0xc09f]
pci 0000:00:1d.1: [8086:2935] type 00 class 0x0c0300 conventional PCI endpoint
pci 0000:00:1d.1: BAR 4 [io 0xc0a0-0xc0bf]
pci 0000:00:1d.2: [8086:2936] type 00 class 0x0c0300 conventional PCI endpoint
<hang - not always exactly here, but always in this block of PCI printk>

2024-04-24 19:52:29

by Borislav Petkov

[permalink] [raw]

Subject: Re: Intermittent Qemu boot hang/regression traced back to INT 0x80 changes

On Wed, Apr 24, 2024 at 02:58:06PM -0400, Paul Gortmaker wrote:
..
> pci 0000:00:1d.0: [8086:2934] type 00 class 0x0c0300 conventional PCI endpoint
> pci 0000:00:1d.0: BAR 4 [io 0xc080-0xc09f]
> pci 0000:00:1d.1: [8086:2935] type 00 class 0x0c0300 conventional PCI endpoint
> pci 0000:00:1d.1: BAR 4 [io 0xc0a0-0xc0bf]
> pci 0000:00:1d.2: [8086:2936] type 00 class 0x0c0300 conventional PCI endpoint
> <hang - not always exactly here, but always in this block of PCI printk>

How would those commits have anything to do with such an early hang?!

Nothing that early is issuing INT80 32-bit syscalls, is it?

Btw, can you checkout the Linus tree at...

f35e46631b28 Merge tag 'x86-int80-20231207' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
f4116bfc4462 x86/tdx: Allow 32-bit emulation by default

<-- here and test that commit as the top one?

55617fb991df x86/entry: Do not allow external 0x80 interrupts

which reminds me - that hang could be actually that guest kernel
panicking but the panic not coming out to the console.

When it hangs, can you connect with gdb to qemu and dump stack and
registers?

Make sure you have DEBUG_INFO enabled in the guest kernel.

Is this even a guest?

I know you had guests last time you reported the alternatives issue.

Right, and then test the tree checked out at this commit:

be5341eb0d43 x86/entry: Convert INT 0x80 emulation to IDTENTRY

The others should be unrelated...

b82a8dbd3d2f x86/coco: Disable 32-bit emulation by default on TDX and SEV

Hmm.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2024-04-24 20:26:47

by Dave Hansen

[permalink] [raw]

Subject: Re: Intermittent Qemu boot hang/regression traced back to INT 0x80 changes

On 4/24/24 11:58, Paul Gortmaker wrote:
> pci 0000:00:1d.0: [8086:2934] type 00 class 0x0c0300 conventional PCI endpoint
> pci 0000:00:1d.0: BAR 4 [io 0xc080-0xc09f]
> pci 0000:00:1d.1: [8086:2935] type 00 class 0x0c0300 conventional PCI endpoint
> pci 0000:00:1d.1: BAR 4 [io 0xc0a0-0xc0bf]
> pci 0000:00:1d.2: [8086:2936] type 00 class 0x0c0300 conventional PCI endpoint
> <hang - not always exactly here, but always in this block of PCI printk>

Any chance you can figure out what the virtual CPU is doing while it's
hung? Maybe run these a couple of times on the qemu monitor?

info registers
info irqs
info lapic

to see if it's taking a bunch of interrupts and whether RIP is still
moving around. A couple of samples of RIP (matched back to the kernel
code via vmlinux) can go a long way to figuring out why it's hung.

I take it that this is before sysrq is working? If it isn't too early:

sendkey alt-sysrq-t

is always handy.

Otherwise, I'm a bit stumped. This code shouldn't even be called before
userspace starts up. Heck you don't even have CONFIG_IA32_EMULATION on
in your .config.

2024-04-24 20:41:41

by Dave Hansen

[permalink] [raw]

Subject: Re: Intermittent Qemu boot hang/regression traced back to INT 0x80 changes

One other thing... Is this using QEMU's TCG code generator? We have
seen some weird bugs with it in the past. Or are you running under KVM?

Your qemu command-line made me think it's TCG and not KVM.

2024-04-25 18:44:31

by Dave Hansen

[permalink] [raw]

Subject: Re: Intermittent Qemu boot hang/regression traced back to INT 0x80 changes

FWIW, I'm running something as close to your qemu command-line as I can
get with a 6.9-rc5 kernel and your .config. I'm over a thousand boots
in with no hangs yet. This was just with my existing qemu.

2024-04-26 12:31:47

by Paul Gortmaker

[permalink] [raw]

Subject: Re: Intermittent Qemu boot hang/regression traced back to INT 0x80 changes

[Apologies for repeated info; last mail didn't make it to the list]

[Re: Intermittent Qemu boot hang/regression traced back to INT 0x80 changes] On 24/04/2024 (Wed 21:51) Borislav Petkov wrote:

> On Wed, Apr 24, 2024 at 02:58:06PM -0400, Paul Gortmaker wrote:
> ...
> > pci 0000:00:1d.0: [8086:2934] type 00 class 0x0c0300 conventional PCI endpoint
> > pci 0000:00:1d.0: BAR 4 [io 0xc080-0xc09f]
> > pci 0000:00:1d.1: [8086:2935] type 00 class 0x0c0300 conventional PCI endpoint
> > pci 0000:00:1d.1: BAR 4 [io 0xc0a0-0xc0bf]
> > pci 0000:00:1d.2: [8086:2936] type 00 class 0x0c0300 conventional PCI endpoint
> > <hang - not always exactly here, but always in this block of PCI printk>
>
> How would those commits have anything to do with such an early hang?!
>
> Nothing that early is issuing INT80 32-bit syscalls, is it?
>
> Btw, can you checkout the Linus tree at...
>
> f35e46631b28 Merge tag 'x86-int80-20231207' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
> f4116bfc4462 x86/tdx: Allow 32-bit emulation by default
>
>
> <-- here and test that commit as the top one?
>
> 55617fb991df x86/entry: Do not allow external 0x80 interrupts

They both show the issue, but that really doesn't matter now. When you
guys pointed out it really didn't make sense, I did what I should have
done before - tested the crap out of ^1, the trunk just before the
INT80 merge:

commit f35e46631b28a63ca3887d7afef1a65a5544da52
Merge: 55b224d90d44 f4116bfc4462
^^^^^^^^^^^^
Author: Linus Torvalds <[email protected]>
Date: Thu Dec 7 11:56:34 2023 -0800

Merge tag 'x86-int80-20231207' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

..which would be 55b224d90d44 (parisc merge). So I left that run
for near 24h (almost 2000 runs), and got 8 PCI-hang instances. :(
Which means INT80 isn't even there yet.

So I owe you guys an apology for pointing the finger at INT80. I still
don't understand how the pseudo bisect on v6.6-stable seems so
"concrete". The v6.6.6 worked "fine" (it seemed) and v6.6.7 died fairly
quickly. The revert of INT80 on v6.6.7 seemed to "fix" it - but if so,
it was only because it perturbed something else.

I already knew my "good" bisect points were not "proven" good, but only
statistically "good". Seems I need to revisit some of those "good" data
points (both on v6.6-stable) and on mainline and test longer.

>
> which reminds me - that hang could be actually that guest kernel
> panicking but the panic not coming out to the console.
>
> When it hangs, can you connect with gdb to qemu and dump stack and
> registers?
>
> Make sure you have DEBUG_INFO enabled in the guest kernel.

I want to try some of these things, but I also don't want to
accidentally lose the reproducer I have. Maybe I'll see if I can
reproduce it at home, since I'll lose use of the current box in a week
anyway...

Again, sorry for the false positive. I let the v6.6-stable testing bias
my mainline conclusions to where I didn't test underneath INT80. I'll
follow up with more details once (if?) I manage to properly sort this.

Paul.
--

>
> Is this even a guest?
>
> I know you had guests last time you reported the alternatives issue.
>
> Right, and then test the tree checked out at this commit:
>
> be5341eb0d43 x86/entry: Convert INT 0x80 emulation to IDTENTRY
>
> The others should be unrelated...
>
> b82a8dbd3d2f x86/coco: Disable 32-bit emulation by default on TDX and SEV
>
> Hmm.
>
> --
> Regards/Gruss,
> Boris.
>
> https://people.kernel.org/tglx/notes-about-netiquette