2022-12-28 12:15:59

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [Bug 216859] New: PCI bridge to bus boot hang at enumeration

[+cc linux-pci, linux-kernel]

On Wed, Dec 28, 2022 at 08:37:52AM +0000, [email protected] wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=216859

> Summary: PCI bridge to bus boot hang at enumeration
> Kernel Version: 6.1-rc1
> ...

> With Kernel 6.1-rc1 the enumeration process stopped working for me,
> see attachments.
>
> The enumeration works fine with Kernel 6.0 and below.
>
> Same problem still exists with v6.1. and v6.2.-rc1

Thank you very much for your report, Zeno!

v6.0 works, v6.1-rc1 fails. Would you mind booting v6.1-rc1 with the
"ignore_loglevel initcall_debug" kernel parameters and taking a photo
when it hangs?

How did you conclude that the hang is related to a PCI bridge? I see
recent PCI messages in the photo, but it looks like the last message
is from NFS, so I'm wondering if I'm missing some context. The v6.0
dmesg shows several other ntfs, fuse, JFS, etc messages before more
PCI-related things. Anyway, the "initcall_debug" might help us narrow
it down a bit.

Bjorn


2022-12-28 17:57:14

by Zeno Davatz

[permalink] [raw]
Subject: Re: [Bug 216859] New: PCI bridge to bus boot hang at enumeration

Dear Bjorn

On Wed, Dec 28, 2022 at 1:02 PM Bjorn Helgaas <[email protected]> wrote:
>
> [+cc linux-pci, linux-kernel]
>
> On Wed, Dec 28, 2022 at 08:37:52AM +0000, [email protected] wrote:
> > https://bugzilla.kernel.org/show_bug.cgi?id=216859
>
> > Summary: PCI bridge to bus boot hang at enumeration
> > Kernel Version: 6.1-rc1
> > ...
>
> > With Kernel 6.1-rc1 the enumeration process stopped working for me,
> > see attachments.
> >
> > The enumeration works fine with Kernel 6.0 and below.
> >
> > Same problem still exists with v6.1. and v6.2.-rc1
>
> Thank you very much for your report, Zeno!
>
> v6.0 works, v6.1-rc1 fails. Would you mind booting v6.1-rc1 with the
> "ignore_loglevel initcall_debug" kernel parameters and taking a photo
> when it hangs?

I will try this after Januar 7th 2023.

> How did you conclude that the hang is related to a PCI bridge? I see
> recent PCI messages in the photo, but it looks like the last message
> is from NFS, so I'm wondering if I'm missing some context. The v6.0
> dmesg shows several other ntfs, fuse, JFS, etc messages before more
> PCI-related things. Anyway, the "initcall_debug" might help us narrow
> it down a bit.

I did not really conclude that. I just saw "PCI" as one of the last
messages being outputted before the boot process stopped.

Best
Zeno

2022-12-28 18:53:33

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [Bug 216859] New: PCI bridge to bus boot hang at enumeration

On Wed, Dec 28, 2022 at 06:42:38PM +0100, Zeno Davatz wrote:
> Dear Bjorn
>
> On Wed, Dec 28, 2022 at 1:02 PM Bjorn Helgaas <[email protected]> wrote:
> >
> > [+cc linux-pci, linux-kernel]
> >
> > On Wed, Dec 28, 2022 at 08:37:52AM +0000, [email protected] wrote:
> > > https://bugzilla.kernel.org/show_bug.cgi?id=216859
> >
> > > Summary: PCI bridge to bus boot hang at enumeration
> > > Kernel Version: 6.1-rc1
> > > ...
> >
> > > With Kernel 6.1-rc1 the enumeration process stopped working for me,
> > > see attachments.
> > >
> > > The enumeration works fine with Kernel 6.0 and below.
> > >
> > > Same problem still exists with v6.1. and v6.2.-rc1
> >
> > Thank you very much for your report, Zeno!
> >
> > v6.0 works, v6.1-rc1 fails. Would you mind booting v6.1-rc1 with the
> > "ignore_loglevel initcall_debug" kernel parameters and taking a photo
> > when it hangs?
>
> I will try this after Januar 7th 2023.

Sounds good, thanks!

> > How did you conclude that the hang is related to a PCI bridge? I see
> > recent PCI messages in the photo, but it looks like the last message
> > is from NFS, so I'm wondering if I'm missing some context. The v6.0
> > dmesg shows several other ntfs, fuse, JFS, etc messages before more
> > PCI-related things. Anyway, the "initcall_debug" might help us narrow
> > it down a bit.
>
> I did not really conclude that. I just saw "PCI" as one of the last
> messages being outputted before the boot process stopped.

OK. We'll figure it out!

Bjorn

2022-12-30 19:25:06

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [Bug 216859] New: PCI bridge to bus boot hang at enumeration

[+cc Bruno, to include you here as well as the bugzilla]

On Wed, Dec 28, 2022 at 12:42:34PM -0600, Bjorn Helgaas wrote:
> On Wed, Dec 28, 2022 at 06:42:38PM +0100, Zeno Davatz wrote:
> > On Wed, Dec 28, 2022 at 1:02 PM Bjorn Helgaas <[email protected]> wrote:
> > > On Wed, Dec 28, 2022 at 08:37:52AM +0000, [email protected] wrote:
> > > > https://bugzilla.kernel.org/show_bug.cgi?id=216859
> > >
> > > > Summary: PCI bridge to bus boot hang at enumeration
> > > > Kernel Version: 6.1-rc1
> > > > ...
> > >
> > > > With Kernel 6.1-rc1 the enumeration process stopped working for me,
> > > > see attachments.
> > > >
> > > > The enumeration works fine with Kernel 6.0 and below.
> > > >
> > > > Same problem still exists with v6.1. and v6.2.-rc1
> > >
> > > Thank you very much for your report, Zeno!
> > >
> > > v6.0 works, v6.1-rc1 fails. Would you mind booting v6.1-rc1 with the
> > > "ignore_loglevel initcall_debug" kernel parameters and taking a photo
> > > when it hangs?
> >
> > I will try this after Januar 7th 2023.
>
> Sounds good, thanks!
>
> > > How did you conclude that the hang is related to a PCI bridge? I see
> > > recent PCI messages in the photo, but it looks like the last message
> > > is from NFS, so I'm wondering if I'm missing some context. The v6.0
> > > dmesg shows several other ntfs, fuse, JFS, etc messages before more
> > > PCI-related things. Anyway, the "initcall_debug" might help us narrow
> > > it down a bit.
> >
> > I did not really conclude that. I just saw "PCI" as one of the last
> > messages being outputted before the boot process stopped.
>
> OK. We'll figure it out!
>
> Bjorn

2023-01-06 16:56:40

by Zeno Davatz

[permalink] [raw]
Subject: Re: [Bug 216859] New: PCI bridge to bus boot hang at enumeration

Dear Bjorn

Happy New Year!

On Fri, Dec 30, 2022 at 7:50 PM Bjorn Helgaas <[email protected]> wrote:
>
> [+cc Bruno, to include you here as well as the bugzilla]
>
> On Wed, Dec 28, 2022 at 12:42:34PM -0600, Bjorn Helgaas wrote:
> > On Wed, Dec 28, 2022 at 06:42:38PM +0100, Zeno Davatz wrote:
> > > On Wed, Dec 28, 2022 at 1:02 PM Bjorn Helgaas <[email protected]> wrote:
> > > > On Wed, Dec 28, 2022 at 08:37:52AM +0000, [email protected] wrote:
> > > > > https://bugzilla.kernel.org/show_bug.cgi?id=216859
> > > >
> > > > > Summary: PCI bridge to bus boot hang at enumeration
> > > > > Kernel Version: 6.1-rc1
> > > > > ...
> > > >
> > > > > With Kernel 6.1-rc1 the enumeration process stopped working for me,
> > > > > see attachments.
> > > > >
> > > > > The enumeration works fine with Kernel 6.0 and below.
> > > > >
> > > > > Same problem still exists with v6.1. and v6.2.-rc1
> > > >
> > > > Thank you very much for your report, Zeno!
> > > >
> > > > v6.0 works, v6.1-rc1 fails. Would you mind booting v6.1-rc1 with the
> > > > "ignore_loglevel initcall_debug" kernel parameters and taking a photo
> > > > when it hangs?
> > >
> > > I will try this after Januar 7th 2023.

I updated the issue:

https://bugzilla.kernel.org/show_bug.cgi?id=216859

I booted with the option: "ignore_loglevel initcall_debug"

Best
Zeno

2023-01-12 21:28:23

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [Bug 216859] New: PCI bridge to bus boot hang at enumeration

[+cc sound folks]

On Wed, Dec 28, 2022 at 06:02:48AM -0600, Bjorn Helgaas wrote:
> On Wed, Dec 28, 2022 at 08:37:52AM +0000, [email protected] wrote:
> > https://bugzilla.kernel.org/show_bug.cgi?id=216859
>
> > Summary: PCI bridge to bus boot hang at enumeration
> > Kernel Version: 6.1-rc1
> > ...
>
> > With Kernel 6.1-rc1 the enumeration process stopped working for me,
> > see attachments.
> >
> > The enumeration works fine with Kernel 6.0 and below.
> >
> > Same problem still exists with v6.1. and v6.2.-rc1
>
> Thank you very much for your report, Zeno!
>
> v6.0 works, v6.1-rc1 fails. Would you mind booting v6.1-rc1 with the
> "ignore_loglevel initcall_debug" kernel parameters and taking a photo
> when it hangs?
>
> How did you conclude that the hang is related to a PCI bridge? I see
> recent PCI messages in the photo, but it looks like the last message
> is from NFS, so I'm wondering if I'm missing some context. The v6.0
> dmesg shows several other ntfs, fuse, JFS, etc messages before more
> PCI-related things. Anyway, the "initcall_debug" might help us narrow
> it down a bit.

Thanks very much for the bisection (complete log at [1])!

The bisection claims the first bad commit is:

833477fce7a1 ("Merge tag 'sound-6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound")

with parents:

7e6739b9336e ("Merge tag 'drm-next-2022-10-05' of git://anongit.freedesktop.org/drm/drm")
86a4d29e7554 ("Merge tag 'asoc-v6.1' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus")

Both 7e6739b9336e and 86a4d29e7554 tested "good" during the bisection.

There is a minor conflict when merging 86a4d29e7554 into the upstream,
but I can't imagine that being resolved incorrectly.

Would you mind turning off CONFIG_SOUND in your .config and testing
833477fce7a1 again? I'm a little skeptical that the hang would be
sound-related, but I guess it's a place to start.

Bjorn

[1] https://bugzilla.kernel.org/show_bug.cgi?id=216859#c35

2023-01-13 10:26:34

by Zeno Davatz

[permalink] [raw]
Subject: Re: [Bug 216859] New: PCI bridge to bus boot hang at enumeration

Dear Bjorn

On Thu, Jan 12, 2023 at 9:08 PM Bjorn Helgaas <[email protected]> wrote:
>
> [+cc sound folks]
>
> On Wed, Dec 28, 2022 at 06:02:48AM -0600, Bjorn Helgaas wrote:
> > On Wed, Dec 28, 2022 at 08:37:52AM +0000, [email protected] wrote:
> > > https://bugzilla.kernel.org/show_bug.cgi?id=216859
> >
> > > Summary: PCI bridge to bus boot hang at enumeration
> > > Kernel Version: 6.1-rc1
> > > ...
> >
> > > With Kernel 6.1-rc1 the enumeration process stopped working for me,
> > > see attachments.
> > >
> > > The enumeration works fine with Kernel 6.0 and below.
> > >
> > > Same problem still exists with v6.1. and v6.2.-rc1
> >
> > Thank you very much for your report, Zeno!
> >
> > v6.0 works, v6.1-rc1 fails. Would you mind booting v6.1-rc1 with the
> > "ignore_loglevel initcall_debug" kernel parameters and taking a photo
> > when it hangs?
> >
> > How did you conclude that the hang is related to a PCI bridge? I see
> > recent PCI messages in the photo, but it looks like the last message
> > is from NFS, so I'm wondering if I'm missing some context. The v6.0
> > dmesg shows several other ntfs, fuse, JFS, etc messages before more
> > PCI-related things. Anyway, the "initcall_debug" might help us narrow
> > it down a bit.
>
> Thanks very much for the bisection (complete log at [1])!
>
> The bisection claims the first bad commit is:
>
> 833477fce7a1 ("Merge tag 'sound-6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound")
>
> with parents:
>
> 7e6739b9336e ("Merge tag 'drm-next-2022-10-05' of git://anongit.freedesktop.org/drm/drm")
> 86a4d29e7554 ("Merge tag 'asoc-v6.1' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus")
>
> Both 7e6739b9336e and 86a4d29e7554 tested "good" during the bisection.
>
> There is a minor conflict when merging 86a4d29e7554 into the upstream,
> but I can't imagine that being resolved incorrectly.
>
> Would you mind turning off CONFIG_SOUND in your .config and testing
> 833477fce7a1 again? I'm a little skeptical that the hang would be
> sound-related, but I guess it's a place to start.
>
> Bjorn
>
> [1] https://bugzilla.kernel.org/show_bug.cgi?id=216859#c35

Booting without sound into the commit 833477fce7a1 did not help. Same hang.

Best
Zeno

2023-01-19 00:38:37

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [Bug 216859] New: PCI bridge to bus boot hang at enumeration

[+cc Krzysztof]

On Fri, Jan 06, 2023 at 05:42:33PM +0100, Zeno Davatz wrote:
> On Fri, Dec 30, 2022 at 7:50 PM Bjorn Helgaas <[email protected]> wrote:
> > On Wed, Dec 28, 2022 at 12:42:34PM -0600, Bjorn Helgaas wrote:
> > > On Wed, Dec 28, 2022 at 06:42:38PM +0100, Zeno Davatz wrote:
> > > > On Wed, Dec 28, 2022 at 1:02 PM Bjorn Helgaas <[email protected]> wrote:
> > > > > On Wed, Dec 28, 2022 at 08:37:52AM +0000, [email protected] wrote:
> > > > > > https://bugzilla.kernel.org/show_bug.cgi?id=216859
> > > > >
> > > > > > Summary: PCI bridge to bus boot hang at enumeration
> > > > > > Kernel Version: 6.1-rc1
> > > > > > ...
> > > > >
> > > > > > With Kernel 6.1-rc1 the enumeration process stopped working for me,
> > > > > > see attachments.
> > > > > >
> > > > > > The enumeration works fine with Kernel 6.0 and below.
> > > > > >
> > > > > > Same problem still exists with v6.1. and v6.2.-rc1
> > > > >
> > > > > Thank you very much for your report, Zeno!
> > > > >
> > > > > v6.0 works, v6.1-rc1 fails. Would you mind booting v6.1-rc1 with the
> > > > > "ignore_loglevel initcall_debug" kernel parameters and taking a photo
> > > > > when it hangs?
> > > >
> > > > I will try this after Januar 7th 2023.
>
> I updated the issue:
>
> https://bugzilla.kernel.org/show_bug.cgi?id=216859
>
> I booted with the option: "ignore_loglevel initcall_debug"

Thanks! There's so much pcie output in that picture that we can't see
any of the initcall logging. Can you capture another movie, but use
kernel parameters like "ignore_loglevel initcall_debug boot_delay=100"
to slow things down? The full-speed boot is too fast for the camera
to capture all the output. You can do this on any convenient kernel
that hangs.

There might be more we can do with the bisection, too, but I don't
have any suggestions for that yet. Maybe Krzysztof does.

Bjorn

2023-01-20 05:55:05

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [Bug 216859] New: PCI bridge to bus boot hang at enumeration

[+cc [email protected] to avoid spamassassin]

On Wed, Jan 18, 2023 at 06:04:58PM -0600, Bjorn Helgaas wrote:
> On Fri, Jan 06, 2023 at 05:42:33PM +0100, Zeno Davatz wrote:
> > On Fri, Dec 30, 2022 at 7:50 PM Bjorn Helgaas <[email protected]> wrote:
> > > On Wed, Dec 28, 2022 at 12:42:34PM -0600, Bjorn Helgaas wrote:
> > > > On Wed, Dec 28, 2022 at 06:42:38PM +0100, Zeno Davatz wrote:
> > > > > On Wed, Dec 28, 2022 at 1:02 PM Bjorn Helgaas <[email protected]> wrote:
> > > > > > On Wed, Dec 28, 2022 at 08:37:52AM +0000, [email protected] wrote:
> > > > > > > https://bugzilla.kernel.org/show_bug.cgi?id=216859
> > > > > >
> > > > > > > Summary: PCI bridge to bus boot hang at enumeration
> > > > > > > Kernel Version: 6.1-rc1
> > > > > > > ...
> > > > > >
> > > > > > > With Kernel 6.1-rc1 the enumeration process stopped working for me,
> > > > > > > see attachments.
> > > > > > >
> > > > > > > The enumeration works fine with Kernel 6.0 and below.
> > > > > > >
> > > > > > > Same problem still exists with v6.1. and v6.2.-rc1
> > > > > >
> > > > > > Thank you very much for your report, Zeno!
> > > > > >
> > > > > > v6.0 works, v6.1-rc1 fails. Would you mind booting v6.1-rc1 with the
> > > > > > "ignore_loglevel initcall_debug" kernel parameters and taking a photo
> > > > > > when it hangs?
> > > > >
> > > > > I will try this after Januar 7th 2023.
> >
> > I updated the issue:
> >
> > https://bugzilla.kernel.org/show_bug.cgi?id=216859
> >
> > I booted with the option: "ignore_loglevel initcall_debug"
>
> Thanks! There's so much pcie output in that picture that we can't see
> any of the initcall logging. Can you capture another movie, but use
> kernel parameters like "ignore_loglevel initcall_debug boot_delay=100"
> to slow things down? The full-speed boot is too fast for the camera
> to capture all the output. You can do this on any convenient kernel
> that hangs.

Thanks for the new movie! The last initcalls I see before the hang
are:

init_mqueue_fs
key_proc_init
jent_mod_init

We must have returned from jent_mod_init() because I think the "saving
config space" messages we see at the hang are from
pcie_portdrv_init().

I built 833477fce7a1 ("Merge tag 'sound-6.1-rc1' of
git://git.kernel.org/pub/scl) with your .config and when I boot it on
qemu, I see this:

calling jent_mod_init+0x0/0x32 @ 1
initcall jent_mod_init+0x0/0x32 returned 0 after 27185 usecs
calling af_alg_init+0x0/0x45 @ 1
NET: Registered PF_ALG protocol family
...
calling sg_pool_init+0x0/0xb4 @ 1
initcall sg_pool_init+0x0/0xb4 returned 0 after 462 usecs
calling pcie_portdrv_init+0x0/0x43 @ 1
pcieport 0000:00:1c.0: vgaarb: pci_notify
pcieport 0000:00:1c.0: runtime IRQ mapping not provided by arch
pcieport 0000:00:1c.0: enabling bus mastering
pcieport 0000:00:1c.0: PME: Signaling with IRQ 24
pcieport 0000:00:1c.0: AER: enabled with IRQ 24
pcieport 0000:00:1c.0: saving config space at offset 0x0 (reading 0x34208086)
pcieport 0000:00:1c.0: saving config space at offset 0x4 (reading 0x100507)
pcieport 0000:00:1c.0: saving config space at offset 0x8 (reading 0x6040002)
...

Would you mind trying again with "boot_delay=1000 pcie_ports=compat"?

"boot_delay=1000" should slow it down more (all the action is in the
last 3 seconds and it's still hard to see) and "pcie_ports=compat"
should turn off the PCIe port driver.

Bjorn

2023-01-26 12:11:34

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [REGRESSION] [Bug 216859] New: PCI bridge to bus boot hang at enumeration

[+cc folks from 145eed48de27 and framebuffer folks, regression list]

On Thu, Jan 12, 2023 at 02:08:19PM -0600, Bjorn Helgaas wrote:
> On Wed, Dec 28, 2022 at 06:02:48AM -0600, Bjorn Helgaas wrote:
> > On Wed, Dec 28, 2022 at 08:37:52AM +0000, [email protected] wrote:
> > > https://bugzilla.kernel.org/show_bug.cgi?id=216859
> >
> > > Summary: PCI bridge to bus boot hang at enumeration
> > > Kernel Version: 6.1-rc1
> > > ...
> >
> > > With Kernel 6.1-rc1 the enumeration process stopped working for me,
> > > see attachments.
> > >
> > > The enumeration works fine with Kernel 6.0 and below.
> > >
> > > Same problem still exists with v6.1. and v6.2.-rc1

This is a regression between v6.0 and v6.1-rc1. Console output during
boot freezes after nvidiafb deactivates the VGA console.

It was a lot of work for Zeno, but we finally isolated this console
hang to 145eed48de27 ("fbdev: Remove conflicting devices on PCI bus").

The system actually does continue to boot and is accessible via ssh,
but the console appears hung, at least for output. More details in
the bugzilla starting at
https://bugzilla.kernel.org/show_bug.cgi?id=216859#c47 .

Bjorn

2023-02-01 22:30:40

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [REGRESSION] [Bug 216859] New: PCI bridge to bus boot hang at enumeration

[+cc Geert]

On Thu, Jan 26, 2023 at 06:11:24AM -0600, Bjorn Helgaas wrote:
> On Thu, Jan 12, 2023 at 02:08:19PM -0600, Bjorn Helgaas wrote:
> > On Wed, Dec 28, 2022 at 06:02:48AM -0600, Bjorn Helgaas wrote:
> > > On Wed, Dec 28, 2022 at 08:37:52AM +0000, [email protected] wrote:
> > > > https://bugzilla.kernel.org/show_bug.cgi?id=216859
> > >
> > > > Summary: PCI bridge to bus boot hang at enumeration
> > > > Kernel Version: 6.1-rc1
> > > > ...
> > >
> > > > With Kernel 6.1-rc1 the enumeration process stopped working for me,
> > > > see attachments.
> > > >
> > > > The enumeration works fine with Kernel 6.0 and below.
> > > >
> > > > Same problem still exists with v6.1. and v6.2.-rc1
>
> This is a regression between v6.0 and v6.1-rc1. Console output during
> boot freezes after nvidiafb deactivates the VGA console.
>
> It was a lot of work for Zeno, but we finally isolated this console
> hang to 145eed48de27 ("fbdev: Remove conflicting devices on PCI bus").
>
> The system actually does continue to boot and is accessible via ssh,
> but the console appears hung, at least for output. More details in
> the bugzilla starting at
> https://bugzilla.kernel.org/show_bug.cgi?id=216859#c47 .

145eed48de27 ("fbdev: Remove conflicting devices on PCI bus") doesn't
say what the benefit is, or what would break if we reverted it.

Does anybody have any clues? It would be nice to resolve this
regression before v6.2, which will probably be released 2/12 or 2/19.

Bjorn