2017-12-11 15:31:30

by Chris Clayton

[permalink] [raw]
Subject: Oops on 4.15-rc[123] on shutdown/reboot

I've been getting an oops when shutting down my laptop (with /sbin/halt) or rebooting it (/sbin/reboot or
/usr/sbin/kexec). Unfortunately, I can't provide the backtrace because it is on the screen for only a moment before the
system shuts down/reboots.

I have however, bisected it and the outcome is:

cc27b735ad3a75574a6ab1a66ed6b09385e77e5e is the first bad commit
commit cc27b735ad3a75574a6ab1a66ed6b09385e77e5e
Author: Sinan Kaya <[email protected]>
Date: Wed Oct 25 15:01:02 2017 -0400

PCI/portdrv: Turn off PCIe services during shutdown

Some of the PCIe services such as AER are being left enabled during
shutdown. This might cause spurious AER errors while SOC is being powered
down.

Clean up the PCIe services gracefully during shutdown to clear these false
positives.

Signed-off-by: Sinan Kaya <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>

:040000 040000 5a827d6956c581344a0bf392e30155c337673c1d 76c6a39b53604a0a0a370383c3503f80aa7cbc1e M drivers

I'm confident that this is the correct outcome because a kernel built with the preceding commit
(6018182d3158505f11103adaee8ffb53424df986) does not oops. Nor does -rc3 with the patch reversed.

I'm more than happy to provide additional diagnostics and test proposed fixes. As a starter for ten, I've attached the
output from 'lspci -v'. If, however, you need to see the backtrace, I'll need some advice on how to capture that.

Chris


Attachments:
lspci-v.txt (10.40 kB)

2017-12-11 16:30:22

by Sinan Kaya

[permalink] [raw]
Subject: Re: Oops on 4.15-rc[123] on shutdown/reboot

Hi Chris,

>
> I'm more than happy to provide additional diagnostics and test proposed fixes. As a starter for ten, I've attached the
> output from 'lspci -v'. If, however, you need to see the backtrace, I'll need some advice on how to capture that.
>

Can you open a bugzilla and also share the boot log?

There must be something unique about your system.

Sinan

--
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

2017-12-11 17:06:51

by Chris Clayton

[permalink] [raw]
Subject: Re: Oops on 4.15-rc[123] on shutdown/reboot



On 11/12/17 16:29, Sinan Kaya wrote:
> Hi Chris,
>
>>
>> I'm more than happy to provide additional diagnostics and test proposed fixes. As a starter for ten, I've attached the
>> output from 'lspci -v'. If, however, you need to see the backtrace, I'll need some advice on how to capture that.
>>
>
> Can you open a bugzilla and also share the boot log?
>

Here's the output of dmesg for 4.15.0-rc3. I'll open a bugzilla later and add this and the lspci output that I sent with
my original repoart.

> There must be something unique about your system.
>
> Sinan
>


Attachments:
dmesg-4.15.0-rc3.log (45.18 kB)

2017-12-11 17:17:54

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: Oops on 4.15-rc[123] on shutdown/reboot

[+cc linux-pci]

On Mon, Dec 11, 2017 at 11:29:50AM -0500, Sinan Kaya wrote:
> Hi Chris,
>
> >
> > I'm more than happy to provide additional diagnostics and test proposed fixes. As a starter for ten, I've attached the
> > output from 'lspci -v'. If, however, you need to see the backtrace, I'll need some advice on how to capture that.
> >
>
> Can you open a bugzilla and also share the boot log?
>
> There must be something unique about your system.

Can you attach "lspci -vv" output (as root) to the bugzilla, too?

2017-12-11 17:24:47

by Sinan Kaya

[permalink] [raw]
Subject: Re: Oops on 4.15-rc[123] on shutdown/reboot

On 12/11/2017 12:06 PM, Chris Clayton wrote:
> Here's the output of dmesg for 4.15.0-rc3. I'll open a bugzilla later and add this and the lspci output that I sent with
> my original repoart.

This was helpful. I don't see any AER/DPC in your log. It looks like the only PCIe
portdrv service you have is PME.

Can we do a quick hack and return immediately from

static int pcie_pme_probe(struct pcie_device *srv)

by putting return 0; at the top.

Same thing in

static void pcie_pme_remove(struct pcie_device *srv)

just place a return at the top.

I'm hoping your problem will go away after this. Then, we can start peeling the onion.

--
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

2017-12-11 19:31:43

by Chris Clayton

[permalink] [raw]
Subject: Re: Oops on 4.15-rc[123] on shutdown/reboot



On 11/12/17 17:24, Sinan Kaya wrote:
> On 12/11/2017 12:06 PM, Chris Clayton wrote:
>> Here's the output of dmesg for 4.15.0-rc3. I'll open a bugzilla later and add this and the lspci output that I sent with
>> my original repoart.
>
> This was helpful. I don't see any AER/DPC in your log. It looks like the only PCIe
> portdrv service you have is PME.
>
> Can we do a quick hack and return immediately from
>
> static int pcie_pme_probe(struct pcie_device *srv)
>
> by putting return 0; at the top.
>
> Same thing in
>
> static void pcie_pme_remove(struct pcie_device *srv)
>
> just place a return at the top.
>

I made those changes (to drivers/pci/pcie/pme.c) and built and installed the kernel. Sorry, but I still get the oops
when I reboot.

> I'm hoping your problem will go away after this. Then, we can start peeling the onion.
>

2017-12-11 20:04:39

by Chris Clayton

[permalink] [raw]
Subject: Re: Oops on 4.15-rc[123] on shutdown/reboot

On 11/12/17 17:17, Bjorn Helgaas wrote:
> [+cc linux-pci]
>
> On Mon, Dec 11, 2017 at 11:29:50AM -0500, Sinan Kaya wrote:
>> Hi Chris,
>>
>>>
>>> I'm more than happy to provide additional diagnostics and test proposed fixes. As a starter for ten, I've attached the
>>> output from 'lspci -v'. If, however, you need to see the backtrace, I'll need some advice on how to capture that.
>>>
>>
>> Can you open a bugzilla and also share the boot log?
>>
>> There must be something unique about your system.
>
> Can you attach "lspci -vv" output (as root) to the bugzilla, too?
>

I've opened the bugzilla report (Bug 198141) and attached the dmesg and lspci -vv outputs to it.