2018-05-16 21:33:51

by Alexandru Gagniuc

[permalink] [raw]
Subject: [PATCH] PCI: DPC: Clear AER status bits before disabling port containment

AER status bits are sticky, and they survive system resets. Downstream
devices are usually taken care of after re-enumerating the downstream
busses, as the AER bits are cleared during probe().

However, nothing clears the bits of the port which contained the
error. These sticky bits may leave some BIOSes to think that something
bad happened, and print ominous messages on next boot. To prevent this,
tidy up the AER status bits before releasing containment.

Signed-off-by: Alexandru Gagniuc <[email protected]>
---
drivers/pci/pcie/dpc.c | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c
index 8c57d607e603..bf82d6936556 100644
--- a/drivers/pci/pcie/dpc.c
+++ b/drivers/pci/pcie/dpc.c
@@ -112,6 +112,10 @@ static void dpc_work(struct work_struct *work)
dpc->rp_pio_status = 0;
}

+ /* DPC event made a mess of our AER status bits. Clean them up. */
+ pci_cleanup_aer_error_status_regs(pdev);
+ /* TODO: Should we also use aer_print_error to log the event? */
+
pci_write_config_word(pdev, cap + PCI_EXP_DPC_STATUS,
PCI_EXP_DPC_STATUS_TRIGGER | PCI_EXP_DPC_STATUS_INTERRUPT);

--
2.14.3



2018-05-16 22:45:02

by Sinan Kaya

[permalink] [raw]
Subject: Re: [PATCH] PCI: DPC: Clear AER status bits before disabling port containment

On 5/16/2018 5:33 PM, Alexandru Gagniuc wrote:
> AER status bits are sticky, and they survive system resets. Downstream
> devices are usually taken care of after re-enumerating the downstream
> busses, as the AER bits are cleared during probe().
>
> However, nothing clears the bits of the port which contained the
> error. These sticky bits may leave some BIOSes to think that something
> bad happened, and print ominous messages on next boot. To prevent this,
> tidy up the AER status bits before releasing containment.
>
> Signed-off-by: Alexandru Gagniuc <[email protected]>
> ---
> drivers/pci/pcie/dpc.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c
> index 8c57d607e603..bf82d6936556 100644
> --- a/drivers/pci/pcie/dpc.c
> +++ b/drivers/pci/pcie/dpc.c
> @@ -112,6 +112,10 @@ static void dpc_work(struct work_struct *work)
> dpc->rp_pio_status = 0;
> }
>
> + /* DPC event made a mess of our AER status bits. Clean them up. */
> + pci_cleanup_aer_error_status_regs(pdev);
> + /* TODO: Should we also use aer_print_error to log the event? */
> +
> pci_write_config_word(pdev, cap + PCI_EXP_DPC_STATUS,
> PCI_EXP_DPC_STATUS_TRIGGER | PCI_EXP_DPC_STATUS_INTERRUPT);
>
>

I think Keith has a patch to fix this. It was under review at some point.

--
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

2018-05-16 23:11:19

by Keith Busch

[permalink] [raw]
Subject: Re: [PATCH] PCI: DPC: Clear AER status bits before disabling port containment

On Wed, May 16, 2018 at 06:44:22PM -0400, Sinan Kaya wrote:
> On 5/16/2018 5:33 PM, Alexandru Gagniuc wrote:
> > AER status bits are sticky, and they survive system resets. Downstream
> > devices are usually taken care of after re-enumerating the downstream
> > busses, as the AER bits are cleared during probe().
> >
> > However, nothing clears the bits of the port which contained the
> > error. These sticky bits may leave some BIOSes to think that something
> > bad happened, and print ominous messages on next boot. To prevent this,
> > tidy up the AER status bits before releasing containment.
> >
> > Signed-off-by: Alexandru Gagniuc <[email protected]>
> > ---
> > drivers/pci/pcie/dpc.c | 4 ++++
> > 1 file changed, 4 insertions(+)
> >
> > diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c
> > index 8c57d607e603..bf82d6936556 100644
> > --- a/drivers/pci/pcie/dpc.c
> > +++ b/drivers/pci/pcie/dpc.c
> > @@ -112,6 +112,10 @@ static void dpc_work(struct work_struct *work)
> > dpc->rp_pio_status = 0;
> > }
> >
> > + /* DPC event made a mess of our AER status bits. Clean them up. */
> > + pci_cleanup_aer_error_status_regs(pdev);
> > + /* TODO: Should we also use aer_print_error to log the event? */
> > +
> > pci_write_config_word(pdev, cap + PCI_EXP_DPC_STATUS,
> > PCI_EXP_DPC_STATUS_TRIGGER | PCI_EXP_DPC_STATUS_INTERRUPT);
> >
> >
>
> I think Keith has a patch to fix this. It was under review at some point.

Right, I do intend to following up on this, but I've had some trouble
finding time the last few weeks. Sorry about that, things will clear up
for me shortly.

2018-06-19 21:59:01

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [PATCH] PCI: DPC: Clear AER status bits before disabling port containment

On Wed, May 16, 2018 at 05:12:21PM -0600, Keith Busch wrote:
> On Wed, May 16, 2018 at 06:44:22PM -0400, Sinan Kaya wrote:
> > On 5/16/2018 5:33 PM, Alexandru Gagniuc wrote:
> > > AER status bits are sticky, and they survive system resets. Downstream
> > > devices are usually taken care of after re-enumerating the downstream
> > > busses, as the AER bits are cleared during probe().
> > >
> > > However, nothing clears the bits of the port which contained the
> > > error. These sticky bits may leave some BIOSes to think that something
> > > bad happened, and print ominous messages on next boot. To prevent this,
> > > tidy up the AER status bits before releasing containment.
> > >
> > > Signed-off-by: Alexandru Gagniuc <[email protected]>
> > > ---
> > > drivers/pci/pcie/dpc.c | 4 ++++
> > > 1 file changed, 4 insertions(+)
> > >
> > > diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c
> > > index 8c57d607e603..bf82d6936556 100644
> > > --- a/drivers/pci/pcie/dpc.c
> > > +++ b/drivers/pci/pcie/dpc.c
> > > @@ -112,6 +112,10 @@ static void dpc_work(struct work_struct *work)
> > > dpc->rp_pio_status = 0;
> > > }
> > >
> > > + /* DPC event made a mess of our AER status bits. Clean them up. */
> > > + pci_cleanup_aer_error_status_regs(pdev);
> > > + /* TODO: Should we also use aer_print_error to log the event? */
> > > +
> > > pci_write_config_word(pdev, cap + PCI_EXP_DPC_STATUS,
> > > PCI_EXP_DPC_STATUS_TRIGGER | PCI_EXP_DPC_STATUS_INTERRUPT);
> > >
> > >
> >
> > I think Keith has a patch to fix this. It was under review at some point.
>
> Right, I do intend to following up on this, but I've had some trouble
> finding time the last few weeks. Sorry about that, things will clear up
> for me shortly.

I'll drop this (Alexandru's) patch for now, waiting for your update, Keith.

2018-06-26 20:52:09

by Alexandru Gagniuc

[permalink] [raw]
Subject: Re: [PATCH] PCI: DPC: Clear AER status bits before disabling port containment



On 06/19/2018 04:57 PM, Bjorn Helgaas wrote:
> On Wed, May 16, 2018 at 05:12:21PM -0600, Keith Busch wrote:
>> On Wed, May 16, 2018 at 06:44:22PM -0400, Sinan Kaya wrote:
>>> On 5/16/2018 5:33 PM, Alexandru Gagniuc wrote:
>>>> AER status bits are sticky, and they survive system resets. Downstream
>>>> devices are usually taken care of after re-enumerating the downstream
>>>> busses, as the AER bits are cleared during probe().
>>>>
>>>> However, nothing clears the bits of the port which contained the
>>>> error. These sticky bits may leave some BIOSes to think that something
>>>> bad happened, and print ominous messages on next boot. To prevent this,
>>>> tidy up the AER status bits before releasing containment.
>>>>
>>>> Signed-off-by: Alexandru Gagniuc <[email protected]>
>>>> ---
>>>> drivers/pci/pcie/dpc.c | 4 ++++
>>>> 1 file changed, 4 insertions(+)
>>>>
>>>> diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c
>>>> index 8c57d607e603..bf82d6936556 100644
>>>> --- a/drivers/pci/pcie/dpc.c
>>>> +++ b/drivers/pci/pcie/dpc.c
>>>> @@ -112,6 +112,10 @@ static void dpc_work(struct work_struct *work)
>>>> dpc->rp_pio_status = 0;
>>>> }
>>>>
>>>> + /* DPC event made a mess of our AER status bits. Clean them up. */
>>>> + pci_cleanup_aer_error_status_regs(pdev);
>>>> + /* TODO: Should we also use aer_print_error to log the event? */
>>>> +
>>>> pci_write_config_word(pdev, cap + PCI_EXP_DPC_STATUS,
>>>> PCI_EXP_DPC_STATUS_TRIGGER | PCI_EXP_DPC_STATUS_INTERRUPT);
>>>>
>>>>
>>>
>>> I think Keith has a patch to fix this. It was under review at some point.
>>
>> Right, I do intend to following up on this, but I've had some trouble
>> finding time the last few weeks. Sorry about that, things will clear up
>> for me shortly.
>
> I'll drop this (Alexandru's) patch for now, waiting for your update, Keith.

I wonder if clearing AER status bits is mutually exclusive with
refactoring other parts of DPC handling?

Alex