2020-11-30 21:30:52

by Florian Fainelli

[permalink] [raw]
Subject: Re: [PATCH v2 5/6] PCI: brcmstb: Add panic/die handler to RC driver



On 11/30/2020 1:11 PM, Jim Quinlan wrote:
> Whereas most PCIe HW returns 0xffffffff on illegal accesses and the like,
> by default Broadcom's STB PCIe controller effects an abort. This simple
> handler determines if the PCIe controller was the cause of the abort and if
> so, prints out diagnostic info.
>
> Example output:
> brcm-pcie 8b20000.pcie: Error: Mem Acc: 32bit, Read, @0x38000000
> brcm-pcie 8b20000.pcie: Type: TO=0 Abt=0 UnspReq=1 AccDsble=0 BadAddr=0
>
> Signed-off-by: Jim Quinlan <[email protected]>

Acked-by: Florian Fainelli <[email protected]>
--
Florian

2020-12-01 18:09:28

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [PATCH v2 5/6] PCI: brcmstb: Add panic/die handler to RC driver

On Mon, Nov 30, 2020 at 04:11:42PM -0500, Jim Quinlan wrote:
> Whereas most PCIe HW returns 0xffffffff on illegal accesses and the like,
> by default Broadcom's STB PCIe controller effects an abort. This simple
> handler determines if the PCIe controller was the cause of the abort and if
> so, prints out diagnostic info.

What happens during enumeration? pci_bus_generic_read_dev_vendor_id()
assumes a read of Vendor ID returns 0xffffffff if the device doesn't
exist.

I assume this case doesn't cause the abort you're referring to here,
or nothing would work. I think this enumeration case results in PCIe
Unsupported Request errors (PCIe r5.0, sec 2.3.2 implementation note).

Bjorn

2020-12-01 20:18:27

by Jim Quinlan

[permalink] [raw]
Subject: Re: [PATCH v2 5/6] PCI: brcmstb: Add panic/die handler to RC driver

On Tue, Dec 1, 2020 at 1:05 PM Bjorn Helgaas <[email protected]> wrote:
>
> On Mon, Nov 30, 2020 at 04:11:42PM -0500, Jim Quinlan wrote:
> > Whereas most PCIe HW returns 0xffffffff on illegal accesses and the like,
> > by default Broadcom's STB PCIe controller effects an abort. This simple
> > handler determines if the PCIe controller was the cause of the abort and if
> > so, prints out diagnostic info.
>
> What happens during enumeration? pci_bus_generic_read_dev_vendor_id()
> assumes a read of Vendor ID returns 0xffffffff if the device doesn't
> exist.
>
> I assume this case doesn't cause the abort you're referring to here,
> or nothing would work. I think this enumeration case results in PCIe
> Unsupported Request errors (PCIe r5.0, sec 2.3.2 implementation note).
Hi Bjorn,

Yes, our controller makes a special case to allow for config-space
accesses to the dev_id and vendor_id registers. even if the device is
missing. That being said, it will abort on any access if the link is
down.

However, the 7216-type SOCs bring PCIe error-reporting HW but also
have a mode where 0xffffffff is returned on improper accesses, just
like many other controllers. We are debating whether we should turn
this on by default.

Regards,
Jim Quinlan
Broadcom STB
>
> Bjorn


Attachments:
smime.p7s (4.07 kB)
S/MIME Cryptographic Signature

2021-01-06 19:21:52

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [PATCH v2 5/6] PCI: brcmstb: Add panic/die handler to RC driver

On Mon, Nov 30, 2020 at 04:11:42PM -0500, Jim Quinlan wrote:
> Whereas most PCIe HW returns 0xffffffff on illegal accesses and the like,
> by default Broadcom's STB PCIe controller effects an abort. This simple
> handler determines if the PCIe controller was the cause of the abort and if
> so, prints out diagnostic info.
>
> Example output:
> brcm-pcie 8b20000.pcie: Error: Mem Acc: 32bit, Read, @0x38000000
> brcm-pcie 8b20000.pcie: Type: TO=0 Abt=0 UnspReq=1 AccDsble=0 BadAddr=0

What does this mean for all the other PCI core code that expects
0xffffffff data returns? Does it work? Does it break differently on
STB than on other platforms?

> +/*
> + * Dump out pcie errors on die or panic.

s/pcie/PCIe/
This could be a single-line comment.

> + */

2021-01-06 19:59:31

by Jim Quinlan

[permalink] [raw]
Subject: Re: [PATCH v2 5/6] PCI: brcmstb: Add panic/die handler to RC driver

On Wed, Jan 6, 2021 at 2:42 PM Jim Quinlan <[email protected]> wrote:
>
> ---------- Forwarded message ---------
> From: Bjorn Helgaas <[email protected]>
> Date: Wed, Jan 6, 2021 at 2:19 PM
> Subject: Re: [PATCH v2 5/6] PCI: brcmstb: Add panic/die handler to RC driver
> To: Jim Quinlan <[email protected]>
> Cc: <[email protected]>, Nicolas Saenz Julienne
> <[email protected]>, <[email protected]>,
> <[email protected]>, Lorenzo Pieralisi
> <[email protected]>, Rob Herring <[email protected]>, Bjorn
> Helgaas <[email protected]>, Florian Fainelli
> <[email protected]>, moderated list:BROADCOM BCM2711/BCM2835 ARM
> ARCHITECTURE <[email protected]>, moderated
> list:BROADCOM BCM2711/BCM2835 ARM ARCHITECTURE
> <[email protected]>, open list
> <[email protected]>
>
>
> On Mon, Nov 30, 2020 at 04:11:42PM -0500, Jim Quinlan wrote:
> > Whereas most PCIe HW returns 0xffffffff on illegal accesses and the like,
> > by default Broadcom's STB PCIe controller effects an abort. This simple
> > handler determines if the PCIe controller was the cause of the abort and if
> > so, prints out diagnostic info.
> >
> > Example output:
> > brcm-pcie 8b20000.pcie: Error: Mem Acc: 32bit, Read, @0x38000000
> > brcm-pcie 8b20000.pcie: Type: TO=0 Abt=0 UnspReq=1 AccDsble=0 BadAddr=0
>
> What does this mean for all the other PCI core code that expects
> 0xffffffff data returns? Does it work? Does it break differently on
> STB than on other platforms?
Hi Bjorn,

Our PCIe HW causes a CPU abort when this happens. Occasionally a
customer will have a fault handler try to fix up the abort and
continue on, but we recommend solving the root problem. This commit
just gives us a chance to glean info about the problem. Our newer
SOCs have a mode that doesn't abort and instead returns 0xffffffff.

BTW, can you point me to example files where "PCI core code that
expects 0xffffffff data returns" [on bad accesses]?

Regards,
Jim Quinlan
Broadcom STB

>
> > +/*
> > + * Dump out pcie errors on die or panic.
>
> s/pcie/PCIe/
> This could be a single-line comment.
>
> > + */
>

2021-01-06 23:13:48

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [PATCH v2 5/6] PCI: brcmstb: Add panic/die handler to RC driver

On Wed, Jan 06, 2021 at 02:57:19PM -0500, Jim Quinlan wrote:
> On Wed, Jan 6, 2021 at 2:42 PM Jim Quinlan <[email protected]> wrote:
> >
> > ---------- Forwarded message ---------
> > From: Bjorn Helgaas <[email protected]>
> > Date: Wed, Jan 6, 2021 at 2:19 PM
> > Subject: Re: [PATCH v2 5/6] PCI: brcmstb: Add panic/die handler to RC driver
> > To: Jim Quinlan <[email protected]>
> > Cc: <[email protected]>, Nicolas Saenz Julienne
> > <[email protected]>, <[email protected]>,
> > <[email protected]>, Lorenzo Pieralisi
> > <[email protected]>, Rob Herring <[email protected]>, Bjorn
> > Helgaas <[email protected]>, Florian Fainelli
> > <[email protected]>, moderated list:BROADCOM BCM2711/BCM2835 ARM
> > ARCHITECTURE <[email protected]>, moderated
> > list:BROADCOM BCM2711/BCM2835 ARM ARCHITECTURE
> > <[email protected]>, open list
> > <[email protected]>
> >
> >
> > On Mon, Nov 30, 2020 at 04:11:42PM -0500, Jim Quinlan wrote:
> > > Whereas most PCIe HW returns 0xffffffff on illegal accesses and the like,
> > > by default Broadcom's STB PCIe controller effects an abort. This simple
> > > handler determines if the PCIe controller was the cause of the abort and if
> > > so, prints out diagnostic info.
> > >
> > > Example output:
> > > brcm-pcie 8b20000.pcie: Error: Mem Acc: 32bit, Read, @0x38000000
> > > brcm-pcie 8b20000.pcie: Type: TO=0 Abt=0 UnspReq=1 AccDsble=0 BadAddr=0
> >
> > What does this mean for all the other PCI core code that expects
> > 0xffffffff data returns? Does it work? Does it break differently on
> > STB than on other platforms?
> Hi Bjorn,
>
> Our PCIe HW causes a CPU abort when this happens. Occasionally a
> customer will have a fault handler try to fix up the abort and
> continue on, but we recommend solving the root problem. This commit
> just gives us a chance to glean info about the problem. Our newer
> SOCs have a mode that doesn't abort and instead returns 0xffffffff.
>
> BTW, can you point me to example files where "PCI core code that
> expects 0xffffffff data returns" [on bad accesses]?

The most important case is during enumeration. A config read to a
device that doesn't exist normally terminates as an Unsupported
Request, and pci_bus_generic_read_dev_vendor_id() depends on reading
0xffffffff in that case. I assume this particular case does work that
way for brcm-pcie, because I assume enumeration does work.

pci_cfg_space_size_ext() is similar. I assume this also works for
brcm-pcie for the same reason.

pci_raw_set_power_state() looks for ~0, which it may see if it does a
config read to a device in D3cold. pci_dev_wait(), dpc_irq(),
pcie_pme_work_fn(), pcie_pme_irq() are all similar.

Yes, this is ugly and we should check for these more consistently.

The above are all for config reads. The PCI core doesn't do MMIO
accesses except for a few cases like MSI-X. But drivers do, and if
they check for PCIe errors on MMIO reads, they do it by looking for
0xffffffff, e.g., pci_mmio_enabled() (in hfi1),
qib_pci_mmio_enabled(), bnx2x_get_hwinfo(), etc.

Bjorn