This patch series adds quirks to work around a few issues related to
certain revisions of Broadcom PAXC based PCIe root complexes
This patch series is based off v4.17-rc5 and is available on GIHUB:
repo: https://github.com/Broadcom/arm64-linux.git
branch: sr-paxc-quirk-v1
Ray Jui (3):
PCI: iproc: Activate PAXC bridge quirk for more devices
PCI: iproc: Fix up corrupted PAXC root complex config registers
PCI: iproc: Disable MSI parsing in certain PAXC blocks
drivers/pci/host/pcie-iproc.c | 34 ++++++++++++++-
drivers/pci/quirks.c | 98 +++++++++++++++++++++++++++++++++++++++++++
2 files changed, 130 insertions(+), 2 deletions(-)
--
2.1.4
The internal MSI parsing logic in certain revisions of PAXC root
complexes does not work properly and can casue corruptions on the
writes. They need to be disabled
Signed-off-by: Ray Jui <[email protected]>
Reviewed-by: Scott Branden <[email protected]>
---
drivers/pci/host/pcie-iproc.c | 34 ++++++++++++++++++++++++++++++++--
1 file changed, 32 insertions(+), 2 deletions(-)
diff --git a/drivers/pci/host/pcie-iproc.c b/drivers/pci/host/pcie-iproc.c
index 3c76c5f..b906d80 100644
--- a/drivers/pci/host/pcie-iproc.c
+++ b/drivers/pci/host/pcie-iproc.c
@@ -1144,10 +1144,22 @@ static int iproc_pcie_paxb_v2_msi_steer(struct iproc_pcie *pcie, u64 msi_addr)
return ret;
}
-static void iproc_pcie_paxc_v2_msi_steer(struct iproc_pcie *pcie, u64 msi_addr)
+static void iproc_pcie_paxc_v2_msi_steer(struct iproc_pcie *pcie, u64 msi_addr,
+ bool enable)
{
u32 val;
+ if (!enable) {
+ /*
+ * Disable PAXC MSI steering. All write transfers will be
+ * treated as non-MSI transfers
+ */
+ val = iproc_pcie_read_reg(pcie, IPROC_PCIE_MSI_EN_CFG);
+ val &= ~MSI_ENABLE_CFG;
+ iproc_pcie_write_reg(pcie, IPROC_PCIE_MSI_EN_CFG, val);
+ return;
+ }
+
/*
* Program bits [43:13] of address of GITS_TRANSLATER register into
* bits [30:0] of the MSI base address register. In fact, in all iProc
@@ -1201,7 +1213,7 @@ static int iproc_pcie_msi_steer(struct iproc_pcie *pcie,
return ret;
break;
case IPROC_PCIE_PAXC_V2:
- iproc_pcie_paxc_v2_msi_steer(pcie, msi_addr);
+ iproc_pcie_paxc_v2_msi_steer(pcie, msi_addr, true);
break;
default:
return -EINVAL;
@@ -1427,6 +1439,24 @@ int iproc_pcie_remove(struct iproc_pcie *pcie)
}
EXPORT_SYMBOL(iproc_pcie_remove);
+/*
+ * The MSI parsing logic in certain revisions of Broadcom PAXC based root
+ * complex does not work and needs to be disabled
+ */
+static void quirk_paxc_disable_msi_parsing(struct pci_dev *pdev)
+{
+ struct iproc_pcie *pcie = iproc_data(pdev->bus);
+
+ if (pdev->hdr_type == PCI_HEADER_TYPE_BRIDGE)
+ iproc_pcie_paxc_v2_msi_steer(pcie, 0, false);
+}
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_BROADCOM, 0x16f0,
+ quirk_paxc_disable_msi_parsing);
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_BROADCOM, 0xd802,
+ quirk_paxc_disable_msi_parsing);
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_BROADCOM, 0xd804,
+ quirk_paxc_disable_msi_parsing);
+
MODULE_AUTHOR("Ray Jui <[email protected]>");
MODULE_DESCRIPTION("Broadcom iPROC PCIe common driver");
MODULE_LICENSE("GPL v2");
--
2.1.4
Activate PAXC bridge quirk for more PAXC based PCIe root complex with
the following PCIe device ID:
0xd750, 0xd802, 0xd804
Signed-off-by: Ray Jui <[email protected]>
---
drivers/pci/quirks.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 2990ad1..47dfea0 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -2195,6 +2195,9 @@ static void quirk_paxc_bridge(struct pci_dev *pdev)
}
DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_BROADCOM, 0x16cd, quirk_paxc_bridge);
DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_BROADCOM, 0x16f0, quirk_paxc_bridge);
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_BROADCOM, 0xd750, quirk_paxc_bridge);
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_BROADCOM, 0xd802, quirk_paxc_bridge);
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_BROADCOM, 0xd804, quirk_paxc_bridge);
#endif
/* Originally in EDAC sources for i82875P:
--
2.1.4
On certain versions of Broadcom PAXC based root complexes, certain
regions of the configuration space are corrupted. As a result, it
prevents the Linux PCIe stack from traversing the linked list of the
capability registers completely and therefore the root complex is
not advertised as "PCIe capable". This prevents the correct PCIe RID
from being parsed in the kernel PCIe stack. A correct RID is required
for mapping to a stream ID from the SMMU or the device ID from the
GICv3 ITS
This patch fixes up the issue by manually populating the related
PCIe capabilities based on readings from the PCIe capability structure
Signed-off-by: Ray Jui <[email protected]>
Reviewed-by: Anup Patel <[email protected]>
Reviewed-by: Scott Branden <[email protected]>
---
drivers/pci/quirks.c | 95 ++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 95 insertions(+)
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 47dfea0..0cdbd0a 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -2198,6 +2198,101 @@ DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_BROADCOM, 0x16f0, quirk_paxc_bridge);
DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_BROADCOM, 0xd750, quirk_paxc_bridge);
DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_BROADCOM, 0xd802, quirk_paxc_bridge);
DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_BROADCOM, 0xd804, quirk_paxc_bridge);
+
+/*
+ * The PCI capabilities list for certain revisions of Broadcom PAXC root
+ * complexes is incorrectly terminated due to corrupted configuration space
+ * registers in the range of 0x50 - 0x5f
+ *
+ * As a result, the capability list becomes broken and prevent standard PCI
+ * stack from being able to traverse to the PCIe capability structure
+ */
+static void quirk_paxc_pcie_capability(struct pci_dev *pdev)
+{
+ int pos, i = 0;
+ u8 next_cap;
+ u16 reg16, *cap;
+ struct pci_cap_saved_state *state;
+
+ /* bail out if PCIe capability can be found */
+ if (pdev->pcie_cap || pci_find_capability(pdev, PCI_CAP_ID_EXP))
+ return;
+
+ /* locate the power management capability */
+ pos = pci_find_capability(pdev, PCI_CAP_ID_PM);
+ if (!pos)
+ return;
+
+ /* bail out if the next capability pointer is not 0x50/0x58 */
+ pci_read_config_byte(pdev, pos + 1, &next_cap);
+ if (next_cap != 0x50 && next_cap != 0x58)
+ return;
+
+ /* bail out if we do not terminate at 0x50/0x58 */
+ pos = next_cap;
+ pci_read_config_byte(pdev, pos + 1, &next_cap);
+ if (next_cap != 0x00)
+ return;
+
+ /*
+ * On these buggy HW, PCIe capability structure is expected to be at
+ * 0xac and should terminate the list
+ *
+ * Borrow the similar logic from theIntel DH895xCC VFs fixup to save
+ * the PCIe capability list
+ */
+ pos = 0xac;
+ pci_read_config_word(pdev, pos, ®16);
+ if (reg16 == (0x0000 | PCI_CAP_ID_EXP)) {
+ u32 status;
+
+#ifndef PCI_EXP_SAVE_REGS
+#define PCI_EXP_SAVE_REGS 7
+#endif
+ int size = PCI_EXP_SAVE_REGS * sizeof(u16);
+
+ pdev->pcie_cap = pos;
+ pci_read_config_word(pdev, pos + PCI_EXP_FLAGS, ®16);
+ pdev->pcie_flags_reg = reg16;
+ pci_read_config_word(pdev, pos + PCI_EXP_DEVCAP, ®16);
+ pdev->pcie_mpss = reg16 & PCI_EXP_DEVCAP_PAYLOAD;
+
+ pdev->cfg_size = PCI_CFG_SPACE_EXP_SIZE;
+ if (pci_read_config_dword(pdev, PCI_CFG_SPACE_SIZE, &status) !=
+ PCIBIOS_SUCCESSFUL || (status == 0xffffffff))
+ pdev->cfg_size = PCI_CFG_SPACE_SIZE;
+
+ if (pci_find_saved_cap(pdev, PCI_CAP_ID_EXP))
+ return;
+
+ state = kzalloc(sizeof(*state) + size, GFP_KERNEL);
+ if (!state)
+ return;
+
+ state->cap.cap_nr = PCI_CAP_ID_EXP;
+ state->cap.cap_extended = 0;
+ state->cap.size = size;
+ cap = (u16 *)&state->cap.data[0];
+ pcie_capability_read_word(pdev, PCI_EXP_DEVCTL, &cap[i++]);
+ pcie_capability_read_word(pdev, PCI_EXP_LNKCTL, &cap[i++]);
+ pcie_capability_read_word(pdev, PCI_EXP_SLTCTL, &cap[i++]);
+ pcie_capability_read_word(pdev, PCI_EXP_RTCTL, &cap[i++]);
+ pcie_capability_read_word(pdev, PCI_EXP_DEVCTL2, &cap[i++]);
+ pcie_capability_read_word(pdev, PCI_EXP_LNKCTL2, &cap[i++]);
+ pcie_capability_read_word(pdev, PCI_EXP_SLTCTL2, &cap[i++]);
+ hlist_add_head(&state->next, &pdev->saved_cap_space);
+ }
+}
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_BROADCOM, PCI_DEVICE_ID_NX2_57810,
+ quirk_paxc_pcie_capability);
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_BROADCOM, 0x16cd,
+ quirk_paxc_pcie_capability);
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_BROADCOM, 0x16f0,
+ quirk_paxc_pcie_capability);
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_BROADCOM, 0xd802,
+ quirk_paxc_pcie_capability);
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_BROADCOM, 0xd804,
+ quirk_paxc_pcie_capability);
#endif
/* Originally in EDAC sources for i82875P:
--
2.1.4
Hi Bjorn/Lorenzo,
My bad for having this "INTERNAL" tag in the subject. Please let me know
if you need me to send out another set with that removed.
Thanks,
Ray
On 5/17/2018 10:21 AM, Ray Jui wrote:
> This patch series adds quirks to work around a few issues related to
> certain revisions of Broadcom PAXC based PCIe root complexes
>
> This patch series is based off v4.17-rc5 and is available on GIHUB:
> repo: https://github.com/Broadcom/arm64-linux.git
> branch: sr-paxc-quirk-v1
>
> Ray Jui (3):
> PCI: iproc: Activate PAXC bridge quirk for more devices
> PCI: iproc: Fix up corrupted PAXC root complex config registers
> PCI: iproc: Disable MSI parsing in certain PAXC blocks
>
> drivers/pci/host/pcie-iproc.c | 34 ++++++++++++++-
> drivers/pci/quirks.c | 98 +++++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 130 insertions(+), 2 deletions(-)
>
On 2018-05-17 22:51, Ray Jui wrote:
> On certain versions of Broadcom PAXC based root complexes, certain
> regions of the configuration space are corrupted. As a result, it
> prevents the Linux PCIe stack from traversing the linked list of the
> capability registers completely and therefore the root complex is
> not advertised as "PCIe capable". This prevents the correct PCIe RID
> from being parsed in the kernel PCIe stack. A correct RID is required
> for mapping to a stream ID from the SMMU or the device ID from the
> GICv3 ITS
>
> This patch fixes up the issue by manually populating the related
> PCIe capabilities based on readings from the PCIe capability structure
>
> Signed-off-by: Ray Jui <[email protected]>
> Reviewed-by: Anup Patel <[email protected]>
> Reviewed-by: Scott Branden <[email protected]>
> ---
> drivers/pci/quirks.c | 95
> ++++++++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 95 insertions(+)
>
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index 47dfea0..0cdbd0a 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -2198,6 +2198,101 @@
> DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_BROADCOM, 0x16f0,
> quirk_paxc_bridge);
> DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_BROADCOM, 0xd750,
> quirk_paxc_bridge);
> DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_BROADCOM, 0xd802,
> quirk_paxc_bridge);
> DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_BROADCOM, 0xd804,
> quirk_paxc_bridge);
> +
> +/*
> + * The PCI capabilities list for certain revisions of Broadcom PAXC
> root
> + * complexes is incorrectly terminated due to corrupted configuration
> space
> + * registers in the range of 0x50 - 0x5f
> + *
> + * As a result, the capability list becomes broken and prevent
> standard PCI
> + * stack from being able to traverse to the PCIe capability structure
> + */
> +static void quirk_paxc_pcie_capability(struct pci_dev *pdev)
> +{
> + int pos, i = 0;
> + u8 next_cap;
> + u16 reg16, *cap;
> + struct pci_cap_saved_state *state;
> +
> + /* bail out if PCIe capability can be found */
> + if (pdev->pcie_cap || pci_find_capability(pdev, PCI_CAP_ID_EXP))
> + return;
> +
> + /* locate the power management capability */
> + pos = pci_find_capability(pdev, PCI_CAP_ID_PM);
> + if (!pos)
> + return;
> +
> + /* bail out if the next capability pointer is not 0x50/0x58 */
> + pci_read_config_byte(pdev, pos + 1, &next_cap);
> + if (next_cap != 0x50 && next_cap != 0x58)
> + return;
> +
> + /* bail out if we do not terminate at 0x50/0x58 */
> + pos = next_cap;
> + pci_read_config_byte(pdev, pos + 1, &next_cap);
> + if (next_cap != 0x00)
> + return;
> +
> + /*
> + * On these buggy HW, PCIe capability structure is expected to be at
> + * 0xac and should terminate the list
> + *
> + * Borrow the similar logic from theIntel DH895xCC VFs fixup to save
:%s /theIntel /Intel
> + * the PCIe capability list
> + */
> + pos = 0xac;
> + pci_read_config_word(pdev, pos, ®16);
> + if (reg16 == (0x0000 | PCI_CAP_ID_EXP)) {
> + u32 status;
> +
> +#ifndef PCI_EXP_SAVE_REGS
> +#define PCI_EXP_SAVE_REGS 7
> +#endif
> + int size = PCI_EXP_SAVE_REGS * sizeof(u16);
> +
> + pdev->pcie_cap = pos;
> + pci_read_config_word(pdev, pos + PCI_EXP_FLAGS, ®16);
> + pdev->pcie_flags_reg = reg16;
> + pci_read_config_word(pdev, pos + PCI_EXP_DEVCAP, ®16);
> + pdev->pcie_mpss = reg16 & PCI_EXP_DEVCAP_PAYLOAD;
> +
> + pdev->cfg_size = PCI_CFG_SPACE_EXP_SIZE;
> + if (pci_read_config_dword(pdev, PCI_CFG_SPACE_SIZE, &status) !=
> + PCIBIOS_SUCCESSFUL || (status == 0xffffffff))
> + pdev->cfg_size = PCI_CFG_SPACE_SIZE;
> +
> + if (pci_find_saved_cap(pdev, PCI_CAP_ID_EXP))
> + return;
> +
> + state = kzalloc(sizeof(*state) + size, GFP_KERNEL);
> + if (!state)
> + return;
> +
> + state->cap.cap_nr = PCI_CAP_ID_EXP;
> + state->cap.cap_extended = 0;
> + state->cap.size = size;
> + cap = (u16 *)&state->cap.data[0];
> + pcie_capability_read_word(pdev, PCI_EXP_DEVCTL, &cap[i++]);
> + pcie_capability_read_word(pdev, PCI_EXP_LNKCTL, &cap[i++]);
> + pcie_capability_read_word(pdev, PCI_EXP_SLTCTL, &cap[i++]);
> + pcie_capability_read_word(pdev, PCI_EXP_RTCTL, &cap[i++]);
> + pcie_capability_read_word(pdev, PCI_EXP_DEVCTL2, &cap[i++]);
> + pcie_capability_read_word(pdev, PCI_EXP_LNKCTL2, &cap[i++]);
> + pcie_capability_read_word(pdev, PCI_EXP_SLTCTL2, &cap[i++]);
> + hlist_add_head(&state->next, &pdev->saved_cap_space);
> + }
> +}
> +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_BROADCOM,
> PCI_DEVICE_ID_NX2_57810,
> + quirk_paxc_pcie_capability);
> +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_BROADCOM, 0x16cd,
> + quirk_paxc_pcie_capability);
> +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_BROADCOM, 0x16f0,
> + quirk_paxc_pcie_capability);
> +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_BROADCOM, 0xd802,
> + quirk_paxc_pcie_capability);
> +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_BROADCOM, 0xd804,
> + quirk_paxc_pcie_capability);
> #endif
>
> /* Originally in EDAC sources for i82875P:
On 2018-05-17 22:51, Ray Jui wrote:
> The internal MSI parsing logic in certain revisions of PAXC root
> complexes does not work properly and can casue corruptions on the
> writes. They need to be disabled
>
> Signed-off-by: Ray Jui <[email protected]>
> Reviewed-by: Scott Branden <[email protected]>
> ---
> drivers/pci/host/pcie-iproc.c | 34 ++++++++++++++++++++++++++++++++--
> 1 file changed, 32 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/pci/host/pcie-iproc.c
> b/drivers/pci/host/pcie-iproc.c
> index 3c76c5f..b906d80 100644
> --- a/drivers/pci/host/pcie-iproc.c
> +++ b/drivers/pci/host/pcie-iproc.c
> @@ -1144,10 +1144,22 @@ static int iproc_pcie_paxb_v2_msi_steer(struct
> iproc_pcie *pcie, u64 msi_addr)
> return ret;
> }
>
> -static void iproc_pcie_paxc_v2_msi_steer(struct iproc_pcie *pcie, u64
> msi_addr)
> +static void iproc_pcie_paxc_v2_msi_steer(struct iproc_pcie *pcie, u64
> msi_addr,
> + bool enable)
> {
> u32 val;
>
> + if (!enable) {
> + /*
> + * Disable PAXC MSI steering. All write transfers will be
> + * treated as non-MSI transfers
> + */
> + val = iproc_pcie_read_reg(pcie, IPROC_PCIE_MSI_EN_CFG);
> + val &= ~MSI_ENABLE_CFG;
> + iproc_pcie_write_reg(pcie, IPROC_PCIE_MSI_EN_CFG, val);
> + return;
> + }
> +
> /*
> * Program bits [43:13] of address of GITS_TRANSLATER register into
> * bits [30:0] of the MSI base address register. In fact, in all
> iProc
> @@ -1201,7 +1213,7 @@ static int iproc_pcie_msi_steer(struct iproc_pcie
> *pcie,
> return ret;
> break;
> case IPROC_PCIE_PAXC_V2:
> - iproc_pcie_paxc_v2_msi_steer(pcie, msi_addr);
> + iproc_pcie_paxc_v2_msi_steer(pcie, msi_addr, true);
Are you calling iproc_pcie_paxc_v2_msi_steer() anywhere else with
'false' ?
I see its getting called only from one place in current code
iproc_pcie_msi_steer().
> break;
> default:
> return -EINVAL;
> @@ -1427,6 +1439,24 @@ int iproc_pcie_remove(struct iproc_pcie *pcie)
> }
> EXPORT_SYMBOL(iproc_pcie_remove);
>
> +/*
> + * The MSI parsing logic in certain revisions of Broadcom PAXC based
> root
> + * complex does not work and needs to be disabled
> + */
> +static void quirk_paxc_disable_msi_parsing(struct pci_dev *pdev)
> +{
> + struct iproc_pcie *pcie = iproc_data(pdev->bus);
> +
> + if (pdev->hdr_type == PCI_HEADER_TYPE_BRIDGE)
> + iproc_pcie_paxc_v2_msi_steer(pcie, 0, false);
> +}
> +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_BROADCOM, 0x16f0,
> + quirk_paxc_disable_msi_parsing);
> +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_BROADCOM, 0xd802,
> + quirk_paxc_disable_msi_parsing);
> +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_BROADCOM, 0xd804,
> + quirk_paxc_disable_msi_parsing);
> +
> MODULE_AUTHOR("Ray Jui <[email protected]>");
> MODULE_DESCRIPTION("Broadcom iPROC PCIe common driver");
> MODULE_LICENSE("GPL v2");
On Fri, May 18, 2018 at 02:53:41PM +0530, [email protected] wrote:
> On 2018-05-17 22:51, Ray Jui wrote:
> >The internal MSI parsing logic in certain revisions of PAXC root
> >complexes does not work properly and can casue corruptions on the
> >writes. They need to be disabled
> >
> >Signed-off-by: Ray Jui <[email protected]>
> >Reviewed-by: Scott Branden <[email protected]>
> >---
> > drivers/pci/host/pcie-iproc.c | 34 ++++++++++++++++++++++++++++++++--
> > 1 file changed, 32 insertions(+), 2 deletions(-)
> >
> >diff --git a/drivers/pci/host/pcie-iproc.c b/drivers/pci/host/pcie-iproc.c
> >index 3c76c5f..b906d80 100644
> >--- a/drivers/pci/host/pcie-iproc.c
> >+++ b/drivers/pci/host/pcie-iproc.c
> >@@ -1144,10 +1144,22 @@ static int iproc_pcie_paxb_v2_msi_steer(struct
> >iproc_pcie *pcie, u64 msi_addr)
> > return ret;
> > }
> >
> >-static void iproc_pcie_paxc_v2_msi_steer(struct iproc_pcie *pcie, u64
> >msi_addr)
> >+static void iproc_pcie_paxc_v2_msi_steer(struct iproc_pcie *pcie, u64
> >msi_addr,
> >+ bool enable)
> > {
> > u32 val;
> >
> >+ if (!enable) {
> >+ /*
> >+ * Disable PAXC MSI steering. All write transfers will be
> >+ * treated as non-MSI transfers
> >+ */
> >+ val = iproc_pcie_read_reg(pcie, IPROC_PCIE_MSI_EN_CFG);
> >+ val &= ~MSI_ENABLE_CFG;
> >+ iproc_pcie_write_reg(pcie, IPROC_PCIE_MSI_EN_CFG, val);
> >+ return;
> >+ }
> >+
> > /*
> > * Program bits [43:13] of address of GITS_TRANSLATER register into
> > * bits [30:0] of the MSI base address register. In fact, in all iProc
> >@@ -1201,7 +1213,7 @@ static int iproc_pcie_msi_steer(struct iproc_pcie
> >*pcie,
> > return ret;
> > break;
> > case IPROC_PCIE_PAXC_V2:
> >- iproc_pcie_paxc_v2_msi_steer(pcie, msi_addr);
> >+ iproc_pcie_paxc_v2_msi_steer(pcie, msi_addr, true);
>
> Are you calling iproc_pcie_paxc_v2_msi_steer() anywhere else with 'false' ?
> I see its getting called only from one place in current code
> iproc_pcie_msi_steer().
It is called below with the false field to disable MSIs. That's probably
one reason more to create a function to enable/disable MSIs instead of
adding a parameter to iproc_pcie_paxc_v2_msi_steer().
Which brings me to the question: what happens to those MSIs writes
when MSIs are disabled according to this patch ? Are they terminated
at the root complex ?
Lorenzo
> > break;
> > default:
> > return -EINVAL;
> >@@ -1427,6 +1439,24 @@ int iproc_pcie_remove(struct iproc_pcie *pcie)
> > }
> > EXPORT_SYMBOL(iproc_pcie_remove);
> >
> >+/*
> >+ * The MSI parsing logic in certain revisions of Broadcom PAXC based root
> >+ * complex does not work and needs to be disabled
> >+ */
> >+static void quirk_paxc_disable_msi_parsing(struct pci_dev *pdev)
> >+{
> >+ struct iproc_pcie *pcie = iproc_data(pdev->bus);
> >+
> >+ if (pdev->hdr_type == PCI_HEADER_TYPE_BRIDGE)
> >+ iproc_pcie_paxc_v2_msi_steer(pcie, 0, false);
> >+}
> >+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_BROADCOM, 0x16f0,
> >+ quirk_paxc_disable_msi_parsing);
> >+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_BROADCOM, 0xd802,
> >+ quirk_paxc_disable_msi_parsing);
> >+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_BROADCOM, 0xd804,
> >+ quirk_paxc_disable_msi_parsing);
> >+
> > MODULE_AUTHOR("Ray Jui <[email protected]>");
> > MODULE_DESCRIPTION("Broadcom iPROC PCIe common driver");
> > MODULE_LICENSE("GPL v2");
Hi Lorenzo,
On 5/18/2018 6:56 AM, Lorenzo Pieralisi wrote:
> On Fri, May 18, 2018 at 02:53:41PM +0530, [email protected] wrote:
>> On 2018-05-17 22:51, Ray Jui wrote:
>>> The internal MSI parsing logic in certain revisions of PAXC root
>>> complexes does not work properly and can casue corruptions on the
>>> writes. They need to be disabled
>>>
>>> Signed-off-by: Ray Jui <[email protected]>
>>> Reviewed-by: Scott Branden <[email protected]>
>>> ---
>>> drivers/pci/host/pcie-iproc.c | 34 ++++++++++++++++++++++++++++++++--
>>> 1 file changed, 32 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/pci/host/pcie-iproc.c b/drivers/pci/host/pcie-iproc.c
>>> index 3c76c5f..b906d80 100644
>>> --- a/drivers/pci/host/pcie-iproc.c
>>> +++ b/drivers/pci/host/pcie-iproc.c
>>> @@ -1144,10 +1144,22 @@ static int iproc_pcie_paxb_v2_msi_steer(struct
>>> iproc_pcie *pcie, u64 msi_addr)
>>> return ret;
>>> }
>>>
>>> -static void iproc_pcie_paxc_v2_msi_steer(struct iproc_pcie *pcie, u64
>>> msi_addr)
>>> +static void iproc_pcie_paxc_v2_msi_steer(struct iproc_pcie *pcie, u64
>>> msi_addr,
>>> + bool enable)
>>> {
>>> u32 val;
>>>
>>> + if (!enable) {
>>> + /*
>>> + * Disable PAXC MSI steering. All write transfers will be
>>> + * treated as non-MSI transfers
>>> + */
>>> + val = iproc_pcie_read_reg(pcie, IPROC_PCIE_MSI_EN_CFG);
>>> + val &= ~MSI_ENABLE_CFG;
>>> + iproc_pcie_write_reg(pcie, IPROC_PCIE_MSI_EN_CFG, val);
>>> + return;
>>> + }
>>> +
>>> /*
>>> * Program bits [43:13] of address of GITS_TRANSLATER register into
>>> * bits [30:0] of the MSI base address register. In fact, in all iProc
>>> @@ -1201,7 +1213,7 @@ static int iproc_pcie_msi_steer(struct iproc_pcie
>>> *pcie,
>>> return ret;
>>> break;
>>> case IPROC_PCIE_PAXC_V2:
>>> - iproc_pcie_paxc_v2_msi_steer(pcie, msi_addr);
>>> + iproc_pcie_paxc_v2_msi_steer(pcie, msi_addr, true);
>>
>> Are you calling iproc_pcie_paxc_v2_msi_steer() anywhere else with 'false' ?
>> I see its getting called only from one place in current code
>> iproc_pcie_msi_steer().
>
> It is called below with the false field to disable MSIs. That's probably
> one reason more to create a function to enable/disable MSIs instead of
> adding a parameter to iproc_pcie_paxc_v2_msi_steer().
Correct. Thanks for helping to explain.
>
> Which brings me to the question: what happens to those MSIs writes
> when MSIs are disabled according to this patch ? Are they terminated
> at the root complex ?
Note the PAXC block parses MSI writes from our internally connected
endpoints (i.e., an embedded network processor). The PAXC block examines
these MSI writes and modifies the memory attributes (to DEVICE) of these
data and then send them out to the AXI bus. These MSI writes will then
be forwarded to the GIC (e.g., GICv2m, GICv3-ITS from ARM) for further
processing. This is saying, PAXC itself does not process these MSI
writes. They are processed by the GIC and associated interrupt will be
generated form the GIC. PAXC's job is to parse and tag them properly so
these MSI writes can reach the GIC, and at the same, reach the GIC at
the right order.
On some of these problematic PAXCs, we are being forced to disable this
PAXC internal parsing logic. In this case, we set up static mapping with
the IOMMU to modify the memory attributes of these MSI writes. These MSI
writes are always designated towards a specific memory address (e.g., on
the GICv3 based system, it's the address of the translator register),
and that's why static mapping table can be set up to work around this.
>
> Lorenzo
>
>>> break;
>>> default:
>>> return -EINVAL;
>>> @@ -1427,6 +1439,24 @@ int iproc_pcie_remove(struct iproc_pcie *pcie)
>>> }
>>> EXPORT_SYMBOL(iproc_pcie_remove);
>>>
>>> +/*
>>> + * The MSI parsing logic in certain revisions of Broadcom PAXC based root
>>> + * complex does not work and needs to be disabled
>>> + */
>>> +static void quirk_paxc_disable_msi_parsing(struct pci_dev *pdev)
>>> +{
>>> + struct iproc_pcie *pcie = iproc_data(pdev->bus);
>>> +
>>> + if (pdev->hdr_type == PCI_HEADER_TYPE_BRIDGE)
>>> + iproc_pcie_paxc_v2_msi_steer(pcie, 0, false);
>>> +}
>>> +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_BROADCOM, 0x16f0,
>>> + quirk_paxc_disable_msi_parsing);
>>> +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_BROADCOM, 0xd802,
>>> + quirk_paxc_disable_msi_parsing);
>>> +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_BROADCOM, 0xd804,
>>> + quirk_paxc_disable_msi_parsing);
>>> +
>>> MODULE_AUTHOR("Ray Jui <[email protected]>");
>>> MODULE_DESCRIPTION("Broadcom iPROC PCIe common driver");
>>> MODULE_LICENSE("GPL v2");
[+Robin]
On Fri, May 18, 2018 at 12:48:28PM -0700, Ray Jui wrote:
> Hi Lorenzo,
>
> On 5/18/2018 6:56 AM, Lorenzo Pieralisi wrote:
> >On Fri, May 18, 2018 at 02:53:41PM +0530, [email protected] wrote:
> >>On 2018-05-17 22:51, Ray Jui wrote:
> >>>The internal MSI parsing logic in certain revisions of PAXC root
> >>>complexes does not work properly and can casue corruptions on the
> >>>writes. They need to be disabled
> >>>
> >>>Signed-off-by: Ray Jui <[email protected]>
> >>>Reviewed-by: Scott Branden <[email protected]>
> >>>---
> >>>drivers/pci/host/pcie-iproc.c | 34 ++++++++++++++++++++++++++++++++--
> >>>1 file changed, 32 insertions(+), 2 deletions(-)
> >>>
> >>>diff --git a/drivers/pci/host/pcie-iproc.c b/drivers/pci/host/pcie-iproc.c
> >>>index 3c76c5f..b906d80 100644
> >>>--- a/drivers/pci/host/pcie-iproc.c
> >>>+++ b/drivers/pci/host/pcie-iproc.c
> >>>@@ -1144,10 +1144,22 @@ static int iproc_pcie_paxb_v2_msi_steer(struct
> >>>iproc_pcie *pcie, u64 msi_addr)
> >>> return ret;
> >>>}
> >>>
> >>>-static void iproc_pcie_paxc_v2_msi_steer(struct iproc_pcie *pcie, u64
> >>>msi_addr)
> >>>+static void iproc_pcie_paxc_v2_msi_steer(struct iproc_pcie *pcie, u64
> >>>msi_addr,
> >>>+ bool enable)
> >>>{
> >>> u32 val;
> >>>
> >>>+ if (!enable) {
> >>>+ /*
> >>>+ * Disable PAXC MSI steering. All write transfers will be
> >>>+ * treated as non-MSI transfers
> >>>+ */
> >>>+ val = iproc_pcie_read_reg(pcie, IPROC_PCIE_MSI_EN_CFG);
> >>>+ val &= ~MSI_ENABLE_CFG;
> >>>+ iproc_pcie_write_reg(pcie, IPROC_PCIE_MSI_EN_CFG, val);
> >>>+ return;
> >>>+ }
> >>>+
> >>> /*
> >>> * Program bits [43:13] of address of GITS_TRANSLATER register into
> >>> * bits [30:0] of the MSI base address register. In fact, in all iProc
> >>>@@ -1201,7 +1213,7 @@ static int iproc_pcie_msi_steer(struct iproc_pcie
> >>>*pcie,
> >>> return ret;
> >>> break;
> >>> case IPROC_PCIE_PAXC_V2:
> >>>- iproc_pcie_paxc_v2_msi_steer(pcie, msi_addr);
> >>>+ iproc_pcie_paxc_v2_msi_steer(pcie, msi_addr, true);
> >>
> >>Are you calling iproc_pcie_paxc_v2_msi_steer() anywhere else with 'false' ?
> >>I see its getting called only from one place in current code
> >>iproc_pcie_msi_steer().
> >
> >It is called below with the false field to disable MSIs. That's probably
> >one reason more to create a function to enable/disable MSIs instead of
> >adding a parameter to iproc_pcie_paxc_v2_msi_steer().
>
> Correct. Thanks for helping to explain.
>
> >
> >Which brings me to the question: what happens to those MSIs writes
> >when MSIs are disabled according to this patch ? Are they terminated
> >at the root complex ?
>
> Note the PAXC block parses MSI writes from our internally connected
> endpoints (i.e., an embedded network processor). The PAXC block examines
> these MSI writes and modifies the memory attributes (to DEVICE) of these
> data and then send them out to the AXI bus. These MSI writes will then be
> forwarded to the GIC (e.g., GICv2m, GICv3-ITS from ARM) for further
> processing. This is saying, PAXC itself does not process these MSI writes.
> They are processed by the GIC and associated interrupt will be generated
> form the GIC. PAXC's job is to parse and tag them properly so these MSI
> writes can reach the GIC, and at the same, reach the GIC at the right order.
>
> On some of these problematic PAXCs, we are being forced to disable this PAXC
> internal parsing logic. In this case, we set up static mapping with the
> IOMMU to modify the memory attributes of these MSI writes. These MSI writes
> are always designated towards a specific memory address (e.g., on the GICv3
> based system, it's the address of the translator register), and that's why
> static mapping table can be set up to work around this.
Which means, I presume, that there must be some code that somehow
somewhere in the kernel sets-up those mappings and it has to be related
to this patch, in which case I wonder why we enable the PAXC steering at
all given that this (hack) can be done through the IOMMU (and I assume
that on all PAXC platforms that do need MSIs there is an IOMMU IP
upstream the root bridge, otherwise I have no idea what will happen to
those MSI writes).
Lorenzo
On 21/05/18 14:33, Lorenzo Pieralisi wrote:
> [+Robin]
>
> On Fri, May 18, 2018 at 12:48:28PM -0700, Ray Jui wrote:
>> Hi Lorenzo,
>>
>> On 5/18/2018 6:56 AM, Lorenzo Pieralisi wrote:
>>> On Fri, May 18, 2018 at 02:53:41PM +0530, [email protected] wrote:
>>>> On 2018-05-17 22:51, Ray Jui wrote:
>>>>> The internal MSI parsing logic in certain revisions of PAXC root
>>>>> complexes does not work properly and can casue corruptions on the
>>>>> writes. They need to be disabled
>>>>>
>>>>> Signed-off-by: Ray Jui <[email protected]>
>>>>> Reviewed-by: Scott Branden <[email protected]>
>>>>> ---
>>>>> drivers/pci/host/pcie-iproc.c | 34 ++++++++++++++++++++++++++++++++--
>>>>> 1 file changed, 32 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/drivers/pci/host/pcie-iproc.c b/drivers/pci/host/pcie-iproc.c
>>>>> index 3c76c5f..b906d80 100644
>>>>> --- a/drivers/pci/host/pcie-iproc.c
>>>>> +++ b/drivers/pci/host/pcie-iproc.c
>>>>> @@ -1144,10 +1144,22 @@ static int iproc_pcie_paxb_v2_msi_steer(struct
>>>>> iproc_pcie *pcie, u64 msi_addr)
>>>>> return ret;
>>>>> }
>>>>>
>>>>> -static void iproc_pcie_paxc_v2_msi_steer(struct iproc_pcie *pcie, u64
>>>>> msi_addr)
>>>>> +static void iproc_pcie_paxc_v2_msi_steer(struct iproc_pcie *pcie, u64
>>>>> msi_addr,
>>>>> + bool enable)
>>>>> {
>>>>> u32 val;
>>>>>
>>>>> + if (!enable) {
>>>>> + /*
>>>>> + * Disable PAXC MSI steering. All write transfers will be
>>>>> + * treated as non-MSI transfers
>>>>> + */
>>>>> + val = iproc_pcie_read_reg(pcie, IPROC_PCIE_MSI_EN_CFG);
>>>>> + val &= ~MSI_ENABLE_CFG;
>>>>> + iproc_pcie_write_reg(pcie, IPROC_PCIE_MSI_EN_CFG, val);
>>>>> + return;
>>>>> + }
>>>>> +
>>>>> /*
>>>>> * Program bits [43:13] of address of GITS_TRANSLATER register into
>>>>> * bits [30:0] of the MSI base address register. In fact, in all iProc
>>>>> @@ -1201,7 +1213,7 @@ static int iproc_pcie_msi_steer(struct iproc_pcie
>>>>> *pcie,
>>>>> return ret;
>>>>> break;
>>>>> case IPROC_PCIE_PAXC_V2:
>>>>> - iproc_pcie_paxc_v2_msi_steer(pcie, msi_addr);
>>>>> + iproc_pcie_paxc_v2_msi_steer(pcie, msi_addr, true);
>>>>
>>>> Are you calling iproc_pcie_paxc_v2_msi_steer() anywhere else with 'false' ?
>>>> I see its getting called only from one place in current code
>>>> iproc_pcie_msi_steer().
>>>
>>> It is called below with the false field to disable MSIs. That's probably
>>> one reason more to create a function to enable/disable MSIs instead of
>>> adding a parameter to iproc_pcie_paxc_v2_msi_steer().
>>
>> Correct. Thanks for helping to explain.
>>
>>>
>>> Which brings me to the question: what happens to those MSIs writes
>>> when MSIs are disabled according to this patch ? Are they terminated
>>> at the root complex ?
>>
>> Note the PAXC block parses MSI writes from our internally connected
>> endpoints (i.e., an embedded network processor). The PAXC block examines
>> these MSI writes and modifies the memory attributes (to DEVICE) of these
>> data and then send them out to the AXI bus. These MSI writes will then be
>> forwarded to the GIC (e.g., GICv2m, GICv3-ITS from ARM) for further
>> processing. This is saying, PAXC itself does not process these MSI writes.
>> They are processed by the GIC and associated interrupt will be generated
>> form the GIC. PAXC's job is to parse and tag them properly so these MSI
>> writes can reach the GIC, and at the same, reach the GIC at the right order.
>>
>> On some of these problematic PAXCs, we are being forced to disable this PAXC
>> internal parsing logic. In this case, we set up static mapping with the
>> IOMMU to modify the memory attributes of these MSI writes. These MSI writes
>> are always designated towards a specific memory address (e.g., on the GICv3
>> based system, it's the address of the translator register), and that's why
>> static mapping table can be set up to work around this.
>
> Which means, I presume, that there must be some code that somehow
> somewhere in the kernel sets-up those mappings and it has to be related
> to this patch, in which case I wonder why we enable the PAXC steering at
> all given that this (hack) can be done through the IOMMU (and I assume
> that on all PAXC platforms that do need MSIs there is an IOMMU IP
> upstream the root bridge, otherwise I have no idea what will happen to
> those MSI writes).
If it's only rewriting memory attributes (FWIW it sounds like this thing
is comparable to the AXI translation table of the PLDA root complex we
have in Juno), then if the IOMMU is controlled by Linux the PAXC
shouldn't be needed anyway. In that situation the GICv2m/ITS doorbell
will be already mapped for the endpoint as device memory (and usually at
some arbitrary address that the PAXC won't recognise anyway) - see
iommu_dma_get_msi_page().
If Linux *doesn't* know about the IOMMU, then the firmware should be
free to set it up with a static 1:1 mapping of the PA space and leave it
that way, provided Linux can't inadvertently stomp on those page tables
later.
Robin.
Hi Robin/Lorenzo,
On 5/21/2018 7:26 AM, Robin Murphy wrote:
> On 21/05/18 14:33, Lorenzo Pieralisi wrote:
>> [+Robin]
>>
>> On Fri, May 18, 2018 at 12:48:28PM -0700, Ray Jui wrote:
>>> Hi Lorenzo,
>>>
>>> On 5/18/2018 6:56 AM, Lorenzo Pieralisi wrote:
>>>> On Fri, May 18, 2018 at 02:53:41PM +0530, [email protected] wrote:
>>>>> On 2018-05-17 22:51, Ray Jui wrote:
>>>>>> The internal MSI parsing logic in certain revisions of PAXC root
>>>>>> complexes does not work properly and can casue corruptions on the
>>>>>> writes. They need to be disabled
>>>>>>
>>>>>> Signed-off-by: Ray Jui <[email protected]>
>>>>>> Reviewed-by: Scott Branden <[email protected]>
>>>>>> ---
>>>>>> drivers/pci/host/pcie-iproc.c | 34 ++++++++++++++++++++++++++++++++--
>>>>>> 1 file changed, 32 insertions(+), 2 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/pci/host/pcie-iproc.c
>>>>>> b/drivers/pci/host/pcie-iproc.c
>>>>>> index 3c76c5f..b906d80 100644
>>>>>> --- a/drivers/pci/host/pcie-iproc.c
>>>>>> +++ b/drivers/pci/host/pcie-iproc.c
>>>>>> @@ -1144,10 +1144,22 @@ static int
>>>>>> iproc_pcie_paxb_v2_msi_steer(struct
>>>>>> iproc_pcie *pcie, u64 msi_addr)
>>>>>> return ret;
>>>>>> }
>>>>>>
>>>>>> -static void iproc_pcie_paxc_v2_msi_steer(struct iproc_pcie *pcie,
>>>>>> u64
>>>>>> msi_addr)
>>>>>> +static void iproc_pcie_paxc_v2_msi_steer(struct iproc_pcie *pcie,
>>>>>> u64
>>>>>> msi_addr,
>>>>>> + bool enable)
>>>>>> {
>>>>>> u32 val;
>>>>>>
>>>>>> + if (!enable) {
>>>>>> + /*
>>>>>> + * Disable PAXC MSI steering. All write transfers will be
>>>>>> + * treated as non-MSI transfers
>>>>>> + */
>>>>>> + val = iproc_pcie_read_reg(pcie, IPROC_PCIE_MSI_EN_CFG);
>>>>>> + val &= ~MSI_ENABLE_CFG;
>>>>>> + iproc_pcie_write_reg(pcie, IPROC_PCIE_MSI_EN_CFG, val);
>>>>>> + return;
>>>>>> + }
>>>>>> +
>>>>>> /*
>>>>>> * Program bits [43:13] of address of GITS_TRANSLATER register
>>>>>> into
>>>>>> * bits [30:0] of the MSI base address register. In fact, in
>>>>>> all iProc
>>>>>> @@ -1201,7 +1213,7 @@ static int iproc_pcie_msi_steer(struct
>>>>>> iproc_pcie
>>>>>> *pcie,
>>>>>> return ret;
>>>>>> break;
>>>>>> case IPROC_PCIE_PAXC_V2:
>>>>>> - iproc_pcie_paxc_v2_msi_steer(pcie, msi_addr);
>>>>>> + iproc_pcie_paxc_v2_msi_steer(pcie, msi_addr, true);
>>>>>
>>>>> Are you calling iproc_pcie_paxc_v2_msi_steer() anywhere else with
>>>>> 'false' ?
>>>>> I see its getting called only from one place in current code
>>>>> iproc_pcie_msi_steer().
>>>>
>>>> It is called below with the false field to disable MSIs. That's
>>>> probably
>>>> one reason more to create a function to enable/disable MSIs instead of
>>>> adding a parameter to iproc_pcie_paxc_v2_msi_steer().
>>>
>>> Correct. Thanks for helping to explain.
>>>
>>>>
>>>> Which brings me to the question: what happens to those MSIs writes
>>>> when MSIs are disabled according to this patch ? Are they terminated
>>>> at the root complex ?
>>>
>>> Note the PAXC block parses MSI writes from our internally connected
>>> endpoints (i.e., an embedded network processor). The PAXC block examines
>>> these MSI writes and modifies the memory attributes (to DEVICE) of these
>>> data and then send them out to the AXI bus. These MSI writes will
>>> then be
>>> forwarded to the GIC (e.g., GICv2m, GICv3-ITS from ARM) for further
>>> processing. This is saying, PAXC itself does not process these MSI
>>> writes.
>>> They are processed by the GIC and associated interrupt will be generated
>>> form the GIC. PAXC's job is to parse and tag them properly so these MSI
>>> writes can reach the GIC, and at the same, reach the GIC at the right
>>> order.
>>>
>>> On some of these problematic PAXCs, we are being forced to disable
>>> this PAXC
>>> internal parsing logic. In this case, we set up static mapping with the
>>> IOMMU to modify the memory attributes of these MSI writes. These MSI
>>> writes
>>> are always designated towards a specific memory address (e.g., on the
>>> GICv3
>>> based system, it's the address of the translator register), and
>>> that's why
>>> static mapping table can be set up to work around this.
>>
>> Which means, I presume, that there must be some code that somehow
>> somewhere in the kernel sets-up those mappings and it has to be related
>> to this patch, in which case I wonder why we enable the PAXC steering at
>> all given that this (hack) can be done through the IOMMU (and I assume
>> that on all PAXC platforms that do need MSIs there is an IOMMU IP
>> upstream the root bridge, otherwise I have no idea what will happen to
>> those MSI writes).
>
> If it's only rewriting memory attributes (FWIW it sounds like this thing
> is comparable to the AXI translation table of the PLDA root complex we
> have in Juno), then if the IOMMU is controlled by Linux the PAXC
> shouldn't be needed anyway. In that situation the GICv2m/ITS doorbell
> will be already mapped for the endpoint as device memory (and usually at
> some arbitrary address that the PAXC won't recognise anyway) - see
> iommu_dma_get_msi_page().
>
> If Linux *doesn't* know about the IOMMU, then the firmware should be
> free to set it up with a static 1:1 mapping of the PA space and leave it
> that way, provided Linux can't inadvertently stomp on those page tables
> later.
Right, in our case, we had our ATF bootloader set up 1:1 mapping for
this. In this case, PAXC internal MSI parsing is completely disabled
(which is what this patch does for those PAXC blocks that have this issue).
>
> Robin.
On 5/22/2018 9:48 AM, Ray Jui wrote:
> Hi Robin/Lorenzo,
>
> On 5/21/2018 7:26 AM, Robin Murphy wrote:
>> On 21/05/18 14:33, Lorenzo Pieralisi wrote:
>>> [+Robin]
>>>
>>> On Fri, May 18, 2018 at 12:48:28PM -0700, Ray Jui wrote:
>>>> Hi Lorenzo,
>>>>
>>>> On 5/18/2018 6:56 AM, Lorenzo Pieralisi wrote:
>>>>> On Fri, May 18, 2018 at 02:53:41PM +0530, [email protected] wrote:
>>>>>> On 2018-05-17 22:51, Ray Jui wrote:
>>>>>>> The internal MSI parsing logic in certain revisions of PAXC root
>>>>>>> complexes does not work properly and can casue corruptions on the
>>>>>>> writes. They need to be disabled
>>>>>>>
>>>>>>> Signed-off-by: Ray Jui <[email protected]>
>>>>>>> Reviewed-by: Scott Branden <[email protected]>
>>>>>>> ---
>>>>>>> drivers/pci/host/pcie-iproc.c | 34
>>>>>>> ++++++++++++++++++++++++++++++++--
>>>>>>> 1 file changed, 32 insertions(+), 2 deletions(-)
>>>>>>>
>>>>>>> diff --git a/drivers/pci/host/pcie-iproc.c
>>>>>>> b/drivers/pci/host/pcie-iproc.c
>>>>>>> index 3c76c5f..b906d80 100644
>>>>>>> --- a/drivers/pci/host/pcie-iproc.c
>>>>>>> +++ b/drivers/pci/host/pcie-iproc.c
>>>>>>> @@ -1144,10 +1144,22 @@ static int
>>>>>>> iproc_pcie_paxb_v2_msi_steer(struct
>>>>>>> iproc_pcie *pcie, u64 msi_addr)
>>>>>>> return ret;
>>>>>>> }
>>>>>>>
>>>>>>> -static void iproc_pcie_paxc_v2_msi_steer(struct iproc_pcie
>>>>>>> *pcie, u64
>>>>>>> msi_addr)
>>>>>>> +static void iproc_pcie_paxc_v2_msi_steer(struct iproc_pcie
>>>>>>> *pcie, u64
>>>>>>> msi_addr,
>>>>>>> + bool enable)
>>>>>>> {
>>>>>>> u32 val;
>>>>>>>
>>>>>>> + if (!enable) {
>>>>>>> + /*
>>>>>>> + * Disable PAXC MSI steering. All write transfers will be
>>>>>>> + * treated as non-MSI transfers
>>>>>>> + */
>>>>>>> + val = iproc_pcie_read_reg(pcie, IPROC_PCIE_MSI_EN_CFG);
>>>>>>> + val &= ~MSI_ENABLE_CFG;
>>>>>>> + iproc_pcie_write_reg(pcie, IPROC_PCIE_MSI_EN_CFG, val);
>>>>>>> + return;
>>>>>>> + }
>>>>>>> +
>>>>>>> /*
>>>>>>> * Program bits [43:13] of address of GITS_TRANSLATER
>>>>>>> register into
>>>>>>> * bits [30:0] of the MSI base address register. In fact, in
>>>>>>> all iProc
>>>>>>> @@ -1201,7 +1213,7 @@ static int iproc_pcie_msi_steer(struct
>>>>>>> iproc_pcie
>>>>>>> *pcie,
>>>>>>> return ret;
>>>>>>> break;
>>>>>>> case IPROC_PCIE_PAXC_V2:
>>>>>>> - iproc_pcie_paxc_v2_msi_steer(pcie, msi_addr);
>>>>>>> + iproc_pcie_paxc_v2_msi_steer(pcie, msi_addr, true);
>>>>>>
>>>>>> Are you calling iproc_pcie_paxc_v2_msi_steer() anywhere else with
>>>>>> 'false' ?
>>>>>> I see its getting called only from one place in current code
>>>>>> iproc_pcie_msi_steer().
>>>>>
>>>>> It is called below with the false field to disable MSIs. That's
>>>>> probably
>>>>> one reason more to create a function to enable/disable MSIs instead of
>>>>> adding a parameter to iproc_pcie_paxc_v2_msi_steer().
>>>>
>>>> Correct. Thanks for helping to explain.
>>>>
>>>>>
>>>>> Which brings me to the question: what happens to those MSIs writes
>>>>> when MSIs are disabled according to this patch ? Are they terminated
>>>>> at the root complex ?
>>>>
>>>> Note the PAXC block parses MSI writes from our internally connected
>>>> endpoints (i.e., an embedded network processor). The PAXC block
>>>> examines
>>>> these MSI writes and modifies the memory attributes (to DEVICE) of
>>>> these
>>>> data and then send them out to the AXI bus. These MSI writes will
>>>> then be
>>>> forwarded to the GIC (e.g., GICv2m, GICv3-ITS from ARM) for further
>>>> processing. This is saying, PAXC itself does not process these MSI
>>>> writes.
>>>> They are processed by the GIC and associated interrupt will be
>>>> generated
>>>> form the GIC. PAXC's job is to parse and tag them properly so these MSI
>>>> writes can reach the GIC, and at the same, reach the GIC at the
>>>> right order.
>>>>
>>>> On some of these problematic PAXCs, we are being forced to disable
>>>> this PAXC
>>>> internal parsing logic. In this case, we set up static mapping with the
>>>> IOMMU to modify the memory attributes of these MSI writes. These MSI
>>>> writes
>>>> are always designated towards a specific memory address (e.g., on
>>>> the GICv3
>>>> based system, it's the address of the translator register), and
>>>> that's why
>>>> static mapping table can be set up to work around this.
>>>
>>> Which means, I presume, that there must be some code that somehow
>>> somewhere in the kernel sets-up those mappings and it has to be related
>>> to this patch, in which case I wonder why we enable the PAXC steering at
>>> all given that this (hack) can be done through the IOMMU (and I assume
>>> that on all PAXC platforms that do need MSIs there is an IOMMU IP
>>> upstream the root bridge, otherwise I have no idea what will happen to
>>> those MSI writes).
>>
>> If it's only rewriting memory attributes (FWIW it sounds like this
>> thing is comparable to the AXI translation table of the PLDA root
>> complex we have in Juno), then if the IOMMU is controlled by Linux the
>> PAXC shouldn't be needed anyway. In that situation the GICv2m/ITS
>> doorbell will be already mapped for the endpoint as device memory (and
>> usually at some arbitrary address that the PAXC won't recognise
>> anyway) - see iommu_dma_get_msi_page().
>>
>> If Linux *doesn't* know about the IOMMU, then the firmware should be
>> free to set it up with a static 1:1 mapping of the PA space and leave
>> it that way, provided Linux can't inadvertently stomp on those page
>> tables later.
>
> Right, in our case, we had our ATF bootloader set up 1:1 mapping for
> this. In this case, PAXC internal MSI parsing is completely disabled
> (which is what this patch does for those PAXC blocks that have this issue).
>
And forgot to mention, the logic in our bootloader to set up 1:1 mapping
to work around this issue is chip dependent and only activated for
certain revisions of PAXC that has this issue.
>>
>> Robin.
On Thu, May 17, 2018 at 10:21:31AM -0700, Ray Jui wrote:
> On certain versions of Broadcom PAXC based root complexes, certain
> regions of the configuration space are corrupted. As a result, it
> prevents the Linux PCIe stack from traversing the linked list of the
> capability registers completely and therefore the root complex is
> not advertised as "PCIe capable". This prevents the correct PCIe RID
> from being parsed in the kernel PCIe stack. A correct RID is required
> for mapping to a stream ID from the SMMU or the device ID from the
> GICv3 ITS
>
> This patch fixes up the issue by manually populating the related
> PCIe capabilities based on readings from the PCIe capability structure
>
> Signed-off-by: Ray Jui <[email protected]>
> Reviewed-by: Anup Patel <[email protected]>
> Reviewed-by: Scott Branden <[email protected]>
> ---
> drivers/pci/quirks.c | 95 ++++++++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 95 insertions(+)
>
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index 47dfea0..0cdbd0a 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -2198,6 +2198,101 @@ DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_BROADCOM, 0x16f0, quirk_paxc_bridge);
> DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_BROADCOM, 0xd750, quirk_paxc_bridge);
> DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_BROADCOM, 0xd802, quirk_paxc_bridge);
> DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_BROADCOM, 0xd804, quirk_paxc_bridge);
> +
> +/*
> + * The PCI capabilities list for certain revisions of Broadcom PAXC root
> + * complexes is incorrectly terminated due to corrupted configuration space
> + * registers in the range of 0x50 - 0x5f
> + *
> + * As a result, the capability list becomes broken and prevent standard PCI
> + * stack from being able to traverse to the PCIe capability structure
> + */
> +static void quirk_paxc_pcie_capability(struct pci_dev *pdev)
> +{
> + int pos, i = 0;
> + u8 next_cap;
> + u16 reg16, *cap;
> + struct pci_cap_saved_state *state;
> +
> + /* bail out if PCIe capability can be found */
> + if (pdev->pcie_cap || pci_find_capability(pdev, PCI_CAP_ID_EXP))
> + return;
> +
> + /* locate the power management capability */
> + pos = pci_find_capability(pdev, PCI_CAP_ID_PM);
> + if (!pos)
> + return;
> +
> + /* bail out if the next capability pointer is not 0x50/0x58 */
> + pci_read_config_byte(pdev, pos + 1, &next_cap);
> + if (next_cap != 0x50 && next_cap != 0x58)
> + return;
> +
> + /* bail out if we do not terminate at 0x50/0x58 */
> + pos = next_cap;
> + pci_read_config_byte(pdev, pos + 1, &next_cap);
> + if (next_cap != 0x00)
> + return;
> +
> + /*
> + * On these buggy HW, PCIe capability structure is expected to be at
> + * 0xac and should terminate the list
> + *
> + * Borrow the similar logic from theIntel DH895xCC VFs fixup to save
> + * the PCIe capability list
> + */
> + pos = 0xac;
> + pci_read_config_word(pdev, pos, ®16);
> + if (reg16 == (0x0000 | PCI_CAP_ID_EXP)) {
> + u32 status;
> +
> +#ifndef PCI_EXP_SAVE_REGS
> +#define PCI_EXP_SAVE_REGS 7
> +#endif
> + int size = PCI_EXP_SAVE_REGS * sizeof(u16);
> +
> + pdev->pcie_cap = pos;
> + pci_read_config_word(pdev, pos + PCI_EXP_FLAGS, ®16);
> + pdev->pcie_flags_reg = reg16;
> + pci_read_config_word(pdev, pos + PCI_EXP_DEVCAP, ®16);
> + pdev->pcie_mpss = reg16 & PCI_EXP_DEVCAP_PAYLOAD;
Is there any way you can fix this in iproc_pcie_config_read() instead,
by making it notice when we're reading a corrupted part of config
space, and then returning the correct data instead? Is it just the
next capability pointer that's corrupted?
If you could fix it in the config accessor, lspci would automatically
show all the correct data (I think lspci will still show the wrong
data with this patch).
The quirk seems like a maintenance issue because anything that calls
pci_find_capability(pdev, PCI_CAP_ID_EXP)
will get the wrong answer.
> +
> + pdev->cfg_size = PCI_CFG_SPACE_EXP_SIZE;
> + if (pci_read_config_dword(pdev, PCI_CFG_SPACE_SIZE, &status) !=
> + PCIBIOS_SUCCESSFUL || (status == 0xffffffff))
> + pdev->cfg_size = PCI_CFG_SPACE_SIZE;
> +
> + if (pci_find_saved_cap(pdev, PCI_CAP_ID_EXP))
> + return;
> +
> + state = kzalloc(sizeof(*state) + size, GFP_KERNEL);
> + if (!state)
> + return;
> +
> + state->cap.cap_nr = PCI_CAP_ID_EXP;
> + state->cap.cap_extended = 0;
> + state->cap.size = size;
> + cap = (u16 *)&state->cap.data[0];
> + pcie_capability_read_word(pdev, PCI_EXP_DEVCTL, &cap[i++]);
> + pcie_capability_read_word(pdev, PCI_EXP_LNKCTL, &cap[i++]);
> + pcie_capability_read_word(pdev, PCI_EXP_SLTCTL, &cap[i++]);
> + pcie_capability_read_word(pdev, PCI_EXP_RTCTL, &cap[i++]);
> + pcie_capability_read_word(pdev, PCI_EXP_DEVCTL2, &cap[i++]);
> + pcie_capability_read_word(pdev, PCI_EXP_LNKCTL2, &cap[i++]);
> + pcie_capability_read_word(pdev, PCI_EXP_SLTCTL2, &cap[i++]);
> + hlist_add_head(&state->next, &pdev->saved_cap_space);
> + }
> +}
> +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_BROADCOM, PCI_DEVICE_ID_NX2_57810,
> + quirk_paxc_pcie_capability);
> +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_BROADCOM, 0x16cd,
> + quirk_paxc_pcie_capability);
> +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_BROADCOM, 0x16f0,
> + quirk_paxc_pcie_capability);
> +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_BROADCOM, 0xd802,
> + quirk_paxc_pcie_capability);
> +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_BROADCOM, 0xd804,
> + quirk_paxc_pcie_capability);
> #endif
>
> /* Originally in EDAC sources for i82875P:
> --
> 2.1.4
>
Hi Bjorn,
On 5/30/2018 10:27 AM, Bjorn Helgaas wrote:
> On Thu, May 17, 2018 at 10:21:31AM -0700, Ray Jui wrote:
>> On certain versions of Broadcom PAXC based root complexes, certain
>> regions of the configuration space are corrupted. As a result, it
>> prevents the Linux PCIe stack from traversing the linked list of the
>> capability registers completely and therefore the root complex is
>> not advertised as "PCIe capable". This prevents the correct PCIe RID
>> from being parsed in the kernel PCIe stack. A correct RID is required
>> for mapping to a stream ID from the SMMU or the device ID from the
>> GICv3 ITS
>>
>> This patch fixes up the issue by manually populating the related
>> PCIe capabilities based on readings from the PCIe capability structure
>>
>> Signed-off-by: Ray Jui <[email protected]>
>> Reviewed-by: Anup Patel <[email protected]>
>> Reviewed-by: Scott Branden <[email protected]>
>> ---
>> drivers/pci/quirks.c | 95 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>> 1 file changed, 95 insertions(+)
>>
>> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
>> index 47dfea0..0cdbd0a 100644
>> --- a/drivers/pci/quirks.c
>> +++ b/drivers/pci/quirks.c
>> @@ -2198,6 +2198,101 @@ DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_BROADCOM, 0x16f0, quirk_paxc_bridge);
>> DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_BROADCOM, 0xd750, quirk_paxc_bridge);
>> DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_BROADCOM, 0xd802, quirk_paxc_bridge);
>> DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_BROADCOM, 0xd804, quirk_paxc_bridge);
>> +
>> +/*
>> + * The PCI capabilities list for certain revisions of Broadcom PAXC root
>> + * complexes is incorrectly terminated due to corrupted configuration space
>> + * registers in the range of 0x50 - 0x5f
>> + *
>> + * As a result, the capability list becomes broken and prevent standard PCI
>> + * stack from being able to traverse to the PCIe capability structure
>> + */
>> +static void quirk_paxc_pcie_capability(struct pci_dev *pdev)
>> +{
>> + int pos, i = 0;
>> + u8 next_cap;
>> + u16 reg16, *cap;
>> + struct pci_cap_saved_state *state;
>> +
>> + /* bail out if PCIe capability can be found */
>> + if (pdev->pcie_cap || pci_find_capability(pdev, PCI_CAP_ID_EXP))
>> + return;
>> +
>> + /* locate the power management capability */
>> + pos = pci_find_capability(pdev, PCI_CAP_ID_PM);
>> + if (!pos)
>> + return;
>> +
>> + /* bail out if the next capability pointer is not 0x50/0x58 */
>> + pci_read_config_byte(pdev, pos + 1, &next_cap);
>> + if (next_cap != 0x50 && next_cap != 0x58)
>> + return;
>> +
>> + /* bail out if we do not terminate at 0x50/0x58 */
>> + pos = next_cap;
>> + pci_read_config_byte(pdev, pos + 1, &next_cap);
>> + if (next_cap != 0x00)
>> + return;
>> +
>> + /*
>> + * On these buggy HW, PCIe capability structure is expected to be at
>> + * 0xac and should terminate the list
>> + *
>> + * Borrow the similar logic from theIntel DH895xCC VFs fixup to save
>> + * the PCIe capability list
>> + */
>> + pos = 0xac;
>> + pci_read_config_word(pdev, pos, ®16);
>> + if (reg16 == (0x0000 | PCI_CAP_ID_EXP)) {
>> + u32 status;
>> +
>> +#ifndef PCI_EXP_SAVE_REGS
>> +#define PCI_EXP_SAVE_REGS 7
>> +#endif
>> + int size = PCI_EXP_SAVE_REGS * sizeof(u16);
>> +
>> + pdev->pcie_cap = pos;
>> + pci_read_config_word(pdev, pos + PCI_EXP_FLAGS, ®16);
>> + pdev->pcie_flags_reg = reg16;
>> + pci_read_config_word(pdev, pos + PCI_EXP_DEVCAP, ®16);
>> + pdev->pcie_mpss = reg16 & PCI_EXP_DEVCAP_PAYLOAD;
>
> Is there any way you can fix this in iproc_pcie_config_read() instead,
> by making it notice when we're reading a corrupted part of config
> space, and then returning the correct data instead? Is it just the
> next capability pointer that's corrupted?
Let me look into that and I'll get back.
Thanks,
Ray
>
> If you could fix it in the config accessor, lspci would automatically
> show all the correct data (I think lspci will still show the wrong
> data with this patch).
>
> The quirk seems like a maintenance issue because anything that calls
>
> pci_find_capability(pdev, PCI_CAP_ID_EXP)
>
> will get the wrong answer.
>
>> +
>> + pdev->cfg_size = PCI_CFG_SPACE_EXP_SIZE;
>> + if (pci_read_config_dword(pdev, PCI_CFG_SPACE_SIZE, &status) !=
>> + PCIBIOS_SUCCESSFUL || (status == 0xffffffff))
>> + pdev->cfg_size = PCI_CFG_SPACE_SIZE;
>> +
>> + if (pci_find_saved_cap(pdev, PCI_CAP_ID_EXP))
>> + return;
>> +
>> + state = kzalloc(sizeof(*state) + size, GFP_KERNEL);
>> + if (!state)
>> + return;
>> +
>> + state->cap.cap_nr = PCI_CAP_ID_EXP;
>> + state->cap.cap_extended = 0;
>> + state->cap.size = size;
>> + cap = (u16 *)&state->cap.data[0];
>> + pcie_capability_read_word(pdev, PCI_EXP_DEVCTL, &cap[i++]);
>> + pcie_capability_read_word(pdev, PCI_EXP_LNKCTL, &cap[i++]);
>> + pcie_capability_read_word(pdev, PCI_EXP_SLTCTL, &cap[i++]);
>> + pcie_capability_read_word(pdev, PCI_EXP_RTCTL, &cap[i++]);
>> + pcie_capability_read_word(pdev, PCI_EXP_DEVCTL2, &cap[i++]);
>> + pcie_capability_read_word(pdev, PCI_EXP_LNKCTL2, &cap[i++]);
>> + pcie_capability_read_word(pdev, PCI_EXP_SLTCTL2, &cap[i++]);
>> + hlist_add_head(&state->next, &pdev->saved_cap_space);
>> + }
>> +}
>> +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_BROADCOM, PCI_DEVICE_ID_NX2_57810,
>> + quirk_paxc_pcie_capability);
>> +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_BROADCOM, 0x16cd,
>> + quirk_paxc_pcie_capability);
>> +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_BROADCOM, 0x16f0,
>> + quirk_paxc_pcie_capability);
>> +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_BROADCOM, 0xd802,
>> + quirk_paxc_pcie_capability);
>> +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_BROADCOM, 0xd804,
>> + quirk_paxc_pcie_capability);
>> #endif
>>
>> /* Originally in EDAC sources for i82875P:
>> --
>> 2.1.4
>>
Hi Bjorn,
On 5/30/2018 10:43 AM, Ray Jui wrote:
> Hi Bjorn,
>
> On 5/30/2018 10:27 AM, Bjorn Helgaas wrote:
>> On Thu, May 17, 2018 at 10:21:31AM -0700, Ray Jui wrote:
>>> On certain versions of Broadcom PAXC based root complexes, certain
>>> regions of the configuration space are corrupted. As a result, it
>>> prevents the Linux PCIe stack from traversing the linked list of the
>>> capability registers completely and therefore the root complex is
>>> not advertised as "PCIe capable". This prevents the correct PCIe RID
>>> from being parsed in the kernel PCIe stack. A correct RID is required
>>> for mapping to a stream ID from the SMMU or the device ID from the
>>> GICv3 ITS
>>>
>>> This patch fixes up the issue by manually populating the related
>>> PCIe capabilities based on readings from the PCIe capability structure
>>>
>>> Signed-off-by: Ray Jui <[email protected]>
>>> Reviewed-by: Anup Patel <[email protected]>
>>> Reviewed-by: Scott Branden <[email protected]>
>>> ---
>>> drivers/pci/quirks.c | 95
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> 1 file changed, 95 insertions(+)
>>>
>>> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
>>> index 47dfea0..0cdbd0a 100644
>>> --- a/drivers/pci/quirks.c
>>> +++ b/drivers/pci/quirks.c
>>> @@ -2198,6 +2198,101 @@
>>> DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_BROADCOM, 0x16f0,
>>> quirk_paxc_bridge);
>>> DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_BROADCOM, 0xd750,
>>> quirk_paxc_bridge);
>>> DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_BROADCOM, 0xd802,
>>> quirk_paxc_bridge);
>>> DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_BROADCOM, 0xd804,
>>> quirk_paxc_bridge);
>>> +
>>> +/*
>>> + * The PCI capabilities list for certain revisions of Broadcom PAXC
>>> root
>>> + * complexes is incorrectly terminated due to corrupted
>>> configuration space
>>> + * registers in the range of 0x50 - 0x5f
>>> + *
>>> + * As a result, the capability list becomes broken and prevent
>>> standard PCI
>>> + * stack from being able to traverse to the PCIe capability structure
>>> + */
>>> +static void quirk_paxc_pcie_capability(struct pci_dev *pdev)
>>> +{
>>> + int pos, i = 0;
>>> + u8 next_cap;
>>> + u16 reg16, *cap;
>>> + struct pci_cap_saved_state *state;
>>> +
>>> + /* bail out if PCIe capability can be found */
>>> + if (pdev->pcie_cap || pci_find_capability(pdev, PCI_CAP_ID_EXP))
>>> + return;
>>> +
>>> + /* locate the power management capability */
>>> + pos = pci_find_capability(pdev, PCI_CAP_ID_PM);
>>> + if (!pos)
>>> + return;
>>> +
>>> + /* bail out if the next capability pointer is not 0x50/0x58 */
>>> + pci_read_config_byte(pdev, pos + 1, &next_cap);
>>> + if (next_cap != 0x50 && next_cap != 0x58)
>>> + return;
>>> +
>>> + /* bail out if we do not terminate at 0x50/0x58 */
>>> + pos = next_cap;
>>> + pci_read_config_byte(pdev, pos + 1, &next_cap);
>>> + if (next_cap != 0x00)
>>> + return;
>>> +
>>> + /*
>>> + * On these buggy HW, PCIe capability structure is expected to
>>> be at
>>> + * 0xac and should terminate the list
>>> + *
>>> + * Borrow the similar logic from theIntel DH895xCC VFs fixup to
>>> save
>>> + * the PCIe capability list
>>> + */
>>> + pos = 0xac;
>>> + pci_read_config_word(pdev, pos, ®16);
>>> + if (reg16 == (0x0000 | PCI_CAP_ID_EXP)) {
>>> + u32 status;
>>> +
>>> +#ifndef PCI_EXP_SAVE_REGS
>>> +#define PCI_EXP_SAVE_REGS 7
>>> +#endif
>>> + int size = PCI_EXP_SAVE_REGS * sizeof(u16);
>>> +
>>> + pdev->pcie_cap = pos;
>>> + pci_read_config_word(pdev, pos + PCI_EXP_FLAGS, ®16);
>>> + pdev->pcie_flags_reg = reg16;
>>> + pci_read_config_word(pdev, pos + PCI_EXP_DEVCAP, ®16);
>>> + pdev->pcie_mpss = reg16 & PCI_EXP_DEVCAP_PAYLOAD;
>>
>> Is there any way you can fix this in iproc_pcie_config_read() instead,
>> by making it notice when we're reading a corrupted part of config
>> space, and then returning the correct data instead? Is it just the
>> next capability pointer that's corrupted?
>
> Let me look into that and I'll get back.
>
> Thanks,
>
> Ray
>
>>
>> If you could fix it in the config accessor, lspci would automatically
>> show all the correct data (I think lspci will still show the wrong
>> data with this patch).
>>
I managed to get this fixed within iproc_pcie_config_read(). By doing it
this way, as you suggested, now lspci and dump of
/sys/bus/pci/devices/xxx:yyy:zzz.w all show correct data.
I'll send out v2 of the patch series with the fix.
Thanks!
Ray
>> The quirk seems like a maintenance issue because anything that calls
>>
>> pci_find_capability(pdev, PCI_CAP_ID_EXP)
>>
>> will get the wrong answer.
>>
>>> +
>>> + pdev->cfg_size = PCI_CFG_SPACE_EXP_SIZE;
>>> + if (pci_read_config_dword(pdev, PCI_CFG_SPACE_SIZE, &status) !=
>>> + PCIBIOS_SUCCESSFUL || (status == 0xffffffff))
>>> + pdev->cfg_size = PCI_CFG_SPACE_SIZE;
>>> +
>>> + if (pci_find_saved_cap(pdev, PCI_CAP_ID_EXP))
>>> + return;
>>> +
>>> + state = kzalloc(sizeof(*state) + size, GFP_KERNEL);
>>> + if (!state)
>>> + return;
>>> +
>>> + state->cap.cap_nr = PCI_CAP_ID_EXP;
>>> + state->cap.cap_extended = 0;
>>> + state->cap.size = size;
>>> + cap = (u16 *)&state->cap.data[0];
>>> + pcie_capability_read_word(pdev, PCI_EXP_DEVCTL, &cap[i++]);
>>> + pcie_capability_read_word(pdev, PCI_EXP_LNKCTL, &cap[i++]);
>>> + pcie_capability_read_word(pdev, PCI_EXP_SLTCTL, &cap[i++]);
>>> + pcie_capability_read_word(pdev, PCI_EXP_RTCTL, &cap[i++]);
>>> + pcie_capability_read_word(pdev, PCI_EXP_DEVCTL2, &cap[i++]);
>>> + pcie_capability_read_word(pdev, PCI_EXP_LNKCTL2, &cap[i++]);
>>> + pcie_capability_read_word(pdev, PCI_EXP_SLTCTL2, &cap[i++]);
>>> + hlist_add_head(&state->next, &pdev->saved_cap_space);
>>> + }
>>> +}
>>> +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_BROADCOM,
>>> PCI_DEVICE_ID_NX2_57810,
>>> + quirk_paxc_pcie_capability);
>>> +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_BROADCOM, 0x16cd,
>>> + quirk_paxc_pcie_capability);
>>> +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_BROADCOM, 0x16f0,
>>> + quirk_paxc_pcie_capability);
>>> +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_BROADCOM, 0xd802,
>>> + quirk_paxc_pcie_capability);
>>> +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_BROADCOM, 0xd804,
>>> + quirk_paxc_pcie_capability);
>>> #endif
>>> /* Originally in EDAC sources for i82875P:
>>> --
>>> 2.1.4
>>>
On Tue, May 22, 2018 at 09:48:55AM -0700, Ray Jui wrote:
> Hi Robin/Lorenzo,
>
> On 5/21/2018 7:26 AM, Robin Murphy wrote:
> >On 21/05/18 14:33, Lorenzo Pieralisi wrote:
> >>[+Robin]
> >>
> >>On Fri, May 18, 2018 at 12:48:28PM -0700, Ray Jui wrote:
> >>>Hi Lorenzo,
> >>>
> >>>On 5/18/2018 6:56 AM, Lorenzo Pieralisi wrote:
> >>>>On Fri, May 18, 2018 at 02:53:41PM +0530, [email protected] wrote:
> >>>>>On 2018-05-17 22:51, Ray Jui wrote:
> >>>>>>The internal MSI parsing logic in certain revisions of PAXC root
> >>>>>>complexes does not work properly and can casue corruptions on the
> >>>>>>writes. They need to be disabled
> >>>>>>
> >>>>>>Signed-off-by: Ray Jui <[email protected]>
> >>>>>>Reviewed-by: Scott Branden <[email protected]>
> >>>>>>---
> >>>>>>drivers/pci/host/pcie-iproc.c | 34 ++++++++++++++++++++++++++++++++--
> >>>>>>1 file changed, 32 insertions(+), 2 deletions(-)
> >>>>>>
> >>>>>>diff --git a/drivers/pci/host/pcie-iproc.c
> >>>>>>b/drivers/pci/host/pcie-iproc.c
> >>>>>>index 3c76c5f..b906d80 100644
> >>>>>>--- a/drivers/pci/host/pcie-iproc.c
> >>>>>>+++ b/drivers/pci/host/pcie-iproc.c
> >>>>>>@@ -1144,10 +1144,22 @@ static int
> >>>>>>iproc_pcie_paxb_v2_msi_steer(struct
> >>>>>>iproc_pcie *pcie, u64 msi_addr)
> >>>>>>????return ret;
> >>>>>>}
> >>>>>>
> >>>>>>-static void iproc_pcie_paxc_v2_msi_steer(struct
> >>>>>>iproc_pcie *pcie, u64
> >>>>>>msi_addr)
> >>>>>>+static void iproc_pcie_paxc_v2_msi_steer(struct
> >>>>>>iproc_pcie *pcie, u64
> >>>>>>msi_addr,
> >>>>>>+???????????????????? bool enable)
> >>>>>>{
> >>>>>>????u32 val;
> >>>>>>
> >>>>>>+??? if (!enable) {
> >>>>>>+??????? /*
> >>>>>>+???????? * Disable PAXC MSI steering. All write transfers will be
> >>>>>>+???????? * treated as non-MSI transfers
> >>>>>>+???????? */
> >>>>>>+??????? val = iproc_pcie_read_reg(pcie, IPROC_PCIE_MSI_EN_CFG);
> >>>>>>+??????? val &= ~MSI_ENABLE_CFG;
> >>>>>>+??????? iproc_pcie_write_reg(pcie, IPROC_PCIE_MSI_EN_CFG, val);
> >>>>>>+??????? return;
> >>>>>>+??? }
> >>>>>>+
> >>>>>>????/*
> >>>>>>???? * Program bits [43:13] of address of
> >>>>>>GITS_TRANSLATER register into
> >>>>>>???? * bits [30:0] of the MSI base address register.? In
> >>>>>>fact, in all iProc
> >>>>>>@@ -1201,7 +1213,7 @@ static int
> >>>>>>iproc_pcie_msi_steer(struct iproc_pcie
> >>>>>>*pcie,
> >>>>>>??????????? return ret;
> >>>>>>??????? break;
> >>>>>>????case IPROC_PCIE_PAXC_V2:
> >>>>>>-??????? iproc_pcie_paxc_v2_msi_steer(pcie, msi_addr);
> >>>>>>+??????? iproc_pcie_paxc_v2_msi_steer(pcie, msi_addr, true);
> >>>>>
> >>>>>Are you calling iproc_pcie_paxc_v2_msi_steer() anywhere
> >>>>>else with 'false' ?
> >>>>>I see its getting called only from one place in current code
> >>>>>iproc_pcie_msi_steer().
> >>>>
> >>>>It is called below with the false field to disable MSIs.
> >>>>That's probably
> >>>>one reason more to create a function to enable/disable MSIs instead of
> >>>>adding a parameter to iproc_pcie_paxc_v2_msi_steer().
> >>>
> >>>Correct. Thanks for helping to explain.
> >>>
> >>>>
> >>>>Which brings me to the question: what happens to those MSIs writes
> >>>>when MSIs are disabled according to this patch ? Are they terminated
> >>>>at the root complex ?
> >>>
> >>>Note the PAXC block parses MSI writes from our internally connected
> >>>endpoints (i.e., an embedded network processor). The PAXC block examines
> >>>these MSI writes and modifies the memory attributes (to DEVICE) of these
> >>>data and then send them out to the AXI bus. These MSI writes
> >>>will then be
> >>>forwarded to the GIC (e.g., GICv2m, GICv3-ITS from ARM) for further
> >>>processing. This is saying, PAXC itself does not process these
> >>>MSI writes.
> >>>They are processed by the GIC and associated interrupt will be generated
> >>>form the GIC. PAXC's job is to parse and tag them properly so these MSI
> >>>writes can reach the GIC, and at the same, reach the GIC at
> >>>the right order.
> >>>
> >>>On some of these problematic PAXCs, we are being forced to
> >>>disable this PAXC
> >>>internal parsing logic. In this case, we set up static mapping with the
> >>>IOMMU to modify the memory attributes of these MSI writes.
> >>>These MSI writes
> >>>are always designated towards a specific memory address (e.g.,
> >>>on the GICv3
> >>>based system, it's the address of the translator register),
> >>>and that's why
> >>>static mapping table can be set up to work around this.
> >>
> >>Which means, I presume, that there must be some code that somehow
> >>somewhere in the kernel sets-up those mappings and it has to be related
> >>to this patch, in which case I wonder why we enable the PAXC steering at
> >>all given that this (hack) can be done through the IOMMU (and I assume
> >>that on all PAXC platforms that do need MSIs there is an IOMMU IP
> >>upstream the root bridge, otherwise I have no idea what will happen to
> >>those MSI writes).
> >
> >If it's only rewriting memory attributes (FWIW it sounds like this
> >thing is comparable to the AXI translation table of the PLDA root
> >complex we have in Juno), then if the IOMMU is controlled by Linux
> >the PAXC shouldn't be needed anyway. In that situation the
> >GICv2m/ITS doorbell will be already mapped for the endpoint as
> >device memory (and usually at some arbitrary address that the PAXC
> >won't recognise anyway) - see iommu_dma_get_msi_page().
> >
> >If Linux *doesn't* know about the IOMMU, then the firmware should
> >be free to set it up with a static 1:1 mapping of the PA space and
> >leave it that way, provided Linux can't inadvertently stomp on
> >those page tables later.
>
> Right, in our case, we had our ATF bootloader set up 1:1 mapping for
> this. In this case, PAXC internal MSI parsing is completely disabled
> (which is what this patch does for those PAXC blocks that have this
> issue).
Ok so to sum it up, why does the PCI host bridge code has to deal with
this steering at all given the discussion above ? Why can't we disable
PAXC steering completely ?
Lorenzo
Hi Lorenzo,
On 7/4/2018 8:13 AM, Lorenzo Pieralisi wrote:
> On Tue, May 22, 2018 at 09:48:55AM -0700, Ray Jui wrote:
>> Hi Robin/Lorenzo,
>>
>> On 5/21/2018 7:26 AM, Robin Murphy wrote:
>>> On 21/05/18 14:33, Lorenzo Pieralisi wrote:
>>>> [+Robin]
>>>>
>>>> On Fri, May 18, 2018 at 12:48:28PM -0700, Ray Jui wrote:
>>>>> Hi Lorenzo,
>>>>>
>>>>> On 5/18/2018 6:56 AM, Lorenzo Pieralisi wrote:
>>>>>> On Fri, May 18, 2018 at 02:53:41PM +0530, [email protected] wrote:
>>>>>>> On 2018-05-17 22:51, Ray Jui wrote:
>>>>>>>> The internal MSI parsing logic in certain revisions of PAXC root
>>>>>>>> complexes does not work properly and can casue corruptions on the
>>>>>>>> writes. They need to be disabled
>>>>>>>>
>>>>>>>> Signed-off-by: Ray Jui <[email protected]>
>>>>>>>> Reviewed-by: Scott Branden <[email protected]>
>>>>>>>> ---
>>>>>>>> drivers/pci/host/pcie-iproc.c | 34 ++++++++++++++++++++++++++++++++--
>>>>>>>> 1 file changed, 32 insertions(+), 2 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/drivers/pci/host/pcie-iproc.c
>>>>>>>> b/drivers/pci/host/pcie-iproc.c
>>>>>>>> index 3c76c5f..b906d80 100644
>>>>>>>> --- a/drivers/pci/host/pcie-iproc.c
>>>>>>>> +++ b/drivers/pci/host/pcie-iproc.c
>>>>>>>> @@ -1144,10 +1144,22 @@ static int
>>>>>>>> iproc_pcie_paxb_v2_msi_steer(struct
>>>>>>>> iproc_pcie *pcie, u64 msi_addr)
>>>>>>>> return ret;
>>>>>>>> }
>>>>>>>>
>>>>>>>> -static void iproc_pcie_paxc_v2_msi_steer(struct
>>>>>>>> iproc_pcie *pcie, u64
>>>>>>>> msi_addr)
>>>>>>>> +static void iproc_pcie_paxc_v2_msi_steer(struct
>>>>>>>> iproc_pcie *pcie, u64
>>>>>>>> msi_addr,
>>>>>>>> + bool enable)
>>>>>>>> {
>>>>>>>> u32 val;
>>>>>>>>
>>>>>>>> + if (!enable) {
>>>>>>>> + /*
>>>>>>>> + * Disable PAXC MSI steering. All write transfers will be
>>>>>>>> + * treated as non-MSI transfers
>>>>>>>> + */
>>>>>>>> + val = iproc_pcie_read_reg(pcie, IPROC_PCIE_MSI_EN_CFG);
>>>>>>>> + val &= ~MSI_ENABLE_CFG;
>>>>>>>> + iproc_pcie_write_reg(pcie, IPROC_PCIE_MSI_EN_CFG, val);
>>>>>>>> + return;
>>>>>>>> + }
>>>>>>>> +
>>>>>>>> /*
>>>>>>>> * Program bits [43:13] of address of
>>>>>>>> GITS_TRANSLATER register into
>>>>>>>> * bits [30:0] of the MSI base address register. In
>>>>>>>> fact, in all iProc
>>>>>>>> @@ -1201,7 +1213,7 @@ static int
>>>>>>>> iproc_pcie_msi_steer(struct iproc_pcie
>>>>>>>> *pcie,
>>>>>>>> return ret;
>>>>>>>> break;
>>>>>>>> case IPROC_PCIE_PAXC_V2:
>>>>>>>> - iproc_pcie_paxc_v2_msi_steer(pcie, msi_addr);
>>>>>>>> + iproc_pcie_paxc_v2_msi_steer(pcie, msi_addr, true);
>>>>>>>
>>>>>>> Are you calling iproc_pcie_paxc_v2_msi_steer() anywhere
>>>>>>> else with 'false' ?
>>>>>>> I see its getting called only from one place in current code
>>>>>>> iproc_pcie_msi_steer().
>>>>>>
>>>>>> It is called below with the false field to disable MSIs.
>>>>>> That's probably
>>>>>> one reason more to create a function to enable/disable MSIs instead of
>>>>>> adding a parameter to iproc_pcie_paxc_v2_msi_steer().
>>>>>
>>>>> Correct. Thanks for helping to explain.
>>>>>
>>>>>>
>>>>>> Which brings me to the question: what happens to those MSIs writes
>>>>>> when MSIs are disabled according to this patch ? Are they terminated
>>>>>> at the root complex ?
>>>>>
>>>>> Note the PAXC block parses MSI writes from our internally connected
>>>>> endpoints (i.e., an embedded network processor). The PAXC block examines
>>>>> these MSI writes and modifies the memory attributes (to DEVICE) of these
>>>>> data and then send them out to the AXI bus. These MSI writes
>>>>> will then be
>>>>> forwarded to the GIC (e.g., GICv2m, GICv3-ITS from ARM) for further
>>>>> processing. This is saying, PAXC itself does not process these
>>>>> MSI writes.
>>>>> They are processed by the GIC and associated interrupt will be generated
>>>>> form the GIC. PAXC's job is to parse and tag them properly so these MSI
>>>>> writes can reach the GIC, and at the same, reach the GIC at
>>>>> the right order.
>>>>>
>>>>> On some of these problematic PAXCs, we are being forced to
>>>>> disable this PAXC
>>>>> internal parsing logic. In this case, we set up static mapping with the
>>>>> IOMMU to modify the memory attributes of these MSI writes.
>>>>> These MSI writes
>>>>> are always designated towards a specific memory address (e.g.,
>>>>> on the GICv3
>>>>> based system, it's the address of the translator register),
>>>>> and that's why
>>>>> static mapping table can be set up to work around this.
>>>>
>>>> Which means, I presume, that there must be some code that somehow
>>>> somewhere in the kernel sets-up those mappings and it has to be related
>>>> to this patch, in which case I wonder why we enable the PAXC steering at
>>>> all given that this (hack) can be done through the IOMMU (and I assume
>>>> that on all PAXC platforms that do need MSIs there is an IOMMU IP
>>>> upstream the root bridge, otherwise I have no idea what will happen to
>>>> those MSI writes).
>>>
>>> If it's only rewriting memory attributes (FWIW it sounds like this
>>> thing is comparable to the AXI translation table of the PLDA root
>>> complex we have in Juno), then if the IOMMU is controlled by Linux
>>> the PAXC shouldn't be needed anyway. In that situation the
>>> GICv2m/ITS doorbell will be already mapped for the endpoint as
>>> device memory (and usually at some arbitrary address that the PAXC
>>> won't recognise anyway) - see iommu_dma_get_msi_page().
>>>
>>> If Linux *doesn't* know about the IOMMU, then the firmware should
>>> be free to set it up with a static 1:1 mapping of the PA space and
>>> leave it that way, provided Linux can't inadvertently stomp on
>>> those page tables later.
>>
>> Right, in our case, we had our ATF bootloader set up 1:1 mapping for
>> this. In this case, PAXC internal MSI parsing is completely disabled
>> (which is what this patch does for those PAXC blocks that have this
>> issue).
>
> Ok so to sum it up, why does the PCI host bridge code has to deal with
> this steering at all given the discussion above ? Why can't we disable
> PAXC steering completely ?
>
> Lorenzo
>
Sorry I probably did not give enough information in our previous discussion.
Per recommendation from our ASIC team, we are advised to enable MSI
steering in the PAXC block in the latest revision of our SoC (and
therefore there's this patch to enable/disable MSI steering based on the
root complex device ID).
Note PAXC is an highly custom, emulated root complex hard-wire connected
internally to our embedded network processor (that acts as EP). Some
special tagging in done in the firmware for the network processor to
distinguish between data packets, completion, and MSI packets
(data/completion in the perspective of our network processor), and
inbound packets are re-ordered within PAXC to 1) ensure that they are in
the correct order (e.g., completion and MSI needs to arrive after data),
and 2) achieve extremely high performance (i.e., pure strictly ordered
packets will actually result very poor performance). I don't know more
details but the above is the high-level summary that explains why we
need to enable PAXC based steering in the latest revision of our SoC.
Thanks,
Ray