2019-01-18 04:25:00

by Srinath Mannam

[permalink] [raw]
Subject: [PATCH 0/3] Add IPROC PCIe new features

Add changes related to IPROC PCIe RC IP new features.

This patch set is based on Linux-5.0-rc2.

Srinath Mannam (3):
PCI: iproc: Add feature to set order mode
PCI: iproc: CRS state check in config request
PCI: iproc: Add PCIe 32bit outbound memory configuration

drivers/pci/controller/pcie-iproc.c | 172 +++++++++++++++++++++++++++++++++++-
drivers/pci/controller/pcie-iproc.h | 16 ++++
2 files changed, 184 insertions(+), 4 deletions(-)

--
2.7.4



2019-01-18 04:25:07

by Srinath Mannam

[permalink] [raw]
Subject: [PATCH 1/3] PCI: iproc: Add feature to set order mode

Order mode in RX header of incoming pcie packets can be override to
strict or loose order based on requirement.
Sysfs entry is provided to set dynamic and default order modes of upstream
traffic.

To improve performance in few endpoints we required to modify the
ordering attributes. Using this feature we can override order modes of RX
packets at IPROC RC.

Ex:
In PCIe based NVMe SSD endpoints data read/writes from disk is using
Queue pairs (submit/completion). After completion of block read/write,
EP writes completion command to completion queue to notify RC.
So that all data transfers before completion command write are not
required to strict order except completion command. It means previous all
packets before completion command write to RC should be written to memory
and acknowledged.

Signed-off-by: Srinath Mannam <[email protected]>
Reviewed-by: Ray Jui <[email protected]>
---
drivers/pci/controller/pcie-iproc.c | 128 ++++++++++++++++++++++++++++++++++++
drivers/pci/controller/pcie-iproc.h | 16 +++++
2 files changed, 144 insertions(+)

diff --git a/drivers/pci/controller/pcie-iproc.c b/drivers/pci/controller/pcie-iproc.c
index c20fd6b..13ce80f 100644
--- a/drivers/pci/controller/pcie-iproc.c
+++ b/drivers/pci/controller/pcie-iproc.c
@@ -57,6 +57,9 @@
#define PCIE_DL_ACTIVE_SHIFT 2
#define PCIE_DL_ACTIVE BIT(PCIE_DL_ACTIVE_SHIFT)

+#define MPS_MRRS_CFG_MPS_SHIFT 0
+#define MPS_MRRS_CFG_MRRS_SHIFT 16
+
#define APB_ERR_EN_SHIFT 0
#define APB_ERR_EN BIT(APB_ERR_EN_SHIFT)

@@ -91,6 +94,14 @@

#define IPROC_PCIE_REG_INVALID 0xffff

+#define RO_FIELD(window) BIT((window) << 1)
+#define RO_VALUE(window) BIT(((window) << 1) + 1)
+/* All Windows are allowed */
+#define RO_ALL_WINDOW 0x33333333
+/* Wait on All Windows */
+#define RO_FIELD_ALL_WINDOW 0x11111111
+#define DYNAMIC_ORDER_MODE 0x5
+
/**
* iProc PCIe outbound mapping controller specific parameters
*
@@ -295,6 +306,15 @@ enum iproc_pcie_reg {
/* enable APB error for unsupported requests */
IPROC_PCIE_APB_ERR_EN,

+ /* Ordering Mode configuration registers */
+ IPROC_PCIE_ORDERING_CFG,
+ IPROC_PCIE_MPS_MRRS_CFG,
+ IPROC_PCIE_IMAP0_RO_CONTROL,
+ IPROC_PCIE_IMAP1_RO_CONTROL,
+ IPROC_PCIE_IMAP2_RO_CONTROL,
+ IPROC_PCIE_IMAP3_RO_CONTROL,
+ IPROC_PCIE_IMAP4_RO_CONTROL,
+
/* total number of core registers */
IPROC_PCIE_MAX_NUM_REG,
};
@@ -352,6 +372,13 @@ static const u16 iproc_pcie_reg_paxb_v2[] = {
[IPROC_PCIE_IMAP4] = 0xe70,
[IPROC_PCIE_LINK_STATUS] = 0xf0c,
[IPROC_PCIE_APB_ERR_EN] = 0xf40,
+ [IPROC_PCIE_ORDERING_CFG] = 0x2000,
+ [IPROC_PCIE_MPS_MRRS_CFG] = 0x2008,
+ [IPROC_PCIE_IMAP0_RO_CONTROL] = 0x201c,
+ [IPROC_PCIE_IMAP1_RO_CONTROL] = 0x2020,
+ [IPROC_PCIE_IMAP2_RO_CONTROL] = 0x2024,
+ [IPROC_PCIE_IMAP3_RO_CONTROL] = 0x2028,
+ [IPROC_PCIE_IMAP4_RO_CONTROL] = 0x202c,
};

/* iProc PCIe PAXC v1 registers */
@@ -1401,6 +1428,97 @@ static int iproc_pcie_rev_init(struct iproc_pcie *pcie)
return 0;
}

+static
+ssize_t order_mode_show(struct device *dev,
+ struct device_attribute *attr,
+ char *buff)
+{
+ struct pci_host_bridge *host = to_pci_host_bridge(dev);
+ struct iproc_pcie *pcie = pci_host_bridge_priv(host);
+
+ return sprintf(buff, "Current PAXB order configuration %d\n",
+ pcie->order_cfg);
+}
+
+static void pcie_iproc_set_dynamic_order(struct iproc_pcie *pcie)
+{
+ u32 val = 0, mps;
+
+ /* Set all IMAPs to relaxed order in dynamic order mode */
+ iproc_pcie_write_reg(pcie, IPROC_PCIE_ORDERING_CFG,
+ DYNAMIC_ORDER_MODE);
+ iproc_pcie_write_reg(pcie, IPROC_PCIE_IMAP0_RO_CONTROL,
+ RO_ALL_WINDOW);
+ iproc_pcie_write_reg(pcie, IPROC_PCIE_IMAP1_RO_CONTROL,
+ RO_ALL_WINDOW);
+ iproc_pcie_write_reg(pcie, IPROC_PCIE_IMAP2_RO_CONTROL,
+ RO_ALL_WINDOW);
+ iproc_pcie_write_reg(pcie, IPROC_PCIE_IMAP3_RO_CONTROL,
+ RO_ALL_WINDOW);
+ iproc_pcie_write_reg(pcie, IPROC_PCIE_IMAP4_RO_CONTROL,
+ RO_ALL_WINDOW);
+
+ /* PAXB MPS/MRRS settings configuration */
+ iproc_pci_raw_config_read32(pcie, 0, IPROC_PCI_EXP_CAP + PCI_EXP_DEVCTL,
+ 2, &mps);
+ mps = (mps & PCI_EXP_DEVCTL_PAYLOAD) >> 5;
+ /* set MRRS to match system MPS */
+ val |= (mps << MPS_MRRS_CFG_MRRS_SHIFT);
+ /* set MPS to 4096 bytes */
+ val |= (0x5 << MPS_MRRS_CFG_MPS_SHIFT);
+ iproc_pcie_write_reg(pcie, IPROC_PCIE_MPS_MRRS_CFG, val);
+}
+
+static
+ssize_t order_mode_store(struct device *dev,
+ struct device_attribute *attr,
+ const char *buf,
+ size_t count)
+{
+ struct pci_host_bridge *host = to_pci_host_bridge(dev);
+ struct iproc_pcie *pcie = pci_host_bridge_priv(host);
+ unsigned long val, regval;
+
+ if (kstrtoul(buf, 0, &val) < 0)
+ return -EINVAL;
+
+ if (val > PAXB_ORDER_DEV_MEM_ONLY) {
+ dev_err(dev, "Invalid Value passed %lu\n", val);
+ dev_err(dev, "0: Everything in strict order\n");
+ dev_err(dev, "1: Only IMAP2 in strict order\n");
+ dev_err(dev, "2: Only device memory in strict order (MSI/MSIX)\n");
+ return -EINVAL;
+ }
+
+ if (val == pcie->order_cfg)
+ return count;
+
+ switch (val) {
+ case PAXB_ORDER_IMAP2_ONLY:
+ pcie_iproc_set_dynamic_order(pcie);
+ regval = RO_ALL_WINDOW;
+ regval &= ~(RO_VALUE(0));
+ /* Set IMAP2 to strict order */
+ iproc_pcie_write_reg(pcie, IPROC_PCIE_IMAP2_RO_CONTROL, regval);
+ dev_info(dev, "RO_IMAP2 set to %#lx\n", regval);
+ break;
+ case PAXB_ORDER_DEV_MEM_ONLY:
+ pcie_iproc_set_dynamic_order(pcie);
+ /* Set IMAP0 to strict order */
+ iproc_pcie_write_reg(pcie, IPROC_PCIE_IMAP0_RO_CONTROL,
+ RO_FIELD_ALL_WINDOW);
+ dev_info(dev, "RO_IMAP0 set to %#x\n", RO_FIELD_ALL_WINDOW);
+ break;
+ default:
+ iproc_pcie_write_reg(pcie, IPROC_PCIE_ORDERING_CFG, 0);
+ break;
+ }
+ pcie->order_cfg = val;
+ return count;
+}
+
+static DEVICE_ATTR_RW(order_mode);
+
int iproc_pcie_setup(struct iproc_pcie *pcie, struct list_head *res)
{
struct device *dev;
@@ -1484,6 +1602,11 @@ int iproc_pcie_setup(struct iproc_pcie *pcie, struct list_head *res)

pci_bus_add_devices(host->bus);

+ if (pcie->type == IPROC_PCIE_PAXB_V2) {
+ ret = device_create_file(&host->dev, &dev_attr_order_mode);
+ if (ret < 0)
+ goto err_power_off_phy;
+ }
return 0;

err_power_off_phy:
@@ -1496,6 +1619,11 @@ EXPORT_SYMBOL(iproc_pcie_setup);

int iproc_pcie_remove(struct iproc_pcie *pcie)
{
+ struct pci_host_bridge *host = pci_host_bridge_from_priv(pcie);
+
+ if (pcie->type == IPROC_PCIE_PAXB_V2)
+ device_remove_file(&host->dev, &dev_attr_order_mode);
+
pci_stop_root_bus(pcie->root_bus);
pci_remove_root_bus(pcie->root_bus);

diff --git a/drivers/pci/controller/pcie-iproc.h b/drivers/pci/controller/pcie-iproc.h
index 4f03ea5..6e100cd 100644
--- a/drivers/pci/controller/pcie-iproc.h
+++ b/drivers/pci/controller/pcie-iproc.h
@@ -24,6 +24,18 @@ enum iproc_pcie_type {
};

/**
+ * PAXB Dynamic ordering modes
+ * PAXB_ORDER_EVERYTHING: Everything in strict order
+ * PAXB_ORDER_IMAP2_ONLY: Only IMAP2 memory window strict order
+ * PAXB_ORDER_DEV_MEM_ONLY: Only device memory (MSI/MSIX) is strict order
+ */
+enum paxb_order_cfg {
+ PAXB_ORDER_EVERYTHING,
+ PAXB_ORDER_IMAP2_ONLY,
+ PAXB_ORDER_DEV_MEM_ONLY,
+};
+
+/**
* iProc PCIe outbound mapping
* @axi_offset: offset from the AXI address to the internal address used by
* the iProc PCIe core
@@ -74,6 +86,8 @@ struct iproc_msi;
* @ib: inbound mapping related parameters
* @ib_map: outbound mapping region related parameters
*
+ * @order_cfg: indicates current value of the order mode.
+ *
* @need_msi_steer: indicates additional configuration of the iProc PCIe
* controller is required to steer MSI writes to external interrupt controller
* @msi: MSI data
@@ -102,6 +116,8 @@ struct iproc_pcie {
struct iproc_pcie_ib ib;
const struct iproc_pcie_ib_map *ib_map;

+ enum paxb_order_cfg order_cfg;
+
bool need_msi_steer;
struct iproc_msi *msi;
};
--
2.7.4


2019-01-18 04:25:13

by Srinath Mannam

[permalink] [raw]
Subject: [PATCH 3/3] PCI: iproc: Add PCIe 32bit outbound memory configuration

IPROC PCIe RC supports outbound memory mapping with SOC address inside
4GB address boundary, which is 32 bit AXI address.
This patch add the support.

Signed-off-by: Srinath Mannam <[email protected]>
Signed-off-by: Abhishek Shah <[email protected]>
Signed-off-by: Ray Jui <[email protected]>
Reviewed-by: Scott Branden <[email protected]>
Reviewed-by: Vikram Prakash <[email protected]>
---
drivers/pci/controller/pcie-iproc.c | 21 +++++++++++++++++++--
1 file changed, 19 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/controller/pcie-iproc.c b/drivers/pci/controller/pcie-iproc.c
index ee89d56..4d0e63b 100644
--- a/drivers/pci/controller/pcie-iproc.c
+++ b/drivers/pci/controller/pcie-iproc.c
@@ -982,8 +982,25 @@ static int iproc_pcie_setup_ob(struct iproc_pcie *pcie, u64 axi_addr,
resource_size_t window_size =
ob_map->window_sizes[size_idx] * SZ_1M;

- if (size < window_size)
- continue;
+ /*
+ * Keep iterating until we reach the last window and
+ * with the minimal window size at index zero. In this
+ * case, we take a compromise by mapping it using the
+ * minimum window size that can be supported
+ */
+ if (size < window_size) {
+ if (size_idx > 0 || window_idx > 0)
+ continue;
+
+ /*
+ * For the corner case of reaching the minimal
+ * window size that can be supported on the
+ * last window
+ */
+ axi_addr = ALIGN_DOWN(axi_addr, window_size);
+ pci_addr = ALIGN_DOWN(pci_addr, window_size);
+ size = window_size;
+ }

if (!IS_ALIGNED(axi_addr, window_size) ||
!IS_ALIGNED(pci_addr, window_size)) {
--
2.7.4


2019-01-18 04:25:22

by Srinath Mannam

[permalink] [raw]
Subject: [PATCH 2/3] PCI: iproc: CRS state check in config request

In the current implementation, config read of 0xffff0001 data
is assumed as CRS completion. but sometimes 0xffff0001 can be
a valid data.
IPROC PCIe RC has a register to show config request status flags
like SC, UR, CRS and CA.
So that extra check is added in the code to confirm the CRS
state using this register before reissue config request.

Signed-off-by: Srinath Mannam <[email protected]>
Reviewed-by: Ray Jui <[email protected]>
---
drivers/pci/controller/pcie-iproc.c | 23 +++++++++++++++++++++--
1 file changed, 21 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/controller/pcie-iproc.c b/drivers/pci/controller/pcie-iproc.c
index 13ce80f..ee89d56 100644
--- a/drivers/pci/controller/pcie-iproc.c
+++ b/drivers/pci/controller/pcie-iproc.c
@@ -63,6 +63,10 @@
#define APB_ERR_EN_SHIFT 0
#define APB_ERR_EN BIT(APB_ERR_EN_SHIFT)

+#define CFG_RD_SUCCESS 0
+#define CFG_RD_UR 1
+#define CFG_RD_CRS 2
+#define CFG_RD_CA 3
#define CFG_RETRY_STATUS 0xffff0001
#define CFG_RETRY_STATUS_TIMEOUT_US 500000 /* 500 milliseconds */

@@ -300,6 +304,9 @@ enum iproc_pcie_reg {
IPROC_PCIE_IARR4,
IPROC_PCIE_IMAP4,

+ /* config read status */
+ IPROC_PCIE_CFG_RD_STATUS,
+
/* link status */
IPROC_PCIE_LINK_STATUS,

@@ -370,6 +377,7 @@ static const u16 iproc_pcie_reg_paxb_v2[] = {
[IPROC_PCIE_IMAP3] = 0xe08,
[IPROC_PCIE_IARR4] = 0xe68,
[IPROC_PCIE_IMAP4] = 0xe70,
+ [IPROC_PCIE_CFG_RD_STATUS] = 0xee0,
[IPROC_PCIE_LINK_STATUS] = 0xf0c,
[IPROC_PCIE_APB_ERR_EN] = 0xf40,
[IPROC_PCIE_ORDERING_CFG] = 0x2000,
@@ -501,10 +509,12 @@ static void __iomem *iproc_pcie_map_ep_cfg_reg(struct iproc_pcie *pcie,
return (pcie->base + offset);
}

-static unsigned int iproc_pcie_cfg_retry(void __iomem *cfg_data_p)
+static unsigned int iproc_pcie_cfg_retry(struct iproc_pcie *pcie,
+ void __iomem *cfg_data_p)
{
int timeout = CFG_RETRY_STATUS_TIMEOUT_US;
unsigned int data;
+ u32 status;

/*
* As per PCIe spec r3.1, sec 2.3.2, CRS Software Visibility only
@@ -525,6 +535,15 @@ static unsigned int iproc_pcie_cfg_retry(void __iomem *cfg_data_p)
*/
data = readl(cfg_data_p);
while (data == CFG_RETRY_STATUS && timeout--) {
+ /*
+ * CRS state is set in CFG_RD status register
+ * This will handle the case where CFG_RETRY_STATUS is
+ * valid config data.
+ */
+ status = iproc_pcie_read_reg(pcie, IPROC_PCIE_CFG_RD_STATUS);
+ if (status != CFG_RD_CRS)
+ return data;
+
udelay(1);
data = readl(cfg_data_p);
}
@@ -603,7 +622,7 @@ static int iproc_pcie_config_read(struct pci_bus *bus, unsigned int devfn,
if (!cfg_data_p)
return PCIBIOS_DEVICE_NOT_FOUND;

- data = iproc_pcie_cfg_retry(cfg_data_p);
+ data = iproc_pcie_cfg_retry(pcie, cfg_data_p);

*val = data;
if (size <= 2)
--
2.7.4


2019-01-18 14:43:13

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [PATCH 0/3] Add IPROC PCIe new features

On Fri, Jan 18, 2019 at 09:53:20AM +0530, Srinath Mannam wrote:
> Add changes related to IPROC PCIe RC IP new features.
>
> This patch set is based on Linux-5.0-rc2.
>
> Srinath Mannam (3):
> PCI: iproc: Add feature to set order mode

Since this apparently refers to a PCIe feature, the subject should use the
exact terminology used in the PCIe spec.

> PCI: iproc: CRS state check in config request

This subject should start with a verb, not "CRS".

> PCI: iproc: Add PCIe 32bit outbound memory configuration
>
> drivers/pci/controller/pcie-iproc.c | 172 +++++++++++++++++++++++++++++++++++-
> drivers/pci/controller/pcie-iproc.h | 16 ++++
> 2 files changed, 184 insertions(+), 4 deletions(-)
>
> --
> 2.7.4
>

2019-01-18 15:10:02

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [PATCH 1/3] PCI: iproc: Add feature to set order mode

On Fri, Jan 18, 2019 at 09:53:21AM +0530, Srinath Mannam wrote:
> Order mode in RX header of incoming pcie packets can be override to
> strict or loose order based on requirement.
> Sysfs entry is provided to set dynamic and default order modes of upstream
> traffic.

s/pcie/PCIe/

If this is two paragraphs, insert a blank line between. If it's one
paragraph, reflow it so it reads like one paragraph.

Are you talking about the Relaxed Ordering bit in the TLP Attributes
field? If so, please use the exact language used in the spec and
include a reference, e.g., to PCIe r4.0, sec 2.2.6.4, 2.4, etc.

I'm hesitant about a new sysfs file for this. That automatically
creates a user-space dependency on this iProc feature. Who would use
this file?

> To improve performance in few endpoints we required to modify the
> ordering attributes. Using this feature we can override order modes of RX
> packets at IPROC RC.
>
> Ex:
> In PCIe based NVMe SSD endpoints data read/writes from disk is using
> Queue pairs (submit/completion). After completion of block read/write,
> EP writes completion command to completion queue to notify RC.
> So that all data transfers before completion command write are not
> required to strict order except completion command. It means previous all
> packets before completion command write to RC should be written to memory
> and acknowledged.

IIUC, if Enable Relaxed Ordering in the endpoint's Device Control
register is set, the device is permitted to set the Relaxed Ordering
bit in TLPs it initiates. So I would think that if we set Enable
Relaxed Ordering correctly, the NVMe endpoint should be able to
use Relaxed Ordering for the data transfers and strict ordering (the
default) for the completion command. What am I missing?

This sysfs file apparently affects the Root Port/Root Complex
behavior, not the Endpoint's behavior. Does that mean iProc ignores
the Relaxed Ordering bit by default, and you're providing a knob that
makes it pay attention to it? If that's the case, why wouldn't you
enable iProc support for Relaxed Ordering always, by default?

> Signed-off-by: Srinath Mannam <[email protected]>
> Reviewed-by: Ray Jui <[email protected]>
> ---
> drivers/pci/controller/pcie-iproc.c | 128 ++++++++++++++++++++++++++++++++++++
> drivers/pci/controller/pcie-iproc.h | 16 +++++
> 2 files changed, 144 insertions(+)
>
> diff --git a/drivers/pci/controller/pcie-iproc.c b/drivers/pci/controller/pcie-iproc.c
> index c20fd6b..13ce80f 100644
> --- a/drivers/pci/controller/pcie-iproc.c
> +++ b/drivers/pci/controller/pcie-iproc.c
> @@ -57,6 +57,9 @@
> #define PCIE_DL_ACTIVE_SHIFT 2
> #define PCIE_DL_ACTIVE BIT(PCIE_DL_ACTIVE_SHIFT)
>
> +#define MPS_MRRS_CFG_MPS_SHIFT 0
> +#define MPS_MRRS_CFG_MRRS_SHIFT 16
> +
> #define APB_ERR_EN_SHIFT 0
> #define APB_ERR_EN BIT(APB_ERR_EN_SHIFT)
>
> @@ -91,6 +94,14 @@
>
> #define IPROC_PCIE_REG_INVALID 0xffff
>
> +#define RO_FIELD(window) BIT((window) << 1)
> +#define RO_VALUE(window) BIT(((window) << 1) + 1)
> +/* All Windows are allowed */
> +#define RO_ALL_WINDOW 0x33333333
> +/* Wait on All Windows */
> +#define RO_FIELD_ALL_WINDOW 0x11111111
> +#define DYNAMIC_ORDER_MODE 0x5
> +
> /**
> * iProc PCIe outbound mapping controller specific parameters
> *
> @@ -295,6 +306,15 @@ enum iproc_pcie_reg {
> /* enable APB error for unsupported requests */
> IPROC_PCIE_APB_ERR_EN,
>
> + /* Ordering Mode configuration registers */
> + IPROC_PCIE_ORDERING_CFG,
> + IPROC_PCIE_MPS_MRRS_CFG,
> + IPROC_PCIE_IMAP0_RO_CONTROL,
> + IPROC_PCIE_IMAP1_RO_CONTROL,
> + IPROC_PCIE_IMAP2_RO_CONTROL,
> + IPROC_PCIE_IMAP3_RO_CONTROL,
> + IPROC_PCIE_IMAP4_RO_CONTROL,
> +
> /* total number of core registers */
> IPROC_PCIE_MAX_NUM_REG,
> };
> @@ -352,6 +372,13 @@ static const u16 iproc_pcie_reg_paxb_v2[] = {
> [IPROC_PCIE_IMAP4] = 0xe70,
> [IPROC_PCIE_LINK_STATUS] = 0xf0c,
> [IPROC_PCIE_APB_ERR_EN] = 0xf40,
> + [IPROC_PCIE_ORDERING_CFG] = 0x2000,
> + [IPROC_PCIE_MPS_MRRS_CFG] = 0x2008,
> + [IPROC_PCIE_IMAP0_RO_CONTROL] = 0x201c,
> + [IPROC_PCIE_IMAP1_RO_CONTROL] = 0x2020,
> + [IPROC_PCIE_IMAP2_RO_CONTROL] = 0x2024,
> + [IPROC_PCIE_IMAP3_RO_CONTROL] = 0x2028,
> + [IPROC_PCIE_IMAP4_RO_CONTROL] = 0x202c,
> };
>
> /* iProc PCIe PAXC v1 registers */
> @@ -1401,6 +1428,97 @@ static int iproc_pcie_rev_init(struct iproc_pcie *pcie)
> return 0;
> }
>
> +static
> +ssize_t order_mode_show(struct device *dev,
> + struct device_attribute *attr,
> + char *buff)

Please format the same as the rest of the file, i.e.,

static ssize_t order_mode_show(struct device *dev, ...

> +{
> + struct pci_host_bridge *host = to_pci_host_bridge(dev);
> + struct iproc_pcie *pcie = pci_host_bridge_priv(host);
> +
> + return sprintf(buff, "Current PAXB order configuration %d\n",
> + pcie->order_cfg);
> +}
> +
> +static void pcie_iproc_set_dynamic_order(struct iproc_pcie *pcie)
> +{
> + u32 val = 0, mps;
> +
> + /* Set all IMAPs to relaxed order in dynamic order mode */
> + iproc_pcie_write_reg(pcie, IPROC_PCIE_ORDERING_CFG,
> + DYNAMIC_ORDER_MODE);
> + iproc_pcie_write_reg(pcie, IPROC_PCIE_IMAP0_RO_CONTROL,
> + RO_ALL_WINDOW);
> + iproc_pcie_write_reg(pcie, IPROC_PCIE_IMAP1_RO_CONTROL,
> + RO_ALL_WINDOW);
> + iproc_pcie_write_reg(pcie, IPROC_PCIE_IMAP2_RO_CONTROL,
> + RO_ALL_WINDOW);
> + iproc_pcie_write_reg(pcie, IPROC_PCIE_IMAP3_RO_CONTROL,
> + RO_ALL_WINDOW);
> + iproc_pcie_write_reg(pcie, IPROC_PCIE_IMAP4_RO_CONTROL,
> + RO_ALL_WINDOW);
> +
> + /* PAXB MPS/MRRS settings configuration */
> + iproc_pci_raw_config_read32(pcie, 0, IPROC_PCI_EXP_CAP + PCI_EXP_DEVCTL,
> + 2, &mps);
> + mps = (mps & PCI_EXP_DEVCTL_PAYLOAD) >> 5;
> + /* set MRRS to match system MPS */
> + val |= (mps << MPS_MRRS_CFG_MRRS_SHIFT);
> + /* set MPS to 4096 bytes */
> + val |= (0x5 << MPS_MRRS_CFG_MPS_SHIFT);
> + iproc_pcie_write_reg(pcie, IPROC_PCIE_MPS_MRRS_CFG, val);
> +}
> +
> +static
> +ssize_t order_mode_store(struct device *dev,
> + struct device_attribute *attr,
> + const char *buf,
> + size_t count)

Fix formatting as above.

> +{
> + struct pci_host_bridge *host = to_pci_host_bridge(dev);
> + struct iproc_pcie *pcie = pci_host_bridge_priv(host);
> + unsigned long val, regval;
> +
> + if (kstrtoul(buf, 0, &val) < 0)
> + return -EINVAL;
> +
> + if (val > PAXB_ORDER_DEV_MEM_ONLY) {
> + dev_err(dev, "Invalid Value passed %lu\n", val);
> + dev_err(dev, "0: Everything in strict order\n");
> + dev_err(dev, "1: Only IMAP2 in strict order\n");
> + dev_err(dev, "2: Only device memory in strict order (MSI/MSIX)\n");
> + return -EINVAL;
> + }
> +
> + if (val == pcie->order_cfg)
> + return count;
> +
> + switch (val) {
> + case PAXB_ORDER_IMAP2_ONLY:
> + pcie_iproc_set_dynamic_order(pcie);
> + regval = RO_ALL_WINDOW;
> + regval &= ~(RO_VALUE(0));
> + /* Set IMAP2 to strict order */
> + iproc_pcie_write_reg(pcie, IPROC_PCIE_IMAP2_RO_CONTROL, regval);
> + dev_info(dev, "RO_IMAP2 set to %#lx\n", regval);
> + break;
> + case PAXB_ORDER_DEV_MEM_ONLY:
> + pcie_iproc_set_dynamic_order(pcie);
> + /* Set IMAP0 to strict order */
> + iproc_pcie_write_reg(pcie, IPROC_PCIE_IMAP0_RO_CONTROL,
> + RO_FIELD_ALL_WINDOW);
> + dev_info(dev, "RO_IMAP0 set to %#x\n", RO_FIELD_ALL_WINDOW);
> + break;
> + default:
> + iproc_pcie_write_reg(pcie, IPROC_PCIE_ORDERING_CFG, 0);
> + break;
> + }
> + pcie->order_cfg = val;
> + return count;
> +}
> +
> +static DEVICE_ATTR_RW(order_mode);
> +
> int iproc_pcie_setup(struct iproc_pcie *pcie, struct list_head *res)
> {
> struct device *dev;
> @@ -1484,6 +1602,11 @@ int iproc_pcie_setup(struct iproc_pcie *pcie, struct list_head *res)
>
> pci_bus_add_devices(host->bus);
>
> + if (pcie->type == IPROC_PCIE_PAXB_V2) {
> + ret = device_create_file(&host->dev, &dev_attr_order_mode);
> + if (ret < 0)
> + goto err_power_off_phy;
> + }
> return 0;
>
> err_power_off_phy:
> @@ -1496,6 +1619,11 @@ EXPORT_SYMBOL(iproc_pcie_setup);
>
> int iproc_pcie_remove(struct iproc_pcie *pcie)
> {
> + struct pci_host_bridge *host = pci_host_bridge_from_priv(pcie);
> +
> + if (pcie->type == IPROC_PCIE_PAXB_V2)
> + device_remove_file(&host->dev, &dev_attr_order_mode);
> +
> pci_stop_root_bus(pcie->root_bus);
> pci_remove_root_bus(pcie->root_bus);
>
> diff --git a/drivers/pci/controller/pcie-iproc.h b/drivers/pci/controller/pcie-iproc.h
> index 4f03ea5..6e100cd 100644
> --- a/drivers/pci/controller/pcie-iproc.h
> +++ b/drivers/pci/controller/pcie-iproc.h
> @@ -24,6 +24,18 @@ enum iproc_pcie_type {
> };
>
> /**
> + * PAXB Dynamic ordering modes
> + * PAXB_ORDER_EVERYTHING: Everything in strict order
> + * PAXB_ORDER_IMAP2_ONLY: Only IMAP2 memory window strict order
> + * PAXB_ORDER_DEV_MEM_ONLY: Only device memory (MSI/MSIX) is strict order
> + */
> +enum paxb_order_cfg {
> + PAXB_ORDER_EVERYTHING,
> + PAXB_ORDER_IMAP2_ONLY,
> + PAXB_ORDER_DEV_MEM_ONLY,
> +};
> +
> +/**
> * iProc PCIe outbound mapping
> * @axi_offset: offset from the AXI address to the internal address used by
> * the iProc PCIe core
> @@ -74,6 +86,8 @@ struct iproc_msi;
> * @ib: inbound mapping related parameters
> * @ib_map: outbound mapping region related parameters
> *
> + * @order_cfg: indicates current value of the order mode.
> + *
> * @need_msi_steer: indicates additional configuration of the iProc PCIe
> * controller is required to steer MSI writes to external interrupt controller
> * @msi: MSI data
> @@ -102,6 +116,8 @@ struct iproc_pcie {
> struct iproc_pcie_ib ib;
> const struct iproc_pcie_ib_map *ib_map;
>
> + enum paxb_order_cfg order_cfg;
> +
> bool need_msi_steer;
> struct iproc_msi *msi;
> };
> --
> 2.7.4
>

2019-01-18 15:11:29

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [PATCH 2/3] PCI: iproc: CRS state check in config request

On Fri, Jan 18, 2019 at 09:53:22AM +0530, Srinath Mannam wrote:
> In the current implementation, config read of 0xffff0001 data
> is assumed as CRS completion. but sometimes 0xffff0001 can be
> a valid data.
> IPROC PCIe RC has a register to show config request status flags
> like SC, UR, CRS and CA.
> So that extra check is added in the code to confirm the CRS
> state using this register before reissue config request.

s/. but/. But/ (Sentences start with a capital letter.)

Please wrap this text correctly. If it's a single paragraph, wrap it so
the lines are filled. It *looks* like it's intended to be separate
paragraphs; they should be separated by blank lines.

> Signed-off-by: Srinath Mannam <[email protected]>
> Reviewed-by: Ray Jui <[email protected]>
> ---
> drivers/pci/controller/pcie-iproc.c | 23 +++++++++++++++++++++--
> 1 file changed, 21 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/pci/controller/pcie-iproc.c b/drivers/pci/controller/pcie-iproc.c
> index 13ce80f..ee89d56 100644
> --- a/drivers/pci/controller/pcie-iproc.c
> +++ b/drivers/pci/controller/pcie-iproc.c
> @@ -63,6 +63,10 @@
> #define APB_ERR_EN_SHIFT 0
> #define APB_ERR_EN BIT(APB_ERR_EN_SHIFT)
>
> +#define CFG_RD_SUCCESS 0
> +#define CFG_RD_UR 1
> +#define CFG_RD_CRS 2
> +#define CFG_RD_CA 3
> #define CFG_RETRY_STATUS 0xffff0001
> #define CFG_RETRY_STATUS_TIMEOUT_US 500000 /* 500 milliseconds */
>
> @@ -300,6 +304,9 @@ enum iproc_pcie_reg {
> IPROC_PCIE_IARR4,
> IPROC_PCIE_IMAP4,
>
> + /* config read status */
> + IPROC_PCIE_CFG_RD_STATUS,
> +
> /* link status */
> IPROC_PCIE_LINK_STATUS,
>
> @@ -370,6 +377,7 @@ static const u16 iproc_pcie_reg_paxb_v2[] = {
> [IPROC_PCIE_IMAP3] = 0xe08,
> [IPROC_PCIE_IARR4] = 0xe68,
> [IPROC_PCIE_IMAP4] = 0xe70,
> + [IPROC_PCIE_CFG_RD_STATUS] = 0xee0,
> [IPROC_PCIE_LINK_STATUS] = 0xf0c,
> [IPROC_PCIE_APB_ERR_EN] = 0xf40,
> [IPROC_PCIE_ORDERING_CFG] = 0x2000,
> @@ -501,10 +509,12 @@ static void __iomem *iproc_pcie_map_ep_cfg_reg(struct iproc_pcie *pcie,
> return (pcie->base + offset);
> }
>
> -static unsigned int iproc_pcie_cfg_retry(void __iomem *cfg_data_p)
> +static unsigned int iproc_pcie_cfg_retry(struct iproc_pcie *pcie,
> + void __iomem *cfg_data_p)
> {
> int timeout = CFG_RETRY_STATUS_TIMEOUT_US;
> unsigned int data;
> + u32 status;
>
> /*
> * As per PCIe spec r3.1, sec 2.3.2, CRS Software Visibility only
> @@ -525,6 +535,15 @@ static unsigned int iproc_pcie_cfg_retry(void __iomem *cfg_data_p)
> */
> data = readl(cfg_data_p);
> while (data == CFG_RETRY_STATUS && timeout--) {
> + /*
> + * CRS state is set in CFG_RD status register
> + * This will handle the case where CFG_RETRY_STATUS is
> + * valid config data.
> + */
> + status = iproc_pcie_read_reg(pcie, IPROC_PCIE_CFG_RD_STATUS);
> + if (status != CFG_RD_CRS)
> + return data;
> +
> udelay(1);
> data = readl(cfg_data_p);
> }
> @@ -603,7 +622,7 @@ static int iproc_pcie_config_read(struct pci_bus *bus, unsigned int devfn,
> if (!cfg_data_p)
> return PCIBIOS_DEVICE_NOT_FOUND;
>
> - data = iproc_pcie_cfg_retry(cfg_data_p);
> + data = iproc_pcie_cfg_retry(pcie, cfg_data_p);
>
> *val = data;
> if (size <= 2)
> --
> 2.7.4
>

2019-01-24 05:37:30

by Srinath Mannam

[permalink] [raw]
Subject: Re: [PATCH 0/3] Add IPROC PCIe new features

Hi Bjorn,

Thank you for the review. I will address your comments in the next patchset.

Regards,
Srinath.

On Fri, Jan 18, 2019 at 8:11 PM Bjorn Helgaas <[email protected]> wrote:
>
> On Fri, Jan 18, 2019 at 09:53:20AM +0530, Srinath Mannam wrote:
> > Add changes related to IPROC PCIe RC IP new features.
> >
> > This patch set is based on Linux-5.0-rc2.
> >
> > Srinath Mannam (3):
> > PCI: iproc: Add feature to set order mode
>
> Since this apparently refers to a PCIe feature, the subject should use the
> exact terminology used in the PCIe spec.
>
> > PCI: iproc: CRS state check in config request
>
> This subject should start with a verb, not "CRS".
>
> > PCI: iproc: Add PCIe 32bit outbound memory configuration
> >
> > drivers/pci/controller/pcie-iproc.c | 172 +++++++++++++++++++++++++++++++++++-
> > drivers/pci/controller/pcie-iproc.h | 16 ++++
> > 2 files changed, 184 insertions(+), 4 deletions(-)
> >
> > --
> > 2.7.4
> >

2019-01-24 08:41:23

by Srinath Mannam

[permalink] [raw]
Subject: Re: [PATCH 1/3] PCI: iproc: Add feature to set order mode

Hi Bjorn,

Thanks for review, please see my comments below inline.

On Fri, Jan 18, 2019 at 8:37 PM Bjorn Helgaas <[email protected]> wrote:
>
> On Fri, Jan 18, 2019 at 09:53:21AM +0530, Srinath Mannam wrote:
> > Order mode in RX header of incoming pcie packets can be override to
> > strict or loose order based on requirement.
> > Sysfs entry is provided to set dynamic and default order modes of upstream
> > traffic.
>
> s/pcie/PCIe/
will change. Thanks.
>
> If this is two paragraphs, insert a blank line between. If it's one
> paragraph, reflow it so it reads like one paragraph.
>
will change. Thanks.
> Are you talking about the Relaxed Ordering bit in the TLP Attributes
> field? If so, please use the exact language used in the spec and
> include a reference, e.g., to PCIe r4.0, sec 2.2.6.4, 2.4, etc.
>
Yes Relax ordering bit in TLP. I will use spec reference. Thanks.
> I'm hesitant about a new sysfs file for this. That automatically
> creates a user-space dependency on this iProc feature. Who would use
> this file?
>
This is the specific feature given in iProc, used to improve
performance for the endpoints which does not support
ordering configuration at its end.
This is the reason we used sysfs file, which allows user to change
ordering based on endpoint used and requirement.
we are using these sysfs files to configure ordering to improve
performance in NVMe endpoints with SPDK/DPDK drivers.
If we enable this default in kernel, then it is applicable to all
endpoints connected. But it may not required for all endpoints.

> > To improve performance in few endpoints we required to modify the
> > ordering attributes. Using this feature we can override order modes of RX
> > packets at IPROC RC.
> >
> > Ex:
> > In PCIe based NVMe SSD endpoints data read/writes from disk is using
> > Queue pairs (submit/completion). After completion of block read/write,
> > EP writes completion command to completion queue to notify RC.
> > So that all data transfers before completion command write are not
> > required to strict order except completion command. It means previous all
> > packets before completion command write to RC should be written to memory
> > and acknowledged.
>
> IIUC, if Enable Relaxed Ordering in the endpoint's Device Control
> register is set, the device is permitted to set the Relaxed Ordering
> bit in TLPs it initiates. So I would think that if we set Enable
> Relaxed Ordering correctly, the NVMe endpoint should be able to
> use Relaxed Ordering for the data transfers and strict ordering (the
> default) for the completion command. What am I missing?
>
As per NVMe spec Enable Relaxed ordering is implementation specific all cards
may not support.

> This sysfs file apparently affects the Root Port/Root Complex
> behavior, not the Endpoint's behavior. Does that mean iProc ignores
> the Relaxed Ordering bit by default, and you're providing a knob that
With sysfs file setting, ordering attributes of all TLPs directed to
specific memory window
can be override iProc layer based on settings.
TLPs to one memory window can keep as strict order and other memory windows can
set to strict order.
Using sysfs file ordering settings can configure and revert back to default.
> makes it pay attention to it? If that's the case, why wouldn't you
> enable iProc support for Relaxed Ordering always, by default?
>
Option is given to user, because few endpoints may support ordering
configuration internally.
few EPs doesn't support. Based on requirement user can configure
ordering settings.

> > Signed-off-by: Srinath Mannam <[email protected]>
> > Reviewed-by: Ray Jui <[email protected]>
> > ---
> > drivers/pci/controller/pcie-iproc.c | 128 ++++++++++++++++++++++++++++++++++++
> > drivers/pci/controller/pcie-iproc.h | 16 +++++
> > 2 files changed, 144 insertions(+)
> >
> > diff --git a/drivers/pci/controller/pcie-iproc.c b/drivers/pci/controller/pcie-iproc.c
> > index c20fd6b..13ce80f 100644
> > --- a/drivers/pci/controller/pcie-iproc.c
> > +++ b/drivers/pci/controller/pcie-iproc.c
> > @@ -57,6 +57,9 @@
> > #define PCIE_DL_ACTIVE_SHIFT 2
> > #define PCIE_DL_ACTIVE BIT(PCIE_DL_ACTIVE_SHIFT)
> >
> > +#define MPS_MRRS_CFG_MPS_SHIFT 0
> > +#define MPS_MRRS_CFG_MRRS_SHIFT 16
> > +
> > #define APB_ERR_EN_SHIFT 0
> > #define APB_ERR_EN BIT(APB_ERR_EN_SHIFT)
> >
> > @@ -91,6 +94,14 @@
> >
> > #define IPROC_PCIE_REG_INVALID 0xffff
> >
> > +#define RO_FIELD(window) BIT((window) << 1)
> > +#define RO_VALUE(window) BIT(((window) << 1) + 1)
> > +/* All Windows are allowed */
> > +#define RO_ALL_WINDOW 0x33333333
> > +/* Wait on All Windows */
> > +#define RO_FIELD_ALL_WINDOW 0x11111111
> > +#define DYNAMIC_ORDER_MODE 0x5
> > +
> > /**
> > * iProc PCIe outbound mapping controller specific parameters
> > *
> > @@ -295,6 +306,15 @@ enum iproc_pcie_reg {
> > /* enable APB error for unsupported requests */
> > IPROC_PCIE_APB_ERR_EN,
> >
> > + /* Ordering Mode configuration registers */
> > + IPROC_PCIE_ORDERING_CFG,
> > + IPROC_PCIE_MPS_MRRS_CFG,
> > + IPROC_PCIE_IMAP0_RO_CONTROL,
> > + IPROC_PCIE_IMAP1_RO_CONTROL,
> > + IPROC_PCIE_IMAP2_RO_CONTROL,
> > + IPROC_PCIE_IMAP3_RO_CONTROL,
> > + IPROC_PCIE_IMAP4_RO_CONTROL,
> > +
> > /* total number of core registers */
> > IPROC_PCIE_MAX_NUM_REG,
> > };
> > @@ -352,6 +372,13 @@ static const u16 iproc_pcie_reg_paxb_v2[] = {
> > [IPROC_PCIE_IMAP4] = 0xe70,
> > [IPROC_PCIE_LINK_STATUS] = 0xf0c,
> > [IPROC_PCIE_APB_ERR_EN] = 0xf40,
> > + [IPROC_PCIE_ORDERING_CFG] = 0x2000,
> > + [IPROC_PCIE_MPS_MRRS_CFG] = 0x2008,
> > + [IPROC_PCIE_IMAP0_RO_CONTROL] = 0x201c,
> > + [IPROC_PCIE_IMAP1_RO_CONTROL] = 0x2020,
> > + [IPROC_PCIE_IMAP2_RO_CONTROL] = 0x2024,
> > + [IPROC_PCIE_IMAP3_RO_CONTROL] = 0x2028,
> > + [IPROC_PCIE_IMAP4_RO_CONTROL] = 0x202c,
> > };
> >
> > /* iProc PCIe PAXC v1 registers */
> > @@ -1401,6 +1428,97 @@ static int iproc_pcie_rev_init(struct iproc_pcie *pcie)
> > return 0;
> > }
> >
> > +static
> > +ssize_t order_mode_show(struct device *dev,
> > + struct device_attribute *attr,
> > + char *buff)
>
> Please format the same as the rest of the file, i.e.,
>
> static ssize_t order_mode_show(struct device *dev, ...
>
> > +{
> > + struct pci_host_bridge *host = to_pci_host_bridge(dev);
> > + struct iproc_pcie *pcie = pci_host_bridge_priv(host);
> > +
> > + return sprintf(buff, "Current PAXB order configuration %d\n",
> > + pcie->order_cfg);
> > +}
> > +
> > +static void pcie_iproc_set_dynamic_order(struct iproc_pcie *pcie)
> > +{
> > + u32 val = 0, mps;
> > +
> > + /* Set all IMAPs to relaxed order in dynamic order mode */
> > + iproc_pcie_write_reg(pcie, IPROC_PCIE_ORDERING_CFG,
> > + DYNAMIC_ORDER_MODE);
> > + iproc_pcie_write_reg(pcie, IPROC_PCIE_IMAP0_RO_CONTROL,
> > + RO_ALL_WINDOW);
> > + iproc_pcie_write_reg(pcie, IPROC_PCIE_IMAP1_RO_CONTROL,
> > + RO_ALL_WINDOW);
> > + iproc_pcie_write_reg(pcie, IPROC_PCIE_IMAP2_RO_CONTROL,
> > + RO_ALL_WINDOW);
> > + iproc_pcie_write_reg(pcie, IPROC_PCIE_IMAP3_RO_CONTROL,
> > + RO_ALL_WINDOW);
> > + iproc_pcie_write_reg(pcie, IPROC_PCIE_IMAP4_RO_CONTROL,
> > + RO_ALL_WINDOW);
> > +
> > + /* PAXB MPS/MRRS settings configuration */
> > + iproc_pci_raw_config_read32(pcie, 0, IPROC_PCI_EXP_CAP + PCI_EXP_DEVCTL,
> > + 2, &mps);
> > + mps = (mps & PCI_EXP_DEVCTL_PAYLOAD) >> 5;
> > + /* set MRRS to match system MPS */
> > + val |= (mps << MPS_MRRS_CFG_MRRS_SHIFT);
> > + /* set MPS to 4096 bytes */
> > + val |= (0x5 << MPS_MRRS_CFG_MPS_SHIFT);
> > + iproc_pcie_write_reg(pcie, IPROC_PCIE_MPS_MRRS_CFG, val);
> > +}
> > +
> > +static
> > +ssize_t order_mode_store(struct device *dev,
> > + struct device_attribute *attr,
> > + const char *buf,
> > + size_t count)
>
> Fix formatting as above.
>
> > +{
> > + struct pci_host_bridge *host = to_pci_host_bridge(dev);
> > + struct iproc_pcie *pcie = pci_host_bridge_priv(host);
> > + unsigned long val, regval;
> > +
> > + if (kstrtoul(buf, 0, &val) < 0)
> > + return -EINVAL;
> > +
> > + if (val > PAXB_ORDER_DEV_MEM_ONLY) {
> > + dev_err(dev, "Invalid Value passed %lu\n", val);
> > + dev_err(dev, "0: Everything in strict order\n");
> > + dev_err(dev, "1: Only IMAP2 in strict order\n");
> > + dev_err(dev, "2: Only device memory in strict order (MSI/MSIX)\n");
> > + return -EINVAL;
> > + }
> > +
> > + if (val == pcie->order_cfg)
> > + return count;
> > +
> > + switch (val) {
> > + case PAXB_ORDER_IMAP2_ONLY:
> > + pcie_iproc_set_dynamic_order(pcie);
> > + regval = RO_ALL_WINDOW;
> > + regval &= ~(RO_VALUE(0));
> > + /* Set IMAP2 to strict order */
> > + iproc_pcie_write_reg(pcie, IPROC_PCIE_IMAP2_RO_CONTROL, regval);
> > + dev_info(dev, "RO_IMAP2 set to %#lx\n", regval);
> > + break;
> > + case PAXB_ORDER_DEV_MEM_ONLY:
> > + pcie_iproc_set_dynamic_order(pcie);
> > + /* Set IMAP0 to strict order */
> > + iproc_pcie_write_reg(pcie, IPROC_PCIE_IMAP0_RO_CONTROL,
> > + RO_FIELD_ALL_WINDOW);
> > + dev_info(dev, "RO_IMAP0 set to %#x\n", RO_FIELD_ALL_WINDOW);
> > + break;
> > + default:
> > + iproc_pcie_write_reg(pcie, IPROC_PCIE_ORDERING_CFG, 0);
> > + break;
> > + }
> > + pcie->order_cfg = val;
> > + return count;
> > +}
> > +
> > +static DEVICE_ATTR_RW(order_mode);
> > +
> > int iproc_pcie_setup(struct iproc_pcie *pcie, struct list_head *res)
> > {
> > struct device *dev;
> > @@ -1484,6 +1602,11 @@ int iproc_pcie_setup(struct iproc_pcie *pcie, struct list_head *res)
> >
> > pci_bus_add_devices(host->bus);
> >
> > + if (pcie->type == IPROC_PCIE_PAXB_V2) {
> > + ret = device_create_file(&host->dev, &dev_attr_order_mode);
> > + if (ret < 0)
> > + goto err_power_off_phy;
> > + }
> > return 0;
> >
> > err_power_off_phy:
> > @@ -1496,6 +1619,11 @@ EXPORT_SYMBOL(iproc_pcie_setup);
> >
> > int iproc_pcie_remove(struct iproc_pcie *pcie)
> > {
> > + struct pci_host_bridge *host = pci_host_bridge_from_priv(pcie);
> > +
> > + if (pcie->type == IPROC_PCIE_PAXB_V2)
> > + device_remove_file(&host->dev, &dev_attr_order_mode);
> > +
> > pci_stop_root_bus(pcie->root_bus);
> > pci_remove_root_bus(pcie->root_bus);
> >
> > diff --git a/drivers/pci/controller/pcie-iproc.h b/drivers/pci/controller/pcie-iproc.h
> > index 4f03ea5..6e100cd 100644
> > --- a/drivers/pci/controller/pcie-iproc.h
> > +++ b/drivers/pci/controller/pcie-iproc.h
> > @@ -24,6 +24,18 @@ enum iproc_pcie_type {
> > };
> >
> > /**
> > + * PAXB Dynamic ordering modes
> > + * PAXB_ORDER_EVERYTHING: Everything in strict order
> > + * PAXB_ORDER_IMAP2_ONLY: Only IMAP2 memory window strict order
> > + * PAXB_ORDER_DEV_MEM_ONLY: Only device memory (MSI/MSIX) is strict order
> > + */
> > +enum paxb_order_cfg {
> > + PAXB_ORDER_EVERYTHING,
> > + PAXB_ORDER_IMAP2_ONLY,
> > + PAXB_ORDER_DEV_MEM_ONLY,
> > +};
> > +
> > +/**
> > * iProc PCIe outbound mapping
> > * @axi_offset: offset from the AXI address to the internal address used by
> > * the iProc PCIe core
> > @@ -74,6 +86,8 @@ struct iproc_msi;
> > * @ib: inbound mapping related parameters
> > * @ib_map: outbound mapping region related parameters
> > *
> > + * @order_cfg: indicates current value of the order mode.
> > + *
> > * @need_msi_steer: indicates additional configuration of the iProc PCIe
> > * controller is required to steer MSI writes to external interrupt controller
> > * @msi: MSI data
> > @@ -102,6 +116,8 @@ struct iproc_pcie {
> > struct iproc_pcie_ib ib;
> > const struct iproc_pcie_ib_map *ib_map;
> >
> > + enum paxb_order_cfg order_cfg;
> > +
> > bool need_msi_steer;
> > struct iproc_msi *msi;
> > };
> > --
> > 2.7.4
> >

2019-01-24 08:44:52

by Srinath Mannam

[permalink] [raw]
Subject: Re: [PATCH 2/3] PCI: iproc: CRS state check in config request

Hi Bjorn,

Thanks for review, please see my comments below inline.

On Fri, Jan 18, 2019 at 8:38 PM Bjorn Helgaas <[email protected]> wrote:
>
> On Fri, Jan 18, 2019 at 09:53:22AM +0530, Srinath Mannam wrote:
> > In the current implementation, config read of 0xffff0001 data
> > is assumed as CRS completion. but sometimes 0xffff0001 can be
> > a valid data.
> > IPROC PCIe RC has a register to show config request status flags
> > like SC, UR, CRS and CA.
> > So that extra check is added in the code to confirm the CRS
> > state using this register before reissue config request.
>
> s/. but/. But/ (Sentences start with a capital letter.)
will change. Thanks.
>
> Please wrap this text correctly. If it's a single paragraph, wrap it so
> the lines are filled. It *looks* like it's intended to be separate
> paragraphs; they should be separated by blank lines.
will change. Thanks.
>
> > Signed-off-by: Srinath Mannam <[email protected]>
> > Reviewed-by: Ray Jui <[email protected]>
> > ---
> > drivers/pci/controller/pcie-iproc.c | 23 +++++++++++++++++++++--
> > 1 file changed, 21 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/pci/controller/pcie-iproc.c b/drivers/pci/controller/pcie-iproc.c
> > index 13ce80f..ee89d56 100644
> > --- a/drivers/pci/controller/pcie-iproc.c
> > +++ b/drivers/pci/controller/pcie-iproc.c
> > @@ -63,6 +63,10 @@
> > #define APB_ERR_EN_SHIFT 0
> > #define APB_ERR_EN BIT(APB_ERR_EN_SHIFT)
> >
> > +#define CFG_RD_SUCCESS 0
> > +#define CFG_RD_UR 1
> > +#define CFG_RD_CRS 2
> > +#define CFG_RD_CA 3
> > #define CFG_RETRY_STATUS 0xffff0001
> > #define CFG_RETRY_STATUS_TIMEOUT_US 500000 /* 500 milliseconds */
> >
> > @@ -300,6 +304,9 @@ enum iproc_pcie_reg {
> > IPROC_PCIE_IARR4,
> > IPROC_PCIE_IMAP4,
> >
> > + /* config read status */
> > + IPROC_PCIE_CFG_RD_STATUS,
> > +
> > /* link status */
> > IPROC_PCIE_LINK_STATUS,
> >
> > @@ -370,6 +377,7 @@ static const u16 iproc_pcie_reg_paxb_v2[] = {
> > [IPROC_PCIE_IMAP3] = 0xe08,
> > [IPROC_PCIE_IARR4] = 0xe68,
> > [IPROC_PCIE_IMAP4] = 0xe70,
> > + [IPROC_PCIE_CFG_RD_STATUS] = 0xee0,
> > [IPROC_PCIE_LINK_STATUS] = 0xf0c,
> > [IPROC_PCIE_APB_ERR_EN] = 0xf40,
> > [IPROC_PCIE_ORDERING_CFG] = 0x2000,
> > @@ -501,10 +509,12 @@ static void __iomem *iproc_pcie_map_ep_cfg_reg(struct iproc_pcie *pcie,
> > return (pcie->base + offset);
> > }
> >
> > -static unsigned int iproc_pcie_cfg_retry(void __iomem *cfg_data_p)
> > +static unsigned int iproc_pcie_cfg_retry(struct iproc_pcie *pcie,
> > + void __iomem *cfg_data_p)
> > {
> > int timeout = CFG_RETRY_STATUS_TIMEOUT_US;
> > unsigned int data;
> > + u32 status;
> >
> > /*
> > * As per PCIe spec r3.1, sec 2.3.2, CRS Software Visibility only
> > @@ -525,6 +535,15 @@ static unsigned int iproc_pcie_cfg_retry(void __iomem *cfg_data_p)
> > */
> > data = readl(cfg_data_p);
> > while (data == CFG_RETRY_STATUS && timeout--) {
> > + /*
> > + * CRS state is set in CFG_RD status register
> > + * This will handle the case where CFG_RETRY_STATUS is
> > + * valid config data.
> > + */
> > + status = iproc_pcie_read_reg(pcie, IPROC_PCIE_CFG_RD_STATUS);
> > + if (status != CFG_RD_CRS)
> > + return data;
> > +
> > udelay(1);
> > data = readl(cfg_data_p);
> > }
> > @@ -603,7 +622,7 @@ static int iproc_pcie_config_read(struct pci_bus *bus, unsigned int devfn,
> > if (!cfg_data_p)
> > return PCIBIOS_DEVICE_NOT_FOUND;
> >
> > - data = iproc_pcie_cfg_retry(cfg_data_p);
> > + data = iproc_pcie_cfg_retry(pcie, cfg_data_p);
> >
> > *val = data;
> > if (size <= 2)
> > --
> > 2.7.4
> >

2019-01-24 19:32:53

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [PATCH 1/3] PCI: iproc: Add feature to set order mode

On Thu, Jan 24, 2019 at 02:10:18PM +0530, Srinath Mannam wrote:
> On Fri, Jan 18, 2019 at 8:37 PM Bjorn Helgaas <[email protected]> wrote:
> > On Fri, Jan 18, 2019 at 09:53:21AM +0530, Srinath Mannam wrote:
> > > Order mode in RX header of incoming pcie packets can be override to
> > > strict or loose order based on requirement.
> > > Sysfs entry is provided to set dynamic and default order modes of upstream
> > > traffic.
> > ...
> > Are you talking about the Relaxed Ordering bit in the TLP Attributes
> > field? If so, please use the exact language used in the spec and
> > include a reference, e.g., to PCIe r4.0, sec 2.2.6.4, 2.4, etc.
> >
> Yes Relax ordering bit in TLP. I will use spec reference. Thanks.
> > I'm hesitant about a new sysfs file for this. That automatically
> > creates a user-space dependency on this iProc feature. Who would use
> > this file?
> >
> This is the specific feature given in iProc, used to improve
> performance for the endpoints which does not support
> ordering configuration at its end.
> This is the reason we used sysfs file, which allows user to change
> ordering based on endpoint used and requirement.
> we are using these sysfs files to configure ordering to improve
> performance in NVMe endpoints with SPDK/DPDK drivers.
> If we enable this default in kernel, then it is applicable to all
> endpoints connected. But it may not required for all endpoints.

Normally, relaxed ordering is used only when an endpoint sets the
"Relaxed Ordering" attribute bit in a TLP. The endpoint is only
allowed to do that if relaxed ordering is enabled in the Device
Control register.

If I understand correctly, this sysfs knob would let you configure the
iProc RC so it handles *all* TLPs (or just those in certain address
ranges) with relaxed ordering, regardless of whether the endpoint has
relaxed ordering, or even whether it supports relaxed ordering at all.

My gut feeling is that this is a messy hack, and if you want the
performance benefits of relaxed ordering, you should just choose an
NVMe endpoint that supports it correctly.

I assume the iProc RC does actually have the smarts to pay attention
to the Relaxed Ordering bit in TLPs if the endpoint sets it?

> > > To improve performance in few endpoints we required to modify the
> > > ordering attributes. Using this feature we can override order modes of RX
> > > packets at IPROC RC.
> > >
> > > Ex:
> > > In PCIe based NVMe SSD endpoints data read/writes from disk is using
> > > Queue pairs (submit/completion). After completion of block read/write,
> > > EP writes completion command to completion queue to notify RC.
> > > So that all data transfers before completion command write are not
> > > required to strict order except completion command. It means previous all
> > > packets before completion command write to RC should be written to memory
> > > and acknowledged.
> >
> > IIUC, if Enable Relaxed Ordering in the endpoint's Device Control
> > register is set, the device is permitted to set the Relaxed Ordering
> > bit in TLPs it initiates. So I would think that if we set Enable
> > Relaxed Ordering correctly, the NVMe endpoint should be able to
> > use Relaxed Ordering for the data transfers and strict ordering (the
> > default) for the completion command. What am I missing?
> >
> As per NVMe spec Enable Relaxed ordering is implementation specific all cards
> may not support.

Relaxed ordering support is optional for *every* PCIe endpoint, not
just NVMe. If an endpoint doesn't support relaxed ordering at all, it
should hardwire the Enable Relaxed Ordering bit to zero.

> > This sysfs file apparently affects the Root Port/Root Complex
> > behavior, not the Endpoint's behavior. Does that mean iProc ignores
> > the Relaxed Ordering bit by default, and you're providing a knob that
> >
> With sysfs file setting, ordering attributes of all TLPs directed to
> specific memory window can be override iProc layer based on
> settings. TLPs to one memory window can keep as strict order and
> other memory windows can set to strict order.
>
> Using sysfs file ordering settings can configure and revert back to default.
>
> > makes it pay attention to it? If that's the case, why wouldn't you
> > enable iProc support for Relaxed Ordering always, by default?
> >
> Option is given to user, because few endpoints may support ordering
> configuration internally.
> few EPs doesn't support. Based on requirement user can configure
> ordering settings.
>
> > > Signed-off-by: Srinath Mannam <[email protected]>
> > > Reviewed-by: Ray Jui <[email protected]>
> > > ---
> > > drivers/pci/controller/pcie-iproc.c | 128 ++++++++++++++++++++++++++++++++++++
> > > drivers/pci/controller/pcie-iproc.h | 16 +++++
> > > 2 files changed, 144 insertions(+)
> > >
> > > diff --git a/drivers/pci/controller/pcie-iproc.c b/drivers/pci/controller/pcie-iproc.c
> > > index c20fd6b..13ce80f 100644
> > > --- a/drivers/pci/controller/pcie-iproc.c
> > > +++ b/drivers/pci/controller/pcie-iproc.c
> > > @@ -57,6 +57,9 @@
> > > #define PCIE_DL_ACTIVE_SHIFT 2
> > > #define PCIE_DL_ACTIVE BIT(PCIE_DL_ACTIVE_SHIFT)
> > >
> > > +#define MPS_MRRS_CFG_MPS_SHIFT 0
> > > +#define MPS_MRRS_CFG_MRRS_SHIFT 16
> > > +
> > > #define APB_ERR_EN_SHIFT 0
> > > #define APB_ERR_EN BIT(APB_ERR_EN_SHIFT)
> > >
> > > @@ -91,6 +94,14 @@
> > >
> > > #define IPROC_PCIE_REG_INVALID 0xffff
> > >
> > > +#define RO_FIELD(window) BIT((window) << 1)
> > > +#define RO_VALUE(window) BIT(((window) << 1) + 1)
> > > +/* All Windows are allowed */
> > > +#define RO_ALL_WINDOW 0x33333333
> > > +/* Wait on All Windows */
> > > +#define RO_FIELD_ALL_WINDOW 0x11111111
> > > +#define DYNAMIC_ORDER_MODE 0x5
> > > +
> > > /**
> > > * iProc PCIe outbound mapping controller specific parameters
> > > *
> > > @@ -295,6 +306,15 @@ enum iproc_pcie_reg {
> > > /* enable APB error for unsupported requests */
> > > IPROC_PCIE_APB_ERR_EN,
> > >
> > > + /* Ordering Mode configuration registers */
> > > + IPROC_PCIE_ORDERING_CFG,
> > > + IPROC_PCIE_MPS_MRRS_CFG,
> > > + IPROC_PCIE_IMAP0_RO_CONTROL,
> > > + IPROC_PCIE_IMAP1_RO_CONTROL,
> > > + IPROC_PCIE_IMAP2_RO_CONTROL,
> > > + IPROC_PCIE_IMAP3_RO_CONTROL,
> > > + IPROC_PCIE_IMAP4_RO_CONTROL,
> > > +
> > > /* total number of core registers */
> > > IPROC_PCIE_MAX_NUM_REG,
> > > };
> > > @@ -352,6 +372,13 @@ static const u16 iproc_pcie_reg_paxb_v2[] = {
> > > [IPROC_PCIE_IMAP4] = 0xe70,
> > > [IPROC_PCIE_LINK_STATUS] = 0xf0c,
> > > [IPROC_PCIE_APB_ERR_EN] = 0xf40,
> > > + [IPROC_PCIE_ORDERING_CFG] = 0x2000,
> > > + [IPROC_PCIE_MPS_MRRS_CFG] = 0x2008,
> > > + [IPROC_PCIE_IMAP0_RO_CONTROL] = 0x201c,
> > > + [IPROC_PCIE_IMAP1_RO_CONTROL] = 0x2020,
> > > + [IPROC_PCIE_IMAP2_RO_CONTROL] = 0x2024,
> > > + [IPROC_PCIE_IMAP3_RO_CONTROL] = 0x2028,
> > > + [IPROC_PCIE_IMAP4_RO_CONTROL] = 0x202c,
> > > };
> > >
> > > /* iProc PCIe PAXC v1 registers */
> > > @@ -1401,6 +1428,97 @@ static int iproc_pcie_rev_init(struct iproc_pcie *pcie)
> > > return 0;
> > > }
> > >
> > > +static
> > > +ssize_t order_mode_show(struct device *dev,
> > > + struct device_attribute *attr,
> > > + char *buff)
> >
> > Please format the same as the rest of the file, i.e.,
> >
> > static ssize_t order_mode_show(struct device *dev, ...
> >
> > > +{
> > > + struct pci_host_bridge *host = to_pci_host_bridge(dev);
> > > + struct iproc_pcie *pcie = pci_host_bridge_priv(host);
> > > +
> > > + return sprintf(buff, "Current PAXB order configuration %d\n",
> > > + pcie->order_cfg);
> > > +}
> > > +
> > > +static void pcie_iproc_set_dynamic_order(struct iproc_pcie *pcie)
> > > +{
> > > + u32 val = 0, mps;
> > > +
> > > + /* Set all IMAPs to relaxed order in dynamic order mode */
> > > + iproc_pcie_write_reg(pcie, IPROC_PCIE_ORDERING_CFG,
> > > + DYNAMIC_ORDER_MODE);
> > > + iproc_pcie_write_reg(pcie, IPROC_PCIE_IMAP0_RO_CONTROL,
> > > + RO_ALL_WINDOW);
> > > + iproc_pcie_write_reg(pcie, IPROC_PCIE_IMAP1_RO_CONTROL,
> > > + RO_ALL_WINDOW);
> > > + iproc_pcie_write_reg(pcie, IPROC_PCIE_IMAP2_RO_CONTROL,
> > > + RO_ALL_WINDOW);
> > > + iproc_pcie_write_reg(pcie, IPROC_PCIE_IMAP3_RO_CONTROL,
> > > + RO_ALL_WINDOW);
> > > + iproc_pcie_write_reg(pcie, IPROC_PCIE_IMAP4_RO_CONTROL,
> > > + RO_ALL_WINDOW);
> > > +
> > > + /* PAXB MPS/MRRS settings configuration */
> > > + iproc_pci_raw_config_read32(pcie, 0, IPROC_PCI_EXP_CAP + PCI_EXP_DEVCTL,
> > > + 2, &mps);
> > > + mps = (mps & PCI_EXP_DEVCTL_PAYLOAD) >> 5;
> > > + /* set MRRS to match system MPS */
> > > + val |= (mps << MPS_MRRS_CFG_MRRS_SHIFT);
> > > + /* set MPS to 4096 bytes */
> > > + val |= (0x5 << MPS_MRRS_CFG_MPS_SHIFT);
> > > + iproc_pcie_write_reg(pcie, IPROC_PCIE_MPS_MRRS_CFG, val);
> > > +}
> > > +
> > > +static
> > > +ssize_t order_mode_store(struct device *dev,
> > > + struct device_attribute *attr,
> > > + const char *buf,
> > > + size_t count)
> >
> > Fix formatting as above.
> >
> > > +{
> > > + struct pci_host_bridge *host = to_pci_host_bridge(dev);
> > > + struct iproc_pcie *pcie = pci_host_bridge_priv(host);
> > > + unsigned long val, regval;
> > > +
> > > + if (kstrtoul(buf, 0, &val) < 0)
> > > + return -EINVAL;
> > > +
> > > + if (val > PAXB_ORDER_DEV_MEM_ONLY) {
> > > + dev_err(dev, "Invalid Value passed %lu\n", val);
> > > + dev_err(dev, "0: Everything in strict order\n");
> > > + dev_err(dev, "1: Only IMAP2 in strict order\n");
> > > + dev_err(dev, "2: Only device memory in strict order (MSI/MSIX)\n");
> > > + return -EINVAL;
> > > + }
> > > +
> > > + if (val == pcie->order_cfg)
> > > + return count;
> > > +
> > > + switch (val) {
> > > + case PAXB_ORDER_IMAP2_ONLY:
> > > + pcie_iproc_set_dynamic_order(pcie);
> > > + regval = RO_ALL_WINDOW;
> > > + regval &= ~(RO_VALUE(0));
> > > + /* Set IMAP2 to strict order */
> > > + iproc_pcie_write_reg(pcie, IPROC_PCIE_IMAP2_RO_CONTROL, regval);
> > > + dev_info(dev, "RO_IMAP2 set to %#lx\n", regval);
> > > + break;
> > > + case PAXB_ORDER_DEV_MEM_ONLY:
> > > + pcie_iproc_set_dynamic_order(pcie);
> > > + /* Set IMAP0 to strict order */
> > > + iproc_pcie_write_reg(pcie, IPROC_PCIE_IMAP0_RO_CONTROL,
> > > + RO_FIELD_ALL_WINDOW);
> > > + dev_info(dev, "RO_IMAP0 set to %#x\n", RO_FIELD_ALL_WINDOW);
> > > + break;
> > > + default:
> > > + iproc_pcie_write_reg(pcie, IPROC_PCIE_ORDERING_CFG, 0);
> > > + break;
> > > + }
> > > + pcie->order_cfg = val;
> > > + return count;
> > > +}
> > > +
> > > +static DEVICE_ATTR_RW(order_mode);
> > > +
> > > int iproc_pcie_setup(struct iproc_pcie *pcie, struct list_head *res)
> > > {
> > > struct device *dev;
> > > @@ -1484,6 +1602,11 @@ int iproc_pcie_setup(struct iproc_pcie *pcie, struct list_head *res)
> > >
> > > pci_bus_add_devices(host->bus);
> > >
> > > + if (pcie->type == IPROC_PCIE_PAXB_V2) {
> > > + ret = device_create_file(&host->dev, &dev_attr_order_mode);
> > > + if (ret < 0)
> > > + goto err_power_off_phy;
> > > + }
> > > return 0;
> > >
> > > err_power_off_phy:
> > > @@ -1496,6 +1619,11 @@ EXPORT_SYMBOL(iproc_pcie_setup);
> > >
> > > int iproc_pcie_remove(struct iproc_pcie *pcie)
> > > {
> > > + struct pci_host_bridge *host = pci_host_bridge_from_priv(pcie);
> > > +
> > > + if (pcie->type == IPROC_PCIE_PAXB_V2)
> > > + device_remove_file(&host->dev, &dev_attr_order_mode);
> > > +
> > > pci_stop_root_bus(pcie->root_bus);
> > > pci_remove_root_bus(pcie->root_bus);
> > >
> > > diff --git a/drivers/pci/controller/pcie-iproc.h b/drivers/pci/controller/pcie-iproc.h
> > > index 4f03ea5..6e100cd 100644
> > > --- a/drivers/pci/controller/pcie-iproc.h
> > > +++ b/drivers/pci/controller/pcie-iproc.h
> > > @@ -24,6 +24,18 @@ enum iproc_pcie_type {
> > > };
> > >
> > > /**
> > > + * PAXB Dynamic ordering modes
> > > + * PAXB_ORDER_EVERYTHING: Everything in strict order
> > > + * PAXB_ORDER_IMAP2_ONLY: Only IMAP2 memory window strict order
> > > + * PAXB_ORDER_DEV_MEM_ONLY: Only device memory (MSI/MSIX) is strict order
> > > + */
> > > +enum paxb_order_cfg {
> > > + PAXB_ORDER_EVERYTHING,
> > > + PAXB_ORDER_IMAP2_ONLY,
> > > + PAXB_ORDER_DEV_MEM_ONLY,
> > > +};
> > > +
> > > +/**
> > > * iProc PCIe outbound mapping
> > > * @axi_offset: offset from the AXI address to the internal address used by
> > > * the iProc PCIe core
> > > @@ -74,6 +86,8 @@ struct iproc_msi;
> > > * @ib: inbound mapping related parameters
> > > * @ib_map: outbound mapping region related parameters
> > > *
> > > + * @order_cfg: indicates current value of the order mode.
> > > + *
> > > * @need_msi_steer: indicates additional configuration of the iProc PCIe
> > > * controller is required to steer MSI writes to external interrupt controller
> > > * @msi: MSI data
> > > @@ -102,6 +116,8 @@ struct iproc_pcie {
> > > struct iproc_pcie_ib ib;
> > > const struct iproc_pcie_ib_map *ib_map;
> > >
> > > + enum paxb_order_cfg order_cfg;
> > > +
> > > bool need_msi_steer;
> > > struct iproc_msi *msi;
> > > };
> > > --
> > > 2.7.4
> > >

2019-01-25 09:44:43

by Srinath Mannam

[permalink] [raw]
Subject: Re: [PATCH 1/3] PCI: iproc: Add feature to set order mode

Hi Bjorn,

Thank you, Please see my comments below inline.

On Fri, Jan 25, 2019 at 1:01 AM Bjorn Helgaas <[email protected]> wrote:
>
> On Thu, Jan 24, 2019 at 02:10:18PM +0530, Srinath Mannam wrote:
> > On Fri, Jan 18, 2019 at 8:37 PM Bjorn Helgaas <[email protected]> wrote:
> > > On Fri, Jan 18, 2019 at 09:53:21AM +0530, Srinath Mannam wrote:
> > > > Order mode in RX header of incoming pcie packets can be override to
> > > > strict or loose order based on requirement.
> > > > Sysfs entry is provided to set dynamic and default order modes of upstream
> > > > traffic.
> > > ...
> > > Are you talking about the Relaxed Ordering bit in the TLP Attributes
> > > field? If so, please use the exact language used in the spec and
> > > include a reference, e.g., to PCIe r4.0, sec 2.2.6.4, 2.4, etc.
> > >
> > Yes Relax ordering bit in TLP. I will use spec reference. Thanks.
> > > I'm hesitant about a new sysfs file for this. That automatically
> > > creates a user-space dependency on this iProc feature. Who would use
> > > this file?
> > >
> > This is the specific feature given in iProc, used to improve
> > performance for the endpoints which does not support
> > ordering configuration at its end.
> > This is the reason we used sysfs file, which allows user to change
> > ordering based on endpoint used and requirement.
> > we are using these sysfs files to configure ordering to improve
> > performance in NVMe endpoints with SPDK/DPDK drivers.
> > If we enable this default in kernel, then it is applicable to all
> > endpoints connected. But it may not required for all endpoints.
>
> Normally, relaxed ordering is used only when an endpoint sets the
> "Relaxed Ordering" attribute bit in a TLP. The endpoint is only
> allowed to do that if relaxed ordering is enabled in the Device
> Control register.
Yes, endpoint has to set RO attribute in up-streaming TLPs.
But in most of NVMe SSDs even Relax ordering bit in device controller
register is set,
still it send TLPs with RO attribute disable (strict order).
>
> If I understand correctly, this sysfs knob would let you configure the
> iProc RC so it handles *all* TLPs (or just those in certain address
> ranges) with relaxed ordering, regardless of whether the endpoint has
> relaxed ordering, or even whether it supports relaxed ordering at all.
Yes, iProc can enable/disable RO attribute in given address range regardless of
endpoint settings.
>
> My gut feeling is that this is a messy hack, and if you want the
> performance benefits of relaxed ordering, you should just choose an
> NVMe endpoint that supports it correctly.
Most of the NVMe endpoints available in market are sending TLPs in strict order
even Relax ordering bit in device controller register is set.
This is the reason to add this dynamic ordering feature in iProc IP.
>
> I assume the iProc RC does actually have the smarts to pay attention
> to the Relaxed Ordering bit in TLPs if the endpoint sets it?
Yes, iProc IP scan all TLPs and modify RO attributes as requested.
if suppose endpoint set RO attribute in TLP, but user need it as
strict order then iProc disable it.
>
> > > > To improve performance in few endpoints we required to modify the
> > > > ordering attributes. Using this feature we can override order modes of RX
> > > > packets at IPROC RC.
> > > >
> > > > Ex:
> > > > In PCIe based NVMe SSD endpoints data read/writes from disk is using
> > > > Queue pairs (submit/completion). After completion of block read/write,
> > > > EP writes completion command to completion queue to notify RC.
> > > > So that all data transfers before completion command write are not
> > > > required to strict order except completion command. It means previous all
> > > > packets before completion command write to RC should be written to memory
> > > > and acknowledged.
> > >
> > > IIUC, if Enable Relaxed Ordering in the endpoint's Device Control
> > > register is set, the device is permitted to set the Relaxed Ordering
> > > bit in TLPs it initiates. So I would think that if we set Enable
> > > Relaxed Ordering correctly, the NVMe endpoint should be able to
> > > use Relaxed Ordering for the data transfers and strict ordering (the
> > > default) for the completion command. What am I missing?
> > >
> > As per NVMe spec Enable Relaxed ordering is implementation specific all cards
> > may not support.
>
> Relaxed ordering support is optional for *every* PCIe endpoint, not
> just NVMe. If an endpoint doesn't support relaxed ordering at all, it
> should hardwire the Enable Relaxed Ordering bit to zero.
>
> > > This sysfs file apparently affects the Root Port/Root Complex
> > > behavior, not the Endpoint's behavior. Does that mean iProc ignores
> > > the Relaxed Ordering bit by default, and you're providing a knob that
> > >
> > With sysfs file setting, ordering attributes of all TLPs directed to
> > specific memory window can be override iProc layer based on
> > settings. TLPs to one memory window can keep as strict order and
> > other memory windows can set to strict order.
> >
> > Using sysfs file ordering settings can configure and revert back to default.
> >
> > > makes it pay attention to it? If that's the case, why wouldn't you
> > > enable iProc support for Relaxed Ordering always, by default?
> > >
> > Option is given to user, because few endpoints may support ordering
> > configuration internally.
> > few EPs doesn't support. Based on requirement user can configure
> > ordering settings.
> >
> > > > Signed-off-by: Srinath Mannam <[email protected]>
> > > > Reviewed-by: Ray Jui <[email protected]>
> > > > ---
> > > > drivers/pci/controller/pcie-iproc.c | 128 ++++++++++++++++++++++++++++++++++++
> > > > drivers/pci/controller/pcie-iproc.h | 16 +++++
> > > > 2 files changed, 144 insertions(+)
> > > >
> > > > diff --git a/drivers/pci/controller/pcie-iproc.c b/drivers/pci/controller/pcie-iproc.c
> > > > index c20fd6b..13ce80f 100644
> > > > --- a/drivers/pci/controller/pcie-iproc.c
> > > > +++ b/drivers/pci/controller/pcie-iproc.c
> > > > @@ -57,6 +57,9 @@
> > > > #define PCIE_DL_ACTIVE_SHIFT 2
> > > > #define PCIE_DL_ACTIVE BIT(PCIE_DL_ACTIVE_SHIFT)
> > > >
> > > > +#define MPS_MRRS_CFG_MPS_SHIFT 0
> > > > +#define MPS_MRRS_CFG_MRRS_SHIFT 16
> > > > +
> > > > #define APB_ERR_EN_SHIFT 0
> > > > #define APB_ERR_EN BIT(APB_ERR_EN_SHIFT)
> > > >
> > > > @@ -91,6 +94,14 @@
> > > >
> > > > #define IPROC_PCIE_REG_INVALID 0xffff
> > > >
> > > > +#define RO_FIELD(window) BIT((window) << 1)
> > > > +#define RO_VALUE(window) BIT(((window) << 1) + 1)
> > > > +/* All Windows are allowed */
> > > > +#define RO_ALL_WINDOW 0x33333333
> > > > +/* Wait on All Windows */
> > > > +#define RO_FIELD_ALL_WINDOW 0x11111111
> > > > +#define DYNAMIC_ORDER_MODE 0x5
> > > > +
> > > > /**
> > > > * iProc PCIe outbound mapping controller specific parameters
> > > > *
> > > > @@ -295,6 +306,15 @@ enum iproc_pcie_reg {
> > > > /* enable APB error for unsupported requests */
> > > > IPROC_PCIE_APB_ERR_EN,
> > > >
> > > > + /* Ordering Mode configuration registers */
> > > > + IPROC_PCIE_ORDERING_CFG,
> > > > + IPROC_PCIE_MPS_MRRS_CFG,
> > > > + IPROC_PCIE_IMAP0_RO_CONTROL,
> > > > + IPROC_PCIE_IMAP1_RO_CONTROL,
> > > > + IPROC_PCIE_IMAP2_RO_CONTROL,
> > > > + IPROC_PCIE_IMAP3_RO_CONTROL,
> > > > + IPROC_PCIE_IMAP4_RO_CONTROL,
> > > > +
> > > > /* total number of core registers */
> > > > IPROC_PCIE_MAX_NUM_REG,
> > > > };
> > > > @@ -352,6 +372,13 @@ static const u16 iproc_pcie_reg_paxb_v2[] = {
> > > > [IPROC_PCIE_IMAP4] = 0xe70,
> > > > [IPROC_PCIE_LINK_STATUS] = 0xf0c,
> > > > [IPROC_PCIE_APB_ERR_EN] = 0xf40,
> > > > + [IPROC_PCIE_ORDERING_CFG] = 0x2000,
> > > > + [IPROC_PCIE_MPS_MRRS_CFG] = 0x2008,
> > > > + [IPROC_PCIE_IMAP0_RO_CONTROL] = 0x201c,
> > > > + [IPROC_PCIE_IMAP1_RO_CONTROL] = 0x2020,
> > > > + [IPROC_PCIE_IMAP2_RO_CONTROL] = 0x2024,
> > > > + [IPROC_PCIE_IMAP3_RO_CONTROL] = 0x2028,
> > > > + [IPROC_PCIE_IMAP4_RO_CONTROL] = 0x202c,
> > > > };
> > > >
> > > > /* iProc PCIe PAXC v1 registers */
> > > > @@ -1401,6 +1428,97 @@ static int iproc_pcie_rev_init(struct iproc_pcie *pcie)
> > > > return 0;
> > > > }
> > > >
> > > > +static
> > > > +ssize_t order_mode_show(struct device *dev,
> > > > + struct device_attribute *attr,
> > > > + char *buff)
> > >
> > > Please format the same as the rest of the file, i.e.,
> > >
> > > static ssize_t order_mode_show(struct device *dev, ...
> > >
> > > > +{
> > > > + struct pci_host_bridge *host = to_pci_host_bridge(dev);
> > > > + struct iproc_pcie *pcie = pci_host_bridge_priv(host);
> > > > +
> > > > + return sprintf(buff, "Current PAXB order configuration %d\n",
> > > > + pcie->order_cfg);
> > > > +}
> > > > +
> > > > +static void pcie_iproc_set_dynamic_order(struct iproc_pcie *pcie)
> > > > +{
> > > > + u32 val = 0, mps;
> > > > +
> > > > + /* Set all IMAPs to relaxed order in dynamic order mode */
> > > > + iproc_pcie_write_reg(pcie, IPROC_PCIE_ORDERING_CFG,
> > > > + DYNAMIC_ORDER_MODE);
> > > > + iproc_pcie_write_reg(pcie, IPROC_PCIE_IMAP0_RO_CONTROL,
> > > > + RO_ALL_WINDOW);
> > > > + iproc_pcie_write_reg(pcie, IPROC_PCIE_IMAP1_RO_CONTROL,
> > > > + RO_ALL_WINDOW);
> > > > + iproc_pcie_write_reg(pcie, IPROC_PCIE_IMAP2_RO_CONTROL,
> > > > + RO_ALL_WINDOW);
> > > > + iproc_pcie_write_reg(pcie, IPROC_PCIE_IMAP3_RO_CONTROL,
> > > > + RO_ALL_WINDOW);
> > > > + iproc_pcie_write_reg(pcie, IPROC_PCIE_IMAP4_RO_CONTROL,
> > > > + RO_ALL_WINDOW);
> > > > +
> > > > + /* PAXB MPS/MRRS settings configuration */
> > > > + iproc_pci_raw_config_read32(pcie, 0, IPROC_PCI_EXP_CAP + PCI_EXP_DEVCTL,
> > > > + 2, &mps);
> > > > + mps = (mps & PCI_EXP_DEVCTL_PAYLOAD) >> 5;
> > > > + /* set MRRS to match system MPS */
> > > > + val |= (mps << MPS_MRRS_CFG_MRRS_SHIFT);
> > > > + /* set MPS to 4096 bytes */
> > > > + val |= (0x5 << MPS_MRRS_CFG_MPS_SHIFT);
> > > > + iproc_pcie_write_reg(pcie, IPROC_PCIE_MPS_MRRS_CFG, val);
> > > > +}
> > > > +
> > > > +static
> > > > +ssize_t order_mode_store(struct device *dev,
> > > > + struct device_attribute *attr,
> > > > + const char *buf,
> > > > + size_t count)
> > >
> > > Fix formatting as above.
> > >
> > > > +{
> > > > + struct pci_host_bridge *host = to_pci_host_bridge(dev);
> > > > + struct iproc_pcie *pcie = pci_host_bridge_priv(host);
> > > > + unsigned long val, regval;
> > > > +
> > > > + if (kstrtoul(buf, 0, &val) < 0)
> > > > + return -EINVAL;
> > > > +
> > > > + if (val > PAXB_ORDER_DEV_MEM_ONLY) {
> > > > + dev_err(dev, "Invalid Value passed %lu\n", val);
> > > > + dev_err(dev, "0: Everything in strict order\n");
> > > > + dev_err(dev, "1: Only IMAP2 in strict order\n");
> > > > + dev_err(dev, "2: Only device memory in strict order (MSI/MSIX)\n");
> > > > + return -EINVAL;
> > > > + }
> > > > +
> > > > + if (val == pcie->order_cfg)
> > > > + return count;
> > > > +
> > > > + switch (val) {
> > > > + case PAXB_ORDER_IMAP2_ONLY:
> > > > + pcie_iproc_set_dynamic_order(pcie);
> > > > + regval = RO_ALL_WINDOW;
> > > > + regval &= ~(RO_VALUE(0));
> > > > + /* Set IMAP2 to strict order */
> > > > + iproc_pcie_write_reg(pcie, IPROC_PCIE_IMAP2_RO_CONTROL, regval);
> > > > + dev_info(dev, "RO_IMAP2 set to %#lx\n", regval);
> > > > + break;
> > > > + case PAXB_ORDER_DEV_MEM_ONLY:
> > > > + pcie_iproc_set_dynamic_order(pcie);
> > > > + /* Set IMAP0 to strict order */
> > > > + iproc_pcie_write_reg(pcie, IPROC_PCIE_IMAP0_RO_CONTROL,
> > > > + RO_FIELD_ALL_WINDOW);
> > > > + dev_info(dev, "RO_IMAP0 set to %#x\n", RO_FIELD_ALL_WINDOW);
> > > > + break;
> > > > + default:
> > > > + iproc_pcie_write_reg(pcie, IPROC_PCIE_ORDERING_CFG, 0);
> > > > + break;
> > > > + }
> > > > + pcie->order_cfg = val;
> > > > + return count;
> > > > +}
> > > > +
> > > > +static DEVICE_ATTR_RW(order_mode);
> > > > +
> > > > int iproc_pcie_setup(struct iproc_pcie *pcie, struct list_head *res)
> > > > {
> > > > struct device *dev;
> > > > @@ -1484,6 +1602,11 @@ int iproc_pcie_setup(struct iproc_pcie *pcie, struct list_head *res)
> > > >
> > > > pci_bus_add_devices(host->bus);
> > > >
> > > > + if (pcie->type == IPROC_PCIE_PAXB_V2) {
> > > > + ret = device_create_file(&host->dev, &dev_attr_order_mode);
> > > > + if (ret < 0)
> > > > + goto err_power_off_phy;
> > > > + }
> > > > return 0;
> > > >
> > > > err_power_off_phy:
> > > > @@ -1496,6 +1619,11 @@ EXPORT_SYMBOL(iproc_pcie_setup);
> > > >
> > > > int iproc_pcie_remove(struct iproc_pcie *pcie)
> > > > {
> > > > + struct pci_host_bridge *host = pci_host_bridge_from_priv(pcie);
> > > > +
> > > > + if (pcie->type == IPROC_PCIE_PAXB_V2)
> > > > + device_remove_file(&host->dev, &dev_attr_order_mode);
> > > > +
> > > > pci_stop_root_bus(pcie->root_bus);
> > > > pci_remove_root_bus(pcie->root_bus);
> > > >
> > > > diff --git a/drivers/pci/controller/pcie-iproc.h b/drivers/pci/controller/pcie-iproc.h
> > > > index 4f03ea5..6e100cd 100644
> > > > --- a/drivers/pci/controller/pcie-iproc.h
> > > > +++ b/drivers/pci/controller/pcie-iproc.h
> > > > @@ -24,6 +24,18 @@ enum iproc_pcie_type {
> > > > };
> > > >
> > > > /**
> > > > + * PAXB Dynamic ordering modes
> > > > + * PAXB_ORDER_EVERYTHING: Everything in strict order
> > > > + * PAXB_ORDER_IMAP2_ONLY: Only IMAP2 memory window strict order
> > > > + * PAXB_ORDER_DEV_MEM_ONLY: Only device memory (MSI/MSIX) is strict order
> > > > + */
> > > > +enum paxb_order_cfg {
> > > > + PAXB_ORDER_EVERYTHING,
> > > > + PAXB_ORDER_IMAP2_ONLY,
> > > > + PAXB_ORDER_DEV_MEM_ONLY,
> > > > +};
> > > > +
> > > > +/**
> > > > * iProc PCIe outbound mapping
> > > > * @axi_offset: offset from the AXI address to the internal address used by
> > > > * the iProc PCIe core
> > > > @@ -74,6 +86,8 @@ struct iproc_msi;
> > > > * @ib: inbound mapping related parameters
> > > > * @ib_map: outbound mapping region related parameters
> > > > *
> > > > + * @order_cfg: indicates current value of the order mode.
> > > > + *
> > > > * @need_msi_steer: indicates additional configuration of the iProc PCIe
> > > > * controller is required to steer MSI writes to external interrupt controller
> > > > * @msi: MSI data
> > > > @@ -102,6 +116,8 @@ struct iproc_pcie {
> > > > struct iproc_pcie_ib ib;
> > > > const struct iproc_pcie_ib_map *ib_map;
> > > >
> > > > + enum paxb_order_cfg order_cfg;
> > > > +
> > > > bool need_msi_steer;
> > > > struct iproc_msi *msi;
> > > > };
> > > > --
> > > > 2.7.4
> > > >

2019-01-25 15:07:06

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [PATCH 1/3] PCI: iproc: Add feature to set order mode

On Fri, Jan 25, 2019 at 03:13:38PM +0530, Srinath Mannam wrote:
> On Fri, Jan 25, 2019 at 1:01 AM Bjorn Helgaas <[email protected]> wrote:
> > On Thu, Jan 24, 2019 at 02:10:18PM +0530, Srinath Mannam wrote:
> > > On Fri, Jan 18, 2019 at 8:37 PM Bjorn Helgaas <[email protected]> wrote:
> > > > On Fri, Jan 18, 2019 at 09:53:21AM +0530, Srinath Mannam wrote:
> > > > > Order mode in RX header of incoming pcie packets can be override to
> > > > > strict or loose order based on requirement.
> > > > > Sysfs entry is provided to set dynamic and default order modes of upstream
> > > > > traffic.
> > > > ...
> > > > Are you talking about the Relaxed Ordering bit in the TLP Attributes
> > > > field? If so, please use the exact language used in the spec and
> > > > include a reference, e.g., to PCIe r4.0, sec 2.2.6.4, 2.4, etc.
> > > >
> > > Yes Relax ordering bit in TLP. I will use spec reference. Thanks.
> > > > I'm hesitant about a new sysfs file for this. That automatically
> > > > creates a user-space dependency on this iProc feature. Who would use
> > > > this file?
> > > >
> > > This is the specific feature given in iProc, used to improve
> > > performance for the endpoints which does not support
> > > ordering configuration at its end.
> > > This is the reason we used sysfs file, which allows user to change
> > > ordering based on endpoint used and requirement.
> > > we are using these sysfs files to configure ordering to improve
> > > performance in NVMe endpoints with SPDK/DPDK drivers.
> > > If we enable this default in kernel, then it is applicable to all
> > > endpoints connected. But it may not required for all endpoints.
> >
> > Normally, relaxed ordering is used only when an endpoint sets the
> > "Relaxed Ordering" attribute bit in a TLP. The endpoint is only
> > allowed to do that if relaxed ordering is enabled in the Device
> > Control register.
> Yes, endpoint has to set RO attribute in up-streaming TLPs.
> But in most of NVMe SSDs even Relax ordering bit in device controller
> register is set,
> still it send TLPs with RO attribute disable (strict order).
> >
> > If I understand correctly, this sysfs knob would let you configure the
> > iProc RC so it handles *all* TLPs (or just those in certain address
> > ranges) with relaxed ordering, regardless of whether the endpoint has
> > relaxed ordering, or even whether it supports relaxed ordering at all.
> Yes, iProc can enable/disable RO attribute in given address range regardless of
> endpoint settings.
> >
> > My gut feeling is that this is a messy hack, and if you want the
> > performance benefits of relaxed ordering, you should just choose an
> > NVMe endpoint that supports it correctly.
>
> Most of the NVMe endpoints available in market are sending TLPs in
> strict order even Relax ordering bit in device controller register
> is set. This is the reason to add this dynamic ordering feature in
> iProc IP.

I'm not very sympathetic to this argument about NVMe endpoints not
supporting Relaxed Ordering. Relaxed Ordering has been in the PCIe
spec since the very beginning in 2002 (see PCIe r1.0, sec 2.4.3). Why
should we jump through hoops to tack this on after the fact if the
endpoint designer couldn't be bothered?

This sysfs knob is problematic because it's not safe and it can't
be supported across architectures.

It's not safe because only the endpoint and its driver know the
ordering requirements. We *hope* the user only enables relaxed
ordering when it is safe. Even when it's not safe, it makes things
faster and works most of the time, so it's pretty tempting. But
eventually it will lead to data corruption.

We try to avoid sysfs things that are architecture-specific. This
particular file would only exist on iProc, and then any user-space
software that uses it would only work on iProc.

Bjorn