2017-06-07 09:16:56

by Ding Tianhong

[permalink] [raw]
Subject: [PATCH v3 0/3] Add new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag

Some devices have problems with Transaction Layer Packets with the Relaxed
Ordering Attribute set. This patch set adds a new PCIe Device Flag,
PCI_DEV_FLAGS_NO_RELAXED_ORDERING, a set of PCI Quirks to catch some known
devices with Relaxed Ordering issues, and a use of this new flag by the
cxgb4 driver to avoid using Relaxed Ordering with problematic Root Complex
Ports.

It's been years since I've submitted kernel.org patches, I appolgise for the
almost certain submission errors.

v2: Alexander point out that the v1 was only a part of the whole solution,
some platform which has some issues could use the new flag to indicate
that it is not safe to enable relaxed ordering attribute, then we need
to clear the relaxed ordering enable bits in the PCI configuration when
initializing the device. So add a new second patch to modify the PCI
initialization code to clear the relaxed ordering enable bit in the
event that the root complex doesn't want relaxed ordering enabled.

The third patch was base on the v1's second patch and only be changed
to query the relaxed ordering enable bit in the PCI configuration space
to allow the Chelsio NIC to send TLPs with the relaxed ordering attributes
set.

This version didn't plan to drop the defines for Intel Drivers to use the
new checking way to enable relaxed ordering because it is not the hardest
part of the moment, we could fix it in next patchset when this patches
reach the goal.

v3: Redesigned the logic for pci_configure_relaxed_ordering when configuration,
If a PCIe device didn't enable the relaxed ordering attribute default,
we should not do anything in the PCIe configuration, otherwise we
should check if any of the devices above us do not support relaxed
ordering by the PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag, then base on
the result if we get a return that indicate that the relaxed ordering
is not supported we should update our device to disable relaxed ordering
in configuration space. If the device above us doesn't exist or isn't
the PCIe device, we shouldn't do anything and skip updating relaxed ordering
because we are probably running in a guest.

Casey Leedom (2):
PCI: Add new PCIe Fabric End Node flag,
PCI_DEV_FLAGS_NO_RELAXED_ORDERING
net/cxgb4: Use new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag

Ding Tianhong (1):
PCI: Enable PCIe Relaxed Ordering if supported

drivers/net/ethernet/chelsio/cxgb4/cxgb4.h | 1 +
drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c | 17 ++++++++++
drivers/net/ethernet/chelsio/cxgb4/sge.c | 5 +--
drivers/pci/pci.c | 29 +++++++++++++++++
drivers/pci/probe.c | 43 +++++++++++++++++++++++++
drivers/pci/quirks.c | 38 ++++++++++++++++++++++
include/linux/pci.h | 4 +++
7 files changed, 135 insertions(+), 2 deletions(-)

--
1.9.0



2017-06-07 09:16:50

by Ding Tianhong

[permalink] [raw]
Subject: [PATCH v3 2/3] PCI: Enable PCIe Relaxed Ordering if supported

The PCIe Device Control Register use the bit 4 to indicate that
whether the device is permitted to enable relaxed ordering or not.
But relaxed ordering is not safe for some platform which could only
use strong write ordering, so devices are allowed (but not required)
to enable relaxed ordering bit by default.

If a PCIe device didn't enable the relaxed ordering attribute default,
we should not do anything in the PCIe configuration, otherwise we
should check if any of the devices above us do not support relaxed
ordering by the PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag, then base on
the result if we get a return that indicate that the relaxed ordering
is not supported we should update our device to disable relaxed ordering
in configuration space. If the device above us doesn't exist or isn't
the PCIe device, we shouldn't do anything and skip updating relaxed ordering
because we are probably running in a guest.

Signed-off-by: Ding Tianhong <[email protected]>
---
drivers/pci/pci.c | 29 +++++++++++++++++++++++++++++
drivers/pci/probe.c | 43 +++++++++++++++++++++++++++++++++++++++++++
include/linux/pci.h | 2 ++
3 files changed, 74 insertions(+)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index b01bd5b..3d42b38 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -4878,6 +4878,35 @@ int pcie_set_mps(struct pci_dev *dev, int mps)
EXPORT_SYMBOL(pcie_set_mps);

/**
+ * pcie_clear_relaxed_ordering - clear PCI Express relexed ordering bit
+ * @dev: PCI device to query
+ *
+ * If possible clear relaxed ordering
+ */
+int pcie_clear_relaxed_ordering(struct pci_dev *dev)
+{
+ return pcie_capability_clear_word(dev, PCI_EXP_DEVCTL,
+ PCI_EXP_DEVCTL_RELAX_EN);
+}
+EXPORT_SYMBOL(pcie_clear_relaxed_ordering);
+
+/**
+ * pcie_get_relaxed_ordering - check PCI Express relexed ordering bit
+ * @dev: PCI device to query
+ *
+ * Returns true if relaxed ordering is been set
+ */
+int pcie_get_relaxed_ordering(struct pci_dev *dev)
+{
+ u16 v;
+
+ pcie_capability_read_word(dev, PCI_EXP_DEVCTL, &v);
+
+ return (v & PCI_EXP_DEVCTL_RELAX_EN) >> 4;
+}
+EXPORT_SYMBOL(pcie_get_relaxed_ordering);
+
+/**
* pcie_get_minimum_link - determine minimum link settings of a PCI device
* @dev: PCI device to query
* @speed: storage for minimum speed
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 19c8950..0c94c80 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -1701,6 +1701,48 @@ static void pci_configure_extended_tags(struct pci_dev *dev)
PCI_EXP_DEVCTL_EXT_TAG);
}

+/**
+ * pci_dev_disable_relaxed_ordering - check if the PCI device
+ * should disable the relaxed ordering attribute.
+ * @dev: PCI device
+ *
+ * Return true if any of the PCI devices above us do not support
+ * relaxed ordering.
+ */
+static int pci_dev_disable_relaxed_ordering(struct pci_dev *dev)
+{
+ int ro_disabled = 0;
+
+ while(dev) {
+ if (dev->dev_flags & PCI_DEV_FLAGS_NO_RELAXED_ORDERING) {
+ ro_disabled = 1;
+ break;
+ }
+ dev = dev->bus->self;
+ }
+
+ return ro_disabled;
+}
+
+static void pci_configure_relaxed_ordering(struct pci_dev *dev)
+{
+ struct pci_dev *bridge = pci_upstream_bridge(dev);
+ int origin_ero;
+
+ if (!pci_is_pcie(dev) || !bridge || !pci_is_pcie(bridge))
+ return;
+
+ origin_ero = pcie_get_relaxed_ordering(dev);
+ /* If the releaxed ordering enable bit is not set, do nothing. */
+ if (!origin_ero)
+ return;
+
+ if (pci_dev_disable_relaxed_ordering(dev)) {
+ pcie_clear_relaxed_ordering(dev);
+ dev_info(&dev->dev, "Disable Relaxed Ordering\n");
+ }
+}
+
static void pci_configure_device(struct pci_dev *dev)
{
struct hotplug_params hpp;
@@ -1708,6 +1750,7 @@ static void pci_configure_device(struct pci_dev *dev)

pci_configure_mps(dev);
pci_configure_extended_tags(dev);
+ pci_configure_relaxed_ordering(dev);

memset(&hpp, 0, sizeof(hpp));
ret = pci_get_hp_params(dev, &hpp);
diff --git a/include/linux/pci.h b/include/linux/pci.h
index e1e8428..299d2f3 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1105,6 +1105,8 @@ int __pci_enable_wake(struct pci_dev *dev, pci_power_t state,
void pci_pme_wakeup_bus(struct pci_bus *bus);
void pci_d3cold_enable(struct pci_dev *dev);
void pci_d3cold_disable(struct pci_dev *dev);
+int pcie_clear_relaxed_ordering(struct pci_dev *dev);
+int pcie_get_relaxed_ordering(struct pci_dev *dev);

static inline int pci_enable_wake(struct pci_dev *dev, pci_power_t state,
bool enable)
--
1.9.0


2017-06-07 09:16:58

by Ding Tianhong

[permalink] [raw]
Subject: [PATCH v3 1/3] PCI: Add new PCIe Fabric End Node flag, PCI_DEV_FLAGS_NO_RELAXED_ORDERING

From: Casey Leedom <[email protected]>

The new flag PCI_DEV_FLAGS_NO_RELAXED_ORDERING indicates that the Relaxed
Ordering Attribute should not be used on Transaction Layer Packets destined
for the PCIe End Node so flagged. Initially flagged this way are Intel
E5-26xx Root Complex Ports which suffer from a Flow Control Credit
Performance Problem and AMD A1100 ARM ("SEATTLE") Root Complex Ports which
don't obey PCIe 3.0 ordering rules which can lead to Data Corruption.

Signed-off-by: Casey Leedom <[email protected]>
Signed-off-by: Ding Tianhong <[email protected]>
---
drivers/pci/quirks.c | 38 ++++++++++++++++++++++++++++++++++++++
include/linux/pci.h | 2 ++
2 files changed, 40 insertions(+)

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 085fb78..58bdd23 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -3999,6 +3999,44 @@ static void quirk_tw686x_class(struct pci_dev *pdev)
quirk_tw686x_class);

/*
+ * Some devices have problems with Transaction Layer Packets with the Relaxed
+ * Ordering Attribute set. Such devices should mark themselves and other
+ * Device Drivers should check before sending TLPs with RO set.
+ */
+static void quirk_relaxedordering_disable(struct pci_dev *dev)
+{
+ dev->dev_flags |= PCI_DEV_FLAGS_NO_RELAXED_ORDERING;
+}
+
+/*
+ * Intel E5-26xx Root Complex has a Flow Control Credit issue which can
+ * cause performance problems with Upstream Transaction Layer Packets with
+ * Relaxed Ordering set.
+ */
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f02, PCI_CLASS_NOT_DEFINED, 8,
+ quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f04, PCI_CLASS_NOT_DEFINED, 8,
+ quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, 0x6f08, PCI_CLASS_NOT_DEFINED, 8,
+ quirk_relaxedordering_disable);
+
+/*
+ * The AMD ARM A1100 (AKA "SEATTLE") SoC has a bug in its PCIe Root Complex
+ * where Upstream Transaction Layer Packets with the Relaxed Ordering
+ * Attribute clear are allowed to bypass earlier TLPs with Relaxed Ordering
+ * set. This is a violation of the PCIe 3.0 Transaction Ordering Rules
+ * outlined in Section 2.4.1 (PCI Express(r) Base Specification Revision 3.0
+ * November 10, 2010). As a result, on this platform we can't use Relaxed
+ * Ordering for Upstream TLPs.
+ */
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_AMD, 0x1a00, PCI_CLASS_NOT_DEFINED, 8,
+ quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_AMD, 0x1a01, PCI_CLASS_NOT_DEFINED, 8,
+ quirk_relaxedordering_disable);
+DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_AMD, 0x1a02, PCI_CLASS_NOT_DEFINED, 8,
+ quirk_relaxedordering_disable);
+
+/*
* Per PCIe r3.0, sec 2.2.9, "Completion headers must supply the same
* values for the Attribute as were supplied in the header of the
* corresponding Request, except as explicitly allowed when IDO is used."
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 33c2b0b..e1e8428 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -183,6 +183,8 @@ enum pci_dev_flags {
PCI_DEV_FLAGS_BRIDGE_XLATE_ROOT = (__force pci_dev_flags_t) (1 << 9),
/* Do not use FLR even if device advertises PCI_AF_CAP */
PCI_DEV_FLAGS_NO_FLR_RESET = (__force pci_dev_flags_t) (1 << 10),
+ /* Don't use Relaxed Ordering for TLPs directed at this device */
+ PCI_DEV_FLAGS_NO_RELAXED_ORDERING = (__force pci_dev_flags_t) (1 << 11),
};

enum pci_irq_reroute_variant {
--
1.9.0


2017-06-07 09:17:21

by Ding Tianhong

[permalink] [raw]
Subject: [PATCH v3 3/3] net/cxgb4: Use new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag

From: Casey Leedom <[email protected]>

cxgb4 Ethernet driver now queries Root Complex Port to determine if it can
send TLPs to it with the Relaxed Ordering Attribute set.

Signed-off-by: Casey Leedom <[email protected]>
Signed-off-by: Ding Tianhong <[email protected]>
---
drivers/net/ethernet/chelsio/cxgb4/cxgb4.h | 1 +
drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c | 17 +++++++++++++++++
drivers/net/ethernet/chelsio/cxgb4/sge.c | 5 +++--
3 files changed, 21 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
index e88c180..478f25a 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
@@ -521,6 +521,7 @@ enum { /* adapter flags */
USING_SOFT_PARAMS = (1 << 6),
MASTER_PF = (1 << 7),
FW_OFLD_CONN = (1 << 9),
+ ROOT_NO_RELAXED_ORDERING = (1 << 10),
};

enum {
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
index 38a5c67..fbfe341 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
@@ -4628,6 +4628,7 @@ static int init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
#ifdef CONFIG_PCI_IOV
u32 v, port_vec;
#endif
+ struct pci_dev *root;

printk_once(KERN_INFO "%s - version %s\n", DRV_DESC, DRV_VERSION);

@@ -4726,6 +4727,22 @@ static int init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
adapter->msg_enable = DFLT_MSG_ENABLE;
memset(adapter->chan_map, 0xff, sizeof(adapter->chan_map));

+ /* If possible, we use PCIe Relaxed Ordering Attribute to deliver
+ * Ingress Packet Data to Free List Buffers in order to allow for
+ * chipset performance optimizations between the Root Complex and
+ * Memory Controllers. (Messages to the associated Ingress Queue
+ * notifying new Packet Placement in the Free Lists Buffers will be
+ * send without the Relaxed Ordering Attribute thus guaranteing that
+ * all preceding PCIe Transaction Layer Packets will be processed
+ * first.) But some Root Complexes have various issues with Upstream
+ * Transaction Layer Packets with the Relaxed Ordering Attribute set.
+ * So we check our Root Complex to see if it's flaged with advice
+ * against using Relaxed Ordering.
+ */
+ root = pci_find_pcie_root_port(adapter->pdev);
+ if (pcie_get_relaxed_ordering(root))
+ adapter->flags |= ROOT_NO_RELAXED_ORDERING;
+
spin_lock_init(&adapter->stats_lock);
spin_lock_init(&adapter->tid_release_lock);
spin_lock_init(&adapter->win0_lock);
diff --git a/drivers/net/ethernet/chelsio/cxgb4/sge.c b/drivers/net/ethernet/chelsio/cxgb4/sge.c
index f05f0d4..ac229a3 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/sge.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/sge.c
@@ -2571,6 +2571,7 @@ int t4_sge_alloc_rxq(struct adapter *adap, struct sge_rspq *iq, bool fwevtq,
struct fw_iq_cmd c;
struct sge *s = &adap->sge;
struct port_info *pi = netdev_priv(dev);
+ int relaxed = !(adap->flags & ROOT_NO_RELAXED_ORDERING);

/* Size needs to be multiple of 16, including status entry. */
iq->size = roundup(iq->size, 16);
@@ -2624,8 +2625,8 @@ int t4_sge_alloc_rxq(struct adapter *adap, struct sge_rspq *iq, bool fwevtq,

flsz = fl->size / 8 + s->stat_len / sizeof(struct tx_desc);
c.iqns_to_fl0congen |= htonl(FW_IQ_CMD_FL0PACKEN_F |
- FW_IQ_CMD_FL0FETCHRO_F |
- FW_IQ_CMD_FL0DATARO_F |
+ FW_IQ_CMD_FL0FETCHRO_V(relaxed) |
+ FW_IQ_CMD_FL0DATARO_V(relaxed) |
FW_IQ_CMD_FL0PADEN_F);
if (cong >= 0)
c.iqns_to_fl0congen |=
--
1.9.0


2017-06-07 17:56:50

by John Garry

[permalink] [raw]
Subject: Re: [PATCH v3 2/3] PCI: Enable PCIe Relaxed Ordering if supported

On 07/06/2017 10:16, Ding Tianhong wrote:

Hi Ding,

A few general style comments:

> The PCIe Device Control Register use the bit 4 to indicate that
> whether the device is permitted to enable relaxed ordering or not.
> But relaxed ordering is not safe for some platform which could only
> use strong write ordering, so devices are allowed (but not required)
> to enable relaxed ordering bit by default.
>
> If a PCIe device didn't enable the relaxed ordering attribute default,
> we should not do anything in the PCIe configuration, otherwise we
> should check if any of the devices above us do not support relaxed
> ordering by the PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag, then base on
> the result if we get a return that indicate that the relaxed ordering
> is not supported we should update our device to disable relaxed ordering
> in configuration space. If the device above us doesn't exist or isn't
> the PCIe device, we shouldn't do anything and skip updating relaxed ordering
> because we are probably running in a guest.

A guest machine/environment

>
> Signed-off-by: Ding Tianhong <[email protected]>
> ---
> drivers/pci/pci.c | 29 +++++++++++++++++++++++++++++
> drivers/pci/probe.c | 43 +++++++++++++++++++++++++++++++++++++++++++
> include/linux/pci.h | 2 ++
> 3 files changed, 74 insertions(+)
>
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index b01bd5b..3d42b38 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -4878,6 +4878,35 @@ int pcie_set_mps(struct pci_dev *dev, int mps)
> EXPORT_SYMBOL(pcie_set_mps);
>
> /**
> + * pcie_clear_relaxed_ordering - clear PCI Express relexed ordering bit
> + * @dev: PCI device to query
> + *
> + * If possible clear relaxed ordering
> + */
> +int pcie_clear_relaxed_ordering(struct pci_dev *dev)
> +{
> + return pcie_capability_clear_word(dev, PCI_EXP_DEVCTL,
> + PCI_EXP_DEVCTL_RELAX_EN);
> +}
> +EXPORT_SYMBOL(pcie_clear_relaxed_ordering);
> +
> +/**
> + * pcie_get_relaxed_ordering - check PCI Express relexed ordering bit

s/relexed/relaxed/

Check what on relaxed ordering bit?

And the function name is inconsistent with this discription.

> + * @dev: PCI device to query
> + *
> + * Returns true if relaxed ordering is been set

If you want to return true/false, then use !!, below in the function

> + */
> +int pcie_get_relaxed_ordering(struct pci_dev *dev)
> +{
> + u16 v;
> +
> + pcie_capability_read_word(dev, PCI_EXP_DEVCTL, &v);
> +
> + return (v & PCI_EXP_DEVCTL_RELAX_EN) >> 4;
> +}
> +EXPORT_SYMBOL(pcie_get_relaxed_ordering);
> +
> +/**
> * pcie_get_minimum_link - determine minimum link settings of a PCI device
> * @dev: PCI device to query
> * @speed: storage for minimum speed
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index 19c8950..0c94c80 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -1701,6 +1701,48 @@ static void pci_configure_extended_tags(struct pci_dev *dev)
> PCI_EXP_DEVCTL_EXT_TAG);
> }
>
> +/**
> + * pci_dev_disable_relaxed_ordering - check if the PCI device
> + * should disable the relaxed ordering attribute.

I think that we need a more accurate description. I know some people
think a function which just "checks" is vague.

> + * @dev: PCI device
> + *
> + * Return true if any of the PCI devices above us do not support
> + * relaxed ordering.
> + */
> +static int pci_dev_disable_relaxed_ordering(struct pci_dev *dev)

The function name implies an action - disabling - but this function does
nothing except return a value

> +{
> + int ro_disabled = 0;
> +
> + while(dev) {

Did you run checkpatch?

> + if (dev->dev_flags & PCI_DEV_FLAGS_NO_RELAXED_ORDERING) {
> + ro_disabled = 1;

just return true, and return false at the bottom, so you can do away
with ro_disabled (which is not a bool)

> + break;
> + }
> + dev = dev->bus->self;
> + }
> +
> + return ro_disabled;
> +}
> +
> +static void pci_configure_relaxed_ordering(struct pci_dev *dev)
> +{
> + struct pci_dev *bridge = pci_upstream_bridge(dev);
> + int origin_ero;

You don't need this variable

> +
> + if (!pci_is_pcie(dev) || !bridge || !pci_is_pcie(bridge))
> + return;
> +
> + origin_ero = pcie_get_relaxed_ordering(dev);
> + /* If the releaxed ordering enable bit is not set, do nothing. */
> + if (!origin_ero)
> + return;
> +
> + if (pci_dev_disable_relaxed_ordering(dev)) {
> + pcie_clear_relaxed_ordering(dev);
> + dev_info(&dev->dev, "Disable Relaxed Ordering\n");
> + }
> +}
> +
> static void pci_configure_device(struct pci_dev *dev)
> {
> struct hotplug_params hpp;
> @@ -1708,6 +1750,7 @@ static void pci_configure_device(struct pci_dev *dev)
>
> pci_configure_mps(dev);
> pci_configure_extended_tags(dev);
> + pci_configure_relaxed_ordering(dev);
>
> memset(&hpp, 0, sizeof(hpp));
> ret = pci_get_hp_params(dev, &hpp);
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index e1e8428..299d2f3 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -1105,6 +1105,8 @@ int __pci_enable_wake(struct pci_dev *dev, pci_power_t state,
> void pci_pme_wakeup_bus(struct pci_bus *bus);
> void pci_d3cold_enable(struct pci_dev *dev);
> void pci_d3cold_disable(struct pci_dev *dev);
> +int pcie_clear_relaxed_ordering(struct pci_dev *dev);
> +int pcie_get_relaxed_ordering(struct pci_dev *dev);
>
> static inline int pci_enable_wake(struct pci_dev *dev, pci_power_t state,
> bool enable)
>


2017-06-07 23:24:46

by Alexander Duyck

[permalink] [raw]
Subject: Re: [PATCH v3 3/3] net/cxgb4: Use new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag

On Wed, Jun 7, 2017 at 2:16 AM, Ding Tianhong <[email protected]> wrote:
> From: Casey Leedom <[email protected]>
>
> cxgb4 Ethernet driver now queries Root Complex Port to determine if it can
> send TLPs to it with the Relaxed Ordering Attribute set.
>
> Signed-off-by: Casey Leedom <[email protected]>
> Signed-off-by: Ding Tianhong <[email protected]>

So I am pretty sure this patch doesn't work with patch 2. We need to
update it so that it doesn't check the root complex but instead checks
itself to see if it is allowed to use relaxed ordering.

What we need here is the ability to detect if relaxed ordering is
disabled, and if so take the steps needed to enable peer to peer
relaxed ordering without enabling relaxed ordering to the root
complex. Do I have that right Casey?

> ---
> drivers/net/ethernet/chelsio/cxgb4/cxgb4.h | 1 +
> drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c | 17 +++++++++++++++++
> drivers/net/ethernet/chelsio/cxgb4/sge.c | 5 +++--
> 3 files changed, 21 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
> index e88c180..478f25a 100644
> --- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
> +++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
> @@ -521,6 +521,7 @@ enum { /* adapter flags */
> USING_SOFT_PARAMS = (1 << 6),
> MASTER_PF = (1 << 7),
> FW_OFLD_CONN = (1 << 9),
> + ROOT_NO_RELAXED_ORDERING = (1 << 10),
> };
>
> enum {
> diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
> index 38a5c67..fbfe341 100644
> --- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
> +++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
> @@ -4628,6 +4628,7 @@ static int init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
> #ifdef CONFIG_PCI_IOV
> u32 v, port_vec;
> #endif
> + struct pci_dev *root;
>
> printk_once(KERN_INFO "%s - version %s\n", DRV_DESC, DRV_VERSION);
>
> @@ -4726,6 +4727,22 @@ static int init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
> adapter->msg_enable = DFLT_MSG_ENABLE;
> memset(adapter->chan_map, 0xff, sizeof(adapter->chan_map));
>
> + /* If possible, we use PCIe Relaxed Ordering Attribute to deliver
> + * Ingress Packet Data to Free List Buffers in order to allow for
> + * chipset performance optimizations between the Root Complex and
> + * Memory Controllers. (Messages to the associated Ingress Queue
> + * notifying new Packet Placement in the Free Lists Buffers will be
> + * send without the Relaxed Ordering Attribute thus guaranteing that
> + * all preceding PCIe Transaction Layer Packets will be processed
> + * first.) But some Root Complexes have various issues with Upstream
> + * Transaction Layer Packets with the Relaxed Ordering Attribute set.
> + * So we check our Root Complex to see if it's flaged with advice
> + * against using Relaxed Ordering.
> + */
> + root = pci_find_pcie_root_port(adapter->pdev);
> + if (pcie_get_relaxed_ordering(root))
> + adapter->flags |= ROOT_NO_RELAXED_ORDERING;
> +
> spin_lock_init(&adapter->stats_lock);
> spin_lock_init(&adapter->tid_release_lock);
> spin_lock_init(&adapter->win0_lock);
> diff --git a/drivers/net/ethernet/chelsio/cxgb4/sge.c b/drivers/net/ethernet/chelsio/cxgb4/sge.c
> index f05f0d4..ac229a3 100644
> --- a/drivers/net/ethernet/chelsio/cxgb4/sge.c
> +++ b/drivers/net/ethernet/chelsio/cxgb4/sge.c
> @@ -2571,6 +2571,7 @@ int t4_sge_alloc_rxq(struct adapter *adap, struct sge_rspq *iq, bool fwevtq,
> struct fw_iq_cmd c;
> struct sge *s = &adap->sge;
> struct port_info *pi = netdev_priv(dev);
> + int relaxed = !(adap->flags & ROOT_NO_RELAXED_ORDERING);
>
> /* Size needs to be multiple of 16, including status entry. */
> iq->size = roundup(iq->size, 16);
> @@ -2624,8 +2625,8 @@ int t4_sge_alloc_rxq(struct adapter *adap, struct sge_rspq *iq, bool fwevtq,
>
> flsz = fl->size / 8 + s->stat_len / sizeof(struct tx_desc);
> c.iqns_to_fl0congen |= htonl(FW_IQ_CMD_FL0PACKEN_F |
> - FW_IQ_CMD_FL0FETCHRO_F |
> - FW_IQ_CMD_FL0DATARO_F |
> + FW_IQ_CMD_FL0FETCHRO_V(relaxed) |
> + FW_IQ_CMD_FL0DATARO_V(relaxed) |
> FW_IQ_CMD_FL0PADEN_F);
> if (cong >= 0)
> c.iqns_to_fl0congen |=
> --
> 1.9.0
>
>

2017-06-09 09:14:50

by Ding Tianhong

[permalink] [raw]
Subject: Re: [PATCH v3 2/3] PCI: Enable PCIe Relaxed Ordering if supported


Hi John:

Thanks for the reviewing, I will fix it in next version.

Ding

On 2017/6/8 1:55, John Garry wrote:
> On 07/06/2017 10:16, Ding Tianhong wrote:
>
> Hi Ding,
>
> A few general style comments:
>
>> The PCIe Device Control Register use the bit 4 to indicate that
>> whether the device is permitted to enable relaxed ordering or not.
>> But relaxed ordering is not safe for some platform which could only
>> use strong write ordering, so devices are allowed (but not required)
>> to enable relaxed ordering bit by default.
>>
>> If a PCIe device didn't enable the relaxed ordering attribute default,
>> we should not do anything in the PCIe configuration, otherwise we
>> should check if any of the devices above us do not support relaxed
>> ordering by the PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag, then base on
>> the result if we get a return that indicate that the relaxed ordering
>> is not supported we should update our device to disable relaxed ordering
>> in configuration space. If the device above us doesn't exist or isn't
>> the PCIe device, we shouldn't do anything and skip updating relaxed ordering
>> because we are probably running in a guest.
>
> A guest machine/environment
>
>>
>> Signed-off-by: Ding Tianhong <[email protected]>
>> ---
>> drivers/pci/pci.c | 29 +++++++++++++++++++++++++++++
>> drivers/pci/probe.c | 43 +++++++++++++++++++++++++++++++++++++++++++
>> include/linux/pci.h | 2 ++
>> 3 files changed, 74 insertions(+)
>>
>> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
>> index b01bd5b..3d42b38 100644
>> --- a/drivers/pci/pci.c
>> +++ b/drivers/pci/pci.c
>> @@ -4878,6 +4878,35 @@ int pcie_set_mps(struct pci_dev *dev, int mps)
>> EXPORT_SYMBOL(pcie_set_mps);
>>
>> /**
>> + * pcie_clear_relaxed_ordering - clear PCI Express relexed ordering bit
>> + * @dev: PCI device to query
>> + *
>> + * If possible clear relaxed ordering
>> + */
>> +int pcie_clear_relaxed_ordering(struct pci_dev *dev)
>> +{
>> + return pcie_capability_clear_word(dev, PCI_EXP_DEVCTL,
>> + PCI_EXP_DEVCTL_RELAX_EN);
>> +}
>> +EXPORT_SYMBOL(pcie_clear_relaxed_ordering);
>> +
>> +/**
>> + * pcie_get_relaxed_ordering - check PCI Express relexed ordering bit
>
> s/relexed/relaxed/
>
> Check what on relaxed ordering bit?
>
> And the function name is inconsistent with this discription.
>
>> + * @dev: PCI device to query
>> + *
>> + * Returns true if relaxed ordering is been set
>
> If you want to return true/false, then use !!, below in the function
>
>> + */
>> +int pcie_get_relaxed_ordering(struct pci_dev *dev)
>> +{
>> + u16 v;
>> +
>> + pcie_capability_read_word(dev, PCI_EXP_DEVCTL, &v);
>> +
>> + return (v & PCI_EXP_DEVCTL_RELAX_EN) >> 4;
>> +}
>> +EXPORT_SYMBOL(pcie_get_relaxed_ordering);
>> +
>> +/**
>> * pcie_get_minimum_link - determine minimum link settings of a PCI device
>> * @dev: PCI device to query
>> * @speed: storage for minimum speed
>> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
>> index 19c8950..0c94c80 100644
>> --- a/drivers/pci/probe.c
>> +++ b/drivers/pci/probe.c
>> @@ -1701,6 +1701,48 @@ static void pci_configure_extended_tags(struct pci_dev *dev)
>> PCI_EXP_DEVCTL_EXT_TAG);
>> }
>>
>> +/**
>> + * pci_dev_disable_relaxed_ordering - check if the PCI device
>> + * should disable the relaxed ordering attribute.
>
> I think that we need a more accurate description. I know some people think a function which just "checks" is vague.
>
>> + * @dev: PCI device
>> + *
>> + * Return true if any of the PCI devices above us do not support
>> + * relaxed ordering.
>> + */
>> +static int pci_dev_disable_relaxed_ordering(struct pci_dev *dev)
>
> The function name implies an action - disabling - but this function does nothing except return a value
>
>> +{
>> + int ro_disabled = 0;
>> +
>> + while(dev) {
>
> Did you run checkpatch?
>
>> + if (dev->dev_flags & PCI_DEV_FLAGS_NO_RELAXED_ORDERING) {
>> + ro_disabled = 1;
>
> just return true, and return false at the bottom, so you can do away with ro_disabled (which is not a bool)
>
>> + break;
>> + }
>> + dev = dev->bus->self;
>> + }
>> +
>> + return ro_disabled;
>> +}
>> +
>> +static void pci_configure_relaxed_ordering(struct pci_dev *dev)
>> +{
>> + struct pci_dev *bridge = pci_upstream_bridge(dev);
>> + int origin_ero;
>
> You don't need this variable
>
>> +
>> + if (!pci_is_pcie(dev) || !bridge || !pci_is_pcie(bridge))
>> + return;
>> +
>> + origin_ero = pcie_get_relaxed_ordering(dev);
>> + /* If the releaxed ordering enable bit is not set, do nothing. */
>> + if (!origin_ero)
>> + return;
>> +
>> + if (pci_dev_disable_relaxed_ordering(dev)) {
>> + pcie_clear_relaxed_ordering(dev);
>> + dev_info(&dev->dev, "Disable Relaxed Ordering\n");
>> + }
>> +}
>> +
>> static void pci_configure_device(struct pci_dev *dev)
>> {
>> struct hotplug_params hpp;
>> @@ -1708,6 +1750,7 @@ static void pci_configure_device(struct pci_dev *dev)
>>
>> pci_configure_mps(dev);
>> pci_configure_extended_tags(dev);
>> + pci_configure_relaxed_ordering(dev);
>>
>> memset(&hpp, 0, sizeof(hpp));
>> ret = pci_get_hp_params(dev, &hpp);
>> diff --git a/include/linux/pci.h b/include/linux/pci.h
>> index e1e8428..299d2f3 100644
>> --- a/include/linux/pci.h
>> +++ b/include/linux/pci.h
>> @@ -1105,6 +1105,8 @@ int __pci_enable_wake(struct pci_dev *dev, pci_power_t state,
>> void pci_pme_wakeup_bus(struct pci_bus *bus);
>> void pci_d3cold_enable(struct pci_dev *dev);
>> void pci_d3cold_disable(struct pci_dev *dev);
>> +int pcie_clear_relaxed_ordering(struct pci_dev *dev);
>> +int pcie_get_relaxed_ordering(struct pci_dev *dev);
>>
>> static inline int pci_enable_wake(struct pci_dev *dev, pci_power_t state,
>> bool enable)
>>
>
>
>
> .
>

2017-06-12 06:55:56

by Ding Tianhong

[permalink] [raw]
Subject: Re: [PATCH v3 3/3] net/cxgb4: Use new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag



On 2017/6/8 7:24, Alexander Duyck wrote:
> On Wed, Jun 7, 2017 at 2:16 AM, Ding Tianhong <[email protected]> wrote:
>> From: Casey Leedom <[email protected]>
>>
>> cxgb4 Ethernet driver now queries Root Complex Port to determine if it can
>> send TLPs to it with the Relaxed Ordering Attribute set.
>>
>> Signed-off-by: Casey Leedom <[email protected]>
>> Signed-off-by: Ding Tianhong <[email protected]>
>
> So I am pretty sure this patch doesn't work with patch 2. We need to
> update it so that it doesn't check the root complex but instead checks
> itself to see if it is allowed to use relaxed ordering.
>

Right, we should check the End Point PCIe device configuration space, not RC.

> What we need here is the ability to detect if relaxed ordering is
> disabled, and if so take the steps needed to enable peer to peer
> relaxed ordering without enabling relaxed ordering to the root
> complex. Do I have that right Casey?
>

I am not very clear to this driver about how to enable peer to peer
relaxed ordering without enabling relaxed ordering to the RC, need
some help from Casey, so I will still focus on this patch and only
fix the peer to RC relaxed ordering problem, I hope Casey could send
another patch to fix it later.

Thanks
Ding

>> ---
>> drivers/net/ethernet/chelsio/cxgb4/cxgb4.h | 1 +
>> drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c | 17 +++++++++++++++++
>> drivers/net/ethernet/chelsio/cxgb4/sge.c | 5 +++--
>> 3 files changed, 21 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
>> index e88c180..478f25a 100644
>> --- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
>> +++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
>> @@ -521,6 +521,7 @@ enum { /* adapter flags */
>> USING_SOFT_PARAMS = (1 << 6),
>> MASTER_PF = (1 << 7),
>> FW_OFLD_CONN = (1 << 9),
>> + ROOT_NO_RELAXED_ORDERING = (1 << 10),
>> };
>>
>> enum {
>> diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
>> index 38a5c67..fbfe341 100644
>> --- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
>> +++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
>> @@ -4628,6 +4628,7 @@ static int init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
>> #ifdef CONFIG_PCI_IOV
>> u32 v, port_vec;
>> #endif
>> + struct pci_dev *root;
>>
>> printk_once(KERN_INFO "%s - version %s\n", DRV_DESC, DRV_VERSION);
>>
>> @@ -4726,6 +4727,22 @@ static int init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
>> adapter->msg_enable = DFLT_MSG_ENABLE;
>> memset(adapter->chan_map, 0xff, sizeof(adapter->chan_map));
>>
>> + /* If possible, we use PCIe Relaxed Ordering Attribute to deliver
>> + * Ingress Packet Data to Free List Buffers in order to allow for
>> + * chipset performance optimizations between the Root Complex and
>> + * Memory Controllers. (Messages to the associated Ingress Queue
>> + * notifying new Packet Placement in the Free Lists Buffers will be
>> + * send without the Relaxed Ordering Attribute thus guaranteing that
>> + * all preceding PCIe Transaction Layer Packets will be processed
>> + * first.) But some Root Complexes have various issues with Upstream
>> + * Transaction Layer Packets with the Relaxed Ordering Attribute set.
>> + * So we check our Root Complex to see if it's flaged with advice
>> + * against using Relaxed Ordering.
>> + */
>> + root = pci_find_pcie_root_port(adapter->pdev);
>> + if (pcie_get_relaxed_ordering(root))
>> + adapter->flags |= ROOT_NO_RELAXED_ORDERING;
>> +
>> spin_lock_init(&adapter->stats_lock);
>> spin_lock_init(&adapter->tid_release_lock);
>> spin_lock_init(&adapter->win0_lock);
>> diff --git a/drivers/net/ethernet/chelsio/cxgb4/sge.c b/drivers/net/ethernet/chelsio/cxgb4/sge.c
>> index f05f0d4..ac229a3 100644
>> --- a/drivers/net/ethernet/chelsio/cxgb4/sge.c
>> +++ b/drivers/net/ethernet/chelsio/cxgb4/sge.c
>> @@ -2571,6 +2571,7 @@ int t4_sge_alloc_rxq(struct adapter *adap, struct sge_rspq *iq, bool fwevtq,
>> struct fw_iq_cmd c;
>> struct sge *s = &adap->sge;
>> struct port_info *pi = netdev_priv(dev);
>> + int relaxed = !(adap->flags & ROOT_NO_RELAXED_ORDERING);
>>
>> /* Size needs to be multiple of 16, including status entry. */
>> iq->size = roundup(iq->size, 16);
>> @@ -2624,8 +2625,8 @@ int t4_sge_alloc_rxq(struct adapter *adap, struct sge_rspq *iq, bool fwevtq,
>>
>> flsz = fl->size / 8 + s->stat_len / sizeof(struct tx_desc);
>> c.iqns_to_fl0congen |= htonl(FW_IQ_CMD_FL0PACKEN_F |
>> - FW_IQ_CMD_FL0FETCHRO_F |
>> - FW_IQ_CMD_FL0DATARO_F |
>> + FW_IQ_CMD_FL0FETCHRO_V(relaxed) |
>> + FW_IQ_CMD_FL0DATARO_V(relaxed) |
>> FW_IQ_CMD_FL0PADEN_F);
>> if (cong >= 0)
>> c.iqns_to_fl0congen |=
>> --
>> 1.9.0
>>
>>
>
> .
>