2015-07-29 22:19:36

by Keith Busch

[permalink] [raw]
Subject: [PATCH 0/3] PCI-e Max Payload Size configuration

This patch series removes arch, driver, and hot-plug specific calls to
configure a PCI-e device's MPS setting. The device is instead configured
in the generic pci-core layer when the device is initially added, and
before it is bound to a driver.

The default policy is also changed from "do nothing" to update the
down stream port to match the upstream port if it is capable.

Dave Jiang (2):
QIB: Removing usage of pcie_set_mps()
PCIE: Remove symbol export for pcie_set_mps()

Keith Busch (1):
pci: Default MPS tuning to match upstream port

arch/arm/kernel/bios32.c | 12 ------------
arch/powerpc/kernel/pci-common.c | 7 -------
arch/tile/kernel/pci_gx.c | 4 ----
arch/x86/pci/acpi.c | 9 ---------
drivers/infiniband/hw/qib/qib_pcie.c | 27 +--------------------------
drivers/pci/bus.c | 4 ++++
drivers/pci/hotplug/acpiphp_glue.c | 1 -
drivers/pci/hotplug/pciehp_pci.c | 1 -
drivers/pci/hotplug/shpchp_pci.c | 1 -
drivers/pci/pci.c | 1 -
drivers/pci/probe.c | 22 ++++++++++++++++------
include/linux/pci.h | 2 --
12 files changed, 21 insertions(+), 70 deletions(-)

--
1.7.10.4


2015-07-29 22:20:45

by Keith Busch

[permalink] [raw]
Subject: [PATCH] pci: Default MPS tuning, match upstream port

A hot plugged PCI-e device max payload size (MPS) defaults to 0 for
128bytes. The device is not usable if the upstream port is configured
to a higher setting.

Bus configuration was previously done by arch specific and hot plug code
after the root port or bridge was scanned, and default behavior logged a
warning without changing the setting if there was a problem. This patch
unifies tuning with a default policy that affects only misconfigured
downstream devices, and preserves existing end result if a non-default
bus tuning is used (ex: pci=pcie_bus_safe).

The new pcie tuning will check the device's MPS against the parent bridge
when it is initially added to the pci subsystem, prior to attaching
to a driver. If MPS is mismatched, the downstream port is set to the
parent bridge's if capable. This is safe to change here since the device
being configured is not bound to a driver and does not affect previously
configured devices in the domain hierarchy.

A device incapable of matching the upstream bridge will log a
warning message and not bind to a driver. This is the safe option since
proper MPS configuration must consider the entire hierarchy between
communicating end points, and we can't safely modify a root port's
subtree MPS settings while it is in use.

Since the bus is configured everytime a bridge is scanned, this
potentially creates unnecessary re-walking of an already configured
sub-tree, but the code only executes during root port initialization,
hot adding a switch, or explicit request to rescan.

Signed-off-by: Keith Busch <[email protected]>
Cc: Dave Jiang <[email protected]>
Cc: Austin Bolen <[email protected]>
Cc: Myron Stowe <[email protected]>
Cc: Jon Mason <[email protected]>
Cc: Bjorn Helgaas <[email protected]>
---
arch/arm/kernel/bios32.c | 12 ------------
arch/powerpc/kernel/pci-common.c | 7 -------
arch/tile/kernel/pci_gx.c | 4 ----
arch/x86/pci/acpi.c | 9 ---------
drivers/pci/bus.c | 4 ++++
drivers/pci/hotplug/acpiphp_glue.c | 1 -
drivers/pci/hotplug/pciehp_pci.c | 1 -
drivers/pci/hotplug/shpchp_pci.c | 1 -
drivers/pci/probe.c | 27 ++++++++++++++++++++++-----
include/linux/pci.h | 2 --
10 files changed, 26 insertions(+), 42 deletions(-)

diff --git a/arch/arm/kernel/bios32.c b/arch/arm/kernel/bios32.c
index fcbbbb1..4fff58e 100644
--- a/arch/arm/kernel/bios32.c
+++ b/arch/arm/kernel/bios32.c
@@ -537,18 +537,6 @@ void pci_common_init_dev(struct device *parent, struct hw_pci *hw)
*/
pci_bus_add_devices(bus);
}
-
- list_for_each_entry(sys, &head, node) {
- struct pci_bus *bus = sys->bus;
-
- /* Configure PCI Express settings */
- if (bus && !pci_has_flag(PCI_PROBE_ONLY)) {
- struct pci_bus *child;
-
- list_for_each_entry(child, &bus->children, node)
- pcie_bus_configure_settings(child);
- }
- }
}

#ifndef CONFIG_PCI_HOST_ITE8152
diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index b9de34d..7f27ffe 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -1682,13 +1682,6 @@ void pcibios_scan_phb(struct pci_controller *hose)
*/
if (ppc_md.pcibios_fixup_phb)
ppc_md.pcibios_fixup_phb(hose);
-
- /* Configure PCI Express settings */
- if (bus && !pci_has_flag(PCI_PROBE_ONLY)) {
- struct pci_bus *child;
- list_for_each_entry(child, &bus->children, node)
- pcie_bus_configure_settings(child);
- }
}
EXPORT_SYMBOL_GPL(pcibios_scan_phb);

diff --git a/arch/tile/kernel/pci_gx.c b/arch/tile/kernel/pci_gx.c
index b1df847..67492fb 100644
--- a/arch/tile/kernel/pci_gx.c
+++ b/arch/tile/kernel/pci_gx.c
@@ -599,10 +599,6 @@ static void fixup_read_and_payload_sizes(struct pci_controller *controller)
__gxio_mmio_write32(trio_context->mmio_base_mac + reg_offset,
rc_dev_cap.word);

- /* Configure PCI Express MPS setting. */
- list_for_each_entry(child, &root_bus->children, node)
- pcie_bus_configure_settings(child);
-
/*
* Set the mac_config register in trio based on the MPS/MRS of the link.
*/
diff --git a/arch/x86/pci/acpi.c b/arch/x86/pci/acpi.c
index ff99117..ab5977a 100644
--- a/arch/x86/pci/acpi.c
+++ b/arch/x86/pci/acpi.c
@@ -478,15 +478,6 @@ struct pci_bus *pci_acpi_scan_root(struct acpi_pci_root *root)
}
}

- /* After the PCI-E bus has been walked and all devices discovered,
- * configure any settings of the fabric that might be necessary.
- */
- if (bus) {
- struct pci_bus *child;
- list_for_each_entry(child, &bus->children, node)
- pcie_bus_configure_settings(child);
- }
-
if (bus && node != NUMA_NO_NODE)
dev_printk(KERN_DEBUG, &bus->dev, "on NUMA node %d\n", node);

diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
index 6fbd3f2..8f8428a 100644
--- a/drivers/pci/bus.c
+++ b/drivers/pci/bus.c
@@ -277,6 +277,10 @@ void pci_bus_add_device(struct pci_dev *dev)
{
int retval;

+ if (dev->bus->self &&
+ pcie_get_mps(dev) != pcie_get_mps(dev->bus->self))
+ return;
+
/*
* Can not put in pci_device_add yet because resources
* are not assigned yet for some devices.
diff --git a/drivers/pci/hotplug/acpiphp_glue.c b/drivers/pci/hotplug/acpiphp_glue.c
index ff53856..cd98649 100644
--- a/drivers/pci/hotplug/acpiphp_glue.c
+++ b/drivers/pci/hotplug/acpiphp_glue.c
@@ -509,7 +509,6 @@ static void enable_slot(struct acpiphp_slot *slot)
__pci_bus_assign_resources(bus, &add_list, NULL);

acpiphp_sanitize_bus(bus);
- pcie_bus_configure_settings(bus);
acpiphp_set_acpi_region(slot);

list_for_each_entry(dev, &bus->devices, bus_list) {
diff --git a/drivers/pci/hotplug/pciehp_pci.c b/drivers/pci/hotplug/pciehp_pci.c
index 9e69403..93bc7f6 100644
--- a/drivers/pci/hotplug/pciehp_pci.c
+++ b/drivers/pci/hotplug/pciehp_pci.c
@@ -65,7 +65,6 @@ int pciehp_configure_device(struct slot *p_slot)
pci_hp_add_bridge(dev);

pci_assign_unassigned_bridge_resources(bridge);
- pcie_bus_configure_settings(parent);
pci_bus_add_devices(parent);

out:
diff --git a/drivers/pci/hotplug/shpchp_pci.c b/drivers/pci/hotplug/shpchp_pci.c
index f8cd3a2..ca3dc3c 100644
--- a/drivers/pci/hotplug/shpchp_pci.c
+++ b/drivers/pci/hotplug/shpchp_pci.c
@@ -69,7 +69,6 @@ int shpchp_configure_device(struct slot *p_slot)
}

pci_assign_unassigned_bridge_resources(bridge);
- pcie_bus_configure_settings(parent);
pci_bus_add_devices(parent);

out:
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index cefd636..b469298 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -37,6 +37,9 @@ struct pci_domain_busn_res {
int domain_nr;
};

+static void pcie_bus_detect_mps(struct pci_dev *dev);
+static void pcie_bus_configure_settings(struct pci_bus *bus);
+
static struct resource *get_pci_domain_busn_res(int domain_nr)
{
struct pci_domain_busn_res *r;
@@ -929,6 +932,8 @@ int pci_scan_bridge(struct pci_bus *bus, struct pci_dev *dev, int max, int pass)
(is_cardbus ? "PCI CardBus %04x:%02x" : "PCI Bus %04x:%02x"),
pci_domain_nr(bus), child->number);

+ pcie_bus_configure_settings(bus);
+
/* Has only triggered on CardBus, fixup is in yenta_socket */
while (bus->parent) {
if ((child->busn_res.end > bus->busn_res.end) ||
@@ -1589,6 +1594,9 @@ void pci_device_add(struct pci_dev *dev, struct pci_bus *bus)
dev->match_driver = false;
ret = device_add(&dev->dev);
WARN_ON(ret < 0);
+
+ if (pci_is_pcie(dev))
+ pcie_bus_detect_mps(dev);
}

struct pci_dev *pci_scan_single_device(struct pci_bus *bus, int devfn)
@@ -1802,9 +1810,17 @@ static void pcie_bus_detect_mps(struct pci_dev *dev)
mps = pcie_get_mps(dev);
p_mps = pcie_get_mps(bridge);

- if (mps != p_mps)
- dev_warn(&dev->dev, "Max Payload Size %d, but upstream %s set to %d; if necessary, use \"pci=pcie_bus_safe\" and report a bug\n",
- mps, pci_name(bridge), p_mps);
+ if (mps != p_mps) {
+ if (128 << dev->pcie_mpss < p_mps) {
+ dev_warn(&dev->dev,
+ "Max Payload Size Supported %d, but upstream %s set to %d. If problems are experienced, try running with pci=pcie_bus_safe\n",
+ 128 << dev->pcie_mpss, pci_name(bridge), p_mps);
+ return;
+ }
+ pcie_write_mps(dev, p_mps);
+ dev_info(&dev->dev, "Max Payload Size set to %4d/%4d (was %4d)\n",
+ pcie_get_mps(dev), 128 << dev->pcie_mpss, mps);
+ }
}

static int pcie_bus_configure_set(struct pci_dev *dev, void *data)
@@ -1836,7 +1852,7 @@ static int pcie_bus_configure_set(struct pci_dev *dev, void *data)
* parents then children fashion. If this changes, then this code will not
* work as designed.
*/
-void pcie_bus_configure_settings(struct pci_bus *bus)
+static void pcie_bus_configure_settings(struct pci_bus *bus)
{
u8 smpss = 0;

@@ -1863,7 +1879,6 @@ void pcie_bus_configure_settings(struct pci_bus *bus)
pcie_bus_configure_set(bus->self, &smpss);
pci_walk_bus(bus, pcie_bus_configure_set, &smpss);
}
-EXPORT_SYMBOL_GPL(pcie_bus_configure_settings);

unsigned int pci_scan_child_bus(struct pci_bus *bus)
{
@@ -1895,6 +1910,8 @@ unsigned int pci_scan_child_bus(struct pci_bus *bus)
max = pci_scan_bridge(bus, dev, max, pass);
}

+ pcie_bus_configure_settings(bus);
+
/*
* We've scanned the bus and so we know all about what's on
* the other side of any bridges that may be on this bus plus
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 8a0321a..e1df5f9 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -735,8 +735,6 @@ struct pci_driver {
/* these external functions are only available when PCI support is enabled */
#ifdef CONFIG_PCI

-void pcie_bus_configure_settings(struct pci_bus *bus);
-
enum pcie_bus_config_types {
PCIE_BUS_TUNE_OFF,
PCIE_BUS_SAFE,
--
1.7.10.4

2015-07-29 22:19:38

by Keith Busch

[permalink] [raw]
Subject: [PATCH 2/3] QIB: Removing usage of pcie_set_mps()

From: Dave Jiang <[email protected]>

This is in perperation of un-exporting the pcie_set_mps() function
symbol. A driver should not be changing the MPS as that is the
responsibility of the PCI subsystem.

Signed-off-by: Dave Jiang <[email protected]>
---
drivers/infiniband/hw/qib/qib_pcie.c | 27 +--------------------------
1 file changed, 1 insertion(+), 26 deletions(-)

diff --git a/drivers/infiniband/hw/qib/qib_pcie.c b/drivers/infiniband/hw/qib/qib_pcie.c
index 4758a38..b8a2dcd 100644
--- a/drivers/infiniband/hw/qib/qib_pcie.c
+++ b/drivers/infiniband/hw/qib/qib_pcie.c
@@ -557,12 +557,11 @@ static void qib_tune_pcie_coalesce(struct qib_devdata *dd)
*/
static int qib_pcie_caps;
module_param_named(pcie_caps, qib_pcie_caps, int, S_IRUGO);
-MODULE_PARM_DESC(pcie_caps, "Max PCIe tuning: Payload (0..3), ReadReq (4..7)");
+MODULE_PARM_DESC(pcie_caps, "Max PCIe tuning: ReadReq (4..7)");

static void qib_tune_pcie_caps(struct qib_devdata *dd)
{
struct pci_dev *parent;
- u16 rc_mpss, rc_mps, ep_mpss, ep_mps;
u16 rc_mrrs, ep_mrrs, max_mrrs;

/* Find out supported and configured values for parent (root) */
@@ -575,30 +574,6 @@ static void qib_tune_pcie_caps(struct qib_devdata *dd)
if (!pci_is_pcie(parent) || !pci_is_pcie(dd->pcidev))
return;

- rc_mpss = parent->pcie_mpss;
- rc_mps = ffs(pcie_get_mps(parent)) - 8;
- /* Find out supported and configured values for endpoint (us) */
- ep_mpss = dd->pcidev->pcie_mpss;
- ep_mps = ffs(pcie_get_mps(dd->pcidev)) - 8;
-
- /* Find max payload supported by root, endpoint */
- if (rc_mpss > ep_mpss)
- rc_mpss = ep_mpss;
-
- /* If Supported greater than limit in module param, limit it */
- if (rc_mpss > (qib_pcie_caps & 7))
- rc_mpss = qib_pcie_caps & 7;
- /* If less than (allowed, supported), bump root payload */
- if (rc_mpss > rc_mps) {
- rc_mps = rc_mpss;
- pcie_set_mps(parent, 128 << rc_mps);
- }
- /* If less than (allowed, supported), bump endpoint payload */
- if (rc_mpss > ep_mps) {
- ep_mps = rc_mpss;
- pcie_set_mps(dd->pcidev, 128 << ep_mps);
- }
-
/*
* Now the Read Request size.
* No field for max supported, but PCIe spec limits it to 4096,
--
1.7.10.4

2015-07-29 22:19:37

by Keith Busch

[permalink] [raw]
Subject: [PATCH 3/3] PCIE: Remove symbol export for pcie_set_mps()

From: Dave Jiang <[email protected]>

The setting of PCIe MPS should be left to the PCI subsystem and not
the driver. An ill configured MPS by the driver could cause the device
to not function or unstablize the entire system. Removing the exported
symbol.

Signed-off-by: Dave Jiang <[email protected]>
---
drivers/pci/pci.c | 1 -
1 file changed, 1 deletion(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 0008c95..92349ee 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -4121,7 +4121,6 @@ int pcie_set_mps(struct pci_dev *dev, int mps)
return pcie_capability_clear_and_set_word(dev, PCI_EXP_DEVCTL,
PCI_EXP_DEVCTL_PAYLOAD, v);
}
-EXPORT_SYMBOL(pcie_set_mps);

/**
* pcie_get_minimum_link - determine minimum link settings of a PCI device
--
1.7.10.4

2015-08-17 22:28:53

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [PATCH] pci: Default MPS tuning, match upstream port

On Wed, Jul 29, 2015 at 04:18:53PM -0600, Keith Busch wrote:
> A hot plugged PCI-e device max payload size (MPS) defaults to 0 for
> 128bytes. The device is not usable if the upstream port is configured
> to a higher setting.
>
> Bus configuration was previously done by arch specific and hot plug code
> after the root port or bridge was scanned, and default behavior logged a
> warning without changing the setting if there was a problem. This patch
> unifies tuning with a default policy that affects only misconfigured
> downstream devices, and preserves existing end result if a non-default
> bus tuning is used (ex: pci=pcie_bus_safe).
>
> The new pcie tuning will check the device's MPS against the parent bridge
> when it is initially added to the pci subsystem, prior to attaching
> to a driver. If MPS is mismatched, the downstream port is set to the
> parent bridge's if capable.

This is a little confusing (at least, *I'm* confused). You're talking
about setting the MPS configuration of the "downstream port," but I
think you are talking about either a Switch Upstream Port or an
Endpoint. Such a port is indeed *downstream* from the parent bridge,
but referring to it as a "downstream port" risks confusing it with the
parent bridge, which is either a Switch Downstream Port or a Root
Port.

> This is safe to change here since the device
> being configured is not bound to a driver and does not affect previously
> configured devices in the domain hierarchy.
>
> A device incapable of matching the upstream bridge will log a
> warning message and not bind to a driver. This is the safe option since
> proper MPS configuration must consider the entire hierarchy between
> communicating end points, and we can't safely modify a root port's
> subtree MPS settings while it is in use.
>
> Since the bus is configured everytime a bridge is scanned, this
> potentially creates unnecessary re-walking of an already configured
> sub-tree, but the code only executes during root port initialization,
> hot adding a switch, or explicit request to rescan.
>
> Signed-off-by: Keith Busch <[email protected]>
> Cc: Dave Jiang <[email protected]>
> Cc: Austin Bolen <[email protected]>
> Cc: Myron Stowe <[email protected]>
> Cc: Jon Mason <[email protected]>
> Cc: Bjorn Helgaas <[email protected]>
> ---
> arch/arm/kernel/bios32.c | 12 ------------
> arch/powerpc/kernel/pci-common.c | 7 -------
> arch/tile/kernel/pci_gx.c | 4 ----
> arch/x86/pci/acpi.c | 9 ---------
> drivers/pci/bus.c | 4 ++++
> drivers/pci/hotplug/acpiphp_glue.c | 1 -
> drivers/pci/hotplug/pciehp_pci.c | 1 -
> drivers/pci/hotplug/shpchp_pci.c | 1 -
> drivers/pci/probe.c | 27 ++++++++++++++++++++++-----
> include/linux/pci.h | 2 --
> 10 files changed, 26 insertions(+), 42 deletions(-)
>
> diff --git a/arch/arm/kernel/bios32.c b/arch/arm/kernel/bios32.c
> index fcbbbb1..4fff58e 100644
> --- a/arch/arm/kernel/bios32.c
> +++ b/arch/arm/kernel/bios32.c
> @@ -537,18 +537,6 @@ void pci_common_init_dev(struct device *parent, struct hw_pci *hw)
> */
> pci_bus_add_devices(bus);
> }
> -
> - list_for_each_entry(sys, &head, node) {
> - struct pci_bus *bus = sys->bus;
> -
> - /* Configure PCI Express settings */
> - if (bus && !pci_has_flag(PCI_PROBE_ONLY)) {
> - struct pci_bus *child;
> -
> - list_for_each_entry(child, &bus->children, node)
> - pcie_bus_configure_settings(child);
> - }
> - }

I think nothing terrible happens if we call
pcie_bus_configure_settings() more than once (we might get some
duplicate messages, but I don't consider that terrible). If that's
true, maybe we could split the removal of these calls into a separate
patch. It doesn't need to be a separate patch for every arch, but I
think getting the "redundant code removal" out of this patch will make
the interesting changes more obvious.

> }
>
> #ifndef CONFIG_PCI_HOST_ITE8152
> diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
> index b9de34d..7f27ffe 100644
> --- a/arch/powerpc/kernel/pci-common.c
> +++ b/arch/powerpc/kernel/pci-common.c
> @@ -1682,13 +1682,6 @@ void pcibios_scan_phb(struct pci_controller *hose)
> */
> if (ppc_md.pcibios_fixup_phb)
> ppc_md.pcibios_fixup_phb(hose);
> -
> - /* Configure PCI Express settings */
> - if (bus && !pci_has_flag(PCI_PROBE_ONLY)) {
> - struct pci_bus *child;
> - list_for_each_entry(child, &bus->children, node)
> - pcie_bus_configure_settings(child);
> - }
> }
> EXPORT_SYMBOL_GPL(pcibios_scan_phb);
>
> diff --git a/arch/tile/kernel/pci_gx.c b/arch/tile/kernel/pci_gx.c
> index b1df847..67492fb 100644
> --- a/arch/tile/kernel/pci_gx.c
> +++ b/arch/tile/kernel/pci_gx.c
> @@ -599,10 +599,6 @@ static void fixup_read_and_payload_sizes(struct pci_controller *controller)
> __gxio_mmio_write32(trio_context->mmio_base_mac + reg_offset,
> rc_dev_cap.word);
>
> - /* Configure PCI Express MPS setting. */
> - list_for_each_entry(child, &root_bus->children, node)
> - pcie_bus_configure_settings(child);
> -
> /*
> * Set the mac_config register in trio based on the MPS/MRS of the link.
> */
> diff --git a/arch/x86/pci/acpi.c b/arch/x86/pci/acpi.c
> index ff99117..ab5977a 100644
> --- a/arch/x86/pci/acpi.c
> +++ b/arch/x86/pci/acpi.c
> @@ -478,15 +478,6 @@ struct pci_bus *pci_acpi_scan_root(struct acpi_pci_root *root)
> }
> }
>
> - /* After the PCI-E bus has been walked and all devices discovered,
> - * configure any settings of the fabric that might be necessary.
> - */
> - if (bus) {
> - struct pci_bus *child;
> - list_for_each_entry(child, &bus->children, node)
> - pcie_bus_configure_settings(child);
> - }
> -
> if (bus && node != NUMA_NO_NODE)
> dev_printk(KERN_DEBUG, &bus->dev, "on NUMA node %d\n", node);
>
> diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
> index 6fbd3f2..8f8428a 100644
> --- a/drivers/pci/bus.c
> +++ b/drivers/pci/bus.c
> @@ -277,6 +277,10 @@ void pci_bus_add_device(struct pci_dev *dev)
> {
> int retval;
>
> + if (dev->bus->self &&
> + pcie_get_mps(dev) != pcie_get_mps(dev->bus->self))
> + return;

This part looks like it could be in its own separate patch that
basically enforces the "if we can't safely configure this device's MPS
setting, prevent drivers from using the device" idea. What happens in
this case? Does the device still appear in sysfs and lspci, even if
we can't configure its MPS safely? This seems like good place for a
dev_warn().

> /*
> * Can not put in pci_device_add yet because resources
> * are not assigned yet for some devices.
> diff --git a/drivers/pci/hotplug/acpiphp_glue.c b/drivers/pci/hotplug/acpiphp_glue.c
> index ff53856..cd98649 100644
> --- a/drivers/pci/hotplug/acpiphp_glue.c
> +++ b/drivers/pci/hotplug/acpiphp_glue.c
> @@ -509,7 +509,6 @@ static void enable_slot(struct acpiphp_slot *slot)
> __pci_bus_assign_resources(bus, &add_list, NULL);
>
> acpiphp_sanitize_bus(bus);
> - pcie_bus_configure_settings(bus);
> acpiphp_set_acpi_region(slot);
>
> list_for_each_entry(dev, &bus->devices, bus_list) {
> diff --git a/drivers/pci/hotplug/pciehp_pci.c b/drivers/pci/hotplug/pciehp_pci.c
> index 9e69403..93bc7f6 100644
> --- a/drivers/pci/hotplug/pciehp_pci.c
> +++ b/drivers/pci/hotplug/pciehp_pci.c
> @@ -65,7 +65,6 @@ int pciehp_configure_device(struct slot *p_slot)
> pci_hp_add_bridge(dev);
>
> pci_assign_unassigned_bridge_resources(bridge);
> - pcie_bus_configure_settings(parent);
> pci_bus_add_devices(parent);
>
> out:
> diff --git a/drivers/pci/hotplug/shpchp_pci.c b/drivers/pci/hotplug/shpchp_pci.c
> index f8cd3a2..ca3dc3c 100644
> --- a/drivers/pci/hotplug/shpchp_pci.c
> +++ b/drivers/pci/hotplug/shpchp_pci.c
> @@ -69,7 +69,6 @@ int shpchp_configure_device(struct slot *p_slot)
> }
>
> pci_assign_unassigned_bridge_resources(bridge);
> - pcie_bus_configure_settings(parent);
> pci_bus_add_devices(parent);
>
> out:
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index cefd636..b469298 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -37,6 +37,9 @@ struct pci_domain_busn_res {
> int domain_nr;
> };
>
> +static void pcie_bus_detect_mps(struct pci_dev *dev);
> +static void pcie_bus_configure_settings(struct pci_bus *bus);
> +
> static struct resource *get_pci_domain_busn_res(int domain_nr)
> {
> struct pci_domain_busn_res *r;
> @@ -929,6 +932,8 @@ int pci_scan_bridge(struct pci_bus *bus, struct pci_dev *dev, int max, int pass)
> (is_cardbus ? "PCI CardBus %04x:%02x" : "PCI Bus %04x:%02x"),
> pci_domain_nr(bus), child->number);
>
> + pcie_bus_configure_settings(bus);
> +
> /* Has only triggered on CardBus, fixup is in yenta_socket */
> while (bus->parent) {
> if ((child->busn_res.end > bus->busn_res.end) ||
> @@ -1589,6 +1594,9 @@ void pci_device_add(struct pci_dev *dev, struct pci_bus *bus)
> dev->match_driver = false;
> ret = device_add(&dev->dev);
> WARN_ON(ret < 0);
> +
> + if (pci_is_pcie(dev))
> + pcie_bus_detect_mps(dev);
> }
>
> struct pci_dev *pci_scan_single_device(struct pci_bus *bus, int devfn)
> @@ -1802,9 +1810,17 @@ static void pcie_bus_detect_mps(struct pci_dev *dev)
> mps = pcie_get_mps(dev);
> p_mps = pcie_get_mps(bridge);
>
> - if (mps != p_mps)
> - dev_warn(&dev->dev, "Max Payload Size %d, but upstream %s set to %d; if necessary, use \"pci=pcie_bus_safe\" and report a bug\n",
> - mps, pci_name(bridge), p_mps);
> + if (mps != p_mps) {
> + if (128 << dev->pcie_mpss < p_mps) {
> + dev_warn(&dev->dev,
> + "Max Payload Size Supported %d, but upstream %s set to %d. If problems are experienced, try running with pci=pcie_bus_safe\n",
> + 128 << dev->pcie_mpss, pci_name(bridge), p_mps);
> + return;

Isn't this the same case where the above change to
pci_bus_add_device() will now decline to add the device? So I think
there are really two policy changes:

1) If an device can be configured to match the upstream bridge's MPS
setting, do it, and

2) Don't add a device when its MPS setting doesn't match the
upstream bridge

I'd like those to be separate patches.

> + }
> + pcie_write_mps(dev, p_mps);
> + dev_info(&dev->dev, "Max Payload Size set to %4d/%4d (was %4d)\n",
> + pcie_get_mps(dev), 128 << dev->pcie_mpss, mps);
> + }

I assume this hunk is related to the policy change (from "do nothing"
to "update the downstream port"). Can you split that policy change
into its own separate minimal patch? Yes, I'm looking for minimal and
specific bisect targets :)

I think this will be clearer if you write it as:

if (mps == p_mps)
return;

if (128 << dev->pcie_mpss < p_mps) {
dev_warn(...);
return;
}

pcie_write_mps(...);
dev_info(...);

Now that this actively updates the MPS setting, it's starting to grow
beyond the original "pcie_bus_detect_mps()" function name.

> }
>
> static int pcie_bus_configure_set(struct pci_dev *dev, void *data)
> @@ -1836,7 +1852,7 @@ static int pcie_bus_configure_set(struct pci_dev *dev, void *data)
> * parents then children fashion. If this changes, then this code will not
> * work as designed.
> */
> -void pcie_bus_configure_settings(struct pci_bus *bus)
> +static void pcie_bus_configure_settings(struct pci_bus *bus)
> {
> u8 smpss = 0;
>
> @@ -1863,7 +1879,6 @@ void pcie_bus_configure_settings(struct pci_bus *bus)
> pcie_bus_configure_set(bus->self, &smpss);
> pci_walk_bus(bus, pcie_bus_configure_set, &smpss);
> }
> -EXPORT_SYMBOL_GPL(pcie_bus_configure_settings);
>
> unsigned int pci_scan_child_bus(struct pci_bus *bus)
> {
> @@ -1895,6 +1910,8 @@ unsigned int pci_scan_child_bus(struct pci_bus *bus)
> max = pci_scan_bridge(bus, dev, max, pass);
> }
>
> + pcie_bus_configure_settings(bus);
> +
> /*
> * We've scanned the bus and so we know all about what's on
> * the other side of any bridges that may be on this bus plus
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 8a0321a..e1df5f9 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -735,8 +735,6 @@ struct pci_driver {
> /* these external functions are only available when PCI support is enabled */
> #ifdef CONFIG_PCI
>
> -void pcie_bus_configure_settings(struct pci_bus *bus);
> -
> enum pcie_bus_config_types {
> PCIE_BUS_TUNE_OFF,
> PCIE_BUS_SAFE,
> --
> 1.7.10.4
>

2015-08-17 22:30:45

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [PATCH 2/3] QIB: Removing usage of pcie_set_mps()

[+cc Mike, linux-rdma]

On Wed, Jul 29, 2015 at 04:18:54PM -0600, Keith Busch wrote:
> From: Dave Jiang <[email protected]>
>
> This is in perperation of un-exporting the pcie_set_mps() function
> symbol. A driver should not be changing the MPS as that is the
> responsibility of the PCI subsystem.

Please explain the implications of removing this code. Does this affect
performance of the device? If so, how do we get that performance back?

I also cc'd the QIB maintainers for you:

QIB DRIVER
M: Mike Marciniszyn <[email protected]>
L: [email protected]
F: drivers/infiniband/hw/qib/

> Signed-off-by: Dave Jiang <[email protected]>
> ---
> drivers/infiniband/hw/qib/qib_pcie.c | 27 +--------------------------
> 1 file changed, 1 insertion(+), 26 deletions(-)
>
> diff --git a/drivers/infiniband/hw/qib/qib_pcie.c b/drivers/infiniband/hw/qib/qib_pcie.c
> index 4758a38..b8a2dcd 100644
> --- a/drivers/infiniband/hw/qib/qib_pcie.c
> +++ b/drivers/infiniband/hw/qib/qib_pcie.c
> @@ -557,12 +557,11 @@ static void qib_tune_pcie_coalesce(struct qib_devdata *dd)
> */
> static int qib_pcie_caps;
> module_param_named(pcie_caps, qib_pcie_caps, int, S_IRUGO);
> -MODULE_PARM_DESC(pcie_caps, "Max PCIe tuning: Payload (0..3), ReadReq (4..7)");
> +MODULE_PARM_DESC(pcie_caps, "Max PCIe tuning: ReadReq (4..7)");
>
> static void qib_tune_pcie_caps(struct qib_devdata *dd)
> {
> struct pci_dev *parent;
> - u16 rc_mpss, rc_mps, ep_mpss, ep_mps;
> u16 rc_mrrs, ep_mrrs, max_mrrs;
>
> /* Find out supported and configured values for parent (root) */
> @@ -575,30 +574,6 @@ static void qib_tune_pcie_caps(struct qib_devdata *dd)
> if (!pci_is_pcie(parent) || !pci_is_pcie(dd->pcidev))
> return;
>
> - rc_mpss = parent->pcie_mpss;
> - rc_mps = ffs(pcie_get_mps(parent)) - 8;
> - /* Find out supported and configured values for endpoint (us) */
> - ep_mpss = dd->pcidev->pcie_mpss;
> - ep_mps = ffs(pcie_get_mps(dd->pcidev)) - 8;
> -
> - /* Find max payload supported by root, endpoint */
> - if (rc_mpss > ep_mpss)
> - rc_mpss = ep_mpss;
> -
> - /* If Supported greater than limit in module param, limit it */
> - if (rc_mpss > (qib_pcie_caps & 7))
> - rc_mpss = qib_pcie_caps & 7;
> - /* If less than (allowed, supported), bump root payload */
> - if (rc_mpss > rc_mps) {
> - rc_mps = rc_mpss;
> - pcie_set_mps(parent, 128 << rc_mps);
> - }
> - /* If less than (allowed, supported), bump endpoint payload */
> - if (rc_mpss > ep_mps) {
> - ep_mps = rc_mpss;
> - pcie_set_mps(dd->pcidev, 128 << ep_mps);
> - }
> -
> /*
> * Now the Read Request size.
> * No field for max supported, but PCIe spec limits it to 4096,
> --
> 1.7.10.4
>

2015-08-17 22:31:44

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [PATCH 3/3] PCIE: Remove symbol export for pcie_set_mps()

On Wed, Jul 29, 2015 at 04:18:55PM -0600, Keith Busch wrote:
> From: Dave Jiang <[email protected]>
>
> The setting of PCIe MPS should be left to the PCI subsystem and not
> the driver. An ill configured MPS by the driver could cause the device
> to not function or unstablize the entire system. Removing the exported
> symbol.
>
> Signed-off-by: Dave Jiang <[email protected]>
> ---
> drivers/pci/pci.c | 1 -
> 1 file changed, 1 deletion(-)
>
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 0008c95..92349ee 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -4121,7 +4121,6 @@ int pcie_set_mps(struct pci_dev *dev, int mps)
> return pcie_capability_clear_and_set_word(dev, PCI_EXP_DEVCTL,
> PCI_EXP_DEVCTL_PAYLOAD, v);
> }
> -EXPORT_SYMBOL(pcie_set_mps);

I think the pcie_set_mps() declaration could be moved from
include/linux/pci.h to drivers/pci/pci.h, couldn't it?

2015-08-17 22:50:19

by Dave Jiang

[permalink] [raw]
Subject: Re: [PATCH 2/3] QIB: Removing usage of pcie_set_mps()

On Mon, 2015-08-17 at 17:30 -0500, Bjorn Helgaas wrote:
> [+cc Mike, linux-rdma]
>
> On Wed, Jul 29, 2015 at 04:18:54PM -0600, Keith Busch wrote:
> > From: Dave Jiang <[email protected]>
> >
> > This is in perperation of un-exporting the pcie_set_mps() function
> > symbol. A driver should not be changing the MPS as that is the
> > responsibility of the PCI subsystem.
>
> Please explain the implications of removing this code. Does this
> affect
> performance of the device? If so, how do we get that performance
> back?

Honestly I don't know. But at the same time I think the driver
shouldn't be touching the MPS at all. Shouldn't that be left to the
PCIe subsystem and rely on the PCIe subsystem to set this to a sane
value?

>
> I also cc'd the QIB maintainers for you:
>
> QIB DRIVER
> M: Mike Marciniszyn <[email protected]>
> L: [email protected]
> F: drivers/infiniband/hw/qib/
>
> > Signed-off-by: Dave Jiang <[email protected]>
> > ---
> > drivers/infiniband/hw/qib/qib_pcie.c | 27 +---------------------
> > -----
> > 1 file changed, 1 insertion(+), 26 deletions(-)
> >
> > diff --git a/drivers/infiniband/hw/qib/qib_pcie.c
> > b/drivers/infiniband/hw/qib/qib_pcie.c
> > index 4758a38..b8a2dcd 100644
> > --- a/drivers/infiniband/hw/qib/qib_pcie.c
> > +++ b/drivers/infiniband/hw/qib/qib_pcie.c
> > @@ -557,12 +557,11 @@ static void qib_tune_pcie_coalesce(struct
> > qib_devdata *dd)
> > */
> > static int qib_pcie_caps;
> > module_param_named(pcie_caps, qib_pcie_caps, int, S_IRUGO);
> > -MODULE_PARM_DESC(pcie_caps, "Max PCIe tuning: Payload (0..3),
> > ReadReq (4..7)");
> > +MODULE_PARM_DESC(pcie_caps, "Max PCIe tuning: ReadReq (4..7)");
> >
> > static void qib_tune_pcie_caps(struct qib_devdata *dd)
> > {
> > struct pci_dev *parent;
> > - u16 rc_mpss, rc_mps, ep_mpss, ep_mps;
> > u16 rc_mrrs, ep_mrrs, max_mrrs;
> >
> > /* Find out supported and configured values for parent
> > (root) */
> > @@ -575,30 +574,6 @@ static void qib_tune_pcie_caps(struct
> > qib_devdata *dd)
> > if (!pci_is_pcie(parent) || !pci_is_pcie(dd->pcidev))
> > return;
> >
> > - rc_mpss = parent->pcie_mpss;
> > - rc_mps = ffs(pcie_get_mps(parent)) - 8;
> > - /* Find out supported and configured values for endpoint
> > (us) */
> > - ep_mpss = dd->pcidev->pcie_mpss;
> > - ep_mps = ffs(pcie_get_mps(dd->pcidev)) - 8;
> > -
> > - /* Find max payload supported by root, endpoint */
> > - if (rc_mpss > ep_mpss)
> > - rc_mpss = ep_mpss;
> > -
> > - /* If Supported greater than limit in module param, limit
> > it */
> > - if (rc_mpss > (qib_pcie_caps & 7))
> > - rc_mpss = qib_pcie_caps & 7;
> > - /* If less than (allowed, supported), bump root payload */
> > - if (rc_mpss > rc_mps) {
> > - rc_mps = rc_mpss;
> > - pcie_set_mps(parent, 128 << rc_mps);
> > - }
> > - /* If less than (allowed, supported), bump endpoint
> > payload */
> > - if (rc_mpss > ep_mps) {
> > - ep_mps = rc_mpss;
> > - pcie_set_mps(dd->pcidev, 128 << ep_mps);
> > - }
> > -
> > /*
> > * Now the Read Request size.
> > * No field for max supported, but PCIe spec limits it to
> > 4096,????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2015-08-17 23:04:01

by Keith Busch

[permalink] [raw]
Subject: Re: [PATCH] pci: Default MPS tuning, match upstream port

On Mon, 17 Aug 2015, Bjorn Helgaas wrote:
> On Wed, Jul 29, 2015 at 04:18:53PM -0600, Keith Busch wrote:
>> The new pcie tuning will check the device's MPS against the parent bridge
>> when it is initially added to the pci subsystem, prior to attaching
>> to a driver. If MPS is mismatched, the downstream port is set to the
>> parent bridge's if capable.
>
> This is a little confusing (at least, *I'm* confused). You're talking
> about setting the MPS configuration of the "downstream port," but I
> think you are talking about either a Switch Upstream Port or an
> Endpoint. Such a port is indeed *downstream* from the parent bridge,
> but referring to it as a "downstream port" risks confusing it with the
> parent bridge, which is either a Switch Downstream Port or a Root
> Port.

Yes, the wording is confusing, yet you've managed to describe my
intention, though. :)

>> {
>> int retval;
>>
>> + if (dev->bus->self &&
>> + pcie_get_mps(dev) != pcie_get_mps(dev->bus->self))
>> + return;
>
> This part looks like it could be in its own separate patch that
> basically enforces the "if we can't safely configure this device's MPS
> setting, prevent drivers from using the device" idea. What happens in
> this case? Does the device still appear in sysfs and lspci, even if
> we can't configure its MPS safely? This seems like good place for a
> dev_warn().

It will remain in the pci topology and all the associated sysfs
artifacts and lspic capabilities will be there. You could retune the
whole topology from user space with "setpci" and do a rescan if you were
so inclined, but that could cause problems if transactions are in-flight
while re-tuning.

A dev_warn will be produced when the device is initially scanned as you
noted below, but another at this point might be useful to
make it clear a driver will not be bound.

The above is broken, though, for PCI device's that are not PCI-e, so
need to fix that at the very least if we will enforce this.

>> - if (mps != p_mps)
>> - dev_warn(&dev->dev, "Max Payload Size %d, but upstream %s set to %d; if necessary, use \"pci=pcie_bus_safe\" and report a bug\n",
>> - mps, pci_name(bridge), p_mps);
>> + if (mps != p_mps) {
>> + if (128 << dev->pcie_mpss < p_mps) {
>> + dev_warn(&dev->dev,
>> + "Max Payload Size Supported %d, but upstream %s set to %d. If problems are experienced, try running with pci=pcie_bus_safe\n",
>> + 128 << dev->pcie_mpss, pci_name(bridge), p_mps);
>> + return;
>
> Isn't this the same case where the above change to
> pci_bus_add_device() will now decline to add the device? So I think
> there are really two policy changes:
>
> 1) If an device can be configured to match the upstream bridge's MPS
> setting, do it, and
>
> 2) Don't add a device when its MPS setting doesn't match the
> upstream bridge
>
> I'd like those to be separate patches.

Ok.

>> + }
>> + pcie_write_mps(dev, p_mps);
>> + dev_info(&dev->dev, "Max Payload Size set to %4d/%4d (was %4d)\n",
>> + pcie_get_mps(dev), 128 << dev->pcie_mpss, mps);
>> + }
>
> I assume this hunk is related to the policy change (from "do nothing"
> to "update the downstream port"). Can you split that policy change
> into its own separate minimal patch? Yes, I'm looking for minimal and
> specific bisect targets :)
>
> I think this will be clearer if you write it as:
>
> if (mps == p_mps)
> return;
>
> if (128 << dev->pcie_mpss < p_mps) {
> dev_warn(...);
> return;
> }
>
> pcie_write_mps(...);
> dev_info(...);
>
> Now that this actively updates the MPS setting, it's starting to grow
> beyond the original "pcie_bus_detect_mps()" function name.

Agreed that's a cleaner flow.


Thanks for all the feedback. Will work on a revision this week.

2015-08-17 23:32:45

by Dave Jiang

[permalink] [raw]
Subject: Re: [PATCH 3/3] PCIE: Remove symbol export for pcie_set_mps()

On Mon, 2015-08-17 at 17:31 -0500, Bjorn Helgaas wrote:
> On Wed, Jul 29, 2015 at 04:18:55PM -0600, Keith Busch wrote:
> > From: Dave Jiang <[email protected]>
> >
> > The setting of PCIe MPS should be left to the PCI subsystem and not
> > the driver. An ill configured MPS by the driver could cause the
> > device
> > to not function or unstablize the entire system. Removing the
> > exported
> > symbol.
> >
> > Signed-off-by: Dave Jiang <[email protected]>
> > ---
> > drivers/pci/pci.c | 1 -
> > 1 file changed, 1 deletion(-)
> >
> > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > index 0008c95..92349ee 100644
> > --- a/drivers/pci/pci.c
> > +++ b/drivers/pci/pci.c
> > @@ -4121,7 +4121,6 @@ int pcie_set_mps(struct pci_dev *dev, int
> > mps)
> > return pcie_capability_clear_and_set_word(dev,
> > PCI_EXP_DEVCTL,
> >
> > PCI_EXP_DEVCTL_PAYLOAD, v);
> > }
> > -EXPORT_SYMBOL(pcie_set_mps);
>
> I think the pcie_set_mps() declaration could be moved from
> include/linux/pci.h to drivers/pci/pci.h, couldn't it?

Ok.????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2015-08-18 00:06:33

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [PATCH 2/3] QIB: Removing usage of pcie_set_mps()

On Mon, Aug 17, 2015 at 5:50 PM, Jiang, Dave <[email protected]> wrote:
> On Mon, 2015-08-17 at 17:30 -0500, Bjorn Helgaas wrote:
>> [+cc Mike, linux-rdma]
>>
>> On Wed, Jul 29, 2015 at 04:18:54PM -0600, Keith Busch wrote:
>> > From: Dave Jiang <[email protected]>
>> >
>> > This is in perperation of un-exporting the pcie_set_mps() function
>> > symbol. A driver should not be changing the MPS as that is the
>> > responsibility of the PCI subsystem.
>>
>> Please explain the implications of removing this code. Does this
>> affect
>> performance of the device? If so, how do we get that performance
>> back?
>
> Honestly I don't know. But at the same time I think the driver
> shouldn't be touching the MPS at all. Shouldn't that be left to the
> PCIe subsystem and rely on the PCIe subsystem to set this to a sane
> value?

Yes, I think in principle the PCI core should own this, but I also
don't want to introduce a performance regression, so I think we need
to understand whether there's a problem, and if there is, fix it.

>> I also cc'd the QIB maintainers for you:
>>
>> QIB DRIVER
>> M: Mike Marciniszyn <[email protected]>
>> L: [email protected]
>> F: drivers/infiniband/hw/qib/
>>
>> > Signed-off-by: Dave Jiang <[email protected]>
>> > ---
>> > drivers/infiniband/hw/qib/qib_pcie.c | 27 +---------------------
>> > -----
>> > 1 file changed, 1 insertion(+), 26 deletions(-)
>> >
>> > diff --git a/drivers/infiniband/hw/qib/qib_pcie.c
>> > b/drivers/infiniband/hw/qib/qib_pcie.c
>> > index 4758a38..b8a2dcd 100644
>> > --- a/drivers/infiniband/hw/qib/qib_pcie.c
>> > +++ b/drivers/infiniband/hw/qib/qib_pcie.c
>> > @@ -557,12 +557,11 @@ static void qib_tune_pcie_coalesce(struct
>> > qib_devdata *dd)
>> > */
>> > static int qib_pcie_caps;
>> > module_param_named(pcie_caps, qib_pcie_caps, int, S_IRUGO);
>> > -MODULE_PARM_DESC(pcie_caps, "Max PCIe tuning: Payload (0..3),
>> > ReadReq (4..7)");
>> > +MODULE_PARM_DESC(pcie_caps, "Max PCIe tuning: ReadReq (4..7)");
>> >
>> > static void qib_tune_pcie_caps(struct qib_devdata *dd)
>> > {
>> > struct pci_dev *parent;
>> > - u16 rc_mpss, rc_mps, ep_mpss, ep_mps;
>> > u16 rc_mrrs, ep_mrrs, max_mrrs;
>> >
>> > /* Find out supported and configured values for parent
>> > (root) */
>> > @@ -575,30 +574,6 @@ static void qib_tune_pcie_caps(struct
>> > qib_devdata *dd)
>> > if (!pci_is_pcie(parent) || !pci_is_pcie(dd->pcidev))
>> > return;
>> >
>> > - rc_mpss = parent->pcie_mpss;
>> > - rc_mps = ffs(pcie_get_mps(parent)) - 8;
>> > - /* Find out supported and configured values for endpoint
>> > (us) */
>> > - ep_mpss = dd->pcidev->pcie_mpss;
>> > - ep_mps = ffs(pcie_get_mps(dd->pcidev)) - 8;
>> > -
>> > - /* Find max payload supported by root, endpoint */
>> > - if (rc_mpss > ep_mpss)
>> > - rc_mpss = ep_mpss;
>> > -
>> > - /* If Supported greater than limit in module param, limit
>> > it */
>> > - if (rc_mpss > (qib_pcie_caps & 7))
>> > - rc_mpss = qib_pcie_caps & 7;
>> > - /* If less than (allowed, supported), bump root payload */
>> > - if (rc_mpss > rc_mps) {
>> > - rc_mps = rc_mpss;
>> > - pcie_set_mps(parent, 128 << rc_mps);
>> > - }
>> > - /* If less than (allowed, supported), bump endpoint
>> > payload */
>> > - if (rc_mpss > ep_mps) {
>> > - ep_mps = rc_mpss;
>> > - pcie_set_mps(dd->pcidev, 128 << ep_mps);
>> > - }
>> > -
>> > /*
>> > * Now the Read Request size.
>> > * No field for max supported, but PCIe spec limits it to
>> > 4096,

2015-08-18 01:49:12

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH 2/3] QIB: Removing usage of pcie_set_mps()

On Mon, Aug 17, 2015 at 07:06:10PM -0500, Bjorn Helgaas wrote:
> On Mon, Aug 17, 2015 at 5:50 PM, Jiang, Dave <[email protected]> wrote:
> > On Mon, 2015-08-17 at 17:30 -0500, Bjorn Helgaas wrote:
> >> [+cc Mike, linux-rdma]
> >>
> >> On Wed, Jul 29, 2015 at 04:18:54PM -0600, Keith Busch wrote:
> >> > From: Dave Jiang <[email protected]>
> >> >
> >> > This is in perperation of un-exporting the pcie_set_mps() function
> >> > symbol. A driver should not be changing the MPS as that is the
> >> > responsibility of the PCI subsystem.
> >>
> >> Please explain the implications of removing this code. Does this
> >> affect
> >> performance of the device? If so, how do we get that performance
> >> back?
> >
> > Honestly I don't know. But at the same time I think the driver
> > shouldn't be touching the MPS at all. Shouldn't that be left to the
> > PCIe subsystem and rely on the PCIe subsystem to set this to a sane
> > value?
>
> Yes, I think in principle the PCI core should own this, but I also
> don't want to introduce a performance regression, so I think we need
> to understand whether there's a problem, and if there is, fix it.

Making sure Mike is CC'd directly..

Mike: I see this has been cut and pasted to HFI1 too, I would be
disappointed if HFI needs it as well. :(

FWIW, I totally agree with the above. MPS/MRS and related have to do
with root port capability, switches in the path and any platform
bugs..

I'm not sure why a driver would ever want to mess with this, and since
this code doesn't walk the bus toward the root port, it is technically
wrong, right? I also find it strange that qib_pcie_caps defaults to
zero which means the 'tuning' reduces the payload size to the minimums
(??)

> >> > /*
> >> > * Now the Read Request size.
> >> > * No field for max supported, but PCIe spec limits it to
> >> > 4096,

.. it has been a bit since I looked at this, but IIRC, MRRS is also
something that should not be touched by a driver?

Jason