2012-10-26 06:59:28

by Cyberman Wu

[permalink] [raw]
Subject: PCIe IO space support on Tilera GX: Is there any one who can confirm my modification to fix it is OK?

After we upgrade to MDE 4.1.0 from Tilera, we encounter a problem that
only on HighPoint 2680 card works, I've
tried to fix it, but since most time I'm working in user space, I'm
not sure my fix is enough. Their FAE said that
the guy who add PCIe I/O space support is on vacation and I can't get
help from him now, I hope maybe there
will have somebody can help.


Problem we encountered:

pci 0000:00:00.0: BAR 8: assigned [mem 0x100c0000000-0x100c00fffff]
pci 0000:00:00.0: BAR 9: assigned [mem 0x100c0100000-0x100c01fffff pref]
pci 0000:00:00.0: BAR 7: assigned [io 0x0000-0x0fff]
pci 0000:01:00.0: BAR 6: assigned [mem 0x100c0100000-0x100c013ffff pref]
pci 0000:01:00.0: BAR 6: set to [mem 0x100c0100000-0x100c013ffff pref]
(PCI address [0xc0100000-0xc013ffff])
pci 0000:01:00.0: BAR 4: assigned [mem 0x100c0000000-0x100c000ffff 64bit]
pci 0000:01:00.0: BAR 4: set to [mem 0x100c0000000-0x100c000ffff
64bit] (PCI address [0xc0000000-0xc000ffff])
pci 0000:01:00.0: BAR 2: assigned [io 0x0000-0x007f]
pci 0000:01:00.0: BAR 2: set to [io 0x0000-0x007f] (PCI address [0x0-0x7f])
pci 0000:00:00.0: PCI bridge to [bus 01-01]
pci 0000:00:00.0: bridge window [io 0x0000-0x0fff]
pci 0000:00:00.0: bridge window [mem 0x100c0000000-0x100c00fffff]
pci 0000:00:00.0: bridge window [mem 0x100c0100000-0x100c01fffff pref]
pci 0001:00:00.0: BAR 8: assigned [mem 0x101c0000000-0x101c00fffff]
pci 0001:00:00.0: BAR 9: assigned [mem 0x101c0100000-0x101c01fffff pref]
pci 0001:00:00.0: BAR 7: assigned [io 0x0000-0x0fff]
pci 0001:01:00.0: BAR 6: assigned [mem 0x101c0100000-0x101c013ffff pref]
pci 0001:01:00.0: BAR 6: set to [mem 0x101c0100000-0x101c013ffff pref]
(PCI address [0xc0100000-0xc013ffff])
pci 0001:01:00.0: BAR 4: assigned [mem 0x101c0000000-0x101c000ffff 64bit]
pci 0001:01:00.0: BAR 4: set to [mem 0x101c0000000-0x101c000ffff
64bit] (PCI address [0xc0000000-0xc000ffff])
pci 0001:01:00.0: BAR 2: assigned [io 0x0000-0x007f]
pci 0001:01:00.0: BAR 2: set to [io 0x0000-0x007f] (PCI address [0x0-0x7f])
pci 0001:00:00.0: PCI bridge to [bus 01-01]
pci 0001:00:00.0: bridge window [io 0x0000-0x0fff]
pci 0001:00:00.0: bridge window [mem 0x101c0000000-0x101c00fffff]
pci 0001:00:00.0: bridge window [mem 0x101c0100000-0x101c01fffff pref]
pci 0000:00:00.0: enabling device (0006 -> 0007)
pci 0001:00:00.0: enabling device (0006 -> 0007)
pci_bus 0000:00: resource 0 [io 0x0000-0xffffffff]
pci_bus 0000:00: resource 1 [mem 0x100c0000000-0x100ffffffff]
pci_bus 0000:01: resource 0 [io 0x0000-0x0fff]
pci_bus 0000:01: resource 1 [mem 0x100c0000000-0x100c00fffff]
pci_bus 0000:01: resource 2 [mem 0x100c0100000-0x100c01fffff pref]
pci_bus 0001:00: resource 0 [io 0x0000-0xffffffff]
pci_bus 0001:00: resource 1 [mem 0x101c0000000-0x101ffffffff]
pci_bus 0001:01: resource 0 [io 0x0000-0x0fff]
pci_bus 0001:01: resource 1 [mem 0x101c0000000-0x101c00fffff]
pci_bus 0001:01: resource 2 [mem 0x101c0100000-0x101c01fffff pref]
......
mvsas 0000:01:00.0: mvsas: driver version 0.8.2
mvsas 0000:01:00.0: enabling device (0000 -> 0003)
mvsas 0000:01:00.0: enabling bus mastering
mvsas 0000:01:00.0: mvsas: PCI-E x4, Bandwidth Usage: 2.5 Gbps
mvsas 0000:01:00.0: Phy3 : No sig fis
scsi0 : mvsas
......
mvsas 0001:01:00.0: mvsas: driver version 0.8.2
mvsas 0001:01:00.0: enabling device (0000 -> 0003)
mvsas 0001:01:00.0: enabling bus mastering
mvsas 0001:01:00.0: BAR 2: can't reserve [io 0x0000-0x007f]
mvsas: probe of 0001:01:00.0 failed with error -16


My modification:

--- /opt/tilera/TileraMDE-4.1.0.148119/tilegx/src/linux-2.6.40.38/arch/tile/kernel/pci_gx.c 2012-10-22
14:56:59.783096378 +0800
+++ Tilera_src/src/linux-2.6.40.38/arch/tile/kernel/pci_gx.c 2012-10-26
13:55:02.731947886 +0800
@@ -368,6 +368,10 @@
int num_trio_shims = 0;
int ctl_index = 0;
int i, j;
+ // Modified by Cyberman Wu on Oct 25th, 2012.
+ resource_size_t io_mem_start;
+ resource_size_t io_mem_end;
+ resource_size_t io_mem_size;

if (!pci_probe) {
pr_info("PCI: disabled by boot argument\n");
@@ -457,6 +461,18 @@
}

out:
+ // Use IO memory space 0~0xffffffff for every controller will
+ // cause device on controller other than the first failed to
+ // load driver if it using IO regions.
+ // Is reserve the first 4K IO address space OK? Tilera use
+ // IO space address begin from 0, but some drivers in Linux
+ // recognize 0 address a error, say, mvsas, so for compatiblity
+ // reserve some address from 0 should be better?
+ // Modified by Cyberman Wu on Oct 25th, 2012.
+ io_mem_start = 4096;
+ io_mem_end = (resource_size_t)IO_SPACE_LIMIT + 1;
+ io_mem_size = (io_mem_end - io_mem_start) / num_rc_controllers;
+ io_mem_size &= ~3;
/*
* Configure each PCIe RC port.
*/
@@ -470,8 +486,9 @@
controller->index = i;
controller->ops = &tile_cfg_ops;

- controller->io_space.start = 0;
- controller->io_space.end = IO_SPACE_LIMIT;
+ // Modified by Cyberman Wu on Oct 25th, 2012.
+ controller->io_space.start = io_mem_start + (i * io_mem_size);
+ controller->io_space.end = controller->io_space.start + io_mem_size - 1;
controller->io_space.flags = IORESOURCE_IO;
snprintf(controller->io_space_name,
sizeof(controller->io_space_name),


Please note that we're using MDE-4.1.0, which use kernel 3.0.38, patch
it and reversion it
to 2.6.40.38.
I've checked source code under arch/tile of kernel 3.6.3 and PCIe I/O
space support is still
not here. Below is diff of arch/tile/pci_gx.c between kernel 3.6.3 and
MDE-4.1.0:

--- .cache/.fr-9Oo37J/linux-3.6.3/arch/tile/kernel/pci_gx.c 2012-10-22
00:32:56.000000000 +0800
+++ /opt/tilera/TileraMDE-4.1.0.148119/tilegx/src/linux-2.6.40.38/arch/tile/kernel/pci_gx.c 2012-10-22
14:56:59.783096378 +0800
@@ -69,19 +69,18 @@
* a HW PCIe link-training bug. The exact delay is specified with
* a kernel boot argument in the form of "pcie_rc_delay=T,P,S",
* where T is the TRIO instance number, P is the port number and S is
- * the delay in seconds. If the delay is not provided, the value
- * will be DEFAULT_RC_DELAY.
+ * the delay in seconds. If the argument is specified, but the delay is
+ * not provided, the value will be DEFAULT_RC_DELAY.
*/
static int __devinitdata rc_delay[TILEGX_NUM_TRIO][TILEGX_TRIO_PCIES];

/* Default number of seconds that the PCIe RC port probe can be delayed. */
#define DEFAULT_RC_DELAY 10

-/* Max number of seconds that the PCIe RC port probe can be delayed. */
-#define MAX_RC_DELAY 20
-
+#if !defined(GX_FPGA)
/* Array of the PCIe ports configuration info obtained from the BIB. */
struct pcie_port_property pcie_ports[TILEGX_NUM_TRIO][TILEGX_TRIO_PCIES];
+#endif

/* All drivers share the TRIO contexts defined here. */
gxio_trio_context_t trio_contexts[TILEGX_NUM_TRIO];
@@ -97,6 +96,41 @@
static struct cpumask intr_cpus_map;

/*
+ * Convert a resource to a PCI device bus address or bus window.
+ */
+void __devinit
+pcibios_resource_to_bus(struct pci_dev *dev, struct pci_bus_region *region,
+ struct resource *res)
+{
+ struct pci_controller *controller =
+ (struct pci_controller *)dev->sysdata;
+ unsigned long offset = 0;
+
+ if (res->flags & IORESOURCE_MEM)
+ offset = controller->mem_offset;
+
+ region->start = res->start - offset;
+ region->end = res->end - offset;
+}
+EXPORT_SYMBOL(pcibios_resource_to_bus);
+
+void __devinit
+pcibios_bus_to_resource(struct pci_dev *dev, struct resource *res,
+ struct pci_bus_region *region)
+{
+ struct pci_controller *controller =
+ (struct pci_controller *)dev->sysdata;
+ unsigned long offset = 0;
+
+ if (res->flags & IORESOURCE_MEM)
+ offset = controller->mem_offset;
+
+ res->start = region->start + offset;
+ res->end = region->end + offset;
+}
+EXPORT_SYMBOL(pcibios_bus_to_resource);
+
+/*
* We don't need to worry about the alignment of resources.
*/
resource_size_t pcibios_align_resource(void *data, const struct resource *res,
@@ -274,6 +308,10 @@

cpumask_copy(&intr_cpus_map, cpu_online_mask);

+#ifdef CONFIG_DATAPLANE
+ /* Remove dataplane cpus. */
+ cpumask_andnot(&intr_cpus_map, &intr_cpus_map, &dataplane_map);
+#endif

for (i = 0; i < 4; i++) {
gxio_trio_context_t *context = controller->trio;
@@ -325,7 +363,7 @@
*
* Returns the number of controllers discovered.
*/
-int __init tile_pci_init(void)
+int __devinit tile_pci_init(void)
{
int num_trio_shims = 0;
int ctl_index = 0;
@@ -359,6 +397,7 @@
* We look at the Board Information Block first and then see if there
* are any overriding configuration by the HW strapping pin.
*/
+#if !defined(GX_FPGA)
for (i = 0; i < TILEGX_NUM_TRIO; i++) {
gxio_trio_context_t *context = &trio_contexts[i];
int ret;
@@ -386,6 +425,13 @@
}
}
}
+#else
+ /*
+ * For now, just assume that there is a single RC port on trio/0.
+ */
+ num_rc_controllers = 1;
+ pcie_rc[0][2] = 1;
+#endif

/*
* Return if no PCIe ports are configured to operate in RC mode.
@@ -424,13 +470,20 @@
controller->index = i;
controller->ops = &tile_cfg_ops;

+ controller->io_space.start = 0;
+ controller->io_space.end = IO_SPACE_LIMIT;
+ controller->io_space.flags = IORESOURCE_IO;
+ snprintf(controller->io_space_name,
+ sizeof(controller->io_space_name),
+ "PCI I/O domain %d", i);
+ controller->io_space.name = controller->io_space_name;
+
/*
* The PCI memory resource is located above the PA space.
* For every host bridge, the BAR window or the MMIO aperture
* is in range [3GB, 4GB - 1] of a 4GB space beyond the
* PA space.
*/
-
controller->mem_offset = TILE_PCI_MEM_START +
(i * TILE_PCI_BAR_WINDOW_TOP);
controller->mem_space.start = controller->mem_offset +
@@ -451,7 +504,7 @@
* (pin - 1) converts from the PCI standard's [1:4] convention to
* a normal [0:3] range.
*/
-static int tile_map_irq(const struct pci_dev *dev, u8 device, u8 pin)
+static int tile_map_irq(struct pci_dev *dev, u8 device, u8 pin)
{
struct pci_controller *controller =
(struct pci_controller *)dev->sysdata;
@@ -463,11 +516,12 @@
controller)
{
gxio_trio_context_t *trio_context = controller->trio;
- struct pci_bus *root_bus = controller->root_bus;
TRIO_PCIE_RC_DEVICE_CONTROL_t dev_control;
TRIO_PCIE_RC_DEVICE_CAP_t rc_dev_cap;
+ unsigned int smallest_max_payload;
+ struct pci_dev *dev = NULL;
unsigned int reg_offset;
- struct pci_bus *child;
+ u16 new_values;
int mac;
int err;

@@ -508,33 +562,59 @@
__gxio_mmio_write32(trio_context->mmio_base_mac + reg_offset,
rc_dev_cap.word);

- /* Configure PCI Express MPS setting. */
- list_for_each_entry(child, &root_bus->children, node) {
- struct pci_dev *self = child->self;
- if (!self)
+ smallest_max_payload = rc_dev_cap.mps_sup;
+
+ /* Scan for the smallest maximum payload size. */
+ while ((dev = pci_get_device(PCI_ANY_ID, PCI_ANY_ID, dev)) != NULL) {
+ int pcie_caps_offset;
+ u32 devcap;
+ int max_payload;
+
+ /* Skip device that is not in this PCIe domain. */
+ if ((struct pci_controller *)dev->sysdata != controller)
continue;

- pcie_bus_configure_settings(child, self->pcie_mpss);
+ pcie_caps_offset = pci_find_capability(dev, PCI_CAP_ID_EXP);
+ if (pcie_caps_offset == 0)
+ continue;
+
+ pci_read_config_dword(dev, pcie_caps_offset + PCI_EXP_DEVCAP,
+ &devcap);
+ max_payload = devcap & PCI_EXP_DEVCAP_PAYLOAD;
+ if (max_payload < smallest_max_payload)
+ smallest_max_payload = max_payload;
+ }
+
+ /* Now, set the max_payload_size for all devices to that value. */
+ new_values = smallest_max_payload << 5;
+ while ((dev = pci_get_device(PCI_ANY_ID, PCI_ANY_ID, dev)) != NULL) {
+ int pcie_caps_offset;
+ u16 devctl;
+
+ /* Skip device that is not in this PCIe domain. */
+ if ((struct pci_controller *)dev->sysdata != controller)
+ continue;
+
+ pcie_caps_offset = pci_find_capability(dev, PCI_CAP_ID_EXP);
+ if (pcie_caps_offset == 0)
+ continue;
+
+ pci_read_config_word(dev, pcie_caps_offset + PCI_EXP_DEVCTL,
+ &devctl);
+ devctl &= ~PCI_EXP_DEVCTL_PAYLOAD;
+ devctl |= new_values;
+ pci_write_config_word(dev, pcie_caps_offset + PCI_EXP_DEVCTL,
+ devctl);
}

/*
* Set the mac_config register in trio based on the MPS/MRS of the link.
*/
- reg_offset =
- (TRIO_PCIE_RC_DEVICE_CONTROL <<
- TRIO_CFG_REGION_ADDR__REG_SHIFT) |
- (TRIO_CFG_REGION_ADDR__INTFC_VAL_MAC_STANDARD <<
- TRIO_CFG_REGION_ADDR__INTFC_SHIFT ) |
- (mac << TRIO_CFG_REGION_ADDR__MAC_SEL_SHIFT);
-
- dev_control.word = __gxio_mmio_read32(trio_context->mmio_base_mac +
- reg_offset);
-
err = gxio_trio_set_mps_mrs(trio_context,
- dev_control.max_payload_size,
+ smallest_max_payload,
dev_control.max_read_req_sz,
mac);
- if (err < 0) {
+ if (err < 0) {
pr_err("PCI: PCIE_CONFIGURE_MAC_MPS_MRS failure, "
"MAC %d on TRIO %d\n",
mac, controller->trio_index);
@@ -571,14 +651,9 @@
if (!isdigit(*str))
return -EINVAL;
delay = simple_strtoul(str, (char **)&str, 10);
- if (delay > MAX_RC_DELAY)
- return -EINVAL;
}

rc_delay[trio_index][mac] = delay ? : DEFAULT_RC_DELAY;
- pr_info("Delaying PCIe RC link training for %u sec"
- " on MAC %lu on TRIO %lu\n", rc_delay[trio_index][mac],
- mac, trio_index);
return 0;
}
early_param("pcie_rc_delay", setup_pcie_rc_delay);
@@ -586,18 +661,14 @@
/*
* PCI initialization entry point, called by subsys_initcall.
*/
-int __init pcibios_init(void)
+int __devinit pcibios_init(void)
{
resource_size_t offset;
- LIST_HEAD(resources);
int next_busno;
int i;

tile_pci_init();

- if (num_rc_controllers == 0 && num_ep_controllers == 0)
- return 0;
-
/*
* We loop over all the TRIO shims and set up the MMIO mappings.
*/
@@ -623,6 +694,9 @@
}
}

+ if (num_rc_controllers == 0 && num_ep_controllers == 0)
+ return 0;
+
/*
* Delay a bit in case devices aren't ready. Some devices are
* known to require at least 20ms here, but we use a more
@@ -684,15 +758,36 @@
}

/*
- * Delay the RC link training if needed.
+ * Delay the bus probe if needed.
*/
- if (rc_delay[trio_index][mac])
+ if (rc_delay[trio_index][mac]) {
+ pr_info("Delaying PCIe RC link training for %d sec"
+ " on MAC %d on TRIO %d\n",
+ rc_delay[trio_index][mac], mac,
+ trio_index);
msleep(rc_delay[trio_index][mac] * 1000);
+ }

- ret = gxio_trio_force_rc_link_up(trio_context, mac);
- if (ret < 0)
- pr_err("PCI: PCIE_FORCE_LINK_UP failure, "
- "MAC %d on TRIO %d\n", mac, trio_index);
+ /*
+ * Check for PCIe link-up status to decide if we need
+ * to force the link to come up.
+ */
+ reg_offset =
+ (TRIO_PCIE_INTFC_PORT_STATUS <<
+ TRIO_CFG_REGION_ADDR__REG_SHIFT) |
+ (TRIO_CFG_REGION_ADDR__INTFC_VAL_MAC_INTERFACE <<
+ TRIO_CFG_REGION_ADDR__INTFC_SHIFT ) |
+ (mac << TRIO_CFG_REGION_ADDR__MAC_SEL_SHIFT);
+
+ port_status.word =
+ __gxio_mmio_read(trio_context->mmio_base_mac +
+ reg_offset);
+ if (!port_status.dl_up) {
+ ret = gxio_trio_force_rc_link_up(trio_context, mac);
+ if (ret < 0)
+ pr_err("PCI: PCIE_FORCE_LINK_UP failure, "
+ "MAC %d on TRIO %d\n", mac, trio_index);
+ }

pr_info("PCI: Found PCI controller #%d on TRIO %d MAC %d\n", i,
trio_index, controller->mac);
@@ -704,22 +799,20 @@
msleep(1000);

/*
- * Check for PCIe link-up status.
+ * Check for PCIe link-up status again.
*/
-
- reg_offset =
- (TRIO_PCIE_INTFC_PORT_STATUS <<
- TRIO_CFG_REGION_ADDR__REG_SHIFT) |
- (TRIO_CFG_REGION_ADDR__INTFC_VAL_MAC_INTERFACE <<
- TRIO_CFG_REGION_ADDR__INTFC_SHIFT ) |
- (mac << TRIO_CFG_REGION_ADDR__MAC_SEL_SHIFT);
-
port_status.word =
__gxio_mmio_read(trio_context->mmio_base_mac +
reg_offset);
if (!port_status.dl_up) {
- pr_err("PCI: link is down, MAC %d on TRIO %d\n",
- mac, trio_index);
+ if (pcie_ports[trio_index][mac].removable) {
+ pr_info("PCI: link is down, MAC %d on TRIO %d",
+ mac, trio_index);
+ pr_info("This is expected if no PCIe card"
+ " is connected to this link");
+ } else
+ pr_err("PCI: link is down, MAC %d on TRIO %d",
+ mac, trio_index);
continue;
}

@@ -842,19 +935,22 @@
}

/*
- * The PCI memory resource is located above the PA space.
- * The memory range for the PCI root bus should not overlap
- * with the physical RAM
+ * This comes from the generic Linux PCI driver.
+ *
+ * It reads the PCI tree for this bus into the Linux
+ * data structures.
+ *
+ * This is inlined in linux/pci.h and calls into
+ * pci_scan_bus_parented() in probe.c.
*/
- pci_add_resource_offset(&resources, &controller->mem_space,
- controller->mem_offset);
-
- controller->first_busno = next_busno;
- bus = pci_scan_root_bus(NULL, next_busno, controller->ops,
- controller, &resources);
+ controller->first_busno= next_busno;
+ bus = pci_scan_bus(next_busno, controller->ops, controller);
controller->root_bus = bus;
- next_busno = bus->busn_res.end + 1;
-
+#if 0
+ next_busno = bus->subordinate + 1;
+#else
+ next_busno = 0;
+#endif
}

/* Do machine dependent PCI interrupt routing */
@@ -951,6 +1047,37 @@
}

/*
+ * Alloc a PIO region for PCI I/O space access for each RC port.
+ */
+ ret = gxio_trio_alloc_pio_regions(trio_context, 1, 0, 0);
+ if (ret < 0) {
+ pr_err("PCI: I/O PIO alloc failure on TRIO %d mac %d, "
+ "give up\n", controller->trio_index,
+ controller->mac);
+
+ continue;
+ }
+
+ controller->pio_io_index = ret;
+
+ /*
+ * For PIO IO, the bus_address_hi parameter is hard-coded 0
+ * because PCI I/O address space is 32-bit.
+ */
+ ret = gxio_trio_init_pio_region_aux(trio_context,
+ controller->pio_io_index,
+ controller->mac,
+ 0,
+ HV_TRIO_PIO_FLAG_IO_SPACE);
+ if (ret < 0) {
+ pr_err("PCI: I/O PIO init failure on TRIO %d mac %d, "
+ "give up\n", controller->trio_index,
+ controller->mac);
+
+ continue;
+ }
+
+ /*
* Configure a Mem-Map region for each memory controller so
* that Linux can map all of its PA space to the PCI bus.
* Use the IOMMU to handle hash-for-home memory.
@@ -1015,9 +1142,22 @@
}
subsys_initcall(pcibios_init);

-/* Note: to be deleted after Linux 3.6 merge. */
+/*
+ * PCI scan code calls the arch specific pcibios_fixup_bus() each time it scans
+ * a new bridge. Called after each bus is probed, but before its children are
+ * examined.
+ */
void __devinit pcibios_fixup_bus(struct pci_bus *bus)
{
+ struct pci_dev *dev = bus->self;
+
+ if (!dev) {
+ struct pci_controller *controller = bus->sysdata;
+
+ /* This is the root bus. */
+ bus->resource[0] = &controller->io_space;
+ bus->resource[1] = &controller->mem_space;
+ }
}

/*
@@ -1043,8 +1183,7 @@

/*
* Enable memory address decoding, as appropriate, for the
- * device described by the 'dev' struct. The I/O decoding
- * is disabled, though the TILE-Gx supports I/O addressing.
+ * device described by the 'dev' struct.
*
* This is called from the generic PCI layer, and can be called
* for bridges or endpoints.
@@ -1126,10 +1265,95 @@
* We need to keep the PCI bus address's in-page offset in the VA.
*/
return iorpc_ioremap(trio_fd, offset, size) +
- (phys_addr & (PAGE_SIZE - 1));
+ (start & (PAGE_SIZE - 1));
}
EXPORT_SYMBOL(ioremap);

+/* Map a PCI I/O address into VA space. */
+void __iomem *ioport_map(unsigned long port, unsigned int size)
+{
+ struct pci_controller *controller = NULL;
+ resource_size_t bar_start;
+ resource_size_t bar_end;
+ resource_size_t offset;
+ resource_size_t start;
+ resource_size_t end;
+ int trio_fd;
+ int i;
+
+ start = port;
+ end = port + size - 1;
+
+ /*
+ * In the following, each PCI controller's mem_resources[0]
+ * represents its PCI I/O resource. By searching port in each
+ * controller's mem_resources[0], we can determine the controller
+ * that should accept the PCI I/O access.
+ */
+
+ for (i = 0; i < num_rc_controllers; i++) {
+ /*
+ * Skip controllers that are not properly initialized or
+ * have down links.
+ */
+ if (pci_controllers[i].root_bus == NULL)
+ continue;
+
+ bar_start = pci_controllers[i].mem_resources[0].start;
+ bar_end = pci_controllers[i].mem_resources[0].end;
+
+ if ((start >= bar_start) && (end <= bar_end)) {
+
+ controller = &pci_controllers[i];
+
+ goto got_it;
+ }
+ }
+
+ if (controller == NULL)
+ return NULL;
+
+got_it:
+ trio_fd = controller->trio->fd;
+
+ offset = HV_TRIO_PIO_OFFSET(controller->pio_io_index) + port;
+
+ /*
+ * We need to keep the PCI bus address's in-page offset in the VA.
+ */
+ return iorpc_ioremap(trio_fd, offset, size) + (port & (PAGE_SIZE - 1));
+}
+EXPORT_SYMBOL(ioport_map);
+
+void ioport_unmap(void __iomem *addr)
+{
+ iounmap(addr);
+}
+EXPORT_SYMBOL(ioport_unmap);
+
+/*
+ * Create a virtual mapping cookie for a PCI BAR (memory or IO).
+ */
+void __iomem *pci_iomap(struct pci_dev *dev, int bar, unsigned long max)
+{
+ resource_size_t start = pci_resource_start(dev, bar);
+ resource_size_t len = pci_resource_len(dev, bar);
+ unsigned long flags = pci_resource_flags(dev, bar);
+
+ if (!len)
+ return NULL;
+ if (max && len > max)
+ len = max;
+ if (flags & IORESOURCE_IO)
+ return ioport_map(start, len);
+ if (flags & IORESOURCE_MEM)
+ return ioremap(start, len);
+
+ pr_err("PCI: Trying to map invalid resource %#lx\n", flags);
+ return NULL;
+}
+EXPORT_SYMBOL(pci_iomap);
+
void pci_iounmap(struct pci_dev *dev, void __iomem *addr)
{
iounmap(addr);
@@ -1478,32 +1702,55 @@
trio_context = controller->trio;

/*
- * Allocate the Mem-Map that will accept the MSI write and
- * trigger the TILE-side interrupts.
- */
- mem_map = gxio_trio_alloc_memory_maps(trio_context, 1, 0, 0);
- if (mem_map < 0) {
- dev_printk(KERN_INFO, &pdev->dev,
- "%s Mem-Map alloc failure. "
- "Failed to initialize MSI interrupts. "
- "Falling back to legacy interrupts.\n",
- desc->msi_attrib.is_msix ? "MSI-X" : "MSI");
+ * Allocate a scatter-queue that will accept the MSI write and
+ * trigger the TILE-side interrupts. We use the scatter-queue regions
+ * before the mem map regions, because the latter are needed by more
+ * applications.
+ */
+ mem_map = gxio_trio_alloc_scatter_queues(trio_context, 1, 0, 0);
+ if (mem_map >= 0) {
+ TRIO_MAP_SQ_DOORBELL_FMT_t doorbell_template = {{
+ .pop = 0,
+ .doorbell = 1,
+ }};
+
+ mem_map += TRIO_NUM_MAP_MEM_REGIONS;
+ mem_map_base = MEM_MAP_INTR_REGIONS_BASE +
+ mem_map * MEM_MAP_INTR_REGION_SIZE;
+ mem_map_limit = mem_map_base + MEM_MAP_INTR_REGION_SIZE - 1;
+
+ msi_addr = mem_map_base + MEM_MAP_INTR_REGION_SIZE - 8;
+ msg.data = (unsigned int)doorbell_template.word;
+ } else {
+ /* SQ regions are out, allocate from map mem regions. */
+ mem_map = gxio_trio_alloc_memory_maps(trio_context, 1, 0, 0);
+ if (mem_map < 0) {
+ dev_printk(KERN_INFO, &pdev->dev,
+ "%s Mem-Map alloc failure. "
+ "Failed to initialize MSI interrupts. "
+ "Falling back to legacy interrupts.\n",
+ desc->msi_attrib.is_msix ? "MSI-X" : "MSI");
+ ret = -ENOMEM;
+ goto msi_mem_map_alloc_failure;
+ }

- ret = -ENOMEM;
- goto msi_mem_map_alloc_failure;
+ mem_map_base = MEM_MAP_INTR_REGIONS_BASE +
+ mem_map * MEM_MAP_INTR_REGION_SIZE;
+ mem_map_limit = mem_map_base + MEM_MAP_INTR_REGION_SIZE - 1;
+
+ msi_addr = mem_map_base + TRIO_MAP_MEM_REG_INT3 -
+ TRIO_MAP_MEM_REG_INT0;
+
+ msg.data = mem_map;
}

/* We try to distribute different IRQs to different tiles. */
cpu = tile_irq_cpu(irq);

/*
- * Now call up to the HV to configure the Mem-Map interrupt and
+ * Now call up to the HV to configure the MSI interrupt and
* set up the IPI binding.
*/
- mem_map_base = MEM_MAP_INTR_REGIONS_BASE +
- mem_map * MEM_MAP_INTR_REGION_SIZE;
- mem_map_limit = mem_map_base + MEM_MAP_INTR_REGION_SIZE - 1;
-
ret = gxio_trio_config_msi_intr(trio_context, cpu_x(cpu), cpu_y(cpu),
KERNEL_PL, irq, controller->mac,
mem_map, mem_map_base, mem_map_limit,
@@ -1516,13 +1763,9 @@

irq_set_msi_desc(irq, desc);

- msi_addr = mem_map_base + TRIO_MAP_MEM_REG_INT3 - TRIO_MAP_MEM_REG_INT0;
-
msg.address_hi = msi_addr >> 32;
msg.address_lo = msi_addr & 0xffffffff;

- msg.data = mem_map;
-
write_msi_msg(irq, &msg);
irq_set_chip_and_handler(irq, &tilegx_msi_chip, handle_level_irq);
irq_set_handler_data(irq, controller);


What we got after my fix:

pci 0000:00:00.0: BAR 8: assigned [mem 0x100c0000000-0x100c00fffff]
pci 0000:00:00.0: BAR 9: assigned [mem 0x100c0100000-0x100c01fffff pref]
pci 0000:00:00.0: BAR 7: assigned [io 0x1000-0x1fff]
pci 0000:01:00.0: BAR 6: assigned [mem 0x100c0100000-0x100c013ffff pref]
pci 0000:01:00.0: BAR 6: set to [mem 0x100c0100000-0x100c013ffff pref]
(PCI address [0xc0100000-0xc013ffff])
pci 0000:01:00.0: BAR 4: assigned [mem 0x100c0000000-0x100c000ffff
64bit]
pci 0000:01:00.0: BAR 4: set to [mem 0x100c0000000-0x100c000ffff
64bit] (PCI address [0xc0000000-0xc000ffff])
pci 0000:01:00.0: BAR 2: assigned [io 0x1000-0x107f]
pci 0000:01:00.0: BAR 2: set to [io 0x1000-0x107f] (PCI address
[0x1000-0x107f])
pci 0000:00:00.0: PCI bridge to [bus 01-01]
pci 0000:00:00.0: bridge window [io 0x1000-0x1fff]
pci 0000:00:00.0: bridge window [mem 0x100c0000000-0x100c00fffff]
pci 0000:00:00.0: bridge window [mem 0x100c0100000-0x100c01fffff
pref]
pci 0001:00:00.0: BAR 8: assigned [mem 0x101c0000000-0x101c00fffff]
pci 0001:00:00.0: BAR 9: assigned [mem 0x101c0100000-0x101c01fffff
pref]
pci 0001:00:00.0: BAR 7: assigned [io 0x80001000-0x80001fff]
pci 0001:01:00.0: BAR 6: assigned [mem 0x101c0100000-0x101c013ffff
pref]
pci 0001:01:00.0: BAR 6: set to [mem 0x101c0100000-0x101c013ffff pref]
(PCI address [0xc0100000-0xc013ffff])
pci 0001:01:00.0: BAR 4: assigned [mem 0x101c0000000-0x101c000ffff
64bit]
pci 0001:01:00.0: BAR 4: set to [mem 0x101c0000000-0x101c000ffff
64bit] (PCI address [0xc0000000-0xc000ffff])
pci 0001:01:00.0: BAR 2: assigned [io 0x80001000-0x8000107f]
pci 0001:01:00.0: BAR 2: set to [io 0x80001000-0x8000107f] (PCI
address [0x80001000-0x8000107f])
pci 0001:00:00.0: PCI bridge to [bus 01-01]
pci 0001:00:00.0: bridge window [io 0x80001000-0x80001fff]
pci 0001:00:00.0: bridge window [mem 0x101c0000000-0x101c00fffff]
pci 0001:00:00.0: bridge window [mem 0x101c0100000-0x101c01fffff
pref]
pci 0000:00:00.0: enabling device (0006 -> 0007)
pci 0001:00:00.0: enabling device (0006 -> 0007)
pci_bus 0000:00: resource 0 [io 0x1000-0x800007ff]
pci_bus 0000:00: resource 1 [mem 0x100c0000000-0x100ffffffff]
pci_bus 0000:01: resource 0 [io 0x1000-0x1fff]
pci_bus 0000:01: resource 1 [mem 0x100c0000000-0x100c00fffff]
pci_bus 0000:01: resource 2 [mem 0x100c0100000-0x100c01fffff pref]
pci_bus 0001:00: resource 0 [io 0x80000800-0xffffffff]
pci_bus 0001:00: resource 1 [mem 0x101c0000000-0x101ffffffff]
pci_bus 0001:01: resource 0 [io 0x80001000-0x80001fff]
pci_bus 0001:01: resource 1 [mem 0x101c0000000-0x101c00fffff]
pci_bus 0001:01: resource 2 [mem 0x101c0100000-0x101c01fffff pref]
......
mvsas 0000:01:00.0: mvsas: driver version 0.8.2
mvsas 0000:01:00.0: enabling device (0000 -> 0003)
mvsas 0000:01:00.0: enabling bus mastering
mvsas 0000:01:00.0: mvsas: PCI-E x4, Bandwidth Usage: 2.5 Gbps
scsi0 : mvsas
......
mvsas 0001:01:00.0: mvsas: driver version 0.8.2
mvsas 0001:01:00.0: enabling device (0000 -> 0003)
mvsas 0001:01:00.0: enabling bus mastering
mvsas 0001:01:00.0: mvsas: PCI-E x4, Bandwidth Usage: 2.5 Gbps
scsi1 : mvsas


It works now. But I really need some one to confirm whether my
modification is enough or not,
if there have other potential problems.



Best regards.

--
Cyberman Wu


2012-10-26 08:03:37

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: PCIe IO space support on Tilera GX: Is there any one who can confirm my modification to fix it is OK?

[+cc Chris, also a few comments below]

On Fri, Oct 26, 2012 at 12:59 AM, Cyberman Wu <[email protected]> wrote:
> After we upgrade to MDE 4.1.0 from Tilera, we encounter a problem that
> only on HighPoint 2680 card works, I've
> tried to fix it, but since most time I'm working in user space, I'm
> not sure my fix is enough. Their FAE said that
> the guy who add PCIe I/O space support is on vacation and I can't get
> help from him now, I hope maybe there
> will have somebody can help.
>
>
> Problem we encountered:
>
> pci 0000:00:00.0: BAR 8: assigned [mem 0x100c0000000-0x100c00fffff]
> pci 0000:00:00.0: BAR 9: assigned [mem 0x100c0100000-0x100c01fffff pref]
> pci 0000:00:00.0: BAR 7: assigned [io 0x0000-0x0fff]
> pci 0000:01:00.0: BAR 6: assigned [mem 0x100c0100000-0x100c013ffff pref]
> pci 0000:01:00.0: BAR 6: set to [mem 0x100c0100000-0x100c013ffff pref]
> (PCI address [0xc0100000-0xc013ffff])
> pci 0000:01:00.0: BAR 4: assigned [mem 0x100c0000000-0x100c000ffff 64bit]
> pci 0000:01:00.0: BAR 4: set to [mem 0x100c0000000-0x100c000ffff
> 64bit] (PCI address [0xc0000000-0xc000ffff])
> pci 0000:01:00.0: BAR 2: assigned [io 0x0000-0x007f]
> pci 0000:01:00.0: BAR 2: set to [io 0x0000-0x007f] (PCI address [0x0-0x7f])
> pci 0000:00:00.0: PCI bridge to [bus 01-01]
> pci 0000:00:00.0: bridge window [io 0x0000-0x0fff]
> pci 0000:00:00.0: bridge window [mem 0x100c0000000-0x100c00fffff]
> pci 0000:00:00.0: bridge window [mem 0x100c0100000-0x100c01fffff pref]
> pci 0001:00:00.0: BAR 8: assigned [mem 0x101c0000000-0x101c00fffff]
> pci 0001:00:00.0: BAR 9: assigned [mem 0x101c0100000-0x101c01fffff pref]
> pci 0001:00:00.0: BAR 7: assigned [io 0x0000-0x0fff]
> pci 0001:01:00.0: BAR 6: assigned [mem 0x101c0100000-0x101c013ffff pref]
> pci 0001:01:00.0: BAR 6: set to [mem 0x101c0100000-0x101c013ffff pref]
> (PCI address [0xc0100000-0xc013ffff])
> pci 0001:01:00.0: BAR 4: assigned [mem 0x101c0000000-0x101c000ffff 64bit]
> pci 0001:01:00.0: BAR 4: set to [mem 0x101c0000000-0x101c000ffff
> 64bit] (PCI address [0xc0000000-0xc000ffff])
> pci 0001:01:00.0: BAR 2: assigned [io 0x0000-0x007f]
> pci 0001:01:00.0: BAR 2: set to [io 0x0000-0x007f] (PCI address [0x0-0x7f])
> pci 0001:00:00.0: PCI bridge to [bus 01-01]
> pci 0001:00:00.0: bridge window [io 0x0000-0x0fff]
> pci 0001:00:00.0: bridge window [mem 0x101c0000000-0x101c00fffff]
> pci 0001:00:00.0: bridge window [mem 0x101c0100000-0x101c01fffff pref]
> pci 0000:00:00.0: enabling device (0006 -> 0007)
> pci 0001:00:00.0: enabling device (0006 -> 0007)
> pci_bus 0000:00: resource 0 [io 0x0000-0xffffffff]
> pci_bus 0000:00: resource 1 [mem 0x100c0000000-0x100ffffffff]
> pci_bus 0000:01: resource 0 [io 0x0000-0x0fff]
> pci_bus 0000:01: resource 1 [mem 0x100c0000000-0x100c00fffff]
> pci_bus 0000:01: resource 2 [mem 0x100c0100000-0x100c01fffff pref]
> pci_bus 0001:00: resource 0 [io 0x0000-0xffffffff]
> pci_bus 0001:00: resource 1 [mem 0x101c0000000-0x101ffffffff]
> pci_bus 0001:01: resource 0 [io 0x0000-0x0fff]
> pci_bus 0001:01: resource 1 [mem 0x101c0000000-0x101c00fffff]
> pci_bus 0001:01: resource 2 [mem 0x101c0100000-0x101c01fffff pref]
> ......
> mvsas 0000:01:00.0: mvsas: driver version 0.8.2
> mvsas 0000:01:00.0: enabling device (0000 -> 0003)
> mvsas 0000:01:00.0: enabling bus mastering
> mvsas 0000:01:00.0: mvsas: PCI-E x4, Bandwidth Usage: 2.5 Gbps
> mvsas 0000:01:00.0: Phy3 : No sig fis
> scsi0 : mvsas
> ......
> mvsas 0001:01:00.0: mvsas: driver version 0.8.2
> mvsas 0001:01:00.0: enabling device (0000 -> 0003)
> mvsas 0001:01:00.0: enabling bus mastering
> mvsas 0001:01:00.0: BAR 2: can't reserve [io 0x0000-0x007f]
> mvsas: probe of 0001:01:00.0 failed with error -16
>
>
> My modification:
>
> --- /opt/tilera/TileraMDE-4.1.0.148119/tilegx/src/linux-2.6.40.38/arch/tile/kernel/pci_gx.c 2012-10-22
> 14:56:59.783096378 +0800
> +++ Tilera_src/src/linux-2.6.40.38/arch/tile/kernel/pci_gx.c 2012-10-26
> 13:55:02.731947886 +0800
> @@ -368,6 +368,10 @@
> int num_trio_shims = 0;
> int ctl_index = 0;
> int i, j;
> + // Modified by Cyberman Wu on Oct 25th, 2012.
> + resource_size_t io_mem_start;
> + resource_size_t io_mem_end;
> + resource_size_t io_mem_size;
>
> if (!pci_probe) {
> pr_info("PCI: disabled by boot argument\n");
> @@ -457,6 +461,18 @@
> }
>
> out:
> + // Use IO memory space 0~0xffffffff for every controller will
> + // cause device on controller other than the first failed to
> + // load driver if it using IO regions.
> + // Is reserve the first 4K IO address space OK? Tilera use
> + // IO space address begin from 0, but some drivers in Linux
> + // recognize 0 address a error, say, mvsas, so for compatiblity
> + // reserve some address from 0 should be better?

It's not that mvsas thinks I/O address 0 is invalid, it's just that we
already assigned [io 0x0000-0x007f] to the device at 0000:01:00.0:

pci 0000:01:00.0: BAR 2: set to [io 0x0000-0x007f]

so that range can't also be assigned to 0001:01:00.0.

> + // Modified by Cyberman Wu on Oct 25th, 2012.
> + io_mem_start = 4096;
> + io_mem_end = (resource_size_t)IO_SPACE_LIMIT + 1;
> + io_mem_size = (io_mem_end - io_mem_start) / num_rc_controllers;
> + io_mem_size &= ~3;
> /*
> * Configure each PCIe RC port.
> */
> @@ -470,8 +486,9 @@
> controller->index = i;
> controller->ops = &tile_cfg_ops;
>
> - controller->io_space.start = 0;
> - controller->io_space.end = IO_SPACE_LIMIT;
> + // Modified by Cyberman Wu on Oct 25th, 2012.
> + controller->io_space.start = io_mem_start + (i * io_mem_size);
> + controller->io_space.end = controller->io_space.start + io_mem_size - 1;
> controller->io_space.flags = IORESOURCE_IO;
> snprintf(controller->io_space_name,
> sizeof(controller->io_space_name),
>
>
> Please note that we're using MDE-4.1.0, which use kernel 3.0.38, patch
> it and reversion it
> to 2.6.40.38.
> I've checked source code under arch/tile of kernel 3.6.3 and PCIe I/O
> space support is still
> not here. Below is diff of arch/tile/pci_gx.c between kernel 3.6.3 and
> MDE-4.1.0:

Per http://lkml.indiana.edu/hypermail/linux/kernel/1205.1/01176.html,
Chris considered adding I/O space support and decided against it at
that time, partly because it would use up a TRIO PIO region.

I don't know his current thoughts. Possibly it could be done under a
config option or something.

But of course, you'd have to do it by adding I/O space support to the
current 3.6 kernel *without* reverting all the other changes that have
been made since 2.6.40.

> --- .cache/.fr-9Oo37J/linux-3.6.3/arch/tile/kernel/pci_gx.c 2012-10-22
> 00:32:56.000000000 +0800
> +++ /opt/tilera/TileraMDE-4.1.0.148119/tilegx/src/linux-2.6.40.38/arch/tile/kernel/pci_gx.c 2012-10-22
> 14:56:59.783096378 +0800
> @@ -69,19 +69,18 @@
> * a HW PCIe link-training bug. The exact delay is specified with
> * a kernel boot argument in the form of "pcie_rc_delay=T,P,S",
> * where T is the TRIO instance number, P is the port number and S is
> - * the delay in seconds. If the delay is not provided, the value
> - * will be DEFAULT_RC_DELAY.
> + * the delay in seconds. If the argument is specified, but the delay is
> + * not provided, the value will be DEFAULT_RC_DELAY.
> */
> static int __devinitdata rc_delay[TILEGX_NUM_TRIO][TILEGX_TRIO_PCIES];
>
> /* Default number of seconds that the PCIe RC port probe can be delayed. */
> #define DEFAULT_RC_DELAY 10
>
> -/* Max number of seconds that the PCIe RC port probe can be delayed. */
> -#define MAX_RC_DELAY 20
> -
> +#if !defined(GX_FPGA)
> /* Array of the PCIe ports configuration info obtained from the BIB. */
> struct pcie_port_property pcie_ports[TILEGX_NUM_TRIO][TILEGX_TRIO_PCIES];
> +#endif
>
> /* All drivers share the TRIO contexts defined here. */
> gxio_trio_context_t trio_contexts[TILEGX_NUM_TRIO];
> @@ -97,6 +96,41 @@
> static struct cpumask intr_cpus_map;
>
> /*
> + * Convert a resource to a PCI device bus address or bus window.
> + */
> +void __devinit
> +pcibios_resource_to_bus(struct pci_dev *dev, struct pci_bus_region *region,
> + struct resource *res)
> +{
> + struct pci_controller *controller =
> + (struct pci_controller *)dev->sysdata;
> + unsigned long offset = 0;
> +
> + if (res->flags & IORESOURCE_MEM)
> + offset = controller->mem_offset;
> +
> + region->start = res->start - offset;
> + region->end = res->end - offset;
> +}
> +EXPORT_SYMBOL(pcibios_resource_to_bus);
> +
> +void __devinit
> +pcibios_bus_to_resource(struct pci_dev *dev, struct resource *res,
> + struct pci_bus_region *region)
> +{
> + struct pci_controller *controller =
> + (struct pci_controller *)dev->sysdata;
> + unsigned long offset = 0;
> +
> + if (res->flags & IORESOURCE_MEM)
> + offset = controller->mem_offset;
> +
> + res->start = region->start + offset;
> + res->end = region->end + offset;
> +}
> +EXPORT_SYMBOL(pcibios_bus_to_resource);
> +
> +/*
> * We don't need to worry about the alignment of resources.
> */
> resource_size_t pcibios_align_resource(void *data, const struct resource *res,
> @@ -274,6 +308,10 @@
>
> cpumask_copy(&intr_cpus_map, cpu_online_mask);
>
> +#ifdef CONFIG_DATAPLANE
> + /* Remove dataplane cpus. */
> + cpumask_andnot(&intr_cpus_map, &intr_cpus_map, &dataplane_map);
> +#endif
>
> for (i = 0; i < 4; i++) {
> gxio_trio_context_t *context = controller->trio;
> @@ -325,7 +363,7 @@
> *
> * Returns the number of controllers discovered.
> */
> -int __init tile_pci_init(void)
> +int __devinit tile_pci_init(void)
> {
> int num_trio_shims = 0;
> int ctl_index = 0;
> @@ -359,6 +397,7 @@
> * We look at the Board Information Block first and then see if there
> * are any overriding configuration by the HW strapping pin.
> */
> +#if !defined(GX_FPGA)
> for (i = 0; i < TILEGX_NUM_TRIO; i++) {
> gxio_trio_context_t *context = &trio_contexts[i];
> int ret;
> @@ -386,6 +425,13 @@
> }
> }
> }
> +#else
> + /*
> + * For now, just assume that there is a single RC port on trio/0.
> + */
> + num_rc_controllers = 1;
> + pcie_rc[0][2] = 1;
> +#endif
>
> /*
> * Return if no PCIe ports are configured to operate in RC mode.
> @@ -424,13 +470,20 @@
> controller->index = i;
> controller->ops = &tile_cfg_ops;
>
> + controller->io_space.start = 0;
> + controller->io_space.end = IO_SPACE_LIMIT;
> + controller->io_space.flags = IORESOURCE_IO;
> + snprintf(controller->io_space_name,
> + sizeof(controller->io_space_name),
> + "PCI I/O domain %d", i);
> + controller->io_space.name = controller->io_space_name;
> +
> /*
> * The PCI memory resource is located above the PA space.
> * For every host bridge, the BAR window or the MMIO aperture
> * is in range [3GB, 4GB - 1] of a 4GB space beyond the
> * PA space.
> */
> -
> controller->mem_offset = TILE_PCI_MEM_START +
> (i * TILE_PCI_BAR_WINDOW_TOP);
> controller->mem_space.start = controller->mem_offset +
> @@ -451,7 +504,7 @@
> * (pin - 1) converts from the PCI standard's [1:4] convention to
> * a normal [0:3] range.
> */
> -static int tile_map_irq(const struct pci_dev *dev, u8 device, u8 pin)
> +static int tile_map_irq(struct pci_dev *dev, u8 device, u8 pin)
> {
> struct pci_controller *controller =
> (struct pci_controller *)dev->sysdata;
> @@ -463,11 +516,12 @@
> controller)
> {
> gxio_trio_context_t *trio_context = controller->trio;
> - struct pci_bus *root_bus = controller->root_bus;
> TRIO_PCIE_RC_DEVICE_CONTROL_t dev_control;
> TRIO_PCIE_RC_DEVICE_CAP_t rc_dev_cap;
> + unsigned int smallest_max_payload;
> + struct pci_dev *dev = NULL;
> unsigned int reg_offset;
> - struct pci_bus *child;
> + u16 new_values;
> int mac;
> int err;
>
> @@ -508,33 +562,59 @@
> __gxio_mmio_write32(trio_context->mmio_base_mac + reg_offset,
> rc_dev_cap.word);
>
> - /* Configure PCI Express MPS setting. */
> - list_for_each_entry(child, &root_bus->children, node) {
> - struct pci_dev *self = child->self;
> - if (!self)
> + smallest_max_payload = rc_dev_cap.mps_sup;
> +
> + /* Scan for the smallest maximum payload size. */
> + while ((dev = pci_get_device(PCI_ANY_ID, PCI_ANY_ID, dev)) != NULL) {
> + int pcie_caps_offset;
> + u32 devcap;
> + int max_payload;
> +
> + /* Skip device that is not in this PCIe domain. */
> + if ((struct pci_controller *)dev->sysdata != controller)
> continue;
>
> - pcie_bus_configure_settings(child, self->pcie_mpss);
> + pcie_caps_offset = pci_find_capability(dev, PCI_CAP_ID_EXP);
> + if (pcie_caps_offset == 0)
> + continue;
> +
> + pci_read_config_dword(dev, pcie_caps_offset + PCI_EXP_DEVCAP,
> + &devcap);
> + max_payload = devcap & PCI_EXP_DEVCAP_PAYLOAD;
> + if (max_payload < smallest_max_payload)
> + smallest_max_payload = max_payload;
> + }
> +
> + /* Now, set the max_payload_size for all devices to that value. */
> + new_values = smallest_max_payload << 5;
> + while ((dev = pci_get_device(PCI_ANY_ID, PCI_ANY_ID, dev)) != NULL) {
> + int pcie_caps_offset;
> + u16 devctl;
> +
> + /* Skip device that is not in this PCIe domain. */
> + if ((struct pci_controller *)dev->sysdata != controller)
> + continue;
> +
> + pcie_caps_offset = pci_find_capability(dev, PCI_CAP_ID_EXP);
> + if (pcie_caps_offset == 0)
> + continue;
> +
> + pci_read_config_word(dev, pcie_caps_offset + PCI_EXP_DEVCTL,
> + &devctl);
> + devctl &= ~PCI_EXP_DEVCTL_PAYLOAD;
> + devctl |= new_values;
> + pci_write_config_word(dev, pcie_caps_offset + PCI_EXP_DEVCTL,
> + devctl);
> }
>
> /*
> * Set the mac_config register in trio based on the MPS/MRS of the link.
> */
> - reg_offset =
> - (TRIO_PCIE_RC_DEVICE_CONTROL <<
> - TRIO_CFG_REGION_ADDR__REG_SHIFT) |
> - (TRIO_CFG_REGION_ADDR__INTFC_VAL_MAC_STANDARD <<
> - TRIO_CFG_REGION_ADDR__INTFC_SHIFT ) |
> - (mac << TRIO_CFG_REGION_ADDR__MAC_SEL_SHIFT);
> -
> - dev_control.word = __gxio_mmio_read32(trio_context->mmio_base_mac +
> - reg_offset);
> -
> err = gxio_trio_set_mps_mrs(trio_context,
> - dev_control.max_payload_size,
> + smallest_max_payload,
> dev_control.max_read_req_sz,
> mac);
> - if (err < 0) {
> + if (err < 0) {
> pr_err("PCI: PCIE_CONFIGURE_MAC_MPS_MRS failure, "
> "MAC %d on TRIO %d\n",
> mac, controller->trio_index);
> @@ -571,14 +651,9 @@
> if (!isdigit(*str))
> return -EINVAL;
> delay = simple_strtoul(str, (char **)&str, 10);
> - if (delay > MAX_RC_DELAY)
> - return -EINVAL;
> }
>
> rc_delay[trio_index][mac] = delay ? : DEFAULT_RC_DELAY;
> - pr_info("Delaying PCIe RC link training for %u sec"
> - " on MAC %lu on TRIO %lu\n", rc_delay[trio_index][mac],
> - mac, trio_index);
> return 0;
> }
> early_param("pcie_rc_delay", setup_pcie_rc_delay);
> @@ -586,18 +661,14 @@
> /*
> * PCI initialization entry point, called by subsys_initcall.
> */
> -int __init pcibios_init(void)
> +int __devinit pcibios_init(void)
> {
> resource_size_t offset;
> - LIST_HEAD(resources);
> int next_busno;
> int i;
>
> tile_pci_init();
>
> - if (num_rc_controllers == 0 && num_ep_controllers == 0)
> - return 0;
> -
> /*
> * We loop over all the TRIO shims and set up the MMIO mappings.
> */
> @@ -623,6 +694,9 @@
> }
> }
>
> + if (num_rc_controllers == 0 && num_ep_controllers == 0)
> + return 0;
> +
> /*
> * Delay a bit in case devices aren't ready. Some devices are
> * known to require at least 20ms here, but we use a more
> @@ -684,15 +758,36 @@
> }
>
> /*
> - * Delay the RC link training if needed.
> + * Delay the bus probe if needed.
> */
> - if (rc_delay[trio_index][mac])
> + if (rc_delay[trio_index][mac]) {
> + pr_info("Delaying PCIe RC link training for %d sec"
> + " on MAC %d on TRIO %d\n",
> + rc_delay[trio_index][mac], mac,
> + trio_index);
> msleep(rc_delay[trio_index][mac] * 1000);
> + }
>
> - ret = gxio_trio_force_rc_link_up(trio_context, mac);
> - if (ret < 0)
> - pr_err("PCI: PCIE_FORCE_LINK_UP failure, "
> - "MAC %d on TRIO %d\n", mac, trio_index);
> + /*
> + * Check for PCIe link-up status to decide if we need
> + * to force the link to come up.
> + */
> + reg_offset =
> + (TRIO_PCIE_INTFC_PORT_STATUS <<
> + TRIO_CFG_REGION_ADDR__REG_SHIFT) |
> + (TRIO_CFG_REGION_ADDR__INTFC_VAL_MAC_INTERFACE <<
> + TRIO_CFG_REGION_ADDR__INTFC_SHIFT ) |
> + (mac << TRIO_CFG_REGION_ADDR__MAC_SEL_SHIFT);
> +
> + port_status.word =
> + __gxio_mmio_read(trio_context->mmio_base_mac +
> + reg_offset);
> + if (!port_status.dl_up) {
> + ret = gxio_trio_force_rc_link_up(trio_context, mac);
> + if (ret < 0)
> + pr_err("PCI: PCIE_FORCE_LINK_UP failure, "
> + "MAC %d on TRIO %d\n", mac, trio_index);
> + }
>
> pr_info("PCI: Found PCI controller #%d on TRIO %d MAC %d\n", i,
> trio_index, controller->mac);
> @@ -704,22 +799,20 @@
> msleep(1000);
>
> /*
> - * Check for PCIe link-up status.
> + * Check for PCIe link-up status again.
> */
> -
> - reg_offset =
> - (TRIO_PCIE_INTFC_PORT_STATUS <<
> - TRIO_CFG_REGION_ADDR__REG_SHIFT) |
> - (TRIO_CFG_REGION_ADDR__INTFC_VAL_MAC_INTERFACE <<
> - TRIO_CFG_REGION_ADDR__INTFC_SHIFT ) |
> - (mac << TRIO_CFG_REGION_ADDR__MAC_SEL_SHIFT);
> -
> port_status.word =
> __gxio_mmio_read(trio_context->mmio_base_mac +
> reg_offset);
> if (!port_status.dl_up) {
> - pr_err("PCI: link is down, MAC %d on TRIO %d\n",
> - mac, trio_index);
> + if (pcie_ports[trio_index][mac].removable) {
> + pr_info("PCI: link is down, MAC %d on TRIO %d",
> + mac, trio_index);
> + pr_info("This is expected if no PCIe card"
> + " is connected to this link");
> + } else
> + pr_err("PCI: link is down, MAC %d on TRIO %d",
> + mac, trio_index);
> continue;
> }
>
> @@ -842,19 +935,22 @@
> }
>
> /*
> - * The PCI memory resource is located above the PA space.
> - * The memory range for the PCI root bus should not overlap
> - * with the physical RAM
> + * This comes from the generic Linux PCI driver.
> + *
> + * It reads the PCI tree for this bus into the Linux
> + * data structures.
> + *
> + * This is inlined in linux/pci.h and calls into
> + * pci_scan_bus_parented() in probe.c.
> */
> - pci_add_resource_offset(&resources, &controller->mem_space,
> - controller->mem_offset);
> -
> - controller->first_busno = next_busno;
> - bus = pci_scan_root_bus(NULL, next_busno, controller->ops,
> - controller, &resources);
> + controller->first_busno= next_busno;
> + bus = pci_scan_bus(next_busno, controller->ops, controller);
> controller->root_bus = bus;
> - next_busno = bus->busn_res.end + 1;
> -
> +#if 0
> + next_busno = bus->subordinate + 1;
> +#else
> + next_busno = 0;
> +#endif
> }
>
> /* Do machine dependent PCI interrupt routing */
> @@ -951,6 +1047,37 @@
> }
>
> /*
> + * Alloc a PIO region for PCI I/O space access for each RC port.
> + */
> + ret = gxio_trio_alloc_pio_regions(trio_context, 1, 0, 0);
> + if (ret < 0) {
> + pr_err("PCI: I/O PIO alloc failure on TRIO %d mac %d, "
> + "give up\n", controller->trio_index,
> + controller->mac);
> +
> + continue;
> + }
> +
> + controller->pio_io_index = ret;
> +
> + /*
> + * For PIO IO, the bus_address_hi parameter is hard-coded 0
> + * because PCI I/O address space is 32-bit.
> + */
> + ret = gxio_trio_init_pio_region_aux(trio_context,
> + controller->pio_io_index,
> + controller->mac,
> + 0,
> + HV_TRIO_PIO_FLAG_IO_SPACE);
> + if (ret < 0) {
> + pr_err("PCI: I/O PIO init failure on TRIO %d mac %d, "
> + "give up\n", controller->trio_index,
> + controller->mac);
> +
> + continue;
> + }
> +
> + /*
> * Configure a Mem-Map region for each memory controller so
> * that Linux can map all of its PA space to the PCI bus.
> * Use the IOMMU to handle hash-for-home memory.
> @@ -1015,9 +1142,22 @@
> }
> subsys_initcall(pcibios_init);
>
> -/* Note: to be deleted after Linux 3.6 merge. */
> +/*
> + * PCI scan code calls the arch specific pcibios_fixup_bus() each time it scans
> + * a new bridge. Called after each bus is probed, but before its children are
> + * examined.
> + */
> void __devinit pcibios_fixup_bus(struct pci_bus *bus)
> {
> + struct pci_dev *dev = bus->self;
> +
> + if (!dev) {
> + struct pci_controller *controller = bus->sysdata;
> +
> + /* This is the root bus. */
> + bus->resource[0] = &controller->io_space;
> + bus->resource[1] = &controller->mem_space;
> + }
> }
>
> /*
> @@ -1043,8 +1183,7 @@
>
> /*
> * Enable memory address decoding, as appropriate, for the
> - * device described by the 'dev' struct. The I/O decoding
> - * is disabled, though the TILE-Gx supports I/O addressing.
> + * device described by the 'dev' struct.
> *
> * This is called from the generic PCI layer, and can be called
> * for bridges or endpoints.
> @@ -1126,10 +1265,95 @@
> * We need to keep the PCI bus address's in-page offset in the VA.
> */
> return iorpc_ioremap(trio_fd, offset, size) +
> - (phys_addr & (PAGE_SIZE - 1));
> + (start & (PAGE_SIZE - 1));
> }
> EXPORT_SYMBOL(ioremap);
>
> +/* Map a PCI I/O address into VA space. */
> +void __iomem *ioport_map(unsigned long port, unsigned int size)
> +{
> + struct pci_controller *controller = NULL;
> + resource_size_t bar_start;
> + resource_size_t bar_end;
> + resource_size_t offset;
> + resource_size_t start;
> + resource_size_t end;
> + int trio_fd;
> + int i;
> +
> + start = port;
> + end = port + size - 1;
> +
> + /*
> + * In the following, each PCI controller's mem_resources[0]
> + * represents its PCI I/O resource. By searching port in each
> + * controller's mem_resources[0], we can determine the controller
> + * that should accept the PCI I/O access.
> + */
> +
> + for (i = 0; i < num_rc_controllers; i++) {
> + /*
> + * Skip controllers that are not properly initialized or
> + * have down links.
> + */
> + if (pci_controllers[i].root_bus == NULL)
> + continue;
> +
> + bar_start = pci_controllers[i].mem_resources[0].start;
> + bar_end = pci_controllers[i].mem_resources[0].end;
> +
> + if ((start >= bar_start) && (end <= bar_end)) {
> +
> + controller = &pci_controllers[i];
> +
> + goto got_it;
> + }
> + }
> +
> + if (controller == NULL)
> + return NULL;
> +
> +got_it:
> + trio_fd = controller->trio->fd;
> +
> + offset = HV_TRIO_PIO_OFFSET(controller->pio_io_index) + port;
> +
> + /*
> + * We need to keep the PCI bus address's in-page offset in the VA.
> + */
> + return iorpc_ioremap(trio_fd, offset, size) + (port & (PAGE_SIZE - 1));
> +}
> +EXPORT_SYMBOL(ioport_map);
> +
> +void ioport_unmap(void __iomem *addr)
> +{
> + iounmap(addr);
> +}
> +EXPORT_SYMBOL(ioport_unmap);
> +
> +/*
> + * Create a virtual mapping cookie for a PCI BAR (memory or IO).
> + */
> +void __iomem *pci_iomap(struct pci_dev *dev, int bar, unsigned long max)
> +{
> + resource_size_t start = pci_resource_start(dev, bar);
> + resource_size_t len = pci_resource_len(dev, bar);
> + unsigned long flags = pci_resource_flags(dev, bar);
> +
> + if (!len)
> + return NULL;
> + if (max && len > max)
> + len = max;
> + if (flags & IORESOURCE_IO)
> + return ioport_map(start, len);
> + if (flags & IORESOURCE_MEM)
> + return ioremap(start, len);
> +
> + pr_err("PCI: Trying to map invalid resource %#lx\n", flags);
> + return NULL;
> +}
> +EXPORT_SYMBOL(pci_iomap);
> +
> void pci_iounmap(struct pci_dev *dev, void __iomem *addr)
> {
> iounmap(addr);
> @@ -1478,32 +1702,55 @@
> trio_context = controller->trio;
>
> /*
> - * Allocate the Mem-Map that will accept the MSI write and
> - * trigger the TILE-side interrupts.
> - */
> - mem_map = gxio_trio_alloc_memory_maps(trio_context, 1, 0, 0);
> - if (mem_map < 0) {
> - dev_printk(KERN_INFO, &pdev->dev,
> - "%s Mem-Map alloc failure. "
> - "Failed to initialize MSI interrupts. "
> - "Falling back to legacy interrupts.\n",
> - desc->msi_attrib.is_msix ? "MSI-X" : "MSI");
> + * Allocate a scatter-queue that will accept the MSI write and
> + * trigger the TILE-side interrupts. We use the scatter-queue regions
> + * before the mem map regions, because the latter are needed by more
> + * applications.
> + */
> + mem_map = gxio_trio_alloc_scatter_queues(trio_context, 1, 0, 0);
> + if (mem_map >= 0) {
> + TRIO_MAP_SQ_DOORBELL_FMT_t doorbell_template = {{
> + .pop = 0,
> + .doorbell = 1,
> + }};
> +
> + mem_map += TRIO_NUM_MAP_MEM_REGIONS;
> + mem_map_base = MEM_MAP_INTR_REGIONS_BASE +
> + mem_map * MEM_MAP_INTR_REGION_SIZE;
> + mem_map_limit = mem_map_base + MEM_MAP_INTR_REGION_SIZE - 1;
> +
> + msi_addr = mem_map_base + MEM_MAP_INTR_REGION_SIZE - 8;
> + msg.data = (unsigned int)doorbell_template.word;
> + } else {
> + /* SQ regions are out, allocate from map mem regions. */
> + mem_map = gxio_trio_alloc_memory_maps(trio_context, 1, 0, 0);
> + if (mem_map < 0) {
> + dev_printk(KERN_INFO, &pdev->dev,
> + "%s Mem-Map alloc failure. "
> + "Failed to initialize MSI interrupts. "
> + "Falling back to legacy interrupts.\n",
> + desc->msi_attrib.is_msix ? "MSI-X" : "MSI");
> + ret = -ENOMEM;
> + goto msi_mem_map_alloc_failure;
> + }
>
> - ret = -ENOMEM;
> - goto msi_mem_map_alloc_failure;
> + mem_map_base = MEM_MAP_INTR_REGIONS_BASE +
> + mem_map * MEM_MAP_INTR_REGION_SIZE;
> + mem_map_limit = mem_map_base + MEM_MAP_INTR_REGION_SIZE - 1;
> +
> + msi_addr = mem_map_base + TRIO_MAP_MEM_REG_INT3 -
> + TRIO_MAP_MEM_REG_INT0;
> +
> + msg.data = mem_map;
> }
>
> /* We try to distribute different IRQs to different tiles. */
> cpu = tile_irq_cpu(irq);
>
> /*
> - * Now call up to the HV to configure the Mem-Map interrupt and
> + * Now call up to the HV to configure the MSI interrupt and
> * set up the IPI binding.
> */
> - mem_map_base = MEM_MAP_INTR_REGIONS_BASE +
> - mem_map * MEM_MAP_INTR_REGION_SIZE;
> - mem_map_limit = mem_map_base + MEM_MAP_INTR_REGION_SIZE - 1;
> -
> ret = gxio_trio_config_msi_intr(trio_context, cpu_x(cpu), cpu_y(cpu),
> KERNEL_PL, irq, controller->mac,
> mem_map, mem_map_base, mem_map_limit,
> @@ -1516,13 +1763,9 @@
>
> irq_set_msi_desc(irq, desc);
>
> - msi_addr = mem_map_base + TRIO_MAP_MEM_REG_INT3 - TRIO_MAP_MEM_REG_INT0;
> -
> msg.address_hi = msi_addr >> 32;
> msg.address_lo = msi_addr & 0xffffffff;
>
> - msg.data = mem_map;
> -
> write_msi_msg(irq, &msg);
> irq_set_chip_and_handler(irq, &tilegx_msi_chip, handle_level_irq);
> irq_set_handler_data(irq, controller);
>
>
> What we got after my fix:
>
> pci 0000:00:00.0: BAR 8: assigned [mem 0x100c0000000-0x100c00fffff]
> pci 0000:00:00.0: BAR 9: assigned [mem 0x100c0100000-0x100c01fffff pref]
> pci 0000:00:00.0: BAR 7: assigned [io 0x1000-0x1fff]
> pci 0000:01:00.0: BAR 6: assigned [mem 0x100c0100000-0x100c013ffff pref]
> pci 0000:01:00.0: BAR 6: set to [mem 0x100c0100000-0x100c013ffff pref]
> (PCI address [0xc0100000-0xc013ffff])
> pci 0000:01:00.0: BAR 4: assigned [mem 0x100c0000000-0x100c000ffff
> 64bit]
> pci 0000:01:00.0: BAR 4: set to [mem 0x100c0000000-0x100c000ffff
> 64bit] (PCI address [0xc0000000-0xc000ffff])
> pci 0000:01:00.0: BAR 2: assigned [io 0x1000-0x107f]
> pci 0000:01:00.0: BAR 2: set to [io 0x1000-0x107f] (PCI address
> [0x1000-0x107f])
> pci 0000:00:00.0: PCI bridge to [bus 01-01]
> pci 0000:00:00.0: bridge window [io 0x1000-0x1fff]
> pci 0000:00:00.0: bridge window [mem 0x100c0000000-0x100c00fffff]
> pci 0000:00:00.0: bridge window [mem 0x100c0100000-0x100c01fffff
> pref]
> pci 0001:00:00.0: BAR 8: assigned [mem 0x101c0000000-0x101c00fffff]
> pci 0001:00:00.0: BAR 9: assigned [mem 0x101c0100000-0x101c01fffff
> pref]
> pci 0001:00:00.0: BAR 7: assigned [io 0x80001000-0x80001fff]
> pci 0001:01:00.0: BAR 6: assigned [mem 0x101c0100000-0x101c013ffff
> pref]
> pci 0001:01:00.0: BAR 6: set to [mem 0x101c0100000-0x101c013ffff pref]
> (PCI address [0xc0100000-0xc013ffff])
> pci 0001:01:00.0: BAR 4: assigned [mem 0x101c0000000-0x101c000ffff
> 64bit]
> pci 0001:01:00.0: BAR 4: set to [mem 0x101c0000000-0x101c000ffff
> 64bit] (PCI address [0xc0000000-0xc000ffff])
> pci 0001:01:00.0: BAR 2: assigned [io 0x80001000-0x8000107f]
> pci 0001:01:00.0: BAR 2: set to [io 0x80001000-0x8000107f] (PCI
> address [0x80001000-0x8000107f])
> pci 0001:00:00.0: PCI bridge to [bus 01-01]
> pci 0001:00:00.0: bridge window [io 0x80001000-0x80001fff]
> pci 0001:00:00.0: bridge window [mem 0x101c0000000-0x101c00fffff]
> pci 0001:00:00.0: bridge window [mem 0x101c0100000-0x101c01fffff
> pref]
> pci 0000:00:00.0: enabling device (0006 -> 0007)
> pci 0001:00:00.0: enabling device (0006 -> 0007)
> pci_bus 0000:00: resource 0 [io 0x1000-0x800007ff]
> pci_bus 0000:00: resource 1 [mem 0x100c0000000-0x100ffffffff]
> pci_bus 0000:01: resource 0 [io 0x1000-0x1fff]
> pci_bus 0000:01: resource 1 [mem 0x100c0000000-0x100c00fffff]
> pci_bus 0000:01: resource 2 [mem 0x100c0100000-0x100c01fffff pref]
> pci_bus 0001:00: resource 0 [io 0x80000800-0xffffffff]
> pci_bus 0001:00: resource 1 [mem 0x101c0000000-0x101ffffffff]
> pci_bus 0001:01: resource 0 [io 0x80001000-0x80001fff]
> pci_bus 0001:01: resource 1 [mem 0x101c0000000-0x101c00fffff]
> pci_bus 0001:01: resource 2 [mem 0x101c0100000-0x101c01fffff pref]
> ......
> mvsas 0000:01:00.0: mvsas: driver version 0.8.2
> mvsas 0000:01:00.0: enabling device (0000 -> 0003)
> mvsas 0000:01:00.0: enabling bus mastering
> mvsas 0000:01:00.0: mvsas: PCI-E x4, Bandwidth Usage: 2.5 Gbps
> scsi0 : mvsas
> ......
> mvsas 0001:01:00.0: mvsas: driver version 0.8.2
> mvsas 0001:01:00.0: enabling device (0000 -> 0003)
> mvsas 0001:01:00.0: enabling bus mastering
> mvsas 0001:01:00.0: mvsas: PCI-E x4, Bandwidth Usage: 2.5 Gbps
> scsi1 : mvsas
>
>
> It works now. But I really need some one to confirm whether my
> modification is enough or not,
> if there have other potential problems.
>
>
>
> Best regards.
>
> --
> Cyberman Wu
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2012-10-26 09:01:19

by Cyberman Wu

[permalink] [raw]
Subject: Re: PCIe IO space support on Tilera GX: Is there any one who can confirm my modification to fix it is OK?

We're not using 3.6.x, we're using is from MDE-4.1.0 from Tilera and
it patch 3.0.38.
>From its release notes that PCIe I/O space is already supported.
I provide diff of pci_gx.c between 3.6.3 and MDE-4.1.0 for a hint,
since I don't know
if it's allowed to use attached file in mail list, and their patch for
3.0.38 is bigger than
7MB.

For mvsas, it seems do think 0 I/O address invalied. Some code from
drivers/scsi/mvsas/mv_init.c:
int mvs_ioremap(struct mvs_info *mvi, int bar, int bar_ex)
{
unsigned long res_start, res_len, res_flag, res_flag_ex = 0;
struct pci_dev *pdev = mvi->pdev;
if (bar_ex != -1) {
/*
* ioremap main and peripheral registers
*/
res_start = pci_resource_start(pdev, bar_ex);
res_len = pci_resource_len(pdev, bar_ex);
if (!res_start || !res_len)
goto err_out;

res_flag_ex = pci_resource_flags(pdev, bar_ex);
if (res_flag_ex & IORESOURCE_MEM) {
if (res_flag_ex & IORESOURCE_CACHEABLE)
mvi->regs_ex = ioremap(res_start, res_len);
else
mvi->regs_ex = ioremap_nocache(res_start,
res_len);
} else
mvi->regs_ex = (void *)res_start;
if (!mvi->regs_ex)
goto err_out;
}

res_start = pci_resource_start(pdev, bar);
res_len = pci_resource_len(pdev, bar);
if (!res_start || !res_len)
goto err_out;

res_flag = pci_resource_flags(pdev, bar);
if (res_flag & IORESOURCE_CACHEABLE)
mvi->regs = ioremap(res_start, res_len);
else
mvi->regs = ioremap_nocache(res_start, res_len);

if (!mvi->regs) {
if (mvi->regs_ex && (res_flag_ex & IORESOURCE_MEM))
iounmap(mvi->regs_ex);
mvi->regs_ex = NULL;
goto err_out;
}

return 0;
err_out:
return -1;
}

For 64xx, bar_ex is I/O space, and
res_start = pci_resource_start(pdev, bar_ex);
res_len = pci_resource_len(pdev, bar_ex);
if (!res_start || !res_len)
goto err_out;
will cause driver loading failed.

When we using MDE-4.0.0 it don't support I/O space, I just bypass
these check since after
investigate all code of mvsas it seems that I/O space map to BAR 2 is
not really used.

When the same card inserted into x86 platform, the allocated I/O space
is not start from 0,
so it works fine.


On Fri, Oct 26, 2012 at 4:03 PM, Bjorn Helgaas <[email protected]> wrote:
> [+cc Chris, also a few comments below]
>
> On Fri, Oct 26, 2012 at 12:59 AM, Cyberman Wu <[email protected]> wrote:
>> After we upgrade to MDE 4.1.0 from Tilera, we encounter a problem that
>> only on HighPoint 2680 card works, I've
>> tried to fix it, but since most time I'm working in user space, I'm
>> not sure my fix is enough. Their FAE said that
>> the guy who add PCIe I/O space support is on vacation and I can't get
>> help from him now, I hope maybe there
>> will have somebody can help.
>>
>>
>> Problem we encountered:
>>
>> pci 0000:00:00.0: BAR 8: assigned [mem 0x100c0000000-0x100c00fffff]
>> pci 0000:00:00.0: BAR 9: assigned [mem 0x100c0100000-0x100c01fffff pref]
>> pci 0000:00:00.0: BAR 7: assigned [io 0x0000-0x0fff]
>> pci 0000:01:00.0: BAR 6: assigned [mem 0x100c0100000-0x100c013ffff pref]
>> pci 0000:01:00.0: BAR 6: set to [mem 0x100c0100000-0x100c013ffff pref]
>> (PCI address [0xc0100000-0xc013ffff])
>> pci 0000:01:00.0: BAR 4: assigned [mem 0x100c0000000-0x100c000ffff 64bit]
>> pci 0000:01:00.0: BAR 4: set to [mem 0x100c0000000-0x100c000ffff
>> 64bit] (PCI address [0xc0000000-0xc000ffff])
>> pci 0000:01:00.0: BAR 2: assigned [io 0x0000-0x007f]
>> pci 0000:01:00.0: BAR 2: set to [io 0x0000-0x007f] (PCI address [0x0-0x7f])
>> pci 0000:00:00.0: PCI bridge to [bus 01-01]
>> pci 0000:00:00.0: bridge window [io 0x0000-0x0fff]
>> pci 0000:00:00.0: bridge window [mem 0x100c0000000-0x100c00fffff]
>> pci 0000:00:00.0: bridge window [mem 0x100c0100000-0x100c01fffff pref]
>> pci 0001:00:00.0: BAR 8: assigned [mem 0x101c0000000-0x101c00fffff]
>> pci 0001:00:00.0: BAR 9: assigned [mem 0x101c0100000-0x101c01fffff pref]
>> pci 0001:00:00.0: BAR 7: assigned [io 0x0000-0x0fff]
>> pci 0001:01:00.0: BAR 6: assigned [mem 0x101c0100000-0x101c013ffff pref]
>> pci 0001:01:00.0: BAR 6: set to [mem 0x101c0100000-0x101c013ffff pref]
>> (PCI address [0xc0100000-0xc013ffff])
>> pci 0001:01:00.0: BAR 4: assigned [mem 0x101c0000000-0x101c000ffff 64bit]
>> pci 0001:01:00.0: BAR 4: set to [mem 0x101c0000000-0x101c000ffff
>> 64bit] (PCI address [0xc0000000-0xc000ffff])
>> pci 0001:01:00.0: BAR 2: assigned [io 0x0000-0x007f]
>> pci 0001:01:00.0: BAR 2: set to [io 0x0000-0x007f] (PCI address [0x0-0x7f])
>> pci 0001:00:00.0: PCI bridge to [bus 01-01]
>> pci 0001:00:00.0: bridge window [io 0x0000-0x0fff]
>> pci 0001:00:00.0: bridge window [mem 0x101c0000000-0x101c00fffff]
>> pci 0001:00:00.0: bridge window [mem 0x101c0100000-0x101c01fffff pref]
>> pci 0000:00:00.0: enabling device (0006 -> 0007)
>> pci 0001:00:00.0: enabling device (0006 -> 0007)
>> pci_bus 0000:00: resource 0 [io 0x0000-0xffffffff]
>> pci_bus 0000:00: resource 1 [mem 0x100c0000000-0x100ffffffff]
>> pci_bus 0000:01: resource 0 [io 0x0000-0x0fff]
>> pci_bus 0000:01: resource 1 [mem 0x100c0000000-0x100c00fffff]
>> pci_bus 0000:01: resource 2 [mem 0x100c0100000-0x100c01fffff pref]
>> pci_bus 0001:00: resource 0 [io 0x0000-0xffffffff]
>> pci_bus 0001:00: resource 1 [mem 0x101c0000000-0x101ffffffff]
>> pci_bus 0001:01: resource 0 [io 0x0000-0x0fff]
>> pci_bus 0001:01: resource 1 [mem 0x101c0000000-0x101c00fffff]
>> pci_bus 0001:01: resource 2 [mem 0x101c0100000-0x101c01fffff pref]
>> ......
>> mvsas 0000:01:00.0: mvsas: driver version 0.8.2
>> mvsas 0000:01:00.0: enabling device (0000 -> 0003)
>> mvsas 0000:01:00.0: enabling bus mastering
>> mvsas 0000:01:00.0: mvsas: PCI-E x4, Bandwidth Usage: 2.5 Gbps
>> mvsas 0000:01:00.0: Phy3 : No sig fis
>> scsi0 : mvsas
>> ......
>> mvsas 0001:01:00.0: mvsas: driver version 0.8.2
>> mvsas 0001:01:00.0: enabling device (0000 -> 0003)
>> mvsas 0001:01:00.0: enabling bus mastering
>> mvsas 0001:01:00.0: BAR 2: can't reserve [io 0x0000-0x007f]
>> mvsas: probe of 0001:01:00.0 failed with error -16
>>
>>
>> My modification:
>>
>> --- /opt/tilera/TileraMDE-4.1.0.148119/tilegx/src/linux-2.6.40.38/arch/tile/kernel/pci_gx.c 2012-10-22
>> 14:56:59.783096378 +0800
>> +++ Tilera_src/src/linux-2.6.40.38/arch/tile/kernel/pci_gx.c 2012-10-26
>> 13:55:02.731947886 +0800
>> @@ -368,6 +368,10 @@
>> int num_trio_shims = 0;
>> int ctl_index = 0;
>> int i, j;
>> + // Modified by Cyberman Wu on Oct 25th, 2012.
>> + resource_size_t io_mem_start;
>> + resource_size_t io_mem_end;
>> + resource_size_t io_mem_size;
>>
>> if (!pci_probe) {
>> pr_info("PCI: disabled by boot argument\n");
>> @@ -457,6 +461,18 @@
>> }
>>
>> out:
>> + // Use IO memory space 0~0xffffffff for every controller will
>> + // cause device on controller other than the first failed to
>> + // load driver if it using IO regions.
>> + // Is reserve the first 4K IO address space OK? Tilera use
>> + // IO space address begin from 0, but some drivers in Linux
>> + // recognize 0 address a error, say, mvsas, so for compatiblity
>> + // reserve some address from 0 should be better?
>
> It's not that mvsas thinks I/O address 0 is invalid, it's just that we
> already assigned [io 0x0000-0x007f] to the device at 0000:01:00.0:
>
> pci 0000:01:00.0: BAR 2: set to [io 0x0000-0x007f]
>
> so that range can't also be assigned to 0001:01:00.0.
>
>> + // Modified by Cyberman Wu on Oct 25th, 2012.
>> + io_mem_start = 4096;
>> + io_mem_end = (resource_size_t)IO_SPACE_LIMIT + 1;
>> + io_mem_size = (io_mem_end - io_mem_start) / num_rc_controllers;
>> + io_mem_size &= ~3;
>> /*
>> * Configure each PCIe RC port.
>> */
>> @@ -470,8 +486,9 @@
>> controller->index = i;
>> controller->ops = &tile_cfg_ops;
>>
>> - controller->io_space.start = 0;
>> - controller->io_space.end = IO_SPACE_LIMIT;
>> + // Modified by Cyberman Wu on Oct 25th, 2012.
>> + controller->io_space.start = io_mem_start + (i * io_mem_size);
>> + controller->io_space.end = controller->io_space.start + io_mem_size - 1;
>> controller->io_space.flags = IORESOURCE_IO;
>> snprintf(controller->io_space_name,
>> sizeof(controller->io_space_name),
>>
>>
>> Please note that we're using MDE-4.1.0, which use kernel 3.0.38, patch
>> it and reversion it
>> to 2.6.40.38.
>> I've checked source code under arch/tile of kernel 3.6.3 and PCIe I/O
>> space support is still
>> not here. Below is diff of arch/tile/pci_gx.c between kernel 3.6.3 and
>> MDE-4.1.0:
>
> Per http://lkml.indiana.edu/hypermail/linux/kernel/1205.1/01176.html,
> Chris considered adding I/O space support and decided against it at
> that time, partly because it would use up a TRIO PIO region.
>
> I don't know his current thoughts. Possibly it could be done under a
> config option or something.
>
> But of course, you'd have to do it by adding I/O space support to the
> current 3.6 kernel *without* reverting all the other changes that have
> been made since 2.6.40.
>
>> --- .cache/.fr-9Oo37J/linux-3.6.3/arch/tile/kernel/pci_gx.c 2012-10-22
>> 00:32:56.000000000 +0800
>> +++ /opt/tilera/TileraMDE-4.1.0.148119/tilegx/src/linux-2.6.40.38/arch/tile/kernel/pci_gx.c 2012-10-22
>> 14:56:59.783096378 +0800
>> @@ -69,19 +69,18 @@
>> * a HW PCIe link-training bug. The exact delay is specified with
>> * a kernel boot argument in the form of "pcie_rc_delay=T,P,S",
>> * where T is the TRIO instance number, P is the port number and S is
>> - * the delay in seconds. If the delay is not provided, the value
>> - * will be DEFAULT_RC_DELAY.
>> + * the delay in seconds. If the argument is specified, but the delay is
>> + * not provided, the value will be DEFAULT_RC_DELAY.
>> */
>> static int __devinitdata rc_delay[TILEGX_NUM_TRIO][TILEGX_TRIO_PCIES];
>>
>> /* Default number of seconds that the PCIe RC port probe can be delayed. */
>> #define DEFAULT_RC_DELAY 10
>>
>> -/* Max number of seconds that the PCIe RC port probe can be delayed. */
>> -#define MAX_RC_DELAY 20
>> -
>> +#if !defined(GX_FPGA)
>> /* Array of the PCIe ports configuration info obtained from the BIB. */
>> struct pcie_port_property pcie_ports[TILEGX_NUM_TRIO][TILEGX_TRIO_PCIES];
>> +#endif
>>
>> /* All drivers share the TRIO contexts defined here. */
>> gxio_trio_context_t trio_contexts[TILEGX_NUM_TRIO];
>> @@ -97,6 +96,41 @@
>> static struct cpumask intr_cpus_map;
>>
>> /*
>> + * Convert a resource to a PCI device bus address or bus window.
>> + */
>> +void __devinit
>> +pcibios_resource_to_bus(struct pci_dev *dev, struct pci_bus_region *region,
>> + struct resource *res)
>> +{
>> + struct pci_controller *controller =
>> + (struct pci_controller *)dev->sysdata;
>> + unsigned long offset = 0;
>> +
>> + if (res->flags & IORESOURCE_MEM)
>> + offset = controller->mem_offset;
>> +
>> + region->start = res->start - offset;
>> + region->end = res->end - offset;
>> +}
>> +EXPORT_SYMBOL(pcibios_resource_to_bus);
>> +
>> +void __devinit
>> +pcibios_bus_to_resource(struct pci_dev *dev, struct resource *res,
>> + struct pci_bus_region *region)
>> +{
>> + struct pci_controller *controller =
>> + (struct pci_controller *)dev->sysdata;
>> + unsigned long offset = 0;
>> +
>> + if (res->flags & IORESOURCE_MEM)
>> + offset = controller->mem_offset;
>> +
>> + res->start = region->start + offset;
>> + res->end = region->end + offset;
>> +}
>> +EXPORT_SYMBOL(pcibios_bus_to_resource);
>> +
>> +/*
>> * We don't need to worry about the alignment of resources.
>> */
>> resource_size_t pcibios_align_resource(void *data, const struct resource *res,
>> @@ -274,6 +308,10 @@
>>
>> cpumask_copy(&intr_cpus_map, cpu_online_mask);
>>
>> +#ifdef CONFIG_DATAPLANE
>> + /* Remove dataplane cpus. */
>> + cpumask_andnot(&intr_cpus_map, &intr_cpus_map, &dataplane_map);
>> +#endif
>>
>> for (i = 0; i < 4; i++) {
>> gxio_trio_context_t *context = controller->trio;
>> @@ -325,7 +363,7 @@
>> *
>> * Returns the number of controllers discovered.
>> */
>> -int __init tile_pci_init(void)
>> +int __devinit tile_pci_init(void)
>> {
>> int num_trio_shims = 0;
>> int ctl_index = 0;
>> @@ -359,6 +397,7 @@
>> * We look at the Board Information Block first and then see if there
>> * are any overriding configuration by the HW strapping pin.
>> */
>> +#if !defined(GX_FPGA)
>> for (i = 0; i < TILEGX_NUM_TRIO; i++) {
>> gxio_trio_context_t *context = &trio_contexts[i];
>> int ret;
>> @@ -386,6 +425,13 @@
>> }
>> }
>> }
>> +#else
>> + /*
>> + * For now, just assume that there is a single RC port on trio/0.
>> + */
>> + num_rc_controllers = 1;
>> + pcie_rc[0][2] = 1;
>> +#endif
>>
>> /*
>> * Return if no PCIe ports are configured to operate in RC mode.
>> @@ -424,13 +470,20 @@
>> controller->index = i;
>> controller->ops = &tile_cfg_ops;
>>
>> + controller->io_space.start = 0;
>> + controller->io_space.end = IO_SPACE_LIMIT;
>> + controller->io_space.flags = IORESOURCE_IO;
>> + snprintf(controller->io_space_name,
>> + sizeof(controller->io_space_name),
>> + "PCI I/O domain %d", i);
>> + controller->io_space.name = controller->io_space_name;
>> +
>> /*
>> * The PCI memory resource is located above the PA space.
>> * For every host bridge, the BAR window or the MMIO aperture
>> * is in range [3GB, 4GB - 1] of a 4GB space beyond the
>> * PA space.
>> */
>> -
>> controller->mem_offset = TILE_PCI_MEM_START +
>> (i * TILE_PCI_BAR_WINDOW_TOP);
>> controller->mem_space.start = controller->mem_offset +
>> @@ -451,7 +504,7 @@
>> * (pin - 1) converts from the PCI standard's [1:4] convention to
>> * a normal [0:3] range.
>> */
>> -static int tile_map_irq(const struct pci_dev *dev, u8 device, u8 pin)
>> +static int tile_map_irq(struct pci_dev *dev, u8 device, u8 pin)
>> {
>> struct pci_controller *controller =
>> (struct pci_controller *)dev->sysdata;
>> @@ -463,11 +516,12 @@
>> controller)
>> {
>> gxio_trio_context_t *trio_context = controller->trio;
>> - struct pci_bus *root_bus = controller->root_bus;
>> TRIO_PCIE_RC_DEVICE_CONTROL_t dev_control;
>> TRIO_PCIE_RC_DEVICE_CAP_t rc_dev_cap;
>> + unsigned int smallest_max_payload;
>> + struct pci_dev *dev = NULL;
>> unsigned int reg_offset;
>> - struct pci_bus *child;
>> + u16 new_values;
>> int mac;
>> int err;
>>
>> @@ -508,33 +562,59 @@
>> __gxio_mmio_write32(trio_context->mmio_base_mac + reg_offset,
>> rc_dev_cap.word);
>>
>> - /* Configure PCI Express MPS setting. */
>> - list_for_each_entry(child, &root_bus->children, node) {
>> - struct pci_dev *self = child->self;
>> - if (!self)
>> + smallest_max_payload = rc_dev_cap.mps_sup;
>> +
>> + /* Scan for the smallest maximum payload size. */
>> + while ((dev = pci_get_device(PCI_ANY_ID, PCI_ANY_ID, dev)) != NULL) {
>> + int pcie_caps_offset;
>> + u32 devcap;
>> + int max_payload;
>> +
>> + /* Skip device that is not in this PCIe domain. */
>> + if ((struct pci_controller *)dev->sysdata != controller)
>> continue;
>>
>> - pcie_bus_configure_settings(child, self->pcie_mpss);
>> + pcie_caps_offset = pci_find_capability(dev, PCI_CAP_ID_EXP);
>> + if (pcie_caps_offset == 0)
>> + continue;
>> +
>> + pci_read_config_dword(dev, pcie_caps_offset + PCI_EXP_DEVCAP,
>> + &devcap);
>> + max_payload = devcap & PCI_EXP_DEVCAP_PAYLOAD;
>> + if (max_payload < smallest_max_payload)
>> + smallest_max_payload = max_payload;
>> + }
>> +
>> + /* Now, set the max_payload_size for all devices to that value. */
>> + new_values = smallest_max_payload << 5;
>> + while ((dev = pci_get_device(PCI_ANY_ID, PCI_ANY_ID, dev)) != NULL) {
>> + int pcie_caps_offset;
>> + u16 devctl;
>> +
>> + /* Skip device that is not in this PCIe domain. */
>> + if ((struct pci_controller *)dev->sysdata != controller)
>> + continue;
>> +
>> + pcie_caps_offset = pci_find_capability(dev, PCI_CAP_ID_EXP);
>> + if (pcie_caps_offset == 0)
>> + continue;
>> +
>> + pci_read_config_word(dev, pcie_caps_offset + PCI_EXP_DEVCTL,
>> + &devctl);
>> + devctl &= ~PCI_EXP_DEVCTL_PAYLOAD;
>> + devctl |= new_values;
>> + pci_write_config_word(dev, pcie_caps_offset + PCI_EXP_DEVCTL,
>> + devctl);
>> }
>>
>> /*
>> * Set the mac_config register in trio based on the MPS/MRS of the link.
>> */
>> - reg_offset =
>> - (TRIO_PCIE_RC_DEVICE_CONTROL <<
>> - TRIO_CFG_REGION_ADDR__REG_SHIFT) |
>> - (TRIO_CFG_REGION_ADDR__INTFC_VAL_MAC_STANDARD <<
>> - TRIO_CFG_REGION_ADDR__INTFC_SHIFT ) |
>> - (mac << TRIO_CFG_REGION_ADDR__MAC_SEL_SHIFT);
>> -
>> - dev_control.word = __gxio_mmio_read32(trio_context->mmio_base_mac +
>> - reg_offset);
>> -
>> err = gxio_trio_set_mps_mrs(trio_context,
>> - dev_control.max_payload_size,
>> + smallest_max_payload,
>> dev_control.max_read_req_sz,
>> mac);
>> - if (err < 0) {
>> + if (err < 0) {
>> pr_err("PCI: PCIE_CONFIGURE_MAC_MPS_MRS failure, "
>> "MAC %d on TRIO %d\n",
>> mac, controller->trio_index);
>> @@ -571,14 +651,9 @@
>> if (!isdigit(*str))
>> return -EINVAL;
>> delay = simple_strtoul(str, (char **)&str, 10);
>> - if (delay > MAX_RC_DELAY)
>> - return -EINVAL;
>> }
>>
>> rc_delay[trio_index][mac] = delay ? : DEFAULT_RC_DELAY;
>> - pr_info("Delaying PCIe RC link training for %u sec"
>> - " on MAC %lu on TRIO %lu\n", rc_delay[trio_index][mac],
>> - mac, trio_index);
>> return 0;
>> }
>> early_param("pcie_rc_delay", setup_pcie_rc_delay);
>> @@ -586,18 +661,14 @@
>> /*
>> * PCI initialization entry point, called by subsys_initcall.
>> */
>> -int __init pcibios_init(void)
>> +int __devinit pcibios_init(void)
>> {
>> resource_size_t offset;
>> - LIST_HEAD(resources);
>> int next_busno;
>> int i;
>>
>> tile_pci_init();
>>
>> - if (num_rc_controllers == 0 && num_ep_controllers == 0)
>> - return 0;
>> -
>> /*
>> * We loop over all the TRIO shims and set up the MMIO mappings.
>> */
>> @@ -623,6 +694,9 @@
>> }
>> }
>>
>> + if (num_rc_controllers == 0 && num_ep_controllers == 0)
>> + return 0;
>> +
>> /*
>> * Delay a bit in case devices aren't ready. Some devices are
>> * known to require at least 20ms here, but we use a more
>> @@ -684,15 +758,36 @@
>> }
>>
>> /*
>> - * Delay the RC link training if needed.
>> + * Delay the bus probe if needed.
>> */
>> - if (rc_delay[trio_index][mac])
>> + if (rc_delay[trio_index][mac]) {
>> + pr_info("Delaying PCIe RC link training for %d sec"
>> + " on MAC %d on TRIO %d\n",
>> + rc_delay[trio_index][mac], mac,
>> + trio_index);
>> msleep(rc_delay[trio_index][mac] * 1000);
>> + }
>>
>> - ret = gxio_trio_force_rc_link_up(trio_context, mac);
>> - if (ret < 0)
>> - pr_err("PCI: PCIE_FORCE_LINK_UP failure, "
>> - "MAC %d on TRIO %d\n", mac, trio_index);
>> + /*
>> + * Check for PCIe link-up status to decide if we need
>> + * to force the link to come up.
>> + */
>> + reg_offset =
>> + (TRIO_PCIE_INTFC_PORT_STATUS <<
>> + TRIO_CFG_REGION_ADDR__REG_SHIFT) |
>> + (TRIO_CFG_REGION_ADDR__INTFC_VAL_MAC_INTERFACE <<
>> + TRIO_CFG_REGION_ADDR__INTFC_SHIFT ) |
>> + (mac << TRIO_CFG_REGION_ADDR__MAC_SEL_SHIFT);
>> +
>> + port_status.word =
>> + __gxio_mmio_read(trio_context->mmio_base_mac +
>> + reg_offset);
>> + if (!port_status.dl_up) {
>> + ret = gxio_trio_force_rc_link_up(trio_context, mac);
>> + if (ret < 0)
>> + pr_err("PCI: PCIE_FORCE_LINK_UP failure, "
>> + "MAC %d on TRIO %d\n", mac, trio_index);
>> + }
>>
>> pr_info("PCI: Found PCI controller #%d on TRIO %d MAC %d\n", i,
>> trio_index, controller->mac);
>> @@ -704,22 +799,20 @@
>> msleep(1000);
>>
>> /*
>> - * Check for PCIe link-up status.
>> + * Check for PCIe link-up status again.
>> */
>> -
>> - reg_offset =
>> - (TRIO_PCIE_INTFC_PORT_STATUS <<
>> - TRIO_CFG_REGION_ADDR__REG_SHIFT) |
>> - (TRIO_CFG_REGION_ADDR__INTFC_VAL_MAC_INTERFACE <<
>> - TRIO_CFG_REGION_ADDR__INTFC_SHIFT ) |
>> - (mac << TRIO_CFG_REGION_ADDR__MAC_SEL_SHIFT);
>> -
>> port_status.word =
>> __gxio_mmio_read(trio_context->mmio_base_mac +
>> reg_offset);
>> if (!port_status.dl_up) {
>> - pr_err("PCI: link is down, MAC %d on TRIO %d\n",
>> - mac, trio_index);
>> + if (pcie_ports[trio_index][mac].removable) {
>> + pr_info("PCI: link is down, MAC %d on TRIO %d",
>> + mac, trio_index);
>> + pr_info("This is expected if no PCIe card"
>> + " is connected to this link");
>> + } else
>> + pr_err("PCI: link is down, MAC %d on TRIO %d",
>> + mac, trio_index);
>> continue;
>> }
>>
>> @@ -842,19 +935,22 @@
>> }
>>
>> /*
>> - * The PCI memory resource is located above the PA space.
>> - * The memory range for the PCI root bus should not overlap
>> - * with the physical RAM
>> + * This comes from the generic Linux PCI driver.
>> + *
>> + * It reads the PCI tree for this bus into the Linux
>> + * data structures.
>> + *
>> + * This is inlined in linux/pci.h and calls into
>> + * pci_scan_bus_parented() in probe.c.
>> */
>> - pci_add_resource_offset(&resources, &controller->mem_space,
>> - controller->mem_offset);
>> -
>> - controller->first_busno = next_busno;
>> - bus = pci_scan_root_bus(NULL, next_busno, controller->ops,
>> - controller, &resources);
>> + controller->first_busno= next_busno;
>> + bus = pci_scan_bus(next_busno, controller->ops, controller);
>> controller->root_bus = bus;
>> - next_busno = bus->busn_res.end + 1;
>> -
>> +#if 0
>> + next_busno = bus->subordinate + 1;
>> +#else
>> + next_busno = 0;
>> +#endif
>> }
>>
>> /* Do machine dependent PCI interrupt routing */
>> @@ -951,6 +1047,37 @@
>> }
>>
>> /*
>> + * Alloc a PIO region for PCI I/O space access for each RC port.
>> + */
>> + ret = gxio_trio_alloc_pio_regions(trio_context, 1, 0, 0);
>> + if (ret < 0) {
>> + pr_err("PCI: I/O PIO alloc failure on TRIO %d mac %d, "
>> + "give up\n", controller->trio_index,
>> + controller->mac);
>> +
>> + continue;
>> + }
>> +
>> + controller->pio_io_index = ret;
>> +
>> + /*
>> + * For PIO IO, the bus_address_hi parameter is hard-coded 0
>> + * because PCI I/O address space is 32-bit.
>> + */
>> + ret = gxio_trio_init_pio_region_aux(trio_context,
>> + controller->pio_io_index,
>> + controller->mac,
>> + 0,
>> + HV_TRIO_PIO_FLAG_IO_SPACE);
>> + if (ret < 0) {
>> + pr_err("PCI: I/O PIO init failure on TRIO %d mac %d, "
>> + "give up\n", controller->trio_index,
>> + controller->mac);
>> +
>> + continue;
>> + }
>> +
>> + /*
>> * Configure a Mem-Map region for each memory controller so
>> * that Linux can map all of its PA space to the PCI bus.
>> * Use the IOMMU to handle hash-for-home memory.
>> @@ -1015,9 +1142,22 @@
>> }
>> subsys_initcall(pcibios_init);
>>
>> -/* Note: to be deleted after Linux 3.6 merge. */
>> +/*
>> + * PCI scan code calls the arch specific pcibios_fixup_bus() each time it scans
>> + * a new bridge. Called after each bus is probed, but before its children are
>> + * examined.
>> + */
>> void __devinit pcibios_fixup_bus(struct pci_bus *bus)
>> {
>> + struct pci_dev *dev = bus->self;
>> +
>> + if (!dev) {
>> + struct pci_controller *controller = bus->sysdata;
>> +
>> + /* This is the root bus. */
>> + bus->resource[0] = &controller->io_space;
>> + bus->resource[1] = &controller->mem_space;
>> + }
>> }
>>
>> /*
>> @@ -1043,8 +1183,7 @@
>>
>> /*
>> * Enable memory address decoding, as appropriate, for the
>> - * device described by the 'dev' struct. The I/O decoding
>> - * is disabled, though the TILE-Gx supports I/O addressing.
>> + * device described by the 'dev' struct.
>> *
>> * This is called from the generic PCI layer, and can be called
>> * for bridges or endpoints.
>> @@ -1126,10 +1265,95 @@
>> * We need to keep the PCI bus address's in-page offset in the VA.
>> */
>> return iorpc_ioremap(trio_fd, offset, size) +
>> - (phys_addr & (PAGE_SIZE - 1));
>> + (start & (PAGE_SIZE - 1));
>> }
>> EXPORT_SYMBOL(ioremap);
>>
>> +/* Map a PCI I/O address into VA space. */
>> +void __iomem *ioport_map(unsigned long port, unsigned int size)
>> +{
>> + struct pci_controller *controller = NULL;
>> + resource_size_t bar_start;
>> + resource_size_t bar_end;
>> + resource_size_t offset;
>> + resource_size_t start;
>> + resource_size_t end;
>> + int trio_fd;
>> + int i;
>> +
>> + start = port;
>> + end = port + size - 1;
>> +
>> + /*
>> + * In the following, each PCI controller's mem_resources[0]
>> + * represents its PCI I/O resource. By searching port in each
>> + * controller's mem_resources[0], we can determine the controller
>> + * that should accept the PCI I/O access.
>> + */
>> +
>> + for (i = 0; i < num_rc_controllers; i++) {
>> + /*
>> + * Skip controllers that are not properly initialized or
>> + * have down links.
>> + */
>> + if (pci_controllers[i].root_bus == NULL)
>> + continue;
>> +
>> + bar_start = pci_controllers[i].mem_resources[0].start;
>> + bar_end = pci_controllers[i].mem_resources[0].end;
>> +
>> + if ((start >= bar_start) && (end <= bar_end)) {
>> +
>> + controller = &pci_controllers[i];
>> +
>> + goto got_it;
>> + }
>> + }
>> +
>> + if (controller == NULL)
>> + return NULL;
>> +
>> +got_it:
>> + trio_fd = controller->trio->fd;
>> +
>> + offset = HV_TRIO_PIO_OFFSET(controller->pio_io_index) + port;
>> +
>> + /*
>> + * We need to keep the PCI bus address's in-page offset in the VA.
>> + */
>> + return iorpc_ioremap(trio_fd, offset, size) + (port & (PAGE_SIZE - 1));
>> +}
>> +EXPORT_SYMBOL(ioport_map);
>> +
>> +void ioport_unmap(void __iomem *addr)
>> +{
>> + iounmap(addr);
>> +}
>> +EXPORT_SYMBOL(ioport_unmap);
>> +
>> +/*
>> + * Create a virtual mapping cookie for a PCI BAR (memory or IO).
>> + */
>> +void __iomem *pci_iomap(struct pci_dev *dev, int bar, unsigned long max)
>> +{
>> + resource_size_t start = pci_resource_start(dev, bar);
>> + resource_size_t len = pci_resource_len(dev, bar);
>> + unsigned long flags = pci_resource_flags(dev, bar);
>> +
>> + if (!len)
>> + return NULL;
>> + if (max && len > max)
>> + len = max;
>> + if (flags & IORESOURCE_IO)
>> + return ioport_map(start, len);
>> + if (flags & IORESOURCE_MEM)
>> + return ioremap(start, len);
>> +
>> + pr_err("PCI: Trying to map invalid resource %#lx\n", flags);
>> + return NULL;
>> +}
>> +EXPORT_SYMBOL(pci_iomap);
>> +
>> void pci_iounmap(struct pci_dev *dev, void __iomem *addr)
>> {
>> iounmap(addr);
>> @@ -1478,32 +1702,55 @@
>> trio_context = controller->trio;
>>
>> /*
>> - * Allocate the Mem-Map that will accept the MSI write and
>> - * trigger the TILE-side interrupts.
>> - */
>> - mem_map = gxio_trio_alloc_memory_maps(trio_context, 1, 0, 0);
>> - if (mem_map < 0) {
>> - dev_printk(KERN_INFO, &pdev->dev,
>> - "%s Mem-Map alloc failure. "
>> - "Failed to initialize MSI interrupts. "
>> - "Falling back to legacy interrupts.\n",
>> - desc->msi_attrib.is_msix ? "MSI-X" : "MSI");
>> + * Allocate a scatter-queue that will accept the MSI write and
>> + * trigger the TILE-side interrupts. We use the scatter-queue regions
>> + * before the mem map regions, because the latter are needed by more
>> + * applications.
>> + */
>> + mem_map = gxio_trio_alloc_scatter_queues(trio_context, 1, 0, 0);
>> + if (mem_map >= 0) {
>> + TRIO_MAP_SQ_DOORBELL_FMT_t doorbell_template = {{
>> + .pop = 0,
>> + .doorbell = 1,
>> + }};
>> +
>> + mem_map += TRIO_NUM_MAP_MEM_REGIONS;
>> + mem_map_base = MEM_MAP_INTR_REGIONS_BASE +
>> + mem_map * MEM_MAP_INTR_REGION_SIZE;
>> + mem_map_limit = mem_map_base + MEM_MAP_INTR_REGION_SIZE - 1;
>> +
>> + msi_addr = mem_map_base + MEM_MAP_INTR_REGION_SIZE - 8;
>> + msg.data = (unsigned int)doorbell_template.word;
>> + } else {
>> + /* SQ regions are out, allocate from map mem regions. */
>> + mem_map = gxio_trio_alloc_memory_maps(trio_context, 1, 0, 0);
>> + if (mem_map < 0) {
>> + dev_printk(KERN_INFO, &pdev->dev,
>> + "%s Mem-Map alloc failure. "
>> + "Failed to initialize MSI interrupts. "
>> + "Falling back to legacy interrupts.\n",
>> + desc->msi_attrib.is_msix ? "MSI-X" : "MSI");
>> + ret = -ENOMEM;
>> + goto msi_mem_map_alloc_failure;
>> + }
>>
>> - ret = -ENOMEM;
>> - goto msi_mem_map_alloc_failure;
>> + mem_map_base = MEM_MAP_INTR_REGIONS_BASE +
>> + mem_map * MEM_MAP_INTR_REGION_SIZE;
>> + mem_map_limit = mem_map_base + MEM_MAP_INTR_REGION_SIZE - 1;
>> +
>> + msi_addr = mem_map_base + TRIO_MAP_MEM_REG_INT3 -
>> + TRIO_MAP_MEM_REG_INT0;
>> +
>> + msg.data = mem_map;
>> }
>>
>> /* We try to distribute different IRQs to different tiles. */
>> cpu = tile_irq_cpu(irq);
>>
>> /*
>> - * Now call up to the HV to configure the Mem-Map interrupt and
>> + * Now call up to the HV to configure the MSI interrupt and
>> * set up the IPI binding.
>> */
>> - mem_map_base = MEM_MAP_INTR_REGIONS_BASE +
>> - mem_map * MEM_MAP_INTR_REGION_SIZE;
>> - mem_map_limit = mem_map_base + MEM_MAP_INTR_REGION_SIZE - 1;
>> -
>> ret = gxio_trio_config_msi_intr(trio_context, cpu_x(cpu), cpu_y(cpu),
>> KERNEL_PL, irq, controller->mac,
>> mem_map, mem_map_base, mem_map_limit,
>> @@ -1516,13 +1763,9 @@
>>
>> irq_set_msi_desc(irq, desc);
>>
>> - msi_addr = mem_map_base + TRIO_MAP_MEM_REG_INT3 - TRIO_MAP_MEM_REG_INT0;
>> -
>> msg.address_hi = msi_addr >> 32;
>> msg.address_lo = msi_addr & 0xffffffff;
>>
>> - msg.data = mem_map;
>> -
>> write_msi_msg(irq, &msg);
>> irq_set_chip_and_handler(irq, &tilegx_msi_chip, handle_level_irq);
>> irq_set_handler_data(irq, controller);
>>
>>
>> What we got after my fix:
>>
>> pci 0000:00:00.0: BAR 8: assigned [mem 0x100c0000000-0x100c00fffff]
>> pci 0000:00:00.0: BAR 9: assigned [mem 0x100c0100000-0x100c01fffff pref]
>> pci 0000:00:00.0: BAR 7: assigned [io 0x1000-0x1fff]
>> pci 0000:01:00.0: BAR 6: assigned [mem 0x100c0100000-0x100c013ffff pref]
>> pci 0000:01:00.0: BAR 6: set to [mem 0x100c0100000-0x100c013ffff pref]
>> (PCI address [0xc0100000-0xc013ffff])
>> pci 0000:01:00.0: BAR 4: assigned [mem 0x100c0000000-0x100c000ffff
>> 64bit]
>> pci 0000:01:00.0: BAR 4: set to [mem 0x100c0000000-0x100c000ffff
>> 64bit] (PCI address [0xc0000000-0xc000ffff])
>> pci 0000:01:00.0: BAR 2: assigned [io 0x1000-0x107f]
>> pci 0000:01:00.0: BAR 2: set to [io 0x1000-0x107f] (PCI address
>> [0x1000-0x107f])
>> pci 0000:00:00.0: PCI bridge to [bus 01-01]
>> pci 0000:00:00.0: bridge window [io 0x1000-0x1fff]
>> pci 0000:00:00.0: bridge window [mem 0x100c0000000-0x100c00fffff]
>> pci 0000:00:00.0: bridge window [mem 0x100c0100000-0x100c01fffff
>> pref]
>> pci 0001:00:00.0: BAR 8: assigned [mem 0x101c0000000-0x101c00fffff]
>> pci 0001:00:00.0: BAR 9: assigned [mem 0x101c0100000-0x101c01fffff
>> pref]
>> pci 0001:00:00.0: BAR 7: assigned [io 0x80001000-0x80001fff]
>> pci 0001:01:00.0: BAR 6: assigned [mem 0x101c0100000-0x101c013ffff
>> pref]
>> pci 0001:01:00.0: BAR 6: set to [mem 0x101c0100000-0x101c013ffff pref]
>> (PCI address [0xc0100000-0xc013ffff])
>> pci 0001:01:00.0: BAR 4: assigned [mem 0x101c0000000-0x101c000ffff
>> 64bit]
>> pci 0001:01:00.0: BAR 4: set to [mem 0x101c0000000-0x101c000ffff
>> 64bit] (PCI address [0xc0000000-0xc000ffff])
>> pci 0001:01:00.0: BAR 2: assigned [io 0x80001000-0x8000107f]
>> pci 0001:01:00.0: BAR 2: set to [io 0x80001000-0x8000107f] (PCI
>> address [0x80001000-0x8000107f])
>> pci 0001:00:00.0: PCI bridge to [bus 01-01]
>> pci 0001:00:00.0: bridge window [io 0x80001000-0x80001fff]
>> pci 0001:00:00.0: bridge window [mem 0x101c0000000-0x101c00fffff]
>> pci 0001:00:00.0: bridge window [mem 0x101c0100000-0x101c01fffff
>> pref]
>> pci 0000:00:00.0: enabling device (0006 -> 0007)
>> pci 0001:00:00.0: enabling device (0006 -> 0007)
>> pci_bus 0000:00: resource 0 [io 0x1000-0x800007ff]
>> pci_bus 0000:00: resource 1 [mem 0x100c0000000-0x100ffffffff]
>> pci_bus 0000:01: resource 0 [io 0x1000-0x1fff]
>> pci_bus 0000:01: resource 1 [mem 0x100c0000000-0x100c00fffff]
>> pci_bus 0000:01: resource 2 [mem 0x100c0100000-0x100c01fffff pref]
>> pci_bus 0001:00: resource 0 [io 0x80000800-0xffffffff]
>> pci_bus 0001:00: resource 1 [mem 0x101c0000000-0x101ffffffff]
>> pci_bus 0001:01: resource 0 [io 0x80001000-0x80001fff]
>> pci_bus 0001:01: resource 1 [mem 0x101c0000000-0x101c00fffff]
>> pci_bus 0001:01: resource 2 [mem 0x101c0100000-0x101c01fffff pref]
>> ......
>> mvsas 0000:01:00.0: mvsas: driver version 0.8.2
>> mvsas 0000:01:00.0: enabling device (0000 -> 0003)
>> mvsas 0000:01:00.0: enabling bus mastering
>> mvsas 0000:01:00.0: mvsas: PCI-E x4, Bandwidth Usage: 2.5 Gbps
>> scsi0 : mvsas
>> ......
>> mvsas 0001:01:00.0: mvsas: driver version 0.8.2
>> mvsas 0001:01:00.0: enabling device (0000 -> 0003)
>> mvsas 0001:01:00.0: enabling bus mastering
>> mvsas 0001:01:00.0: mvsas: PCI-E x4, Bandwidth Usage: 2.5 Gbps
>> scsi1 : mvsas
>>
>>
>> It works now. But I really need some one to confirm whether my
>> modification is enough or not,
>> if there have other potential problems.
>>
>>
>>
>> Best regards.
>>
>> --
>> Cyberman Wu
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html



--
Cyberman Wu

2012-10-26 09:22:35

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: PCIe IO space support on Tilera GX: Is there any one who can confirm my modification to fix it is OK?

On Fri, Oct 26, 2012 at 3:01 AM, Cyberman Wu <[email protected]> wrote:
> We're not using 3.6.x, we're using is from MDE-4.1.0 from Tilera and
> it patch 3.0.38.

That's fine, but you sent the email to the linux-pci and linux-kernel
lists, and on those lists, we're only concerned with the upstream
Linux kernels, e.g., 3.6. If you need support for MDE-4.1.0, you need
to talk to whoever supplies that, because we have no idea what it is.

> For mvsas, it seems do think 0 I/O address invalied.

That's a driver bug. Zero is a perfectly valid I/O address. On many
systems it's not usable because of platform restrictions, but the
driver has no way to know about those restrictions, and the driver
should still work on the platforms where zero *is* usable.

> When we using MDE-4.0.0 it don't support I/O space, I just bypass
> these check since after
> investigate all code of mvsas it seems that I/O space map to BAR 2 is
> not really used.

If the driver doesn't need I/O space, it'd be a lot simpler to just
change it to use pci_enable_device_mem(), which indicates that we
don't need to enable I/O BARs, and strip out the code that checks
whether the I/O BARs are valid. Then you wouldn't need to mess with
adding I/O space support in your platform.

Bjorn

2012-10-26 14:08:58

by Chris Metcalf

[permalink] [raw]
Subject: Re: PCIe IO space support on Tilera GX: Is there any one who can confirm my modification to fix it is OK?

On 10/26/2012 4:03 AM, Bjorn Helgaas wrote:
> [+cc Chris, also a few comments below]

Bjorn, thanks for looping me in.

> On Fri, Oct 26, 2012 at 12:59 AM, Cyberman Wu <[email protected]> wrote:
>> After we upgrade to MDE 4.1.0 from Tilera, we encounter a problem that
>> only on HighPoint 2680 card works, I've
>> tried to fix it, but since most time I'm working in user space, I'm
>> not sure my fix is enough. Their FAE said that
>> the guy who add PCIe I/O space support is on vacation and I can't get
>> help from him now, I hope maybe there
>> will have somebody can help.

I asked internally and I think the response provided by Tilera was more
along the lines of "it may take a bit longer to get you an answer". :-)

> Per http://lkml.indiana.edu/hypermail/linux/kernel/1205.1/01176.html,
> Chris considered adding I/O space support and decided against it at
> that time, partly because it would use up a TRIO PIO region.
>
> I don't know his current thoughts. Possibly it could be done under a
> config option or something.

In the end, we did code up I/O port support in the root complex driver,
since we had a customer with a device that needed it. We haven't yet
returned it to the community - getting the basic PCI root complex support
returned (and networking support) took way more time than I had anticipated
so I'm batching up the next round of stuff to return for later. But
"later" is probably going to come fairly soon since it would make sense to
target the 3.8 merge window.

>> It works now. But I really need some one to confirm whether my
>> modification is enough or not,
>> if there have other potential problems.

Cyberman: it seems like your bias hack is working for you. But, as Bjorn
says, this sounds like a driver bug. What happens if you just revert your
changes, but then in mvsas.c change the "if (!res_start || !res_len)" to
just say "if (!res_len)"? That seems like the true error test. If that
works, you should submit that change to the community.

Bjorn et al: does it seem reasonable to add a bias to the mappings so that
we never report a zero value as valid? This may be sufficiently defensive
programming that it's just the right thing to do regardless of whether
drivers are technically at fault or not. If so, what's a good bias? (I'm
inclined to think 64K rather than 4K.)

Thanks!

--
Chris Metcalf, Tilera Corp.
http://www.tilera.com

2012-10-26 16:29:16

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: PCIe IO space support on Tilera GX: Is there any one who can confirm my modification to fix it is OK?

On Fri, Oct 26, 2012 at 8:08 AM, Chris Metcalf <[email protected]> wrote:

> Cyberman: it seems like your bias hack is working for you. But, as Bjorn
> says, this sounds like a driver bug. What happens if you just revert your
> changes, but then in mvsas.c change the "if (!res_start || !res_len)" to
> just say "if (!res_len)"? That seems like the true error test. If that
> works, you should submit that change to the community.

I don't *think* that is going to be enough, even with the kernel that
has some I/O space support, because both devices are assigned
identical resources:

pci 0000:01:00.0: BAR 2: assigned [io 0x0000-0x007f]
pci 0001:01:00.0: BAR 2: assigned [io 0x0000-0x007f]

The I/O space support that's there is broken because we think the same
I/O range is available on both root buses, which is probably not the
case:

pci_bus 0000:00: resource 0 [io 0x0000-0xffffffff]
pci_bus 0001:00: resource 0 [io 0x0000-0xffffffff]

If mvsas really doesn't need the I/O BAR, I think it's likely that
making it use pci_enable_device_mem() will make both devices work even
without I/O space support in the kernel.

> Bjorn et al: does it seem reasonable to add a bias to the mappings so that
> we never report a zero value as valid? This may be sufficiently defensive
> programming that it's just the right thing to do regardless of whether
> drivers are technically at fault or not. If so, what's a good bias? (I'm
> inclined to think 64K rather than 4K.)

I/O space is very limited to begin with (many architectures only
*have* 64K), so I hesitate to add a bias in the PCI core. But we do
something similar in arch_remove_reservations(), and I think you could
implement it that way if you wanted to.

Bjorn

2012-10-27 01:39:40

by Cyberman Wu

[permalink] [raw]
Subject: Re: PCIe IO space support on Tilera GX: Is there any one who can confirm my modification to fix it is OK?

On Sat, Oct 27, 2012 at 12:28 AM, Bjorn Helgaas <[email protected]> wrote:
> On Fri, Oct 26, 2012 at 8:08 AM, Chris Metcalf <[email protected]> wrote:
>
>> Cyberman: it seems like your bias hack is working for you. But, as Bjorn
>> says, this sounds like a driver bug. What happens if you just revert your
>> changes, but then in mvsas.c change the "if (!res_start || !res_len)" to
>> just say "if (!res_len)"? That seems like the true error test. If that
>> works, you should submit that change to the community.
>
> I don't *think* that is going to be enough, even with the kernel that
> has some I/O space support, because both devices are assigned
> identical resources:
>
> pci 0000:01:00.0: BAR 2: assigned [io 0x0000-0x007f]
> pci 0001:01:00.0: BAR 2: assigned [io 0x0000-0x007f]
>
> The I/O space support that's there is broken because we think the same
> I/O range is available on both root buses, which is probably not the
> case:
>
> pci_bus 0000:00: resource 0 [io 0x0000-0xffffffff]
> pci_bus 0001:00: resource 0 [io 0x0000-0xffffffff]
>
That's the problem I want to confirm what I've changed is correct. I've split
the two RootComplex using separate I/O range, it seems works on our device,
but since I'm not very clear about Linux kernel, I want some some to check it.
For mvsas, I've already modified it some thing like Chris said when I began
using MDE-4.0.0 GA release. I bring it out to see if there have some ideas
about that issue.

> If mvsas really doesn't need the I/O BAR, I think it's likely that
> making it use pci_enable_device_mem() will make both devices work even
> without I/O space support in the kernel.
>
>> Bjorn et al: does it seem reasonable to add a bias to the mappings so that
>> we never report a zero value as valid? This may be sufficiently defensive
>> programming that it's just the right thing to do regardless of whether
>> drivers are technically at fault or not. If so, what's a good bias? (I'm
>> inclined to think 64K rather than 4K.)
>
> I/O space is very limited to begin with (many architectures only
> *have* 64K), so I hesitate to add a bias in the PCI core. But we do
> something similar in arch_remove_reservations(), and I think you could
> implement it that way if you wanted to.
>
> Bjorn



--
Cyberman Wu

2012-10-27 03:05:21

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: PCIe IO space support on Tilera GX: Is there any one who can confirm my modification to fix it is OK?

On Fri, Oct 26, 2012 at 7:39 PM, Cyberman Wu <[email protected]> wrote:
> On Sat, Oct 27, 2012 at 12:28 AM, Bjorn Helgaas <[email protected]> wrote:
>> On Fri, Oct 26, 2012 at 8:08 AM, Chris Metcalf <[email protected]> wrote:
>>
>>> Cyberman: it seems like your bias hack is working for you. But, as Bjorn
>>> says, this sounds like a driver bug. What happens if you just revert your
>>> changes, but then in mvsas.c change the "if (!res_start || !res_len)" to
>>> just say "if (!res_len)"? That seems like the true error test. If that
>>> works, you should submit that change to the community.
>>
>> I don't *think* that is going to be enough, even with the kernel that
>> has some I/O space support, because both devices are assigned
>> identical resources:
>>
>> pci 0000:01:00.0: BAR 2: assigned [io 0x0000-0x007f]
>> pci 0001:01:00.0: BAR 2: assigned [io 0x0000-0x007f]
>>
>> The I/O space support that's there is broken because we think the same
>> I/O range is available on both root buses, which is probably not the
>> case:
>>
>> pci_bus 0000:00: resource 0 [io 0x0000-0xffffffff]
>> pci_bus 0001:00: resource 0 [io 0x0000-0xffffffff]
>>
> That's the problem I want to confirm what I've changed is correct. I've split
> the two RootComplex using separate I/O range, it seems works on our device,
> but since I'm not very clear about Linux kernel, I want some some to check it.
> For mvsas, I've already modified it some thing like Chris said when I began
> using MDE-4.0.0 GA release. I bring it out to see if there have some ideas
> about that issue.

Some architectures do implement multiple I/O ranges. Typical HP
parisc and ia64 boxes have a PCI host bridge for every slot, so each
slot can be in a separate PCI domain, and each host bridge can support
a separate 64KB I/O port space for its slot. In that case, the values
in the struct resource will be different from the actual addresses
that appear on the PCI buses.

For example, you might have bridge A leading to bus 0000:00 with [io
0x0000-0xffff] and bridge B leading to bus 0001:00 with [io
0x10000-0x1ffff]. The I/O port addresses used by drivers don't
overlap, and there's no ambiguity, but if you put an analyzer on bus
0001:00, you'd see port addresses in the 0x0000-0xffff range. If you
moved the analyzer to bus 0000:00, you'd see the same 0x0000-0xffff
range of port addresses. It's up to the architecture implementation
of inb()/outb()/etc. to map an I/O resource address to a host bridge
and a bus port address behind that bridge.

The bottom line is that what you want to do seems possible and makes
some sense. Of course, the diff you posted is useless for upstream
Linux because it's all entangled with MDE and it reverts a lot of the
recent Linux work. But what you want to do is possible in principle.
It's up to you and Chris to figure out whether and how to rework the
changes to add this functionality cleanly.

Bjorn