This is v2 of my attempt to add support for a generic pci_host_bridge controller created
from a description passed in the device tree.
Changes from v1:
- Add patch to fix conversion of IO ranges into IO resources.
- Added a domain_nr member to pci_host_bridge structure, and a new function
to create a root bus in a given domain number. In order to facilitate that
I propose changing the order of initialisation between pci_host_bridge and
it's related bus in pci_create_root_bus() as sort of a rever of 7b5436635800.
This is done in patch 1/4 and 2/4.
- Added a simple allocator of domain numbers in drivers/pci/host-bridge.c. The
code will first try to get a domain id from of_alias_get_id(..., "pci-domain")
and if that fails assign the next unallocated domain id.
- Changed the name of the function that creates the generic host bridge from
pci_host_bridge_of_init to of_create_pci_host_bridge and exported as GPL symbol.
v1 thread here: https://lkml.org/lkml/2014/2/3/380
The following is an edit of the original blurb:
Following the discussion started here [1], I now have a proposal for tackling
generic support for host bridges described via device tree. It is an initial
stab at it, to try to get feedback and suggestions, but it is functional enough
that I have PCI Express for arm64 working on an FPGA using the patch that I am
also publishing that adds support for PCI for that platform.
Looking at the existing architectures that fit the requirements (use of device
tree and PCI) yields the powerpc and microblaze as generic enough to make them
candidates for conversion. I have a tentative patch for microblaze that I can
only compile test it, unfortunately using qemu-microblaze leads to an early
crash in the kernel.
As Bjorn has mentioned in the previous discussion, the idea is to add to
struct pci_host_bridge enough data to be able to reduce the size or remove the
architecture specific pci_controller structure. arm64 support actually manages
to get rid of all the architecture static data and has no pci_controller structure
defined. For host bridge drivers that means a change of API unless architectures
decide to provide a compatibility layer (comments here please).
In order to initialise a host bridge with the new API, the following example
code is sufficient for a _probe() function:
static int myhostbridge_probe(struct platform_device *pdev)
{
int err;
struct device_node *dev;
struct pci_host_bridge *bridge;
struct myhostbridge_port *pp;
resource_size_t lastbus;
dev = pdev->dev.of_node;
if (!of_device_is_available(dev)) {
pr_warn("%s: disabled\n", dev->full_name);
return -ENODEV;
}
pp = kzalloc(sizeof(struct myhostbridge_port), GFP_KERNEL);
if (!pp)
return -ENOMEM;
bridge = of_create_pci_host_bridge(&pdev->dev, &myhostbridge_ops, pp);
if (!bridge) {
err = -EINVAL;
goto bridge_init_fail;
}
err = myhostbridge_setup(bridge->bus);
if (err)
goto bridge_init_fail;
/* We always enable PCI domains and we keep domain 0 backward
* compatible in /proc for video cards
*/
pci_add_flags(PCI_ENABLE_PROC_DOMAINS | PCI_COMPAT_DOMAIN_0);
pci_add_flags(PCI_REASSIGN_ALL_BUS | PCI_REASSIGN_ALL_RSRC);
lastbus = pci_scan_child_bus(bridge->bus);
pci_bus_update_busn_res_end(bridge->bus, lastbus);
pci_assign_unassigned_bus_resources(bridge->bus);
pci_bus_add_devices(bridge->bus);
return 0;
bridge_init_fail:
kfree(pp);
return err;
}
[1] http://thread.gmane.org/gmane.linux.kernel.pci/25946
Best regards,
Liviu
Liviu Dudau (4):
pci: OF: Fix the conversion of IO ranges into IO resources.
pci: Create pci_host_bridge before its associated bus in pci_create_root_bus.
pci: Introduce a domain number for pci_host_bridge.
pci: Add support for creating a generic host_bridge from device tree
drivers/of/address.c | 31 +++++++++++
drivers/pci/host-bridge.c | 134 +++++++++++++++++++++++++++++++++++++++++++++
drivers/pci/probe.c | 66 ++++++++++++++--------
include/linux/of_address.h | 13 +----
include/linux/pci.h | 17 ++++++
5 files changed, 227 insertions(+), 34 deletions(-)
--
1.9.0
Before commit 7b5436635800 the pci_host_bridge was created before the root bus.
As that commit has added a needless dependency on the bus for pci_alloc_host_bridge()
the creation order has been changed for no good reason. Revert the order of
creation as we are going to depend on the pci_host_bridge structure to retrieve the
domain number of the root bus.
Signed-off-by: Liviu Dudau <[email protected]>
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 6e34498..78ccba0 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -505,7 +505,7 @@ static void pci_release_host_bridge_dev(struct device *dev)
kfree(bridge);
}
-static struct pci_host_bridge *pci_alloc_host_bridge(struct pci_bus *b)
+static struct pci_host_bridge *pci_alloc_host_bridge(void)
{
struct pci_host_bridge *bridge;
@@ -514,7 +514,6 @@ static struct pci_host_bridge *pci_alloc_host_bridge(struct pci_bus *b)
return NULL;
INIT_LIST_HEAD(&bridge->windows);
- bridge->bus = b;
return bridge;
}
@@ -1727,9 +1726,21 @@ struct pci_bus *pci_create_root_bus(struct device *parent, int bus,
char bus_addr[64];
char *fmt;
+ bridge = pci_alloc_host_bridge();
+ if (!bridge)
+ return NULL;
+
+ bridge->dev.parent = parent;
+ bridge->dev.release = pci_release_host_bridge_dev;
+ error = pcibios_root_bridge_prepare(bridge);
+ if (error) {
+ kfree(bridge);
+ return NULL;
+ }
+
b = pci_alloc_bus();
if (!b)
- return NULL;
+ goto err_out;
b->sysdata = sysdata;
b->ops = ops;
@@ -1738,26 +1749,15 @@ struct pci_bus *pci_create_root_bus(struct device *parent, int bus,
if (b2) {
/* If we already got to this bus through a different bridge, ignore it */
dev_dbg(&b2->dev, "bus already known\n");
- goto err_out;
+ goto err_bus_out;
}
- bridge = pci_alloc_host_bridge(b);
- if (!bridge)
- goto err_out;
-
- bridge->dev.parent = parent;
- bridge->dev.release = pci_release_host_bridge_dev;
+ bridge->bus = b;
dev_set_name(&bridge->dev, "pci%04x:%02x", pci_domain_nr(b), bus);
- error = pcibios_root_bridge_prepare(bridge);
- if (error) {
- kfree(bridge);
- goto err_out;
- }
-
error = device_register(&bridge->dev);
if (error) {
put_device(&bridge->dev);
- goto err_out;
+ goto err_bus_out;
}
b->bridge = get_device(&bridge->dev);
device_enable_async_suspend(b->bridge);
@@ -1814,8 +1814,10 @@ struct pci_bus *pci_create_root_bus(struct device *parent, int bus,
class_dev_reg_err:
put_device(&bridge->dev);
device_unregister(&bridge->dev);
-err_out:
+err_bus_out:
kfree(b);
+err_out:
+ kfree(bridge);
return NULL;
}
--
1.9.0
The ranges property for a host bridge controller in DT describes
the mapping between the PCI bus address and the CPU physical address.
The resources framework however expects that the IO resources start
at a pseudo "port" address 0 (zero) and have a maximum size of 64kb.
The conversion from pci ranges to resources failed to take that into
account.
In the process move the function into drivers/of/address.c as it
now depends on pci_address_to_pio() code.
Signed-off-by: Liviu Dudau <[email protected]>
diff --git a/drivers/of/address.c b/drivers/of/address.c
index 1a54f1f..7cf2b16 100644
--- a/drivers/of/address.c
+++ b/drivers/of/address.c
@@ -719,3 +719,34 @@ void __iomem *of_iomap(struct device_node *np, int index)
return ioremap(res.start, resource_size(&res));
}
EXPORT_SYMBOL(of_iomap);
+
+/**
+ * of_pci_range_to_resource - Create a resource from an of_pci_range
+ * @range: the PCI range that describes the resource
+ * @np: device node where the range belongs to
+ * @res: pointer to a valid resource that will be updated to
+ * reflect the values contained in the range.
+ * Note that if the range is an IO range, the resource will be converted
+ * using pci_address_to_pio() which can fail if it is called to early or
+ * if the range cannot be matched to any host bridge IO space.
+ */
+void of_pci_range_to_resource(struct of_pci_range *range,
+ struct device_node *np, struct resource *res)
+{
+ res->flags = range->flags;
+ if (res->flags & IORESOURCE_IO) {
+ unsigned long port;
+ port = pci_address_to_pio(range->pci_addr);
+ if (port == (unsigned long)-1) {
+ res->start = (resource_size_t)OF_BAD_ADDR;
+ res->end = (resource_size_t)OF_BAD_ADDR;
+ return;
+ }
+ res->start = port;
+ } else {
+ res->start = range->cpu_addr;
+ }
+ res->end = res->start + range->size - 1;
+ res->parent = res->child = res->sibling = NULL;
+ res->name = np->full_name;
+}
diff --git a/include/linux/of_address.h b/include/linux/of_address.h
index 5f6ed6b..a667762 100644
--- a/include/linux/of_address.h
+++ b/include/linux/of_address.h
@@ -23,17 +23,8 @@ struct of_pci_range {
#define for_each_of_pci_range(parser, range) \
for (; of_pci_range_parser_one(parser, range);)
-static inline void of_pci_range_to_resource(struct of_pci_range *range,
- struct device_node *np,
- struct resource *res)
-{
- res->flags = range->flags;
- res->start = range->cpu_addr;
- res->end = range->cpu_addr + range->size - 1;
- res->parent = res->child = res->sibling = NULL;
- res->name = np->full_name;
-}
-
+extern void of_pci_range_to_resource(struct of_pci_range *range,
+ struct device_node *np, struct resource *res);
/* Translate a DMA address from device space to CPU space */
extern u64 of_translate_dma_address(struct device_node *dev,
const __be32 *in_addr);
--
1.9.0
Several platforms use a rather generic version of parsing
the device tree to find the host bridge ranges. Move the common code
into the generic PCI code and use it to create a pci_host_bridge
structure that can be used by arch code.
Based on early attempts by Andrew Murray to unify the code.
Used powerpc and microblaze PCI code as starting point.
Signed-off-by: Liviu Dudau <[email protected]>
diff --git a/drivers/pci/host-bridge.c b/drivers/pci/host-bridge.c
index 06ace62..feb8436 100644
--- a/drivers/pci/host-bridge.c
+++ b/drivers/pci/host-bridge.c
@@ -6,9 +6,13 @@
#include <linux/init.h>
#include <linux/pci.h>
#include <linux/module.h>
+#include <linux/of_address.h>
+#include <linux/of_pci.h>
#include "pci.h"
+static int domain_nr;
+
static struct pci_bus *find_pci_root_bus(struct pci_bus *bus)
{
while (bus->parent)
@@ -91,3 +95,133 @@ void pcibios_bus_to_resource(struct pci_bus *bus, struct resource *res,
res->end = region->end + offset;
}
EXPORT_SYMBOL(pcibios_bus_to_resource);
+
+/**
+ * pci_host_bridge_of_get_ranges - Parse PCI host bridge resources from DT
+ * @dev: device node of the host bridge having the range property
+ * @resources: list where the range of resources will be added after DT parsing
+ * @io_base: pointer to a variable that will contain the physical address for
+ * the start of the I/O range.
+ *
+ * If this function returns an error then the @resources list will be freed.
+ *
+ * This function will parse the "ranges" property of a PCI host bridge device
+ * node and setup the resource mapping based on its content. It is expected
+ * that the property conforms with the Power ePAPR document.
+ *
+ * Each architecture is then offered the chance of applying their own
+ * filtering of pci_host_bridge_windows based on their own restrictions by
+ * calling pcibios_fixup_bridge_ranges(). The filtered list of windows
+ * can then be used when creating a pci_host_bridge structure.
+ */
+static int pci_host_bridge_of_get_ranges(struct device_node *dev,
+ struct list_head *resources, resource_size_t *io_base)
+{
+ struct resource *res;
+ struct of_pci_range range;
+ struct of_pci_range_parser parser;
+ int err;
+
+ pr_info("PCI host bridge %s ranges:\n", dev->full_name);
+
+ /* Check for ranges property */
+ err = of_pci_range_parser_init(&parser, dev);
+ if (err)
+ return err;
+
+ pr_debug("Parsing ranges property...\n");
+ for_each_of_pci_range(&parser, &range) {
+ /* Read next ranges element */
+ pr_debug("pci_space: 0x%08x pci_addr:0x%016llx ",
+ range.pci_space, range.pci_addr);
+ pr_debug("cpu_addr:0x%016llx size:0x%016llx\n",
+ range.cpu_addr, range.size);
+
+ /*
+ * If we failed translation or got a zero-sized region
+ * then skip this range
+ */
+ if (range.cpu_addr == OF_BAD_ADDR || range.size == 0)
+ continue;
+
+ res = kzalloc(sizeof(struct resource), GFP_KERNEL);
+ if (!res) {
+ err = -ENOMEM;
+ goto bridge_ranges_nomem;
+ }
+
+ of_pci_range_to_resource(&range, dev, res);
+
+ if (resource_type(res) == IORESOURCE_IO)
+ *io_base = range.cpu_addr;
+
+ pci_add_resource_offset(resources, res,
+ res->start - range.pci_addr);
+ }
+
+ /* Apply architecture specific fixups for the ranges */
+ pcibios_fixup_bridge_ranges(resources);
+
+ return 0;
+
+bridge_ranges_nomem:
+ pci_free_resource_list(resources);
+ return err;
+}
+
+/**
+ * of_create_pci_host_bridge - Create a PCI host bridge structure using
+ * information passed in the DT.
+ * @parent: device owning this host bridge
+ * @ops: pci_ops associated with the host controller
+ * @host_data: opaque data structure used by the host controller.
+ *
+ * returns a pointer to the newly created pci_host_bridge structure, or
+ * NULL if the call failed.
+ *
+ * This function will try to obtain the host bridge domain number by
+ * using of_alias_get_id() call with "pci-domain" as a stem. If that
+ * fails, a local allocator will be used that will put each host bridge
+ * in a new domain.
+ */
+struct pci_host_bridge *
+of_create_pci_host_bridge(struct device *parent, struct pci_ops *ops, void *host_data)
+{
+ int err, domain, busno;
+ struct resource bus_range;
+ struct pci_bus *root_bus;
+ struct pci_host_bridge *bridge;
+ resource_size_t io_base;
+ LIST_HEAD(res);
+
+ domain = of_alias_get_id(parent->of_node, "pci-domain");
+ if (domain == -ENODEV)
+ domain = domain_nr++;
+
+ err = of_pci_parse_bus_range(parent->of_node, &bus_range);
+ if (err) {
+ dev_info(parent, "No bus range for %s, using default [0-255]\n",
+ parent->of_node->full_name);
+ bus_range.start = 0;
+ bus_range.end = 255;
+ bus_range.flags = IORESOURCE_BUS;
+ }
+ busno = bus_range.start;
+ pci_add_resource(&res, &bus_range);
+
+ /* now parse the rest of host bridge bus ranges */
+ if (pci_host_bridge_of_get_ranges(parent->of_node, &res, &io_base))
+ return NULL;
+
+ /* then create the root bus */
+ root_bus = pci_create_root_bus_in_domain(parent, domain, busno,
+ ops, host_data, &res);
+ if (!root_bus)
+ return NULL;
+
+ bridge = to_pci_host_bridge(root_bus->bridge);
+ bridge->io_base = io_base;
+
+ return bridge;
+}
+EXPORT_SYMBOL_GPL(of_create_pci_host_bridge);
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 1eed009..0c5e269 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -395,6 +395,7 @@ struct pci_host_bridge {
struct device dev;
struct pci_bus *bus; /* root bus */
int domain_nr;
+ resource_size_t io_base; /* physical address for the start of I/O area */
struct list_head windows; /* pci_host_bridge_windows */
void (*release_fn)(struct pci_host_bridge *);
void *release_data;
@@ -1786,11 +1787,23 @@ static inline struct device_node *pci_bus_to_OF_node(struct pci_bus *bus)
return bus ? bus->dev.of_node : NULL;
}
+struct pci_host_bridge *
+of_create_pci_host_bridge(struct device *parent, struct pci_ops *ops,
+ void *host_data);
+
+void pcibios_fixup_bridge_ranges(struct list_head *resources);
#else /* CONFIG_OF */
static inline void pci_set_of_node(struct pci_dev *dev) { }
static inline void pci_release_of_node(struct pci_dev *dev) { }
static inline void pci_set_bus_of_node(struct pci_bus *bus) { }
static inline void pci_release_bus_of_node(struct pci_bus *bus) { }
+
+static inline struct pci_host_bridge *
+pci_host_bridge_of_init(struct device *parent, struct pci_ops *ops,
+ void *host_data)
+{
+ return NULL;
+}
#endif /* CONFIG_OF */
#ifdef CONFIG_EEH
--
1.9.0
Make it easier to discover the domain number of a bus by storing
the number in pci_host_bridge for the root bus. Several architectures
have their own way of storing this information, so it makes sense
to try to unify the code. While at this, add a new function that
creates a root bus in a given domain and make pci_create_root_bus()
a wrapper around this function.
Signed-off-by: Liviu Dudau <[email protected]>
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 78ccba0..1b2f45c 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -1714,8 +1714,9 @@ void __weak pcibios_remove_bus(struct pci_bus *bus)
{
}
-struct pci_bus *pci_create_root_bus(struct device *parent, int bus,
- struct pci_ops *ops, void *sysdata, struct list_head *resources)
+struct pci_bus *pci_create_root_bus_in_domain(struct device *parent,
+ int domain, int bus, struct pci_ops *ops, void *sysdata,
+ struct list_head *resources)
{
int error;
struct pci_host_bridge *bridge;
@@ -1732,6 +1733,7 @@ struct pci_bus *pci_create_root_bus(struct device *parent, int bus,
bridge->dev.parent = parent;
bridge->dev.release = pci_release_host_bridge_dev;
+ bridge->domain_nr = domain;
error = pcibios_root_bridge_prepare(bridge);
if (error) {
kfree(bridge);
@@ -1745,7 +1747,7 @@ struct pci_bus *pci_create_root_bus(struct device *parent, int bus,
b->sysdata = sysdata;
b->ops = ops;
b->number = b->busn_res.start = bus;
- b2 = pci_find_bus(pci_domain_nr(b), bus);
+ b2 = pci_find_bus(bridge->domain_nr, bus);
if (b2) {
/* If we already got to this bus through a different bridge, ignore it */
dev_dbg(&b2->dev, "bus already known\n");
@@ -1753,7 +1755,7 @@ struct pci_bus *pci_create_root_bus(struct device *parent, int bus,
}
bridge->bus = b;
- dev_set_name(&bridge->dev, "pci%04x:%02x", pci_domain_nr(b), bus);
+ dev_set_name(&bridge->dev, "pci%04x:%02x", bridge->domain_nr, bus);
error = device_register(&bridge->dev);
if (error) {
put_device(&bridge->dev);
@@ -1768,7 +1770,7 @@ struct pci_bus *pci_create_root_bus(struct device *parent, int bus,
b->dev.class = &pcibus_class;
b->dev.parent = b->bridge;
- dev_set_name(&b->dev, "%04x:%02x", pci_domain_nr(b), bus);
+ dev_set_name(&b->dev, "%04x:%02x", bridge->domain_nr, bus);
error = device_register(&b->dev);
if (error)
goto class_dev_reg_err;
@@ -1821,6 +1823,22 @@ err_out:
return NULL;
}
+struct pci_bus *pci_create_root_bus(struct device *parent, int bus,
+ struct pci_ops *ops, void *sysdata, struct list_head *resources)
+{
+ int domain_nr;
+ struct pci_bus *b = pci_alloc_bus();
+ if (!b)
+ return NULL;
+
+ b->sysdata = sysdata;
+ domain_nr = pci_domain_nr(b);
+ kfree(b);
+
+ return pci_create_root_bus_in_domain(parent, domain_nr, bus,
+ ops, sysdata, resources);
+}
+
int pci_bus_insert_busn_res(struct pci_bus *b, int bus, int bus_max)
{
struct resource *res = &b->busn_res;
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 33aa2ca..1eed009 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -394,6 +394,7 @@ struct pci_host_bridge_window {
struct pci_host_bridge {
struct device dev;
struct pci_bus *bus; /* root bus */
+ int domain_nr;
struct list_head windows; /* pci_host_bridge_windows */
void (*release_fn)(struct pci_host_bridge *);
void *release_data;
@@ -747,6 +748,9 @@ struct pci_bus *pci_scan_bus(int bus, struct pci_ops *ops, void *sysdata);
struct pci_bus *pci_create_root_bus(struct device *parent, int bus,
struct pci_ops *ops, void *sysdata,
struct list_head *resources);
+struct pci_bus *pci_create_root_bus_in_domain(struct device *parent,
+ int domain, int bus, struct pci_ops *ops,
+ void *sysdata, struct list_head *resources);
int pci_bus_insert_busn_res(struct pci_bus *b, int bus, int busmax);
int pci_bus_update_busn_res_end(struct pci_bus *b, int busmax);
void pci_bus_release_busn_res(struct pci_bus *b);
--
1.9.0
On Thursday 27 February 2014 13:06:39 Liviu Dudau wrote:
> + res->flags = range->flags;
> + if (res->flags & IORESOURCE_IO) {
> + unsigned long port;
> + port = pci_address_to_pio(range->pci_addr);
> + if (port == (unsigned long)-1) {
> + res->start = (resource_size_t)OF_BAD_ADDR;
> + res->end = (resource_size_t)OF_BAD_ADDR;
> + return;
> + }
>
I think this conflicts with the way that pci_address_to_pio() is
defined on powerpc, where it expects a CPU address as the input,
not a PCI i/o address.
Arnd
On 27 February 2014 13:06, Liviu Dudau <[email protected]> wrote:
>
> The ranges property for a host bridge controller in DT describes
> the mapping between the PCI bus address and the CPU physical address.
> The resources framework however expects that the IO resources start
> at a pseudo "port" address 0 (zero) and have a maximum size of 64kb.
Is this just in the case of ARM? (I've tried to keep up with the
conversation, but apologies if I've misunderstood).
> The conversion from pci ranges to resources failed to take that into
> account.
>
> In the process move the function into drivers/of/address.c as it
> now depends on pci_address_to_pio() code.
>
> Signed-off-by: Liviu Dudau <[email protected]>
>
> diff --git a/drivers/of/address.c b/drivers/of/address.c
> index 1a54f1f..7cf2b16 100644
> --- a/drivers/of/address.c
> +++ b/drivers/of/address.c
> @@ -719,3 +719,34 @@ void __iomem *of_iomap(struct device_node *np, int index)
> return ioremap(res.start, resource_size(&res));
> }
> EXPORT_SYMBOL(of_iomap);
> +
> +/**
> + * of_pci_range_to_resource - Create a resource from an of_pci_range
> + * @range: the PCI range that describes the resource
> + * @np: device node where the range belongs to
> + * @res: pointer to a valid resource that will be updated to
> + * reflect the values contained in the range.
> + * Note that if the range is an IO range, the resource will be converted
> + * using pci_address_to_pio() which can fail if it is called to early or
> + * if the range cannot be matched to any host bridge IO space.
> + */
> +void of_pci_range_to_resource(struct of_pci_range *range,
> + struct device_node *np, struct resource *res)
> +{
> + res->flags = range->flags;
> + if (res->flags & IORESOURCE_IO) {
> + unsigned long port;
> + port = pci_address_to_pio(range->pci_addr);
Is this likely to break existing users of of_pci_range_to_resource?
For example arch/mips: IO_SPACE_LIMIT defaults to 0xffff and there is
no overridden implementation for pci_address_to_pio, therefore this
will set res->start to OF_BAD_ADDR whereas previously it would have
been the CPU address for I/O (assuming the cpu_addr was previously >
64K).
I have no idea if I/O previously worked for mips, but this patch seems
to change that behavior. It may be a similar story for microblaze and
powerpc.
Andrew Murray
> + if (port == (unsigned long)-1) {
> + res->start = (resource_size_t)OF_BAD_ADDR;
> + res->end = (resource_size_t)OF_BAD_ADDR;
> + return;
> + }
> + res->start = port;
> + } else {
> + res->start = range->cpu_addr;
> + }
> + res->end = res->start + range->size - 1;
> + res->parent = res->child = res->sibling = NULL;
> + res->name = np->full_name;
> +}
> diff --git a/include/linux/of_address.h b/include/linux/of_address.h
> index 5f6ed6b..a667762 100644
> --- a/include/linux/of_address.h
> +++ b/include/linux/of_address.h
> @@ -23,17 +23,8 @@ struct of_pci_range {
> #define for_each_of_pci_range(parser, range) \
> for (; of_pci_range_parser_one(parser, range);)
>
> -static inline void of_pci_range_to_resource(struct of_pci_range *range,
> - struct device_node *np,
> - struct resource *res)
> -{
> - res->flags = range->flags;
> - res->start = range->cpu_addr;
> - res->end = range->cpu_addr + range->size - 1;
> - res->parent = res->child = res->sibling = NULL;
> - res->name = np->full_name;
> -}
> -
> +extern void of_pci_range_to_resource(struct of_pci_range *range,
> + struct device_node *np, struct resource *res);
> /* Translate a DMA address from device space to CPU space */
> extern u64 of_translate_dma_address(struct device_node *dev,
> const __be32 *in_addr);
> --
> 1.9.0
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thursday 27 February 2014 13:06:40 Liviu Dudau wrote:
> Before commit 7b5436635800 the pci_host_bridge was created before the root bus.
> As that commit has added a needless dependency on the bus for pci_alloc_host_bridge()
> the creation order has been changed for no good reason. Revert the order of
> creation as we are going to depend on the pci_host_bridge structure to retrieve the
> domain number of the root bus.
>
> Signed-off-by: Liviu Dudau <[email protected]>
>
Looks good to me.
Arnd
On Thursday 27 February 2014 13:06:41 Liviu Dudau wrote:
> Make it easier to discover the domain number of a bus by storing
> the number in pci_host_bridge for the root bus. Several architectures
> have their own way of storing this information, so it makes sense
> to try to unify the code. While at this, add a new function that
> creates a root bus in a given domain and make pci_create_root_bus()
> a wrapper around this function.
>
> Signed-off-by: Liviu Dudau <[email protected]>
>
Acked-by: Arnd Bergmann <[email protected]>
On Thursday 27 February 2014 13:06:42 Liviu Dudau wrote:
> Several platforms use a rather generic version of parsing
> the device tree to find the host bridge ranges. Move the common code
> into the generic PCI code and use it to create a pci_host_bridge
> structure that can be used by arch code.
>
> Based on early attempts by Andrew Murray to unify the code.
> Used powerpc and microblaze PCI code as starting point.
>
> Signed-off-by: Liviu Dudau <[email protected]>
Please add Benjamin Herrenschmidt to Cc here, I think it would be helpful
to get his input so we can make this work on powerpc as well.
> diff --git a/drivers/pci/host-bridge.c b/drivers/pci/host-bridge.c
> index 06ace62..feb8436 100644
> --- a/drivers/pci/host-bridge.c
> +++ b/drivers/pci/host-bridge.c
> @@ -6,9 +6,13 @@
> #include <linux/init.h>
> #include <linux/pci.h>
> #include <linux/module.h>
> +#include <linux/of_address.h>
> +#include <linux/of_pci.h>
>
> #include "pci.h"
>
> +static int domain_nr;
> +
For correctness, I think you want an 'atomic_t' here and use
atomic_inc_return() to get a new value.
> static struct pci_bus *find_pci_root_bus(struct pci_bus *bus)
> {
> while (bus->parent)
> @@ -91,3 +95,133 @@ void pcibios_bus_to_resource(struct pci_bus *bus, struct resource *res,
> res->end = region->end + offset;
> }
> EXPORT_SYMBOL(pcibios_bus_to_resource);
> +
> +/**
> + * pci_host_bridge_of_get_ranges - Parse PCI host bridge resources from DT
> + * @dev: device node of the host bridge having the range property
> + * @resources: list where the range of resources will be added after DT parsing
> + * @io_base: pointer to a variable that will contain the physical address for
> + * the start of the I/O range.
> + *
> + * If this function returns an error then the @resources list will be freed.
> + *
> + * This function will parse the "ranges" property of a PCI host bridge device
> + * node and setup the resource mapping based on its content. It is expected
> + * that the property conforms with the Power ePAPR document.
> + *
> + * Each architecture is then offered the chance of applying their own
> + * filtering of pci_host_bridge_windows based on their own restrictions by
> + * calling pcibios_fixup_bridge_ranges(). The filtered list of windows
> + * can then be used when creating a pci_host_bridge structure.
> + */
> +static int pci_host_bridge_of_get_ranges(struct device_node *dev,
> + struct list_head *resources, resource_size_t *io_base)
> +{
> + struct resource *res;
> + struct of_pci_range range;
> + struct of_pci_range_parser parser;
> + int err;
> +
> + pr_info("PCI host bridge %s ranges:\n", dev->full_name);
> +
> + /* Check for ranges property */
> + err = of_pci_range_parser_init(&parser, dev);
> + if (err)
> + return err;
> +
> + pr_debug("Parsing ranges property...\n");
> + for_each_of_pci_range(&parser, &range) {
> + /* Read next ranges element */
> + pr_debug("pci_space: 0x%08x pci_addr:0x%016llx ",
> + range.pci_space, range.pci_addr);
> + pr_debug("cpu_addr:0x%016llx size:0x%016llx\n",
> + range.cpu_addr, range.size);
> +
> + /*
> + * If we failed translation or got a zero-sized region
> + * then skip this range
> + */
> + if (range.cpu_addr == OF_BAD_ADDR || range.size == 0)
> + continue;
> +
> + res = kzalloc(sizeof(struct resource), GFP_KERNEL);
> + if (!res) {
> + err = -ENOMEM;
> + goto bridge_ranges_nomem;
> + }
> +
> + of_pci_range_to_resource(&range, dev, res);
> +
> + if (resource_type(res) == IORESOURCE_IO)
> + *io_base = range.cpu_addr;
> +
> + pci_add_resource_offset(resources, res,
> + res->start - range.pci_addr);
> + }
This is not the correct resource for I/O space at all. Please talk
to Will, I've been over this with him in detail and he probably
understands it now. I assume you are both working in the same
building.
Since this is common PCI code, you could also decide to open-code
the pci_add_resource_offset() function. If you don't do that, I
think you have a memory leak for the resources that you can avoid
by allocating the resource and pci_host_bridge_window structures
together with a single kzalloc.
> + /* Apply architecture specific fixups for the ranges */
> + pcibios_fixup_bridge_ranges(resources);
> +
> + return 0;
> +
> +bridge_ranges_nomem:
> + pci_free_resource_list(resources);
> + return err;
> +}
> +
> +/**
> + * of_create_pci_host_bridge - Create a PCI host bridge structure using
> + * information passed in the DT.
> + * @parent: device owning this host bridge
> + * @ops: pci_ops associated with the host controller
> + * @host_data: opaque data structure used by the host controller.
> + *
> + * returns a pointer to the newly created pci_host_bridge structure, or
> + * NULL if the call failed.
> + *
> + * This function will try to obtain the host bridge domain number by
> + * using of_alias_get_id() call with "pci-domain" as a stem. If that
> + * fails, a local allocator will be used that will put each host bridge
> + * in a new domain.
> + */
> +struct pci_host_bridge *
> +of_create_pci_host_bridge(struct device *parent, struct pci_ops *ops, void *host_data)
> +{
> + int err, domain, busno;
> + struct resource bus_range;
> + struct pci_bus *root_bus;
> + struct pci_host_bridge *bridge;
> + resource_size_t io_base;
> + LIST_HEAD(res);
> +
> + domain = of_alias_get_id(parent->of_node, "pci-domain");
> + if (domain == -ENODEV)
> + domain = domain_nr++;
> +
> + err = of_pci_parse_bus_range(parent->of_node, &bus_range);
> + if (err) {
> + dev_info(parent, "No bus range for %s, using default [0-255]\n",
> + parent->of_node->full_name);
> + bus_range.start = 0;
> + bus_range.end = 255;
> + bus_range.flags = IORESOURCE_BUS;
> + }
> + busno = bus_range.start;
> + pci_add_resource(&res, &bus_range);
> +
> + /* now parse the rest of host bridge bus ranges */
> + if (pci_host_bridge_of_get_ranges(parent->of_node, &res, &io_base))
> + return NULL;
> +
> + /* then create the root bus */
> + root_bus = pci_create_root_bus_in_domain(parent, domain, busno,
> + ops, host_data, &res);
> + if (!root_bus)
> + return NULL;
Do we have any code that checks for conflicting domain/bus numbers here?
I guess pci_create_root_bus_in_domain() will fail if you have that.
Since pci_create_root_bus_in_domain() is a new function that you just
introduced, it would be helpful to change the calling conventions
so it returns an error pointer instead of NULL upon failing.
of_create_pci_host_bridge() can do the same, but pci_create_root_bus()
should keep returning NULL so we don't have to change all the
callers.
> + bridge = to_pci_host_bridge(root_bus->bridge);
> + bridge->io_base = io_base;
> +
> + return bridge;
> +}
> +EXPORT_SYMBOL_GPL(of_create_pci_host_bridge);
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 1eed009..0c5e269 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -395,6 +395,7 @@ struct pci_host_bridge {
> struct device dev;
> struct pci_bus *bus; /* root bus */
> int domain_nr;
> + resource_size_t io_base; /* physical address for the start of I/O area */
> struct list_head windows; /* pci_host_bridge_windows */
> void (*release_fn)(struct pci_host_bridge *);
> void *release_data;
What is the io_base used for here?
> @@ -1786,11 +1787,23 @@ static inline struct device_node *pci_bus_to_OF_node(struct pci_bus *bus)
> return bus ? bus->dev.of_node : NULL;
> }
>
> +struct pci_host_bridge *
> +of_create_pci_host_bridge(struct device *parent, struct pci_ops *ops,
> + void *host_data);
> +
> +void pcibios_fixup_bridge_ranges(struct list_head *resources);
> #else /* CONFIG_OF */
> static inline void pci_set_of_node(struct pci_dev *dev) { }
> static inline void pci_release_of_node(struct pci_dev *dev) { }
> static inline void pci_set_bus_of_node(struct pci_bus *bus) { }
> static inline void pci_release_bus_of_node(struct pci_bus *bus) { }
> +
> +static inline struct pci_host_bridge *
> +pci_host_bridge_of_init(struct device *parent, struct pci_ops *ops,
> + void *host_data)
> +{
> + return NULL;
> +}
> #endif /* CONFIG_OF */
>
> #ifdef CONFIG_EEH
>
On Thu, Feb 27, 2014 at 01:20:54PM +0000, Arnd Bergmann wrote:
> On Thursday 27 February 2014 13:06:39 Liviu Dudau wrote:
> > + res->flags = range->flags;
> > + if (res->flags & IORESOURCE_IO) {
> > + unsigned long port;
> > + port = pci_address_to_pio(range->pci_addr);
> > + if (port == (unsigned long)-1) {
> > + res->start = (resource_size_t)OF_BAD_ADDR;
> > + res->end = (resource_size_t)OF_BAD_ADDR;
> > + return;
> > + }
> >
>
> I think this conflicts with the way that pci_address_to_pio() is
> defined on powerpc, where it expects a CPU address as the input,
> not a PCI i/o address.
And you are right, maybe what I need is to fix weak version of the
function, as that one cannot cope with cpu addresses. But I think
the idea still stands.
Thanks for reviewing this version as well!
Liviu
>
> Arnd
>
>
--
====================
| I would like to |
| fix the world, |
| but they're not |
| giving me the |
\ source code! /
---------------
¯\_(ツ)_/¯
On Thursday 27 February 2014 14:38:32 Arnd Bergmann wrote:
> > + pr_debug("Parsing ranges property...\n");
> > + for_each_of_pci_range(&parser, &range) {
> > + /* Read next ranges element */
> > + pr_debug("pci_space: 0x%08x pci_addr:0x%016llx ",
> > + range.pci_space, range.pci_addr);
> > + pr_debug("cpu_addr:0x%016llx size:0x%016llx\n",
> > + range.cpu_addr, range.size);
> > +
> > + /*
> > + * If we failed translation or got a zero-sized region
> > + * then skip this range
> > + */
> > + if (range.cpu_addr == OF_BAD_ADDR || range.size == 0)
> > + continue;
> > +
> > + res = kzalloc(sizeof(struct resource), GFP_KERNEL);
> > + if (!res) {
> > + err = -ENOMEM;
> > + goto bridge_ranges_nomem;
> > + }
> > +
> > + of_pci_range_to_resource(&range, dev, res);
> > +
> > + if (resource_type(res) == IORESOURCE_IO)
> > + *io_base = range.cpu_addr;
> > +
> > + pci_add_resource_offset(resources, res,
> > + res->start - range.pci_addr);
> > + }
>
> This is not the correct resource for I/O space at all. Please talk
> to Will, I've been over this with him in detail and he probably
> understands it now. I assume you are both working in the same
> building.
>
Sorry, I initially missed part of your changes in patch 1.
I think it's actually correct now.
Arnd
On Thu, Feb 27, 2014 at 01:22:19PM +0000, Andrew Murray wrote:
> On 27 February 2014 13:06, Liviu Dudau <[email protected]> wrote:
> >
> > The ranges property for a host bridge controller in DT describes
> > the mapping between the PCI bus address and the CPU physical address.
> > The resources framework however expects that the IO resources start
> > at a pseudo "port" address 0 (zero) and have a maximum size of 64kb.
>
> Is this just in the case of ARM? (I've tried to keep up with the
> conversation, but apologies if I've misunderstood).
>
> > The conversion from pci ranges to resources failed to take that into
> > account.
> >
> > In the process move the function into drivers/of/address.c as it
> > now depends on pci_address_to_pio() code.
> >
> > Signed-off-by: Liviu Dudau <[email protected]>
> >
> > diff --git a/drivers/of/address.c b/drivers/of/address.c
> > index 1a54f1f..7cf2b16 100644
> > --- a/drivers/of/address.c
> > +++ b/drivers/of/address.c
> > @@ -719,3 +719,34 @@ void __iomem *of_iomap(struct device_node *np, int index)
> > return ioremap(res.start, resource_size(&res));
> > }
> > EXPORT_SYMBOL(of_iomap);
> > +
> > +/**
> > + * of_pci_range_to_resource - Create a resource from an of_pci_range
> > + * @range: the PCI range that describes the resource
> > + * @np: device node where the range belongs to
> > + * @res: pointer to a valid resource that will be updated to
> > + * reflect the values contained in the range.
> > + * Note that if the range is an IO range, the resource will be converted
> > + * using pci_address_to_pio() which can fail if it is called to early or
> > + * if the range cannot be matched to any host bridge IO space.
> > + */
> > +void of_pci_range_to_resource(struct of_pci_range *range,
> > + struct device_node *np, struct resource *res)
> > +{
> > + res->flags = range->flags;
> > + if (res->flags & IORESOURCE_IO) {
> > + unsigned long port;
> > + port = pci_address_to_pio(range->pci_addr);
>
> Is this likely to break existing users of of_pci_range_to_resource?
I've tested the change with a tegra2 based device (trimslice) and I've
got a functional board.
>
> For example arch/mips: IO_SPACE_LIMIT defaults to 0xffff and there is
> no overridden implementation for pci_address_to_pio, therefore this
> will set res->start to OF_BAD_ADDR whereas previously it would have
> been the CPU address for I/O (assuming the cpu_addr was previously >
> 64K).
And that is why I'm using pci_addr as the physical address passed to
pci_address_to_pio. My idea is that when converting ranges into resources,
for IORESOURCE_IO we should generate a resource that uses logical I/O
addresses, not physical CPU addresses. How we get that needs debating,
currently I've took the shortcut of using the pci_addr to get the port
number.
>
> I have no idea if I/O previously worked for mips, but this patch seems
> to change that behavior. It may be a similar story for microblaze and
> powerpc.
Both microblaze and powerpc share an identical implementation and it is
expecting that the physical address passed as parameter fits between
io_base_phys and io_base_phys + pcibios_io_size(hose). So yes, the
correct way is to use cpu_addr and fix the weak implementation? But we
don't have enough information for the weak implementation to work, as
we don't know where the physical IO base start (we are just about to
find out from DT).
Thanks for reviewing these patches!
Liviu
>
> Andrew Murray
>
> > + if (port == (unsigned long)-1) {
> > + res->start = (resource_size_t)OF_BAD_ADDR;
> > + res->end = (resource_size_t)OF_BAD_ADDR;
> > + return;
> > + }
> > + res->start = port;
> > + } else {
> > + res->start = range->cpu_addr;
> > + }
> > + res->end = res->start + range->size - 1;
> > + res->parent = res->child = res->sibling = NULL;
> > + res->name = np->full_name;
> > +}
> > diff --git a/include/linux/of_address.h b/include/linux/of_address.h
> > index 5f6ed6b..a667762 100644
> > --- a/include/linux/of_address.h
> > +++ b/include/linux/of_address.h
> > @@ -23,17 +23,8 @@ struct of_pci_range {
> > #define for_each_of_pci_range(parser, range) \
> > for (; of_pci_range_parser_one(parser, range);)
> >
> > -static inline void of_pci_range_to_resource(struct of_pci_range *range,
> > - struct device_node *np,
> > - struct resource *res)
> > -{
> > - res->flags = range->flags;
> > - res->start = range->cpu_addr;
> > - res->end = range->cpu_addr + range->size - 1;
> > - res->parent = res->child = res->sibling = NULL;
> > - res->name = np->full_name;
> > -}
> > -
> > +extern void of_pci_range_to_resource(struct of_pci_range *range,
> > + struct device_node *np, struct resource *res);
> > /* Translate a DMA address from device space to CPU space */
> > extern u64 of_translate_dma_address(struct device_node *dev,
> > const __be32 *in_addr);
> > --
> > 1.9.0
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> > the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
>
--
====================
| I would like to |
| fix the world, |
| but they're not |
| giving me the |
\ source code! /
---------------
¯\_(ツ)_/¯
On Thursday 27 February 2014 13:22:19 Andrew Murray wrote:
> On 27 February 2014 13:06, Liviu Dudau <[email protected]> wrote:
> >
> > The ranges property for a host bridge controller in DT describes
> > the mapping between the PCI bus address and the CPU physical address.
> > The resources framework however expects that the IO resources start
> > at a pseudo "port" address 0 (zero) and have a maximum size of 64kb.
>
> Is this just in the case of ARM? (I've tried to keep up with the
> conversation, but apologies if I've misunderstood).
We are a bit inconsistent on Linux. The limitation cited above is
indeed something we came up with on ARM to simplify the possible
cases we have to worry about.
In theory, each PCI host can have its own 4GB I/O space, but in practice
limiting to 64KB is the most reasonable way to use it, and that
still provides plenty of room for I/O registers since most devices
don't use any, and at most a few bytes of address space.
The limit we enforce on Linux is IO_SPACE_LIMIT, which is sometimes set
to 0xffffffff, but I think most if not all of those cases are done so
in error.
> > + * of_pci_range_to_resource - Create a resource from an of_pci_range
> > + * @range: the PCI range that describes the resource
> > + * @np: device node where the range belongs to
> > + * @res: pointer to a valid resource that will be updated to
> > + * reflect the values contained in the range.
> > + * Note that if the range is an IO range, the resource will be converted
> > + * using pci_address_to_pio() which can fail if it is called to early or
> > + * if the range cannot be matched to any host bridge IO space.
> > + */
> > +void of_pci_range_to_resource(struct of_pci_range *range,
> > + struct device_node *np, struct resource *res)
> > +{
> > + res->flags = range->flags;
> > + if (res->flags & IORESOURCE_IO) {
> > + unsigned long port;
> > + port = pci_address_to_pio(range->pci_addr);
>
> Is this likely to break existing users of of_pci_range_to_resource?
>
> For example arch/mips: IO_SPACE_LIMIT defaults to 0xffff and there is
> no overridden implementation for pci_address_to_pio, therefore this
> will set res->start to OF_BAD_ADDR whereas previously it would have
> been the CPU address for I/O (assuming the cpu_addr was previously >
> 64K).
The function is used on MIPS, Microblaze and ARM at the moment.
MIPS currently gets it wrong, by calling pci_add_resource_offset
on the CPU address for IORESOURCE_IO, which is the wrong space.
Limiting to IO_SPACE_LIMIT will fix it for the first host bridge
on MIPS, and the second one will still not work, until
IO_SPACE_LIMIT is fixed.
On ARM, I believe we have a couple of drivers that make the
same mistake, and others that at the moment override the
address with range->pci_addr, so they won't change.
Microblaze does 'range.cpu_addr = range.pci_addr;' for the I/O
space window to fix it up. We should probably take a closer look there.
Arnd
On Thursday 27 February 2014 13:56:16 Liviu Dudau wrote:
> On Thu, Feb 27, 2014 at 01:22:19PM +0000, Andrew Murray wrote:
> > On 27 February 2014 13:06, Liviu Dudau <[email protected]> wrote:
> > >
> > > +/**
> > > + * of_pci_range_to_resource - Create a resource from an of_pci_range
> > > + * @range: the PCI range that describes the resource
> > > + * @np: device node where the range belongs to
> > > + * @res: pointer to a valid resource that will be updated to
> > > + * reflect the values contained in the range.
> > > + * Note that if the range is an IO range, the resource will be converted
> > > + * using pci_address_to_pio() which can fail if it is called to early or
> > > + * if the range cannot be matched to any host bridge IO space.
> > > + */
> > > +void of_pci_range_to_resource(struct of_pci_range *range,
> > > + struct device_node *np, struct resource *res)
> > > +{
> > > + res->flags = range->flags;
> > > + if (res->flags & IORESOURCE_IO) {
> > > + unsigned long port;
> > > + port = pci_address_to_pio(range->pci_addr);
> >
> > Is this likely to break existing users of of_pci_range_to_resource?
>
> I've tested the change with a tegra2 based device (trimslice) and I've
> got a functional board.
Did you have any devices using I/O ports though? They are fairly rare
these days.
> > I have no idea if I/O previously worked for mips, but this patch seems
> > to change that behavior. It may be a similar story for microblaze and
> > powerpc.
>
> Both microblaze and powerpc share an identical implementation and it is
> expecting that the physical address passed as parameter fits between
> io_base_phys and io_base_phys + pcibios_io_size(hose). So yes, the
> correct way is to use cpu_addr and fix the weak implementation? But we
> don't have enough information for the weak implementation to work, as
> we don't know where the physical IO base start (we are just about to
> find out from DT).
I think using pci_address_to_pio() at that point is just wrong
in either way. Before the host is fully registered, you can't actually
look up the port number -- you are only trying to assign one at this time.
The implementation that Will wrote for ARM would work here: find the
next available virtual I/O range, call pci_ioremap_io on range->pci_addr
and then return the virtual address. Unfortunately that code is not
architecture independent at this time, and we will first have to come
up with something that can be made to work for powerpc, microblaze,
mips and arm.
Arnd
On Thu, Feb 27, 2014 at 01:38:32PM +0000, Arnd Bergmann wrote:
> On Thursday 27 February 2014 13:06:42 Liviu Dudau wrote:
> > Several platforms use a rather generic version of parsing
> > the device tree to find the host bridge ranges. Move the common code
> > into the generic PCI code and use it to create a pci_host_bridge
> > structure that can be used by arch code.
> >
> > Based on early attempts by Andrew Murray to unify the code.
> > Used powerpc and microblaze PCI code as starting point.
> >
> > Signed-off-by: Liviu Dudau <[email protected]>
>
> Please add Benjamin Herrenschmidt to Cc here, I think it would be helpful
> to get his input so we can make this work on powerpc as well.
Sorry to Benjamin for omitting him, I will add him to future postings (if he finds
the direction interesting ;) ).
>
> > diff --git a/drivers/pci/host-bridge.c b/drivers/pci/host-bridge.c
> > index 06ace62..feb8436 100644
> > --- a/drivers/pci/host-bridge.c
> > +++ b/drivers/pci/host-bridge.c
> > @@ -6,9 +6,13 @@
> > #include <linux/init.h>
> > #include <linux/pci.h>
> > #include <linux/module.h>
> > +#include <linux/of_address.h>
> > +#include <linux/of_pci.h>
> >
> > #include "pci.h"
> >
> > +static int domain_nr;
> > +
>
> For correctness, I think you want an 'atomic_t' here and use
> atomic_inc_return() to get a new value.
OK, will do.
>
> > static struct pci_bus *find_pci_root_bus(struct pci_bus *bus)
> > {
> > while (bus->parent)
> > @@ -91,3 +95,133 @@ void pcibios_bus_to_resource(struct pci_bus *bus, struct resource *res,
> > res->end = region->end + offset;
> > }
> > EXPORT_SYMBOL(pcibios_bus_to_resource);
> > +
> > +/**
> > + * pci_host_bridge_of_get_ranges - Parse PCI host bridge resources from DT
> > + * @dev: device node of the host bridge having the range property
> > + * @resources: list where the range of resources will be added after DT parsing
> > + * @io_base: pointer to a variable that will contain the physical address for
> > + * the start of the I/O range.
> > + *
> > + * If this function returns an error then the @resources list will be freed.
> > + *
> > + * This function will parse the "ranges" property of a PCI host bridge device
> > + * node and setup the resource mapping based on its content. It is expected
> > + * that the property conforms with the Power ePAPR document.
> > + *
> > + * Each architecture is then offered the chance of applying their own
> > + * filtering of pci_host_bridge_windows based on their own restrictions by
> > + * calling pcibios_fixup_bridge_ranges(). The filtered list of windows
> > + * can then be used when creating a pci_host_bridge structure.
> > + */
> > +static int pci_host_bridge_of_get_ranges(struct device_node *dev,
> > + struct list_head *resources, resource_size_t *io_base)
> > +{
> > + struct resource *res;
> > + struct of_pci_range range;
> > + struct of_pci_range_parser parser;
> > + int err;
> > +
> > + pr_info("PCI host bridge %s ranges:\n", dev->full_name);
> > +
> > + /* Check for ranges property */
> > + err = of_pci_range_parser_init(&parser, dev);
> > + if (err)
> > + return err;
> > +
> > + pr_debug("Parsing ranges property...\n");
> > + for_each_of_pci_range(&parser, &range) {
> > + /* Read next ranges element */
> > + pr_debug("pci_space: 0x%08x pci_addr:0x%016llx ",
> > + range.pci_space, range.pci_addr);
> > + pr_debug("cpu_addr:0x%016llx size:0x%016llx\n",
> > + range.cpu_addr, range.size);
> > +
> > + /*
> > + * If we failed translation or got a zero-sized region
> > + * then skip this range
> > + */
> > + if (range.cpu_addr == OF_BAD_ADDR || range.size == 0)
> > + continue;
> > +
> > + res = kzalloc(sizeof(struct resource), GFP_KERNEL);
> > + if (!res) {
> > + err = -ENOMEM;
> > + goto bridge_ranges_nomem;
> > + }
> > +
> > + of_pci_range_to_resource(&range, dev, res);
> > +
> > + if (resource_type(res) == IORESOURCE_IO)
> > + *io_base = range.cpu_addr;
> > +
> > + pci_add_resource_offset(resources, res,
> > + res->start - range.pci_addr);
> > + }
>
> This is not the correct resource for I/O space at all. Please talk
> to Will, I've been over this with him in detail and he probably
> understands it now. I assume you are both working in the same
> building.
As you noticed later, it is now correct because res->start will
have the start logical I/O port. I'm talking with Will on regular basis
and I think we are both on the same sheet.
>
> Since this is common PCI code, you could also decide to open-code
> the pci_add_resource_offset() function. If you don't do that, I
> think you have a memory leak for the resources that you can avoid
> by allocating the resource and pci_host_bridge_window structures
> together with a single kzalloc.
This is a list of temporary resources that hasn't been yet added
to pci_host_bridge_window. As noted in the comment for the function, if
the call fails it will clear the resources list.
>
> > + /* Apply architecture specific fixups for the ranges */
> > + pcibios_fixup_bridge_ranges(resources);
> > +
> > + return 0;
> > +
> > +bridge_ranges_nomem:
> > + pci_free_resource_list(resources);
> > + return err;
> > +}
> > +
> > +/**
> > + * of_create_pci_host_bridge - Create a PCI host bridge structure using
> > + * information passed in the DT.
> > + * @parent: device owning this host bridge
> > + * @ops: pci_ops associated with the host controller
> > + * @host_data: opaque data structure used by the host controller.
> > + *
> > + * returns a pointer to the newly created pci_host_bridge structure, or
> > + * NULL if the call failed.
> > + *
> > + * This function will try to obtain the host bridge domain number by
> > + * using of_alias_get_id() call with "pci-domain" as a stem. If that
> > + * fails, a local allocator will be used that will put each host bridge
> > + * in a new domain.
> > + */
> > +struct pci_host_bridge *
> > +of_create_pci_host_bridge(struct device *parent, struct pci_ops *ops, void *host_data)
> > +{
> > + int err, domain, busno;
> > + struct resource bus_range;
> > + struct pci_bus *root_bus;
> > + struct pci_host_bridge *bridge;
> > + resource_size_t io_base;
> > + LIST_HEAD(res);
> > +
> > + domain = of_alias_get_id(parent->of_node, "pci-domain");
> > + if (domain == -ENODEV)
> > + domain = domain_nr++;
> > +
> > + err = of_pci_parse_bus_range(parent->of_node, &bus_range);
> > + if (err) {
> > + dev_info(parent, "No bus range for %s, using default [0-255]\n",
> > + parent->of_node->full_name);
> > + bus_range.start = 0;
> > + bus_range.end = 255;
> > + bus_range.flags = IORESOURCE_BUS;
> > + }
> > + busno = bus_range.start;
> > + pci_add_resource(&res, &bus_range);
> > +
> > + /* now parse the rest of host bridge bus ranges */
> > + if (pci_host_bridge_of_get_ranges(parent->of_node, &res, &io_base))
> > + return NULL;
> > +
> > + /* then create the root bus */
> > + root_bus = pci_create_root_bus_in_domain(parent, domain, busno,
> > + ops, host_data, &res);
> > + if (!root_bus)
> > + return NULL;
>
> Do we have any code that checks for conflicting domain/bus numbers here?
> I guess pci_create_root_bus_in_domain() will fail if you have that.
It does fail.
>
> Since pci_create_root_bus_in_domain() is a new function that you just
> introduced, it would be helpful to change the calling conventions
> so it returns an error pointer instead of NULL upon failing.
> of_create_pci_host_bridge() can do the same, but pci_create_root_bus()
> should keep returning NULL so we don't have to change all the
> callers.
I need to look if any useful information is being captured in the function.
Please note that pci_create_root_bus_in_domain() is actually the old
pci_create_root_bus() function with an added parameter.
>
> > + bridge = to_pci_host_bridge(root_bus->bridge);
> > + bridge->io_base = io_base;
To answer your later question, io_base is remembered here.
> > +
> > + return bridge;
> > +}
> > +EXPORT_SYMBOL_GPL(of_create_pci_host_bridge);
> > diff --git a/include/linux/pci.h b/include/linux/pci.h
> > index 1eed009..0c5e269 100644
> > --- a/include/linux/pci.h
> > +++ b/include/linux/pci.h
> > @@ -395,6 +395,7 @@ struct pci_host_bridge {
> > struct device dev;
> > struct pci_bus *bus; /* root bus */
> > int domain_nr;
> > + resource_size_t io_base; /* physical address for the start of I/O area */
> > struct list_head windows; /* pci_host_bridge_windows */
> > void (*release_fn)(struct pci_host_bridge *);
> > void *release_data;
>
> What is the io_base used for here?
It is useful for host bridge drivers as this is the only place where we store
the physical CPU address for the IO range. This is then needed when setting up the
translation registers. Also used when calling the pci_ioremap_io function that I'm
introducing in the AArch64 patches.
Whole patch is still under legal review, but the fragment for setting up the ATR looks
like this:
list_for_each_entry(window, &bridge->windows, list) {
res = window->res;
offset = window->offset;
wsize = ilog2(resource_size(res)) - 1;
if (resource_type(res) == IORESOURCE_MEM)
update_atr_entry(pp->base + ATR_REG_whatever,
res->start, /* CPU address */
res->start - offset, /* PCI address */
0, wsize);
else if (resource_type(res) == IORESOURCE_IO) {
io_offset = pci_ioremap_io(res, bridge->io_base + offset);
update_atr_entry(pp->base + ATR_REG_whatever,
bridge->io_base + res->start + offset, /* CPU address */
res->start, /* PCI address */
0x20000, wsize);
}
}
Best regards,
Liviu
>
> > @@ -1786,11 +1787,23 @@ static inline struct device_node *pci_bus_to_OF_node(struct pci_bus *bus)
> > return bus ? bus->dev.of_node : NULL;
> > }
> >
> > +struct pci_host_bridge *
> > +of_create_pci_host_bridge(struct device *parent, struct pci_ops *ops,
> > + void *host_data);
> > +
> > +void pcibios_fixup_bridge_ranges(struct list_head *resources);
> > #else /* CONFIG_OF */
> > static inline void pci_set_of_node(struct pci_dev *dev) { }
> > static inline void pci_release_of_node(struct pci_dev *dev) { }
> > static inline void pci_set_bus_of_node(struct pci_bus *bus) { }
> > static inline void pci_release_bus_of_node(struct pci_bus *bus) { }
> > +
> > +static inline struct pci_host_bridge *
> > +pci_host_bridge_of_init(struct device *parent, struct pci_ops *ops,
> > + void *host_data)
> > +{
> > + return NULL;
> > +}
> > #endif /* CONFIG_OF */
> >
> > #ifdef CONFIG_EEH
> >
>
>
>
--
====================
| I would like to |
| fix the world, |
| but they're not |
| giving me the |
\ source code! /
---------------
¯\_(ツ)_/¯
On Thu, Feb 27, 2014 at 02:08:44PM +0000, Arnd Bergmann wrote:
> On Thursday 27 February 2014 13:56:16 Liviu Dudau wrote:
> > On Thu, Feb 27, 2014 at 01:22:19PM +0000, Andrew Murray wrote:
> > > On 27 February 2014 13:06, Liviu Dudau <[email protected]> wrote:
> > > >
> > > > +/**
> > > > + * of_pci_range_to_resource - Create a resource from an of_pci_range
> > > > + * @range: the PCI range that describes the resource
> > > > + * @np: device node where the range belongs to
> > > > + * @res: pointer to a valid resource that will be updated to
> > > > + * reflect the values contained in the range.
> > > > + * Note that if the range is an IO range, the resource will be converted
> > > > + * using pci_address_to_pio() which can fail if it is called to early or
> > > > + * if the range cannot be matched to any host bridge IO space.
> > > > + */
> > > > +void of_pci_range_to_resource(struct of_pci_range *range,
> > > > + struct device_node *np, struct resource *res)
> > > > +{
> > > > + res->flags = range->flags;
> > > > + if (res->flags & IORESOURCE_IO) {
> > > > + unsigned long port;
> > > > + port = pci_address_to_pio(range->pci_addr);
> > >
> > > Is this likely to break existing users of of_pci_range_to_resource?
> >
> > I've tested the change with a tegra2 based device (trimslice) and I've
> > got a functional board.
>
> Did you have any devices using I/O ports though? They are fairly rare
> these days.
>
> > > I have no idea if I/O previously worked for mips, but this patch seems
> > > to change that behavior. It may be a similar story for microblaze and
> > > powerpc.
> >
> > Both microblaze and powerpc share an identical implementation and it is
> > expecting that the physical address passed as parameter fits between
> > io_base_phys and io_base_phys + pcibios_io_size(hose). So yes, the
> > correct way is to use cpu_addr and fix the weak implementation? But we
> > don't have enough information for the weak implementation to work, as
> > we don't know where the physical IO base start (we are just about to
> > find out from DT).
>
> I think using pci_address_to_pio() at that point is just wrong
> in either way. Before the host is fully registered, you can't actually
> look up the port number -- you are only trying to assign one at this time.
>
> The implementation that Will wrote for ARM would work here: find the
> next available virtual I/O range, call pci_ioremap_io on range->pci_addr
> and then return the virtual address. Unfortunately that code is not
> architecture independent at this time, and we will first have to come
> up with something that can be made to work for powerpc, microblaze,
> mips and arm.
No, no, no... we cannot use the virtual address as the start of the resource
as this will later be used when doing pcibios_resource_to_bus(). What
we need here is a portable way of converting from PCI range that uses physical
CPU addresses to a IORESOURCE_IO type resource that uses logical IO
addresses. Using logical IO values works, as Bjorn's code treats it as
physical address.
The alternative is to introduce a thorough revision of the pci_host_bridge
calls that understand that IORESOURCE_IO and IORESOURCE_MEM are different
beasts. That is what caught Will and others a number of times.
Best regards,
Liviu
>
> Arnd
>
>
--
====================
| I would like to |
| fix the world, |
| but they're not |
| giving me the |
\ source code! /
---------------
¯\_(ツ)_/¯
On Thu, Feb 27, 2014 at 02:08:44PM +0000, Arnd Bergmann wrote:
> On Thursday 27 February 2014 13:56:16 Liviu Dudau wrote:
> > On Thu, Feb 27, 2014 at 01:22:19PM +0000, Andrew Murray wrote:
> > > On 27 February 2014 13:06, Liviu Dudau <[email protected]> wrote:
> > > >
> > > > +/**
> > > > + * of_pci_range_to_resource - Create a resource from an of_pci_range
> > > > + * @range: the PCI range that describes the resource
> > > > + * @np: device node where the range belongs to
> > > > + * @res: pointer to a valid resource that will be updated to
> > > > + * reflect the values contained in the range.
> > > > + * Note that if the range is an IO range, the resource will be converted
> > > > + * using pci_address_to_pio() which can fail if it is called to early or
> > > > + * if the range cannot be matched to any host bridge IO space.
> > > > + */
> > > > +void of_pci_range_to_resource(struct of_pci_range *range,
> > > > + struct device_node *np, struct resource *res)
> > > > +{
> > > > + res->flags = range->flags;
> > > > + if (res->flags & IORESOURCE_IO) {
> > > > + unsigned long port;
> > > > + port = pci_address_to_pio(range->pci_addr);
> > >
> > > Is this likely to break existing users of of_pci_range_to_resource?
> >
> > I've tested the change with a tegra2 based device (trimslice) and I've
> > got a functional board.
>
> Did you have any devices using I/O ports though? They are fairly rare
> these days.
# cat /proc/ioports
00001000-0000ffff : PCI0 I/O
00001000-00001fff : PCI Bus 0000:01
00001000-000010ff : 0000:01:00.0
00001000-000010ff : r8169
Looks like it does.
Liviu
>
> > > I have no idea if I/O previously worked for mips, but this patch seems
> > > to change that behavior. It may be a similar story for microblaze and
> > > powerpc.
> >
> > Both microblaze and powerpc share an identical implementation and it is
> > expecting that the physical address passed as parameter fits between
> > io_base_phys and io_base_phys + pcibios_io_size(hose). So yes, the
> > correct way is to use cpu_addr and fix the weak implementation? But we
> > don't have enough information for the weak implementation to work, as
> > we don't know where the physical IO base start (we are just about to
> > find out from DT).
>
> I think using pci_address_to_pio() at that point is just wrong
> in either way. Before the host is fully registered, you can't actually
> look up the port number -- you are only trying to assign one at this time.
>
> The implementation that Will wrote for ARM would work here: find the
> next available virtual I/O range, call pci_ioremap_io on range->pci_addr
> and then return the virtual address. Unfortunately that code is not
> architecture independent at this time, and we will first have to come
> up with something that can be made to work for powerpc, microblaze,
> mips and arm.
>
> Arnd
>
>
--
====================
| I would like to |
| fix the world, |
| but they're not |
| giving me the |
\ source code! /
---------------
¯\_(ツ)_/¯
On Thursday 27 February 2014 14:13:22 Liviu Dudau wrote:
>
> It is useful for host bridge drivers as this is the only place where we store
> the physical CPU address for the IO range. This is then needed when setting up the
> translation registers. Also used when calling the pci_ioremap_io function that I'm
> introducing in the AArch64 patches.
I don't understand what translation windows you are talking about. Is this
about how the PCI spaces are mapped into the CPU address space? If so, I
would strongly recommend to have this handled by the boot loader before
calling into the kernel. For ARM32, we have a lot of embedded systems
that require the PCI host driver to set up those windows, but actually
it would be much better to just have the firmware tell us what the setup
is and that use that.
> Whole patch is still under legal review, but the fragment for setting up the ATR looks
> like this:
>
> list_for_each_entry(window, &bridge->windows, list) {
> res = window->res;
> offset = window->offset;
> wsize = ilog2(resource_size(res)) - 1;
>
> if (resource_type(res) == IORESOURCE_MEM)
> update_atr_entry(pp->base + ATR_REG_whatever,
> res->start, /* CPU address */
> res->start - offset, /* PCI address */
> 0, wsize);
> else if (resource_type(res) == IORESOURCE_IO) {
> io_offset = pci_ioremap_io(res, bridge->io_base + offset);
> update_atr_entry(pp->base + ATR_REG_whatever,
> bridge->io_base + res->start + offset, /* CPU address */
> res->start, /* PCI address */
> 0x20000, wsize);
> }
> }
Hmm, I again don't see how 'bridge->io_base + res->start + offset' is
the correct address here. What is it you are trying to pass into
update_atr_entry()?
Arnd
On Thursday 27 February 2014 14:21:03 Liviu Dudau wrote:
> On Thu, Feb 27, 2014 at 02:08:44PM +0000, Arnd Bergmann wrote:
> > I think using pci_address_to_pio() at that point is just wrong
> > in either way. Before the host is fully registered, you can't actually
> > look up the port number -- you are only trying to assign one at this time.
> >
> > The implementation that Will wrote for ARM would work here: find the
> > next available virtual I/O range, call pci_ioremap_io on range->pci_addr
> > and then return the virtual address. Unfortunately that code is not
> > architecture independent at this time, and we will first have to come
> > up with something that can be made to work for powerpc, microblaze,
> > mips and arm.
>
> No, no, no... we cannot use the virtual address as the start of the resource
> as this will later be used when doing pcibios_resource_to_bus().
My mistake: I meant to say return the offset into the virtual window, i.e.
what IORESOURCE_IO space is about.
> What we need here is a portable way of converting from PCI range that uses physical
> CPU addresses to a IORESOURCE_IO type resource that uses logical IO
> addresses. Using logical IO values works, as Bjorn's code treats it as
> physical address.
Right.
Arnd
On Thu, Feb 27, 2014 at 03:58:41PM +0000, Arnd Bergmann wrote:
> On Thursday 27 February 2014 14:13:22 Liviu Dudau wrote:
> >
> > It is useful for host bridge drivers as this is the only place where we store
> > the physical CPU address for the IO range. This is then needed when setting up the
> > translation registers. Also used when calling the pci_ioremap_io function that I'm
> > introducing in the AArch64 patches.
>
> I don't understand what translation windows you are talking about. Is this
> about how the PCI spaces are mapped into the CPU address space? If so, I
> would strongly recommend to have this handled by the boot loader before
> calling into the kernel. For ARM32, we have a lot of embedded systems
> that require the PCI host driver to set up those windows, but actually
> it would be much better to just have the firmware tell us what the setup
> is and that use that.
The AXI to PCI bridge that I'm using has a set of registers for doing address translation.
When it sees an AXI translation that matches the programmed translation window will
convert it into a PCI write using the PCI address base written in that translation window.
In other words you basically program the DT range into those address translation registers
and the bridge does the AXI to PCI conversion for you.
>
> > Whole patch is still under legal review, but the fragment for setting up the ATR looks
> > like this:
> >
> > list_for_each_entry(window, &bridge->windows, list) {
> > res = window->res;
> > offset = window->offset;
> > wsize = ilog2(resource_size(res)) - 1;
> >
> > if (resource_type(res) == IORESOURCE_MEM)
> > update_atr_entry(pp->base + ATR_REG_whatever,
> > res->start, /* CPU address */
> > res->start - offset, /* PCI address */
> > 0, wsize);
> > else if (resource_type(res) == IORESOURCE_IO) {
> > io_offset = pci_ioremap_io(res, bridge->io_base + offset);
> > update_atr_entry(pp->base + ATR_REG_whatever,
> > bridge->io_base + res->start + offset, /* CPU address */
> > res->start, /* PCI address */
> > 0x20000, wsize);
> > }
> > }
>
> Hmm, I again don't see how 'bridge->io_base + res->start + offset' is
> the correct address here. What is it you are trying to pass into
> update_atr_entry()?
The physical CPU address where the IO range starts.
For IO:
res->start = beginning of logical I/O port block
offset = window->offset = res->start - range.pci_addr
bridge->io_base = range.cpu_addr
bridge->io_base + res->start + offset = range.cpu_addr + res->start + res->start - range.pci_addr
Hmm, writting code without coffee .... :)
Lucky for me res->start = range.pci_addr = 0, but I can see now where I've got it wrong. Thanks!
Liviu
>
> Arnd
>
>
--
====================
| I would like to |
| fix the world, |
| but they're not |
| giving me the |
\ source code! /
---------------
¯\_(ツ)_/¯
On Thursday 27 February 2014 16:20:03 Liviu Dudau wrote:
> On Thu, Feb 27, 2014 at 03:58:41PM +0000, Arnd Bergmann wrote:
> > On Thursday 27 February 2014 14:13:22 Liviu Dudau wrote:
> > >
> > > It is useful for host bridge drivers as this is the only place where we store
> > > the physical CPU address for the IO range. This is then needed when setting up the
> > > translation registers. Also used when calling the pci_ioremap_io function that I'm
> > > introducing in the AArch64 patches.
> >
> > I don't understand what translation windows you are talking about. Is this
> > about how the PCI spaces are mapped into the CPU address space? If so, I
> > would strongly recommend to have this handled by the boot loader before
> > calling into the kernel. For ARM32, we have a lot of embedded systems
> > that require the PCI host driver to set up those windows, but actually
> > it would be much better to just have the firmware tell us what the setup
> > is and that use that.
>
> The AXI to PCI bridge that I'm using has a set of registers for doing address translation.
> When it sees an AXI translation that matches the programmed translation window will
> convert it into a PCI write using the PCI address base written in that translation window.
> In other words you basically program the DT range into those address translation registers
> and the bridge does the AXI to PCI conversion for you.
Right. That should definitely be done in the boot loader
before Linux is started.
Arnd
On Thu, Feb 27, 2014 at 04:45:24PM +0000, Arnd Bergmann wrote:
> On Thursday 27 February 2014 16:20:03 Liviu Dudau wrote:
> > On Thu, Feb 27, 2014 at 03:58:41PM +0000, Arnd Bergmann wrote:
> > > On Thursday 27 February 2014 14:13:22 Liviu Dudau wrote:
> > > >
> > > > It is useful for host bridge drivers as this is the only place where we store
> > > > the physical CPU address for the IO range. This is then needed when setting up the
> > > > translation registers. Also used when calling the pci_ioremap_io function that I'm
> > > > introducing in the AArch64 patches.
> > >
> > > I don't understand what translation windows you are talking about. Is this
> > > about how the PCI spaces are mapped into the CPU address space? If so, I
> > > would strongly recommend to have this handled by the boot loader before
> > > calling into the kernel. For ARM32, we have a lot of embedded systems
> > > that require the PCI host driver to set up those windows, but actually
> > > it would be much better to just have the firmware tell us what the setup
> > > is and that use that.
> >
> > The AXI to PCI bridge that I'm using has a set of registers for doing address translation.
> > When it sees an AXI translation that matches the programmed translation window will
> > convert it into a PCI write using the PCI address base written in that translation window.
> > In other words you basically program the DT range into those address translation registers
> > and the bridge does the AXI to PCI conversion for you.
>
> Right. That should definitely be done in the boot loader
> before Linux is started.
There are examples of other host controllers doing similar things. pci-tegra.c does it in
tegra_pcie_setup_translations(), pcie-designware.c has dw_pcie_prog_viewport_*() functions.
I'm afraid for the moment the bootloader on my platform is not going to be of too much use on
this, so it will have to be initially in Linux.
Best regards,
Liviu
>
>
> Arnd
>
>
--
====================
| I would like to |
| fix the world, |
| but they're not |
| giving me the |
\ source code! /
---------------
¯\_(ツ)_/¯
On Thu, Feb 27, 2014 at 01:06:39PM +0000, Liviu Dudau wrote:
> + if (res->flags & IORESOURCE_IO) {
> + unsigned long port;
> + port = pci_address_to_pio(range->pci_addr);
This looks very suspicious, pci_addr is not unique across all domains,
so there is no way to convert from a pci_addr to the virtual IO
address without knowing the domain number as well.
I would like to see it be:
port = pci_address_to_pio(range->cpu_addr);
cpu_addr is unique across all domains.
Looking at the microblaze and PPC versions I think the above version
is actually correct (assuming io_base_phys is the CPU address of the
IO window)
Jason
On Thursday 27 February 2014, Liviu Dudau wrote:
> There are examples of other host controllers doing similar things. pci-tegra.c does it in
> tegra_pcie_setup_translations(), pcie-designware.c has dw_pcie_prog_viewport_*() functions.
>
> I'm afraid for the moment the bootloader on my platform is not going to be of too much use on
> this, so it will have to be initially in Linux.
Ah, too bad. I know that we always do it wrong on arm32, I was hoping for a fresh
start on arm64 to fix this particular issue.
Arnd
On Thu, Feb 27, 2014 at 11:19:51AM -0700, Jason Gunthorpe wrote:
> On Thu, Feb 27, 2014 at 01:06:39PM +0000, Liviu Dudau wrote:
> > + if (res->flags & IORESOURCE_IO) {
> > + unsigned long port;
> > + port = pci_address_to_pio(range->pci_addr);
>
> This looks very suspicious, pci_addr is not unique across all domains,
> so there is no way to convert from a pci_addr to the virtual IO
> address without knowing the domain number as well.
>
> I would like to see it be:
> port = pci_address_to_pio(range->cpu_addr);
>
> cpu_addr is unique across all domains.
Jason,
First thanks for reviewing the updated series.
We have agreed early on that indeed using range->cpu_addr is the correct aproach
and me taking the lazy shortcut of using range->pci_addr because of the broken
default implementation of pci_address_to_pio is wrong. I will fix this for v3.
The outstanding issue is how to fix pci_address_to_pio() as it will not
for for range->cpu_addr > IO_SPACE_LIMIT (16MB in my case).
Best regards,
Liviu
>
> Looking at the microblaze and PPC versions I think the above version
> is actually correct (assuming io_base_phys is the CPU address of the
> IO window)
>
> Jason
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
On Thu, Feb 27, 2014 at 07:12:59PM +0000, Liviu Dudau wrote:
> The outstanding issue is how to fix pci_address_to_pio() as it will not
> for for range->cpu_addr > IO_SPACE_LIMIT (16MB in my case).
The default actually looks fine to me, it is the correct behavior for
systems that actually have a dedicated IO space (like x86) where the
'CPU' value for IO is the exact value used in the IO accessor
instructions. In this case the IO_SPACE_LIMIT test is appropriate.
It also looks correct for architectures that use the CPU MMIO address
as the IO address directly (where IO_SPACE_LIMIT would be 4G)
Architectures that use the virtual IO window technique will always
require a custom pci_address_to_pio implementation.
BTW, something that occured to me after reading the patches:
For ARM64 you might want to think about doing away with the fixed
virtual IO window like we see in ARM32. Just use the CPU MMIO address
directly within the kernel, and implement a ioport_map to setup the MM
on demand.
I think the legacy reasons for having all those layers of translation
are probably not applicable to ARM64, and it is much simpler without
the extra translation step....
Arnd, what do you think?
Jason
On Thursday 27 February 2014 12:36:27 Jason Gunthorpe wrote:
> On Thu, Feb 27, 2014 at 07:12:59PM +0000, Liviu Dudau wrote:
> > The outstanding issue is how to fix pci_address_to_pio() as it will not
> > for for range->cpu_addr > IO_SPACE_LIMIT (16MB in my case).
>
> The default actually looks fine to me, it is the correct behavior for
> systems that actually have a dedicated IO space (like x86) where the
> 'CPU' value for IO is the exact value used in the IO accessor
> instructions. In this case the IO_SPACE_LIMIT test is appropriate.
Right.
> It also looks correct for architectures that use the CPU MMIO address
> as the IO address directly (where IO_SPACE_LIMIT would be 4G)
Are you aware of any that still do? I thought we had stopped doing
that.
> Architectures that use the virtual IO window technique will always
> require a custom pci_address_to_pio implementation.
Hmm, at the moment we only call it from of_address_to_resource(),
which in turn does not get called on PCI devices, and does not
call pci_address_to_pio for 'simple' platform devices. The only
case I can think of where it actually matters is when we have
ISA devices in DT that use an I/O port address in the reg property,
and that case hopefully won't happen on ARM32 or ARM64.
> BTW, something that occured to me after reading the patches:
>
> For ARM64 you might want to think about doing away with the fixed
> virtual IO window like we see in ARM32. Just use the CPU MMIO address
> directly within the kernel, and implement a ioport_map to setup the MM
> on demand.
>
> I think the legacy reasons for having all those layers of translation
> are probably not applicable to ARM64, and it is much simpler without
> the extra translation step....
>
> Arnd, what do you think?
Either I don't like it or I misunderstand you ;-)
Most PCI drivers normally don't call ioport_map or pci_iomap, so
we can't just do it there. If you are thinking of calling ioport_map
for every PCI device that has an I/O BAR and storing the virtual
address in the pci_dev resource, I don't see what that gains us
in terms of complexity, and it will also break /dev/port.
Arnd
On Thu, Feb 27, 2014 at 08:48:08PM +0100, Arnd Bergmann wrote:
> > It also looks correct for architectures that use the CPU MMIO address
> > as the IO address directly (where IO_SPACE_LIMIT would be 4G)
>
> Are you aware of any that still do? I thought we had stopped doing
> that.
I thought ia64 used to, but it has been a long time since I've touched
one...
> > Architectures that use the virtual IO window technique will always
> > require a custom pci_address_to_pio implementation.
>
> Hmm, at the moment we only call it from of_address_to_resource(),
> which in turn does not get called on PCI devices, and does not
> call pci_address_to_pio for 'simple' platform devices. The only
> case I can think of where it actually matters is when we have
> ISA devices in DT that use an I/O port address in the reg property,
> and that case hopefully won't happen on ARM32 or ARM64.
Sure, I ment, after Liviu's patch it will become required since he is
cleverly using it to figure out what the io mapping the bridge driver
setup before calling the helper.
> > I think the legacy reasons for having all those layers of translation
> > are probably not applicable to ARM64, and it is much simpler without
> > the extra translation step....
> >
> > Arnd, what do you think?
>
> Either I don't like it or I misunderstand you ;-)
>
> Most PCI drivers normally don't call ioport_map or pci_iomap, so
> we can't just do it there. If you are thinking of calling ioport_map
Okay, that was one of the 'legacy reasons'. Certainly lots of drivers
do call pci_iomap, but if you think legacy drivers that don't are
important to ARM64 then it makes sense to use the virtual IO window.
> for every PCI device that has an I/O BAR and storing the virtual
> address in the pci_dev resource, I don't see what that gains us
Mainly we get to drop the fancy dynamic allocation stuff for the fixed
virtual window, and it gives the option to have a 1:1 relationship
between CPU addresses and PCI BARs.
> in terms of complexity, and it will also break /dev/port.
Yes, /dev/port needs updating, it would need to iomap (arguably it
probably should be doing that already anyhow), and the hardwired limit
of 65536 needs to be replaced with the arch's IO limit, but those do
not seem to be fundemental problems with the UAPI??
Jason
On Thursday 27 February 2014 13:07:29 Jason Gunthorpe wrote:
> On Thu, Feb 27, 2014 at 08:48:08PM +0100, Arnd Bergmann wrote:
> > > It also looks correct for architectures that use the CPU MMIO address
> > > as the IO address directly (where IO_SPACE_LIMIT would be 4G)
> >
> > Are you aware of any that still do? I thought we had stopped doing
> > that.
>
> I thought ia64 used to, but it has been a long time since I've touched
> one...
They have a different way of doing it now, no idea how it looked in
the past:
#define IO_SPACE_LIMIT 0xffffffffffffffffUL
#define MAX_IO_SPACES_BITS 8
#define MAX_IO_SPACES (1UL << MAX_IO_SPACES_BITS)
#define IO_SPACE_BITS 24
#define IO_SPACE_SIZE (1UL << IO_SPACE_BITS)
#define IO_SPACE_NR(port) ((port) >> IO_SPACE_BITS)
#define IO_SPACE_BASE(space) ((space) << IO_SPACE_BITS)
#define IO_SPACE_PORT(port) ((port) & (IO_SPACE_SIZE - 1))
#define IO_SPACE_SPARSE_ENCODING(p) ((((p) >> 2) << 12) | ((p) & 0xfff))
So their port number is a logical token that contains the I/O space number
and a 16MB offset.
Apparently sparc64 uses physical memory addressing for I/O space, the
same way they do for memory space, and they just set IO_SPACE_LIMIT to
0xffffffffffffffffUL.
> > > Architectures that use the virtual IO window technique will always
> > > require a custom pci_address_to_pio implementation.
> >
> > Hmm, at the moment we only call it from of_address_to_resource(),
> > which in turn does not get called on PCI devices, and does not
> > call pci_address_to_pio for 'simple' platform devices. The only
> > case I can think of where it actually matters is when we have
> > ISA devices in DT that use an I/O port address in the reg property,
> > and that case hopefully won't happen on ARM32 or ARM64.
>
> Sure, I ment, after Liviu's patch it will become required since he is
> cleverly using it to figure out what the io mapping the bridge driver
> setup before calling the helper.
Ok. I was arguing more that we should add this dependency.
> > > I think the legacy reasons for having all those layers of translation
> > > are probably not applicable to ARM64, and it is much simpler without
> > > the extra translation step....
> > >
> > > Arnd, what do you think?
> >
> > Either I don't like it or I misunderstand you ;-)
> >
> > Most PCI drivers normally don't call ioport_map or pci_iomap, so
> > we can't just do it there. If you are thinking of calling ioport_map
>
> Okay, that was one of the 'legacy reasons'. Certainly lots of drivers
> do call pci_iomap, but if you think legacy drivers that don't are
> important to ARM64 then it makes sense to use the virtual IO window.
I think all uses of I/O space are legacy, but I don't think that
drivers doing inb/outb are more obsolete than those doing pci_iomap.
It's got more to do with the subsystem requirements, e.g. libata
requires the use of pci_iomap.
> > for every PCI device that has an I/O BAR and storing the virtual
> > address in the pci_dev resource, I don't see what that gains us
>
> Mainly we get to drop the fancy dynamic allocation stuff for the fixed
> virtual window, and it gives the option to have a 1:1 relationship
> between CPU addresses and PCI BARs.
I don't think the allocation is much of a problem, as long as we
can localize it in one function that is shared by everyone.
The problems I saw were all about explaining to people how it
works, but they really shouldn't have to know.
Arnd
On Thu, 2014-02-27 at 14:38 +0100, Arnd Bergmann wrote:
> On Thursday 27 February 2014 13:06:42 Liviu Dudau wrote:
> > Several platforms use a rather generic version of parsing
> > the device tree to find the host bridge ranges. Move the common code
> > into the generic PCI code and use it to create a pci_host_bridge
> > structure that can be used by arch code.
> >
> > Based on early attempts by Andrew Murray to unify the code.
> > Used powerpc and microblaze PCI code as starting point.
So that is only going to work for archs that don't have any special
twist. For example it won't work for powerpc which is why Andrew
original approach didn't fly.
The range walk helpers do help though, I need to review in more details
and test Andrew powerpc patch here and will merge it.
> > Signed-off-by: Liviu Dudau <[email protected]>
>
> Please add Benjamin Herrenschmidt to Cc here, I think it would be helpful
> to get his input so we can make this work on powerpc as well.
Tricky... We would need hooks which would turn the whole thing into a
pile of spaghetti. I think we should stick to using the range helpers
(Andrew latest patch), which makes the powerpc code a lot smaller,
and leave it at that.
> > diff --git a/drivers/pci/host-bridge.c b/drivers/pci/host-bridge.c
> > index 06ace62..feb8436 100644
> > --- a/drivers/pci/host-bridge.c
> > +++ b/drivers/pci/host-bridge.c
> > @@ -6,9 +6,13 @@
> > #include <linux/init.h>
> > #include <linux/pci.h>
> > #include <linux/module.h>
> > +#include <linux/of_address.h>
> > +#include <linux/of_pci.h>
> >
> > #include "pci.h"
> >
> > +static int domain_nr;
> > +
>
> For correctness, I think you want an 'atomic_t' here and use
> atomic_inc_return() to get a new value.
>
> > static struct pci_bus *find_pci_root_bus(struct pci_bus *bus)
> > {
> > while (bus->parent)
> > @@ -91,3 +95,133 @@ void pcibios_bus_to_resource(struct pci_bus *bus, struct resource *res,
> > res->end = region->end + offset;
> > }
> > EXPORT_SYMBOL(pcibios_bus_to_resource);
> > +
> > +/**
> > + * pci_host_bridge_of_get_ranges - Parse PCI host bridge resources from DT
> > + * @dev: device node of the host bridge having the range property
> > + * @resources: list where the range of resources will be added after DT parsing
> > + * @io_base: pointer to a variable that will contain the physical address for
> > + * the start of the I/O range.
> > + *
> > + * If this function returns an error then the @resources list will be freed.
> > + *
> > + * This function will parse the "ranges" property of a PCI host bridge device
> > + * node and setup the resource mapping based on its content. It is expected
> > + * that the property conforms with the Power ePAPR document.
> > + *
> > + * Each architecture is then offered the chance of applying their own
> > + * filtering of pci_host_bridge_windows based on their own restrictions by
> > + * calling pcibios_fixup_bridge_ranges(). The filtered list of windows
> > + * can then be used when creating a pci_host_bridge structure.
> > + */
> > +static int pci_host_bridge_of_get_ranges(struct device_node *dev,
> > + struct list_head *resources, resource_size_t *io_base)
> > +{
> > + struct resource *res;
> > + struct of_pci_range range;
> > + struct of_pci_range_parser parser;
> > + int err;
> > +
> > + pr_info("PCI host bridge %s ranges:\n", dev->full_name);
> > +
> > + /* Check for ranges property */
> > + err = of_pci_range_parser_init(&parser, dev);
> > + if (err)
> > + return err;
> > +
> > + pr_debug("Parsing ranges property...\n");
> > + for_each_of_pci_range(&parser, &range) {
> > + /* Read next ranges element */
> > + pr_debug("pci_space: 0x%08x pci_addr:0x%016llx ",
> > + range.pci_space, range.pci_addr);
> > + pr_debug("cpu_addr:0x%016llx size:0x%016llx\n",
> > + range.cpu_addr, range.size);
> > +
> > + /*
> > + * If we failed translation or got a zero-sized region
> > + * then skip this range
> > + */
> > + if (range.cpu_addr == OF_BAD_ADDR || range.size == 0)
> > + continue;
Shouldn't that test move into the parsing helper ?
> > + res = kzalloc(sizeof(struct resource), GFP_KERNEL);
> > + if (!res) {
> > + err = -ENOMEM;
> > + goto bridge_ranges_nomem;
> > + }
> > +
> > + of_pci_range_to_resource(&range, dev, res);
> > +
> > + if (resource_type(res) == IORESOURCE_IO)
> > + *io_base = range.cpu_addr;
You don't care about the size of the IO space ?
> > + pci_add_resource_offset(resources, res,
> > + res->start - range.pci_addr);
> > + }
>
> This is not the correct resource for I/O space at all. Please talk
> to Will, I've been over this with him in detail and he probably
> understands it now. I assume you are both working in the same
> building.
Yes, the IO offsets work differently on powerpc as well
> Since this is common PCI code, you could also decide to open-code
> the pci_add_resource_offset() function. If you don't do that, I
> think you have a memory leak for the resources that you can avoid
> by allocating the resource and pci_host_bridge_window structures
> together with a single kzalloc.
>
> > + /* Apply architecture specific fixups for the ranges */
> > + pcibios_fixup_bridge_ranges(resources);
> > +
> > + return 0;
> > +
> > +bridge_ranges_nomem:
> > + pci_free_resource_list(resources);
> > + return err;
> > +}
> > +
> > +/**
> > + * of_create_pci_host_bridge - Create a PCI host bridge structure using
> > + * information passed in the DT.
> > + * @parent: device owning this host bridge
> > + * @ops: pci_ops associated with the host controller
> > + * @host_data: opaque data structure used by the host controller.
> > + *
> > + * returns a pointer to the newly created pci_host_bridge structure, or
> > + * NULL if the call failed.
> > + *
> > + * This function will try to obtain the host bridge domain number by
> > + * using of_alias_get_id() call with "pci-domain" as a stem. If that
> > + * fails, a local allocator will be used that will put each host bridge
> > + * in a new domain.
> > + */
> > +struct pci_host_bridge *
> > +of_create_pci_host_bridge(struct device *parent, struct pci_ops *ops, void *host_data)
> > +{
> > + int err, domain, busno;
> > + struct resource bus_range;
> > + struct pci_bus *root_bus;
> > + struct pci_host_bridge *bridge;
> > + resource_size_t io_base;
> > + LIST_HEAD(res);
> > +
> > + domain = of_alias_get_id(parent->of_node, "pci-domain");
> > + if (domain == -ENODEV)
> > + domain = domain_nr++;
> >
We probably want some uniqueness testing here.
> > + err = of_pci_parse_bus_range(parent->of_node, &bus_range);
> > + if (err) {
> > + dev_info(parent, "No bus range for %s, using default [0-255]\n",
> > + parent->of_node->full_name);
> > + bus_range.start = 0;
> > + bus_range.end = 255;
> > + bus_range.flags = IORESOURCE_BUS;
> > + }
> > + busno = bus_range.start;
> > + pci_add_resource(&res, &bus_range);
> > +
> > + /* now parse the rest of host bridge bus ranges */
> > + if (pci_host_bridge_of_get_ranges(parent->of_node, &res, &io_base))
> > + return NULL;
> > +
> > + /* then create the root bus */
> > + root_bus = pci_create_root_bus_in_domain(parent, domain, busno,
> > + ops, host_data, &res);
> > + if (!root_bus)
> > + return NULL;
>
> Do we have any code that checks for conflicting domain/bus numbers here?
> I guess pci_create_root_bus_in_domain() will fail if you have that.
>
> Since pci_create_root_bus_in_domain() is a new function that you just
> introduced, it would be helpful to change the calling conventions
> so it returns an error pointer instead of NULL upon failing.
> of_create_pci_host_bridge() can do the same, but pci_create_root_bus()
> should keep returning NULL so we don't have to change all the
> callers.
>
> > + bridge = to_pci_host_bridge(root_bus->bridge);
> > + bridge->io_base = io_base;
> > +
> > + return bridge;
> > +}
> > +EXPORT_SYMBOL_GPL(of_create_pci_host_bridge);
> > diff --git a/include/linux/pci.h b/include/linux/pci.h
> > index 1eed009..0c5e269 100644
> > --- a/include/linux/pci.h
> > +++ b/include/linux/pci.h
> > @@ -395,6 +395,7 @@ struct pci_host_bridge {
> > struct device dev;
> > struct pci_bus *bus; /* root bus */
> > int domain_nr;
> > + resource_size_t io_base; /* physical address for the start of I/O area */
> > struct list_head windows; /* pci_host_bridge_windows */
> > void (*release_fn)(struct pci_host_bridge *);
> > void *release_data;
>
> What is the io_base used for here?
>
> > @@ -1786,11 +1787,23 @@ static inline struct device_node *pci_bus_to_OF_node(struct pci_bus *bus)
> > return bus ? bus->dev.of_node : NULL;
> > }
> >
> > +struct pci_host_bridge *
> > +of_create_pci_host_bridge(struct device *parent, struct pci_ops *ops,
> > + void *host_data);
> > +
> > +void pcibios_fixup_bridge_ranges(struct list_head *resources);
> > #else /* CONFIG_OF */
> > static inline void pci_set_of_node(struct pci_dev *dev) { }
> > static inline void pci_release_of_node(struct pci_dev *dev) { }
> > static inline void pci_set_bus_of_node(struct pci_bus *bus) { }
> > static inline void pci_release_bus_of_node(struct pci_bus *bus) { }
> > +
> > +static inline struct pci_host_bridge *
> > +pci_host_bridge_of_init(struct device *parent, struct pci_ops *ops,
> > + void *host_data)
> > +{
> > + return NULL;
> > +}
> > #endif /* CONFIG_OF */
> >
> > #ifdef CONFIG_EEH
> >
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
On Friday 28 February 2014 10:32:04 Benjamin Herrenschmidt wrote:
> On Thu, 2014-02-27 at 14:38 +0100, Arnd Bergmann wrote:
> > On Thursday 27 February 2014 13:06:42 Liviu Dudau wrote:
> > > Signed-off-by: Liviu Dudau <[email protected]>
> >
> > Please add Benjamin Herrenschmidt to Cc here, I think it would be helpful
> > to get his input so we can make this work on powerpc as well.
>
> Tricky... We would need hooks which would turn the whole thing into a
> pile of spaghetti. I think we should stick to using the range helpers
> (Andrew latest patch), which makes the powerpc code a lot smaller,
> and leave it at that.
We can certainly do both: small helpers that let you shrink the powerpc
code, and a generic implementation that can be shared by some of the
other architectures that you don't use. The PCI core already uses
a number of 'weak' functions here, and we can expand on that.
> > + res = kzalloc(sizeof(struct resource), GFP_KERNEL);
> > > + if (!res) {
> > > + err = -ENOMEM;
> > > + goto bridge_ranges_nomem;
> > > + }
> > > +
> > > + of_pci_range_to_resource(&range, dev, res);
> > > +
> > > + if (resource_type(res) == IORESOURCE_IO)
> > > + *io_base = range.cpu_addr;
>
> You don't care about the size of the IO space ?
We probably should. The ARM code currently assumes that each I/O
space is 64KB, but for a generic implementation we probably want
to handle both smaller and larger windows. I suggested not supporting
more than 1MB though, which is the maximum that I can see a reason
for, i.e. the pci-mvebu fake host bridge that has to combine
multiple per-port HW I/O spaces into one logical space.
> > > + pci_add_resource_offset(resources, res,
> > > + res->start - range.pci_addr);
> > > + }
> >
> > This is not the correct resource for I/O space at all. Please talk
> > to Will, I've been over this with him in detail and he probably
> > understands it now. I assume you are both working in the same
> > building.
>
> Yes, the IO offsets work differently on powerpc as well
As I noticed later, the first patch in the series actually changes
the range_to_resource parser to return the logical start here, which
would make the offset correct even on powerpc, but unfortunately
I think that cannot work.
> > > +struct pci_host_bridge *
> > > +of_create_pci_host_bridge(struct device *parent, struct pci_ops *ops, void *host_data)
> > > +{
> > > + int err, domain, busno;
> > > + struct resource bus_range;
> > > + struct pci_bus *root_bus;
> > > + struct pci_host_bridge *bridge;
> > > + resource_size_t io_base;
> > > + LIST_HEAD(res);
> > > +
> > > + domain = of_alias_get_id(parent->of_node, "pci-domain");
> > > + if (domain == -ENODEV)
> > > + domain = domain_nr++;
> > >
> We probably want some uniqueness testing here.
I thought this at first too, but as Liviu mentioned, this does
get caught later when trying to create the root bus with
a conflicting number.
What the above code cannot do is to put multiple host bridges
into the same domain, using distinct bus ranges. This is an
intentional simplification over what some architectures currently
do, and we could not see a reason why you would still need
to put multiple host bridges into one domain in 2014.
x86 with ACPI does it, but they won't call of_create_pci_host_bridge.
Arnd
On Thu, Feb 27, 2014 at 11:32:04PM +0000, Benjamin Herrenschmidt wrote:
> On Thu, 2014-02-27 at 14:38 +0100, Arnd Bergmann wrote:
> > On Thursday 27 February 2014 13:06:42 Liviu Dudau wrote:
> > > Several platforms use a rather generic version of parsing
> > > the device tree to find the host bridge ranges. Move the common code
> > > into the generic PCI code and use it to create a pci_host_bridge
> > > structure that can be used by arch code.
> > >
> > > Based on early attempts by Andrew Murray to unify the code.
> > > Used powerpc and microblaze PCI code as starting point.
>
> So that is only going to work for archs that don't have any special
> twist. For example it won't work for powerpc which is why Andrew
> original approach didn't fly.
>
> The range walk helpers do help though, I need to review in more details
> and test Andrew powerpc patch here and will merge it.
>
> > > Signed-off-by: Liviu Dudau <[email protected]>
> >
> > Please add Benjamin Herrenschmidt to Cc here, I think it would be helpful
> > to get his input so we can make this work on powerpc as well.
>
> Tricky... We would need hooks which would turn the whole thing into a
> pile of spaghetti. I think we should stick to using the range helpers
> (Andrew latest patch), which makes the powerpc code a lot smaller,
> and leave it at that.
Benjamin,
Thanks for looking at the patches.
I did look at powerpc and microblaze. I've actually started modifying microblaze
as it seems to have fewer corner cases and that's how I came up with the current
design. Unfortunately my QEMU environment crashes when trying to feed it a custom
dtb file that has a pci host bridge node.
I know that in the range parsing powerpc does a lot more work and that is why
I've split my code into the generic part and the arch specific callback that can
do validation of the ranges and also a bit of housekeeping.
The next phase will be to see how much of the pci_controller structure can move
into the generic code and reduce its size. I do suffer from not having a PowerPC
platform where I can test the changes, but I can produce some compiled code that
someone can test.
>
> > > diff --git a/drivers/pci/host-bridge.c b/drivers/pci/host-bridge.c
> > > index 06ace62..feb8436 100644
> > > --- a/drivers/pci/host-bridge.c
> > > +++ b/drivers/pci/host-bridge.c
> > > @@ -6,9 +6,13 @@
> > > #include <linux/init.h>
> > > #include <linux/pci.h>
> > > #include <linux/module.h>
> > > +#include <linux/of_address.h>
> > > +#include <linux/of_pci.h>
> > >
> > > #include "pci.h"
> > >
> > > +static int domain_nr;
> > > +
> >
> > For correctness, I think you want an 'atomic_t' here and use
> > atomic_inc_return() to get a new value.
> >
> > > static struct pci_bus *find_pci_root_bus(struct pci_bus *bus)
> > > {
> > > while (bus->parent)
> > > @@ -91,3 +95,133 @@ void pcibios_bus_to_resource(struct pci_bus *bus, struct resource *res,
> > > res->end = region->end + offset;
> > > }
> > > EXPORT_SYMBOL(pcibios_bus_to_resource);
> > > +
> > > +/**
> > > + * pci_host_bridge_of_get_ranges - Parse PCI host bridge resources from DT
> > > + * @dev: device node of the host bridge having the range property
> > > + * @resources: list where the range of resources will be added after DT parsing
> > > + * @io_base: pointer to a variable that will contain the physical address for
> > > + * the start of the I/O range.
> > > + *
> > > + * If this function returns an error then the @resources list will be freed.
> > > + *
> > > + * This function will parse the "ranges" property of a PCI host bridge device
> > > + * node and setup the resource mapping based on its content. It is expected
> > > + * that the property conforms with the Power ePAPR document.
> > > + *
> > > + * Each architecture is then offered the chance of applying their own
> > > + * filtering of pci_host_bridge_windows based on their own restrictions by
> > > + * calling pcibios_fixup_bridge_ranges(). The filtered list of windows
> > > + * can then be used when creating a pci_host_bridge structure.
> > > + */
> > > +static int pci_host_bridge_of_get_ranges(struct device_node *dev,
> > > + struct list_head *resources, resource_size_t *io_base)
> > > +{
> > > + struct resource *res;
> > > + struct of_pci_range range;
> > > + struct of_pci_range_parser parser;
> > > + int err;
> > > +
> > > + pr_info("PCI host bridge %s ranges:\n", dev->full_name);
> > > +
> > > + /* Check for ranges property */
> > > + err = of_pci_range_parser_init(&parser, dev);
> > > + if (err)
> > > + return err;
> > > +
> > > + pr_debug("Parsing ranges property...\n");
> > > + for_each_of_pci_range(&parser, &range) {
> > > + /* Read next ranges element */
> > > + pr_debug("pci_space: 0x%08x pci_addr:0x%016llx ",
> > > + range.pci_space, range.pci_addr);
> > > + pr_debug("cpu_addr:0x%016llx size:0x%016llx\n",
> > > + range.cpu_addr, range.size);
> > > +
> > > + /*
> > > + * If we failed translation or got a zero-sized region
> > > + * then skip this range
> > > + */
> > > + if (range.cpu_addr == OF_BAD_ADDR || range.size == 0)
> > > + continue;
>
> Shouldn't that test move into the parsing helper ?
Good question. Should we then automatically skip the bad translated
ranges?
>
> > > + res = kzalloc(sizeof(struct resource), GFP_KERNEL);
> > > + if (!res) {
> > > + err = -ENOMEM;
> > > + goto bridge_ranges_nomem;
> > > + }
> > > +
> > > + of_pci_range_to_resource(&range, dev, res);
> > > +
> > > + if (resource_type(res) == IORESOURCE_IO)
> > > + *io_base = range.cpu_addr;
>
> You don't care about the size of the IO space ?
Not here. I will leave it to the platform specific hook to do the validation.
>
> > > + pci_add_resource_offset(resources, res,
> > > + res->start - range.pci_addr);
> > > + }
> >
> > This is not the correct resource for I/O space at all. Please talk
> > to Will, I've been over this with him in detail and he probably
> > understands it now. I assume you are both working in the same
> > building.
>
> Yes, the IO offsets work differently on powerpc as well
Can you check for powerpc in light of the changes I've made in the
of_pci_range_to_resource() function? (and please read range.cpu_addr instead
of range.pci_addr in the pci_address_to_pio(). I'll post v3 shortly.)
>
> > Since this is common PCI code, you could also decide to open-code
> > the pci_add_resource_offset() function. If you don't do that, I
> > think you have a memory leak for the resources that you can avoid
> > by allocating the resource and pci_host_bridge_window structures
> > together with a single kzalloc.
> >
> > > + /* Apply architecture specific fixups for the ranges */
> > > + pcibios_fixup_bridge_ranges(resources);
> > > +
> > > + return 0;
> > > +
> > > +bridge_ranges_nomem:
> > > + pci_free_resource_list(resources);
> > > + return err;
> > > +}
> > > +
> > > +/**
> > > + * of_create_pci_host_bridge - Create a PCI host bridge structure using
> > > + * information passed in the DT.
> > > + * @parent: device owning this host bridge
> > > + * @ops: pci_ops associated with the host controller
> > > + * @host_data: opaque data structure used by the host controller.
> > > + *
> > > + * returns a pointer to the newly created pci_host_bridge structure, or
> > > + * NULL if the call failed.
> > > + *
> > > + * This function will try to obtain the host bridge domain number by
> > > + * using of_alias_get_id() call with "pci-domain" as a stem. If that
> > > + * fails, a local allocator will be used that will put each host bridge
> > > + * in a new domain.
> > > + */
> > > +struct pci_host_bridge *
> > > +of_create_pci_host_bridge(struct device *parent, struct pci_ops *ops, void *host_data)
> > > +{
> > > + int err, domain, busno;
> > > + struct resource bus_range;
> > > + struct pci_bus *root_bus;
> > > + struct pci_host_bridge *bridge;
> > > + resource_size_t io_base;
> > > + LIST_HEAD(res);
> > > +
> > > + domain = of_alias_get_id(parent->of_node, "pci-domain");
> > > + if (domain == -ENODEV)
> > > + domain = domain_nr++;
> > >
> We probably want some uniqueness testing here.
You mean you want one host bridge per domain? It's not entirely clear if there is
general consensus here on this, but I'm in favour of enforcing that.
Best regards,
Liviu
>
> > > + err = of_pci_parse_bus_range(parent->of_node, &bus_range);
> > > + if (err) {
> > > + dev_info(parent, "No bus range for %s, using default [0-255]\n",
> > > + parent->of_node->full_name);
> > > + bus_range.start = 0;
> > > + bus_range.end = 255;
> > > + bus_range.flags = IORESOURCE_BUS;
> > > + }
> > > + busno = bus_range.start;
> > > + pci_add_resource(&res, &bus_range);
> > > +
> > > + /* now parse the rest of host bridge bus ranges */
> > > + if (pci_host_bridge_of_get_ranges(parent->of_node, &res, &io_base))
> > > + return NULL;
> > > +
> > > + /* then create the root bus */
> > > + root_bus = pci_create_root_bus_in_domain(parent, domain, busno,
> > > + ops, host_data, &res);
> > > + if (!root_bus)
> > > + return NULL;
> >
> > Do we have any code that checks for conflicting domain/bus numbers here?
> > I guess pci_create_root_bus_in_domain() will fail if you have that.
> >
> > Since pci_create_root_bus_in_domain() is a new function that you just
> > introduced, it would be helpful to change the calling conventions
> > so it returns an error pointer instead of NULL upon failing.
> > of_create_pci_host_bridge() can do the same, but pci_create_root_bus()
> > should keep returning NULL so we don't have to change all the
> > callers.
> >
> > > + bridge = to_pci_host_bridge(root_bus->bridge);
> > > + bridge->io_base = io_base;
> > > +
> > > + return bridge;
> > > +}
> > > +EXPORT_SYMBOL_GPL(of_create_pci_host_bridge);
> > > diff --git a/include/linux/pci.h b/include/linux/pci.h
> > > index 1eed009..0c5e269 100644
> > > --- a/include/linux/pci.h
> > > +++ b/include/linux/pci.h
> > > @@ -395,6 +395,7 @@ struct pci_host_bridge {
> > > struct device dev;
> > > struct pci_bus *bus; /* root bus */
> > > int domain_nr;
> > > + resource_size_t io_base; /* physical address for the start of I/O area */
> > > struct list_head windows; /* pci_host_bridge_windows */
> > > void (*release_fn)(struct pci_host_bridge *);
> > > void *release_data;
> >
> > What is the io_base used for here?
> >
> > > @@ -1786,11 +1787,23 @@ static inline struct device_node *pci_bus_to_OF_node(struct pci_bus *bus)
> > > return bus ? bus->dev.of_node : NULL;
> > > }
> > >
> > > +struct pci_host_bridge *
> > > +of_create_pci_host_bridge(struct device *parent, struct pci_ops *ops,
> > > + void *host_data);
> > > +
> > > +void pcibios_fixup_bridge_ranges(struct list_head *resources);
> > > #else /* CONFIG_OF */
> > > static inline void pci_set_of_node(struct pci_dev *dev) { }
> > > static inline void pci_release_of_node(struct pci_dev *dev) { }
> > > static inline void pci_set_bus_of_node(struct pci_bus *bus) { }
> > > static inline void pci_release_bus_of_node(struct pci_bus *bus) { }
> > > +
> > > +static inline struct pci_host_bridge *
> > > +pci_host_bridge_of_init(struct device *parent, struct pci_ops *ops,
> > > + void *host_data)
> > > +{
> > > + return NULL;
> > > +}
> > > #endif /* CONFIG_OF */
> > >
> > > #ifdef CONFIG_EEH
> > >
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> > the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
>
--
====================
| I would like to |
| fix the world, |
| but they're not |
| giving me the |
\ source code! /
---------------
¯\_(ツ)_/¯
On Thu, Feb 27, 2014 at 08:22:12PM +0000, Arnd Bergmann wrote:
> On Thursday 27 February 2014 13:07:29 Jason Gunthorpe wrote:
> > On Thu, Feb 27, 2014 at 08:48:08PM +0100, Arnd Bergmann wrote:
> > > > It also looks correct for architectures that use the CPU MMIO address
> > > > as the IO address directly (where IO_SPACE_LIMIT would be 4G)
> > >
> > > Are you aware of any that still do? I thought we had stopped doing
> > > that.
> >
> > I thought ia64 used to, but it has been a long time since I've touched
> > one...
>
> They have a different way of doing it now, no idea how it looked in
> the past:
>
> #define IO_SPACE_LIMIT 0xffffffffffffffffUL
>
> #define MAX_IO_SPACES_BITS 8
> #define MAX_IO_SPACES (1UL << MAX_IO_SPACES_BITS)
> #define IO_SPACE_BITS 24
> #define IO_SPACE_SIZE (1UL << IO_SPACE_BITS)
>
> #define IO_SPACE_NR(port) ((port) >> IO_SPACE_BITS)
> #define IO_SPACE_BASE(space) ((space) << IO_SPACE_BITS)
> #define IO_SPACE_PORT(port) ((port) & (IO_SPACE_SIZE - 1))
>
> #define IO_SPACE_SPARSE_ENCODING(p) ((((p) >> 2) << 12) | ((p) & 0xfff))
>
> So their port number is a logical token that contains the I/O space number
> and a 16MB offset.
>
> Apparently sparc64 uses physical memory addressing for I/O space, the
> same way they do for memory space, and they just set IO_SPACE_LIMIT to
> 0xffffffffffffffffUL.
>
> > > > Architectures that use the virtual IO window technique will always
> > > > require a custom pci_address_to_pio implementation.
> > >
> > > Hmm, at the moment we only call it from of_address_to_resource(),
> > > which in turn does not get called on PCI devices, and does not
> > > call pci_address_to_pio for 'simple' platform devices. The only
> > > case I can think of where it actually matters is when we have
> > > ISA devices in DT that use an I/O port address in the reg property,
> > > and that case hopefully won't happen on ARM32 or ARM64.
> >
> > Sure, I ment, after Liviu's patch it will become required since he is
> > cleverly using it to figure out what the io mapping the bridge driver
> > setup before calling the helper.
>
> Ok. I was arguing more that we should add this dependency.
I've thought about this last night and I think I was trying to be too clever
for my own good. As Jason points out, arm64 needs its own version of
pci_address_to_pio(). I have an idea on how to borrow the powerpc/microblaze
one and make it useful without the need for pci_controller *hose. It would
be generic enough for other platforms that use virtual I/O windows can use,
but I'll start with it being defined for arm64 for discussions in this list.
I'll post v3 shortly.
Best regards,
Liviu
>
> > > > I think the legacy reasons for having all those layers of translation
> > > > are probably not applicable to ARM64, and it is much simpler without
> > > > the extra translation step....
> > > >
> > > > Arnd, what do you think?
> > >
> > > Either I don't like it or I misunderstand you ;-)
> > >
> > > Most PCI drivers normally don't call ioport_map or pci_iomap, so
> > > we can't just do it there. If you are thinking of calling ioport_map
> >
> > Okay, that was one of the 'legacy reasons'. Certainly lots of drivers
> > do call pci_iomap, but if you think legacy drivers that don't are
> > important to ARM64 then it makes sense to use the virtual IO window.
>
> I think all uses of I/O space are legacy, but I don't think that
> drivers doing inb/outb are more obsolete than those doing pci_iomap.
> It's got more to do with the subsystem requirements, e.g. libata
> requires the use of pci_iomap.
>
> > > for every PCI device that has an I/O BAR and storing the virtual
> > > address in the pci_dev resource, I don't see what that gains us
> >
> > Mainly we get to drop the fancy dynamic allocation stuff for the fixed
> > virtual window, and it gives the option to have a 1:1 relationship
> > between CPU addresses and PCI BARs.
>
> I don't think the allocation is much of a problem, as long as we
> can localize it in one function that is shared by everyone.
> The problems I saw were all about explaining to people how it
> works, but they really shouldn't have to know.
>
>
> Arnd
>
>
--
====================
| I would like to |
| fix the world, |
| but they're not |
| giving me the |
\ source code! /
---------------
¯\_(ツ)_/¯